track_planning causing performance regression
Hi,
During fully-cached SELECT-only test using pgbench, Postgres v13Beta1 shows
~45% performance drop [2] at high DB connection counts (when compared with v12.3)
Disabling pg_stat_statements.track_planning (which is 'On' by default)
brings the TPS numbers up to v12.3 levels.
The inflection point (in this test-case) is 128 Connections, beyond which the
TPS numbers are consistently low. Looking at the mailing list [1], this issue
didn't surface earlier possibly since the regression is trivial at low connection counts.
It would be great if this could be optimized further, or track_planning
disabled (by default) so as to not trip users upgrading from v12 with pg_stat_statement
enabled (but otherwise not particularly interested in track_planning).
These are some details around the above test:
pgbench: scale - 100 / threads - 16
test-duration - 30s each
server - 96 vCPUs / 768GB - r5.24xl (AWS EC2 instance)
client - 72 vCPUs / 144GB - c5.18xl (AWS EC2 instance) (co-located with the DB server - Same AZ)
v12 - REL_12_STABLE (v12.3)
v13Beta1 - REL_13_STABLE (v13Beta1)
max_connections = 10000
shared_preload_libraries = 'pg_stat_statements'
shared_buffers 128MB
Reference:
1) /messages/by-id/1554150919882-0.post@n3.nabble.com
2) Fully-cached-select-only TPS drops >= 128 connections.
Conn v12.3 v13Beta1 v13Beta1 (track_planning=off)
1 6,764 6,734 6,905
2 14,978 14,961 15,316
4 31,641 32,012 36,961
8 71,989 68,848 69,204
16 129,056 131,157 132,773
32 231,910 226,718 253,316
64 381,778 371,782 385,402
128 534,661 ====> 353,944 539,231
256 636,794 ====> 248,825 643,631
512 574,447 ====> 213,033 555,099
768 493,912 ====> 214,801 502,014
1024 484,993 ====> 222,492 490,716
1280 480,571 ====> 223,296 483,843
1536 475,030 ====> 228,137 477,153
1792 472,145 ====> 229,027 474,423
2048 471,385 ====> 228,665 470,238
3) perf - v13Beta1
- 88.38% 0.17% postgres postgres [.] PostgresMain
- 88.21% PostgresMain
- 80.09% exec_simple_query
- 25.34% pg_plan_queries
- 25.28% pg_plan_query
- 25.21% pgss_planner
- 14.36% pgss_store
+ 13.54% s_lock
+ 10.71% standard_planner
+ 18.29% PortalRun
- 15.12% PortalDrop
- 14.73% PortalCleanup
- 13.78% pgss_ExecutorEnd
- 13.72% pgss_store
+ 12.83% s_lock
0.72% standard_ExecutorEnd
+ 6.18% PortalStart
+ 4.86% pg_analyze_and_rewrite
+ 3.52% GetTransactionSnapshot
+ 2.56% pg_parse_query
+ 1.83% finish_xact_command
0.51% start_xact_command
+ 3.93% pq_getbyte
+ 3.40% ReadyForQuery
4) perf - v12.3
v12.3
- 84.32% 0.21% postgres postgres [.] PostgresMain
- 84.11% PostgresMain
- 72.56% exec_simple_query
+ 26.71% PortalRun
- 15.33% pg_plan_queries
- 15.29% pg_plan_query
+ 15.21% standard_planner
+ 7.81% PortalStart
+ 6.76% pg_analyze_and_rewrite
+ 4.37% GetTransactionSnapshot
+ 3.69% pg_parse_query
- 2.96% PortalDrop
- 2.42% PortalCleanup
- 1.35% pgss_ExecutorEnd
- 1.22% pgss_store
0.57% s_lock
0.77% standard_ExecutorEnd
+ 2.16% finish_xact_command
+ 0.78% start_xact_command
+ 0.59% pg_rewrite_query
+ 5.67% pq_getbyte
+ 4.73% ReadyForQuery
-
robins
Hi,
On Mon, Jun 29, 2020 at 7:49 AM Tharakan, Robins <tharar@amazon.com> wrote:
During fully-cached SELECT-only test using pgbench, Postgres v13Beta1 shows
~45% performance drop [2] at high DB connection counts (when compared with v12.3)Disabling pg_stat_statements.track_planning (which is 'On' by default)
brings the TPS numbers up to v12.3 levels.The inflection point (in this test-case) is 128 Connections, beyond which the
TPS numbers are consistently low. Looking at the mailing list [1], this issue
didn't surface earlier possibly since the regression is trivial at low connection counts.It would be great if this could be optimized further, or track_planning
disabled (by default) so as to not trip users upgrading from v12 with pg_stat_statement
enabled (but otherwise not particularly interested in track_planning).These are some details around the above test:
pgbench: scale - 100 / threads - 16
test-duration - 30s each
server - 96 vCPUs / 768GB - r5.24xl (AWS EC2 instance)
client - 72 vCPUs / 144GB - c5.18xl (AWS EC2 instance) (co-located with the DB server - Same AZ)
v12 - REL_12_STABLE (v12.3)
v13Beta1 - REL_13_STABLE (v13Beta1)
max_connections = 10000
shared_preload_libraries = 'pg_stat_statements'
shared_buffers 128MB
I can't reproduce this on my laptop, but I can certainly believe that
running the same 3 queries using more connections than available cores
will lead to extra overhead.
I disagree with the conclusion though. It seems to me that if you
really have this workload that consists in these few queries and want
to get better performance, you'll anyway use a connection pooler
and/or use prepared statements, which will make this overhead
disappear entirely, and will also yield an even bigger performance
improvement. A quick test using pgbench -M prepared, with
track_planning enabled, with still way too many connections already
shows a 25% improvement over the -M simple without track_planning.
On 2020/06/29 16:05, Julien Rouhaud wrote:
Hi,
On Mon, Jun 29, 2020 at 7:49 AM Tharakan, Robins <tharar@amazon.com> wrote:
During fully-cached SELECT-only test using pgbench, Postgres v13Beta1 shows
Thanks for the benchmark!
~45% performance drop [2] at high DB connection counts (when compared with v12.3)
That's bad :(
Disabling pg_stat_statements.track_planning (which is 'On' by default)
brings the TPS numbers up to v12.3 levels.The inflection point (in this test-case) is 128 Connections, beyond which the
TPS numbers are consistently low. Looking at the mailing list [1], this issue
didn't surface earlier possibly since the regression is trivial at low connection counts.It would be great if this could be optimized further, or track_planning
disabled (by default) so as to not trip users upgrading from v12 with pg_stat_statement
enabled (but otherwise not particularly interested in track_planning).
Your benchmark result seems to suggest that the cause of the problem is
the contention of per-query spinlock in pgss_store(). Right?
This lock contention is likely to happen when multiple sessions run
the same queries.
One idea to reduce that lock contention is to separate per-query spinlock
into two; one is for planning, and the other is for execution. pgss_store()
determines which lock to use based on the given "kind" argument.
To make this idea work, also every pgss counters like shared_blks_hit
need to be separated into two, i.e., for planning and execution.
These are some details around the above test:
pgbench: scale - 100 / threads - 16
test-duration - 30s each
server - 96 vCPUs / 768GB - r5.24xl (AWS EC2 instance)
client - 72 vCPUs / 144GB - c5.18xl (AWS EC2 instance) (co-located with the DB server - Same AZ)
v12 - REL_12_STABLE (v12.3)
v13Beta1 - REL_13_STABLE (v13Beta1)
max_connections = 10000
shared_preload_libraries = 'pg_stat_statements'
shared_buffers 128MBI can't reproduce this on my laptop, but I can certainly believe that
running the same 3 queries using more connections than available cores
will lead to extra overhead.I disagree with the conclusion though. It seems to me that if you
really have this workload that consists in these few queries and want
to get better performance, you'll anyway use a connection pooler
and/or use prepared statements, which will make this overhead
disappear entirely, and will also yield an even bigger performance
improvement. A quick test using pgbench -M prepared, with
track_planning enabled, with still way too many connections already
shows a 25% improvement over the -M simple without track_planning.
I understand your point. But IMO the default setting basically should
be safer value, i.e., off at least until the problem disappears.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Mon, Jun 29, 2020 at 10:55 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
On 2020/06/29 16:05, Julien Rouhaud wrote:
On Mon, Jun 29, 2020 at 7:49 AM Tharakan, Robins <tharar@amazon.com> wrote:
During fully-cached SELECT-only test using pgbench, Postgres v13Beta1 shows
Thanks for the benchmark!
~45% performance drop [2] at high DB connection counts (when compared with v12.3)
That's bad :(
Disabling pg_stat_statements.track_planning (which is 'On' by default)
brings the TPS numbers up to v12.3 levels.The inflection point (in this test-case) is 128 Connections, beyond which the
TPS numbers are consistently low. Looking at the mailing list [1], this issue
didn't surface earlier possibly since the regression is trivial at low connection counts.It would be great if this could be optimized further, or track_planning
disabled (by default) so as to not trip users upgrading from v12 with pg_stat_statement
enabled (but otherwise not particularly interested in track_planning).Your benchmark result seems to suggest that the cause of the problem is
the contention of per-query spinlock in pgss_store(). Right?
This lock contention is likely to happen when multiple sessions run
the same queries.One idea to reduce that lock contention is to separate per-query spinlock
into two; one is for planning, and the other is for execution. pgss_store()
determines which lock to use based on the given "kind" argument.
To make this idea work, also every pgss counters like shared_blks_hit
need to be separated into two, i.e., for planning and execution.
This can probably remove some overhead, but won't it eventually hit
the same issue when multiple connections try to plan the same query,
given the number of different queries and very low execution runtime?
It'll also quite increase the shared memory consumption.
I'm wondering if we could instead use atomics to store the counters.
The only downside is that we won't guarantee per-row consistency
anymore, which may be problematic.
On 2020/06/29 18:17, Julien Rouhaud wrote:
On Mon, Jun 29, 2020 at 10:55 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:On 2020/06/29 16:05, Julien Rouhaud wrote:
On Mon, Jun 29, 2020 at 7:49 AM Tharakan, Robins <tharar@amazon.com> wrote:
During fully-cached SELECT-only test using pgbench, Postgres v13Beta1 shows
Thanks for the benchmark!
~45% performance drop [2] at high DB connection counts (when compared with v12.3)
That's bad :(
Disabling pg_stat_statements.track_planning (which is 'On' by default)
brings the TPS numbers up to v12.3 levels.The inflection point (in this test-case) is 128 Connections, beyond which the
TPS numbers are consistently low. Looking at the mailing list [1], this issue
didn't surface earlier possibly since the regression is trivial at low connection counts.It would be great if this could be optimized further, or track_planning
disabled (by default) so as to not trip users upgrading from v12 with pg_stat_statement
enabled (but otherwise not particularly interested in track_planning).Your benchmark result seems to suggest that the cause of the problem is
the contention of per-query spinlock in pgss_store(). Right?
This lock contention is likely to happen when multiple sessions run
the same queries.One idea to reduce that lock contention is to separate per-query spinlock
into two; one is for planning, and the other is for execution. pgss_store()
determines which lock to use based on the given "kind" argument.
To make this idea work, also every pgss counters like shared_blks_hit
need to be separated into two, i.e., for planning and execution.This can probably remove some overhead, but won't it eventually hit
the same issue when multiple connections try to plan the same query,
given the number of different queries and very low execution runtime?
Yes. But maybe we can expect that the idea would improve
the performance to the near same level as v12?
It'll also quite increase the shared memory consumption.
Yes.
I'm wondering if we could instead use atomics to store the counters.
The only downside is that we won't guarantee per-row consistency
anymore, which may be problematic.
Yeah, we can consider more improvements against this issue.
But I'm afraid these (maybe including my idea) basically should
be items for v14...
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Mon, Jun 29, 2020 at 11:38 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
Your benchmark result seems to suggest that the cause of the problem is
the contention of per-query spinlock in pgss_store(). Right?
This lock contention is likely to happen when multiple sessions run
the same queries.One idea to reduce that lock contention is to separate per-query spinlock
into two; one is for planning, and the other is for execution. pgss_store()
determines which lock to use based on the given "kind" argument.
To make this idea work, also every pgss counters like shared_blks_hit
need to be separated into two, i.e., for planning and execution.This can probably remove some overhead, but won't it eventually hit
the same issue when multiple connections try to plan the same query,
given the number of different queries and very low execution runtime?Yes. But maybe we can expect that the idea would improve
the performance to the near same level as v12?
A POC patch should be easy to do and see how much it solves this
problem. However I'm not able to reproduce the issue, and IMHO unless
we specifically want to be able to distinguish planner-time counters
from execution-time counters, I'd prefer to disable track_planning by
default than going this way, so that users with a sane usage won't
have to suffer from a memory increase.
I'm wondering if we could instead use atomics to store the counters.
The only downside is that we won't guarantee per-row consistency
anymore, which may be problematic.Yeah, we can consider more improvements against this issue.
But I'm afraid these (maybe including my idea) basically should
be items for v14...
Yes, that's clearly not something I'd vote to push in v13 at this point.
On 2020/06/29 18:53, Julien Rouhaud wrote:
On Mon, Jun 29, 2020 at 11:38 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:Your benchmark result seems to suggest that the cause of the problem is
the contention of per-query spinlock in pgss_store(). Right?
This lock contention is likely to happen when multiple sessions run
the same queries.One idea to reduce that lock contention is to separate per-query spinlock
into two; one is for planning, and the other is for execution. pgss_store()
determines which lock to use based on the given "kind" argument.
To make this idea work, also every pgss counters like shared_blks_hit
need to be separated into two, i.e., for planning and execution.This can probably remove some overhead, but won't it eventually hit
the same issue when multiple connections try to plan the same query,
given the number of different queries and very low execution runtime?Yes. But maybe we can expect that the idea would improve
the performance to the near same level as v12?A POC patch should be easy to do and see how much it solves this
problem. However I'm not able to reproduce the issue, and IMHO unless
we specifically want to be able to distinguish planner-time counters
from execution-time counters, I'd prefer to disable track_planning by
default than going this way, so that users with a sane usage won't
have to suffer from a memory increase.
Agreed. +1 to change that default to off.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2020/06/29 18:56, Fujii Masao wrote:
On 2020/06/29 18:53, Julien Rouhaud wrote:
On Mon, Jun 29, 2020 at 11:38 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:Your benchmark result seems to suggest that the cause of the problem is
the contention of per-query spinlock in pgss_store(). Right?
This lock contention is likely to happen when multiple sessions run
the same queries.One idea to reduce that lock contention is to separate per-query spinlock
into two; one is for planning, and the other is for execution. pgss_store()
determines which lock to use based on the given "kind" argument.
To make this idea work, also every pgss counters like shared_blks_hit
need to be separated into two, i.e., for planning and execution.This can probably remove some overhead, but won't it eventually hit
the same issue when multiple connections try to plan the same query,
given the number of different queries and very low execution runtime?Yes. But maybe we can expect that the idea would improve
the performance to the near same level as v12?A POC patch should be easy to do and see how much it solves this
problem. However I'm not able to reproduce the issue, and IMHO unless
we specifically want to be able to distinguish planner-time counters
from execution-time counters, I'd prefer to disable track_planning by
default than going this way, so that users with a sane usage won't
have to suffer from a memory increase.Agreed. +1 to change that default to off.
Attached patch does this.
I also add the following into the description about each *_plan_time column
in the docs. IMO this is helpful for users when they see that those columns
report zero by default and try to understand why.
(if <varname>pg_stat_statements.track_planning</varname> is enabled, otherwise zero)
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachments:
disable_pgss_planning_default_v1.patchtext/plain; charset=UTF-8; name=disable_pgss_planning_default_v1.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index f615f8c2bf..c3f013860a 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -3,6 +3,7 @@ CREATE EXTENSION pg_stat_statements;
-- simple and compound statements
--
SET pg_stat_statements.track_utility = FALSE;
+SET pg_stat_statements.track_planning = TRUE;
SELECT pg_stat_statements_reset();
pg_stat_statements_reset
--------------------------
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index cef8bb5a49..65ac301b99 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -442,7 +442,7 @@ _PG_init(void)
"Selects whether planning duration is tracked by pg_stat_statements.",
NULL,
&pgss_track_planning,
- true,
+ false,
PGC_SUSET,
0,
NULL,
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 75c10554a8..6ed8e38028 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -4,6 +4,7 @@ CREATE EXTENSION pg_stat_statements;
-- simple and compound statements
--
SET pg_stat_statements.track_utility = FALSE;
+SET pg_stat_statements.track_planning = TRUE;
SELECT pg_stat_statements_reset();
SELECT 1 AS "int";
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index a13e28a84c..430d8bf07c 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -101,6 +101,8 @@
</para>
<para>
Number of times the statement was planned
+ (if <varname>pg_stat_statements.track_planning</varname> is enabled,
+ otherwise zero)
</para></entry>
</row>
@@ -110,6 +112,8 @@
</para>
<para>
Total time spent planning the statement, in milliseconds
+ (if <varname>pg_stat_statements.track_planning</varname> is enabled,
+ otherwise zero)
</para></entry>
</row>
@@ -119,6 +123,8 @@
</para>
<para>
Minimum time spent planning the statement, in milliseconds
+ (if <varname>pg_stat_statements.track_planning</varname> is enabled,
+ otherwise zero)
</para></entry>
</row>
@@ -128,6 +134,8 @@
</para>
<para>
Maximum time spent planning the statement, in milliseconds
+ (if <varname>pg_stat_statements.track_planning</varname> is enabled,
+ otherwise zero)
</para></entry>
</row>
@@ -137,6 +145,8 @@
</para>
<para>
Mean time spent planning the statement, in milliseconds
+ (if <varname>pg_stat_statements.track_planning</varname> is enabled,
+ otherwise zero)
</para></entry>
</row>
@@ -145,7 +155,10 @@
<structfield>stddev_plan_time</structfield> <type>double precision</type>
</para>
<para>
- Population standard deviation of time spent planning the statement, in milliseconds
+ Population standard deviation of time spent planning the statement,
+ in milliseconds
+ (if <varname>pg_stat_statements.track_planning</varname> is enabled,
+ otherwise zero)
</para></entry>
</row>
@@ -594,7 +607,7 @@
<para>
<varname>pg_stat_statements.track_planning</varname> controls whether
planning operations and duration are tracked by the module.
- The default value is <literal>on</literal>.
+ The default value is <literal>off</literal>.
Only superusers can change this setting.
</para>
</listitem>
On Mon, Jun 29, 2020 at 1:14 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/06/29 18:56, Fujii Masao wrote:
On 2020/06/29 18:53, Julien Rouhaud wrote:
On Mon, Jun 29, 2020 at 11:38 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:Your benchmark result seems to suggest that the cause of the problem is
the contention of per-query spinlock in pgss_store(). Right?
This lock contention is likely to happen when multiple sessions run
the same queries.One idea to reduce that lock contention is to separate per-query spinlock
into two; one is for planning, and the other is for execution. pgss_store()
determines which lock to use based on the given "kind" argument.
To make this idea work, also every pgss counters like shared_blks_hit
need to be separated into two, i.e., for planning and execution.This can probably remove some overhead, but won't it eventually hit
the same issue when multiple connections try to plan the same query,
given the number of different queries and very low execution runtime?Yes. But maybe we can expect that the idea would improve
the performance to the near same level as v12?A POC patch should be easy to do and see how much it solves this
problem. However I'm not able to reproduce the issue, and IMHO unless
we specifically want to be able to distinguish planner-time counters
from execution-time counters, I'd prefer to disable track_planning by
default than going this way, so that users with a sane usage won't
have to suffer from a memory increase.Agreed. +1 to change that default to off.
Attached patch does this.
Patch looks good to me.
I also add the following into the description about each *_plan_time column
in the docs. IMO this is helpful for users when they see that those columns
report zero by default and try to understand why.(if <varname>pg_stat_statements.track_planning</varname> is enabled, otherwise zero)
+1
Do you intend to wait for other input before pushing? FWIW I'm still
not convinced that the exposed problem is representative of any
realistic workload. I of course entirely agree with the other
documentation changes.
On Mon, 29 Jun 2020 at 12:17, Julien Rouhaud <rjuju123@gmail.com> wrote:
On Mon, Jun 29, 2020 at 10:55 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:On 2020/06/29 16:05, Julien Rouhaud wrote:
On Mon, Jun 29, 2020 at 7:49 AM Tharakan, Robins <tharar@amazon.com>
wrote:
During fully-cached SELECT-only test using pgbench, Postgres v13Beta1
shows
Thanks for the benchmark!
~45% performance drop [2] at high DB connection counts (when compared
with v12.3)
That's bad :(
Disabling pg_stat_statements.track_planning (which is 'On' by default)
brings the TPS numbers up to v12.3 levels.The inflection point (in this test-case) is 128 Connections, beyond
which the
TPS numbers are consistently low. Looking at the mailing list [1],
this issue
didn't surface earlier possibly since the regression is trivial at
low connection counts.
It would be great if this could be optimized further, or
track_planning
disabled (by default) so as to not trip users upgrading from v12 with
pg_stat_statement
enabled (but otherwise not particularly interested in track_planning).
Your benchmark result seems to suggest that the cause of the problem is
the contention of per-query spinlock in pgss_store(). Right?
This lock contention is likely to happen when multiple sessions run
the same queries.One idea to reduce that lock contention is to separate per-query spinlock
into two; one is for planning, and the other is for execution.pgss_store()
determines which lock to use based on the given "kind" argument.
To make this idea work, also every pgss counters like shared_blks_hit
need to be separated into two, i.e., for planning and execution.This can probably remove some overhead, but won't it eventually hit
the same issue when multiple connections try to plan the same query,
given the number of different queries and very low execution runtime?
It'll also quite increase the shared memory consumption.I'm wondering if we could instead use atomics to store the counters.
The only downside is that we won't guarantee per-row consistency
anymore, which may be problematic.
The problem looks to be that spinlocks are terrible with overloaded CPU and
a contended spinlock. A process holding the spinlock might easily get
scheduled out leading to excessive spinning by everybody. I think a simple
thing to try would be to replace the spinlock with LWLock.
I did a prototype patch that replaces spinlocks with futexes, but was not
able to find a workload where it mattered. We have done a great job at
eliminating spinlocks from contended code paths. Robins, perhaps you could
try it to see if it reduces the regression you are observing. The patch is
against v13 stable branch.
--
Ants Aasma
Senior Database Engineerwww.cybertec-postgresql.com
Attachments:
futex-prototype.patchtext/x-patch; charset=US-ASCII; name=futex-prototype.patchDownload
diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c
index 7fac0703419..56d45b7cfce 100644
--- a/src/backend/storage/lmgr/s_lock.c
+++ b/src/backend/storage/lmgr/s_lock.c
@@ -90,6 +90,7 @@ s_lock_stuck(const char *file, int line, const char *func)
int
s_lock(volatile slock_t *lock, const char *file, int line, const char *func)
{
+#ifndef HAS_FUTEX
SpinDelayStatus delayStatus;
init_spin_delay(&delayStatus, file, line, func);
@@ -102,6 +103,8 @@ s_lock(volatile slock_t *lock, const char *file, int line, const char *func)
finish_spin_delay(&delayStatus);
return delayStatus.delays;
+#endif
+ elog(FATAL, "Should not be called");
}
#ifdef USE_DEFAULT_S_UNLOCK
@@ -218,6 +221,71 @@ update_spins_per_delay(int shared_spins_per_delay)
return (shared_spins_per_delay * 15 + spins_per_delay) / 16;
}
+#ifdef HAS_FUTEX
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <linux/futex.h>
+
+static int
+futex(volatile uint32 *uaddr, int futex_op, int val,
+ const struct timespec *timeout, int *uaddr2, int val3)
+{
+ return syscall(SYS_futex, uaddr, futex_op, val,
+ timeout, uaddr, val3);
+}
+
+int
+futex_lock(volatile slock_t *lock, uint32 current, const char *file, int line, const char *func)
+{
+ int i, s;
+ /*
+ * First lets wait for a bit without involving the kernel, it is quite likely
+ * the lock holder is still running.
+ **/
+ if (likely(current < 2))
+ {
+ uint32 expected;
+ for (i = 0; i < DEFAULT_SPINS_PER_DELAY; i++)
+ {
+ SPIN_DELAY();
+ expected = lock->value;
+ if (expected == 0 && pg_atomic_compare_exchange_u32(lock, &expected, 1))
+ return i;
+ }
+
+ while (expected != 2 && !pg_atomic_compare_exchange_u32(lock, &expected, 2)) {
+ if (expected == 0 && pg_atomic_compare_exchange_u32(lock, &expected, 2))
+ return i;
+ }
+ }
+
+ /* At this point lock value is 2 and we will get waken up */
+ while (true)
+ {
+ uint32 expected = 0;
+ s = futex(&(lock->value), FUTEX_WAIT, 2, NULL, NULL, 0);
+ if (s == -1 && errno != EAGAIN)
+ elog(FATAL, "Futex wait failed with error: %m");
+
+ /* Maybe someone else was waiting too, we will try to wake them up. */
+ if (pg_atomic_compare_exchange_u32(lock, &expected, 2))
+ break;
+
+ }
+
+ return i;
+}
+
+int futex_unlock(volatile slock_t *lock, uint32 current)
+{
+ lock->value = 0;
+ if (futex(&(lock->value), FUTEX_WAKE, 1, NULL, NULL, 0) == -1)
+ elog(FATAL, "Futex wake failed with error: %m");
+
+ return 0;
+}
+
+#endif /* HAS_FUTEX */
/*
* Various TAS implementations that cannot live in s_lock.h as no inline
diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
index 31a5ca6fb32..ba0b0d7ef91 100644
--- a/src/include/storage/s_lock.h
+++ b/src/include/storage/s_lock.h
@@ -205,6 +205,52 @@ spin_delay(void)
#ifdef __x86_64__ /* AMD Opteron, Intel EM64T */
#define HAS_TEST_AND_SET
+#if defined(__linux__)
+#define HAS_FUTEX 1 /* TODO: move to configure to check for old kernels */
+#endif
+
+#ifdef HAS_FUTEX
+
+#include "port/atomics.h"
+
+typedef pg_atomic_uint32 slock_t;
+
+#define S_LOCK(lock) \
+ do { \
+ uint32 expected = 0; \
+ if (unlikely(!pg_atomic_compare_exchange_u32((lock), &expected, 1))) \
+ futex_lock((lock), expected, __FILE__, __LINE__, PG_FUNCNAME_MACRO); \
+ } while (0)
+
+
+#define S_UNLOCK(lock) \
+ do { \
+ uint32 actual = pg_atomic_exchange_u32((lock), 0); \
+ if (unlikely(actual == 2)) \
+ futex_unlock((lock), actual); \
+ } while (0)
+extern int futex_lock(volatile slock_t *lock, uint32 current, const char *file, int line, const char *func);
+extern int futex_unlock(volatile slock_t *lock, uint32 current);
+
+/* TAS only needed for regress */
+#define TAS(lock) tas(lock)
+
+static __inline__ int
+tas(volatile slock_t *lock)
+{
+ register uint32 _res = 1;
+
+ __asm__ __volatile__(
+ " lock \n"
+ " xchgl %0,%1 \n"
+: "+q"(_res), "+m"(*lock)
+: /* no inputs */
+: "memory", "cc");
+ return (int) _res;
+}
+
+
+#else
typedef unsigned char slock_t;
#define TAS(lock) tas(lock)
@@ -247,6 +293,7 @@ spin_delay(void)
" rep; nop \n");
}
+#endif /* HAS_FUTEX */
#endif /* __x86_64__ */
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index 02397f2eb10..a5ba0212460 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -832,7 +832,11 @@ test_spinlock(void)
S_UNLOCK(&struct_w_lock.lock);
/* and that "contended" acquisition works */
+#ifdef HAS_FUTEX
+ futex_lock(&struct_w_lock.lock, 1, "testfile", 17, "testfunc");
+#else
s_lock(&struct_w_lock.lock, "testfile", 17, "testfunc");
+#endif
S_UNLOCK(&struct_w_lock.lock);
/*
On Mon, Jun 29, 2020 at 1:55 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
I disagree with the conclusion though. It seems to me that if you
really have this workload that consists in these few queries and want
to get better performance, you'll anyway use a connection pooler
and/or use prepared statements, which will make this overhead
disappear entirely, and will also yield an even bigger performance
improvement. A quick test using pgbench -M prepared, with
track_planning enabled, with still way too many connections already
shows a 25% improvement over the -M simple without track_planning.I understand your point. But IMO the default setting basically should
be safer value, i.e., off at least until the problem disappears.
+1 -- this regression seems unacceptable to me.
--
Peter Geoghegan
On Mon, Jun 29, 2020 at 3:23 PM Peter Geoghegan <pg@bowt.ie> wrote:
+1 -- this regression seems unacceptable to me.
I added an open item to track this.
Thanks
--
Peter Geoghegan
Hi,
On 2020-06-29 09:05:18 +0200, Julien Rouhaud wrote:
I can't reproduce this on my laptop, but I can certainly believe that
running the same 3 queries using more connections than available cores
will lead to extra overhead.
I disagree with the conclusion though. It seems to me that if you
really have this workload that consists in these few queries and want
to get better performance, you'll anyway use a connection pooler
and/or use prepared statements, which will make this overhead
disappear entirely, and will also yield an even bigger performance
improvement.
It's an extremely common to have have times where there's more active
queries than CPUs. And a pooler won't avoid that fully, at least not
without drastically reducing overall throughput.
Greetings,
Andres Freund
Hi,
On 2020-06-29 17:55:28 +0900, Fujii Masao wrote:
One idea to reduce that lock contention is to separate per-query spinlock
into two; one is for planning, and the other is for execution. pgss_store()
determines which lock to use based on the given "kind" argument.
To make this idea work, also every pgss counters like shared_blks_hit
need to be separated into two, i.e., for planning and execution.
I suspect that the best thing would be to just turn the spinlock into an
lwlock. Spinlocks deal terribly with contention. I suspect it'd solve
the performance issue entirely. And it might even be possible, further
down the line, to just use a shared lock, and use atomics for the
counters.
Greetings,
Andres Freund
On 2020/06/29 22:23, Ants Aasma wrote:
On Mon, 29 Jun 2020 at 12:17, Julien Rouhaud <rjuju123@gmail.com <mailto:rjuju123@gmail.com>> wrote:
On Mon, Jun 29, 2020 at 10:55 AM Fujii Masao
<masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:On 2020/06/29 16:05, Julien Rouhaud wrote:
On Mon, Jun 29, 2020 at 7:49 AM Tharakan, Robins <tharar@amazon.com <mailto:tharar@amazon.com>> wrote:
During fully-cached SELECT-only test using pgbench, Postgres v13Beta1 shows
Thanks for the benchmark!
~45% performance drop [2] at high DB connection counts (when compared with v12.3)
That's bad :(
Disabling pg_stat_statements.track_planning (which is 'On' by default)
brings the TPS numbers up to v12.3 levels.The inflection point (in this test-case) is 128 Connections, beyond which the
TPS numbers are consistently low. Looking at the mailing list [1], this issue
didn't surface earlier possibly since the regression is trivial at low connection counts.It would be great if this could be optimized further, or track_planning
disabled (by default) so as to not trip users upgrading from v12 with pg_stat_statement
enabled (but otherwise not particularly interested in track_planning).Your benchmark result seems to suggest that the cause of the problem is
the contention of per-query spinlock in pgss_store(). Right?
This lock contention is likely to happen when multiple sessions run
the same queries.One idea to reduce that lock contention is to separate per-query spinlock
into two; one is for planning, and the other is for execution. pgss_store()
determines which lock to use based on the given "kind" argument.
To make this idea work, also every pgss counters like shared_blks_hit
need to be separated into two, i.e., for planning and execution.This can probably remove some overhead, but won't it eventually hit
the same issue when multiple connections try to plan the same query,
given the number of different queries and very low execution runtime?
It'll also quite increase the shared memory consumption.I'm wondering if we could instead use atomics to store the counters.
The only downside is that we won't guarantee per-row consistency
anymore, which may be problematic.The problem looks to be that spinlocks are terrible with overloaded CPU and a contended spinlock. A process holding the spinlock might easily get scheduled out leading to excessive spinning by everybody. I think a simple thing to try would be to replace the spinlock with LWLock.
Yes. Attached is the POC patch that replaces per-counter spinlock with LWLock.
I did a prototype patch that replaces spinlocks with futexes, but was not able to find a workload where it mattered.
I'm not familiar with futex, but could you tell me why you used futex instead
of LWLock that we already have? Is futex portable?
We have done a great job at eliminating spinlocks from contended code paths. Robins, perhaps you could try it to see if it reduces the regression you are observing.
Yes. Also we need to check that this change doesn't increase performance
overhead in other workloads.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachments:
pgss_lwlock_v1.patchtext/plain; charset=UTF-8; name=pgss_lwlock_v1.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index cef8bb5a49..aa506f6c11 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -39,7 +39,7 @@
* in an entry except the counters requires the same. To look up an entry,
* one must hold the lock shared. To read or update the counters within
* an entry, one must hold the lock shared or exclusive (so the entry doesn't
- * disappear!) and also take the entry's mutex spinlock.
+ * disappear!) and also take the entry's partition lock.
* The shared state variable pgss->extent (the next free spot in the external
* query-text file) should be accessed only while holding either the
* pgss->mutex spinlock, or exclusive lock on pgss->lock. We use the mutex to
@@ -115,6 +115,11 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
#define JUMBLE_SIZE 1024 /* query serialization buffer size */
+#define PGSS_NUM_LOCK_PARTITIONS() (pgss_max)
+#define PGSS_HASH_PARTITION_LOCK(key) \
+ (&(pgss->base + \
+ (get_hash_value(pgss_hash, key) % PGSS_NUM_LOCK_PARTITIONS()))->lock)
+
/*
* Extension version number, for supporting older extension versions' objects
*/
@@ -207,7 +212,7 @@ typedef struct pgssEntry
Size query_offset; /* query text offset in external file */
int query_len; /* # of valid bytes in query string, or -1 */
int encoding; /* query text encoding */
- slock_t mutex; /* protects the counters only */
+ LWLock *lock; /* protects the counters only */
} pgssEntry;
/*
@@ -216,6 +221,7 @@ typedef struct pgssEntry
typedef struct pgssSharedState
{
LWLock *lock; /* protects hashtable search/modification */
+ LWLockPadded *base; /* base address of this LWLock */
double cur_median_usage; /* current median usage in hashtable */
Size mean_query_len; /* current mean entry text length */
slock_t mutex; /* protects following fields only: */
@@ -468,7 +474,8 @@ _PG_init(void)
* resources in pgss_shmem_startup().
*/
RequestAddinShmemSpace(pgss_memsize());
- RequestNamedLWLockTranche("pg_stat_statements", 1);
+ RequestNamedLWLockTranche("pg_stat_statements",
+ PGSS_NUM_LOCK_PARTITIONS() + 1);
/*
* Install hooks.
@@ -547,7 +554,8 @@ pgss_shmem_startup(void)
if (!found)
{
/* First time through ... */
- pgss->lock = &(GetNamedLWLockTranche("pg_stat_statements"))->lock;
+ pgss->base = GetNamedLWLockTranche("pg_stat_statements");
+ pgss->lock = &(pgss->base + pgss_max)->lock;
pgss->cur_median_usage = ASSUMED_MEDIAN_INIT;
pgss->mean_query_len = ASSUMED_LENGTH_INIT;
SpinLockInit(&pgss->mutex);
@@ -1376,14 +1384,14 @@ pgss_store(const char *query, uint64 queryId,
if (!jstate)
{
/*
- * Grab the spinlock while updating the counters (see comment about
- * locking rules at the head of the file)
+ * Grab the partition lock while updating the counters (see comment
+ * about locking rules at the head of the file)
*/
volatile pgssEntry *e = (volatile pgssEntry *) entry;
Assert(kind == PGSS_PLAN || kind == PGSS_EXEC);
- SpinLockAcquire(&e->mutex);
+ LWLockAcquire(e->lock, LW_EXCLUSIVE);
/* "Unstick" entry if it was previously sticky */
if (IS_STICKY(e->counters))
@@ -1435,7 +1443,7 @@ pgss_store(const char *query, uint64 queryId,
e->counters.wal_fpi += walusage->wal_fpi;
e->counters.wal_bytes += walusage->wal_bytes;
- SpinLockRelease(&e->mutex);
+ LWLockRelease(e->lock);
}
done:
@@ -1762,9 +1770,9 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
{
volatile pgssEntry *e = (volatile pgssEntry *) entry;
- SpinLockAcquire(&e->mutex);
+ LWLockAcquire(e->lock, LW_SHARED);
tmp = e->counters;
- SpinLockRelease(&e->mutex);
+ LWLockRelease(e->lock);
}
/* Skip entry if unexecuted (ie, it's a pending "sticky" entry) */
@@ -1908,8 +1916,8 @@ entry_alloc(pgssHashKey *key, Size query_offset, int query_len, int encoding,
memset(&entry->counters, 0, sizeof(Counters));
/* set the appropriate initial usage count */
entry->counters.usage = sticky ? pgss->cur_median_usage : USAGE_INIT;
- /* re-initialize the mutex each time ... we assume no one using it */
- SpinLockInit(&entry->mutex);
+ /* determine which partition lock to use for this query */
+ entry->lock = PGSS_HASH_PARTITION_LOCK(key);
/* ... and don't forget the query text metadata */
Assert(query_len >= 0);
entry->query_offset = query_offset;
On Tue, 30 Jun 2020 at 08:43, Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:
The problem looks to be that spinlocks are terrible with overloaded
CPU and a contended spinlock. A process holding the spinlock might easily
get scheduled out leading to excessive spinning by everybody. I think a
simple thing to try would be to replace the spinlock with LWLock.Yes. Attached is the POC patch that replaces per-counter spinlock with
LWLock.
Great. I think this is the one that should get considered for testing.
I did a prototype patch that replaces spinlocks with futexes, but was
not able to find a workload where it mattered.
I'm not familiar with futex, but could you tell me why you used futex
instead
of LWLock that we already have? Is futex portable?
Futex is a Linux kernel call that allows to build a lock that has
uncontended cases work fully in user space almost exactly like a spinlock,
while falling back to syscalls that wait for wakeup in case of contention.
It's not portable, but probably something similar could be implemented for
other operating systems. I did not pursue this further because it became
apparent that every performance critical spinlock had already been removed.
To be clear, I am not advocating for this patch to get included. I just had
the patch immediately available and it could have confirmed that using a
better lock fixes things.
--
Ants Aasma
Senior Database Engineerwww.cybertec-postgresql.com
On 2020/06/30 20:30, Ants Aasma wrote:
On Tue, 30 Jun 2020 at 08:43, Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:
The problem looks to be that spinlocks are terrible with overloaded CPU and a contended spinlock. A process holding the spinlock might easily get scheduled out leading to excessive spinning by everybody. I think a simple thing to try would be to replace the spinlock with LWLock.
Yes. Attached is the POC patch that replaces per-counter spinlock with LWLock.
Great. I think this is the one that should get considered for testing.
I did a prototype patch that replaces spinlocks with futexes, but was not able to find a workload where it mattered.
I'm not familiar with futex, but could you tell me why you used futex instead
of LWLock that we already have? Is futex portable?Futex is a Linux kernel call that allows to build a lock that has uncontended cases work fully in user space almost exactly like a spinlock, while falling back to syscalls that wait for wakeup in case of contention. It's not portable, but probably something similar could be implemented for other operating systems. I did not pursue this further because it became apparent that every performance critical spinlock had already been removed.
To be clear, I am not advocating for this patch to get included. I just had the patch immediately available and it could have confirmed that using a better lock fixes things.
Understood. Thanks for the explanation!
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2020/06/30 7:29, Peter Geoghegan wrote:
On Mon, Jun 29, 2020 at 3:23 PM Peter Geoghegan <pg@bowt.ie> wrote:
+1 -- this regression seems unacceptable to me.
I added an open item to track this.
Thanks!
I'm thinking to change the default value of track_planning to off for v13.
Ants and Andres suggested to replace the spinlock used in pgss_store() with
LWLock. I agreed with them and posted the POC patch doing that. But I think
the patch is an item for v14. The patch may address the reported performance
issue, but may cause other performance issues in other workloads. We would
need to measure how the patch affects the performance in various workloads.
It seems too late to do that at this stage of v13. Thought?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Hi,
On 2020-06-30 14:43:39 +0900, Fujii Masao wrote:
I did a prototype patch that replaces spinlocks with futexes, but was not able to find a workload where it mattered.
I'm not familiar with futex, but could you tell me why you used futex instead
of LWLock that we already have? Is futex portable?
We can't rely on futexes, they're linux only. I also don't see much of a
reason to use spinlocks (rather than lwlocks) here in the first place.
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index cef8bb5a49..aa506f6c11 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -39,7 +39,7 @@ * in an entry except the counters requires the same. To look up an entry, * one must hold the lock shared. To read or update the counters within * an entry, one must hold the lock shared or exclusive (so the entry doesn't - * disappear!) and also take the entry's mutex spinlock. + * disappear!) and also take the entry's partition lock. * The shared state variable pgss->extent (the next free spot in the external * query-text file) should be accessed only while holding either the * pgss->mutex spinlock, or exclusive lock on pgss->lock. We use the mutex to @@ -115,6 +115,11 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;#define JUMBLE_SIZE 1024 /* query serialization buffer size */
+#define PGSS_NUM_LOCK_PARTITIONS() (pgss_max) +#define PGSS_HASH_PARTITION_LOCK(key) \ + (&(pgss->base + \ + (get_hash_value(pgss_hash, key) % PGSS_NUM_LOCK_PARTITIONS()))->lock) + /* * Extension version number, for supporting older extension versions' objects */ @@ -207,7 +212,7 @@ typedef struct pgssEntry Size query_offset; /* query text offset in external file */ int query_len; /* # of valid bytes in query string, or -1 */ int encoding; /* query text encoding */ - slock_t mutex; /* protects the counters only */ + LWLock *lock; /* protects the counters only */ } pgssEntry;
Why did you add the hashing here? It seems a lot better to just add an
lwlock in-place instead of the spinlock? The added size is neglegible
compared to the size of pgssEntry.
Greetings,
Andres Freund
Hi,
On 2020-06-30 14:30:03 +0300, Ants Aasma wrote:
Futex is a Linux kernel call that allows to build a lock that has
uncontended cases work fully in user space almost exactly like a spinlock,
while falling back to syscalls that wait for wakeup in case of contention.
It's not portable, but probably something similar could be implemented for
other operating systems. I did not pursue this further because it became
apparent that every performance critical spinlock had already been removed.
Our lwlock implementation does have that property already, though. While
the kernel wait is implemented using semaphores, those are implemented
using futexes internally (posix ones, not sysv ones, so only after
whatever version we switched the default to posix semas on linux).
I'd rather move towards removing spinlocks from postgres than making
their implementation more complicated...
Greetings,
Andres Freund
On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
Ants and Andres suggested to replace the spinlock used in pgss_store() with
LWLock. I agreed with them and posted the POC patch doing that. But I think
the patch is an item for v14. The patch may address the reported performance
issue, but may cause other performance issues in other workloads. We would
need to measure how the patch affects the performance in various workloads.
It seems too late to do that at this stage of v13. Thought?
I agree that it's too late for v13.
Thanks
--
Peter Geoghegan
On 2020/07/01 4:03, Andres Freund wrote:
Hi,
On 2020-06-30 14:43:39 +0900, Fujii Masao wrote:
I did a prototype patch that replaces spinlocks with futexes, but was not able to find a workload where it mattered.
I'm not familiar with futex, but could you tell me why you used futex instead
of LWLock that we already have? Is futex portable?We can't rely on futexes, they're linux only.
Understood. Thanks!
I also don't see much of a
reason to use spinlocks (rather than lwlocks) here in the first place.diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index cef8bb5a49..aa506f6c11 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -39,7 +39,7 @@ * in an entry except the counters requires the same. To look up an entry, * one must hold the lock shared. To read or update the counters within * an entry, one must hold the lock shared or exclusive (so the entry doesn't - * disappear!) and also take the entry's mutex spinlock. + * disappear!) and also take the entry's partition lock. * The shared state variable pgss->extent (the next free spot in the external * query-text file) should be accessed only while holding either the * pgss->mutex spinlock, or exclusive lock on pgss->lock. We use the mutex to @@ -115,6 +115,11 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;#define JUMBLE_SIZE 1024 /* query serialization buffer size */
+#define PGSS_NUM_LOCK_PARTITIONS() (pgss_max) +#define PGSS_HASH_PARTITION_LOCK(key) \ + (&(pgss->base + \ + (get_hash_value(pgss_hash, key) % PGSS_NUM_LOCK_PARTITIONS()))->lock) + /* * Extension version number, for supporting older extension versions' objects */ @@ -207,7 +212,7 @@ typedef struct pgssEntry Size query_offset; /* query text offset in external file */ int query_len; /* # of valid bytes in query string, or -1 */ int encoding; /* query text encoding */ - slock_t mutex; /* protects the counters only */ + LWLock *lock; /* protects the counters only */ } pgssEntry;Why did you add the hashing here? It seems a lot better to just add an
lwlock in-place instead of the spinlock? The added size is neglegible
compared to the size of pgssEntry.
Because pgssEntry is not array entry but hashtable entry. First I was
thinking to assign per-process lwlock to each entry in the array at the
startup. But each entry is created every time new entry is required.
So lwlock needs to be assigned to each entry at that creation time.
We cannnot easily assign lwlock to all the entries at the startup.
Also each entry can be dropped from the hashtable. In this case,
maybe already-assigned lwlock needs to be moved back to "freelist"
so that it will be able to be assigned again to new entry later. We can
implement this probably, but which looks a bit complicated.
Since the hasing addresses these issues, I just used it in POC patch.
But I'd like to hear better idea!
+#define PGSS_NUM_LOCK_PARTITIONS() (pgss_max)
Currently pgss_max is used as the number of lwlock for entries.
But if too large number of lwlock is useless (or a bit harmful?), we can
set the upper limit here, e.g., max(pgss_max, 10000).
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Hi,
On 2020-07-01 22:20:50 +0900, Fujii Masao wrote:
On 2020/07/01 4:03, Andres Freund wrote:
Why did you add the hashing here? It seems a lot better to just add an
lwlock in-place instead of the spinlock? The added size is neglegible
compared to the size of pgssEntry.Because pgssEntry is not array entry but hashtable entry. First I was
thinking to assign per-process lwlock to each entry in the array at the
startup. But each entry is created every time new entry is required.
So lwlock needs to be assigned to each entry at that creation time.
We cannnot easily assign lwlock to all the entries at the startup.
But why not just do it exactly at the place the SpinLockInit() is done
currently?
Greetings,
Andres Freund
On 2020/07/02 1:54, Andres Freund wrote:
Hi,
On 2020-07-01 22:20:50 +0900, Fujii Masao wrote:
On 2020/07/01 4:03, Andres Freund wrote:
Why did you add the hashing here? It seems a lot better to just add an
lwlock in-place instead of the spinlock? The added size is neglegible
compared to the size of pgssEntry.Because pgssEntry is not array entry but hashtable entry. First I was
thinking to assign per-process lwlock to each entry in the array at the
startup. But each entry is created every time new entry is required.
So lwlock needs to be assigned to each entry at that creation time.
We cannnot easily assign lwlock to all the entries at the startup.But why not just do it exactly at the place the SpinLockInit() is done
currently?
Sorry I failed to understand your point... You mean that new lwlock should
be initialized at the place the SpinLockInit() is done currently instead of
requesting postmaster to initialize all the lwlocks required for pgss
at _PG_init()?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2020/07/01 7:37, Peter Geoghegan wrote:
On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
Ants and Andres suggested to replace the spinlock used in pgss_store() with
LWLock. I agreed with them and posted the POC patch doing that. But I think
the patch is an item for v14. The patch may address the reported performance
issue, but may cause other performance issues in other workloads. We would
need to measure how the patch affects the performance in various workloads.
It seems too late to do that at this stage of v13. Thought?I agree that it's too late for v13.
Thanks for the comment!
So I pushed the patch and changed default of track_planning to off.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Thu, Jul 2, 2020 at 7:39 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
So I pushed the patch and changed default of track_planning to off.
I have closed out the open item I created for this.
Thanks!
--
Peter Geoghegan
On 2020/07/03 11:43, Peter Geoghegan wrote:
On Thu, Jul 2, 2020 at 7:39 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
So I pushed the patch and changed default of track_planning to off.
I have closed out the open item I created for this.
Thanks!!
I added the patch that replaces spinlock with lwlock in pgss, into CF-2020-09.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Hi
pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com>
napsal:
On 2020/07/01 7:37, Peter Geoghegan wrote:
On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:
Ants and Andres suggested to replace the spinlock used in pgss_store()
with
LWLock. I agreed with them and posted the POC patch doing that. But I
think
the patch is an item for v14. The patch may address the reported
performance
issue, but may cause other performance issues in other workloads. We
would
need to measure how the patch affects the performance in various
workloads.
It seems too late to do that at this stage of v13. Thought?
I agree that it's too late for v13.
Thanks for the comment!
So I pushed the patch and changed default of track_planning to off.
Maybe there can be documented so enabling this option can have a negative
impact on performance.
Regards
Pavel
Show quoted text
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2020/07/03 13:05, Pavel Stehule wrote:
Hi
pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> napsal:
On 2020/07/01 7:37, Peter Geoghegan wrote:
On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:
Ants and Andres suggested to replace the spinlock used in pgss_store() with
LWLock. I agreed with them and posted the POC patch doing that. But I think
the patch is an item for v14. The patch may address the reported performance
issue, but may cause other performance issues in other workloads. We would
need to measure how the patch affects the performance in various workloads.
It seems too late to do that at this stage of v13. Thought?I agree that it's too late for v13.
Thanks for the comment!
So I pushed the patch and changed default of track_planning to off.
Maybe there can be documented so enabling this option can have a negative impact on performance.
Yes. What about adding either of the followings into the doc?
Enabling this parameter may incur a noticeable performance penalty.
or
Enabling this parameter may incur a noticeable performance penalty,
especially when a fewer kinds of queries are executed on many
concurrent connections.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
pá 3. 7. 2020 v 8:57 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com>
napsal:
On 2020/07/03 13:05, Pavel Stehule wrote:
Hi
pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com
<mailto:masao.fujii@oss.nttdata.com>> napsal:
On 2020/07/01 7:37, Peter Geoghegan wrote:
On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao <
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:
Ants and Andres suggested to replace the spinlock used in
pgss_store() with
LWLock. I agreed with them and posted the POC patch doing that.
But I think
the patch is an item for v14. The patch may address the reported
performance
issue, but may cause other performance issues in other
workloads. We would
need to measure how the patch affects the performance in various
workloads.
It seems too late to do that at this stage of v13. Thought?
I agree that it's too late for v13.
Thanks for the comment!
So I pushed the patch and changed default of track_planning to off.
Maybe there can be documented so enabling this option can have a
negative impact on performance.
Yes. What about adding either of the followings into the doc?
Enabling this parameter may incur a noticeable performance penalty.
or
Enabling this parameter may incur a noticeable performance penalty,
especially when a fewer kinds of queries are executed on many
concurrent connections.
This second variant looks perfect for this case.
Thank you
Pavel
Show quoted text
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2020/07/03 16:02, Pavel Stehule wrote:
pá 3. 7. 2020 v 8:57 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> napsal:
On 2020/07/03 13:05, Pavel Stehule wrote:
Hi
pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>> napsal:
On 2020/07/01 7:37, Peter Geoghegan wrote:
> On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>> wrote:
>> Ants and Andres suggested to replace the spinlock used in pgss_store() with
>> LWLock. I agreed with them and posted the POC patch doing that. But I think
>> the patch is an item for v14. The patch may address the reported performance
>> issue, but may cause other performance issues in other workloads. We would
>> need to measure how the patch affects the performance in various workloads.
>> It seems too late to do that at this stage of v13. Thought?
>
> I agree that it's too late for v13.Thanks for the comment!
So I pushed the patch and changed default of track_planning to off.
Maybe there can be documented so enabling this option can have a negative impact on performance.
Yes. What about adding either of the followings into the doc?
Enabling this parameter may incur a noticeable performance penalty.
or
Enabling this parameter may incur a noticeable performance penalty,
especially when a fewer kinds of queries are executed on many
concurrent connections.This second variant looks perfect for this case.
Ok, so patch attached.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachments:
document_overhead_by_track_planning.patchtext/plain; charset=UTF-8; name=document_overhead_by_track_planning.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 430d8bf07c..cf2d25b7b2 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -607,6 +607,9 @@
<para>
<varname>pg_stat_statements.track_planning</varname> controls whether
planning operations and duration are tracked by the module.
+ Enabling this parameter may incur a noticeable performance penalty,
+ especially when a fewer kinds of queries are executed on many
+ concurrent connections.
The default value is <literal>off</literal>.
Only superusers can change this setting.
</para>
pá 3. 7. 2020 v 13:02 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com>
napsal:
On 2020/07/03 16:02, Pavel Stehule wrote:
pá 3. 7. 2020 v 8:57 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com
<mailto:masao.fujii@oss.nttdata.com>> napsal:
On 2020/07/03 13:05, Pavel Stehule wrote:
Hi
pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao <
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>> napsal:On 2020/07/01 7:37, Peter Geoghegan wrote:
On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao <
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>> wrote:Ants and Andres suggested to replace the spinlock used in
pgss_store() with
LWLock. I agreed with them and posted the POC patch doing
that. But I think
the patch is an item for v14. The patch may address the
reported performance
issue, but may cause other performance issues in other
workloads. We would
need to measure how the patch affects the performance in
various workloads.
It seems too late to do that at this stage of v13.
Thought?
I agree that it's too late for v13.
Thanks for the comment!
So I pushed the patch and changed default of track_planning
to off.
Maybe there can be documented so enabling this option can have a
negative impact on performance.
Yes. What about adding either of the followings into the doc?
Enabling this parameter may incur a noticeable performance
penalty.
or
Enabling this parameter may incur a noticeable performance
penalty,
especially when a fewer kinds of queries are executed on many
concurrent connections.This second variant looks perfect for this case.
Ok, so patch attached.
+1
Thank you
Pavel
Show quoted text
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2020/07/04 12:22, Pavel Stehule wrote:
pá 3. 7. 2020 v 13:02 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> napsal:
On 2020/07/03 16:02, Pavel Stehule wrote:
pá 3. 7. 2020 v 8:57 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>> napsal:
On 2020/07/03 13:05, Pavel Stehule wrote:
> Hi
>
> pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>>> napsal:
>
>
>
> On 2020/07/01 7:37, Peter Geoghegan wrote:
> > On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>>> wrote:
> >> Ants and Andres suggested to replace the spinlock used in pgss_store() with
> >> LWLock. I agreed with them and posted the POC patch doing that. But I think
> >> the patch is an item for v14. The patch may address the reported performance
> >> issue, but may cause other performance issues in other workloads. We would
> >> need to measure how the patch affects the performance in various workloads.
> >> It seems too late to do that at this stage of v13. Thought?
> >
> > I agree that it's too late for v13.
>
> Thanks for the comment!
>
> So I pushed the patch and changed default of track_planning to off.
>
>
> Maybe there can be documented so enabling this option can have a negative impact on performance.Yes. What about adding either of the followings into the doc?
Enabling this parameter may incur a noticeable performance penalty.
or
Enabling this parameter may incur a noticeable performance penalty,
especially when a fewer kinds of queries are executed on many
concurrent connections.This second variant looks perfect for this case.
Ok, so patch attached.
+1
Thanks for the review! Pushed.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
<https://commitfest.postgresql.org/29/2634/>
On Mon, Jul 6, 2020 at 10:29 AM Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:
On 2020/07/04 12:22, Pavel Stehule wrote:
pá 3. 7. 2020 v 13:02 odesílatel Fujii Masao <
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> napsal:
On 2020/07/03 16:02, Pavel Stehule wrote:
pá 3. 7. 2020 v 8:57 odesílatel Fujii Masao <
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>> napsal:On 2020/07/03 13:05, Pavel Stehule wrote:
Hi
pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao <
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>>>
napsal:On 2020/07/01 7:37, Peter Geoghegan wrote:
On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao <
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>>> wrote:Ants and Andres suggested to replace the spinlock
used in pgss_store() with
LWLock. I agreed with them and posted the POC
patch doing that. But I think
the patch is an item for v14. The patch may
address the reported performance
issue, but may cause other performance issues in
other workloads. We would
need to measure how the patch affects the
performance in various workloads.
It seems too late to do that at this stage of v13.
Thought?
I agree that it's too late for v13.
Thanks for the comment!
So I pushed the patch and changed default of
track_planning to off.
Maybe there can be documented so enabling this option can
have a negative impact on performance.
Yes. What about adding either of the followings into the doc?
Enabling this parameter may incur a noticeable
performance penalty.
or
Enabling this parameter may incur a noticeable
performance penalty,
especially when a fewer kinds of queries are executed
on many
concurrent connections.
This second variant looks perfect for this case.
Ok, so patch attached.
+1
Thanks for the review! Pushed.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
You might also want to update this patch's status in the commitfest:
https://commitfest.postgresql.org/29/2634/
--
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950 EMAIL: mailto:hamid.akhtar@highgo.ca
SKYPE: engineeredvirus
On 2020/07/31 21:40, Hamid Akhtar wrote:
<https://commitfest.postgresql.org/29/2634/>
On Mon, Jul 6, 2020 at 10:29 AM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:
On 2020/07/04 12:22, Pavel Stehule wrote:
pá 3. 7. 2020 v 13:02 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>> napsal:
On 2020/07/03 16:02, Pavel Stehule wrote:
>
>
> pá 3. 7. 2020 v 8:57 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>>> napsal:
>
>
>
> On 2020/07/03 13:05, Pavel Stehule wrote:
> > Hi
> >
> > pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>>>> napsal:
> >
> >
> >
> > On 2020/07/01 7:37, Peter Geoghegan wrote:
> > > On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>>>> wrote:
> > >> Ants and Andres suggested to replace the spinlock used in pgss_store() with
> > >> LWLock. I agreed with them and posted the POC patch doing that. But I think
> > >> the patch is an item for v14. The patch may address the reported performance
> > >> issue, but may cause other performance issues in other workloads. We would
> > >> need to measure how the patch affects the performance in various workloads.
> > >> It seems too late to do that at this stage of v13. Thought?
> > >
> > > I agree that it's too late for v13.
> >
> > Thanks for the comment!
> >
> > So I pushed the patch and changed default of track_planning to off.
> >
> >
> > Maybe there can be documented so enabling this option can have a negative impact on performance.
>
> Yes. What about adding either of the followings into the doc?
>
> Enabling this parameter may incur a noticeable performance penalty.
>
> or
>
> Enabling this parameter may incur a noticeable performance penalty,
> especially when a fewer kinds of queries are executed on many
> concurrent connections.
>
>
> This second variant looks perfect for this case.Ok, so patch attached.
+1
Thanks for the review! Pushed.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONYou might also want to update this patch's status in the commitfest:
https://commitfest.postgresql.org/29/2634/
The patch added into this CF entry has not been committed yet.
So I was thinking that there is no need to update the status yet. No?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Mon, Aug 17, 2020 at 2:21 PM Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:
On 2020/07/31 21:40, Hamid Akhtar wrote:
<https://commitfest.postgresql.org/29/2634/>
On Mon, Jul 6, 2020 at 10:29 AM Fujii Masao <masao.fujii@oss.nttdata.com
<mailto:masao.fujii@oss.nttdata.com>> wrote:
On 2020/07/04 12:22, Pavel Stehule wrote:
pá 3. 7. 2020 v 13:02 odesílatel Fujii Masao <
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>> napsal:On 2020/07/03 16:02, Pavel Stehule wrote:
pá 3. 7. 2020 v 8:57 odesílatel Fujii Masao <
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>>>
napsal:On 2020/07/03 13:05, Pavel Stehule wrote:
Hi
pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao <
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>>
<mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>
<mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>
<mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>
<mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>>>>
napsal:On 2020/07/01 7:37, Peter Geoghegan wrote:
On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao <
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:
masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>>
<mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>
<mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>
<mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>
<mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>>>>
wrote:Ants and Andres suggested to replace the
spinlock used in pgss_store() with
LWLock. I agreed with them and posted the
POC patch doing that. But I think
the patch is an item for v14. The patch may
address the reported performance
issue, but may cause other performance
issues in other workloads. We would
need to measure how the patch affects the
performance in various workloads.
It seems too late to do that at this stage
of v13. Thought?
I agree that it's too late for v13.
Thanks for the comment!
So I pushed the patch and changed default of
track_planning to off.
Maybe there can be documented so enabling this
option can have a negative impact on performance.
Yes. What about adding either of the followings into
the doc?
Enabling this parameter may incur a noticeable
performance penalty.
or
Enabling this parameter may incur a noticeable
performance penalty,
especially when a fewer kinds of queries are
executed on many
concurrent connections.
This second variant looks perfect for this case.
Ok, so patch attached.
+1
Thanks for the review! Pushed.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONYou might also want to update this patch's status in the commitfest:
https://commitfest.postgresql.org/29/2634/The patch added into this CF entry has not been committed yet.
So I was thinking that there is no need to update the status yet. No?
Your previous email suggested that it's been pushed, hence my comment.
Checking the git log, I see a commit was pushed on July 6 (321fa6a) with
the changes that match the latest patch.
Am I missing something here?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
--
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950 EMAIL: mailto:hamid.akhtar@highgo.ca
SKYPE: engineeredvirus
On 2020/08/17 18:34, Hamid Akhtar wrote:
On Mon, Aug 17, 2020 at 2:21 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:
On 2020/07/31 21:40, Hamid Akhtar wrote:
<https://commitfest.postgresql.org/29/2634/>
On Mon, Jul 6, 2020 at 10:29 AM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>> wrote:
On 2020/07/04 12:22, Pavel Stehule wrote:
>
>
> pá 3. 7. 2020 v 13:02 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>>> napsal:
>
>
>
> On 2020/07/03 16:02, Pavel Stehule wrote:
> >
> >
> > pá 3. 7. 2020 v 8:57 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>>>> napsal:
> >
> >
> >
> > On 2020/07/03 13:05, Pavel Stehule wrote:
> > > Hi
> > >
> > > pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>> <mailto:masao.fujii@oss.nttdata.com<mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>>>>> napsal:
> > >
> > >
> > >
> > > On 2020/07/01 7:37, Peter Geoghegan wrote:
> > > > On Tue, Jun 30, 2020 at 6:40 AM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>><mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>>>>> wrote:
> > > >> Ants and Andres suggested to replace the spinlock used in pgss_store() with
> > > >> LWLock. I agreed with them and posted the POC patch doing that. But I think
> > > >> the patch is an item for v14. The patch may address the reported performance
> > > >> issue, but may cause other performance issues in other workloads. We would
> > > >> need to measure how the patch affects the performance in various workloads.
> > > >> It seems too late to do that at this stage of v13. Thought?
> > > >
> > > > I agree that it's too late for v13.
> > >
> > > Thanks for the comment!
> > >
> > > So I pushed the patch and changed default of track_planning to off.
> > >
> > >
> > > Maybe there can be documented so enabling this option can have a negative impact on performance.
> >
> > Yes. What about adding either of the followings into the doc?
> >
> > Enabling this parameter may incur a noticeable performance penalty.
> >
> > or
> >
> > Enabling this parameter may incur a noticeable performance penalty,
> > especially when a fewer kinds of queries are executed on many
> > concurrent connections.
> >
> >
> > This second variant looks perfect for this case.
>
> Ok, so patch attached.
>
>
> +1Thanks for the review! Pushed.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONYou might also want to update this patch's status in the commitfest:
https://commitfest.postgresql.org/29/2634/The patch added into this CF entry has not been committed yet.
So I was thinking that there is no need to update the status yet. No?Your previous email suggested that it's been pushed, hence my comment. Checking the git log, I see a commit was pushed on July 6 (321fa6a) with the changes that match the latest patch.
Yes, I pushed the document_overhead_by_track_planning.patch, but this
CF entry is for pgss_lwlock_v1.patch which replaces spinlocks with lwlocks
in pg_stat_statements. The latter patch has not been committed yet.
Probably attachding the different patches in the same thread would cause
this confusing thing... Anyway, thanks for your comment!
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Yes, I pushed the document_overhead_by_track_planning.patch, but this
CF entry is for pgss_lwlock_v1.patch which replaces spinlocks with lwlocks
in pg_stat_statements. The latter patch has not been committed yet.
Probably attachding the different patches in the same thread would cause
this confusing thing... Anyway, thanks for your comment!
To avoid further confusion, I attached the rebased version of
the patch that was registered at CF. I'd appreciate it if
you review this version.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachments:
pgss_lwlock_v2.patchtext/plain; charset=UTF-8; name=pgss_lwlock_v2.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 6b91c62c31..3035c14ed5 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -39,7 +39,7 @@
* in an entry except the counters requires the same. To look up an entry,
* one must hold the lock shared. To read or update the counters within
* an entry, one must hold the lock shared or exclusive (so the entry doesn't
- * disappear!) and also take the entry's mutex spinlock.
+ * disappear!) and also take the entry's partition lock.
* The shared state variable pgss->extent (the next free spot in the external
* query-text file) should be accessed only while holding either the
* pgss->mutex spinlock, or exclusive lock on pgss->lock. We use the mutex to
@@ -115,6 +115,11 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
#define JUMBLE_SIZE 1024 /* query serialization buffer size */
+#define PGSS_NUM_LOCK_PARTITIONS() (pgss_max)
+#define PGSS_HASH_PARTITION_LOCK(key) \
+ (&(pgss->base + \
+ (get_hash_value(pgss_hash, key) % PGSS_NUM_LOCK_PARTITIONS()))->lock)
+
/*
* Extension version number, for supporting older extension versions' objects
*/
@@ -207,7 +212,7 @@ typedef struct pgssEntry
Size query_offset; /* query text offset in external file */
int query_len; /* # of valid bytes in query string, or -1 */
int encoding; /* query text encoding */
- slock_t mutex; /* protects the counters only */
+ LWLock *lock; /* protects the counters only */
} pgssEntry;
/*
@@ -216,6 +221,7 @@ typedef struct pgssEntry
typedef struct pgssSharedState
{
LWLock *lock; /* protects hashtable search/modification */
+ LWLockPadded *base; /* base address of this LWLock */
double cur_median_usage; /* current median usage in hashtable */
Size mean_query_len; /* current mean entry text length */
slock_t mutex; /* protects following fields only: */
@@ -468,7 +474,8 @@ _PG_init(void)
* resources in pgss_shmem_startup().
*/
RequestAddinShmemSpace(pgss_memsize());
- RequestNamedLWLockTranche("pg_stat_statements", 1);
+ RequestNamedLWLockTranche("pg_stat_statements",
+ PGSS_NUM_LOCK_PARTITIONS() + 1);
/*
* Install hooks.
@@ -547,7 +554,8 @@ pgss_shmem_startup(void)
if (!found)
{
/* First time through ... */
- pgss->lock = &(GetNamedLWLockTranche("pg_stat_statements"))->lock;
+ pgss->base = GetNamedLWLockTranche("pg_stat_statements");
+ pgss->lock = &(pgss->base + pgss_max)->lock;
pgss->cur_median_usage = ASSUMED_MEDIAN_INIT;
pgss->mean_query_len = ASSUMED_LENGTH_INIT;
SpinLockInit(&pgss->mutex);
@@ -1384,14 +1392,14 @@ pgss_store(const char *query, uint64 queryId,
if (!jstate)
{
/*
- * Grab the spinlock while updating the counters (see comment about
- * locking rules at the head of the file)
+ * Grab the partition lock while updating the counters (see comment
+ * about locking rules at the head of the file)
*/
volatile pgssEntry *e = (volatile pgssEntry *) entry;
Assert(kind == PGSS_PLAN || kind == PGSS_EXEC);
- SpinLockAcquire(&e->mutex);
+ LWLockAcquire(e->lock, LW_EXCLUSIVE);
/* "Unstick" entry if it was previously sticky */
if (IS_STICKY(e->counters))
@@ -1443,7 +1451,7 @@ pgss_store(const char *query, uint64 queryId,
e->counters.wal_fpi += walusage->wal_fpi;
e->counters.wal_bytes += walusage->wal_bytes;
- SpinLockRelease(&e->mutex);
+ LWLockRelease(e->lock);
}
done:
@@ -1770,9 +1778,9 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
{
volatile pgssEntry *e = (volatile pgssEntry *) entry;
- SpinLockAcquire(&e->mutex);
+ LWLockAcquire(e->lock, LW_SHARED);
tmp = e->counters;
- SpinLockRelease(&e->mutex);
+ LWLockRelease(e->lock);
}
/* Skip entry if unexecuted (ie, it's a pending "sticky" entry) */
@@ -1916,8 +1924,8 @@ entry_alloc(pgssHashKey *key, Size query_offset, int query_len, int encoding,
memset(&entry->counters, 0, sizeof(Counters));
/* set the appropriate initial usage count */
entry->counters.usage = sticky ? pgss->cur_median_usage : USAGE_INIT;
- /* re-initialize the mutex each time ... we assume no one using it */
- SpinLockInit(&entry->mutex);
+ /* determine which partition lock to use for this query */
+ entry->lock = PGSS_HASH_PARTITION_LOCK(key);
/* ... and don't forget the query text metadata */
Assert(query_len >= 0);
entry->query_offset = query_offset;
On Tue, Aug 18, 2020 at 8:43 PM Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:
Yes, I pushed the document_overhead_by_track_planning.patch, but this
CF entry is for pgss_lwlock_v1.patch which replaces spinlocks withlwlocks
in pg_stat_statements. The latter patch has not been committed yet.
Probably attachding the different patches in the same thread would cause
this confusing thing... Anyway, thanks for your comment!To avoid further confusion, I attached the rebased version of
the patch that was registered at CF. I'd appreciate it if
you review this version.
Thank you. Reviewing it now.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
--
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950 EMAIL: mailto:hamid.akhtar@highgo.ca
SKYPE: engineeredvirus
The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested
Overall, the patch works fine. However, I have a few observations:
(1) Code Comments:
- The code comments should be added for the 2 new macros, in particular for PGSS_NUM_LOCK_PARTITIONS. As you explained in your email, this may be used to limit the number of locks if a very large value for pgss_max is specified.
- From the code I inferred that the number of locks can in future be less than pgss_max (per your email where in future this macro could be used to limit the number of locks). I suggest to perhaps add some notes helping future changes in this code area.
(2) It seems like that "pgss->lock = &(pgss->base + pgss_max)->lock;" statement should not use pgss_max directly and instead use PGSS_NUM_LOCK_PARTITIONS macro, as when a limit is imposed on number of locks, this statement will cause an overrun.
--
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950 EMAIL: mailto:hamid.akhtar@highgo.ca
SKYPE: engineeredvirus
The new status of this patch is: Waiting on Author
Hi,
On 2020-08-19 00:43, Fujii Masao wrote:
Yes, I pushed the document_overhead_by_track_planning.patch, but this
CF entry is for pgss_lwlock_v1.patch which replaces spinlocks with
lwlocks
in pg_stat_statements. The latter patch has not been committed yet.
Probably attachding the different patches in the same thread would
cause
this confusing thing... Anyway, thanks for your comment!To avoid further confusion, I attached the rebased version of
the patch that was registered at CF. I'd appreciate it if
you review this version.
I tested pgss_lwlock_v2.patch with 3 workloads. And I couldn't observe
performance
improvement in our environment and I'm afraid to say that even worser in
some case.
- Workload1: pgbench select-only mode
- Workload2: pgbench custom scripts which run "SELECT 1;"
- Workload3: pgbench custom scripts which run 1000 types of different
simple queries
- Workload1
First we set the pg_stat_statements.track_planning to on/off and run the
fully-cached pgbench
select-only mode on pg14head which is installed in on-premises
server(32CPU, 256GB mem).
However in this enveronment we couldn't reproduce 45% performance drop
due to s_lock conflict
(Tharakan-san mentioned in his post on
2895b53b033c47ccb22972b589050dd9@EX13D05UWC001.ant.amazon.com).
- Workload2
Then we adopted pgbench custom script "SELECT 1;" which supposed to
increase the s_lock and
make it easier to reproduce the issue. In this case around 10% of
performance decrease
which also shows slightly increase in s_lock (~10%). With this senario,
despite a s_lock
absence, the patch shows more than 50% performance degradation
regardless of track_planning.
And also we couldn't see performance improvement in this workload.
pgbench:
initialization: pgbench -i -s 100
benchmarking : pgbench -j16 -c128 -T180 -r -n -f <script> -h <address>
-U <user> -p <port> -d <db>
# VACUUMed and pg_prewarmed manually before run the benchmark
query:SELECT 1;
pgss_lwlock_v2.patch track_planning TPS decline rate s_lock CPU usage - OFF 810509.4 standard 0.17% 98.8%(sys24.9%,user73.9%) - ON 732823.1 -9.6% 1.94% 95.1%(sys22.8%,user72.3%) + OFF 371035.0 -49.4% - 65.2%(sys20.6%,user44.6%) + ON 193965.2 -47.7% - 41.8%(sys12.1%,user29.7%)
# "-" is showing that s_lock was not reported from the perf.
- Workload3
Next, there is concern that replacement of LWLock may reduce performance
in some other workloads.
(Fujii-san mentioned in his post on
42a13b4c-e60c-c6e7-3475-8eff8050bed4@oss.nttdata.com).
To clarify this, we prepared 1000 simple queries which is supposed to
prevent the conflict of
s_lock and may expect to see the behavior without s_lock. In this case,
no performance decline
was observed and also we couldn't see any additional memory consumption
or cpu usage.
pgbench:
initialization: pgbench -n -i -s 100 --partitions=1000
--partition-method=range
benchmarking : command is same as (Workload1)
query: SELECT abalance FROM pgbench_accounts_xxxx WHERE aid = :aid +
(10000 * :random_num - 10000);
pgss_lwlock_v2.patch track_planning TPS decline rate CPU usage - OFF 88329.1 standard 82.1%(sys6.5%,user75.6%) - ON 88015.3 -0.36% 82.6%(sys6.5%,user76.1%) + OFF 88177.5 0.18% 82.2%(sys6.5%,user75.7%) + ON 88079.1 -0.11% 82.5%(sys6.5%,user76.0%)
(Environment)
machine:
server/client - 32 CPUs / 256GB # used same machine as server & client
postgres:
version: v14 (6eee73e)
configure: '--prefix=/usr/pgsql-14a' 'CFLAGS=-O2'
GUC param (changed from defaults):
shared_preload_libraries = 'pg_stat_statements, pg_prewarm'
autovacuum = off
checkpoint = 120min
max_connections=300
listen_address='*'
shared_buffers=64GB
Regards,
--
Hibiki Tanaka
On 2020/09/11 16:23, bttanakahbk wrote:
Hi,
On 2020-08-19 00:43, Fujii Masao wrote:
Yes, I pushed the document_overhead_by_track_planning.patch, but this
CF entry is for pgss_lwlock_v1.patch which replaces spinlocks with lwlocks
in pg_stat_statements. The latter patch has not been committed yet.
Probably attachding the different patches in the same thread would cause
this confusing thing... Anyway, thanks for your comment!To avoid further confusion, I attached the rebased version of
the patch that was registered at CF. I'd appreciate it if
you review this version.I tested pgss_lwlock_v2.patch with 3 workloads. And I couldn't observe performance
improvement in our environment and I'm afraid to say that even worser in some case.
- Workload1: pgbench select-only mode
- Workload2: pgbench custom scripts which run "SELECT 1;"
- Workload3: pgbench custom scripts which run 1000 types of different simple queries
Thanks for running the benchmarks!
- Workload1
First we set the pg_stat_statements.track_planning to on/off and run the fully-cached pgbench
select-only mode on pg14head which is installed in on-premises server(32CPU, 256GB mem).
However in this enveronment we couldn't reproduce 45% performance drop due to s_lock conflict
(Tharakan-san mentioned in his post on 2895b53b033c47ccb22972b589050dd9@EX13D05UWC001.ant.amazon.com).- Workload2
Then we adopted pgbench custom script "SELECT 1;" which supposed to increase the s_lock and
make it easier to reproduce the issue. In this case around 10% of performance decrease
which also shows slightly increase in s_lock (~10%). With this senario, despite a s_lock
absence, the patch shows more than 50% performance degradation regardless of track_planning.
And also we couldn't see performance improvement in this workload.pgbench:
initialization: pgbench -i -s 100
benchmarking : pgbench -j16 -c128 -T180 -r -n -f <script> -h <address> -U <user> -p <port> -d <db>
# VACUUMed and pg_prewarmed manually before run the benchmark
query:SELECT 1;pgss_lwlock_v2.patch track_planning TPS decline rate s_lock CPU usage
- OFF 810509.4 standard 0.17% 98.8%(sys24.9%,user73.9%)
- ON 732823.1 -9.6% 1.94% 95.1%(sys22.8%,user72.3%)
+ OFF 371035.0 -49.4% - 65.2%(sys20.6%,user44.6%)
+ ON 193965.2 -47.7% - 41.8%(sys12.1%,user29.7%)# "-" is showing that s_lock was not reported from the perf.
Ok, so my proposed patch degrated the performance in this case :(
This means that replacing spinlock with lwlock in pgss is not proper
approach for the lock contention issue on pgss...
I proposed to split the spinlock for each pgss entry into two
to reduce the lock contention, upthread. One is for planner stats,
and the other is for executor stats. Is it worth working on
this approach as an alternative idea? Or does anyone have any better idea?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Fri, Sep 11, 2020 at 4:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2020/09/11 16:23, bttanakahbk wrote:
pgbench:
initialization: pgbench -i -s 100
benchmarking : pgbench -j16 -c128 -T180 -r -n -f <script> -h <address> -U <user> -p <port> -d <db>
# VACUUMed and pg_prewarmed manually before run the benchmark
query:SELECT 1;pgss_lwlock_v2.patch track_planning TPS decline rate s_lock CPU usage - OFF 810509.4 standard 0.17% 98.8%(sys24.9%,user73.9%) - ON 732823.1 -9.6% 1.94% 95.1%(sys22.8%,user72.3%) + OFF 371035.0 -49.4% - 65.2%(sys20.6%,user44.6%) + ON 193965.2 -47.7% - 41.8%(sys12.1%,user29.7%)# "-" is showing that s_lock was not reported from the perf.
Ok, so my proposed patch degrated the performance in this case :(
This means that replacing spinlock with lwlock in pgss is not proper
approach for the lock contention issue on pgss...I proposed to split the spinlock for each pgss entry into two
to reduce the lock contention, upthread. One is for planner stats,
and the other is for executor stats. Is it worth working on
this approach as an alternative idea? Or does anyone have any better idea?
For now only calls and [min|max|mean|total]_time are split between
planning and execution, so we'd have to do the same for the rest of
the counters to be able to have 2 different spinlocks. That'll
increase the size of the struct quite a lot, and we'd also have to
change the SRF output, which is already quite wide.
On 2020-Sep-11, Fujii Masao wrote:
Ok, so my proposed patch degrated the performance in this case :(
This means that replacing spinlock with lwlock in pgss is not proper
approach for the lock contention issue on pgss...I proposed to split the spinlock for each pgss entry into two
to reduce the lock contention, upthread. One is for planner stats,
and the other is for executor stats. Is it worth working on
this approach as an alternative idea? Or does anyone have any better idea?
It does seem that the excl-locked section in pgss_store is rather large.
(I admit I don't understand why would a LWLock decrease performance.)
Andres suggested in [1]/messages/by-id/20200629231015.qlej5b3qpfe4uijo@alap3.anarazel.de to use atomics for the counters together with a
single lwlock to be used in shared mode only. I didn't quite understand
what the lwlock is *for*, but maybe you do.
[1]: /messages/by-id/20200629231015.qlej5b3qpfe4uijo@alap3.anarazel.de
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi,
On 2020-09-11 19:10:05 -0300, Alvaro Herrera wrote:
Andres suggested in [1] to use atomics for the counters together with a
single lwlock to be used in shared mode only. I didn't quite understand
what the lwlock is *for*, but maybe you do.[1] /messages/by-id/20200629231015.qlej5b3qpfe4uijo@alap3.anarazel.de
Just to be clear - I am saying that in the first iteration I would just
straight up replace the spinlock with an lwlock, i.e. having many
lwlocks.
The piece about a single shared lwlocks is/was about protecting the set
of entries that are currently in-memory - which can't easily be
implemented just using atomics (at least without the risk of increasing
the counters of an entry since replaced with another query).
Greetings,
Andres Freund
Hi,
During fully-cached SELECT-only test using pgbench, Postgres v13Beta1
shows ~45% performance drop [2] at high DB connection counts (when
compared with v12.3)
It'd be great if we could also give credit to "Sean Massey" for this find.
This study on performance regression was done after his (internal) triaging
that the v13 regression disappeared before this commit.
I regret the omission in the original post.
-
thanks
robins
Import Notes
Resolved by subject fallback
Hi,
Attached is a patch that adds Sean's name to the Release 13 Docs.
It'd be great if we could also give credit to "Sean Massey" for this
find.This study on performance regression was done after his (internal)
triaging that the v13 regression disappeared before this commit.I regret the omission in the original post.
Once again regret mentioning this earlier, but hope this is in time before GA :(
-
thanks
robins
Attachments:
trackplanning_docs_attribution.patchapplication/octet-stream; name=trackplanning_docs_attribution.patchDownload
diff --git a/doc/src/sgml/release-13.sgml b/doc/src/sgml/release-13.sgml
index 3fd97c9d10..0dd6104f55 100644
--- a/doc/src/sgml/release-13.sgml
+++ b/doc/src/sgml/release-13.sgml
@@ -3453,6 +3453,7 @@ Author: Peter Geoghegan <pg@bowt.ie>
<member>Ryohei Takahashi</member>
<member>Scott Ribe</member>
<member>Sean Farrell</member>
+ <member>Sean Massey</member>
<member>Sehrope Sarkuni</member>
<member>Sergei Agalakov</member>
<member>Sergei Kornilov</member>
"Tharakan, Robins" <tharar@amazon.com> writes:
During fully-cached SELECT-only test using pgbench, Postgres v13Beta1
shows ~45% performance drop [2] at high DB connection counts (when
compared with v12.3)
It'd be great if we could also give credit to "Sean Massey" for this find.
I poked through my Postgres inbox, and couldn't find any previous
mail from or mentioning a Sean Massey.
While there's not much of a formal policy around this, we usually limit
ourselves to crediting people who have directly interacted with the PG
community. I'm aware that there's more than a few cases where someone
talks to the community on behalf of a company-internal team ... but
since we have little visibility into such situations, whoever's doing
the talking gets all the credit.
Given that background, it seems like crediting Sean here would be
slightly unfair special treatment. I'd encourage him to join the
mailing lists and be part of the community directly --- then we'll
definitely know what he's contributed.
regards, tom lane
On Fri, Sep 11, 2020 at 03:32:54PM -0700, Andres Freund wrote:
The piece about a single shared lwlocks is/was about protecting the set
of entries that are currently in-memory - which can't easily be
implemented just using atomics (at least without the risk of increasing
the counters of an entry since replaced with another query).
This discussion has stalled, and the patch proposed is incorrect, so I
have marked it as RwF in the CF app.
--
Michael
Reviewing this change which was committed last year as
321fa6a4a26c9b649a0fbec9fc8b019f19e62289
On Fri, Jul 03, 2020 at 03:57:38PM +0900, Fujii Masao wrote:
On 2020/07/03 13:05, Pavel Stehule wrote:
p� 3. 7. 2020 v�4:39 odes�latel Fujii Masao <masao.fujii@oss.nttdata.com> napsal:
Maybe there can be documented so enabling this option can have a negative impact on performance.
Yes. What about adding either of the followings into the doc?
Enabling this parameter may incur a noticeable performance penalty.
or
Enabling this parameter may incur a noticeable performance penalty,
especially when a fewer kinds of queries are executed on many
concurrent connections.
Something seems is wrong with this sentence, and I'm not sure what it's trying
to say. Is this right ?
Enabling this parameter may incur a noticeable performance penalty,
especially when a small number of queries are executed on many
concurrent connections.
--
Justin
On 2021/04/19 8:36, Justin Pryzby wrote:
Reviewing this change which was committed last year as
321fa6a4a26c9b649a0fbec9fc8b019f19e62289On Fri, Jul 03, 2020 at 03:57:38PM +0900, Fujii Masao wrote:
On 2020/07/03 13:05, Pavel Stehule wrote:
pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com> napsal:
Maybe there can be documented so enabling this option can have a negative impact on performance.
Yes. What about adding either of the followings into the doc?
Enabling this parameter may incur a noticeable performance penalty.
or
Enabling this parameter may incur a noticeable performance penalty,
especially when a fewer kinds of queries are executed on many
concurrent connections.Something seems is wrong with this sentence, and I'm not sure what it's trying
to say. Is this right ?
pg_stat_statements users different spinlock for each kind of query.
So fewer kinds of queries many sessions execute, fewer spinlocks
they try to acquire. This may lead to spinlock contention and
significant performance degration. This is what the statement is
trying to say.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Mon, Apr 19, 2021 at 11:44:05PM +0900, Fujii Masao wrote:
On 2021/04/19 8:36, Justin Pryzby wrote:
Reviewing this change which was committed last year as
321fa6a4a26c9b649a0fbec9fc8b019f19e62289On Fri, Jul 03, 2020 at 03:57:38PM +0900, Fujii Masao wrote:
On 2020/07/03 13:05, Pavel Stehule wrote:
p� 3. 7. 2020 v�4:39 odes�latel Fujii Masao <masao.fujii@oss.nttdata.com> napsal:
Maybe there can be documented so enabling this option can have a negative impact on performance.
Yes. What about adding either of the followings into the doc?
Enabling this parameter may incur a noticeable performance penalty.
or
Enabling this parameter may incur a noticeable performance penalty,
especially when a fewer kinds of queries are executed on many
concurrent connections.Something seems is wrong with this sentence, and I'm not sure what it's trying
to say. Is this right ?pg_stat_statements users different spinlock for each kind of query.
So fewer kinds of queries many sessions execute, fewer spinlocks
they try to acquire. This may lead to spinlock contention and
significant performance degration. This is what the statement is
trying to say.
What does "kind" mean ? I think it means a "normalized" query or a "query
structure".
"a fewer kinds" is wrong, so I think the docs should say "a small number of
queries" or maybe:
Enabling this parameter may incur a noticeable performance penalty,
especially similar queries are run by many concurrent connections and
compete to update the same pg_stat_statements entry
--
Justin
On 2021/04/19 23:55, Justin Pryzby wrote:
What does "kind" mean ? I think it means a "normalized" query or a "query
structure"."a fewer kinds" is wrong, so I think the docs should say "a small number of
queries" or maybe:
Okay, I agree to update the description.
Enabling this parameter may incur a noticeable performance penalty,
especially similar queries are run by many concurrent connections and
compete to update the same pg_stat_statements entry
"a small number of" is better than "similar" at the above because
"similar" sounds a bit unclear in this case?
It's better to use "entries" rather than "entry" at the above?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Wed, Apr 21, 2021 at 11:38:52PM +0900, Fujii Masao wrote:
On 2021/04/19 23:55, Justin Pryzby wrote:
What does "kind" mean ? I think it means a "normalized" query or a "query
structure"."a fewer kinds" is wrong, so I think the docs should say "a small number of
queries" or maybe:Okay, I agree to update the description.
Enabling this parameter may incur a noticeable performance penalty,
especially similar queries are run by many concurrent connections and
compete to update the same pg_stat_statements entry"a small number of" is better than "similar" at the above because
"similar" sounds a bit unclear in this case?It's better to use "entries" rather than "entry" at the above?
How about like this?
Enabling this parameter may incur a noticeable performance penalty,
- especially when a fewer kinds of queries are executed on many
+ especially when queries with the same queryid are executed by many
concurrent connections.
Or:
Enabling this parameter may incur a noticeable performance penalty,
especially similar queries are executed by many concurrent connections
and compete to update a small number of pg_stat_statements entries.
--
Justin
On 2021/04/21 23:53, Justin Pryzby wrote:
Or:
Enabling this parameter may incur a noticeable performance penalty,
especially similar queries are executed by many concurrent connections
and compete to update a small number of pg_stat_statements entries.
I prefer this. But what about using "identical" instead of "similar"
because pg_stat_statements docs already uses "identical" in some places?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Thu, Apr 22, 2021 at 12:13:17AM +0900, Fujii Masao wrote:
On 2021/04/21 23:53, Justin Pryzby wrote:
Or:
Enabling this parameter may incur a noticeable performance penalty,
especially similar queries are executed by many concurrent connections
and compete to update a small number of pg_stat_statements entries.I prefer this. But what about using "identical" instead of "similar"
because pg_stat_statements docs already uses "identical" in some places?
I also missed "when", again...
Show quoted text
Enabling this parameter may incur a noticeable performance penalty,
especially when queries with identical structure are executed by many concurrent connections
which compete to update a small number of pg_stat_statements entries.
On Wed, Apr 21, 2021 at 10:40:07AM -0500, Justin Pryzby wrote:
On Thu, Apr 22, 2021 at 12:13:17AM +0900, Fujii Masao wrote:
On 2021/04/21 23:53, Justin Pryzby wrote:
Or:
Enabling this parameter may incur a noticeable performance penalty,
especially similar queries are executed by many concurrent connections
and compete to update a small number of pg_stat_statements entries.I prefer this. But what about using "identical" instead of "similar"
because pg_stat_statements docs already uses "identical" in some places?I also missed "when", again...
Enabling this parameter may incur a noticeable performance penalty,
especially when queries with identical structure are executed by many concurrent connections
which compete to update a small number of pg_stat_statements entries.
Checking back - here's the latest patch.
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 930081c429..9e98472c5c 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -696,8 +696,9 @@
<varname>pg_stat_statements.track_planning</varname> controls whether
planning operations and duration are tracked by the module.
Enabling this parameter may incur a noticeable performance penalty,
- especially when queries with the same queryid are executed on many
- concurrent connections.
+ especially when queries with identical structure are executed by many
+ concurrent connections which compete to update a small number of
+ pg_stat_statements entries.
The default value is <literal>off</literal>.
Only superusers can change this setting.
</para>
On Tue, Jun 29, 2021 at 10:09 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
Checking back - here's the latest patch.
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml index 930081c429..9e98472c5c 100644 --- a/doc/src/sgml/pgstatstatements.sgml +++ b/doc/src/sgml/pgstatstatements.sgml @@ -696,8 +696,9 @@ <varname>pg_stat_statements.track_planning</varname> controls whether planning operations and duration are tracked by the module. Enabling this parameter may incur a noticeable performance penalty, - especially when queries with the same queryid are executed on many - concurrent connections. + especially when queries with identical structure are executed by many + concurrent connections which compete to update a small number of + pg_stat_statements entries. The default value is <literal>off</literal>. Only superusers can change this setting. </para>
Is "identical structure" really accurate here? For instance a multi
tenant application could rely on the search_path and only use
unqualified relation name. So while they have queries with identical
structure, those will generate a large number of different query_id.
On Tue, Jun 29, 2021 at 10:29:43AM +0800, Julien Rouhaud wrote:
On Tue, Jun 29, 2021 at 10:09 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
Is "identical structure" really accurate here? For instance a multi
tenant application could rely on the search_path and only use
unqualified relation name. So while they have queries with identical
structure, those will generate a large number of different query_id.
We borrowed that language from the previous text:
| Plannable queries (that is, SELECT, INSERT, UPDATE, and DELETE) are combined into a single pg_stat_statements entry whenever they have identical query structures according to an internal hash calculation
Note that it continues to say:
|In some cases, queries with visibly different texts might get merged into a single pg_stat_statements entry. Normally this will happen only for semantically equivalent queries, but there is a small chance of hash collisions causing unrelated queries to be merged into one entry. (This cannot happen for queries belonging to different users or databases, however.)
|
|Since the queryid hash value is computed on the post-parse-analysis representation of the queries, the opposite is also possible: queries with identical texts might appear as separate entries, if they have different meanings as a result of factors such as different search_path settings.
Really, I'm only trying to fix where it currently says "a fewer kinds".
It looks like I'd sent the wrong diff (git diff with a previous patch applied).
I think this is the latest proposal:
Enabling this parameter may incur a noticeable performance penalty,
- especially when a fewer kinds of queries are executed on many
- concurrent connections.
+ especially when queries with identical structure are executed by many
+ concurrent connections which compete to update a small number of
+ pg_stat_statements entries.
It could say "identical structure" or "the same queryid" or "identical queryid".
--
Justin
On Tue, Jun 29, 2021 at 10:45 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
We borrowed that language from the previous text:
| Plannable queries (that is, SELECT, INSERT, UPDATE, and DELETE) are combined into a single pg_stat_statements entry whenever they have identical query structures according to an internal hash calculation
Yes, but here's it's "identical query structure", which seems less
ambiguous than "identical structure" as iI think one could think it
refer to internal representation as much as as the query text. And
it's also removing any doubt with the final "internal hash
calculation".
Really, I'm only trying to fix where it currently says "a fewer kinds".
I agree that "fewer kinds" should be improved.
Enabling this parameter may incur a noticeable performance penalty, - especially when a fewer kinds of queries are executed on many - concurrent connections. + especially when queries with identical structure are executed by many + concurrent connections which compete to update a small number of + pg_stat_statements entries.It could say "identical structure" or "the same queryid" or "identical queryid".
I think we should try to reuse the previous formulation. How about
"statements with identical query structure"? Or replace query
structure with "internal representation", in both places?
On 2021/06/30 0:12, Julien Rouhaud wrote:
Enabling this parameter may incur a noticeable performance penalty, - especially when a fewer kinds of queries are executed on many - concurrent connections. + especially when queries with identical structure are executed by many + concurrent connections which compete to update a small number of + pg_stat_statements entries.It could say "identical structure" or "the same queryid" or "identical queryid".
I think we should try to reuse the previous formulation. How about
"statements with identical query structure"?
I'm fine with this. So what about the following diff? I added <structname> tag.
<varname>pg_stat_statements.track_planning</varname> controls whether
planning operations and duration are tracked by the module.
Enabling this parameter may incur a noticeable performance penalty,
- especially when a fewer kinds of queries are executed on many
- concurrent connections.
+ especially when statements with identical query structure are executed
+ by many concurrent connections which compete to update a small number of
+ <structname>pg_stat_statements</structname> entries.
The default value is <literal>off</literal>.
Only superusers can change this setting.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Thu, Jul 1, 2021 at 4:28 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
I'm fine with this. So what about the following diff? I added <structname> tag.
<varname>pg_stat_statements.track_planning</varname> controls whether planning operations and duration are tracked by the module. Enabling this parameter may incur a noticeable performance penalty, - especially when a fewer kinds of queries are executed on many - concurrent connections. + especially when statements with identical query structure are executed + by many concurrent connections which compete to update a small number of + <structname>pg_stat_statements</structname> entries. The default value is <literal>off</literal>. Only superusers can change this setting.
It seems perfect, thanks!
On 2021/07/07 18:09, Julien Rouhaud wrote:
On Thu, Jul 1, 2021 at 4:28 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
I'm fine with this. So what about the following diff? I added <structname> tag.
<varname>pg_stat_statements.track_planning</varname> controls whether planning operations and duration are tracked by the module. Enabling this parameter may incur a noticeable performance penalty, - especially when a fewer kinds of queries are executed on many - concurrent connections. + especially when statements with identical query structure are executed + by many concurrent connections which compete to update a small number of + <structname>pg_stat_statements</structname> entries. The default value is <literal>off</literal>. Only superusers can change this setting.It seems perfect, thanks!
Pushed. Thanks!
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION