Perf Benchmarking and regression.

Started by Mithun Cyabout 10 years ago56 messageshackers

mithun.cy@enterprisedb.com

about 10 years ago

I tried to do some benchmarking on postgres master head
commit 72a98a639574d2e25ed94652848555900c81a799
Author: Andres Freund <andres@anarazel.de>
Date: Tue Apr 26 20:32:51 2016 -0700

CASE : Read-Write Tests when data exceeds shared buffers.

Non Default settings and test
./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
checkpoint_completion_target=0.9 &

./pgbench -i -s 1000 postgres

./pgbench -c $threads -j $threads -T 1800 -M prepared postgres

Machine : "cthulhu" 8 node numa machine with 128 hyper threads.

numactl --hardware

available: 8 nodes (0-7)
node 0 cpus: 0 65 66 67 68 69 70 71 96 97 98 99 100 101 102 103
node 0 size: 65498 MB
node 0 free: 37885 MB
node 1 cpus: 72 73 74 75 76 77 78 79 104 105 106 107 108 109 110 111
node 1 size: 65536 MB
node 1 free: 31215 MB
node 2 cpus: 80 81 82 83 84 85 86 87 112 113 114 115 116 117 118 119
node 2 size: 65536 MB
node 2 free: 15331 MB
node 3 cpus: 88 89 90 91 92 93 94 95 120 121 122 123 124 125 126 127
node 3 size: 65536 MB
node 3 free: 36774 MB
node 4 cpus: 1 2 3 4 5 6 7 8 33 34 35 36 37 38 39 40
node 4 size: 65536 MB
node 4 free: 62 MB
node 5 cpus: 9 10 11 12 13 14 15 16 41 42 43 44 45 46 47 48
node 5 size: 65536 MB
node 5 free: 9653 MB
node 6 cpus: 17 18 19 20 21 22 23 24 49 50 51 52 53 54 55 56
node 6 size: 65536 MB
node 6 free: 50209 MB
node 7 cpus: 25 26 27 28 29 30 31 32 57 58 59 60 61 62 63 64
node 7 size: 65536 MB
node 7 free: 43966 MB
node distances:
node 0 1 2 3 4 5 6 7
0: 10 21 21 21 21 21 21 21
1: 21 10 21 21 21 21 21 21
2: 21 21 10 21 21 21 21 21
3: 21 21 21 10 21 21 21 21
4: 21 21 21 21 10 21 21 21
5: 21 21 21 21 21 10 21 21
6: 21 21 21 21 21 21 10 21
7: 21 21 21 21 21 21 21 10

I see some regression when compared to 9.5

*Sessions* *PostgreSQL-9.5 scale 1000* *PostgreSQL-9.6 scale 1000* %diff
*1* 747.367249 892.149891 19.3723557185
*8* 5281.282799 4941.905008 -6.4260484416
*16* 9000.915419 8695.396233 -3.3943123758
*24* 11852.839627 10843.328776 -8.5170379653
*32* 14323.048334 11977.505153 -16.3760054864
*40* 16098.926583 12195.447024 -24.2468312336
*48* 16959.646965 12639.951087 -25.4704351271
*56* 17157.737762 12543.212929 -26.894715941
*64* 17201.914922 12628.002422 -26.5895542487
*72* 16956.994835 11280.870599 -33.4736448954
*80* 16775.954896 11348.830603 -32.3506132834
*88* 16609.137558 10823.465121 -34.834273705
*96* 16510.099404 11091.757753 -32.8183466278
*104* 16275.724927 10665.743275 -34.4683980416
*112* 16141.815128 10977.84664 -31.9912503461
*120* 15904.086614 10716.17755 -32.6199749153
*128* 15738.391503 10962.333439 -30.3465450271

When I run git bisect on master (And this is for 128 clients).

2 commitIds which affected the performance

1. # first bad commit: [ac1d7945f866b1928c2554c0f80fd52d7f977772] Make idle
backends exit if the postmaster dies.
this made performance to drop from

15947.21546 (15K +) to 13409.758510 (arround 13K+).

2. # first bad commit: [428b1d6b29ca599c5700d4bc4f4ce4c5880369bf] Allow to
trigger kernel writeback after a configurable number of writes.

this made performance to drop further to 10962.333439 (10K +)

I think It did not recover afterwards.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Mithun Cy (#1)

Re: Perf Benchmarking and regression.

Hi,

Thanks for benchmarking!

On 2016-05-06 19:43:52 +0530, Mithun Cy wrote:

1. # first bad commit: [ac1d7945f866b1928c2554c0f80fd52d7f977772] Make idle
backends exit if the postmaster dies.
this made performance to drop from

15947.21546 (15K +) to 13409.758510 (arround 13K+).

Let's debug this one first, it's a lot more local. I'm rather surprised
that you're seing a big effect with that "few" TPS/socket operations;
and even more that our efforts to address that problem haven't been
fruitful (given we've verified the fix on a number of machines).

Can you verify that removing
AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL, NULL);
in src/backend/libpq/pqcomm.c : pq_init() restores performance?

I think it'd be best to test the back/forth on master with
bgwriter_flush_after = 0
checkpointer_flush_after = 0
backend_flush_after = 0
to isolate the issue.

Also, do you see read-only workloads to be affected too?

2. # first bad commit: [428b1d6b29ca599c5700d4bc4f4ce4c5880369bf] Allow to
trigger kernel writeback after a configurable number of writes.

FWIW, it'd be very interesting to test again with a bigger
backend_flush_after setting.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Mithun Cy

mithun.cy@enterprisedb.com

about 10 years ago

In reply to: Andres Freund (#2)

Re: Perf Benchmarking and regression.

On Fri, May 6, 2016 at 8:35 PM, Andres Freund <andres@anarazel.de> wrote:

Also, do you see read-only workloads to be affected too?

Thanks, I have not tested with above specific commitid which reported
performance issue but
At HEAD commit 72a98a639574d2e25ed94652848555900c81a799
Author: Andres Freund <andres@anarazel.de>
Date: Tue Apr 26 20:32:51 2016 -0700

READ-Only (prepared) tests (both when data fits to shared buffers or it
exceeds shared buffer=8GB) performance of master has improved over 9.5

*Sessions* *PostgreSQL-9.5 scale 300* *PostgreSQL-9.6 scale 300* *%diff*
*1* 5287.561594 5213.723197 -1.396454598
*8* 84265.389083 84871.305689 0.719057507
*16* 148330.4155 158661.128315 6.9646624936
*24* 207062.803697 219958.12974 6.2277366155
*32* 265145.089888 290190.501443 9.4459269699
*40* 311688.752973 340000.551772 9.0833559212
*48* 327169.9673 372408.073033 13.8270960829
*56* 274426.530496 390629.24948 42.3438356248
*64* 261777.692042 384613.9666 46.9238893505
*72* 210747.55937 376390.162022 78.5976374517
*80* 220192.818648 398128.779329 80.8091570713
*88* 185176.91888 423906.711882 128.9198429512
*96* 161579.719039 421541.656474 160.8877271115
*104* 146935.568434 450672.740567 206.7145316618
*112* 136605.466232 432047.309248 216.2738074582
*120* 127687.175016 455458.086889 256.6983816753
*128* 120413.936453 428127.879242 255.5467845776

*Sessions* *PostgreSQL-9.5 scale 1000* *PostgreSQL-9.6 scale 1000* %diff
*1* 5103.812202 5155.434808 1.01145191
*8* 47741.9041 53117.805096 11.2603405694
*16* 89722.57031 86965.10079 -3.0733287182
*24* 130914.537373 153849.634245 17.5191367836
*32* 197125.725706 212454.474264 7.7761279017
*40* 248489.551052 270304.093767 8.7788571482
*48* 291884.652232 317257.836746 8.6928806705
*56* 304526.216047 359676.785476 18.1102862489
*64* 301440.463174 388324.710185 28.8230206709
*72* 194239.941979 393676.628802 102.6754254511
*80* 144879.527847 383365.678053 164.6099719885
*88* 122894.325326 372905.436117 203.4358463076
*96* 109836.31148 362208.867756 229.7715144249
*104* 103791.981583 352330.402278 239.4582094921
*112* 105189.206682 345722.499429 228.6672752217
*120* 108095.811432 342597.969088 216.939171416
*128* 113242.59492 333821.98763 194.7848270925

Even for READ-WRITE when data fits into shared buffer (scale_factor=300 and
shared_buffers=8GB) performance has improved.
Only case is when data exceeds shared_buffer(scale_factor=1000 and
shared_buffers=8GB) I see some regression.

I will try to run the tests as you have suggested and will report the same.

Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Mithun Cy (#3)

Re: Perf Benchmarking and regression.

Hi,

On 2016-05-06 21:21:11 +0530, Mithun Cy wrote:

I will try to run the tests as you have suggested and will report the same.

Any news on that front?

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Ashutosh Sharma

ashu.coek88@gmail.com

about 10 years ago

In reply to: Andres Freund (#4)

Re: Perf Benchmarking and regression.

Hi Andres,

I am extremely sorry for the delayed response. As suggested by you, I have
taken the performance readings at 128 client counts after making the
following two changes:

*1).* Removed AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL,
NULL); from pq_init(). Below is the git diff for the same.

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 8d6eb0b..399d54b 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -206,7 +206,9 @@ pq_init(void)
        AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE,
MyProcPort->sock,
                                          NULL, NULL);
        AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL);
+#if 0
        AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL, NULL);
+#endif

*2).* Disabled the guc vars "bgwriter_flush_after",
"checkpointer_flush_after" and "backend_flush_after" by setting them to
zero.

After doing the above two changes below are the readings i got for 128
client counts:

*CASE :* Read-Write Tests when data exceeds shared buffers.

./pgbench -i -s 1000 postgres

./pgbench -c 128 -j 128 -T 1800 -M prepared postgres

*Run1 :* tps = 9690.678225
*Run2 :* tps = 9904.320645
*Run3 :* tps = 9943.547176

Please let me know if i need to take readings with other client counts as
well.

*Note:* I have taken these readings on postgres master head at,

commit 91fd1df4aad2141859310564b498a3e28055ee28
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun May 8 16:53:55 2016 -0400

With Regards,
Ashutosh Sharma
EnterpriseDB: *http://www.enterprisedb.com <http://www.enterprisedb.com>*

On Wed, May 11, 2016 at 3:53 AM, Andres Freund <andres@anarazel.de> wrote:

Show quoted text

Hi,

On 2016-05-06 21:21:11 +0530, Mithun Cy wrote:

I will try to run the tests as you have suggested and will report the

same.

Any news on that front?

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Ashutosh Sharma (#5)

Re: Perf Benchmarking and regression.

On Wed, May 11, 2016 at 12:51 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

I am extremely sorry for the delayed response. As suggested by you, I have
taken the performance readings at 128 client counts after making the
following two changes:

1). Removed AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL,
NULL); from pq_init(). Below is the git diff for the same.
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 8d6eb0b..399d54b 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -206,7 +206,9 @@ pq_init(void)
AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE,
MyProcPort->sock,
NULL, NULL);
AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL);
+#if 0
AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL, NULL);
+#endif
2). Disabled the guc vars "bgwriter_flush_after", "checkpointer_flush_after"
and "backend_flush_after" by setting them to zero.

After doing the above two changes below are the readings i got for 128
client counts:

CASE : Read-Write Tests when data exceeds shared buffers.

Non Default settings and test
./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
checkpoint_completion_target=0.9 &

./pgbench -i -s 1000 postgres

./pgbench -c 128 -j 128 -T 1800 -M prepared postgres

Run1 : tps = 9690.678225
Run2 : tps = 9904.320645
Run3 : tps = 9943.547176

Please let me know if i need to take readings with other client counts as
well.

Can you please take four new sets of readings, like this:

- Unpatched master, default *_flush_after
- Unpatched master, *_flush_after=0
- That line removed with #if 0, default *_flush_after
- That line removed with #if 0, *_flush_after=0

128 clients is fine. But I want to see four sets of numbers that were
all taken by the same person at the same time using the same script.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Ashutosh Sharma

ashu.coek88@gmail.com

about 10 years ago

In reply to: Robert Haas (#6)

Re: Perf Benchmarking and regression.

Hi,

Please find the test results for the following set of combinations taken at
128 client counts:

*1)* *Unpatched master, default *_flush_after :* TPS = 10925.882396

*2) Unpatched master, *_flush_after=0 :* TPS = 18613.343529

*3)* *That line removed with #if 0, default *_flush_after :* TPS =
9856.809278

*4)* *That line removed with #if 0, *_flush_after=0 :* TPS = 18158.648023

Here, *That line* points to "*AddWaitEventToSet(FeBeWaitSet,
WL_POSTMASTER_DEATH, -1, NULL, NULL);* in pq_init()."

Please note that earlier i had taken readings with data directory and
pg_xlog directory at the same location in HDD. But this time i have changed
the location of pg_xlog to ssd and taken the readings. With pg_xlog and
data directory at the same location in HDD i was seeing much lesser
performance like for "*That line removed with #if 0, *_flush_after=0 :*"
case i was getting 7367.709378 tps.

Also, the commit-id on which i have taken above readings along with pgbench
commands used are mentioned below:

commit 8a13d5e6d1bb9ff9460c72992657077e57e30c32
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Wed May 11 17:06:53 2016 -0400

Fix infer_arbiter_indexes() to not barf on system columns.

*Non Default settings and test*:
./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
checkpoint_completion_target=0.9 &

./pgbench -i -s 1000 postgres

./pgbench -c 128 -j 128 -T 1800 -M prepared postgres

With Regards,
Ashutosh Sharma
EnterpriseDB: *http://www.enterprisedb.com <http://www.enterprisedb.com>*

On Thu, May 12, 2016 at 9:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Show quoted text

On Wed, May 11, 2016 at 12:51 AM, Ashutosh Sharma <ashu.coek88@gmail.com>
wrote:

I am extremely sorry for the delayed response. As suggested by you, I

have
taken the performance readings at 128 client counts after making the
following two changes:

1). Removed AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL,
NULL); from pq_init(). Below is the git diff for the same.
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 8d6eb0b..399d54b 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -206,7 +206,9 @@ pq_init(void)
AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE,
MyProcPort->sock,
NULL, NULL);
AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL);
+#if 0
AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL,
NULL);

+#endif

2). Disabled the guc vars "bgwriter_flush_after",

"checkpointer_flush_after"

and "backend_flush_after" by setting them to zero.

After doing the above two changes below are the readings i got for 128
client counts:

CASE : Read-Write Tests when data exceeds shared buffers.

Non Default settings and test
./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB

-c

checkpoint_completion_target=0.9 &

./pgbench -i -s 1000 postgres

./pgbench -c 128 -j 128 -T 1800 -M prepared postgres

Run1 : tps = 9690.678225
Run2 : tps = 9904.320645
Run3 : tps = 9943.547176

Please let me know if i need to take readings with other client counts as
well.

Can you please take four new sets of readings, like this:

- Unpatched master, default *_flush_after
- Unpatched master, *_flush_after=0
- That line removed with #if 0, default *_flush_after
- That line removed with #if 0, *_flush_after=0

128 clients is fine. But I want to see four sets of numbers that were
all taken by the same person at the same time using the same script.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Ashutosh Sharma (#7)

Re: Perf Benchmarking and regression.

On Thu, May 12, 2016 at 8:39 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Please find the test results for the following set of combinations taken at
128 client counts:

1) Unpatched master, default *_flush_after : TPS = 10925.882396

2) Unpatched master, *_flush_after=0 : TPS = 18613.343529

3) That line removed with #if 0, default *_flush_after : TPS = 9856.809278

4) That line removed with #if 0, *_flush_after=0 : TPS = 18158.648023

I'm getting increasingly unhappy about the checkpoint flush control.
I saw major regressions on my parallel COPY test, too:

/messages/by-id/CA+TgmoYoUQf9cGcpgyGNgZQHcY-gCcKRyAqQtDU8KFE4N6HVkA@mail.gmail.com

That was a completely different machine (POWER7 instead of Intel,
lousy disks instead of good ones) and a completely different workload.
Considering these results, I think there's now plenty of evidence to
suggest that this feature is going to be horrible for a large number
of users. A 45% regression on pgbench is horrible. (Nobody wants to
take even a 1% hit for snapshot too old, right?) Sure, it might not
be that way for every user on every Linux system, and I'm sure it
performed well on the systems where Andres benchmarked it, or he
wouldn't have committed it. But our goal can't be to run well only on
the newest hardware with the least-buggy kernel...

Here, That line points to "AddWaitEventToSet(FeBeWaitSet,
WL_POSTMASTER_DEATH, -1, NULL, NULL); in pq_init()."

Given the above results, it's not clear whether that is making things
better or worse.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Ashutosh Sharma (#7)

Re: Perf Benchmarking and regression.

Hi,

On 2016-05-12 18:09:07 +0530, Ashutosh Sharma wrote:

Please find the test results for the following set of combinations taken at
128 client counts:

Thanks.

*1)* *Unpatched master, default *_flush_after :* TPS = 10925.882396

Could you run this one with a number of different backend_flush_after
settings? I'm suspsecting the primary issue is that the default is too low.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Andres Freund (#9)

Re: Perf Benchmarking and regression.

On Thu, May 12, 2016 at 11:13 AM, Andres Freund <andres@anarazel.de> wrote:

Could you run this one with a number of different backend_flush_after
settings? I'm suspsecting the primary issue is that the default is too low.

What values do you think would be good to test? Maybe provide 3 or 4
suggested values to try?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Robert Haas (#10)

Re: Perf Benchmarking and regression.

On 2016-05-12 11:27:31 -0400, Robert Haas wrote:

On Thu, May 12, 2016 at 11:13 AM, Andres Freund <andres@anarazel.de> wrote:

Could you run this one with a number of different backend_flush_after
settings? I'm suspsecting the primary issue is that the default is too low.

What values do you think would be good to test? Maybe provide 3 or 4
suggested values to try?

0 (disabled), 16 (current default), 32, 64, 128, 256?

I'm suspecting that only backend_flush_after_* has these negative
performance implications at this point. One path is to increase that
option's default value, another is to disable only backend guided
flushing. And add a strong hint that if you care about predictable
throughput you might want to enable it.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Robert Haas (#8)

Re: Perf Benchmarking and regression.

On 2016-05-12 10:49:06 -0400, Robert Haas wrote:

On Thu, May 12, 2016 at 8:39 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Please find the test results for the following set of combinations taken at
128 client counts:

1) Unpatched master, default *_flush_after : TPS = 10925.882396

2) Unpatched master, *_flush_after=0 : TPS = 18613.343529

3) That line removed with #if 0, default *_flush_after : TPS = 9856.809278

4) That line removed with #if 0, *_flush_after=0 : TPS = 18158.648023

I'm getting increasingly unhappy about the checkpoint flush control.
I saw major regressions on my parallel COPY test, too:

Yes, I'm concerned too.

The workload in this thread is a bit of an "artificial" workload (all
data is constantly updated, doesn't fit into shared_buffers, fits into
the OS page cache), and only measures throughput not latency. But I
agree that that's way too large a regression to accept, and that there's
a significant number of machines with way undersized shared_buffer
values.

/messages/by-id/CA+TgmoYoUQf9cGcpgyGNgZQHcY-gCcKRyAqQtDU8KFE4N6HVkA@mail.gmail.com

That was a completely different machine (POWER7 instead of Intel,
lousy disks instead of good ones) and a completely different workload.
Considering these results, I think there's now plenty of evidence to
suggest that this feature is going to be horrible for a large number
of users. A 45% regression on pgbench is horrible.

I asked you over there whether you could benchmark with just different
values for backend_flush_after... I chose the current value because it
gives the best latency / most consistent throughput numbers, but 128kb
isn't a large window. I suspect we might need to disable backend guided
flushing if that's not sufficient :(

Here, That line points to "AddWaitEventToSet(FeBeWaitSet,
WL_POSTMASTER_DEATH, -1, NULL, NULL); in pq_init()."

Given the above results, it's not clear whether that is making things
better or worse.

Yea, me neither. I think it's doubful that you'd see performance
difference due to the original ac1d7945f866b1928c2554c0f80fd52d7f977772
, independent of the WaitEventSet stuff, at these throughput rates.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Fabien COELHO

coelho@cri.ensmp.fr

about 10 years ago

In reply to: Andres Freund (#12)

Re: Perf Benchmarking and regression.

I'm getting increasingly unhappy about the checkpoint flush control.
I saw major regressions on my parallel COPY test, too:

Yes, I'm concerned too.

A few thoughts:

- focussing on raw tps is not a good idea, because it may be a lot of tps
followed by a sync panic, with an unresponsive database. I wish the
performance reports would include some indication of the distribution
(eg min/q1/median/d3/max tps per second seen, standard deviation), not
just the final "tps" figure.

- checkpoint flush control (checkpoint_flush_after) should mostly always
beneficial because it flushes sorted data. I would be surprised
to see significant regressions with this on. A lot of tests showed
maybe improved tps, but mostly greatly improved performance stability,
where a database unresponsive 60% of the time (60% of seconds in the
the tps show very low or zero tps) and then becomes always responsive.

- other flush controls ({backend,bgwriter}_flush_after) may just increase
random writes, so are more risky in nature because the data is not
sorted, and it may or may not be a good idea depending on detailed
conditions. A "parallel copy" would be just such a special IO load
which degrade performance under these settings.

Maybe these two should be disabled by default because they lead to
possibly surprising regressions?

- for any particular load, the admin can decide to disable these if
they think it is better not to flush. Also, as suggested by Andres,
with 128 parallel queries the default value may not be appropriate
at all.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Ashutosh Sharma

ashu.coek88@gmail.com

about 10 years ago

In reply to: Andres Freund (#11)

Re: Perf Benchmarking and regression.

Hi,

Following are the performance results for read write test observed with
different numbers of "*backend_flush_after*".

1) backend_flush_after = *256kb* (32*8kb), tps = *10841.178815*
2) backend_flush_after = *512kb* (64*8kb), tps = *11098.702707*
3) backend_flush_after = *1MB* (128*8kb), tps = *11434.964545*
4) backend_flush_after = *2MB* (256*8kb), tps = *13477.089417*

*Note:* Above test has been performed on Unpatched master with default
values for checkpoint_flush_after, bgwriter_flush_after
and wal_writer_flush_after.

With Regards,
Ashutosh Sharma
EnterpriseDB:* http://www.enterprisedb.com <http://www.enterprisedb.com>*

On Thu, May 12, 2016 at 9:20 PM, Andres Freund <andres@anarazel.de> wrote:

Show quoted text

On 2016-05-12 11:27:31 -0400, Robert Haas wrote:

On Thu, May 12, 2016 at 11:13 AM, Andres Freund <andres@anarazel.de>

wrote:

Could you run this one with a number of different backend_flush_after
settings? I'm suspsecting the primary issue is that the default is

too low.

What values do you think would be good to test? Maybe provide 3 or 4
suggested values to try?

0 (disabled), 16 (current default), 32, 64, 128, 256?

I'm suspecting that only backend_flush_after_* has these negative
performance implications at this point. One path is to increase that
option's default value, another is to disable only backend guided
flushing. And add a strong hint that if you care about predictable
throughput you might want to enable it.

Greetings,

Andres Freund

#15

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Ashutosh Sharma (#14)

Re: Perf Benchmarking and regression.

On Fri, May 13, 2016 at 7:08 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Following are the performance results for read write test observed with
different numbers of "backend_flush_after".

1) backend_flush_after = 256kb (32*8kb), tps = 10841.178815
2) backend_flush_after = 512kb (64*8kb), tps = 11098.702707
3) backend_flush_after = 1MB (128*8kb), tps = 11434.964545
4) backend_flush_after = 2MB (256*8kb), tps = 13477.089417

So even at 2MB we don't come close to recovering all of the lost
performance. Can you please test these three scenarios?

1. Default settings for *_flush_after
2. backend_flush_after=0, rest defaults
3. backend_flush_after=0, bgwriter_flush_after=0,
wal_writer_flush_after=0, checkpoint_flush_after=0

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Robert Haas (#15)

Re: Perf Benchmarking and regression.

On 2016-05-13 10:20:04 -0400, Robert Haas wrote:

On Fri, May 13, 2016 at 7:08 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Following are the performance results for read write test observed with
different numbers of "backend_flush_after".

1) backend_flush_after = 256kb (32*8kb), tps = 10841.178815
2) backend_flush_after = 512kb (64*8kb), tps = 11098.702707
3) backend_flush_after = 1MB (128*8kb), tps = 11434.964545
4) backend_flush_after = 2MB (256*8kb), tps = 13477.089417

So even at 2MB we don't come close to recovering all of the lost
performance. Can you please test these three scenarios?

1. Default settings for *_flush_after
2. backend_flush_after=0, rest defaults
3. backend_flush_after=0, bgwriter_flush_after=0,
wal_writer_flush_after=0, checkpoint_flush_after=0

4) 1) + a shared_buffers setting appropriate to the workload.

I just want to emphasize what we're discussing here is a bit of an
extreme setup. A workload that's bigger than shared buffers, but smaller
than the OS's cache size; with a noticeable likelihood of rewriting
individual OS page cache pages within 30s.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Andres Freund (#16)

Re: Perf Benchmarking and regression.

On Fri, May 13, 2016 at 1:43 PM, Andres Freund <andres@anarazel.de> wrote:

On 2016-05-13 10:20:04 -0400, Robert Haas wrote:

On Fri, May 13, 2016 at 7:08 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Following are the performance results for read write test observed with
different numbers of "backend_flush_after".

1) backend_flush_after = 256kb (32*8kb), tps = 10841.178815
2) backend_flush_after = 512kb (64*8kb), tps = 11098.702707
3) backend_flush_after = 1MB (128*8kb), tps = 11434.964545
4) backend_flush_after = 2MB (256*8kb), tps = 13477.089417

So even at 2MB we don't come close to recovering all of the lost
performance. Can you please test these three scenarios?

1. Default settings for *_flush_after
2. backend_flush_after=0, rest defaults
3. backend_flush_after=0, bgwriter_flush_after=0,
wal_writer_flush_after=0, checkpoint_flush_after=0

4) 1) + a shared_buffers setting appropriate to the workload.

I just want to emphasize what we're discussing here is a bit of an
extreme setup. A workload that's bigger than shared buffers, but smaller
than the OS's cache size; with a noticeable likelihood of rewriting
individual OS page cache pages within 30s.

You're just describing pgbench with a scale factor too large to fit in
shared_buffers. I think it's unfair to paint that as some kind of
niche use case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Robert Haas (#17)

Re: Perf Benchmarking and regression.

On 2016-05-13 14:43:15 -0400, Robert Haas wrote:

On Fri, May 13, 2016 at 1:43 PM, Andres Freund <andres@anarazel.de> wrote:

I just want to emphasize what we're discussing here is a bit of an
extreme setup. A workload that's bigger than shared buffers, but smaller
than the OS's cache size; with a noticeable likelihood of rewriting
individual OS page cache pages within 30s.

You're just describing pgbench with a scale factor too large to fit in
shared_buffers.

Well, that *and* a scale factor smaller than 20% of the memory
available, *and* a scale factor small enough that make re-dirtying of
already written out pages likely.

I think it's unfair to paint that as some kind of niche use case.

I'm not saying we don't need to do something about it. Just that it's a
hard tradeoff to make. The massive performance / latency we've observed
originate from the kernel caching too much dirty IO. The fix is making
is cache fewer dirty pages. But there's workloads where the kernel's
buffer cache works as an extension of our page cache.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Amit Kapila

amit.kapila16@gmail.com

about 10 years ago

In reply to: Andres Freund (#16)

Re: Perf Benchmarking and regression.

On Fri, May 13, 2016 at 11:13 PM, Andres Freund <andres@anarazel.de> wrote:

On 2016-05-13 10:20:04 -0400, Robert Haas wrote:

On Fri, May 13, 2016 at 7:08 AM, Ashutosh Sharma <ashu.coek88@gmail.com>

wrote:

Following are the performance results for read write test observed

with

different numbers of "backend_flush_after".

1) backend_flush_after = 256kb (32*8kb), tps = 10841.178815
2) backend_flush_after = 512kb (64*8kb), tps = 11098.702707
3) backend_flush_after = 1MB (128*8kb), tps = 11434.964545
4) backend_flush_after = 2MB (256*8kb), tps = 13477.089417

So even at 2MB we don't come close to recovering all of the lost
performance. Can you please test these three scenarios?

1. Default settings for *_flush_after
2. backend_flush_after=0, rest defaults
3. backend_flush_after=0, bgwriter_flush_after=0,
wal_writer_flush_after=0, checkpoint_flush_after=0

4) 1) + a shared_buffers setting appropriate to the workload.

If by 4th point, you mean to test the case when data fits in shared
buffers, then Mithun has already reported above [1]/messages/by-id/CAD__OuiObzNVTt_hO__P5AEnU4iNqcFWgArXR4TbLKe-UXyukQ@mail.gmail.com Read line - Even for READ-WRITE when data fits into shared buffer (scale_factor=300 and shared_buffers=8GB) performance has improved. that it didn't see any
regression for that case

[1]: /messages/by-id/CAD__OuiObzNVTt_hO__P5AEnU4iNqcFWgArXR4TbLKe-UXyukQ@mail.gmail.com Read line - Even for READ-WRITE when data fits into shared buffer (scale_factor=300 and shared_buffers=8GB) performance has improved.
/messages/by-id/CAD__OuiObzNVTt_hO__P5AEnU4iNqcFWgArXR4TbLKe-UXyukQ@mail.gmail.com
Read line - Even for READ-WRITE when data fits into shared buffer
(scale_factor=300 and shared_buffers=8GB) performance has improved.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#20

Ashutosh Sharma

ashu.coek88@gmail.com

about 10 years ago

In reply to: Robert Haas (#15)

Re: Perf Benchmarking and regression.

Hi,

Please find the results for the following 3 scenarios with unpatched master:

1. Default settings for *_flush_after : TPS = *10677.662356*
2. backend_flush_after=0, rest defaults : TPS = *18452.655936*
3. backend_flush_after=0, bgwriter_flush_after=0,
wal_writer_flush_after=0, checkpoint_flush_after=0 : TPS = *18614.479962*

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com

On Fri, May 13, 2016 at 7:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Show quoted text

On Fri, May 13, 2016 at 7:08 AM, Ashutosh Sharma <ashu.coek88@gmail.com>
wrote:

Following are the performance results for read write test observed with
different numbers of "backend_flush_after".

1) backend_flush_after = 256kb (32*8kb), tps = 10841.178815
2) backend_flush_after = 512kb (64*8kb), tps = 11098.702707
3) backend_flush_after = 1MB (128*8kb), tps = 11434.964545
4) backend_flush_after = 2MB (256*8kb), tps = 13477.089417

So even at 2MB we don't come close to recovering all of the lost
performance. Can you please test these three scenarios?

1. Default settings for *_flush_after
2. backend_flush_after=0, rest defaults
3. backend_flush_after=0, bgwriter_flush_after=0,
wal_writer_flush_after=0, checkpoint_flush_after=0

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21

Fabien COELHO

coelho@cri.ensmp.fr

about 10 years ago

In reply to: Ashutosh Sharma (#20)

#22

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Fabien COELHO (#21)

#23

Fabien COELHO

coelho@cri.ensmp.fr

about 10 years ago

In reply to: Andres Freund (#22)

#24

Ashutosh Sharma

ashu.coek88@gmail.com

about 10 years ago

In reply to: Fabien COELHO (#23)

#25

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Ashutosh Sharma (#24)

#26

Noah Misch

noah@leadboat.com

about 10 years ago

In reply to: Robert Haas (#8)

#27

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Andres Freund (#25)

#28

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Robert Haas (#27)

#29

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Andres Freund (#28)

#30

Noah Misch

noah@leadboat.com

almost 10 years ago

In reply to: Andres Freund (#28)

#31

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Noah Misch (#30)

#32

Noah Misch

noah@leadboat.com

almost 10 years ago

In reply to: Andres Freund (#31)

#33

Fabien COELHO

coelho@cri.ensmp.fr

almost 10 years ago

In reply to: Noah Misch (#32)

#34

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Noah Misch (#32)

#35

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Andres Freund (#34)

#36

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Andres Freund (#31)

#37

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Robert Haas (#36)

#38

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Andres Freund (#37)

#39

Tom Lane

tgl@sss.pgh.pa.us

almost 10 years ago

In reply to: Robert Haas (#38)

#40

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Robert Haas (#38)

#41

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Tom Lane (#39)

#42

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Andres Freund (#40)

#43

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Robert Haas (#42)

#44

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Andres Freund (#43)

#45

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Robert Haas (#44)

#46

Noah Misch

noah@leadboat.com

almost 10 years ago

In reply to: Robert Haas (#44)

#47

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Noah Misch (#46)

#48

Noah Misch

noah@leadboat.com

almost 10 years ago

In reply to: Noah Misch (#26)

#49

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Noah Misch (#48)

#50

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Andres Freund (#49)

#51

Michael Paquier

michael@paquier.xyz

almost 10 years ago

In reply to: Andres Freund (#50)

#52

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Michael Paquier (#51)

#53

Michael Paquier

michael@paquier.xyz

almost 10 years ago

In reply to: Andres Freund (#52)

#54

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Michael Paquier (#53)

#55

Michael Paquier

michael@paquier.xyz

almost 10 years ago

In reply to: Andres Freund (#54)

#56

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Andres Freund (#50)