Scaling PostgreSQL at multicore Power8

Started by Yury Zhuravlevover 10 years ago11 messageshackers
Jump to latest
#1Yury Zhuravlev
u.zhuravlev@postgrespro.ru

Hello hackers

Recently, we were given access to the test server is IBM, 9119-MHE with 8 CPUs
* 8 cores * 8 threads. We decided to take advantage of this and to find
bottlenecks for read scalability (pgbench -S).

All detail you can read here:
http://www.postgrespro.ru/blog/pgsql/2015/08/30/p8scaling

Performance 9.4 stopped growing after 100 clients, and 9.5 / 9.6 stopped after
150 (at 4 NUMA nodes). After research using pref we saw that inhibits
ProcArrayLock in GetSnaphotData. But inserting the stub instead of
GetSnapshotData not significantly increased scalability. Trying to find the
bottleneck with gdb, we found another place. We have noticed s_lock in
PinBuffer and UnpinBuffer. For the test we rewrited PinBuffer and UnpinBuffer
by atomic operations and we liked the result. Degradation of performance
almost completely disappeared, and went scaling up to 400 clients (4 NUMA
nodes with 256 "CPUs").

To scale Postgres for large NUMA machine must be ported to the atomic
operations bufmgr. During our tests, we no found errors in our patch, but most
likely it is not true and this patch only for test.

Who has any thoughts?

--
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

atomic_bufmgr_v3.patchtext/x-patch; charset=UTF-8; name=atomic_bufmgr_v3.patchDownload+120-118
#2Andres Freund
andres@anarazel.de
In reply to: Yury Zhuravlev (#1)
Re: Scaling PostgreSQL at multicore Power8

On 2015-08-31 13:54:57 +0300, YUriy Zhuravlev wrote:

We have noticed s_lock in PinBuffer and UnpinBuffer. For the test we
rewrited PinBuffer and UnpinBuffer by atomic operations and we liked
the result. Degradation of performance almost completely disappeared,
and went scaling up to 400 clients (4 NUMA nodes with 256 "CPUs").

To scale Postgres for large NUMA machine must be ported to the atomic
operations bufmgr. During our tests, we no found errors in our patch, but most
likely it is not true and this patch only for test.

I agree that this is necessary, and it matches with what I've
profiled.

Unfortunately I don't think the patch can be quite as simple as
presented here - we rely on the exclusion provided by the spinlock in a
bunch of places for more than the manipulation of individual values. And
those currently are only correct if there's no other possible
manipulations going on. But it's definitely doable.

I've initial patch doing this, but for me at it seemed to be necessary
to merge flags, usage and refcount into a single value - otherwise the
consistency is hard to maintain because you can't do a CAS over all
values.

diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index 3ae2848..5fdaca7 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -95,9 +95,9 @@ InitBufferPool(void)
BufferDesc *buf = GetBufferDescriptor(i);
CLEAR_BUFFERTAG(buf->tag);
-			buf->flags = 0;
-			buf->usage_count = 0;
-			buf->refcount = 0;
+			buf->flags.value = 0;
+			buf->usage_count.value = 0;
+			buf->refcount.value = 0;
buf->wait_backend_pid = 0;

That's definitely not correct, you should initialize the atomics using
pg_atomic_init_u32() and write to by using pg_atomic_write_u32() - not
access them directly. This breaks the fallback paths.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Yury Zhuravlev
u.zhuravlev@postgrespro.ru
In reply to: Andres Freund (#2)
Re: Scaling PostgreSQL at multicore Power8

On Monday 31 August 2015 13:03:07 you wrote:

That's definitely not correct, you should initialize the atomics using
pg_atomic_init_u32() and write to by using pg_atomic_write_u32() - not
access them directly. This breaks the fallback paths.

You right. Now it's just to silence the compiler.
This patch is concept only.

--
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Yury Zhuravlev (#1)
Re: Scaling PostgreSQL at multicore Power8

On 08/31/2015 12:54 PM, YUriy Zhuravlev wrote:

Hello hackers

Recently, we were given access to the test server is IBM, 9119-MHE with 8 CPUs
* 8 cores * 8 threads. We decided to take advantage of this and to find
bottlenecks for read scalability (pgbench -S).

All detail you can read here:
http://www.postgrespro.ru/blog/pgsql/2015/08/30/p8scaling

Performance 9.4 stopped growing after 100 clients, and 9.5 / 9.6 stopped after
150 (at 4 NUMA nodes). After research using pref we saw that inhibits
ProcArrayLock in GetSnaphotData. But inserting the stub instead of
GetSnapshotData not significantly increased scalability. Trying to find the
bottleneck with gdb, we found another place. We have noticed s_lock in
PinBuffer and UnpinBuffer. For the test we rewrited PinBuffer and UnpinBuffer
by atomic operations and we liked the result. Degradation of performance
almost completely disappeared, and went scaling up to 400 clients (4 NUMA
nodes with 256 "CPUs").

To scale Postgres for large NUMA machine must be ported to the atomic
operations bufmgr. During our tests, we no found errors in our patch, but most
likely it is not true and this patch only for test.

Who has any thoughts?

Well, I could test the patch on a x86 machine with 4 sockets (64 cores),
but I wonder whether it makes sense at this point, as the patch really
is not correct (judging by what Andres says).

Also, what pgbench scale was used for the testing?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#4)
Re: Scaling PostgreSQL at multicore Power8

On 2015-08-31 17:43:08 +0200, Tomas Vondra wrote:

Well, I could test the patch on a x86 machine with 4 sockets (64 cores), but
I wonder whether it makes sense at this point, as the patch really is not
correct (judging by what Andres says).

Additionally it's, for default pgbench, really mostly a bottlneck after
GetSnapshotData() is fixed. You can make it a problem much earlier if
you have index nested loops over a lot of rows.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#5)
Re: Scaling PostgreSQL at multicore Power8

On 08/31/2015 05:48 PM, Andres Freund wrote:

On 2015-08-31 17:43:08 +0200, Tomas Vondra wrote:

Well, I could test the patch on a x86 machine with 4 sockets (64 cores), but
I wonder whether it makes sense at this point, as the patch really is not
correct (judging by what Andres says).

Additionally it's, for default pgbench, really mostly a bottlneck after
GetSnapshotData() is fixed. You can make it a problem much earlier if
you have index nested loops over a lot of rows.

[scratches head] So does this mean it's worth testing the patch on x86
or not, in it's current state? Or should we come up with another test
case, exercising the nested loops?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#6)
Re: Scaling PostgreSQL at multicore Power8

On 2015-08-31 17:54:17 +0200, Tomas Vondra wrote:

[scratches head] So does this mean it's worth testing the patch on x86 or
not, in it's current state?

You could try if you're interested. But I don't think it's super
meaningful. The patch is just a POC and rather widely incorrect.

Don't get me wrong, I think it's rather important that we fix this. But
that requires, imo, conceptual/development work right now, not
performance testing.

Or should we come up with another test case, exercising the nested
loops?

You'd need to do that, yes.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Yury Zhuravlev
u.zhuravlev@postgrespro.ru
In reply to: Tomas Vondra (#6)
Re: Scaling PostgreSQL at multicore Power8

On Monday 31 August 2015 17:54:17 Tomas Vondra wrote:

So does this mean it's worth testing the patch on x86
or not, in it's current state?

Its realy intersting. But you need have true 64 cores without HT. (32 core +HT
not have effect)

--
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Dmitry Vasilyev
d.vasilyev@postgrespro.ru
In reply to: Tomas Vondra (#4)
Re: Scaling PostgreSQL at multicore Power8

We did not got any affect on core 64 with smt = 8, and we not have a 64
-cpu x86 machine with disable HT feature.
You can set scale > 1000 and with shared_buffers >> size of index
pgbench_accounts_pkey.
You can also increase the concurrency: not only access top of b-tree
index, but also to a specific buffer: select * from pgbench_accounts
where aid = 1;

On Mon, 2015-08-31 at 17:43 +0200, Tomas Vondra wrote:

On 08/31/2015 12:54 PM, YUriy Zhuravlev wrote:

Hello hackers

Recently, we were given access to the test server is IBM, 9119-MHE
with 8 CPUs
* 8 cores * 8 threads. We decided to take advantage of this and to
find
bottlenecks for read scalability (pgbench -S).

All detail you can read here:
http://www.postgrespro.ru/blog/pgsql/2015/08/30/p8scaling

Performance 9.4 stopped growing after 100 clients, and 9.5 / 9.6
stopped after
150 (at 4 NUMA nodes). After research using pref we saw that
inhibits
ProcArrayLock in GetSnaphotData. But inserting the stub instead of
GetSnapshotData not significantly increased scalability. Trying to
find the
bottleneck with gdb, we found another place. We have noticed s_lock
in
PinBuffer and UnpinBuffer. For the test we rewrited PinBuffer and
UnpinBuffer
by atomic operations and we liked the result. Degradation of
performance
almost completely disappeared, and went scaling up to 400 clients
(4 NUMA
nodes with 256 "CPUs").

To scale Postgres for large NUMA machine must be ported to the
atomic
operations bufmgr. During our tests, we no found errors in our
patch, but most
likely it is not true and this patch only for test.

Who has any thoughts?

Well, I could test the patch on a x86 machine with 4 sockets (64
cores),
but I wonder whether it makes sense at this point, as the patch
really
is not correct (judging by what Andres says).

Also, what pgbench scale was used for the testing?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Yury Zhuravlev
u.zhuravlev@postgrespro.ru
In reply to: Andres Freund (#5)
Re: Scaling PostgreSQL at multicore Power8

On Monday 31 August 2015 17:48:50 Andres Freund wrote:

Additionally it's, for default pgbench, really mostly a bottlneck after
GetSnapshotData() is fixed. You can make it a problem much earlier if
you have index nested loops over a lot of rows.

100 000 000 is a lot? Simple select query from pgbech is common task not for
all but...
--
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Yury Zhuravlev
u.zhuravlev@postgrespro.ru
In reply to: Tomas Vondra (#4)
Re: Scaling PostgreSQL at multicore Power8

On Monday 31 August 2015 17:43:08 Tomas Vondra wrote:

Well, I could test the patch on a x86 machine with 4 sockets (64 cores),
but I wonder whether it makes sense at this point, as the patch really
is not correct (judging by what Andres says).

Can you test patch from this thread: /messages/by-id/2400449.GjM57CE0Yg@dinodell ?

In our view it is correct, although this is not obvious.

--
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers