Performance degradation in commit ac1d794

Started by Dmitry Vasilyevover 10 years ago94 messageshackers
Jump to latest
#1Dmitry Vasilyev
d.vasilyev@postgrespro.ru

Hello hackers!

I suddenly found commit ac1d794 gives up to 3 times performance degradation.

I tried to run pgbench -s 1000 -j 48 -c 48 -S -M prepared on 70 CPU-core
machine:
commit ac1d794 gives me 363,474 tps
and previous commit a05dc4d gives me 956,146
and master( 3d0c50f ) with revert ac1d794 gives me 969,265

it's shocking

---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

#2Andres Freund
andres@anarazel.de
In reply to: Dmitry Vasilyev (#1)
Re: Performance degradation in commit ac1d794

On December 25, 2015 6:08:15 PM GMT+01:00, "Васильев Дмитрий" <d.vasilyev@postgrespro.ru> wrote:

Hello hackers!

I suddenly found commit ac1d794 gives up to 3 times performance
degradation.

I tried to run pgbench -s 1000 -j 48 -c 48 -S -M prepared on 70
CPU-core
machine:
commit ac1d794 gives me 363,474 tps
and previous commit a05dc4d gives me 956,146
and master( 3d0c50f ) with revert ac1d794 gives me 969,265

it's shocking

You're talking about http://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/backend/libpq/be-secure.c;h=2ddcf428f89fd12c230d6f417c2f707fbd97bf39;hp=26d8faaf773a818b388b899b8d83d617bdf7af9b;hb=ac1d794;hpb=a05dc4d7fd57d4ae084c1f0801973e5c1a1aa26e

If so, could you provide a hierarchical before/after profile?

Andres
Hi,

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Dmitry Vasilyev
d.vasilyev@postgrespro.ru
In reply to: Andres Freund (#2)
Re: Performance degradation in commit ac1d794

2015-12-25 20:18 GMT+03:00 Andres Freund <andres@anarazel.de>:

On December 25, 2015 6:08:15 PM GMT+01:00, "Васильев Дмитрий" <
d.vasilyev@postgrespro.ru> wrote:

Hello hackers!

I suddenly found commit ac1d794 gives up to 3 times performance
degradation.

I tried to run pgbench -s 1000 -j 48 -c 48 -S -M prepared on 70
CPU-core
machine:
commit ac1d794 gives me 363,474 tps
and previous commit a05dc4d gives me 956,146
and master( 3d0c50f ) with revert ac1d794 gives me 969,265

it's shocking


You're talking about
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/backend/libpq/be-secure.c;h=2ddcf428f89fd12c230d6f417c2f707fbd97bf39;hp=26d8faaf773a818b388b899b8d83d617bdf7af9b;hb=ac1d794;hpb=a05dc4d7fd57d4ae084c1f0801973e5c1a1aa26e

​​
If so, could you provide a hierarchical before/after profile?

Andres
Hi,

---
Please excuse brevity and formatting - I am writing this on my mobile
phone.

​> ​You're talking about
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/backend/libpq/be-secure.c;h=2ddcf428f89fd12c230d6f417c2f707fbd97bf39;hp=26d8faaf773a818b388b899b8d83d617bdf7af9b;hb=ac1d794;hpb=a05dc4d7fd57d4ae084c1f0801973e5c1a1aa26e

Yes, about this.

​If so, could you provide a hierarchical before/after profile?

​Performance | Git hash commit | Date
~ 360k tps | c3e7c24a1d60dc6ad56e2a0723399f1570c54224 | Thu Nov 12 09:12:18
2015 -0500
~ 360k tps | ac1d7945f866b1928c2554c0f80fd52d7f977772 | Thu Nov 12
09:00:33 2015 -0500
~ 960k tps | a05dc4d7fd57d4ae084c1f0801973e5c1a1aa26e | Thu Nov 12
07:40:31 2015 -0500

​​---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company​

#4Dmitry Vasilyev
d.vasilyev@postgrespro.ru
In reply to: Dmitry Vasilyev (#3)
Re: Performance degradation in commit ac1d794

2015-12-25 20:27 GMT+03:00 Васильев Дмитрий <d.vasilyev@postgrespro.ru>:

2015-12-25 20:18 GMT+03:00 Andres Freund <andres@anarazel.de>:

On December 25, 2015 6:08:15 PM GMT+01:00, "Васильев Дмитрий" <
d.vasilyev@postgrespro.ru> wrote:

Hello hackers!

I suddenly found commit ac1d794 gives up to 3 times performance
degradation.

I tried to run pgbench -s 1000 -j 48 -c 48 -S -M prepared on 70
CPU-core
machine:
commit ac1d794 gives me 363,474 tps
and previous commit a05dc4d gives me 956,146
and master( 3d0c50f ) with revert ac1d794 gives me 969,265

it's shocking


You're talking about
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/backend/libpq/be-secure.c;h=2ddcf428f89fd12c230d6f417c2f707fbd97bf39;hp=26d8faaf773a818b388b899b8d83d617bdf7af9b;hb=ac1d794;hpb=a05dc4d7fd57d4ae084c1f0801973e5c1a1aa26e

​​
If so, could you provide a hierarchical before/after profile?

Andres
Hi,

---
Please excuse brevity and formatting - I am writing this on my mobile
phone.

​> ​You're talking about
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/backend/libpq/be-secure.c;h=2ddcf428f89fd12c230d6f417c2f707fbd97bf39;hp=26d8faaf773a818b388b899b8d83d617bdf7af9b;hb=ac1d794;hpb=a05dc4d7fd57d4ae084c1f0801973e5c1a1aa26e

Yes, about this.

​If so, could you provide a hierarchical before/after profile?

​Performance | Git hash commit | Date
~ 360k tps | c3e7c24a1d60dc6ad56e2a0723399f1570c54224 | Thu Nov 12
09:12:18 2015 -0500
~ 360k tps | ac1d7945f866b1928c2554c0f80fd52d7f977772 | Thu Nov 12
09:00:33 2015 -0500
~ 960k tps | a05dc4d7fd57d4ae084c1f0801973e5c1a1aa26e | Thu Nov 12
07:40:31 2015 -0500

​​
​​---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company​

​I came across it by studying the results:​​

vanilla 9.5 = 30020c3fc3b6de5592978313df929d370f5770ce
vanilla 9.6 = c4a8812cf64b142685e39a69694c5276601f40e4


info | clients | tps
-----------------------+---------+---------
vanilla 9.5 | 1 | 30321
vanilla 9.5 | 8 | 216542
vanilla 9.5 | 16 | 412526
vanilla 9.5 | 32 | 751331
vanilla 9.5 | 48 | 956146
​ <- this point​
vanilla 9.5 | 56 | 990122
vanilla 9.5 | 64 | 842436
vanilla 9.5 | 72 | 913272
vanilla 9.5 | 82 | 659332
vanilla 9.5 | 92 | 630111
vanilla 9.5 | 96 | 616863
vanilla 9.5 | 110 | 592080
vanilla 9.5 | 120 | 575831
vanilla 9.5 | 130 | 557521
vanilla 9.5 | 140 | 537951
vanilla 9.5 | 150 | 517019
vanilla 9.5 | 160 | 502312
vanilla 9.5 | 170 | 489162
vanilla 9.5 | 180 | 477178
vanilla 9.5 | 190 | 464620
vanilla 9.6 | 1 | 31738
vanilla 9.6 | 8 | 219692
vanilla 9.6 | 16 | 422933
vanilla 9.6 | 32 | 375546
vanilla 9.6 | 48 | 363474
​ <- this point​
vanilla 9.6 | 56 | 352943
vanilla 9.6 | 64 | 334498
vanilla 9.6 | 72 | 369802
vanilla 9.6 | 82 | 604867
vanilla 9.6 | 92 | 871048
vanilla 9.6 | 96 | 969265
vanilla 9.6 | 105 | 996794
vanilla 9.6 | 110 | 932853
vanilla 9.6 | 115 | 758485
vanilla 9.6 | 120 | 721365
vanilla 9.6 | 125 | 632265
vanilla 9.6 | 130 | 624666
vanilla 9.6 | 135 | 582120
vanilla 9.6 | 140 | 583080
vanilla 9.6 | 150 | 555608
vanilla 9.6 | 160 | 533340
vanilla 9.6 | 170 | 520308
vanilla 9.6 | 180 | 504536
vanilla 9.6 | 190 | 496967​

​​​​---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company​​

#5Andres Freund
andres@anarazel.de
In reply to: Dmitry Vasilyev (#3)
Re: Performance degradation in commit ac1d794

On December 25, 2015 6:27:06 PM GMT+01:00, "Васильев Дмитрий"

​If so, could you provide a hierarchical before/after profile?

​Performance | Git hash commit | Date
~ 360k tps | c3e7c24a1d60dc6ad56e2a0723399f1570c54224 | Thu Nov 12
09:12:18
2015 -0500
~ 360k tps | ac1d7945f866b1928c2554c0f80fd52d7f977772 | Thu Nov 12
09:00:33 2015 -0500
~ 960k tps | a05dc4d7fd57d4ae084c1f0801973e5c1a1aa26e | Thu Nov 12
07:40:31 2015 -0500

Profile as in perf oprofile or something.

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Dmitry Vasilyev
d.vasilyev@postgrespro.ru
In reply to: Andres Freund (#5)
Re: Performance degradation in commit ac1d794


2015-12-25 20:44 GMT+03:00 Andres Freund <andres@anarazel.de>:

On December 25, 2015 6:27:06 PM GMT+01:00, "Васильев Дмитрий"

​If so, could you provide a hierarchical before/after profile?

​Performance | Git hash commit | Date
~ 360k tps | c3e7c24a1d60dc6ad56e2a0723399f1570c54224 | Thu Nov 12
09:12:18
2015 -0500
~ 360k tps | ac1d7945f866b1928c2554c0f80fd52d7f977772 | Thu Nov 12
09:00:33 2015 -0500
~ 960k tps | a05dc4d7fd57d4ae084c1f0801973e5c1a1aa26e | Thu Nov 12
07:40:31 2015 -0500

Profile as in perf oprofile or something.

---
Please excuse brevity and formatting - I am writing this on my mobile
phone.

​ac1d794​:

​ Samples: 1M of event 'cycles', Event count (approx.): 816922259995, UID:
pgpro
Overhead Shared Object Symbol

69,72% [kernel] [k] _raw_spin_lock_irqsave
1,43% postgres [.] _bt_compare
1,19% postgres [.] LWLockAcquire
0,99% postgres [.] hash_search_with_hash_value
0,61% postgres [.] PinBuffer

0,46% postgres [.] GetSnapshotData ​

​a05dc4d:​

​Samples: 1M of event 'cycles', Event count (approx.): 508150718694, UID:
pgpro
Overhead Shared Object Symbol

4,77% postgres [.] GetSnapshotData
4,30% postgres [.] _bt_compare
3,13% postgres [.] hash_search_with_hash_value
3,08% postgres [.] LWLockAcquire
2,09% postgres [.] LWLockRelease
2,03% postgres [.] PinBuffer

​Perf record ​generate big traffic:

time perf record -u pgpro -g --call-graph=dwarf
^C[ perf record: Woken up 0 times to write data ]
Warning:
Processed 1078453 events and lost 18257 chunks!
Check IO/CPU overload!
[ perf record: Captured and wrote 8507.985 MB perf.data (1055663 samples) ]
real 0m8.791s
user 0m0.678s
sys 0m8.120s

​If you want i give you ssh-access.

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dmitry Vasilyev (#6)
Re: Performance degradation in commit ac1d794

=?UTF-8?B?0JLQsNGB0LjQu9GM0LXQsiDQlNC80LjRgtGA0LjQuQ==?= <d.vasilyev@postgrespro.ru> writes:

​ Samples: 1M of event 'cycles', Event count (approx.): 816922259995, UID:
pgpro
Overhead Shared Object Symbol

69,72% [kernel] [k] _raw_spin_lock_irqsave
1,43% postgres [.] _bt_compare
1,19% postgres [.] LWLockAcquire
0,99% postgres [.] hash_search_with_hash_value
0,61% postgres [.] PinBuffer

Seems like what you've got here is a kernel bug.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#7)
Re: Performance degradation in commit ac1d794

On December 25, 2015 7:10:23 PM GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:

=?UTF-8?B?0JLQsNGB0LjQu9GM0LXQsiDQlNC80LjRgtGA0LjQuQ==?=
<d.vasilyev@postgrespro.ru> writes:

��� Samples: 1M of event 'cycles', Event count (approx.):

816922259995, UID:

pgpro
Overhead Shared Object Symbol

69,72% [kernel] [k] _raw_spin_lock_irqsave
1,43% postgres [.] _bt_compare
1,19% postgres [.] LWLockAcquire
0,99% postgres [.] hash_search_with_hash_value
0,61% postgres [.] PinBuffer

Seems like what you've got here is a kernel bug.

I wouldn't go as far as calling it a kernel bug. Were still doing 300k tps. And were triggering the performance degradation by adding another socket (IIRC) to the poll(2) call.

It certainly be interesting to see the expanded tree below the spinlock. I wonder if this is related to directed wakeups.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#8)
Re: Performance degradation in commit ac1d794

Andres Freund <andres@anarazel.de> writes:

On December 25, 2015 7:10:23 PM GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Seems like what you've got here is a kernel bug.

I wouldn't go as far as calling it a kernel bug. Were still doing 300k tps. And were triggering the performance degradation by adding another socket (IIRC) to the poll(2) call.

Hmm. And all those FDs point to the same pipe. I wonder if we're looking
at contention for some pipe-related data structure inside the kernel.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Dmitry Vasilyev
d.vasilyev@postgrespro.ru
In reply to: Tom Lane (#9)
Re: Performance degradation in commit ac1d794


2015-12-25 21:28 GMT+03:00 Tom Lane <tgl@sss.pgh.pa.us>:

Andres Freund <andres@anarazel.de> writes:

On December 25, 2015 7:10:23 PM GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us>

wrote:

Seems like what you've got here is a kernel bug.

I wouldn't go as far as calling it a kernel bug. Were still doing 300k

tps. And were triggering the performance degradation by adding another
socket (IIRC) to the poll(2) call.

Hmm. And all those FDs point to the same pipe. I wonder if we're looking
at contention for some pipe-related data structure inside the kernel.

regards, tom lane

​I did bt on backends and found it in following state:

#0 0x00007f77b0e5bb60 in __poll_nocancel () from /lib64/libc.so.6
#1 0x00000000006a7cd0 in WaitLatchOrSocket (latch=0x7f779e2e96c4,
wakeEvents=wakeEvents@entry=19, sock=9, timeout=timeout@entry=0) at
pg_latch.c:333
#2 0x0000000000612c7d in secure_read (port=0x17e6af0, ptr=0xcc94a0
<PqRecvBuffer>, len=8192) at be-secure.c:147
#3 0x000000000061be36 in pq_recvbuf () at pqcomm.c:915
#4 pq_getbyte () at pqcomm.c:958
#5 0x0000000000728ad5 in SocketBackend (inBuf=0x7ffd8b6b1460) at
postgres.c:345

Perf shows _raw_spin_lock_irqsave call remove_wait_queue add_wait_queue
There’s screenshots: http://i.imgur.com/pux2bGJ.png
http://i.imgur.com/LJQbm2V.png​

​​---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company​

#11Dmitry Vasilyev
d.vasilyev@postgrespro.ru
In reply to: Dmitry Vasilyev (#10)
Re: Performance degradation in commit ac1d794


2015-12-25 22:42 GMT+03:00 Васильев Дмитрий <d.vasilyev@postgrespro.ru>:


2015-12-25 21:28 GMT+03:00 Tom Lane <tgl@sss.pgh.pa.us>:

Andres Freund <andres@anarazel.de> writes:

On December 25, 2015 7:10:23 PM GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us>

wrote:

Seems like what you've got here is a kernel bug.

I wouldn't go as far as calling it a kernel bug. Were still doing 300k

tps. And were triggering the performance degradation by adding another
socket (IIRC) to the poll(2) call.

Hmm. And all those FDs point to the same pipe. I wonder if we're looking
at contention for some pipe-related data structure inside the kernel.

regards, tom lane

​I did bt on backends and found it in following state:

#0 0x00007f77b0e5bb60 in __poll_nocancel () from /lib64/libc.so.6
#1 0x00000000006a7cd0 in WaitLatchOrSocket (latch=0x7f779e2e96c4,
wakeEvents=wakeEvents@entry=19, sock=9, timeout=timeout@entry=0) at
pg_latch.c:333
#2 0x0000000000612c7d in secure_read (port=0x17e6af0, ptr=0xcc94a0
<PqRecvBuffer>, len=8192) at be-secure.c:147
#3 0x000000000061be36 in pq_recvbuf () at pqcomm.c:915
#4 pq_getbyte () at pqcomm.c:958
#5 0x0000000000728ad5 in SocketBackend (inBuf=0x7ffd8b6b1460) at
postgres.c:345

Perf shows _raw_spin_lock_irqsave call remove_wait_queue add_wait_queue
There’s screenshots: http://i.imgur.com/pux2bGJ.png
http://i.imgur.com/LJQbm2V.png​

​​---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company​

​I’m sorry I meant remove_wait_queue and add_wait_queue functions call
_raw_spin_lock_irqsave what holds main processor time.​

​uname -a: 3.10.0-229.20.1.el7.x86_64 #1 SMP Tue Nov 3 19:10:07 UTC 2015
x86_64 x86_64 x86_64 GNU/Linux​

​​---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company​

#12Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#9)
Re: Performance degradation in commit ac1d794

On 2015-12-25 13:28:55 -0500, Tom Lane wrote:

Hmm. And all those FDs point to the same pipe. I wonder if we're looking
at contention for some pipe-related data structure inside the kernel.

Sounds fairly likely - and not too surprising. In this scenario we've a
couple 100k registrations/unregistrations to a pretty fundamentally
shared resource (the wait queue for changes to the pipe). Not that
surprising that it becomes a problem.

There's a couple solutions I can think of to that problem:
1) Use epoll()/kqueue, or other similar interfaces that don't require
re-registering fds at every invocation. My guess is that that'd be
desirable for performance anyway.

2) Create a pair of fds between postmaster/backend for each
backend. While obviously increasing the the number of FDs noticeably,
it's interesting for other features as well: If we ever want to do FD
passing from postmaster to existing backends, we're going to need
that anyway.

3) Replace the postmaster_alive_fds socketpair by some other signalling
mechanism. E.g. sending a procsignal to each backend, which sets the
latch and a special flag in the latch structure.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#12)
Re: Performance degradation in commit ac1d794

Andres Freund <andres@anarazel.de> writes:

There's a couple solutions I can think of to that problem:
1) Use epoll()/kqueue, or other similar interfaces that don't require
re-registering fds at every invocation. My guess is that that'd be
desirable for performance anyway.

Portability, on the other hand, would be problematic.

2) Create a pair of fds between postmaster/backend for each
backend. While obviously increasing the the number of FDs noticeably,
it's interesting for other features as well: If we ever want to do FD
passing from postmaster to existing backends, we're going to need
that anyway.

Maybe; it'd provide another limit on how many backends we could run.

3) Replace the postmaster_alive_fds socketpair by some other signalling
mechanism. E.g. sending a procsignal to each backend, which sets the
latch and a special flag in the latch structure.

And what would send the signal? The entire point here is to notice the
situation where the postmaster has crashed. It can *not* depend on the
postmaster taking some action.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#13)
Re: Performance degradation in commit ac1d794

On 2015-12-25 16:29:53 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

There's a couple solutions I can think of to that problem:
1) Use epoll()/kqueue, or other similar interfaces that don't require
re-registering fds at every invocation. My guess is that that'd be
desirable for performance anyway.

Portability, on the other hand, would be problematic.

Indeed. But we might be able to get away with it because there's
realistically just one platform on which people run four socket
servers. Obviously we'd leave poll and select support in place. It'd be
a genuine improvement for less extreme loads on linux, too.

3) Replace the postmaster_alive_fds socketpair by some other signalling
mechanism. E.g. sending a procsignal to each backend, which sets the
latch and a special flag in the latch structure.

And what would send the signal? The entire point here is to notice the
situation where the postmaster has crashed. It can *not* depend on the
postmaster taking some action.

Ahem. Um. Look, over there --->

I blame it on all the food.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#14)
Re: Performance degradation in commit ac1d794

On 2015-12-26 12:22:48 +0100, Andres Freund wrote:

3) Replace the postmaster_alive_fds socketpair by some other signalling
mechanism. E.g. sending a procsignal to each backend, which sets the
latch and a special flag in the latch structure.

And what would send the signal? The entire point here is to notice the
situation where the postmaster has crashed. It can *not* depend on the
postmaster taking some action.

Ahem. Um. Look, over there --->

I blame it on all the food.

A unportable and easy version of this, actually making sense this time,
would be to use prctl(PR_SET_PDEATHSIG, SIGQUIT). That'd send SIGQUIT to
backends whenever postmaster dies. Obviously that's not portable
either - doing this for linux only wouldn't be all that kludgey tho.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#15)
Re: Performance degradation in commit ac1d794

Andres Freund <andres@anarazel.de> writes:

A unportable and easy version of this, actually making sense this time,
would be to use prctl(PR_SET_PDEATHSIG, SIGQUIT). That'd send SIGQUIT to
backends whenever postmaster dies. Obviously that's not portable
either - doing this for linux only wouldn't be all that kludgey tho.

Hmm. That would have semantics rather substantially different from
the way that the WL_POSTMASTER_DEATH code behaves. But I don't know
how much we care about that, since the whole scenario is something
that should not happen under normal circumstances. Maybe cross-platform
variation is OK as long as it doesn't make the code too hairy.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#15)
Re: Performance degradation in commit ac1d794

On Sat, Dec 26, 2015 at 5:41 PM, Andres Freund <andres@anarazel.de> wrote:

On 2015-12-26 12:22:48 +0100, Andres Freund wrote:

3) Replace the postmaster_alive_fds socketpair by some other

signalling

mechanism. E.g. sending a procsignal to each backend, which sets

the

latch and a special flag in the latch structure.

And what would send the signal? The entire point here is to notice the
situation where the postmaster has crashed. It can *not* depend on the
postmaster taking some action.

Ahem. Um. Look, over there --->

I blame it on all the food.

A unportable and easy version of this, actually making sense this time,
would be to use prctl(PR_SET_PDEATHSIG, SIGQUIT). That'd send SIGQUIT to
backends whenever postmaster dies. Obviously that's not portable
either - doing this for linux only wouldn't be all that kludgey tho.

There is a way to make backends exit in Windows as well by using
JobObjects and use limitFlags as JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE
for JOBOBJECT_BASIC_LIMIT_INFORMATION.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#18Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#14)
Re: Performance degradation in commit ac1d794

On 2015-12-26 12:22:48 +0100, Andres Freund wrote:

On 2015-12-25 16:29:53 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

There's a couple solutions I can think of to that problem:
1) Use epoll()/kqueue, or other similar interfaces that don't require
re-registering fds at every invocation. My guess is that that'd be
desirable for performance anyway.

Portability, on the other hand, would be problematic.

Indeed. But we might be able to get away with it because there's
realistically just one platform on which people run four socket
servers. Obviously we'd leave poll and select support in place. It'd be
a genuine improvement for less extreme loads on linux, too.

I finally got back to working on this. Attached is a WIP patch series
implementing:
0001: Allow to easily choose between the readiness primitives in unix_latch.c
Pretty helpful for testing, not useful for anything else.
0002: Error out if waiting on socket readiness without a specified socket.
0003: Only clear unix_latch.c's self-pipe if it actually contains data.
~2% on high qps workloads
0004: Support using epoll as the polling primitive in unix_latch.c.
~3% on high qps workloads, massive scalability improvements (x3)
on very large machines.

With 0004 obviously being the relevant bit for this thread. I verified
that using epoll addresses the performance problem, using the hardware
the OP noticed the performance problem on.

The reason I went with using epoll over the PR_SET_PDEATHSIG approach is
that it provides semantics that are more similar to the other platforms,
while being just as platform dependant as PR_SET_PDEATHSIG. It also is
actually measurably faster, at least here.

0004 currently contains one debatable optimization, which I'd like to
discuss: Currently the 'sock' passed to WaitLatchOrSocket is not
removed/added to the epoll fd, if it's the numerically same as in the
last call. That's good for performance, but would be wrong if the socket
were close and a new one with the same value would be waited on. I
think a big warning sign somewhere is sufficient to deal with that
problem - it's not something we're likely to start doing. And even if
it's done at some point, we can just offer an API to reset the last used
socket fd.

Unless somebody comes up with a platform independent way of addressing
this, I'm inclined to press forward using epoll(). Opinions?

Andres

Attachments:

0001-Make-it-easier-to-choose-the-used-waiting-primitive-.patchtext/x-patch; charset=us-asciiDownload+34-17
0002-Error-out-if-waiting-on-socket-readiness-without-a-s.patchtext/x-patch; charset=us-asciiDownload+8-6
0003-Only-clear-unix_latch.c-s-self-pipe-if-it-actually-c.patchtext/x-patch; charset=us-asciiDownload+51-27
0004-Support-using-epoll-as-the-polling-primitive-in-unix.patchtext/x-patch; charset=us-asciiDownload+234-6
#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#18)
Re: Performance degradation in commit ac1d794

Andres Freund <andres@anarazel.de> writes:

0004 currently contains one debatable optimization, which I'd like to
discuss: Currently the 'sock' passed to WaitLatchOrSocket is not
removed/added to the epoll fd, if it's the numerically same as in the
last call. That's good for performance, but would be wrong if the socket
were close and a new one with the same value would be waited on. I
think a big warning sign somewhere is sufficient to deal with that
problem - it's not something we're likely to start doing. And even if
it's done at some point, we can just offer an API to reset the last used
socket fd.

Perhaps a cleaner API solution would be to remove the socket argument per
se from the function altogether, instead providing a separate
SetSocketToWaitOn() call.

(Also, if there is a need for it, we could provide a function that still
takes a socket argument, with the understanding that it's to be used for
short-lived sockets where you don't want to change the process's main
epoll state.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#18)
Re: Performance degradation in commit ac1d794

On Thu, Jan 14, 2016 at 9:39 AM, Andres Freund <andres@anarazel.de> wrote:

I finally got back to working on this. Attached is a WIP patch series
implementing:
0001: Allow to easily choose between the readiness primitives in unix_latch.c
Pretty helpful for testing, not useful for anything else.

Looks good.

0002: Error out if waiting on socket readiness without a specified socket.

Looks good.

0003: Only clear unix_latch.c's self-pipe if it actually contains data.
~2% on high qps workloads

everytime -> every time

+        if ((wakeEvents & WL_LATCH_SET) && latch->is_set)
+        {
+            result |= WL_LATCH_SET;
+        }

Excess braces.

Doesn't this code make it possible for the self-pipe to fill up,
self-deadlocking the process? Suppose we repeatedly enter
WaitLatchOrSocket(). Each time we do, just after waiting = true is
set, somebody sets the latch. We handle the signal and put a byte
into the pipe. Returning from the signal handler, we then notice that
is_set is true and return at once, without draining the pipe. Repeat
until something bad happens.

0004: Support using epoll as the polling primitive in unix_latch.c.
~3% on high qps workloads, massive scalability improvements (x3)
on very large machines.

With 0004 obviously being the relevant bit for this thread. I verified
that using epoll addresses the performance problem, using the hardware
the OP noticed the performance problem on.

+                /*
+                 * Unnecessarily pass data for delete due to bug errorneously
+                 * requiring it in the past.
+                 */

This is pretty vague. And it has a spelling mistake.

Further down, nodified -> notified.

+ if (wakeEvents != latch->lastmask || latch->lastwatchfd != sock)

I don't like this very much. I think it's a bad idea to test
latch->lastwatchfd != sock. That has an excellent change of letting
people write code that appears to work but then doesn't. I think it
would be better, if we're going to change the API contract, to make it
a hard break, as I see Tom has also suggested while I've been writing
this.

Incidentally, if we're going to whack around the latch API, it would
be nice to pick a design which wouldn't be too hard to extend to
waiting on multiple sockets. The application I have in mind is to
send of queries to several foreign servers at once and then wait until
bytes come back from any of them. It's mostly pie in the sky at this
point, but it seems highly likely to me that we'd want to do such a
thing by waiting for bytes from any of the sockets involved OR a latch
event.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#20)
#22Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#20)
#23Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#21)
#24Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#23)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#22)
#26Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#25)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#26)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#27)
#29Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#27)
#30Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#30)
#32Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#29)
#33Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#29)
#34Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Andres Freund (#32)
#35Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#31)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#35)
#37Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#36)
#38Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#37)
#39Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#38)
#40Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#39)
#41Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#40)
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#41)
#43Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#42)
#44Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Robert Haas (#33)
#45Andres Freund
andres@anarazel.de
In reply to: Kyotaro Horiguchi (#34)
#46Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#27)
#47Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#46)
#48Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#47)
#49Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#48)
#50Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#49)
#51Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#49)
#52Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#51)
#53Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#51)
#54Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#53)
#55Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#54)
#56Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#53)
#57Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#53)
#58Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#53)
#59Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#56)
#60Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#55)
#61Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#57)
#62Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#58)
#63Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#60)
#64Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#63)
#65Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#64)
#66Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#62)
#67Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#58)
#68Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#67)
#69Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#66)
#70Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#69)
#71Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#70)
#72Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#71)
#73Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#72)
#74Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#58)
#75Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#73)
#76Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#74)
#77Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#76)
#78Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#77)
#79Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#78)
#80Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#79)
#81Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#79)
#82Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#75)
#83Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#82)
#84Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#83)
#85Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#81)
#86Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#84)
#87Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#86)
#88Andres Freund
andres@anarazel.de
In reply to: Dmitry Vasilyev (#1)
#89Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#85)
#90Noah Misch
noah@leadboat.com
In reply to: Dmitry Vasilyev (#1)
#91Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#90)
#92Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#91)
#93Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#92)
#94Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#93)