Performance degradation in commit ac1d794

Started by Dmitry Vasilyevover 10 years ago94 messageshackers

d.vasilyev@postgrespro.ru

over 10 years ago

Hello hackers!

I suddenly found commit ac1d794 gives up to 3 times performance degradation.

I tried to run pgbench -s 1000 -j 48 -c 48 -S -M prepared on 70 CPU-core
machine:
commit ac1d794 gives me 363,474 tps
and previous commit a05dc4d gives me 956,146
and master( 3d0c50f ) with revert ac1d794 gives me 969,265

it's shocking

---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Dmitry Vasilyev (#1)

Re: Performance degradation in commit ac1d794

On December 25, 2015 6:08:15 PM GMT+01:00, "Васильев Дмитрий" <d.vasilyev@postgrespro.ru> wrote:

Hello hackers!

I suddenly found commit ac1d794 gives up to 3 times performance
degradation.

I tried to run pgbench -s 1000 -j 48 -c 48 -S -M prepared on 70
CPU-core
machine:
commit ac1d794 gives me 363,474 tps
and previous commit a05dc4d gives me 956,146
and master( 3d0c50f ) with revert ac1d794 gives me 969,265

it's shocking

You're talking about http://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/backend/libpq/be-secure.c;h=2ddcf428f89fd12c230d6f417c2f707fbd97bf39;hp=26d8faaf773a818b388b899b8d83d617bdf7af9b;hb=ac1d794;hpb=a05dc4d7fd57d4ae084c1f0801973e5c1a1aa26e

If so, could you provide a hierarchical before/after profile?

Andres
Hi,

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Dmitry Vasilyev

d.vasilyev@postgrespro.ru

over 10 years ago

In reply to: Andres Freund (#2)

Re: Performance degradation in commit ac1d794

2015-12-25 20:18 GMT+03:00 Andres Freund <andres@anarazel.de>:

On December 25, 2015 6:08:15 PM GMT+01:00, "Васильев Дмитрий" <
d.vasilyev@postgrespro.ru> wrote:

Hello hackers!

I suddenly found commit ac1d794 gives up to 3 times performance
degradation.

I tried to run pgbench -s 1000 -j 48 -c 48 -S -M prepared on 70
CPU-core
machine:
commit ac1d794 gives me 363,474 tps
and previous commit a05dc4d gives me 956,146
and master( 3d0c50f ) with revert ac1d794 gives me 969,265

it's shocking

You're talking about
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/backend/libpq/be-secure.c;h=2ddcf428f89fd12c230d6f417c2f707fbd97bf39;hp=26d8faaf773a818b388b899b8d83d617bdf7af9b;hb=ac1d794;hpb=a05dc4d7fd57d4ae084c1f0801973e5c1a1aa26e

If so, could you provide a hierarchical before/after profile?

Andres
Hi,

---
Please excuse brevity and formatting - I am writing this on my mobile
phone.

> You're talking about
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/backend/libpq/be-secure.c;h=2ddcf428f89fd12c230d6f417c2f707fbd97bf39;hp=26d8faaf773a818b388b899b8d83d617bdf7af9b;hb=ac1d794;hpb=a05dc4d7fd57d4ae084c1f0801973e5c1a1aa26e

Yes, about this.

If so, could you provide a hierarchical before/after profile?

---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Dmitry Vasilyev

d.vasilyev@postgrespro.ru

over 10 years ago

In reply to: Dmitry Vasilyev (#3)

Re: Performance degradation in commit ac1d794

2015-12-25 20:27 GMT+03:00 Васильев Дмитрий <d.vasilyev@postgrespro.ru>:

2015-12-25 20:18 GMT+03:00 Andres Freund <andres@anarazel.de>:

On December 25, 2015 6:08:15 PM GMT+01:00, "Васильев Дмитрий" <
d.vasilyev@postgrespro.ru> wrote:

Hello hackers!

I suddenly found commit ac1d794 gives up to 3 times performance
degradation.

I tried to run pgbench -s 1000 -j 48 -c 48 -S -M prepared on 70
CPU-core
machine:
commit ac1d794 gives me 363,474 tps
and previous commit a05dc4d gives me 956,146
and master( 3d0c50f ) with revert ac1d794 gives me 969,265

it's shocking

You're talking about
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/backend/libpq/be-secure.c;h=2ddcf428f89fd12c230d6f417c2f707fbd97bf39;hp=26d8faaf773a818b388b899b8d83d617bdf7af9b;hb=ac1d794;hpb=a05dc4d7fd57d4ae084c1f0801973e5c1a1aa26e

If so, could you provide a hierarchical before/after profile?

Andres
Hi,

---
Please excuse brevity and formatting - I am writing this on my mobile
phone.

> You're talking about
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/backend/libpq/be-secure.c;h=2ddcf428f89fd12c230d6f417c2f707fbd97bf39;hp=26d8faaf773a818b388b899b8d83d617bdf7af9b;hb=ac1d794;hpb=a05dc4d7fd57d4ae084c1f0801973e5c1a1aa26e

Yes, about this.

If so, could you provide a hierarchical before/after profile?

Performance | Git hash commit | Date
~ 360k tps | c3e7c24a1d60dc6ad56e2a0723399f1570c54224 | Thu Nov 12
09:12:18 2015 -0500
~ 360k tps | ac1d7945f866b1928c2554c0f80fd52d7f977772 | Thu Nov 12
09:00:33 2015 -0500
~ 960k tps | a05dc4d7fd57d4ae084c1f0801973e5c1a1aa26e | Thu Nov 12
07:40:31 2015 -0500

---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

I came across it by studying the results:

vanilla 9.5 = 30020c3fc3b6de5592978313df929d370f5770ce
vanilla 9.6 = c4a8812cf64b142685e39a69694c5276601f40e4

info | clients | tps
-----------------------+---------+---------
vanilla 9.5 | 1 | 30321
vanilla 9.5 | 8 | 216542
vanilla 9.5 | 16 | 412526
vanilla 9.5 | 32 | 751331
vanilla 9.5 | 48 | 956146
<- this point
vanilla 9.5 | 56 | 990122
vanilla 9.5 | 64 | 842436
vanilla 9.5 | 72 | 913272
vanilla 9.5 | 82 | 659332
vanilla 9.5 | 92 | 630111
vanilla 9.5 | 96 | 616863
vanilla 9.5 | 110 | 592080
vanilla 9.5 | 120 | 575831
vanilla 9.5 | 130 | 557521
vanilla 9.5 | 140 | 537951
vanilla 9.5 | 150 | 517019
vanilla 9.5 | 160 | 502312
vanilla 9.5 | 170 | 489162
vanilla 9.5 | 180 | 477178
vanilla 9.5 | 190 | 464620
vanilla 9.6 | 1 | 31738
vanilla 9.6 | 8 | 219692
vanilla 9.6 | 16 | 422933
vanilla 9.6 | 32 | 375546
vanilla 9.6 | 48 | 363474
<- this point
vanilla 9.6 | 56 | 352943
vanilla 9.6 | 64 | 334498
vanilla 9.6 | 72 | 369802
vanilla 9.6 | 82 | 604867
vanilla 9.6 | 92 | 871048
vanilla 9.6 | 96 | 969265
vanilla 9.6 | 105 | 996794
vanilla 9.6 | 110 | 932853
vanilla 9.6 | 115 | 758485
vanilla 9.6 | 120 | 721365
vanilla 9.6 | 125 | 632265
vanilla 9.6 | 130 | 624666
vanilla 9.6 | 135 | 582120
vanilla 9.6 | 140 | 583080
vanilla 9.6 | 150 | 555608
vanilla 9.6 | 160 | 533340
vanilla 9.6 | 170 | 520308
vanilla 9.6 | 180 | 504536
vanilla 9.6 | 190 | 496967

---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Dmitry Vasilyev (#3)

Re: Performance degradation in commit ac1d794

On December 25, 2015 6:27:06 PM GMT+01:00, "Васильев Дмитрий"

If so, could you provide a hierarchical before/after profile?

Performance | Git hash commit | Date
~ 360k tps | c3e7c24a1d60dc6ad56e2a0723399f1570c54224 | Thu Nov 12
09:12:18
2015 -0500
~ 360k tps | ac1d7945f866b1928c2554c0f80fd52d7f977772 | Thu Nov 12
09:00:33 2015 -0500
~ 960k tps | a05dc4d7fd57d4ae084c1f0801973e5c1a1aa26e | Thu Nov 12
07:40:31 2015 -0500

Profile as in perf oprofile or something.

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Dmitry Vasilyev

d.vasilyev@postgrespro.ru

over 10 years ago

In reply to: Andres Freund (#5)

Re: Performance degradation in commit ac1d794

2015-12-25 20:44 GMT+03:00 Andres Freund <andres@anarazel.de>:

On December 25, 2015 6:27:06 PM GMT+01:00, "Васильев Дмитрий"

If so, could you provide a hierarchical before/after profile?

Performance | Git hash commit | Date
~ 360k tps | c3e7c24a1d60dc6ad56e2a0723399f1570c54224 | Thu Nov 12
09:12:18
2015 -0500
~ 360k tps | ac1d7945f866b1928c2554c0f80fd52d7f977772 | Thu Nov 12
09:00:33 2015 -0500
~ 960k tps | a05dc4d7fd57d4ae084c1f0801973e5c1a1aa26e | Thu Nov 12
07:40:31 2015 -0500

Profile as in perf oprofile or something.

---
Please excuse brevity and formatting - I am writing this on my mobile
phone.

ac1d794:

Samples: 1M of event 'cycles', Event count (approx.): 816922259995, UID:
pgpro
Overhead Shared Object Symbol

69,72% [kernel] [k] _raw_spin_lock_irqsave
1,43% postgres [.] _bt_compare
1,19% postgres [.] LWLockAcquire
0,99% postgres [.] hash_search_with_hash_value
0,61% postgres [.] PinBuffer

0,46% postgres [.] GetSnapshotData

a05dc4d:

Samples: 1M of event 'cycles', Event count (approx.): 508150718694, UID:
pgpro
Overhead Shared Object Symbol

4,77% postgres [.] GetSnapshotData
4,30% postgres [.] _bt_compare
3,13% postgres [.] hash_search_with_hash_value
3,08% postgres [.] LWLockAcquire
2,09% postgres [.] LWLockRelease
2,03% postgres [.] PinBuffer

Perf record generate big traffic:

time perf record -u pgpro -g --call-graph=dwarf
^C[ perf record: Woken up 0 times to write data ]
Warning:
Processed 1078453 events and lost 18257 chunks!
Check IO/CPU overload!
[ perf record: Captured and wrote 8507.985 MB perf.data (1055663 samples) ]
real 0m8.791s
user 0m0.678s
sys 0m8.120s

If you want i give you ssh-access.

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Dmitry Vasilyev (#6)

Re: Performance degradation in commit ac1d794

=?UTF-8?B?0JLQsNGB0LjQu9GM0LXQsiDQlNC80LjRgtGA0LjQuQ==?= <d.vasilyev@postgrespro.ru> writes:

Samples: 1M of event 'cycles', Event count (approx.): 816922259995, UID:
pgpro
Overhead Shared Object Symbol

69,72% [kernel] [k] _raw_spin_lock_irqsave
1,43% postgres [.] _bt_compare
1,19% postgres [.] LWLockAcquire
0,99% postgres [.] hash_search_with_hash_value
0,61% postgres [.] PinBuffer

Seems like what you've got here is a kernel bug.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#7)

Re: Performance degradation in commit ac1d794

On December 25, 2015 7:10:23 PM GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:

=?UTF-8?B?0JLQsNGB0LjQu9GM0LXQsiDQlNC80LjRgtGA0LjQuQ==?=
<d.vasilyev@postgrespro.ru> writes:

�� Samples: 1M of event 'cycles', Event count (approx.):

816922259995, UID:

pgpro
Overhead Shared Object Symbol

69,72% [kernel] [k] _raw_spin_lock_irqsave
1,43% postgres [.] _bt_compare
1,19% postgres [.] LWLockAcquire
0,99% postgres [.] hash_search_with_hash_value
0,61% postgres [.] PinBuffer

Seems like what you've got here is a kernel bug.

I wouldn't go as far as calling it a kernel bug. Were still doing 300k tps. And were triggering the performance degradation by adding another socket (IIRC) to the poll(2) call.

It certainly be interesting to see the expanded tree below the spinlock. I wonder if this is related to directed wakeups.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#8)

Re: Performance degradation in commit ac1d794

Andres Freund <andres@anarazel.de> writes:

On December 25, 2015 7:10:23 PM GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Seems like what you've got here is a kernel bug.

I wouldn't go as far as calling it a kernel bug. Were still doing 300k tps. And were triggering the performance degradation by adding another socket (IIRC) to the poll(2) call.

Hmm. And all those FDs point to the same pipe. I wonder if we're looking
at contention for some pipe-related data structure inside the kernel.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Dmitry Vasilyev

d.vasilyev@postgrespro.ru

over 10 years ago

In reply to: Tom Lane (#9)

Re: Performance degradation in commit ac1d794

2015-12-25 21:28 GMT+03:00 Tom Lane <tgl@sss.pgh.pa.us>:

Andres Freund <andres@anarazel.de> writes:

On December 25, 2015 7:10:23 PM GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us>

wrote:

Seems like what you've got here is a kernel bug.

I wouldn't go as far as calling it a kernel bug. Were still doing 300k

tps. And were triggering the performance degradation by adding another
socket (IIRC) to the poll(2) call.

Hmm. And all those FDs point to the same pipe. I wonder if we're looking
at contention for some pipe-related data structure inside the kernel.

regards, tom lane

I did bt on backends and found it in following state:

#0 0x00007f77b0e5bb60 in __poll_nocancel () from /lib64/libc.so.6
#1 0x00000000006a7cd0 in WaitLatchOrSocket (latch=0x7f779e2e96c4,
wakeEvents=wakeEvents@entry=19, sock=9, timeout=timeout@entry=0) at
pg_latch.c:333
#2 0x0000000000612c7d in secure_read (port=0x17e6af0, ptr=0xcc94a0
<PqRecvBuffer>, len=8192) at be-secure.c:147
#3 0x000000000061be36 in pq_recvbuf () at pqcomm.c:915
#4 pq_getbyte () at pqcomm.c:958
#5 0x0000000000728ad5 in SocketBackend (inBuf=0x7ffd8b6b1460) at
postgres.c:345

Perf shows _raw_spin_lock_irqsave call remove_wait_queue add_wait_queue
There’s screenshots: http://i.imgur.com/pux2bGJ.png
http://i.imgur.com/LJQbm2V.png

---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

#11

Dmitry Vasilyev

d.vasilyev@postgrespro.ru

over 10 years ago

In reply to: Dmitry Vasilyev (#10)

Re: Performance degradation in commit ac1d794

2015-12-25 22:42 GMT+03:00 Васильев Дмитрий <d.vasilyev@postgrespro.ru>:

2015-12-25 21:28 GMT+03:00 Tom Lane <tgl@sss.pgh.pa.us>:

Andres Freund <andres@anarazel.de> writes:

On December 25, 2015 7:10:23 PM GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us>

wrote:

Seems like what you've got here is a kernel bug.

I wouldn't go as far as calling it a kernel bug. Were still doing 300k

tps. And were triggering the performance degradation by adding another
socket (IIRC) to the poll(2) call.

Hmm. And all those FDs point to the same pipe. I wonder if we're looking
at contention for some pipe-related data structure inside the kernel.

regards, tom lane

I did bt on backends and found it in following state:

#0 0x00007f77b0e5bb60 in __poll_nocancel () from /lib64/libc.so.6
#1 0x00000000006a7cd0 in WaitLatchOrSocket (latch=0x7f779e2e96c4,
wakeEvents=wakeEvents@entry=19, sock=9, timeout=timeout@entry=0) at
pg_latch.c:333
#2 0x0000000000612c7d in secure_read (port=0x17e6af0, ptr=0xcc94a0
<PqRecvBuffer>, len=8192) at be-secure.c:147
#3 0x000000000061be36 in pq_recvbuf () at pqcomm.c:915
#4 pq_getbyte () at pqcomm.c:958
#5 0x0000000000728ad5 in SocketBackend (inBuf=0x7ffd8b6b1460) at
postgres.c:345

Perf shows _raw_spin_lock_irqsave call remove_wait_queue add_wait_queue
There’s screenshots: http://i.imgur.com/pux2bGJ.png
http://i.imgur.com/LJQbm2V.png

---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

I’m sorry I meant remove_wait_queue and add_wait_queue functions call
_raw_spin_lock_irqsave what holds main processor time.

uname -a: 3.10.0-229.20.1.el7.x86_64 #1 SMP Tue Nov 3 19:10:07 UTC 2015
x86_64 x86_64 x86_64 GNU/Linux

---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

#12

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#9)

Re: Performance degradation in commit ac1d794

On 2015-12-25 13:28:55 -0500, Tom Lane wrote:

Hmm. And all those FDs point to the same pipe. I wonder if we're looking
at contention for some pipe-related data structure inside the kernel.

Sounds fairly likely - and not too surprising. In this scenario we've a
couple 100k registrations/unregistrations to a pretty fundamentally
shared resource (the wait queue for changes to the pipe). Not that
surprising that it becomes a problem.

There's a couple solutions I can think of to that problem:
1) Use epoll()/kqueue, or other similar interfaces that don't require
re-registering fds at every invocation. My guess is that that'd be
desirable for performance anyway.

2) Create a pair of fds between postmaster/backend for each
backend. While obviously increasing the the number of FDs noticeably,
it's interesting for other features as well: If we ever want to do FD
passing from postmaster to existing backends, we're going to need
that anyway.

3) Replace the postmaster_alive_fds socketpair by some other signalling
mechanism. E.g. sending a procsignal to each backend, which sets the
latch and a special flag in the latch structure.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#12)

Re: Performance degradation in commit ac1d794

Andres Freund <andres@anarazel.de> writes:

There's a couple solutions I can think of to that problem:
1) Use epoll()/kqueue, or other similar interfaces that don't require
re-registering fds at every invocation. My guess is that that'd be
desirable for performance anyway.

Portability, on the other hand, would be problematic.

2) Create a pair of fds between postmaster/backend for each
backend. While obviously increasing the the number of FDs noticeably,
it's interesting for other features as well: If we ever want to do FD
passing from postmaster to existing backends, we're going to need
that anyway.

Maybe; it'd provide another limit on how many backends we could run.

3) Replace the postmaster_alive_fds socketpair by some other signalling
mechanism. E.g. sending a procsignal to each backend, which sets the
latch and a special flag in the latch structure.

And what would send the signal? The entire point here is to notice the
situation where the postmaster has crashed. It can *not* depend on the
postmaster taking some action.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Tom Lane (#13)

Re: Performance degradation in commit ac1d794

On 2015-12-25 16:29:53 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

There's a couple solutions I can think of to that problem:
1) Use epoll()/kqueue, or other similar interfaces that don't require
re-registering fds at every invocation. My guess is that that'd be
desirable for performance anyway.

Portability, on the other hand, would be problematic.

Indeed. But we might be able to get away with it because there's
realistically just one platform on which people run four socket
servers. Obviously we'd leave poll and select support in place. It'd be
a genuine improvement for less extreme loads on linux, too.

3) Replace the postmaster_alive_fds socketpair by some other signalling
mechanism. E.g. sending a procsignal to each backend, which sets the
latch and a special flag in the latch structure.

And what would send the signal? The entire point here is to notice the
situation where the postmaster has crashed. It can *not* depend on the
postmaster taking some action.

Ahem. Um. Look, over there --->

I blame it on all the food.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Andres Freund (#14)

Re: Performance degradation in commit ac1d794

On 2015-12-26 12:22:48 +0100, Andres Freund wrote:

3) Replace the postmaster_alive_fds socketpair by some other signalling
mechanism. E.g. sending a procsignal to each backend, which sets the
latch and a special flag in the latch structure.

And what would send the signal? The entire point here is to notice the
situation where the postmaster has crashed. It can *not* depend on the
postmaster taking some action.

Ahem. Um. Look, over there --->

I blame it on all the food.

A unportable and easy version of this, actually making sense this time,
would be to use prctl(PR_SET_PDEATHSIG, SIGQUIT). That'd send SIGQUIT to
backends whenever postmaster dies. Obviously that's not portable
either - doing this for linux only wouldn't be all that kludgey tho.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#15)

Re: Performance degradation in commit ac1d794

Andres Freund <andres@anarazel.de> writes:

A unportable and easy version of this, actually making sense this time,
would be to use prctl(PR_SET_PDEATHSIG, SIGQUIT). That'd send SIGQUIT to
backends whenever postmaster dies. Obviously that's not portable
either - doing this for linux only wouldn't be all that kludgey tho.

Hmm. That would have semantics rather substantially different from
the way that the WL_POSTMASTER_DEATH code behaves. But I don't know
how much we care about that, since the whole scenario is something
that should not happen under normal circumstances. Maybe cross-platform
variation is OK as long as it doesn't make the code too hairy.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Amit Kapila

amit.kapila16@gmail.com

over 10 years ago

In reply to: Andres Freund (#15)

Re: Performance degradation in commit ac1d794

On Sat, Dec 26, 2015 at 5:41 PM, Andres Freund <andres@anarazel.de> wrote:

On 2015-12-26 12:22:48 +0100, Andres Freund wrote:

3) Replace the postmaster_alive_fds socketpair by some other

signalling

mechanism. E.g. sending a procsignal to each backend, which sets

the

latch and a special flag in the latch structure.

And what would send the signal? The entire point here is to notice the
situation where the postmaster has crashed. It can *not* depend on the
postmaster taking some action.

Ahem. Um. Look, over there --->

I blame it on all the food.

A unportable and easy version of this, actually making sense this time,
would be to use prctl(PR_SET_PDEATHSIG, SIGQUIT). That'd send SIGQUIT to
backends whenever postmaster dies. Obviously that's not portable
either - doing this for linux only wouldn't be all that kludgey tho.

There is a way to make backends exit in Windows as well by using
JobObjects and use limitFlags as JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE
for JOBOBJECT_BASIC_LIMIT_INFORMATION.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#18

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Andres Freund (#14)

Re: Performance degradation in commit ac1d794

On 2015-12-26 12:22:48 +0100, Andres Freund wrote:

On 2015-12-25 16:29:53 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

There's a couple solutions I can think of to that problem:
1) Use epoll()/kqueue, or other similar interfaces that don't require
re-registering fds at every invocation. My guess is that that'd be
desirable for performance anyway.

Portability, on the other hand, would be problematic.

Indeed. But we might be able to get away with it because there's
realistically just one platform on which people run four socket
servers. Obviously we'd leave poll and select support in place. It'd be
a genuine improvement for less extreme loads on linux, too.

I finally got back to working on this. Attached is a WIP patch series
implementing:
0001: Allow to easily choose between the readiness primitives in unix_latch.c
Pretty helpful for testing, not useful for anything else.
0002: Error out if waiting on socket readiness without a specified socket.
0003: Only clear unix_latch.c's self-pipe if it actually contains data.
~2% on high qps workloads
0004: Support using epoll as the polling primitive in unix_latch.c.
~3% on high qps workloads, massive scalability improvements (x3)
on very large machines.

With 0004 obviously being the relevant bit for this thread. I verified
that using epoll addresses the performance problem, using the hardware
the OP noticed the performance problem on.

The reason I went with using epoll over the PR_SET_PDEATHSIG approach is
that it provides semantics that are more similar to the other platforms,
while being just as platform dependant as PR_SET_PDEATHSIG. It also is
actually measurably faster, at least here.

0004 currently contains one debatable optimization, which I'd like to
discuss: Currently the 'sock' passed to WaitLatchOrSocket is not
removed/added to the epoll fd, if it's the numerically same as in the
last call. That's good for performance, but would be wrong if the socket
were close and a new one with the same value would be waited on. I
think a big warning sign somewhere is sufficient to deal with that
problem - it's not something we're likely to start doing. And even if
it's done at some point, we can just offer an API to reset the last used
socket fd.

Unless somebody comes up with a platform independent way of addressing
this, I'm inclined to press forward using epoll(). Opinions?

Andres

#19

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#18)

Re: Performance degradation in commit ac1d794

Andres Freund <andres@anarazel.de> writes:

0004 currently contains one debatable optimization, which I'd like to
discuss: Currently the 'sock' passed to WaitLatchOrSocket is not
removed/added to the epoll fd, if it's the numerically same as in the
last call. That's good for performance, but would be wrong if the socket
were close and a new one with the same value would be waited on. I
think a big warning sign somewhere is sufficient to deal with that
problem - it's not something we're likely to start doing. And even if
it's done at some point, we can just offer an API to reset the last used
socket fd.

Perhaps a cleaner API solution would be to remove the socket argument per
se from the function altogether, instead providing a separate
SetSocketToWaitOn() call.

(Also, if there is a need for it, we could provide a function that still
takes a socket argument, with the understanding that it's to be used for
short-lived sockets where you don't want to change the process's main
epoll state.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Andres Freund (#18)

Re: Performance degradation in commit ac1d794

On Thu, Jan 14, 2016 at 9:39 AM, Andres Freund <andres@anarazel.de> wrote:

I finally got back to working on this. Attached is a WIP patch series
implementing:
0001: Allow to easily choose between the readiness primitives in unix_latch.c
Pretty helpful for testing, not useful for anything else.

Looks good.

0002: Error out if waiting on socket readiness without a specified socket.

Looks good.

0003: Only clear unix_latch.c's self-pipe if it actually contains data.
~2% on high qps workloads

everytime -> every time

+        if ((wakeEvents & WL_LATCH_SET) && latch->is_set)
+        {
+            result |= WL_LATCH_SET;
+        }

Excess braces.

Doesn't this code make it possible for the self-pipe to fill up,
self-deadlocking the process? Suppose we repeatedly enter
WaitLatchOrSocket(). Each time we do, just after waiting = true is
set, somebody sets the latch. We handle the signal and put a byte
into the pipe. Returning from the signal handler, we then notice that
is_set is true and return at once, without draining the pipe. Repeat
until something bad happens.

0004: Support using epoll as the polling primitive in unix_latch.c.
~3% on high qps workloads, massive scalability improvements (x3)
on very large machines.

With 0004 obviously being the relevant bit for this thread. I verified
that using epoll addresses the performance problem, using the hardware
the OP noticed the performance problem on.

+                /*
+                 * Unnecessarily pass data for delete due to bug errorneously
+                 * requiring it in the past.
+                 */

This is pretty vague. And it has a spelling mistake.

Further down, nodified -> notified.

+ if (wakeEvents != latch->lastmask || latch->lastwatchfd != sock)

I don't like this very much. I think it's a bad idea to test
latch->lastwatchfd != sock. That has an excellent change of letting
people write code that appears to work but then doesn't. I think it
would be better, if we're going to change the API contract, to make it
a hard break, as I see Tom has also suggested while I've been writing
this.

Incidentally, if we're going to whack around the latch API, it would
be nice to pick a design which wouldn't be too hard to extend to
waiting on multiple sockets. The application I have in mind is to
send of queries to several foreign servers at once and then wait until
bytes come back from any of them. It's mostly pie in the sky at this
point, but it seems highly likely to me that we'd want to do such a
thing by waiting for bytes from any of the sockets involved OR a latch
event.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Robert Haas (#20)

#22

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Robert Haas (#20)

#23

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tom Lane (#21)

#24

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Robert Haas (#23)

#25

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Andres Freund (#22)

#26

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Robert Haas (#25)

#27

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Andres Freund (#26)

#28

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Robert Haas (#27)

#29

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Robert Haas (#27)

#30

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Andres Freund (#29)

#31

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andres Freund (#30)

#32

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Andres Freund (#29)

#33

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Andres Freund (#29)

#34

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 10 years ago

In reply to: Andres Freund (#32)

#35

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tom Lane (#31)

#36

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Robert Haas (#35)

#37

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tom Lane (#36)

#38

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Robert Haas (#37)

#39

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Andres Freund (#38)

#40

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Robert Haas (#39)

#41

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Andres Freund (#40)

#42

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Robert Haas (#41)

#43

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Tom Lane (#42)

#44

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 10 years ago

In reply to: Robert Haas (#33)

#45

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Kyotaro Horiguchi (#34)

#46

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Robert Haas (#27)

#47

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Andres Freund (#46)

#48

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Robert Haas (#47)

#49

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Andres Freund (#48)

#50

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Robert Haas (#49)

#51

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Robert Haas (#49)

#52

Amit Kapila

amit.kapila16@gmail.com

about 10 years ago

In reply to: Andres Freund (#51)

#53

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Andres Freund (#51)

#54

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Robert Haas (#53)

#55

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Robert Haas (#54)

#56

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Robert Haas (#53)

#57

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Robert Haas (#53)

#58

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Robert Haas (#53)

#59

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Andres Freund (#56)

#60

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Andres Freund (#55)

#61

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Andres Freund (#57)

#62

Amit Kapila

amit.kapila16@gmail.com

about 10 years ago

In reply to: Andres Freund (#58)

#63

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Robert Haas (#60)

#64

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Andres Freund (#63)

#65

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Robert Haas (#64)

#66

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Amit Kapila (#62)

#67

Alvaro Herrera

alvherre@2ndquadrant.com

about 10 years ago

In reply to: Andres Freund (#58)

#68

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Alvaro Herrera (#67)

#69

Amit Kapila

amit.kapila16@gmail.com

about 10 years ago

In reply to: Andres Freund (#66)

#70

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Amit Kapila (#69)

#71

Amit Kapila

amit.kapila16@gmail.com

about 10 years ago

In reply to: Andres Freund (#70)

#72

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Amit Kapila (#71)

#73

Amit Kapila

amit.kapila16@gmail.com

about 10 years ago

In reply to: Andres Freund (#72)

#74

Amit Kapila

amit.kapila16@gmail.com

about 10 years ago

In reply to: Andres Freund (#58)

#75

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Amit Kapila (#73)

#76

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Amit Kapila (#74)

#77

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Andres Freund (#76)

#78

Thomas Munro

thomas.munro@gmail.com

about 10 years ago

In reply to: Andres Freund (#77)

#79

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Thomas Munro (#78)

#80

Thomas Munro

thomas.munro@gmail.com

about 10 years ago

In reply to: Andres Freund (#79)

#81

Thomas Munro

thomas.munro@gmail.com

about 10 years ago

In reply to: Andres Freund (#79)

#82

Amit Kapila

amit.kapila16@gmail.com

about 10 years ago

In reply to: Andres Freund (#75)

#83

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Amit Kapila (#82)

#84

Amit Kapila

amit.kapila16@gmail.com

about 10 years ago

In reply to: Andres Freund (#83)

#85

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Thomas Munro (#81)

#86

Amit Kapila

amit.kapila16@gmail.com

about 10 years ago

In reply to: Amit Kapila (#84)

#87

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Amit Kapila (#86)

#88

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Dmitry Vasilyev (#1)

#89

Thomas Munro

thomas.munro@gmail.com

about 10 years ago

In reply to: Andres Freund (#85)

#90

Noah Misch

noah@leadboat.com

about 10 years ago

In reply to: Dmitry Vasilyev (#1)

#91

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Noah Misch (#90)

#92

Noah Misch

noah@leadboat.com

almost 10 years ago

In reply to: Robert Haas (#91)

#93

Tom Lane

tgl@sss.pgh.pa.us

almost 10 years ago

In reply to: Noah Misch (#92)

#94

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Tom Lane (#93)

Performance degradation in commit ac1d794

Attachments: