Performance degradation in commit 6150a1b0

Started by Amit Kapilaalmost 10 years ago28 messages
#1Amit Kapila
Amit Kapila
amit.kapila16@gmail.com

From past few weeks, we were facing some performance degradation in the
read-only performance bench marks in high-end machines. My colleague
Mithun, has tried by reverting commit ac1d794 which seems to degrade the
performance in HEAD on high-end m/c's as reported previously[1]/messages/by-id/CAB-SwXZh44_2ybvS5Z67p_CDz=XFn4hNAD=CnMEF+QqkXwFrGg@mail.gmail.com, but still
we were getting degradation, then we have done some profiling to see what
has caused it and we found that it's mainly caused by spin lock when
called via pin/unpin buffer and then we tried by reverting commit 6150a1b0
which has recently changed the structures in that area and it turns out
that reverting that patch, we don't see any degradation in performance.
The important point to note is that the performance degradation doesn't
occur every time, but if the tests are repeated twice or thrice, it
is easily visible.

m/c details
IBM POWER-8
24 cores,192 hardware threads
RAM - 492GB

Non-default postgresql.conf settings-
shared_buffers=16GB
max_connections=200
min_wal_size=15GB
max_wal_size=20GB
checkpoint_timeout=900
maintenance_work_mem=1GB
checkpoint_completion_target=0.9

scale_factor - 300

Performance at commit 43cd468cf01007f39312af05c4c92ceb6de8afd8 is 469002 at
64-client count and then at 6150a1b08a9fe7ead2b25240be46dddeae9d98e1, it
went down to 200807. This performance numbers are median of 3 15-min
pgbench read-only tests. The similar data is seen even when we revert the
patch on latest commit. We have yet to perform detail analysis as to why
the commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1 lead to degradation,
but any ideas are welcome.

[1]: /messages/by-id/CAB-SwXZh44_2ybvS5Z67p_CDz=XFn4hNAD=CnMEF+QqkXwFrGg@mail.gmail.com
/messages/by-id/CAB-SwXZh44_2ybvS5Z67p_CDz=XFn4hNAD=CnMEF+QqkXwFrGg@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#2Simon Riggs
Simon Riggs
simon@2ndQuadrant.com
In reply to: Amit Kapila (#1)
Re: Performance degradation in commit 6150a1b0

On 24 February 2016 at 23:26, Amit Kapila <amit.kapila16@gmail.com> wrote:

From past few weeks, we were facing some performance degradation in the
read-only performance bench marks in high-end machines. My colleague
Mithun, has tried by reverting commit ac1d794 which seems to degrade the
performance in HEAD on high-end m/c's as reported previously[1], but still
we were getting degradation, then we have done some profiling to see what
has caused it and we found that it's mainly caused by spin lock when
called via pin/unpin buffer and then we tried by reverting commit 6150a1b0
which has recently changed the structures in that area and it turns out
that reverting that patch, we don't see any degradation in performance.
The important point to note is that the performance degradation doesn't
occur every time, but if the tests are repeated twice or thrice, it
is easily visible.

Not seen that on the original patch I posted. 6150a1b0 contains multiple
changes to the lwlock structures, one written by me, others by Andres.

Perhaps we should revert that patch and re-apply the various changes in
multiple commits so we can see the differences.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Simon Riggs (#2)
Re: Performance degradation in commit 6150a1b0

On Thu, Feb 25, 2016 at 11:38 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 24 February 2016 at 23:26, Amit Kapila <amit.kapila16@gmail.com> wrote:

From past few weeks, we were facing some performance degradation in the
read-only performance bench marks in high-end machines. My colleague
Mithun, has tried by reverting commit ac1d794 which seems to degrade the
performance in HEAD on high-end m/c's as reported previously[1], but still
we were getting degradation, then we have done some profiling to see what
has caused it and we found that it's mainly caused by spin lock when
called via pin/unpin buffer and then we tried by reverting commit 6150a1b0
which has recently changed the structures in that area and it turns out
that reverting that patch, we don't see any degradation in performance.
The important point to note is that the performance degradation doesn't
occur every time, but if the tests are repeated twice or thrice, it
is easily visible.

Not seen that on the original patch I posted. 6150a1b0 contains multiple
changes to the lwlock structures, one written by me, others by Andres.

Perhaps we should revert that patch and re-apply the various changes in
multiple commits so we can see the differences.

Yes, thats one choice, other is locally we can narrow down the root cause
of problem and then try to address the same. Last time similar issue came
up on list, agreement [1]/messages/by-id/CA+TgmoYjYqegXzrBizL-Ov7zDsS=GavCnxYnGn9WZ1S=rP8DaA@mail.gmail.com was to note down it in PostgreSQL 9.6 open items
and then work on it. I think for this problem, we haven't got to the root
cause of problem, so we can try to investigate it. If nobody else steps up
to reproduce and look into problem, in few days, I will look into it.

[1]: /messages/by-id/CA+TgmoYjYqegXzrBizL-Ov7zDsS=GavCnxYnGn9WZ1S=rP8DaA@mail.gmail.com
/messages/by-id/CA+TgmoYjYqegXzrBizL-Ov7zDsS=GavCnxYnGn9WZ1S=rP8DaA@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#4Simon Riggs
Simon Riggs
simon@2ndQuadrant.com
In reply to: Amit Kapila (#3)
Re: Performance degradation in commit 6150a1b0

On 25 February 2016 at 18:42, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Feb 25, 2016 at 11:38 PM, Simon Riggs <simon@2ndquadrant.com>
wrote:

On 24 February 2016 at 23:26, Amit Kapila <amit.kapila16@gmail.com>
wrote:

From past few weeks, we were facing some performance degradation in the
read-only performance bench marks in high-end machines. My colleague
Mithun, has tried by reverting commit ac1d794 which seems to degrade the
performance in HEAD on high-end m/c's as reported previously[1], but still
we were getting degradation, then we have done some profiling to see what
has caused it and we found that it's mainly caused by spin lock when
called via pin/unpin buffer and then we tried by reverting commit 6150a1b0
which has recently changed the structures in that area and it turns out
that reverting that patch, we don't see any degradation in performance.
The important point to note is that the performance degradation doesn't
occur every time, but if the tests are repeated twice or thrice, it
is easily visible.

Not seen that on the original patch I posted. 6150a1b0 contains multiple
changes to the lwlock structures, one written by me, others by Andres.

Perhaps we should revert that patch and re-apply the various changes in
multiple commits so we can see the differences.

Yes, thats one choice, other is locally we can narrow down the root cause
of problem and then try to address the same. Last time similar issue came
up on list, agreement [1] was to note down it in PostgreSQL 9.6 open items
and then work on it. I think for this problem, we haven't got to the root
cause of problem, so we can try to investigate it. If nobody else steps up
to reproduce and look into problem, in few days, I will look into it.

[1] -
/messages/by-id/CA+TgmoYjYqegXzrBizL-Ov7zDsS=GavCnxYnGn9WZ1S=rP8DaA@mail.gmail.com

Don't understand this. If a problem is caused by one of two things, first
you check one, then the other.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#4)
Re: Performance degradation in commit 6150a1b0

On Fri, Feb 26, 2016 at 8:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Don't understand this. If a problem is caused by one of two things, first
you check one, then the other.

I don't quite understand how you think that patch can be decomposed
into multiple, independent changes. It was one commit because every
change in there is interdependent with every other one, at least as
far as I can see. I don't really understand how you'd split it up, or
what useful information you'd hope to gain from testing a split patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andres Freund
Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#1)
Re: Performance degradation in commit 6150a1b0

Hi,

On 2016-02-25 12:56:39 +0530, Amit Kapila wrote:

From past few weeks, we were facing some performance degradation in the
read-only performance bench marks in high-end machines. My colleague
Mithun, has tried by reverting commit ac1d794 which seems to degrade the
performance in HEAD on high-end m/c's as reported previously[1], but still
we were getting degradation, then we have done some profiling to see what
has caused it and we found that it's mainly caused by spin lock when
called via pin/unpin buffer and then we tried by reverting commit 6150a1b0
which has recently changed the structures in that area and it turns out
that reverting that patch, we don't see any degradation in performance.
The important point to note is that the performance degradation doesn't
occur every time, but if the tests are repeated twice or thrice, it
is easily visible.

m/c details
IBM POWER-8
24 cores,192 hardware threads
RAM - 492GB

Non-default postgresql.conf settings-
shared_buffers=16GB
max_connections=200
min_wal_size=15GB
max_wal_size=20GB
checkpoint_timeout=900
maintenance_work_mem=1GB
checkpoint_completion_target=0.9

scale_factor - 300

Performance at commit 43cd468cf01007f39312af05c4c92ceb6de8afd8 is 469002 at
64-client count and then at 6150a1b08a9fe7ead2b25240be46dddeae9d98e1, it
went down to 200807. This performance numbers are median of 3 15-min
pgbench read-only tests. The similar data is seen even when we revert the
patch on latest commit. We have yet to perform detail analysis as to why
the commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1 lead to degradation,
but any ideas are welcome.

Ugh. Especially the varying performance is odd. Does it vary between
restarts, or is it just happenstance? If it's the former, we might be
dealing with some alignment issues.

If not, I wonder if the issue is massive buffer header contention. As a
LL/SC architecture acquiring the content lock might interrupt buffer
spinlock acquisition and vice versa.

Does applying the patch from http://archives.postgresql.org/message-id/CAPpHfdu77FUi5eiNb%2BjRPFh5S%2B1U%2B8ax4Zw%3DAUYgt%2BCPsKiyWw%40mail.gmail.com
change the picture?

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#6)
Re: Performance degradation in commit 6150a1b0

On Sat, Feb 27, 2016 at 12:41 AM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2016-02-25 12:56:39 +0530, Amit Kapila wrote:

From past few weeks, we were facing some performance degradation in the
read-only performance bench marks in high-end machines. My colleague
Mithun, has tried by reverting commit ac1d794 which seems to degrade the
performance in HEAD on high-end m/c's as reported previously[1], but

still

we were getting degradation, then we have done some profiling to see

what

has caused it and we found that it's mainly caused by spin lock when
called via pin/unpin buffer and then we tried by reverting commit

6150a1b0

which has recently changed the structures in that area and it turns out
that reverting that patch, we don't see any degradation in performance.
The important point to note is that the performance degradation doesn't
occur every time, but if the tests are repeated twice or thrice, it
is easily visible.

m/c details
IBM POWER-8
24 cores,192 hardware threads
RAM - 492GB

Non-default postgresql.conf settings-
shared_buffers=16GB
max_connections=200
min_wal_size=15GB
max_wal_size=20GB
checkpoint_timeout=900
maintenance_work_mem=1GB
checkpoint_completion_target=0.9

scale_factor - 300

Performance at commit 43cd468cf01007f39312af05c4c92ceb6de8afd8 is

469002 at

64-client count and then at 6150a1b08a9fe7ead2b25240be46dddeae9d98e1, it
went down to 200807. This performance numbers are median of 3 15-min
pgbench read-only tests. The similar data is seen even when we revert

the

patch on latest commit. We have yet to perform detail analysis as to

why

the commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1 lead to degradation,
but any ideas are welcome.

Ugh. Especially the varying performance is odd. Does it vary between
restarts, or is it just happenstance? If it's the former, we might be
dealing with some alignment issues.

It varies between restarts.

If not, I wonder if the issue is massive buffer header contention. As a
LL/SC architecture acquiring the content lock might interrupt buffer
spinlock acquisition and vice versa.

Does applying the patch from

http://archives.postgresql.org/message-id/CAPpHfdu77FUi5eiNb%2BjRPFh5S%2B1U%2B8ax4Zw%3DAUYgt%2BCPsKiyWw%40mail.gmail.com

change the picture?

Not tried, but if this is alignment issue as you are suspecting above, then
does it make sense to try this out?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#8Andres Freund
Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#7)
Re: Performance degradation in commit 6150a1b0

On February 26, 2016 7:55:18 PM PST, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Feb 27, 2016 at 12:41 AM, Andres Freund <andres@anarazel.de>
wrote:

Hi,

On 2016-02-25 12:56:39 +0530, Amit Kapila wrote:

From past few weeks, we were facing some performance degradation in

the

read-only performance bench marks in high-end machines. My

colleague

Mithun, has tried by reverting commit ac1d794 which seems to

degrade the

performance in HEAD on high-end m/c's as reported previously[1],

but
still

we were getting degradation, then we have done some profiling to

see
what

has caused it and we found that it's mainly caused by spin lock

when

called via pin/unpin buffer and then we tried by reverting commit

6150a1b0

which has recently changed the structures in that area and it turns

out

that reverting that patch, we don't see any degradation in

performance.

The important point to note is that the performance degradation

doesn't

occur every time, but if the tests are repeated twice or thrice, it
is easily visible.

m/c details
IBM POWER-8
24 cores,192 hardware threads
RAM - 492GB

Non-default postgresql.conf settings-
shared_buffers=16GB
max_connections=200
min_wal_size=15GB
max_wal_size=20GB
checkpoint_timeout=900
maintenance_work_mem=1GB
checkpoint_completion_target=0.9

scale_factor - 300

Performance at commit 43cd468cf01007f39312af05c4c92ceb6de8afd8 is

469002 at

64-client count and then at

6150a1b08a9fe7ead2b25240be46dddeae9d98e1, it

went down to 200807. This performance numbers are median of 3

15-min

pgbench read-only tests. The similar data is seen even when we

revert
the

patch on latest commit. We have yet to perform detail analysis as

to
why

the commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1 lead to

degradation,

but any ideas are welcome.

Ugh. Especially the varying performance is odd. Does it vary between
restarts, or is it just happenstance? If it's the former, we might

be

dealing with some alignment issues.

It varies between restarts.

If not, I wonder if the issue is massive buffer header contention. As

a

LL/SC architecture acquiring the content lock might interrupt buffer
spinlock acquisition and vice versa.

Does applying the patch from

http://archives.postgresql.org/message-id/CAPpHfdu77FUi5eiNb%2BjRPFh5S%2B1U%2B8ax4Zw%3DAUYgt%2BCPsKiyWw%40mail.gmail.com

change the picture?

Not tried, but if this is alignment issue as you are suspecting above,
then
does it make sense to try this out?

It's the other theory I had. And it's additionally useful testing regardless of this regression...

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Ashutosh Sharma
Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Andres Freund (#8)
1 attachment(s)
Re: Performance degradation in commit 6150a1b0

Hi All,

I have been working on this issue for last few days trying to investigate
what could be the probable reasons for Performance degradation at commit
6150a1b0. After going through Andres patch for moving buffer I/O and
content lock out of Main Tranche, the following two things come into my
mind.

1. Content Lock is no more used as a pointer in BufferDesc structure
instead it is included as LWLock structure. This basically increases the
overall structure size from 64bytes to 80 bytes. Just to investigate on
this, I have reverted the changes related to content lock from commit
6150a1b0 and taken at least 10 readings and with this change i can see that
the overall performance is similar to what it was observed earlier i.e.
before commit 6150a1b0.

2. Secondly, i can see that the BufferDesc structure padding is 64 bytes
however the PG CACHE LINE ALIGNMENT is 128 bytes. Also, after changing the
BufferDesc structure padding size to 128 bytes along with the changes
mentioned in above point #1, I see that the overall performance is again
similar to what is observed before commit 6150a1b0.

Please have a look into the attached test report that contains the
performance test results for all the scenarios discussed above and let me
know your thoughts.

With Regards,
Ashutosh Sharma
EnterpriseDB: *http://www.enterprisedb.com <http://www.enterprisedb.com&gt;*

On Sat, Feb 27, 2016 at 9:26 AM, Andres Freund <andres@anarazel.de> wrote:

Show quoted text

On February 26, 2016 7:55:18 PM PST, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Sat, Feb 27, 2016 at 12:41 AM, Andres Freund <andres@anarazel.de>
wrote:

Hi,

On 2016-02-25 12:56:39 +0530, Amit Kapila wrote:

From past few weeks, we were facing some performance degradation in

the

read-only performance bench marks in high-end machines. My

colleague

Mithun, has tried by reverting commit ac1d794 which seems to

degrade the

performance in HEAD on high-end m/c's as reported previously[1],

but
still

we were getting degradation, then we have done some profiling to

see
what

has caused it and we found that it's mainly caused by spin lock

when

called via pin/unpin buffer and then we tried by reverting commit

6150a1b0

which has recently changed the structures in that area and it turns

out

that reverting that patch, we don't see any degradation in

performance.

The important point to note is that the performance degradation

doesn't

occur every time, but if the tests are repeated twice or thrice, it
is easily visible.

m/c details
IBM POWER-8
24 cores,192 hardware threads
RAM - 492GB

Non-default postgresql.conf settings-
shared_buffers=16GB
max_connections=200
min_wal_size=15GB
max_wal_size=20GB
checkpoint_timeout=900
maintenance_work_mem=1GB
checkpoint_completion_target=0.9

scale_factor - 300

Performance at commit 43cd468cf01007f39312af05c4c92ceb6de8afd8 is

469002 at

64-client count and then at

6150a1b08a9fe7ead2b25240be46dddeae9d98e1, it

went down to 200807. This performance numbers are median of 3

15-min

pgbench read-only tests. The similar data is seen even when we

revert
the

patch on latest commit. We have yet to perform detail analysis as

to
why

the commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1 lead to

degradation,

but any ideas are welcome.

Ugh. Especially the varying performance is odd. Does it vary between
restarts, or is it just happenstance? If it's the former, we might

be

dealing with some alignment issues.

It varies between restarts.

If not, I wonder if the issue is massive buffer header contention. As

a

LL/SC architecture acquiring the content lock might interrupt buffer
spinlock acquisition and vice versa.

Does applying the patch from

http://archives.postgresql.org/message-id/CAPpHfdu77FUi5eiNb%2BjRPFh5S%2B1U%2B8ax4Zw%3DAUYgt%2BCPsKiyWw%40mail.gmail.com

change the picture?

Not tried, but if this is alignment issue as you are suspecting above,
then
does it make sense to try this out?

It's the other theory I had. And it's additionally useful testing
regardless of this regression...

---
Please excuse brevity and formatting - I am writing this on my mobile
phone.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachments:

Performance_Results.xlsxapplication/vnd.openxmlformats-officedocument.spreadsheetml.sheet; name=Performance_Results.xlsx
#10Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Sharma (#9)
Re: Performance degradation in commit 6150a1b0

On Wed, Mar 23, 2016 at 1:59 PM, Ashutosh Sharma <ashu.coek88@gmail.com>
wrote:

Hi All,

I have been working on this issue for last few days trying to investigate

what could be the probable reasons for Performance degradation at commit
6150a1b0. After going through Andres patch for moving buffer I/O and
content lock out of Main Tranche, the following two things come into my

mind.

1. Content Lock is no more used as a pointer in BufferDesc structure

instead it is included as LWLock structure. This basically increases the
overall structure size from 64bytes to 80 bytes. Just to investigate on
this, I have reverted the changes related to content lock from commit
6150a1b0 and taken at least 10 readings and with this change i can see that
the overall performance is similar to what it was observed earlier i.e.
before commit 6150a1b0.

2. Secondly, i can see that the BufferDesc structure padding is 64 bytes

however the PG CACHE LINE ALIGNMENT is 128 bytes. Also, after changing the
BufferDesc structure padding size to 128 bytes along with the changes
mentioned in above point #1, I see that the overall performance is again
similar to what is observed before commit 6150a1b0.

Please have a look into the attached test report that contains the

performance test results for all the scenarios discussed above and let me
know your thoughts.

So this indicates that changing back content lock as LWLock* in BufferDesc
brings back the performance which indicates that increase in BufferDesc
size to more than 64bytes on this platform has caused regression. I think
it is worth trying the patch [1]/messages/by-id/CAPpHfdsRoT1JmsnRnCCqpNZEU9vUT7TX6B-N1wyOuWWfhD6F+g@mail.gmail.com as suggested by Andres as that will reduce
the size of BufferDesc which can bring back the performance. Can you once
try the same?

[1]: /messages/by-id/CAPpHfdsRoT1JmsnRnCCqpNZEU9vUT7TX6B-N1wyOuWWfhD6F+g@mail.gmail.com
/messages/by-id/CAPpHfdsRoT1JmsnRnCCqpNZEU9vUT7TX6B-N1wyOuWWfhD6F+g@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#11Andres Freund
Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#10)
Re: Performance degradation in commit 6150a1b0

On 2016-03-25 09:29:34 +0530, Amit Kapila wrote:

2. Secondly, i can see that the BufferDesc structure padding is 64 bytes

however the PG CACHE LINE ALIGNMENT is 128 bytes. Also, after changing the
BufferDesc structure padding size to 128 bytes along with the changes
mentioned in above point #1, I see that the overall performance is again
similar to what is observed before commit 6150a1b0.

That makes sense, as it restores alignment.

So this indicates that changing back content lock as LWLock* in BufferDesc
brings back the performance which indicates that increase in BufferDesc
size to more than 64bytes on this platform has caused regression. I think
it is worth trying the patch [1] as suggested by Andres as that will reduce
the size of BufferDesc which can bring back the performance. Can you once
try the same?

[1] -
/messages/by-id/CAPpHfdsRoT1JmsnRnCCqpNZEU9vUT7TX6B-N1wyOuWWfhD6F+g@mail.gmail.com

Yes please. I'll try to review that once more ASAP.

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Ashutosh Sharma
Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Amit Kapila (#10)
Re: Performance degradation in commit 6150a1b0

Hi,

I am getting some reject files while trying to apply "*pinunpin-cas-5.patch*"
attached with the thread,

*/messages/by-id/CAPpHfdsRoT1JmsnRnCCqpNZEU9vUT7TX6B-N1wyOuWWfhD6F+g@mail.gmail.com
</messages/by-id/CAPpHfdsRoT1JmsnRnCCqpNZEU9vUT7TX6B-N1wyOuWWfhD6F+g@mail.gmail.com&gt;*
Note: I am applying this patch on top of commit "
*6150a1b08a9fe7ead2b25240be46dddeae9d98e1*".

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com

On Fri, Mar 25, 2016 at 9:29 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

Show quoted text

On Wed, Mar 23, 2016 at 1:59 PM, Ashutosh Sharma <ashu.coek88@gmail.com>
wrote:

Hi All,

I have been working on this issue for last few days trying to

investigate what could be the probable reasons for Performance degradation
at commit 6150a1b0. After going through Andres patch for moving buffer I/O
and content lock out of Main Tranche, the following two things come into my

mind.

1. Content Lock is no more used as a pointer in BufferDesc structure

instead it is included as LWLock structure. This basically increases the
overall structure size from 64bytes to 80 bytes. Just to investigate on
this, I have reverted the changes related to content lock from commit
6150a1b0 and taken at least 10 readings and with this change i can see that
the overall performance is similar to what it was observed earlier i.e.
before commit 6150a1b0.

2. Secondly, i can see that the BufferDesc structure padding is 64 bytes

however the PG CACHE LINE ALIGNMENT is 128 bytes. Also, after changing the
BufferDesc structure padding size to 128 bytes along with the changes
mentioned in above point #1, I see that the overall performance is again
similar to what is observed before commit 6150a1b0.

Please have a look into the attached test report that contains the

performance test results for all the scenarios discussed above and let me
know your thoughts.

So this indicates that changing back content lock as LWLock* in BufferDesc
brings back the performance which indicates that increase in BufferDesc
size to more than 64bytes on this platform has caused regression. I think
it is worth trying the patch [1] as suggested by Andres as that will reduce
the size of BufferDesc which can bring back the performance. Can you once
try the same?

[1] -
/messages/by-id/CAPpHfdsRoT1JmsnRnCCqpNZEU9vUT7TX6B-N1wyOuWWfhD6F+g@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#13Ashutosh Sharma
Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Ashutosh Sharma (#12)
1 attachment(s)
Re: Performance degradation in commit 6150a1b0

Hi,

As mentioned in my earlier mail i was not able to apply
*pinunpin-cas-5.patch* on commit *6150a1b0, *therefore i thought of
applying it on the
latest commit and i was able to do it successfully. I have now taken the
performance readings at latest commit i.e. *76281aa9* with and without
applying *pinunpin-cas-5.patch* and my observations are as follows,

1. I can still see that the current performance lags by 2-3% from the
expected performance when *pinunpin-cas-5.patch *is applied on the commit

*76281aa9.*
2. When *pinunpin-cas-5.patch *is ignored and performance is measured at
commit *76281aa9 *the overall performance lags by 50-60% from the expected
performance.

*Note:* Here, the expected performance is the performance observed before
commit *6150a1b0 *when* ac1d794 *is reverted.

Please refer to the attached performance report sheet for more insights.

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/&gt;

On Sat, Mar 26, 2016 at 9:31 PM, Ashutosh Sharma <ashu.coek88@gmail.com>
wrote:

Show quoted text

Hi,

I am getting some reject files while trying to apply "
*pinunpin-cas-5.patch*" attached with the thread,

*/messages/by-id/CAPpHfdsRoT1JmsnRnCCqpNZEU9vUT7TX6B-N1wyOuWWfhD6F+g@mail.gmail.com
</messages/by-id/CAPpHfdsRoT1JmsnRnCCqpNZEU9vUT7TX6B-N1wyOuWWfhD6F+g@mail.gmail.com&gt;*
Note: I am applying this patch on top of commit "
*6150a1b08a9fe7ead2b25240be46dddeae9d98e1*".

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com

On Fri, Mar 25, 2016 at 9:29 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Wed, Mar 23, 2016 at 1:59 PM, Ashutosh Sharma <ashu.coek88@gmail.com>
wrote:

Hi All,

I have been working on this issue for last few days trying to

investigate what could be the probable reasons for Performance degradation
at commit 6150a1b0. After going through Andres patch for moving buffer I/O
and content lock out of Main Tranche, the following two things come into my

mind.

1. Content Lock is no more used as a pointer in BufferDesc structure

instead it is included as LWLock structure. This basically increases the
overall structure size from 64bytes to 80 bytes. Just to investigate on
this, I have reverted the changes related to content lock from commit
6150a1b0 and taken at least 10 readings and with this change i can see that
the overall performance is similar to what it was observed earlier i.e.
before commit 6150a1b0.

2. Secondly, i can see that the BufferDesc structure padding is 64

bytes however the PG CACHE LINE ALIGNMENT is 128 bytes. Also, after
changing the BufferDesc structure padding size to 128 bytes along with the
changes mentioned in above point #1, I see that the overall performance is
again similar to what is observed before commit 6150a1b0.

Please have a look into the attached test report that contains the

performance test results for all the scenarios discussed above and let me
know your thoughts.

So this indicates that changing back content lock as LWLock* in
BufferDesc brings back the performance which indicates that increase in
BufferDesc size to more than 64bytes on this platform has caused
regression. I think it is worth trying the patch [1] as suggested by
Andres as that will reduce the size of BufferDesc which can bring back the
performance. Can you once try the same?

[1] -
/messages/by-id/CAPpHfdsRoT1JmsnRnCCqpNZEU9vUT7TX6B-N1wyOuWWfhD6F+g@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

Performance_Results_with_pinunpin_cas_changes.xlsxapplication/vnd.openxmlformats-officedocument.spreadsheetml.sheet; name=Performance_Results_with_pinunpin_cas_changes.xlsx
#14Andres Freund
Andres Freund
andres@anarazel.de
In reply to: Ashutosh Sharma (#13)
Re: Performance degradation in commit 6150a1b0

Hi,

On 2016-03-27 02:34:32 +0530, Ashutosh Sharma wrote:

As mentioned in my earlier mail i was not able to apply
*pinunpin-cas-5.patch* on commit *6150a1b0,

That's not surprising; that's pretty old.

*therefore i thought of applying it on the latest commit and i was
able to do it successfully. I have now taken the performance readings
at latest commit i.e. *76281aa9* with and without applying
*pinunpin-cas-5.patch* and my observations are as follows,

1. I can still see that the current performance lags by 2-3% from the
expected performance when *pinunpin-cas-5.patch *is applied on the commit

*76281aa9.*
2. When *pinunpin-cas-5.patch *is ignored and performance is measured at
commit *76281aa9 *the overall performance lags by 50-60% from the expected
performance.

*Note:* Here, the expected performance is the performance observed before
commit *6150a1b0 *when* ac1d794 *is reverted.

Thanks for doing these benchmarks. What's the performance if you revert
6150a1b0 on top of a recent master? There've been a lot of other patches
influencing performance since 6150a1b0, so minor performance differences
aren't necessarily meaningful; especially when that older version then
had other patches reverted.

Thanks,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Ashutosh Sharma
Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Andres Freund (#14)
Re: Performance degradation in commit 6150a1b0

Hi,

I am unable to revert 6150a1b0 on top of recent commit in the master
branch. It seems like there has been some commit made recently that has got
dependency on 6150a1b0.

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com

On Sun, Mar 27, 2016 at 5:45 PM, Andres Freund <andres@anarazel.de> wrote:

Show quoted text

Hi,

On 2016-03-27 02:34:32 +0530, Ashutosh Sharma wrote:

As mentioned in my earlier mail i was not able to apply
*pinunpin-cas-5.patch* on commit *6150a1b0,

That's not surprising; that's pretty old.

*therefore i thought of applying it on the latest commit and i was
able to do it successfully. I have now taken the performance readings
at latest commit i.e. *76281aa9* with and without applying
*pinunpin-cas-5.patch* and my observations are as follows,

1. I can still see that the current performance lags by 2-3% from the
expected performance when *pinunpin-cas-5.patch *is applied on the commit

*76281aa9.*
2. When *pinunpin-cas-5.patch *is ignored and performance is measured at
commit *76281aa9 *the overall performance lags by 50-60% from the

expected

performance.

*Note:* Here, the expected performance is the performance observed before
commit *6150a1b0 *when* ac1d794 *is reverted.

Thanks for doing these benchmarks. What's the performance if you revert
6150a1b0 on top of a recent master? There've been a lot of other patches
influencing performance since 6150a1b0, so minor performance differences
aren't necessarily meaningful; especially when that older version then
had other patches reverted.

Thanks,

Andres

#16Noah Misch
Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#14)
Re: Performance degradation in commit 6150a1b0

On Sun, Mar 27, 2016 at 02:15:50PM +0200, Andres Freund wrote:

On 2016-03-27 02:34:32 +0530, Ashutosh Sharma wrote:

As mentioned in my earlier mail i was not able to apply
*pinunpin-cas-5.patch* on commit *6150a1b0,

That's not surprising; that's pretty old.

*therefore i thought of applying it on the latest commit and i was
able to do it successfully. I have now taken the performance readings
at latest commit i.e. *76281aa9* with and without applying
*pinunpin-cas-5.patch* and my observations are as follows,

1. I can still see that the current performance lags by 2-3% from the
expected performance when *pinunpin-cas-5.patch *is applied on the commit

*76281aa9.*
2. When *pinunpin-cas-5.patch *is ignored and performance is measured at
commit *76281aa9 *the overall performance lags by 50-60% from the expected
performance.

*Note:* Here, the expected performance is the performance observed before
commit *6150a1b0 *when* ac1d794 *is reverted.

Thanks for doing these benchmarks. What's the performance if you revert
6150a1b0 on top of a recent master? There've been a lot of other patches
influencing performance since 6150a1b0, so minor performance differences
aren't necessarily meaningful; especially when that older version then
had other patches reverted.

[This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item. Andres,
since you committed the patch believed to have created it, you own this open
item. If that responsibility lies elsewhere, please let us know whose
responsibility it is to fix this. Since new open items may be discovered at
any time and I want to plan to have them all fixed well in advance of the ship
date, I will appreciate your efforts toward speedy resolution. Please
present, within 72 hours, a plan to fix the defect within seven days of this
message. Thanks.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Noah Misch
Noah Misch
noah@leadboat.com
In reply to: Noah Misch (#16)
Re: Performance degradation in commit 6150a1b0

On Thu, Mar 31, 2016 at 01:10:56AM -0400, Noah Misch wrote:

On Sun, Mar 27, 2016 at 02:15:50PM +0200, Andres Freund wrote:

On 2016-03-27 02:34:32 +0530, Ashutosh Sharma wrote:

As mentioned in my earlier mail i was not able to apply
*pinunpin-cas-5.patch* on commit *6150a1b0,

That's not surprising; that's pretty old.

*therefore i thought of applying it on the latest commit and i was
able to do it successfully. I have now taken the performance readings
at latest commit i.e. *76281aa9* with and without applying
*pinunpin-cas-5.patch* and my observations are as follows,

1. I can still see that the current performance lags by 2-3% from the
expected performance when *pinunpin-cas-5.patch *is applied on the commit

*76281aa9.*
2. When *pinunpin-cas-5.patch *is ignored and performance is measured at
commit *76281aa9 *the overall performance lags by 50-60% from the expected
performance.

*Note:* Here, the expected performance is the performance observed before
commit *6150a1b0 *when* ac1d794 *is reverted.

Thanks for doing these benchmarks. What's the performance if you revert
6150a1b0 on top of a recent master? There've been a lot of other patches
influencing performance since 6150a1b0, so minor performance differences
aren't necessarily meaningful; especially when that older version then
had other patches reverted.

[This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item. Andres,
since you committed the patch believed to have created it, you own this open
item. If that responsibility lies elsewhere, please let us know whose
responsibility it is to fix this. Since new open items may be discovered at
any time and I want to plan to have them all fixed well in advance of the ship
date, I will appreciate your efforts toward speedy resolution. Please
present, within 72 hours, a plan to fix the defect within seven days of this
message. Thanks.

My attribution above was incorrect. Robert Haas is the committer and owner of
this one. I apologize.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Andres Freund
Andres Freund
andres@anarazel.de
In reply to: Noah Misch (#17)
Re: Performance degradation in commit 6150a1b0

On March 31, 2016 7:16:33 AM GMT+02:00, Noah Misch <noah@leadboat.com> wrote:

On Thu, Mar 31, 2016 at 01:10:56AM -0400, Noah Misch wrote:

On Sun, Mar 27, 2016 at 02:15:50PM +0200, Andres Freund wrote:

On 2016-03-27 02:34:32 +0530, Ashutosh Sharma wrote:

As mentioned in my earlier mail i was not able to apply
*pinunpin-cas-5.patch* on commit *6150a1b0,

That's not surprising; that's pretty old.

*therefore i thought of applying it on the latest commit and i

was

able to do it successfully. I have now taken the performance

readings

at latest commit i.e. *76281aa9* with and without applying
*pinunpin-cas-5.patch* and my observations are as follows,

1. I can still see that the current performance lags by 2-3% from

the

expected performance when *pinunpin-cas-5.patch *is applied on

the commit

*76281aa9.*
2. When *pinunpin-cas-5.patch *is ignored and performance is

measured at

commit *76281aa9 *the overall performance lags by 50-60% from the

expected

performance.

*Note:* Here, the expected performance is the performance

observed before

commit *6150a1b0 *when* ac1d794 *is reverted.

Thanks for doing these benchmarks. What's the performance if you

revert

6150a1b0 on top of a recent master? There've been a lot of other

patches

influencing performance since 6150a1b0, so minor performance

differences

aren't necessarily meaningful; especially when that older version

then

had other patches reverted.

[This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.

Andres,

since you committed the patch believed to have created it, you own

this open

item. If that responsibility lies elsewhere, please let us know

whose

responsibility it is to fix this. Since new open items may be

discovered at

any time and I want to plan to have them all fixed well in advance of

the ship

date, I will appreciate your efforts toward speedy resolution.

Please

present, within 72 hours, a plan to fix the defect within seven days

of this

message. Thanks.

My attribution above was incorrect. Robert Haas is the committer and
owner of
this one. I apologize.

Fine in this case I guess. I've posted a proposal nearby either way, it appears to be a !x86 problem.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#18)
Re: Performance degradation in commit 6150a1b0

On Thu, Mar 31, 2016 at 3:51 AM, Andres Freund <andres@anarazel.de> wrote:

My attribution above was incorrect. Robert Haas is the committer and
owner of
this one. I apologize.

Fine in this case I guess. I've posted a proposal nearby either way, it appears to be a !x86 problem.

To which proposal are you referring?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Andres Freund
Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#19)
Re: Performance degradation in commit 6150a1b0

On 2016-03-31 06:43:19 -0400, Robert Haas wrote:

On Thu, Mar 31, 2016 at 3:51 AM, Andres Freund <andres@anarazel.de> wrote:

My attribution above was incorrect. Robert Haas is the committer and
owner of
this one. I apologize.

Fine in this case I guess. I've posted a proposal nearby either way, it appears to be a !x86 problem.

To which proposal are you referring?

1) in /messages/by-id/20160328130904.4mhugvkf4f3wg4qb@awork2.anarazel.de

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#20)
Re: Performance degradation in commit 6150a1b0

On Thu, Mar 31, 2016 at 6:45 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-03-31 06:43:19 -0400, Robert Haas wrote:

On Thu, Mar 31, 2016 at 3:51 AM, Andres Freund <andres@anarazel.de> wrote:

My attribution above was incorrect. Robert Haas is the committer and
owner of
this one. I apologize.

Fine in this case I guess. I've posted a proposal nearby either way, it appears to be a !x86 problem.

To which proposal are you referring?

1) in /messages/by-id/20160328130904.4mhugvkf4f3wg4qb@awork2.anarazel.de

OK. So, Noah, my proposed strategy is to wait and see if Andres can
make that work, and if not, then revisit the issue of what to do.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Tom Lane
Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#21)
Re: Performance degradation in commit 6150a1b0

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Mar 31, 2016 at 6:45 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-03-31 06:43:19 -0400, Robert Haas wrote:

To which proposal are you referring?

1) in /messages/by-id/20160328130904.4mhugvkf4f3wg4qb@awork2.anarazel.de

OK. So, Noah, my proposed strategy is to wait and see if Andres can
make that work, and if not, then revisit the issue of what to do.

I thought that proposal had already crashed and burned, on the grounds
that byte-size spinlocks require instructions that many PPC machines
don't have.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#22)
Re: Performance degradation in commit 6150a1b0

On Thu, Mar 31, 2016 at 10:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Mar 31, 2016 at 6:45 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-03-31 06:43:19 -0400, Robert Haas wrote:

To which proposal are you referring?

1) in /messages/by-id/20160328130904.4mhugvkf4f3wg4qb@awork2.anarazel.de

OK. So, Noah, my proposed strategy is to wait and see if Andres can
make that work, and if not, then revisit the issue of what to do.

I thought that proposal had already crashed and burned, on the grounds
that byte-size spinlocks require instructions that many PPC machines
don't have.

So the current status of this issue is:

1. Andres committed a patch (008608b9d51061b1f598c197477b3dc7be9c4a64)
to reduce the size of an LWLock by an amount equal to the size of a
mutex (modulo alignment).

2. Andres also committed a patch
(48354581a49c30f5757c203415aa8412d85b0f70) to remove the spinlock from
a BufferDesc, which also reduces its size, I think, because it
replaces members of types BufFlags (2 bytes), uint8, slock_t, and
unsigned with a single member of type pg_atomic_uint32.

The reason why these changes are relevant is because Andres thought
the observed regression might be related to the BufferDesc growing to
more than 64 bytes on POWER, which in turn could cause buffer
descriptors to get split across cache lines. However, in the
meantime, I did some performance tests on the same machine that Amit
used for testing in the email that started this thread:

/messages/by-id/CA+TgmoZJdA6K7-17K4A48rVB0UPR98HVuaNcfNNLrGsdb1uChg@mail.gmail.com

The upshot of that is that (1) the performance degradation I saw was
significant but smaller than what Amit reported in the OP, and (2) it
looked like the patches Andres gave me to test at the time got
performance back to about the same level we were at before 6150a1b0.
So there's room for optimism that this is fixed, but perhaps some
retesting is in order, since what was committed was, I think, not
identical to what I tested.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Noah Misch
Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#23)
Re: Performance degradation in commit 6150a1b0

On Tue, Apr 12, 2016 at 05:36:07PM -0400, Robert Haas wrote:

So the current status of this issue is:

1. Andres committed a patch (008608b9d51061b1f598c197477b3dc7be9c4a64)
to reduce the size of an LWLock by an amount equal to the size of a
mutex (modulo alignment).

2. Andres also committed a patch
(48354581a49c30f5757c203415aa8412d85b0f70) to remove the spinlock from
a BufferDesc, which also reduces its size, I think, because it
replaces members of types BufFlags (2 bytes), uint8, slock_t, and
unsigned with a single member of type pg_atomic_uint32.

The reason why these changes are relevant is because Andres thought
the observed regression might be related to the BufferDesc growing to
more than 64 bytes on POWER, which in turn could cause buffer
descriptors to get split across cache lines. However, in the
meantime, I did some performance tests on the same machine that Amit
used for testing in the email that started this thread:

/messages/by-id/CA+TgmoZJdA6K7-17K4A48rVB0UPR98HVuaNcfNNLrGsdb1uChg@mail.gmail.com

The upshot of that is that (1) the performance degradation I saw was
significant but smaller than what Amit reported in the OP, and (2) it
looked like the patches Andres gave me to test at the time got
performance back to about the same level we were at before 6150a1b0.
So there's room for optimism that this is fixed, but perhaps some
retesting is in order, since what was committed was, I think, not
identical to what I tested.

That sounds like this open item is ready for CLOSE_WAIT status; is it?

If someone does retest this, it would be informative to see how the system
performs with 6150a1b0 reverted. Your testing showed performance of 6150a1b0
alone and of 6150a1b0 plus predecessors of 008608b and 4835458. I don't
recall seeing figures for 008608b + 4835458 - 6150a1b0, though.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#24)
Re: Performance degradation in commit 6150a1b0

On Tue, Apr 12, 2016 at 10:30 PM, Noah Misch <noah@leadboat.com> wrote:

That sounds like this open item is ready for CLOSE_WAIT status; is it?

I just retested this on power2. Here are the results. I retested
3fed4174 and 6150a1b0 plus master as of deb71fa9. 5-minute pgbench -S
runs, scale factor 300, with predictable prewarming to minimize
variation, as well as numactl --interleave. Each result is a median
of three.

1 client: 3fed4174 = 13701.014931, 6150a1b0 = 13669.626916, master =
19685.571089
8 clients: 3fed4174 = 126676.357079, 6150a1b0 = 125239.911105, master
= 122940.079404
32 clients: 3fed4174 = 323989.685428, 6150a1b0 = 338638.095126, master
= 333656.861590
64 clients: 3fed4174 = 495434.372578, 6150a1b0 = 457794.475129, master
= 493034.922791
128 clients: 3fed4174 = 376412.090366, 6150a1b0 = 363157.294391,
master = 625498.280370

On this test 8, 32, and 64 clients are coming out about the same as
3fed4174, but 1 client and 128 clients are dramatically improved with
current master. The 1-client result is a lot more surprising than the
128-client result; I don't know what's going on there. But anyway I
don't see a regression here.

So, yes, I would say this should go to CLOSE_WAIT at this point,
unless Amit or somebody else turns up further evidence of a continuing
issue here.

Random points of possible interest:

1. During a 128-client run, top shows about 45% user time, 10% system
time, 45% idle.

2. About 3 minutes into a 128-client run, perf looks like this
(substantially abridged):

3.55% postgres postgres [.] GetSnapshotData
2.15% postgres postgres [.] LWLockAttemptLock
|--32.82%-- LockBuffer
| |--48.59%-- _bt_relandgetbuf
| |--44.07%-- _bt_getbuf
|--29.81%-- ReadBuffer_common
|--23.88%-- GetSnapshotData
|--5.30%-- LockAcquireExtended
2.12% postgres postgres [.] LWLockRelease
2.02% postgres postgres [.] _bt_compare
1.88% postgres postgres [.]
hash_search_with_hash_value
|--47.21%-- BufTableLookup
|--10.93%-- LockAcquireExtended
|--5.43%-- GetPortalByName
|--5.21%-- ReadBuffer_common
|--4.68%-- RelationIdGetRelation
1.87% postgres postgres [.] AllocSetAlloc
1.42% postgres postgres [.] PinBuffer.isra.3
0.96% postgres libc-2.17.so [.] __memcpy_power7
0.89% postgres postgres [.]
UnpinBuffer.constprop.7
0.80% postgres postgres [.] PostgresMain
0.80% postgres postgres [.]
pg_encoding_mbcliplen
0.71% postgres postgres [.] hash_any
0.62% postgres postgres [.] AllocSetFree
0.59% postgres postgres [.] palloc
0.57% postgres libc-2.17.so [.] _int_free

A context-switch profile, somewhat amazingly, shows no context
switches for anything other than waiting on client read, implying that
performance is entirely constrained by memory bandwidth and CPU speed,
not lock contention.

If someone does retest this, it would be informative to see how the system
performs with 6150a1b0 reverted. Your testing showed performance of 6150a1b0
alone and of 6150a1b0 plus predecessors of 008608b and 4835458. I don't
recall seeing figures for 008608b + 4835458 - 6150a1b0, though.

That revert isn't trivial: even what exactly that would mean at this
point is somewhat subjective. I'm also not sure there is much point.
6150a1b08a9fe7ead2b25240be46dddeae9d98e1 was written in such a way
that only platforms with single-byte spinlocks were going to have a
BufferDesc that fits into 64 bytes, which in retrospect was a bit
short-sighted. Because the changes that were made to get it back down
to 64 bytes might also have other performance-relevant consequences,
it's a bit hard to be sure that that was the precise thing that caused
the regression. And of course there was a fury of other commits going
in at the same time, some even on related topics, which further adds
to the difficulty of pinpointing this precisely. All that is a bit
unfortunate in some sense, but I think we're just going to have to
keep moving forward and hope for the best.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26Noah Misch
Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#25)
Re: Performance degradation in commit 6150a1b0

On Tue, Apr 12, 2016 at 11:40:43PM -0400, Robert Haas wrote:

On Tue, Apr 12, 2016 at 10:30 PM, Noah Misch <noah@leadboat.com> wrote:

That sounds like this open item is ready for CLOSE_WAIT status; is it?

I just retested this on power2.

So, yes, I would say this should go to CLOSE_WAIT at this point,
unless Amit or somebody else turns up further evidence of a continuing
issue here.

Thanks for testing again.

If someone does retest this, it would be informative to see how the system
performs with 6150a1b0 reverted. Your testing showed performance of 6150a1b0
alone and of 6150a1b0 plus predecessors of 008608b and 4835458. I don't
recall seeing figures for 008608b + 4835458 - 6150a1b0, though.

That revert isn't trivial: even what exactly that would mean at this
point is somewhat subjective. I'm also not sure there is much point.
6150a1b08a9fe7ead2b25240be46dddeae9d98e1 was written in such a way
that only platforms with single-byte spinlocks were going to have a
BufferDesc that fits into 64 bytes, which in retrospect was a bit
short-sighted. Because the changes that were made to get it back down
to 64 bytes might also have other performance-relevant consequences,
it's a bit hard to be sure that that was the precise thing that caused
the regression. And of course there was a fury of other commits going
in at the same time, some even on related topics, which further adds
to the difficulty of pinpointing this precisely. All that is a bit
unfortunate in some sense, but I think we're just going to have to
keep moving forward and hope for the best.

I can live with that.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#25)
Re: Performance degradation in commit 6150a1b0

On Wed, Apr 13, 2016 at 9:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Apr 12, 2016 at 10:30 PM, Noah Misch <noah@leadboat.com> wrote:

That sounds like this open item is ready for CLOSE_WAIT status; is it?

I just retested this on power2. Here are the results. I retested
3fed4174 and 6150a1b0 plus master as of deb71fa9. 5-minute pgbench -S
runs, scale factor 300, with predictable prewarming to minimize
variation, as well as numactl --interleave. Each result is a median
of three.

1 client: 3fed4174 = 13701.014931, 6150a1b0 = 13669.626916, master =
19685.571089
8 clients: 3fed4174 = 126676.357079, 6150a1b0 = 125239.911105, master
= 122940.079404
32 clients: 3fed4174 = 323989.685428, 6150a1b0 = 338638.095126, master
= 333656.861590
64 clients: 3fed4174 = 495434.372578, 6150a1b0 = 457794.475129, master
= 493034.922791
128 clients: 3fed4174 = 376412.090366, 6150a1b0 = 363157.294391,
master = 625498.280370

On this test 8, 32, and 64 clients are coming out about the same as
3fed4174, but 1 client and 128 clients are dramatically improved with
current master. The 1-client result is a lot more surprising than the
128-client result; I don't know what's going on there. But anyway I
don't see a regression here.

So, yes, I would say this should go to CLOSE_WAIT at this point,
unless Amit or somebody else turns up further evidence of a continuing
issue here.

Yes, I also think that this particular issue can be closed. However I felt
that the observation related to performance variation is still present as I
never need to perform prewarm or anything else to get consistent results
during my work in 9.5 or early 9.6. Also, Andres, Alexander and myself are
working on similar observation (run-to-run performance variation) in a
nearby thread [1]/messages/by-id/20160412160246.nyzil35w3wein5fm@alap3.anarazel.de.

[1]: /messages/by-id/20160412160246.nyzil35w3wein5fm@alap3.anarazel.de
/messages/by-id/20160412160246.nyzil35w3wein5fm@alap3.anarazel.de

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#28Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#27)
Re: Performance degradation in commit 6150a1b0

On Wed, Apr 13, 2016 at 11:22 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Yes, I also think that this particular issue can be closed. However I felt
that the observation related to performance variation is still present as I
never need to perform prewarm or anything else to get consistent results
during my work in 9.5 or early 9.6. Also, Andres, Alexander and myself are
working on similar observation (run-to-run performance variation) in a
nearby thread [1].

Yeah. My own measurements do not seem to support the idea that the
variance recently increased, but I haven't tested incredibly widely.
It may be that whatever is causing the variance is something that used
to be hidden by locking bottlenecks and now no longer is.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers