Is the unfair lwlock behavior intended?

Started by Tsunakawa, Takayukiover 9 years ago16 messages
#1Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com

Hello,

I encountered a strange behavior of lightweight lock in PostgreSQL 9.2. That appears to apply to 9.6, too, as far as I examine the code. Could you tell me if the behavior is intended or needs fix?

Simply put, the unfair behavior is that waiters for exclusive mode are overtaken by share-mode lockers who arrive later.

PROBLEM
====================

Under a heavy read/write workload on a big machine with dozens of CPUs and hundreds of GBs of RAM, psql sometimes took more than 30 seconds to connect to the database (and actually, it failed to connect due to our connect_timeout setting.) The backend corresponding to the psql was waiting to acquire exclusive mode lock on ProcArrayLock. Some other backends took more than 10 seconds to commit their transactions, waiting for exclusive mode lock on ProcArrayLock.

At that time, many backend processes (I forgot the number) were acquiring and releasing share mode lock on ProcArrayLock, most of which were from TransactionIsInProgress().

CAUSE
====================

Going into the 9.2 code, I realized that those who request share mode don't pay attention to the wait queue. That is, if some processes hold share mode lock and someone is waiting for exclusive mode in the wait queue, other processes who come later can get share mode overtaking those who are already waiting. If many processes repeatedly request share mode, the waiters can't get exclusive mode for a long time.

Is this intentional, or should we make the later share-lockers if someone is in the wait queue?

Regards
Takayuki Tsunakawa

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Alexander Korotkov
a.korotkov@postgrespro.ru
In reply to: Tsunakawa, Takayuki (#1)
Re: Is the unfair lwlock behavior intended?

Hi!

On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki <
tsunakawa.takay@jp.fujitsu.com> wrote:

I encountered a strange behavior of lightweight lock in PostgreSQL 9.2.
That appears to apply to 9.6, too, as far as I examine the code. Could you
tell me if the behavior is intended or needs fix?

Simply put, the unfair behavior is that waiters for exclusive mode are
overtaken by share-mode lockers who arrive later.

PROBLEM
====================

Under a heavy read/write workload on a big machine with dozens of CPUs and
hundreds of GBs of RAM, psql sometimes took more than 30 seconds to connect
to the database (and actually, it failed to connect due to our
connect_timeout setting.) The backend corresponding to the psql was
waiting to acquire exclusive mode lock on ProcArrayLock. Some other
backends took more than 10 seconds to commit their transactions, waiting
for exclusive mode lock on ProcArrayLock.

At that time, many backend processes (I forgot the number) were acquiring
and releasing share mode lock on ProcArrayLock, most of which were from
TransactionIsInProgress().

CAUSE
====================

Going into the 9.2 code, I realized that those who request share mode
don't pay attention to the wait queue. That is, if some processes hold
share mode lock and someone is waiting for exclusive mode in the wait
queue, other processes who come later can get share mode overtaking those
who are already waiting. If many processes repeatedly request share mode,
the waiters can't get exclusive mode for a long time.

Is this intentional, or should we make the later share-lockers if someone
is in the wait queue?

I've already observed such behavior, see [1]. I think that now there is no
consensus on how to fix that. For instance, Andres express opinion that
this shouldn't be fixed from LWLock side [2].
FYI, I'm planning to pickup work on CSN patch [3] for 10.0. CSN should fix
various scalability issues including high ProcArrayLock contention.

References.

1.
/messages/by-id/CAPpHfdsytkTFMy3N-zfSo+kAuUx=u-7JG6q2bYB6Fpuw2cD5DQ@mail.gmail.com
2.
/messages/by-id/20151211130413.GO14789@awork2.anarazel.de
3.
/messages/by-id/CA+CSw_tEpJ=md1zgxPkjH6CWDnTDft4gBi=+P9SnoC+Wy3pKdA@mail.gmail.com

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#3Ants Aasma
ants.aasma@eesti.ee
In reply to: Tsunakawa, Takayuki (#1)
Re: Is the unfair lwlock behavior intended?

On Tue, May 24, 2016 at 1:29 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

Is this intentional, or should we make the later share-lockers if someone
is in the wait queue?

I've already observed such behavior, see [1]. I think that now there is no
consensus on how to fix that. For instance, Andres express opinion that
this shouldn't be fixed from LWLock side [2].
FYI, I'm planning to pickup work on CSN patch [3] for 10.0. CSN should fix
various scalability issues including high ProcArrayLock contention.

Some amount of non-fairness is ok, but degrading to the point of
complete denial of service is not very graceful. I don't think it's
realistic to hope that all lwlock contention issues will be fixed any
time soon. Some fallback mechanism would be extremely nice until then.

It seems to me that we could fix complete starvation with no
noticeable effect on common cases. The general idea would be to have a
separate flag bit for starving exclusive lockers. When lock is taken
in shared mode an exclusive locker puts itself on the wait queue like
it does now, and then proceeds to wait on its semaphore. But
differently from now, the semaphore wait has a (configurable?) timeout
after which the exclusive locker wakes up and sets the new
LW_FLAG_STARVING flag. When this flag is set new shared lockers will
now consider the lock taken and will queue up to wait. The exclusive
locker that marked itself as starving will get a chance to run once
the existing shared lockers complete and wake it up. If there are
other nearly starved exclusive lockers they are likely to be at the
head of the wait queue and will also get a chance to run. If the wait
queue is empty or has a shared locker at the head the starvation flag
gets reset and all of the queued up shared lockers get their turn.

With the proposed scheme non-starved cases will execute pretty much
exactly what they do now, except for a slightly different check for
shared lock availability. In starvation scenarios the exclusive
lockers will get a chance to run once per starvation grace period.
That might still not be ideal or "fair", but it is a lot better than
the status quo of indefinitely blocking.

PS: if/when you are picking up the CSN work, ping me to write up some
of the insights and ideas I have had while thinking about this topic.

Regards,
Ants Aasma

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Ants Aasma
ants.aasma@eesti.ee
In reply to: Ants Aasma (#3)
Re: Is the unfair lwlock behavior intended?

On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

I encountered a strange behavior of lightweight lock in PostgreSQL 9.2. That appears to apply to 9.6, too, as far as I examine the code. Could you tell me if the behavior is intended or needs fix?

Simply put, the unfair behavior is that waiters for exclusive mode are overtaken by share-mode lockers who arrive later.

9.5 had significant LWLock scalability improvements. This might
improve performance enough so that exclusive lockers don't get
completely starved. It would be helpful if you could test if it's
still possible to trigger starvation with the new code.

Regards,
Ants Aasma

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Robert Haas
robertmhaas@gmail.com
In reply to: Ants Aasma (#4)
Re: Is the unfair lwlock behavior intended?

On Tue, May 24, 2016 at 4:39 PM, Ants Aasma <ants.aasma@eesti.ee> wrote:

On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

I encountered a strange behavior of lightweight lock in PostgreSQL 9.2. That appears to apply to 9.6, too, as far as I examine the code. Could you tell me if the behavior is intended or needs fix?

Simply put, the unfair behavior is that waiters for exclusive mode are overtaken by share-mode lockers who arrive later.

9.5 had significant LWLock scalability improvements. This might
improve performance enough so that exclusive lockers don't get
completely starved. It would be helpful if you could test if it's
still possible to trigger starvation with the new code.

9.5 didn't just increase the scalability; it also whacked the fairness
aspects of this code around.

Author: Andres Freund <andres@anarazel.de>
Branch: master Release: REL9_5_BR [7882c3b0b] 2014-12-25 17:24:30 +0100

Convert the PGPROC->lwWaitLink list into a dlist instead of open coding it.

Besides being shorter and much easier to read it changes the logic in
LWLockRelease() to release all shared lockers when waking up any. This
can yield some significant performance improvements - and the fairness
isn't really much worse than before, as we always allowed new shared
lockers to jump the queue.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#5)
Re: Is the unfair lwlock behavior intended?

On 2016-05-24 17:20:48 -0400, Robert Haas wrote:

On Tue, May 24, 2016 at 4:39 PM, Ants Aasma <ants.aasma@eesti.ee> wrote:

On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

I encountered a strange behavior of lightweight lock in PostgreSQL 9.2. That appears to apply to 9.6, too, as far as I examine the code. Could you tell me if the behavior is intended or needs fix?

Simply put, the unfair behavior is that waiters for exclusive mode are overtaken by share-mode lockers who arrive later.

Are you sure you're actually queued behind share locks, and not
primarily behind the lwlock's spinlocks? The latter is what I've seen in
similar cases.

9.5 had significant LWLock scalability improvements. This might
improve performance enough so that exclusive lockers don't get
completely starved. It would be helpful if you could test if it's
still possible to trigger starvation with the new code.

9.5 didn't just increase the scalability; it also whacked the fairness
aspects of this code around.

True, but the difference isn't that big. As the commit says:

and the fairness isn't really much worse than before, as we always
allowed new shared lockers to jump the queue.

if you have lots of incoming locks, as it appears to be the case for th
OP, the fact that queued share locks are woken up earlier doesn't make
much of a difference.

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Peter Geoghegan
pg@heroku.com
In reply to: Ants Aasma (#3)
Re: Is the unfair lwlock behavior intended?

On Tue, May 24, 2016 at 1:38 PM, Ants Aasma <ants.aasma@eesti.ee> wrote:

I've already observed such behavior, see [1]. I think that now there is no
consensus on how to fix that. For instance, Andres express opinion that
this shouldn't be fixed from LWLock side [2].
FYI, I'm planning to pickup work on CSN patch [3] for 10.0. CSN should fix
various scalability issues including high ProcArrayLock contention.

Some amount of non-fairness is ok, but degrading to the point of
complete denial of service is not very graceful. I don't think it's
realistic to hope that all lwlock contention issues will be fixed any
time soon. Some fallback mechanism would be extremely nice until then.

Jim Gray's paper on the "Convoy phenomenon" remains relevant, decades later:

http://www.msr-waypoint.com/en-us/um/people/gray/papers/Convoy%20Phenomenon%20RJ%202516.pdf

I could believe that there's a case to be made for per-LWLock fairness
characteristics, which may be roughly what Andres meant.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#7)
Re: Is the unfair lwlock behavior intended?

On 2016-05-24 15:34:31 -0700, Peter Geoghegan wrote:

On Tue, May 24, 2016 at 1:38 PM, Ants Aasma <ants.aasma@eesti.ee> wrote:

I've already observed such behavior, see [1]. I think that now there is no
consensus on how to fix that. For instance, Andres express opinion that
this shouldn't be fixed from LWLock side [2].
FYI, I'm planning to pickup work on CSN patch [3] for 10.0. CSN should fix
various scalability issues including high ProcArrayLock contention.

Some amount of non-fairness is ok, but degrading to the point of
complete denial of service is not very graceful. I don't think it's
realistic to hope that all lwlock contention issues will be fixed any
time soon. Some fallback mechanism would be extremely nice until then.

Jim Gray's paper on the "Convoy phenomenon" remains relevant, decades later:

http://www.msr-waypoint.com/en-us/um/people/gray/papers/Convoy%20Phenomenon%20RJ%202516.pdf

I could believe that there's a case to be made for per-LWLock fairness
characteristics, which may be roughly what Andres meant.

The problem is that half-way fair locks, which are frequently acquired
both in shared and exclusive mode, have really bad throughput
characteristics on modern multi-socket systems. We mostly get away with
fair locking on object level (after considerable work re fast-path
locking), because nearly all access are non-conflicting. But
prohibiting any snapshot acquisitions when there's a single LW_EXCLUSIVE
ProcArrayLock waiter, can reduce throughput dramatically.

I don't think randomly processing the wait queue - which is what the
quoted paper essentially describes - is really useful here. We
intentionally *ignore* the wait queue entirely if a lock is not
conflicting, and that's what can prohibit exclusive locks from ever
succeeding, because you essentially can get repetitions of:

S1: acq(SHARED) -> shared = 1
S2: acq(EXCLUSIVE) -> shared = 1, waiters = 1 <block>
...
S3: acq(SHARED) -> shared = 2
S1: rel(SHARED) -> shared = 1
S1: acq(SHARED) -> shared = 2
S3: rel(SHARED) -> shared = 1
...

Now we potentially could mark individual lwlocks as being fair
locks. But which ones would those be? Certainly not ProcArrayLock, it's
way too heavily contended.

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Peter Geoghegan
pg@heroku.com
In reply to: Andres Freund (#8)
Re: Is the unfair lwlock behavior intended?

On Tue, May 24, 2016 at 3:50 PM, Andres Freund <andres@anarazel.de> wrote:

Jim Gray's paper on the "Convoy phenomenon" remains relevant, decades later:

http://www.msr-waypoint.com/en-us/um/people/gray/papers/Convoy%20Phenomenon%20RJ%202516.pdf

I could believe that there's a case to be made for per-LWLock fairness
characteristics, which may be roughly what Andres meant.

The problem is that half-way fair locks, which are frequently acquired
both in shared and exclusive mode, have really bad throughput
characteristics on modern multi-socket systems. We mostly get away with
fair locking on object level (after considerable work re fast-path
locking), because nearly all access are non-conflicting. But
prohibiting any snapshot acquisitions when there's a single LW_EXCLUSIVE
ProcArrayLock waiter, can reduce throughput dramatically.

That's a useful example of where things are different for different,
well known cases.

I don't think randomly processing the wait queue - which is what the
quoted paper essentially describes - is really useful here. We
intentionally *ignore* the wait queue entirely if a lock is not
conflicting, and that's what can prohibit exclusive locks from ever
succeeding, because you essentially can get repetitions of:

My point about the paper is that it made me reconsider my assumption
that fairness, according to some definition, was essential for
something like LWLocks. The paper made me aware that it's perfectly
acceptable practice to make informed trade-offs around fairness for
what it calls "latches" (what we'd call LWLocks). Maybe that was
obvious to you, but it wasn't to me.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Andres Freund (#8)
Re: Is the unfair lwlock behavior intended?

From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Alexander Korotkov
I've already observed such behavior, see [1]. I think that now there is no consensus on how to fix that. For instance, Andres express opinion that this shouldn't be fixed from LWLock side [2].

Thank you for nice pointers. I understood.

From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Ants Aasma
9.5 had significant LWLock scalability improvements. This might
improve performance enough so that exclusive lockers don't get
completely starved. It would be helpful if you could test if it's
still possible to trigger starvation with the new code.

Unfortunately, we cannot test anymore because the customer's system is now in production. The heavy ProcArray contention was caused mainly by too many tuple visibility tests, which in turn were caused by unintended sequential scans. Then the customer avoided the contention problem by adding an index and reducing the number of concurrent active sessions.

From: Andres Freund [mailto:andres@anarazel.de]
Are you sure you're actually queued behind share locks, and not primarily
behind the lwlock's spinlocks? The latter is what I've seen in similar cases.

I think so, because the stack trace showed that the backends were waiting in TransactionIsInProgress (or some function in the commit processing) -> LWLockAcquire -> PGSemaphoreLock -> semop(), not including spinlock-related functions.

The problem is that half-way fair locks, which are frequently acquired both
in shared and exclusive mode, have really bad throughput characteristics
on modern multi-socket systems. We mostly get away with fair locking on
object level (after considerable work re fast-path locking), because nearly
all access are non-conflicting. But prohibiting any snapshot acquisitions
when there's a single LW_EXCLUSIVE ProcArrayLock waiter, can reduce
throughput dramatically.

Thanks, I understood that you chose total throughput over stable response time. I feel empathetic with the decision, and I think it's the way to go.

OTOH, maybe I'll object if I'm the pitiful waiter... I'll get out of the Disneyland if their staff said "Please stay in the line as long as there are efficient guests behind you. That's the benefit for the whole Disneyland."

Regards
Takayuki Tsunakawa

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#8)
Re: Is the unfair lwlock behavior intended?

On Tue, May 24, 2016 at 6:50 PM, Andres Freund <andres@anarazel.de> wrote:

Now we potentially could mark individual lwlocks as being fair
locks. But which ones would those be? Certainly not ProcArrayLock, it's
way too heavily contended.

I think this is looking at the problem from the wrong angle. The OP's
complaint is pretty fair: a 30-second wait for ProcArrayLock is
horrendous, and if that's actually something that is happening with
any significant regularity on well-configured systems, we need to fix
it somehow. Whether FIFO queueing on LWLocks is the right way to fix
it is a separate question, and I agree that the answer is probably
"no" ... although it does strike me that Amit's work on group XID
clearing might make that a lot more palatable, since you won't
normally have more than one exclusive waiter in the queue at a time.
But regardless of that, I don't think we can just say "oh, well,
sometimes the system becomes totally unresponsive for more than 30
seconds, but we don't care". We have to care about that.

What I think we need to understand better from is Tsunakawa-san is (1)
whether this behavior still occurs in 9.6 and (2) what sort of test
case is required to produce it. Depending on how extreme that test
case is, we can decide how much of a problem we think this is. And we
can test potential solutions to see whether and to what degree they
address it, as well as how they affect throughput.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#11)
Re: Is the unfair lwlock behavior intended?

On 2016-05-25 14:09:43 -0400, Robert Haas wrote:

I think this is looking at the problem from the wrong angle. The OP's
complaint is pretty fair: a 30-second wait for ProcArrayLock is
horrendous, and if that's actually something that is happening with
any significant regularity on well-configured systems, we need to fix
it somehow.

No disagreement there.

I don't think we can just say "oh, well,
sometimes the system becomes totally unresponsive for more than 30
seconds, but we don't care". We have to care about that.

I don't think anybody was doing that? The first questions on this thread
were about upgrading and retesting...

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#12)
Re: Is the unfair lwlock behavior intended?

On 2016-05-25 11:15:37 -0700, Andres Freund wrote:

On 2016-05-25 14:09:43 -0400, Robert Haas wrote:
I don't think anybody was doing that? The first questions on this thread
were about upgrading and retesting...

Something I've repeatedly wondered about around this topic is whether we
could split ProcArrayLock into one that governs entering or leaving the
procarray from the one that's for consistent snapshots. I think there's
no need for ProcArrayAdd/ProcArrayRemove/CountDBBackends()/CancelDBBackends()/
CountUserBackends()/CountOtherDBBackends() (and potentially some more)
to conflict with GetSnapshotData()/ProcArrayEndTransaction()/
TransactionIdIsInProgress()/TransactionIdIsActive()/GetOldestXmin()/...
as long as we're careful to ensure that by the time a entry is removed
ProcArrayEndTransaction() has been called.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Andres Freund
andres@anarazel.de
In reply to: Tsunakawa, Takayuki (#1)
Re: Is the unfair lwlock behavior intended?

Hi,

On 2016-05-24 06:03:07 +0000, Tsunakawa, Takayuki wrote:

At that time, many backend processes (I forgot the number) were acquiring and releasing share mode lock on ProcArrayLock, most of which were from TransactionIsInProgress().

FWIW, I suspect that 9.6 might be a fair bit better here, due to
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=8a7d0701814a4e293efad22091d6f6fb441bbe1c

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#13)
Re: Is the unfair lwlock behavior intended?

On Wed, May 25, 2016 at 3:26 PM, Andres Freund <andres@anarazel.de> wrote:

On 2016-05-25 11:15:37 -0700, Andres Freund wrote:

On 2016-05-25 14:09:43 -0400, Robert Haas wrote:
I don't think anybody was doing that? The first questions on this thread
were about upgrading and retesting...

Something I've repeatedly wondered about around this topic is whether we
could split ProcArrayLock into one that governs entering or leaving the
procarray from the one that's for consistent snapshots. I think there's
no need for ProcArrayAdd/ProcArrayRemove/CountDBBackends()/CancelDBBackends()/
CountUserBackends()/CountOtherDBBackends() (and potentially some more)
to conflict with GetSnapshotData()/ProcArrayEndTransaction()/
TransactionIdIsInProgress()/TransactionIdIsActive()/GetOldestXmin()/...
as long as we're careful to ensure that by the time a entry is removed
ProcArrayEndTransaction() has been called.

I'm doubtful about how much that would reduce contention, because when
I've used perf or inserted instrumentation to see which actual call
sides are the problem, it's always been entirely down to
GetSnapshotData() and ProcArrayEndTransaction(). However, I think it
might be worth doing anyway, because redesigning the whole mechanism
might be easier if that lock weren't doing so many only-semi-related
things.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#15)
Re: Is the unfair lwlock behavior intended?

On 2016-05-25 17:24:22 -0400, Robert Haas wrote:

On Wed, May 25, 2016 at 3:26 PM, Andres Freund <andres@anarazel.de> wrote:

On 2016-05-25 11:15:37 -0700, Andres Freund wrote:

On 2016-05-25 14:09:43 -0400, Robert Haas wrote:
I don't think anybody was doing that? The first questions on this thread
were about upgrading and retesting...

Something I've repeatedly wondered about around this topic is whether we
could split ProcArrayLock into one that governs entering or leaving the
procarray from the one that's for consistent snapshots. I think there's
no need for ProcArrayAdd/ProcArrayRemove/CountDBBackends()/CancelDBBackends()/
CountUserBackends()/CountOtherDBBackends() (and potentially some more)
to conflict with GetSnapshotData()/ProcArrayEndTransaction()/
TransactionIdIsInProgress()/TransactionIdIsActive()/GetOldestXmin()/...
as long as we're careful to ensure that by the time a entry is removed
ProcArrayEndTransaction() has been called.

I'm doubtful about how much that would reduce contention,

Yea, I don't think it'll usually will make a huge difference. It'd
likely have solved this specific complaint though; and it'd remove
nearly all other exclusive acquisitions of ProcArrayLock other than
ProcArrayEndTransaction().

However, I think it might be worth doing anyway, because redesigning
the whole mechanism might be easier if that lock weren't doing so many
only-semi-related things.

Very much agreed upon this. Having a distinct 'SnapshotLock', for
EndTransaction() and GetSnapshotData() would be a first step in making
the locking more granular.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers