can while loop in ClockSweepTick function be kind of infinite loop in some cases?
This question is about ClockSweepTick function and the code is below.
https://github.com/postgres/postgres/blob/24d2b2680a8d0e01b30ce8a41c4eb3b47aca5031/src/backend/storage/buffer/freelist.c#L146-L165
The value of expected, NBuffers, wrapped variable is fixed in the while
loop, so that when the value of expected variable is not equal to
StrategyControl->nextVictimBuffer, CAS operation fails and the while loop
will be run kind-of infinitely.
It is possible for this problem to occur when ClockSweepTick function is
concurrently called and nextVictimBuffer is incremented by other process
before CAS operation in the loop (ex: in this case, the value of expected
variable is NBuffers+1 while the value of nextVictimBuffer variable is
NBuffers+2. so CAS operation fails)
I think. `expected = originalVictim + 1;` line should be in while loop
(before acquiring spin lock) so that, even in the case above, expected
variable is incremented for each loop and CAS operation will be successful
at some point.
Is my understanding correct? If so, I will send PR for fixing this issue.
Thank you in advance
Hayato Shiba
Hi,
On 2023-01-11 01:25:06 +0900, 斯波隼斗 wrote:
This question is about ClockSweepTick function and the code is below.
https://github.com/postgres/postgres/blob/24d2b2680a8d0e01b30ce8a41c4eb3b47aca5031/src/backend/storage/buffer/freelist.c#L146-L165The value of expected, NBuffers, wrapped variable is fixed in the while
loop, so that when the value of expected variable is not equal to
StrategyControl->nextVictimBuffer, CAS operation fails and the while loop
will be run kind-of infinitely.
It is possible for this problem to occur when ClockSweepTick function is
concurrently called and nextVictimBuffer is incremented by other process
before CAS operation in the loop (ex: in this case, the value of expected
variable is NBuffers+1 while the value of nextVictimBuffer variable is
NBuffers+2. so CAS operation fails)
I think. `expected = originalVictim + 1;` line should be in while loop
(before acquiring spin lock) so that, even in the case above, expected
variable is incremented for each loop and CAS operation will be successful
at some point.
Is my understanding correct? If so, I will send PR for fixing this issue.
Yes, I think your understanding might be correct. Interesting that this
apparently has never occurred.
Yes, please send a patch.
Thanks,
Andres
On Tue, Jan 10, 2023 at 12:40 PM Andres Freund <andres@anarazel.de> wrote:
I think. `expected = originalVictim + 1;` line should be in while loop
(before acquiring spin lock) so that, even in the case above, expected
variable is incremented for each loop and CAS operation will be successful
at some point.
Is my understanding correct? If so, I will send PR for fixing this issue.Yes, I think your understanding might be correct. Interesting that this
apparently has never occurred.
Doesn't pg_atomic_compare_exchange_u32 set expected if it fails?
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
On 2023-01-10 13:11:35 -0500, Robert Haas wrote:
On Tue, Jan 10, 2023 at 12:40 PM Andres Freund <andres@anarazel.de> wrote:
I think. `expected = originalVictim + 1;` line should be in while loop
(before acquiring spin lock) so that, even in the case above, expected
variable is incremented for each loop and CAS operation will be successful
at some point.
Is my understanding correct? If so, I will send PR for fixing this issue.Yes, I think your understanding might be correct. Interesting that this
apparently has never occurred.Doesn't pg_atomic_compare_exchange_u32 set expected if it fails?
Indeed, so there's no problem.
I wonder if we should make ->nextVictimBuffer a 64bit atomic. At the time the
changes went in we didn't (or rather, couldn't) rely on it, but these days we
could. I think with a 64bit number we could get rid of ->completePasses and
just infer it from ->nextVictimBuffer/NBuffers, removing th need for the
spinlock. It's very unlikely that 64bit would ever wrap, and even if, it'd
just be a small inaccuracy in the assumed number of passes. OTOH, it's
doubtful the overflow handling / the spinlock matters performance wise.
Greetings,
Andres Freund
Hi, Thank you for your quick reply
I misunderstood the logic of pg_atomic_compare_exchange_u32, so the loop
cannot be infinite.
I wonder if we should make ->nextVictimBuffer a 64bit atomic. At the time
the changes went in we didn't (or rather, couldn't) rely on it, but these
days we could. I think with a 64bit number we could get rid of
->completePasses and just infer it from ->nextVictimBuffer/NBuffers,
removing th need for the spinlock. It's very unlikely that 64bit would
ever wrap, and even if, it'd just be a small inaccuracy in the assumed
number of passes. OTOH, it's doubtful the overflow handling / the spinlock
matters performance wise.
I'm not sure why 64 bit was not used at the time, so I'm concerned about
it.
but, except for it, you have a point and I completely agree with you. as
you have said, we should use 64 bit whose higher-order 32 bit indicates
completePasses, and should remove spinlock.
maybe we don't have to exceptionally worry about the overflow here mainly
because, even now, the completePasses can overflow and the possibility of
overflow may not be so different so that the 64 bit atomic operation is
better.
if overflow would happen, passes_delta variable in the function called by
bgwriter would be negative high value and it would lead to the failure of
assert. (the code is below
https://github.com/postgres/postgres/blob/d9d873bac67047cfacc9f5ef96ee488f2cb0f1c3/src/backend/storage/buffer/bufmgr.c#L2298-L2303
Do you send patch for the replacement with 64 bit? If you don't mind, I
would like to send patch. ( or is there some procedure before sending patch?
Thanks
hayato
2023年1月11日(水) 3:59 Andres Freund <andres@anarazel.de>:
Show quoted text
Hi,
On 2023-01-10 13:11:35 -0500, Robert Haas wrote:
On Tue, Jan 10, 2023 at 12:40 PM Andres Freund <andres@anarazel.de>
wrote:
I think. `expected = originalVictim + 1;` line should be in while
loop
(before acquiring spin lock) so that, even in the case above,
expected
variable is incremented for each loop and CAS operation will be
successful
at some point.
Is my understanding correct? If so, I will send PR for fixing thisissue.
Yes, I think your understanding might be correct. Interesting that this
apparently has never occurred.Doesn't pg_atomic_compare_exchange_u32 set expected if it fails?
Indeed, so there's no problem.
I wonder if we should make ->nextVictimBuffer a 64bit atomic. At the time
the
changes went in we didn't (or rather, couldn't) rely on it, but these days
we
could. I think with a 64bit number we could get rid of ->completePasses
and
just infer it from ->nextVictimBuffer/NBuffers, removing th need for the
spinlock. It's very unlikely that 64bit would ever wrap, and even if, it'd
just be a small inaccuracy in the assumed number of passes. OTOH, it's
doubtful the overflow handling / the spinlock matters performance wise.Greetings,
Andres Freund