Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

Started by Alexey Kondratovover 5 years ago8 messageshackers
Jump to latest
#1Alexey Kondratov
a.kondratov@postgrespro.ru

Hi Hackers,

Today I have accidentally noticed that autoprewarm feature of pg_prewarm
used TimestampDifference()'s results in a wrong way.

First, it used *seconds* result from it as a *milliseconds*. It was
causing it to make dump file autoprewarm.blocks ~every second with
default setting of autoprewarm_interval = 300s.

Here is a log part with debug output in this case:

```
2020-11-09 19:09:00.162 MSK [85328] LOG: dumping autoprewarm.blocks
2020-11-09 19:09:01.161 MSK [85328] LOG: dumping autoprewarm.blocks
2020-11-09 19:09:02.160 MSK [85328] LOG: dumping autoprewarm.blocks
2020-11-09 19:09:03.159 MSK [85328] LOG: dumping autoprewarm.blocks
```

After fixing this issue I have noticed that it still dumps blocks twice
at each timeout (here I set autoprewarm_interval to 15s):

```
2020-11-09 19:18:59.692 MSK [85662] LOG: dumping autoprewarm.blocks
2020-11-09 19:18:59.700 MSK [85662] LOG: dumping autoprewarm.blocks

2020-11-09 19:19:14.694 MSK [85662] LOG: dumping autoprewarm.blocks
2020-11-09 19:19:14.704 MSK [85662] LOG: dumping autoprewarm.blocks
```

This happens because at timeout time we were using continue, but
actually we still have to wait the entire autoprewarm_interval after
successful dumping.

I have fixed both issues in the attached patches and also added a
minimalistic tap test as a first one to verify that this automatic
damping still works after refactoring. I put Robert into CC, since he is
an author of this feature.

What do you think?

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Attachments:

v1-0003-pg_prewarm-refactor-autoprewarm-waits.patchtext/x-diff; name=v1-0003-pg_prewarm-refactor-autoprewarm-waits.patchDownload+3-2
v1-0002-pg_prewarm-fix-autoprewarm_interval-behaviour.patchtext/x-diff; name=v1-0002-pg_prewarm-fix-autoprewarm_interval-behaviour.patchDownload+1-2
v1-0001-pg_prewarm-add-tap-test-for-autoprewarm-feature.patchtext/x-diff; name=v1-0001-pg_prewarm-add-tap-test-for-autoprewarm-feature.patchDownload+53-1
#2Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Alexey Kondratov (#1)
Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

Hi,

On Mon, Nov 9, 2020 at 10:15 PM Alexey Kondratov <a.kondratov@postgrespro.ru>
wrote:

Hi Hackers,

Today I have accidentally noticed that autoprewarm feature of pg_prewarm
used TimestampDifference()'s results in a wrong way.

First, it used *seconds* result from it as a *milliseconds*. It was
causing it to make dump file autoprewarm.blocks ~every second with
default setting of autoprewarm_interval = 300s.

I had a look at this, and I agree that this is an issue. I also had a look
at
the patch 0002, and the patch looks good to me.

In patch 0003 there is a typo:

+ /* We have to sleep even after a successfull dump */

s/successfull/successful

Regards,
Jeevan Ladhe

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexey Kondratov (#1)
Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

Alexey Kondratov <a.kondratov@postgrespro.ru> writes:

After fixing this issue I have noticed that it still dumps blocks twice
at each timeout (here I set autoprewarm_interval to 15s):
...
This happens because at timeout time we were using continue, but
actually we still have to wait the entire autoprewarm_interval after
successful dumping.

I don't think your 0001 is correct. It would be okay if apw_dump_now()
could be counted on to take negligible time, but we shouldn't assume
that should we?

I agree that the "continue" seems a bit bogus, because it's skipping
the ResetLatch call at the bottom of the loop; it's not quite clear
to me whether that's a good thing or not. But the general idea of
the existing code seems to be to loop around and make a fresh calculation
of how-long-to-wait, and that doesn't seem wrong.

0002 seems like a pretty clear bug fix, though I wonder if this is exactly
what we want to do going forward. It seems like a very large fraction of
the callers of TimestampDifference would like to have the value in msec,
which means we're doing a whole lot of expensive and error-prone
arithmetic to break down the difference to sec/usec and then put it
back together again. Let's get rid of that by inventing, say
TimestampDifferenceMilliseconds(...).

BTW, I see another bug of a related ilk. Look what
postgres_fdw/connection.c is doing:

TimestampDifference(now, endtime, &secs, &microsecs);

/* To protect against clock skew, limit sleep to one minute. */
cur_timeout = Min(60000, secs * USECS_PER_SEC + microsecs);

/* Sleep until there's something to do */
wc = WaitLatchOrSocket(MyLatch,
WL_LATCH_SET | WL_SOCKET_READABLE |
WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
PQsocket(conn),
cur_timeout, PG_WAIT_EXTENSION);

WaitLatchOrSocket's timeout is measured in msec not usec. I think the
comment about "clock skew" is complete BS, and the Min() calculation was
put in as a workaround by somebody observing that the sleep waited too
long, but not understanding why.

regards, tom lane

#4Alexey Kondratov
a.kondratov@postgrespro.ru
In reply to: Tom Lane (#3)
Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

On 2020-11-09 21:53, Tom Lane wrote:

Alexey Kondratov <a.kondratov@postgrespro.ru> writes:

After fixing this issue I have noticed that it still dumps blocks
twice
at each timeout (here I set autoprewarm_interval to 15s):
...
This happens because at timeout time we were using continue, but
actually we still have to wait the entire autoprewarm_interval after
successful dumping.

I don't think your 0001 is correct. It would be okay if apw_dump_now()
could be counted on to take negligible time, but we shouldn't assume
that should we?

Yes, it seems so, if I understand you correctly. I had a doubt about
possibility of pg_ctl to exit earlier than a dumping process. Now I
added an explicit wait for dump file into test.

I agree that the "continue" seems a bit bogus, because it's skipping
the ResetLatch call at the bottom of the loop; it's not quite clear
to me whether that's a good thing or not. But the general idea of
the existing code seems to be to loop around and make a fresh
calculation
of how-long-to-wait, and that doesn't seem wrong.

I have left the last patch intact, since it resolves the 'double dump'
issue, but I agree with нщгк point about existing logic of the code,
although it is a bit broken. So I have to think more about how to fix it
in a better way.

0002 seems like a pretty clear bug fix, though I wonder if this is
exactly
what we want to do going forward. It seems like a very large fraction
of
the callers of TimestampDifference would like to have the value in
msec,
which means we're doing a whole lot of expensive and error-prone
arithmetic to break down the difference to sec/usec and then put it
back together again. Let's get rid of that by inventing, say
TimestampDifferenceMilliseconds(...).

Yeah, I get into this problem after a bug in another extension —
pg_wait_sampling. I have attached 0002, which implements
TimestampDifferenceMilliseconds(), so 0003 just uses this new function
to solve the initial issues. If it looks good to you, then we can switch
all similar callers to it.

BTW, I see another bug of a related ilk. Look what
postgres_fdw/connection.c is doing:

TimestampDifference(now, endtime, &secs, &microsecs);

/* To protect against clock skew, limit sleep to one
minute. */
cur_timeout = Min(60000, secs * USECS_PER_SEC +
microsecs);

/* Sleep until there's something to do */
wc = WaitLatchOrSocket(MyLatch,
WL_LATCH_SET |
WL_SOCKET_READABLE |
WL_TIMEOUT |
WL_EXIT_ON_PM_DEATH,
PQsocket(conn),
cur_timeout, PG_WAIT_EXTENSION);

WaitLatchOrSocket's timeout is measured in msec not usec. I think the
comment about "clock skew" is complete BS, and the Min() calculation
was
put in as a workaround by somebody observing that the sleep waited too
long, but not understanding why.

I wonder how many troubles one can get with all these unit conversions.

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Attachments:

v2-0004-pg_prewarm-refactor-autoprewarm-waits.patchtext/x-diff; name=v2-0004-pg_prewarm-refactor-autoprewarm-waits.patchDownload+3-2
v2-0003-pg_prewarm-fix-autoprewarm_interval-behaviour.patchtext/x-diff; name=v2-0003-pg_prewarm-fix-autoprewarm_interval-behaviour.patchDownload+3-6
v2-0002-Implement-TimestampDifferenceMilliseconds.patchtext/x-diff; name=v2-0002-Implement-TimestampDifferenceMilliseconds.patchDownload+22-1
v2-0001-pg_prewarm-add-tap-test-for-autoprewarm-feature.patchtext/x-diff; name=v2-0001-pg_prewarm-add-tap-test-for-autoprewarm-feature.patchDownload+61-1
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexey Kondratov (#4)
Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

Alexey Kondratov <a.kondratov@postgrespro.ru> writes:

On 2020-11-09 21:53, Tom Lane wrote:

0002 seems like a pretty clear bug fix, though I wonder if this is exactly
what we want to do going forward. It seems like a very large fraction of
the callers of TimestampDifference would like to have the value in msec,
which means we're doing a whole lot of expensive and error-prone
arithmetic to break down the difference to sec/usec and then put it
back together again. Let's get rid of that by inventing, say
TimestampDifferenceMilliseconds(...).

Yeah, I get into this problem after a bug in another extension —
pg_wait_sampling. I have attached 0002, which implements
TimestampDifferenceMilliseconds(), so 0003 just uses this new function
to solve the initial issues. If it looks good to you, then we can switch
all similar callers to it.

Yeah, let's move forward with that --- in fact, I'm inclined to
back-patch it. (Not till the current release cycle is done, though.
I don't find this important enough to justify a last-moment patch.)

BTW, I wonder if we shouldn't make TimestampDifferenceMilliseconds
round any fractional millisecond up rather than down. Rounding down
seems to create a hazard of uselessly waking just before the delay is
completed. Better to wake just after.

I still think your 0001 is fishy, but don't have time today to stare at
it more closely.

regards, tom lane

#6Alexey Kondratov
a.kondratov@postgrespro.ru
In reply to: Tom Lane (#5)
Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

On 2020-11-09 23:25, Tom Lane wrote:

Alexey Kondratov <a.kondratov@postgrespro.ru> writes:

On 2020-11-09 21:53, Tom Lane wrote:

0002 seems like a pretty clear bug fix, though I wonder if this is
exactly
what we want to do going forward. It seems like a very large
fraction of
the callers of TimestampDifference would like to have the value in
msec,
which means we're doing a whole lot of expensive and error-prone
arithmetic to break down the difference to sec/usec and then put it
back together again. Let's get rid of that by inventing, say
TimestampDifferenceMilliseconds(...).

Yeah, I get into this problem after a bug in another extension —
pg_wait_sampling. I have attached 0002, which implements
TimestampDifferenceMilliseconds(), so 0003 just uses this new function
to solve the initial issues. If it looks good to you, then we can
switch
all similar callers to it.

Yeah, let's move forward with that --- in fact, I'm inclined to
back-patch it. (Not till the current release cycle is done, though.
I don't find this important enough to justify a last-moment patch.)

BTW, I wonder if we shouldn't make TimestampDifferenceMilliseconds
round any fractional millisecond up rather than down. Rounding down
seems to create a hazard of uselessly waking just before the delay is
completed. Better to wake just after.

Yes, it make sense. I have changed TimestampDifferenceMilliseconds() to
round result up if there is a reminder.

After looking on the autoprewarm code more closely I have realised that
this 'double dump' issues was not an issues at all. I have just
misplaced a debug elog(), so its second output in the log was only
indicating that we calculated delay_in_ms one more time. Actually, even
with wrong calculation of delay_in_ms the only problem was that we were
busy looping with ~1 second interval instead of waiting on latch.

It is still a buggy behaviour, but much less harmful than I have
originally thought.

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Attachments:

v3-0003-pg_prewarm-add-tap-test-for-autoprewarm-feature.patchtext/x-diff; name=v3-0003-pg_prewarm-add-tap-test-for-autoprewarm-feature.patchDownload+61-1
v3-0002-pg_prewarm-fix-autoprewarm_interval-behaviour.patchtext/x-diff; name=v3-0002-pg_prewarm-fix-autoprewarm_interval-behaviour.patchDownload+3-6
v3-0001-Implement-TimestampDifferenceMilliseconds.patchtext/x-diff; name=v3-0001-Implement-TimestampDifferenceMilliseconds.patchDownload+34-1
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexey Kondratov (#6)
Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

Alexey Kondratov <a.kondratov@postgrespro.ru> writes:

After looking on the autoprewarm code more closely I have realised that
this 'double dump' issues was not an issues at all. I have just
misplaced a debug elog(), so its second output in the log was only
indicating that we calculated delay_in_ms one more time.

Ah --- that explains why I couldn't see a problem.

I've pushed 0001+0002 plus some followup work to fix other places
that could usefully use TimestampDifferenceMilliseconds(). I have
not done anything with 0003 (the TAP test for pg_prewarm), and will
leave that to the judgment of somebody who's worked with pg_prewarm
before. To me it looks like it's not really testing things very
carefully at all; on the other hand, we have exactly zero test
coverage of that module today, so maybe something is better than
nothing.

regards, tom lane

#8Alexey Kondratov
a.kondratov@postgrespro.ru
In reply to: Tom Lane (#7)
Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

On 2020-11-11 06:59, Tom Lane wrote:

Alexey Kondratov <a.kondratov@postgrespro.ru> writes:

After looking on the autoprewarm code more closely I have realised
that
this 'double dump' issues was not an issues at all. I have just
misplaced a debug elog(), so its second output in the log was only
indicating that we calculated delay_in_ms one more time.

Ah --- that explains why I couldn't see a problem.

I've pushed 0001+0002 plus some followup work to fix other places
that could usefully use TimestampDifferenceMilliseconds(). I have
not done anything with 0003 (the TAP test for pg_prewarm), and will
leave that to the judgment of somebody who's worked with pg_prewarm
before. To me it looks like it's not really testing things very
carefully at all; on the other hand, we have exactly zero test
coverage of that module today, so maybe something is better than
nothing.

Great, thank you for generalisation of the issue and working on it.

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company