Eager page freeze criteria clarification

Started by Melanie Plagemanover 2 years ago91 messageshackers
Jump to latest
#1Melanie Plageman
melanieplageman@gmail.com

Hi,

While hacking on the pruning and page-level freezing code and am a bit
confused by the test guarding eager freezing [1]https://github.com/postgres/postgres/blob/master/src/backend/access/heap/vacuumlazy.c#L1802:

/*
* Freeze the page when heap_prepare_freeze_tuple indicates that at least
* one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
* freeze when pruning generated an FPI, if doing so means that we set the
* page all-frozen afterwards (might not happen until final heap pass).
*/
if (pagefrz.freeze_required || tuples_frozen == 0 ||
(prunestate->all_visible && prunestate->all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{

I'm trying to understand the condition fpi_before !=
pgWalUsage.wal_fpi -- don't eager freeze if pruning emitted an FPI.

Is this test meant to guard against unnecessary freezing or to avoid
freezing when the cost is too high? That is, are we trying to
determine how likely it is that the page has been recently modified
and avoid eager freezing when it would be pointless (because the page
will soon be modified again)? Or are we trying to determine how likely
the freeze record is to emit an FPI and avoid eager freezing when it
isn't worth the cost? Or something else?

The commit message says:

Also teach VACUUM to trigger page-level freezing whenever it detects
that heap pruning generated an FPI. We'll have already written a large
amount of WAL just to do that much, so it's very likely a good idea to
get freezing out of the way for the page early.

And I found the thread where it was discussed [2]/messages/by-id/CAH2-Wzm_=mrWO+kUAJbR_gM_6RzpwVA8n8e4nh3dJGHdw_urew@mail.gmail.com. Several possible
explanations are mentioned in the thread.

But, the final rationale is still not clear to me. Could we add a
comment above the if condition specifying both:
a) what the test is a proxy for
b) the intended outcome (when do we expect to eager freeze)
And perhaps we could even describe a scenario where this heuristic is effective?

- Melanie

[1]: https://github.com/postgres/postgres/blob/master/src/backend/access/heap/vacuumlazy.c#L1802
[2]: /messages/by-id/CAH2-Wzm_=mrWO+kUAJbR_gM_6RzpwVA8n8e4nh3dJGHdw_urew@mail.gmail.com

In reply to: Melanie Plageman (#1)
Re: Eager page freeze criteria clarification

On Fri, Jul 28, 2023 at 11:13 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

if (pagefrz.freeze_required || tuples_frozen == 0 ||
(prunestate->all_visible && prunestate->all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{

I'm trying to understand the condition fpi_before !=
pgWalUsage.wal_fpi -- don't eager freeze if pruning emitted an FPI.

You mean "prunestate->all_visible && prunestate->all_frozen", which is
a condition of applying FPI-based eager freezing, but not traditional
lazy freezing?

Obviously, the immediate impact of that is that the FPI trigger
condition is not met unless we know for sure that the page will be
marked all-visible and all-frozen in the visibility map afterwards. A
page that's eligible to become all-visible will also be seen as
eligible to become all-frozen in the vast majority of cases, but there
are some rare and obscure cases involving MultiXacts that must be
considered here. There is little point in freezing early unless we
have at least some chance of not having to freeze the same page again
in the future (ideally forever).

There is generally no point in freezing most of the tuples on a page
when there'll still be one or more not-yet-eligible unfrozen tuples
that get left behind -- we might as well not bother with freezing at
all when we see a page where that'll be the outcome from freezing.
However, there is (once again) a need to account for rare and extreme
cases -- it still needs to be possible to do that. Specifically, we
may be forced to freeze a page that's left with some remaining
unfrozen tuples when VACUUM is fundamentally incapable of freezing
them due to its OldestXmin/removable cutoff being too old. That can
happen when VACUUM needs to freeze according to the traditional
age-based settings, and yet the OldestXmin/removable cutoff gets held
back (by a leaked replication slot or whatever).

(Actually, VACUUM FREEZE freeze will freeze only a subset of the
tuples from some heap pages far more often. VACUUM FREEZE seems like a
bad design to me, though -- it uses the most aggressive possible XID
cutoff for freezing when it should probably hold off on freezing those
individual pages where we determine that it makes little sense. We
need to focus more on physical pages and their costs, and less on XID
cutoffs.)

Is this test meant to guard against unnecessary freezing or to avoid
freezing when the cost is too high? That is, are we trying to
determine how likely it is that the page has been recently modified
and avoid eager freezing when it would be pointless (because the page
will soon be modified again)?

Sort of. This cost of freezing over time is weirdly nonlinear, so it's
hard to give a simple answer.

The justification for the FPI trigger optimization is that FPIs are
overwhelmingly the cost that really matters when it comes to freezing
(and vacuuming in general) -- so we might as well make the best out of
a bad situation when pruning happens to get an FPI. There can easily
be a 10x or more cost difference (measured in total WAL volume)
between freezing without an FPI and freezing with an FPI.

Or are we trying to determine how likely
the freeze record is to emit an FPI and avoid eager freezing when it
isn't worth the cost?

No, that's not something that we're doing right now (we just talked
about doing something like that). In 16 VACUUM just "makes the best
out of a bad situation" when an FPI was already required during
pruning. We have already "paid for the privilege" of writing some WAL
for the page at that point, so it's reasonable to not squander a
window of opportunity to avoid future FPIs in future VACUUM
operations, by freezing early.

We're "taking a chance" on being able to get freezing out of the way
early when an FPI triggers freezing. It's not guaranteed to work out
in each individual case, of course, but even if we assume it's fairly
unlikely to work out (which is very pessimistic) it's still very
likely a good deal.

This strategy (the 16 strategy of freezing eagerly because we already
got an FPI) seems safer than a strategy involving freezing eagerly
because we won't get an FPI as a result. If for no other reason than
this: with the approach in 16 we already know for sure that we'll have
written an FPI anyway. It's hard to imagine somebody being okay with
the FPIs, but not being okay with the other extra WAL.

But, the final rationale is still not clear to me. Could we add a
comment above the if condition specifying both:
a) what the test is a proxy for
b) the intended outcome (when do we expect to eager freeze)
And perhaps we could even describe a scenario where this heuristic is effective?

There are lots of scenarios where it'll be effective. I agree that
there is a need to document this stuff a lot better. I have a pending
doc patch that overhauls the user-facing docs in this area.

My latest draft is here:

/messages/by-id/CAH2-Wz=UUJz+MMb1AxFzz-HDA=1t1Fx_KmrdOVoPZXkpA-TFwg@mail.gmail.com
/messages/by-id/attachment/146830/routine-vacuuming.html

I've been meaning to get back to that, but other commitments have kept
me from it. I'd welcome your involvement with that effort.

--
Peter Geoghegan

#3Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#2)
Re: Eager page freeze criteria clarification

On Fri, Jul 28, 2023 at 3:00 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Fri, Jul 28, 2023 at 11:13 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

if (pagefrz.freeze_required || tuples_frozen == 0 ||
(prunestate->all_visible && prunestate->all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{

I'm trying to understand the condition fpi_before !=
pgWalUsage.wal_fpi -- don't eager freeze if pruning emitted an FPI.

You mean "prunestate->all_visible && prunestate->all_frozen", which is
a condition of applying FPI-based eager freezing, but not traditional
lazy freezing?

Ah no, a very key word in my sentence was off:
I meant "don't eager freeze *unless* pruning emitted an FPI"
I was asking about the FPI-trigger optimization specifically.
I'm clear on the all_frozen/all_visible criteria.

Is this test meant to guard against unnecessary freezing or to avoid
freezing when the cost is too high? That is, are we trying to
determine how likely it is that the page has been recently modified
and avoid eager freezing when it would be pointless (because the page
will soon be modified again)?

Sort of. This cost of freezing over time is weirdly nonlinear, so it's
hard to give a simple answer.

The justification for the FPI trigger optimization is that FPIs are
overwhelmingly the cost that really matters when it comes to freezing
(and vacuuming in general) -- so we might as well make the best out of
a bad situation when pruning happens to get an FPI. There can easily
be a 10x or more cost difference (measured in total WAL volume)
between freezing without an FPI and freezing with an FPI.

...

In 16 VACUUM just "makes the best
out of a bad situation" when an FPI was already required during
pruning. We have already "paid for the privilege" of writing some WAL
for the page at that point, so it's reasonable to not squander a
window of opportunity to avoid future FPIs in future VACUUM
operations, by freezing early.

We're "taking a chance" on being able to get freezing out of the way
early when an FPI triggers freezing. It's not guaranteed to work out
in each individual case, of course, but even if we assume it's fairly
unlikely to work out (which is very pessimistic) it's still very
likely a good deal.

This strategy (the 16 strategy of freezing eagerly because we already
got an FPI) seems safer than a strategy involving freezing eagerly
because we won't get an FPI as a result. If for no other reason than
this: with the approach in 16 we already know for sure that we'll have
written an FPI anyway. It's hard to imagine somebody being okay with
the FPIs, but not being okay with the other extra WAL.

I see. I don't have an opinion on the "best of a bad situation"
argument. Though, I think it is worth amending the comment in the code
to include this explanation.

But, ISTM that there should also be some independent heuristic to
determine whether or not it makes sense to freeze the page. That could
be related to whether or not it will be cheap to do so (in which case
we can check if we will have to emit an FPI as part of the freeze
record) or it could be related to whether or not the freezing is
likely to be pointless (we are likely to update the page again soon).

It sounds like it was discussed before, but I'd be interested in
revisiting it and happy to test out various ideas.

- Melanie

In reply to: Melanie Plageman (#3)
Re: Eager page freeze criteria clarification

On Fri, Jul 28, 2023 at 3:27 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I see. I don't have an opinion on the "best of a bad situation"
argument. Though, I think it is worth amending the comment in the code
to include this explanation.

I think that the user-facing docs should be completely overhauled to
make that clear, and reasonably accessible. It's hard to talk about in
code comments because it's something that can only be effective over
time, across multiple VACUUM operations.

But, ISTM that there should also be some independent heuristic to
determine whether or not it makes sense to freeze the page. That could
be related to whether or not it will be cheap to do so (in which case
we can check if we will have to emit an FPI as part of the freeze
record) or it could be related to whether or not the freezing is
likely to be pointless (we are likely to update the page again soon).

It sounds like you're interested in adding additional criteria to
trigger page-level freezing. That's something that the current
structure anticipates.

To give a slightly contrived example: it would be very easy to add
another condition, where (say) we call random() to determine whether
or not to freeze the page. You'd very likely want to gate this new
trigger criteria in the same way as the FPI criteria (i.e. only do it
when "prunestate->all_visible && prunestate->all_frozen" hold). We've
decoupled the decision to freeze from what it means to execute
freezing itself.

Having additional trigger criteria makes a lot of sense. I'm sure that
it makes sense to add at least one more, and it seems possible that
adding several more makes sense. Obviously, that will have to be new
work targeting 17, though. I made a decision to stop working on
VACUUM, though, so I'm afraid I won't be able to offer much help with
any of this. (Happy to give more background information, though.)

--
Peter Geoghegan

#5Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#2)
Re: Eager page freeze criteria clarification

On Fri, Jul 28, 2023 at 3:00 PM Peter Geoghegan <pg@bowt.ie> wrote:

Or are we trying to determine how likely
the freeze record is to emit an FPI and avoid eager freezing when it
isn't worth the cost?

No, that's not something that we're doing right now (we just talked
about doing something like that). In 16 VACUUM just "makes the best
out of a bad situation" when an FPI was already required during
pruning. We have already "paid for the privilege" of writing some WAL
for the page at that point, so it's reasonable to not squander a
window of opportunity to avoid future FPIs in future VACUUM
operations, by freezing early.

This doesn't make sense to me. It's true that if the pruning record
emitted an FPI, then the freezing record probably won't need to do so
either, unless by considerable ill-fortune the redo pointer was moved
in the tiny window between pruning and freezing. But isn't that also
true that if the pruning record *didn't* emit an FPI? In that case,
the pruning record wasn't the first WAL-logged modification to the
page during the then-current checkpoint cycle, and some earlier
modification already did it. But in that case also the freezing won't
need to emit a new FPI, except for the same identical case where the
redo pointer moves at just the wrong time.

Put differently, I can't see any reason to care whether pruning
emitted an FPI or not. Either way, it's very unlikely that freezing
needs to do so.

--
Robert Haas
EDB: http://www.enterprisedb.com

#6Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#5)
Re: Eager page freeze criteria clarification

Hi,

On 2023-07-28 16:19:47 -0400, Robert Haas wrote:

On Fri, Jul 28, 2023 at 3:00 PM Peter Geoghegan <pg@bowt.ie> wrote:

Or are we trying to determine how likely
the freeze record is to emit an FPI and avoid eager freezing when it
isn't worth the cost?

No, that's not something that we're doing right now (we just talked
about doing something like that). In 16 VACUUM just "makes the best
out of a bad situation" when an FPI was already required during
pruning. We have already "paid for the privilege" of writing some WAL
for the page at that point, so it's reasonable to not squander a
window of opportunity to avoid future FPIs in future VACUUM
operations, by freezing early.

This doesn't make sense to me. It's true that if the pruning record
emitted an FPI, then the freezing record probably won't need to do so
either, unless by considerable ill-fortune the redo pointer was moved
in the tiny window between pruning and freezing. But isn't that also
true that if the pruning record *didn't* emit an FPI? In that case,
the pruning record wasn't the first WAL-logged modification to the
page during the then-current checkpoint cycle, and some earlier
modification already did it. But in that case also the freezing won't
need to emit a new FPI, except for the same identical case where the
redo pointer moves at just the wrong time.

Put differently, I can't see any reason to care whether pruning
emitted an FPI or not. Either way, it's very unlikely that freezing
needs to do so.

+many

Using FPIs having been generated during pruning as part of the logic to decide
whether to freeze makes no sense to me. We likely need some heuristic here,
but I suspect it has to look quite different than the FPI condition.

Greetings,

Andres

In reply to: Andres Freund (#6)
Re: Eager page freeze criteria clarification

On Fri, Jul 28, 2023 at 4:30 PM Andres Freund <andres@anarazel.de> wrote:

Put differently, I can't see any reason to care whether pruning
emitted an FPI or not. Either way, it's very unlikely that freezing
needs to do so.

+many

Using FPIs having been generated during pruning as part of the logic to decide
whether to freeze makes no sense to me. We likely need some heuristic here,
but I suspect it has to look quite different than the FPI condition.

I think that it depends on how you frame it. It makes zero sense that
that's the only thing that can do something like this at all. It makes
some sense as an incremental improvement, since there is no way that
we can lose by too much, even if you make arbitrarily pessimistic
assumptions. In other words, you won't be able to come up with an
adversarial benchmark where this loses by too much.

As I said, I'm no longer interested in working on VACUUM (though I do
hope that Melanie or somebody else picks up where I left off). I have
nothing to say about any new work in this area. If you want me to do
something in the scope of the work on 16, as a release management
task, please be clear about what that is.

--
Peter Geoghegan

#8Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#6)
Re: Eager page freeze criteria clarification

On Fri, Jul 28, 2023 at 4:30 PM Andres Freund <andres@anarazel.de> wrote:

Using FPIs having been generated during pruning as part of the logic to decide
whether to freeze makes no sense to me. We likely need some heuristic here,
but I suspect it has to look quite different than the FPI condition.

Why do we need a heuristic at all?

What we're talking about here, AIUI, is under what circumstance we
should eagerly freeze a page that can be fully frozen, leaving no
tuples that vacuum ever needs to worry about again except if the page
is further modified. Freezing in such circumstances seems highly
likely to work out in our favor. The only argument against it is that
the page might soon be modified again, in which case the effort of
freezing would be mostly wasted. But I'm not sure whether that's
really a valid rationale, because the win from not having to vacuum
the page again if it isn't modified seems pretty big, and emitting a
freeze record for a page we've just pruned doesn't seem that
expensive. If it is a valid rationale, my first thought would be to
make the heuristic based on vacuum_freeze_min_age. XID age isn't a
particularly great proxy for whether the page is still actively being
modified, but at least it has tradition going for it.

--
Robert Haas
EDB: http://www.enterprisedb.com

#9Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#7)
Re: Eager page freeze criteria clarification

On Fri, Jul 28, 2023 at 4:49 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Fri, Jul 28, 2023 at 4:30 PM Andres Freund <andres@anarazel.de> wrote:

Put differently, I can't see any reason to care whether pruning
emitted an FPI or not. Either way, it's very unlikely that freezing
needs to do so.

+many

Using FPIs having been generated during pruning as part of the logic to decide
whether to freeze makes no sense to me. We likely need some heuristic here,
but I suspect it has to look quite different than the FPI condition.

I think that it depends on how you frame it. It makes zero sense that
that's the only thing that can do something like this at all. It makes
some sense as an incremental improvement, since there is no way that
we can lose by too much, even if you make arbitrarily pessimistic
assumptions. In other words, you won't be able to come up with an
adversarial benchmark where this loses by too much.

When you were working on this, what were the downsides of only having the
criteria that 1) page would be all_visible/all_frozen and 2) we did prune
something (i.e. no consideration of FPIs)?

As I said, I'm no longer interested in working on VACUUM (though I do
hope that Melanie or somebody else picks up where I left off). I have
nothing to say about any new work in this area. If you want me to do
something in the scope of the work on 16, as a release management
task, please be clear about what that is.

I'm only talking about this in the context of future work for 17.

- Melanie

In reply to: Melanie Plageman (#9)
Re: Eager page freeze criteria clarification

On Fri, Jul 28, 2023 at 5:03 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

When you were working on this, what were the downsides of only having the
criteria that 1) page would be all_visible/all_frozen and 2) we did prune
something (i.e. no consideration of FPIs)?

Conditioning freezing on "all_visible && all_frozen" seems likely to
always be a good idea when it comes to any sort of speculative trigger
criteria. Most of the wins in this area will come from avoiding future
FPIs, and you can't hope to do that unless you freeze everything on
the page. The cost of freezing when "all_visible && all_frozen" does
not hold may be quite low, but the benefits will also be very low.
Also, the way that recovery conflicts are generated for freeze records
somewhat depends on this right now -- though you could probably fix
that if you had to.

Note that we *don't* really limit ourselves to cases where an FPI was
generated from pruning, exactly -- even on 16. With page-level
checksums we can also generate an FPI just to set a hint bit. That
triggers freezing too, even on insert-only tables (assuming checksums
are enabled). I expect that that will be an important condition for
triggering freezing in practice.

Your question about 2 seems equivalent to "why not just always
freeze?". I don't think that that's a bad question -- quite the
opposite. Even trying to give an answer to this question would amount
to getting involved in new work on VACUUM, though. So I don't think I
can help you here.

--
Peter Geoghegan

#11Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#3)
Re: Eager page freeze criteria clarification

On Fri, Jul 28, 2023 at 3:27 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Fri, Jul 28, 2023 at 3:00 PM Peter Geoghegan <pg@bowt.ie> wrote:

Is this test meant to guard against unnecessary freezing or to avoid
freezing when the cost is too high? That is, are we trying to
determine how likely it is that the page has been recently modified
and avoid eager freezing when it would be pointless (because the page
will soon be modified again)?

Sort of. This cost of freezing over time is weirdly nonlinear, so it's
hard to give a simple answer.

The justification for the FPI trigger optimization is that FPIs are
overwhelmingly the cost that really matters when it comes to freezing
(and vacuuming in general) -- so we might as well make the best out of
a bad situation when pruning happens to get an FPI. There can easily
be a 10x or more cost difference (measured in total WAL volume)
between freezing without an FPI and freezing with an FPI.

...

In 16 VACUUM just "makes the best
out of a bad situation" when an FPI was already required during
pruning. We have already "paid for the privilege" of writing some WAL
for the page at that point, so it's reasonable to not squander a
window of opportunity to avoid future FPIs in future VACUUM
operations, by freezing early.

We're "taking a chance" on being able to get freezing out of the way
early when an FPI triggers freezing. It's not guaranteed to work out
in each individual case, of course, but even if we assume it's fairly
unlikely to work out (which is very pessimistic) it's still very
likely a good deal.

This strategy (the 16 strategy of freezing eagerly because we already
got an FPI) seems safer than a strategy involving freezing eagerly
because we won't get an FPI as a result. If for no other reason than
this: with the approach in 16 we already know for sure that we'll have
written an FPI anyway. It's hard to imagine somebody being okay with
the FPIs, but not being okay with the other extra WAL.

I see. I don't have an opinion on the "best of a bad situation"
argument. Though, I think it is worth amending the comment in the code
to include this explanation.

But, ISTM that there should also be some independent heuristic to
determine whether or not it makes sense to freeze the page. That could
be related to whether or not it will be cheap to do so (in which case
we can check if we will have to emit an FPI as part of the freeze
record) or it could be related to whether or not the freezing is
likely to be pointless (we are likely to update the page again soon).

It sounds like it was discussed before, but I'd be interested in
revisiting it and happy to test out various ideas.

Hi, in service of "testing various ideas", I've benchmarked the
heuristics proposed in this thread, as well a few others that Andres and
I considered, for determining whether or not to opportunistically freeze
a page during vacuum. Note that this heuristic would be in addition to
the existing criterion that we only opportunistically freeze pages that
can be subsequently set all frozen in the visibility map.

I believe that there are two goals that should dictate whether or not we
should perform opportunistic freezing:

1. Is it cheap? For example, if the buffer is already dirty, then no
write amplification occurs, since it must be written out anyway.
Freezing is also less expensive if we can do it without emitting an
FPI.

2. Will it be effective; that is, will it stay frozen?
Opportunistically freezing a page that will immediately be modified is
a waste.

The current heuristic on master meets neither of these goals: it freezes
a page if pruning emitted an FPI for it. This doesn't evaluate whether
or not freezing itself would be cheap, but rather attempts to hide
freezing behind an expensive operation. Furthermore, it often fails to
freeze cold data and may indiscriminately freeze hot data.

For the second goal, I've relied on past data to predict future
behavior, so I tried several criteria to estimate the likelihood that a
page will not be imminently modified. What was most effective was
Andres' suggestion of comparing the page LSN to the insert LSN at the
end of the last vacuum of that table; this approximates whether the page
has been recently modified, which is a decent proxy for whether it'll be
modified in the future. To do this, we need to save that insert LSN
somewhere. In the attached WIP patch, I saved it in the table stats, for
now -- knowing that those are not crash-safe.

Other discarded heuristic ideas included comparing the next transaction
ID at the end of the vacuum of a relation to the visibility cutoff xid
in the page -- but that wasn't effective for freezing data from bulk
loads.

The algorithms I evaluated all attempt to satisfy goal (1) by freezing
only if the buffer is already dirty and also by considering whether or
not an FPI would be emitted. Those that attempt to satisfy goal (2) do
so using the LSN comparison with varying thresholds. I ended up testing
master and the following five alternatives:

1. Dirty buffer, no FPI required

2. Dirty buffer, no FPI required OR page LSN is older than 10% of the
LSNs since the last vacuum of the table.

3. Dirty buffer, no FPI required AND page LSN is older than 10% of the
LSNs since the last vacuum of the table.

4. Dirty buffer, no FPI required OR page LSN is older than 33% of the
LSNs since the last vacuum of the table.

5. Dirty buffer, no FPI required AND page LSN is older than 33% of the
LSNs since the last vacuum of the table.

I ran several benchmarks and compared these based on two metrics:

1. Percentage of pages frozen at the end of the benchmark. For
workloads with a working set much smaller than their data set, this
metric should be high. Conversely, for workloads with a working set
that is more-or-less their entire data set, this metric should be low.

2. Page freezes per page frozen at the end of the benchmark. This
should be as low as possible. Since each benchmark starts with zero
frozen pages, a metric of 1 indicates that each frozen page was frozen
only once.

Some of the benchmarks were run for a fixed number of transactions.
Those that do not specify a number of transactions were run for 45
minutes. I collected metrics from OS utilities and Postgres statistics
to examine throughput, FPIs emitted, and many other performance metrics
over the course of the benchmark. Below, I've summarized the results and
pointed out any notable negative performance impacts.

Overall, the two algorithms that seem to strike the best balance are (4)
and (5).

The OR condition in algorithm (4), as you might expect, results in
freezing much more of the cold data in workloads with a smaller working
set than data set. It tends to cause more FPIs to be emitted, since the
age criteria alone can trigger freezing -- even when the freeze record
would contain an FPI. Though, these FPIs may happen when the cold data
is eventually frozen in a wraparound vacuum. That is, the absence of
FPIs tracks the absence of frozen data quite closely.

For a workload in which only 10% of the data is being updated, master
often freezes the wrong data and still emits FPIs. For this kind of
workload, algorithms 1 and 2 also did not perform well and emitted more
FPIs than the other algorithms. In my examples, I found that the 10%
cutoff froze data too aggressively -- freezing data that was modified
soon after.

The workloads I benchmarked were as follows:

A. gaussian tpcb-like + select-only:
pgbench scale 600 (DB < SB) with indexes on updated columns
WL 1: 2 clients tpcb-like pgbench with gaussian access distribution
WL 2: 16 clients select-only pgbench
freezing more is better

B. tpcb-like
pgbench scale 600, 16 clients
freezing less is better

C. shifting hot set, autovacuum off, vacuum at end
1 client inserting a single row and updating an indexed column of that
row. 2 million transactions.
freezing more is better

D. shifting hot set, delete old data
10 MB table with index on updated column
WL 1: 1 client inserting one row, updating that row
WL 2: 1 client, rate limited to 0.02 TPS, delete old data keeping
table at 5000 rows
freezing less is better

E. shifting hot set, delete new data, access new data
WL 1: 1 client cycling through 2 inserts of a single row each,
updating an indexed column of the most recently inserted row, and then
deleting that row
WL 2: rate-limited to 0.2 TPS, selecting data from the last 300
seconds
freezing more is better

F. many COPYs, autovacuum on
1 client, copying a total of 50 GB of data, autovacuum will run ~2x,
~2 checkpoints
freezing more is better

G. several COPYs, autovacuum off, vacuum at end
1 client, copying a total of 10 GB of data, no checkpoints
freezing more is better

H. append only table, autovacuum off, vacuum at end
1 client, inserting a single row at a time for 3million transactions
freezing more is better

I. work queue
1 client, inserting a row, sleep for half a second, delete that row
for 5000 transactions.
freezing less is better

Note that the page freezes/page frozen metric can be misleading when the
overall number of pages freezes is low. This is the case for master. It
did few page freezes but those tended to be pages that were modified
again soon after.

Page Freezes/Page Frozen (less is better)

| | Master | (1) | (2) | (3) | (4) | (5) |
|---+--------+---------+---------+---------+---------+---------|
| A | 28.50 | 3.89 | 1.08 | 1.15 | 1.10 | 1.10 |
| B | 1.00 | 1.06 | 1.65 | 1.03 | 1.59 | 1.00 |
| C | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| D | 2.00 | 5199.15 | 5276.85 | 4830.45 | 5234.55 | 2193.55 |
| E | 7.90 | 3.21 | 2.73 | 2.70 | 2.69 | 2.43 |
| F | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| G | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| H | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| I | N/A | 42.00 | 42.00 | N/A | 41.00 | N/A |

% Frozen at end of run

| | Master | (1) | (2) | (3) | (4) | (5) |
|---+--------+-----+-----+-----+------+-----+
| A | 0 | 1 | 99 | 0 | 81 | 0 |
| B | 71 | 96 | 99 | 3 | 98 | 2 |
| C | 0 | 9 | 100 | 6 | 92 | 5 |
| D | 0 | 1 | 1 | 1 | 1 | 1 |
| E | 0 | 63 | 100 | 68 | 100 | 67 |
| F | 0 | 5 | 14 | 6 | 14 | 5 |
| G | 0 | 100 | 100 | 92 | 100 | 67 |
| H | 0 | 11 | 100 | 9 | 86 | 5 |
| I | 0 | 100 | 100 | 0 | 100 | 0 |

I can provide exact pgbench commands, configurations, or detailed
results upon request.

- Melanie

#12Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#11)
Re: Eager page freeze criteria clarification

On Mon, Aug 28, 2023 at 10:00 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

For the second goal, I've relied on past data to predict future
behavior, so I tried several criteria to estimate the likelihood that a
page will not be imminently modified. What was most effective was
Andres' suggestion of comparing the page LSN to the insert LSN at the
end of the last vacuum of that table; this approximates whether the page
has been recently modified, which is a decent proxy for whether it'll be
modified in the future. To do this, we need to save that insert LSN
somewhere. In the attached WIP patch, I saved it in the table stats, for
now -- knowing that those are not crash-safe.

I wonder what the real plan here is for where to store this. It's not
obvious that we need this to be crash-safe; it's after all only for
use by a heuristic, and there's no actual breakage if the heuristic
goes wrong. At the same time, it doesn't exactly feel like a
statistic.

Then there's the question of whether it's the right metric. My first
reaction is to think that it sounds pretty good. One thing I really
like about it is that if the table is being vacuumed frequently, then
we freeze less aggressively, and if the table is being vacuumed
infrequently, then we freeze more aggressively. That seems like a very
desirable property. It also seems broadly good that this metric
doesn't really care about reads. If there are a lot of reads on the
system, or no reads at all, it doesn't really change the chances that
a certain page is going to be written again soon, and since reads
don't change the insert LSN, here again it seems to do the right
thing. I'm a little less clear about whether it's good that it doesn't
really depend on wall-clock time. Certainly, that's desirable from the
point of view of not wanting to have to measure wall-clock time in
places where we otherwise wouldn't have to, which tends to end up
being expensive. However, if I were making all of my freezing
decisions manually, I might be more freeze-positive on a low-velocity
system where writes are more stretched out across time than on a
high-velocity system where we're blasting through the LSN space at a
higher rate. But maybe that's not a very important consideration, and
I don't know what we'd do about it anyway.

Page Freezes/Page Frozen (less is better)

| | Master | (1) | (2) | (3) | (4) | (5) |
|---+--------+---------+---------+---------+---------+---------|
| A | 28.50 | 3.89 | 1.08 | 1.15 | 1.10 | 1.10 |
| B | 1.00 | 1.06 | 1.65 | 1.03 | 1.59 | 1.00 |
| C | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| D | 2.00 | 5199.15 | 5276.85 | 4830.45 | 5234.55 | 2193.55 |
| E | 7.90 | 3.21 | 2.73 | 2.70 | 2.69 | 2.43 |
| F | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| G | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| H | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| I | N/A | 42.00 | 42.00 | N/A | 41.00 | N/A |

Hmm. I would say that the interesting rows here are A, D, and I, with
rows C and E deserving honorable mention. In row A, master is bad. In
row D, your algorithms are all bad, really bad. I don't quite
understand how it can be that bad, actually. Row I looks bad for
algorithms 1, 2, and 4: they freeze pages because it looks cheap, but
the work doesn't really pay off.

% Frozen at end of run

| | Master | (1) | (2) | (3) | (4) | (5) |
|---+--------+-----+-----+-----+------+-----+
| A | 0 | 1 | 99 | 0 | 81 | 0 |
| B | 71 | 96 | 99 | 3 | 98 | 2 |
| C | 0 | 9 | 100 | 6 | 92 | 5 |
| D | 0 | 1 | 1 | 1 | 1 | 1 |
| E | 0 | 63 | 100 | 68 | 100 | 67 |
| F | 0 | 5 | 14 | 6 | 14 | 5 |
| G | 0 | 100 | 100 | 92 | 100 | 67 |
| H | 0 | 11 | 100 | 9 | 86 | 5 |
| I | 0 | 100 | 100 | 0 | 100 | 0 |

So all of the algorithms here, but especially 1, 2, and 4, freeze a
lot more often than master.

If I understand correctly, we'd like to see small numbers for B, D,
and I, and large numbers for the other workloads. None of the
algorithms seem to achieve that. (3) and (5) seem like they always
behave as well or better than master, but they produce small numbers
for A, C, F, and H. (1), (2), and (4) regress B and I relative to
master but do better than (3) and (5) on A, C, and the latter two also
on E.

B is such an important benchmarking workload that I'd be loathe to
regress it, so if I had to pick on the basis of this data, my vote
would be (3) or (5), provided whatever is happening with (D) in the
previous metric is not as bad as it looks. What's your reason for
preferring (4) and (5) over (2) and (3)? I'm not clear that these
numbers give us much of an idea whether 10% or 33% or something else
is better in general.

To be honest, having now spent more time looking at the benchmark
results, I feel slightly less good about using the LSN as a metric
here. These results, to me, clearly suggest that some recency metric
is needed. But they don't seem to make a compelling case for this
particular one. Neither do they make a case that this is the wrong
one. They just don't seem that revealing either way.

--
Robert Haas
EDB: http://www.enterprisedb.com

In reply to: Melanie Plageman (#11)
Re: Eager page freeze criteria clarification

On Mon, Aug 28, 2023 at 7:00 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I believe that there are two goals that should dictate whether or not we
should perform opportunistic freezing:

1. Is it cheap? For example, if the buffer is already dirty, then no
write amplification occurs, since it must be written out anyway.
Freezing is also less expensive if we can do it without emitting an
FPI.

2. Will it be effective; that is, will it stay frozen?
Opportunistically freezing a page that will immediately be modified is
a waste.

The current heuristic on master meets neither of these goals: it freezes
a page if pruning emitted an FPI for it. This doesn't evaluate whether
or not freezing itself would be cheap, but rather attempts to hide
freezing behind an expensive operation.

The goal is to minimize the number of FPIs over time, in general.
That's hardly the same thing as hiding the cost.

Furthermore, it often fails to
freeze cold data and may indiscriminately freeze hot data.

You need to account for the cost of not freezing, too. Controlling the
overall freeze debt at the level of whole tables (and the level of the
whole system) is very important. In fact it's probably the single most
important thing. A good model might end up increasing the cost of
freezing on a simple quantitative basis, while making life much better
overall. Huge balloon payments for freezing are currently a huge
problem.

Performance stability might come with a cost in some cases. There
isn't necessarily anything wrong with that at all.

--
Peter Geoghegan

In reply to: Melanie Plageman (#11)
Re: Eager page freeze criteria clarification

On Mon, Aug 28, 2023 at 7:00 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

To do this, we need to save that insert LSN
somewhere. In the attached WIP patch, I saved it in the table stats, for
now -- knowing that those are not crash-safe.

What patch? You didn't actually attach one.

Other discarded heuristic ideas included comparing the next transaction
ID at the end of the vacuum of a relation to the visibility cutoff xid
in the page -- but that wasn't effective for freezing data from bulk
loads.

I've long emphasized the importance of designs that just try to avoid
disaster. With that in mind, I wonder: have you thought about
conditioning page freezing on whether or not there are already some
frozen tuples on the page? You could perhaps give some weight to
whether or not the page already has at least one or two preexisting
frozen tuples when deciding on whether to freeze it once again now.
You'd be more eager about freezing pages that have no frozen tuples
whatsoever, compared to what you'd do with an otherwise equivalent
page that has no unfrozen tuples.

Small mistakes are inevitable here. They have to be okay. What's not
okay is a small mistake that effectively becomes a big mistake because
it gets repeated across each VACUUM operation, again and again,
ceaselessly. You can probably be quite aggressive about freezing, to
good effect -- provided you can be sure that the downside when it
turns out to have been a bad idea is self-limiting, in whatever way.
Making more small mistakes might actually be a good thing --
especially if it can dramatically lower the chances of ever making any
really big mistakes.

VACUUM is not a passive observer of the system. It's just another
component in the system. So what VACUUM sees in the table depends in
no small part on what previous VACUUMs actually did. It follows that
VACUUM should be concerned about how what it does might either help or
hinder future VACUUMs. My preexisting frozen tuples suggestion is
really just an example of how that principle might be applied. Many
variations on the same general idea are possible.

There are already various extremely weird feedback loops where what
VACUUM did last time affects what it'll do this time. These are
vicious circles. So ISTM that there is a lot to be said for disrupting
those vicious circles, and even deliberately engineering heuristics
that have the potential to create virtuous circles for the right
workload.

--
Peter Geoghegan

#15Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#14)
Re: Eager page freeze criteria clarification

On Mon, Aug 28, 2023 at 3:09 PM Peter Geoghegan <pg@bowt.ie> wrote:

I've long emphasized the importance of designs that just try to avoid
disaster. With that in mind, I wonder: have you thought about
conditioning page freezing on whether or not there are already some
frozen tuples on the page? You could perhaps give some weight to
whether or not the page already has at least one or two preexisting
frozen tuples when deciding on whether to freeze it once again now.
You'd be more eager about freezing pages that have no frozen tuples
whatsoever, compared to what you'd do with an otherwise equivalent
page that has no unfrozen tuples.

I'm sure this could be implemented, but it's unclear to me why you
would expect it to perform well. Freezing a page that has no frozen
tuples yet isn't cheaper than freezing one that does, so for this idea
to be a win, the presence of frozen tuples on the page would have to
be a signal that the page is likely to be modified again in the near
future. In general, I don't see any reason why we should expect that
to be the case. One could easily construct a workload where it is the
case -- for instance, set up one table T1 where 90% of the tuples are
repeatedly updated and the other 10% are never touched, and another
table T2 that is insert-only. Once frozen, the never-updated tuples in
T1 become sentinels that we can use to know that the table isn't
insert-only. But I don't think that's very interesting: you can
construct a test case like this for any proposed criterion, just by
structuring the test workload so that whatever criterion is being
tested is a perfect predictor of whether the page will be modified
soon.

What really matters here is finding a criterion that is likely to
perform well in general, on a test case not known to us beforehand.
This isn't an entirely feasible goal, because just as you can
construct a test case where any given criterion performs well, so you
can also construct one where any given criterion performs poorly. But
I think a rule that has a clear theory of operation must be preferable
to one that doesn't. The theory that Melanie and Andres are advancing
is that a page that has been modified recently (in insert-LSN-time) is
more likely to be modified again soon than one that has not i.e. the
near future will be like the recent past. I'm not sure what the theory
behind the rule you propose here might be; if you articulated it
somewhere in your email, I seem to have missed it.

--
Robert Haas
EDB: http://www.enterprisedb.com

#16Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#12)
Re: Eager page freeze criteria clarification

On Mon, Aug 28, 2023 at 12:26 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Aug 28, 2023 at 10:00 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
Then there's the question of whether it's the right metric. My first
reaction is to think that it sounds pretty good. One thing I really
like about it is that if the table is being vacuumed frequently, then
we freeze less aggressively, and if the table is being vacuumed
infrequently, then we freeze more aggressively. That seems like a very
desirable property. It also seems broadly good that this metric
doesn't really care about reads. If there are a lot of reads on the
system, or no reads at all, it doesn't really change the chances that
a certain page is going to be written again soon, and since reads
don't change the insert LSN, here again it seems to do the right
thing. I'm a little less clear about whether it's good that it doesn't
really depend on wall-clock time. Certainly, that's desirable from the
point of view of not wanting to have to measure wall-clock time in
places where we otherwise wouldn't have to, which tends to end up
being expensive. However, if I were making all of my freezing
decisions manually, I might be more freeze-positive on a low-velocity
system where writes are more stretched out across time than on a
high-velocity system where we're blasting through the LSN space at a
higher rate. But maybe that's not a very important consideration, and
I don't know what we'd do about it anyway.

By low-velocity, do you mean lower overall TPS? In that case, wouldn't you be
less likely to run into xid wraparound and thus need less aggressive
opportunistic freezing?

Page Freezes/Page Frozen (less is better)

| | Master | (1) | (2) | (3) | (4) | (5) |
|---+--------+---------+---------+---------+---------+---------|
| A | 28.50 | 3.89 | 1.08 | 1.15 | 1.10 | 1.10 |
| B | 1.00 | 1.06 | 1.65 | 1.03 | 1.59 | 1.00 |
| C | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| D | 2.00 | 5199.15 | 5276.85 | 4830.45 | 5234.55 | 2193.55 |
| E | 7.90 | 3.21 | 2.73 | 2.70 | 2.69 | 2.43 |
| F | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| G | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| H | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| I | N/A | 42.00 | 42.00 | N/A | 41.00 | N/A |

Hmm. I would say that the interesting rows here are A, D, and I, with
rows C and E deserving honorable mention. In row A, master is bad.

So, this is where the caveat about absolute number of page freezes
matters. In algorithm A, master only did 57 page freezes (spread across
the various pgbench tables). At the end of the run, 2 pages were still
frozen.

In row D, your algorithms are all bad, really bad. I don't quite
understand how it can be that bad, actually.

So, I realize now that this test was poorly designed. I meant it to be a
worst case scenario, but I think one critical part was wrong. In this
example one client is going at full speed inserting a row and then
updating it. Then another rate-limited client is deleting old data
periodically to keep the table at a constant size. I meant to bulk load
the table with enough data that the delete job would have data to delete
from the start. With the default autovacuum settings, over the course of
45 minutes, I usually saw around 40 autovacuums of the table. Due to the
rate limiting, the first autovacuum of the table ends up freezing many
pages that are deleted soon after. Thus the total number of page freezes
is very high.

I will redo benchmarking of workload D and start the table with the
number of rows which the DELETE job seeks to maintain. My back of the
envelope math says that this will mean ratios closer to a dozen (and not
5000).

Also, I had doubled checkpoint timeout, which likely led master to
freeze so few pages (2 total freezes, neither of which were still frozen
at the end of the run). This is an example where master's overall low
number of page freezes makes it difficult to compare to the alternatives
using a ratio.

I didn't initially question the numbers because it seems like freezing
data and then deleting it right after would naturally be one of the
worst cases for opportunistic freezing, but certainly not this bad.

Row I looks bad for algorithms 1, 2, and 4: they freeze pages because
it looks cheap, but the work doesn't really pay off.

Yes, the work queue example looks like it is hard to handle.

% Frozen at end of run

| | Master | (1) | (2) | (3) | (4) | (5) |
|---+--------+-----+-----+-----+------+-----+
| A | 0 | 1 | 99 | 0 | 81 | 0 |
| B | 71 | 96 | 99 | 3 | 98 | 2 |
| C | 0 | 9 | 100 | 6 | 92 | 5 |
| D | 0 | 1 | 1 | 1 | 1 | 1 |
| E | 0 | 63 | 100 | 68 | 100 | 67 |
| F | 0 | 5 | 14 | 6 | 14 | 5 |
| G | 0 | 100 | 100 | 92 | 100 | 67 |
| H | 0 | 11 | 100 | 9 | 86 | 5 |
| I | 0 | 100 | 100 | 0 | 100 | 0 |

So all of the algorithms here, but especially 1, 2, and 4, freeze a
lot more often than master.

If I understand correctly, we'd like to see small numbers for B, D,
and I, and large numbers for the other workloads. None of the
algorithms seem to achieve that. (3) and (5) seem like they always
behave as well or better than master, but they produce small numbers
for A, C, F, and H. (1), (2), and (4) regress B and I relative to
master but do better than (3) and (5) on A, C, and the latter two also
on E.

B is such an important benchmarking workload that I'd be loathe to
regress it, so if I had to pick on the basis of this data, my vote
would be (3) or (5), provided whatever is happening with (D) in the
previous metric is not as bad as it looks. What's your reason for
preferring (4) and (5) over (2) and (3)? I'm not clear that these
numbers give us much of an idea whether 10% or 33% or something else
is better in general.

(1) seems bad to me because it doesn't consider whether or not freezing
will be useful -- only if it will be cheap. It froze very little of the
cold data in a workload where a small percentage of it was being
modified (especially workloads A, C, H). And it froze a lot of data in
workloads where it was being uniformly modified (workload B).

I suggested (4) and (5) because I think the "older than 33%" threshold
is better than the "older than 10%" threshold. I chose both because I am
still unclear on our values. Are we willing to freeze more aggressively
at the expense of emitting more FPIs? As long as it doesn't affect
throughput? For pretty much all of these workloads, the algorithms which
froze based on page modification recency OR FPI required emitted many
more FPIs than those which froze based only on page modification
recency.

I've attached the WIP patch that I forgot in my previous email.

I'll rerun workload D in a more reasonable way and be back with results.

- Melanie

Attachments:

WIP-opp_freeze_cold_data.patchtext/x-patch; charset=US-ASCII; name=WIP-opp_freeze_cold_data.patchDownload+98-9
In reply to: Robert Haas (#15)
Re: Eager page freeze criteria clarification

On Mon, Aug 28, 2023 at 1:17 PM Robert Haas <robertmhaas@gmail.com> wrote:

I'm sure this could be implemented, but it's unclear to me why you
would expect it to perform well. Freezing a page that has no frozen
tuples yet isn't cheaper than freezing one that does, so for this idea
to be a win, the presence of frozen tuples on the page would have to
be a signal that the page is likely to be modified again in the near
future. In general, I don't see any reason why we should expect that
to be the case.

What I've described in a scheme for deciding to not freeze where that
would usually happen -- a scheme for *vetoing* page freezing -- rather
than a scheme for deciding to freeze. On its own, what I suggested
cannot help at all. It assumes a world in which we're already deciding
to freeze much more frequently, based on whatever other criteria. It's
intended to complement something like the LSN scheme.

What really matters here is finding a criterion that is likely to
perform well in general, on a test case not known to us beforehand.
This isn't an entirely feasible goal, because just as you can
construct a test case where any given criterion performs well, so you
can also construct one where any given criterion performs poorly. But
I think a rule that has a clear theory of operation must be preferable
to one that doesn't. The theory that Melanie and Andres are advancing
is that a page that has been modified recently (in insert-LSN-time) is
more likely to be modified again soon than one that has not i.e. the
near future will be like the recent past.

I don't think that it's all that useful on its own. You just cannot
ignore the fact that the choice to not freeze now doesn't necessarily
mean that you get to rereview that choice in the near future.
Particularly with large tables, the opportunities to freeze at all are
few and far between -- if for no other reason than the general design
of autovacuum scheduling. Worse still, any unfrozen all-visible pages
can just accumulate as all-visible pages, until the next aggressive
VACUUM happens whenever. How can that not be extremely important?

That isn't an argument against a scheme that uses LSNs (many kinds of
information might be weighed) -- it's an argument in favor of paying
attention to the high level cadence of VACUUM. That much seems
essential. I think that there might well be room for having several
complementary schemes like the LSN scheme. Or one big scheme that
weighs multiple factors together, if you prefer. That all seems
basically reasonable to me.

Adaptive behavior is important with something as complicated as this.
Adaptive schemes all seem to involve trial and error. The cost of
freezing too much is relatively well understood, and can be managed
sensibly. So we should err in that direction -- a direction that is
relatively easy to understand, to notice, and to pull back from having
gone too far. Putting off freezing for a very long time is a source of
much of the seemingly intractable complexity in this area.

Another way of addressing that is getting rid of aggressive VACUUM as
a concept. But I'm not going to revisit that topic now, or likely
ever.

--
Peter Geoghegan

#18Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#17)
Re: Eager page freeze criteria clarification

On Mon, Aug 28, 2023 at 5:06 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Mon, Aug 28, 2023 at 1:17 PM Robert Haas <robertmhaas@gmail.com> wrote:

I'm sure this could be implemented, but it's unclear to me why you
would expect it to perform well. Freezing a page that has no frozen
tuples yet isn't cheaper than freezing one that does, so for this idea
to be a win, the presence of frozen tuples on the page would have to
be a signal that the page is likely to be modified again in the near
future. In general, I don't see any reason why we should expect that
to be the case.

What I've described in a scheme for deciding to not freeze where that
would usually happen -- a scheme for *vetoing* page freezing -- rather
than a scheme for deciding to freeze. On its own, what I suggested
cannot help at all. It assumes a world in which we're already deciding
to freeze much more frequently, based on whatever other criteria. It's
intended to complement something like the LSN scheme.

I like the direction of this idea. It could avoid repeatedly freezing
a page that is being modified over and over. I tried it out with a
short-running version of workload I (approximating a work queue) --
and it prevents unnecessary freezing.

The problem with this criterion is that if you freeze a page and then
ever modify it again -- even once, vacuum will not be allowed to
freeze it again until it is forced to. Perhaps we could use a very
strict LSN cutoff for pages which contain any frozen tuples -- for
example: only freeze a page containing a mix of frozen and unfrozen
tuples if it has not been modified since before the last vacuum of the
relation ended.

- Melanie

In reply to: Melanie Plageman (#18)
Re: Eager page freeze criteria clarification

On Mon, Aug 28, 2023 at 4:15 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Mon, Aug 28, 2023 at 5:06 PM Peter Geoghegan <pg@bowt.ie> wrote:

What I've described in a scheme for deciding to not freeze where that
would usually happen -- a scheme for *vetoing* page freezing -- rather
than a scheme for deciding to freeze. On its own, what I suggested
cannot help at all. It assumes a world in which we're already deciding
to freeze much more frequently, based on whatever other criteria. It's
intended to complement something like the LSN scheme.

I like the direction of this idea. It could avoid repeatedly freezing
a page that is being modified over and over. I tried it out with a
short-running version of workload I (approximating a work queue) --
and it prevents unnecessary freezing.

I see a lot of value in it as an enabler of freezing at the earliest
opportunity, which is usually the way to go.

The problem with this criterion is that if you freeze a page and then
ever modify it again -- even once, vacuum will not be allowed to
freeze it again until it is forced to.

Right. Like you, I was thinking of something that dampened VACUUM's
newfound enthusiasm for freezing, in whatever way -- not a discrete
rule. A mechanism with a sense of proportion about what it meant for
the page to look a certain way. The strength of the signal would
perhaps be highest (i.e. most discouraging of further freezing) on a
page that has only a few relatively recent unfrozen tuples. XID age
could actually be useful here!

Perhaps we could use a very
strict LSN cutoff for pages which contain any frozen tuples -- for
example: only freeze a page containing a mix of frozen and unfrozen
tuples if it has not been modified since before the last vacuum of the
relation ended.

You might also need to account for things like interactions with the
FSM. If the new unfrozen tuple packs the page to the brim, and the new
row is from an insert, then maybe the mechanism shouldn't dampen our
enthusiasm for freezing the page at all. Similarly, on a page that has
no unfrozen tuples at all just yet, it would make sense for the
algorithm to be most enthusiastic about freezing those pages that can
fit no more tuples. Perhaps we'd be willing to accept the cost of an
extra FPI with such a page, for example.

It will take time to validate these sorts of ideas thoroughly, of
course. I agree with what you said upthread about it also being a
question of values and priorities.

--
Peter Geoghegan

#20Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#16)
Re: Eager page freeze criteria clarification

On Mon, Aug 28, 2023 at 4:30 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

By low-velocity, do you mean lower overall TPS? In that case, wouldn't you be
less likely to run into xid wraparound and thus need less aggressive
opportunistic freezing?

Yes. But it also means that we've got slack capacity to do extra work
now without really hurting anything. If we can leverage that capacity
to reduce the pain of future bulk operations, that seems good. When
resources are tight, doing speculative work immediately becomes less
appealing. It's pretty hard to take such things into account, though.
I was just mentioning it.

So, this is where the caveat about absolute number of page freezes
matters. In algorithm A, master only did 57 page freezes (spread across
the various pgbench tables). At the end of the run, 2 pages were still
frozen.

I'm increasingly feeling that it's hard to make sense of the ratio.
Maybe show the number of pages frozen, the number that are frozen at
the end, and the number of pages in the database at the end of the run
as three separate values.

(1) seems bad to me because it doesn't consider whether or not freezing
will be useful -- only if it will be cheap. It froze very little of the
cold data in a workload where a small percentage of it was being
modified (especially workloads A, C, H). And it froze a lot of data in
workloads where it was being uniformly modified (workload B).

Sure, but if the cost of being wrong is low, you can be wrong a lot
and still be pretty happy. It's unclear to me to what extent we should
gamble on making only inexpensive mistakes and to what extent we
should gamble on making only infrequent mistakes, but they're both
valid strategies.

I suggested (4) and (5) because I think the "older than 33%" threshold
is better than the "older than 10%" threshold. I chose both because I am
still unclear on our values. Are we willing to freeze more aggressively
at the expense of emitting more FPIs? As long as it doesn't affect
throughput? For pretty much all of these workloads, the algorithms which
froze based on page modification recency OR FPI required emitted many
more FPIs than those which froze based only on page modification
recency.

Let's assume for a moment that the rate at which the insert LSN is
advancing is roughly constant over time, so that it serves as a good
proxy for wall-clock time. Consider four tables A, B, C, and D that
are, respectively, vacuumed once per minute, once per hour, once per
day, and once per week. With a 33% threshold, pages in table A will be
frozen if they haven't been modified in 20 seconds, page in table B
will be frozen if they haven't been modified in 20 minutes, pages in
table C will be frozen if they haven't been modified in 8 hours, and
pages in table D will be frozen if they haven't been modified in 2
days, 8 hours. My intuition is that this feels awfully aggressive for
A and awfully passive for D.

To expand on that: apparently D doesn't actually get much write
activity, else there would be more vacuuming happening. So it's very
likely that pages in table D are going to get checkpointed and evicted
before they get modified again. Freezing them therefore seems like a
good bet: it's a lot cheaper to freeze those pages when they're
already in shared_buffers and dirty than it is if we have to read and
write them specifically for freezing. It's less obvious that what
we're doing in table A is wrong, and it could be exactly right, but
workloads where a row is modified, a human thinks about something
(e.g. whether to complete the proposed purchase), and then the same
row is modified again are not uncommon, and human thinking times can
easily exceed 20 seconds. On the other hand, workloads where a row is
modified, a computer thinks about something, and then the row is
modified again are also quite common, and computer thinking times can
easily be less than 20 seconds. It feels like a toss-up whether we get
it right. For this kind of table, I suspect we'd be happier freezing
pages that are about to be evicted or about to be written as part of a
checkpoint rather than freezing pages opportunistically in vacuum.

Maybe that's something we need to think harder about. If we froze
dirty pages that wouldn't need a new FPI just before evicting them,
and just before they would be written out for a checkpoint, under what
circumstances would we still want vacuum to opportunistically freeze?
I think the answer might be "only when it's very cheap." If it's very
cheap to freeze now, it's appealing to gamble on doing it before a
checkpoint comes along and makes the same operation require an extra
FPI. But if it isn't too cheap, then why not just wait and see what
happens? If the buffer gets modified again before it gets written out,
then freeing immediately is a waste and freeze-on-evict is better. If
it doesn't, freezing immediately and freeze-on-evict are the same
price.

I'm hand-waving a bit here because maybe freeze-on-evict is subject to
some conditions and freezing-immediately is more unconditional or
something like that. But I think the basic point is valid, namely,
sometimes it might be better to defer the decision to the last point
in time at which we can reasonably make it, and by focusing on what
vacuum (or even HOT pruning) do, we're pulling that decision forward
in time, which is good if it makes it cheaper by piggybacking on a
previous FPI, but not good if it means that we're guessing whether the
page will be accessed again soon when we could just as well wait and
see whether that happens or not.

--
Robert Haas
EDB: http://www.enterprisedb.com

#21Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#20)
In reply to: Robert Haas (#21)
#23Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#12)
#24Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#23)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#22)
In reply to: Robert Haas (#25)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#26)
#28Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#24)
#29Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#25)
#30Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#27)
In reply to: Andres Freund (#29)
#32Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#29)
#33Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#31)
In reply to: Andres Freund (#33)
#35Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#16)
#36Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#35)
#37Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#28)
#38Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#35)
In reply to: Robert Haas (#37)
#40Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#37)
#41Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#39)
In reply to: Andres Freund (#41)
#43Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#40)
#44Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#43)
#45Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#42)
In reply to: Andres Freund (#45)
#47Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#46)
In reply to: Andres Freund (#47)
#49Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#28)
In reply to: Melanie Plageman (#49)
#51Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#44)
#52Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#49)
#53Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#52)
In reply to: Robert Haas (#52)
In reply to: Andres Freund (#53)
#56Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#55)
In reply to: Andres Freund (#56)
In reply to: Peter Geoghegan (#57)
#59Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#51)
In reply to: Melanie Plageman (#59)
#61Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#60)
#62Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#57)
In reply to: Melanie Plageman (#61)
#64Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#63)
In reply to: Andres Freund (#64)
In reply to: Melanie Plageman (#62)
#67Andres Freund
andres@anarazel.de
In reply to: Melanie Plageman (#59)
In reply to: Andres Freund (#67)
#69Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#59)
#70Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#68)
In reply to: Robert Haas (#70)
#72Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#71)
In reply to: Robert Haas (#72)
In reply to: Robert Haas (#72)
#75Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#74)
In reply to: Robert Haas (#75)
#77Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#76)
In reply to: Robert Haas (#77)
#79Andres Freund
andres@anarazel.de
In reply to: Melanie Plageman (#1)
#80Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#79)
#81Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#79)
#82Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#81)
#83Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#79)
#84Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#83)
#85Joe Conway
mail@joeconway.com
In reply to: Melanie Plageman (#84)
#86Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#84)
#87Melanie Plageman
melanieplageman@gmail.com
In reply to: Joe Conway (#85)
#88Joe Conway
mail@joeconway.com
In reply to: Melanie Plageman (#87)
#89Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#86)
#90Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#87)
#91Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#89)