free space map and visibility map
With some intensive crash-recovery testing, I've run into a situation where
I get some bad table bloat. There will be large swaths of the table which
are empty (all results from heap_page_items other than lp are either zero
or NULL), but have zero available space in the fsm, and are marked as
all-visible and all-frozen in the vm.
I guess it is a result of a crash causing updates to the fsm to be lost.
Then due to the (crash-recovered) visibility map showing them as all
visible and all frozen, vacuum never touches the pages again, so the fsm
never gets corrected.
'VACUUM (DISABLE_PAGE_SKIPPING) foo;' does fix it, but that seems to be
the only thing that will.
Is there a way to improve this, short of making updates to the fsm be a
wal-logged operation?
It is probably not a very pressing issue, as crashes are normally pretty
rare, I would hope. But it seems worth improving if there is a good way to
do so.
Cheers,
Jeff
On Fri, Mar 17, 2017 at 9:37 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
With some intensive crash-recovery testing, I've run into a situation where
I get some bad table bloat. There will be large swaths of the table which
are empty (all results from heap_page_items other than lp are either zero or
NULL), but have zero available space in the fsm, and are marked as
all-visible and all-frozen in the vm.I guess it is a result of a crash causing updates to the fsm to be lost.
Then due to the (crash-recovered) visibility map showing them as all visible
and all frozen, vacuum never touches the pages again, so the fsm never gets
corrected.
I guess that this happens only if heap_xlog_clean applies FPI. Right?
Updating fsm can be lost but fsm is updated by replaying HEAP2_CLEAN
record during crash recovery.
'VACUUM (DISABLE_PAGE_SKIPPING) foo;' does fix it, but that seems to be
the only thing that will.
If the above is correct, another one option is to allow
heap_xlog_clean to update fsm even when appling FPI.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Mar 18, 2017 at 2:09 PM, Masahiko Sawada <sawada.mshk@gmail.com>
wrote:
On Fri, Mar 17, 2017 at 9:37 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
With some intensive crash-recovery testing, I've run into a situation
where
I get some bad table bloat. There will be large swaths of the table
which
are empty (all results from heap_page_items other than lp are either
zero or
NULL), but have zero available space in the fsm, and are marked as
all-visible and all-frozen in the vm.I guess it is a result of a crash causing updates to the fsm to be lost.
Then due to the (crash-recovered) visibility map showing them as allvisible
and all frozen, vacuum never touches the pages again, so the fsm never
gets
corrected.
I guess that this happens only if heap_xlog_clean applies FPI. Right?
Updating fsm can be lost but fsm is updated by replaying HEAP2_CLEAN
record during crash recovery.
Isn't HEAP2_CLEAN only issued before an intended HOT update? (Which then
can't leave the block as all visible or all frozen). I think the issue is
here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this correctly,
that neither of those ever update the FSM, regardless of FPI?
I don't know how to test the issue of which record is most responsible. I
could turn off FPW globally and see what happens, with some tweaking to my
testing harness.
Cheers,
Jeff
On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
Isn't HEAP2_CLEAN only issued before an intended HOT update? (Which then
can't leave the block as all visible or all frozen). I think the issue is
here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this correctly,
that neither of those ever update the FSM, regardless of FPI?
Yes, updates to the FSM are never logged. Forcing replay of
HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
Isn't HEAP2_CLEAN only issued before an intended HOT update? (Which then
can't leave the block as all visible or all frozen). I think the issue is
here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this correctly,
that neither of those ever update the FSM, regardless of FPI?Yes, updates to the FSM are never logged. Forcing replay of
HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
I think I was missing something. I imaged your situation is that FPI
is replayed during crash recovery after the crashed server vacuums the
page and marked it as all-frozen. But this situation is also resolved
by that solution.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ@mail.gmail.com>
On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
Isn't HEAP2_CLEAN only issued before an intended HOT update? (Which then
can't leave the block as all visible or all frozen). I think the issue is
here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this correctly,
that neither of those ever update the FSM, regardless of FPI?Yes, updates to the FSM are never logged. Forcing replay of
HEAP2_FREEZE_PAGE to update the FSM might be a good idea.I think I was missing something. I imaged your situation is that FPI
is replayed during crash recovery after the crashed server vacuums the
page and marked it as all-frozen. But this situation is also resolved
by that solution.
# HEAP2_CLEAN is issued in lazy_vacuum_page
It will work but I'm not sure it is right direction for
HEAP2_FREEZE_PAGE to touch FSM.
As Masahiko said, the situation must be created by HEAP2_VISIBLE
without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
think only the latter can happen. The comment in heap_xlog_clean
below is right generally but if a page filled with tuples becomes
almost empty and freezable by this cleanup, a problematic
situation like this occurs.
/*
* Update the FSM as well.
*
* XXX: Don't do this if the page was restored from full page image. We
* don't bother to update the FSM in that case, it doesn't need to be
* totally accurate anyway.
*/
if (action == BLK_NEEDS_REDO)
XLogRecordPageWithFreeSpace(rnode, blkno, freespace);
HEAP_INSERT/HEAP2_MULTI_INSERT/UPDATE does the similar. All of
these reduces freespace but HEAP2_CLEAN increases. HEAP2_CLEAN
occurs infrequently than the three. So I suppose HEAP2_CLEAN may
always update FSM.
Even if the page is not frozen, the similar situation is made
with just ALL_VISIBLE. Without any updates on the page, freespace
information for the page won't be corrected until the next
freezing(or 'aggressive') vacuum occurs.
From this point of view, HEAP2_FREEZE_PAGE is not responsible for
updating FSM. But if we see that always updating FSM on
HEAP2_CLEAN is too much, HEAP2_FREEZE_PAGE would be the next way
to go.
(I don't understand the reason for skipping updating FSM only for
FPI. This seems introduced by f8f42279)
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Mar 24, 2017 at 11:01 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ@mail.gmail.com>
On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
Isn't HEAP2_CLEAN only issued before an intended HOT update? (Which then
can't leave the block as all visible or all frozen). I think the issue is
here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this correctly,
that neither of those ever update the FSM, regardless of FPI?Yes, updates to the FSM are never logged. Forcing replay of
HEAP2_FREEZE_PAGE to update the FSM might be a good idea.I think I was missing something. I imaged your situation is that FPI
is replayed during crash recovery after the crashed server vacuums the
page and marked it as all-frozen. But this situation is also resolved
by that solution.# HEAP2_CLEAN is issued in lazy_vacuum_page
It will work but I'm not sure it is right direction for
HEAP2_FREEZE_PAGE to touch FSM.As Masahiko said, the situation must be created by HEAP2_VISIBLE
without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
think only the latter can happen. The comment in heap_xlog_clean
below is right generally but if a page filled with tuples becomes
almost empty and freezable by this cleanup, a problematic
situation like this occurs./*
* Update the FSM as well.
*
* XXX: Don't do this if the page was restored from full page image. We
* don't bother to update the FSM in that case, it doesn't need to be
* totally accurate anyway.
*/
if (action == BLK_NEEDS_REDO)
XLogRecordPageWithFreeSpace(rnode, blkno, freespace);HEAP_INSERT/HEAP2_MULTI_INSERT/UPDATE does the similar. All of
these reduces freespace but HEAP2_CLEAN increases. HEAP2_CLEAN
occurs infrequently than the three. So I suppose HEAP2_CLEAN may
always update FSM.Even if the page is not frozen, the similar situation is made
with just ALL_VISIBLE. Without any updates on the page, freespace
information for the page won't be corrected until the next
freezing(or 'aggressive') vacuum occurs.From this point of view, HEAP2_FREEZE_PAGE is not responsible for
updating FSM. But if we see that always updating FSM on
HEAP2_CLEAN is too much, HEAP2_FREEZE_PAGE would be the next way
to go.(I don't understand the reason for skipping updating FSM only for
FPI. This seems introduced by f8f42279)
This code is introduced by e9816533e39be464227b748ee5eeb3d9f688cd76
and discussion is here[1]/messages/by-id/49072021.7010801@enterprisedb.com.
ISTM that this code is implemented based on that all page will be
vacuumed eventually. But now that we have freeze map and the pages
could never be vacuum, it would be worth to consider that behavior
again.
[1]: /messages/by-id/49072021.7010801@enterprisedb.com
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Mar 23, 2017 at 7:01 PM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:
At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada <sawada.mshk@gmail.com>
wrote in <CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ@mail.
gmail.com>On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas@gmail.com>
wrote:
On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.janes@gmail.com>
wrote:
Isn't HEAP2_CLEAN only issued before an intended HOT update? (Which
then
can't leave the block as all visible or all frozen). I think the
issue is
here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this
correctly,
that neither of those ever update the FSM, regardless of FPI?
Yes, updates to the FSM are never logged. Forcing replay of
HEAP2_FREEZE_PAGE to update the FSM might be a good idea.I think I was missing something. I imaged your situation is that FPI
is replayed during crash recovery after the crashed server vacuums the
page and marked it as all-frozen. But this situation is also resolved
by that solution.# HEAP2_CLEAN is issued in lazy_vacuum_page
It will work but I'm not sure it is right direction for
HEAP2_FREEZE_PAGE to touch FSM.As Masahiko said, the situation must be created by HEAP2_VISIBLE
without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
think only the latter can happen. The comment in heap_xlog_clean
below is right generally but if a page filled with tuples becomes
almost empty and freezable by this cleanup, a problematic
situation like this occurs.
I now think this is not the cause of the problem I am seeing. I made the
replay of FREEZE_PAGE update the FSM (both with and without FPI), but that
did not fix it. With frequent crashes, it still accumulated a lot of
frozen and empty (but full according to FSM) pages. I also set up replica
streaming and turned off crashing on the master, and the FSM of the replica
stays accurate, so the WAL stream and replay logic is doing the right thing
on the replica.
I now think the dirtied FSM pages are somehow not getting marked as dirty,
or are getting marked as dirty but somehow the checkpoint is skipping
them. It looks like MarkBufferDirtyHint does do some operations unlocked
which could explain lost update, but it seems unlikely that that would
happen often enough to see the amount of lost updates I am seeing.
/*
* Update the FSM as well.
*
* XXX: Don't do this if the page was restored from full page image. We
* don't bother to update the FSM in that case, it doesn't need to be
* totally accurate anyway.
*/
What does that save us? If we restored from FPI, we already have the block
in memory (we don't need to see the old version, just the new one), so it
doesn't save us a random read IO.
Cheers,
Jeff
At Sat, 25 Mar 2017 19:53:47 -0700, Jeff Janes <jeff.janes@gmail.com> wrote in <CAMkU=1x3+DPsfSU+AF7WAzAVugmEhUA2+jNf7SuAL-MSKQ+_KA@mail.gmail.com>
On Thu, Mar 23, 2017 at 7:01 PM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada <sawada.mshk@gmail.com>
wrote in <CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ@mail.
gmail.com>On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas@gmail.com>
wrote:
On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.janes@gmail.com>
wrote:
Isn't HEAP2_CLEAN only issued before an intended HOT update? (Which
then
can't leave the block as all visible or all frozen). I think the
issue is
here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this
correctly,
that neither of those ever update the FSM, regardless of FPI?
Yes, updates to the FSM are never logged. Forcing replay of
HEAP2_FREEZE_PAGE to update the FSM might be a good idea.I think I was missing something. I imaged your situation is that FPI
is replayed during crash recovery after the crashed server vacuums the
page and marked it as all-frozen. But this situation is also resolved
by that solution.# HEAP2_CLEAN is issued in lazy_vacuum_page
It will work but I'm not sure it is right direction for
HEAP2_FREEZE_PAGE to touch FSM.As Masahiko said, the situation must be created by HEAP2_VISIBLE
without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
think only the latter can happen. The comment in heap_xlog_clean
below is right generally but if a page filled with tuples becomes
almost empty and freezable by this cleanup, a problematic
situation like this occurs.I now think this is not the cause of the problem I am seeing. I made the
replay of FREEZE_PAGE update the FSM (both with and without FPI), but that
did not fix it. With frequent crashes, it still accumulated a lot of
frozen and empty (but full according to FSM) pages. I also set up replica
streaming and turned off crashing on the master, and the FSM of the replica
stays accurate, so the WAL stream and replay logic is doing the right thing
on the replica.I now think the dirtied FSM pages are somehow not getting marked as dirty,
or are getting marked as dirty but somehow the checkpoint is skipping
them. It looks like MarkBufferDirtyHint does do some operations unlocked
which could explain lost update, but it seems unlikely that that would
happen often enough to see the amount of lost updates I am seeing.
Hmm.. clearing dirty hint seems already protected by exclusive
lock. And I think it can occur without lock failure.
Other than by FPI, FSM update is omitted when record LSN is older
than page LSN. If heap page is evicted but FSM page is not after
vacuuming and before power cut, replaying HEAP2_CLEAN skips
update of FSM even though FPI is not attached. Of course this
cannot occur on standby. One FSM page covers as many heap pages
as about 4k, so FSM can stay far longer than heap pages.
ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page
is already empty when entering lazy_sacn_heap, or a page of
non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is
issued to set ALL_FROZEN.
Perhaps the problem will be fixed by forcing heap_xlog_visible to
update FSM (addition to FREEZE_PAGE), or the same in
heap_xlog_clean. (As menthined in the previous mail, I prefer the
latter.)
/*
* Update the FSM as well.
*
* XXX: Don't do this if the page was restored from full page image. We
* don't bother to update the FSM in that case, it doesn't need to be
* totally accurate anyway.
*/What does that save us? If we restored from FPI, we already have the block
in memory (we don't need to see the old version, just the new one), so it
doesn't save us a random read IO.
Updates on random pages can cause visits to many unloaded FSM
pages. It may be intending to avoid that. Or, especially for
INSERT, successive operations tends to occur on the same heap
page, the complexity of calculating FSM wouldn't be so small
relatively. FMS tells a lie that the page has spare space after
that but it doesn't harm. But I think that the things are
different for operations that increments free space.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Mar 27, 2017 at 2:38 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
At Sat, 25 Mar 2017 19:53:47 -0700, Jeff Janes <jeff.janes@gmail.com> wrote in <CAMkU=1x3+DPsfSU+AF7WAzAVugmEhUA2+jNf7SuAL-MSKQ+_KA@mail.gmail.com>
On Thu, Mar 23, 2017 at 7:01 PM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada <sawada.mshk@gmail.com>
wrote in <CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ@mail.
gmail.com>On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas@gmail.com>
wrote:
On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.janes@gmail.com>
wrote:
Isn't HEAP2_CLEAN only issued before an intended HOT update? (Which
then
can't leave the block as all visible or all frozen). I think the
issue is
here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this
correctly,
that neither of those ever update the FSM, regardless of FPI?
Yes, updates to the FSM are never logged. Forcing replay of
HEAP2_FREEZE_PAGE to update the FSM might be a good idea.I think I was missing something. I imaged your situation is that FPI
is replayed during crash recovery after the crashed server vacuums the
page and marked it as all-frozen. But this situation is also resolved
by that solution.# HEAP2_CLEAN is issued in lazy_vacuum_page
It will work but I'm not sure it is right direction for
HEAP2_FREEZE_PAGE to touch FSM.As Masahiko said, the situation must be created by HEAP2_VISIBLE
without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
think only the latter can happen. The comment in heap_xlog_clean
below is right generally but if a page filled with tuples becomes
almost empty and freezable by this cleanup, a problematic
situation like this occurs.I now think this is not the cause of the problem I am seeing. I made the
replay of FREEZE_PAGE update the FSM (both with and without FPI), but that
did not fix it. With frequent crashes, it still accumulated a lot of
frozen and empty (but full according to FSM) pages. I also set up replica
streaming and turned off crashing on the master, and the FSM of the replica
stays accurate, so the WAL stream and replay logic is doing the right thing
on the replica.I now think the dirtied FSM pages are somehow not getting marked as dirty,
or are getting marked as dirty but somehow the checkpoint is skipping
them. It looks like MarkBufferDirtyHint does do some operations unlocked
which could explain lost update, but it seems unlikely that that would
happen often enough to see the amount of lost updates I am seeing.Hmm.. clearing dirty hint seems already protected by exclusive
lock. And I think it can occur without lock failure.Other than by FPI, FSM update is omitted when record LSN is older
than page LSN. If heap page is evicted but FSM page is not after
vacuuming and before power cut, replaying HEAP2_CLEAN skips
update of FSM even though FPI is not attached. Of course this
cannot occur on standby. One FSM page covers as many heap pages
as about 4k, so FSM can stay far longer than heap pages.ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page
is already empty when entering lazy_sacn_heap, or a page of
non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is
issued to set ALL_FROZEN.Perhaps the problem will be fixed by forcing heap_xlog_visible to
update FSM (addition to FREEZE_PAGE), or the same in
heap_xlog_clean. (As menthined in the previous mail, I prefer the
latter.)
Maybe it's enough just to make both heap_xlog_visible and
heap_xlog_freeze_page forcibly updates the FSM (heap_xlog_freeze_page
might be unnecessary). Because the problem happens on the page that is
full according to FSM but is empty and marked as all-visible or
all-frozen. Though heap_xlog_clean loads the heap page to the memory
for redo operation, forcing heap_xlog_clean to update FSM might be
overkill for this solution. Because it can happen on every pages that
are not marked as neither all-visible nor all-frozen. Basically 100%
accuracy of FSM is not required. On the other hand, if we makes
heap_xlog_visible updates the FSM, it requires to load both heap page
and FSM page, which can also be overhead. Another idea is, we can
heap_xlog_visible to have the freespace of corresponding heap page,
and then update FSM during recovery.
/*
* Update the FSM as well.
*
* XXX: Don't do this if the page was restored from full page image. We
* don't bother to update the FSM in that case, it doesn't need to be
* totally accurate anyway.
*/What does that save us? If we restored from FPI, we already have the block
in memory (we don't need to see the old version, just the new one), so it
doesn't save us a random read IO.Updates on random pages can cause visits to many unloaded FSM
pages. It may be intending to avoid that. Or, especially for
INSERT, successive operations tends to occur on the same heap
page, the complexity of calculating FSM wouldn't be so small
relatively. FMS tells a lie that the page has spare space after
that but it doesn't harm. But I think that the things are
different for operations that increments free space.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I'd like to have a comment from Heikki or Tom.
At Mon, 27 Mar 2017 16:49:08 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoAnCw8y37dJSEhdbpue5H1v5FVLKUA_uh2MhZe331HNyw@mail.gmail.com>
On Mon, Mar 27, 2017 at 2:38 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:Other than by FPI, FSM update is omitted when record LSN is older
than page LSN. If heap page is evicted but FSM page is not after
vacuuming and before power cut, replaying HEAP2_CLEAN skips
update of FSM even though FPI is not attached. Of course this
cannot occur on standby. One FSM page covers as many heap pages
as about 4k, so FSM can stay far longer than heap pages.ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page
is already empty when entering lazy_sacn_heap, or a page of
non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is
issued to set ALL_FROZEN.Perhaps the problem will be fixed by forcing heap_xlog_visible to
update FSM (addition to FREEZE_PAGE), or the same in
heap_xlog_clean. (As menthined in the previous mail, I prefer the
latter.)Maybe it's enough just to make both heap_xlog_visible and
heap_xlog_freeze_page forcibly updates the FSM (heap_xlog_freeze_page
might be unnecessary). Because the problem happens on the page that is
full according to FSM but is empty and marked as all-visible or
It would work and straightforward.
Currently FSM seems to be assumed as a part of heap from the view
of WAL. From the point of view, the problem is heap_xlog_clean
omits updating FSM for certain cases. My only concern is whether
updating heap information by visibility map record is right or
not. The code indents to reduce FSM updates without having
problem. For the insert/update cases, the problem is too-large
freespace information in FSM can cause needless fetches of heap
pages. But things are a bit different for the clean case. The
problem is too-small freespace information that causes
everlasting empty pages.
I dug out the original discussion. The mention on this was found
here.
/messages/by-id/24334.1225205478@sss.pgh.pa.us
Tom Lane wrote:
| Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
| > One issue with this patch is that it doesn't update the FSM at all when
| > pages are restored from full page images. It would require fetching the
| > page and checking the free space on it, or peeking into the size of the
| > backup block data, and I'm not sure if it's worth the extra code to do that.
|
| I'd vote not to bother, at least not in the first cut. As you say, 100%
| accuracy isn't required, and I think that in typical scenarios an
| insert/update that causes a page to become full would be relatively less
| likely to have a full-page image.
This is the 'first cut' shape, which hadn't cause a apparent
problem without ALL_FROZEN.
all-frozen. Though heap_xlog_clean loads the heap page to the memory
for redo operation, forcing heap_xlog_clean to update FSM might be
overkill for this solution. Because it can happen on every pages that
are not marked as neither all-visible nor all-frozen. Basically 100%
I'm not sure that it is defeinitely not an overkill but it seems
to me the same with the 20% rule of insert/update cases. We must
avoid 0% or too-small (under 20%?) FSM info on heap_clean for the
case especially for FREEZEing.
accuracy of FSM is not required. On the other hand, if we makes
Yes, what is needed here is not accuracy, but miminum guratantee
not to cause a critical problem.
heap_xlog_visible updates the FSM, it requires to load both heap page
and FSM page, which can also be overhead. Another idea is, we can
heap_xlog_visible to have the freespace of corresponding heap page,
and then update FSM during recovery.
I haven't considered it. Counting freepsace by visiblilty logs is
worse in I/O perspective. Seems somewhat arbitrary but having
freespace in VM records seems to work.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I accidentally sent this off-list, sending to the list now:
On Sun, Mar 26, 2017 at 10:38 PM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:
At Sat, 25 Mar 2017 19:53:47 -0700, Jeff Janes <jeff.janes@gmail.com>
wrote in <CAMkU=1x3+DPsfSU+AF7WAzAVugmEhUA2+jNf7SuAL-MSKQ+_KA@mail.
gmail.com>On Thu, Mar 23, 2017 at 7:01 PM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada <
sawada.mshk@gmail.com>
wrote in <CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ@mail.
gmail.com>On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas@gmail.com
wrote:
On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff.janes@gmail.com>
wrote:
Isn't HEAP2_CLEAN only issued before an intended HOT update?
(Which
then
can't leave the block as all visible or all frozen). I think the
issue is
here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this
correctly,
that neither of those ever update the FSM, regardless of FPI?
Yes, updates to the FSM are never logged. Forcing replay of
HEAP2_FREEZE_PAGE to update the FSM might be a good idea.I think I was missing something. I imaged your situation is that FPI
is replayed during crash recovery after the crashed server vacuumsthe
page and marked it as all-frozen. But this situation is also resolved
by that solution.# HEAP2_CLEAN is issued in lazy_vacuum_page
It will work but I'm not sure it is right direction for
HEAP2_FREEZE_PAGE to touch FSM.As Masahiko said, the situation must be created by HEAP2_VISIBLE
without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
think only the latter can happen. The comment in heap_xlog_clean
below is right generally but if a page filled with tuples becomes
almost empty and freezable by this cleanup, a problematic
situation like this occurs.I now think this is not the cause of the problem I am seeing. I made the
replay of FREEZE_PAGE update the FSM (both with and without FPI), butthat
did not fix it. With frequent crashes, it still accumulated a lot of
frozen and empty (but full according to FSM) pages. I also set upreplica
streaming and turned off crashing on the master, and the FSM of the
replica
stays accurate, so the WAL stream and replay logic is doing the right
thing
on the replica.
I now think the dirtied FSM pages are somehow not getting marked as
dirty,
or are getting marked as dirty but somehow the checkpoint is skipping
them. It looks like MarkBufferDirtyHint does do some operations unlocked
which could explain lost update, but it seems unlikely that that would
happen often enough to see the amount of lost updates I am seeing.Hmm.. clearing dirty hint seems already protected by exclusive
lock. And I think it can occur without lock failure.Other than by FPI, FSM update is omitted when record LSN is older
than page LSN. If heap page is evicted but FSM page is not after
vacuuming and before power cut, replaying HEAP2_CLEAN skips
update of FSM even though FPI is not attached. Of course this
cannot occur on standby. One FSM page covers as many heap pages
as about 4k, so FSM can stay far longer than heap pages.
This corresponds to action == BLK_DONE case, right?
ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page
is already empty when entering lazy_sacn_heap, or a page of
non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is
issued to set ALL_FROZEN.Perhaps the problem will be fixed by forcing heap_xlog_visible to
update FSM (addition to FREEZE_PAGE), or the same in
heap_xlog_clean. (As menthined in the previous mail, I prefer the
latter.)
When I make heap_xlog_clean update FSM even on BLK_RESTORED (but not on
BLK_DONE), it solves the problem I was seeing. Which still leaves me
wondering why the problem doesn't show up on the standby because, unlike
BLK_DONE, BLK_RESTORED should have the same issue on standby as it does on
a recovering master, shouldn't it? Maybe the difference is that the
existence a replication slot delays the clean up in a way that causes a
different pattern of WAL records.
/*
* Update the FSM as well.
*
* XXX: Don't do this if the page was restored from full page image.We
* don't bother to update the FSM in that case, it doesn't need to be
* totally accurate anyway.
*/What does that save us? If we restored from FPI, we already have the
block
in memory (we don't need to see the old version, just the new one), so it
doesn't save us a random read IO.Updates on random pages can cause visits to many unloaded FSM
pages. It may be intending to avoid that.
But I think that that would be no worse for BLK_RESTORED than it is for
BLK_NEEDS_REDO. Why optimize only one of the cases, if it is worth
optimizing either one?
Cheers,
Jeff
Attachments:
fsm_clean.patchapplication/octet-stream; name=fsm_clean.patchDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
new file mode 100644
index b147f64..7b2a390
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
*************** heap_xlog_clean(XLogReaderState *record)
*** 8011,8018 ****
nowdead, ndead,
nowunused, nunused);
- freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
-
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
--- 8011,8016 ----
*************** heap_xlog_clean(XLogReaderState *record)
*** 8021,8037 ****
PageSetLSN(page, lsn);
MarkBufferDirty(buffer);
}
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
/*
* Update the FSM as well.
*
! * XXX: Don't do this if the page was restored from full page image. We
! * don't bother to update the FSM in that case, it doesn't need to be
! * totally accurate anyway.
*/
! if (action == BLK_NEEDS_REDO)
XLogRecordPageWithFreeSpace(rnode, blkno, freespace);
}
--- 8019,8038 ----
PageSetLSN(page, lsn);
MarkBufferDirty(buffer);
}
+ if (action == BLK_NEEDS_REDO || action == BLK_RESTORED)
+ freespace = PageGetHeapFreeSpace(BufferGetPage(buffer)); /* needed to update FSM below */
+
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
/*
* Update the FSM as well.
*
! * Do this even if the page was restored from full page image. Otherwise
! * a cleaned up page which is also all_visible and nearly empty can go
! * unreused for prolonged time, or forever if all_frozen.
*/
! if (action == BLK_NEEDS_REDO || action == BLK_RESTORED)
XLogRecordPageWithFreeSpace(rnode, blkno, freespace);
}
Import Notes
Reply to msg id not found: CAMkU1yHuz2qWDMxJ8XkVYw1r-YvFjxEJdDQGU_qKe3w2dtA@mail.gmail.com
Hello,
At Tue, 28 Mar 2017 08:50:58 -0700, Jeff Janes <jeff.janes@gmail.com> wrote in <CAMkU=1zKfqGePWG+qqKthmWERBn8UAA2_9Sb+qTUUREhFkqLCA@mail.gmail.com>
I now think this is not the cause of the problem I am seeing. I made the
replay of FREEZE_PAGE update the FSM (both with and without FPI), butthat
did not fix it. With frequent crashes, it still accumulated a lot of
frozen and empty (but full according to FSM) pages. I also set upreplica
streaming and turned off crashing on the master, and the FSM of the
replica
stays accurate, so the WAL stream and replay logic is doing the right
thing
on the replica.
I now think the dirtied FSM pages are somehow not getting marked as
dirty,
or are getting marked as dirty but somehow the checkpoint is skipping
them. It looks like MarkBufferDirtyHint does do some operations unlocked
which could explain lost update, but it seems unlikely that that would
happen often enough to see the amount of lost updates I am seeing.Hmm.. clearing dirty hint seems already protected by exclusive
lock. And I think it can occur without lock failure.Other than by FPI, FSM update is omitted when record LSN is older
than page LSN. If heap page is evicted but FSM page is not after
vacuuming and before power cut, replaying HEAP2_CLEAN skips
update of FSM even though FPI is not attached. Of course this
cannot occur on standby. One FSM page covers as many heap pages
as about 4k, so FSM can stay far longer than heap pages.This corresponds to action == BLK_DONE case, right?
Yes. WAL with older LSN results in BLK_DONE. It works as long as
heap page and FSM are consistent but leaves FSM broken during
crach-recovery for the situation.
ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page
is already empty when entering lazy_sacn_heap, or a page of
non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is
issued to set ALL_FROZEN.Perhaps the problem will be fixed by forcing heap_xlog_visible to
update FSM (addition to FREEZE_PAGE), or the same in
heap_xlog_clean. (As menthined in the previous mail, I prefer the
latter.)When I make heap_xlog_clean update FSM even on BLK_RESTORED (but not on
BLK_DONE), it solves the problem I was seeing. Which still leaves me
wondering why the problem doesn't show up on the standby because, unlike
BLK_DONE, BLK_RESTORED should have the same issue on standby as it does on
a recovering master, shouldn't it? Maybe the difference is that the
existence a replication slot delays the clean up in a way that causes a
different pattern of WAL records.
While all WAL records are new to target page during standby
recovery, several WAL records at the beginning can be old in
a crash-recovery.
/*
* Update the FSM as well.
*
* XXX: Don't do this if the page was restored from full page image.We
* don't bother to update the FSM in that case, it doesn't need to be
* totally accurate anyway.
*/What does that save us? If we restored from FPI, we already have the
block
in memory (we don't need to see the old version, just the new one), so it
doesn't save us a random read IO.Updates on random pages can cause visits to many unloaded FSM
pages. It may be intending to avoid that.But I think that that would be no worse for BLK_RESTORED than it is for
BLK_NEEDS_REDO. Why optimize only one of the cases, if it is worth
optimizing either one?
I agree with you. FPI increases and descreases free space just
the same as redoing WAL record. The following is the discussion
about that.
/messages/by-id/49072021.7010801@enterprisedb.com
/messages/by-id/24334.1225205478@sss.pgh.pa.us
Tom Lane wrote:
Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
One issue with this patch is that it doesn't update the FSM at all when
pages are restored from full page images. It would require fetching the
page and checking the free space on it, or peeking into the size of the
backup block data, and I'm not sure if it's worth the extra code to do that.I'd vote not to bother, at least not in the first cut. As you say, 100%
accuracy isn't required, and I think that in typical scenarios an
insert/update that causes a page to become full would be relatively less
likely to have a full-page image.
So, the reason seems to be that it just doesn't seem necessary.
Including another branch of this thread, the following options
are proposed.
- Let FREEZE_PAGE and VISIBLE update FSM.
This causes extra fetch of a heap page, summing up of free
space and FSM update for every frozen pages.
- Let CLEAN always update FSM.
This causes extra counting of free space and FSM update for
every vacuuming of heap pages regardless of frozen-ness.
- Let FREEZE_PAGE/VISIBLE or CLEAN records have free space.
This doesn't need to fetch a heap page. But breaks the policy
(really?) that FSM is not WAL-logged, or that FSM is updated
just as the result of heap udpates.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers