"page is not marked all-visible" warning in regression tests
I got this last night in a perfectly standard build of HEAD:
*** /home/tgl/pgsql/src/test/regress/expected/sanity_check.out Thu Jan 12 14:06:14 2012
--- /home/tgl/pgsql/src/test/regress/results/sanity_check.out Mon Jun 4 20:28:39 2012
***************
*** 1,4 ****
--- 1,5 ----
VACUUM;
+ WARNING: page is not marked all-visible but visibility map bit is set in relation "pg_db_role_setting" page 0
--
-- sanity check, if we don't have indices the test will take years to
-- complete. But skip TOAST relations (since they will have varying
======================================================================
I didn't manage to reproduce it in a few tries, and I don't recall
having seen the like reported from the buildfarm either. So there's
some low-probability bug in there ...
regards, tom lane
On Tuesday, June 05, 2012 03:32:08 PM Tom Lane wrote:
I got this last night in a perfectly standard build of HEAD:
*** /home/tgl/pgsql/src/test/regress/expected/sanity_check.out Thu Jan 12 14:06:14 2012 --- /home/tgl/pgsql/src/test/regress/results/sanity_check.out Mon Jun 4 20:28:39 2012 *************** *** 1,4 **** --- 1,5 ---- VACUUM; + WARNING: page is not marked all-visible but visibility map bit is set in relation "pg_db_role_setting" page 0 -- -- sanity check, if we don't have indices the test will take years to -- complete. But skip TOAST relations (since they will have varying======================================================================
I didn't manage to reproduce it in a few tries, and I don't recall
having seen the like reported from the buildfarm either. So there's
some low-probability bug in there ...
I have seen that twice just yesterday. Couldn't reproduce it so far.
Workload was (pretty exactly):
initdb
postgres -c fsync=off
pgbench -i -s 100
CREATE TABLE data(id serial primary key, data int);
ALTER SEQUENCE data_id_seq INCREMENT 2;
VACUUM FREEZE;
normal shutdown
postgres -c fsync=on
pgbench -c 20 -j 20 -T 100
WARNING: ... pg_depend ...
WARNING: ... can't remember ...
Greetings,
Andres
Andres Freund <andres@2ndquadrant.com> writes:
On Tuesday, June 05, 2012 03:32:08 PM Tom Lane wrote:
I got this last night in a perfectly standard build of HEAD:
+ WARNING: page is not marked all-visible but visibility map bit is set in
relation "pg_db_role_setting" page 0 --
I have seen that twice just yesterday. Couldn't reproduce it so far.
Workload was (pretty exactly):
initdb
postgres -c fsync=off
pgbench -i -s 100
CREATE TABLE data(id serial primary key, data int);
ALTER SEQUENCE data_id_seq INCREMENT 2;
VACUUM FREEZE;
normal shutdown
postgres -c fsync=on
pgbench -c 20 -j 20 -T 100
WARNING: ... pg_depend ...
WARNING: ... can't remember ...
Hmm ... from memory, what I did was
configure/build/install from a fresh pull
initdb
start postmaster, fsync off
make installcheck
stop postmaster
apply Hanada-san's json patch, replace postgres executable
start postmaster, fsync off
make installcheck
and it was the second of these runs that failed. Could we be missing
flushing some blocks out to disk at shutdown? Maybe fsync off is a
contributing factor?
regards, tom lane
On Tuesday, June 05, 2012 04:18:44 PM Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On Tuesday, June 05, 2012 03:32:08 PM Tom Lane wrote:
I got this last night in a perfectly standard build of HEAD:
+ WARNING: page is not marked all-visible but visibility map bit is set
in relation "pg_db_role_setting" page 0 --I have seen that twice just yesterday. Couldn't reproduce it so far.
Workload was (pretty exactly):initdb
postgres -c fsync=off
pgbench -i -s 100
CREATE TABLE data(id serial primary key, data int);
ALTER SEQUENCE data_id_seq INCREMENT 2;
VACUUM FREEZE;
normal shutdown
postgres -c fsync=on
pgbench -c 20 -j 20 -T 100
WARNING: ... pg_depend ...
WARNING: ... can't remember ...Hmm ... from memory, what I did was
configure/build/install from a fresh pull
initdb
start postmaster, fsync off
make installcheck
stop postmaster
apply Hanada-san's json patch, replace postgres executable
start postmaster, fsync off
make installcheckand it was the second of these runs that failed. Could we be missing
flushing some blocks out to disk at shutdown? Maybe fsync off is a
contributing factor?
On a cursory lock it might just be a race condition in
vacuumlazy.c:lazy_scan_heap. If scan_all is set, which it has to be for the
warning to be visible, all_visible_according_to_vm is determined before we
loop over all blocks. At the point where one specific heap block is actually
read and locked that knowledge might be completely outdated by any concurrent
backend. Am I missing something?
I have to say the whole visibilitymap correctness and crash-safety seems to be
quite under documented, especially as it seems to be somewhat intricate (to
me). E.g. not having any note why visibilitymap_test doesn't need locking. (I
guess the theory is that a 1 byte read will always be consistent. But how does
that ensure other backends see an up2date value?).
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jun 6, 2012 at 1:46 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On a cursory lock it might just be a race condition in
vacuumlazy.c:lazy_scan_heap. If scan_all is set, which it has to be for the
warning to be visible, all_visible_according_to_vm is determined before we
loop over all blocks. At the point where one specific heap block is actually
read and locked that knowledge might be completely outdated by any concurrent
backend. Am I missing something?
No, I think you're right. I think that warning is bogus. I added it
in place of some older warning which no longer made sense, but I think
this one doesn't make sense either.
I have to say the whole visibilitymap correctness and crash-safety seems to be
quite under documented, especially as it seems to be somewhat intricate (to
me). E.g. not having any note why visibilitymap_test doesn't need locking. (I
guess the theory is that a 1 byte read will always be consistent. But how does
that ensure other backends see an up2date value?).
It's definitely intricate, and it's very possible that we should have
some more documentation. I am not sure exactly what and where, but
feel free to suggest something.
visibilitymap_test() does have a comment saying that:
/*
* We don't need to lock the page, as we're only looking at a
single bit.
*/
But that's a bit unsatisfying, because, as you say, it doesn't address
the question of memory-ordering issues. I think that there's no
situation in which it causes a problem to see the visibility map bit
as unset when in reality it has just recently been set by some other
back-end. It would be bad if someone did something like:
if (visibilitymap_test(...))
visibilitymap_clear();
...because then memory-ordering issues could cause us to accidentally
fail to clear the bit. No one should be doing that, though; the
relevant locking and conditional logic is built directly into
visibilitymap_clear().
On the flip side, if someone sees the visibility map bit as set when
it's actually just been cleared, that could cause a problem - most
seriously, index-only scans could return wrong answers. For that to
happen, someone would have to insert a heap tuple onto a previously
all-visible page, clearing the visibility map bit, and then insert an
index tuple; concurrently, some other backend would need to see the
index tuple but not the fact that the visibility map bit had been
cleared. I don't think that can happen: after inserting the heap
tuple, the inserting backend would release buffer content lock, which
acts as a full memory barrier; before reading the index tuple, the
index-only-scanning backend would have to take the content lock on the
index buffer, which also acts as a full memory barrier. So the
inserter can't do the writes out of order, and the index-only-scanner
can't do the reads out of order, so I think it's safe.... but we
probably do need to explain that somewhere.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wednesday, June 06, 2012 08:19:15 PM Robert Haas wrote:
On Wed, Jun 6, 2012 at 1:46 PM, Andres Freund <andres@2ndquadrant.com>
wrote:
On a cursory lock it might just be a race condition in
vacuumlazy.c:lazy_scan_heap. If scan_all is set, which it has to be for
the warning to be visible, all_visible_according_to_vm is determined
before we loop over all blocks. At the point where one specific heap
block is actually read and locked that knowledge might be completely
outdated by any concurrent backend. Am I missing something?No, I think you're right. I think that warning is bogus. I added it
in place of some older warning which no longer made sense, but I think
this one doesn't make sense either.
Agreed.
It might be interesting to recheck the visibility and warn if its wrong. That
should be infrequent enough to bearable and it does check for an actually
dangerous case in a new code path.
I have to say the whole visibilitymap correctness and crash-safety seems
to be quite under documented, especially as it seems to be somewhat
intricate (to me). E.g. not having any note why visibilitymap_test
doesn't need locking. (I guess the theory is that a 1 byte read will
always be consistent. But how does that ensure other backends see an
up2date value?).It's definitely intricate, and it's very possible that we should have
some more documentation. I am not sure exactly what and where, but
feel free to suggest something.
I think some addition to the LOCKING part of visibilitymap.c's header
explaining some of what you wrote in your email might be a good start.
I would also suggest explictly mentioning that its ok to have the
visibilitymap and the page disagreeing about the visibility if its the
visibility map that think that the page contains invisible data but not the
other way round (I think that can currently happen).
visibilitymap_test() should explain that its results can be outdated if youre
not holding a buffer lock.
visibilitymap_test() does have a comment saying that:
/*
* We don't need to lock the page, as we're only looking at a
single bit.
*/
Oh. I conveniently skipped that comment in my brain ;)
But that's a bit unsatisfying, because, as you say, it doesn't address
the question of memory-ordering issues. I think that there's no
situation in which it causes a problem to see the visibility map bit
as unset when in reality it has just recently been set by some other
back-end. It would be bad if someone did something like:if (visibilitymap_test(...))
visibilitymap_clear();...because then memory-ordering issues could cause us to accidentally
fail to clear the bit. No one should be doing that, though; the
relevant locking and conditional logic is built directly into
visibilitymap_clear().
Then _test should document that... I don't think its impossible that we will
grow more uses of the visibilitymap logic.
On the flip side, if someone sees the visibility map bit as set when
it's actually just been cleared, that could cause a problem - most
seriously, index-only scans could return wrong answers. For that to
happen, someone would have to insert a heap tuple onto a previously
all-visible page, clearing the visibility map bit, and then insert an
index tuple; concurrently, some other backend would need to see the
index tuple but not the fact that the visibility map bit had been
cleared. I don't think that can happen: after inserting the heap
tuple, the inserting backend would release buffer content lock, which
acts as a full memory barrier; before reading the index tuple, the
index-only-scanning backend would have to take the content lock on the
index buffer, which also acts as a full memory barrier. So the
inserter can't do the writes out of order, and the index-only-scanner
can't do the reads out of order, so I think it's safe.... but we
probably do need to explain that somewhere.
Hm. For a short while I thought there would be an issue with heap_delete and
IOS because the deleting transaction can commit without any barriers happening
on the IOS side. But that only seems to be possible with non MVCC snapshots
which are currently not allowed with index only scans.
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jun 6, 2012 at 3:07 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Hm. For a short while I thought there would be an issue with heap_delete and
IOS because the deleting transaction can commit without any barriers happening
on the IOS side. But that only seems to be possible with non MVCC snapshots
which are currently not allowed with index only scans.
Well, one, commits are irrelevant; the page ceases to be all-visible
as soon as the delete happens. And two, you can't do a delete or a
commit without a memory barrier - every LWLockAcquire() or
LWLockRelease() is a full barrier, so any operation that requires a
buffer content lock is both preceded and followed by a full barrier.
Proposed patch attached. This adds some more comments in various
places, and implements your suggestion of retesting the visibility-map
bit when we detect a possible mismatch with the page-level bit.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
vm-test-cleanup.patchapplication/octet-stream; name=vm-test-cleanup.patchDownload+37-9
On Thursday, June 07, 2012 03:20:50 PM Robert Haas wrote:
On Wed, Jun 6, 2012 at 3:07 PM, Andres Freund <andres@2ndquadrant.com>
wrote:
Hm. For a short while I thought there would be an issue with heap_delete
and IOS because the deleting transaction can commit without any barriers
happening on the IOS side. But that only seems to be possible with non
MVCC snapshots which are currently not allowed with index only scans.Well, one, commits are irrelevant; the page ceases to be all-visible
as soon as the delete happens.
Its not irrelevant for the code as it stands if non-mvcc snapshots were
allowed. Unless I miss something, even disregarding memory ordering issues,
there is no interlocking providing protection against the index doing the
visibilitymap_test() and some concurrent backend doing a heap_delete + commit
directly after that. If a SnapshotNow were used this could result in different
results between a IOS and a normal indexscan because the normal indexscan
would lock the heap page before doing the visibility testing and thus would
see the new visibility information. But then a SnapshotNow wouldn't be used if
that were a problem...
For normal snapshots its not a problem because with regards to the snapshot
everything on the page is still visible as I think that can only happen for
deletions and HOT updates because the index page would need to get locked
otherwise. Deletions aren't visible yet and HOT is irrelevant for the IOS by
definition.
And two, you can't do a delete or a commit without a memory barrier - every
LWLockAcquire() or LWLockRelease() is a full barrier, so any operation that
requires a buffer content lock is both preceded and followed by a full
barrier.
The memory barrier on the deleting side is irrelevant if there is no memory
barrier on the reading side.
Proposed patch attached. This adds some more comments in various
places, and implements your suggestion of retesting the visibility-map
bit when we detect a possible mismatch with the page-level bit.
Thanks, will look at it in a bit.
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jun 7, 2012 at 9:41 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Well, one, commits are irrelevant; the page ceases to be all-visible
as soon as the delete happens.Its not irrelevant for the code as it stands if non-mvcc snapshots were
allowed. Unless I miss something, even disregarding memory ordering issues,
there is no interlocking providing protection against the index doing the
visibilitymap_test() and some concurrent backend doing a heap_delete + commit
directly after that.
Hmm. Well, the backend that did the heap_delete would clear the
visibility map bit when it did the delete, before releasing the lock
on the heap buffer. So normally we'll see that buffer as
not-all-visible as soon as the delete is performed, even if no commit
has happened yet. But I guess there could be a memory-ordering issue
there for a SnapshotNow scan, because as you say there's no barrier on
the read side in that case. The delete + commit would have to happen
after we finish fetching the index TID but before we test the
visibility map bit. So I suppose that if we wanted to support
index-only scans under SnapshotNow maybe we'd want to lock and unlock
the visibility map page when testing each bit. But that almost seems
like overkill anyway: for it to matter, someone would have to be
relying on starting a scan, deleting a tuple in another transaction
while the scan was in progress, and having the scan see the delete
when it reached the deleted tuple. But of course if the scan had take
a microsecond longer to reach the deleted tuple it wouldn't have seen
it as deleted anyway, so it's a bit hard to see how any such code
could be race-free anyhow.
Proposed patch attached. This adds some more comments in various
places, and implements your suggestion of retesting the visibility-map
bit when we detect a possible mismatch with the page-level bit.Thanks, will look at it in a bit.
Thanks.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thursday, June 07, 2012 04:27:32 PM Robert Haas wrote:
On Thu, Jun 7, 2012 at 9:41 AM, Andres Freund <andres@2ndquadrant.com>
wrote:
Proposed patch attached. This adds some more comments in various
places, and implements your suggestion of retesting the visibility-map
bit when we detect a possible mismatch with the page-level bit.Thanks, will look at it in a bit.
I wonder if
/* mark page all-visible, if appropriate */
if (all_visible && !PageIsAllVisible(page))
{
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(onerel, blkno, InvalidXLogRecPtr, vmbuffer,
visibility_cutoff_xid);
}
shouldn't test
if (all_visible &&
(!PageIsAllVisible(page) || !all_visible_according_to_vm)
instead.
Commentwise I am not totally content with the emphasis on memory ordering
because some of the stuff is more locking than memory ordering. Except that I
think its a pretty clear improvement. I can reformulate the places where I
find that relevant but I have the feeling that wouldn't help the legibility.
Its the big comment in vacuumlazy.c, the comment in nodeIndexonly.c and the
one in the header of visibilitymap_test. Should be s/memory-
ordering/concurrency/ except in nodeIndexonlyscan.c
The visibilitymap_clear/PageClearAllVisible in heap_multi_insert should be
moved into the critical section, shouldn't it?
Regards,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jun 7, 2012 at 11:04 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On Thursday, June 07, 2012 04:27:32 PM Robert Haas wrote:
On Thu, Jun 7, 2012 at 9:41 AM, Andres Freund <andres@2ndquadrant.com>
wrote:
Proposed patch attached. This adds some more comments in various
places, and implements your suggestion of retesting the visibility-map
bit when we detect a possible mismatch with the page-level bit.Thanks, will look at it in a bit.
I wonder if
/* mark page all-visible, if appropriate */
if (all_visible && !PageIsAllVisible(page))
{
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(onerel, blkno, InvalidXLogRecPtr, vmbuffer,
visibility_cutoff_xid);
}
shouldn't test
if (all_visible &&
(!PageIsAllVisible(page) || !all_visible_according_to_vm)
instead.
Hmm, I think you're right.
Commentwise I am not totally content with the emphasis on memory ordering
because some of the stuff is more locking than memory ordering. Except that I
think its a pretty clear improvement. I can reformulate the places where I
find that relevant but I have the feeling that wouldn't help the legibility.
Its the big comment in vacuumlazy.c, the comment in nodeIndexonly.c and the
one in the header of visibilitymap_test. Should be s/memory-
ordering/concurrency/ except in nodeIndexonlyscan.c
Hmm, I see your point.
The visibilitymap_clear/PageClearAllVisible in heap_multi_insert should be
moved into the critical section, shouldn't it?
Yes, it should. I was thinking maybe we could go the other way and
have heap_insert do it before starting the critical section, but
that's no good: clearing the visibility map bit is part of the
critical data change, and we can't do it and then forget to WAL-log
it.
Updated patch attached.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
vm-test-cleanup-v2.patchapplication/octet-stream; name=vm-test-cleanup-v2.patchDownload+61-15
On Thursday, June 07, 2012 06:08:34 PM Robert Haas wrote:
On Thu, Jun 7, 2012 at 11:04 AM, Andres Freund <andres@2ndquadrant.com>
wrote:
On Thursday, June 07, 2012 04:27:32 PM Robert Haas wrote:
On Thu, Jun 7, 2012 at 9:41 AM, Andres Freund <andres@2ndquadrant.com>
wrote:
Proposed patch attached. This adds some more comments in various
places, and implements your suggestion of retesting the
visibility-map bit when we detect a possible mismatch with the
page-level bit.Thanks, will look at it in a bit.
The visibilitymap_clear/PageClearAllVisible in heap_multi_insert should
be moved into the critical section, shouldn't it?Yes, it should. I was thinking maybe we could go the other way and
have heap_insert do it before starting the critical section, but
that's no good: clearing the visibility map bit is part of the
critical data change, and we can't do it and then forget to WAL-log
it.
You could do a visibilitymap_pin outside, but I don't really see the point as
the page is already locked. There might be some slight benefit in doing so in
multi_insert but that would be more complicated. And of doubtful benefit.
Updated patch attached.
Looks good.
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jun 7, 2012 at 12:19 PM, Andres Freund <andres@2ndquadrant.com> wrote:
You could do a visibilitymap_pin outside, but I don't really see the point as
the page is already locked. There might be some slight benefit in doing so in
multi_insert but that would be more complicated. And of doubtful benefit.
Doesn't RelationGetBufferForTuple() already do exactly that?
Updated patch attached.
Looks good.
Thanks for the review.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thursday, June 07, 2012 06:23:58 PM Robert Haas wrote:
On Thu, Jun 7, 2012 at 12:19 PM, Andres Freund <andres@2ndquadrant.com>
wrote:
You could do a visibilitymap_pin outside, but I don't really see the
point as the page is already locked. There might be some slight benefit
in doing so in multi_insert but that would be more complicated. And of
doubtful benefit.Doesn't RelationGetBufferForTuple() already do exactly that?
Forget it, sorry. I mis(read|remembered) how multi_insert works and concluded
from that on how insert works...
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Robert Haas <robertmhaas@gmail.com> writes:
Updated patch attached.
The comments need a pass of copy-editing, eg here and here:
+ * so somebody else could be change the bit just after we look at it. In fact,
^^^^^^^^^^^^^^^
+ * got cleared after we checked it and before we got took the buffer
^^^^^^^^
Seems reasonable beyond that.
regards, tom lane
On Thu, Jun 7, 2012 at 12:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
Updated patch attached.
The comments need a pass of copy-editing, eg here and here:
+ * so somebody else could be change the bit just after we look at it. In fact,
^^^^^^^^^^^^^^^
+ * got cleared after we checked it and before we got took the buffer
^^^^^^^^
Seems reasonable beyond that.
Thanks, committed with those fixes and a correction for one more typo
I spotted elsewhere.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company