Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

Started by Andres Freundover 7 years ago25 messageshackers
Jump to latest
#1Andres Freund
andres@anarazel.de

Hi,

The zheap patchset, even after being based on pluggable storage,
currently has the following condition in RelationAddExtraBlocks():
if (RelationStorageIsZHeap(relation))
{
Assert(BufferGetBlockNumber(buffer) != ZHEAP_METAPAGE);
ZheapInitPage(page, BufferGetPageSize(buffer));
freespace = PageGetZHeapFreeSpace(page);
}
else
{
PageInit(page, BufferGetPageSize(buffer), 0);
freespace = PageGetHeapFreeSpace(page);
}

I.e. it initializes the page differently when zheap is used versus
heap.

Thinking about whether it's worth to allow to extend that function in an
extensible manner made me wonder: Is it actually a good idea to
initialize the page at that point, including marking it dirty?

As far as I can tell that that has several downsides:
- Dirtying the buffer for initialization will cause potentially
superfluous IO, with no interesting data in the write except for a
newly initialized page.
- As there's no sort of interlock, it's entirely possible that, after a
crash, the blocks will come up empty, but with the FSM returning it as
as empty, so that path would be good to support anyway.
- It adds heap specific code to a routine that otherwise could be
generic for different table access methods

It seems to me, this could be optimized by *not* initializing the page,
and having a PageIsNew(), check at the places that check whether the
page is new, and initialize it in that case.

Right now we'll just ignore such pages when we encounter them
(e.g. after a crash) until vacuum initializes them. But given that we've
accepted that we'll potentially create empty pages anyway, I don't see
what advantages that provides? The warnings emitted by vacuum are
pretty scary, and can happen in a lot of legitimate cases, so removing
them seems like a good idea anyway.

I'm pretty tired, so I might be missing something large and obvious
here.

Greetings,

Andres Freund

#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#1)
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

On Wed, Dec 19, 2018 at 2:09 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

The zheap patchset, even after being based on pluggable storage,
currently has the following condition in RelationAddExtraBlocks():
if (RelationStorageIsZHeap(relation))
{
Assert(BufferGetBlockNumber(buffer) != ZHEAP_METAPAGE);
ZheapInitPage(page, BufferGetPageSize(buffer));
freespace = PageGetZHeapFreeSpace(page);
}
else
{
PageInit(page, BufferGetPageSize(buffer), 0);
freespace = PageGetHeapFreeSpace(page);
}

I.e. it initializes the page differently when zheap is used versus
heap.

Thinking about whether it's worth to allow to extend that function in an
extensible manner made me wonder: Is it actually a good idea to
initialize the page at that point, including marking it dirty?

As far as I can tell that that has several downsides:
- Dirtying the buffer for initialization will cause potentially
superfluous IO, with no interesting data in the write except for a
newly initialized page.
- As there's no sort of interlock, it's entirely possible that, after a
crash, the blocks will come up empty, but with the FSM returning it as
as empty, so that path would be good to support anyway.
- It adds heap specific code to a routine that otherwise could be
generic for different table access methods

IIUC, your proposal is to remove page initialization and
MarkBufferDirty from RelationAddExtraBlocks(), but record them in FSM.
Is my understanding correct, if so, I don't see any problem with that
and as you have mentioned, it will be generally advantageous as well?

It seems to me, this could be optimized by *not* initializing the page,
and having a PageIsNew(), check at the places that check whether the
page is new, and initialize it in that case.

makes sense to me.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#3Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#1)
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

On Wed, Dec 19, 2018 at 3:39 AM Andres Freund <andres@anarazel.de> wrote:

Thinking about whether it's worth to allow to extend that function in an
extensible manner made me wonder: Is it actually a good idea to
initialize the page at that point, including marking it dirty?

As far as I can recall, my motivation was to avoid increasing the
number of warnings produced by VACUUM. If those warnings are going
away, then I don't know that there's any reason to keep that code as
it is. But I am not sure whether such a move would provoke any
opposition.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#3)
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

Hi,

On 2018-12-19 17:28:17 -0500, Robert Haas wrote:

On Wed, Dec 19, 2018 at 3:39 AM Andres Freund <andres@anarazel.de> wrote:

Thinking about whether it's worth to allow to extend that function in an
extensible manner made me wonder: Is it actually a good idea to
initialize the page at that point, including marking it dirty?

As far as I can recall, my motivation was to avoid increasing the
number of warnings produced by VACUUM. If those warnings are going
away, then I don't know that there's any reason to keep that code as
it is. But I am not sure whether such a move would provoke any
opposition.

What's gained by the logic of emitting that warning in VACUUM after a
crash? I don't really see any robustness advantages in it. If the logic
were that we'd never reuse empty pages because they can hide corruption
that normally would discovered by checksums, then we shouldn't
reinitialize them at all and instead error out hard - but we can't do
that, because it's normal that they occur. Right now we have empty
pages on-disk whenever a busy server is restarted in immediate mode,
after all.

Greetings,

Andres Freund

#5Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#4)
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

On Wed, Dec 19, 2018 at 5:37 PM Andres Freund <andres@anarazel.de> wrote:

What's gained by the logic of emitting that warning in VACUUM after a
crash? I don't really see any robustness advantages in it. If the logic
were that we'd never reuse empty pages because they can hide corruption
that normally would discovered by checksums, then we shouldn't
reinitialize them at all and instead error out hard - but we can't do
that, because it's normal that they occur. Right now we have empty
pages on-disk whenever a busy server is restarted in immediate mode,
after all.

I don't know. I am just normally reluctant to change things
precipitously that are of long tenure.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#5)
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Dec 19, 2018 at 5:37 PM Andres Freund <andres@anarazel.de> wrote:

What's gained by the logic of emitting that warning in VACUUM after a
crash? I don't really see any robustness advantages in it.

I don't know. I am just normally reluctant to change things
precipitously that are of long tenure.

Me too, but I think Andres has a point here. Those warnings in VACUUM
are ancient, probably predating the introduction of WAL :-(. At the
time there was good reason to be suspicious of zeroed pages in tables.
Now, though, we have (what we think is) a bulletproof crash recovery
procedure in which possibly-zeroed pages are to be expected; so we're
just causing users unnecessary alarm by warning about them. I think
it's reasonable to, if not remove the messages entirely, at least
downgrade them to a low DEBUG level.

regards, tom lane

#7Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#6)
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

Hi,

On 2018-12-19 19:39:33 -0500, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Dec 19, 2018 at 5:37 PM Andres Freund <andres@anarazel.de> wrote:

What's gained by the logic of emitting that warning in VACUUM after a
crash? I don't really see any robustness advantages in it.

I don't know. I am just normally reluctant to change things
precipitously that are of long tenure.

Me too, but I think Andres has a point here. Those warnings in VACUUM
are ancient, probably predating the introduction of WAL :-(. At the
time there was good reason to be suspicious of zeroed pages in tables.
Now, though, we have (what we think is) a bulletproof crash recovery
procedure in which possibly-zeroed pages are to be expected; so we're
just causing users unnecessary alarm by warning about them. I think
it's reasonable to, if not remove the messages entirely, at least
downgrade them to a low DEBUG level.

Yea, I think that'd be reasonable.

I think we ought to add them, as new/zero pages, to the FSM. If we did
so both during vacuum and RelationAddExtraBlocks() we'd avoid the
redundant writes, and avoid depending on heap layout in
RelationAddExtraBlocks().

I wonder if it'd make sense to only log a DEBUG if there's no
corresponding FSM entry. That'd obviously not be bulltproof, but it'd
reduce the spammyness considerably.

Greetings,

Andres Freund

#8Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#7)
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

Hi,

On 2018-12-19 16:56:36 -0800, Andres Freund wrote:

On 2018-12-19 19:39:33 -0500, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Dec 19, 2018 at 5:37 PM Andres Freund <andres@anarazel.de> wrote:

What's gained by the logic of emitting that warning in VACUUM after a
crash? I don't really see any robustness advantages in it.

I don't know. I am just normally reluctant to change things
precipitously that are of long tenure.

Me too, but I think Andres has a point here. Those warnings in VACUUM
are ancient, probably predating the introduction of WAL :-(. At the
time there was good reason to be suspicious of zeroed pages in tables.
Now, though, we have (what we think is) a bulletproof crash recovery
procedure in which possibly-zeroed pages are to be expected; so we're
just causing users unnecessary alarm by warning about them. I think
it's reasonable to, if not remove the messages entirely, at least
downgrade them to a low DEBUG level.

Yea, I think that'd be reasonable.

I think we ought to add them, as new/zero pages, to the FSM. If we did
so both during vacuum and RelationAddExtraBlocks() we'd avoid the
redundant writes, and avoid depending on heap layout in
RelationAddExtraBlocks().

I wonder if it'd make sense to only log a DEBUG if there's no
corresponding FSM entry. That'd obviously not be bulltproof, but it'd
reduce the spammyness considerably.

Here's a prototype patch implementing this change. I'm not sure the new
DEBUG message is really worth it?

Looking at the surrounding code made me wonder about the wisdom of
entering empty pages as all-visible and all-frozen into the VM. That'll
mean we'll never re-discover them on a primary, after promotion. There's
no mechanism to add such pages to the FSM on a standby (in contrast to
e.g. pages where tuples are modified), as vacuum will never visit that
page again. Now obviously it has the advantage of avoiding
re-processing the page in the next vacuum, but is that really an
important goal? If there's frequent vacuums, there got to be a fair
amount of modifications, which ought to lead to re-use of such pages at
a not too far away point?

Greetings,

Andres Freund

Attachments:

new-blocks-as-new.difftext/x-diff; charset=us-asciiDownload+49-41
#9Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#8)
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

Hi,

On 2018-12-20 15:04:11 -0800, Andres Freund wrote:

On 2018-12-19 16:56:36 -0800, Andres Freund wrote:

On 2018-12-19 19:39:33 -0500, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Dec 19, 2018 at 5:37 PM Andres Freund <andres@anarazel.de> wrote:

What's gained by the logic of emitting that warning in VACUUM after a
crash? I don't really see any robustness advantages in it.

I don't know. I am just normally reluctant to change things
precipitously that are of long tenure.

Me too, but I think Andres has a point here. Those warnings in VACUUM
are ancient, probably predating the introduction of WAL :-(. At the
time there was good reason to be suspicious of zeroed pages in tables.
Now, though, we have (what we think is) a bulletproof crash recovery
procedure in which possibly-zeroed pages are to be expected; so we're
just causing users unnecessary alarm by warning about them. I think
it's reasonable to, if not remove the messages entirely, at least
downgrade them to a low DEBUG level.

Yea, I think that'd be reasonable.

I think we ought to add them, as new/zero pages, to the FSM. If we did
so both during vacuum and RelationAddExtraBlocks() we'd avoid the
redundant writes, and avoid depending on heap layout in
RelationAddExtraBlocks().

I wonder if it'd make sense to only log a DEBUG if there's no
corresponding FSM entry. That'd obviously not be bulltproof, but it'd
reduce the spammyness considerably.

Here's a prototype patch implementing this change. I'm not sure the new
DEBUG message is really worth it?

Looking at the surrounding code made me wonder about the wisdom of
entering empty pages as all-visible and all-frozen into the VM. That'll
mean we'll never re-discover them on a primary, after promotion. There's
no mechanism to add such pages to the FSM on a standby (in contrast to
e.g. pages where tuples are modified), as vacuum will never visit that
page again. Now obviously it has the advantage of avoiding
re-processing the page in the next vacuum, but is that really an
important goal? If there's frequent vacuums, there got to be a fair
amount of modifications, which ought to lead to re-use of such pages at
a not too far away point?

Any comments on the approach in this patch?

Greetings,

Andres Freund

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#9)
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

Andres Freund <andres@anarazel.de> writes:

Looking at the surrounding code made me wonder about the wisdom of
entering empty pages as all-visible and all-frozen into the VM. That'll
mean we'll never re-discover them on a primary, after promotion. There's
no mechanism to add such pages to the FSM on a standby (in contrast to
e.g. pages where tuples are modified), as vacuum will never visit that
page again. Now obviously it has the advantage of avoiding
re-processing the page in the next vacuum, but is that really an
important goal? If there's frequent vacuums, there got to be a fair
amount of modifications, which ought to lead to re-use of such pages at
a not too far away point?

Any comments on the approach in this patch?

I agree with the concept of postponing page init till we're actually
going to do something with the page. However, the patch seems to need
some polish:

* There's a comment in RelationAddExtraBlocks, just above where you
changed, that

* Extend by one page. This should generally match the main-line
* extension code in RelationGetBufferForTuple, except that we hold
* the relation extension lock throughout.

This seems to be falsified by this patch, in that one of the two paths
does PageInit and the other doesn't.

* s/unitialized pages/uninitialized pages/

* This bit in vacuumlazy seems unnecessarily confusing:

+            Size        freespace = 0;
...
+            if (GetRecordedFreeSpace(onerel, blkno) == 0)
+                freespace = BufferGetPageSize(buf) - SizeOfPageHeaderData;
+
+            if (freespace > 0)
+            {
+                RecordPageWithFreeSpace(onerel, blkno, freespace);

I'd write that as just

+            if (GetRecordedFreeSpace(onerel, blkno) == 0)
+            {
+                Size    freespace = BufferGetPageSize(buf) - SizeOfPageHeaderData;
+                RecordPageWithFreeSpace(onerel, blkno, freespace);

I tend to agree that the DEBUG message isn't very necessary, or at least
could be lower than DEBUG1.

regards, tom lane

#11Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#10)
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

Hi,

On 2019-01-15 17:35:21 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Looking at the surrounding code made me wonder about the wisdom of
entering empty pages as all-visible and all-frozen into the VM. That'll
mean we'll never re-discover them on a primary, after promotion. There's
no mechanism to add such pages to the FSM on a standby (in contrast to
e.g. pages where tuples are modified), as vacuum will never visit that
page again. Now obviously it has the advantage of avoiding
re-processing the page in the next vacuum, but is that really an
important goal? If there's frequent vacuums, there got to be a fair
amount of modifications, which ought to lead to re-use of such pages at
a not too far away point?

Any comments on the approach in this patch?

I agree with the concept of postponing page init till we're actually
going to do something with the page. However, the patch seems to need
some polish:

Yea, as I'd written in an earlier message, it was really meant as a
prototype to see whether there's buyin to the change.

* There's a comment in RelationAddExtraBlocks, just above where you
changed, that

* Extend by one page. This should generally match the main-line
* extension code in RelationGetBufferForTuple, except that we hold
* the relation extension lock throughout.

This seems to be falsified by this patch, in that one of the two paths
does PageInit and the other doesn't.

* s/unitialized pages/uninitialized pages/

* This bit in vacuumlazy seems unnecessarily confusing:

+            Size        freespace = 0;
...
+            if (GetRecordedFreeSpace(onerel, blkno) == 0)
+                freespace = BufferGetPageSize(buf) - SizeOfPageHeaderData;
+
+            if (freespace > 0)
+            {
+                RecordPageWithFreeSpace(onerel, blkno, freespace);

I'd write that as just

+            if (GetRecordedFreeSpace(onerel, blkno) == 0)
+            {
+                Size    freespace = BufferGetPageSize(buf) - SizeOfPageHeaderData;
+                RecordPageWithFreeSpace(onerel, blkno, freespace);

Fixed, thanks for looking!

I tend to agree that the DEBUG message isn't very necessary, or at least
could be lower than DEBUG1.

After a bit of back and forth waffling, I've removed it now.

Besides a fair bit of comment changes the latest version has just one
functional change: lazy_check_needs_freeze() doesn't indicate requiring
freezing for new pages anymore, if we can't get a cleanup lock on those,
it's about to be written to, and we'd not do more than enter it into the
FSM.

Greetings,

Andres Freund

#12Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#11)
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

Hi,

So, I'd pushed the latest version. And longfin came back with an
interesting error:

ERROR: page 135 of relation "pg_class" should be empty but is not

The only way I can currently imagine this happening is that there's a
concurrent vacuum that discovers the page is empty, enters it into the
FSM (which now isn't happening under an extension lock anymore), and
then a separate backend starts to actually use that buffer. That seems
tight but possible. Seems we need to re-add the
LockRelationForExtension/UnlockRelationForExtension() logic :(

Greetings,

Andres Freund

#13Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#12)
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

Hi,

On 2019-01-28 14:10:55 -0800, Andres Freund wrote:

So, I'd pushed the latest version. And longfin came back with an
interesting error:

ERROR: page 135 of relation "pg_class" should be empty but is not

The only way I can currently imagine this happening is that there's a
concurrent vacuum that discovers the page is empty, enters it into the
FSM (which now isn't happening under an extension lock anymore), and
then a separate backend starts to actually use that buffer. That seems
tight but possible. Seems we need to re-add the
LockRelationForExtension/UnlockRelationForExtension() logic :(

Hm, but thinking about this, isn't this a risk independent of this
change? The FSM isn't WAL logged, and given it's tree-ish structure it's
possible to "rediscover" free space (e.g. FreeSpaceMapVacuum()). ISTM
that after a crash the FSM might point to free space that doesn't
actually exist, and is rediscovered after independent changes. Not sure
if that's actually a realistic issue.

I'm inclined to put back the
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
LockRelationForExtension(onerel, ExclusiveLock);
UnlockRelationForExtension(onerel, ExclusiveLock);
LockBufferForCleanup(buf);
if (PageIsNew(page))
dance regardless, just to get the buildfarm to green?

But I do wonder if we should just make hio.c cope with this instead.

Greetings,

Andres Freund

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#13)
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

Andres Freund <andres@anarazel.de> writes:

I'm inclined to put back the
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
LockRelationForExtension(onerel, ExclusiveLock);
UnlockRelationForExtension(onerel, ExclusiveLock);
LockBufferForCleanup(buf);
if (PageIsNew(page))
dance regardless, just to get the buildfarm to green?

The buildfarm's got half a dozen reports now of a failure of this ilk,
so you'd better do something.

But I do wonder if we should just make hio.c cope with this instead.

Probably should not try to go that way under time presssure.

regards, tom lane

#15Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#14)
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

Hi,

On 2019-01-28 18:08:59 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

I'm inclined to put back the
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
LockRelationForExtension(onerel, ExclusiveLock);
UnlockRelationForExtension(onerel, ExclusiveLock);
LockBufferForCleanup(buf);
if (PageIsNew(page))
dance regardless, just to get the buildfarm to green?

The buildfarm's got half a dozen reports now of a failure of this ilk,
so you'd better do something.

Yea, I was working on a patch. Was trying to come up with an explanation
of how this can be realistically hit on the BF, but failed. I've pushed
something now, let's see whether that fixes it.

But I do wonder if we should just make hio.c cope with this instead.

Probably should not try to go that way under time presssure.

Definitely.

Greetings,

Andres Freund

#16Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#15)
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

Hi,

On 2019-01-28 15:49:33 -0800, Andres Freund wrote:

On 2019-01-28 18:08:59 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

I'm inclined to put back the
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
LockRelationForExtension(onerel, ExclusiveLock);
UnlockRelationForExtension(onerel, ExclusiveLock);
LockBufferForCleanup(buf);
if (PageIsNew(page))
dance regardless, just to get the buildfarm to green?

The buildfarm's got half a dozen reports now of a failure of this ilk,
so you'd better do something.

Yea, I was working on a patch. Was trying to come up with an explanation
of how this can be realistically hit on the BF, but failed. I've pushed
something now, let's see whether that fixes it.

It has not. Given that I don't understand what's happening here I'm
going to revert both commits unless I figure it out in the next ~30min.

Greetings,

Andres Freund

#17Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#16)
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

Hi,

On 2019-01-28 16:40:36 -0800, Andres Freund wrote:

On 2019-01-28 15:49:33 -0800, Andres Freund wrote:

On 2019-01-28 18:08:59 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

I'm inclined to put back the
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
LockRelationForExtension(onerel, ExclusiveLock);
UnlockRelationForExtension(onerel, ExclusiveLock);
LockBufferForCleanup(buf);
if (PageIsNew(page))
dance regardless, just to get the buildfarm to green?

The buildfarm's got half a dozen reports now of a failure of this ilk,
so you'd better do something.

Yea, I was working on a patch. Was trying to come up with an explanation
of how this can be realistically hit on the BF, but failed. I've pushed
something now, let's see whether that fixes it.

It has not. Given that I don't understand what's happening here I'm
going to revert both commits unless I figure it out in the next ~30min.

I did that now. I couldn't reproduce it locally, despite a lot of
runs. Looking at the buildfarm it looks like the failures were,
excluding handfish which failed without recognizable symptoms before and
after, on BSD derived platforms (netbsd, freebsd, OX), which certainly
is interesting. I asked Thomas Munro whether he could run on freebsd,
and he gave me a login, where I just now reproduced the issue (2 of 5
make check runs failing).

Greetings,

Andres Freund

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#17)
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

Andres Freund <andres@anarazel.de> writes:

I did that now. I couldn't reproduce it locally, despite a lot of
runs. Looking at the buildfarm it looks like the failures were,
excluding handfish which failed without recognizable symptoms before and
after, on BSD derived platforms (netbsd, freebsd, OX), which certainly
is interesting.

Isn't it now. Something about the BSD scheduler perhaps? But we've
got four or five different BSD-ish platforms that reported failures,
and it's hard to believe they've all got identical schedulers.

That second handfish failure does match the symptoms elsewhere:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&amp;dt=2019-01-29%2000%3A20%3A22

--- /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr	2018-10-30 20:11:45.551967381 -0300
+++ /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr	2019-01-28 22:38:20.614211568 -0200
@@ -0,0 +1,20 @@
+SQL error: page 0 of relation "test_thread" should be empty but is not on line 125

so it's not quite 100% BSD, but certainly the failure rate on BSD is
way higher than elsewhere. Puzzling.

regards, tom lane

#19Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#18)
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

On 2019-01-28 22:37:53 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

I did that now. I couldn't reproduce it locally, despite a lot of
runs. Looking at the buildfarm it looks like the failures were,
excluding handfish which failed without recognizable symptoms before and
after, on BSD derived platforms (netbsd, freebsd, OX), which certainly
is interesting.

Isn't it now. Something about the BSD scheduler perhaps? But we've
got four or five different BSD-ish platforms that reported failures,
and it's hard to believe they've all got identical schedulers.

That second handfish failure does match the symptoms elsewhere:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&amp;dt=2019-01-29%2000%3A20%3A22

--- /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr	2018-10-30 20:11:45.551967381 -0300
+++ /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr	2019-01-28 22:38:20.614211568 -0200
@@ -0,0 +1,20 @@
+SQL error: page 0 of relation "test_thread" should be empty but is not on line 125

so it's not quite 100% BSD, but certainly the failure rate on BSD is
way higher than elsewhere. Puzzling.

Interesting.

While chatting with Robert about this issue I came across the following
section of code:

/*
* If the FSM knows nothing of the rel, try the last page before we
* give up and extend. This avoids one-tuple-per-page syndrome during
* bootstrapping or in a recently-started system.
*/
if (targetBlock == InvalidBlockNumber)
{
BlockNumber nblocks = RelationGetNumberOfBlocks(relation);

if (nblocks > 0)
targetBlock = nblocks - 1;
}

I think that explains the issue (albeit not why it is much more frequent
on BSDs). Because we're not going through the FSM, it's perfectly
possible to find a page that is uninitialized, *and* is not yet in the
FSM. The only reason this wasn't previously actively broken, I think, is
that while we previously *also* looked that page (before the extending
backend acquired a lock!), when looking at the page
PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free
space because it just interprets the zeroes in pd_upper - pd_lower as no
free space.

Hm, thinking about what a good solution here could be.

Greetings,

Andres Freund

#20Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#19)
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

On 2019-01-29 11:25:41 -0800, Andres Freund wrote:

On 2019-01-28 22:37:53 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

I did that now. I couldn't reproduce it locally, despite a lot of
runs. Looking at the buildfarm it looks like the failures were,
excluding handfish which failed without recognizable symptoms before and
after, on BSD derived platforms (netbsd, freebsd, OX), which certainly
is interesting.

Isn't it now. Something about the BSD scheduler perhaps? But we've
got four or five different BSD-ish platforms that reported failures,
and it's hard to believe they've all got identical schedulers.

That second handfish failure does match the symptoms elsewhere:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&amp;dt=2019-01-29%2000%3A20%3A22

--- /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr	2018-10-30 20:11:45.551967381 -0300
+++ /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr	2019-01-28 22:38:20.614211568 -0200
@@ -0,0 +1,20 @@
+SQL error: page 0 of relation "test_thread" should be empty but is not on line 125

so it's not quite 100% BSD, but certainly the failure rate on BSD is
way higher than elsewhere. Puzzling.

Interesting.

While chatting with Robert about this issue I came across the following
section of code:

/*
* If the FSM knows nothing of the rel, try the last page before we
* give up and extend. This avoids one-tuple-per-page syndrome during
* bootstrapping or in a recently-started system.
*/
if (targetBlock == InvalidBlockNumber)
{
BlockNumber nblocks = RelationGetNumberOfBlocks(relation);

if (nblocks > 0)
targetBlock = nblocks - 1;
}

I think that explains the issue (albeit not why it is much more frequent
on BSDs). Because we're not going through the FSM, it's perfectly
possible to find a page that is uninitialized, *and* is not yet in the
FSM. The only reason this wasn't previously actively broken, I think, is
that while we previously *also* looked that page (before the extending
backend acquired a lock!), when looking at the page
PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free
space because it just interprets the zeroes in pd_upper - pd_lower as no
free space.

Hm, thinking about what a good solution here could be.

I wonder if we should just expand the logic we have for
RBM_ZERO_AND_LOCK so it can be and use it in hio.c (we probably could
just use it without any changes, but the name seems a bit confusing) -
because that'd prevent the current weirdness that it's possible that the
buffer can be locked by somebody between the ReadBufferBI(P_NEW) and and
the LockBuffer(BUFFER_LOCK_EXCLUSIVE). I think that'd allow us to
alltogether drop the cleanup lock logic we currently have, and also
protect us against the FSM issue I'd outlined upthread?

Greetings,

Andres Freund

#21Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#19)
#22Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#20)
#23Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#22)
#24Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#23)
#25Thomas Munro
thomas.munro@gmail.com
In reply to: Amit Kapila (#24)