Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.
Hi all,
I faced suspicious behaviour on hot standby server related to visibility map.
The scenario is,
1. Create table and check internal of visibility map on master server.
postgres(1)=# create table hoge (col int);
CREATE TABLE
postgres(1)=# insert into hoge select generate_series(1,10);
INSERT 0 10
postgres(1)=# select * from pg_visibility('hoge');
blkno | all_visible | all_frozen | pd_all_visible
-------+-------------+------------+----------------
0 | f | f | f
(1 row)
2. Check internal of visibility map on standby server.
postgres(1)=# select * from pg_visibility('hoge');
blkno | all_visible | all_frozen | pd_all_visible
-------+-------------+------------+----------------
0 | f | f | f
(1 row)
3. Do VACUUM on master server.
postgres(1)=# vacuum hoge;
VACUUM
postgres(1)=# select * from pg_visibility('hoge');
blkno | all_visible | all_frozen | pd_all_visible
-------+-------------+------------+----------------
0 | t | f | t
(1 row)
4. Check internal of visibility map on standby server again.
** Note that the we use the connection we established at #2 again **
postgres(1)=# select * from pg_visibility('hoge');
blkno | all_visible | all_frozen | pd_all_visible
-------+-------------+------------+----------------
0 | f | f | t
(1 row)
Even on standby server, the result should be (t,f,t), but returned (f,f,t)
(XLOG_HEAP2_VISIBLE record actually has been reached to standby, and
has been surely applied)
As a result of looked into code around the recvoery, ISTM that the
cause is related to relation cache clear.
In heap_xlog_visible, if the standby server receives WAL record then
relation cache is eventually cleared in vm_extend, but If standby
server receives FPI then relation cache would not be cleared.
For example, after I applied attached patch to HEAD, (it might not be
right way but) this problem seems to be resolved.
Is this a bug? or not?
Regards,
--
Masahiko Sawada
Attachments:
heap_xlog_visible_invalidate_cache.patchapplication/octet-stream; name=heap_xlog_visible_invalidate_cache.patchDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 34ba385..3d891a7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7816,6 +7816,7 @@ heap_xlog_visible(XLogReaderState *record)
RelFileNode rnode;
BlockNumber blkno;
XLogRedoAction action;
+ Relation reln;
XLogRecGetBlockTag(record, 1, &rnode, NULL, &blkno);
@@ -7870,6 +7871,8 @@ heap_xlog_visible(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
+ reln = CreateFakeRelcacheEntry(rnode);
+
/*
* Even if we skipped the heap page update due to the LSN interlock, it's
* still safe to update the visibility map. Any WAL record that clears
@@ -7880,7 +7883,6 @@ heap_xlog_visible(XLogReaderState *record)
&vmbuffer) == BLK_NEEDS_REDO)
{
Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
/* initialize the page if it was read as zeros */
if (PageIsNew(vmpage))
@@ -7892,7 +7894,6 @@ heap_xlog_visible(XLogReaderState *record)
*/
LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
- reln = CreateFakeRelcacheEntry(rnode);
visibilitymap_pin(reln, blkno, &vmbuffer);
/*
@@ -7911,10 +7912,13 @@ heap_xlog_visible(XLogReaderState *record)
xlrec->cutoff_xid, xlrec->flags);
ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
}
else if (BufferIsValid(vmbuffer))
UnlockReleaseBuffer(vmbuffer);
+
+ RelationOpenSmgr(reln);
+ CacheInvalidateSmgr(reln->rd_smgr->smgr_rnode);
+ FreeFakeRelcacheEntry(reln);
}
/*
On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote:
As a result of looked into code around the recvoery, ISTM that the
cause is related to relation cache clear.
In heap_xlog_visible, if the standby server receives WAL record then
relation cache is eventually cleared in vm_extend, but If standby
server receives FPI then relation cache would not be cleared.
For example, after I applied attached patch to HEAD, (it might not be
right way but) this problem seems to be resolved.Is this a bug? or not?
It's a bug. I don't expect it causes queries to return wrong answers, because
visibilitymap.c says "it's always safe to clear a bit in the map from
correctness point of view." (The bug makes a visibility map bit temporarily
appear to have been cleared.) I still call it a bug, because recovery
behavior becomes too difficult to verify when xlog replay produces conditions
that don't happen outside of recovery. Even if there's no way to get a wrong
query answer today, this would be too easy to break later. I wonder if we
make the same omission in other xlog replay functions. Similar omissions may
cause wrong query answers, even if this particular one does not.
Would you like to bisect for the commit, or at least the major release, at
which the bug first appeared?
I wonder if your discovery has any relationship to this recently-reported case
of insufficient smgr invalidation:
/messages/by-id/CAB7nPqSBFmh5cQjpRbFBp9Rkv1nF=Nh2o1FxKkJ6yvOBtvYDBA@mail.gmail.com
Thanks,
nm
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Mar 31, 2016 at 2:02 PM, Noah Misch <noah@leadboat.com> wrote:
On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote:
As a result of looked into code around the recvoery, ISTM that the
cause is related to relation cache clear.
In heap_xlog_visible, if the standby server receives WAL record then
relation cache is eventually cleared in vm_extend, but If standby
server receives FPI then relation cache would not be cleared.
For example, after I applied attached patch to HEAD, (it might not be
right way but) this problem seems to be resolved.Is this a bug? or not?
It's a bug. I don't expect it causes queries to return wrong answers, because
visibilitymap.c says "it's always safe to clear a bit in the map from
correctness point of view." (The bug makes a visibility map bit temporarily
appear to have been cleared.) I still call it a bug, because recovery
behavior becomes too difficult to verify when xlog replay produces conditions
that don't happen outside of recovery. Even if there's no way to get a wrong
query answer today, this would be too easy to break later. I wonder if we
make the same omission in other xlog replay functions. Similar omissions may
cause wrong query answers, even if this particular one does not.Would you like to bisect for the commit, or at least the major release, at
which the bug first appeared?I wonder if your discovery has any relationship to this recently-reported case
of insufficient smgr invalidation:
/messages/by-id/CAB7nPqSBFmh5cQjpRbFBp9Rkv1nF=Nh2o1FxKkJ6yvOBtvYDBA@mail.gmail.com
I'm not sure this bug has relationship to another issue you mentioned
but after further investigation, this bug seems to be reproduced even
on more older version.
At least I reproduced it at 9.0.0.
Regards,
--
Masahiko Sawada
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Mar 31, 2016 at 04:48:26PM +0900, Masahiko Sawada wrote:
On Thu, Mar 31, 2016 at 2:02 PM, Noah Misch <noah@leadboat.com> wrote:
On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote:
As a result of looked into code around the recvoery, ISTM that the
cause is related to relation cache clear.
In heap_xlog_visible, if the standby server receives WAL record then
relation cache is eventually cleared in vm_extend, but If standby
server receives FPI then relation cache would not be cleared.
For example, after I applied attached patch to HEAD, (it might not be
right way but) this problem seems to be resolved.Is this a bug? or not?
It's a bug. I don't expect it causes queries to return wrong answers, because
visibilitymap.c says "it's always safe to clear a bit in the map from
correctness point of view." (The bug makes a visibility map bit temporarily
appear to have been cleared.) I still call it a bug, because recovery
behavior becomes too difficult to verify when xlog replay produces conditions
that don't happen outside of recovery. Even if there's no way to get a wrong
query answer today, this would be too easy to break later. I wonder if we
make the same omission in other xlog replay functions. Similar omissions may
cause wrong query answers, even if this particular one does not.Would you like to bisect for the commit, or at least the major release, at
which the bug first appeared?I wonder if your discovery has any relationship to this recently-reported case
of insufficient smgr invalidation:
/messages/by-id/CAB7nPqSBFmh5cQjpRbFBp9Rkv1nF=Nh2o1FxKkJ6yvOBtvYDBA@mail.gmail.comI'm not sure this bug has relationship to another issue you mentioned
but after further investigation, this bug seems to be reproduced even
on more older version.
At least I reproduced it at 9.0.0.
Would you try PostgreSQL 9.2.16? The visibility map was not crash safe and
had no correctness implications until 9.2. If 9.2 behaves this way, it's
almost certainly not a recent regression.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Apr 1, 2016 at 9:10 AM, Noah Misch <noah@leadboat.com> wrote:
On Thu, Mar 31, 2016 at 04:48:26PM +0900, Masahiko Sawada wrote:
On Thu, Mar 31, 2016 at 2:02 PM, Noah Misch <noah@leadboat.com> wrote:
On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote:
As a result of looked into code around the recvoery, ISTM that the
cause is related to relation cache clear.
In heap_xlog_visible, if the standby server receives WAL record then
relation cache is eventually cleared in vm_extend, but If standby
server receives FPI then relation cache would not be cleared.
For example, after I applied attached patch to HEAD, (it might not be
right way but) this problem seems to be resolved.Is this a bug? or not?
It's a bug. I don't expect it causes queries to return wrong answers, because
visibilitymap.c says "it's always safe to clear a bit in the map from
correctness point of view." (The bug makes a visibility map bit temporarily
appear to have been cleared.) I still call it a bug, because recovery
behavior becomes too difficult to verify when xlog replay produces conditions
that don't happen outside of recovery. Even if there's no way to get a wrong
query answer today, this would be too easy to break later. I wonder if we
make the same omission in other xlog replay functions. Similar omissions may
cause wrong query answers, even if this particular one does not.Would you like to bisect for the commit, or at least the major release, at
which the bug first appeared?I wonder if your discovery has any relationship to this recently-reported case
of insufficient smgr invalidation:
/messages/by-id/CAB7nPqSBFmh5cQjpRbFBp9Rkv1nF=Nh2o1FxKkJ6yvOBtvYDBA@mail.gmail.comI'm not sure this bug has relationship to another issue you mentioned
but after further investigation, this bug seems to be reproduced even
on more older version.
At least I reproduced it at 9.0.0.Would you try PostgreSQL 9.2.16? The visibility map was not crash safe and
had no correctness implications until 9.2. If 9.2 behaves this way, it's
almost certainly not a recent regression.
Yeah, I reproduced it on 9.2.0 and 9.2.16, it's not recent regression.
The commit is 503c7305a1e379f95649eef1a694d0c1dbdc674a which
introduces crash-safe visibility map.
Regards,
--
Masahiko Sawada
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-03-31 01:02:06 -0400, Noah Misch wrote:
On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote:
As a result of looked into code around the recvoery, ISTM that the
cause is related to relation cache clear.
In heap_xlog_visible, if the standby server receives WAL record then
relation cache is eventually cleared in vm_extend, but If standby
server receives FPI then relation cache would not be cleared.
For example, after I applied attached patch to HEAD, (it might not be
right way but) this problem seems to be resolved.Is this a bug? or not?
It's a bug.
I agree it's a bug. If I understand correctly it's not really
visibilitymap related though:
Debugging shows that vm_extend() (on the master) correctly issues a
CacheInvalidateSmgr(), which ought to clear the smgr state on the
standby. But replay doesn't do anything like that. That kind of sounded
familiar. The issue is that vacuum doesn't assign an xid and thus
RecordTransactionCommit() doesn't emit a commit record, which in turn
means invalidation messages aren't sent to the standby.
Ugh.
We've recently discussed a very similar issue around
/messages/by-id/20160227002958.peftvmcx4dxwe244@alap3.anarazel.de
Unfortunately Simon over in that thread disagreed there about fixing
this by always emitting a commit record when nmsgs > 0 in
RecordTransactionCommit(). I think this thread is a pretty strong hint
that we actually should do so.
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 13, 2016 at 9:58 PM, Andres Freund <andres@anarazel.de> wrote:
We've recently discussed a very similar issue around
/messages/by-id/20160227002958.peftvmcx4dxwe244@alap3.anarazel.deUnfortunately Simon over in that thread disagreed there about fixing
this by always emitting a commit record when nmsgs > 0 in
RecordTransactionCommit(). I think this thread is a pretty strong hint
that we actually should do so.
Yes. I'm pretty confident that you had the right idea there, and that
Simon's objection was off-base.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-04-14 11:50:58 -0400, Robert Haas wrote:
On Wed, Apr 13, 2016 at 9:58 PM, Andres Freund <andres@anarazel.de> wrote:
We've recently discussed a very similar issue around
/messages/by-id/20160227002958.peftvmcx4dxwe244@alap3.anarazel.deUnfortunately Simon over in that thread disagreed there about fixing
this by always emitting a commit record when nmsgs > 0 in
RecordTransactionCommit(). I think this thread is a pretty strong hint
that we actually should do so.Yes. I'm pretty confident that you had the right idea there, and that
Simon's objection was off-base.
The easiest way to achieve that seems to be to just assign an xid if
that's the case; while it's not necessarily safe/efficient to do so at
the point the invalidation message was queued, I think it should be safe
to do so at commit time. Seems less invasive to backpatch than to either
support commit records without xids, or a separate record just
transporting invalidation messages.
Although a separate record with invalidations is going to be needed to
support logical decoding of running xacts anyway...
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Apr 14, 2016 at 12:11 PM, Andres Freund <andres@anarazel.de> wrote:
On 2016-04-14 11:50:58 -0400, Robert Haas wrote:
On Wed, Apr 13, 2016 at 9:58 PM, Andres Freund <andres@anarazel.de> wrote:
We've recently discussed a very similar issue around
/messages/by-id/20160227002958.peftvmcx4dxwe244@alap3.anarazel.deUnfortunately Simon over in that thread disagreed there about fixing
this by always emitting a commit record when nmsgs > 0 in
RecordTransactionCommit(). I think this thread is a pretty strong hint
that we actually should do so.Yes. I'm pretty confident that you had the right idea there, and that
Simon's objection was off-base.The easiest way to achieve that seems to be to just assign an xid if
that's the case; while it's not necessarily safe/efficient to do so at
the point the invalidation message was queued, I think it should be safe
to do so at commit time. Seems less invasive to backpatch than to either
support commit records without xids, or a separate record just
transporting invalidation messages.
I agree that's better for back-patching. I hope it won't suck
performance-wise. In master, we might think of inventing something
new.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-04-15 13:07:19 -0400, Robert Haas wrote:
On Thu, Apr 14, 2016 at 12:11 PM, Andres Freund <andres@anarazel.de> wrote:
On 2016-04-14 11:50:58 -0400, Robert Haas wrote:
On Wed, Apr 13, 2016 at 9:58 PM, Andres Freund <andres@anarazel.de> wrote:
We've recently discussed a very similar issue around
/messages/by-id/20160227002958.peftvmcx4dxwe244@alap3.anarazel.deUnfortunately Simon over in that thread disagreed there about fixing
this by always emitting a commit record when nmsgs > 0 in
RecordTransactionCommit(). I think this thread is a pretty strong hint
that we actually should do so.Yes. I'm pretty confident that you had the right idea there, and that
Simon's objection was off-base.The easiest way to achieve that seems to be to just assign an xid if
that's the case; while it's not necessarily safe/efficient to do so at
the point the invalidation message was queued, I think it should be safe
to do so at commit time. Seems less invasive to backpatch than to either
support commit records without xids, or a separate record just
transporting invalidation messages.I agree that's better for back-patching.
It's a bit ugly though, since we're at that stage pretty heavily
assuming there's no xid assigned (on the caller level)... I've toyed
with the idea of emitting a commit record that doesn't have an assigned
xid, but that'd require changes on the apply side, which would be uglier
than a new record type :(
I hope it won't suck performance-wise.
I can't see that be the case, there's not many places where we send
invalidation messages without an assigned xid. Running an instrumented
build with an appropriate wal level reveals about a handfull places.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Apr 14, 2016 at 12:11 PM, Andres Freund <andres@anarazel.de> wrote:
The easiest way to achieve that seems to be to just assign an xid if
that's the case; while it's not necessarily safe/efficient to do so at
the point the invalidation message was queued, I think it should be safe
to do so at commit time. Seems less invasive to backpatch than to either
support commit records without xids, or a separate record just
transporting invalidation messages.
I agree that's better for back-patching. I hope it won't suck
performance-wise. In master, we might think of inventing something
new.
I'm a little worried about whether this will break assumptions that
vacuum doesn't have an XID. I don't immediately see how it would,
but it seems a bit shaky.
I find it hard to believe that the act of assigning an XID would add
measurably to the cost of a vacuum, so Robert's performance concern
doesn't sound very exciting. If this works, I think it's fine to
adopt as a permanent solution.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Apr 15, 2016 at 1:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Apr 14, 2016 at 12:11 PM, Andres Freund <andres@anarazel.de> wrote:
The easiest way to achieve that seems to be to just assign an xid if
that's the case; while it's not necessarily safe/efficient to do so at
the point the invalidation message was queued, I think it should be safe
to do so at commit time. Seems less invasive to backpatch than to either
support commit records without xids, or a separate record just
transporting invalidation messages.I agree that's better for back-patching. I hope it won't suck
performance-wise. In master, we might think of inventing something
new.I'm a little worried about whether this will break assumptions that
vacuum doesn't have an XID. I don't immediately see how it would,
but it seems a bit shaky.
Actually, I think there's a bigger problem. If you manage to almost
wrap around the XID space, and the cluster shuts down, you are
guaranteed to be able to vacuum the whole cluster without actually
running out of XIDs. Forcing an XID to be assigned here would make
that uncertain - it would depend on how many tables you have versus
how many XIDs you have left. That seems unacceptable to me.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-04-15 13:44:27 -0400, Robert Haas wrote:
On Fri, Apr 15, 2016 at 1:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Apr 14, 2016 at 12:11 PM, Andres Freund <andres@anarazel.de> wrote:
The easiest way to achieve that seems to be to just assign an xid if
that's the case; while it's not necessarily safe/efficient to do so at
the point the invalidation message was queued, I think it should be safe
to do so at commit time. Seems less invasive to backpatch than to either
support commit records without xids, or a separate record just
transporting invalidation messages.I agree that's better for back-patching. I hope it won't suck
performance-wise. In master, we might think of inventing something
new.I'm a little worried about whether this will break assumptions that
vacuum doesn't have an XID. I don't immediately see how it would,
but it seems a bit shaky.Actually, I think there's a bigger problem. If you manage to almost
wrap around the XID space, and the cluster shuts down, you are
guaranteed to be able to vacuum the whole cluster without actually
running out of XIDs.
Currently you're unfortunately not, c.f.
/messages/by-id/CAPpHfdspOkmiQsxh-UZw2chM6dRMwXAJGEmmbmqYR=yvM7-s6A@mail.gmail.com
Forcing an XID to be assigned here would make
that uncertain - it would depend on how many tables you have versus
how many XIDs you have left. That seems unacceptable to me.
But I agree that that's a concern. I'm kinda leaning towards
introducing an invalidation WAL record for that case atm. It's quite
possible to force an xid to be assigned in xact.c, but it's certainly
not pretty (the least ugly way is to duplicate the
xactGetCommittedInvalidationMessages() call, and force an xid to be
assigned early in CommitTransaction().
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 15 April 2016 at 18:44, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Apr 15, 2016 at 1:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Apr 14, 2016 at 12:11 PM, Andres Freund <andres@anarazel.de>
wrote:
The easiest way to achieve that seems to be to just assign an xid if
that's the case; while it's not necessarily safe/efficient to do so at
the point the invalidation message was queued, I think it should besafe
to do so at commit time. Seems less invasive to backpatch than to
either
support commit records without xids, or a separate record just
transporting invalidation messages.I agree that's better for back-patching. I hope it won't suck
performance-wise. In master, we might think of inventing something
new.I'm a little worried about whether this will break assumptions that
vacuum doesn't have an XID. I don't immediately see how it would,
but it seems a bit shaky.Actually, I think there's a bigger problem. If you manage to almost
wrap around the XID space, and the cluster shuts down, you are
guaranteed to be able to vacuum the whole cluster without actually
running out of XIDs. Forcing an XID to be assigned here would make
that uncertain - it would depend on how many tables you have versus
how many XIDs you have left. That seems unacceptable to me.
I agree this is a bug, similar to the last one.
For me, the issue is that we need to do something to catch bugs. The
existing code does not have any test at all to check whether we are doing
the wrong thing - it just lets the wrong thing happen.
Fixing it by forcing a new behaviour might be the right thing to do going
forwards, but I don't much like the idea of forcing new behaviour in back
branches. It might fix this bug, but can easily cause others.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2016-04-15 19:59:06 +0100, Simon Riggs wrote:
For me, the issue is that we need to do something to catch bugs. The
existing code does not have any test at all to check whether we are doing
the wrong thing - it just lets the wrong thing happen.
But sending the message, without assigning an xid, *IS* the right thing
to do here? We shouldn't assign an xid, and we need to send the message
out to the standbys.
Fixing it by forcing a new behaviour might be the right thing to do going
forwards, but I don't much like the idea of forcing new behaviour in back
branches. It might fix this bug, but can easily cause others.
What's your alternative? Assigning an xid in the middle of vacuum isn't
ok, breaking vacuum isn't either?
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 15 April 2016 at 20:01, Andres Freund <andres@anarazel.de> wrote:
On 2016-04-15 19:59:06 +0100, Simon Riggs wrote:
For me, the issue is that we need to do something to catch bugs. The
existing code does not have any test at all to check whether we are doing
the wrong thing - it just lets the wrong thing happen.But sending the message, without assigning an xid, *IS* the right thing
to do here? We shouldn't assign an xid, and we need to send the message
out to the standbys.Fixing it by forcing a new behaviour might be the right thing to do going
forwards, but I don't much like the idea of forcing new behaviour in back
branches. It might fix this bug, but can easily cause others.What's your alternative? Assigning an xid in the middle of vacuum isn't
ok, breaking vacuum isn't either?
In my understanding we have two choices for this bug
1) assign an xid so it forces sending a message (message plus xid)
2) send a message without assigning an xid (message only)
(1) seems like it is worse for backpatching, IMHO, though I am willing to
hear other thoughts or options
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes:
In my understanding we have two choices for this bug
1) assign an xid so it forces sending a message (message plus xid)
2) send a message without assigning an xid (message only)
(1) seems like it is worse for backpatching, IMHO, though I am willing to
hear other thoughts or options
The problem with (1) is that it creates side-effects that could be bad;
Robert's already pointed out one close-to-show-stopper consequence,
and I have little confidence that there are not others. In general,
if we got here without assigning an xid, there's a reason.
I think the bottom line is that we misdesigned the WAL representation
by assuming that this sort of info could always be piggybacked on a
transaction commit record. It's time to fix that.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
I think the bottom line is that we misdesigned the WAL representation
by assuming that this sort of info could always be piggybacked on a
transaction commit record. It's time to fix that.
I think we got to piggyback it onto a commit record, as long as there's
one. Otherwise it's going to be more complex (queuing messages when
reading an inval record) and slower (more wal records). I can easily
develop a patch for that, the question is what we do on the back
branches...
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund wrote:
On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
I think the bottom line is that we misdesigned the WAL representation
by assuming that this sort of info could always be piggybacked on a
transaction commit record. It's time to fix that.I think we got to piggyback it onto a commit record, as long as there's
one. Otherwise it's going to be more complex (queuing messages when
reading an inval record) and slower (more wal records). I can easily
develop a patch for that, the question is what we do on the back
branches...
We have introduced new wal records in back branches previously --
nothing new (c.f. 8e9a16ab8f7f0e5813644975cc3f336e5b064b6e). The user
just needs to make sure to upgrade the standbys first. If they don't,
they would die upon replay of the first such record, which they can take
as an indication that they need to be upgraded; the standby is down for
some time, but there is no data loss or corruption.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
I think the bottom line is that we misdesigned the WAL representation
by assuming that this sort of info could always be piggybacked on a
transaction commit record. It's time to fix that.
I think we got to piggyback it onto a commit record, as long as there's
one.
No objection to that part. What I'm saying is that when there isn't one,
the answer is a new record type, not forcing xid assignment. It might
look almost like a commit record, but it shouldn't imply that there
was a transaction.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Apr 16, 2016 at 5:37 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
Andres Freund wrote:
On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
I think the bottom line is that we misdesigned the WAL representation
by assuming that this sort of info could always be piggybacked on a
transaction commit record. It's time to fix that.I think we got to piggyback it onto a commit record, as long as there's
one. Otherwise it's going to be more complex (queuing messages when
reading an inval record) and slower (more wal records). I can easily
develop a patch for that, the question is what we do on the back
branches...We have introduced new wal records in back branches previously --
nothing new (c.f. 8e9a16ab8f7f0e5813644975cc3f336e5b064b6e). The user
just needs to make sure to upgrade the standbys first. If they don't,
they would die upon replay of the first such record, which they can take
as an indication that they need to be upgraded; the standby is down for
some time, but there is no data loss or corruption.
Yeah, introducing a new WAL record to address this issue in
back-branches would not be an issue, and that's what we should do. For
HEAD, let's add that in the commit record.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 18 April 2016 at 12:43, Michael Paquier <michael.paquier@gmail.com>
wrote:
On Sat, Apr 16, 2016 at 5:37 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:Andres Freund wrote:
On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
I think the bottom line is that we misdesigned the WAL representation
by assuming that this sort of info could always be piggybacked on a
transaction commit record. It's time to fix that.I think we got to piggyback it onto a commit record, as long as there's
one. Otherwise it's going to be more complex (queuing messages when
reading an inval record) and slower (more wal records). I can easily
develop a patch for that, the question is what we do on the back
branches...We have introduced new wal records in back branches previously --
nothing new (c.f. 8e9a16ab8f7f0e5813644975cc3f336e5b064b6e). The user
just needs to make sure to upgrade the standbys first. If they don't,
they would die upon replay of the first such record, which they can take
as an indication that they need to be upgraded; the standby is down for
some time, but there is no data loss or corruption.Yeah, introducing a new WAL record to address this issue in
back-branches would not be an issue, and that's what we should do. For
HEAD, let's add that in the commit record.
(non-reply just because of travel)
OK, I'll write up a patch today to fix, with a view to backpatching.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2016-04-15 17:37:03 -0300, Alvaro Herrera wrote:
Andres Freund wrote:
On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
I think the bottom line is that we misdesigned the WAL representation
by assuming that this sort of info could always be piggybacked on a
transaction commit record. It's time to fix that.I think we got to piggyback it onto a commit record, as long as there's
one. Otherwise it's going to be more complex (queuing messages when
reading an inval record) and slower (more wal records). I can easily
develop a patch for that, the question is what we do on the back
branches...We have introduced new wal records in back branches previously --
nothing new (c.f. 8e9a16ab8f7f0e5813644975cc3f336e5b064b6e).
Yea, I remember ;). We made that decision because we couldn't find
another way, and because the consequences were pretty grave.
The user just needs to make sure to upgrade the standbys first. If
they don't, they would die upon replay of the first such record, which
they can take as an indication that they need to be upgraded; the
standby is down for some time, but there is no data loss or
corruption.
There could, if they're using wal_keep_segments, and the standby cannot
be caught up anymore.
I think it's still worth to go for the new record type, but it's a
pretty close call. We could also just decide to document the issues :/ -
but I'm not sure we're eing all of them yet.
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-04-18 20:43:30 +0900, Michael Paquier wrote:
Yeah, introducing a new WAL record to address this issue in
back-branches would not be an issue, and that's what we should do. For
HEAD, let's add that in the commit record.
I'm not sure why/how you'd do it that way in HEAD? I mean the only
reason not to use a separate record is the standby incompatibility.
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-04-18 13:15:19 +0100, Simon Riggs wrote:
OK, I'll write up a patch today to fix, with a view to backpatching.
Cool. I've spent a couple minutes on this yesterday, and have the start
of a patch - do you want that?
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-04-15 16:53:37 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
I think the bottom line is that we misdesigned the WAL representation
by assuming that this sort of info could always be piggybacked on a
transaction commit record. It's time to fix that.I think we got to piggyback it onto a commit record, as long as there's
one.No objection to that part. What I'm saying is that when there isn't one,
the answer is a new record type, not forcing xid assignment. It might
look almost like a commit record, but it shouldn't imply that there
was a transaction.
Here's a patch doing so. Note that, after putting the record into RM_XACT_ID
first, I've decided to make it a RM_STANDBY_ID type record. I think it's
likely that this is going to be needed beyond transaction commits, and
it generally seems to fit better into standby.c; even if it's a bit
awkward that commit records contain similar data. To avoid duplicating
the *desc.c code, I've exposed standby_desc_invalidations() to also be
used by xactdesc.c.
It fixes the problem at hand, but essentially it's just luck that the
patch is sufficient. The first layer of the issue is that queued
invalidation messages aren't sent; but for vm_extend() there's another
side to it. Namely vm_extend() does
/*
* Send a shared-inval message to force other backends to close any smgr
* references they may have for this rel, which we are about to change.
* This is a useful optimization because it means that backends don't have
* to keep checking for creation or extension of the file, which happens
* infrequently.
*/
CacheInvalidateSmgr(rel->rd_smgr->smgr_rnode);
but CacheInvalidateSmgr is non-transactional as it's comment explains:
*
* Note: because these messages are nontransactional, they won't be captured
* in commit/abort WAL entries. Instead, calls to CacheInvalidateSmgr()
* should happen in low-level smgr.c routines, which are executed while
* replaying WAL as well as when creating it.
*
as far as I can see vm_extend() is the only current caller forgetting
that rule. I don't think it's safe to use CacheInvalidateSmgr() outside
smgr.c.
The reason this all ends up working nonetheless is that the
heap_inplace_update()s in vacuum triggers a CacheInvalidateHeapTuple()
which queues a relcache invalidation, which in turn does a
RelationCloseSmgr() in RelationClearRelation(). Thereby effectively
doing a transactional CacheInvalidateSmgr(). But that seems more than
fragile.
ISTM we should additionally replace the CacheInvalidateSmgr() with a
CacheInvalidateRelcache() and document that that implies an smgr
invalidation. Alternatively we could log smgr (and relmapper)
invalidations as well, but that's not quite non-invasive either; but
might be a good long-term idea to keep things simpler.
Comments?
- Andres
Attachments:
0001-Emit-invalidations-to-standby-for-transactions-witho.patchtext/x-patch; charset=utf-8Download
From b2295700fb9e1d01b03bdebca3c71692941144c9 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 23 Apr 2016 19:18:00 -0700
Subject: [PATCH] Emit invalidations to standby for transactions without xid.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
So far, when a transaction with pending invalidations, but without an
assigned xid, committed, we simply ignored those invalidation
messages. That's problematic, because those are actually sent for a
reason.
Known symptoms of this include that existing sessions on a hot-standby
replica sometimes fail to notice new concurrently built indexes and
visibility map updates.
The solution is to WAL log such invalidations in transactions without an
xid. We considered to alternatively force-assign an xid, but that'd be
problematic for vacuum, which might be run in systems with few xids.
Important: This adds a new WAL record, but as the patch has to be
back-patched, we can't bump the WAL page magic. This means that standbys
have to be updated before primaries; otherwise
"PANIC: standby_redo: unknown op code 32" errors can be encountered.
XXX:
Reported-By: ÐаÑилÑев ÐмиÑÑий, Masahiko Sawada
Discussion:
CAB-SwXY6oH=9twBkXJtgR4UC1NqT-vpYAtxCseME62ADwyK5OA@mail.gmail.com
CAD21AoDpZ6Xjg=gFrGPnSn4oTRRcwK1EBrWCq9OqOHuAcMMC=w@mail.gmail.com
---
src/backend/access/rmgrdesc/standbydesc.c | 54 +++++++++++++++++++++++++
src/backend/access/rmgrdesc/xactdesc.c | 30 ++------------
src/backend/access/transam/xact.c | 18 +++++++++
src/backend/replication/logical/decode.c | 9 +++++
src/backend/replication/logical/reorderbuffer.c | 53 +++++++++++++++---------
src/backend/storage/ipc/standby.c | 35 ++++++++++++++++
src/backend/utils/cache/inval.c | 5 ++-
src/include/replication/reorderbuffer.h | 2 +
src/include/storage/standby.h | 2 +
src/include/storage/standbydefs.h | 21 ++++++++++
10 files changed, 181 insertions(+), 48 deletions(-)
diff --git a/src/backend/access/rmgrdesc/standbydesc.c b/src/backend/access/rmgrdesc/standbydesc.c
index 4872cfb..e6172cc 100644
--- a/src/backend/access/rmgrdesc/standbydesc.c
+++ b/src/backend/access/rmgrdesc/standbydesc.c
@@ -58,6 +58,14 @@ standby_desc(StringInfo buf, XLogReaderState *record)
standby_desc_running_xacts(buf, xlrec);
}
+ else if (info == XLOG_INVALIDATIONS)
+ {
+ xl_invalidations *xlrec = (xl_invalidations *) rec;
+
+ standby_desc_invalidations(buf, xlrec->nmsgs, xlrec->msgs,
+ xlrec->dbId, xlrec->tsId,
+ xlrec->relcacheInitFileInval);
+ }
}
const char *
@@ -73,7 +81,53 @@ standby_identify(uint8 info)
case XLOG_RUNNING_XACTS:
id = "RUNNING_XACTS";
break;
+ case XLOG_INVALIDATIONS:
+ id = "INVALIDATIONS";
+ break;
}
return id;
}
+
+/*
+ * This routine is used by both standby_desc and xact_desc, because
+ * transaction commits and XLOG_INVALIDATIONS messages contain invalidations;
+ * it seems pointless to duplicate the code.
+ */
+void
+standby_desc_invalidations(StringInfo buf,
+ int nmsgs, SharedInvalidationMessage *msgs,
+ Oid dbId, Oid tsId,
+ bool relcacheInitFileInval)
+{
+ int i;
+
+ if (relcacheInitFileInval)
+ appendStringInfo(buf, "; relcache init file inval dbid %u tsid %u",
+ dbId, tsId);
+
+ appendStringInfoString(buf, "; inval msgs:");
+ for (i = 0; i < nmsgs; i++)
+ {
+ SharedInvalidationMessage *msg = &msgs[i];
+
+ if (msg->id >= 0)
+ appendStringInfo(buf, " catcache %d", msg->id);
+ else if (msg->id == SHAREDINVALCATALOG_ID)
+ appendStringInfo(buf, " catalog %u", msg->cat.catId);
+ else if (msg->id == SHAREDINVALRELCACHE_ID)
+ appendStringInfo(buf, " relcache %u", msg->rc.relId);
+ /* not expected, but print something anyway */
+ else if (msg->id == SHAREDINVALSMGR_ID)
+ appendStringInfoString(buf, " smgr");
+ /* not expected, but print something anyway */
+ else if (msg->id == SHAREDINVALRELMAP_ID)
+ appendStringInfoString(buf, " relmap");
+ else if (msg->id == SHAREDINVALRELMAP_ID)
+ appendStringInfo(buf, " relmap db %u", msg->rm.dbId);
+ else if (msg->id == SHAREDINVALSNAPSHOT_ID)
+ appendStringInfo(buf, " snapshot %u", msg->sn.relId);
+ else
+ appendStringInfo(buf, " unknown id %d", msg->id);
+ }
+}
diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index e8a334c..6f07c5c 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -18,6 +18,7 @@
#include "access/xact.h"
#include "catalog/catalog.h"
#include "storage/sinval.h"
+#include "storage/standbydefs.h"
#include "utils/timestamp.h"
/*
@@ -203,32 +204,9 @@ xact_desc_commit(StringInfo buf, uint8 info, xl_xact_commit *xlrec, RepOriginId
}
if (parsed.nmsgs > 0)
{
- if (XactCompletionRelcacheInitFileInval(parsed.xinfo))
- appendStringInfo(buf, "; relcache init file inval dbid %u tsid %u",
- parsed.dbId, parsed.tsId);
-
- appendStringInfoString(buf, "; inval msgs:");
- for (i = 0; i < parsed.nmsgs; i++)
- {
- SharedInvalidationMessage *msg = &parsed.msgs[i];
-
- if (msg->id >= 0)
- appendStringInfo(buf, " catcache %d", msg->id);
- else if (msg->id == SHAREDINVALCATALOG_ID)
- appendStringInfo(buf, " catalog %u", msg->cat.catId);
- else if (msg->id == SHAREDINVALRELCACHE_ID)
- appendStringInfo(buf, " relcache %u", msg->rc.relId);
- /* not expected, but print something anyway */
- else if (msg->id == SHAREDINVALSMGR_ID)
- appendStringInfoString(buf, " smgr");
- /* not expected, but print something anyway */
- else if (msg->id == SHAREDINVALRELMAP_ID)
- appendStringInfoString(buf, " relmap");
- else if (msg->id == SHAREDINVALSNAPSHOT_ID)
- appendStringInfo(buf, " snapshot %u", msg->sn.relId);
- else
- appendStringInfo(buf, " unknown id %d", msg->id);
- }
+ standby_desc_invalidations(
+ buf, parsed.nmsgs, parsed.msgs, parsed.dbId, parsed.tsId,
+ XactCompletionRelcacheInitFileInval(parsed.xinfo));
}
if (XactCompletionForceSyncCommit(parsed.xinfo))
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 7e37331..95690ff 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1164,6 +1164,24 @@ RecordTransactionCommit(void)
Assert(nchildren == 0);
/*
+ * Transactions without an assigned xid can contain invalidation
+ * messages (e.g. explicit relcache invalidations or catcache
+ * invalidations for inplace updates); standbys need to process
+ * those. We can't emit a commit record without an xid, and we don't
+ * want to force assigning an xid, because that'd be problematic for
+ * e.g. vacuum. Hence we emit a bespoke record for the
+ * invalidations. We don't want to use that in case a commit record is
+ * emitted, so they happen synchronously with commits (besides not
+ * wanting to emit more WAL recoreds).
+ */
+ if (nmsgs != 0)
+ {
+ LogStandbyInvalidations(nmsgs, invalMessages,
+ RelcacheInitFileInval);
+ wrote_xlog = true; /* not strictly necessary */
+ }
+
+ /*
* If we didn't create XLOG entries, we're done here; otherwise we
* should trigger flushing those entries the same as a commit record
* would. This will primarily happen for HOT pruning and the like; we
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 0cdb0b8..0c248f0 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -327,6 +327,15 @@ DecodeStandbyOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
break;
case XLOG_STANDBY_LOCK:
break;
+ case XLOG_INVALIDATIONS:
+ {
+ xl_invalidations *invalidations =
+ (xl_invalidations *) XLogRecGetData(r);
+
+ ReorderBufferImmediateInvalidation(
+ ctx->reorder, invalidations->nmsgs, invalidations->msgs);
+ }
+ break;
default:
elog(ERROR, "unexpected RM_STANDBY_ID record type: %u", info);
}
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 4520708..57821c3 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1810,26 +1810,8 @@ ReorderBufferForget(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
* catalog and we need to update the caches according to that.
*/
if (txn->base_snapshot != NULL && txn->ninvalidations > 0)
- {
- bool use_subtxn = IsTransactionOrTransactionBlock();
-
- if (use_subtxn)
- BeginInternalSubTransaction("replay");
-
- /*
- * Force invalidations to happen outside of a valid transaction - that
- * way entries will just be marked as invalid without accessing the
- * catalog. That's advantageous because we don't need to setup the
- * full state necessary for catalog access.
- */
- if (use_subtxn)
- AbortCurrentTransaction();
-
- ReorderBufferExecuteInvalidations(rb, txn);
-
- if (use_subtxn)
- RollbackAndReleaseCurrentSubTransaction();
- }
+ ReorderBufferImmediateInvalidation(rb, txn->ninvalidations,
+ txn->invalidations);
else
Assert(txn->ninvalidations == 0);
@@ -1837,6 +1819,37 @@ ReorderBufferForget(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
ReorderBufferCleanupTXN(rb, txn);
}
+/*
+ * Execute invalidations happening outside the context of a decoded
+ * transaction. That currently happens either for xid-less commits
+ * (c.f. RecordTransactionCommit()) or for invalidations in uninteresting
+ * transactions (via ReorderBufferForget()).
+ */
+void
+ReorderBufferImmediateInvalidation(ReorderBuffer *rb, uint32 ninvalidations,
+ SharedInvalidationMessage *invalidations)
+{
+ bool use_subtxn = IsTransactionOrTransactionBlock();
+ int i;
+
+ if (use_subtxn)
+ BeginInternalSubTransaction("replay");
+
+ /*
+ * Force invalidations to happen outside of a valid transaction - that
+ * way entries will just be marked as invalid without accessing the
+ * catalog. That's advantageous because we don't need to setup the
+ * full state necessary for catalog access.
+ */
+ if (use_subtxn)
+ AbortCurrentTransaction();
+
+ for (i = 0; i < ninvalidations; i++)
+ LocalExecuteInvalidationMessage(&invalidations[i]);
+
+ if (use_subtxn)
+ RollbackAndReleaseCurrentSubTransaction();
+}
/*
* Tell reorderbuffer about an xid seen in the WAL stream. Has to be called at
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 6a9bf84..762dfa6 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -825,6 +825,16 @@ standby_redo(XLogReaderState *record)
ProcArrayApplyRecoveryInfo(&running);
}
+ else if (info == XLOG_INVALIDATIONS)
+ {
+ xl_invalidations *xlrec = (xl_invalidations *) XLogRecGetData(record);
+
+ ProcessCommittedInvalidationMessages(xlrec->msgs,
+ xlrec->nmsgs,
+ xlrec->relcacheInitFileInval,
+ xlrec->dbId,
+ xlrec->tsId);
+ }
else
elog(PANIC, "standby_redo: unknown op code %u", info);
}
@@ -1068,3 +1078,28 @@ LogAccessExclusiveLockPrepare(void)
*/
(void) GetTopTransactionId();
}
+
+/*
+ * Emit WAL for invalidations. This currently is only used for commits without
+ * an xid but which contain invalidations.
+ */
+void
+LogStandbyInvalidations(int nmsgs, SharedInvalidationMessage *msgs,
+ bool relcacheInitFileInval)
+{
+ xl_invalidations xlrec;
+
+ /* prepare record */
+ memset(&xlrec, 0, sizeof(xlrec));
+ xlrec.dbId = MyDatabaseId;
+ xlrec.tsId = MyDatabaseTableSpace;
+ xlrec.relcacheInitFileInval = relcacheInitFileInval;
+ xlrec.nmsgs = nmsgs;
+
+ /* perform insertion */
+ XLogBeginInsert();
+ XLogRegisterData((char *) (&xlrec), MinSizeOfInvalidations);
+ XLogRegisterData((char *) msgs,
+ nmsgs * sizeof(SharedInvalidationMessage));
+ XLogInsert(RM_STANDBY_ID, XLOG_INVALIDATIONS);
+}
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 924bebb..5803518 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -842,8 +842,9 @@ xactGetCommittedInvalidationMessages(SharedInvalidationMessage **msgs,
}
/*
- * ProcessCommittedInvalidationMessages is executed by xact_redo_commit()
- * to process invalidation messages added to commit records.
+ * ProcessCommittedInvalidationMessages is executed by xact_redo_commit() or
+ * standby_redo() to process invalidation messages. Currently that happens
+ * only at end-of-xact.
*
* Relcache init file invalidation requires processing both
* before and after we send the SI messages. See AtEOXact_Inval()
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 4c54953..e070894 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -391,6 +391,8 @@ void ReorderBufferAddNewTupleCids(ReorderBuffer *, TransactionId, XLogRecPtr lsn
CommandId cmin, CommandId cmax, CommandId combocid);
void ReorderBufferAddInvalidations(ReorderBuffer *, TransactionId, XLogRecPtr lsn,
Size nmsgs, SharedInvalidationMessage *msgs);
+void ReorderBufferImmediateInvalidation(ReorderBuffer *, uint32 ninvalidations,
+ SharedInvalidationMessage *invalidations);
void ReorderBufferProcessXid(ReorderBuffer *, TransactionId xid, XLogRecPtr lsn);
void ReorderBufferXidSetCatalogChanges(ReorderBuffer *, TransactionId xid, XLogRecPtr lsn);
bool ReorderBufferXidHasCatalogChanges(ReorderBuffer *, TransactionId xid);
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index aafc9b8..5205884 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -85,5 +85,7 @@ extern void LogAccessExclusiveLock(Oid dbOid, Oid relOid);
extern void LogAccessExclusiveLockPrepare(void);
extern XLogRecPtr LogStandbySnapshot(void);
+extern void LogStandbyInvalidations(int nmsgs, SharedInvalidationMessage *msgs,
+ bool relcacheInitFileInval);
#endif /* STANDBY_H */
diff --git a/src/include/storage/standbydefs.h b/src/include/storage/standbydefs.h
index 609d06e..bd3c97f 100644
--- a/src/include/storage/standbydefs.h
+++ b/src/include/storage/standbydefs.h
@@ -17,17 +17,23 @@
#include "access/xlogreader.h"
#include "lib/stringinfo.h"
#include "storage/lockdefs.h"
+#include "storage/sinval.h"
/* Recovery handlers for the Standby Rmgr (RM_STANDBY_ID) */
extern void standby_redo(XLogReaderState *record);
extern void standby_desc(StringInfo buf, XLogReaderState *record);
extern const char *standby_identify(uint8 info);
+extern void standby_desc_invalidations(StringInfo buf,
+ int nmsgs, SharedInvalidationMessage *msgs,
+ Oid dbId, Oid tsId,
+ bool relcacheInitFileInval);
/*
* XLOG message types
*/
#define XLOG_STANDBY_LOCK 0x00
#define XLOG_RUNNING_XACTS 0x10
+#define XLOG_INVALIDATIONS 0x20
typedef struct xl_standby_locks
{
@@ -50,4 +56,19 @@ typedef struct xl_running_xacts
TransactionId xids[FLEXIBLE_ARRAY_MEMBER];
} xl_running_xacts;
+/*
+ * Invalidations for standby, currently only when transactions without an
+ * assigned xid commit.
+ */
+typedef struct xl_invalidations
+{
+ Oid dbId; /* MyDatabaseId */
+ Oid tsId; /* MyDatabaseTableSpace */
+ bool relcacheInitFileInval; /* invalidate relcache init file */
+ int nmsgs; /* number of shared inval msgs */
+ SharedInvalidationMessage msgs[FLEXIBLE_ARRAY_MEMBER];
+} xl_invalidations;
+
+#define MinSizeOfInvalidations offsetof(xl_invalidations, msgs)
+
#endif /* STANDBYDEFS_H */
--
2.7.0.229.g701fa7f
On 18 April 2016 at 13:15, Simon Riggs <simon@2ndquadrant.com> wrote:
On 18 April 2016 at 12:43, Michael Paquier <michael.paquier@gmail.com>
wrote:On Sat, Apr 16, 2016 at 5:37 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:Andres Freund wrote:
On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
I think the bottom line is that we misdesigned the WAL representation
by assuming that this sort of info could always be piggybacked on a
transaction commit record. It's time to fix that.I think we got to piggyback it onto a commit record, as long as there's
one. Otherwise it's going to be more complex (queuing messages when
reading an inval record) and slower (more wal records). I can easily
develop a patch for that, the question is what we do on the back
branches...We have introduced new wal records in back branches previously --
nothing new (c.f. 8e9a16ab8f7f0e5813644975cc3f336e5b064b6e). The user
just needs to make sure to upgrade the standbys first. If they don't,
they would die upon replay of the first such record, which they can take
as an indication that they need to be upgraded; the standby is down for
some time, but there is no data loss or corruption.Yeah, introducing a new WAL record to address this issue in
back-branches would not be an issue, and that's what we should do. For
HEAD, let's add that in the commit record.(non-reply just because of travel)
OK, I'll write up a patch today to fix, with a view to backpatching.
Patch from Tuesday. On various planes.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
inval_only.v1.patchapplication/octet-stream; name=inval_only.v1.patchDownload
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index a65048b..f33cd41 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -2053,12 +2053,10 @@ RecordTransactionCommitPrepared(TransactionId xid,
MyPgXact->delayChkpt = true;
/* Emit the XLOG commit record */
- recptr = XactLogCommitRecord(committs,
+ recptr = XactLogCommitRecord(committs, xid, true,
nchildren, children, nrels, rels,
ninvalmsgs, invalmsgs,
- initfileinval, false,
- xid);
-
+ initfileinval, false);
if (replorigin)
/* Move LSNs forward for this replication origin */
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 7e37331..71ab3f4 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1149,7 +1149,7 @@ RecordTransactionCommit(void)
* If we haven't been assigned an XID yet, we neither can, nor do we want
* to write a COMMIT record.
*/
- if (!markXidCommitted)
+ if (!markXidCommitted && nmsgs == 0)
{
/*
* We expect that every smgrscheduleunlink is followed by a catalog
@@ -1212,10 +1212,10 @@ RecordTransactionCommit(void)
SetCurrentTransactionStopTimestamp();
XactLogCommitRecord(xactStopTimestamp,
+ xid, false, /* plain commit */
nchildren, children, nrels, rels,
nmsgs, invalMessages,
- RelcacheInitFileInval, forceSyncCommit,
- InvalidTransactionId /* plain commit */ );
+ RelcacheInitFileInval, forceSyncCommit);
if (replorigin)
/* Move LSNs forward for this replication origin */
@@ -1235,7 +1235,8 @@ RecordTransactionCommit(void)
if (!replorigin || replorigin_session_origin_timestamp == 0)
replorigin_session_origin_timestamp = xactStopTimestamp;
- TransactionTreeSetCommitTsData(xid, nchildren, children,
+ if (markXidCommitted)
+ TransactionTreeSetCommitTsData(xid, nchildren, children,
replorigin_session_origin_timestamp,
replorigin_session_origin, false);
}
@@ -1305,7 +1306,7 @@ RecordTransactionCommit(void)
* If we entered a commit critical section, leave it now, and let
* checkpoints proceed.
*/
- if (markXidCommitted)
+ if (markXidCommitted || nmsgs > 0)
{
MyPgXact->delayChkpt = false;
END_CRIT_SECTION();
@@ -5079,18 +5080,23 @@ xactGetCommittedChildren(TransactionId **ptr)
/*
- * Log the commit record for a plain or twophase transaction commit.
+ * Log information at transaction commit.
*
- * A 2pc commit will be emitted when twophase_xid is valid, a plain one
- * otherwise.
+ * In the common case this produces standard commit records, though we
+ * support other cases:
+ * * If xid is invalid and we have msgs, send an INVAL_ONLY record
+ * * If xid is valid and twophase is true, send a COMMIT_PREPARED record
+ *
+ * In earlier releases, as a bug fix, the XLOG_XACT_INVAL_ONLY record is
+ * sent as an XLOG_NOOP record so it can avoid screwing up archives.
*/
XLogRecPtr
XactLogCommitRecord(TimestampTz commit_time,
+ TransactionId xid, bool twophase,
int nsubxacts, TransactionId *subxacts,
int nrels, RelFileNode *rels,
int nmsgs, SharedInvalidationMessage *msgs,
- bool relcacheInval, bool forceSync,
- TransactionId twophase_xid)
+ bool relcacheInval, bool forceSync)
{
xl_xact_commit xlrec;
xl_xact_xinfo xl_xinfo;
@@ -5108,10 +5114,15 @@ XactLogCommitRecord(TimestampTz commit_time,
xl_xinfo.xinfo = 0;
/* decide between a plain and 2pc commit */
- if (!TransactionIdIsValid(twophase_xid))
- info = XLOG_XACT_COMMIT;
- else
+ if (!TransactionIdIsValid(xid))
+ {
+ Assert(nmsgs > 0);
+ info = XLOG_XACT_INVAL_ONLY;
+ }
+ else if (twophase)
info = XLOG_XACT_COMMIT_PREPARED;
+ else
+ info = XLOG_XACT_COMMIT;
/* First figure out and collect all the information needed */
@@ -5126,7 +5137,8 @@ XactLogCommitRecord(TimestampTz commit_time,
* Check if the caller would like to ask standbys for immediate feedback
* once this commit is applied.
*/
- if (synchronous_commit >= SYNCHRONOUS_COMMIT_REMOTE_APPLY)
+ if (synchronous_commit >= SYNCHRONOUS_COMMIT_REMOTE_APPLY &&
+ info != XLOG_XACT_INVAL_ONLY)
xl_xinfo.xinfo |= XACT_COMPLETION_APPLY_FEEDBACK;
/*
@@ -5142,12 +5154,14 @@ XactLogCommitRecord(TimestampTz commit_time,
if (nsubxacts > 0)
{
+ Assert(info != XLOG_XACT_INVAL_ONLY);
xl_xinfo.xinfo |= XACT_XINFO_HAS_SUBXACTS;
xl_subxacts.nsubxacts = nsubxacts;
}
if (nrels > 0)
{
+ Assert(info != XLOG_XACT_INVAL_ONLY);
xl_xinfo.xinfo |= XACT_XINFO_HAS_RELFILENODES;
xl_relfilenodes.nrels = nrels;
}
@@ -5158,10 +5172,11 @@ XactLogCommitRecord(TimestampTz commit_time,
xl_invals.nmsgs = nmsgs;
}
- if (TransactionIdIsValid(twophase_xid))
+ if (twophase)
{
+ Assert(info == XLOG_XACT_COMMIT_PREPARED);
xl_xinfo.xinfo |= XACT_XINFO_HAS_TWOPHASE;
- xl_twophase.xid = twophase_xid;
+ xl_twophase.xid = xid;
}
/* dump transaction origin information */
@@ -5569,7 +5584,8 @@ xact_redo(XLogReaderState *record)
/* Backup blocks are not used in xact records */
Assert(!XLogRecHasAnyBlockRefs(record));
- if (info == XLOG_XACT_COMMIT || info == XLOG_XACT_COMMIT_PREPARED)
+ if (info == XLOG_XACT_COMMIT || info == XLOG_XACT_COMMIT_PREPARED ||
+ info == XLOG_XACT_INVAL_ONLY)
{
xl_xact_commit *xlrec = (xl_xact_commit *) XLogRecGetData(record);
xl_xact_parsed_commit parsed;
@@ -5583,13 +5599,25 @@ xact_redo(XLogReaderState *record)
xact_redo_commit(&parsed, XLogRecGetXid(record),
record->EndRecPtr, XLogRecGetOrigin(record));
}
- else
+ else if (info == XLOG_XACT_COMMIT_PREPARED)
{
Assert(TransactionIdIsValid(parsed.twophase_xid));
xact_redo_commit(&parsed, parsed.twophase_xid,
record->EndRecPtr, XLogRecGetOrigin(record));
RemoveTwoPhaseFile(parsed.twophase_xid, false);
}
+ else /* XLOG_XACT_INVAL_ONLY */
+ {
+ Assert(!TransactionIdIsValid(parsed.twophase_xid));
+
+ /*
+ * Send cache invalidations attched to this message.
+ */
+ ProcessCommittedInvalidationMessages(
+ parsed.msgs, parsed.nmsgs,
+ XactCompletionRelcacheInitFileInval(parsed.xinfo),
+ parsed.dbId, parsed.tsId);
+ }
}
else if (info == XLOG_XACT_ABORT || info == XLOG_XACT_ABORT_PREPARED)
{
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 503ae1b..cbd430f 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -118,7 +118,7 @@ typedef void (*SubXactCallback) (SubXactEvent event, SubTransactionId mySubid,
#define XLOG_XACT_COMMIT_PREPARED 0x30
#define XLOG_XACT_ABORT_PREPARED 0x40
#define XLOG_XACT_ASSIGNMENT 0x50
-/* free opcode 0x60 */
+#define XLOG_XACT_INVAL_ONLY 0x60
/* free opcode 0x70 */
/* mask for filtering opcodes out of xl_info */
@@ -361,11 +361,11 @@ extern void UnregisterSubXactCallback(SubXactCallback callback, void *arg);
extern int xactGetCommittedChildren(TransactionId **ptr);
extern XLogRecPtr XactLogCommitRecord(TimestampTz commit_time,
+ TransactionId xid, bool twophase,
int nsubxacts, TransactionId *subxacts,
int nrels, RelFileNode *rels,
int nmsgs, SharedInvalidationMessage *msgs,
- bool relcacheInval, bool forceSync,
- TransactionId twophase_xid);
+ bool relcacheInval, bool forceSync);
extern XLogRecPtr XactLogAbortRecord(TimestampTz abort_time,
int nsubxacts, TransactionId *subxacts,
On 2016-04-24 22:10:19 +0100, Simon Riggs wrote:
On 18 April 2016 at 13:15, Simon Riggs <simon@2ndquadrant.com> wrote:
(non-reply just because of travel)
OK, I'll write up a patch today to fix, with a view to backpatching.
Patch from Tuesday. On various planes.
I posted a version of this at
http://archives.postgresql.org/message-id/20160424025117.2cmf6ku4ldfcoo44%40alap3.anarazel.de
(because it was blocking me from fixing a newer bug)
I think my approach of a separate record is going to be easier to
backpatch - the new commit record format you took advantage of is
new. If we go your way, this also needs pg_xlogdump/xact_desc() &
logical decoding integration.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Dean, Robert,
Afaics the problem described below was introduced in b4e07417, do you
have a different/better proposal than
s/CacheInvalidateSmgr/CacheInvalidateRelcache/? Doing that doesn't feel
quite right either, because it only makes the file extension visible at
end-of-xact - which is mostly harmless, but still.
On 2016-04-23 19:51:17 -0700, Andres Freund wrote:
It fixes the problem at hand, but essentially it's just luck that the
patch is sufficient. The first layer of the issue is that queued
invalidation messages aren't sent; but for vm_extend() there's another
side to it. Namely vm_extend() does/*
* Send a shared-inval message to force other backends to close any smgr
* references they may have for this rel, which we are about to change.
* This is a useful optimization because it means that backends don't have
* to keep checking for creation or extension of the file, which happens
* infrequently.
*/
CacheInvalidateSmgr(rel->rd_smgr->smgr_rnode);but CacheInvalidateSmgr is non-transactional as it's comment explains:
*
* Note: because these messages are nontransactional, they won't be captured
* in commit/abort WAL entries. Instead, calls to CacheInvalidateSmgr()
* should happen in low-level smgr.c routines, which are executed while
* replaying WAL as well as when creating it.
*as far as I can see vm_extend() is the only current caller forgetting
that rule. I don't think it's safe to use CacheInvalidateSmgr() outside
smgr.c.The reason this all ends up working nonetheless is that the
heap_inplace_update()s in vacuum triggers a CacheInvalidateHeapTuple()
which queues a relcache invalidation, which in turn does a
RelationCloseSmgr() in RelationClearRelation(). Thereby effectively
doing a transactional CacheInvalidateSmgr(). But that seems more than
fragile.ISTM we should additionally replace the CacheInvalidateSmgr() with a
CacheInvalidateRelcache() and document that that implies an smgr
invalidation. Alternatively we could log smgr (and relmapper)
invalidations as well, but that's not quite non-invasive either; but
might be a good long-term idea to keep things simpler.
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 26, 2016 at 1:13 AM, Andres Freund <andres@anarazel.de> wrote:
I think my approach of a separate record is going to be easier to
backpatch - the new commit record format you took advantage of is
new.
Sorry for the late reply.
After reading both patches yesterday, I found your approach with
standby.c to be neater.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Apr 24, 2016 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote:
Here's a patch doing so. Note that, after putting the record into RM_XACT_ID
first, I've decided to make it a RM_STANDBY_ID type record. I think it's
likely that this is going to be needed beyond transaction commits, and
it generally seems to fit better into standby.c; even if it's a bit
awkward that commit records contain similar data. To avoid duplicating
the *desc.c code, I've exposed standby_desc_invalidations() to also be
used by xactdesc.c.
I have been eyeballing this patch for some time since yesterday and
did some tests, and that's really neat. Having the invalidation stuff
in standby.c makes quite some sense as well.
+ * wanting to emit more WAL recoreds).
s/recoreds/records/
The reason this all ends up working nonetheless is that the
heap_inplace_update()s in vacuum triggers a CacheInvalidateHeapTuple()
which queues a relcache invalidation, which in turn does a
RelationCloseSmgr() in RelationClearRelation(). Thereby effectively
doing a transactional CacheInvalidateSmgr(). But that seems more than
fragile.
You have spent quite some time on that. As things stand that's indeed
a bit fragile..
ISTM we should additionally replace the CacheInvalidateSmgr() with a
CacheInvalidateRelcache() and document that that implies an smgr
invalidation. Alternatively we could log smgr (and relmapper)
invalidations as well, but that's not quite non-invasive either; but
might be a good long-term idea to keep things simpler.Comments?
Yeah, this looks like a good idea at the end. As the invalidation
patch is aimed at being backpatched, this may be something to do as
well in back-branches.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
Thanks for looking into this.
On 2016-04-26 11:43:06 +0900, Michael Paquier wrote:
On Sun, Apr 24, 2016 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote:
ISTM we should additionally replace the CacheInvalidateSmgr() with a
CacheInvalidateRelcache() and document that that implies an smgr
invalidation. Alternatively we could log smgr (and relmapper)
invalidations as well, but that's not quite non-invasive either; but
might be a good long-term idea to keep things simpler.Comments?
Yeah, this looks like a good idea at the end.
You mean the bit about making smgr invalidations logged?
As the invalidation patch is aimed at being backpatched, this may be
something to do as well in back-branches.
I'm a bit split here. I think forcing processing of invalidations at
moments they've previously never been processed is a bit risky for the
back branches. But on the other hand relcache invalidations are only
processed at end-of-xact, which isn't really correct for the code at
hand :/
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 26, 2016 at 11:47 AM, Andres Freund <andres@anarazel.de> wrote:
Thanks for looking into this.
On 2016-04-26 11:43:06 +0900, Michael Paquier wrote:
On Sun, Apr 24, 2016 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote:
ISTM we should additionally replace the CacheInvalidateSmgr() with a
CacheInvalidateRelcache() and document that that implies an smgr
invalidation. Alternatively we could log smgr (and relmapper)
invalidations as well, but that's not quite non-invasive either; but
might be a good long-term idea to keep things simpler.Comments?
Yeah, this looks like a good idea at the end.
You mean the bit about making smgr invalidations logged?
Sorry for the lack of precision in my words. I was referring to your
idea of replacing CacheInvalidateSmgr() by CacheInvalidateRelcache()
sounds like a good option as this would make the invalidation to be
logged at commit. Thinking about the logging of smgr invalidations,
this is quite interesting. But what would we actually gain in doing
that? Do you foresee any advantages in doing so? The only case where
this would be useful now is for vm_extend by looking at the code.
As the invalidation patch is aimed at being backpatched, this may be
something to do as well in back-branches.I'm a bit split here. I think forcing processing of invalidations at
moments they've previously never been processed is a bit risky for the
back branches. But on the other hand relcache invalidations are only
processed at end-of-xact, which isn't really correct for the code at
hand :/
Oh, OK. So you mean that this patch is not aimed for back-branches
with this new record type, but that's only for HEAD.. To be honest, I
thought that we had better address this issue on back-branches with a
patch close to what you are proposing (which is more or less what
Simon has actually sent), and keep the change of CacheInvalidateSmgr()
in visibilitymap.c to be HEAD-only.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 26, 2016 at 12:39 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Tue, Apr 26, 2016 at 11:47 AM, Andres Freund <andres@anarazel.de> wrote:
Thanks for looking into this.
On 2016-04-26 11:43:06 +0900, Michael Paquier wrote:
On Sun, Apr 24, 2016 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote:
ISTM we should additionally replace the CacheInvalidateSmgr() with a
CacheInvalidateRelcache() and document that that implies an smgr
invalidation. Alternatively we could log smgr (and relmapper)
invalidations as well, but that's not quite non-invasive either; but
might be a good long-term idea to keep things simpler.Comments?
Yeah, this looks like a good idea at the end.
You mean the bit about making smgr invalidations logged?
Sorry for the lack of precision in my words. I was referring to your
idea of replacing CacheInvalidateSmgr() by CacheInvalidateRelcache()
sounds like a good option as this would make the invalidation to be
logged at commit. Thinking about the logging of smgr invalidations,
this is quite interesting. But what would we actually gain in doing
that? Do you foresee any advantages in doing so? The only case where
this would be useful now is for vm_extend by looking at the code.As the invalidation patch is aimed at being backpatched, this may be
something to do as well in back-branches.I'm a bit split here. I think forcing processing of invalidations at
moments they've previously never been processed is a bit risky for the
back branches. But on the other hand relcache invalidations are only
processed at end-of-xact, which isn't really correct for the code at
hand :/Oh, OK. So you mean that this patch is not aimed for back-branches
with this new record type, but that's only for HEAD.. To be honest, I
thought that we had better address this issue on back-branches with a
patch close to what you are proposing (which is more or less what
Simon has actually sent), and keep the change of CacheInvalidateSmgr()
in visibilitymap.c to be HEAD-only.
Ah, and actually as I'm on it from your previous patch there is this bit:
+ else if (msg->id == SHAREDINVALRELMAP_ID)
+ appendStringInfoString(buf, " relmap");
+ else if (msg->id == SHAREDINVALRELMAP_ID)
+ appendStringInfo(buf, " relmap db %u", msg->rm.dbId);
You surely want to check for !OidIsValid(msg->rm.dbId) in the first one.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-04-26 12:39:37 +0900, Michael Paquier wrote:
Thinking about the logging of smgr invalidations, this is quite
interesting. But what would we actually gain in doing that? Do you
foresee any advantages in doing so? The only case where this would be
useful now is for vm_extend by looking at the code.
Well, it'd make vm_extend actually correct, which replacing the
invalidation with a relcache one would not. Relcache invalidations are
transactional, whereas smgr ones are not (intentionally so!). I don't
think it's currently a big problem, but it does make me rather wary.
As the invalidation patch is aimed at being backpatched, this may be
something to do as well in back-branches.I'm a bit split here. I think forcing processing of invalidations at
moments they've previously never been processed is a bit risky for the
back branches. But on the other hand relcache invalidations are only
processed at end-of-xact, which isn't really correct for the code at
hand :/Oh, OK. So you mean that this patch is not aimed for back-branches
with this new record type, but that's only for HEAD.
No, I think we got to do this in all branches. I was just wondering
about how to fix vm_extend(). Which I do think we got to fix, even in
the back-branches.
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 26 April 2016 at 04:48, Andres Freund <andres@anarazel.de> wrote:
No, I think we got to do this in all branches. I was just wondering
about how to fix vm_extend(). Which I do think we got to fix, even in
the back-branches.
I think replacing CacheInvalidateSmgr() with CacheInvalidateRelcache()
in vm_extend() is probably the safer thing to do, and ought to be
relatively harmless.
It means that an index-only scan won't be notified of VM extension
until the end of the other transaction, which might lead to extra heap
fetches, but I think that's unlikely to have any performance impact
because it ought to be a fairly rare event, and if it was another
transaction adding tuples, they wouldn't be all visible before it was
committed anyway, so the extra heap fetches would be required in any
case.
Regards,
Dean
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Apr 25, 2016 at 2:10 PM, Andres Freund <andres@anarazel.de> wrote:
Afaics the problem described below was introduced in b4e07417, do you
have a different/better proposal than
s/CacheInvalidateSmgr/CacheInvalidateRelcache/? Doing that doesn't feel
quite right either, because it only makes the file extension visible at
end-of-xact - which is mostly harmless, but still.
Maybe I'm all wet here, but it seems to me like this is a bug in
heap_xlog_visible(). If the relation has been extended on the master
but no bits are set, then it doesn't really matter whether any
invalidation happens on the standby. On the other hand, if the
relation has been extended on the master and a bit has actually been
set on the new page, then there should be an XLOG_HEAP2_VISIBLE record
that gets replayed on the standby. And replay of that record should
call visibilitymap_pin(), which should call vm_readbuf(), which should
call vm_extend(), which should issue the same smgr invalidation on the
standby that got issued on the master. So no problem! However, in
the FPI case, heap_xlog_visible() optimizes all of that away, so the
invalidation doesn't get issued. Oops. But we could cure that just
by having heap_xlog_visible() still call CacheInvalidateSmgr() even in
that case, and then we'd be actually be following the rule that
non-transactional invalidations should be issued on the standby even
while in replay. That seems cleaner to me than switching to a
transactional invalidation, which isn't really the right thing anyway.
In other words, I think Masahiko Sawada's patch in the original post
is pretty much right on target, except that we don't need to do that
always, but rather only in the FPI case when the call to
visibilitymap_pin() is being optimized away. If we solve the problem
that way, I don't think we even need a new WAL record for this case,
which is a non-trivial fringe benefit.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 27, 2016 at 12:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
In other words, I think Masahiko Sawada's patch in the original post
is pretty much right on target, except that we don't need to do that
always, but rather only in the FPI case when the call to
visibilitymap_pin() is being optimized away. If we solve the problem
that way, I don't think we even need a new WAL record for this case,
which is a non-trivial fringe benefit.
The visibility map is not the only thing that need to be addressed,
no? For example take this report from Dmitry Vasilyev of a couple of
months back where index relations are not visible on a standby:
/messages/by-id/CAB-SwXY6oH=9twBkXJtgR4UC1NqT-vpYAtxCseME62ADwyK5OA@mail.gmail.com
This is really leading to a solution where we need to take a more
general approach to this problem instead of trying to patch multiple
WAL replay code paths. And Andres' stuff does so.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-04-27 14:27:33 +0900, Michael Paquier wrote:
On Wed, Apr 27, 2016 at 12:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
In other words, I think Masahiko Sawada's patch in the original post
is pretty much right on target, except that we don't need to do that
always, but rather only in the FPI case when the call to
visibilitymap_pin() is being optimized away. If we solve the problem
that way, I don't think we even need a new WAL record for this case,
which is a non-trivial fringe benefit.The visibility map is not the only thing that need to be addressed,
no?
If I understand Robert correctly his point is about fixing the smgr
inval alone - without the invalidation fix that'd not be enough because
the relcache info with outdated information (particularly relallvisible
et al), would continue to be valid. Relcache invalidations imply an smgr
one, but not the other way round.
The reason the fix for nmsg > 0 && !markXidCommitted isn't entirely
sufficient is because the smgr invalidation isn't transaction bound,
i.e. sent out immediately. So, to have the same behaviour on master/HS,
we need a way to send out the invalidiation properly in lockstep with
replay.
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 27, 2016 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote:
On 2016-04-27 14:27:33 +0900, Michael Paquier wrote:
On Wed, Apr 27, 2016 at 12:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
In other words, I think Masahiko Sawada's patch in the original post
is pretty much right on target, except that we don't need to do that
always, but rather only in the FPI case when the call to
visibilitymap_pin() is being optimized away. If we solve the problem
that way, I don't think we even need a new WAL record for this case,
which is a non-trivial fringe benefit.The visibility map is not the only thing that need to be addressed,
no?If I understand Robert correctly his point is about fixing the smgr
inval alone - without the invalidation fix that'd not be enough because
the relcache info with outdated information (particularly relallvisible
et al), would continue to be valid. Relcache invalidations imply an smgr
one, but not the other way round.The reason the fix for nmsg > 0 && !markXidCommitted isn't entirely
sufficient is because the smgr invalidation isn't transaction bound,
i.e. sent out immediately. So, to have the same behaviour on master/HS,
we need a way to send out the invalidiation properly in lockstep with
replay.
What I'm confused about here is:
Masahiko Sawada posted a patch that fixes the problem for him, which
does not involve any new WAL record type. It also seems to be fixing
the problem in a way that is clean and consistent with what we've done
elsewhere.
The patch actually under discussion here manages to introduce a new
WAL record type without fixing that problem.
Therefore I include that the committed patch fixes some *other*
problem, not the one that this thread is ostensibly about.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-04-27 11:59:39 -0400, Robert Haas wrote:
Masahiko Sawada posted a patch that fixes the problem for him, which
does not involve any new WAL record type. It also seems to be fixing
the problem in a way that is clean and consistent with what we've done
elsewhere.
It only fixes one symptom, the relcache entry is still wrong
afterwards. Which is pretty relevant for planning.
The patch actually under discussion here manages to introduce a new
WAL record type without fixing that problem.
It does fix the problem, just not in a super robust way. Which is why I
think we should add something like Masahiko's fix additionally.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 27, 2016 at 12:03 PM, Andres Freund <andres@anarazel.de> wrote:
On 2016-04-27 11:59:39 -0400, Robert Haas wrote:
Masahiko Sawada posted a patch that fixes the problem for him, which
does not involve any new WAL record type. It also seems to be fixing
the problem in a way that is clean and consistent with what we've done
elsewhere.It only fixes one symptom, the relcache entry is still wrong
afterwards. Which is pretty relevant for planning.The patch actually under discussion here manages to introduce a new
WAL record type without fixing that problem.It does fix the problem, just not in a super robust way. Which is why I
think we should add something like Masahiko's fix additionally.
OK, that works for me.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers