Improving replay of XLOG_BTREE_VACUUM records
Hi all.
There are situations in which vacuuming big btree index causes stuck in WAL replaying on hot standby servers for quite a long time. I’ve described the problem in more details in this thread [0]/messages/by-id/058C9D59-9200-45FD-A565-0E4431A6F1E3@simply.name </messages/by-id/058C9D59-9200-45FD-A565-0E4431A6F1E3@simply.name>. Below in that thread Kevin Grittner proposed a good way for improving btree scans so that btree vacuuming logic could be seriously simplified. Since I don’t know when that may happen I’ve done a patch that makes some improvement right now. If Kevin or someone else would expand [1]http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ed5b87f96d473962ec5230fd820abfeaccb2069 <http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ed5b87f96d473962ec5230fd820abfeaccb2069> for handling all types of btree scans, I suppose, my patch could be thrown away and vacuuming logic should be strongly rewritten.
The idea of the patch is not to read pages from disk to make sure they are unpinned (like btree_xlog_vacuum does it right now). This is done with creating a new ReadBufferMode which returns locked buffer without setting BM_VALID flag on it. I don’t know if that is the right way of doing that but it seems to work well.
Testing it my environment gives a good win [2]https://yadi.sk/i/l13PZUNhgNB8u <https://yadi.sk/i/l13PZUNhgNB8u> - green is unpatched replica, blue is replica with a patch, two spikes are results of calling manual vacuuming of big table. Since the picture could be unavailable I’ll write here that:
1. replication delay reduced from ~1250 MB to 200 MB of replay_location lag,
2. patched replica caught master in less than a minute against 12 minutes of unpatched replica,
3. Physical I/O load on patched replica didn’t change compared with the normal workload while unpatched replica did lots of reads from PGDATA during spikes.
There is still a stuck in WAL replay but much smaller that right now. Also this change seems not to affect any other scenarios.
I’ll add it to 2015-06 commitfest.
[0]: /messages/by-id/058C9D59-9200-45FD-A565-0E4431A6F1E3@simply.name </messages/by-id/058C9D59-9200-45FD-A565-0E4431A6F1E3@simply.name>
[1]: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ed5b87f96d473962ec5230fd820abfeaccb2069 <http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ed5b87f96d473962ec5230fd820abfeaccb2069>
[2]: https://yadi.sk/i/l13PZUNhgNB8u <https://yadi.sk/i/l13PZUNhgNB8u>
--
May the force be with you…
https://simply.name
Attachments:
btree_vacuum_v1.patchapplication/octet-stream; name=btree_vacuum_v1.patchDownload+36-21
On 5/1/15 11:19 AM, Vladimir Borodin wrote:
There are situations in which vacuuming big btree index causes stuck in
WAL replaying on hot standby servers for quite a long time. I’ve
described the problem in more details in this thread [0]. Below in that
thread Kevin Grittner proposed a good way for improving btree scans so
that btree vacuuming logic could be seriously simplified. Since I don’t
know when that may happen I’ve done a patch that makes some improvement
right now. If Kevin or someone else would expand [1] for handling all
types of btree scans, I suppose, my patch could be thrown away and
vacuuming logic should be strongly rewritten.
This looks like a good way to address this until the more significant
work can be done.
I'm not a fan of "RBM_ZERO_NO_BM_VALID"; how about RBM_ZERO_BM_INVALID?
or BM_NOT_VALID? Or maybe I'm just trying to impose too much English on
the code; I see the logic to NO_BM_VALID...
+ * RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK, but does not set
+ * BM_VALID bit before returning buffer so that noone could pin it.
It would be better to explain why we want that mode. How about:
RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK but does not set
BM_VALID before returning the buffer. This is done to ensure that no one
can pin the buffer without actually reading the buffer contents in. This
is necessary while replying XLOG_BTREE_VACUUM records in hot standby.
+ if (mode == RBM_ZERO_NO_BM_VALID)
+ TerminateBufferIO(bufHdr, false, 0);
+ else
+ TerminateBufferIO(bufHdr, false, BM_VALID);
Simply passing in a 0 seems a bit odd to me; is there anywhere else we
do that?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi, Jim.
Thanks for review.
2 мая 2015 г., в 2:10, Jim Nasby <Jim.Nasby@BlueTreble.com> написал(а):
On 5/1/15 11:19 AM, Vladimir Borodin wrote:
There are situations in which vacuuming big btree index causes stuck in
WAL replaying on hot standby servers for quite a long time. I’ve
described the problem in more details in this thread [0]. Below in that
thread Kevin Grittner proposed a good way for improving btree scans so
that btree vacuuming logic could be seriously simplified. Since I don’t
know when that may happen I’ve done a patch that makes some improvement
right now. If Kevin or someone else would expand [1] for handling all
types of btree scans, I suppose, my patch could be thrown away and
vacuuming logic should be strongly rewritten.This looks like a good way to address this until the more significant work can be done.
I'm not a fan of "RBM_ZERO_NO_BM_VALID"; how about RBM_ZERO_BM_INVALID? or BM_NOT_VALID? Or maybe I'm just trying to impose too much English on the code; I see the logic to NO_BM_VALID…
Perhaps, RBM_ZERO_NO_BM_VALID is not so good (it makes more difficult to grep BM_VALID in code), but I don’t actually like BM_INVALID and BM_NOT_VALID, sorry :( But I also don’t insist on NO_BM_VALID, any other suggestions?
+ * RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK, but does not set + * BM_VALID bit before returning buffer so that noone could pin it.It would be better to explain why we want that mode. How about:
RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK but does not set BM_VALID before returning the buffer. This is done to ensure that no one can pin the buffer without actually reading the buffer contents in. This is necessary while replying XLOG_BTREE_VACUUM records in hot standby.
Good point, fixed in attached patch.
+ if (mode == RBM_ZERO_NO_BM_VALID) + TerminateBufferIO(bufHdr, false, 0); + else + TerminateBufferIO(bufHdr, false, BM_VALID);Simply passing in a 0 seems a bit odd to me; is there anywhere else we do that?
Yes, it is done the same way in FlushBuffer function [0]http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/storage/buffer/bufmgr.c;h=a68eae81695f3f3b711d35d6c910e46b031f1cbc;hb=HEAD#l2426 <http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/storage/buffer/bufmgr.c;h=a68eae81695f3f3b711d35d6c910e46b031f1cbc;hb=HEAD#l2426>. Also comments before TerminateBufferIO say that 0 is expected value for third argument.
[0]: http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/storage/buffer/bufmgr.c;h=a68eae81695f3f3b711d35d6c910e46b031f1cbc;hb=HEAD#l2426 <http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/storage/buffer/bufmgr.c;h=a68eae81695f3f3b711d35d6c910e46b031f1cbc;hb=HEAD#l2426>
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
May the force be with you…
https://simply.name
Attachments:
btree_vacuum_v2.patchapplication/octet-stream; name=btree_vacuum_v2.patchDownload+39-21
On 05/02/2015 02:10 AM, Jim Nasby wrote:
This looks like a good way to address this until the more significant
work can be done.I'm not a fan of "RBM_ZERO_NO_BM_VALID"; how about RBM_ZERO_BM_INVALID?
or BM_NOT_VALID? Or maybe I'm just trying to impose too much English on
the code; I see the logic to NO_BM_VALID...+ * RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK, but does not set + * BM_VALID bit before returning buffer so that noone could pin it.It would be better to explain why we want that mode. How about:
RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK but does not set
BM_VALID before returning the buffer. This is done to ensure that no one
can pin the buffer without actually reading the buffer contents in. This
is necessary while replying XLOG_BTREE_VACUUM records in hot standby.
That's a rather strange mode. The BM_VALID flag is internal to the
buffer manager - if you don't know how the buffer manager works, you
cannot understand that description. I couldn't quite understand what
that means myself. What can or can you not do with a buffer that's not
marked as BM_VALID? I'd suggest a mode like this instead:
In RBM_IF_IN_CACHE mode, the page is returned if it is in the buffer
cache already. If it's not, it is not read from disk, and InvalidBuffer
is returned instead.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 7/24/15 1:53 AM, Heikki Linnakangas wrote:
On 05/02/2015 02:10 AM, Jim Nasby wrote:
This looks like a good way to address this until the more significant
work can be done.I'm not a fan of "RBM_ZERO_NO_BM_VALID"; how about RBM_ZERO_BM_INVALID?
or BM_NOT_VALID? Or maybe I'm just trying to impose too much English on
the code; I see the logic to NO_BM_VALID...+ * RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK, but does not set + * BM_VALID bit before returning buffer so that noone could pin it.It would be better to explain why we want that mode. How about:
RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK but does not set
BM_VALID before returning the buffer. This is done to ensure that no one
can pin the buffer without actually reading the buffer contents in. This
is necessary while replying XLOG_BTREE_VACUUM records in hot standby.That's a rather strange mode. The BM_VALID flag is internal to the
buffer manager - if you don't know how the buffer manager works, you
cannot understand that description. I couldn't quite understand what
that means myself. What can or can you not do with a buffer that's not
marked as BM_VALID? I'd suggest a mode like this instead:In RBM_IF_IN_CACHE mode, the page is returned if it is in the buffer
cache already. If it's not, it is not read from disk, and InvalidBuffer
is returned instead.
+1, though I think it's still worth mentioning why we're doing this (so
we know no one else can pin the buffer).
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-07-24 09:53:49 +0300, Heikki Linnakangas wrote:
On 05/02/2015 02:10 AM, Jim Nasby wrote:
This looks like a good way to address this until the more significant
work can be done.I'm not a fan of "RBM_ZERO_NO_BM_VALID"; how about RBM_ZERO_BM_INVALID?
or BM_NOT_VALID? Or maybe I'm just trying to impose too much English on
the code; I see the logic to NO_BM_VALID...+ * RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK, but does not set + * BM_VALID bit before returning buffer so that noone could pin it.It would be better to explain why we want that mode. How about:
RBM_ZERO_NO_BM_VALID is the same as RBM_ZERO_AND_LOCK but does not set
BM_VALID before returning the buffer. This is done to ensure that no one
can pin the buffer without actually reading the buffer contents in. This
is necessary while replying XLOG_BTREE_VACUUM records in hot standby.That's a rather strange mode. The BM_VALID flag is internal to the buffer
manager - if you don't know how the buffer manager works, you cannot
understand that description. I couldn't quite understand what that means
myself. What can or can you not do with a buffer that's not marked as
BM_VALID? I'd suggest a mode like this instead:In RBM_IF_IN_CACHE mode, the page is returned if it is in the buffer cache
already. If it's not, it is not read from disk, and InvalidBuffer is
returned instead.
To me it sounds like this shouldn't go through the full ReadBuffer()
rigamarole. That code is already complex enough, and here it's really
not needed. I think it'll be much easier to review - and actually faster
in many cases to simply have something like
bool
BufferInCache(Relation, ForkNumber, BlockNumber)
{
/* XXX: setup tag, hash, partition */
LWLockAcquire(newPartitionLock, LW_SHARED);
buf_id = BufTableLookup(&newTag, newHash);
LWLockRelease(newPartitionLock);
return buf_id != -1;
}
and then fall back to the normal ReadBuffer() when it's in cache.
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Jul 26, 2015 at 9:46 PM, Andres Freund wrote:
On 2015-07-24 09:53:49 +0300, Heikki Linnakangas wrote:
To me it sounds like this shouldn't go through the full ReadBuffer()
rigamarole. That code is already complex enough, and here it's really
not needed. I think it'll be much easier to review - and actually faster
in many cases to simply have something likebool
BufferInCache(Relation, ForkNumber, BlockNumber)
{
/* XXX: setup tag, hash, partition */LWLockAcquire(newPartitionLock, LW_SHARED);
buf_id = BufTableLookup(&newTag, newHash);
LWLockRelease(newPartitionLock);
return buf_id != -1;
}and then fall back to the normal ReadBuffer() when it's in cache.
Patch marked as returned with feedback as input from the author has
been waited for some time now.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
25 авг. 2015 г., в 16:03, Michael Paquier <michael.paquier@gmail.com> написал(а):
On Sun, Jul 26, 2015 at 9:46 PM, Andres Freund wrote:
On 2015-07-24 09:53:49 +0300, Heikki Linnakangas wrote:
To me it sounds like this shouldn't go through the full ReadBuffer()
rigamarole. That code is already complex enough, and here it's really
not needed. I think it'll be much easier to review - and actually faster
in many cases to simply have something likebool
BufferInCache(Relation, ForkNumber, BlockNumber)
{
/* XXX: setup tag, hash, partition */LWLockAcquire(newPartitionLock, LW_SHARED);
buf_id = BufTableLookup(&newTag, newHash);
LWLockRelease(newPartitionLock);
return buf_id != -1;
}and then fall back to the normal ReadBuffer() when it's in cache.
Yep, sounds good. Patch with implementation attached.
Patch marked as returned with feedback as input from the author has
been waited for some time now.
Sorry for delay, I will link it to the current commitfest.
--
Michael
--
May the force be with you…
https://simply.name
Attachments:
btree_vacuum_v3.patchapplication/octet-stream; name=btree_vacuum_v3.patchDownload+42-4
On Sun, Jul 26, 2015 at 1:46 PM, Andres Freund <andres@anarazel.de> wrote:
To me it sounds like this shouldn't go through the full ReadBuffer()
rigamarole. That code is already complex enough, and here it's really
not needed. I think it'll be much easier to review - and actually faster
in many cases to simply have something likebool
BufferInCache(Relation, ForkNumber, BlockNumber)
{
/* XXX: setup tag, hash, partition */LWLockAcquire(newPartitionLock, LW_SHARED);
buf_id = BufTableLookup(&newTag, newHash);
LWLockRelease(newPartitionLock);
return buf_id != -1;
}and then fall back to the normal ReadBuffer() when it's in cache.
I guess I'm missing something but how is this API useful? As soon as
its returned it the result might be invalid since before you actually
make use of the return value another process could come and read in
and pin the page in question. Is this part of some interlock where you
only have to know it was unpinned at some point in time since some
other event?
--
greg
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-09-02 22:46:59 +0100, Greg Stark wrote:
On Sun, Jul 26, 2015 at 1:46 PM, Andres Freund <andres@anarazel.de> wrote:
To me it sounds like this shouldn't go through the full ReadBuffer()
rigamarole. That code is already complex enough, and here it's really
not needed. I think it'll be much easier to review - and actually faster
in many cases to simply have something likebool
BufferInCache(Relation, ForkNumber, BlockNumber)
{
/* XXX: setup tag, hash, partition */LWLockAcquire(newPartitionLock, LW_SHARED);
buf_id = BufTableLookup(&newTag, newHash);
LWLockRelease(newPartitionLock);
return buf_id != -1;
}and then fall back to the normal ReadBuffer() when it's in cache.
I guess I'm missing something but how is this API useful? As soon as
its returned it the result might be invalid since before you actually
make use of the return value another process could come and read in
and pin the page in question. Is this part of some interlock where you
only have to know it was unpinned at some point in time since some
other event?
Yes. We're talking about this block:
for (blkno = xlrec->lastBlockVacuumed + 1; blkno < thisblkno; blkno++)
{
/*
* We use RBM_NORMAL_NO_LOG mode because it's not an error
* condition to see all-zero pages. The original btvacuumpage
* scan would have skipped over all-zero pages, noting them in FSM
* but not bothering to initialize them just yet; so we mustn't
* throw an error here. (We could skip acquiring the cleanup lock
* if PageIsNew, but it's probably not worth the cycles to test.)
*
* XXX we don't actually need to read the block, we just need to
* confirm it is unpinned. If we had a special call into the
* buffer manager we could optimise this so that if the block is
* not in shared_buffers we confirm it as unpinned.
*/
buffer = XLogReadBufferExtended(thisrnode, MAIN_FORKNUM, blkno,
RBM_NORMAL_NO_LOG);
if (BufferIsValid(buffer))
{
LockBufferForCleanup(buffer);
UnlockReleaseBuffer(buffer);
}
}
}
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 3, 2015 at 6:10 AM, Vladimir Borodin wrote:
Patch with implementation attached.
Sorry for delay, I will link it to the current commitfest.
This patch looks correct, and is doing what it claims it does. It is
also not really worth refactoring the bit of code in PrefetchBuffer
that does a similar operation, so I am marking it as ready for
committer.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Vladimir Borodin wrote:
There are situations in which vacuuming big btree index causes stuck
in WAL replaying on hot standby servers for quite a long time. I’ve
described the problem in more details in this thread [0]. Below in
that thread Kevin Grittner proposed a good way for improving btree
scans so that btree vacuuming logic could be seriously simplified.
Since I don’t know when that may happen I’ve done a patch that makes
some improvement right now. If Kevin or someone else would expand [1]
for handling all types of btree scans, I suppose, my patch could be
thrown away and vacuuming logic should be strongly rewritten.
You submitted this patch in May 2015 -- and 7 months later, Simon came
up with another patch that's supposed to fix the underlying problem, so
that this shouldn't be a problem anymore.
Would you please have a look at Simon's patch, in particular verify
whether it solves the performance dip in your testing environment?
/messages/by-id/CANP8+jJuyExr1HnTAdZraWsWkfc-octhug7YPtzPtJcYbyi4pA@mail.gmail.com
(Note there's an updated patch a few emails down the thread.)
If it seems to fix the problem for you, I think we should mark yours
rejected and just apply Simon's.
Thanks!
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jan 7, 2016 at 12:20 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
Vladimir Borodin wrote:
There are situations in which vacuuming big btree index causes stuck
in WAL replaying on hot standby servers for quite a long time. I’ve
described the problem in more details in this thread [0]. Below in
that thread Kevin Grittner proposed a good way for improving btree
scans so that btree vacuuming logic could be seriously simplified.
Since I don’t know when that may happen I’ve done a patch that makes
some improvement right now. If Kevin or someone else would expand [1]
for handling all types of btree scans, I suppose, my patch could be
thrown away and vacuuming logic should be strongly rewritten.You submitted this patch in May 2015 -- and 7 months later, Simon came
up with another patch that's supposed to fix the underlying problem, so
that this shouldn't be a problem anymore.Would you please have a look at Simon's patch, in particular verify
whether it solves the performance dip in your testing environment?
/messages/by-id/CANP8+jJuyExr1HnTAdZraWsWkfc-octhug7YPtzPtJcYbyi4pA@mail.gmail.com
(Note there's an updated patch a few emails down the thread.)If it seems to fix the problem for you, I think we should mark yours
rejected and just apply Simon's.
Simon's patch (justly) does not update lastBlockVacuumed in the case
of toast indexes, but Vladimir's patch would still optimize this case,
no?
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
7 янв. 2016 г., в 5:26, Michael Paquier <michael.paquier@gmail.com> написал(а):
On Thu, Jan 7, 2016 at 12:20 AM, Alvaro Herrera
<alvherre@2ndquadrant.com <mailto:alvherre@2ndquadrant.com>> wrote:Vladimir Borodin wrote:
There are situations in which vacuuming big btree index causes stuck
in WAL replaying on hot standby servers for quite a long time. I’ve
described the problem in more details in this thread [0]. Below in
that thread Kevin Grittner proposed a good way for improving btree
scans so that btree vacuuming logic could be seriously simplified.
Since I don’t know when that may happen I’ve done a patch that makes
some improvement right now. If Kevin or someone else would expand [1]
for handling all types of btree scans, I suppose, my patch could be
thrown away and vacuuming logic should be strongly rewritten.You submitted this patch in May 2015 -- and 7 months later, Simon came
up with another patch that's supposed to fix the underlying problem, so
that this shouldn't be a problem anymore.Would you please have a look at Simon's patch, in particular verify
whether it solves the performance dip in your testing environment?
/messages/by-id/CANP8+jJuyExr1HnTAdZraWsWkfc-octhug7YPtzPtJcYbyi4pA@mail.gmail.com
(Note there's an updated patch a few emails down the thread.)If it seems to fix the problem for you, I think we should mark yours
rejected and just apply Simon’s.
Ok, I’ll try this patch with my use case. Basically, it’s not so easy now since I’ve partitioned that big table to not have such problems but there is a way to reproduce it once again. If it helps, I agree that my should be rejected in favor of the Simon’s patch because my patch just reduces replication lag but Simon’s seems to remove lag at all.
Simon's patch (justly) does not update lastBlockVacuumed in the case
of toast indexes, but Vladimir's patch would still optimize this case,
no?
I suppose, in case of _not_ toast indexes. But yes, seems that my patch should help in that case too.
--
Michael
--
May the force be with you…
https://simply.name
Vladimir Borodin wrote:
7 янв. 2016 г., в 5:26, Michael Paquier <michael.paquier@gmail.com> написал(а):
On Thu, Jan 7, 2016 at 12:20 AM, Alvaro Herrera
<alvherre@2ndquadrant.com <mailto:alvherre@2ndquadrant.com>> wrote:
Would you please have a look at Simon's patch, in particular verify
whether it solves the performance dip in your testing environment?
/messages/by-id/CANP8+jJuyExr1HnTAdZraWsWkfc-octhug7YPtzPtJcYbyi4pA@mail.gmail.com
(Note there's an updated patch a few emails down the thread.)If it seems to fix the problem for you, I think we should mark yours
rejected and just apply Simon’s.Ok, I’ll try this patch with my use case. Basically, it’s not so easy
now since I’ve partitioned that big table to not have such problems
but there is a way to reproduce it once again. If it helps, I agree
that my should be rejected in favor of the Simon’s patch because my
patch just reduces replication lag but Simon’s seems to remove lag at
all.
I would agree except for the observation on toast indexes. I think
that's an important enough use case that perhaps we should have both.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 8 January 2016 at 13:36, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Vladimir Borodin wrote:
7 янв. 2016 г., в 5:26, Michael Paquier <michael.paquier@gmail.com>
написал(а):
On Thu, Jan 7, 2016 at 12:20 AM, Alvaro Herrera
<alvherre@2ndquadrant.com <mailto:alvherre@2ndquadrant.com>> wrote:Would you please have a look at Simon's patch, in particular verify
whether it solves the performance dip in your testing environment?/messages/by-id/CANP8+jJuyExr1HnTAdZraWsWkfc-octhug7YPtzPtJcYbyi4pA@mail.gmail.com
(Note there's an updated patch a few emails down the thread.)
If it seems to fix the problem for you, I think we should mark yours
rejected and just apply Simon’s.Ok, I’ll try this patch with my use case. Basically, it’s not so easy
now since I’ve partitioned that big table to not have such problems
but there is a way to reproduce it once again. If it helps, I agree
that my should be rejected in favor of the Simon’s patch because my
patch just reduces replication lag but Simon’s seems to remove lag at
all.I would agree except for the observation on toast indexes. I think
that's an important enough use case that perhaps we should have both.
The exclusion of toast indexes is something we can remove also, I have
recently discovered. When we access toast data we ignore MVCC, but we still
have the toast pointer and chunkid to use for rechecking our scan results.
So a later patch will add some rechecks.
So I don't think it is worth applying this patch now. I should add that I
also had a patch that did this, posted earlier IIRC. That is not the reason
to reject this, just me pointing out that I'm effectively rejecting my own
earlier patch also.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Simon Riggs wrote:
On 8 January 2016 at 13:36, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
I would agree except for the observation on toast indexes. I think
that's an important enough use case that perhaps we should have both.The exclusion of toast indexes is something we can remove also, I have
recently discovered. When we access toast data we ignore MVCC, but we still
have the toast pointer and chunkid to use for rechecking our scan results.
So a later patch will add some rechecks.
Ah, interesting, glad to hear. I take it you're pushing your patch
soon, then?
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 1/8/16 9:34 AM, Alvaro Herrera wrote:
Simon Riggs wrote:
On 8 January 2016 at 13:36, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
I would agree except for the observation on toast indexes. I think
that's an important enough use case that perhaps we should have both.The exclusion of toast indexes is something we can remove also, I have
recently discovered. When we access toast data we ignore MVCC, but we still
have the toast pointer and chunkid to use for rechecking our scan results.
So a later patch will add some rechecks.Ah, interesting, glad to hear. I take it you're pushing your patch
soon, then?
ISTM that this patch should be "returned with feedback" or "rejected"
based on the thread. I'm marking it "waiting for author" for the time
being.
Thanks,
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Mar 10, 2016 at 1:29 AM, David Steele <david@pgmasters.net> wrote:
On 1/8/16 9:34 AM, Alvaro Herrera wrote:
Simon Riggs wrote:
On 8 January 2016 at 13:36, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:I would agree except for the observation on toast indexes. I think
that's an important enough use case that perhaps we should have both.The exclusion of toast indexes is something we can remove also, I have
recently discovered. When we access toast data we ignore MVCC, but we
still
have the toast pointer and chunkid to use for rechecking our scan
results.
So a later patch will add some rechecks.Ah, interesting, glad to hear. I take it you're pushing your patch
soon, then?ISTM that this patch should be "returned with feedback" or "rejected" based
on the thread. I'm marking it "waiting for author" for the time being.
I think that we are still waiting for some input from Simon here...
Simon, are you going to finish wrapping up your other patch?
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10 March 2016 at 06:27, Michael Paquier <michael.paquier@gmail.com>
wrote:
On Thu, Mar 10, 2016 at 1:29 AM, David Steele <david@pgmasters.net> wrote:
On 1/8/16 9:34 AM, Alvaro Herrera wrote:
Simon Riggs wrote:
On 8 January 2016 at 13:36, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:I would agree except for the observation on toast indexes. I think
that's an important enough use case that perhaps we should have both.The exclusion of toast indexes is something we can remove also, I have
recently discovered. When we access toast data we ignore MVCC, but we
still
have the toast pointer and chunkid to use for rechecking our scan
results.
So a later patch will add some rechecks.Ah, interesting, glad to hear. I take it you're pushing your patch
soon, then?ISTM that this patch should be "returned with feedback" or "rejected"
based
on the thread. I'm marking it "waiting for author" for the time being.
I think that we are still waiting for some input from Simon here...
Simon, are you going to finish wrapping up your other patch?
Yes, I have done the research, so think patch should be rejected now.
Thanks to everyone for their input. It's good to have alternate approaches.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services