FSM versus GIN pending list bloat
For a GIN index with fastupdate turned on, both the user backends and
autoanalyze routine will clear out the pending list, pushing the entries
into the normal index structure and deleting the pages used by the pending
list. But those deleted pages will not get added to the freespace map
until a vacuum is done. This leads to horrible bloat on insert only
tables, as it is never vacuumed and so the pending list space is never
reused. And the pending list is very inefficient in space usage to start
with, even compared to the old style posting lists and especially compared
to the new compressed ones. (If they were aggressively recycled, this
inefficient use wouldn't be much of a problem.)
Even on a table receiving mostly updates after its initial population (and
so being vacuumed regularly) with default autovac setting, there is a lot
of bloat.
The attached proof of concept patch greatly improves the bloat for both the
insert and the update cases. You need to turn on both features: adding the
pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3). The
first of those two things could probably be adopted for real, but the
second probably is not acceptable. What is the right way to do this?
Could a variant of RecordFreeIndexPage bubble the free space up the map
immediately rather than waiting for a vacuum? It would only have to move
up until it found a page with freespace already recorded in it, which the
vast majority of the time would mean observing up one level and then not
writing to it, assuming the pending list pages remain well clustered.
Or would a completely different approach be better, like managing the
vacated pending list pages directly in the index without going to the fsm?
Cheers,
Jeff
Attachments:
gin_fast_freespace.patchapplication/octet-stream; name=gin_fast_freespace.patchDownload
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
new file mode 100644
index c5732c3..1270471
*** a/src/backend/access/gin/ginfast.c
--- b/src/backend/access/gin/ginfast.c
***************
*** 27,32 ****
--- 27,33 ----
/* GUC parameter */
int gin_pending_list_limit = 0;
+ int JJ_GIN=0;
#define GIN_PAGE_FREESIZE \
( BLCKSZ - MAXALIGN(SizeOfPageHeaderData) - MAXALIGN(sizeof(GinPageOpaqueData)) )
*************** shiftList(Relation index, Buffer metabuf
*** 521,530 ****
--- 522,533 ----
int64 nDeletedHeapTuples = 0;
ginxlogDeleteListPages data;
Buffer buffers[GIN_NDELETE_AT_ONCE];
+ BlockNumber freespace[GIN_NDELETE_AT_ONCE];
data.ndeleted = 0;
while (data.ndeleted < GIN_NDELETE_AT_ONCE && blknoToDelete != newHead)
{
+ freespace[data.ndeleted] = blknoToDelete;
buffers[data.ndeleted] = ReadBuffer(index, blknoToDelete);
LockBuffer(buffers[data.ndeleted], GIN_EXCLUSIVE);
page = BufferGetPage(buffers[data.ndeleted]);
*************** shiftList(Relation index, Buffer metabuf
*** 609,614 ****
--- 612,624 ----
UnlockReleaseBuffer(buffers[i]);
END_CRIT_SECTION();
+
+ if(JJ_GIN&1)
+ for (i = 0; i < data.ndeleted; i++)
+ RecordFreeIndexPage(index, freespace[i]);
+ if(JJ_GIN&2)
+ FreeSpaceMapVacuum(index);
+
} while (blknoToDelete != newHead);
return false;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
new file mode 100644
index 1b7b914..8cdad78
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***************
*** 107,112 ****
--- 107,113 ----
extern bool Log_disconnections;
extern int CommitDelay;
extern int CommitSiblings;
+ extern int JJ_GIN;
extern char *default_tablespace;
extern char *temp_tablespaces;
extern bool ignore_checksum_failure;
*************** static struct config_int ConfigureNamesI
*** 1659,1664 ****
--- 1660,1674 ----
NULL, NULL, NULL
},
{
+ {"JJ_GIN", PGC_USERSET, QUERY_TUNING_OTHER,
+ gettext_noop("recycle GIN pages more aggressively."),
+ gettext_noop(" ")
+ },
+ &JJ_GIN,
+ 0, 0, 10000,
+ NULL, NULL, NULL
+ },
+ {
{"from_collapse_limit", PGC_USERSET, QUERY_TUNING_OTHER,
gettext_noop("Sets the FROM-list size beyond which subqueries "
"are not collapsed."),
On 08/04/2015 08:03 AM, Jeff Janes wrote:
For a GIN index with fastupdate turned on, both the user backends and
autoanalyze routine will clear out the pending list, pushing the entries
into the normal index structure and deleting the pages used by the pending
list. But those deleted pages will not get added to the freespace map
until a vacuum is done. This leads to horrible bloat on insert only
tables, as it is never vacuumed and so the pending list space is never
reused. And the pending list is very inefficient in space usage to start
with, even compared to the old style posting lists and especially compared
to the new compressed ones. (If they were aggressively recycled, this
inefficient use wouldn't be much of a problem.)
Good point.
The attached proof of concept patch greatly improves the bloat for both the
insert and the update cases. You need to turn on both features: adding the
pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3). The
first of those two things could probably be adopted for real, but the
second probably is not acceptable. What is the right way to do this?
Could a variant of RecordFreeIndexPage bubble the free space up the map
immediately rather than waiting for a vacuum? It would only have to move
up until it found a page with freespace already recorded in it, which the
vast majority of the time would mean observing up one level and then not
writing to it, assuming the pending list pages remain well clustered.
Yep, that sounds good.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4 August 2015 at 06:03, Jeff Janes <jeff.janes@gmail.com> wrote:
The attached proof of concept patch greatly improves the bloat for both
the insert and the update cases. You need to turn on both features: adding
the pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3).
The first of those two things could probably be adopted for real, but the
second probably is not acceptable. What is the right way to do this?
Could a variant of RecordFreeIndexPage bubble the free space up the map
immediately rather than waiting for a vacuum? It would only have to move
up until it found a page with freespace already recorded in it, which the
vast majority of the time would mean observing up one level and then not
writing to it, assuming the pending list pages remain well clustered.
You make a good case for action here since insert only tables with GIN
indexes on text are a common use case for GIN.
Why would vacuuming the FSM be unacceptable? With a
large gin_pending_list_limit it makes sense.
If it is unacceptable, perhaps we can avoid calling it every time, or
simply have FreeSpaceMapVacuum() terminate more quickly on some kind of
80/20 heuristic for this case.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 4 August 2015 at 09:39, Simon Riggs <simon@2ndquadrant.com> wrote:
On 4 August 2015 at 06:03, Jeff Janes <jeff.janes@gmail.com> wrote:
The attached proof of concept patch greatly improves the bloat for both
the insert and the update cases. You need to turn on both features: adding
the pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3).
The first of those two things could probably be adopted for real, but the
second probably is not acceptable. What is the right way to do this?
Could a variant of RecordFreeIndexPage bubble the free space up the map
immediately rather than waiting for a vacuum? It would only have to move
up until it found a page with freespace already recorded in it, which the
vast majority of the time would mean observing up one level and then not
writing to it, assuming the pending list pages remain well clustered.You make a good case for action here since insert only tables with GIN
indexes on text are a common use case for GIN.Why would vacuuming the FSM be unacceptable? With a
large gin_pending_list_limit it makes sense.If it is unacceptable, perhaps we can avoid calling it every time, or
simply have FreeSpaceMapVacuum() terminate more quickly on some kind of
80/20 heuristic for this case.
Couple of questions here...
* the docs say "it's desirable to have pending-list cleanup occur in the
background", but there is no way to invoke that, except via VACUUM. I think
we need a separate function to be able to call this as a background action.
If we had that, we wouldn't need much else, would we?
* why do we have two parameters: gin_pending_list_limit and fastupdate?
What happens if we set gin_pending_list_limit but don't set fastupdate?
* how do we know how to set that parameter? Is there a way of knowing
gin_pending_list_limit has been reached?
This and the OP seem like 9.5 open items to me.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 08/04/2015 04:35 PM, Simon Riggs wrote:
This and the OP seem like 9.5 open items to me.
Why? This is nothing new in 9.5.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4 August 2015 at 14:55, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 08/04/2015 04:35 PM, Simon Riggs wrote:
This and the OP seem like 9.5 open items to me.
Why? This is nothing new in 9.5.
gin_pending_list_limit is new in 9.5
We're in Alpha, so if something has been added and isn't very usable, we
should change it while we can.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2015-08-04 14:59:11 +0100, Simon Riggs wrote:
On 4 August 2015 at 14:55, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 08/04/2015 04:35 PM, Simon Riggs wrote:
This and the OP seem like 9.5 open items to me.
Why? This is nothing new in 9.5.
gin_pending_list_limit is new in 9.5
We're in Alpha, so if something has been added and isn't very usable, we
should change it while we can.
The only thing that variable does is change what the pending size limit
is determined by. Previously it was work_mem, now it's
gin_pending_list_limit. Imo that has pretty much nothing to do with not
registering pages as free.
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4 August 2015 at 15:18, Andres Freund <andres@anarazel.de> wrote:
On 2015-08-04 14:59:11 +0100, Simon Riggs wrote:
On 4 August 2015 at 14:55, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 08/04/2015 04:35 PM, Simon Riggs wrote:
This and the OP seem like 9.5 open items to me.
Why? This is nothing new in 9.5.
gin_pending_list_limit is new in 9.5
We're in Alpha, so if something has been added and isn't very usable, we
should change it while we can.The only thing that variable does is change what the pending size limit
is determined by. Previously it was work_mem, now it's
gin_pending_list_limit. Imo that has pretty much nothing to do with not
registering pages as free.
We've made a change to the way GIN fast update works and that needs to be
an effective change, not one that immediately triggers a secondary
annoyance for the user. I've asked some questions; they may turn into
actions, or not, but they are open items.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes:
On 4 August 2015 at 15:18, Andres Freund <andres@anarazel.de> wrote:
The only thing that variable does is change what the pending size limit
is determined by. Previously it was work_mem, now it's
gin_pending_list_limit. Imo that has pretty much nothing to do with not
registering pages as free.
We've made a change to the way GIN fast update works and that needs to be
an effective change, not one that immediately triggers a secondary
annoyance for the user.
Said annoyance has been there since day one, no?
I've asked some questions; they may turn into
actions, or not, but they are open items.
If it's not a new problem introduced by 9.5, I don't think it's
appropriate to insist that 9.5 must solve it.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Jeff Janes wrote:
The attached proof of concept patch greatly improves the bloat for both the
insert and the update cases. You need to turn on both features: adding the
pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3). The
first of those two things could probably be adopted for real, but the
second probably is not acceptable. What is the right way to do this?
I think that's the right way, only that I wouldn't call FSMVacuum()
inside the loop but only once you're out of it.
FWIW I had to add a FSMVacuum call somewhere in the BRIN code also,
which I didn't like at all. If there's a way to optimize this "bubbling
up" of some individual page all the way to the root level, that would
probably be a lot better than what it's currently doing.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 4, 2015 at 1:39 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 4 August 2015 at 06:03, Jeff Janes <jeff.janes@gmail.com> wrote:
The attached proof of concept patch greatly improves the bloat for both
the insert and the update cases. You need to turn on both features: adding
the pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3).
The first of those two things could probably be adopted for real, but the
second probably is not acceptable. What is the right way to do this?
Could a variant of RecordFreeIndexPage bubble the free space up the map
immediately rather than waiting for a vacuum? It would only have to move
up until it found a page with freespace already recorded in it, which the
vast majority of the time would mean observing up one level and then not
writing to it, assuming the pending list pages remain well clustered.You make a good case for action here since insert only tables with GIN
indexes on text are a common use case for GIN.Why would vacuuming the FSM be unacceptable? With a
large gin_pending_list_limit it makes sense.
But with a smallish gin_pending_list_limit (like the default 4MB) this
could be called a lot (multiple times a second during some spurts), and
would read the entire fsm each time.
If it is unacceptable, perhaps we can avoid calling it every time, or
simply have FreeSpaceMapVacuum() terminate more quickly on some kind of
80/20 heuristic for this case.
Or maybe it could be passed a range of blocks which need vacuuming, so it
concentrated on that range.
But from the README file, it sounds like it is already supposed to be
bubbling up. I'll have to see just whats going on there when I get a
chance.
Cheers,
Jeff
On Tue, Aug 4, 2015 at 6:35 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 4 August 2015 at 09:39, Simon Riggs <simon@2ndquadrant.com> wrote:
On 4 August 2015 at 06:03, Jeff Janes <jeff.janes@gmail.com> wrote:
The attached proof of concept patch greatly improves the bloat for both
the insert and the update cases. You need to turn on both features: adding
the pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3).
The first of those two things could probably be adopted for real, but the
second probably is not acceptable. What is the right way to do this?
Could a variant of RecordFreeIndexPage bubble the free space up the map
immediately rather than waiting for a vacuum? It would only have to move
up until it found a page with freespace already recorded in it, which the
vast majority of the time would mean observing up one level and then not
writing to it, assuming the pending list pages remain well clustered.You make a good case for action here since insert only tables with GIN
indexes on text are a common use case for GIN.Why would vacuuming the FSM be unacceptable? With a
large gin_pending_list_limit it makes sense.If it is unacceptable, perhaps we can avoid calling it every time, or
simply have FreeSpaceMapVacuum() terminate more quickly on some kind of
80/20 heuristic for this case.Couple of questions here...
* the docs say "it's desirable to have pending-list cleanup occur in the
background", but there is no way to invoke that, except via VACUUM. I
think we need a separate function to be able to call this as a background
action. If we had that, we wouldn't need much else, would we?
I thought maybe the new bgworker framework would be a way to have a backend
signal a bgworker to do the cleanup when it notices the pending list is
getting large. But that wouldn't directly fix this issue, because the
bgworker still wouldn't recycle that space (without further changes), only
vacuum workers do that currently.
But I don't think this could be implemented as an extension, because the
signalling code has to be in core, so (not having studied the matter at
all) I don't know if it is good fit for bgworker.
* why do we have two parameters: gin_pending_list_limit and fastupdate?
What happens if we set gin_pending_list_limit but don't set fastupdate?
Fastupdate is on by default. If it were turned off, then
gin_pending_list_limit would be mostly irrelevant for those tables.
Fastupdate could have been implemented as a magic value (0 or -1) for
gin_pending_list_limit but that would break backwards compatibility (and
arguably would not be a better way of doing things, anyway).
* how do we know how to set that parameter? Is there a way of knowing
gin_pending_list_limit has been reached?
I don't think there is an easier answer to that. The trade offs are
complex and depend on things like how well cached the parts of the index
needing insertions are, how many lexemes/array elements are in an average
document, and how many documents inserted near the same time as each other
share lexemes in common. And of course what you need to optimize for,
latency or throughput, and if latency search latency or insert latency.
This and the OP seem like 9.5 open items to me.
I don't think so. Freeing gin_pending_list_limit from being forcibly tied
to work_mem is a good thing. Even if I don't know exactly how to set
gin_pending_list_limit, I know I don't want to be 4GB just because work_mem
was set there for some temporary reason. I'm happy to leave it at its
default and let its fine tuning be a topic for people who really care about
every microsecond of performance.
Cheers,
Jeff
On 4 August 2015 at 21:04, Jeff Janes <jeff.janes@gmail.com> wrote:
Couple of questions here...
* the docs say "it's desirable to have pending-list cleanup occur in the
background", but there is no way to invoke that, except via VACUUM. I
think we need a separate function to be able to call this as a background
action. If we had that, we wouldn't need much else, would we?I thought maybe the new bgworker framework would be a way to have a
backend signal a bgworker to do the cleanup when it notices the pending
list is getting large. But that wouldn't directly fix this issue, because
the bgworker still wouldn't recycle that space (without further changes),
only vacuum workers do that currently.But I don't think this could be implemented as an extension, because the
signalling code has to be in core, so (not having studied the matter at
all) I don't know if it is good fit for bgworker.
We need to expose 2 functions:
1. a function to perform the recycling directly (BRIN has an equivalent
function)
2. a function to see how big the pending list is for a particular index,
i.e. do we need to run function 1?
We can then build a bgworker that polls the pending list and issues a
recycle if and when needed - which is how autovac started.
* why do we have two parameters: gin_pending_list_limit and fastupdate?
What happens if we set gin_pending_list_limit but don't set fastupdate?
Fastupdate is on by default. If it were turned off, then
gin_pending_list_limit would be mostly irrelevant for those tables.
Fastupdate could have been implemented as a magic value (0 or -1) for
gin_pending_list_limit but that would break backwards compatibility (and
arguably would not be a better way of doing things, anyway).* how do we know how to set that parameter? Is there a way of knowing
gin_pending_list_limit has been reached?I don't think there is an easier answer to that. The trade offs are
complex and depend on things like how well cached the parts of the index
needing insertions are, how many lexemes/array elements are in an average
document, and how many documents inserted near the same time as each other
share lexemes in common. And of course what you need to optimize for,
latency or throughput, and if latency search latency or insert latency.
So we also need a way to count the number of times the pending list is
flushed. Perhaps record that on the metapage, so we can see how often it
has happened - and another function to view the stats on that
This and the OP seem like 9.5 open items to me.
I don't think so. Freeing gin_pending_list_limit from being forcibly tied
to work_mem is a good thing. Even if I don't know exactly how to set
gin_pending_list_limit, I know I don't want to be 4GB just because work_mem
was set there for some temporary reason. I'm happy to leave it at its
default and let its fine tuning be a topic for people who really care about
every microsecond of performance.
OK, I accept this.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Aug 4, 2015 at 12:38 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Aug 4, 2015 at 1:39 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 4 August 2015 at 06:03, Jeff Janes <jeff.janes@gmail.com> wrote:
The attached proof of concept patch greatly improves the bloat for both
the insert and the update cases. You need to turn on both features: adding
the pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3).
The first of those two things could probably be adopted for real, but the
second probably is not acceptable. What is the right way to do this?
Could a variant of RecordFreeIndexPage bubble the free space up `the map
immediately rather than waiting for a vacuum? It would only have to move
up until it found a page with freespace already recorded in it, which the
vast majority of the time would mean observing up one level and then not
writing to it, assuming the pending list pages remain well clustered.You make a good case for action here since insert only tables with GIN
indexes on text are a common use case for GIN.Why would vacuuming the FSM be unacceptable? With a
large gin_pending_list_limit it makes sense.But with a smallish gin_pending_list_limit (like the default 4MB) this
could be called a lot (multiple times a second during some spurts), and
would read the entire fsm each time.If it is unacceptable, perhaps we can avoid calling it every time, or
simply have FreeSpaceMapVacuum() terminate more quickly on some kind of
80/20 heuristic for this case.Or maybe it could be passed a range of blocks which need vacuuming, so it
concentrated on that range.But from the README file, it sounds like it is already supposed to be
bubbling up. I'll have to see just whats going on there when I get a
chance.
Before making changes to the FSM code to make immediate summarization
possible, I decided to quantify the effect of vacuuming the entire fsm.
Going up to 5 GB of index size, the time taken to vacuum the entire FSM one
time for each GIN_NDELETE_AT_ONCE was undetectable.
Based on that, I made this patch which vacuums it one time per completed
ginInsertCleanup, which should be far less than once per
GIN_NDELETE_AT_ONCE.
I would be interested in hearing what people with very large GIN indexes
think of it. It does seem like at some point the time needed must become
large, but from what I can tell that point is way beyond what someone is
likely to have for an index on an unpartitioned table.
I have a simple test case that inserts an array of 101 md5 digests into
each row. With 10_000 of these rows inserted into an already indexed
table, I get 40MB for the table and 80MB for the index unpatched. With the
patch, I get 7.3 MB for the index.
Cheers,
Jeff
Attachments:
gin_fast_freespace_v001.patchapplication/octet-stream; name=gin_fast_freespace_v001.patchDownload
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
new file mode 100644
index c5732c3..a13f243
*** a/src/backend/access/gin/ginfast.c
--- b/src/backend/access/gin/ginfast.c
***************
*** 24,29 ****
--- 24,30 ----
#include "miscadmin.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+ #include "storage/indexfsm.h"
/* GUC parameter */
int gin_pending_list_limit = 0;
*************** shiftList(Relation index, Buffer metabuf
*** 521,530 ****
--- 522,533 ----
int64 nDeletedHeapTuples = 0;
ginxlogDeleteListPages data;
Buffer buffers[GIN_NDELETE_AT_ONCE];
+ BlockNumber freespace[GIN_NDELETE_AT_ONCE];
data.ndeleted = 0;
while (data.ndeleted < GIN_NDELETE_AT_ONCE && blknoToDelete != newHead)
{
+ freespace[data.ndeleted] = blknoToDelete;
buffers[data.ndeleted] = ReadBuffer(index, blknoToDelete);
LockBuffer(buffers[data.ndeleted], GIN_EXCLUSIVE);
page = BufferGetPage(buffers[data.ndeleted]);
*************** shiftList(Relation index, Buffer metabuf
*** 609,614 ****
--- 612,621 ----
UnlockReleaseBuffer(buffers[i]);
END_CRIT_SECTION();
+
+ for (i = 0; i < data.ndeleted; i++)
+ RecordFreeIndexPage(index, freespace[i]);
+
} while (blknoToDelete != newHead);
return false;
*************** ginInsertCleanup(GinState *ginstate,
*** 744,749 ****
--- 751,757 ----
BuildAccumulator accum;
KeyArray datums;
BlockNumber blkno;
+ bool fsm_vac = false;
metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
LockBuffer(metabuffer, GIN_SHARE);
*************** ginInsertCleanup(GinState *ginstate,
*** 793,798 ****
--- 801,807 ----
{
/* another cleanup process is running concurrently */
UnlockReleaseBuffer(buffer);
+ fsm_vac = false;
break;
}
*************** ginInsertCleanup(GinState *ginstate,
*** 857,862 ****
--- 866,872 ----
/* another cleanup process is running concurrently */
UnlockReleaseBuffer(buffer);
LockBuffer(metabuffer, GIN_UNLOCK);
+ fsm_vac = false;
break;
}
*************** ginInsertCleanup(GinState *ginstate,
*** 895,903 ****
--- 905,917 ----
{
/* another cleanup process is running concurrently */
LockBuffer(metabuffer, GIN_UNLOCK);
+ fsm_vac = false;
break;
}
+ /* At this point, some pending pages have been freed up */
+ fsm_vac=true;
+
Assert(blkno == metadata->head);
LockBuffer(metabuffer, GIN_UNLOCK);
*************** ginInsertCleanup(GinState *ginstate,
*** 931,936 ****
--- 945,959 ----
ReleaseBuffer(metabuffer);
+ /*
+ * As pending list pages can have a high churn rate, it is
+ * desirable to recycle them immediately to the FreeSpace Map when
+ * ordinary backends clean the list.
+ */
+ if (fsm_vac && !vac_delay)
+ IndexFreeSpaceMapVacuum(index);
+
+
/* Clean up temporary space */
MemoryContextSwitchTo(oldCtx);
MemoryContextDelete(opCtx);
On Wed, Aug 5, 2015 at 5:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 4 August 2015 at 21:04, Jeff Janes <jeff.janes@gmail.com> wrote:
Couple of questions here...
* the docs say "it's desirable to have pending-list cleanup occur in the
background", but there is no way to invoke that, except via VACUUM. I think
we need a separate function to be able to call this as a background action.
If we had that, we wouldn't need much else, would we?I thought maybe the new bgworker framework would be a way to have a
backend signal a bgworker to do the cleanup when it notices the pending list
is getting large. But that wouldn't directly fix this issue, because the
bgworker still wouldn't recycle that space (without further changes), only
vacuum workers do that currently.But I don't think this could be implemented as an extension, because the
signalling code has to be in core, so (not having studied the matter at all)
I don't know if it is good fit for bgworker.We need to expose 2 functions:
1. a function to perform the recycling directly (BRIN has an equivalent
function)2. a function to see how big the pending list is for a particular index,
i.e. do we need to run function 1?
Probably you can use pgstatginindex() that pgstattuple contrib module provides
for this case.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Aug 10, 2015 at 1:16 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
I have a simple test case that inserts an array of 101 md5 digests into each
row. With 10_000 of these rows inserted into an already indexed table, I
get 40MB for the table and 80MB for the index unpatched. With the patch, I
get 7.3 MB for the index.
Uh, wow. Seems like we should do something about this.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 3, 2015 at 3:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Aug 10, 2015 at 1:16 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
I have a simple test case that inserts an array of 101 md5 digests into each
row. With 10_000 of these rows inserted into an already indexed table, I
get 40MB for the table and 80MB for the index unpatched. With the patch, I
get 7.3 MB for the index.Uh, wow. Seems like we should do something about this.
I looked into this patch.
The patch is applied cleanly, and complied without error.
I think it works as we expected, but the patch has some unnecessary white space.
Also the today document says that pending list is cleaned up by only
vacuum, we need to document about this?
Regards,
--
Masahiko Sawada
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers