use AV worker items infrastructure for GIN pending list's cleanup

Started by Jaime Casanovaalmost 5 years ago8 messages
#1Jaime Casanova
jcasanov@systemguards.com.ec
1 attachment(s)

Hi,

When AV worker items where introduced 4 years ago, i was suggested that
it could be used for other things like cleaning the pending list of GIN
index when it reaches gin_pending_list_limit instead of making user
visible operation pay the price.

That never happened though. So, here is a little patch for that.

Should I add an entry for this on next commitfest?

--
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Attachments:

av_workitem_gin_cleanup_pending_list.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index e0d9940946..305326119c 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -459,7 +459,17 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
 	 * pending list not forcibly.
 	 */
 	if (needCleanup)
-		ginInsertCleanup(ginstate, false, true, false, NULL);
+	{
+		bool		recorded;
+
+		recorded = AutoVacuumRequestWork(AVW_GINCleanPendingList, 
+					RelationGetRelid(ginstate->index), InvalidBlockNumber);
+		if (!recorded)
+			ereport(LOG,
+					(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+					 errmsg("request for GIN clean pending list for index \"%s\" was not recorded",
+							RelationGetRelationName(ginstate->index))));
+	}
 }
 
 /*
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 23ef23c13e..ee981640e3 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2680,6 +2680,10 @@ perform_work_item(AutoVacuumWorkItem *workitem)
 									ObjectIdGetDatum(workitem->avw_relation),
 									Int64GetDatum((int64) workitem->avw_blockNumber));
 				break;
+			case AVW_GINCleanPendingList:
+				DirectFunctionCall1(gin_clean_pending_list,
+									ObjectIdGetDatum(workitem->avw_relation));
+				break;
 			default:
 				elog(WARNING, "unrecognized work item found: type %d",
 					 workitem->avw_type);
@@ -3291,6 +3295,10 @@ autovac_report_workitem(AutoVacuumWorkItem *workitem,
 			snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,
 					 "autovacuum: BRIN summarize");
 			break;
+		case AVW_GINCleanPendingList:
+			snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,
+					 "autovacuum: GIN pending list cleanup");
+			break;
 	}
 
 	/*
diff --git a/src/include/postmaster/autovacuum.h b/src/include/postmaster/autovacuum.h
index aacdd0f575..2cbdde41c4 100644
--- a/src/include/postmaster/autovacuum.h
+++ b/src/include/postmaster/autovacuum.h
@@ -22,7 +22,8 @@
  */
 typedef enum
 {
-	AVW_BRINSummarizeRange
+	AVW_BRINSummarizeRange,
+	AVW_GINCleanPendingList
 } AutoVacuumWorkItemType;
 
 
#2Euler Taveira
euler@eulerto.com
In reply to: Jaime Casanova (#1)
Re: use AV worker items infrastructure for GIN pending list's cleanup

On Mon, Apr 5, 2021, at 3:31 AM, Jaime Casanova wrote:

When AV worker items where introduced 4 years ago, i was suggested that
it could be used for other things like cleaning the pending list of GIN
index when it reaches gin_pending_list_limit instead of making user
visible operation pay the price.

That never happened though. So, here is a little patch for that.

Should I add an entry for this on next commitfest?

+1. It slipped through the cracks along the years. It is even suggested in the
current docs since the fast update support.

https://www.postgresql.org/docs/current/gin-tips.html

To avoid fluctuations in observed response time, it's desirable to have
pending-list cleanup occur in the background (i.e., via autovacuum).

Could you provide a link from the previous discussion?

--
Euler Taveira
EDB https://www.enterprisedb.com/

#3Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Euler Taveira (#2)
Re: use AV worker items infrastructure for GIN pending list's cleanup

On Mon, Apr 05, 2021 at 10:41:22AM -0300, Euler Taveira wrote:

On Mon, Apr 5, 2021, at 3:31 AM, Jaime Casanova wrote:

When AV worker items where introduced 4 years ago, i was suggested that
it could be used for other things like cleaning the pending list of GIN
index when it reaches gin_pending_list_limit instead of making user
visible operation pay the price.

That never happened though. So, here is a little patch for that.

Should I add an entry for this on next commitfest?

+1. It slipped through the cracks along the years. It is even suggested in the
current docs since the fast update support.

https://www.postgresql.org/docs/current/gin-tips.html

Interesting, that comment maybe needs to be rewritten. I would go for
remove completely the first paragraph under gin_pending_list_limit entry

Could you provide a link from the previous discussion?

It happened here:
/messages/by-id/20170301045823.vneqdqkmsd4as4ds@alvherre.pgsql

--
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

#4Joel Jacobson
joel@compiler.org
In reply to: Jaime Casanova (#3)
Re: use AV worker items infrastructure for GIN pending list's cleanup

On Mon, Apr 5, 2021, at 16:47, Jaime Casanova wrote:

On Mon, Apr 05, 2021 at 10:41:22AM -0300, Euler Taveira wrote:

On Mon, Apr 5, 2021, at 3:31 AM, Jaime Casanova wrote:

When AV worker items where introduced 4 years ago, i was suggested that
it could be used for other things like cleaning the pending list of GIN
index when it reaches gin_pending_list_limit instead of making user
visible operation pay the price.

That never happened though. So, here is a little patch for that.

Should I add an entry for this on next commitfest?

+1. It slipped through the cracks along the years. It is even suggested in the
current docs since the fast update support.

https://www.postgresql.org/docs/current/gin-tips.html

Interesting, that comment maybe needs to be rewritten. I would go for
remove completely the first paragraph under gin_pending_list_limit entry

Thanks for working on this patch.

I found this thread searching for "gin_pending_list_limit" in pg hackers after reading an interesting article found via the front page of Hacker News: "Debugging random slow writes in PostgreSQL" (https://iamsafts.com/posts/postgres-gin-performance/).

I thought it could be interesting to read about a real user story where this patch would be helpful.

Hacker News discussion: https://news.ycombinator.com/item?id=27152507

/Joel

#5Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Joel Jacobson (#4)
Re: use AV worker items infrastructure for GIN pending list's cleanup

On Sat, May 15, 2021 at 08:12:51AM +0200, Joel Jacobson wrote:

On Mon, Apr 5, 2021, at 16:47, Jaime Casanova wrote:

On Mon, Apr 05, 2021 at 10:41:22AM -0300, Euler Taveira wrote:

On Mon, Apr 5, 2021, at 3:31 AM, Jaime Casanova wrote:

When AV worker items where introduced 4 years ago, i was suggested that
it could be used for other things like cleaning the pending list of GIN
index when it reaches gin_pending_list_limit instead of making user
visible operation pay the price.

That never happened though. So, here is a little patch for that.

Should I add an entry for this on next commitfest?

+1. It slipped through the cracks along the years. It is even suggested in the
current docs since the fast update support.

https://www.postgresql.org/docs/current/gin-tips.html

Interesting, that comment maybe needs to be rewritten. I would go for
remove completely the first paragraph under gin_pending_list_limit entry

Thanks for working on this patch.

I found this thread searching for "gin_pending_list_limit" in pg hackers after reading an interesting article found via the front page of Hacker News: "Debugging random slow writes in PostgreSQL" (https://iamsafts.com/posts/postgres-gin-performance/).

I thought it could be interesting to read about a real user story where this patch would be helpful.

A customer here has 20+ GIN indexes in a big heavily used table and
every time one of the indexes reaches gin_pending_list_limit (because of
an insert or update) a user feels the impact.

So, currently we have a cronjob running periodically and checking
pending list sizes to process the index before the limit get fired by an
user operation. While the index still is processed and locked the fact
that doesn't happen in the user face make the process less notorious and
in the mind of users faster.

This will provide the same facility, the process will happen "in the
background".

--
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

#6Joel Jacobson
joel@compiler.org
In reply to: Jaime Casanova (#5)
Re: use AV worker items infrastructure for GIN pending list's cleanup

On Sat, May 15, 2021, at 08:42, Jaime Casanova wrote:

A customer here has 20+ GIN indexes in a big heavily used table and
every time one of the indexes reaches gin_pending_list_limit (because of
an insert or update) a user feels the impact.

So, currently we have a cronjob running periodically and checking
pending list sizes to process the index before the limit get fired by an
user operation. While the index still is processed and locked the fact
that doesn't happen in the user face make the process less notorious and
in the mind of users faster.

This will provide the same facility, the process will happen "in the
background".

Sounds like a great improvement, many thanks.

/Joel

#7Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Jaime Casanova (#1)
Re: use AV worker items infrastructure for GIN pending list's cleanup

On Mon, Apr 5, 2021 at 3:31 PM Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:

Hi,

When AV worker items where introduced 4 years ago, i was suggested that
it could be used for other things like cleaning the pending list of GIN
index when it reaches gin_pending_list_limit instead of making user
visible operation pay the price.

That never happened though. So, here is a little patch for that.

Thank you for working on this.

I like the idea of cleaning the GIN pending list using by autovacuum
work item. But with the patch, we request and skip the pending list
cleanup if the pending list size exceeds gin_pending_list_limit during
insertion. But autovacuum work items are executed after an autovacuum
runs. So if many insertions happen before executing the autovacuum
work item, we will end up greatly exceeding the threshold
(gin_pending_list_limit) and registering the same work item again and
again. Maybe we need something like a soft limit and a hard limit?
That is, if the pending list size exceeds the soft limit, we request
the work item. OTOH, if it exceeds the hard limit
(gin_pending_list_limit) we cleanup the pending list before insertion.
We might also need to have autovacuum work items ignore the work item
if the same work item with the same arguments is already registered.
In addition to that, I think we should avoid the work item for
cleaning the pending list from being executed if an autovacuum runs on
the gin index before executing the work item.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#8Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Masahiko Sawada (#7)
Re: use AV worker items infrastructure for GIN pending list's cleanup

On Mon, May 17, 2021 at 01:46:37PM +0900, Masahiko Sawada wrote:

On Mon, Apr 5, 2021 at 3:31 PM Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:

Hi,

When AV worker items where introduced 4 years ago, i was suggested that
it could be used for other things like cleaning the pending list of GIN
index when it reaches gin_pending_list_limit instead of making user
visible operation pay the price.

That never happened though. So, here is a little patch for that.

Thank you for working on this.

I like the idea of cleaning the GIN pending list using by autovacuum
work item. But with the patch, we request and skip the pending list
cleanup if the pending list size exceeds gin_pending_list_limit during
insertion. But autovacuum work items are executed after an autovacuum
runs. So if many insertions happen before executing the autovacuum
work item, we will end up greatly exceeding the threshold
(gin_pending_list_limit) and registering the same work item again and
again. Maybe we need something like a soft limit and a hard limit?
That is, if the pending list size exceeds the soft limit, we request
the work item. OTOH, if it exceeds the hard limit
(gin_pending_list_limit) we cleanup the pending list before insertion.
We might also need to have autovacuum work items ignore the work item
if the same work item with the same arguments is already registered.
In addition to that, I think we should avoid the work item for
cleaning the pending list from being executed if an autovacuum runs on
the gin index before executing the work item.

Thanks for your comments on this. I have been working on a rebased
version, but ENOTIME right now.

Will mark this one as "Returned with feedback" and resubmit for
november.

--
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL