brin autosummarization -- autovacuum "work items"

Started by Alvaro Herreraalmost 9 years ago11 messages
#1Alvaro Herrera
alvherre@2ndquadrant.com
1 attachment(s)

I think one of the most serious issues with BRIN indexes is how they
don't get updated automatically as the table is filled. This patch
attempts to improve on that. During brininsert() time, we check whether
we're inserting the first item on the first page in a range. If we are,
request autovacuum to do a summarization run on that table. This is
dependent on a new reloption for BRIN called "autosummarize", default
off.

The way the request works is that autovacuum maintains a DSA which can
be filled by backends with "work items". Currently, work items can
specify a BRIN summarization of some specific index; in the future we
could use this framework to request other kinds of things that do not
fit in the "dead tuples / recently inserted tuples" logic that autovac
currently uses to decide to vacuum/analyze tables.

However, it seems I have not quite gotten the hang of DSA just yet,
because after a couple of iterations, crashes occur. I think the reason
has to do with either a resource owner clearing the DSA at an unwelcome
time, or perhaps there's a mistake in my handling of DSA "relative
pointers" stuff.

This patch was initially written by Simon Riggs, who envisioned that
brininsert itself would invoke the summarization. However, this doesn't
work because summarization requires having ShareUpdateExclusive lock,
which brininsert doesn't have. So I modified things to instead use the
DSA stuff. (He also set things up so that brininsert would only
summarize the just-filled range, but I didn't preserve that idea in the
autovacuum-based implementation; some changed lines there can probably
be removed.)

--
�lvaro Herrera PostgreSQL Expert, https://www.2ndQuadrant.com/

Attachments:

brin-autosummarize.patchtext/plain; charset=us-asciiDownload
diff --git a/doc/src/sgml/brin.sgml b/doc/src/sgml/brin.sgml
index 6448b18..480895b 100644
--- a/doc/src/sgml/brin.sgml
+++ b/doc/src/sgml/brin.sgml
@@ -74,9 +74,13 @@
    tuple; those tuples remain unsummarized until a summarization run is
    invoked later, creating initial summaries.
    This process can be invoked manually using the
-   <function>brin_summarize_new_values(regclass)</function> function,
-   or automatically when <command>VACUUM</command> processes the table.
+   <function>brin_summarize_new_values(regclass)</function> function;
+   automatically when <command>VACUUM</command> processes the table;
+   or by automatic summarization executed by autovacuum, as insertions
+   occur.  (This last trigger is disabled by default and is enabled with
+   the parameter <literal>autosummarize</literal>.)
   </para>
+
  </sect2>
 </sect1>
 
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index fcb7a60..80d9c39 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -382,7 +382,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
    </variablelist>
 
    <para>
-    <acronym>BRIN</> indexes accept a different parameter:
+    <acronym>BRIN</> indexes accept different parameters:
    </para>
 
    <variablelist>
@@ -396,6 +396,16 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
     </para>
     </listitem>
    </varlistentry>
+
+   <varlistentry>
+    <term><literal>autosummarize</></term>
+    <listitem>
+    <para>
+     Defines whether a summarization run is invoked for the previous page
+     range whenever an insertion is detected on the next one.
+    </para>
+    </listitem>
+   </varlistentry>
    </variablelist>
   </refsect2>
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index b22563b..01586ff 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -26,6 +26,7 @@
 #include "catalog/pg_am.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "postmaster/autovacuum.h"
 #include "storage/bufmgr.h"
 #include "storage/freespace.h"
 #include "utils/builtins.h"
@@ -60,10 +61,12 @@ typedef struct BrinOpaque
 	BrinDesc   *bo_bdesc;
 } BrinOpaque;
 
+#define BRIN_ALL_BLOCKRANGES	InvalidBlockNumber
+
 static BrinBuildState *initialize_brin_buildstate(Relation idxRel,
 						   BrinRevmap *revmap, BlockNumber pagesPerRange);
 static void terminate_brin_buildstate(BrinBuildState *state);
-static void brinsummarize(Relation index, Relation heapRel,
+static void brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange,
 			  double *numSummarized, double *numExisting);
 static void form_and_insert_tuple(BrinBuildState *state);
 static void union_tuples(BrinDesc *bdesc, BrinMemTuple *a,
@@ -126,8 +129,11 @@ brinhandler(PG_FUNCTION_ARGS)
  * with those of the new tuple.  If the tuple values are not consistent with
  * the summary tuple, we need to update the index tuple.
  *
+ * If autosummarization is enabled, check if we need to summarize the previous
+ * page range.
+ *
  * If the range is not currently summarized (i.e. the revmap returns NULL for
- * it), there's nothing to do.
+ * it), there's nothing to do for this tuple.
  */
 bool
 brininsert(Relation idxRel, Datum *values, bool *nulls,
@@ -141,6 +147,7 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
 	Buffer		buf = InvalidBuffer;
 	MemoryContext tupcxt = NULL;
 	MemoryContext oldcxt = CurrentMemoryContext;
+	bool		autosummarize = BrinGetAutoSummarize(idxRel);
 
 	revmap = brinRevmapInitialize(idxRel, &pagesPerRange, NULL);
 
@@ -148,18 +155,41 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
 	{
 		bool		need_insert = false;
 		OffsetNumber off;
-		BrinTuple  *brtup;
+		BrinTuple  *brtup = NULL;
 		BrinMemTuple *dtup;
 		BlockNumber heapBlk;
+		BlockNumber heapBlk0;
 		int			keyno;
 
 		CHECK_FOR_INTERRUPTS();
 
 		heapBlk = ItemPointerGetBlockNumber(heaptid);
 		/* normalize the block number to be the first block in the range */
-		heapBlk = (heapBlk / pagesPerRange) * pagesPerRange;
-		brtup = brinGetTupleForHeapBlock(revmap, heapBlk, &buf, &off, NULL,
-										 BUFFER_LOCK_SHARE, NULL);
+		heapBlk0 = (heapBlk / pagesPerRange) * pagesPerRange;
+
+		/*
+		 * If auto-summarization is enabled and we just inserted the first
+		 * tuple into the first block of a new page range, request a
+		 * summarization run.
+		 */
+		if (autosummarize &&
+			heapBlk == heapBlk0 &&
+			ItemPointerGetOffsetNumber(heaptid) == FirstOffsetNumber)
+		{
+			BlockNumber lastPageRange = heapBlk0;
+
+			if (heapBlk0 >= pagesPerRange)
+				lastPageRange -= pagesPerRange;
+			brtup = brinGetTupleForHeapBlock(revmap, lastPageRange, &buf, &off, NULL,
+											 BUFFER_LOCK_SHARE, NULL);
+			if (!brtup)
+				AutoVacuumRequestWork(AVW_BRINSummarizeRange,
+									  RelationGetRelid(idxRel));
+		}
+
+		if (!brtup)
+			brtup = brinGetTupleForHeapBlock(revmap, heapBlk0, &buf, &off,
+											 NULL, BUFFER_LOCK_SHARE, NULL);
 
 		/* if range is unsummarized, there's nothing to do */
 		if (!brtup)
@@ -747,7 +777,7 @@ brinvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 
 	brin_vacuum_scan(info->index, info->strategy);
 
-	brinsummarize(info->index, heapRel,
+	brinsummarize(info->index, heapRel, BRIN_ALL_BLOCKRANGES,
 				  &stats->num_index_tuples, &stats->num_index_tuples);
 
 	heap_close(heapRel, AccessShareLock);
@@ -765,7 +795,8 @@ brinoptions(Datum reloptions, bool validate)
 	BrinOptions *rdopts;
 	int			numoptions;
 	static const relopt_parse_elt tab[] = {
-		{"pages_per_range", RELOPT_TYPE_INT, offsetof(BrinOptions, pagesPerRange)}
+		{"pages_per_range", RELOPT_TYPE_INT, offsetof(BrinOptions, pagesPerRange)},
+		{"autosummarize", RELOPT_TYPE_BOOL, offsetof(BrinOptions, autosummarize)}
 	};
 
 	options = parseRelOptions(reloptions, validate, RELOPT_KIND_BRIN,
@@ -837,7 +868,7 @@ brin_summarize_new_values(PG_FUNCTION_ARGS)
 						RelationGetRelationName(indexRel))));
 
 	/* OK, do it */
-	brinsummarize(indexRel, heapRel, &numSummarized, NULL);
+	brinsummarize(indexRel, heapRel, BRIN_ALL_BLOCKRANGES, &numSummarized, NULL);
 
 	relation_close(indexRel, ShareUpdateExclusiveLock);
 	relation_close(heapRel, ShareUpdateExclusiveLock);
@@ -1063,17 +1094,17 @@ summarize_range(IndexInfo *indexInfo, BrinBuildState *state, Relation heapRel,
 }
 
 /*
- * Scan a complete BRIN index, and summarize each page range that's not already
- * summarized.  The index and heap must have been locked by caller in at
- * least ShareUpdateExclusiveLock mode.
+ * Scan a portion of a BRIN index, and summarize each page range that's not
+ * already summarized.  The index and heap must have been locked by caller in
+ * at least ShareUpdateExclusiveLock mode.
  *
  * For each new index tuple inserted, *numSummarized (if not NULL) is
  * incremented; for each existing tuple, *numExisting (if not NULL) is
  * incremented.
  */
 static void
-brinsummarize(Relation index, Relation heapRel, double *numSummarized,
-			  double *numExisting)
+brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange,
+			  double *numSummarized, double *numExisting)
 {
 	BrinRevmap *revmap;
 	BrinBuildState *state = NULL;
@@ -1082,6 +1113,8 @@ brinsummarize(Relation index, Relation heapRel, double *numSummarized,
 	BlockNumber heapBlk;
 	BlockNumber pagesPerRange;
 	Buffer		buf;
+	BlockNumber startBlk;
+	BlockNumber endBlk;
 
 	revmap = brinRevmapInitialize(index, &pagesPerRange, NULL);
 
@@ -1090,7 +1123,20 @@ brinsummarize(Relation index, Relation heapRel, double *numSummarized,
 	 */
 	buf = InvalidBuffer;
 	heapNumBlocks = RelationGetNumberOfBlocks(heapRel);
-	for (heapBlk = 0; heapBlk < heapNumBlocks; heapBlk += pagesPerRange)
+	if (pageRange == BRIN_ALL_BLOCKRANGES ||
+		pageRange > heapNumBlocks)
+	{
+		startBlk = 0;
+		endBlk = heapNumBlocks;
+	}
+	else
+	{
+		startBlk = pageRange;
+		endBlk = startBlk + pagesPerRange;
+		if (endBlk > heapNumBlocks)
+			endBlk = heapNumBlocks - 1;
+	}
+	for (heapBlk = startBlk; heapBlk < endBlk; heapBlk += pagesPerRange)
 	{
 		BrinTuple  *tup;
 		OffsetNumber off;
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 0de6999..5d45b48 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -205,7 +205,11 @@ brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber heapBlk,
 	/* normalize the heap block number to be the first page in the range */
 	heapBlk = (heapBlk / revmap->rm_pagesPerRange) * revmap->rm_pagesPerRange;
 
-	/* Compute the revmap page number we need */
+	/*
+	 * Compute the revmap page number we need.  If Invalid is returned (i.e.,
+	 * the revmap page hasn't been created yet), the requested page range is
+	 * not summarized.
+	 */
 	mapBlk = revmap_get_blkno(revmap, heapBlk);
 	if (mapBlk == InvalidBlockNumber)
 	{
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 42b4ea4..66b493c 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -58,6 +58,15 @@ static relopt_bool boolRelOpts[] =
 {
 	{
 		{
+			"autosummarize",
+			"Enables automatic summarization on this BRIN index",
+			RELOPT_KIND_BRIN,
+			AccessExclusiveLock
+		},
+		false
+	},
+	{
+		{
 			"autovacuum_enabled",
 			"Enables autovacuum in this relation",
 			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index e8de9a3..c9853a9 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -92,7 +92,9 @@
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "tcop/tcopprot.h"
+#include "utils/dsa.h"
 #include "utils/fmgroids.h"
+#include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -252,9 +254,10 @@ typedef enum
  * av_runningWorkers the WorkerInfo non-free queue
  * av_startingWorker pointer to WorkerInfo currently being started (cleared by
  *					the worker itself as soon as it's up and running)
+ * av_dsa_handle	handle for allocatable shared memory
  *
  * This struct is protected by AutovacuumLock, except for av_signal and parts
- * of the worker list (see above).
+ * of the worker list (see above).  av_dsa_handle is readable unlocked.
  *-------------
  */
 typedef struct
@@ -264,6 +267,8 @@ typedef struct
 	dlist_head	av_freeWorkers;
 	dlist_head	av_runningWorkers;
 	WorkerInfo	av_startingWorker;
+	dsa_handle	av_dsa_handle;
+	dsa_pointer	av_workitems;
 } AutoVacuumShmemStruct;
 
 static AutoVacuumShmemStruct *AutoVacuumShmem;
@@ -278,6 +283,30 @@ static MemoryContext DatabaseListCxt = NULL;
 /* Pointer to my own WorkerInfo, valid on each worker */
 static WorkerInfo MyWorkerInfo = NULL;
 
+/*
+ * Autovacuum workitem array, stored in AutoVacuumShmem->av_workitems.  This
+ * list is mostly protected by AutovacuumLock, except that an autovacuum
+ * worker may "claim" an item (by marking it active), and then no other process
+ * is allowed to touch it.
+ */
+typedef struct AutoVacuumWorkItem
+{
+	AutoVacuumWorkItemType avw_type;
+	Oid			avw_database;
+	Oid			avw_relation;
+	bool		avw_active;
+	dsa_pointer	avw_next;
+} AutoVacuumWorkItem;
+
+#define NUM_WORKITEMS	256
+typedef struct
+{
+	dsa_pointer		avs_usedItems;
+	dsa_pointer		avs_freeItems;
+} AutovacWorkItems;
+
+static dsa_area	*AutoVacuumDSA = NULL;
+
 /* PID of launcher, valid only in worker while shutting down */
 int			AutovacuumLauncherPid = 0;
 
@@ -316,6 +345,7 @@ static AutoVacOpts *extract_autovac_opts(HeapTuple tup,
 static PgStat_StatTabEntry *get_pgstat_tabentry_relid(Oid relid, bool isshared,
 						  PgStat_StatDBEntry *shared,
 						  PgStat_StatDBEntry *dbentry);
+static void perform_work_item(AutoVacuumWorkItem *workitem);
 static void autovac_report_activity(autovac_table *tab);
 static void av_sighup_handler(SIGNAL_ARGS);
 static void avl_sigusr2_handler(SIGNAL_ARGS);
@@ -574,6 +604,22 @@ AutoVacLauncherMain(int argc, char *argv[])
 	 */
 	rebuild_database_list(InvalidOid);
 
+	/*
+	 * Set up our DSA so that backends can install work-item requests.  It may
+	 * already exist as created by a previous launcher.
+	 */
+	if (!AutoVacuumShmem->av_dsa_handle)
+	{
+		LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
+		AutoVacuumDSA = dsa_create(LWTRANCHE_AUTOVACUUM);
+		AutoVacuumShmem->av_dsa_handle = dsa_get_handle(AutoVacuumDSA);
+		/* delay array allocation until first request */
+		AutoVacuumShmem->av_workitems = InvalidDsaPointer;
+		LWLockRelease(AutovacuumLock);
+	}
+	else
+		AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);
+
 	/* loop until shutdown request */
 	while (!got_SIGTERM)
 	{
@@ -1617,6 +1663,13 @@ AutoVacWorkerMain(int argc, char *argv[])
 	{
 		char		dbname[NAMEDATALEN];
 
+		if (AutoVacuumShmem->av_dsa_handle)
+		{
+			/* First use of DSA in this worker, so attach to it */
+			Assert(!AutoVacuumDSA);
+			AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);
+		}
+
 		/*
 		 * Report autovac startup to the stats collector.  We deliberately do
 		 * this before InitPostgres, so that the last_autovac_time will get
@@ -2467,6 +2520,69 @@ deleted:
 	}
 
 	/*
+	 * Perform additional work items, as requested by backends.
+	 */
+	if (AutoVacuumShmem->av_workitems)
+	{
+		dsa_pointer		nextitem;
+		AutovacWorkItems *workitems;
+
+		LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
+
+		/*
+		 * Scan the list of pending items, and process the inactive ones in our
+		 * database.
+		 */
+		workitems = (AutovacWorkItems *)
+			dsa_get_address(AutoVacuumDSA, AutoVacuumShmem->av_workitems);
+		nextitem = workitems->avs_usedItems;
+
+		while (nextitem != InvalidDsaPointer)
+		{
+			AutoVacuumWorkItem	*workitem;
+
+			workitem = (AutoVacuumWorkItem *)
+				dsa_get_address(AutoVacuumDSA, nextitem);
+
+			if (workitem->avw_database == MyDatabaseId && !workitem->avw_active)
+			{
+				/* claim this one, and release lock while we process it */
+				workitem->avw_active = true;
+
+				LWLockRelease(AutovacuumLock);
+				perform_work_item(workitem);
+
+				/*
+				 * Check for config changes before acquiring lock for further
+				 * jobs.
+				 */
+				CHECK_FOR_INTERRUPTS();
+				if (got_SIGHUP)
+				{
+					got_SIGHUP = false;
+					ProcessConfigFile(PGC_SIGHUP);
+				}
+
+				LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
+
+				/*
+				 * Remove the job we just completed from the used list and put
+				 * the array item back on the free list.
+				 */
+				workitems->avs_usedItems = workitem->avw_next;
+				workitem->avw_next = workitems->avs_freeItems;
+				workitems->avs_freeItems = nextitem;
+			}
+
+			/* prepare for next iteration */
+			nextitem = workitems->avs_usedItems;
+		}
+
+		/* all done */
+		LWLockRelease(AutovacuumLock);
+	}
+
+	/*
 	 * We leak table_toast_map here (among other things), but since we're
 	 * going away soon, it's not a problem.
 	 */
@@ -2498,6 +2614,101 @@ deleted:
 	CommitTransactionCommand();
 }
 
+static void
+perform_work_item(AutoVacuumWorkItem *workitem)
+{
+	char	   *cur_datname = NULL;
+	char	   *cur_nspname = NULL;
+	char	   *cur_relname = NULL;
+
+	elog(LOG, "performing work on relation %u", workitem->avw_relation);
+
+	/*
+	 * Note we do not store table info in MyWorkerInfo, since this is not
+	 * vacuuming proper.
+	 */
+
+	/*
+	 * Save the relation name for a possible error message, to avoid a
+	 * catalog lookup in case of an error.  If any of these return NULL,
+	 * then the relation has been dropped since last we checked; skip it.
+	 * Note: they must live in a long-lived memory context because we call
+	 * vacuum and analyze in different transactions.
+	 */
+
+	cur_relname = get_rel_name(workitem->avw_relation);
+	cur_nspname = get_namespace_name(get_rel_namespace(workitem->avw_relation));
+	cur_datname = get_database_name(MyDatabaseId);
+	if (!cur_relname || !cur_nspname || !cur_datname)
+		goto deleted2;
+
+	/*
+	 * We will abort the current work item if something errors out, and
+	 * continue with the next one; in particular, this happens if we are
+	 * interrupted with SIGINT.  XXX but the work item was already deleted
+	 * from the work list.  Maybe instead of this we should set a "being
+	 * processed" flag in the work item, move it to the back of the list,
+	 * and only delete if we're successful.
+	 */
+	PG_TRY();
+	{
+		/* have at it */
+		MemoryContextSwitchTo(TopTransactionContext);
+
+		switch (workitem->avw_type)
+		{
+			case AVW_BRINSummarizeRange:
+				DirectFunctionCall1(brin_summarize_new_values,
+									ObjectIdGetDatum(workitem->avw_relation));
+				break;
+			default:
+				elog(WARNING, "unrecognized work item found: type %d",
+					 workitem->avw_type);
+				break;
+		}
+
+		/*
+		 * Clear a possible query-cancel signal, to avoid a late reaction
+		 * to an automatically-sent signal because of vacuuming the
+		 * current table (we're done with it, so it would make no sense to
+		 * cancel at this point.)
+		 */
+		QueryCancelPending = false;
+	}
+	PG_CATCH();
+	{
+		/*
+		 * Abort the transaction, start a new one, and proceed with the
+		 * next table in our list.
+		 */
+		HOLD_INTERRUPTS();
+		errcontext("processing work entry for relation \"%s.%s.%s\"",
+				   cur_datname, cur_nspname, cur_relname);
+		EmitErrorReport();
+
+		/* this resets the PGXACT flags too */
+		AbortOutOfAnyTransaction();
+		FlushErrorState();
+		MemoryContextResetAndDeleteChildren(PortalContext);
+
+		/* restart our transaction for the following operations */
+		StartTransactionCommand();
+		RESUME_INTERRUPTS();
+	}
+	PG_END_TRY();
+
+	/* We intentionally do not set did_vacuum here */
+
+	/* be tidy */
+deleted2:
+	if (cur_datname)
+		pfree(cur_datname);
+	if (cur_nspname)
+		pfree(cur_nspname);
+	if (cur_relname)
+		pfree(cur_relname);
+}
+
 /*
  * extract_autovac_opts
  *
@@ -2959,6 +3170,119 @@ AutoVacuumingActive(void)
 }
 
 /*
+ * Request one work item to the next autovacuum run processing our database.
+ */
+void
+AutoVacuumRequestWork(AutoVacuumWorkItemType type, Oid relationId)
+{
+	AutovacWorkItems *workitems;
+	dsa_pointer		wi_ptr;
+	AutoVacuumWorkItem *workitem;
+
+	elog(LOG, "requesting work on relation %u", relationId);
+
+	LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
+
+	/*
+	 * It may be useful to deduplicate the list upon insertion.  For the only
+	 * currently existing caller, this is not necessary.
+	 */
+
+	/* First use in this process?  Initialize DSA */
+	if (!AutoVacuumDSA)
+	{
+		if (!AutoVacuumShmem->av_dsa_handle)
+		{
+			/* autovacuum launcher not started; nothing can be done */
+			LWLockRelease(AutovacuumLock);
+			return;
+		}
+		AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);
+
+		if (!AutoVacuumDSA)
+		{
+			/* cannot attach?  disregard request */
+			LWLockRelease(AutovacuumLock);
+			return;
+		}
+	}
+
+	/* First use overall?  Allocate work items array */
+	if (AutoVacuumShmem->av_workitems == InvalidDsaPointer)
+	{
+		int		i;
+		AutovacWorkItems *workitems;
+
+		AutoVacuumShmem->av_workitems =
+			dsa_allocate_extended(AutoVacuumDSA,
+								  sizeof(AutovacWorkItems) +
+								  NUM_WORKITEMS * sizeof(AutoVacuumWorkItem),
+								  DSA_ALLOC_NO_OOM);
+		/* if out of memory, silently disregard the request */
+		if (AutoVacuumShmem->av_workitems == InvalidDsaPointer)
+		{
+			dsa_detach(AutoVacuumDSA);
+			AutoVacuumDSA = NULL;
+			LWLockRelease(AutovacuumLock);
+			return;
+		}
+
+		/* Initialize each array entry as a member of the free list */
+		workitems = dsa_get_address(AutoVacuumDSA, AutoVacuumShmem->av_workitems);
+
+		workitems->avs_usedItems = InvalidDsaPointer;
+		workitems->avs_freeItems = InvalidDsaPointer;
+		for (i = 0; i < NUM_WORKITEMS; i++)
+		{
+			/* XXX surely there is a simpler way to do this */
+			wi_ptr = AutoVacuumShmem->av_workitems + sizeof(AutovacWorkItems) +
+				sizeof(AutoVacuumWorkItem) * i;
+			workitem = (AutoVacuumWorkItem *) dsa_get_address(AutoVacuumDSA, wi_ptr);
+
+			workitem->avw_type = 0;
+			workitem->avw_database = InvalidOid;
+			workitem->avw_relation = InvalidOid;
+			workitem->avw_active = false;
+
+			/* put this item in the free list */
+			workitem->avw_next = workitems->avs_freeItems;
+			workitems->avs_freeItems = wi_ptr;
+		}
+	}
+
+	workitems = (AutovacWorkItems *)
+		dsa_get_address(AutoVacuumDSA, AutoVacuumShmem->av_workitems);
+
+	/* If array is full, disregard the request */
+	if (workitems->avs_freeItems == InvalidDsaPointer)
+	{
+		LWLockRelease(AutovacuumLock);
+		dsa_detach(AutoVacuumDSA);
+		AutoVacuumDSA = NULL;
+		return;
+	}
+
+	/* remove workitem struct from free list ... */
+	wi_ptr = workitems->avs_freeItems;
+	workitem = dsa_get_address(AutoVacuumDSA, wi_ptr);
+	workitems->avs_freeItems = workitem->avw_next;
+
+	/* ... initialize it ... */
+	workitem->avw_type = type;
+	workitem->avw_database = MyDatabaseId;
+	workitem->avw_relation = relationId;
+	workitem->avw_active = false;
+	workitem->avw_next = workitems->avs_usedItems;
+
+	/* ... and put it on autovacuum's to-do list */
+	workitems->avs_usedItems = wi_ptr;
+
+	LWLockRelease(AutovacuumLock);
+	dsa_detach(AutoVacuumDSA);
+	AutoVacuumDSA = NULL;
+}
+
+/*
  * autovac_init
  *		This is called at postmaster initialization.
  *
diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index 54378bc..b8c96db 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -1095,7 +1095,8 @@ dsm_create_descriptor(void)
 {
 	dsm_segment *seg;
 
-	ResourceOwnerEnlargeDSMs(CurrentResourceOwner);
+	if (CurrentResourceOwner)
+		ResourceOwnerEnlargeDSMs(CurrentResourceOwner);
 
 	seg = MemoryContextAlloc(TopMemoryContext, sizeof(dsm_segment));
 	dlist_push_head(&dsm_segment_list, &seg->node);
@@ -1106,8 +1107,11 @@ dsm_create_descriptor(void)
 	seg->mapped_address = NULL;
 	seg->mapped_size = 0;
 
-	seg->resowner = CurrentResourceOwner;
-	ResourceOwnerRememberDSM(CurrentResourceOwner, seg);
+	if (CurrentResourceOwner)
+	{
+		seg->resowner = CurrentResourceOwner;
+		ResourceOwnerRememberDSM(CurrentResourceOwner, seg);
+	}
 
 	slist_init(&seg->on_detach);
 
diff --git a/src/backend/utils/mmgr/dsa.c b/src/backend/utils/mmgr/dsa.c
index 49e68b4..6d5d12a 100644
--- a/src/backend/utils/mmgr/dsa.c
+++ b/src/backend/utils/mmgr/dsa.c
@@ -498,7 +498,7 @@ dsa_get_handle(dsa_area *area)
 
 /*
  * Attach to an area given a handle generated (possibly in another process) by
- * dsa_get_area_handle.  The area must have been created with dsa_create (not
+ * dsa_get_handle.  The area must have been created with dsa_create (not
  * dsa_create_in_place).
  */
 dsa_area *
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index 896824a..3f4c29b 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -22,6 +22,7 @@ typedef struct BrinOptions
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	BlockNumber pagesPerRange;
+	bool		autosummarize;
 } BrinOptions;
 
 #define BRIN_DEFAULT_PAGES_PER_RANGE	128
@@ -29,5 +30,9 @@ typedef struct BrinOptions
 	((relation)->rd_options ? \
 	 ((BrinOptions *) (relation)->rd_options)->pagesPerRange : \
 	  BRIN_DEFAULT_PAGES_PER_RANGE)
+#define BrinGetAutoSummarize(relation) \
+	((relation)->rd_options ? \
+	 ((BrinOptions *) (relation)->rd_options)->autosummarize : \
+	  false)
 
 #endif   /* BRIN_H */
diff --git a/src/include/postmaster/autovacuum.h b/src/include/postmaster/autovacuum.h
index 99d7f09..a871508 100644
--- a/src/include/postmaster/autovacuum.h
+++ b/src/include/postmaster/autovacuum.h
@@ -14,6 +14,15 @@
 #ifndef AUTOVACUUM_H
 #define AUTOVACUUM_H
 
+/*
+ * Other processes can request specific work from autovacuum, identified by
+ * AutoVacuumWorkItem elements.
+ */
+typedef enum
+{
+	AVW_BRINSummarizeRange
+} AutoVacuumWorkItemType;
+
 
 /* GUC variables */
 extern bool autovacuum_start_daemon;
@@ -60,6 +69,9 @@ extern void AutovacuumWorkerIAm(void);
 extern void AutovacuumLauncherIAm(void);
 #endif
 
+extern void AutoVacuumRequestWork(AutoVacuumWorkItemType type,
+					  Oid relationId);
+
 /* shared memory stuff */
 extern Size AutoVacuumShmemSize(void);
 extern void AutoVacuumShmemInit(void);
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 8bd93c3..df27aca 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -211,6 +211,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_BUFFER_MAPPING,
 	LWTRANCHE_LOCK_MANAGER,
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
+	LWTRANCHE_AUTOVACUUM,
 	LWTRANCHE_PARALLEL_QUERY_DSA,
 	LWTRANCHE_FIRST_USER_DEFINED
 }	BuiltinTrancheIds;
#2Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Alvaro Herrera (#1)
Re: brin autosummarization -- autovacuum "work items"

On Wed, Mar 1, 2017 at 5:58 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

I think one of the most serious issues with BRIN indexes is how they
don't get updated automatically as the table is filled. This patch
attempts to improve on that. During brininsert() time, we check whether
we're inserting the first item on the first page in a range. If we are,
request autovacuum to do a summarization run on that table. This is
dependent on a new reloption for BRIN called "autosummarize", default
off.

Nice.

The way the request works is that autovacuum maintains a DSA which can
be filled by backends with "work items". Currently, work items can
specify a BRIN summarization of some specific index; in the future we
could use this framework to request other kinds of things that do not
fit in the "dead tuples / recently inserted tuples" logic that autovac
currently uses to decide to vacuum/analyze tables.

However, it seems I have not quite gotten the hang of DSA just yet,
because after a couple of iterations, crashes occur. I think the reason
has to do with either a resource owner clearing the DSA at an unwelcome
time, or perhaps there's a mistake in my handling of DSA "relative
pointers" stuff.

Ok, I'll take a look. It's set up for ease of use in short lifespan
situations like parallel query, and there are a few extra hoops to
jump through for longer lived DSA areas.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Thomas Munro (#2)
Re: brin autosummarization -- autovacuum "work items"

On Wed, Mar 1, 2017 at 6:06 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Wed, Mar 1, 2017 at 5:58 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

I think one of the most serious issues with BRIN indexes is how they
don't get updated automatically as the table is filled. This patch
attempts to improve on that. During brininsert() time, we check whether
we're inserting the first item on the first page in a range. If we are,
request autovacuum to do a summarization run on that table. This is
dependent on a new reloption for BRIN called "autosummarize", default
off.

Nice.

The way the request works is that autovacuum maintains a DSA which can
be filled by backends with "work items". Currently, work items can
specify a BRIN summarization of some specific index; in the future we
could use this framework to request other kinds of things that do not
fit in the "dead tuples / recently inserted tuples" logic that autovac
currently uses to decide to vacuum/analyze tables.

However, it seems I have not quite gotten the hang of DSA just yet,
because after a couple of iterations, crashes occur. I think the reason
has to do with either a resource owner clearing the DSA at an unwelcome
time, or perhaps there's a mistake in my handling of DSA "relative
pointers" stuff.

Ok, I'll take a look. It's set up for ease of use in short lifespan
situations like parallel query, and there are a few extra hoops to
jump through for longer lived DSA areas.

I haven't tested this, but here is some initial feedback after reading
it through once:

 /*
  * Attach to an area given a handle generated (possibly in another process) by
- * dsa_get_area_handle.  The area must have been created with dsa_create (not
+ * dsa_get_handle.  The area must have been created with dsa_create (not
  * dsa_create_in_place).
  */

This is an independent slam-dunk typo fix.

+    /*
+     * Set up our DSA so that backends can install work-item requests.  It may
+     * already exist as created by a previous launcher.
+     */
+    if (!AutoVacuumShmem->av_dsa_handle)
+    {
+        LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
+        AutoVacuumDSA = dsa_create(LWTRANCHE_AUTOVACUUM);
+        AutoVacuumShmem->av_dsa_handle = dsa_get_handle(AutoVacuumDSA);
+        /* delay array allocation until first request */
+        AutoVacuumShmem->av_workitems = InvalidDsaPointer;
+        LWLockRelease(AutovacuumLock);
+    }
+    else
+        AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);

I haven't looked into the autovacuum launcher lifecycle, but if it can
be restarted as implied by the above then I see no reason to believe
that the DSA area still exists at the point where you call
dsa_attach() here. DSA areas are reference counted, so if there ever
a scenario where no backend is currently attached, then it will be
destroyed and this call will fail. If you want to create a DSA area
that lasts until cluster shutdown even while all backends are
detached, you need to call dsa_pin() after creating it.

In AutoVacuumRequestWork:

+        AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);
+
+        if (!AutoVacuumDSA)
+        {
+            /* cannot attach?  disregard request */
+            LWLockRelease(AutovacuumLock);
+            return;
+        }

dsa_attach either succeeds or throws, so that conditional code is unreachable.

+            /* XXX surely there is a simpler way to do this */
+            wi_ptr = AutoVacuumShmem->av_workitems + sizeof(AutovacWorkItems) +
+                sizeof(AutoVacuumWorkItem) * i;
+            workitem = (AutoVacuumWorkItem *)
dsa_get_address(AutoVacuumDSA, wi_ptr);

It'd probably be simpler to keep hold of the backend-local address of
the the base of the workitems array and then use regular C language
facilities like array notation to work with it: workitems =
&workitems[i], and then:

+    /* ... and put it on autovacuum's to-do list */
+    workitems->avs_usedItems = wi_ptr;

Considering that i is really an index into the contiguous workitems
array, maybe you should really just be storing the index from i here,
instead of dealing in dsa_pointers. The idea with dsa_pointers is
that they're useful for complex data structures that might point into
different DSM segments, like a hash table or binary tree that has
internal pointers that could pointer arbitrary other objects in a data
structure because it's allocated in incremental pieces. Here, you are
dealing with objects in a contiguous memory space of fixed size. This
leads to a bigger question about this design, which I'll ask at the
end.

Then at the bottom of AutoVacuumRequestWork:

+    LWLockRelease(AutovacuumLock);
+    dsa_detach(AutoVacuumDSA);
+    AutoVacuumDSA = NULL;
+}

I'm guessing that you intended to remain attached, rather than
detaching at the end like this? Otherwise every backend that is
inserting lots of new data attaches and detaches repeatedly, which
seems unnecessary. If you do that, you'll need to run
dsa_pin_mapping() after attaching, or else the DSA area will be
unmapped at end of transaction and future attempts to access it will
segfault.

In dsm.c:

-    ResourceOwnerEnlargeDSMs(CurrentResourceOwner);
+    if (CurrentResourceOwner)
+        ResourceOwnerEnlargeDSMs(CurrentResourceOwner);

... and then:

-    seg->resowner = CurrentResourceOwner;
-    ResourceOwnerRememberDSM(CurrentResourceOwner, seg);
+    if (CurrentResourceOwner)
+    {
+        seg->resowner = CurrentResourceOwner;
+        ResourceOwnerRememberDSM(CurrentResourceOwner, seg);
+    }

This makes sense. It allows DSMs (and therefore also DSA areas) to be
created when you don't have a resource owner. In fact dsm.c
contradicts itself sightly in this area: dsm_create() clearly believes
that seg->segment can be NULL after dsm_create_descriptor() returns
(see code near "too many dynamic shared memory segments" error), but
dsm_create_descriptor() doesn't believe that to be the case without
your patch, so perhaps this should be a separate commit to fix that
rough edge. However, I think dsm_create_descriptor() still needs to
assign seg->resowner even when it's NULL, otherwise it's
uninitialised.

My solution to this problem when I wrote a couple of different things
that used long lifetime DSA areas (experimental things not posted on
this list) was to define a CurrentResourceOwner with a name like "Foo
Top Level", and then after creating/attaching the segment I'd call
dsa_pin_mapping() which in turn calls dsm_pin_mapping() on all
segments. Your solution starts out in the pinned mapping state
instead (= disconnected from resource owner machinery), which is
better.

In AutoVacWorkerMain:

+ if (workitem->avw_database == MyDatabaseId &&
!workitem->avw_active)

Stepping over already-active items in the list seems OK because the
number of such items is bounded by the number of workers. Stepping
over all items for other databases sounds quite expensive if it
happens very often, because these are not so bounded. Ah, there can't
be more than NUM_WORKITEMS, which is small.

+                /*
+                 * Remove the job we just completed from the used list and put
+                 * the array item back on the free list.
+                 */
+                workitems->avs_usedItems = workitem->avw_next;

Isn't this corrupting the list avs_usedItems queue if avw_next points
to an item that some other worker has removed from the list while we
were working on our item?

Stepping back from the code a bit:

What is your motivation for using DSA? It seems you are creating an
area and then using it to make exactly one allocation of a constant
size known up front to hold your fixed size workitems array. You
don't do any dynamic allocation at runtime, apart from the detail that
it happens to allocated on demand. Perhaps it would make sense if you
had a separate space per database or something like that, so that the
shared memory need would be dynamic?

It looks like outstanding autosummarisation work will be forgotten if
you restart before it is processed. Over in another thread[1]/messages/by-id/20170130191640.2johoyume5v2dbbq@alvherre.pgsql we
exchanged emails on another way to recognise that summarisation work
needs to be done, if we are only interested in unsummarised ranges at
the end of the heap. I haven't studied BRIN enough to know if that is
insufficient: can you finish up with unsummarised ranges not in a
contiguous range at the end of the heap? If so, perhaps the BRIN
index itself should also have a way to record that certain non-final
ranges are unsummarised but should be summarised asynchronously? Then
the system could be made to behave exactly the same way no matter when
reboots occur, which seems like a good property.

[1]: /messages/by-id/20170130191640.2johoyume5v2dbbq@alvherre.pgsql

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Thomas Munro (#2)
Re: brin autosummarization -- autovacuum "work items"

On Wed, Mar 1, 2017 at 6:06 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Wed, Mar 1, 2017 at 5:58 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

I think one of the most serious issues with BRIN indexes is how they
don't get updated automatically as the table is filled. This patch
attempts to improve on that. During brininsert() time, we check whether
we're inserting the first item on the first page in a range. If we are,
request autovacuum to do a summarization run on that table. This is
dependent on a new reloption for BRIN called "autosummarize", default
off.

Nice.

Another thought about this design: Why autovacuum?

Obviously we don't want to get to the point where you start up
PostgreSQL and see 25 llines like BRIN SummarizationLauncher started,
Foo Launcher started, Bar Launcher started, ... but perhaps there is a
middle ground between piling all the background work into the
autovacuum framework, and making new dedicated launchers and workers
for each thing.

Is there some way we could turn this kind of maintenance work into a
'task' (insert better word) that can be scheduled to run
asynchronously by magic workers, so that you don't have to supply a
whole worker and main loop and possibly launcher OR jam new
non-vacuum-related work into the vacuum machinery, for each thing like
this that someone decides to invent?

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Thomas Munro (#4)
Re: brin autosummarization -- autovacuum "work items"

Thomas Munro wrote:

Another thought about this design: Why autovacuum?

One reason is that autovacuum is already there, so it's convenient to
give it the responsibility for this kind of task. Another reason is
that autovacuum is already doing this, via vacuum. I don't see the
need to have a completely different process set for tasks that belong to
the system's cleanup process.

With this infrastructure, we could have other types of individual tasks
that could be run by autovacuum. GIN pending list cleanup, for
instance, or VM bit setting. Both of those are currently being doing
whenever VACUUM fires, but only because at the time they were written
there was no other convenient place to hook them onto.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Thomas Munro (#3)
Re: brin autosummarization -- autovacuum "work items"

Thomas Munro wrote:

What is your motivation for using DSA? It seems you are creating an
area and then using it to make exactly one allocation of a constant
size known up front to hold your fixed size workitems array. You
don't do any dynamic allocation at runtime, apart from the detail that
it happens to allocated on demand. Perhaps it would make sense if you
had a separate space per database or something like that, so that the
shared memory need would be dynamic?

Well, the number of work items is currently fixed; but if you have many
BRIN indexes, then you'd overflow (lose requests). By using DSA I am
making it easy to patch this afterwards so that an arbitrary number of
requests can be recorded.

It looks like outstanding autosummarisation work will be forgotten if
you restart before it is processed.

That's true. However, it would be easy to make index scans also request
work items if they find a full page range that should have been
summarized, so if they are lost, it's not a big deal.

Over in another thread[1] we
exchanged emails on another way to recognise that summarisation work
needs to be done, if we are only interested in unsummarised ranges at
the end of the heap. I haven't studied BRIN enough to know if that is
insufficient: can you finish up with unsummarised ranges not in a
contiguous range at the end of the heap?

If we include my other patch to remove the index tuple for a certain
range, then yes, it can happen. (That proposed patch requires manual
action, but range invalidation could also be invoked automatically when,
say, a certain number of tuples are removed from a page range.)

If so, perhaps the BRIN
index itself should also have a way to record that certain non-final
ranges are unsummarised but should be summarised asynchronously?

I think this is unnecessary, and would lead to higher operating
overhead. With this patch, it's very cheap to file a work item.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#6)
1 attachment(s)
Re: brin autosummarization -- autovacuum "work items"

Here's a version of this patch which I consider final.

Thanks for your tips on using DSA. No crashes now.

I am confused about not needing dsa_attach the second time around. If I
do that, the dsa handle has been 0x7f'd, which I don't understand since
it is supposed to be allocated in TopMemoryContext. I didn't dig too
deep to try and find what is causing that behavior. Once we do, it's
easy to remove the dsa_detach/dsa_attach calls.

I added a new SQL-callable function to invoke summarization of an
individual page range. That is what I wanted to do in vacuum (rather
than a scan of the complete index), and it seems independently useful.

I also removed the behavior that on index creation the final partial
block range is always summarized. It's pointless.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

brin-autosummarize-2.patchtext/plain; charset=us-asciiDownload
diff --git a/doc/src/sgml/brin.sgml b/doc/src/sgml/brin.sgml
index 5bf11dc..5140a38 100644
--- a/doc/src/sgml/brin.sgml
+++ b/doc/src/sgml/brin.sgml
@@ -74,9 +74,14 @@
    tuple; those tuples remain unsummarized until a summarization run is
    invoked later, creating initial summaries.
    This process can be invoked manually using the
-   <function>brin_summarize_new_values(regclass)</function> function,
-   or automatically when <command>VACUUM</command> processes the table.
+   <function>brin_summarize_range(regclass, bigint)</function> or
+   <function>brin_summarize_new_values(regclass)</function> functions;
+   automatically when <command>VACUUM</command> processes the table;
+   or by automatic summarization executed by autovacuum, as insertions
+   occur.  (This last trigger is disabled by default and can be enabled
+   with the <literal>autosummarize</literal> parameter.)
   </para>
+
  </sect2>
 </sect1>
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6887eab..25c18d1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19685,6 +19685,13 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
       </row>
       <row>
        <entry>
+        <literal><function>brin_summarize_range(<parameter>index</> <type>regclass</>, <parameter>blockNumber</> <type>bigint</type>)</function></literal>
+       </entry>
+       <entry><type>integer</type></entry>
+       <entry>summarize the page range covering the given block, if not already summarized</entry>
+      </row>
+      <row>
+       <entry>
         <literal><function>gin_clean_pending_list(<parameter>index</> <type>regclass</>)</function></literal>
        </entry>
        <entry><type>bigint</type></entry>
@@ -19700,7 +19707,8 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
     that are not currently summarized by the index; for any such range
     it creates a new summary index tuple by scanning the table pages.
     It returns the number of new page range summaries that were inserted
-    into the index.
+    into the index.  <function>brin_summarize_range</> does the same, except
+    it only summarizes the range that covers the given block number.
    </para>
 
    <para>
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 7163b03..83ee7d3 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -382,7 +382,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
    </variablelist>
 
    <para>
-    <acronym>BRIN</> indexes accept a different parameter:
+    <acronym>BRIN</> indexes accept different parameters:
    </para>
 
    <variablelist>
@@ -396,6 +396,16 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
     </para>
     </listitem>
    </varlistentry>
+
+   <varlistentry>
+    <term><literal>autosummarize</></term>
+    <listitem>
+    <para>
+     Defines whether a summarization run is invoked for the previous page
+     range whenever an insertion is detected on the next one.
+    </para>
+    </listitem>
+   </varlistentry>
    </variablelist>
   </refsect2>
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index b22563b..707d04e 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -26,6 +26,7 @@
 #include "catalog/pg_am.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "postmaster/autovacuum.h"
 #include "storage/bufmgr.h"
 #include "storage/freespace.h"
 #include "utils/builtins.h"
@@ -60,10 +61,12 @@ typedef struct BrinOpaque
 	BrinDesc   *bo_bdesc;
 } BrinOpaque;
 
+#define BRIN_ALL_BLOCKRANGES	InvalidBlockNumber
+
 static BrinBuildState *initialize_brin_buildstate(Relation idxRel,
 						   BrinRevmap *revmap, BlockNumber pagesPerRange);
 static void terminate_brin_buildstate(BrinBuildState *state);
-static void brinsummarize(Relation index, Relation heapRel,
+static void brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange,
 			  double *numSummarized, double *numExisting);
 static void form_and_insert_tuple(BrinBuildState *state);
 static void union_tuples(BrinDesc *bdesc, BrinMemTuple *a,
@@ -126,8 +129,11 @@ brinhandler(PG_FUNCTION_ARGS)
  * with those of the new tuple.  If the tuple values are not consistent with
  * the summary tuple, we need to update the index tuple.
  *
+ * If autosummarization is enabled, check if we need to summarize the previous
+ * page range.
+ *
  * If the range is not currently summarized (i.e. the revmap returns NULL for
- * it), there's nothing to do.
+ * it), there's nothing to do for this tuple.
  */
 bool
 brininsert(Relation idxRel, Datum *values, bool *nulls,
@@ -136,30 +142,57 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
 		   IndexInfo *indexInfo)
 {
 	BlockNumber pagesPerRange;
+	BlockNumber	origHeapBlk;
+	BlockNumber	heapBlk;
 	BrinDesc   *bdesc = (BrinDesc *) indexInfo->ii_AmCache;
 	BrinRevmap *revmap;
 	Buffer		buf = InvalidBuffer;
 	MemoryContext tupcxt = NULL;
 	MemoryContext oldcxt = CurrentMemoryContext;
+	bool		autosummarize = BrinGetAutoSummarize(idxRel);
 
 	revmap = brinRevmapInitialize(idxRel, &pagesPerRange, NULL);
 
+	/*
+	 * origHeapBlk is the block number where the insertion occurred.  heapBlk
+	 * is the first block in the corresponding page range.
+	 */
+	origHeapBlk = ItemPointerGetBlockNumber(heaptid);
+	heapBlk = (origHeapBlk / pagesPerRange) * pagesPerRange;
+
 	for (;;)
 	{
 		bool		need_insert = false;
 		OffsetNumber off;
-		BrinTuple  *brtup;
+		BrinTuple  *brtup = NULL;
 		BrinMemTuple *dtup;
-		BlockNumber heapBlk;
 		int			keyno;
 
 		CHECK_FOR_INTERRUPTS();
 
-		heapBlk = ItemPointerGetBlockNumber(heaptid);
-		/* normalize the block number to be the first block in the range */
-		heapBlk = (heapBlk / pagesPerRange) * pagesPerRange;
-		brtup = brinGetTupleForHeapBlock(revmap, heapBlk, &buf, &off, NULL,
-										 BUFFER_LOCK_SHARE, NULL);
+		/*
+		 * If auto-summarization is enabled and we just inserted the first
+		 * tuple into the first block of a new non-first page range, request a
+		 * summarization run of the previous range.
+		 */
+		if (autosummarize &&
+			heapBlk > 0 &&
+			heapBlk == origHeapBlk &&
+			ItemPointerGetOffsetNumber(heaptid) == FirstOffsetNumber)
+		{
+			BlockNumber lastPageRange = heapBlk - 1;
+
+			brtup = brinGetTupleForHeapBlock(revmap, lastPageRange, &buf, &off, NULL,
+											 BUFFER_LOCK_SHARE, NULL);
+			if (!brtup)
+				AutoVacuumRequestWork(AVW_BRINSummarizeRange,
+									  RelationGetRelid(idxRel),
+									  lastPageRange);
+		}
+
+		if (!brtup)
+			brtup = brinGetTupleForHeapBlock(revmap, heapBlk, &buf, &off,
+											 NULL, BUFFER_LOCK_SHARE, NULL);
 
 		/* if range is unsummarized, there's nothing to do */
 		if (!brtup)
@@ -664,9 +697,6 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	reltuples = IndexBuildHeapScan(heap, index, indexInfo, false,
 								   brinbuildCallback, (void *) state);
 
-	/* process the final batch */
-	form_and_insert_tuple(state);
-
 	/* release resources */
 	idxtuples = state->bs_numtuples;
 	brinRevmapTerminate(state->bs_rmAccess);
@@ -747,7 +777,7 @@ brinvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 
 	brin_vacuum_scan(info->index, info->strategy);
 
-	brinsummarize(info->index, heapRel,
+	brinsummarize(info->index, heapRel, BRIN_ALL_BLOCKRANGES,
 				  &stats->num_index_tuples, &stats->num_index_tuples);
 
 	heap_close(heapRel, AccessShareLock);
@@ -765,7 +795,8 @@ brinoptions(Datum reloptions, bool validate)
 	BrinOptions *rdopts;
 	int			numoptions;
 	static const relopt_parse_elt tab[] = {
-		{"pages_per_range", RELOPT_TYPE_INT, offsetof(BrinOptions, pagesPerRange)}
+		{"pages_per_range", RELOPT_TYPE_INT, offsetof(BrinOptions, pagesPerRange)},
+		{"autosummarize", RELOPT_TYPE_BOOL, offsetof(BrinOptions, autosummarize)}
 	};
 
 	options = parseRelOptions(reloptions, validate, RELOPT_KIND_BRIN,
@@ -792,12 +823,35 @@ brinoptions(Datum reloptions, bool validate)
 Datum
 brin_summarize_new_values(PG_FUNCTION_ARGS)
 {
+	Datum	relation = PG_GETARG_DATUM(0);
+
+	return DirectFunctionCall2(brin_summarize_range,
+							   relation,
+							   Int64GetDatum((int64) BRIN_ALL_BLOCKRANGES));
+}
+
+/*
+ * SQL-callable function to summarize the indicated page range, if not already
+ * summarized.  If the second argument is BRIN_ALL_BLOCKRANGES, all
+ * unsummarized ranges are summarized.
+ */
+Datum
+brin_summarize_range(PG_FUNCTION_ARGS)
+{
 	Oid			indexoid = PG_GETARG_OID(0);
+	int64		heapBlk64 = PG_GETARG_INT64(1);
+	BlockNumber	heapBlk;
 	Oid			heapoid;
 	Relation	indexRel;
 	Relation	heapRel;
 	double		numSummarized = 0;
 
+	if (heapBlk64 > BRIN_ALL_BLOCKRANGES || heapBlk64 < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				 errmsg("invalid block number " INT64_FORMAT, heapBlk64)));
+	heapBlk = (BlockNumber) heapBlk64;
+
 	/*
 	 * We must lock table before index to avoid deadlocks.  However, if the
 	 * passed indexoid isn't an index then IndexGetRelation() will fail.
@@ -837,7 +891,7 @@ brin_summarize_new_values(PG_FUNCTION_ARGS)
 						RelationGetRelationName(indexRel))));
 
 	/* OK, do it */
-	brinsummarize(indexRel, heapRel, &numSummarized, NULL);
+	brinsummarize(indexRel, heapRel, heapBlk, &numSummarized, NULL);
 
 	relation_close(indexRel, ShareUpdateExclusiveLock);
 	relation_close(heapRel, ShareUpdateExclusiveLock);
@@ -1063,17 +1117,17 @@ summarize_range(IndexInfo *indexInfo, BrinBuildState *state, Relation heapRel,
 }
 
 /*
- * Scan a complete BRIN index, and summarize each page range that's not already
- * summarized.  The index and heap must have been locked by caller in at
- * least ShareUpdateExclusiveLock mode.
+ * Summarize page ranges that are not already summarized.  If pageRange is
+ * BRIN_ALL_BLOCKRANGES then the whole table is scanned; otherwise, only the
+ * page range containing the given heap page number is scanned.
  *
  * For each new index tuple inserted, *numSummarized (if not NULL) is
  * incremented; for each existing tuple, *numExisting (if not NULL) is
  * incremented.
  */
 static void
-brinsummarize(Relation index, Relation heapRel, double *numSummarized,
-			  double *numExisting)
+brinsummarize(Relation index, Relation heapRel, BlockNumber pageRange,
+			  double *numSummarized, double *numExisting)
 {
 	BrinRevmap *revmap;
 	BrinBuildState *state = NULL;
@@ -1082,15 +1136,40 @@ brinsummarize(Relation index, Relation heapRel, double *numSummarized,
 	BlockNumber heapBlk;
 	BlockNumber pagesPerRange;
 	Buffer		buf;
+	BlockNumber startBlk;
+	BlockNumber endBlk;
+
+	/* determine range of pages to process; nothing to do for an empty table */
+	heapNumBlocks = RelationGetNumberOfBlocks(heapRel);
+	if (heapNumBlocks == 0)
+		return;
 
 	revmap = brinRevmapInitialize(index, &pagesPerRange, NULL);
 
+	if (pageRange == BRIN_ALL_BLOCKRANGES)
+	{
+		startBlk = 0;
+		endBlk = heapNumBlocks;
+	}
+	else
+	{
+		startBlk = (pageRange / pagesPerRange) * pagesPerRange;
+		/* Nothing to do if start point is beyond end of table */
+		if (startBlk > heapNumBlocks)
+		{
+			brinRevmapTerminate(revmap);
+			return;
+		}
+		endBlk = startBlk + pagesPerRange;
+		if (endBlk > heapNumBlocks)
+			endBlk = heapNumBlocks;
+	}
+
 	/*
 	 * Scan the revmap to find unsummarized items.
 	 */
 	buf = InvalidBuffer;
-	heapNumBlocks = RelationGetNumberOfBlocks(heapRel);
-	for (heapBlk = 0; heapBlk < heapNumBlocks; heapBlk += pagesPerRange)
+	for (heapBlk = startBlk; heapBlk < endBlk; heapBlk += pagesPerRange)
 	{
 		BrinTuple  *tup;
 		OffsetNumber off;
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 0de6999..3937ffd 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -205,7 +205,11 @@ brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber heapBlk,
 	/* normalize the heap block number to be the first page in the range */
 	heapBlk = (heapBlk / revmap->rm_pagesPerRange) * revmap->rm_pagesPerRange;
 
-	/* Compute the revmap page number we need */
+	/*
+	 * Compute the revmap page number we need.  If Invalid is returned (i.e.,
+	 * the revmap page hasn't been created yet), the requested page range is
+	 * not summarized.
+	 */
 	mapBlk = revmap_get_blkno(revmap, heapBlk);
 	if (mapBlk == InvalidBlockNumber)
 	{
@@ -281,13 +285,13 @@ brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber heapBlk,
 			{
 				tup = (BrinTuple *) PageGetItem(page, lp);
 
-				if (tup->bt_blkno == heapBlk)
-				{
-					if (size)
-						*size = ItemIdGetLength(lp);
-					/* found it! */
-					return tup;
-				}
+				if (tup->bt_blkno != heapBlk)
+					elog(ERROR, "expected blkno %u, got %u", heapBlk, tup->bt_blkno);
+
+				if (size)
+					*size = ItemIdGetLength(lp);
+				/* found it! */
+				return tup;
 			}
 		}
 
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 72e1253..9da287d 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -94,6 +94,15 @@ static relopt_bool boolRelOpts[] =
 {
 	{
 		{
+			"autosummarize",
+			"Enables automatic summarization on this BRIN index",
+			RELOPT_KIND_BRIN,
+			AccessExclusiveLock
+		},
+		false
+	},
+	{
+		{
 			"autovacuum_enabled",
 			"Enables autovacuum in this relation",
 			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 33ca749..6f4b6e8 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -92,7 +92,9 @@
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "tcop/tcopprot.h"
+#include "utils/dsa.h"
 #include "utils/fmgroids.h"
+#include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -252,9 +254,10 @@ typedef enum
  * av_runningWorkers the WorkerInfo non-free queue
  * av_startingWorker pointer to WorkerInfo currently being started (cleared by
  *					the worker itself as soon as it's up and running)
+ * av_dsa_handle	handle for allocatable shared memory
  *
  * This struct is protected by AutovacuumLock, except for av_signal and parts
- * of the worker list (see above).
+ * of the worker list (see above).  av_dsa_handle is readable unlocked.
  *-------------
  */
 typedef struct
@@ -264,6 +267,8 @@ typedef struct
 	dlist_head	av_freeWorkers;
 	dlist_head	av_runningWorkers;
 	WorkerInfo	av_startingWorker;
+	dsa_handle	av_dsa_handle;
+	dsa_pointer	av_workitems;
 } AutoVacuumShmemStruct;
 
 static AutoVacuumShmemStruct *AutoVacuumShmem;
@@ -278,6 +283,32 @@ static MemoryContext DatabaseListCxt = NULL;
 /* Pointer to my own WorkerInfo, valid on each worker */
 static WorkerInfo MyWorkerInfo = NULL;
 
+/*
+ * Autovacuum workitem array, stored in AutoVacuumShmem->av_workitems.  This
+ * list is mostly protected by AutovacuumLock, except that if it's marked
+ * 'active' other processes must not modify the work-identifying members,
+ * though changing the list pointers is okay.
+ */
+typedef struct AutoVacuumWorkItem
+{
+	AutoVacuumWorkItemType avw_type;
+	Oid			avw_database;
+	Oid			avw_relation;
+	BlockNumber	avw_blockNumber;
+	bool		avw_active;
+	dsa_pointer	avw_next;	/* doubly linked list pointers */
+	dsa_pointer	avw_prev;
+} AutoVacuumWorkItem;
+
+#define NUM_WORKITEMS	256
+typedef struct
+{
+	dsa_pointer		avs_usedItems;
+	dsa_pointer		avs_freeItems;
+} AutovacWorkItems;
+
+static dsa_area	*AutoVacuumDSA = NULL;
+
 /* PID of launcher, valid only in worker while shutting down */
 int			AutovacuumLauncherPid = 0;
 
@@ -316,11 +347,16 @@ static AutoVacOpts *extract_autovac_opts(HeapTuple tup,
 static PgStat_StatTabEntry *get_pgstat_tabentry_relid(Oid relid, bool isshared,
 						  PgStat_StatDBEntry *shared,
 						  PgStat_StatDBEntry *dbentry);
+static void perform_work_item(AutoVacuumWorkItem *workitem);
 static void autovac_report_activity(autovac_table *tab);
+static void autovac_report_workitem(AutoVacuumWorkItem *workitem,
+						const char *nspname, const char *relname);
 static void av_sighup_handler(SIGNAL_ARGS);
 static void avl_sigusr2_handler(SIGNAL_ARGS);
 static void avl_sigterm_handler(SIGNAL_ARGS);
 static void autovac_refresh_stats(void);
+static void remove_wi_from_list(dsa_pointer *list, dsa_pointer wi_ptr);
+static void add_wi_to_list(dsa_pointer *list, dsa_pointer wi_ptr);
 
 
 
@@ -574,6 +610,28 @@ AutoVacLauncherMain(int argc, char *argv[])
 	 */
 	rebuild_database_list(InvalidOid);
 
+	/*
+	 * Set up our DSA so that backends can install work-item requests.  It may
+	 * already exist as created by a previous launcher.
+	 */
+	if (!AutoVacuumShmem->av_dsa_handle)
+	{
+		LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
+		AutoVacuumDSA = dsa_create(LWTRANCHE_AUTOVACUUM);
+		/* make sure it doesn't go away even if we do */
+		dsa_pin(AutoVacuumDSA);
+		dsa_pin_mapping(AutoVacuumDSA);
+		AutoVacuumShmem->av_dsa_handle = dsa_get_handle(AutoVacuumDSA);
+		/* delay array allocation until first request */
+		AutoVacuumShmem->av_workitems = InvalidDsaPointer;
+		LWLockRelease(AutovacuumLock);
+	}
+	else
+	{
+		AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);
+		dsa_pin_mapping(AutoVacuumDSA);
+	}
+
 	/* loop until shutdown request */
 	while (!got_SIGTERM)
 	{
@@ -1617,6 +1675,14 @@ AutoVacWorkerMain(int argc, char *argv[])
 	{
 		char		dbname[NAMEDATALEN];
 
+		if (AutoVacuumShmem->av_dsa_handle)
+		{
+			/* First use of DSA in this worker, so attach to it */
+			Assert(!AutoVacuumDSA);
+			AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);
+			dsa_pin_mapping(AutoVacuumDSA);
+		}
+
 		/*
 		 * Report autovac startup to the stats collector.  We deliberately do
 		 * this before InitPostgres, so that the last_autovac_time will get
@@ -2467,6 +2533,69 @@ deleted:
 	}
 
 	/*
+	 * Perform additional work items, as requested by backends.
+	 */
+	if (AutoVacuumShmem->av_workitems)
+	{
+		dsa_pointer		wi_ptr;
+		AutovacWorkItems *workitems;
+
+		LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
+
+		/*
+		 * Scan the list of pending items, and process the inactive ones in our
+		 * database.
+		 */
+		workitems = (AutovacWorkItems *)
+			dsa_get_address(AutoVacuumDSA, AutoVacuumShmem->av_workitems);
+		wi_ptr = workitems->avs_usedItems;
+
+		while (wi_ptr != InvalidDsaPointer)
+		{
+			AutoVacuumWorkItem	*workitem;
+
+			workitem = (AutoVacuumWorkItem *)
+				dsa_get_address(AutoVacuumDSA, wi_ptr);
+
+			if (workitem->avw_database == MyDatabaseId && !workitem->avw_active)
+			{
+				dsa_pointer		next_ptr;
+
+				/* claim this one */
+				workitem->avw_active = true;
+
+				LWLockRelease(AutovacuumLock);
+
+				perform_work_item(workitem);
+
+				/*
+				 * Check for config changes before acquiring lock for further
+				 * jobs.
+				 */
+				CHECK_FOR_INTERRUPTS();
+				if (got_SIGHUP)
+				{
+					got_SIGHUP = false;
+					ProcessConfigFile(PGC_SIGHUP);
+				}
+
+				LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
+
+				/* Put the array item back for the next user */
+				next_ptr = workitem->avw_next;
+				remove_wi_from_list(&workitems->avs_usedItems, wi_ptr);
+				add_wi_to_list(&workitems->avs_freeItems, wi_ptr);
+				wi_ptr = next_ptr;
+			}
+			else
+				wi_ptr = workitem->avw_next;
+		}
+
+		/* all done */
+		LWLockRelease(AutovacuumLock);
+	}
+
+	/*
 	 * We leak table_toast_map here (among other things), but since we're
 	 * going away soon, it's not a problem.
 	 */
@@ -2499,6 +2628,103 @@ deleted:
 }
 
 /*
+ * Execute a previously registered work item.
+ */
+static void
+perform_work_item(AutoVacuumWorkItem *workitem)
+{
+	char	   *cur_datname = NULL;
+	char	   *cur_nspname = NULL;
+	char	   *cur_relname = NULL;
+
+	/*
+	 * Note we do not store table info in MyWorkerInfo, since this is not
+	 * vacuuming proper.
+	 */
+
+	/*
+	 * Save the relation name for a possible error message, to avoid a
+	 * catalog lookup in case of an error.  If any of these return NULL,
+	 * then the relation has been dropped since last we checked; skip it.
+	 * Note: they must live in a long-lived memory context because we call
+	 * vacuum and analyze in different transactions.
+	 */
+
+	cur_relname = get_rel_name(workitem->avw_relation);
+	cur_nspname = get_namespace_name(get_rel_namespace(workitem->avw_relation));
+	cur_datname = get_database_name(MyDatabaseId);
+	if (!cur_relname || !cur_nspname || !cur_datname)
+		goto deleted2;
+
+	autovac_report_workitem(workitem, cur_nspname, cur_datname);
+
+	/*
+	 * We will abort the current work item if something errors out, and
+	 * continue with the next one; in particular, this happens if we are
+	 * interrupted with SIGINT.  Note that this means that the work item list
+	 * can be lossy.
+	 */
+	PG_TRY();
+	{
+		/* have at it */
+		MemoryContextSwitchTo(TopTransactionContext);
+
+		switch (workitem->avw_type)
+		{
+			case AVW_BRINSummarizeRange:
+				DirectFunctionCall2(brin_summarize_range,
+									ObjectIdGetDatum(workitem->avw_relation),
+									Int64GetDatum((int64) workitem->avw_blockNumber));
+				break;
+			default:
+				elog(WARNING, "unrecognized work item found: type %d",
+					 workitem->avw_type);
+				break;
+		}
+
+		/*
+		 * Clear a possible query-cancel signal, to avoid a late reaction
+		 * to an automatically-sent signal because of vacuuming the
+		 * current table (we're done with it, so it would make no sense to
+		 * cancel at this point.)
+		 */
+		QueryCancelPending = false;
+	}
+	PG_CATCH();
+	{
+		/*
+		 * Abort the transaction, start a new one, and proceed with the
+		 * next table in our list.
+		 */
+		HOLD_INTERRUPTS();
+		errcontext("processing work entry for relation \"%s.%s.%s\"",
+				   cur_datname, cur_nspname, cur_relname);
+		EmitErrorReport();
+
+		/* this resets the PGXACT flags too */
+		AbortOutOfAnyTransaction();
+		FlushErrorState();
+		MemoryContextResetAndDeleteChildren(PortalContext);
+
+		/* restart our transaction for the following operations */
+		StartTransactionCommand();
+		RESUME_INTERRUPTS();
+	}
+	PG_END_TRY();
+
+	/* We intentionally do not set did_vacuum here */
+
+	/* be tidy */
+deleted2:
+	if (cur_datname)
+		pfree(cur_datname);
+	if (cur_nspname)
+		pfree(cur_nspname);
+	if (cur_relname)
+		pfree(cur_relname);
+}
+
+/*
  * extract_autovac_opts
  *
  * Given a relation's pg_class tuple, return the AutoVacOpts portion of
@@ -2946,6 +3172,45 @@ autovac_report_activity(autovac_table *tab)
 }
 
 /*
+ * autovac_report_workitem
+ *		Report to pgstat that autovacuum is processing a work item
+ */
+static void
+autovac_report_workitem(AutoVacuumWorkItem *workitem,
+						const char *nspname, const char *relname)
+{
+	char	activity[MAX_AUTOVAC_ACTIV_LEN + 12 + 2];
+	char	blk[12 + 2];
+	int		len;
+
+	switch (workitem->avw_type)
+	{
+		case AVW_BRINSummarizeRange:
+			snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,
+					 "autovacuum: BRIN summarize");
+			break;
+	}
+
+	/*
+	 * Report the qualified name of the relation, and the block number if any
+	 */
+	len = strlen(activity);
+
+	if (BlockNumberIsValid(workitem->avw_blockNumber))
+		snprintf(blk, sizeof(blk), " %u", workitem->avw_blockNumber);
+	else
+		blk[0] = '\0';
+
+	snprintf(activity + len, MAX_AUTOVAC_ACTIV_LEN - len,
+			 " %s.%s%s", nspname, relname, blk);
+
+	/* Set statement_timestamp() to current time for pg_stat_activity */
+	SetCurrentStatementStartTimestamp();
+
+	pgstat_report_activity(STATE_RUNNING, activity);
+}
+
+/*
  * AutoVacuumingActive
  *		Check GUC vars and report whether the autovacuum process should be
  *		running.
@@ -2959,6 +3224,113 @@ AutoVacuumingActive(void)
 }
 
 /*
+ * Request one work item to the next autovacuum run processing our database.
+ */
+void
+AutoVacuumRequestWork(AutoVacuumWorkItemType type, Oid relationId,
+					  BlockNumber blkno)
+{
+	AutovacWorkItems *workitems;
+	dsa_pointer		wi_ptr;
+	AutoVacuumWorkItem *workitem;
+
+	LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
+
+	/*
+	 * It may be useful to de-duplicate the list upon insertion.  For the only
+	 * currently existing caller, this is not necessary.
+	 */
+
+	/* First use in this process?  Set up DSA */
+	if (!AutoVacuumDSA)
+	{
+		if (!AutoVacuumShmem->av_dsa_handle)
+		{
+			/* autovacuum launcher not started; nothing can be done */
+			LWLockRelease(AutovacuumLock);
+			return;
+		}
+		AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);
+		dsa_pin_mapping(AutoVacuumDSA);
+	}
+
+	/* First use overall?  Allocate work items array */
+	if (AutoVacuumShmem->av_workitems == InvalidDsaPointer)
+	{
+		int		i;
+		AutovacWorkItems *workitems;
+
+		AutoVacuumShmem->av_workitems =
+			dsa_allocate_extended(AutoVacuumDSA,
+								  sizeof(AutovacWorkItems) +
+								  NUM_WORKITEMS * sizeof(AutoVacuumWorkItem),
+								  DSA_ALLOC_NO_OOM);
+		/* if out of memory, silently disregard the request */
+		if (AutoVacuumShmem->av_workitems == InvalidDsaPointer)
+		{
+			LWLockRelease(AutovacuumLock);
+			dsa_detach(AutoVacuumDSA);
+			AutoVacuumDSA = NULL;
+			return;
+		}
+
+		/* Initialize each array entry as a member of the free list */
+		workitems = dsa_get_address(AutoVacuumDSA, AutoVacuumShmem->av_workitems);
+
+		workitems->avs_usedItems = InvalidDsaPointer;
+		workitems->avs_freeItems = InvalidDsaPointer;
+		for (i = 0; i < NUM_WORKITEMS; i++)
+		{
+			/* XXX surely there is a simpler way to do this */
+			wi_ptr = AutoVacuumShmem->av_workitems + sizeof(AutovacWorkItems) +
+				sizeof(AutoVacuumWorkItem) * i;
+			workitem = (AutoVacuumWorkItem *) dsa_get_address(AutoVacuumDSA, wi_ptr);
+
+			workitem->avw_type = 0;
+			workitem->avw_database = InvalidOid;
+			workitem->avw_relation = InvalidOid;
+			workitem->avw_active = false;
+
+			/* put this item in the free list */
+			workitem->avw_next = workitems->avs_freeItems;
+			workitems->avs_freeItems = wi_ptr;
+		}
+	}
+
+	workitems = (AutovacWorkItems *)
+		dsa_get_address(AutoVacuumDSA, AutoVacuumShmem->av_workitems);
+
+	/* If array is full, disregard the request */
+	if (workitems->avs_freeItems == InvalidDsaPointer)
+	{
+		LWLockRelease(AutovacuumLock);
+		dsa_detach(AutoVacuumDSA);
+		AutoVacuumDSA = NULL;
+		return;
+	}
+
+	/* remove workitem struct from free list ... */
+	wi_ptr = workitems->avs_freeItems;
+	remove_wi_from_list(&workitems->avs_freeItems, wi_ptr);
+
+	/* ... initialize it ... */
+	workitem = dsa_get_address(AutoVacuumDSA, wi_ptr);
+	workitem->avw_type = type;
+	workitem->avw_database = MyDatabaseId;
+	workitem->avw_relation = relationId;
+	workitem->avw_blockNumber = blkno;
+	workitem->avw_active = false;
+
+	/* ... and put it on autovacuum's to-do list */
+	add_wi_to_list(&workitems->avs_usedItems, wi_ptr);
+
+	LWLockRelease(AutovacuumLock);
+
+	dsa_detach(AutoVacuumDSA);
+	AutoVacuumDSA = NULL;
+}
+
+/*
  * autovac_init
  *		This is called at postmaster initialization.
  *
@@ -3079,3 +3451,59 @@ autovac_refresh_stats(void)
 
 	pgstat_clear_snapshot();
 }
+
+/*
+ * Simplistic open-coded list implementation for objects stored in DSA.
+ * Each item is doubly linked, but we have no tail pointer, and the "prev"
+ * element of the first item is null, not the list.
+ */
+
+/*
+ * Remove a work item from the given list.
+ */
+static void
+remove_wi_from_list(dsa_pointer *list, dsa_pointer wi_ptr)
+{
+	AutoVacuumWorkItem *workitem = dsa_get_address(AutoVacuumDSA, wi_ptr);
+	dsa_pointer		next = workitem->avw_next;
+	dsa_pointer		prev = workitem->avw_prev;
+
+	workitem->avw_next = workitem->avw_prev = InvalidDsaPointer;
+
+	if (next != InvalidDsaPointer)
+	{
+		workitem = dsa_get_address(AutoVacuumDSA, next);
+		workitem->avw_prev = prev;
+	}
+
+	if (prev != InvalidDsaPointer)
+	{
+		workitem = dsa_get_address(AutoVacuumDSA, prev);
+		workitem->avw_next = next;
+	}
+	else
+		*list = next;
+}
+
+/*
+ * Add a workitem to the given list
+ */
+static void
+add_wi_to_list(dsa_pointer *list, dsa_pointer wi_ptr)
+{
+	if (*list == InvalidDsaPointer)
+	{
+		/* list is empty; item is now singleton */
+		*list = wi_ptr;
+	}
+	else
+	{
+		AutoVacuumWorkItem *workitem = dsa_get_address(AutoVacuumDSA, wi_ptr);
+		AutoVacuumWorkItem *old = dsa_get_address(AutoVacuumDSA, *list);
+
+		/* Put item at head of list */
+		workitem->avw_next = *list;
+		old->avw_prev = wi_ptr;
+		*list = wi_ptr;
+	}
+}
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 3e13394..c4313a5 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -494,7 +494,7 @@ RegisterLWLockTranches(void)
 
 	if (LWLockTrancheArray == NULL)
 	{
-		LWLockTranchesAllocated = 64;
+		LWLockTranchesAllocated = 72;
 		LWLockTrancheArray = (char **)
 			MemoryContextAllocZero(TopMemoryContext,
 						  LWLockTranchesAllocated * sizeof(char *));
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index 896824a..3f4c29b 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -22,6 +22,7 @@ typedef struct BrinOptions
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	BlockNumber pagesPerRange;
+	bool		autosummarize;
 } BrinOptions;
 
 #define BRIN_DEFAULT_PAGES_PER_RANGE	128
@@ -29,5 +30,9 @@ typedef struct BrinOptions
 	((relation)->rd_options ? \
 	 ((BrinOptions *) (relation)->rd_options)->pagesPerRange : \
 	  BRIN_DEFAULT_PAGES_PER_RANGE)
+#define BrinGetAutoSummarize(relation) \
+	((relation)->rd_options ? \
+	 ((BrinOptions *) (relation)->rd_options)->autosummarize : \
+	  false)
 
 #endif   /* BRIN_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 1132a60..1b7ab2a 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -564,6 +564,8 @@ DATA(insert OID = 335 (  brinhandler	PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0
 DESCR("brin index access method handler");
 DATA(insert OID = 3952 (  brin_summarize_new_values PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 23 "2205" _null_ _null_ _null_ _null_ _null_ brin_summarize_new_values _null_ _null_ _null_ ));
 DESCR("brin: standalone scan new table pages");
+DATA(insert OID = 3999 (  brin_summarize_range PGNSP PGUID 12 1 0 0 0 f f f f t f v s 2 0 23 "2205 20" _null_ _null_ _null_ _null_ _null_ brin_summarize_range _null_ _null_ _null_ ));
+DESCR("brin: standalone scan new table pages");
 
 DATA(insert OID = 338 (  amvalidate		PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 16 "26" _null_ _null_ _null_ _null_ _null_	amvalidate _null_ _null_ _null_ ));
 DESCR("validate an operator class");
diff --git a/src/include/postmaster/autovacuum.h b/src/include/postmaster/autovacuum.h
index 99d7f09..174e91a 100644
--- a/src/include/postmaster/autovacuum.h
+++ b/src/include/postmaster/autovacuum.h
@@ -14,6 +14,15 @@
 #ifndef AUTOVACUUM_H
 #define AUTOVACUUM_H
 
+/*
+ * Other processes can request specific work from autovacuum, identified by
+ * AutoVacuumWorkItem elements.
+ */
+typedef enum
+{
+	AVW_BRINSummarizeRange
+} AutoVacuumWorkItemType;
+
 
 /* GUC variables */
 extern bool autovacuum_start_daemon;
@@ -60,6 +69,9 @@ extern void AutovacuumWorkerIAm(void);
 extern void AutovacuumLauncherIAm(void);
 #endif
 
+extern void AutoVacuumRequestWork(AutoVacuumWorkItemType type,
+					  Oid relationId, BlockNumber blkno);
+
 /* shared memory stuff */
 extern Size AutoVacuumShmemSize(void);
 extern void AutoVacuumShmemInit(void);
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 0cd45bb..9105f3d 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -211,6 +211,7 @@ typedef enum BuiltinTrancheIds
 	LWTRANCHE_BUFFER_MAPPING,
 	LWTRANCHE_LOCK_MANAGER,
 	LWTRANCHE_PREDICATE_LOCK_MANAGER,
+	LWTRANCHE_AUTOVACUUM,
 	LWTRANCHE_PARALLEL_QUERY_DSA,
 	LWTRANCHE_TBM,
 	LWTRANCHE_FIRST_USER_DEFINED
#8Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#6)
Re: brin autosummarization -- autovacuum "work items"

On Tue, Mar 21, 2017 at 4:10 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Well, the number of work items is currently fixed; but if you have many
BRIN indexes, then you'd overflow (lose requests). By using DSA I am
making it easy to patch this afterwards so that an arbitrary number of
requests can be recorded.

But that could also use an arbitrarily large amount of memory, and any
leaks will be cluster-lifespan.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#8)
Re: brin autosummarization -- autovacuum "work items"

Robert Haas wrote:

On Tue, Mar 21, 2017 at 4:10 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Well, the number of work items is currently fixed; but if you have many
BRIN indexes, then you'd overflow (lose requests). By using DSA I am
making it easy to patch this afterwards so that an arbitrary number of
requests can be recorded.

But that could also use an arbitrarily large amount of memory, and any
leaks will be cluster-lifespan.

Good point -- probably not such a great idea as presented. The patch
only uses a fixed amount of memory currently, so it should be fine on
that front. I think this is a good enough start.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#7)
Re: brin autosummarization -- autovacuum "work items"

Alvaro Herrera wrote:

I also removed the behavior that on index creation the final partial
block range is always summarized. It's pointless.

I just pushed this, without this change, because it breaks
src/test/modules/brin. I still think it's pointless, but it'd require
more than one line to change.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Jeff Janes
jeff.janes@gmail.com
In reply to: Alvaro Herrera (#10)
1 attachment(s)
Re: brin autosummarization -- autovacuum "work items"

On Sat, Apr 1, 2017 at 10:09 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Alvaro Herrera wrote:

I also removed the behavior that on index creation the final partial
block range is always summarized. It's pointless.

I just pushed this, without this change, because it breaks
src/test/modules/brin. I still think it's pointless, but it'd require
more than one line to change.

This is failing for me (and the entire build farm, it looks like).

Cheers,

Jeff

Attachments:

regression.diffsapplication/octet-stream; name=regression.diffsDownload
*** /home/jjanes/pgsql/git/src/test/regress/expected/brin.out	Sat Apr  1 11:53:51 2017
--- /home/jjanes/pgsql/git/src/test/regress/results/brin.out	Sat Apr  1 11:54:41 2017
***************
*** 425,431 ****
  SELECT brin_summarize_range('brin_summarize_idx', 0);
   brin_summarize_range 
  ----------------------
!                     1
  (1 row)
  
  -- nothing: already summarized
--- 425,431 ----
  SELECT brin_summarize_range('brin_summarize_idx', 0);
   brin_summarize_range 
  ----------------------
!                     0
  (1 row)
  
  -- nothing: already summarized

======================================================================