Proposal : For Auto-Prewarm.

Started by Mithun Cyabout 9 years ago108 messages

mithun.cy@enterprisedb.com

about 9 years ago

1 attachment(s)

# pg_autoprewarm.

This a PostgreSQL contrib module which automatically dump all of the
blocknums
present in buffer pool at the time of server shutdown(smart and fast mode
only,
to be enhanced to dump at regular interval.) and load these blocks when
server restarts.

Design:
------
We have created a BG Worker Auto Pre-warmer which during shutdown dumps all
the
blocknum in buffer pool in sorted order.
Format of each entry is
<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>.
Auto Pre-warmer is started as soon as the postmaster is started we do not
wait
for recovery to finish and database to reach a consistent state. If there
is a
"dump_file" to load we start loading each block entry to buffer pool until
there is a free buffer. This way we do not replace any new blocks which was
loaded either by recovery process or querying clients. Then it waits until
it receives
SIGTERM to dump the block information in buffer pool.

HOW TO USE:
-----------
Build and add the pg_autoprewarm to shared_preload_libraries. Auto
Pre-warmer
process automatically do dumping of buffer pool's block info and load them
when
restarted.

TO DO:
------
Add functionality to dump based on timer at regular interval.
And some cleanups.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

pg_autoprewarm_01.patchapplication/octet-stream; name=pg_autoprewarm_01.patchDownload

commit 52a1753ffbdaf6fa4d02ed0accded0feb634068c
Author: mithun <mithun@localhost.localdomain>
Date:   Thu Oct 27 16:58:53 2016 +0530

    Proposal pg_autoprewarm

diff --git a/contrib/pg_autoprewarm/Makefile b/contrib/pg_autoprewarm/Makefile
new file mode 100644
index 0000000..2e754b0
--- /dev/null
+++ b/contrib/pg_autoprewarm/Makefile
@@ -0,0 +1,15 @@
+# contrib/pg_autorewarm/Makefile
+
+MODULE_big = pg_autoprewarm
+OBJS = pg_autoprewarm.o $(WIN32RES)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_autoprewarm
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_autoprewarm/README b/contrib/pg_autoprewarm/README
new file mode 100644
index 0000000..2fda8ea
--- /dev/null
+++ b/contrib/pg_autoprewarm/README
@@ -0,0 +1,27 @@
+# pg_autoprewarm.
+
+This a PostgreSQL contrib module which automatically dump all of the blocknums
+present in buffer pool at the time of server shutdown(smart and fast mode only,
+to be enhanced to dump at regular interval.)
+and load these blocks when server restarts.
+
+Design:
+------
+We have created a BG worker Auto Pre-warmer which during shutdown dumps all the
+blocknum in buffer pool after sorting same.
+Format of each entry is <DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>.
+Auto Pre-warmer is started as soon as the postmaster is started we do not wait
+for recovery to finish and database to reach a consistent state. If there is a
+"dump_file" to load we start loading each block entry to buffer pool until
+there is a free buffer. This way we do not replace any new blocks which was
+loaded either by recovery process or querying clients.
+
+HOW TO USE:
+-----------
+Build and add the pg_autoprewarm to shared_preload_libraries. Auto Pre-warmer
+process automatically do dumping of buffer pool block info and load them when
+restarted.
+
+TO DO:
+------
+Add functionality to dump based on timer at regular interval.
\ No newline at end of file
diff --git a/contrib/pg_autoprewarm/pg_autoprewarm.c b/contrib/pg_autoprewarm/pg_autoprewarm.c
new file mode 100644
index 0000000..f23a8db
--- /dev/null
+++ b/contrib/pg_autoprewarm/pg_autoprewarm.c
@@ -0,0 +1,465 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_autoprewarm.c
+ *	Automatically dumps and load buffer pool.
+ *
+ *	contrib/pg_autoprewarm/pg_autoprewarm.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "pgstat.h"
+#include "storage/buf_internals.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "catalog/pg_class.h"
+#include <unistd.h>
+
+PG_MODULE_MAGIC;
+
+static void AutoPreWarmerMain (Datum main_arg);
+static bool
+load_block(RelFileNode rnode, char reltype, ForkNumber forkNum,
+		   BlockNumber blockNum);
+
+/* Primary functions */
+void            _PG_init(void);
+
+/* Secondary/supporting functions */
+static void     sigtermHandler(SIGNAL_ARGS);
+
+/* flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+
+/*
+ *	Signal handler for SIGTERM
+ *	set our latch to wake it up.
+ */
+static void
+sigtermHandler(SIGNAL_ARGS)
+{
+	int save_errno = errno;
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/* Meta-data of each persistent page buffer which is dumped and used to load. */
+typedef struct BlockInfoRecord
+{
+	Oid			database;	/* datbase */
+	Oid			spcNode;	/* tablespace */
+	Oid			filenode;	/* relation */
+	ForkNumber	forknum;	/* fork number */
+	BlockNumber	blocknum;	/* block number */
+}BlockInfoRecord;
+
+/* Try loading only once during startup. If any error do not retry. */
+static bool avoid_loading = false;
+
+/*
+ * And avoid dumping if we receive sigterm while loading. Also do not re-try if
+ * dump has failed previously.
+ */
+static bool avoid_dumping = false;
+
+/* compare member elements to check if they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * sort_cmp_func - compare function used while qsorting BlockInfoRecord objects.
+ */
+static int
+sort_cmp_func(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(spcNode);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+	return 0;
+}
+
+#define DEFAULT_DUMP_FILENAME "autoprewarm"
+
+/*
+ *	load_block -	Load a given block.
+ */
+bool
+load_block(RelFileNode rnode, char reltype, ForkNumber forkNum,
+		   BlockNumber blockNum)
+{
+	Buffer      buffer;
+
+	/* Load the page only if there exist a free buffer. We do not want to
+	 * replace an existing buffer. */
+	if (have_free_buffer())
+	{
+		SMgrRelation smgr = smgropen(rnode, InvalidBackendId);
+
+		/*
+		 * Check if fork exists first otherwise we will not be able to use one
+		 * free buffer for each non existing block.
+		 */
+		if  (smgrexists(smgr, forkNum))
+		{
+			buffer = ReadBufferForPrewarm(smgr, reltype,
+										  forkNum, blockNum,
+										  RBM_NORMAL, NULL);
+			if (!BufferIsValid(buffer))
+				elog(LOG, "\n Skipped the buff page. \n");
+			else
+				ReleaseBuffer(buffer);
+		}
+
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ *	load_now - Main routine which reads from dump file and load each block.
+ *	We try to load each blocknum read from DEFAULT_DUMP_FILENAME until we have
+ *	any free buffer left or SIGTERM is received. If we fail to load a block we
+ *	ignore the ERROR and try to load next blocknum. This is because there is
+ *	possibility that corresponding blocknum might have been deleted.
+ */
+static void load_now(void)
+{
+	static char dump_file_path[MAXPGPATH];
+	FILE *file = NULL;
+	uint32 i, num_buffers = 0;
+
+	avoid_loading = true;
+
+	/* Check if file exists and open file in read mode. */
+	snprintf(dump_file_path, sizeof(dump_file_path), "%s.save", DEFAULT_DUMP_FILENAME);
+	file = fopen(dump_file_path, PG_BINARY_R);
+
+	if (!file)
+		return;	/* No file to load. */
+
+	if (fscanf(file,"<<%u>>", &num_buffers) != 1)
+	{
+		fclose(file);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				errmsg("Error reading num of elements in \"%s\" for autoprewarm : %m", dump_file_path)));
+	}
+
+	elog(LOG, "Num buffers : %d \n", num_buffers);
+
+	for (i = 0; i < num_buffers; i++)
+	{
+		RelFileNode	rnode;
+		uint32		forknum;
+		BlockNumber	blocknum;
+		bool		have_free_buf = true;
+
+		if (got_sigterm)
+		{
+			/*
+			 * Received shutdown while we were still loading the buffers.
+			 * No need to dump at this stage.
+			 */
+			avoid_dumping = true;
+			break;
+		}
+
+		if(!have_free_buf)
+			break;
+
+		/* Get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &rnode.dbNode, &rnode.spcNode,
+							 &rnode.relNode, &forknum, &blocknum))
+			break;	/* No more valid entry hence stop processing. */
+
+		PG_TRY();
+		{
+			have_free_buf = load_block(rnode, RELPERSISTENCE_PERMANENT,
+									   (ForkNumber)forknum, blocknum);
+		}
+		PG_CATCH();
+		{
+			/* Any error handle it and then try to load next buffer. */
+
+			/* Prevent interrupts while cleaning up */
+			HOLD_INTERRUPTS();
+
+			/* Report the error to the server log */
+			EmitErrorReport();
+
+			LWLockReleaseAll();
+			AbortBufferIO();
+			UnlockBuffers();
+
+			/* buffer pins are released here. */
+			ResourceOwnerRelease(CurrentResourceOwner,
+								 RESOURCE_RELEASE_BEFORE_LOCKS,
+								 false, true);
+			FlushErrorState();
+
+			/* Now we can allow interrupts again */
+			RESUME_INTERRUPTS();
+		}
+		PG_END_TRY();
+	}
+
+	fclose(file);
+
+	elog(LOG, "loaded");
+	return;
+}
+
+
+/*
+ *	dump_now - Main routine which goes through each buffer header and dump
+ *	their metadata in the format.
+ *	<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>. We Sort these data
+ *	and then dump them. Sorting is necessary as it facilitates sequential read
+ *	during load. Unlike load if we encounter any error we abort the dump.
+ */
+static void dump_now(void)
+{
+	static char		dump_file_path[MAXPGPATH],
+					transient_dump_file_path[MAXPGPATH];
+	uint32			i;
+	int				ret;
+	uint32			num_buffers;
+	BlockInfoRecord	*block_info_array;
+	BufferDesc		*bufHdr;
+	FILE			*file = NULL;
+
+	avoid_dumping = true;
+	block_info_array = (BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	if (!block_info_array)
+		elog(ERROR, "Out of Memory!");
+
+	for (num_buffers = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32 buf_state;
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* Lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		/* Only valid and persistant page buffers are dumped. */
+		if ((buf_state & BM_VALID) && (buf_state & BM_TAG_VALID) &&
+			(buf_state & BM_PERMANENT))
+		{
+			block_info_array[num_buffers].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_buffers].spcNode  = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_buffers].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_buffers].forknum  = bufHdr->tag.forkNum;
+			block_info_array[num_buffers].blocknum = bufHdr->tag.blockNum;
+			++num_buffers;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	/* Sorting now only to avoid sorting while loading. */
+	pg_qsort(block_info_array, num_buffers, sizeof(BlockInfoRecord), sort_cmp_func);
+
+	snprintf(transient_dump_file_path, sizeof(dump_file_path),
+			 "%s.save.tmp", DEFAULT_DUMP_FILENAME);
+	file = fopen(transient_dump_file_path, "w");
+	if (file == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open \"%s\": %m", dump_file_path)));
+
+	snprintf(dump_file_path, sizeof(dump_file_path),
+			 "%s.save", DEFAULT_DUMP_FILENAME);
+
+	/* Write num_buffers first and then BlockMetaInfoRecords. */
+	ret = fprintf(file, "<<%u>>\n", num_buffers);
+	if (ret < 0)
+	{
+		fclose(file);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				errmsg("error writing to \"%s\" : %m", dump_file_path)));
+	}
+
+	for (i = 0; i < num_buffers; i++)
+	{
+		ret = fprintf(file, "%u,%u,%u,%u,%u\n",
+							block_info_array[i].database,
+							block_info_array[i].spcNode,
+							block_info_array[i].filenode,
+							(uint32)block_info_array[i].forknum,
+							block_info_array[i].blocknum);
+		if (ret < 0)
+		{
+			fclose(file);
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					errmsg("error writing to \"%s\" : %m", dump_file_path)));
+		}
+	}
+
+	pfree(block_info_array);
+
+	/*
+	 * Rename transient_dump_file_path to dump_file_path to make things
+	 * permanent.
+	 */
+	ret = fclose(file);
+	if (ret != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				errmsg("error closing \"%s\" : %m", transient_dump_file_path)));
+
+	ret = unlink(dump_file_path);
+	if (ret != 0 && errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				errmsg("unlink \"%s\" failed : %m", dump_file_path)));
+
+	ret = rename(transient_dump_file_path, dump_file_path);
+	if (ret != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				errmsg("Failed to rename \"%s\" to \"%s\" : %m",
+					   transient_dump_file_path, dump_file_path)));
+
+	elog(LOG, "Buffer Dump: saved metadata of %d blocks", num_buffers);
+}
+
+/* Extension's entry point. */
+void _PG_init(void)
+{
+	BackgroundWorker auto_prewarm;
+
+	/* Register AutoPreWarmer. */
+	MemSet(&auto_prewarm, 0, sizeof(auto_prewarm));
+	auto_prewarm.bgw_main_arg = Int32GetDatum(0);
+	auto_prewarm.bgw_flags      = BGWORKER_SHMEM_ACCESS;
+
+	/* Register the Auto Pre-warmer background worker */
+	auto_prewarm.bgw_start_time = BgWorkerStart_PostmasterStart;
+	auto_prewarm.bgw_restart_time   = 0;  /* Keep the Auto Pre-warmer running */
+	auto_prewarm.bgw_main           = AutoPreWarmerMain;
+	snprintf(auto_prewarm.bgw_name, BGW_MAXLEN, "Auto Pre-warmer");
+	RegisterBackgroundWorker(&auto_prewarm);
+}
+
+/*
+ * AutoPreWarmerMain -- Main entry point of Auto-prewarmer process.
+ *						This is invoked as a background worker.
+ */
+static void AutoPreWarmerMain (Datum main_arg)
+{
+	MemoryContext	autoprewarmer_context;
+	sigjmp_buf		local_sigjmp_buf;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, sigtermHandler);
+
+	/*
+	 * Create a resource owner to keep track of our resources.
+	 */
+	CurrentResourceOwner = ResourceOwnerCreate(NULL, "AutoPreWarmer");
+
+	/*
+	 * Create a memory context that we will do all our work in.  We do this so
+	 * that we can reset the context during error recovery and thereby avoid
+	 * possible memory leaks.
+	 */
+	autoprewarmer_context = AllocSetContextCreate(TopMemoryContext,
+												 "AutoPreWarmer",
+												 ALLOCSET_DEFAULT_MINSIZE,
+												 ALLOCSET_DEFAULT_INITSIZE,
+												 ALLOCSET_DEFAULT_MAXSIZE);
+	MemoryContextSwitchTo(autoprewarmer_context);
+
+
+	/*
+	 * If an exception is encountered, processing resumes here.
+	 * See notes in postgres.c about the design of this coding.
+	 */
+	if (sigsetjmp(local_sigjmp_buf, 1) != 0)
+	{
+		/* Since not using PG_TRY, must reset error stack by hand */
+		error_context_stack = NULL;
+
+		/* Prevent interrupts while cleaning up */
+		HOLD_INTERRUPTS();
+
+		/* Report the error to the server log */
+		EmitErrorReport();
+
+		LWLockReleaseAll();
+		AbortBufferIO();
+		UnlockBuffers();
+
+		/* buffer pins are released here. */
+		ResourceOwnerRelease(CurrentResourceOwner,
+							 RESOURCE_RELEASE_BEFORE_LOCKS,
+							 false, true);
+		AtEOXact_Buffers(false);
+		AtEOXact_SMgr();
+
+		MemoryContextSwitchTo(autoprewarmer_context);
+		FlushErrorState();
+
+		/* Flush any leaked data in the top-level context */
+		MemoryContextResetAndDeleteChildren(autoprewarmer_context);
+
+		/* Now we can allow interrupts again */
+		RESUME_INTERRUPTS();
+
+		/* Close all open files after any error. */
+		smgrcloseall();
+
+		/* Error while dumping is treated as fatal hence do proc_exit */
+		if (avoid_dumping)
+			proc_exit(0);
+	}
+
+	/* We can now handle ereport(ERROR) */
+	PG_exception_stack = &local_sigjmp_buf;
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+	if (!avoid_loading)
+		load_now();
+	while (!got_sigterm)
+	{
+		int rc;
+		ResetLatch(&MyProc->procLatch);
+		rc = WaitLatch(&MyProc->procLatch,
+						WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						10 * 1000L, PG_WAIT_EXTENSION);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+
+	if (!avoid_dumping)
+		dump_now();
+}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 2b63cd3..f319953 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -693,6 +693,20 @@ ReadBufferWithoutRelcache(RelFileNode rnode, ForkNumber forkNum,
 							 mode, strategy, &hit);
 }
 
+/*
+ * ReadBufferForPrewarm -- This new interface is for pg_autoprewarm.
+ */
+Buffer
+ReadBufferForPrewarm(SMgrRelation smgr, char relpersistence,
+					 ForkNumber forkNum, BlockNumber blockNum,
+					 ReadBufferMode mode, BufferAccessStrategy strategy)
+{
+	bool        hit;
+
+	return ReadBuffer_common(smgr, relpersistence, forkNum, blockNum,
+							 mode, strategy, &hit);
+}
+
 
 /*
  * ReadBuffer_common -- common logic for all ReadBuffer variants
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 88b90dc..da66094 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,19 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- This function check whether there is a free buffer in
+ * buffer pool. Used by pg_autoprewarm module.
+ */
+bool
+have_free_buffer()
+{
+	if	(StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index e0dfb2f..a156bf2 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -316,6 +316,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 7b6ba96..2c70914 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -16,6 +16,7 @@
 
 #include "storage/block.h"
 #include "storage/buf.h"
+#include "storage/smgr.h"
 #include "storage/bufpage.h"
 #include "storage/relfilenode.h"
 #include "utils/relcache.h"
@@ -183,6 +184,10 @@ extern Buffer ReadBufferExtended(Relation reln, ForkNumber forkNum,
 extern Buffer ReadBufferWithoutRelcache(RelFileNode rnode,
 						  ForkNumber forkNum, BlockNumber blockNum,
 						  ReadBufferMode mode, BufferAccessStrategy strategy);
+extern Buffer ReadBufferForPrewarm(SMgrRelation smgr, char relpersistence,
+								   ForkNumber forkNum, BlockNumber blockNum,
+								   ReadBufferMode mode,
+								   BufferAccessStrategy strategy);
 extern void ReleaseBuffer(Buffer buffer);
 extern void UnlockReleaseBuffer(Buffer buffer);
 extern void MarkBufferDirty(Buffer buffer);

Mithun Cy

mithun.cy@enterprisedb.com

about 9 years ago

In reply to: Mithun Cy (#1)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

On Thu, Oct 27, 2016 at 5:09 PM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

This a PostgreSQL contrib module which automatically dump all of the

blocknums

present in buffer pool at the time of server shutdown(smart and fast mode

only,

to be enhanced to dump at regular interval.) and load these blocks when

server restarts.

Sorry I forgot add warmup time performance measurement done based on this
patch. Adding now.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

pg_autoprewarm_perf_results.odsapplication/vnd.oasis.opendocument.spreadsheet; name=pg_autoprewarm_perf_results.odsDownload

PKU][I�l9�..mimetypeapplication/vnd.oasis.opendocument.spreadsheetPKU][I�2$�_�_Thumbnails/thumbnail.png�PNG


IHDR��+%@_�IDATx��]`G�����	$!!xp���.E���(R
����J��)�X������!D�_r���Nn���%��������fg��y�������8&''1�:��f��8�@���h��a9�O�S�_i����9eU�XU����Z���,����p�%��d�JE��4�D)5��_�L���fv�8N��GZq�O�����~E��T#V�LS[Tk���jfQ�B9���ZB���Xp�d��A;8�<��L�H+.�I#��T���H��j�*�ij�j��T��#j[(�#Q5]Kh�t�:P/K�����v#e�v���:�D�����*�P�	�kZZ	ZuT3���=�'��SJm��h:�,���F����������veE���*d�*��xV�BVJ�GVJ_8Z��Q�
2V��Vm�N"Im_����$��MifGN�
���j"�e���Z��eUe�*�hIGYPI�(��p�%��t��j�t�B��B��8W�,��j:��j"�eg��
o�B�r8��:�0:��V��j3���	�,GGG��U��/
��r�V4�-�qb��Ao�e09���Hrr2���tq��<����XL�z�Re)l#����(a@��+���~P��%H+��`�5
H�94'm5D�X���rK��i-�_�d�����;w����
�J�			��3'��T3���4���
A�^P�UJ����*+"��u'M�?���'�i�=u���q�v��-m>�S������[��^�z���IIIth1Y�6)�h�Ts�����2��W��*�t$���F��?�<y��%JT�Z566���������>������{����j��r�JDDD���`�}����7���'O�T����s�vwwGW999�-[�����<gg�[�n!����*�����&��Z�j			O�>E�d��-**���U{xx(P����Y�d��?�^�zS�xq4���C����]���)SPP������ � EGJ�?~<k��E�y��5�G��/*W��._���J�F�����R@����k�J�,������Dp���)�����:���GB7n���#�\�6m:u�t��}d8p���I�&����...�p���f��u�����3_��=�����c��S�<:��j����K�Z�h������[�n��}1R����������F�x:{��Y�fA�*T�D����]�va��)��b��M���;`���y��??���o������_clC������5k�T�J,hA�(,,�u��y��A�=z,X���u��a�@��?���_�'F��A�Z�l	u���@e10��}���k����,}��)��v��
J�.
y�X������A���G��h���9r�@f8�y�t�����r����0�,X���Ch
b�>Z�^�z�*t����4Fp���1L�={��O�!C��h���={v zt����V�Z��@��3/111s���=  `��QPb�k�����
CfT�E_���e��eK�,�F2'����+��_����sh#���o��V�X-:{�l�R��C�jUA���I��R�J��oG�W_}����oh���C/E��4A�@����l7n�@WaL��@�$J����d����H�.�#��@\�~=�8w���@��]�Bo�
x�f�A@=4j�����d1����k�D��
�0a�O�6��.��/��6�
�@�T
��3��l*T���A��7�����
5R(BfD������Af�\�����r�|�P��|���bt[XhDq`�b����<��_��@�����c��O>����B�@{�p#�%`�
�(
x?y�$�}	' 4q�D���kC�PD����2O�Q����}���3gR��e�
u-\��ru�0tt��E�z��GE`R*�q�/F��c������S+F'�hT����60r���TtT�������G�C���|���]B"�F���B] )�8�t?.�>b��|�
�<�S:��M�v��1�<:��8������2��7�	P*`tv���������)�b���4�I*M$�[�;X�f�����m���h�&�'�X�VS�4�r��=(����[T"��E��
���q���|zyyaH�b��hUJY-m �~����_�����U��[o=
;L�����p��GAH4Q��V��F~x�"b(��D-�r�_AD�p0�%��;�$8��6tA������,!2g�j4��	��u8@�����`)��0��g��'�'xW�i
��~{0���i���-[��"(t��
�4�PL2���,B�Rx��[
�t�F�������J�?��+R%���ekth6H����tB�A�S���b�c��(�����T@0��������=�@m�c��1������F(���C�B"H����K���T�H��9�Z0��g��+L����)�q�A� �o�"EEJg��N�#H�>����0b��L��da�
n��X�i�&@-]He�J�������6��O��+�NY��23:�8�,d��b?��
�
��h��7F�~��A�``�����zY�C^�aj$}�G,��<�O���`�$B�*G�xJ��e���[^eh�(eD��2N4��|�(�KYq���(������%�Y�}��V�R\i�(#��(KZA���wT�H�#��2��!�Z"�f�^��-���S�&w��qU��(�>{��J�"�A��5�=b(%��C�S������Cz��}�5"Z�S-�n
:!}MK+A�D%�,��/��K�v�Hb������MmQ�����_l��,E�I-��`��������u2+����pU�R+��/U&�FoJ3�W�&�;]&b}��YV{��k_D"M�����R����,'oYkV����
�j�,�&��Gx��8I�:L3H�G����zY��-P5=�)����T-�[��8��=����h)�����TctHIy�y�di`����c�2�*�"��>��fb�f���TY�Rn�d?|;��S��~��A�2y����}��!���#���H�Z�*}*U�;w���}�Y�fL������J1�fX�zu��-��F�x�������5k����W����7o��?HHH�������:����'N�@���K�R��f�q��i/_������[��������������w����_�DFwAp�F4~���


���JLL|��-D������������gH��3'��D{�����3f>�t!��G���;���Ww��Y�L����}�����������___t|ll,}1j+W��[��To�?�G`)  ���)**�����?�����{w����\�-			(N�@�7��N�����L����?�y���o�m���t��e���-��)S�*U���Y+V�}_���;00|�@>H�F	������;����a������!tv�|��F#R.\���^�t	]U�P��M���=����+����t�&�e:�E(��Q�FP�k��m���R{��y������;u������d��H��932C��m��W���zI����7�@W���?s�L\�f��+W�a���zn�_�����@v����r�B��m�B���j����SZ��Q�F��E?����={��_�������Ih���9B }��g��~C_�|y��A�t��y����u���z�
z���<�7oF<����1����:t(D�(R��7tR���a���/]���C� �>}��%J���1���_�V-�������:z���bX�n�GL�a�P������o��
@ 
���B�G��.A9�*:{��aK�.|�l�����!C�PI�tb���
z��(�P}4�>y��H�n�Q)�s����s?����7���.^����P��`A@|�)�I�����@
�L�00���^q�J���G���cG�{����d�Z�)�Ag����
6D����
.��P&{�����."~�d�l���t3����'�q1Xv"�QO��q�e!04@& ���
��060fP�����'O~��GP��7BW��ata A3��1	x�k5j�((�0KP��u�3X������`�����k������@���`��d�$H
�-�&
�K����p��k�k�.`uq�\�`�RME~tz�D?Q@A��~`p��{V�X��C�3"����(W��b�=Ix
@��A/���K'z�E�6�X�l�W�=1b�{�����Z�@)(���7����
6@3�����}�"���A��0u����k�&�BC>����e��\��,4����I���&���?�����&�@P�c���9s����:p��2������K��;�!q�6m��;w�f�7�`��FP)��2�%qU,j���!�r!F��3g0����`�(�QM3f���O�:R����+]b���*ES�RK�,Af�8w�\0�a��'N��CaL���^h`	����o�����S2�CR�m\KHw���J�
���T�X�����H���b�)t��[6�"��	c	�*T������8�$x9�-P�Bb��-���" 3�Q"�H����.i�����t��~b`�H���b`?���0�S���7�t��KN�E��L�
�nD`�
�[AV�2K���I;�z32u����\�9]`�������%f���fS���J�-�3~XNJ
J�(HY"�<\�'�8��*������,�r������GYf��H
����}��(Rs#�GF����������R�2~x�Fz�S�_X/V��Z����T	�dV��Q�%���*-�J���Q�I"*���2g��)e�j�u�(�/��?U9Tm����
Q%���-�����O��`�J�����y"�&@�H�(5��1f��Gl����Z�l������j�*���E��&���V?����D�t-�������7tu�B_�Q����~M�s�7�t���+���7�F���O�D���7���9p�O�������PL.]��c�$~���[�l�w��������>}�%K����/\�����0��7����b����	P����%K����k��M#F�����Z>|�M�6AAAs���2
6Z�82:3{��^�z����q��r�b��XH���V6�P�OP�E�:��5Q5]Kh��:���"�LA��z�
6h��}�������X����'��m��P��g�����/_~��!&L���(Q�����5���[s��S2�mI��Q+���P���d�����J\���"2�e��~��,JQV�Z����+��d"�z����������_�ccc}||h��v����6222S�Ll/�]o����
�?�B�?~�A�7n�����9�u��s��`��:th���'N\�~=��_�&MB�X�b�/_F:o9����F�z���-d�~?^<�Js����u�6l��V�Z6l([�l��-o��q���/���)00����p���+�*T����[�������A)�2�kux)�B����xC4�
-�e<�2g�>
��T�����K�E�@�%�@�Ab`�3v�\3��LF�?%=`�x��2�r�����L3d���.�q��l9|��D(�K��1��R�����O�*���X���	�Gw���xhVR��'fSm�����_��&��[��B��2�i��w��~��(F���:��\5Q�C��q�|�o1'���	;���7�&���>�)��<}/}� �Av�o���)�U�F���]{��<���a����-�e��M�M��{I$�n���^����Dp��ub�&�SuD���!2I�:�_��hM�d������,_����St*��5^Y�,����E����_&�O�g#H�j��PMN"�F���`<�r'Q�db>�?�����V�RI|BfD�	BaLm�[������RTu��-sX��V	�$*[���>�v|ll,&����4i�e�����?x��rL��/~��]L�+V�O�����N��\�2�OIw�h�$U����v"��;z��O/'36������MJ$6�[9�$��;��|!�z��Lu��q) !V������d�o��UoNJW��UP4�h\t�R(:8�t-�Qe@��N�U�:�J��o2�NQ����w���:���S�G�R��������7ob6>}����;�Z�
y�N���K�_��c��c���2e
;��)�U�y1p$����A���� 9���G�����K����:F#Y:�>G
�O�Z�ww�,�<)R��H ��(���4��S�����+���i\�	����H8#-��
�,�Z�k~z?��!t��l��@��fic���b�\(ST1�\'��b�����7<<�����}�]HHb��q}�QXX��a��u�����/���G����X{������V�/��7bK��������uc����$���f6�g2@���1��%@�SHaw0���_�+�{)Z�P�bUX&#�*��(�?����+��-)�-��_0�d������LQ��
����
��gB�����p���S4mD-�o,3��hbl8��.0�*�����+i�:UKLL��%���C_W������f<<<��y��45�q777�y
��R�����d������oHJ�|n?��Y�uB����`����_g��F7����{������(Q��zevpt���k�����h;�w�
�����]�T��.��P����9�����|P�����r�w��s�e
�?,��;�R#���0ON6;�Cn{����W�����������*7�D_G��{�{�������,�'�N�4R4���'��Wfbt�]�HT�h������`38���S�^�E���[O�<y���^�z���:q��e�5�=�W�4��]��-Z\�rqj�����7;��~$ ��������M�_�@��5��i�yp��Z���g�����+lQ�O}��$I&a�U�#���a�
�?��'rp\8��@��cbJ�sfCpb@�,<��?��x���gg,�'�s}'	h�j���s�����������9n��&�4��:�G�.�Z4�T�M����Hj%�����O�	Ae�Y�$��������&������`N��$2��e,9w��,C�w����&%����\o�����Ez����####g���@�n����#�+3f��?>����c���;w.���5l���d�6;%,�8�gHX��fY�� DNp;o�n�'�4��<6�-oh��8{r����4������XA>�������?���"�s~#�^��D�x�Tm�?�k�/���!.01x>�t�{�����t��BG��������-fC�B�}& ��;����~��5�*�*"t32_?C����U$�
�x3Qd�������Vb2��jd�,���8�	��N���dJOr�1q!��#9Lv,�u!��&������%���j6`G���c�H��U��(
���W�Z5zq�#""�k��C��������o"�zT��Y�N\D��v�F0�0�N)o&~���%�����P.O>2r)��L��x�P�0���&����e\��$��Uc����:��������o������u��9�����+J�A��/o��'?o���Y7"�<������ga�j5R��Po&_((�K����'�R�)2�#�8�S���c���$�p�<H���/H��p��UMA)��Y=�	�>�9{���s�kW���d���@��DN��F�'a�|�Z��X�?p=Y5�[�+9��������98�����j�K^�0Z�P��c��&r�)])#��y�^��U����O��8���T���+��F�<��0r�<�����\���'�����k$����s��S\w���r�}D.\�6/!���%c�+WHP)2yy�
oK�ze����=����=H`a��h
]�{���v�$��x9�] ��M�dF6���@N(P�l��9r���L�#BX5�Z7���E&�R����~r��|����=2Q<��)��dT{r�4����XD��OcI�2�%��<���P�w���H��cn�/D\���FKa�s�z2���W��8�rc��]����~!������K����;2r��tN�[>�a�9XgK�2�SYE�^��'�I�R�
�!*��?7�d����J�`��rvD��o^#=���d>g6AK�w��37�o>�� _�!����SR�:����|����~�d�[�&s���/,�e�Z��R�_���
��I&�	�h�L�a>��0l��v������Y�	|�/H����'	E��LVL%k��!wf�E�%��I��}a8{���gl$��IP5(�����f���<{B���c2a5D�%,:�+.��J?C�x���|��@MR�����No^ ���w�.�X���V+rzw��Kg_1����H���dq�����7�pn�_#1�,M�8�Z	,��0NY�#��Q�BP�9����$�D��,R����|���xNxYZ�*��7�/���%>��H�����8{�,�MYOb��U��1���[������f�f9�GXm���`b�$��'s�������5�|:\���O�� ����~�=d�V����v=������i)1r�a�����D����������f*@N����@�%�=H���@p�,Wp�mZB��%����E���m8wd3	�#�|�>���=xG'�n���N��"�%}� ��Zy��u��7T?���H��<u�G�������J�w�{�Z���/�C�|��5�8r��)��t��*�-� �d�dg!bf%.���T�zcrd;�zJ��]
�sg%�����L1j��t
��s���dr2�[\��}?�?��8����E&��C<�7gw��Q���]2i��������k95��U��-�&�������7���|<�~K�/�<Y��R��$�o|�Zr���Jn�)�n���!������$��~�Q}��$�@��d�!�dpr�H������x��z�W�,Y0K:s������Q�%������,Y���+::���K���K�3�o��3��%���B
��U|Y���@`@����-�*��W�]J�pV	�K-���V����������r�����������7���.T�*��^r�)��E�-$�:st*$R�������qS�������c�����T�Db�#�Y�7��ps�� �'��,��,��u��l����how�#=�vA�8�X�4��$��/����28�����H����t���G/"2xN��{����i+�������q�._�|��As��)V���+������gC�~���i������%q��7���@��1���	�;P��%P�s)��R�����Z8q%��^�}��d�	�{2�]�
!t��w-���RmF!gw2r	q�GN�&�n_#i����
��N��l b6p�<����y��q��~I��������w������:���H(��L��G�a��{���<�Ev_��k��Q�l��$��@{N�y�:�N����� )�X��R�F�����fl}���v������~�7�m����q���7ob6��o����7��/��]����1���Wq���o���[76��O&V�$>���MrR�W�m�*���j�S����}��#��C����c����k��d�.r�y���'�D;m&�[*c��sF�a���������uZ�j��W%��27�f�����J�����@k=s��;���O^��$���=���k��n<��F�j�s�{����k������W��f�k�����l��^��7o���=�����g��"��6�#g����g�C�P��
�$!!��n�����C��v��x��Y_�~���(���(b��
������%��0S�"QocH|5H���	�R�kM���|�UZ��-!�h�8/-��xE�DBH�[�<)j$��D�[|��-4��������=Z�.���U(�T-�y��������_V*��? sC
oIf������;�Q���y�2=�p�����?~o��#�h-�\�������k7_�����o[f���  \b2�r������Jd_��N���e���C��������������9r@-���>|X�\����x���i�q^}�'���nEo�������$9d����O6q����)�����J!U���OD<�H��%��������o_D���H�����wfg�R2�k�=�!W�t���������9(���`_�^����js�5D}�J�����3���{��E�Of�����-�dt�p�E�qn^y�q��
��f�+����gO�u��Z3�M�b�2�����S�}}}�����O�0z�z���;w��y��	L,�����+���9q@N��MQ:D����U[q���O�E�� ��~��W+����"l8r������n��h��I]`"�������m��_~���a�]4���fW��g�-�l�	}[�J���B���NA��T����LI�e��������{�|���i�l�����8gr�<�=���l��sL�:���
t���G��U�Z����E��f��1q��@#���7�/N_����Y�S� ����7����2��	�����IP�=j���q���2Er����YIB���=��&�G�Y���|Y�����������f#�<��,w~�C�����*�@3Y�����Cb��9�~����
�����}�>q5.��^P ��le���^���U��������a����I�R�����C���s��p�K������rHc@u����#'�t�Qa!^����
�/���'��U�����C����v5������������[��O�y|���,[�v,��`� ������N�LGA
�*�~xn��+��z����U�JM*�OJ�U�>\��QNzU���)C��Sf����X�}�BR�W��YE�~�Lky>>VX�uuO5�V�Y�ESm�j)��6R�	�IVW����Y�O�D��v�J������A�F��g�����s���g��:��b�T1�OC?*�����n�TcIx$�Nr`�LK�~��J��A?%j����"|��?P q��{WI��)�6s� �/�(I��7D���#�OQ�Ql�5g��;/�����Y�df��&~:�+���U��Q�C���yWg�z��V��L�x�h%��>/!����O6��C��#��h������X^<q����N\��%o:Y\�RZd5#T/�:I��;ltR��t��h��@��F#%�A4+��O��E���|� ���&�pB
!������J!�S��&';H���i;�k�r��0R9m��0-��DFFb*����L�����^�/^��ni.v��-�2U������a�DY��F�����=
����i��'����u��/�lQUg�����)�_/����0 ��8}�����y�fz���1Q�{�����e����h8�C��3g=�|��!�f����=������RQ�����\l��%�
]��a�I������T�lc��l,��[�w�I��Gu�6������R��R��lZ�������f�9�111+V��Z����'���m;v�X�6m�4i2{��	&0H����q�����#G�i�����o�����,5v�����#�51.k.!ExA�rH�~\�
�kz��9�v
�}������u�7Z��H�RV�m�SZ?Y���`����/�6m�u�V��E�mM
��^�z�L&L�������#^�X��K�N�<q�y���DEEQ���6�s:�7@����=��$1 o|t4���;�r���+�����8����Xg����G��{���c���Q�:4�h�v�j�jq.�Y�9����~:(�����M��)S~1����P���;wn���<y��xDD�)�,����~�������}�L�C���+rF'>�E��.����
go=[�����mf��s���������mzC����o����W��x����LXX^,������Y�
 g����O�F��w�������SP�			��m�;.��K|3]]�C��M��]���$_�~�������������Q������7o��
�H��i��]����5kv��Q�4����(���~���3'h�����gO�y[b��������z
RV�8%<��ox�;<h���'n�g����a2=�P��y)?��">��,��6o��V�x�V�����0���",�K����J�L��Q��yE""I���P)`#)+c�(���UM���LX�&%����?>{���0�w������Z�f�4M'����[��>�*��r�/�s�Y.cg�K�������������$,��o�=�����3�&'���4{/<��1p3�j��yY��dg�o����?O��
��I��E�~�)�����ybYE�"���_@S��-���E��&����DMGKR��4�/?n7v����+����4,��#�i5D']���I��,�%�x��"�R���-��85m�S�Sn��#�v��@b	)V�V��dZ�B���*A��Z����O�����S6G=�]������4'w�t�(U�������2g����$AGGS�K>�A��a��P��B��"���y�/��w_K	;5�-z?�b�����9��j��X�v�m�:�89������T;KUu�*�8c���7{���/_b
��e����?�����������c��������w��ew���d�w������?_�|={�$
H����F.�G&&��P�����2<f��}+w_"	�/?�1�w=W'G�Oi��2��D��;7b��%J�L&�����:��}{��g���y��1cBCCg��5j�(�_�~=s��~��Au����|d��g��n_����p!����L��v*��Ci��l1z��s�����/N��p�/l�rPQ�Y'Q5=c��Yu�t<,���1�����J�*~~~����k���������Z�j�|�
4����
6~�x�����F����Xj�@'..�h@(�U<��ht4:�I��(s|<��H�T-xW�U;e5����|O7���.�_r���p��Y��U��11��E>'1R2�;��"��[X/�+��~�!G�@�>}��)S���o�fD<��������84F��7�D�h����>p��f����h��M�XVsGd��Bo���_#x�9O����E��)�d��?nR�����������Ue����
�W��ED��3gN��]��?��A���h�S�N
4����H��o����w��Y�re<������f�-ooo�z��*c��O�L�y������Y������3�����z�sw1&�$���?�����C�*��:��suum����2i�����U�V�>|8�����~�)�|��w�^�F
x����Wp����z���G'.6�OL ��u�QE��$����2|6c����`����_4���qqO�Ue���S��t�LR]#����^87@Z���h�*�������.���%��FOy���|NJU�0�����P���U+�)�������Az���u�����i\�z���L��8�E�*}����LY{U3X�o������^�zG�����2��Zx�Y�����s���n�cwy�8�f8��c&q���"��K��s4�GI��s���f�mx��J������gm���2���lD�n
K%%%;8h�]y��F�h��)��)[��������;T��uR�5I�[��	��i�U[�I�S8C��;"�KY�Qm��H�SY�V)���"}S��|�����7]8{���s��]�������!���~b:�)k�j+�
U�����+^�x����O�<Y6��BC�wBw]1���c��#$��0fTaFZ�X��y��!�hE1Q�@H���7\��]a����Gm��y�=���Q���	�*���y��F	��j��(W���y�����
���?/^\�BI��F^���,.�Q�����0
r���"f���I��L>����B]:2Q55�	Q�����C���3����`RDL��EV�l�5S�Ws0�Tbe��v�*�X�u�%�M���|wGs������;***K�,�7��/U�T�9-����:4$$v����G�'N���"�>y�$�a��8{�l�
���KWz���}�
o(a��^>��cF�����/��t��m��S��k�4h��^n��O-��C�Z���g�������A���&V2n�U~,^�x����6@T����1_|�E||<&�s��y����C��U�g�~��	�O�81s��e���-o=)���1'_��1�{��D���9���A!k��G����Jie�&�W�}�p��
g8O�_5��B����--����3��h��N�����FU�i`J�9T�Z�P�
��Yb�c�������K�[�n�i�*UQ��M+U�T�����*Y�$q��	C�A��c��@���l��Zzx6�CFg�L��b���f�U��o�0�v2:v��c������=jCiM��`0�A�����
g�w��xE���;_�|��(P���K�*����~0&���G�l(JA��]e������l�	R�B<U����6z��x����
/8�M�Z�&cF_o�2'%�3�;OX}�	��i���kZ���P������&ME����A�.����K�/���Q`k���e�Xq�G��]����P�V�Z<��<x�e���o�~��
�	�x`�f�O�>������>>>L#W6,UH�%�/����&n��e��t�5�oL��4|������j�����.��`��$
*�S�����W��+-/�����_"��'�+V2���]�v��1L���W�^
�_��������8��5jH?���=C
p�U	���D���/�?a����<�?z���u$1��GeGu�*��� ���X����)bu��vTd����m)��+���@sr�K��TV4O[1Q�:w�L��������/�Gh���#����J�G��I�B������V�b�O!*�v<s�Y���4���X5���X
:��5Q5])T�����}[����t9�K�=�_e�l>GfD��=�������d>�SU�a�b�_HL 	����p�����X�����FH��1�_<=	��2e���7�j�c|[�CJ����
mv�"E��������lb�Y
��$�H�����������H�,^s�6\8�9��I�u�Z�|�l�j�,��J�4�o>k9j���a��������A<q�A�������m�iZZ	�$���F���D$O�<2���(M��,�?����Az��a�ZF�H�FT�K�8��S]����
�N0CP�����HS�D���~�#�';FO��j����V8�/2oF5?g��Wz��!�����%����3\r��IZt�/.�98��!��4d�����%E�T�Vx���Y�\���6�`�M�30#"KWf�������{���S'..n��m���p��<yr���r��.\���G���|�`������T�r��y��e���J��5�u	q��e���+DXeR'�Z���pm%^�\\�������<z��=����I��-2��H�^���Q�t�Kk�j�VU���9���DN���1_�n����;�
��}��K��\�����)S�u��|��V�ZM�8��e���n�z��S�N
�N�������(�����%k[�)B�����Q8�.�U����~����'����D���kw�+��+�4��EX�t��C�Si
������#�y{xx80f�����z��1��EEE
2��6m��5B����{������M���1����	� H�A���u��-h�"0������a����oG>
%�.����T�x��A�=\���T�c>�
9:\YmZZ	�$*y~�7��)�@����)G�k��y��U�l�����������f2�����(S�L4J����111�m�����������*��0�\h��9�+%�J�a��/������������/��r1�y�1�w��e?�]�f��>��occ_��kJ���{��H����FFY'QV���R�i�>��g�uDGG������t��={��\���_

z��1�I�&HA��k��!~���f��q�����-������yoa�7G|}�	u)G&>����9��;/:��x�Z�r���#:U-����:��g���#}Y��c��fF�*�������Kdd$����,�:t(&M6l��y���#�"��=�3������g�5
(��Q#�F�!�;��O�������`�����N�7t��a����_/�gB�:mj~z������Y���T����=���7�b ����A�A�����%�4,�',,�B$+��@����W��o^r	��
R$�i�6�f���as�����������=�I�P�r���:��sl��R��%���Juh�QefU��������*��?Y�K��@����P�ry�����c,�<P���D����CE�Z��7~R��?Y�P��(�-y6R\Q����D������Y�-��:}�������I�������lt�hZU�U�F����i��s(6����~8g9Gfh��n>p]��$�(��;��fG/?~�9:v��J����d4�uI�!���?|c��]!�_9�z����G�{Y�:R���������Q�R)b�f�R�$����,�77�����]}2y��]'n��q���0/G���?�O�@�������] {�\~�����M�����{e��k$&�l�|K�~T�@@�)Y\�V�������o��NI��X����	��3(�Wm��N�����_&��0M�(�0Z��R��w$���	�R�z�������G�_���^��{WC6��`���K�\�.���^GG��!q��9}��\mr�������~��r��-����T�
�3�1���������^�x�������-Z���#!!������sww����z�j��%1c�JMO5�'1�OL�\\����
�&j��4~;>n�.b�[�(���F���HH4=|�?���k!_�}�4���0����[�2���d��r������������Ld;�d�P�P�(P���o�3g�`Z����-Z���'{��K�.<x�?��?�����o��T�`��K�,�:u���7�{�����'S�����,��\�xf?���*��3w�-�������
��&%�)����lt,�����1HH0=���x�;��z�U���C�c�ds
9W6$*)���U�X%n#�2nU3d�{�z�����=X�l��_�^�V��?�Z2g��~��ar�i��Y��Y�&�C���cG��]�k�y��`�8�gN&w���-��x���WQ=�m%��n����(����0��x9;��x4��q�#�1�[g[2PvJ���J�Js������������~��`ZNWk`�r��	5j��=�_�v�s���g�����W|��R��_�������x��plD$gJ�
���-�m'm{��u���;�
}�	���02{��@,�\Yn��p�5-�U;%�/
RC ��&�H���&���r+k�*�Zv��+�M�N.���?���0&O���������������O��x�2eBBB��������n�2�L��T�09��<��nv�����F��I���l�b��s'�d����	�r���e:���nP��T�2'��Eo>$�3~�������B��]��A�P���U�v��
�t��]z8�� �D�����4iB�TP���[���^p���@
�q0Bi���x���%�.<��
h�?.���m���CC��7@�_~����7=4)88���&�F��O�Y���1�)�]]]�]�pt�ew���@���/'H�&F��7I�K�����Q��UjW���zmb�	���OYm���Vc4�)�a����P#�G�
h"T���:4���5 U�+�@^>!!�8���8.�d�h��[�B�g����8����&v��p��V�Z�2�4��z�'��?�,~Ywb;�"�T���3����q���'ng���iB;���2�uu��ZJ?��t�*M��:�V	�$*yf��?O����1��=������+��!f�M>�3���a������k���T$��_{�����z��c8�YZE��B�l/�t�������h��&7�s����s�o���r���0�}��@QiD�&-�-��J�e#A��6����d{�j��`��~�j2Y��'�>"h�J@7��;;d���������q�>o��.�d�u�������48�D|Xt�+G��]*��Fq�?�;"�S�Fi3����F�(j�W-[�<�����3g�*U�C������i��������s����v��m����[�����O��Y��T���;u�$;o���:����������0���T���j�������2�Q����0i{[d�i"h�S�t�Kk�j���*cR�'V�� �HE���
����G��o�~��1,8|�p�.]|||&M�4q��g����sg��)�~���'O���;y�d��1W�^�:�r����_'?{�����#���=��SF���Z��Y��R�M�U�lv;e�N�"!
��S�>&��F������_����;_�|��w�1cF@@@�Z��
6|�p��s����-���+w��y������WbbbBB������{_
]x�+H��Ku�eD�
Er���SfL�`n�SD��T��JI�Nq����'*��������j�K�Y���,%�Hll,J%����p�A�����9��)��K��nN���5���%�(�F5��^�W�M�����?�[^5<������x�|�����{��������������W�����F �����Ev�B�Z��]|�}�F�re�)���=�2�/�����%��C�����U�V���.\���n���-[6o�<o��k��m��m��Ms�����={�-[������^����M���I\��R�b�S�|J�����xX,N�����Q�F��k���&��h��h���8�Mo�08r���b@���8��D}�4Q�	�kZZ	�$*y~�7��OK�-Az���}^������\�{=�B���
�J��+�������p��6�������OW��4�j��U�����m�%���jf�\���&��k	MG���l�d8��{R��z�Im�TI����S�������TZ�����>���GA��M�����:w���'*W�\�bE�=�R;�C�*o���g��)+v��V���t^1w�>"37�y�Tin��9e��������/��x���m��J����o�>T�j�#G�:t��V�.<�S0�z��W��HU�f�d��K��
��#2d�YA^{�F��VP���Zu����CR�se�U�9'�C3��~1�Z�����u!�Hdd$}���[����6�a�������*��4�Mbb"��j�t�
`_4�Yd!..MH��0P������Uoh:(@...,=�{�������d��K�.���v�������F��zh48<<�������|�����m��d������7�
�������D�_VD��t�q��5o��1~�x��u��a{��4��6t��K7�0W����a()h����a<�ohDF��u�D�����O���"nnn6�Ue���U��j��2�zK��U6�P���`5@�OReX'�db�bR?�+)���7�������8~6����>3�AjiA4X�rC��AA�
gy
�*SeY�<�-��H[�O��|��w�(�G�sb{J<0����v��-�I)�m���6��&r�[���P��#X�r�XDy���pv\�T�may5�,)=%CG �W�k"~xJ�amT���/�����@u1�i�]|��
��v!�h�����>������"��J�*�f���|�����pz�1�����+i_"��k�<x�������{�n�B�-���-��Li����q�+W���C^t���2��y��!��Q����~e�r�l�r���E����A����P�����T?����#��9=zt����U�z{{C&p%�������dm�$.88�R�Jh���@��u1�=z�(/����|J+'�������5j���_��5Kz��VAb[�l���?�b4�G��-B|�����gx������I����S�>D
���z���p(>M�>6���q��qx9���w��o�^:�������~��;w�,U�z]���,TA���M�6�=��-[v��a�71b�������Q��7��,�m�n���#f��V�:v�4~������{��Z����o�9��y*T�[�n@�.]���*H,�XA�
4@W���{������'Dk1P���6�:F�W���1�N�8�x�b����c�~�d�y���e�s���1cP*O�<3f���u+0o��q:t ��_`���K��JN��P�]�vu��]j�dd����m��F����,@j�A+���S����� �I4�|�
��!��U����a���*HT��U��/V��a����9�I��f���sh��3g�������xt3���U�
�M��8������?}�FJ���i��:&-	 ?��O?��R�3g6�&DFF�4��z��C��eCBB�HW�C�<q�'��y�Bw�p}�����,B�;�����;7�������n�S	h�B(������O���w��u�<������V�$���o���T���s�m��q����IB�5o��^R���s8[��aM�NQ�y��1@��:
��H�7��8�b(R�#L-@�l@ ��*43k���_�;v,����+WB���wzC1�Y�f�6t����_�}�N�����k�<yI��3�]�v�:���/Y��~��������`������T�^��2@�+��v�ecbb��8����i~����O�8��Q~��E��3OE:�������t�|���;�l�,�"x<�&M�( �=�tx�g��-Cq��@\`?,tG��=as����?�'���.*+�>`}�C,S�k������]���Z�����l��`uN����0�0�LoaQU��5�1�j��"��+z�AJAU�0P��i��|@�K�-�VKo�T��P�c�����g��MqWkr@����������
�6�[dC����bY�=�r�;�d3F����@��$��9��x���lt��A!�u��(�P}��'%Q�����2��w���"B;����^u��`Y:u��vo&
C�����@Y%o(H�s	s	2F�DI��%���k�FZ�Tv��rW���U�(�,+�l�1�������'c����	DK8�e��K}��jc�2g�d��MP�,���1:VE���
���*���0(qH*t[
*q%�tTaI��eUHK������U�����N-:R�T����{���V�JJ5b��4�E�v[�f���V�Kk�j���l�>�G����?�w���"eN-�F�$��B6H�vY���E8l�+/���1UR�B�Re���`e�����W6S�7�����S��l�OS�EM��U����	Z����C���X^���t��tMax:!"�^����������j}���4����RV������<��L�W������#G^�~�Hlll�Z�r��)-K�Pm��paaa�^�*Z��tE�O�}�""]��,�			�N�B��U>}���4n�o��@_�����h4�*U
?�?~������z�B�
qqq{����*T�X�'<<������aC�N����K����Q����������A�g���F���� �q��...7o�|��A�%�������������Ok��)}=��`�Oe�|��A�r���xz�wpp0�_��R�JW�^��,[��u���~��(Py>|���|��i�Wddd�N��]�6g���M�^�x1((h��E(8��:u�@-Pj�����|�������z��8q���r�����CM�%NNN����}�64	:�'O��k-!kI��g�_���'::��QG�yzzBV�Z��s�5k��y�:$���gk���)�!q��M����^�z����/����L�2�;w�]�v�>~����te���/_��R�
�,��
8Z�x�j�Pld���y?�m��Mi���=�Us���d���Z�U���i��&YP'"""**�����{����9r�/�k,����� ����P
Zj��/��jhA"Tm�����"D��
<�����]��
S������Bq�8��+[d�I���|?T��$���-���I�l��"�6�.�z{{���J(Bf(�����ZK:�lC�����m��'O��@c��������uk84�)8=�h��n���K���a�@/���~�]��l��o4�Ow[Tk�� K��??{Do���������e�*�����~S��)���b��-Z�t��+V�qzC
\(����O�>}�0":m�oXL�Z���}OUW�v����j�����jQ�!��:�������l3�����'Y��{:��Z@�rf�\���`���� g9����+[��v����S�%(�5�v
�fJY�ZB����d�7�D�#���`-�(�v���=�=����
o�de���U��U���(�F��K�I�&�Q���:���eQ�^j�TV������<�n�uJYe�
:!}MK+A�D%���K�����o��F�%27J5.�)��t~*��W�JJ5b��4�E�v[�f�Ei��Kk�j���l�>������Z�-�M�~�a/����ZB�� ���=�/���h�j}��a�������U�@5�o�!=�����u&��7v;�E�n����ot��Z�-�Cx��[�����:kU�cJ"6�����m!���A���&��k	�v\'��b��y�z��}�=dx���=�'������o��M�����ob�O�_���|{-o�F�V��Z�-U3��4���5Q5]Kh�tv;e�	�u?���:�Z2���7���`�o���:�v��2>���$�20����9eU�XU����Z���,����p��s��R��U��U���(�F��K�I����������R��	d)�0�Z��JkWVduYH�����g�*d�dyd�����(� cU*j���$���Eu�I�seoJ3����7��F��DN��:��}nunl��Pg��EM�����U6l��������������O���Y�U6l�������:�U�`�M"v�������F�FI+}C}V���veEV�m�J:2�U���������V�d���U��U���HR���/����7���vJ���N��);���n�[�F��]K�m���H��Dl$k�-���BP2��{iMTM����N���!}���B�-���BP5���m���D�t-���},�����7��oT+P�XL�B��`cl:��EM��U���-tB���V�V^5���������&��$�l�BA'��ii%hu����7���`�O��S�L����n�t���������N[Tk���o�x�Y�*�	�Q�J�t���g_�S�OY:}[�jzl7RVm�-��cJ����j}��a�������U�@5��Ni��)=;���x��^;��2����.��n���������7-���L[�����H��)��H�j[Tk������:��5Q5]Kh����7:D�xc�;������F�}{��`�{HO��7�D�������7*)v����`�{HO���=�'���S�����������C���k�
[(���5-��NDT3���=�'���������E�v[�/]�������%4��W��/������U9�*->�z�_�����m����_�fj���d��E�v[�f�Ei��Kk�j���l�>������Z�-U3�����HTM��-���o�x����xc������H���`��{M"V��l�BA'��ii%h��T�`�o���:�v�F����!�o����M%m���IEND�B`�PKU][Isettings.xml�ZKw�:��_���=)��i�$��W�������	V#K>��!��2�6��!`�vqY�����fF3c�?�Cr4.0�Z��������B
;��O�������3/��cR�)�H-���
_h1�
�
�B
�5Xt���|vcI�]�L/�@��Q�$I�.9y���R������z�����J��~N��A�.��Y��t������VB>SMMk�������� �8��T7G���h��l�0$?��m[�r��v	��E�zP."5�����y�W�7_��,y�}l�>�O�8�xl��Q?�����S��orA�}��k�}��.C��7��+���AT�$i
�!�)��&��%v �;X�s��
X*����=w�%��6����{��'F%"vD��f>l�?`�.�W���k����/��$���Bt�����%�Z�������)����-&%�0J���:����Y�����B[�~�'�����f�v#����<���[.>?-���a}W�Z��%����3B\�s���im�����������r��i.�go�����f�-�o7�]�L���������0�MX�C���I��~��������J�8�Kz�|U��B��$*����_c��9�����a��-���(2		�]�ge�}��IJc�@�������{F�M��I���v��x�E���o~_����9��zg?��KV���	��-��B�Q��o�=\1�[*�0JeKhN�V�JM�D���W� ;@<����#zSO�hK5U�Ed1�?#�����R����,�8�tc
��nT��z-�x�r������/�����[�}��������e�R�xS��|�t�3$-�y}��0����}�
#��O_��a[��~os������'w��Us���et�h�z!����0��w��������i�<j-��D7�����=�s3�^�Gnh�:O~�J\z[�6����!�4���|��J�=�����~r�Nm2N�j<�t'���It_����C�m2���n��`����������:q������a����^�����n9���3��C����V�qZ��<�I� ��9�Q(p��f�G3�|$��UaD������%3��r���~G�z���_���������d�!�I����&���IJ�O�-B�0}�\9��{�_����Q����C:�������:r��4F���W�����S���`��9�*��t�j���-�b��.�nw@hS���-7v���{e7g�i)�Z��\����;���[W~y
_��B�;PK���e'� PKU][Icontent.xml�]k�����_ax��� �/Q����E��Y��l�~�H����$�c�^J�L{���$R�$�dL�C^^^������<����N��f���L�IZ�o����O����7�j��f���.7E��e���3@��{z3�U�����^Qn�e/��)z���^�mu%u�������n�C�[�#lt�o�5v�I�c�����W%�Pg�������IO�x�����|�4��bqO�)����a�h����vWe�U/Lflc����ms�DX����R��oM�M�DOFu[�L������\����hv��/�9�D�g��1UD���H\l5���?����O���rl[��(Tq�n����]|Y����M��]N�\t���g����1�c?kGY<D���
��,<sg)?L"���/���q�\���?��oL��O{iQ7Qq�Le�bO�Ee�e��Y��F��m�<�,�io�����)�# #0�����w����!\�F��>`tam�)	�}�j=�O�rW$�t�0[S��Q����Q
.���3���iN
GB�����C��VS�^^���v�����*�Ug�^&��O&}\��97������<�l���o�I��m�tZP/����*����8����4~(�u���7��),mxg�fqo�����������vb��gGU[{om
�E}�����6mb���J[�-�w��������rl0.=�������L����Q��F����=����*c.�y��,.�m_���4��z�?���K64���U'��]^�{�[�maV��IM=[����D�[*�M�5��������p_�y����s���|���/�ML��P�e���|X���K|��������{d7&]o@�)�>���wxW��6ie��n�����D���sH�M�m���z]E��9����ng���6����k� ����L��������4�M��O�Qe����g����`�F/��u�������U��Wy����2<���QNh;ZW�vsB�C��E�CqU���2��c���`.	����e,+��&Iv���������/�S[qY%v/I	U�fV��f����7	�%������j]=�}m�i<Da����&�PYU~�*�����E�4��
������U�[i3�S����l��*OaK��V�h�_����������~1�F������������!�Bg�zcL��M��v�������Mc��C���n���vt��j�]�?����]��t�>��zm���V����o�#Ilb���C-o��;�Em�!�����'a���g��,$�����`>0�<v,��T��
���V�M�z� F�-l�`�\yUT�M�w����e������?������`����<���7��,�I�S��x�Me`�O���f}y�����i�mw�mY���+&�5I_��:v3/��e���m{���m�gKr�?�z�.���\q������M��|l����������-�N{;�='�t���u�_�@��^P���>C>x���I���������}z\N�s���S��s�-^
��v�\�Z~�1�LG_������L������M�>��G[g�A����m���e��=m�KZ�j�ER�����z3���j3�#8�wX%g���GU��?���31[ee�Gv��c��!�_6���C�+���,@���9��Yb���Q�"HM�Q�J�'�D�p�1.�$R�@�8G����|8U(q�:�W��F'ul$�cJJc~�p�8�Ii���Bq������$��v|$���rM��8n0]����<��P�3�Q���W���;N��Hj���8��
��9�&�]-�T���.`Z�]���C������'�b4T" �q\��O+�RNX����.�����*�RD!CI��I��0�"������p�L�t�%N�X�����`,H&:����@CR
��:��<D���r$E�2I���%�.`ZJ��0D�S:��<D�A���Ig���\��Wq����+A|���	�xe"����C�RD�$#��3��.`��Y�6���#:�W&"aq��F�D���`��L#�0-���P_�<�q�LD0G5�&J_���>���0��L�O|�)2�p&������8�
����1A*����V�DH�/����u�)�H��
����������B� :��v��
�h�����Cwj�#���%��*$�J�*�)�:��|�d$MQZ��4��#r�f����CS�;wsWA��*�H���a��
����x�����@Q�MT���
NU��T")���G�n]��'B�f����\�UP�Q����tE)�)$R2��=0�hA|�)l^���,���t$e	�~$?��!p��E3-$����\Y�W���3��P��t7�8qwS��('>��oL8�+!RY���P�����9�/B�_�n��� /���n����p��Fn���d����B��!.�J�������L���28��[.`�E�E|*)�C.�:���������mR�;u���I�4��]/����,c��V��\iI0��\�QT�{ w\Y����X��U��{���e�L�,\��a$Y����,c�q�CM��+�{��L|�"|E���pdA�Cfc]DV���.���L�,*�R��\	Y�_'��C��|x�	F8����v{�9$�!e��}.`3����`'��8v~�}���o=c���[�Wj�I<�`���u���e�G_��v_���:��P��PKauO�
KePKU][Imeta.xml���n�0D��
A��"��	Y�(�H�u����q���AQ����^��zQ@
�p��Ru:��l+�^�4"a�!�~>m��2��?U��Er`������&�V��iivV3���e�9@�g�z��k��A�rRR�Y���;2�����$2v��j�����
~���U#%8CB�iD��
����^W2���|*=����xz_��B���	�
��W	��0���:�8���j��X�[h�'�7C�#���lh����E�eeRZ����g+%(.6�`��,�RZf�?E�lJ!��x$:;�U������{J~�)�����+h��=��
�ll� w��q%Q�wRw��s�o�4��Gk~w8M���N*��9���p�H[�;�N�`�]�S�����N"�m����
Q�~wG���P�PK�H�I�MPKU][I
styles.xml�Z[o�6~��0��h�������y�Pl@[����(�%
$'��;$EZ�.Q����	D~���;��������Iyq,�Q0#E�Zd���O��yp�����)��:�q��B!���3.��N^�(�K*���\�x�KR8�u�6K��l��7��SS�5�%����l�M�D��Ta�N��)�*|'J9�y^bE��c����J��0��v�����,\�V���z�c�++�*�C��^L���"t��(<�>�m�TT�5���
w�Z
"��q9MQS�_������h��XL�3n��Q2=T���l��v����[�4�����������-�bA������<����l�s�Qt��z7
�	��h��Qx�Y��yi�[��@�V��C��A�'� %��N/v�����V�l8U���f"Iz�`�QiI�n)�=k��q�W�5����"
5����/�"��A��"�ig� w%TOaf��-
��a�*�3�����R�>�>}����ZK��[w�����%$frsaK���gm�e��B�3��>��
������_�q����f-��2R�n!���J�B�T�Pn������i�������Y
����"��l�&f��k��5�o�m�b����!�5��jA8m������	Iq����i��2��b�X��%8���R@nE�e�S�-�D	�
�{��?��=-:I������O���"�p��01��`�����
��!j0N��|���Z����]W�>��W�A[z��O�������)1�����~nq
�%��7���ezvt|z����.���2�g���4N
�;P�xi���#�\��-N�A�H�l"��[�&��d8%bt8�JqYb.4!�B1+��qXVE�*�*H�������kA0���s+7��z��Q�P_��Z��EB���oF����RR�$�,���{���j<�+I���+�x�%*b���CH')�s\�V���_t�.K59�Q�;�"
C�5~z
�u�����s��G������W����J_[��E�hV?���`T$��fTI0�,�����+!�b}���b�x�]�rQ�o��a=
��k��5�Hp�F4l~����@4������w�f3��ri
B�{�JI0��&�mIL���Z�N�� ��y�3h	�q�C�H�����U�����zu���%��D��������7�0�����g<�)����u�c�~|��4u`n~_e�X�7	�L��A�<��6�3��D�;���X�ww����\�����p������5�{ [���&�M���F ������3pGh��{$^���,�N�Xs�>��}�v���`���q�j��
$y%tw��;�\@��5�Pw
�����(�+�b$b�jp����]J_�<��5��GQ��1�R��kD��6u��h����c�����1|�+����2_=��Q��	���>����]Gk,�\w�m���h[;!����;��"�FR=�5�=z����������%�a�jS���p#�1�mN����-�G��3�\$���r~�*a�����E��J��O��y�8wRp0���H�u5���g����(#�8���������:�U�D��W!�}����:�f��8k���|W�����\]]]����Hy@�����aM%������r�W��R?h�m��Y��c����&���C�����������
�J�g/,NQ�����CDk���I�����"��i��2+Z��E��g�&�B�E���e��V�ANu�Z/����|uv�<>?>?�K����t_�������,l��u�Z5�v��n�_-�����
PK����$m PKU][Imanifest.rdf���n�0��<�e��@/r(��j��5�X/������VQ�������F3�����a�����T4c)%�Hh��+:�.���:���+��j���*�wn*9_��-7l���(x��<O�"��8qH���	�Bi��|9��	fWQt���y� =��:���
a�R��� ��@�	L��t��NK�3��Q9�����`����<`�+�������^����\��|�hz�czu����#�`�2�O��;y���.�����vDl@��g�����UG�PK��h��PKU][IConfigurations2/popupmenu/PKU][I'Configurations2/accelerator/current.xmlPKPKU][IConfigurations2/floater/PKU][IConfigurations2/images/Bitmaps/PKU][IConfigurations2/toolbar/PKU][IConfigurations2/menubar/PKU][IConfigurations2/statusbar/PKU][IConfigurations2/toolpanel/PKU][IConfigurations2/progressbar/PKU][IObject 1/content.xml�\[o��~���-���v��n��(��[@�����Z���=C���L��f�d���r�g��}���>[|J�2��KF�r��<N���~��,�{��]���Q�����O���
�����r���/O�a��eZ��>)�U�����]����5�)��l��z2^]%����v�`m�8�r=����i�b;���o�����������1��.������rWU��j���D�����cV�h�p��;����G�$K,�r�[�s�IN����,N�������
_Y�������WT��b�o�����t��������M��O0X������TZv�@UQ�'������<�X�������U��~�9��H��@�����0�:���1��<��^���i��V��|�w�����������]��������PV���La�pUR�*�c^T�b6�&X�w���}v=��h;u[���T`G� �!��Oi��� ����'
B��
FW�$�5�Z`�t�A�_l�Sh�� �\g�%���H�P�������g�gly>���t�&Y�/:�F��so_��@���5Z=<�����lh���r��4����������gGH�gJ�(����OU�-��:c�wM�?��]4���z��']^LZ�?��C�����6� ��r�p�������5�h��g`����B���%�"<���#�&)�4)��`�"�
68���&�(�"�o��?ab��b5��2�j�V�)i�)���9/<lm9B-7���	�k��]�+��a"��c��=H(����=(���<���;�����[�������&��AW���T.g��r���y�����I������D��Jo)tt�W(}�B�g)�N_�<�F�G�f�P�����(;���K�89xQ�e�46aVvSF�
5��pU$a�%�c����ny�M(���j<8������=U#z\#g�>��o=�4o*+NA3�����I6�'��!�`�}]��HlVO �I���<^7FV��#h���u����V����.>�%R=��}�+th��I�����c���}"�E6_������M/�J��/����c�y���;k�E���}Jc��RB�h?z�Ak�=vl�#�2'�6�
[u�7N��_c��+���>���t���G�4�y�"�+�,��\56��7J�w����������r���<~�>�<=�5�Y��8{S��~�c��VJ�v	������zW$��%!��������u�(����u�U}�t��b�v{-7UZY�apA��-#/���?��a;���9>�
��8�SX�?���!{��1,�xq�>&�h��[�W�������x��0�����5vD���U����E���}<m6P�����vz��_7aT��%��.�"�R{)F^��r�r�A�}��:������q��� F��s�G����1<4w;���h��_3p�r��A���gh.�m	�P$@���"�n��.I*F>0�>������M���&������eS&�FE���I�@Br�D�-[��$���[�Z����|�y)2� [&[�IKTI�!�*�HvICa�:����+���Y������E���y�z<r����\^sU�w�_��E
�&Q~��k����-���0/���F��5a$��~���"�G71)l�//���Z�2)P0Lk���S������8�(�u��7�^-�B��%����2���}OD~��w�:��9��]�y�.;�!�0��F!S������A����F2�m�|�xOa���{�m2H%����K�������g�5+�������vp�%�-N�C9k@K�}+F����"�<�C�Z^��L��� d��mL����@��n�V��J,(w����F�(C|����u
���rW����H�B1�����_�\�����\�^Yo��F
=��S�d���a���A�SPo�n�7��MG�3Li((g��F�9�fT0�H�R���:�?���{��:�����qD��@)_D
��������6�E�M��)	��S�h�@P�S�F����T1B�����9y�{>[X�"�;�H.(�����B1��n�}*	��qt���9�(Z8�H���T���h��^�n���	|m�{���t��E�p�Z��Cy"q@v��x��r��<rV��aF�����U<
i$�
�R`;���xT���	P�`tC:J>��t�Z��|�
�PW�� u���8�v�ftT�K��Q<
J�$ZR�r��{�������/#p3�j�
G����JD��	P��QG$�ZE!P<btK����wP�"R3��R�����J�AP����'$Bg�����QLjG1)��E%p&G�#Kr�5�J�{��bIG�F���4*�4��.v0�*�R��������Y,����QL��b��|��FklI�:�M��$`7�������������X�E��*@
��+;R&H`��Sk�bGW��(W��1`1P�Q��udGhC8��2������������C�������u�I�iP�q+���X�Q�ePLG1	q@��|��Z1��.�4�����!���X����A1i\�$hM�@
�;���%U��R�3�K���t��BFoQ���h��x�*�B�O�bzP� t[:�a?��*d�� BC��u�PGt�	�"�.Bg����;��;�nw�1�����\�Qg,��F���u����^���8�\�����p	�@W"�s(|���Y,�HZ���0Wg���/	���A�+[��|�
����,�t��
?�a���L*
������e����\h����Fg���:��W<��3����$�W�=��� ��� �#t[:��a�)s��Gp(lK���'0�GB�$�'��Yl�HZ�_�0W�y��E@S�s,F��%�D@��e��bKW�����M�2eDr�%�����R(M�b���s���I�����]��*�����F���6�-T\��:�-]����/�8;/�`�s���'A�+��Cue(��0�N[^"�zh����-�?�q���#����|��PK���:YPKU][IObject 1/styles.xml���r� ���)2��Cog&��>@��c�)F���}�.�&)�K[�/	I��i���cJs��$[o��
.�C���9}JN�?{(KN.��5�&��C0��b��h<$����KR3�
��09�pHcj�����=�
L���-y����P](���kk�K�Z�%����-�Ap�~H��4����}�U�l��!o��3��Jx���	��i��34������>1�������jW�8=�_/[��-��E�T�&:�H�z�������o7��h���.�+n�
pz�D��
P_�e�)�\�'Z�C����k@�9�2����l��=�Z�[g��J�U���#;�v�����aq����<����l�<�����#��|�*U����V�#�|)��3�ex�!�1�py�������:7������-uK��������&����t���/��PK�����yPKU][IObject 1/meta.xml��=o�0���
�f����`�:t�����E���`#cB~~�)����}��}�s�=7�wB�I�r��x��RU9y}y�7d[�e�p����oPY�A[z�V������(��NvL�
v�r�[Ts[��M;�Z����m�0t��6�i������W��M�(�k:i3;�Wjd�JZ�k��O�..
���LWF���.l����I�p?w�ci��!a"���-F�b��(^dN�B����;�n�����>�h���?��6��:���5���$V����e���~����P|PKTnw"JPKU][IObjectReplacements/Object 1���M�������y���5{�{f�r��(	�RBfBI�K!�=J*�:���:n%E��Ft?!%�"���R����[�iX������5>k�����~��}��^��f�������QZ�_QZ�V���M�\��ghZ��CNM��G��c�n�O�R/���jg?#�M�Oi�):�At�Y����~}
���������c����R(�cZ�$�A�k��*y��d_�%�:��5�_e�Q��v��Y	%����W�V��/��Gi3���u��������/j����%=[�9�]�&9
}����Z�$MK��1�8�p�Z�8�����*:���~��7E�\:��Z�����pB>%_lLTE2�9�*�th}���+�O]yCy�}�H�r%��M*���9i&��E�e)<���$����`��a�t&��z�%[��3���yX.a��/�I$?��r	�|�L")���K��KgI�wS,�0l���$��oV���-���DR���K��Kg)�Ky��
�l&v��N�r	�|�L��# ��\,�0l����>���g��a�l&v�|Z�%[�e3�������\��_6;��}:�%[�e3�������X.a��/��|?��%[�e3������0I�J~����fb�����_6�H	�Q���+�����GH�a��|�L��#$����P6;�	>�wf���>B��m)e3������~ON�L��#$����T6;�	>�wc���>B���he3������~^�L��# �����fR�����1�_��~�[���� �F��i��x�<��q���L��?�gU�yy�A�-�Xn{�/���E���Y�M�`p6��'����k���C�z�'�	�����
M�
��$��kZl���`:X\��c0l>�����&`7p���.AMLk�����f���c'�V����X��	6?��5�s�x��1�=���������G�������|�2w?GL=�����p�b�Nk���������"p3�
|	||
�����������UP�U�|L7�u�=`3�(�t"�.`�<��\�������m�����:�y\-�>�
�>�������e�A�>*p��f�S�\p�������`�8
<��O��3�����G�����D�ep$�8���������J��@78Lok���M���-�,�{�^����9YS���v�'�k)�&��|
�B?��d_���y��`+0}��p��g�J_����-��.��2�.x�����`x�������j�6>�t��\+7�Mp-�����\Q�� .+��zt�+�d��j��^��o����K�=���a���3t���������n����/�������������G�`U�]A�Q����lf��@�<�r�������=��7���w�3���1�`��|ln������z�p�����L?k�;�F�v���cxGpx����og�����z��c`:�4�P�n�5�.��^�����}�=>�c����g�.���>��B�ix9X��S�����\�����2���jp��	nw���8m�pW�������|N�p�\�,�bp4x����j?���`*x��3�����/����~�m/�������6?��{�r=��r=`�]`%0��2��y`�����3�9������)�P�p8|\l�:{�\~n2�.������?�.b0�^��9'f��������������+������������1��|~�y�����vp�>�-�6���zP_�`
�1�Q�����yU�`p8,G���I�������s���X�u�
�������A}�`�VI����q[������~s�����&{�;�!�.�:p78-���\�}L�x�<*�*�rlL�Z�}�
~&��$p\?�������9?��`�D���W�����'����L�v��\3=���
||	||\~�����<b>�������vLI���o�����w�m�&� ��������%q\�)����$��?�8�-I���l�q}�oS���$��A�#��$�������d�_��k��������|��y^?p8|������Sp^2�u����`%�gm2��&�~f��'�_��S8O9R��I)��j��=��P���&{��kR����-��`S�v�x
�����
T��)\�������u�~p�\)����2~o*���\����R��o���]�N�O*�C�;���r]4�||'8|�<�q������S9o�����P*�Z���u`��
ns��[W�k�;��?��2*��mb���� ��{���k�O�
���l�w����A���`'�{�X)�������:��u��� ��c!8,J��P��yt8
����{8|�
\�����4���M�2�?h��38�M��������`�t�8&����|~8�
,G����������f�>���Z��o���mK�<��~����B���Y��\��2��|��y�%`�J�!�?��?&��|J��;28.��`9gp>|&���:�������������/�U��z}�0�:��z��Gl�^V���/�����o����������������o_7��
�wW�:��8�����&���F��/�2�:���W���v�6��~c&��,�.�����39~?
�__��wd��������`�,����x����n�l��o�t)��Z
,�����~���~����x0���eY\���b�����g��<�:���/Y�U��:?>��Bf6�=9�\���//��xtu6��������qlF6�������1����l��^��>�-p���9��5�q$���X��t��~/����j�z���ut�)�s
�COp?8���X0���v����p
�,x�\n_??����15���lf����`�8px!8
�
���Ig0|E��2���q�&?�Qb'���l�eK/[z����^q�����-�l�eK/[z����^�����-�l�eK����*�9�S��6�����6�����C��=�>�;�<Y���2����7������%��UH�';jz�oLO����>6=���'{��d�LO����V��l���2=��z2Y��'�`���'��bP�ca���n���y��5n��:u\3e����cZ�~��������x�)�9����[O?v<��Lsm?>�c��x��+���d!~,/�~,+�~L�����x����6�~l�A?���[`���i����H��h�~��A?v�A?���ka��52��j�ci�X�A?&{5�c��XE�~���~�G?���������}����O?��O?����.?���~L�o�3�|�b��b�~������������I9���/&�/e��I���U1��$�b��s�b�^�o��%�a�}X�A�i��e�aR��j�au
���}X�>,���5��������\j�'��A&�#��A&�4��a����Z<X���x��I���*�`R?������I;����������{�O&{}�`+��`K��`�����������M�������
���]�������O��O��O��O���K���y��`�~z0�Ov�Gv�G��G���l���M=�:=�=�3>z�'}�`K|�`}�`�f����������������������G&�G<�(=�P=�@=X=X=X=X=�8%�`�}�`�}�`����e���>z0������`^z����`���`[��`��`/x������=����K6�Kv����K��,�%^z�Vf�s��`�^z0�����u���l�����7z��V{���x���{��n���M�������
�������u��������y���x������z�8=X���;7=�>7=�'nz�-nz�7��6�7L��V�����,���vk#����f�9����5;�M6�M��Mv������\7=X}7=Xu7��x7=X��Ls��r���q���5*�-=�K.z��.z�E.z��������nv���� l��s�5.���.�/[��r�������������.�/��������'��'���u���8��^w�����Z���z�I�u���k����:�����_=��_�\K����������\�������A������}��+�/������\Mw�]���*r��w�5v��_1�/�]���m:����-������_t���u��N:�WS��K�l�_�:��!E��M��Z@��RE�5O�MR�_���u���j���r�W����5��/��������/����6������b���q�_�/��>M����������y��d_'���8�/Yk���-�K����d�%������)��,E�����.T�_���5T�MT�_s��E��R������*�����w��K~*�K���:�W�N�UK��j���>A������������Z��_����.�K��%��%�'���N�����`���`?+z��=���������(z�e�L����f*z�	�l������:+z���L�Y<XE&�-�`��L�E<��k����L��C����������E&kg�`���������)z�s=X+Ev��������}�����WwE�%�n���.�����*P�_�F�u���:_��^�WsE�u���j���(z������������*z/����8�{I��$?�^�*z/i�^���XE�u����
>>����������Z���d� ��#E��[��^A���'������:���+��������K�����(������jE�D���>[<�E6^��
V�`r�������S�`RO����&�K<���`�����g&��`���K����dE�%�H�W������u�������F(�/���d��X��U�_���^e�/E����?�����u,L�V��.�����`cuz��tz��:=�|��1�l�N��N�V���D<�z��5�L��F�[���-������d'l�N6F��yH<��:=X�N&��x0����x(L�&��x0���6+z�7=��S��^T�`R���^V�`=���x�]��[E������`y:=Xg��X��=�x�:=�K:=�N���N�s���q���;���:��d�)L�����;��&9��&;��nt��I�B�|�`�8���t��ut����6��j;��d,�����l�N&{i�`����$.�`�~������]���
�����_=u��:��:��<��K��������
5��h�M0��4��a��{M����_�M�����6�����B�������e�-�l�e�,[f�2��Y���e�-�l�e�,[f)[f�2��Y���eV�dV���%'C�n�m�����]���Zvjs�.����Z��sB��������0�8XPT<�'Ea�sL>�������5�cB�j��(�&����6*�g�	�$�!1��)tr�C/��V�~�*�|�Z�Y�6�K�����I�J'����'T�_��p5��e�09�a���C?K�
��G
o��GY&�u��u�NsOZ��v���:-����'���v���:8^�y'��<�NG����Y��������+��������VNu�m�tfT{�������^���_��3KW{�)��<g�3��{�,]��OQ��9)���>�N�j�;E����yFW��<����I��t�Ua�����N_N�:���J�gJ���p�WARJ��~�_�:��e���x=��x
O����9������9����/rO����rO��r������q���f�5�/ro�����4���x��x�(6^��l�&v�EN���^�Iy��xyv�EN���a�5=��kj7�l����xWF���B��&�k���FjCP���p��6Z+�q���"������ m��g

��i���Zo�\�����b�z���B�����������
e�_Y�n����jZ<���XM;������
*�]���;��:5m����eVE��4��^�����op�4x��|�pw�h=o$����5�.4��TM�"�N���iR�0�����uR~%����IU���Z'�X'�Z'U�N���:In��J����:I��JrZ'�����I�$�u��:�o�dX��uR~�uR�:)�:)�:)�:)�:)�:)�:)h]�I�i�I��I�I���2��������jX�l�T`�������e�G��W��,�4�%u0����O;���6sE����(�*���gF��&��Z+�v��>8oRz"��d��s���\}p���,4��_5
�<I���r�E	�f��(��~kP���l��s6������+��)����=�o7��]�����dF��������
�2g������1Wx�D��r/&.Pz�gq���^j������c�|�"wu��
=<]���l���P��4�w�l��Z�����6�c{��o��]�{��O\��.�kZW�uR~-�����j['��.�uR~]��z�oh��_�:���Z'�7�N��N��Njd���:��uR�u\�I��<��]�����wC��C��\R+����9i&��������(x��e)����h�x��3���[�����E=�
��X9JY�G�X���������+����i�
�rR���Z��+{B�Q�f%%5�T=�G��)��u"�c8Vb����u���*�w�v���l������:�p%�C����� X�1��nX�x_mD�Dr���uI��qM�:�z8�����8nD7CS����?F��?�awr����7��j�����-^�������)�J��}���P��_~�~��"�����6?�~7ZK}/)E����c�����/-��PK�^ ���PKU][IMETA-INF/manifest.xml�UMO!��+6����Y��AS���=dg�>V�����nkLMW�3<�{�&K��98/�)��^����4MI^������F�Y��b3��:���$���r/}a�_�(l��"h0X|�k�.�10$��l�WKy\�V[t��[����}$���J�W-������c������0��I}��W~�����<��p�<������cEj�K��T< ���4�s3�Y��E,����c�j@~tR�+���&G]U�Au�[���Z6��)��q!@A�c"8���O��[��Ih�T�2�|{������:�uC������T�w=�/�'h�H<�9��2O�������d*L�^��j�_k�4����������CM#�+������d�	PK���-��PKU][I�l9�..mimetypePKU][I�2$�_�_TThumbnails/thumbnail.pngPKU][I���e'� N`settings.xmlPKU][IauO�
Ke�econtent.xmlPKU][I�H�I�M�pmeta.xmlPKU][I����$m 
�rstyles.xmlPKU][I��h��zmanifest.rdfPKU][I[{Configurations2/popupmenu/PKU][I'�{Configurations2/accelerator/current.xmlPKU][I�{Configurations2/floater/PKU][I |Configurations2/images/Bitmaps/PKU][I]|Configurations2/toolbar/PKU][I�|Configurations2/menubar/PKU][I�|Configurations2/statusbar/PKU][I}Configurations2/toolpanel/PKU][I9}Configurations2/progressbar/PKU][I���:Ys}Object 1/content.xmlPKU][I�����y��Object 1/styles.xmlPKU][ITnw"J��Object 1/meta.xmlPKU][I�^ ���	�ObjectReplacements/Object 1PKU][I���-��V�META-INF/manifest.xmlPK{�

Jim Nasby

Jim.Nasby@BlueTreble.com

about 9 years ago

In reply to: Mithun Cy (#1)

Re: Proposal : For Auto-Prewarm.

On 10/27/16 6:39 AM, Mithun Cy wrote:

# pg_autoprewarm.

IMO it would be better to add this functionality to pg_prewarm instead
of creating another contrib module. That would reduce confusion and
reduce the amount of code necessary.

+	cmp_member_elem(database);
+	cmp_member_elem(spcNode);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);

Presumably the first 4 numbers will vary far less than blocknum, so it's
probably worth reversing the order here (or at least put blocknum first).

I didn't look at the two load functions since presumably they'd go
away/change significantly if this was combined with pg_prewarm.

+	if (!block_info_array)
+		elog(ERROR, "Out of Memory!");
AFAICT this isn't necessary since palloc will error itself if it fails.

+	snprintf(transient_dump_file_path, sizeof(dump_file_path),
+			 "%s.save.tmp", DEFAULT_DUMP_FILENAME);
Since there's no method to change DEFAULT_DUMP_FILENAME, I would call it 
what it is: DUMP_FILENAME.

Also, maybe worth an assert to make sure there was enough room for the
complete filename. That'd need to be a real check if this was
configurable anyway.

+ if (!avoid_dumping)
+ dump_now();
Perhaps that check should be moved into dump_now() itself...
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tsunakawa, Takayuki

tsunakawa.takay@jp.fujitsu.com

about 9 years ago

In reply to: Mithun Cy (#1)

Re: Proposal : For Auto-Prewarm.

From: pgsql-hackers-owner@postgresql.org

[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Mithun Cy
# pg_autoprewarm.

This a PostgreSQL contrib module which automatically dump all of the
blocknums present in buffer pool at the time of server shutdown(smart and
fast mode only, to be enhanced to dump at regular interval.) and load these
blocks when server restarts.

I welcome this feature! I remember pg_hibernate did this. I wonder what happened to pg_hibernate. Did you check it?

Regards
Takayuki Tsunakawa

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier

michael.paquier@gmail.com

about 9 years ago

In reply to: Jim Nasby (#3)

Re: Proposal : For Auto-Prewarm.

On Fri, Oct 28, 2016 at 5:15 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 10/27/16 6:39 AM, Mithun Cy wrote:

# pg_autoprewarm.

IMO it would be better to add this functionality to pg_prewarm instead of
creating another contrib module. That would reduce confusion and reduce the
amount of code necessary.
+       cmp_member_elem(database);
+       cmp_member_elem(spcNode);
+       cmp_member_elem(filenode);
+       cmp_member_elem(forknum);
+       cmp_member_elem(blocknum);
Presumably the first 4 numbers will vary far less than blocknum, so it's
probably worth reversing the order here (or at least put blocknum first).

I didn't look at the two load functions since presumably they'd go
away/change significantly if this was combined with pg_prewarm.
+       if (!block_info_array)
+               elog(ERROR, "Out of Memory!");
AFAICT this isn't necessary since palloc will error itself if it fails.
+       snprintf(transient_dump_file_path, sizeof(dump_file_path),
+                        "%s.save.tmp", DEFAULT_DUMP_FILENAME);
Since there's no method to change DEFAULT_DUMP_FILENAME, I would call it
what it is: DUMP_FILENAME.
Also, maybe worth an assert to make sure there was enough room for the
complete filename. That'd need to be a real check if this was configurable
anyway.
+       if (!avoid_dumping)
+               dump_now();
Perhaps that check should be moved into dump_now() itself...

As this picked up my curiosity...

+/*
+ * ReadBufferForPrewarm -- This new interface is for pg_autoprewarm.
+ */
+Buffer
+ReadBufferForPrewarm(SMgrRelation smgr, char relpersistence,
+                     ForkNumber forkNum, BlockNumber blockNum,
+                     ReadBufferMode mode, BufferAccessStrategy strategy)
+{
+    bool        hit;
+
+    return ReadBuffer_common(smgr, relpersistence, forkNum, blockNum,
+                             mode, strategy, &hit);
+}
May be better to just expose ReadBuffer_common or rename it.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Kapila

amit.kapila16@gmail.com

about 9 years ago

In reply to: Jim Nasby (#3)

Re: Proposal : For Auto-Prewarm.

On Fri, Oct 28, 2016 at 1:45 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 10/27/16 6:39 AM, Mithun Cy wrote:

# pg_autoprewarm.

IMO it would be better to add this functionality to pg_prewarm instead of
creating another contrib module.

There is not much common functionality between the two. The main job
of this feature is to dump the contents of shared buffers and reload
them at startup and it takes care for not over ridding the existing
shared buffer contents while reloading the dump file. This is
somewhat different to what prewarm provides which is to load the
relation blocks in memory or buffers, it has nothing to do with prior
shared buffer contents. So, I am not sure if it is good idea to add
it as a functionality with prewarm module. OTOH, if more people want
that way, then as such there is no harm in doing that way.

One point that seems to be worth discussing is when should the buffer
information be dumped to a file? Shall we dump at each checkpoint or
at regular intervals via auto prewarm worker or at some other time?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Amit Kapila (#6)

Re: Proposal : For Auto-Prewarm.

On 2016-10-28 12:59:58 +0530, Amit Kapila wrote:

On Fri, Oct 28, 2016 at 1:45 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 10/27/16 6:39 AM, Mithun Cy wrote:

# pg_autoprewarm.

IMO it would be better to add this functionality to pg_prewarm instead of
creating another contrib module.

There is not much common functionality between the two.

I don't really agree. For me manual and automated prewarming are pretty
closely related. Sure they have their independent usages, and not too
much code might be shared. But grouping them in the same extension seems
to make sense, it's confusing to have closely related but independent
extensions.

One point that seems to be worth discussing is when should the buffer
information be dumped to a file? Shall we dump at each checkpoint or
at regular intervals via auto prewarm worker or at some other time?

Should probably be at some regular interval - not sure if checkpoints
are the best time (or if it's even realistic to tie a bgworker to
checkpoints), since checkpoints have a significant impact on the state
of shared_buffers.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Andres Freund (#7)

Re: Proposal : For Auto-Prewarm.

On Fri, Oct 28, 2016 at 3:40 AM, Andres Freund <andres@anarazel.de> wrote:

There is not much common functionality between the two.

I don't really agree. For me manual and automated prewarming are pretty
closely related. Sure they have their independent usages, and not too
much code might be shared. But grouping them in the same extension seems
to make sense, it's confusing to have closely related but independent
extensions.

I agree that putting them together would be fine.

One point that seems to be worth discussing is when should the buffer
information be dumped to a file? Shall we dump at each checkpoint or
at regular intervals via auto prewarm worker or at some other time?

Should probably be at some regular interval - not sure if checkpoints
are the best time (or if it's even realistic to tie a bgworker to
checkpoints), since checkpoints have a significant impact on the state
of shared_buffers.

Checkpoints don't cause any buffer replacement, which I think is what
would be relevant here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Haribabu Kommi

kommi.haribabu@gmail.com

about 9 years ago

In reply to: Robert Haas (#8)

Re: Proposal : For Auto-Prewarm.

Hi Ashutosh,

This is a gentle reminder.

you assigned as reviewer to the current patch in the 11-2016 commitfest.
But you haven't shared your review yet. Please share your views about
the patch. This will help us in smoother operation of commitfest.

Please Ignore if you already shared your review.

Regards,
Hari Babu
Fujitsu Australia

#10

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 9 years ago

In reply to: Haribabu Kommi (#9)

Re: Proposal : For Auto-Prewarm.

you assigned as reviewer to the current patch in the 11-2016 commitfest.
But you haven't shared your review yet. Please share your views about
the patch. This will help us in smoother operation of commitfest.

Thanks for the reminder.

Mithun has not provided a patch addressing the comments upthread. I am
waiting for his response to those comments.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Mithun Cy

mithun.cy@enterprisedb.com

about 9 years ago

In reply to: Jim Nasby (#3)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

Sorry I took some time on this please find latest patch with addressed
review comments. Apart from fixes for comments I have introduced a new GUC
variable for the pg_autoprewarm "buff_dump_interval". So now we dump the
buffer pool metadata at every buff_dump_interval secs. Currently
buff_dump_interval can be set only at startup time. I did not choose to do
the dumping at checkpoint time, as it appeared these 2 things are not much
related and keeping it independent would be nice for usage. Also overhead
of any communication between them can be avoided.

On Fri, Oct 28, 2016 at 1:45 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

IMO it would be better to add this functionality to pg_prewarm instead

of creating another contrib module. That would reduce confusion and reduce

the amount of code necessary.

I have merged pg_autoprewarm module into pg_prewarm, This is just the
directory merge, Functionality merge is not possible pg_prewarm is just a
utility function with specific signature to load a specific relation at
runtime, where as pg_autoprewarm is a bgworker which dumps current buffer
pool and load it during startup time.

Presumably the first 4 numbers will vary far less than blocknum, so it's

probably worth reversing the order here (or at least put blocknum first).

function sort_cmp_func is for qsort so orderly comparison is needed to say
which is bigger or smaller, If we put blocknum first, we cannot decide same.

AFAICT this isn't necessary since palloc will error itself if it fails.

Fixed.

Since there's no method to change DEFAULT_DUMP_FILENAME, I would call it

what it is: DUMP_FILENAME.

Fixed.

Also, maybe worth an assert to make sure there was enough room for the

complete filename. That'd need to be a real check if this was configurable

anyway.

I think if we make it configurable I think I can put that check.

+ if (!avoid_dumping)
+               dump_now();
Perhaps that check should be moved into dump_now() itself...

Fixed.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

pg_autoprewarm_02.patchapplication/octet-stream; name=pg_autoprewarm_02.patchDownload

commit 689fc9b602c1379dbb58a785bdda4dc1e2fdc2bb
Author: mithun <mithun@localhost.localdomain>
Date:   Tue Nov 29 10:43:01 2016 +0530

    pg_autoprewarm_patch_02

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e..8ec0411 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,7 +1,7 @@
 # contrib/pg_prewarm/Makefile
 
-MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+MODULES = pg_prewarm pg_autoprewarm
+OBJS = pg_prewarm.o pg_autoprewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
 DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
diff --git a/contrib/pg_prewarm/README b/contrib/pg_prewarm/README
new file mode 100644
index 0000000..047f436
--- /dev/null
+++ b/contrib/pg_prewarm/README
@@ -0,0 +1,29 @@
+# pg_autoprewarm.
+
+This a PostgreSQL contrib module which automatically dump all of the blocknums
+present in buffer pool at a regular interval and at the time of server shutdown
+(smart and fast mode only) and load these blocks when server restarts.
+
+Design:
+------
+We have created a BG worker Auto Pre-warmer which during shutdown dumps all the
+blocknum in buffer pool after sorting same.
+Format of each entry is <DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>.
+Auto Pre-warmer is started as soon as the postmaster is started we do not wait
+for recovery to finish and database to reach a consistent state. If there is a
+"dump_file" to load we start loading each block entry to buffer pool until
+there is a free buffer. This way we do not replace any new blocks which was
+loaded either by recovery process or querying clients.
+
+HOW TO USE:
+-----------
+Build and add the pg_autoprewarm to shared_preload_libraries. Auto Pre-warmer
+process automatically do dumping of buffer pool block info and load them when
+restarted.
+Set pg_autoprewarm.buff_dump_interval in seconds to specify minimum interval
+between two dumps. If pg_autoprewarm.buff_dump_interval is set to zero then
+dumping based on timer is disabled. We only dump while server shutdown.
+
+TO DO:
+------
+Add functionality to dump based on timer at regular interval.
diff --git a/contrib/pg_prewarm/pg_autoprewarm.c b/contrib/pg_prewarm/pg_autoprewarm.c
new file mode 100644
index 0000000..6796306
--- /dev/null
+++ b/contrib/pg_prewarm/pg_autoprewarm.c
@@ -0,0 +1,493 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_autoprewarm.c
+ *	Automatically dumps and load buffer pool.
+ *
+ *	contrib/pg_autoprewarm/pg_autoprewarm.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "pgstat.h"
+#include "storage/buf_internals.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "catalog/pg_class.h"
+#include <unistd.h>
+#include "utils/guc.h"
+
+PG_MODULE_MAGIC;
+
+static void AutoPreWarmerMain (Datum main_arg);
+static bool
+load_block(RelFileNode rnode, char reltype, ForkNumber forkNum,
+		   BlockNumber blockNum);
+
+/* Primary functions */
+void            _PG_init(void);
+
+/* Secondary/supporting functions */
+static void     sigtermHandler(SIGNAL_ARGS);
+
+/* flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+
+/*
+ *	Signal handler for SIGTERM
+ *	set our latch to wake it up.
+ */
+static void
+sigtermHandler(SIGNAL_ARGS)
+{
+	int save_errno = errno;
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/* Meta-data of each persistent page buffer which is dumped and used to load. */
+typedef struct BlockInfoRecord
+{
+	Oid			database;	/* datbase */
+	Oid			spcNode;	/* tablespace */
+	Oid			filenode;	/* relation */
+	ForkNumber	forknum;	/* fork number */
+	BlockNumber	blocknum;	/* block number */
+}BlockInfoRecord;
+
+/* Try loading only once during startup. If any error do not retry. */
+static bool avoid_loading = false;
+
+/*
+ * And avoid dumping if we receive sigterm while loading. Also do not re-try if
+ * dump has failed previously.
+ */
+static bool avoid_dumping = false;
+
+int	buff_dump_interval = 0;
+
+/* compare member elements to check if they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * sort_cmp_func - compare function used while qsorting BlockInfoRecord objects.
+ */
+static int
+sort_cmp_func(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(spcNode);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+	return 0;
+}
+
+#define DUMP_FILENAME "autoprewarm"
+
+/*
+ *	load_block -	Load a given block.
+ */
+bool
+load_block(RelFileNode rnode, char reltype, ForkNumber forkNum,
+		   BlockNumber blockNum)
+{
+	Buffer      buffer;
+
+	/* Load the page only if there exist a free buffer. We do not want to
+	 * replace an existing buffer. */
+	if (have_free_buffer())
+	{
+		SMgrRelation smgr = smgropen(rnode, InvalidBackendId);
+
+		/*
+		 * Check if fork exists first otherwise we will not be able to use one
+		 * free buffer for each non existing block.
+		 */
+		if  (smgrexists(smgr, forkNum))
+		{
+			buffer = ReadBufferForPrewarm(smgr, reltype,
+										  forkNum, blockNum,
+										  RBM_NORMAL, NULL);
+			if (!BufferIsValid(buffer))
+				elog(LOG, "\n Skipped the buff page. \n");
+			else
+				ReleaseBuffer(buffer);
+		}
+
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ *	load_now - Main routine which reads from dump file and load each block.
+ *	We try to load each blocknum read from DUMP_FILENAME until we have
+ *	any free buffer left or SIGTERM is received. If we fail to load a block we
+ *	ignore the ERROR and try to load next blocknum. This is because there is
+ *	possibility that corresponding blocknum might have been deleted.
+ */
+static void load_now(void)
+{
+	static char dump_file_path[MAXPGPATH];
+	FILE *file = NULL;
+	uint32 i, num_buffers = 0;
+
+	avoid_loading = true;
+
+	/* Check if file exists and open file in read mode. */
+	snprintf(dump_file_path, sizeof(dump_file_path), "%s.save", DUMP_FILENAME);
+	file = fopen(dump_file_path, PG_BINARY_R);
+
+	if (!file)
+		return;	/* No file to load. */
+
+	if (fscanf(file,"<<%u>>", &num_buffers) != 1)
+	{
+		fclose(file);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				errmsg("Error reading num of elements in \"%s\" for autoprewarm : %m", dump_file_path)));
+	}
+
+	elog(LOG, "Num buffers : %d \n", num_buffers);
+
+	for (i = 0; i < num_buffers; i++)
+	{
+		RelFileNode	rnode;
+		uint32		forknum;
+		BlockNumber	blocknum;
+		bool		have_free_buf = true;
+
+		if (got_sigterm)
+		{
+			/*
+			 * Received shutdown while we were still loading the buffers.
+			 * No need to dump at this stage.
+			 */
+			avoid_dumping = true;
+			break;
+		}
+
+		if(!have_free_buf)
+			break;
+
+		/* Get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &rnode.dbNode, &rnode.spcNode,
+							 &rnode.relNode, &forknum, &blocknum))
+			break;	/* No more valid entry hence stop processing. */
+
+		PG_TRY();
+		{
+			have_free_buf = load_block(rnode, RELPERSISTENCE_PERMANENT,
+									   (ForkNumber)forknum, blocknum);
+		}
+		PG_CATCH();
+		{
+			/* Any error handle it and then try to load next buffer. */
+
+			/* Prevent interrupts while cleaning up */
+			HOLD_INTERRUPTS();
+
+			/* Report the error to the server log */
+			EmitErrorReport();
+
+			LWLockReleaseAll();
+			AbortBufferIO();
+			UnlockBuffers();
+
+			/* buffer pins are released here. */
+			ResourceOwnerRelease(CurrentResourceOwner,
+								 RESOURCE_RELEASE_BEFORE_LOCKS,
+								 false, true);
+			FlushErrorState();
+
+			/* Now we can allow interrupts again */
+			RESUME_INTERRUPTS();
+		}
+		PG_END_TRY();
+	}
+
+	fclose(file);
+
+	elog(LOG, "loaded");
+	return;
+}
+
+
+/*
+ *	dump_now - Main routine which goes through each buffer header and dump
+ *	their metadata in the format.
+ *	<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>. We Sort these data
+ *	and then dump them. Sorting is necessary as it facilitates sequential read
+ *	during load. Unlike load if we encounter any error we abort the dump.
+ */
+static void dump_now(void)
+{
+	static char		dump_file_path[MAXPGPATH],
+					transient_dump_file_path[MAXPGPATH];
+	uint32			i;
+	int				ret;
+	uint32			num_buffers;
+	BlockInfoRecord	*block_info_array;
+	BufferDesc		*bufHdr;
+	FILE			*file = NULL;
+
+	if (avoid_dumping)
+		return;
+
+	avoid_dumping = true;
+	block_info_array = (BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	for (num_buffers = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32 buf_state;
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* Lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		/* Only valid and persistant page buffers are dumped. */
+		if ((buf_state & BM_VALID) && (buf_state & BM_TAG_VALID) &&
+			(buf_state & BM_PERMANENT))
+		{
+			block_info_array[num_buffers].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_buffers].spcNode  = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_buffers].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_buffers].forknum  = bufHdr->tag.forkNum;
+			block_info_array[num_buffers].blocknum = bufHdr->tag.blockNum;
+			++num_buffers;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	/* Sorting now only to avoid sorting while loading. */
+	pg_qsort(block_info_array, num_buffers, sizeof(BlockInfoRecord), sort_cmp_func);
+
+	snprintf(transient_dump_file_path, sizeof(dump_file_path),
+			 "%s.save.tmp", DUMP_FILENAME);
+	file = fopen(transient_dump_file_path, "w");
+	if (file == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open \"%s\": %m", dump_file_path)));
+
+	snprintf(dump_file_path, sizeof(dump_file_path),
+			 "%s.save", DUMP_FILENAME);
+
+	/* Write num_buffers first and then BlockMetaInfoRecords. */
+	ret = fprintf(file, "<<%u>>\n", num_buffers);
+	if (ret < 0)
+	{
+		fclose(file);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				errmsg("error writing to \"%s\" : %m", dump_file_path)));
+	}
+
+	for (i = 0; i < num_buffers; i++)
+	{
+		ret = fprintf(file, "%u,%u,%u,%u,%u\n",
+							block_info_array[i].database,
+							block_info_array[i].spcNode,
+							block_info_array[i].filenode,
+							(uint32)block_info_array[i].forknum,
+							block_info_array[i].blocknum);
+		if (ret < 0)
+		{
+			fclose(file);
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					errmsg("error writing to \"%s\" : %m", dump_file_path)));
+		}
+	}
+
+	pfree(block_info_array);
+
+	/*
+	 * Rename transient_dump_file_path to dump_file_path to make things
+	 * permanent.
+	 */
+	ret = fclose(file);
+	if (ret != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				errmsg("error closing \"%s\" : %m", transient_dump_file_path)));
+
+	ret = unlink(dump_file_path);
+	if (ret != 0 && errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				errmsg("unlink \"%s\" failed : %m", dump_file_path)));
+
+	ret = rename(transient_dump_file_path, dump_file_path);
+	if (ret != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				errmsg("Failed to rename \"%s\" to \"%s\" : %m",
+					   transient_dump_file_path, dump_file_path)));
+
+	if (!got_sigterm)
+		avoid_dumping = false;
+
+	elog(LOG, "Buffer Dump: saved metadata of %d blocks", num_buffers);
+}
+
+/* Extension's entry point. */
+void _PG_init(void)
+{
+	BackgroundWorker auto_prewarm;
+
+	/* Define custom GUC variables. */
+	DefineCustomIntVariable("pg_autoprewarm.buff_dump_interval",
+		"Sets the maximum time between two buffer pool dumps",
+		"If set to Zero, timer based dumping is diabled.",
+							&buff_dump_interval,
+							0,
+							0, INT_MAX / 1000,
+							PGC_POSTMASTER,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	/* Register AutoPreWarmer. */
+	MemSet(&auto_prewarm, 0, sizeof(auto_prewarm));
+	auto_prewarm.bgw_main_arg = Int32GetDatum(0);
+	auto_prewarm.bgw_flags      = BGWORKER_SHMEM_ACCESS;
+
+	/* Register the Auto Pre-warmer background worker */
+	auto_prewarm.bgw_start_time = BgWorkerStart_PostmasterStart;
+	auto_prewarm.bgw_restart_time   = 0;  /* Keep the Auto Pre-warmer running */
+	auto_prewarm.bgw_main           = AutoPreWarmerMain;
+	snprintf(auto_prewarm.bgw_name, BGW_MAXLEN, "Auto Pre-warmer");
+	RegisterBackgroundWorker(&auto_prewarm);
+}
+
+/*
+ * AutoPreWarmerMain -- Main entry point of Auto-prewarmer process.
+ *						This is invoked as a background worker.
+ */
+static void AutoPreWarmerMain (Datum main_arg)
+{
+	MemoryContext	autoprewarmer_context;
+	sigjmp_buf		local_sigjmp_buf;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, sigtermHandler);
+
+	/*
+	 * Create a resource owner to keep track of our resources.
+	 */
+	CurrentResourceOwner = ResourceOwnerCreate(NULL, "AutoPreWarmer");
+
+	/*
+	 * Create a memory context that we will do all our work in.  We do this so
+	 * that we can reset the context during error recovery and thereby avoid
+	 * possible memory leaks.
+	 */
+	autoprewarmer_context = AllocSetContextCreate(TopMemoryContext,
+												 "AutoPreWarmer",
+												 ALLOCSET_DEFAULT_MINSIZE,
+												 ALLOCSET_DEFAULT_INITSIZE,
+												 ALLOCSET_DEFAULT_MAXSIZE);
+	MemoryContextSwitchTo(autoprewarmer_context);
+
+
+	/*
+	 * If an exception is encountered, processing resumes here.
+	 * See notes in postgres.c about the design of this coding.
+	 */
+	if (sigsetjmp(local_sigjmp_buf, 1) != 0)
+	{
+		/* Since not using PG_TRY, must reset error stack by hand */
+		error_context_stack = NULL;
+
+		/* Prevent interrupts while cleaning up */
+		HOLD_INTERRUPTS();
+
+		/* Report the error to the server log */
+		EmitErrorReport();
+
+		LWLockReleaseAll();
+		AbortBufferIO();
+		UnlockBuffers();
+
+		/* buffer pins are released here. */
+		ResourceOwnerRelease(CurrentResourceOwner,
+							 RESOURCE_RELEASE_BEFORE_LOCKS,
+							 false, true);
+		AtEOXact_Buffers(false);
+		AtEOXact_SMgr();
+
+		MemoryContextSwitchTo(autoprewarmer_context);
+		FlushErrorState();
+
+		/* Flush any leaked data in the top-level context */
+		MemoryContextResetAndDeleteChildren(autoprewarmer_context);
+
+		/* Now we can allow interrupts again */
+		RESUME_INTERRUPTS();
+
+		/* Close all open files after any error. */
+		smgrcloseall();
+
+		/* Error while dumping is treated as fatal hence do proc_exit */
+		if (avoid_dumping)
+			proc_exit(0);
+	}
+
+	/* We can now handle ereport(ERROR) */
+	PG_exception_stack = &local_sigjmp_buf;
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+	if (!avoid_loading)
+		load_now();
+	while (!got_sigterm)
+	{
+		int rc;
+		int timeout = 10;
+
+		if (buff_dump_interval)
+			timeout = buff_dump_interval;
+
+		ResetLatch(&MyProc->procLatch);
+		rc = WaitLatch(&MyProc->procLatch,
+						WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						timeout * 1000, PG_WAIT_EXTENSION);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		/* If buff_dump_interval is set then dump the buff pool. */
+		if ((rc & WL_TIMEOUT) && buff_dump_interval)
+			dump_now();
+	}
+
+	/* One last buffer pool dump while postmaster shutdown. */
+	dump_now();
+}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 58b0a97..acce5d2 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -693,6 +693,20 @@ ReadBufferWithoutRelcache(RelFileNode rnode, ForkNumber forkNum,
 							 mode, strategy, &hit);
 }
 
+/*
+ * ReadBufferForPrewarm -- This new interface is for pg_autoprewarm.
+ */
+Buffer
+ReadBufferForPrewarm(SMgrRelation smgr, char relpersistence,
+					 ForkNumber forkNum, BlockNumber blockNum,
+					 ReadBufferMode mode, BufferAccessStrategy strategy)
+{
+	bool        hit;
+
+	return ReadBuffer_common(smgr, relpersistence, forkNum, blockNum,
+							 mode, strategy, &hit);
+}
+
 
 /*
  * ReadBuffer_common -- common logic for all ReadBuffer variants
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 88b90dc..8d267fa 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,19 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- This function check whether there is a free buffer in
+ * buffer pool. Used by pg_autoprewarm module.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index c7da9f6..59a5277 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 821bee5..495fa8e 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -16,6 +16,7 @@
 
 #include "storage/block.h"
 #include "storage/buf.h"
+#include "storage/smgr.h"
 #include "storage/bufpage.h"
 #include "storage/relfilenode.h"
 #include "utils/relcache.h"
@@ -183,6 +184,10 @@ extern Buffer ReadBufferExtended(Relation reln, ForkNumber forkNum,
 extern Buffer ReadBufferWithoutRelcache(RelFileNode rnode,
 						  ForkNumber forkNum, BlockNumber blockNum,
 						  ReadBufferMode mode, BufferAccessStrategy strategy);
+extern Buffer ReadBufferForPrewarm(SMgrRelation smgr, char relpersistence,
+								   ForkNumber forkNum, BlockNumber blockNum,
+								   ReadBufferMode mode,
+								   BufferAccessStrategy strategy);
 extern void ReleaseBuffer(Buffer buffer);
 extern void UnlockReleaseBuffer(Buffer buffer);
 extern void MarkBufferDirty(Buffer buffer);

#12

Haribabu Kommi

kommi.haribabu@gmail.com

about 9 years ago

In reply to: Mithun Cy (#11)

Re: Proposal : For Auto-Prewarm.

On Tue, Nov 29, 2016 at 4:26 PM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

Sorry I took some time on this please find latest patch with addressed
review comments. Apart from fixes for comments I have introduced a new GUC
variable for the pg_autoprewarm "buff_dump_interval". So now we dump the
buffer pool metadata at every buff_dump_interval secs. Currently
buff_dump_interval can be set only at startup time. I did not choose to do
the dumping at checkpoint time, as it appeared these 2 things are not much
related and keeping it independent would be nice for usage. Also overhead
of any communication between them can be avoided.

On Fri, Oct 28, 2016 at 1:45 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:

IMO it would be better to add this functionality to pg_prewarm instead

of creating another contrib module. That would reduce confusion and reduce

the amount of code necessary.

I have merged pg_autoprewarm module into pg_prewarm, This is just the
directory merge, Functionality merge is not possible pg_prewarm is just a
utility function with specific signature to load a specific relation at
runtime, where as pg_autoprewarm is a bgworker which dumps current buffer
pool and load it during startup time.

Presumably the first 4 numbers will vary far less than blocknum, so it's

probably worth reversing the order here (or at least put blocknum first).

function sort_cmp_func is for qsort so orderly comparison is needed to say
which is bigger or smaller, If we put blocknum first, we cannot decide same.

AFAICT this isn't necessary since palloc will error itself if it fails.

Fixed.

Since there's no method to change DEFAULT_DUMP_FILENAME, I would call it

what it is: DUMP_FILENAME.

Fixed.

Also, maybe worth an assert to make sure there was enough room for the

complete filename. That'd need to be a real check if this was configurable

anyway.

I think if we make it configurable I think I can put that check.
+ if (!avoid_dumping)
+               dump_now();
Perhaps that check should be moved into dump_now() itself...
Fixed.

Moved to next CF with "needs review" status.

Regards,
Hari Babu
Fujitsu Australia

#13

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 9 years ago

In reply to: Haribabu Kommi (#12)

Re: Proposal : For Auto-Prewarm.

I took a look at this again, and it doesn't appear to be working for me. The library is being loaded during startup, but I don't see any further activity in the log, and I don't see an autoprewarm file in $PGDATA.

There needs to be some kind of documentation change as part of this patch.

I'm not sure the default GUC setting of 0 makes sense. If you've loaded the module, presumably you want it to be running. I think it'd be nice if the GUC had a -1 setting that meant to use checkpoint_timeout.

Having the GUC be restart-only is also pretty onerous. I don't think it'd be hard to make the worker respond to a reload... there's code in the autovacuum launcher you could use as an example.

I'm also wondering if this really needs to be a permanently running process... perhaps the worker could simply be started as necessary? Though maybe that means it wouldn't run at shutdown. Also not sure if it could be relaunched when a reload happens.

I'm marking this as waiting on author for now, because it's not working for me and needs documentation.

The new status of this patch is: Waiting on Author

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Mithun Cy

mithun.cy@enterprisedb.com

almost 9 years ago

In reply to: Jim Nasby (#13)

Re: Proposal : For Auto-Prewarm.

On Tue, Jan 24, 2017 at 5:07 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

I took a look at this again, and it doesn't appear to be working for me. The library is being loaded during startup, but I don't see any further activity in the log, and I don't see an autoprewarm file in $PGDATA.

Hi Jim,
Thanks for looking into this patch, I just downloaded the patch and
applied same to the latest code, I can see file " autoprewarm.save" in
$PGDATA which is created and dumped at shutdown time and an activity
is logged as below
2017-01-24 13:22:25.012 IST [91755] LOG: Buffer Dump: saved metadata
of 59 blocks.

In my code by default, we only dump at shutdown time. If we want to
dump at regular interval then we need to set the GUC
pg_autoprewarm.buff_dump_interval to > 0. I think I am missing
something while trying to recreate the bug reported above. Can you
please let me know what exactly you mean by the library is not
working.

There needs to be some kind of documentation change as part of this patch.

Thanks, I will add a sgml for same.

For remaining suggestions, I will try to address in my next patch
based on its feasibility.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Beena Emerson

memissemerson@gmail.com

almost 9 years ago

In reply to: Mithun Cy (#14)

Re: Proposal : For Auto-Prewarm.

On Tue, Jan 24, 2017 at 1:56 PM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

On Tue, Jan 24, 2017 at 5:07 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:

I took a look at this again, and it doesn't appear to be working for me.

The library is being loaded during startup, but I don't see any further
activity in the log, and I don't see an autoprewarm file in $PGDATA.

Hi Jim,
Thanks for looking into this patch, I just downloaded the patch and
applied same to the latest code, I can see file " autoprewarm.save" in
$PGDATA which is created and dumped at shutdown time and an activity
is logged as below
2017-01-24 13:22:25.012 IST [91755] LOG: Buffer Dump: saved metadata
of 59 blocks.

In my code by default, we only dump at shutdown time. If we want to
dump at regular interval then we need to set the GUC
pg_autoprewarm.buff_dump_interval to > 0. I think I am missing
something while trying to recreate the bug reported above. Can you
please let me know what exactly you mean by the library is not
working.

There needs to be some kind of documentation change as part of this

patch.
Thanks, I will add a sgml for same.

For remaining suggestions, I will try to address in my next patch
based on its feasibility.

The patch works for me too.

Few initial comments:

1. I think the README was maintained as is from the 1st version and says
pg_autoprewarm is a contrib module. It should be rewritten to
pg_autoprewarm is a part of pg_prewarm module. The documentation should be
added to pgprewarm.sgml instead of the README

2. buff_dump_interval could be renamed to just dump_interval or
buffer_dump_interval. Also, since it is now part of pg_prewarm. I think it
makes sense to have the conf parameter be: pg_prewarm.xxx instead of
pg_autoprewarm.xxx

3. During startup we get the following message:

2017-01-24 16:13:57.615 IST [90061] LOG: Num buffers : 272

I could be better written as “pg_prewarm: 272 blocks loaded from dump” or
something similar.

4. Also, the message while dumping says:

2017-01-24 16:15:17.712 IST [90061] LOG: Buffer Dump: saved metadata of
272 blocks

It would be better to write the module name in message instead of "Buffer
Dump"

Thank you,

Beena Emerson

Have a Great Day!

#16

Mithun Cy

mithun.cy@enterprisedb.com

almost 9 years ago

In reply to: Tsunakawa, Takayuki (#4)

Re: Proposal : For Auto-Prewarm.

On Fri, Oct 28, 2016 at 6:36 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:

I welcome this feature! I remember pg_hibernate did this. I wonder what happened to pg_hibernate. Did you check it?

Thanks, when I checked with pg_hibernate there were two things people
complained about it. Buffer loading will start after the recovery is
finished and the database has reached the consistent state. Two It can
replace existing buffers which are loaded due to recovery and newly
connected clients. And this solution tried to solve them.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 9 years ago

In reply to: Beena Emerson (#15)

Re: Proposal : For Auto-Prewarm.

On 1/24/17 4:56 AM, Beena Emerson wrote:

2. buff_dump_interval could be renamed to just dump_interval or
buffer_dump_interval. Also, since it is now part of pg_prewarm. I think
it makes sense to have the conf parameter be: pg_prewarm.xxx instead of
pg_autoprewarm.xxx

I'd really like to find terminology other than "buffer dump", because
that makes it sound like we're dumping the contents of the buffers
themselves.

Maybe block_map? Buffer_map?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 9 years ago

In reply to: Mithun Cy (#14)

Re: Proposal : For Auto-Prewarm.

On 1/24/17 2:26 AM, Mithun Cy wrote:

Thanks for looking into this patch, I just downloaded the patch and
applied same to the latest code, I can see file " autoprewarm.save" in
$PGDATA which is created and dumped at shutdown time and an activity
is logged as below
2017-01-24 13:22:25.012 IST [91755] LOG: Buffer Dump: saved metadata
of 59 blocks.

Yeah, I wasn't getting that at all, though I did see the shared library
being loaded. If I get a chance I'll try it again.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Beena Emerson

memissemerson@gmail.com

almost 9 years ago

In reply to: Jim Nasby (#18)

Re: Proposal : For Auto-Prewarm.

On Wed, Jan 25, 2017 at 10:36 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:

On 1/24/17 2:26 AM, Mithun Cy wrote:

Thanks for looking into this patch, I just downloaded the patch and
applied same to the latest code, I can see file " autoprewarm.save" in
$PGDATA which is created and dumped at shutdown time and an activity
is logged as below
2017-01-24 13:22:25.012 IST [91755] LOG: Buffer Dump: saved metadata
of 59 blocks.

Yeah, I wasn't getting that at all, though I did see the shared library
being loaded. If I get a chance I'll try it again.

Hope u added the following to the conf file:

shared_preload_libraries = 'pg_autoprewarm' # (change requires restart)
pg_autoprewarm.buff_dump_interval=20

Even after this u could not see the message then that's strange.

--
Thank you,

Beena Emerson

Have a Great Day!

#20

Amit Kapila

amit.kapila16@gmail.com

almost 9 years ago

In reply to: Jim Nasby (#13)

Re: Proposal : For Auto-Prewarm.

On Tue, Jan 24, 2017 at 5:07 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

I took a look at this again, and it doesn't appear to be working for me. The library is being loaded during startup, but I don't see any further activity in the log, and I don't see an autoprewarm file in $PGDATA.

There needs to be some kind of documentation change as part of this patch.

I'm not sure the default GUC setting of 0 makes sense. If you've loaded the module, presumably you want it to be running. I think it'd be nice if the GUC had a -1 setting that meant to use checkpoint_timeout.

Having the GUC be restart-only is also pretty onerous. I don't think it'd be hard to make the worker respond to a reload... there's code in the autovacuum launcher you could use as an example.

+1. I don't think there should be any problem in making it PGC_SIGHUP.

I'm also wondering if this really needs to be a permanently running process... perhaps the worker could simply be started as necessary?

Do you want to invoke worker after every buff_dump_interval? I think
that will be bad in terms of starting a new process and who will
monitor when to start such a process. I think it is better to keep it
as a permanently running background process if loaded by user.

Though maybe that means it wouldn't run at shutdown.

Yeah, that will be another drawback.

Few comments found while glancing the patch.

1.
+TO DO:
+------
+Add functionality to dump based on timer at regular interval.

I think you need to remove above TO DO.

2.
+ /* Load the page only if there exist a free buffer. We do not want to
+ * replace an existing buffer. */

This is not a PG style multiline comment.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Jim Nasby (#13)

Re: Proposal : For Auto-Prewarm.

On Mon, Jan 23, 2017 at 6:37 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

I'm not sure the default GUC setting of 0 makes sense. If you've loaded the module, presumably you want it to be running. I think it'd be nice if the GUC had a -1 setting that meant to use checkpoint_timeout.

Actually, I think we need to use -1 to mean "don't run the worker at
all". 0 means "run the worker, but don't do timed dumps". >0 means
"run the worker, and dump at that interval".

I have to admit that when I was first thinking about this feature, my
initial thought was "hey, let's dump once per checkpoint_timeout".
But I think that Mithun's approach is better. There's no intrinsic
connection between this and checkpointing, and letting the user pick
the interval is a lot more flexible. We could still have a magic
value that means "same as checkpoint_timeout" but it's not obvious to
me that there's any value in that; the user might as well just pick
the time interval that they want.

Actually, for busy systems, the interval is probably shorter than
checkpoint_timeout. Dumping the list of buffers isn't that expensive,
and if you are doing checkpoints every half hour or so that's not
probably longer than what you want for this. So I suggest that we
should just have the documentation could recommend a suitable value
(e.g. 5 minutes) and not worry about checkpoint_timeout.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 9 years ago

In reply to: Beena Emerson (#19)

Re: Proposal : For Auto-Prewarm.

On 1/24/17 11:13 PM, Beena Emerson wrote:

On Wed, Jan 25, 2017 at 10:36 AM, Jim Nasby <Jim.Nasby@bluetreble.com
<mailto:Jim.Nasby@bluetreble.com>> wrote:

On 1/24/17 2:26 AM, Mithun Cy wrote:

Thanks for looking into this patch, I just downloaded the patch and
applied same to the latest code, I can see file "
autoprewarm.save" in
$PGDATA which is created and dumped at shutdown time and an activity
is logged as below
2017-01-24 13:22:25.012 IST [91755] LOG: Buffer Dump: saved
metadata
of 59 blocks.

Yeah, I wasn't getting that at all, though I did see the shared
library being loaded. If I get a chance I'll try it again.

Hope u added the following to the conf file:

shared_preload_libraries = 'pg_autoprewarm' # (change requires restart)

Therein lied my problem: I was preloading pg_prewarm, not pg_autoprewarm.

I think the two need to be integrated much better than they are right
now. They should certainly be in the same .so, and as others have
mentioned the docs need to be fixed. For consistency, I think the name
should just be pg_prewarm, as well as the prefix for the GUC.

Based on that and other feedback I'm going to mark this as returned with
feedback, though if you're able to get a revised patch in the next few
days please do.

FYI (and this is just a suggestion), for testing purposes, it might also
be handy to allow manual dump and load via functions, with the load
function giving you the option to forcibly load (instead of doing
nothing if there are no buffers on the free list). It would also be
handy of those functions accepted a different filename. That way you
could reset shared_buffers to a known condition before running a test.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 9 years ago

In reply to: Jim Nasby (#22)

Re: Proposal : For Auto-Prewarm.

On 1/25/17 1:46 PM, Jim Nasby wrote:

Based on that and other feedback I'm going to mark this as returned with
feedback, though if you're able to get a revised patch in the next few
days please do.

Actually, based on the message that popped up when I went to do that I
guess it's better not to do that, so I marked it as Waiting on Author.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Jim Nasby (#22)

Re: Proposal : For Auto-Prewarm.

On Wed, Jan 25, 2017 at 2:46 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

I think the two need to be integrated much better than they are right now.
They should certainly be in the same .so, and as others have mentioned the
docs need to be fixed. For consistency, I think the name should just be
pg_prewarm, as well as the prefix for the GUC.

Yikes. +1, definitely.

It would also be handy of those functions
accepted a different filename. That way you could reset shared_buffers to a
known condition before running a test.

That would have some pretty unpleasant security implications unless it
is awfully carefully thought out. I doubt this has enough utility to
make it worth thinking that hard about.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

almost 9 years ago

In reply to: Mithun Cy (#14)

Re: Proposal : For Auto-Prewarm.

On 1/24/17 3:26 AM, Mithun Cy wrote:

In my code by default, we only dump at shutdown time. If we want to
dump at regular interval then we need to set the GUC
pg_autoprewarm.buff_dump_interval to > 0.

Just a thought with an additional use case: If I want to set up a
standby for offloading queries, could I take the dump file from the
primary or another existing standby, copy it to the new standby, and
have it be warmed up to the state of the other instance from that?

In my experience, that kind of use is just as interesting as preserving
the buffers across a restart.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Amit Kapila

amit.kapila16@gmail.com

almost 9 years ago

In reply to: Peter Eisentraut (#25)

Re: Proposal : For Auto-Prewarm.

On Thu, Jan 26, 2017 at 8:45 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 1/24/17 3:26 AM, Mithun Cy wrote:

In my code by default, we only dump at shutdown time. If we want to
dump at regular interval then we need to set the GUC
pg_autoprewarm.buff_dump_interval to > 0.

Just a thought with an additional use case: If I want to set up a
standby for offloading queries, could I take the dump file from the
primary or another existing standby, copy it to the new standby, and
have it be warmed up to the state of the other instance from that?

In my experience, that kind of use is just as interesting as preserving
the buffers across a restart.

An interesting use case. I am not sure if somebody has tried that way
but it appears to me that the current proposed patch should work for
this use case.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27

Beena Emerson

memissemerson@gmail.com

almost 9 years ago

In reply to: Amit Kapila (#26)

Re: Proposal : For Auto-Prewarm.

On Fri, Jan 27, 2017 at 8:14 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Thu, Jan 26, 2017 at 8:45 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 1/24/17 3:26 AM, Mithun Cy wrote:

In my code by default, we only dump at shutdown time. If we want to
dump at regular interval then we need to set the GUC
pg_autoprewarm.buff_dump_interval to > 0.

Just a thought with an additional use case: If I want to set up a
standby for offloading queries, could I take the dump file from the
primary or another existing standby, copy it to the new standby, and
have it be warmed up to the state of the other instance from that?

In my experience, that kind of use is just as interesting as preserving
the buffers across a restart.

An interesting use case. I am not sure if somebody has tried that way
but it appears to me that the current proposed patch should work for
this use case.

Even I feel this should work.
In that case, we could add the file location parameter. By default it
would store in the cluster directory else in the location provided. We can
update this parameter in standby for it to access the file.
Thoughts?

--
Thank you,

Beena Emerson

Have a Great Day!

#28

Mithun Cy

mithun.cy@enterprisedb.com

almost 9 years ago

In reply to: Peter Eisentraut (#25)

Re: Proposal : For Auto-Prewarm.

On Thu, Jan 26, 2017 at 8:45 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

Just a thought with an additional use case: If I want to set up a
standby for offloading queries, could I take the dump file from the
primary or another existing standby, copy it to the new standby, and
have it be warmed up to the state of the other instance from that?

In my experience, that kind of use is just as interesting as preserving
the buffers across a restart.

Initially, I did not think about this thanks for asking. For now, we
dump the buffer pool info in the format
<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>; If BlockNum in
new standby correspond to the same set of rows as it was with the
server where the dump was produced, I think we can directly use the
dump file in new standby. All we need to do is just drop the ".save"
file in data-directory and preload the library. Buffer pool will be
warmed with blocks mentioned in ".save".

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

almost 9 years ago

In reply to: Beena Emerson (#27)

Re: Proposal : For Auto-Prewarm.

On 1/26/17 11:11 PM, Beena Emerson wrote:

In that case, we could add the file location parameter. By default it
would store in the cluster directory else in the location provided. We
can update this parameter in standby for it to access the file.

I don't see the need for that.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Peter Eisentraut (#29)

Re: Proposal : For Auto-Prewarm.

On Fri, Jan 27, 2017 at 3:18 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 1/26/17 11:11 PM, Beena Emerson wrote:

In that case, we could add the file location parameter. By default it
would store in the cluster directory else in the location provided. We
can update this parameter in standby for it to access the file.

I don't see the need for that.

+1. That seems like over-engineering this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 9 years ago

In reply to: Beena Emerson (#27)

Re: Proposal : For Auto-Prewarm.

On 1/26/17 10:11 PM, Beena Emerson wrote:

In that case, we could add the file location parameter. By default it
would store in the cluster directory else in the location provided. We
can update this parameter in standby for it to access the file.

I don't see file location being as useful in this case. What would be
more useful is being able to forcibly load blocks into shared buffers so
that you didn't need to restart.

Hmm, it occurs to me that could be accomplished by providing an SRF that
returned the contents of the current save file.

In any case, I strongly suggest focusing on the issues that have already
been identified before trying to add more features.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Jim Nasby (#31)

Re: Proposal : For Auto-Prewarm.

On Fri, Jan 27, 2017 at 5:34 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 1/26/17 10:11 PM, Beena Emerson wrote:

In that case, we could add the file location parameter. By default it
would store in the cluster directory else in the location provided. We
can update this parameter in standby for it to access the file.

I don't see file location being as useful in this case. What would be more
useful is being able to forcibly load blocks into shared buffers so that you
didn't need to restart.

Of course, you can already do that with the existing pg_prewarm and
pg_buffercache functionality. Any time you want, you can use
pg_buffercache to dump out a list of everything in shared_buffers, and
pg_prewarm to suck that same stuff in on the same node or a separate
node. All this patch is trying to do is provide a convenient,
automated way to make that work.

(An argument could be made that this ought to be in core and the
default behavior, because who really wants to start with an ice-cold
cold buffer cache on a production system? It's possible that
repopulating shared_buffers in the background after a restart could
cause enough I/O to interfere with foreground activity that
regrettably ends up needing none of the prewarmed buffers, but I think
prewarming a few GB of data should be quite fast under normal
circumstances, and any well-intentioned system can go wrong under some
set of obscure circumstances. Still, the patch takes the conservative
course of making this an opt-in behavior, and that's probably for the
best.)

In any case, I strongly suggest focusing on the issues that have already
been identified before trying to add more features.

+1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33

Michael Paquier

michael.paquier@gmail.com

almost 9 years ago

In reply to: Robert Haas (#32)

Re: Proposal : For Auto-Prewarm.

On Tue, Jan 31, 2017 at 3:02 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jan 27, 2017 at 5:34 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 1/26/17 10:11 PM, Beena Emerson wrote:

In that case, we could add the file location parameter. By default it
would store in the cluster directory else in the location provided. We
can update this parameter in standby for it to access the file.

I don't see file location being as useful in this case. What would be more
useful is being able to forcibly load blocks into shared buffers so that you
didn't need to restart.

Of course, you can already do that with the existing pg_prewarm and
pg_buffercache functionality. Any time you want, you can use
pg_buffercache to dump out a list of everything in shared_buffers, and
pg_prewarm to suck that same stuff in on the same node or a separate
node. All this patch is trying to do is provide a convenient,
automated way to make that work.

(An argument could be made that this ought to be in core and the
default behavior, because who really wants to start with an ice-cold
cold buffer cache on a production system? It's possible that
repopulating shared_buffers in the background after a restart could
cause enough I/O to interfere with foreground activity that
regrettably ends up needing none of the prewarmed buffers, but I think
prewarming a few GB of data should be quite fast under normal
circumstances, and any well-intentioned system can go wrong under some
set of obscure circumstances. Still, the patch takes the conservative
course of making this an opt-in behavior, and that's probably for the
best.)

I partially agree with this paragraph, at least there are advantages
to do so for cases where the data fits in shared buffers. Even for
data sets fitting in RAM that can be an advantage as the buffers would
get evicted from Postgres' cache but not the OS. Now for cases where
there are many load patterns on a given database (I have some here),
that's hard to make this thing by default on.

This patch needs to be visibly reshaped anyway, so I am marking it as
returned with feedback.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Michael Paquier (#33)

Re: Proposal : For Auto-Prewarm.

On Tue, Jan 31, 2017 at 1:48 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I partially agree with this paragraph, at least there are advantages
to do so for cases where the data fits in shared buffers. Even for
data sets fitting in RAM that can be an advantage as the buffers would
get evicted from Postgres' cache but not the OS. Now for cases where
there are many load patterns on a given database (I have some here),
that's hard to make this thing by default on.

Well, the question even for that case is whether it really costs
anything. My bet is that it is nearly free when it doesn't help, but
that could be wrong. My experience running pgbench tests is that
prewarming all of pgbench_accounts on a scale factor that fits in
shared_buffers using "dd" took just a few seconds, but when accessing
the blocks in random order the cache took many minutes to heat up.
Now, I assume that this patch sorts the I/O (although I haven't
checked that) and therefore I expect that the prewarm completes really
fast. If that's not the case, then that's bad. But if it is the
case, then it's not really hurting you even if the workload changes
completely.

Again, I'm not really arguing for enable-by-default, but I think if
this is well-implemented the chances of it actually hurting anything
are very low, so you'll either win or it'll make no difference.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35

Mithun Cy

mithun.cy@enterprisedb.com

almost 9 years ago

In reply to: Robert Haas (#34)

Re: Proposal : For Auto-Prewarm.

On Tue, Jan 31, 2017 at 9:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Now, I assume that this patch sorts the I/O (although I haven't
checked that) and therefore I expect that the prewarm completes really
fast. If that's not the case, then that's bad. But if it is the
case, then it's not really hurting you even if the workload changes
completely.

-- JFYI Yes in the patch we load the sorted
<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36

Michael Paquier

michael.paquier@gmail.com

almost 9 years ago

In reply to: Robert Haas (#34)

Re: Proposal : For Auto-Prewarm.

On Wed, Feb 1, 2017 at 1:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Well, the question even for that case is whether it really costs
anything. My bet is that it is nearly free when it doesn't help, but
that could be wrong. My experience running pgbench tests is that
prewarming all of pgbench_accounts on a scale factor that fits in
shared_buffers using "dd" took just a few seconds, but when accessing
the blocks in random order the cache took many minutes to heat up.

And benchmarks like dbt-1 have a pre-warming period added in the test
itself where the user can specify in a number of seconds to linearly
increase the load from 0% to 100%, just for filling in the OS and PG's
cache... This feature would be helpful.

Now, I assume that this patch sorts the I/O (although I haven't
checked that) and therefore I expect that the prewarm completes really
fast. If that's not the case, then that's bad. But if it is the
case, then it's not really hurting you even if the workload changes
completely.

Having that working fast would be really nice.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37

Mithun Cy

mithun.cy@enterprisedb.com

almost 9 years ago

In reply to: Michael Paquier (#36)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

Hi all,
Here is the new patch which fixes all of above comments, I changed the
design a bit now as below

What is it?
===========
A pair of bgwrokers one which automatically dumps buffer pool's block
info at a given interval and another which loads those block into
buffer pool when
the server restarts.

How does it work?
=================
When the shared library pg_prewarm is preloaded during server startup.
A bgworker "auto pg_prewarm load" is launched immediately after the
server is started. The bgworker will start loading blocks obtained
from block info entry
<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum> in
$PGDATA/AUTO_PG_PREWARM_FILE, until there is a free buffer in the
buffer pool. This way we do not replace any new blocks which were
loaded either by the recovery process or the querying clients.

Once the "auto pg_prewarm load" bgworker has completed its job, it
will register a dynamic bgworker "auto pg_prewarm dump" which has to
be launched
when the server reaches a consistent state. The new bgworker will
periodically scan the buffer pool and then dump the meta info of
blocks
which are currently in the buffer pool. The GUC
pg_prewarm.dump_interval if set > 0 indicates the minimum time
interval between two dumps. If
pg_prewarm.dump_interval is set to AT_PWARM_DUMP_AT_SHUTDOWN_ONLY the
bgworker will only dump at the time of server shutdown. If it is set
to AT_PWARM_LOAD_ONLY we do not want the bgworker to dump anymore, so
it stops there.

To relaunch a stopped "auto pg_prewarm dump" bgworker we can use the
utility function "launch_pg_prewarm_dump".

==================
One problem now I have kept it open is multiple "auto pg_prewarm dump"
can be launched even if already a dump/load worker is running by
calling launch_pg_prewarm_dump. I can avoid this by dropping a
lock-file before starting the bgworkers. But, if there is an another
method to avoid launching bgworker on a simple method I can do same.
Any suggestion on this will be very helpful.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

pg_auto_prewarm_03.patchapplication/octet-stream; name=pg_auto_prewarm_03.patchDownload

commit 6c397b937027e2f96c372138410e0ea1a52e253b
Author: mithun <mithun@localhost.localdomain>
Date:   Tue Feb 7 10:23:28 2017 +0530

    commit 1: auto_pg_prewarm

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e..706b0da 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,10 +1,10 @@
 # contrib/pg_prewarm/Makefile
 
 MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+OBJS = pg_prewarm.o auto_pg_prewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
-DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
+DATA = pg_prewarm--1.1.sql pg_prewarm--1.1--1.2.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_prewarm/auto_pg_prewarm.c b/contrib/pg_prewarm/auto_pg_prewarm.c
new file mode 100644
index 0000000..fd514c0
--- /dev/null
+++ b/contrib/pg_prewarm/auto_pg_prewarm.c
@@ -0,0 +1,760 @@
+/*-------------------------------------------------------------------------
+ *
+ * auto_pg_prewarm.c
+ *	Automatically dumps buffer pool's block info and then load blocks into
+ *	buffer pool.
+ *
+ *	Copyright (c) 2013-2017, PostgreSQL Global Development Group
+ *
+ *	IDENTIFICATION
+ *		contrib/pg_prewarm.c/auto_pg_prewarm.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include <unistd.h>
+
+/* These are always necessary for a bgworker. */
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+
+/* These are necessary for prewarm utilities. */
+#include "pgstat.h"
+#include "storage/buf_internals.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "utils/guc.h"
+#include "catalog/pg_class.h"
+
+/*
+ * auto pg_prewarm :
+ *
+ * What is it?
+ * ===========
+ * A pair of bgwrokers one which automatically dumps buffer pool's block info at
+ * a given interval and another which loads those block into buffer pool when
+ * the server restarts.
+ *
+ * How does it work?
+ * =================
+ * When the shared library pg_prewarm is preloaded during server startup. A
+ * bgworker "auto pg_prewarm load" is launched immediately after the server
+ * is started. The bgworker will start loading blocks obtained from block info
+ * entry <DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum> in
+ * $PGDATA/AUTO_PG_PREWARM_FILE, until there is a free buffer in the buffer
+ * pool. This way we do not replace any new blocks which were loaded either by
+ * the recovery process or the querying clients.
+ *
+ * Once the "auto pg_prewarm load" bgworker has completed its job, it will
+ * register a dynamic bgworker "auto pg_prewarm dump" which has to be launched
+ * when the server reaches to a consistent state. The new bgworker will
+ * periodically scan the buffer pool and then dump the meta info of blocks
+ * which are currently in the buffer pool. The GUC pg_prewarm.dump_interval if
+ * set > 0 indicates the minimum time interval between two dumps. If
+ * pg_prewarm.dump_interval is set to AT_PWARM_DUMP_AT_SHUTDOWN_ONLY the
+ * bgworker will only dump at the time of server shutdown. If it is set to
+ * AT_PWARM_LOAD_ONLY we do not want the bgworker to dump anymore, so it stops
+ * there.
+ *
+ * To relaunch a stopped "auto pg_prewarm dump" bgworker we can use the utility
+ * function launch_pg_prewarm_dump.
+ */
+
+PG_FUNCTION_INFO_V1(launch_pg_prewarm_dump);
+
+#define AT_PWARM_LOAD_ONLY -1
+#define AT_PWARM_DUMP_AT_SHUTDOWN_ONLY 0
+#define AT_PWARM_DEFAULT_DUMP_INTERVAL 300
+
+/* Primary functions */
+void		_PG_init(void);
+static void auto_pgprewarm_main(Datum main_arg);
+static bool load_block(RelFileNode rnode, char reltype, ForkNumber forkNum,
+		   BlockNumber blockNum);
+static void register_auto_pgprewarm();
+void		auto_pgprewarm_dump_main(void);
+
+/* Secondary/supporting functions */
+static void sigtermHandler(SIGNAL_ARGS);
+static void sighupHandler(SIGNAL_ARGS);
+
+/* flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+static volatile sig_atomic_t got_sighup = false;
+
+/*
+ *	Signal handler for SIGTERM
+ *	Set a flag to let the main loop to terminate, and set our latch to wake it
+ *	up.
+ */
+static void
+sigtermHandler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGHUP
+ *	Set a flag to tell the main loop to reread the config file, and set our
+ *	latch to wake it up.
+ */
+static void
+sighupHandler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sighup = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/* Meta-data of each persistent page buffer which is dumped and used to load. */
+typedef struct BlockInfoRecord
+{
+	Oid			database;		/* datbase */
+	Oid			spcNode;		/* tablespace */
+	Oid			filenode;		/* relation */
+	ForkNumber	forknum;		/* fork number */
+	BlockNumber blocknum;		/* block number */
+}	BlockInfoRecord;
+
+/* Try loading only once during startup. If any error do not retry. */
+static bool avoid_loading = false;
+
+/*
+ * And avoid dumping if we receive SIGTERM while loading. Also, do not retry if
+ * dump has failed previously.
+ */
+static bool avoid_dumping = false;
+
+int			dump_interval = 0;
+
+/* compare member elements to check if they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * sort_cmp_func - compare function used while qsorting BlockInfoRecord objects.
+ */
+static int
+sort_cmp_func(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(spcNode);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+	return 0;
+}
+
+#define AUTO_PG_PREWARM_FILE "autopgprewarm"
+
+/*
+ *	load_block -	Load a given block.
+ */
+static bool
+load_block(RelFileNode rnode, char reltype, ForkNumber forkNum,
+		   BlockNumber blockNum)
+{
+	Buffer		buffer;
+
+	/*
+	 * Load the page only if there exist a free buffer. We do not want to
+	 * replace an existing buffer.
+	 */
+	if (have_free_buffer())
+	{
+		SMgrRelation smgr = smgropen(rnode, InvalidBackendId);
+
+		/*
+		 * Check if fork exists first otherwise we will not be able to use one
+		 * free buffer for each nonexisting block.
+		 */
+		if (smgrexists(smgr, forkNum))
+		{
+			buffer = ReadBufferForPrewarm(smgr, reltype,
+										  forkNum, blockNum,
+										  RBM_NORMAL, NULL);
+			if (BufferIsValid(buffer))
+				ReleaseBuffer(buffer);
+		}
+
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ *	load_now - The main routine which reads from the dump file and loads each
+ *	block. We try to load each blocknum read from AUTO_PG_PREWARM_FILE until
+ *	we have any free buffer left or SIGTERM is received. If we fail to load a
+ *	block we ignore the ERROR and try to load next blocknum. This is because
+ *	there is a possibility that corresponding blocknum might have been
+ *	deleted.
+ */
+static void
+load_now(void)
+{
+	static char dump_file_path[MAXPGPATH];
+	FILE	   *file = NULL;
+	uint32		i,
+				num_buffers = 0;
+
+	if (avoid_loading)
+		return;
+
+	avoid_loading = true;
+
+	/* Check if file exists and open file in read mode. */
+	snprintf(dump_file_path, sizeof(dump_file_path), "%s.save",
+			 AUTO_PG_PREWARM_FILE);
+	file = fopen(dump_file_path, PG_BINARY_R);
+
+	if (!file)
+		return;					/* No file to load. */
+
+	if (fscanf(file, "<<%u>>", &num_buffers) != 1)
+	{
+		fclose(file);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("auto pg_prewarm load : error reading num of elements"
+						" in \"%s\" : %m", dump_file_path)));
+	}
+
+	elog(LOG, "auto pg_prewarm load: number of buffers to load %u",
+		 num_buffers);
+
+	for (i = 0; i < num_buffers; i++)
+	{
+		RelFileNode rnode;
+		uint32		forknum;
+		BlockNumber blocknum;
+		bool		have_free_buf = true;
+
+		if (got_sigterm)
+		{
+			/*
+			 * Received shutdown while we were still loading the buffers. No
+			 * need to dump at this stage.
+			 */
+			avoid_dumping = true;
+			break;
+		}
+
+		if (!have_free_buf)
+			break;
+
+		/* Get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &rnode.dbNode, &rnode.spcNode,
+						&rnode.relNode, &forknum, &blocknum))
+			break;				/* No more valid entry hence stop processing. */
+
+		PG_TRY();
+		{
+			have_free_buf = load_block(rnode, RELPERSISTENCE_PERMANENT,
+									   (ForkNumber) forknum, blocknum);
+		}
+		PG_CATCH();
+		{
+			/* Any error handle it and then try to load next buffer. */
+
+			/* Prevent interrupts while cleaning up */
+			HOLD_INTERRUPTS();
+
+			/* Report the error to the server log */
+			EmitErrorReport();
+
+			LWLockReleaseAll();
+			AbortBufferIO();
+			UnlockBuffers();
+
+			/* buffer pins are released here. */
+			ResourceOwnerRelease(CurrentResourceOwner,
+								 RESOURCE_RELEASE_BEFORE_LOCKS,
+								 false, true);
+			FlushErrorState();
+
+			/* Now we can allow interrupts again */
+			RESUME_INTERRUPTS();
+		}
+		PG_END_TRY();
+	}
+
+	fclose(file);
+
+	elog(LOG,
+		 "auto pg_prewarm load : number of buffers actually tried to load %u",
+		 i);
+	return;
+}
+
+/*
+ *	dump_now - The main routine which goes through each buffer header and
+ *	dumps their metadata in the format
+ *	<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>. We Sort these data
+ *	and then dump them. Sorting is necessary as it facilitates sequential read
+ *	during load. Unlike load, if we encounter any error we abort the dump.
+ */
+static void
+dump_now(void)
+{
+	static char dump_file_path[MAXPGPATH],
+				transient_dump_file_path[MAXPGPATH];
+	uint32		i;
+	int			ret;
+	uint32		num_buffers;
+	BlockInfoRecord *block_info_array;
+	BufferDesc *bufHdr;
+	FILE	   *file = NULL;
+
+	if (avoid_dumping)
+		return;
+
+	avoid_dumping = true;
+	block_info_array =
+		(BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	for (num_buffers = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* Lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		/* Only valid and persistent page buffers are dumped. */
+		if ((buf_state & BM_VALID) && (buf_state & BM_TAG_VALID) &&
+			(buf_state & BM_PERMANENT))
+		{
+			block_info_array[num_buffers].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_buffers].spcNode = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_buffers].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_buffers].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_buffers].blocknum = bufHdr->tag.blockNum;
+			++num_buffers;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	/* Sorting now only to avoid sorting while loading. */
+	pg_qsort(block_info_array, num_buffers, sizeof(BlockInfoRecord),
+			 sort_cmp_func);
+
+	snprintf(transient_dump_file_path, sizeof(dump_file_path),
+			 "%s.save.tmp", AUTO_PG_PREWARM_FILE);
+	file = fopen(transient_dump_file_path, "w");
+	if (file == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("auto pg_prewarm dump : could not open \"%s\": %m",
+						dump_file_path)));
+
+	snprintf(dump_file_path, sizeof(dump_file_path),
+			 "%s.save", AUTO_PG_PREWARM_FILE);
+
+	/* Write num_buffers first and then BlockMetaInfoRecords. */
+	ret = fprintf(file, "<<%u>>\n", num_buffers);
+	if (ret < 0)
+	{
+		fclose(file);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("auto pg_prewarm dump : error writing to \"%s\" : %m",
+						dump_file_path)));
+	}
+
+	for (i = 0; i < num_buffers; i++)
+	{
+		ret = fprintf(file, "%u,%u,%u,%u,%u\n",
+					  block_info_array[i].database,
+					  block_info_array[i].spcNode,
+					  block_info_array[i].filenode,
+					  (uint32) block_info_array[i].forknum,
+					  block_info_array[i].blocknum);
+		if (ret < 0)
+		{
+			fclose(file);
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("auto pg_prewarm dump : error writing to"
+							" \"%s\" : %m", dump_file_path)));
+		}
+	}
+
+	pfree(block_info_array);
+
+	/*
+	 * Rename transient_dump_file_path to dump_file_path to make things
+	 * permanent.
+	 */
+	ret = fclose(file);
+	if (ret != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("auto pg_prewarm dump : error closing \"%s\" : %m",
+						transient_dump_file_path)));
+
+	ret = unlink(dump_file_path);
+	if (ret != 0 && errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("auto pg_prewarm dump : unlink \"%s\" failed : %m",
+						dump_file_path)));
+
+	ret = rename(transient_dump_file_path, dump_file_path);
+	if (ret != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("auto pg_prewarm dump : failed to rename \"%s\" to"
+						" \"%s\" : %m",
+						transient_dump_file_path, dump_file_path)));
+
+	if (!got_sigterm)
+		avoid_dumping = false;
+
+	elog(LOG, "auto pg_prewarm dump : saved metadata info of %d blocks",
+		 num_buffers);
+}
+
+static void
+register_auto_pgprewarm()
+{
+	BackgroundWorker auto_pg_prewarm;
+
+	MemSet(&auto_pg_prewarm, 0, sizeof(auto_pg_prewarm));
+	auto_pg_prewarm.bgw_main_arg = Int32GetDatum(0);
+	auto_pg_prewarm.bgw_flags = BGWORKER_SHMEM_ACCESS;
+
+	/* Register the auto pg_prewarm background worker */
+	auto_pg_prewarm.bgw_start_time = BgWorkerStart_PostmasterStart;
+	auto_pg_prewarm.bgw_restart_time = 0;		/* Keep the auto pg_prewarm
+												 * running */
+	auto_pg_prewarm.bgw_main = auto_pgprewarm_main;
+	snprintf(auto_pg_prewarm.bgw_name, BGW_MAXLEN, "auto pg_prewarm load");
+	RegisterBackgroundWorker(&auto_pg_prewarm);
+}
+
+/* Extension's entry point. */
+void
+_PG_init(void)
+{
+	/* Define custom GUC variables. */
+	DefineCustomIntVariable("pg_prewarm.dump_interval",
+					   "Sets the maximum time between two buffer pool dumps",
+							"If set to Zero, timer based dumping is disabled."
+							" If set to -1 we never dump.",
+							&dump_interval,
+							AT_PWARM_DEFAULT_DUMP_INTERVAL,
+							AT_PWARM_LOAD_ONLY, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	/*
+	 * auto pg_prewarm load should be started from postmaster as a preloaded
+	 * library.
+	 */
+	if (!process_shared_preload_libraries_in_progress)
+		return;
+
+	/* Register auto pg_prewarm. */
+	register_auto_pgprewarm();
+}
+
+/*
+ * auto_pgprewarm_main -- The Main entry point of auto pg_pgwarm dump
+ * process. This is invoked as a background worker.
+ */
+static void
+auto_pgprewarm_main(Datum main_arg)
+{
+	MemoryContext autoprewarmer_context;
+	sigjmp_buf	local_sigjmp_buf;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, sigtermHandler);
+	pqsignal(SIGHUP, sighupHandler);
+
+	/*
+	 * Create a resource owner to keep track of our resources.
+	 */
+	CurrentResourceOwner = ResourceOwnerCreate(NULL, "autoprewarmer");
+
+	/*
+	 * Create a memory context that we will do all our work in.  We do this so
+	 * that we can reset the context during error recovery and thereby avoid
+	 * possible memory leaks.
+	 */
+	autoprewarmer_context = AllocSetContextCreate(TopMemoryContext,
+												  "autoprewarmer",
+												  ALLOCSET_DEFAULT_MINSIZE,
+												  ALLOCSET_DEFAULT_INITSIZE,
+												  ALLOCSET_DEFAULT_MAXSIZE);
+	MemoryContextSwitchTo(autoprewarmer_context);
+
+
+	/*
+	 * If an exception is encountered, processing resumes here.
+	 */
+	if (sigsetjmp(local_sigjmp_buf, 1) != 0)
+	{
+		/* Since not using PG_TRY, must reset error stack by hand */
+		error_context_stack = NULL;
+
+		/* Prevent interrupts while cleaning up */
+		HOLD_INTERRUPTS();
+
+		/* Report the error to the server log */
+		EmitErrorReport();
+
+		LWLockReleaseAll();
+		AbortBufferIO();
+		UnlockBuffers();
+
+		/* buffer pins are released here. */
+		ResourceOwnerRelease(CurrentResourceOwner,
+							 RESOURCE_RELEASE_BEFORE_LOCKS,
+							 false, true);
+		AtEOXact_Buffers(false);
+		AtEOXact_SMgr();
+
+		MemoryContextSwitchTo(autoprewarmer_context);
+		FlushErrorState();
+
+		/* Flush any leaked data in the top-level context */
+		MemoryContextResetAndDeleteChildren(autoprewarmer_context);
+
+		/* Now we can allow interrupts again */
+		RESUME_INTERRUPTS();
+
+		/* Close all open files after any error. */
+		smgrcloseall();
+	}
+
+	/* We can now handle ereport(ERROR) */
+	PG_exception_stack = &local_sigjmp_buf;
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+	load_now();
+
+	/*
+	 * In case of a SIGHUP, just reload the configuration.
+	 */
+	if (got_sighup)
+	{
+		got_sighup = false;
+		ProcessConfigFile(PGC_SIGHUP);
+	}
+
+	/* One last buffer pool block meta info dump while postmaster shutdown. */
+	if (!avoid_dumping &&
+		dump_interval >= AT_PWARM_DUMP_AT_SHUTDOWN_ONLY)
+		launch_pg_prewarm_dump(0);
+}
+
+/*
+ * auto_pgprewarm_dump_main -- The main entry point of auto pg_pgwarm dump
+ * process. This is invoked as a background worker.
+ */
+void
+auto_pgprewarm_dump_main(void)
+{
+	MemoryContext autoprewarmer_context;
+	sigjmp_buf	local_sigjmp_buf;
+	int			timeout = AT_PWARM_DEFAULT_DUMP_INTERVAL;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, sigtermHandler);
+	pqsignal(SIGHUP, sighupHandler);
+
+	/*
+	 * Create a resource owner to keep track of our resources.
+	 */
+	CurrentResourceOwner = ResourceOwnerCreate(NULL, "autoprewarmer");
+
+	/*
+	 * Create a memory context that we will do all our work in.  We do this so
+	 * that we can reset the context during error recovery and thereby avoid
+	 * possible memory leaks.
+	 */
+	autoprewarmer_context = AllocSetContextCreate(TopMemoryContext,
+												  "autoprewarmer",
+												  ALLOCSET_DEFAULT_MINSIZE,
+												  ALLOCSET_DEFAULT_INITSIZE,
+												  ALLOCSET_DEFAULT_MAXSIZE);
+	MemoryContextSwitchTo(autoprewarmer_context);
+
+
+	/*
+	 * If an exception is encountered, processing resumes here.
+	 */
+	if (sigsetjmp(local_sigjmp_buf, 1) != 0)
+	{
+		/* Since not using PG_TRY, must reset error stack by hand */
+		error_context_stack = NULL;
+
+		/* Prevent interrupts while cleaning up */
+		HOLD_INTERRUPTS();
+
+		/* Report the error to the server log */
+		EmitErrorReport();
+
+		LWLockReleaseAll();
+		AbortBufferIO();
+		UnlockBuffers();
+
+		/* buffer pins are released here. */
+		ResourceOwnerRelease(CurrentResourceOwner,
+							 RESOURCE_RELEASE_BEFORE_LOCKS,
+							 false, true);
+		AtEOXact_Buffers(false);
+		AtEOXact_SMgr();
+
+		MemoryContextSwitchTo(autoprewarmer_context);
+		FlushErrorState();
+
+		/* Flush any leaked data in the top-level context */
+		MemoryContextResetAndDeleteChildren(autoprewarmer_context);
+
+		/* Now we can allow interrupts again */
+		RESUME_INTERRUPTS();
+
+		/* Close all open files after any error. */
+		smgrcloseall();
+
+		/* Error while dumping is treated as fatal hence do proc_exit */
+		if (avoid_dumping)
+			proc_exit(1);
+	}
+
+	/* We can now handle ereport(ERROR) */
+	PG_exception_stack = &local_sigjmp_buf;
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+
+	/*
+	 * In case of a SIGHUP, just reload the configuration.
+	 */
+	if (got_sighup)
+	{
+		got_sighup = false;
+		ProcessConfigFile(PGC_SIGHUP);
+	}
+
+	/* Has been set not to dump. nothing more to do. */
+	if (dump_interval == AT_PWARM_LOAD_ONLY)
+		return;
+
+	while (!got_sigterm)
+	{
+		int			rc;
+
+		if (dump_interval > AT_PWARM_DUMP_AT_SHUTDOWN_ONLY)
+			timeout = dump_interval;
+
+		ResetLatch(&MyProc->procLatch);
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   timeout * 1000, PG_WAIT_EXTENSION);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Has been set not to dump. nothing more to do. */
+		if (dump_interval == AT_PWARM_LOAD_ONLY)
+			return;
+
+		/* If dump_interval is set then dump the buff pool. */
+		if ((rc & WL_TIMEOUT) &&
+			(dump_interval > AT_PWARM_DUMP_AT_SHUTDOWN_ONLY))
+			dump_now();
+	}
+
+	/* One last block meta info dump while postmaster shutdown. */
+	if (dump_interval >= AT_PWARM_DUMP_AT_SHUTDOWN_ONLY)
+		dump_now();
+}
+
+/*
+ * Dynamically launch an auto pg_prewarm dump worker.
+ */
+Datum
+launch_pg_prewarm_dump(PG_FUNCTION_ARGS)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	worker.bgw_flags = BGWORKER_SHMEM_ACCESS;
+	worker.bgw_start_time = BgWorkerStart_ConsistentState;
+	worker.bgw_restart_time = BGW_NEVER_RESTART;
+	worker.bgw_main = NULL;		/* new worker might not have library loaded */
+	sprintf(worker.bgw_library_name, "pg_prewarm");
+	sprintf(worker.bgw_function_name, "auto_pgprewarm_dump_main");
+	snprintf(worker.bgw_name, BGW_MAXLEN, "auto pg_prewarm dump");
+
+	/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+		PG_RETURN_NULL();
+
+	status = WaitForBackgroundWorkerStartup(handle, &pid);
+
+	if (status == BGWH_STOPPED)
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not start background process"),
+			   errhint("More details may be available in the server log.")));
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			  errmsg("cannot start background processes without postmaster"),
+				 errhint("Kill all remaining database processes and restart"
+						 " the database.")));
+	Assert(status == BGWH_STARTED);
+
+	PG_RETURN_INT32(pid);
+}
diff --git a/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
new file mode 100644
index 0000000..86b219d
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,9 @@
+/* contrib/pg_prewarm/pg_prewarm--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_prewarm UPDATE TO '1.2'" to load this file. \quit
+
+CREATE FUNCTION launch_pg_prewarm_dump()
+RETURNS pg_catalog.int4 STRICT
+AS 'MODULE_PATHNAME', 'launch_pg_prewarm_dump'
+LANGUAGE C;
diff --git a/contrib/pg_prewarm/pg_prewarm.control b/contrib/pg_prewarm/pg_prewarm.control
index cf2fb92..40e3add 100644
--- a/contrib/pg_prewarm/pg_prewarm.control
+++ b/contrib/pg_prewarm/pg_prewarm.control
@@ -1,5 +1,5 @@
 # pg_prewarm extension
 comment = 'prewarm relation data'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_prewarm'
 relocatable = true
diff --git a/doc/src/sgml/pgprewarm.sgml b/doc/src/sgml/pgprewarm.sgml
index c090401..b559141 100644
--- a/doc/src/sgml/pgprewarm.sgml
+++ b/doc/src/sgml/pgprewarm.sgml
@@ -58,6 +58,46 @@ pg_prewarm(regclass, mode text default 'buffer', fork text default 'main',
  </sect2>
 
  <sect2>
+  <title>auto pg_prewarm bgworker</title>
+
+  <para>
+  If we preload the pg_prewarm shared library, we start a pair of bgworkers
+  which automatically dump all of the buffer pool block info at a regular
+  interval and at the time server shutdown (smart and fast mode only).
+  And then load these blocks when the server restarts.
+  </para>
+
+  <para>
+  If shared_preload_libraries is set with pg_prewarm a bgworker
+  <literal> auto pg_prewarm load </literal> is started by the postmaster.
+  Postmaster does not wait for recovery to finish and database to reach a
+  consistent state. If there is a dump file
+  <literal>autopgprewarm.save</literal> to load, the bgworker starts loading
+  each block entry in it to buffer pool until there is a free buffer available.
+  This way we do not replace any new blocks which were loaded either by the
+  recovery process or the querying clients.
+  Once <literal>auto pg_prewarm load</literal> has finished its job of
+  prewarming buffer pool, it launches a dynamic bgworker
+  <literal>auto pg_prewarm dump</literal> which periodically dumps the meta
+  info of blocks present in the buffer pool.
+  </para>
+
+  <para>
+  Set pg_prewarm.dump_interval in seconds to specify the minimum interval
+  between two dumps. If it is set to zero then dumping based on the timer is
+  disabled, we only dump while server shutdown. If set to -1 dumping itself is
+  disabled, the <literal>auto pg_prewarm dump</literal> worker just stop there.
+  By default, it is set to 300 seconds.
+  </para>
+
+  <para>
+  To relaunch a stopped "auto pg_prewarm dump" bgworker without restarting the
+  server, we can use the utility function
+  <literal>launch_pg_prewarm_dump() RETURNS int4</literal>.
+  </para>
+ </sect2>
+
+ <sect2>
   <title>Author</title>
 
   <para>
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 3cb5120..82d1464 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -693,6 +693,20 @@ ReadBufferWithoutRelcache(RelFileNode rnode, ForkNumber forkNum,
 							 mode, strategy, &hit);
 }
 
+/*
+ * ReadBufferForPrewarm -- This new interface is for auto pg_prewarm.
+ */
+Buffer
+ReadBufferForPrewarm(SMgrRelation smgr, char relpersistence,
+					 ForkNumber forkNum, BlockNumber blockNum,
+					 ReadBufferMode mode, BufferAccessStrategy strategy)
+{
+	bool        hit;
+
+	return ReadBuffer_common(smgr, relpersistence, forkNum, blockNum,
+							 mode, strategy, &hit);
+}
+
 
 /*
  * ReadBuffer_common -- common logic for all ReadBuffer variants
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 5d0a636..4606a32 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,19 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- This function check whether there is a free buffer in
+ * buffer pool. Used by auto pg_prewarm module.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index d117b66..58d4871 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 4c697e2..8cd55a7 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -16,6 +16,7 @@
 
 #include "storage/block.h"
 #include "storage/buf.h"
+#include "storage/smgr.h"
 #include "storage/bufpage.h"
 #include "storage/relfilenode.h"
 #include "utils/relcache.h"
@@ -172,6 +173,10 @@ extern Buffer ReadBufferExtended(Relation reln, ForkNumber forkNum,
 extern Buffer ReadBufferWithoutRelcache(RelFileNode rnode,
 						  ForkNumber forkNum, BlockNumber blockNum,
 						  ReadBufferMode mode, BufferAccessStrategy strategy);
+extern Buffer ReadBufferForPrewarm(SMgrRelation smgr, char relpersistence,
+								   ForkNumber forkNum, BlockNumber blockNum,
+								   ReadBufferMode mode,
+								   BufferAccessStrategy strategy);
 extern void ReleaseBuffer(Buffer buffer);
 extern void UnlockReleaseBuffer(Buffer buffer);
 extern void MarkBufferDirty(Buffer buffer);

#38

Beena Emerson

memissemerson@gmail.com

almost 9 years ago

In reply to: Mithun Cy (#37)

Re: Proposal : For Auto-Prewarm.

Hello,

Thank you for the updated patch.

On Tue, Feb 7, 2017 at 10:44 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

Hi all,
Here is the new patch which fixes all of above comments, I changed the
design a bit now as below

What is it?
===========
A pair of bgwrokers one which automatically dumps buffer pool's block
info at a given interval and another which loads those block into
buffer pool when
the server restarts.

Are 2 workers required? This would reduce the number of workers to be
launched by other applications. Also with max_worker_processes = 2 and
restart, the system crashes when the 2nd worker is not launched:
2017-02-07 11:36:39.132 IST [20573] LOG: auto pg_prewarm load : number of
buffers actually tried to load 64
2017-02-07 11:36:39.143 IST [18014] LOG: worker process: auto pg_prewarm
load (PID 20573) was terminated by signal 11: Segmentation fault

I think the document should also mention that an appropriate
max_worker_processes should be set else the dump worker will not be
launched at all.

--
Thank you,

Beena Emerson

Have a Great Day!

#39

Amit Kapila

amit.kapila16@gmail.com

almost 9 years ago

In reply to: Beena Emerson (#38)

Re: Proposal : For Auto-Prewarm.

On Tue, Feb 7, 2017 at 11:53 AM, Beena Emerson <memissemerson@gmail.com> wrote:

Hello,

Thank you for the updated patch.

On Tue, Feb 7, 2017 at 10:44 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

Hi all,
Here is the new patch which fixes all of above comments, I changed the
design a bit now as below

What is it?
===========
A pair of bgwrokers one which automatically dumps buffer pool's block
info at a given interval and another which loads those block into
buffer pool when
the server restarts.

Are 2 workers required?

I think in the new design there is a provision of launching the worker
dynamically to dump the buffers, so there seems to be a need of
separate workers for loading and dumping the buffers. However, there
is no explanation in the patch or otherwise when and why this needs a
pair of workers. Also, if the dump interval is greater than zero,
then do we really need to separately register a dynamic worker?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40

Amit Kapila

amit.kapila16@gmail.com

almost 9 years ago

In reply to: Mithun Cy (#37)

Re: Proposal : For Auto-Prewarm.

On Tue, Feb 7, 2017 at 10:44 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

==================
One problem now I have kept it open is multiple "auto pg_prewarm dump"
can be launched even if already a dump/load worker is running by
calling launch_pg_prewarm_dump. I can avoid this by dropping a
lock-file before starting the bgworkers. But, if there is an another
method to avoid launching bgworker on a simple method I can do same.

How about keeping a variable in PROC_HDR structure to indicate if
already one dump worker is running, then don't allow to start a new
one?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41

Mithun Cy

mithun.cy@enterprisedb.com

almost 9 years ago

In reply to: Beena Emerson (#38)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

On Tue, Feb 7, 2017 at 11:53 AM, Beena Emerson <memissemerson@gmail.com> wrote:

launched by other applications. Also with max_worker_processes = 2 and
restart, the system crashes when the 2nd worker is not launched:
2017-02-07 11:36:39.132 IST [20573] LOG: auto pg_prewarm load : number of
buffers actually tried to load 64
2017-02-07 11:36:39.143 IST [18014] LOG: worker process: auto pg_prewarm
load (PID 20573) was terminated by signal 11: Segmentation fault

SEGFAULT was the coding mistake I have called the C-language function
directly without initializing the functioncallinfo. Thanks for
raising. Below patch fixes same.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

pg_auto_prewarm_04.patchapplication/octet-stream; name=pg_auto_prewarm_04.patchDownload

commit ec7af33d6f36a24b1ae1c68661277188f45030b3
Author: mithun <mithun@localhost.localdomain>
Date:   Tue Feb 7 14:57:35 2017 +0530

    commit 1: auto_pg_prewarm

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e..706b0da 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,10 +1,10 @@
 # contrib/pg_prewarm/Makefile
 
 MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+OBJS = pg_prewarm.o auto_pg_prewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
-DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
+DATA = pg_prewarm--1.1.sql pg_prewarm--1.1--1.2.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_prewarm/auto_pg_prewarm.c b/contrib/pg_prewarm/auto_pg_prewarm.c
new file mode 100644
index 0000000..f4ba4b5
--- /dev/null
+++ b/contrib/pg_prewarm/auto_pg_prewarm.c
@@ -0,0 +1,785 @@
+/*-------------------------------------------------------------------------
+ *
+ * auto_pg_prewarm.c
+ *	Automatically dumps buffer pool's block info and then load blocks into
+ *	buffer pool.
+ *
+ *	Copyright (c) 2013-2017, PostgreSQL Global Development Group
+ *
+ *	IDENTIFICATION
+ *		contrib/pg_prewarm.c/auto_pg_prewarm.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include <unistd.h>
+
+/* These are always necessary for a bgworker. */
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+
+/* These are necessary for prewarm utilities. */
+#include "pgstat.h"
+#include "storage/buf_internals.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "utils/guc.h"
+#include "catalog/pg_class.h"
+
+/*
+ * auto pg_prewarm :
+ *
+ * What is it?
+ * ===========
+ * A pair of bgwrokers one which automatically dumps buffer pool's block info at
+ * a given interval and another which loads those block into buffer pool when
+ * the server restarts.
+ *
+ * How does it work?
+ * =================
+ * When the shared library pg_prewarm is preloaded during server startup. A
+ * bgworker "auto pg_prewarm load" is launched immediately after the server
+ * is started. The bgworker will start loading blocks obtained from block info
+ * entry <DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum> in
+ * $PGDATA/AUTO_PG_PREWARM_FILE, until there is a free buffer in the buffer
+ * pool. This way we do not replace any new blocks which were loaded either by
+ * the recovery process or the querying clients.
+ *
+ * Once the "auto pg_prewarm load" bgworker has completed its job, it will
+ * register a dynamic bgworker "auto pg_prewarm dump" which has to be launched
+ * when the server reaches to a consistent state. The new bgworker will
+ * periodically scan the buffer pool and then dump the meta info of blocks
+ * which are currently in the buffer pool. The GUC pg_prewarm.dump_interval if
+ * set > 0 indicates the minimum time interval between two dumps. If
+ * pg_prewarm.dump_interval is set to AT_PWARM_DUMP_AT_SHUTDOWN_ONLY the
+ * bgworker will only dump at the time of server shutdown. If it is set to
+ * AT_PWARM_LOAD_ONLY we do not want the bgworker to dump anymore, so it stops
+ * there.
+ *
+ * To relaunch a stopped "auto pg_prewarm dump" bgworker we can use the utility
+ * function launch_pg_prewarm_dump.
+ */
+
+PG_FUNCTION_INFO_V1(launch_pg_prewarm_dump);
+
+#define AT_PWARM_LOAD_ONLY -1
+#define AT_PWARM_DUMP_AT_SHUTDOWN_ONLY 0
+#define AT_PWARM_DEFAULT_DUMP_INTERVAL 300
+
+/* Primary functions */
+void		_PG_init(void);
+static void auto_pgprewarm_main(Datum main_arg);
+static bool load_block(RelFileNode rnode, char reltype, ForkNumber forkNum,
+		   BlockNumber blockNum);
+static void register_auto_pgprewarm(void);
+void		auto_pgprewarm_dump_main(void);
+pid_t		auto_pg_prewarm_dump_launcher(void);
+
+/* Secondary/supporting functions */
+static void sigtermHandler(SIGNAL_ARGS);
+static void sighupHandler(SIGNAL_ARGS);
+
+/* flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+static volatile sig_atomic_t got_sighup = false;
+
+/*
+ *	Signal handler for SIGTERM
+ *	Set a flag to let the main loop to terminate, and set our latch to wake it
+ *	up.
+ */
+static void
+sigtermHandler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGHUP
+ *	Set a flag to tell the main loop to reread the config file, and set our
+ *	latch to wake it up.
+ */
+static void
+sighupHandler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sighup = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/* Meta-data of each persistent page buffer which is dumped and used to load. */
+typedef struct BlockInfoRecord
+{
+	Oid			database;		/* datbase */
+	Oid			spcNode;		/* tablespace */
+	Oid			filenode;		/* relation */
+	ForkNumber	forknum;		/* fork number */
+	BlockNumber blocknum;		/* block number */
+}	BlockInfoRecord;
+
+/* Try loading only once during startup. If any error do not retry. */
+static bool avoid_loading = false;
+
+/*
+ * And avoid dumping if we receive SIGTERM while loading. Also, do not retry if
+ * dump has failed previously.
+ */
+static bool avoid_dumping = false;
+
+int			dump_interval = 0;
+
+/* compare member elements to check if they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * sort_cmp_func - compare function used while qsorting BlockInfoRecord objects.
+ */
+static int
+sort_cmp_func(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(spcNode);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+	return 0;
+}
+
+#define AUTO_PG_PREWARM_FILE "autopgprewarm"
+
+/*
+ *	load_block -	Load a given block.
+ */
+static bool
+load_block(RelFileNode rnode, char reltype, ForkNumber forkNum,
+		   BlockNumber blockNum)
+{
+	Buffer		buffer;
+
+	/*
+	 * Load the page only if there exist a free buffer. We do not want to
+	 * replace an existing buffer.
+	 */
+	if (have_free_buffer())
+	{
+		SMgrRelation smgr = smgropen(rnode, InvalidBackendId);
+
+		/*
+		 * Check if fork exists first otherwise we will not be able to use one
+		 * free buffer for each nonexisting block.
+		 */
+		if (smgrexists(smgr, forkNum))
+		{
+			buffer = ReadBufferForPrewarm(smgr, reltype,
+										  forkNum, blockNum,
+										  RBM_NORMAL, NULL);
+			if (BufferIsValid(buffer))
+				ReleaseBuffer(buffer);
+		}
+
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ *	load_now - The main routine which reads from the dump file and loads each
+ *	block. We try to load each blocknum read from AUTO_PG_PREWARM_FILE until
+ *	we have any free buffer left or SIGTERM is received. If we fail to load a
+ *	block we ignore the ERROR and try to load next blocknum. This is because
+ *	there is a possibility that corresponding blocknum might have been
+ *	deleted.
+ */
+static void
+load_now(void)
+{
+	static char dump_file_path[MAXPGPATH];
+	FILE	   *file = NULL;
+	uint32		i,
+				num_buffers = 0;
+
+	if (avoid_loading)
+		return;
+
+	avoid_loading = true;
+
+	/* Check if file exists and open file in read mode. */
+	snprintf(dump_file_path, sizeof(dump_file_path), "%s.save",
+			 AUTO_PG_PREWARM_FILE);
+	file = fopen(dump_file_path, PG_BINARY_R);
+
+	if (!file)
+		return;					/* No file to load. */
+
+	if (fscanf(file, "<<%u>>", &num_buffers) != 1)
+	{
+		fclose(file);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("auto pg_prewarm load : error reading num of elements"
+						" in \"%s\" : %m", dump_file_path)));
+	}
+
+	elog(LOG, "auto pg_prewarm load : number of buffers to load %u",
+		 num_buffers);
+
+	for (i = 0; i < num_buffers; i++)
+	{
+		RelFileNode rnode;
+		uint32		forknum;
+		BlockNumber blocknum;
+		bool		have_free_buf = true;
+
+		if (got_sigterm)
+		{
+			/*
+			 * Received shutdown while we were still loading the buffers. No
+			 * need to dump at this stage.
+			 */
+			avoid_dumping = true;
+			break;
+		}
+
+		if (!have_free_buf)
+			break;
+
+		/* Get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &rnode.dbNode, &rnode.spcNode,
+						&rnode.relNode, &forknum, &blocknum))
+			break;				/* No more valid entry hence stop processing. */
+
+		PG_TRY();
+		{
+			have_free_buf = load_block(rnode, RELPERSISTENCE_PERMANENT,
+									   (ForkNumber) forknum, blocknum);
+		}
+		PG_CATCH();
+		{
+			/* Any error handle it and then try to load next buffer. */
+
+			/* Prevent interrupts while cleaning up */
+			HOLD_INTERRUPTS();
+
+			/* Report the error to the server log */
+			EmitErrorReport();
+
+			LWLockReleaseAll();
+			AbortBufferIO();
+			UnlockBuffers();
+
+			/* buffer pins are released here. */
+			ResourceOwnerRelease(CurrentResourceOwner,
+								 RESOURCE_RELEASE_BEFORE_LOCKS,
+								 false, true);
+			FlushErrorState();
+
+			/* Now we can allow interrupts again */
+			RESUME_INTERRUPTS();
+		}
+		PG_END_TRY();
+	}
+
+	fclose(file);
+
+	elog(LOG,
+		 "auto pg_prewarm load : number of buffers actually tried to load %u",
+		 i);
+	return;
+}
+
+/*
+ *	dump_now - The main routine which goes through each buffer header and
+ *	dumps their metadata in the format
+ *	<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>. We Sort these data
+ *	and then dump them. Sorting is necessary as it facilitates sequential read
+ *	during load. Unlike load, if we encounter any error we abort the dump.
+ */
+static void
+dump_now(void)
+{
+	static char dump_file_path[MAXPGPATH],
+				transient_dump_file_path[MAXPGPATH];
+	uint32		i;
+	int			ret;
+	uint32		num_buffers;
+	BlockInfoRecord *block_info_array;
+	BufferDesc *bufHdr;
+	FILE	   *file = NULL;
+
+	if (avoid_dumping)
+		return;
+
+	avoid_dumping = true;
+	block_info_array =
+		(BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	for (num_buffers = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* Lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		/* Only valid and persistent page buffers are dumped. */
+		if ((buf_state & BM_VALID) && (buf_state & BM_TAG_VALID) &&
+			(buf_state & BM_PERMANENT))
+		{
+			block_info_array[num_buffers].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_buffers].spcNode = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_buffers].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_buffers].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_buffers].blocknum = bufHdr->tag.blockNum;
+			++num_buffers;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	/* Sorting now only to avoid sorting while loading. */
+	pg_qsort(block_info_array, num_buffers, sizeof(BlockInfoRecord),
+			 sort_cmp_func);
+
+	snprintf(transient_dump_file_path, sizeof(dump_file_path),
+			 "%s.save.tmp", AUTO_PG_PREWARM_FILE);
+	file = fopen(transient_dump_file_path, "w");
+	if (file == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("auto pg_prewarm dump : could not open \"%s\": %m",
+						dump_file_path)));
+
+	snprintf(dump_file_path, sizeof(dump_file_path),
+			 "%s.save", AUTO_PG_PREWARM_FILE);
+
+	/* Write num_buffers first and then BlockMetaInfoRecords. */
+	ret = fprintf(file, "<<%u>>\n", num_buffers);
+	if (ret < 0)
+	{
+		fclose(file);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("auto pg_prewarm dump : error writing to \"%s\" : %m",
+						dump_file_path)));
+	}
+
+	for (i = 0; i < num_buffers; i++)
+	{
+		ret = fprintf(file, "%u,%u,%u,%u,%u\n",
+					  block_info_array[i].database,
+					  block_info_array[i].spcNode,
+					  block_info_array[i].filenode,
+					  (uint32) block_info_array[i].forknum,
+					  block_info_array[i].blocknum);
+		if (ret < 0)
+		{
+			fclose(file);
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("auto pg_prewarm dump : error writing to"
+							" \"%s\" : %m", dump_file_path)));
+		}
+	}
+
+	pfree(block_info_array);
+
+	/*
+	 * Rename transient_dump_file_path to dump_file_path to make things
+	 * permanent.
+	 */
+	ret = fclose(file);
+	if (ret != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("auto pg_prewarm dump : error closing \"%s\" : %m",
+						transient_dump_file_path)));
+
+	ret = unlink(dump_file_path);
+	if (ret != 0 && errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("auto pg_prewarm dump : unlink \"%s\" failed : %m",
+						dump_file_path)));
+
+	ret = rename(transient_dump_file_path, dump_file_path);
+	if (ret != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("auto pg_prewarm dump : failed to rename \"%s\" to"
+						" \"%s\" : %m",
+						transient_dump_file_path, dump_file_path)));
+
+	if (!got_sigterm)
+		avoid_dumping = false;
+
+	elog(LOG, "auto pg_prewarm dump : saved metadata info of %d blocks",
+		 num_buffers);
+}
+
+/* Register auto pg_prewarm load bgworker. */
+static void
+register_auto_pgprewarm()
+{
+	BackgroundWorker auto_pg_prewarm;
+
+	MemSet(&auto_pg_prewarm, 0, sizeof(auto_pg_prewarm));
+	auto_pg_prewarm.bgw_main_arg = Int32GetDatum(0);
+	auto_pg_prewarm.bgw_flags = BGWORKER_SHMEM_ACCESS;
+
+	/* Register the auto pg_prewarm background worker */
+	auto_pg_prewarm.bgw_start_time = BgWorkerStart_PostmasterStart;
+	auto_pg_prewarm.bgw_restart_time = BGW_NEVER_RESTART;
+	auto_pg_prewarm.bgw_main = auto_pgprewarm_main;
+	snprintf(auto_pg_prewarm.bgw_name, BGW_MAXLEN, "auto pg_prewarm load");
+	RegisterBackgroundWorker(&auto_pg_prewarm);
+}
+
+/* Extension's entry point. */
+void
+_PG_init(void)
+{
+	/* Define custom GUC variables. */
+	DefineCustomIntVariable("pg_prewarm.dump_interval",
+					   "Sets the maximum time between two buffer pool dumps",
+							"If set to Zero, timer based dumping is disabled."
+							" If set to -1 we never dump.",
+							&dump_interval,
+							AT_PWARM_DEFAULT_DUMP_INTERVAL,
+							AT_PWARM_LOAD_ONLY, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	/*
+	 * auto pg_prewarm load should be started from postmaster as a preloaded
+	 * library.
+	 */
+	if (!process_shared_preload_libraries_in_progress)
+		return;
+
+	/* Register auto pg_prewarm load. */
+	register_auto_pgprewarm();
+}
+
+/*
+ * auto_pgprewarm_main -- The Main entry point of auto pg_pgwarm dump
+ * process. This is invoked as a background worker.
+ */
+static void
+auto_pgprewarm_main(Datum main_arg)
+{
+	MemoryContext autoprewarmer_context;
+	sigjmp_buf	local_sigjmp_buf;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, sigtermHandler);
+	pqsignal(SIGHUP, sighupHandler);
+
+	/*
+	 * Create a resource owner to keep track of our resources.
+	 */
+	CurrentResourceOwner = ResourceOwnerCreate(NULL, "autoprewarmer");
+
+	/*
+	 * Create a memory context that we will do all our work in.  We do this so
+	 * that we can reset the context during error recovery and thereby avoid
+	 * possible memory leaks.
+	 */
+	autoprewarmer_context = AllocSetContextCreate(TopMemoryContext,
+												  "autoprewarmer",
+												  ALLOCSET_DEFAULT_MINSIZE,
+												  ALLOCSET_DEFAULT_INITSIZE,
+												  ALLOCSET_DEFAULT_MAXSIZE);
+	MemoryContextSwitchTo(autoprewarmer_context);
+
+	/*
+	 * If an exception is encountered, processing resumes here.
+	 */
+	if (sigsetjmp(local_sigjmp_buf, 1) != 0)
+	{
+		/* Since not using PG_TRY, must reset error stack by hand */
+		error_context_stack = NULL;
+
+		/* Prevent interrupts while cleaning up */
+		HOLD_INTERRUPTS();
+
+		/* Report the error to the server log */
+		EmitErrorReport();
+
+		LWLockReleaseAll();
+		AbortBufferIO();
+		UnlockBuffers();
+
+		/* buffer pins are released here. */
+		ResourceOwnerRelease(CurrentResourceOwner,
+							 RESOURCE_RELEASE_BEFORE_LOCKS,
+							 false, true);
+		AtEOXact_Buffers(false);
+		AtEOXact_SMgr();
+
+		MemoryContextSwitchTo(autoprewarmer_context);
+		FlushErrorState();
+
+		/* Flush any leaked data in the top-level context */
+		MemoryContextResetAndDeleteChildren(autoprewarmer_context);
+
+		/* Now we can allow interrupts again */
+		RESUME_INTERRUPTS();
+
+		/* Close all open files after any error. */
+		smgrcloseall();
+	}
+
+	/* We can now handle ereport(ERROR) */
+	PG_exception_stack = &local_sigjmp_buf;
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+	load_now();
+
+	/*
+	 * In case of a SIGHUP, just reload the configuration.
+	 */
+	if (got_sighup)
+	{
+		got_sighup = false;
+		ProcessConfigFile(PGC_SIGHUP);
+	}
+
+	/* launch auto pg_prewarm dump bgworker. */
+	if (!avoid_dumping &&
+		dump_interval != AT_PWARM_LOAD_ONLY)
+		auto_pg_prewarm_dump_launcher();
+}
+
+/*
+ * auto_pgprewarm_dump_main -- The main entry point of auto pg_pgwarm dump
+ * process. This is invoked as a background worker.
+ */
+void
+auto_pgprewarm_dump_main(void)
+{
+	MemoryContext autoprewarmer_context;
+	sigjmp_buf	local_sigjmp_buf;
+	int			timeout = AT_PWARM_DEFAULT_DUMP_INTERVAL;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, sigtermHandler);
+	pqsignal(SIGHUP, sighupHandler);
+
+	/*
+	 * Create a resource owner to keep track of our resources.
+	 */
+	CurrentResourceOwner = ResourceOwnerCreate(NULL, "autoprewarmer");
+
+	/*
+	 * Create a memory context that we will do all our work in.  We do this so
+	 * that we can reset the context during error recovery and thereby avoid
+	 * possible memory leaks.
+	 */
+	autoprewarmer_context = AllocSetContextCreate(TopMemoryContext,
+												  "autoprewarmer",
+												  ALLOCSET_DEFAULT_MINSIZE,
+												  ALLOCSET_DEFAULT_INITSIZE,
+												  ALLOCSET_DEFAULT_MAXSIZE);
+	MemoryContextSwitchTo(autoprewarmer_context);
+
+
+	/*
+	 * If an exception is encountered, processing resumes here.
+	 */
+	if (sigsetjmp(local_sigjmp_buf, 1) != 0)
+	{
+		/* Since not using PG_TRY, must reset error stack by hand */
+		error_context_stack = NULL;
+
+		/* Prevent interrupts while cleaning up */
+		HOLD_INTERRUPTS();
+
+		/* Report the error to the server log */
+		EmitErrorReport();
+
+		LWLockReleaseAll();
+		AbortBufferIO();
+		UnlockBuffers();
+
+		/* buffer pins are released here. */
+		ResourceOwnerRelease(CurrentResourceOwner,
+							 RESOURCE_RELEASE_BEFORE_LOCKS,
+							 false, true);
+		AtEOXact_Buffers(false);
+		AtEOXact_SMgr();
+
+		MemoryContextSwitchTo(autoprewarmer_context);
+		FlushErrorState();
+
+		/* Flush any leaked data in the top-level context */
+		MemoryContextResetAndDeleteChildren(autoprewarmer_context);
+
+		/* Now we can allow interrupts again */
+		RESUME_INTERRUPTS();
+
+		/* Close all open files after any error. */
+		smgrcloseall();
+
+		/* Error while dumping is treated as fatal hence do proc_exit */
+		if (avoid_dumping)
+			proc_exit(1);
+	}
+
+	/* We can now handle ereport(ERROR) */
+	PG_exception_stack = &local_sigjmp_buf;
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+
+	/*
+	 * In case of a SIGHUP, just reload the configuration.
+	 */
+	if (got_sighup)
+	{
+		got_sighup = false;
+		ProcessConfigFile(PGC_SIGHUP);
+	}
+
+	/* Has been set not to dump. nothing more to do. */
+	if (dump_interval == AT_PWARM_LOAD_ONLY)
+		return;
+
+	while (!got_sigterm)
+	{
+		int			rc;
+
+		if (dump_interval > AT_PWARM_DUMP_AT_SHUTDOWN_ONLY)
+			timeout = dump_interval;
+
+		ResetLatch(&MyProc->procLatch);
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   timeout * 1000, PG_WAIT_EXTENSION);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Has been set not to dump. nothing more to do. */
+		if (dump_interval == AT_PWARM_LOAD_ONLY)
+			return;
+
+		/* If dump_interval is set then dump the buff pool. */
+		if ((rc & WL_TIMEOUT) &&
+			(dump_interval > AT_PWARM_DUMP_AT_SHUTDOWN_ONLY))
+			dump_now();
+	}
+
+	/* One last block meta info dump while postmaster shutdown. */
+	if (dump_interval != AT_PWARM_LOAD_ONLY)
+		dump_now();
+}
+
+/*
+ * Dynamically launch an auto pg_prewarm dump worker.
+ */
+pid_t
+auto_pg_prewarm_dump_launcher(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	worker.bgw_flags = BGWORKER_SHMEM_ACCESS;
+	worker.bgw_start_time = BgWorkerStart_ConsistentState;
+	worker.bgw_restart_time = BGW_NEVER_RESTART;
+	worker.bgw_main = NULL;		/* new worker might not have library loaded */
+	sprintf(worker.bgw_library_name, "pg_prewarm");
+	sprintf(worker.bgw_function_name, "auto_pgprewarm_dump_main");
+	snprintf(worker.bgw_name, BGW_MAXLEN, "auto pg_prewarm dump");
+
+	/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		avoid_dumping = true;
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker \"auto pg_prewarm dump\" failed"),
+				 errhint("Consider increasing configuration parameter "
+						 "\"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerStartup(handle, &pid);
+
+	if (status == BGWH_STOPPED)
+	{
+		avoid_dumping = true;
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not start auto pg_prewarm dump bgworker"),
+			   errhint("More details may be available in the server log.")));
+	}
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		avoid_dumping = true;
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+		  errmsg("cannot start bgworker auto pg_prewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart"
+						 " the database.")));
+	}
+	Assert(status == BGWH_STARTED);
+	return pid;
+}
+
+/*
+ * The C-Language entry function to launch auto pg_prewarm dump.
+ */
+Datum
+launch_pg_prewarm_dump(PG_FUNCTION_ARGS)
+{
+	pid_t		pid;
+
+	pid = auto_pg_prewarm_dump_launcher();
+	PG_RETURN_INT32(pid);
+}
diff --git a/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
new file mode 100644
index 0000000..86b219d
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,9 @@
+/* contrib/pg_prewarm/pg_prewarm--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_prewarm UPDATE TO '1.2'" to load this file. \quit
+
+CREATE FUNCTION launch_pg_prewarm_dump()
+RETURNS pg_catalog.int4 STRICT
+AS 'MODULE_PATHNAME', 'launch_pg_prewarm_dump'
+LANGUAGE C;
diff --git a/contrib/pg_prewarm/pg_prewarm.control b/contrib/pg_prewarm/pg_prewarm.control
index cf2fb92..40e3add 100644
--- a/contrib/pg_prewarm/pg_prewarm.control
+++ b/contrib/pg_prewarm/pg_prewarm.control
@@ -1,5 +1,5 @@
 # pg_prewarm extension
 comment = 'prewarm relation data'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_prewarm'
 relocatable = true
diff --git a/doc/src/sgml/pgprewarm.sgml b/doc/src/sgml/pgprewarm.sgml
index c090401..b559141 100644
--- a/doc/src/sgml/pgprewarm.sgml
+++ b/doc/src/sgml/pgprewarm.sgml
@@ -58,6 +58,46 @@ pg_prewarm(regclass, mode text default 'buffer', fork text default 'main',
  </sect2>
 
  <sect2>
+  <title>auto pg_prewarm bgworker</title>
+
+  <para>
+  If we preload the pg_prewarm shared library, we start a pair of bgworkers
+  which automatically dump all of the buffer pool block info at a regular
+  interval and at the time server shutdown (smart and fast mode only).
+  And then load these blocks when the server restarts.
+  </para>
+
+  <para>
+  If shared_preload_libraries is set with pg_prewarm a bgworker
+  <literal> auto pg_prewarm load </literal> is started by the postmaster.
+  Postmaster does not wait for recovery to finish and database to reach a
+  consistent state. If there is a dump file
+  <literal>autopgprewarm.save</literal> to load, the bgworker starts loading
+  each block entry in it to buffer pool until there is a free buffer available.
+  This way we do not replace any new blocks which were loaded either by the
+  recovery process or the querying clients.
+  Once <literal>auto pg_prewarm load</literal> has finished its job of
+  prewarming buffer pool, it launches a dynamic bgworker
+  <literal>auto pg_prewarm dump</literal> which periodically dumps the meta
+  info of blocks present in the buffer pool.
+  </para>
+
+  <para>
+  Set pg_prewarm.dump_interval in seconds to specify the minimum interval
+  between two dumps. If it is set to zero then dumping based on the timer is
+  disabled, we only dump while server shutdown. If set to -1 dumping itself is
+  disabled, the <literal>auto pg_prewarm dump</literal> worker just stop there.
+  By default, it is set to 300 seconds.
+  </para>
+
+  <para>
+  To relaunch a stopped "auto pg_prewarm dump" bgworker without restarting the
+  server, we can use the utility function
+  <literal>launch_pg_prewarm_dump() RETURNS int4</literal>.
+  </para>
+ </sect2>
+
+ <sect2>
   <title>Author</title>
 
   <para>
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 3cb5120..82d1464 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -693,6 +693,20 @@ ReadBufferWithoutRelcache(RelFileNode rnode, ForkNumber forkNum,
 							 mode, strategy, &hit);
 }
 
+/*
+ * ReadBufferForPrewarm -- This new interface is for auto pg_prewarm.
+ */
+Buffer
+ReadBufferForPrewarm(SMgrRelation smgr, char relpersistence,
+					 ForkNumber forkNum, BlockNumber blockNum,
+					 ReadBufferMode mode, BufferAccessStrategy strategy)
+{
+	bool        hit;
+
+	return ReadBuffer_common(smgr, relpersistence, forkNum, blockNum,
+							 mode, strategy, &hit);
+}
+
 
 /*
  * ReadBuffer_common -- common logic for all ReadBuffer variants
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 5d0a636..4606a32 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,19 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- This function check whether there is a free buffer in
+ * buffer pool. Used by auto pg_prewarm module.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index d117b66..58d4871 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 4c697e2..8cd55a7 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -16,6 +16,7 @@
 
 #include "storage/block.h"
 #include "storage/buf.h"
+#include "storage/smgr.h"
 #include "storage/bufpage.h"
 #include "storage/relfilenode.h"
 #include "utils/relcache.h"
@@ -172,6 +173,10 @@ extern Buffer ReadBufferExtended(Relation reln, ForkNumber forkNum,
 extern Buffer ReadBufferWithoutRelcache(RelFileNode rnode,
 						  ForkNumber forkNum, BlockNumber blockNum,
 						  ReadBufferMode mode, BufferAccessStrategy strategy);
+extern Buffer ReadBufferForPrewarm(SMgrRelation smgr, char relpersistence,
+								   ForkNumber forkNum, BlockNumber blockNum,
+								   ReadBufferMode mode,
+								   BufferAccessStrategy strategy);
 extern void ReleaseBuffer(Buffer buffer);
 extern void UnlockReleaseBuffer(Buffer buffer);
 extern void MarkBufferDirty(Buffer buffer);

#42

Beena Emerson

memissemerson@gmail.com

almost 9 years ago

In reply to: Mithun Cy (#41)

Re: Proposal : For Auto-Prewarm.

On Tue, Feb 7, 2017 at 3:01 PM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

On Tue, Feb 7, 2017 at 11:53 AM, Beena Emerson <memissemerson@gmail.com>
wrote:

launched by other applications. Also with max_worker_processes = 2 and
restart, the system crashes when the 2nd worker is not launched:
2017-02-07 11:36:39.132 IST [20573] LOG: auto pg_prewarm load : number

of

buffers actually tried to load 64
2017-02-07 11:36:39.143 IST [18014] LOG: worker process: auto pg_prewarm
load (PID 20573) was terminated by signal 11: Segmentation fault

SEGFAULT was the coding mistake I have called the C-language function
directly without initializing the functioncallinfo. Thanks for
raising. Below patch fixes same.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Few more comments:

= Background worker messages:

- Workers when launched, show messages like: "logical replication launcher
started”, "autovacuum launcher started”. We should probably have a similar
message to show that the pg_prewarm load and dump bgworker has started.

- "auto pg_prewarm load: number of buffers to load x”, other messages show
space before and after “:”, we should keep it consistent through out.

= Action of -1.
I thought we decided that interval value of -1 would mean that the auto
prewarm worker will not be run at all. With current code, -1 is explained
to mean it will not dump. I noticed that reloading with new option as -1
stops both the workers but restarting loads the data and then quits. Why
does it allow loading with -1? Please explain this better in the documents.

= launch_pg_prewarm_dump()
With dump_interval=-1, Though function returns a pid, this worker is not
running in the 04 patch. 03 version it was launching. Dumping is not done
now.

=# SELECT launch_pg_prewarm_dump();
launch_pg_prewarm_dump
------------------------
53552
(1 row)

$ ps -ef | grep 53552
b_emers+ 53555 4391 0 16:21 pts/1 00:00:00 grep --color=auto 53552

= Function names
- load_now could be better named as load_file, load_dumpfile or similar.
- dump_now -> dump_buffer or better?

= Corrupt file
if the dump file is corrupted, the system crashes and the prewarm bgworkers
are not restarted. This needs to be handled better.

WARNING: terminating connection because of crash of another server process
2017-02-07 16:36:58.680 IST [54252] DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory

= Documentation

I feel the documentation needs to be improved greatly.

- The first para in pg_prewarm should mention the autoload feature too.

- The new section should be named “The pg_prewarm autoload” or something
better. "auto pg_prewarm bgworker” does not seem appropriate. The
configuration parameter should also be listed out clearly like in
auth-delay page. The new function launch_pg_prewarm_dump() should be listed
under Functions.

--
Thank you,

Beena Emerson

Have a Great Day!

#43

Mithun Cy

mithun.cy@enterprisedb.com

almost 9 years ago

In reply to: Amit Kapila (#39)

Re: Proposal : For Auto-Prewarm.

On Tue, Feb 7, 2017 at 12:24 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Feb 7, 2017 at 11:53 AM, Beena Emerson <memissemerson@gmail.com> wrote:

Are 2 workers required?

I think in the new design there is a provision of launching the worker
dynamically to dump the buffers, so there seems to be a need of
separate workers for loading and dumping the buffers. However, there
is no explanation in the patch or otherwise when and why this needs a
pair of workers. Also, if the dump interval is greater than zero,
then do we really need to separately register a dynamic worker?

We have introduced a new value -1 for pg_prewarm.dump_interval this
means we will not dump at all, At this state, I thought auto
pg_prewarm process need not run at all, so I coded to exit the auto
pg_prewarm immediately. But If the user decides to start the auto
pg_prewarm to dump only without restarting the server, I have
introduced a launcher function "launch_pg_prewarm_dump" to restart the
auto pg_prewarm only to dump. Since now we can launch worker only to
dump, I thought we can redistribute the code between two workers, one
which only does prewarm (load only) and another dumps periodically.
This helped me to modularize and reuse the code. So once load worker
has finished its job, it registers a dump worker and then exists.
But if max_worker_processes is not enough to launch the "auto
pg_prewarm dump" bgworker
We throw an error
2017-02-07 14:51:59.789 IST [50481] ERROR: registering dynamic
bgworker "auto pg_prewarm dump" failed c
2017-02-07 14:51:59.789 IST [50481] HINT: Consider increasing
configuration parameter "max_worker_processes".

Now thinking again instead of such error and then correcting same by
explicitly launching the auto pg_prewarm dump bgwroker through
launch_pg_prewarm_dump(), I can go back to original design where there
will be one worker which loads and then dumps periodically. And
launch_pg_prewarm_dump will relaunch dump only activity of that
worker. Does this sound good?

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44

Mithun Cy

mithun.cy@enterprisedb.com

almost 9 years ago

In reply to: Beena Emerson (#42)

Re: Proposal : For Auto-Prewarm.

Thanks Beena,

On Tue, Feb 7, 2017 at 4:46 PM, Beena Emerson <memissemerson@gmail.com> wrote:

Few more comments:

= Background worker messages:

- Workers when launched, show messages like: "logical replication launcher
started”, "autovacuum launcher started”. We should probably have a similar
message to show that the pg_prewarm load and dump bgworker has started.

-- Thanks, I will add startup and shutdown message.

- "auto pg_prewarm load: number of buffers to load x”, other messages show
space before and after “:”, we should keep it consistent through out.

-- I think you are testing patch 03. The latest patch_04 have
corrected same. Can you please re-test it.

= Action of -1.
I thought we decided that interval value of -1 would mean that the auto
prewarm worker will not be run at all. With current code, -1 is explained to
mean it will not dump. I noticed that reloading with new option as -1 stops
both the workers but restarting loads the data and then quits. Why does it
allow loading with -1? Please explain this better in the documents.

-- '-1' means we do not want to dump at all. So we decide not to
continue with launched bgworker and it exits. As per your first
comment, if I register the startup and shutdown messages for auto
pg_prewarm I think it will look better. Will try to explain it in a
better way in documentation. The "auto pg_prewarm load" will not be
affected with dump_interval value. It will always start, load(prewarm)
and then exit.

= launch_pg_prewarm_dump()

=# SELECT launch_pg_prewarm_dump();
launch_pg_prewarm_dump
------------------------
53552
(1 row)

$ ps -ef | grep 53552
b_emers+ 53555 4391 0 16:21 pts/1 00:00:00 grep --color=auto 53552

-- If dump_interval = -1 "auto pg_prewarm dump" will exit immediately.

= Function names
- load_now could be better named as load_file, load_dumpfile or similar.
- dump_now -> dump_buffer or better?

I did choose load_now and dump_now to indicate we are doing it
immediately as invoking them was based on a timer/state. Probably we
can improve that but dump_buffer, load_file may not be the right
replacement.

= Corrupt file
if the dump file is corrupted, the system crashes and the prewarm bgworkers
are not restarted. This needs to be handled better.

WARNING: terminating connection because of crash of another server process
2017-02-07 16:36:58.680 IST [54252] DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory

--- Can you please paste you autopgprewarm.save file, I edited the
file manually to some illegal entry but did not see any crash.  Only
we failed to load as block number were invalid. Please share your
tests so that I can reproduce same.

= Documentation

I feel the documentation needs to be improved greatly.

- The first para in pg_prewarm should mention the autoload feature too.

- The new section should be named “The pg_prewarm autoload” or something
better. "auto pg_prewarm bgworker” does not seem appropriate. The
configuration parameter should also be listed out clearly like in auth-delay
page. The new function launch_pg_prewarm_dump() should be listed under
Functions.

-- Thanks I will try to improve the documentation. And, put more details there.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45

Beena Emerson

memissemerson@gmail.com

almost 9 years ago

In reply to: Mithun Cy (#44)

Re: Proposal : For Auto-Prewarm.

Hello,

On Tue, Feb 7, 2017 at 5:52 PM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

Thanks Beena,

On Tue, Feb 7, 2017 at 4:46 PM, Beena Emerson <memissemerson@gmail.com>
wrote:

Few more comments:

= Background worker messages:

- Workers when launched, show messages like: "logical replication

launcher

started”, "autovacuum launcher started”. We should probably have a

similar

message to show that the pg_prewarm load and dump bgworker has started.

-- Thanks, I will add startup and shutdown message.

- "auto pg_prewarm load: number of buffers to load x”, other messages

show

space before and after “:”, we should keep it consistent through out.

-- I think you are testing patch 03. The latest patch_04 have
corrected same. Can you please re-test it.

I had initially written comments for 03 and then again tested for 04 and
retained comments valid for it. I guess I missed to removed this. Sorry.

= Action of -1.
I thought we decided that interval value of -1 would mean that the auto
prewarm worker will not be run at all. With current code, -1 is

explained to

mean it will not dump. I noticed that reloading with new option as -1

stops

both the workers but restarting loads the data and then quits. Why does

it

allow loading with -1? Please explain this better in the documents.

-- '-1' means we do not want to dump at all. So we decide not to
continue with launched bgworker and it exits. As per your first
comment, if I register the startup and shutdown messages for auto
pg_prewarm I think it will look better. Will try to explain it in a
better way in documentation. The "auto pg_prewarm load" will not be
affected with dump_interval value. It will always start, load(prewarm)
and then exit.

= launch_pg_prewarm_dump()

=# SELECT launch_pg_prewarm_dump();
launch_pg_prewarm_dump
------------------------
53552
(1 row)

$ ps -ef | grep 53552
b_emers+ 53555 4391 0 16:21 pts/1 00:00:00 grep --color=auto 53552

-- If dump_interval = -1 "auto pg_prewarm dump" will exit immediately.

= Function names
- load_now could be better named as load_file, load_dumpfile or similar.
- dump_now -> dump_buffer or better?

I did choose load_now and dump_now to indicate we are doing it
immediately as invoking them was based on a timer/state. Probably we
can improve that but dump_buffer, load_file may not be the right
replacement.

= Corrupt file
if the dump file is corrupted, the system crashes and the prewarm

bgworkers

are not restarted. This needs to be handled better.

WARNING: terminating connection because of crash of another server

process

2017-02-07 16:36:58.680 IST [54252] DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit,

because

another server process exited abnormally and possibly corrupted shared
memory
--- Can you please paste you autopgprewarm.save file, I edited the
file manually to some illegal entry but did not see any crash.  Only
we failed to load as block number were invalid. Please share your
tests so that I can reproduce same.

I only changed the fork number from 0 to 10 in one of the entry.

= Documentation

I feel the documentation needs to be improved greatly.

- The first para in pg_prewarm should mention the autoload feature too.

- The new section should be named “The pg_prewarm autoload” or something
better. "auto pg_prewarm bgworker” does not seem appropriate. The
configuration parameter should also be listed out clearly like in

auth-delay

page. The new function launch_pg_prewarm_dump() should be listed under
Functions.

-- Thanks I will try to improve the documentation. And, put more details
there.

--
Thank you,

Beena Emerson

Have a Great Day!

#46

Beena Emerson

memissemerson@gmail.com

almost 9 years ago

In reply to: Mithun Cy (#43)

Re: Proposal : For Auto-Prewarm.

Hello,

On Tue, Feb 7, 2017 at 5:14 PM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

On Tue, Feb 7, 2017 at 12:24 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Tue, Feb 7, 2017 at 11:53 AM, Beena Emerson <memissemerson@gmail.com>

wrote:

Are 2 workers required?

I think in the new design there is a provision of launching the worker
dynamically to dump the buffers, so there seems to be a need of
separate workers for loading and dumping the buffers. However, there
is no explanation in the patch or otherwise when and why this needs a
pair of workers. Also, if the dump interval is greater than zero,
then do we really need to separately register a dynamic worker?

We have introduced a new value -1 for pg_prewarm.dump_interval this
means we will not dump at all, At this state, I thought auto
pg_prewarm process need not run at all, so I coded to exit the auto
pg_prewarm immediately. But If the user decides to start the auto
pg_prewarm to dump only without restarting the server, I have
introduced a launcher function "launch_pg_prewarm_dump" to restart the
auto pg_prewarm only to dump. Since now we can launch worker only to
dump, I thought we can redistribute the code between two workers, one
which only does prewarm (load only) and another dumps periodically.
This helped me to modularize and reuse the code. So once load worker
has finished its job, it registers a dump worker and then exists.
But if max_worker_processes is not enough to launch the "auto
pg_prewarm dump" bgworker
We throw an error
2017-02-07 14:51:59.789 IST [50481] ERROR: registering dynamic
bgworker "auto pg_prewarm dump" failed c
2017-02-07 14:51:59.789 IST [50481] HINT: Consider increasing
configuration parameter "max_worker_processes".

Now thinking again instead of such error and then correcting same by
explicitly launching the auto pg_prewarm dump bgwroker through
launch_pg_prewarm_dump(), I can go back to original design where there
will be one worker which loads and then dumps periodically. And
launch_pg_prewarm_dump will relaunch dump only activity of that
worker. Does this sound good?

Yes it would be better to have only one pg_prewarm worker as the loader is
idle for the entire server run time after the initial load activity of few
secs.

--
Thank you,

Beena Emerson

Have a Great Day!

#47

Mithun Cy

mithun.cy@enterprisedb.com

almost 9 years ago

In reply to: Beena Emerson (#46)

Re: Proposal : For Auto-Prewarm.

On Tue, Feb 7, 2017 at 6:11 PM, Beena Emerson <memissemerson@gmail.com> wrote:

Yes it would be better to have only one pg_prewarm worker as the loader is
idle for the entire server run time after the initial load activity of few
secs.

Sorry, that is again a bug in the code. The code to handle SIGUSR1
somehow got deleted before I submitted patch_03 and I failed to notice
same.
As in the code loader bgworker is waiting on the latch to know the
status of dump bgworker. Actually, the loader bgworker should exit
right after launching the dump bgworker. I will try to fix this and
other comments given by you in my next patch.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48

Amit Kapila

amit.kapila16@gmail.com

almost 9 years ago

In reply to: Mithun Cy (#43)

Re: Proposal : For Auto-Prewarm.

On Tue, Feb 7, 2017 at 5:14 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Tue, Feb 7, 2017 at 12:24 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Feb 7, 2017 at 11:53 AM, Beena Emerson <memissemerson@gmail.com> wrote:

Are 2 workers required?

I think in the new design there is a provision of launching the worker
dynamically to dump the buffers, so there seems to be a need of
separate workers for loading and dumping the buffers. However, there
is no explanation in the patch or otherwise when and why this needs a
pair of workers. Also, if the dump interval is greater than zero,
then do we really need to separately register a dynamic worker?

We have introduced a new value -1 for pg_prewarm.dump_interval this
means we will not dump at all, At this state, I thought auto
pg_prewarm process need not run at all, so I coded to exit the auto
pg_prewarm immediately. But If the user decides to start the auto
pg_prewarm to dump only without restarting the server, I have
introduced a launcher function "launch_pg_prewarm_dump" to restart the
auto pg_prewarm only to dump. Since now we can launch worker only to
dump, I thought we can redistribute the code between two workers, one
which only does prewarm (load only) and another dumps periodically.
This helped me to modularize and reuse the code. So once load worker
has finished its job, it registers a dump worker and then exists.
But if max_worker_processes is not enough to launch the "auto
pg_prewarm dump" bgworker
We throw an error
2017-02-07 14:51:59.789 IST [50481] ERROR: registering dynamic
bgworker "auto pg_prewarm dump" failed c
2017-02-07 14:51:59.789 IST [50481] HINT: Consider increasing
configuration parameter "max_worker_processes".

Now thinking again instead of such error and then correcting same by
explicitly launching the auto pg_prewarm dump bgwroker through
launch_pg_prewarm_dump(), I can go back to original design where there
will be one worker which loads and then dumps periodically. And
launch_pg_prewarm_dump will relaunch dump only activity of that
worker. Does this sound good?

Won't it be simple if you consider -1 as a value to just load library?
For *_interval = -1, it will neither load nor dump.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49

Beena Emerson

memissemerson@gmail.com

almost 9 years ago

In reply to: Amit Kapila (#48)

Re: Proposal : For Auto-Prewarm.

Hello,

On Wed, Feb 8, 2017 at 3:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Feb 7, 2017 at 5:14 PM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

On Tue, Feb 7, 2017 at 12:24 PM, Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Tue, Feb 7, 2017 at 11:53 AM, Beena Emerson <memissemerson@gmail.com>

wrote:

Are 2 workers required?

I think in the new design there is a provision of launching the worker
dynamically to dump the buffers, so there seems to be a need of
separate workers for loading and dumping the buffers. However, there
is no explanation in the patch or otherwise when and why this needs a
pair of workers. Also, if the dump interval is greater than zero,
then do we really need to separately register a dynamic worker?

We have introduced a new value -1 for pg_prewarm.dump_interval this
means we will not dump at all, At this state, I thought auto
pg_prewarm process need not run at all, so I coded to exit the auto
pg_prewarm immediately. But If the user decides to start the auto
pg_prewarm to dump only without restarting the server, I have
introduced a launcher function "launch_pg_prewarm_dump" to restart the
auto pg_prewarm only to dump. Since now we can launch worker only to
dump, I thought we can redistribute the code between two workers, one
which only does prewarm (load only) and another dumps periodically.
This helped me to modularize and reuse the code. So once load worker
has finished its job, it registers a dump worker and then exists.
But if max_worker_processes is not enough to launch the "auto
pg_prewarm dump" bgworker
We throw an error
2017-02-07 14:51:59.789 IST [50481] ERROR: registering dynamic
bgworker "auto pg_prewarm dump" failed c
2017-02-07 14:51:59.789 IST [50481] HINT: Consider increasing
configuration parameter "max_worker_processes".

Now thinking again instead of such error and then correcting same by
explicitly launching the auto pg_prewarm dump bgwroker through
launch_pg_prewarm_dump(), I can go back to original design where there
will be one worker which loads and then dumps periodically. And
launch_pg_prewarm_dump will relaunch dump only activity of that
worker. Does this sound good?

Won't it be simple if you consider -1 as a value to just load library?
For *_interval = -1, it will neither load nor dump.

+1
That is what I thought was the behaviour we decided upon for -1.

--
Thank you,

Beena Emerson

Have a Great Day!

#50

Peter Geoghegan

pg@bowt.ie

almost 9 years ago

In reply to: Mithun Cy (#41)

Re: Proposal : For Auto-Prewarm.

On Tue, Feb 7, 2017 at 1:31 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

SEGFAULT was the coding mistake I have called the C-language function
directly without initializing the functioncallinfo. Thanks for
raising. Below patch fixes same.

It would be nice if this had an option to preload only internal B-tree
pages into shared_buffers. They're usually less than 1% of the total
pages in a B-Tree, and are by far the most frequently accessed. It's
reasonable to suppose that much of the random I/O incurred when
warming the cache occurs there. Now, prewarming those will incur
random I/O, often completely random I/O, but by and large it would be
a matter of swallowing that cost sooner, through using your tool,
rather than later, during the execution of queries.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51

Amit Kapila

amit.kapila16@gmail.com

almost 9 years ago

In reply to: Peter Geoghegan (#50)

Re: Proposal : For Auto-Prewarm.

On Thu, Feb 9, 2017 at 12:36 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Feb 7, 2017 at 1:31 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

SEGFAULT was the coding mistake I have called the C-language function
directly without initializing the functioncallinfo. Thanks for
raising. Below patch fixes same.

It would be nice if this had an option to preload only internal B-tree
pages into shared_buffers.

Sure, but I think it won't directly fit into the current functionality
of patch which loads the blocks that were dumped from shared buffers
before the server has stopped. The way to extend it could be that
while dumping it just dumps the btree internal pages or something like
that.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Amit Kapila (#40)

Re: Proposal : For Auto-Prewarm.

On Tue, Feb 7, 2017 at 2:04 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Feb 7, 2017 at 10:44 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

==================
One problem now I have kept it open is multiple "auto pg_prewarm dump"
can be launched even if already a dump/load worker is running by
calling launch_pg_prewarm_dump. I can avoid this by dropping a
lock-file before starting the bgworkers. But, if there is an another
method to avoid launching bgworker on a simple method I can do same.

How about keeping a variable in PROC_HDR structure to indicate if
already one dump worker is running, then don't allow to start a new
one?

A contrib module shouldn't change core (and shouldn't need to). It
can register its own shared memory area if it wants.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#53

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Beena Emerson (#49)

Re: Proposal : For Auto-Prewarm.

On Thu, Feb 9, 2017 at 1:47 AM, Beena Emerson <memissemerson@gmail.com> wrote:

Won't it be simple if you consider -1 as a value to just load library?
For *_interval = -1, it will neither load nor dump.

+1
That is what I thought was the behaviour we decided upon for -1.

Right. I can't see why you'd want to be able to separately control
those two things. If you're not dumping, you don't want to load; if
you're not loading, you don't want to dump.

I think what should happen is that there should be only one worker.
If the GUC is -1, it never gets registered. Otherwise, it starts at
database startup time and runs until shutdown. At startup time, it
loads buffers until we run out of free buffers or until all saved
buffers are loaded, whichever happens first. Buffers should be sorted
by relfilenode (in any order) and block number (in increasing order).
Once it finishes loading buffers, it repeatedly sleeps for the amount
of time indicated by the GUC (or indefinitely if the GUC is 0),
dumping after each sleep and at shutdown.

Shutting down one worker to start up another doesn't seem to make
sense. If for some reason you want the code for those in separate
functions, you can call one function and then call the other. Putting
them in completely separate processes doesn't buy anything.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#54

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

almost 9 years ago

In reply to: Robert Haas (#53)

Re: Proposal : For Auto-Prewarm.

On 2/10/17 15:12, Robert Haas wrote:

Right. I can't see why you'd want to be able to separately control
those two things. If you're not dumping, you don't want to load; if
you're not loading, you don't want to dump.

What about the case where you want to prewarm a standby from the info
from the primary (or another standby)?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Peter Eisentraut (#54)

Re: Proposal : For Auto-Prewarm.

On Wed, Feb 22, 2017 at 4:06 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 2/10/17 15:12, Robert Haas wrote:

Right. I can't see why you'd want to be able to separately control
those two things. If you're not dumping, you don't want to load; if
you're not loading, you don't want to dump.

What about the case where you want to prewarm a standby from the info
from the primary (or another standby)?

I think it's OK to treat that as something of a corner case. There's
nothing to keep you from doing that today: just use pg_buffercache to
dump a list of blocks on one server, and then pass those blocks to
pg_prewarm on another server. The point of this patch, AIUI, is to
automate a particularly common case of that, which is to dump before
shutdown and load upon startup. It doesn't preclude other things that
people want to do.

I suppose we could have an SQL-callable function that does an
immediate dump (without starting a worker). That wouldn't hurt
anything, and might be useful in a case like the one you mention.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#56

Mithun Cy

mithun.cy@enterprisedb.com

almost 9 years ago

In reply to: Robert Haas (#55)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

Hi all thanks,
I have tried to fix all of the comments given above with some more
code cleanups.

On Wed, Feb 22, 2017 at 6:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I think it's OK to treat that as something of a corner case. There's
nothing to keep you from doing that today: just use pg_buffercache to
dump a list of blocks on one server, and then pass those blocks to
pg_prewarm on another server. The point of this patch, AIUI, is to
automate a particularly common case of that, which is to dump before
shutdown and load upon startup. It doesn't preclude other things that
people want to do.

I suppose we could have an SQL-callable function that does an
immediate dump (without starting a worker). That wouldn't hurt
anything, and might be useful in a case like the one you mention.

In the latest patch, I have moved the things back as in old ways there
will be one bgworker "auto pg_prewarm" which automatically records
information about blocks which were present in buffer pool before
server shutdown and then prewarm the buffer pool upon server restart
with those blocks. I have reverted back the code which helped us to
launch the stopped "auto pg_prewarm" bgworker. The reason I introduced
a launcher SQL utility function is the running auto pg_prewarm can be
stopped by the user by setting dump_interval to -1. So if the user
wants to restart the stopped auto pg_prewarm(this time dump only to
prewarm on next restart), he can use that utility. The user can launch
the auto pg_prewarm to dump periodically while the server is still
running. If that was not the concern I think I misunderstood the
comments and overdid the design. So as a first patch I will keep the
things simple. Also,
using a separate process for prewarm and dump activity was a bad
design hence reverted back same. The auto pg_prewarm can only be
launched by preloading the library. And I can add additional
utilities, once we can formalize what is really needed out of it.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

pg_auto_prewarm_05.patchapplication/octet-stream; name=pg_auto_prewarm_05.patchDownload

commit 82a8b7f8fc3d40ddfcb3c5a92ae07e43a0235d27
Author: mithun <mithun@localhost.localdomain>
Date:   Wed Feb 22 23:26:17 2017 +0530

    Feature : auto pg_prewarm
    
    Author : Mithun C Y

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e..badd0c0 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,7 +1,7 @@
 # contrib/pg_prewarm/Makefile
 
 MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+OBJS = pg_prewarm.o auto_pg_prewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
 DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
diff --git a/contrib/pg_prewarm/auto_pg_prewarm.c b/contrib/pg_prewarm/auto_pg_prewarm.c
new file mode 100644
index 0000000..f25c8da
--- /dev/null
+++ b/contrib/pg_prewarm/auto_pg_prewarm.c
@@ -0,0 +1,695 @@
+/*-------------------------------------------------------------------------
+ *
+ * auto_pg_prewarm.c
+ *
+ * -- Automatically prewarm the shared buffer pool when server restarts.
+ *
+ *	Copyright (c) 2013-2017, PostgreSQL Global Development Group
+ *
+ *	IDENTIFICATION
+ *		contrib/pg_prewarm.c/auto_pg_prewarm.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include <unistd.h>
+
+/* These are always necessary for a bgworker. */
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+
+/* These are necessary for prewarm utilities. */
+#include "pgstat.h"
+#include "storage/buf_internals.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "utils/guc.h"
+#include "catalog/pg_class.h"
+
+/*
+ * auto pg_prewarm :
+ *
+ * What is it?
+ * ===========
+ * A bgworker which automatically records information about blocks which were
+ * present in buffer pool before server shutdown and then prewarm the buffer
+ * pool upon server restart with those blocks.
+ *
+ * How does it work?
+ * =================
+ * When the shared library "pg_prewarm" is preloaded, a
+ * bgworker "auto pg_prewarm" is launched immediately after the server is
+ * started.  The bgworker will start loading blocks recorded in the format
+ * BlockInfoRecord <<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>> in
+ * $PGDATA/AUTO_PG_PREWARM_FILE, until there is a free buffer left in the
+ * buffer pool. This way we do not replace any new blocks which were loaded
+ * either by the recovery process or the querying clients.
+ *
+ * Once the "auto pg_prewarm" bgworker has completed its prewarm task, it will
+ * start a new task to periodically dump the BlockInfoRecords related to blocks
+ * which are currently in shared buffer pool. Upon next server restart, the
+ * bgworker will prewarm the buffer pool by loading those blocks. The GUC
+ * pg_prewarm.dump_interval will control the dumping activity of the bgworker.
+ */
+
+#define AT_PWARM_OFF -1
+#define AT_PWARM_DUMP_AT_SHUTDOWN_ONLY 0
+#define AT_PWARM_DEFAULT_DUMP_INTERVAL 300
+
+/* Primary functions */
+void		_PG_init(void);
+static void auto_pgprewarm_main(Datum main_arg);
+static bool load_block(RelFileNode rnode, char reltype, ForkNumber forkNum,
+		   BlockNumber blockNum);
+static void register_auto_pgprewarm(void);
+static void dump_block_info_periodically(void);
+static void auto_prewarm_tasks(void);
+
+/*
+ * ============================================================================
+ * ===========================	 SIGNAL HANDLERS	===========================
+ * ============================================================================
+ */
+
+static void sigtermHandler(SIGNAL_ARGS);
+static void sighupHandler(SIGNAL_ARGS);
+
+/* flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+static volatile sig_atomic_t got_sighup = false;
+
+/*
+ *	Signal handler for SIGTERM
+ *	Set a flag to let the main loop to terminate, and set our latch to wake it
+ *	up.
+ */
+static void
+sigtermHandler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGHUP
+ *	Set a flag to tell the main loop to reread the config file, and set our
+ *	latch to wake it up.
+ */
+static void
+sighupHandler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sighup = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/* ============================================================================
+ * ==============	types and variables used by auto pg_prewam	  =============
+ * ============================================================================
+ */
+
+/*
+ * Meta-data of each persistent block which is dumped and used to load.
+ */
+typedef struct BlockInfoRecord
+{
+	Oid			database;		/* database */
+	Oid			spcNode;		/* tablespace */
+	Oid			filenode;		/* relation */
+	ForkNumber	forknum;		/* fork number */
+	BlockNumber blocknum;		/* block number */
+}	BlockInfoRecord;
+
+/*
+ * state which indicates the activity of auto pg_prewarm.
+ */
+typedef enum
+{
+	TASK_PREWARM_BUFFERPOOL,	/* prewarm the buffer pool. */
+	TASK_DUMP_BUFFERPOOL_INFO,	/* dump the buffer pool block info. */
+	TASK_END					/* no more tasks to do. */
+}	auto_pg_prewarm_task;
+
+auto_pg_prewarm_task next_task = TASK_END;
+
+/* GUC variable which control the dump activity of auto pg_prewarm. */
+int			dump_interval = 0;
+
+/* compare member elements to check if they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * sort_cmp_func - compare function used for qsort().
+ */
+static int
+sort_cmp_func(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(spcNode);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+	return 0;
+}
+
+#define AUTO_PG_PREWARM_FILE "autopgprewarm"
+
+/* ============================================================================
+ * =====================	auto pg_prewarm utility functions	===============
+ * ============================================================================
+ */
+
+/*
+ *	load_block - Load a given block.
+ *
+ *	returns true if successfully loaded.
+ */
+static bool
+load_block(RelFileNode rnode, char reltype, ForkNumber forkNum,
+		   BlockNumber blockNum)
+{
+	Buffer		buffer;
+	SMgrRelation smgr = smgropen(rnode, InvalidBackendId);
+
+	/*
+	 * First check if fork exists. Otherwise we will not be able to use one
+	 * free buffer for each non existing block.
+	 */
+	if (forkNum > InvalidForkNumber && forkNum <= MAX_FORKNUM &&
+		smgrexists(smgr, forkNum))
+	{
+		buffer = ReadBufferForPrewarm(smgr, reltype,
+									  forkNum, blockNum,
+									  RBM_NORMAL, NULL);
+		if (BufferIsValid(buffer))
+		{
+			ReleaseBuffer(buffer);
+			return true;
+		}
+	}
+
+	return false;
+}
+
+/*
+ *	prewarm_buffer_pool - the main routine which prewarm the buffer pool.
+ *	We try to load each blocknum read from $PGDATA/AUTO_PG_PREWARM_FILE until
+ *	we have any free buffer left or SIGTERM is received. If we fail to load a
+ *	block we ignore the ERROR and try to load next blocknum. This is because
+ *	there is a possibility that corresponding blocknum might have been deleted.
+ */
+static void
+prewarm_buffer_pool(void)
+{
+	static char dump_file_path[MAXPGPATH];
+	FILE	   *file = NULL;
+	uint32		i,
+				num_blocks = 0,
+				total_blocks_loaded = 0;
+
+	next_task = TASK_DUMP_BUFFERPOOL_INFO;
+
+	/* check if file exists and open file in read mode. */
+	snprintf(dump_file_path, sizeof(dump_file_path), "%s.save",
+			 AUTO_PG_PREWARM_FILE);
+	file = fopen(dump_file_path, PG_BINARY_R);
+
+	if (!file)
+		return;					/* No file to load. */
+
+	if (fscanf(file, "<<%u>>", &num_blocks) != 1)
+	{
+		fclose(file);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("auto pg_prewarm load : error reading num of elements"
+						" in \"%s\" : %m", dump_file_path)));
+	}
+
+	elog(LOG, "auto pg_prewarm load : %u blocks to load", num_blocks);
+
+	for (i = 0; i < num_blocks; i++)
+	{
+		RelFileNode rnode;
+		uint32		forknum;
+		BlockNumber blocknum;
+
+		if (got_sigterm)
+		{
+			/*
+			 * Received shutdown while we were still loading the blocks. No
+			 * need to dump at this stage.
+			 */
+			next_task = TASK_END;
+			break;
+		}
+
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+			if (dump_interval == AT_PWARM_OFF)
+			{
+				next_task = TASK_END;
+				break;
+			}
+
+			/*
+			 * It is sad that SIGHUP was not to turn auto pg_prewarm off!. We
+			 * lost some valuable time here, which could have been used to
+			 * prewarm some more buffers. But it is inevitable, there might be
+			 * a genuine case where user want to stop prewarm process which is
+			 * taking long time and he do not want it any more.
+			 */
+		}
+
+		/*
+		 * Load the block only if there exist a free buffer. We do not want to
+		 * replace a block already in buffer pool.
+		 */
+		if (!have_free_buffer())
+			break;
+
+		/* get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &rnode.dbNode, &rnode.spcNode,
+						&rnode.relNode, &forknum, &blocknum))
+			break;				/* No more valid entry hence stop processing. */
+
+		PG_TRY();
+		{
+			if (load_block(rnode, RELPERSISTENCE_PERMANENT,
+						   (ForkNumber) forknum, blocknum))
+				total_blocks_loaded++;
+		}
+		PG_CATCH();
+		{
+			/* any error handle it and then try to load next block. */
+
+			/* prevent interrupts while cleaning up */
+			HOLD_INTERRUPTS();
+
+			/* report the error to the server log */
+			EmitErrorReport();
+
+			LWLockReleaseAll();
+			AbortBufferIO();
+			UnlockBuffers();
+
+			/* buffer pins are released here. */
+			ResourceOwnerRelease(CurrentResourceOwner,
+								 RESOURCE_RELEASE_BEFORE_LOCKS,
+								 false, true);
+			FlushErrorState();
+
+			/* now we can allow interrupts again */
+			RESUME_INTERRUPTS();
+		}
+		PG_END_TRY();
+	}
+
+	fclose(file);
+
+	elog(LOG,
+		 "auto pg_prewarm load : %u blocks sucessfully loaded",
+		 total_blocks_loaded);
+	return;
+}
+
+/*
+ *	dump_now - the main routine which goes through each buffer header of
+ *	buffer pool and dumps their meta data in the format
+ *	<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>. We Sort these data
+ *	and then dump them. Sorting is necessary as it facilitates sequential read
+ *	during load. Unlike load, if we encounter any error we abort the dump.
+ */
+static void
+dump_now(void)
+{
+	static char dump_file_path[MAXPGPATH],
+				transient_dump_file_path[MAXPGPATH];
+	uint32		i;
+	int			ret;
+	uint32		num_blocks;
+	BlockInfoRecord *block_info_array;
+	BufferDesc *bufHdr;
+	FILE	   *file = NULL;
+
+	if (next_task == TASK_END)
+		return;
+
+	/*
+	 * set next_task to TASK_END, if dump failed we try to avoid another dump
+	 * activity.
+	 */
+	next_task = TASK_END;
+
+	block_info_array =
+		(BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	for (num_blocks = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		/* only valid and persistent blocks are dumped. */
+		if ((buf_state & BM_VALID) && (buf_state & BM_TAG_VALID) &&
+			(buf_state & BM_PERMANENT))
+		{
+			block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_blocks].spcNode = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+			++num_blocks;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	/* sorting now only to avoid sorting while loading. */
+	pg_qsort(block_info_array, num_blocks, sizeof(BlockInfoRecord),
+			 sort_cmp_func);
+
+	snprintf(transient_dump_file_path, sizeof(dump_file_path),
+			 "%s.save.tmp", AUTO_PG_PREWARM_FILE);
+	file = fopen(transient_dump_file_path, "w");
+	if (file == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("auto pg_prewarm dump : could not open \"%s\": %m",
+						dump_file_path)));
+
+	snprintf(dump_file_path, sizeof(dump_file_path),
+			 "%s.save", AUTO_PG_PREWARM_FILE);
+
+	/* write num_blocks first and then BlockMetaInfoRecords. */
+	ret = fprintf(file, "<<%u>>\n", num_blocks);
+	if (ret < 0)
+	{
+		fclose(file);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("auto pg_prewarm dump : error writing to \"%s\" : %m",
+						dump_file_path)));
+	}
+
+	for (i = 0; i < num_blocks; i++)
+	{
+		ret = fprintf(file, "%u,%u,%u,%u,%u\n",
+					  block_info_array[i].database,
+					  block_info_array[i].spcNode,
+					  block_info_array[i].filenode,
+					  (uint32) block_info_array[i].forknum,
+					  block_info_array[i].blocknum);
+		if (ret < 0)
+		{
+			fclose(file);
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("auto pg_prewarm dump : error writing to"
+							" \"%s\" : %m", dump_file_path)));
+		}
+	}
+
+	pfree(block_info_array);
+
+	/*
+	 * rename transient_dump_file_path to dump_file_path to make things
+	 * permanent.
+	 */
+	ret = fclose(file);
+	if (ret != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("auto pg_prewarm dump : error closing \"%s\" : %m",
+						transient_dump_file_path)));
+
+	ret = unlink(dump_file_path);
+	if (ret != 0 && errno != ENOENT)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("auto pg_prewarm dump : unlink \"%s\" failed : %m",
+						dump_file_path)));
+
+	ret = rename(transient_dump_file_path, dump_file_path);
+	if (ret != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("auto pg_prewarm dump : failed to rename \"%s\" to"
+						" \"%s\" : %m",
+						transient_dump_file_path, dump_file_path)));
+
+	/* the dump was successful, let's do one more time! */
+	if (!got_sigterm)
+		next_task = TASK_DUMP_BUFFERPOOL_INFO;
+
+	elog(LOG, "auto pg_prewarm dump : saved metadata info of %d blocks",
+		 num_blocks);
+}
+
+/*
+ * dump_block_info_periodically - at regular intervals, which is defined by GUC
+ * dump_interval, dump the info of blocks which are present in buffer pool.
+ */
+void
+dump_block_info_periodically()
+{
+	int			timeout = AT_PWARM_DEFAULT_DUMP_INTERVAL;
+
+	Assert(next_task == TASK_DUMP_BUFFERPOOL_INFO);
+
+	while (!got_sigterm)
+	{
+		int			rc;
+
+		if (dump_interval > AT_PWARM_DUMP_AT_SHUTDOWN_ONLY)
+			timeout = dump_interval;
+
+		ResetLatch(&MyProc->procLatch);
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   timeout * 1000, PG_WAIT_EXTENSION);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Has been set not to dump. Nothing more to do. */
+		if (dump_interval == AT_PWARM_OFF)
+		{
+			next_task = TASK_END;
+			return;
+		}
+
+		/* If dump_interval is set then dump the buff pool. */
+		if ((rc & WL_TIMEOUT) &&
+			(dump_interval > AT_PWARM_DUMP_AT_SHUTDOWN_ONLY))
+			dump_now();
+	}
+
+	/* One last block meta info dump while postmaster shutdown. */
+	if (dump_interval != AT_PWARM_OFF)
+		dump_now();
+
+	next_task = TASK_END;
+}
+
+/*
+ *	auto_prewarm_tasks -- perform next task of auto pg_prewarm.
+ */
+void
+auto_prewarm_tasks(void)
+{
+	if (next_task == TASK_PREWARM_BUFFERPOOL)
+		prewarm_buffer_pool();
+
+	if (next_task == TASK_DUMP_BUFFERPOOL_INFO)
+		dump_block_info_periodically();
+}
+
+/*
+ * auto_pgprewarm_main -- the main entry point of auto pg_prewarm bgworker
+ *						  process.
+ */
+static void
+auto_pgprewarm_main(Datum main_arg)
+{
+	MemoryContext autoprewarmer_context;
+	sigjmp_buf	local_sigjmp_buf;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, sigtermHandler);
+	pqsignal(SIGHUP, sighupHandler);
+
+	next_task = DatumGetInt32(main_arg);
+
+	/*
+	 * Create a resource owner to keep track of our resources.
+	 */
+	CurrentResourceOwner = ResourceOwnerCreate(NULL, "autoprewarmer");
+
+	/*
+	 * Create a memory context that we will do all our work in.  We do this so
+	 * that we can reset the context during error recovery and thereby avoid
+	 * possible memory leaks.
+	 */
+	autoprewarmer_context = AllocSetContextCreate(TopMemoryContext,
+												  "autoprewarmer",
+												  ALLOCSET_DEFAULT_MINSIZE,
+												  ALLOCSET_DEFAULT_INITSIZE,
+												  ALLOCSET_DEFAULT_MAXSIZE);
+	MemoryContextSwitchTo(autoprewarmer_context);
+
+	elog(LOG, "auto pg_prewarm has started");
+
+	/*
+	 * **** establish error handling mechanism. ****
+	 */
+	if (sigsetjmp(local_sigjmp_buf, 1) != 0)
+	{
+		/* Since not using PG_TRY, must reset error stack by hand */
+		error_context_stack = NULL;
+
+		/* Prevent interrupts while cleaning up */
+		HOLD_INTERRUPTS();
+
+		/* Report the error to the server log */
+		EmitErrorReport();
+
+		LWLockReleaseAll();
+		AbortBufferIO();
+		UnlockBuffers();
+
+		/* buffer pins are released here. */
+		ResourceOwnerRelease(CurrentResourceOwner,
+							 RESOURCE_RELEASE_BEFORE_LOCKS,
+							 false, true);
+		AtEOXact_Buffers(false);
+		AtEOXact_SMgr();
+
+		MemoryContextSwitchTo(autoprewarmer_context);
+		FlushErrorState();
+
+		/* Flush any leaked data in the top-level context */
+		MemoryContextResetAndDeleteChildren(autoprewarmer_context);
+
+		/* Now we can allow interrupts again */
+		RESUME_INTERRUPTS();
+
+		/* Close all open files after any error. */
+		smgrcloseall();
+
+		if (next_task == TASK_END)
+		{
+			elog(LOG, "auto pg_prewarm shutting down");
+			proc_exit(1);
+		}
+	}
+
+	/* We can now handle ereport(ERROR) */
+	PG_exception_stack = &local_sigjmp_buf;
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+
+	/*
+	 * **** perform auto pg_prewarm's next task	****
+	 */
+	auto_prewarm_tasks();
+	elog(LOG, "auto pg_prewarm shutting down");
+}
+
+/* ============================================================================
+ * =============	extension's entry functions/utilities	===================
+ * ============================================================================
+ */
+
+/* Register auto pg_prewarm load bgworker. */
+static void
+register_auto_pgprewarm()
+{
+	BackgroundWorker auto_pg_prewarm;
+
+	MemSet(&auto_pg_prewarm, 0, sizeof(auto_pg_prewarm));
+	auto_pg_prewarm.bgw_main_arg = Int32GetDatum(0);
+	auto_pg_prewarm.bgw_flags = BGWORKER_SHMEM_ACCESS;
+
+	/* Register the auto pg_prewarm background worker */
+	auto_pg_prewarm.bgw_start_time = BgWorkerStart_PostmasterStart;
+	auto_pg_prewarm.bgw_restart_time = BGW_NEVER_RESTART;
+	auto_pg_prewarm.bgw_main = auto_pgprewarm_main;
+	snprintf(auto_pg_prewarm.bgw_name, BGW_MAXLEN, "auto pg_prewarm");
+	auto_pg_prewarm.bgw_main_arg = UInt32GetDatum(TASK_PREWARM_BUFFERPOOL);
+	RegisterBackgroundWorker(&auto_pg_prewarm);
+}
+
+/* Extension's entry point. */
+void
+_PG_init(void)
+{
+	/* Define custom GUC variables. */
+	DefineCustomIntVariable("pg_prewarm.dump_interval",
+					   "Sets the maximum time between two buffer pool dumps",
+							"If set to Zero, timer based dumping is disabled."
+						 " If set to -1, stops the running auto pg_prewarm.",
+							&dump_interval,
+							AT_PWARM_DEFAULT_DUMP_INTERVAL,
+							AT_PWARM_OFF, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	/* if not run as a preloaded library, nothing more to do here! */
+	if (!process_shared_preload_libraries_in_progress)
+		return;
+
+	/* Has been set not to prewarm/dump. Nothing more to do. */
+	if (dump_interval == AT_PWARM_OFF)
+	{
+		next_task = TASK_END;
+		return;
+	}
+
+	/* Register auto pg_prewarm load. */
+	register_auto_pgprewarm();
+}
diff --git a/doc/src/sgml/pgprewarm.sgml b/doc/src/sgml/pgprewarm.sgml
index c090401..3b610be 100644
--- a/doc/src/sgml/pgprewarm.sgml
+++ b/doc/src/sgml/pgprewarm.sgml
@@ -10,7 +10,9 @@
  <para>
   The <filename>pg_prewarm</filename> module provides a convenient way
   to load relation data into either the operating system buffer cache
-  or the <productname>PostgreSQL</productname> buffer cache.
+  or the <productname>PostgreSQL</productname> buffer cache. Additionally, an
+  automatic prewarming of the server buffers is supported whenever the server
+  restarts.
  </para>
 
  <sect2>
@@ -58,6 +60,59 @@ pg_prewarm(regclass, mode text default 'buffer', fork text default 'main',
  </sect2>
 
  <sect2>
+  <title>auto pg_prewarm</title>
+
+  <para>
+  A bgworker which automatically records information about blocks which were
+  present in buffer pool before server shutdown and then prewarm the buffer
+  pool upon server restart with those blocks.
+  </para>
+
+  <para>
+  When the shared library <literal>pg_prewarm</literal> is preloaded via
+  <xref linkend="guc-shared-preload-libraries"> in <filename>postgresql.conf</>,
+  a bgworker <literal>auto pg_prewarm</literal> is launched immediately after
+  the server is started. The bgworker will start loading blocks recorded in
+  <literal>$PGDATA/autopgprewarm.save</literal> until there is a free buffer
+  left in the buffer pool. This way we do not replace any new blocks which were
+  loaded either by the recovery process or the querying clients.
+  </para>
+
+  <para>
+  Once the <literal>auto pg_prewarm</literal> bgworker has completed its
+  prewarm task, it will start a new task to periodically dump the information
+  about blocks which are currently in shared buffer pool. Upon next server
+  restart, the bgworker will prewarm the buffer pool by loading those blocks.
+  The GUC <literal>pg_prewarm.dump_interval</literal> will control the dumping
+  activity of the bgworker.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term>
+     <varname>pg_prewarm.dump_interval</varname> (<type>int</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.dump_interval</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      The minimum number of seconds between two buffer pool's block information
+      dump. The default is 300 seconds. It also takes special values. If set to
+      0 then timer based dump is disabled, it dumps only while the server is
+      shutting down. If set to -1, the running
+      <literal>auto pg_prewarm</literal> will be stopped.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </sect2>
+
+ <sect2>
   <title>Author</title>
 
   <para>
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 3cb5120..82d1464 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -693,6 +693,20 @@ ReadBufferWithoutRelcache(RelFileNode rnode, ForkNumber forkNum,
 							 mode, strategy, &hit);
 }
 
+/*
+ * ReadBufferForPrewarm -- This new interface is for auto pg_prewarm.
+ */
+Buffer
+ReadBufferForPrewarm(SMgrRelation smgr, char relpersistence,
+					 ForkNumber forkNum, BlockNumber blockNum,
+					 ReadBufferMode mode, BufferAccessStrategy strategy)
+{
+	bool        hit;
+
+	return ReadBuffer_common(smgr, relpersistence, forkNum, blockNum,
+							 mode, strategy, &hit);
+}
+
 
 /*
  * ReadBuffer_common -- common logic for all ReadBuffer variants
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 5d0a636..4606a32 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,19 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- This function check whether there is a free buffer in
+ * buffer pool. Used by auto pg_prewarm module.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index d117b66..58d4871 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 07a32d6..dd98fde 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -16,6 +16,7 @@
 
 #include "storage/block.h"
 #include "storage/buf.h"
+#include "storage/smgr.h"
 #include "storage/bufpage.h"
 #include "storage/relfilenode.h"
 #include "utils/relcache.h"
@@ -172,6 +173,10 @@ extern Buffer ReadBufferExtended(Relation reln, ForkNumber forkNum,
 extern Buffer ReadBufferWithoutRelcache(RelFileNode rnode,
 						  ForkNumber forkNum, BlockNumber blockNum,
 						  ReadBufferMode mode, BufferAccessStrategy strategy);
+extern Buffer ReadBufferForPrewarm(SMgrRelation smgr, char relpersistence,
+								   ForkNumber forkNum, BlockNumber blockNum,
+								   ReadBufferMode mode,
+								   BufferAccessStrategy strategy);
 extern void ReleaseBuffer(Buffer buffer);
 extern void UnlockReleaseBuffer(Buffer buffer);
 extern void MarkBufferDirty(Buffer buffer);

#57

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Mithun Cy (#56)

Re: Proposal : For Auto-Prewarm.

On Wed, Feb 22, 2017 at 11:49 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

Hi all thanks,
I have tried to fix all of the comments given above with some more
code cleanups.

While reading this patch tonight, I realized a serious problem with
the entire approach, which is that this patch is supposing that we can
read relation blocks for every database from a single worker that's
not connected to any database. I realize that I suggested that
approach, but now I think it's broken, because the patch isn't taking
any locks on the relations whose pages it is reading, and that is
definitely going to break things. While autoprewarm is busy sucking
blocks into the shared buffer cache, somebody could be, for example,
dropping one of those relations. DropRelFileNodesAllBuffers and
friends expect that nobody is going to be concurrently reading blocks
back into the buffer cache because they hold AccessExclusiveLock, and
they assume that anybody else who is touching it will hold at least
AccessShareLock. But this violates that assumption, and probably some
others.

This is not easy to fix. The lock has to be taken based on the
relation OID, not the relfilenode, but we don't have the relation OID
in the dump file, and recording it there won't help, because the
relfilenode can change under us if the relation is rewritten with
CLUSTER or VACUUM FULL or relevant forms of ALTER TABLE. I don't see
a solution other than launching a separate worker for each database,
which seems like it could be extremely expensive if there are many
databases. Also, I am pretty sure it's no good to take locks before
recovery reaches a consistent state. I'm not sure off-hand whether
crash-recovery will notice conflicting locks, but even if it does,
blocking crash recovery in order to prewarm stuff is bad news.

Here are some other review comments (which may not matter unless we
can think up a solution to the problems above).

- I think auto_pg_prewarm.c is an unnecessarily long name. How about
autoprewarm.c?

- It looks like you ran pgindent over this without adding the new
typedefs to your typedefs.list, so things like the last line of each
typedef is formatted incorrectly.

- ReadBufferForPrewarm isn't a good name for this interface. You need
to give it a generic name (and header comment) that describes what it
does, rather than why you added it. Others might want to also use
this interface. Actually, an even better idea might be to adjust
ReadBufferWithoutRelcache() to serve your need here. That function's
header comment seems to contemplate that somebody might want to add a
relpersistence argument someday; perhaps that day has arrived.

- have_free_buffer's comment shouldn't mention autoprewarm, but it
should mention that this is a lockless test, so the results might be
slightly stale. See similar comments in various other backend
functions for an example of how to write this.

- next_task could be static, and with such a generic name, it really
MUST be static to avoid namespace conflicts.

- load_block() has a race condition. The relation could be dropped
after you check smgrexists() and before you access the relation.
Also, the fork number validity check looks useless; it should never
fail.

- I suggest renaming the file that stores the blocks to
autoprewarm.blocks or something like that. Calling it just
"autopgprewarm" doesn't seem very clear.

- I don't see any reason for the dump file to contain a header record
with an expected record count. When rereading the file, you can just
read until EOF; there's no real need to know the record count before
you start.

- You should test for multiple flags like this: if ((buf_state &
(BM_VALID|VM_TAG_VALID|BM_PERSISTENT)) != 0). However, I also think
the test is wrong. Even if the buffer isn't BM_VALID, that's not
really a reason not to include it in the dump file. Same with
BM_PERSISTENT. I think the reason for the latter restriction is that
you're always calling load_block() with RELPERSISTENCE_PERMANENT, but
that's not a good idea either. If the relation were made unlogged
after you wrote the dump file, then on reload it you'd incorrectly set
BM_PERMANENT on the reloaded blocks.

- elog() should not be used for user-facing messages, but rather only
for internal messages that we don't expect to get generated. Also,
the messages you've picked don't conform to the project's message
style guidelines.

- The error handling loop around load_block() suggests that you're
expecting some reads to fail, which I guess is because you could be
trying to read blocks from a relation that's been rewritten under a
different relfilenode, or partially or entirely truncated. But I
don't think it's very reasonable to just let ReadBufferWhatever() spew
error messages into the log and hope users don't panic. People will
expect an automatic prewarm solution to handle any such cases quietly,
not bleat loudly in the log. I suspect that this error-trapping code
isn't entirely correct, but there's not much point in fixing it; what
we really need to do is get rid of it (somehow).

- dump_block_info_periodically() will sleep for too long - perhaps
forever - if WaitLatch() repeatedly returns due to WL_LATCH_SET, which
can probably happen if for any reason the process receives SIGUSR1
repeatedly. Every time the latch gets set, the timeout is reset, so
it may never expire. There are examples of how to write a loop like
this correctly in various places in the server; please check one of
those.

- I don't think you should need an error-catching loop in
auto_pgprewarm_main(), either. Just let the worker die if there's an
ERROR, and set the restart interval to something other than
BGW_NEVER_RESTART.

- Setting bgw_main isn't correct for extension code. Please read the
documentation on background workers, which explains how to do this
correctly in an extension.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#58

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

almost 9 years ago

In reply to: Robert Haas (#57)

Re: Proposal : For Auto-Prewarm.

On 2/26/17 11:46, Robert Haas wrote:

I don't see
a solution other than launching a separate worker for each database,
which seems like it could be extremely expensive if there are many
databases.

You don't have to start all these workers at once. Starting one and not
starting the next one until the first one is finished should be fine.
It will have the same serial behavior that the patch is proposing anyway.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#59

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Peter Eisentraut (#58)

Re: Proposal : For Auto-Prewarm.

On Mon, Feb 27, 2017 at 7:18 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 2/26/17 11:46, Robert Haas wrote:

I don't see
a solution other than launching a separate worker for each database,
which seems like it could be extremely expensive if there are many
databases.

You don't have to start all these workers at once. Starting one and not
starting the next one until the first one is finished should be fine.
It will have the same serial behavior that the patch is proposing anyway.

Yeah, true. The constant factor is higher, of course.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#60

Amit Kapila

amit.kapila16@gmail.com

almost 9 years ago

In reply to: Robert Haas (#57)

Re: Proposal : For Auto-Prewarm.

On Sun, Feb 26, 2017 at 10:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Feb 22, 2017 at 11:49 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

Hi all thanks,
I have tried to fix all of the comments given above with some more
code cleanups.

While reading this patch tonight, I realized a serious problem with
the entire approach, which is that this patch is supposing that we can
read relation blocks for every database from a single worker that's
not connected to any database. I realize that I suggested that
approach, but now I think it's broken, because the patch isn't taking
any locks on the relations whose pages it is reading, and that is
definitely going to break things. While autoprewarm is busy sucking
blocks into the shared buffer cache, somebody could be, for example,
dropping one of those relations. DropRelFileNodesAllBuffers and
friends expect that nobody is going to be concurrently reading blocks
back into the buffer cache because they hold AccessExclusiveLock, and
they assume that anybody else who is touching it will hold at least
AccessShareLock. But this violates that assumption, and probably some
others.

This is not easy to fix. The lock has to be taken based on the
relation OID, not the relfilenode, but we don't have the relation OID
in the dump file, and recording it there won't help, because the
relfilenode can change under us if the relation is rewritten with
CLUSTER or VACUUM FULL or relevant forms of ALTER TABLE. I don't see
a solution other than launching a separate worker for each database,
which seems like it could be extremely expensive if there are many
databases. Also, I am pretty sure it's no good to take locks before
recovery reaches a consistent state.

So we should move this loading of blocks once the recovery reaches a
consistent state so that we can connect to a database. To allow
worker, to take a lock, we need to dump relation oid as well. Is that
what you are envisioning to fix this problem?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#61

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Amit Kapila (#60)

Re: Proposal : For Auto-Prewarm.

On Thu, Mar 2, 2017 at 7:14 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

So we should move this loading of blocks once the recovery reaches a
consistent state so that we can connect to a database. To allow
worker, to take a lock, we need to dump relation oid as well. Is that
what you are envisioning to fix this problem?

No. The relation -> relfilenode mapping isn't constant, and the
dumping process has no way to get the relation OIDs from the
relfilenodes anyway. I think what you need to do is dump the
relfilenodes as at present, but then at restore time you need a worker
per database, and it connects to the database and then uses the
infrastructure added by f01d1ae3a104019d6d68aeff85c4816a275130b3 to
discover what relation OID, if any, currently corresponds to the
proposed relfilenode.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#62

Mithun Cy

mithun.cy@enterprisedb.com

almost 9 years ago

In reply to: Robert Haas (#57)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

Hi all, thanks for the feedback. Based on your recent comments I have
implemented a new patch which is attached below,

On Sun, Feb 26, 2017 at 10:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:

This is not easy to fix. The lock has to be taken based on the
relation OID, not the relfilenode, but we don't have the relation OID
in the dump file, and recording it there won't help, because the
relfilenode can change under us if the relation is rewritten with
CLUSTER or VACUUM FULL or relevant forms of ALTER TABLE. I don't see
a solution other than launching a separate worker for each database,
which seems like it could be extremely expensive if there are many
databases. Also, I am pretty sure it's no good to take locks before
recovery reaches a consistent state. I'm not sure off-hand whether
crash-recovery will notice conflicting locks, but even if it does,
blocking crash recovery in order to prewarm stuff is bad news.

On Mon, Feb 27, 2017 at 7:18 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

You don't have to start all these workers at once. Starting one and not
starting the next one until the first one is finished should be fine.
It will have the same serial behavior that the patch is proposing anyway.

I have implemented a similar logic now. The prewarm bgworker will
launch a sub-worker per database in the dump file. And, each
sub-worker will load its database block info. The sub-workers will be
launched only after previous one is finished. All of this will only
start if the database has reached a consistent state.

I have also introduced 2 more utility functions which were requested earlier.
A. launch_autoprewarm_dump() RETURNS int4
This is a SQL callable function to launch the autoprewarm worker to
dump the buffer pool information at regular interval. In a server, we
can only run one autoprewarm worker so if a worker sees another
existing worker it will exit immediately. The return value is pid of
the worker which has been launched.

B. autoprewarm_dump_now() RETURNS int8
This is a SQL callable function to dump buffer pool information
immediately once by a backend. This can work in parallel with an
autoprewarm worker while it is dumping. The return value is the number
of blocks info dumped.

I need some feedback on efficiently dividing the blocks among
sub-workers. Existing dump format will not be much useful.

I have chosen the following format
Each entry of block info looks like this:
<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum> and we shall
call it as BlockInfoRecord.

Contents of AUTOPREWARM_FILE has been formated such a way that
blockInfoRecord of each database can be given to different prewarm
workers.
format of AUTOPREWAM_FILE
=======================================
[offset position of database map table]
[sorted BlockInfoRecords..............]
[database map table]
=======================================

The [database map table] is a sequence of offset in file which will
point to first BlockInfoRecords of each database in the dump. The
prewarm worker will read this offset one by one in sequence and ask
its sub-worker to seek to this position and then start loading the
BlockInfoRecords one by one until it sees a BlockInfoRecords of a
different database than it is actually connected to. NOTE: We store
off_t inside file so the dump file will not be portable to be used
across systems where sizeof off_t is different from each other.

I also thought of having one dump file per database. Problems I
thought which can go against it is there could be too many dump file
(also stale files of databases which are no longer in buffer pool).
Also, there is a possibility of dump file getting lost due to
concurrent dumps by bgworker and autoprewarm_dump_now() SQL utility.
ex: While implementing same, dump file names were chosen as number
sequence 0,1,2,3......number_of_db. (This helps to avoid stale files
being chosen before active database dump files). If 2 concurrent
process dump together there will be a race condition. If one page of
the new database is loaded to buffer pool at that point there could be
a possibility that a recent dump of database blocks will be
overwritten due to the race condition and it will go missing from
dumps even though that database is still active in bufferpool. If any
comments to fix this will be very helpful.

On Sun, Feb 26, 2017 at 10:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Here are some other review comments (which may not matter unless we
can think up a solution to the problems above).

- I think auto_pg_prewarm.c is an unnecessarily long name. How about
autoprewarm.c?

Fixed; I am also trying to replace the term "auto pg_prewarm" to
"autoprewarm" in all relevant places.

- It looks like you ran pgindent over this without adding the new
typedefs to your typedefs.list, so things like the last line of each
typedef is formatted incorrectly.

Fixed.

- ReadBufferForPrewarm isn't a good name for this interface. You need
to give it a generic name (and header comment) that describes what it
does, rather than why you added it. Others might want to also use
this interface. Actually, an even better idea might be to adjust
ReadBufferWithoutRelcache() to serve your need here. That function's
header comment seems to contemplate that somebody might want to add a
relpersistence argument someday; perhaps that day has arrived.

-- I think it is not needed now as we have relation descriptor hence
using ReadBufferExtended.

- have_free_buffer's comment shouldn't mention autoprewarm, but it
should mention that this is a lockless test, so the results might be
slightly stale. See similar comments in various other backend
functions for an example of how to write this.

-- Fixed

- next_task could be static, and with such a generic name, it really
MUST be static to avoid namespace conflicts.

-- Fixed. after the new code that variable is no longer needed.

- load_block() has a race condition. The relation could be dropped
after you check smgrexists() and before you access the relation.
Also, the fork number validity check looks useless; it should never
fail.

-- Now we hold a relation lock so this should not be an issue. But
extra check for forknum is required before calling smgrexists.
Otherwise, it crashes if valid forknum is not given hence check is
necessary.

crash call stack after manually setting one of the forknum 10

#0 0x00007f333bb27694 in vfprintf () from /lib64/libc.so.6
#1 0x00007f333bb53179 in vsnprintf () from /lib64/libc.so.6
#2 0x00000000009ec683 in pvsnprintf (buf=0x139e8e0 "global/1260_",
'\177' <repeats 115 times>, len=128, fmt=0xbebffe "global/%u_%s",
args=0x7ffeaea65750)
at psprintf.c:121
#3 0x00000000009ec5d1 in psprintf (fmt=0xbebffe "global/%u_%s") at
psprintf.c:64
#4 0x00000000009eca0f in GetRelationPath (dbNode=0, spcNode=1664,
relNode=1260, backendId=-1, forkNumber=10) at relpath.c:150
#5 0x000000000082cd95 in mdopen (reln=0x1359c78, forknum=10,
behavior=2) at md.c:583
#6 0x000000000082c58d in mdexists (reln=0x1359c78, forkNum=10) at md.c:284
#7 0x000000000082f4cf in smgrexists (reln=0x1359c78, forknum=10) at smgr.c:289
#8 0x00007f33353b0294 in load_one_database (main_arg=21) at autoprewarm.c:546
#9 0x0000000000795232 in StartBackgroundWorker () at bgworker.c:757
#10 0x00000000007a707d in do_start_bgworker (rw=0x12fc3d0) at postmaster.c:5570

- I suggest renaming the file that stores the blocks to
autoprewarm.blocks or something like that. Calling it just
"autopgprewarm" doesn't seem very clear.

-- Fixed. The current file name is "autopgprewarm.save" changed it to
autoprewarm.blocks.

- I don't see any reason for the dump file to contain a header record
with an expected record count. When rereading the file, you can just
read until EOF; there's no real need to know the record count before
you start.

-- Fixed. Now removed.

- You should test for multiple flags like this: if ((buf_state &
(BM_VALID|VM_TAG_VALID|BM_PERSISTENT)) != 0). However, I also think
the test is wrong. Even if the buffer isn't BM_VALID, that's not
really a reason not to include it in the dump file. Same with
BM_PERSISTENT. I think the reason for the latter restriction is that
you're always calling load_block() with RELPERSISTENCE_PERMANENT, but
that's not a good idea either. If the relation were made unlogged
after you wrote the dump file, then on reload it you'd incorrectly set
BM_PERMANENT on the reloaded blocks.

-- Fixed now removed BM_PERMANENT and BM_VALID only using BM_TAG_VALID
so if the hash table has this buffer we dump it.

- elog() should not be used for user-facing messages, but rather only
for internal messages that we don't expect to get generated. Also,
the messages you've picked don't conform to the project's message
style guidelines.

-- Fixed.

- The error handling loop around load_block() suggests that you're
expecting some reads to fail, which I guess is because you could be
trying to read blocks from a relation that's been rewritten under a
different relfilenode, or partially or entirely truncated. But I
don't think it's very reasonable to just let ReadBufferWhatever() spew
error messages into the log and hope users don't panic. People will
expect an automatic prewarm solution to handle any such cases quietly,
not bleat loudly in the log. I suspect that this error-trapping code
isn't entirely correct, but there's not much point in fixing it; what
we really need to do is get rid of it (somehow).

[Need Reelook] -- Debug and check if block load fails what will happen.

- dump_block_info_periodically() will sleep for too long - perhaps
forever - if WaitLatch() repeatedly returns due to WL_LATCH_SET, which
can probably happen if for any reason the process receives SIGUSR1
repeatedly. Every time the latch gets set, the timeout is reset, so
it may never expire. There are examples of how to write a loop like
this correctly in various places in the server; please check one of
those.

-- Fixed same, to count elapsed time from the previous dump and then dump.

- I don't think you should need an error-catching loop in
auto_pgprewarm_main(), either. Just let the worker die if there's an
ERROR, and set the restart interval to something other than
BGW_NEVER_RESTART.

-- Fixed we restart it now.

- Setting bgw_main isn't correct for extension code. Please read the
documentation on background workers, which explains how to do this
correctly in an extension.

-Fixed.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

pg_autoprewarm_06.patchapplication/octet-stream; name=pg_autoprewarm_06.patchDownload

commit a94095576809948e61b1778d4fb99010b28c51bb
Author: mithun <mithun@localhost.localdomain>
Date:   Mon Mar 13 18:32:15 2017 +0530

    Work :: AutoPrewarm
    Author :: Mithun

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e..88580d1 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,10 +1,10 @@
 # contrib/pg_prewarm/Makefile
 
 MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+OBJS = pg_prewarm.o autoprewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
-DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
+DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
new file mode 100644
index 0000000..f4b34ca
--- /dev/null
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -0,0 +1,1137 @@
+/*-------------------------------------------------------------------------
+ *
+ * autoprewarm.c
+ *
+ * -- Automatically prewarm the shared buffer pool when server restarts.
+ *
+ *	Copyright (c) 2013-2017, PostgreSQL Global Development Group
+ *
+ *	IDENTIFICATION
+ *		contrib/pg_prewarm.c/autoprewarm.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include <unistd.h>
+
+/* These are always necessary for a bgworker. */
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+
+/* These are necessary for prewarm utilities. */
+#include "pgstat.h"
+#include "storage/buf_internals.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "utils/guc.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "executor/spi.h"
+#include "access/xact.h"
+#include "utils/rel.h"
+#include "port/atomics.h"
+
+/*
+ * autoprewarm :
+ *
+ * What is it?
+ * ===========
+ * A bgworker which automatically records information about blocks which were
+ * present in buffer pool before server shutdown and then prewarm the buffer
+ * pool upon server restart with those blocks.
+ *
+ * How does it work?
+ * =================
+ * When the shared library "pg_prewarm" is preloaded, a
+ * bgworker "autoprewarm" is launched immediately after the server has reached
+ * consistent state. The bgworker will start loading blocks recorded in the
+ * format BlockInfoRecord
+ * <<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>> in
+ * $PGDATA/AUTOPREWARM_FILE, until there is a free buffer left in the buffer
+ * pool. This way we do not replace any new blocks which were loaded either by
+ * the recovery process or the querying clients.
+ *
+ * Once the "autoprewarm" bgworker has completed its prewarm task, it will
+ * start a new task to periodically dump the BlockInfoRecords related to blocks
+ * which are currently in shared buffer pool. Upon next server restart, the
+ * bgworker will prewarm the buffer pool by loading those blocks. The GUC
+ * pg_prewarm.dump_interval will control the dumping activity of the bgworker.
+ */
+
+PG_FUNCTION_INFO_V1(launch_autoprewarm_dump);
+PG_FUNCTION_INFO_V1(autoprewarm_dump_now);
+
+#define AT_PWARM_OFF -1
+#define AT_PWARM_DUMP_AT_SHUTDOWN_ONLY 0
+#define AT_PWARM_DEFAULT_DUMP_INTERVAL 300
+
+#define AUTOPREWARM_FILE "autoprewarm.blocks"
+
+/* Primary functions */
+void		_PG_init(void);
+void		autoprewarm_main(Datum main_arg);
+static void dump_block_info_periodically(void);
+static pid_t autoprewarm_dump_launcher(void);
+static void setup_autoprewarm(BackgroundWorker *autoprewarm,
+				  const char *worker_name,
+				  const char *worker_function,
+				  Datum main_arg, int restart_time,
+				  int extra_flags);
+void		load_one_database(Datum main_arg);
+
+/*
+ * ============================================================================
+ * ===========================	 SIGNAL HANDLERS	===========================
+ * ============================================================================
+ */
+
+static void sigtermHandler(SIGNAL_ARGS);
+static void sighupHandler(SIGNAL_ARGS);
+
+/* flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+static volatile sig_atomic_t got_sighup = false;
+
+/*
+ *	Signal handler for SIGTERM
+ *	Set a flag to let the main loop to terminate, and set our latch to wake it
+ *	up.
+ */
+static void
+sigtermHandler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGHUP
+ *	Set a flag to tell the process to reread the config file, and set our
+ *	latch to wake it up.
+ */
+static void
+sighupHandler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sighup = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGUSR1.
+ */
+static void
+sigusr1Handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/* ============================================================================
+ * ==============	types and variables used by autoprewam	  =============
+ * ============================================================================
+ */
+
+/*
+ * Meta-data of each persistent block which is dumped and used to load.
+ */
+typedef struct BlockInfoRecord
+{
+	Oid			database;		/* database */
+	Oid			spcNode;		/* tablespace */
+	Oid			filenode;		/* relation's filenode. */
+	ForkNumber	forknum;		/* fork number */
+	BlockNumber blocknum;		/* block number */
+} BlockInfoRecord;
+
+/*
+ * Tasks performed by autoprewarm workers.
+ */
+typedef enum
+{
+	TASK_PREWARM_BUFFERPOOL,	/* prewarm the buffer pool. */
+	TASK_DUMP_BUFFERPOOL_INFO,	/* dump the buffer pool block info. */
+	TASK_DUMP_IMMEDIATE_ONCE,	/* dump the buffer pool block info immediately
+								 * once. */
+	TASK_END					/* no more tasks to do. */
+} AutoPrewarmTask;
+
+/*
+ * Shared state information about the running autoprewarm bgworker.
+ */
+typedef struct AutoPrewarmSharedState
+{
+	pg_atomic_uint32 current_task;		/* current tasks performed by
+										 * autoprewarm workers. */
+} AutoPrewarmSharedState;
+
+static AutoPrewarmSharedState *state = NULL;
+
+/*
+ * Kind of BlockInfoRecord in AUTOPREWARM_FILE file.
+ */
+typedef enum
+{
+	BLKTYPE_NEW_DATABASE,		/* first BlockInfoRecord of new database. */
+	BLKTYPE_NEW_RELATION,		/* first BlockInfoRecord of new relation. */
+	BLKTYPE_NEW_FORK,			/* first BlockInfoRecord of new fork file. */
+	BLKTYPE_NEW_BLOCK,			/* any next BlockInfoRecord. */
+	BLKTYPE_END					/* No More BlockInfoRecords available in dump
+								 * file. */
+} BlkType;
+
+/* GUC variable which control the dump activity of autoprewarm. */
+static int	dump_interval = 0;
+
+/*
+ * GUC variable which says to which database we have to connect when
+ * BlockInfoRecord belongs to global objects.
+ */
+static char *default_database;
+
+/* compare member elements to check if they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * sort_cmp_func - compare function used for qsort().
+ */
+static int
+sort_cmp_func(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(spcNode);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+	return 0;
+}
+
+/* ============================================================================
+ * =====================	prewarm part of autoprewarm =======================
+ * ============================================================================
+ */
+
+/*
+ * set_autoprewarm_task - get next task allowed and to be performed by the
+ * autoprewarm worker.
+ *
+ * It works like this if we are the first to allocate shared memory we can do
+ * what ever task we wanted to do. If TASK_PREWARM_BUFFERPOOL is running
+ * nothing else can go parallel. If TASK_DUMP_BUFFERPOOL_INFO is running then
+ * only TASK_DUMP_IMMEDIATE_ONCE can go further ahead.
+ */
+static AutoPrewarmTask
+get_autoprewarm_task(AutoPrewarmTask todo_task)
+{
+	bool		found;
+
+	state = NULL;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	state = ShmemInitStruct("autoprewarm",
+							sizeof(AutoPrewarmSharedState),
+							&found);
+	if (!found)
+		pg_atomic_write_u32(&(state->current_task), todo_task);
+
+	LWLockRelease(AddinShmemInitLock);
+
+	/* If found check if we can go ahead. */
+	if (found)
+	{
+		if (pg_atomic_read_u32(&(state->current_task)) ==
+			TASK_PREWARM_BUFFERPOOL)
+		{
+			if (todo_task == TASK_PREWARM_BUFFERPOOL)
+			{
+				/*
+				 * we were prewarming and we are back to do same, time to
+				 * abort prewarming and move to dumping.
+				 */
+				pg_atomic_write_u32(&(state->current_task),
+									TASK_DUMP_BUFFERPOOL_INFO);
+				return TASK_DUMP_BUFFERPOOL_INFO;
+			}
+			else
+				return TASK_END;	/* rest all cannot proceed further. */
+		}
+		else if (pg_atomic_read_u32(&(state->current_task)) ==
+				 TASK_DUMP_BUFFERPOOL_INFO)
+		{
+			/*
+			 * only thing that can be done now is TASK_DUMP_IMMEDIATE_ONCE.
+			 */
+			if (todo_task == TASK_DUMP_IMMEDIATE_ONCE)
+				return TASK_DUMP_IMMEDIATE_ONCE;
+			else
+				return TASK_END;
+		}
+		else if (pg_atomic_read_u32(&(state->current_task)) ==
+				 TASK_DUMP_IMMEDIATE_ONCE)
+		{
+			uint32		current_state = TASK_DUMP_IMMEDIATE_ONCE;
+
+			/* We cannot do a TASK_PREWARM_BUFFERPOOL but rest can go ahead */
+			if (todo_task == TASK_DUMP_IMMEDIATE_ONCE)
+				return TASK_DUMP_IMMEDIATE_ONCE;
+
+			if (todo_task == TASK_PREWARM_BUFFERPOOL)
+				todo_task = TASK_DUMP_BUFFERPOOL_INFO;	/* skip to do dump only */
+
+			/*
+			 * first guy who can atomically set the current_task get the
+			 * opportunity to proceed further
+			 */
+			if (pg_atomic_compare_exchange_u32(&(state->current_task),
+											   &current_state,
+											   TASK_DUMP_BUFFERPOOL_INFO))
+			{
+				/* Wow! We won the race proceed with the task. */
+				return TASK_DUMP_BUFFERPOOL_INFO;
+			}
+			else
+				return TASK_END;
+		}
+
+		return TASK_END;
+	}
+
+	return todo_task;			/* we were first we can do what we wanted. */
+}
+
+/*
+ * getnextblockinfo -- given a BlkType get its next BlockInfoRecord from the
+ *					   dump file.
+ */
+static BlkType
+getnextblockinfo(FILE *file, BlockInfoRecord *currblkinfo, BlkType reqblock,
+				 BlockInfoRecord *newblkinfo)
+{
+	BlkType		nextblk;
+
+	while (true)
+	{
+		/* get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &(newblkinfo->database),
+						&(newblkinfo->spcNode), &(newblkinfo->filenode),
+						(uint32 *) &(newblkinfo->forknum),
+						&(newblkinfo->blocknum)))
+			return BLKTYPE_END; /* No more valid entry hence stop processing. */
+
+		if (!currblkinfo || newblkinfo->database != currblkinfo->database)
+			nextblk = BLKTYPE_NEW_DATABASE;
+		else if (newblkinfo->filenode != currblkinfo->filenode)
+			nextblk = BLKTYPE_NEW_RELATION;
+		else if (newblkinfo->forknum != currblkinfo->forknum)
+			nextblk = BLKTYPE_NEW_FORK;
+		else
+			nextblk = BLKTYPE_NEW_BLOCK;
+
+		if (nextblk <= reqblock)
+			return nextblk;
+	}
+}
+
+/*
+ * GetRelOid -- given a filenode get its relation oid.
+ */
+static Oid
+get_reloid(Oid filenode)
+{
+	int			ret;
+	Oid			relationid;
+	bool		isnull;
+	Datum		value[1] = {ObjectIdGetDatum(filenode)};
+	StringInfoData buf;
+	Oid			ptype[1] = {OIDOID};
+
+	initStringInfo(&buf);
+	appendStringInfo(&buf,
+			"select oid from pg_class where pg_relation_filenode(oid) = $1");
+
+	ret = SPI_execute_with_args(buf.data, 1, (Oid *) &ptype, (Datum *) &value,
+								NULL, true, 1);
+
+	if (ret != SPI_OK_SELECT)
+		ereport(FATAL, (errmsg("SPI_execute failed: error code %d", ret)));
+
+	if (SPI_processed < 1)
+		return InvalidOid;
+
+	relationid = DatumGetObjectId(SPI_getbinval(SPI_tuptable->vals[0],
+												SPI_tuptable->tupdesc,
+												1, &isnull));
+	if (isnull)
+		return InvalidOid;
+
+	return relationid;
+}
+
+/*
+ * connect_to_db -- connect to the given dbid.
+ *
+ * For global objects the dbid will be InvalidOid, connect to user given
+ * default_database and try to load those blocks.
+ */
+static void
+connect_to_db(Oid dbid)
+{
+	if (!OidIsValid(dbid))
+		BackgroundWorkerInitializeConnection(default_database, NULL);
+	else
+		BackgroundWorkerInitializeConnectionByOid(dbid, InvalidOid);
+	SetCurrentStatementStartTimestamp();
+	StartTransactionCommand();
+	SPI_connect();
+	PushActiveSnapshot(GetTransactionSnapshot());
+}
+
+/*
+ * load_one_database -- start of prewarm sub-worker, this will try to load
+ * blocks of one database starting from block info position passed by main
+ * prewarm worker.
+ */
+void
+load_one_database(Datum main_arg)
+{
+	off_t		blockinfo_pos;
+	char		dump_file_path[MAXPGPATH];
+	FILE	   *file = NULL;
+	BlockInfoRecord prevblock,
+				toload_block;
+	Relation	rel = NULL;
+	bool		have_dbconnection = false;
+	BlkType		loadblocktype;
+	BlockNumber nblocks = 0;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, sigtermHandler);
+	pqsignal(SIGHUP, sighupHandler);
+	pqsignal(SIGUSR1, sigusr1Handler);
+
+	/*
+	 * We're now ready to receive signals
+	 */
+	BackgroundWorkerUnblockSignals();
+
+	blockinfo_pos = DatumGetInt64(main_arg);
+
+	/*
+	 * Seek to the blockinfo_pos and get the database ID to which following
+	 * block info's belong to. Connect to the that database and start loading
+	 * the blocks which follows until we reach end of block infos that belongs
+	 * to connected database.
+	 */
+
+	/* check if file exists and open file in read mode. */
+	snprintf(dump_file_path, sizeof(dump_file_path), "%s", AUTOPREWARM_FILE);
+	file = fopen(dump_file_path, PG_BINARY_R);
+	if (!file)
+		return;					/* No file to load. */
+
+	if (fseeko(file, blockinfo_pos, SEEK_SET))
+	{
+		fclose(file);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("autoprewarm: error reading from \"%s\" : %m",
+						dump_file_path)));
+	}
+
+	loadblocktype = BLKTYPE_NEW_BLOCK;
+	loadblocktype = getnextblockinfo(file, NULL, loadblocktype, &toload_block);
+
+	/*
+	 * It should be a block info belonging to a new database. Or else dump
+	 * file is corrupted better to end the loading of bocks now.
+	 */
+	if (loadblocktype != BLKTYPE_NEW_DATABASE)
+		goto end_load;			/* should we raise a voice here? */
+
+	while (loadblocktype != BLKTYPE_END)
+	{
+		Buffer		buf;
+		Oid			reloid;
+
+		/*
+		 * Load the block only if there exist a free buffer. We do not want to
+		 * replace a block already in buffer pool.
+		 */
+		if (!have_free_buffer())
+			goto end_load;
+
+		if (got_sigterm)
+			goto end_load;
+
+		switch (loadblocktype)
+		{
+			case BLKTYPE_NEW_DATABASE:
+
+				if (have_dbconnection)
+					goto end_load;		/* blocks belong to a new database,
+										 * lets end the loading process. */
+				loadblocktype = BLKTYPE_NEW_DATABASE;
+
+				/*
+				 * connect to the database.
+				 */
+				connect_to_db(toload_block.database);
+				have_dbconnection = true;
+
+			case BLKTYPE_NEW_RELATION:
+
+				/*
+				 * release lock on previous relation.
+				 */
+				if (rel)
+				{
+					relation_close(rel, AccessShareLock);
+					rel = NULL;
+				}
+
+				loadblocktype = BLKTYPE_NEW_RELATION;
+
+				/*
+				 * lock new relation.
+				 */
+				reloid = get_reloid(toload_block.filenode);
+
+				if (!OidIsValid(reloid))
+					break;
+
+				rel = try_relation_open(reloid, AccessShareLock);
+				if (!rel)
+					break;
+				RelationOpenSmgr(rel);
+
+			case BLKTYPE_NEW_FORK:
+
+				/*
+				 * check if fork exists and if block is within the range
+				 */
+				loadblocktype = BLKTYPE_NEW_FORK;
+				if (			/* toload_block.forknum > InvalidForkNumber &&
+								 * toload_block.forknum <= MAX_FORKNUM && */
+					!smgrexists(rel->rd_smgr, toload_block.forknum))
+					break;
+				nblocks = RelationGetNumberOfBlocksInFork(rel,
+													   toload_block.forknum);
+			case BLKTYPE_NEW_BLOCK:
+
+				/* check if blocknum is valid and with in fork file size. */
+				if (toload_block.blocknum >= nblocks)
+				{
+					/* move to next forknum. */
+					loadblocktype = BLKTYPE_NEW_FORK;
+					break;
+				}
+
+				buf = ReadBufferExtended(rel, toload_block.forknum,
+										 toload_block.blocknum, RBM_NORMAL,
+										 NULL);
+				if (BufferIsValid(buf))
+				{
+					ReleaseBuffer(buf);
+				}
+
+				loadblocktype = BLKTYPE_NEW_BLOCK;
+				break;
+
+			case BLKTYPE_END:
+				Assert(0);		/* Should not be here! */
+		}
+
+		memcpy(&prevblock, &toload_block, sizeof(BlockInfoRecord));
+		memset(&toload_block, 0, sizeof(BlockInfoRecord));
+		loadblocktype = getnextblockinfo(file, &prevblock, loadblocktype,
+										 &toload_block);
+	}
+
+end_load:
+
+	fclose(file);
+	/* release lock on previous relation. */
+	if (rel)
+	{
+		relation_close(rel, AccessShareLock);
+		rel = NULL;
+	}
+
+	if (have_dbconnection)
+	{
+		SPI_finish();
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+	return;
+}
+
+/*
+ * launch_prewarm_subworker -- register a dynamic worker to load the blocks
+ * starting from next_db_pos. We wait until the worker has stopped.
+ */
+static void
+launch_prewarm_subworker(off_t next_db_pos)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle = NULL;
+	BgwHandleStatus status;
+
+	setup_autoprewarm(&worker, "autoprewarm", "load_one_database",
+					  Int64GetDatum(next_db_pos), BGW_NEVER_RESTART,
+					  BGWORKER_BACKEND_DATABASE_CONNECTION);
+	/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerShutdown */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			   errmsg("registering dynamic bgworker autoprewarm failed"),
+				 errhint("Consider increasing configuration parameter "
+						 "\"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerShutdown(handle);
+	if (status == BGWH_STOPPED)
+		return;
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			  errmsg("cannot start bgworker autoprewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart"
+						 " the database.")));
+	}
+
+	Assert(0);
+}
+
+/*
+ *	prewarm_buffer_pool - the main routine which prewarm the buffer pool.
+ *
+ *	The prewarm bgworker will first seek to database map table in
+ *	$PGDATA/AUTOPREWARM_FILE. For each offset in the map table it launches a
+ *	sub-worker to load the block info from that offset position until the end
+ *	of that database's block info. All sub-workers will be launched in
+ *	sequential order only after the previous sub-worker has finished its job.
+ *	We try to load each blocknum read from $PGDATA/AUTOPREWARM_FILE until we
+ *	have any free buffer left or SIGTERM is received.
+ */
+static void
+prewarm_buffer_pool(void)
+{
+	char		dump_file_path[MAXPGPATH];
+	FILE	   *file = NULL;
+	off_t		database_map_table,
+				next_db_pos;
+
+	snprintf(dump_file_path, sizeof(dump_file_path), "%s",
+			 AUTOPREWARM_FILE);
+
+	file = fopen(dump_file_path, PG_BINARY_R);
+	if (!file)
+		return;					/* No file to load. */
+
+	/* seek to start of database_map_table. */
+	if (1 != fscanf(file, "%020jd\n", (intmax_t *) & database_map_table))
+		return;
+	if (fseeko(file, database_map_table, SEEK_SET))
+	{
+		fclose(file);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("autoprewarm : error writing to \"%s\" : %m",
+						dump_file_path)));
+	}
+
+	/* get next database's first block info's position. */
+	while (!got_sigterm &&
+		   1 == fscanf(file, "|%jd", (intmax_t *) & next_db_pos))
+	{
+		/*
+		 * Register a sub-worker to load new database's block. Wait until the
+		 * sub-worker finish its job before launching next subworker.
+		 */
+		launch_prewarm_subworker(next_db_pos);
+	}
+
+	ereport(LOG, (errmsg("autoprewarm load task ended")));
+
+	fclose(file);
+
+	return;
+}
+
+/* ============================================================================
+ * =============	buffer pool info dump part of autoprewarm	===============
+ * ============================================================================
+ */
+
+/* This sub-module is for periodically dumping buffer pool's block info into
+ * a dump file AUTOPREWARM_FILE.
+ * Each entry of block info looks like this:
+ * <DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum> and we shall call it
+ * as BlockInfoRecord.
+ *
+ * Contents of AUTOPREWARM_FILE has been formated such a way that
+ * blockInfoRecord of each database can be given to different prewarm workers.
+ *
+ *	format of AUTOPREWAM_FILE
+ *	=======================================
+ *	[offset position of database map table]
+ *	[sorted BlockInfoRecords..............]
+ *	[database map table]
+ *	=======================================
+ *
+ *	The [database map table] is sequence of offset in file which will point to
+ *	first BlockInfoRecords of each database in the dump. The prewarm worker
+ *	will read this offset one by one in sequence and ask its subworker to seek
+ *	to this position and then start loading the BlockInfoRecords one by one
+ *	until it see a BlockInfoRecords of a different database than it is actually
+ *	connected to.
+ *	NOTE : We store off_t inside file so the dump file will not be portable to
+ *	be used across systems where sizeof off_t is different from each other.
+ */
+
+/*
+ *	dump_now - the main routine which goes through each buffer header of buffer
+ *	pool and dumps their meta data. We Sort these data and then dump them.
+ *	Sorting is necessary as it facilitates sequential read during load.
+ */
+static uint32
+dump_now(void)
+{
+	static char dump_file_path[MAXPGPATH],
+				transient_dump_file_path[MAXPGPATH];
+	uint32		i;
+	int			ret;
+	uint32		num_blocks;
+	BlockInfoRecord *block_info_array;
+	BufferDesc *bufHdr;
+	FILE	   *file = NULL;
+	off_t	   *database_map_table,
+				database_map_table_pos;
+	size_t		database_map_table_size;
+	uint32		num_db = 0;
+	Oid			prev_database;
+
+	block_info_array =
+		(BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+	database_map_table_size = 64;
+	database_map_table =
+		(off_t *) palloc(sizeof(off_t) * database_map_table_size);
+
+	for (num_blocks = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		if (buf_state & BM_TAG_VALID)
+		{
+			block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_blocks].spcNode = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+			++num_blocks;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	/* sorting now only to avoid sorting while loading. */
+	pg_qsort(block_info_array, num_blocks, sizeof(BlockInfoRecord),
+			 sort_cmp_func);
+
+	snprintf(transient_dump_file_path, sizeof(dump_file_path),
+			 "%s.%d", AUTOPREWARM_FILE, MyProcPid);
+	file = fopen(transient_dump_file_path, "w");
+	if (file == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("autoprewarm: could not open \"%s\": %m",
+						dump_file_path)));
+
+	snprintf(dump_file_path, sizeof(dump_file_path),
+			 "%s", AUTOPREWARM_FILE);
+	ret = fprintf(file, "%020jd\n", (intmax_t) 0);
+	if (ret < 0)
+	{
+		fclose(file);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("autoprewarm: error writing to \"%s\" : %m",
+						dump_file_path)));
+	}
+
+	database_map_table[num_db++] = ftello(file);
+
+	for (i = 0; i < num_blocks; i++)
+	{
+		if (i > 0 && block_info_array[i].database != prev_database)
+		{
+			if (num_db == database_map_table_size)
+			{
+				database_map_table_size *= 2;	/* double and repalloc. */
+				database_map_table =
+					(off_t *) repalloc(database_map_table,
+									sizeof(off_t) * database_map_table_size);
+			}
+			fflush(file);
+			database_map_table[num_db++] = ftello(file);
+		}
+
+		ret = fprintf(file, "%u,%u,%u,%u,%u\n",
+					  block_info_array[i].database,
+					  block_info_array[i].spcNode,
+					  block_info_array[i].filenode,
+					  (uint32) block_info_array[i].forknum,
+					  block_info_array[i].blocknum);
+		if (ret < 0)
+		{
+			fclose(file);
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("autoprewarm: error writing to \"%s\" : %m",
+							dump_file_path)));
+		}
+
+		prev_database = block_info_array[i].database;
+	}
+
+	pfree(block_info_array);
+	database_map_table_pos = ftello(file);
+
+	for (i = 0; i < num_db; i++)
+	{
+		ret = fprintf(file, "|%jd", (intmax_t) database_map_table[i]);
+		if (ret < 0)
+		{
+			fclose(file);
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("autoprewarm: error writing to \"%s\" : %m",
+							dump_file_path)));
+		}
+	}
+
+	rewind(file);
+	ret = fprintf(file, "%020jd\n", (intmax_t) database_map_table_pos);
+	if (ret < 0)
+	{
+		fclose(file);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("autoprewarm: error writing to \"%s\" : %m",
+						dump_file_path)));
+	}
+
+	/*
+	 * rename transient_dump_file_path to dump_file_path to make things
+	 * permanent.
+	 */
+	ret = fclose(file);
+	if (ret != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("autoprewarm: error closing \"%s\" : %m",
+						transient_dump_file_path)));
+	(void) durable_rename(transient_dump_file_path, dump_file_path, LOG);
+
+	ereport(LOG, (errmsg("autoprewarm: saved metadata info of %d blocks",
+						 num_blocks)));
+	return num_blocks;
+}
+
+/*
+ * dump_block_info_periodically - at regular intervals, which is defined by GUC
+ * dump_interval, dump the info of blocks which are present in buffer pool.
+ */
+void
+dump_block_info_periodically()
+{
+	pg_time_t	last_dump_time = (pg_time_t) time(NULL);
+
+	while (!got_sigterm)
+	{
+		int			rc;
+		pg_time_t	now;
+		int			elapsed_secs = 0,
+					timeout = AT_PWARM_DEFAULT_DUMP_INTERVAL;
+
+		if (dump_interval > AT_PWARM_DUMP_AT_SHUTDOWN_ONLY)
+		{
+			now = (pg_time_t) time(NULL);
+			elapsed_secs = now - last_dump_time;
+
+			if (elapsed_secs > dump_interval)
+			{
+				dump_now();
+				if (got_sigterm)
+					return;		/* got shutdown signal just after a dump. And,
+								 * I think better to return now. */
+				last_dump_time = (pg_time_t) time(NULL);
+				elapsed_secs = 0;
+			}
+
+			timeout = dump_interval - elapsed_secs;
+		}
+
+		/* Has been set not to dump. Nothing more to do. */
+		if (dump_interval == AT_PWARM_OFF)
+			return;
+
+		ResetLatch(&MyProc->procLatch);
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   timeout * 1000, PG_WAIT_EXTENSION);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* One last block meta info dump while postmaster shutdown. */
+	if (dump_interval != AT_PWARM_OFF)
+		dump_now();
+}
+
+/*
+ * autoprewarm_main -- the main entry point of autoprewarm bgworker process.
+ */
+void
+autoprewarm_main(Datum main_arg)
+{
+	AutoPrewarmTask next_task;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, sigtermHandler);
+	pqsignal(SIGHUP, sighupHandler);
+	pqsignal(SIGUSR1, sigusr1Handler);
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+
+	next_task = get_autoprewarm_task(DatumGetInt32(main_arg));
+
+	ereport(LOG, (errmsg("autoprewarm has started")));
+
+	/*
+	 * **** perform autoprewarm's next task	****
+	 */
+	if (next_task == TASK_PREWARM_BUFFERPOOL)
+	{
+		prewarm_buffer_pool();
+		/* prewarm is done lets move to TASK_DUMP_BUFFERPOOL_INFO. */
+		pg_atomic_write_u32(&(state->current_task),
+							TASK_DUMP_BUFFERPOOL_INFO);
+		next_task = TASK_DUMP_BUFFERPOOL_INFO;
+	}
+
+	if (next_task == TASK_DUMP_BUFFERPOOL_INFO)
+	{
+		dump_block_info_periodically();
+
+		/*
+		 * down grade to TASK_DUMP_IMMEDIATE_ONCE so others can start
+		 * TASK_DUMP_BUFFERPOOL_INFO
+		 */
+		pg_atomic_write_u32(&(state->current_task), TASK_DUMP_IMMEDIATE_ONCE);
+	}
+
+	ereport(LOG, (errmsg("autoprewarm shutting down")));
+}
+
+/* ============================================================================
+ * =============	extension's entry functions/utilities	===================
+ * ============================================================================
+ */
+
+/* Register autoprewarm load bgworker. */
+static void
+setup_autoprewarm(BackgroundWorker *autoprewarm, const char *worker_name,
+			   const char *worker_function, Datum main_arg, int restart_time,
+				  int extra_flags)
+{
+	MemSet(autoprewarm, 0, sizeof(BackgroundWorker));
+	autoprewarm->bgw_flags = BGWORKER_SHMEM_ACCESS | extra_flags;
+
+	/* Register the autoprewarm background worker */
+	autoprewarm->bgw_start_time = BgWorkerStart_ConsistentState;
+	autoprewarm->bgw_restart_time = restart_time;
+	autoprewarm->bgw_main = NULL;
+	strcpy(autoprewarm->bgw_library_name, "pg_prewarm");
+	strcpy(autoprewarm->bgw_function_name, worker_function);
+	strncpy(autoprewarm->bgw_name, worker_name, BGW_MAXLEN);
+	autoprewarm->bgw_main_arg = main_arg;
+}
+
+/* Extension's entry point. */
+void
+_PG_init(void)
+{
+	BackgroundWorker autoprewarm;
+
+	/* Define custom GUC variables. */
+	DefineCustomIntVariable("pg_prewarm.dump_interval",
+					   "Sets the maximum time between two buffer pool dumps",
+							"If set to Zero, timer based dumping is disabled."
+							" If set to -1, stops the running autoprewarm.",
+							&dump_interval,
+							AT_PWARM_DEFAULT_DUMP_INTERVAL,
+							AT_PWARM_OFF, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	/* if not run as a preloaded library, nothing more to do here! */
+	if (!process_shared_preload_libraries_in_progress)
+		return;
+
+	DefineCustomStringVariable("pg_prewarm.default_database",
+				"default database to connect if dump has not recorded same.",
+							   NULL,
+							   &default_database,
+							   "postgres",
+							   PGC_POSTMASTER,
+							   0,
+							   NULL,
+							   NULL,
+							   NULL);
+	/* Request additional shared resources */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
+
+	/* Has been set not to prewarm/dump. Nothing more to do. */
+	if (dump_interval == AT_PWARM_OFF)
+		return;
+
+	/* Register autoprewarm load. */
+	setup_autoprewarm(&autoprewarm, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_PREWARM_BUFFERPOOL), 0, 0);
+	RegisterBackgroundWorker(&autoprewarm);
+}
+
+/*
+ * Dynamically launch an autoprewarm dump worker.
+ */
+static pid_t
+autoprewarm_dump_launcher(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	setup_autoprewarm(&worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_DUMP_BUFFERPOOL_INFO), 0, 0);
+
+	/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			   errmsg("registering dynamic bgworker \"autoprewarm\" failed"),
+				 errhint("Consider increasing configuration parameter "
+						 "\"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerStartup(handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not start autoprewarm dump bgworker"),
+			   errhint("More details may be available in the server log.")));
+	}
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			  errmsg("cannot start bgworker autoprewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart"
+						 " the database.")));
+	}
+
+	Assert(status == BGWH_STARTED);
+	return pid;
+}
+
+/*
+ * The C-Language entry function to launch autoprewarm dump bgworker.
+ */
+Datum
+launch_autoprewarm_dump(PG_FUNCTION_ARGS)
+{
+	pid_t		pid;
+
+	/* Has been set not to prewarm/dump. Nothing more to do. */
+	if (dump_interval == AT_PWARM_OFF)
+		PG_RETURN_NULL();
+
+	pid = autoprewarm_dump_launcher();
+	PG_RETURN_INT32(pid);
+}
+
+/*
+ * The C-Language entry function to dump immediately.
+ */
+Datum
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+	AutoPrewarmTask next_task;
+
+	/* dump only if prewarm is not in progress. */
+	next_task = get_autoprewarm_task(TASK_DUMP_IMMEDIATE_ONCE);
+	if (next_task == TASK_DUMP_IMMEDIATE_ONCE)
+		PG_RETURN_INT64(dump_now());
+	PG_RETURN_NULL();
+}
diff --git a/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
new file mode 100644
index 0000000..6c35fb7
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,14 @@
+/* contrib/pg_prewarm/pg_prewarm--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_prewarm UPDATE TO '1.2'" to load this file. \quit
+
+CREATE FUNCTION launch_autoprewarm_dump()
+RETURNS pg_catalog.int4 STRICT
+AS 'MODULE_PATHNAME', 'launch_autoprewarm_dump'
+LANGUAGE C;
+
+CREATE FUNCTION autoprewarm_dump_now()
+RETURNS pg_catalog.int8 STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_dump_now'
+LANGUAGE C;
diff --git a/contrib/pg_prewarm/pg_prewarm.control b/contrib/pg_prewarm/pg_prewarm.control
index cf2fb92..40e3add 100644
--- a/contrib/pg_prewarm/pg_prewarm.control
+++ b/contrib/pg_prewarm/pg_prewarm.control
@@ -1,5 +1,5 @@
 # pg_prewarm extension
 comment = 'prewarm relation data'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_prewarm'
 relocatable = true
diff --git a/doc/src/sgml/pgprewarm.sgml b/doc/src/sgml/pgprewarm.sgml
index c090401..1538446 100644
--- a/doc/src/sgml/pgprewarm.sgml
+++ b/doc/src/sgml/pgprewarm.sgml
@@ -10,7 +10,9 @@
  <para>
   The <filename>pg_prewarm</filename> module provides a convenient way
   to load relation data into either the operating system buffer cache
-  or the <productname>PostgreSQL</productname> buffer cache.
+  or the <productname>PostgreSQL</productname> buffer cache. Additionally, an
+  automatic prewarming of the server buffers is supported whenever the server
+  restarts.
  </para>
 
  <sect2>
@@ -55,6 +57,102 @@ pg_prewarm(regclass, mode text default 'buffer', fork text default 'main',
    cache. For these reasons, prewarming is typically most useful at startup,
    when caches are largely empty.
   </para>
+
+<synopsis>
+launch_autoprewarm_dump() RETURNS int4
+</synopsis>
+
+  <para>
+   This is a SQL callable function to launch the <literal>autoprewarm</literal>
+   worker to dump the buffer pool information at regular interval. In a server,
+   we can only run one <literal>autoprewarm</literal> worker so if worker sees
+   another existing worker it will exit immediately. The return value is pid of
+   the worker which has been launched.
+  </para>
+
+<synopsis>
+autoprewarm_dump_now() RETURNS int8
+</synopsis>
+
+  <para>
+   This is a SQL callable function to dump buffer pool information immediately
+   once by a backend. This can work in parallel
+   with the <literal>autoprewarm</literal> worker while it is dumping.
+   The return value is the number of blocks info dumped.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>autoprewarm</title>
+
+  <para>
+  A bgworker which automatically records information about blocks which were
+  present in buffer pool before server shutdown and then prewarm the buffer
+  pool upon server restart with those blocks.
+  </para>
+
+  <para>
+  When the shared library <literal>pg_prewarm</literal> is preloaded via
+  <xref linkend="guc-shared-preload-libraries"> in <filename>postgresql.conf</>,
+  a bgworker <literal>autoprewarm</literal> is launched immediately after the
+  server has reached a consistent state. The bgworker will start loading blocks
+  recorded in <literal>$PGDATA/autoprewarm.blocks</literal> until there is a
+  free buffer left in the buffer pool. This way we do not replace any new
+  blocks which were loaded either by the recovery process or the querying
+  clients.
+  </para>
+
+  <para>
+  Once the <literal>autoprewarm</literal> bgworker has completed its prewarm
+  task, it will start a new task to periodically dump the information about
+  blocks which are currently in shared buffer pool. Upon next server restart,
+  the bgworker will prewarm the buffer pool by loading those blocks. The GUC
+  <literal>pg_prewarm.dump_interval</literal> will control the dumping activity
+  of the bgworker.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+   <term>
+     <varname>pg_prewarm.dump_interval</varname> (<type>int</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.dump_interval</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is valid only for <literal>autoprewarm</literal>. The minimum number
+      of seconds between two buffer pool's block information dump. The default
+      is 300 seconds. It also takes special values. If set to 0 then timer
+      based dump is disabled, it dumps only while the server is shutting down.
+      If set to -1, the running <literal>autoprewarm</literal> will be stopped.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+ <variablelist>
+   <varlistentry>
+    <term>
+     <varname>pg_prewarm.default_database</varname> (<type>string</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.default_database</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is valid only for <literal>autoprewarm</literal>. The blocks of
+      global objects will not have a database associated with them. The
+      <literal>default_database</literal> will be used to connect and preload
+      such blocks.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
  </sect2>
 
  <sect2>
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 5d0a636..06a34a7 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,23 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- a lockless check to see if there is a free buffer in
+ *					   buffer pool.
+ *
+ * If the result is true that will become stale once free buffers are moved out
+ * by other operations, so the caller who strictly want to use a free buffer
+ * should not call this.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index d117b66..58d4871 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9f876ae..4e7ea86 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -133,6 +133,8 @@ AttrDefault
 AttrNumber
 AttributeOpts
 AuthRequest
+AutoPrewarmSharedState
+AutoPrewarmTask
 AutoVacOpts
 AutoVacuumShmemStruct
 AuxProcType
@@ -206,10 +208,12 @@ BitmapOr
 BitmapOrPath
 BitmapOrState
 Bitmapset
+BlkType
 BlobInfo
 Block
 BlockId
 BlockIdData
+BlockInfoRecord
 BlockNumber
 BlockSampler
 BlockSamplerData

#63

Mithun Cy

mithun.cy@enterprisedb.com

almost 9 years ago

In reply to: Mithun Cy (#62)

Re: Proposal : For Auto-Prewarm.

- The error handling loop around load_block() suggests that you're
expecting some reads to fail, which I guess is because you could be
trying to read blocks from a relation that's been rewritten under a
different relfilenode, or partially or entirely truncated. But I
don't think it's very reasonable to just let ReadBufferWhatever() spew
error messages into the log and hope users don't panic. People will
expect an automatic prewarm solution to handle any such cases quietly,
not bleat loudly in the log. I suspect that this error-trapping code
isn't entirely correct, but there's not much point in fixing it; what
we really need to do is get rid of it (somehow).

[Need Reelook] -- Debug and check if block load fails what will happen.

Oops Sorry, this was for self-reference. It is fixed now

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#64

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

almost 9 years ago

In reply to: Mithun Cy (#62)

Re: Proposal : For Auto-Prewarm.

On 3/13/17 09:15, Mithun Cy wrote:

A. launch_autoprewarm_dump() RETURNS int4
This is a SQL callable function to launch the autoprewarm worker to
dump the buffer pool information at regular interval. In a server, we
can only run one autoprewarm worker so if a worker sees another
existing worker it will exit immediately. The return value is pid of
the worker which has been launched.

Why do you need that?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#65

Mithun Cy

mithun.cy@enterprisedb.com

almost 9 years ago

In reply to: Peter Eisentraut (#64)

Re: Proposal : For Auto-Prewarm.

On Sat, Mar 25, 2017 at 11:46 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 3/13/17 09:15, Mithun Cy wrote:

A. launch_autoprewarm_dump() RETURNS int4
This is a SQL callable function to launch the autoprewarm worker to
dump the buffer pool information at regular interval. In a server, we
can only run one autoprewarm worker so if a worker sees another
existing worker it will exit immediately. The return value is pid of
the worker which has been launched.

Why do you need that?

To launch an autoprewarm worker we have to preload the liberary which
need a server restart. If we want to start periodic dumping on an
already running server so that it can automatically prewarm on its
next restart, this API can be used to launch the autoprewarm.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#66

Noname

andres@anarazel.de

almost 9 years ago

In reply to: Mithun Cy (#62)

Re: Proposal : For Auto-Prewarm.

On 2017-03-13 18:45:00 +0530, Mithun Cy wrote:

I have implemented a similar logic now. The prewarm bgworker will
launch a sub-worker per database in the dump file. And, each
sub-worker will load its database block info. The sub-workers will be
launched only after previous one is finished. All of this will only
start if the database has reached a consistent state.

Hm. For replay performance it'd possibly be good to start earlier,
before reaching consistency. Is there an issue starting earlier?

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
new file mode 100644
index 0000000..f4b34ca
--- /dev/null
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -0,0 +1,1137 @@
+/*-------------------------------------------------------------------------
+ *
+ * autoprewarm.c
+ *
+ * -- Automatically prewarm the shared buffer pool when server restarts.

Don't think we ususally use -- here.

+ * Copyright (c) 2013-2017, PostgreSQL Global Development Group

Hm, that's a bit of a weird date range.

+ *	IDENTIFICATION
+ *		contrib/pg_prewarm.c/autoprewarm.c
+ *-------------------------------------------------------------------------
+ */

The pg_prewarm.c in there looks like some search & replace gone awry.

+#include "postgres.h"
+#include <unistd.h>
+
+/* These are always necessary for a bgworker. */
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+
+/* These are necessary for prewarm utilities. */
+#include "pgstat.h"
+#include "storage/buf_internals.h"
+#include "storage/smgr.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "utils/guc.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "executor/spi.h"
+#include "access/xact.h"
+#include "utils/rel.h"
+#include "port/atomics.h"

I'd rather just sort these alphabetically.

I think this should rather be in the initial header.

+/*
+ * autoprewarm :
+ *
+ * What is it?
+ * ===========
+ * A bgworker which automatically records information about blocks which were
+ * present in buffer pool before server shutdown and then prewarm the buffer
+ * pool upon server restart with those blocks.
+ *
+ * How does it work?
+ * =================
+ * When the shared library "pg_prewarm" is preloaded, a
+ * bgworker "autoprewarm" is launched immediately after the server has reached
+ * consistent state. The bgworker will start loading blocks recorded in the
+ * format BlockInfoRecord
+ * <<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>> in
+ * $PGDATA/AUTOPREWARM_FILE, until there is a free buffer left in the buffer
+ * pool. This way we do not replace any new blocks which were loaded either by
+ * the recovery process or the querying clients.

s/until there is a/until there is no/?

+/*
+ * ============================================================================
+ * ===========================	 SIGNAL HANDLERS	===========================
+ * ============================================================================
+ */

Hm...

+static void sigtermHandler(SIGNAL_ARGS);
+static void sighupHandler(SIGNAL_ARGS);

I don't think that's a casing we commonly use. We mostly use CamelCase
or underscore_case.

+/*
+ *	Signal handler for SIGUSR1.
+ */
+static void
+sigusr1Handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}

Hm, what's this one for?

+/*
+ * Shared state information about the running autoprewarm bgworker.
+ */
+typedef struct AutoPrewarmSharedState
+{
+	pg_atomic_uint32 current_task;		/* current tasks performed by
+										 * autoprewarm workers. */
+} AutoPrewarmSharedState;

Hm. Why do we need atomics here? I thought there's no concurrency?

+/*
+ * sort_cmp_func - compare function used for qsort().
+ */
+static int
+sort_cmp_func(const void *p, const void *q)
+{

rename to blockinfo_cmp?

+static AutoPrewarmTask
+get_autoprewarm_task(AutoPrewarmTask todo_task)
+{
+	bool		found;
+
+	state = NULL;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	state = ShmemInitStruct("autoprewarm",
+							sizeof(AutoPrewarmSharedState),
+							&found);
+	if (!found)
+		pg_atomic_write_u32(&(state->current_task), todo_task);

Superflous parens (repeated a lot).

+	LWLockRelease(AddinShmemInitLock);
+
+	/* If found check if we can go ahead. */
+	if (found)
+	{
+		if (pg_atomic_read_u32(&(state->current_task)) ==
+			TASK_PREWARM_BUFFERPOOL)

You repeat the read in every branch - why don't you store it in a
variable instead?

That aside, the use of an atomic doesn't seem to actually gain us
anything here. If we need control over concurrency it seems a lot
better to instead use a lwlock or spinlock. There's no contention here,
using lock-free stuff just increases complexity without a corresponding
benefit.

+		{
+			if (todo_task == TASK_PREWARM_BUFFERPOOL)
+			{
+				/*
+				 * we were prewarming and we are back to do same, time to
+				 * abort prewarming and move to dumping.
+				 */

I'm not sure what "back to do same" should mean here - changing to a
different type of task surely is not the same.

+				pg_atomic_write_u32(&(state->current_task),
+									TASK_DUMP_BUFFERPOOL_INFO);
+				return TASK_DUMP_BUFFERPOOL_INFO;
+			}
+			else
+				return TASK_END;	/* rest all cannot proceed further. */

What does that comment mean?

+		}
+		else if (pg_atomic_read_u32(&(state->current_task)) ==
+				 TASK_DUMP_IMMEDIATE_ONCE)
+		{
+			uint32		current_state = TASK_DUMP_IMMEDIATE_ONCE;
+
+			/* We cannot do a TASK_PREWARM_BUFFERPOOL but rest can go ahead */
+			if (todo_task == TASK_DUMP_IMMEDIATE_ONCE)
+				return TASK_DUMP_IMMEDIATE_ONCE;
+
+			if (todo_task == TASK_PREWARM_BUFFERPOOL)
+				todo_task = TASK_DUMP_BUFFERPOOL_INFO;	/* skip to do dump only */
+
+			/*
+			 * first guy who can atomically set the current_task get the
+			 * opportunity to proceed further
+			 */
+			if (pg_atomic_compare_exchange_u32(&(state->current_task),
+											   &current_state,
+											   TASK_DUMP_BUFFERPOOL_INFO))
+			{
+				/* Wow! We won the race proceed with the task. */
+				return TASK_DUMP_BUFFERPOOL_INFO;
+			}
+			else
+				return TASK_END;

Note that it's not generally guaranteed that any
pg_atomic_compare_exchange_u32 actually wins, it could temporarily fail
for all.

+/*
+ * getnextblockinfo -- given a BlkType get its next BlockInfoRecord from the
+ *					   dump file.
+ */
+static BlkType
+getnextblockinfo(FILE *file, BlockInfoRecord *currblkinfo, BlkType reqblock,
+				 BlockInfoRecord *newblkinfo)
+{
+	BlkType		nextblk;
+
+	while (true)
+	{
+		/* get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &(newblkinfo->database),
+						&(newblkinfo->spcNode), &(newblkinfo->filenode),
+						(uint32 *) &(newblkinfo->forknum),
+						&(newblkinfo->blocknum)))
+			return BLKTYPE_END; /* No more valid entry hence stop processing. */

Hm. Is it actually helpful to store the file as text? That's commonly
going to increase the size of the file quite considerably, no?

+/*
+ * GetRelOid -- given a filenode get its relation oid.
+ */
+static Oid
+get_reloid(Oid filenode)
+{

Function and comment don't agree on naming.

But what is this actually used for? I thought Robert, in
http://archives.postgresql.org/message-id/CA%2BTgmoa%3DUqCL2mR%2B9WTq05tB3Up-z4Sv2wkzkDxDwBP7Mj_2_w%40mail.gmail.com
suggested storing the filenode in the dump, and then to use
RelidByRelfilenode to get the corresponding relation?

It seems a lot better to use relfilenodes, because otherwise table
rewrites will lead to reloading wrong things.

+	int			ret;
+	Oid			relationid;
+	bool		isnull;
+	Datum		value[1] = {ObjectIdGetDatum(filenode)};
+	StringInfoData buf;
+	Oid			ptype[1] = {OIDOID};
+
+	initStringInfo(&buf);
+	appendStringInfo(&buf,
+			"select oid from pg_class where pg_relation_filenode(oid) = $1");
+
+	ret = SPI_execute_with_args(buf.data, 1, (Oid *) &ptype, (Datum *) &value,
+								NULL, true, 1);
+
+	if (ret != SPI_OK_SELECT)
+		ereport(FATAL, (errmsg("SPI_execute failed: error code %d", ret)));
+
+	if (SPI_processed < 1)
+		return InvalidOid;
+
+	relationid = DatumGetObjectId(SPI_getbinval(SPI_tuptable->vals[0],
+												SPI_tuptable->tupdesc,
+												1, &isnull));
+	if (isnull)
+		return InvalidOid;
+
+	return relationid;
+}

Doing this via SPI doesn't strike me as a good idea - that's really
quite expensive. Why not call the underlying function directly?

+/*
+ * load_one_database -- start of prewarm sub-worker, this will try to load
+ * blocks of one database starting from block info position passed by main
+ * prewarm worker.
+ */
+void
+load_one_database(Datum main_arg)
+{

+	/* check if file exists and open file in read mode. */
+	snprintf(dump_file_path, sizeof(dump_file_path), "%s", AUTOPREWARM_FILE);
+	file = fopen(dump_file_path, PG_BINARY_R);
+	if (!file)
+		return;					/* No file to load. */

Shouldn't this be an error case? In which case is it ok for the file to
be gone after we launched the worker?

+	/*
+	 * It should be a block info belonging to a new database. Or else dump
+	 * file is corrupted better to end the loading of bocks now.
+	 */
+	if (loadblocktype != BLKTYPE_NEW_DATABASE)
+		goto end_load;			/* should we raise a voice here? */

Yes, this should raise an error.

+			case BLKTYPE_NEW_RELATION:
+
+				/*
+				 * release lock on previous relation.
+				 */
+				if (rel)
+				{
+					relation_close(rel, AccessShareLock);
+					rel = NULL;
+				}
+
+				loadblocktype = BLKTYPE_NEW_RELATION;
+
+				/*
+				 * lock new relation.
+				 */
+				reloid = get_reloid(toload_block.filenode);
+
+				if (!OidIsValid(reloid))
+					break;
+
+				rel = try_relation_open(reloid, AccessShareLock);
+				if (!rel)
+					break;
+				RelationOpenSmgr(rel);

Now I'm confused. Your get_reloid used pg_relation_filenode() to map
from relation oid to filenode - and then you're using it to lock the
relation? Something's wrong.

+			case BLKTYPE_NEW_FORK:
+
+				/*
+				 * check if fork exists and if block is within the range
+				 */
+				loadblocktype = BLKTYPE_NEW_FORK;
+				if (			/* toload_block.forknum > InvalidForkNumber &&
+								 * toload_block.forknum <= MAX_FORKNUM && */
+					!smgrexists(rel->rd_smgr, toload_block.forknum))
+					break;

Huh? What's with that commented out section of code?

+			case BLKTYPE_NEW_BLOCK:
+
+				/* check if blocknum is valid and with in fork file size. */
+				if (toload_block.blocknum >= nblocks)
+				{
+					/* move to next forknum. */
+					loadblocktype = BLKTYPE_NEW_FORK;
+					break;
+				}

Hm. Why does the size of the underlying file allow us to skip to the
next fork? Don't we have to read all the pending dump records?

+				buf = ReadBufferExtended(rel, toload_block.forknum,
+										 toload_block.blocknum, RBM_NORMAL,
+										 NULL);
+				if (BufferIsValid(buf))
+				{
+					ReleaseBuffer(buf);
+				}
+
+				loadblocktype = BLKTYPE_NEW_BLOCK;
+				break;

Hm. RBM_NORMAL will error out in a bunch of cases, is that ok?

+	if (have_dbconnection)
+	{
+		SPI_finish();
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+	return;
+}

Are we really ok keeping open a transaction through all of this? That
could potentially be quite long, no? How about doing that on a per-file
basis, or even moving to session locks alltogether?

+/* This sub-module is for periodically dumping buffer pool's block info into
+ * a dump file AUTOPREWARM_FILE.
+ * Each entry of block info looks like this:
+ * <DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum> and we shall call it
+ * as BlockInfoRecord.
+ *
+ * Contents of AUTOPREWARM_FILE has been formated such a way that
+ * blockInfoRecord of each database can be given to different prewarm workers.
+ *
+ *	format of AUTOPREWAM_FILE
+ *	=======================================
+ *	[offset position of database map table]
+ *	[sorted BlockInfoRecords..............]
+ *	[database map table]
+ *	=======================================

This doesn't mention storing things as ascii, instead of binary...

+ *	The [database map table] is sequence of offset in file which will point to
+ *	first BlockInfoRecords of each database in the dump. The prewarm worker
+ *	will read this offset one by one in sequence and ask its subworker to seek
+ *	to this position and then start loading the BlockInfoRecords one by one
+ *	until it see a BlockInfoRecords of a different database than it is actually
+ *	connected to.
+ *	NOTE : We store off_t inside file so the dump file will not be portable to
+ *	be used across systems where sizeof off_t is different from each other.
+ */

Why are we using off_t? Shouldn't this just be BlockNumber?

+static uint32
+dump_now(void)
+{
+	static char dump_file_path[MAXPGPATH],

+
+	for (num_blocks = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		if (buf_state & BM_TAG_VALID)
+		{
+			block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_blocks].spcNode = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+			++num_blocks;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);

+	}
+
+	/* sorting now only to avoid sorting while loading. */

"sorting while loading"? You mean random accesses?

+	pg_qsort(block_info_array, num_blocks, sizeof(BlockInfoRecord),
+			 sort_cmp_func);

+	snprintf(transient_dump_file_path, sizeof(dump_file_path),
+			 "%s.%d", AUTOPREWARM_FILE, MyProcPid);
+	file = fopen(transient_dump_file_path, "w");
+	if (file == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("autoprewarm: could not open \"%s\": %m",
+						dump_file_path)));

What if that file already exists? You're not truncating it. Also, what
if we error out in the middle of this? We'll leak an fd. I think this
needs to use OpenTransientFile etc.

+	snprintf(dump_file_path, sizeof(dump_file_path),
+			 "%s", AUTOPREWARM_FILE);
+	ret = fprintf(file, "%020jd\n", (intmax_t) 0);
+	if (ret < 0)
+	{
+		fclose(file);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("autoprewarm: error writing to \"%s\" : %m",
+						dump_file_path)));
+	}
+
+	database_map_table[num_db++] = ftello(file);
+
+	for (i = 0; i < num_blocks; i++)
+	{
+		if (i > 0 && block_info_array[i].database != prev_database)
+		{
+			if (num_db == database_map_table_size)
+			{
+				database_map_table_size *= 2;	/* double and repalloc. */
+				database_map_table =
+					(off_t *) repalloc(database_map_table,
+									sizeof(off_t) * database_map_table_size);
+			}
+			fflush(file);
+			database_map_table[num_db++] = ftello(file);
+		}
+
+		ret = fprintf(file, "%u,%u,%u,%u,%u\n",
+					  block_info_array[i].database,
+					  block_info_array[i].spcNode,
+					  block_info_array[i].filenode,
+					  (uint32) block_info_array[i].forknum,
+					  block_info_array[i].blocknum);
+		if (ret < 0)
+		{
+			fclose(file);
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("autoprewarm: error writing to \"%s\" : %m",
+							dump_file_path)));
+		}
+
+		prev_database = block_info_array[i].database;
+	}

I think we should check for interrupts somewhere in that (and the
preceding) loop.

+/*
+ * dump_block_info_periodically - at regular intervals, which is defined by GUC
+ * dump_interval, dump the info of blocks which are present in buffer pool.
+ */
+void
+dump_block_info_periodically()
+{

Suggest adding void to the parameter list.

+	pg_time_t	last_dump_time = (pg_time_t) time(NULL);
+
+	while (!got_sigterm)
+	{
+		int			rc;
+		pg_time_t	now;
+		int			elapsed_secs = 0,
+					timeout = AT_PWARM_DEFAULT_DUMP_INTERVAL;
+
+		if (dump_interval > AT_PWARM_DUMP_AT_SHUTDOWN_ONLY)
+		{
+			now = (pg_time_t) time(NULL);
+			elapsed_secs = now - last_dump_time;
+
+			if (elapsed_secs > dump_interval)
+			{
+				dump_now();
+				if (got_sigterm)
+					return;		/* got shutdown signal just after a dump. And,
+								 * I think better to return now. */
+				last_dump_time = (pg_time_t) time(NULL);
+				elapsed_secs = 0;
+			}
+
+			timeout = dump_interval - elapsed_secs;
+		}

I suggest using GetCurrenttimstamp() and TimestampDifferenceExceeds()
instead.

+		/* Has been set not to dump. Nothing more to do. */
+		if (dump_interval == AT_PWARM_OFF)
+			return;
+
+		ResetLatch(&MyProc->procLatch);
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   timeout * 1000, PG_WAIT_EXTENSION);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* One last block meta info dump while postmaster shutdown. */
+	if (dump_interval != AT_PWARM_OFF)
+		dump_now();

Uh, afaics we'll also do this if somebody SIGTERMed the process
interactively?

+/* Extension's entry point. */
+void
+_PG_init(void)
+{
+	BackgroundWorker autoprewarm;
+
+	/* Define custom GUC variables. */
+	DefineCustomIntVariable("pg_prewarm.dump_interval",
+					   "Sets the maximum time between two buffer pool dumps",
+							"If set to Zero, timer based dumping is disabled."
+							" If set to -1, stops the running autoprewarm.",
+							&dump_interval,
+							AT_PWARM_DEFAULT_DUMP_INTERVAL,
+							AT_PWARM_OFF, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	/* if not run as a preloaded library, nothing more to do here! */
+	if (!process_shared_preload_libraries_in_progress)
+		return;
+
+	DefineCustomStringVariable("pg_prewarm.default_database",
+				"default database to connect if dump has not recorded same.",
+							   NULL,
+							   &default_database,
+							   "postgres",
+							   PGC_POSTMASTER,
+							   0,
+							   NULL,
+							   NULL,
+							   NULL);

I don't think it's a good idea to make guc registration depending on
process_shared_preload_libraries_in_progress.

You should also use EmitWarningsOnPlaceholders() somewhere here.

I also wonder whether we don't need to use prefetch to actually make
this fast enough.

I think it's pretty clear that this needs a bit more work and thus won't
be ready for v10. Moved to the next CF.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#67

Mithun Cy

mithun.cy@enterprisedb.com

almost 9 years ago

In reply to: Noname (#66)

Re: Proposal : For Auto-Prewarm.

On Thu, Apr 6, 2017 at 4:12 AM, Andres Freund <andres@anarazel.de> wrote:

On 2017-03-13 18:45:00 +0530, Mithun Cy wrote:

I have implemented a similar logic now. The prewarm bgworker will
launch a sub-worker per database in the dump file. And, each
sub-worker will load its database block info. The sub-workers will be
launched only after previous one is finished. All of this will only
start if the database has reached a consistent state.

Hm. For replay performance it'd possibly be good to start earlier,
before reaching consistency. Is there an issue starting earlier?

Thanks Andres for a detailed review. I will try to address them in my next
post. I thought it is important to reply to above comment before that.
Earlier patches used to start loading blocks before reaching a consistent
state. Then Robert while reviewing found a basic flaw in my approach[1]cannot load blocks without holding relation lock </messages/by-id/CA+TgmoYNF_wfdwQ3z3713zKy2j0Z9C32WJdtKjvRWzeY7JOL4g@mail.gmail.com>.
The function DropRelFileNodesAllBuffers do not expect others to load the
blocks concurrently while it is getting rid of buffered blocks. So has to
delay loading until database reaches consistent state so that we can
connect to each database and take a relation lock before loading any of
theirs blocks.

[1]: cannot load blocks without holding relation lock </messages/by-id/CA+TgmoYNF_wfdwQ3z3713zKy2j0Z9C32WJdtKjvRWzeY7JOL4g@mail.gmail.com>
</messages/by-id/CA+TgmoYNF_wfdwQ3z3713zKy2j0Z9C32WJdtKjvRWzeY7JOL4g@mail.gmail.com>

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

#68

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Noname (#66)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

Thanks, Andres,

I have tried to fix all of your comments. One important change has
happened with this patch is previously we used to read one block info
structure at a time and load it. But now we read all of them together
and load it into as DSA and then we distribute the block infos to the
subgroups to load corresponding blocks. With this portability issue
which I have mentioned above will no longer exists as we do not store
any map tables within the dump file.

On Thu, Apr 6, 2017 at 4:12 AM, Andres Freund <andres@anarazel.de> wrote:

On 2017-03-13 18:45:00 +0530, Mithun Cy wrote:

launched only after previous one is finished. All of this will only
start if the database has reached a consistent state.

Hm. For replay performance it'd possibly be good to start earlier,
before reaching consistency. Is there an issue starting earlier?

Earlier patches used to start loading blocks before reaching a
consistent state. Then Robert while reviewing found a basic flaw in my
approach. The function DropRelFileNodesAllBuffers do not expect
others to load the blocks concurrently while it is getting rid of
buffered blocks. So has to delay loading until database reaches
consistent state so that we can connect to each database and take a
relation lock before loading any of theirs blocks.

+ * -- Automatically prewarm the shared buffer pool when server restarts.

Don't think we ususally use -- here.

-- Fixed.

+ * Copyright (c) 2013-2017, PostgreSQL Global Development Group

Hm, that's a bit of a weird date range.

-- changed it to 2016-2017 is that right?

The pg_prewarm.c in there looks like some search & replace gone awry.

-- sorry Fixed.

+#include "utils/rel.h"
+#include "port/atomics.h"

I'd rather just sort these alphabetically.
I think this should rather be in the initial header.

-- Fixed as suggested and have moved everything into a header file pg_prewarm.h.

s/until there is a/until there is no/?

-- Fixed.

+/*
+ * ============================================================================
+ * ===========================        SIGNAL HANDLERS        ===========================
+ * ============================================================================
+ */

Hm...

-- I have reverted those cosmetic changes now.

+static void sigtermHandler(SIGNAL_ARGS);
+static void sighupHandler(SIGNAL_ARGS);
I don't think that's a casing we commonly use. We mostly use CamelCase
or underscore_case.

-- Fixed with CamelCase.

+ *   Signal handler for SIGUSR1.
+ */
+static void
+sigusr1Handler(SIGNAL_ARGS)

Hm, what's this one for?

-- The prewarm sub-workers will notify with SIGUSR1 on their
startup/shutdown. Updated the comments.

+/*
+ * Shared state information about the running autoprewarm bgworker.
+ */
+typedef struct AutoPrewarmSharedState
+{
+     pg_atomic_uint32 current_task;          /* current tasks performed by
+                                                                              * autoprewarm workers. */
+} AutoPrewarmSharedState;

Hm. Why do we need atomics here? I thought there's no concurrency?

There are 3 methods in autoprewarm.
A: prealoaded prewarm bgworker which also prewarm and then periodic dumping;
B: bgwoker launched by launch_autoprewarm_dump() which do the periodic dumping.
C: Immediate dump by backends.

We do not want 2 bgworkers started and working concurrently. and do
not want to dump while prewarm task is running. As you have suggested
rewrote a simple logic with the use of a boolean variable.

+sort_cmp_func(const void *p, const void *q)
+{
rename to blockinfo_cmp?

-- Fixed.

Superflous parens (repeated a lot).

-- Fixed in all places.

Hm. Is it actually helpful to store the file as text? That's commonly
going to increase the size of the file quite considerably, no?

-- Having in the text could help in readability or if we want to
modify/adjust it as needed. I shall so a small experiment to check how
much we caould save and produce a patch on top of this for same.

But what is this actually used for? I thought Robert, in
http://archives.postgresql.org/message-id/CA%2BTgmoa%3DUqCL2mR%2B9WTq05tB3Up-z4Sv2wkzkDxDwBP7Mj_2_w%40mail.gmail.com
suggested storing the filenode in the dump, and then to use
RelidByRelfilenode to get the corresponding relation?

-- Fixed as suggested directly calling RelidByRelfilenode now.

+load_one_database(Datum main_arg)
+{
+     /* check if file exists and open file in read mode. */
+     snprintf(dump_file_path, sizeof(dump_file_path), "%s", AUTOPREWARM_FILE);
+     file = fopen(dump_file_path, PG_BINARY_R);
+     if (!file)
+             return;                                 /* No file to load. */
Shouldn't this be an error case? In which case is it ok for the file to
be gone after we launched the worker?

-- Yes sorry a mistake. In the new patch I have changed the file map
to dsa to distribute block infos to subworker so this code will go
away.
Now the main worker will load all of the blockinfos into a dynamic
shared area(dsa) and sub worker will read blocks from them which
belong to the database they are assigned to load.

+     /*
+      * It should be a block info belonging to a new database. Or else dump
+      * file is corrupted better to end the loading of bocks now.
+      */
+     if (loadblocktype != BLKTYPE_NEW_DATABASE)
+             goto end_load;                  /* should we raise a voice here? */

Yes, this should raise an error.

-- Got rid of this check with the new code.

+                     case BLKTYPE_NEW_RELATION:
+
+                             /*
+                              * release lock on previous relation.
+                              */
+                             if (rel)
+                             {
+                                     relation_close(rel, AccessShareLock);
+                                     rel = NULL;
+                             }
+
+                             loadblocktype = BLKTYPE_NEW_RELATION;
+
+                             /*
+                              * lock new relation.
+                              */
+                             reloid = get_reloid(toload_block.filenode);
+
+                             if (!OidIsValid(reloid))
+                                     break;
+
+                             rel = try_relation_open(reloid, AccessShareLock);
+                             if (!rel)
+                                     break;
+                             RelationOpenSmgr(rel);

Now I'm confused. Your get_reloid used pg_relation_filenode() to map
from relation oid to filenode - and then you're using it to lock the
relation? Something's wrong.

We take a shared lock on relation id so that while we load the blocks
of the relation, DropRelFileNodesAllBuffers is not called by another
process.

+                     case BLKTYPE_NEW_FORK:
+
+                             /*
+                              * check if fork exists and if block is within the range
+                              */
+                             loadblocktype = BLKTYPE_NEW_FORK;
+                             if (                    /* toload_block.forknum > InvalidForkNumber &&
+                                                              * toload_block.forknum <= MAX_FORKNUM && */
+                                     !smgrexists(rel->rd_smgr, toload_block.forknum))
+                                     break;

Huh? What's with that commented out section of code?

-- smgrexists is not safe it crashed if we pass illegal forknumber so
a precheck. Accidently forgot to uncomment same sorry. Fixed it now.

+                     case BLKTYPE_NEW_BLOCK:
+
+                             /* check if blocknum is valid and with in fork file size. */
+                             if (toload_block.blocknum >= nblocks)
+                             {
+                                     /* move to next forknum. */
+                                     loadblocktype = BLKTYPE_NEW_FORK;
+                                     break;
+                             }

Hm. Why does the size of the underlying file allow us to skip to the
next fork? Don't we have to read all the pending dump records?

-- Blocks beyond the file size are not existing. So I thought we can
move to next fork to load thier blocks.

+                             buf = ReadBufferExtended(rel, toload_block.forknum,
+                                                                              toload_block.blocknum, RBM_NORMAL,
+                                                                              NULL);
+                             if (BufferIsValid(buf))
+                             {
+                                     ReleaseBuffer(buf);
+                             }
+
+                             loadblocktype = BLKTYPE_NEW_BLOCK;
+                             break;

Hm. RBM_NORMAL will error out in a bunch of cases, is that ok?

For now, on error we restart the bgworker; Previously I did setjump to
catch(process) the error and continue. But as a review comment fix I
have moved to new logic of restart.

+     if (have_dbconnection)
+     {
+             SPI_finish();
+             PopActiveSnapshot();
+             CommitTransactionCommand();
+     }
+     return;
+}
Are we really ok keeping open a transaction through all of this? That
could potentially be quite long, no? How about doing that on a per-file
basis, or even moving to session locks alltogether?

-- We hold the transaction until we load all the dumped blocks of the
database or buffer pool becomes full, yes that can go long. Should I
end and start a transaction on every fork file?

This doesn't mention storing things as ascii, instead of binary...

-- Fixed same.

+ *   NOTE : We store off_t inside file so the dump file will not be portable to
+ *   be used across systems where sizeof off_t is different from each other.
+ */

Why are we using off_t? Shouldn't this just be BlockNumber?

-- Previously we maintained a special map to block infos of each
database, but now moved everything to dsa so off_t this code is
removed now.

+static uint32
+dump_now(void)
+{
+     static char dump_file_path[MAXPGPATH],

+
+     for (num_blocks = 0, i = 0; i < NBuffers; i++)
+     {
+             uint32          buf_state;
+
+             bufHdr = GetBufferDescriptor(i);
+
+             /* lock each buffer header before inspecting. */
+             buf_state = LockBufHdr(bufHdr);
+
+             if (buf_state & BM_TAG_VALID)
+             {
+                     block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+                     block_info_array[num_blocks].spcNode = bufHdr->tag.rnode.spcNode;
+                     block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+                     block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+                     block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+                     ++num_blocks;
+             }
+
+             UnlockBufHdr(bufHdr, buf_state);

+     }
+
+     /* sorting now only to avoid sorting while loading. */

"sorting while loading"? You mean random accesses?

-- Yes fixed same.

+     pg_qsort(block_info_array, num_blocks, sizeof(BlockInfoRecord),
+                      sort_cmp_func);

+     snprintf(transient_dump_file_path, sizeof(dump_file_path),
+                      "%s.%d", AUTOPREWARM_FILE, MyProcPid);
+     file = fopen(transient_dump_file_path, "w");
+     if (file == NULL)
+             ereport(ERROR,
+                             (errcode_for_file_access(),
+                              errmsg("autoprewarm: could not open \"%s\": %m",
+                                             dump_file_path)));

What if that file already exists? You're not truncating it. Also, what
if we error out in the middle of this? We'll leak an fd. I think this
needs to use OpenTransientFile etc.

Thanks using OpenTransientFile now.

+     snprintf(dump_file_path, sizeof(dump_file_path),
+                      "%s", AUTOPREWARM_FILE);
+     ret = fprintf(file, "%020jd\n", (intmax_t) 0);
+     if (ret < 0)
+     {
+             fclose(file);
+             ereport(ERROR,
+                             (errcode_for_file_access(),
+                              errmsg("autoprewarm: error writing to \"%s\" : %m",
+                                             dump_file_path)));
+     }
+
+     database_map_table[num_db++] = ftello(file);
+
+     for (i = 0; i < num_blocks; i++)
+     {
+             if (i > 0 && block_info_array[i].database != prev_database)
+             {
+                     if (num_db == database_map_table_size)
+                     {
+                             database_map_table_size *= 2;   /* double and repalloc. */
+                             database_map_table =
+                                     (off_t *) repalloc(database_map_table,
+                                                                     sizeof(off_t) * database_map_table_size);
+                     }
+                     fflush(file);
+                     database_map_table[num_db++] = ftello(file);
+             }
+
+             ret = fprintf(file, "%u,%u,%u,%u,%u\n",
+                                       block_info_array[i].database,
+                                       block_info_array[i].spcNode,
+                                       block_info_array[i].filenode,
+                                       (uint32) block_info_array[i].forknum,
+                                       block_info_array[i].blocknum);
+             if (ret < 0)
+             {
+                     fclose(file);
+                     ereport(ERROR,
+                                     (errcode_for_file_access(),
+                                      errmsg("autoprewarm: error writing to \"%s\" : %m",
+                                                     dump_file_path)));
+             }
+
+             prev_database = block_info_array[i].database;
+     }

I think we should check for interrupts somewhere in that (and the
preceding) loop.

-- Now checking from any sighup and if it is asked not to dump with
pg_prewarm.dump_interval set to -1 I will terminate the loop.

+/*
+ * dump_block_info_periodically - at regular intervals, which is defined by GUC
+ * dump_interval, dump the info of blocks which are present in buffer pool.
+ */
+void
+dump_block_info_periodically()
+{

Suggest adding void to the parameter list.

-- Fixed.

+                             last_dump_time = (pg_time_t) time(NULL);
+                             elapsed_secs = 0;
+                     }
+
+                     timeout = dump_interval - elapsed_secs;
+             }

I suggest using GetCurrenttimstamp() and TimestampDifferenceExceeds()
instead.

-- Fixed as suggested.

+             /*
+              * In case of a SIGHUP, just reload the configuration.
+              */
+             if (got_sighup)
+             {
+                     got_sighup = false;
+                     ProcessConfigFile(PGC_SIGHUP);
+             }
+     }
+
+     /* One last block meta info dump while postmaster shutdown. */
+     if (dump_interval != AT_PWARM_OFF)
+             dump_now();

Uh, afaics we'll also do this if somebody SIGTERMed the process
interactively?

-- Yes we dump and exit on a sigterm.

+     /* if not run as a preloaded library, nothing more to do here! */
+     if (!process_shared_preload_libraries_in_progress)
+             return;
+
+     DefineCustomStringVariable("pg_prewarm.default_database",
+                             "default database to connect if dump has not recorded same.",
+                                                        NULL,
+                                                        &default_database,
+                                                        "postgres",
+                                                        PGC_POSTMASTER,
+                                                        0,
+                                                        NULL,
+                                                        NULL,
+                                                        NULL);

I don't think it's a good idea to make guc registration depending on
process_shared_preload_libraries_in_progress.

-- Fixed now.

You should also use EmitWarningsOnPlaceholders() somewhere here.

-- Added same.

I also wonder whether we don't need to use prefetch to actually make
this fast enough.

-- I have not used prefetch but I will re-access and update the patch.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

autoprewarm_07.patchapplication/octet-stream; name=autoprewarm_07.patchDownload

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e..88580d1 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,10 +1,10 @@
 # contrib/pg_prewarm/Makefile
 
 MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+OBJS = pg_prewarm.o autoprewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
-DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
+DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
new file mode 100644
index 0000000..eb99628
--- /dev/null
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -0,0 +1,1135 @@
+/*-------------------------------------------------------------------------
+ *
+ * autoprewarm.c
+ *			Automatically prewarm the shared buffer pool when server restarts.
+ *
+ *	Copyright (c) 2016-2017, PostgreSQL Global Development Group
+ *
+ *	IDENTIFICATION
+ *		contrib/autoprewarm.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "pg_prewarm.h"
+
+/*
+ * autoprewarm :
+ *
+ * What is it?
+ * ===========
+ * A bgworker which automatically records information about blocks which were
+ * present in buffer pool before server shutdown and then prewarm the buffer
+ * pool upon server restart with those blocks.
+ *
+ * How does it work?
+ * =================
+ * When the shared library "pg_prewarm" is preloaded, a
+ * bgworker "autoprewarm" is launched immediately after the server has reached
+ * consistent state. The bgworker will start loading blocks recorded in the
+ * format BlockInfoRecord
+ * <<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>> in
+ * $PGDATA/AUTOPREWARM_FILE, until there is no free buffer left in the buffer
+ * pool. This way we do not replace any new blocks which were loaded either by
+ * the recovery process or the querying clients.
+ *
+ * Once the "autoprewarm" bgworker has completed its prewarm task, it will
+ * start a new task to periodically dump the BlockInfoRecords related to blocks
+ * which are currently in shared buffer pool. Upon next server restart, the
+ * bgworker will prewarm the buffer pool by loading those blocks. The GUC
+ * pg_prewarm.dump_interval will control the dumping activity of the bgworker.
+ */
+
+PG_FUNCTION_INFO_V1(launch_autoprewarm_dump);
+PG_FUNCTION_INFO_V1(autoprewarm_dump_now);
+
+#define AT_PWARM_OFF -1
+#define AT_PWARM_DUMP_AT_SHUTDOWN_ONLY 0
+#define AT_PWARM_DEFAULT_DUMP_INTERVAL 300
+
+#define AUTOPREWARM_FILE "autoprewarm.blocks"
+
+/* Primary functions */
+void		_PG_init(void);
+void		autoprewarm_main(Datum main_arg);
+static void dump_block_info_periodically(void);
+static pid_t autoprewarm_dump_launcher(void);
+static void setup_autoprewarm(BackgroundWorker *autoprewarm,
+				  const char *worker_name,
+				  const char *worker_function,
+				  Datum main_arg, int restart_time,
+				  int extra_flags);
+void		load_one_database(Datum main_arg);
+
+/*
+ * Signal Handlers.
+ */
+
+static void SigtermHandler(SIGNAL_ARGS);
+static void SighupHandler(SIGNAL_ARGS);
+static void Sigusr1Handler(SIGNAL_ARGS);
+
+/* flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+static volatile sig_atomic_t got_sighup = false;
+
+/*
+ *	Signal handler for SIGTERM
+ *	Set a flag to let the main loop to terminate, and set our latch to wake it
+ *	up.
+ */
+static void
+SigtermHandler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGHUP
+ *	Set a flag to tell the process to reread the config file, and set our
+ *	latch to wake it up.
+ */
+static void
+SighupHandler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sighup = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGUSR1.
+ *	The prewarm sub-workers will notify with SIGUSR1 on their startup/shutdown.
+ */
+static void
+Sigusr1Handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/* ============================================================================
+ * ==============	types and variables used by autoprewam	  =============
+ * ============================================================================
+ */
+
+/*
+ * Meta-data of each persistent block which is dumped and used to load.
+ */
+typedef struct BlockInfoRecord
+{
+	Oid			database;		/* database */
+	Oid			spcNode;		/* tablespace */
+	Oid			filenode;		/* relation's filenode. */
+	ForkNumber	forknum;		/* fork number */
+	BlockNumber blocknum;		/* block number */
+} BlockInfoRecord;
+
+/*
+ * Tasks performed by autoprewarm workers.
+ */
+typedef enum
+{
+	TASK_PREWARM_BUFFERPOOL,	/* prewarm the buffer pool. */
+	TASK_DUMP_BUFFERPOOL_INFO,	/* dump the buffer pool block info. */
+	TASK_DUMP_IMMEDIATE_ONCE,	/* dump the buffer pool block info immediately
+								 * once. */
+	TASK_END					/* no more tasks to do. */
+} AutoPrewarmTask;
+
+/*
+ * Shared state information about the running autoprewarm bgworker.
+ */
+typedef struct AutoPrewarmSharedState
+{
+	LWLock	   *lock;			/* protects SharedState */
+	AutoPrewarmTask current_task;		/* current tasks performed by
+										 * autoprewarm workers. */
+	bool		is_bgworker_running;	/* if set can't start another worker. */
+	bool		can_do_prewarm; /* if set can't do prewarm task. */
+
+	/* dsa used to distribute block_infos among subworkers of prewarm task. */
+	dsa_handle	apw_dsa_handle;
+	uint32		apw_num_block_infos;	/* total number of sorted block_infos
+										 * in dsa. */
+	dsa_pointer apw_block_infos;/* dsa memory where above block_infos are
+								 * stored. */
+} AutoPrewarmSharedState;
+
+static dsa_area *AutoPrewarmDSA = NULL;
+
+static AutoPrewarmSharedState *state = NULL;
+
+/*
+ * Kind of BlockInfoRecord in AUTOPREWARM_FILE file.
+ */
+typedef enum
+{
+	BLKTYPE_NEW_DATABASE,		/* first BlockInfoRecord of new database. */
+	BLKTYPE_NEW_RELATION,		/* first BlockInfoRecord of new relation. */
+	BLKTYPE_NEW_FORK,			/* first BlockInfoRecord of new fork file. */
+	BLKTYPE_NEW_BLOCK,			/* any next BlockInfoRecord. */
+	BLKTYPE_END					/* No More BlockInfoRecords available in dump
+								 * file. */
+} BlkType;
+
+/* GUC variable which control the dump activity of autoprewarm. */
+static int	dump_interval = 0;
+
+/*
+ * GUC variable which says to which database we have to connect when
+ * BlockInfoRecord belongs to global objects.
+ */
+static char *default_database;
+
+/* compare member elements to check if they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * blockinfo_cmp - compare function used for qsort().
+ */
+static int
+blockinfo_cmp(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(spcNode);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+	return 0;
+}
+
+/* ============================================================================
+ * =====================	prewarm part of autoprewarm =======================
+ * ============================================================================
+ */
+
+/*
+ * reset_shm_state - on_shm_exit reset the prewarm state.
+ */
+
+static void
+reset_shm_state(int code, Datum arg)
+{
+	state->is_bgworker_running = false;
+	state->current_task = TASK_END;
+	if (AutoPrewarmDSA)
+	{
+		if (state->apw_block_infos != InvalidDsaPointer)
+		{
+			dsa_free(AutoPrewarmDSA, state->apw_block_infos);
+			state->apw_block_infos = InvalidDsaPointer;
+			state->apw_num_block_infos = 0;
+		}
+
+		dsa_detach(AutoPrewarmDSA);
+		AutoPrewarmDSA = NULL;
+	}
+}
+
+/*
+ * get_autoprewarm_task - get next task allowed and to be performed by the
+ * autoprewarm worker.
+ */
+static AutoPrewarmTask
+get_autoprewarm_task(AutoPrewarmTask todo_task)
+{
+	bool		found = false;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	state = ShmemInitStruct("autoprewarm",
+							sizeof(AutoPrewarmSharedState),
+							&found);
+	if (!found)
+	{
+		/* First time through ... */
+		state->lock = &(GetNamedLWLockTranche("pg_autoprewarm"))->lock;
+		state->current_task = TASK_END;
+		state->is_bgworker_running = false;
+		state->can_do_prewarm = true;
+	}
+
+	LWLockRelease(AddinShmemInitLock);
+	LWLockAcquire(state->lock, LW_EXCLUSIVE);
+
+	/*
+	 * If already a bgworker is running we cannot run another. But if task is
+	 * to just dump immediate and there is no prewarm happening we can go
+	 * further.
+	 */
+	if (state->is_bgworker_running &&
+		(todo_task != TASK_DUMP_IMMEDIATE_ONCE ||
+		 state->current_task != TASK_PREWARM_BUFFERPOOL))
+	{
+		LWLockRelease(state->lock);
+		return TASK_END;
+	}
+
+	/*
+	 * If asked to do prewarm check can we do so. We avoid prewarm if its
+	 * already done on startup.
+	 */
+	if (todo_task == TASK_PREWARM_BUFFERPOOL && !state->can_do_prewarm)
+		todo_task = TASK_DUMP_BUFFERPOOL_INFO;
+
+	/*
+	 * For now if there was a previous attempt to prewarm or dump any further
+	 * request to prewarm will not be entertained.
+	 */
+	state->can_do_prewarm = false;
+
+	if (todo_task != TASK_DUMP_IMMEDIATE_ONCE)
+	{
+		state->is_bgworker_running = true;
+		state->current_task = todo_task;
+		on_shmem_exit(reset_shm_state, 0);
+	}
+
+	LWLockRelease(state->lock);
+	return todo_task;
+}
+
+/*
+ * getnextblockinfo -- given a BlkType get its next BlockInfoRecord.
+ */
+static BlkType
+getnextblockinfo(BlockInfoRecord *toload_block, uint32 curr_blockinfo_pos,
+				 BlkType reqblock, uint32 *next_blockinfo_pos)
+{
+	BlkType		nextblk;
+
+	while (true)
+	{
+		/* get next block. */
+		if (curr_blockinfo_pos >= state->apw_num_block_infos)
+			return BLKTYPE_END; /* No more valid entry hence stop processing. */
+
+		if (toload_block[curr_blockinfo_pos].database !=
+			toload_block[curr_blockinfo_pos + 1].database)
+			nextblk = BLKTYPE_NEW_DATABASE;
+		else if (toload_block[curr_blockinfo_pos].filenode !=
+				 toload_block[curr_blockinfo_pos + 1].filenode)
+			nextblk = BLKTYPE_NEW_RELATION;
+		else if (toload_block[curr_blockinfo_pos].forknum !=
+				 toload_block[curr_blockinfo_pos + 1].forknum)
+			nextblk = BLKTYPE_NEW_FORK;
+		else
+			nextblk = BLKTYPE_NEW_BLOCK;
+
+		if (nextblk <= reqblock)
+		{
+			*next_blockinfo_pos = curr_blockinfo_pos + 1;
+			return nextblk;
+		}
+
+		curr_blockinfo_pos++;
+	}
+}
+
+/*
+ * connect_to_db -- connect to the given dbid.
+ *
+ * For global objects the dbid will be InvalidOid, connect to user given
+ * default_database and try to load those blocks.
+ */
+static void
+connect_to_db(Oid dbid)
+{
+	if (!OidIsValid(dbid))
+		BackgroundWorkerInitializeConnection(default_database, NULL);
+	else
+		BackgroundWorkerInitializeConnectionByOid(dbid, InvalidOid);
+	SetCurrentStatementStartTimestamp();
+	StartTransactionCommand();
+	SPI_connect();
+	PushActiveSnapshot(GetTransactionSnapshot());
+}
+
+/*
+ * load_one_database -- start of prewarm sub-worker, this will try to load
+ * blocks of one database starting from block info position passed by main
+ * prewarm worker.
+ */
+void
+load_one_database(Datum main_arg)
+{
+	uint32		curr_pos,
+				next_pos;
+	BlockInfoRecord *toload_block;
+	Relation	rel = NULL;
+	bool		have_dbconnection = false;
+	BlkType		loadblocktype;
+	BlockNumber nblocks = 0;
+	bool		found;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, SigtermHandler);
+	pqsignal(SIGHUP, SighupHandler);
+
+	/*
+	 * We're now ready to receive signals
+	 */
+	BackgroundWorkerUnblockSignals();
+
+	curr_pos = DatumGetInt64(main_arg);
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	state = ShmemInitStruct("autoprewarm",
+							sizeof(AutoPrewarmSharedState),
+							&found);
+	LWLockRelease(AddinShmemInitLock);
+
+	Assert(found);
+	Assert(state->apw_dsa_handle);
+
+	AutoPrewarmDSA = dsa_attach(state->apw_dsa_handle);
+	toload_block = (BlockInfoRecord *)
+		dsa_get_address(AutoPrewarmDSA, state->apw_block_infos);
+
+	loadblocktype = BLKTYPE_NEW_DATABASE;
+	next_pos = curr_pos;
+
+	while (loadblocktype != BLKTYPE_END)
+	{
+		Buffer		buf;
+		Oid			reloid;
+
+		/*
+		 * Load the block only if there exist a free buffer. We do not want to
+		 * replace a block already in buffer pool.
+		 */
+		if (!have_free_buffer())
+			goto end_load;
+
+		if (got_sigterm)
+			goto end_load;
+
+		switch (loadblocktype)
+		{
+			case BLKTYPE_NEW_DATABASE:
+
+				if (have_dbconnection)
+					goto end_load;		/* blocks belong to a new database,
+										 * lets end the loading process. */
+
+				/*
+				 * connect to the database.
+				 */
+				connect_to_db(toload_block[next_pos].database);
+				have_dbconnection = true;
+
+			case BLKTYPE_NEW_RELATION:
+
+				/*
+				 * release lock on previous relation.
+				 */
+				if (rel)
+				{
+					relation_close(rel, AccessShareLock);
+					rel = NULL;
+				}
+
+				loadblocktype = BLKTYPE_NEW_RELATION;
+
+				/*
+				 * lock new relation.
+				 */
+				reloid =
+					RelidByRelfilenode(toload_block[next_pos].spcNode,
+									   toload_block[next_pos].filenode);
+
+				if (!OidIsValid(reloid))
+					break;
+
+				rel = try_relation_open(reloid, AccessShareLock);
+				if (!rel)
+					break;
+				RelationOpenSmgr(rel);
+
+			case BLKTYPE_NEW_FORK:
+
+				/*
+				 * check if fork exists and if block is within the range
+				 */
+				loadblocktype = BLKTYPE_NEW_FORK;
+				if (toload_block[next_pos].forknum > InvalidForkNumber &&
+					toload_block[next_pos].forknum <= MAX_FORKNUM &&
+					!smgrexists(rel->rd_smgr, toload_block[next_pos].forknum))
+					break;
+				nblocks = RelationGetNumberOfBlocksInFork(rel,
+											 toload_block[next_pos].forknum);
+			case BLKTYPE_NEW_BLOCK:
+
+				/* check if blocknum is valid and with in fork file size. */
+				if (toload_block[next_pos].blocknum >= nblocks)
+				{
+					/* move to next forknum. */
+					loadblocktype = BLKTYPE_NEW_FORK;
+					break;
+				}
+
+				buf = ReadBufferExtended(rel, toload_block[next_pos].forknum,
+								 toload_block[next_pos].blocknum, RBM_NORMAL,
+										 NULL);
+				if (BufferIsValid(buf))
+				{
+					ReleaseBuffer(buf);
+				}
+
+				loadblocktype = BLKTYPE_NEW_BLOCK;
+				break;
+
+			case BLKTYPE_END:
+				Assert(0);		/* Should not be here! */
+		}
+
+		curr_pos = next_pos;
+		loadblocktype = getnextblockinfo(toload_block, curr_pos, loadblocktype,
+										 &next_pos);
+	}
+
+end_load:
+
+	dsa_detach(AutoPrewarmDSA);
+	/* release lock on previous relation. */
+	if (rel)
+	{
+		relation_close(rel, AccessShareLock);
+		rel = NULL;
+	}
+
+	if (have_dbconnection)
+	{
+		SPI_finish();
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+	return;
+}
+
+/*
+ * launch_prewarm_subworker -- register a dynamic worker to load the blocks
+ * starting from next_db_pos. We wait until the worker has stopped.
+ */
+static void
+launch_prewarm_subworker(uint32 next_db_pos)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle = NULL;
+	BgwHandleStatus status;
+
+	setup_autoprewarm(&worker, "autoprewarm", "load_one_database",
+					  Int64GetDatum(next_db_pos), BGW_NEVER_RESTART,
+					  BGWORKER_BACKEND_DATABASE_CONNECTION);
+	/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerShutdown */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker autoprewarm failed"),
+				 errhint("Consider increasing configuration parameter "
+						 "\"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerShutdown(handle);
+	if (status == BGWH_STOPPED)
+		return;
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			  errmsg("cannot start bgworker autoprewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart"
+						 " the database.")));
+	}
+
+	Assert(0);
+}
+
+/*
+ *	prewarm_buffer_pool - the main routine which prewarm the buffer pool.
+ *
+ *	The prewarm bgworker will first load all of the BlockInfoRecord's in
+ *	$PGDATA/AUTOPREWARM_FILE to a dsa. And those BlockInfoRecords are further
+ *	separated based on their database. And for each group of BlockInfoRecords a
+ *	sub-workers will be launched to load corresponding blocks. Each sub-worker
+ *	will be launched in sequential order only after the previous sub-worker has
+ *	finished its job.
+ */
+static void
+prewarm_buffer_pool(void)
+{
+	static char dump_file_path[MAXPGPATH];
+	FILE	   *file = NULL;
+	uint32	   *next_db_pos;
+	size_t		next_db_pos_size;
+	uint32		this_dbs_elements = 0,
+				num_elements,
+				num_db = 0,
+				i;
+	Oid			prev_database;
+	BlockInfoRecord *blkinfo;
+
+	snprintf(dump_file_path, sizeof(dump_file_path), "%s",
+			 AUTOPREWARM_FILE);
+
+	file = fopen(dump_file_path, PG_BINARY_R);
+	if (!file)
+		return;					/* No file to load. */
+
+	if (!state->apw_dsa_handle)
+	{
+		AutoPrewarmDSA = dsa_create(state->lock->tranche);
+		state->apw_dsa_handle = dsa_get_handle(AutoPrewarmDSA);
+	}
+	else
+		AutoPrewarmDSA = dsa_attach(state->apw_dsa_handle);
+
+	if (fscanf(file, "<<%u>>", &num_elements) != 1)
+	{
+		fclose(file);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("Error reading num of elements in \"%s\" for"
+						" autoprewarm : %m", dump_file_path)));
+	}
+
+	state->apw_block_infos =
+		dsa_allocate_extended(AutoPrewarmDSA,
+							  sizeof(BlockInfoRecord) * num_elements,
+							  DSA_ALLOC_NO_OOM);
+	if (state->apw_block_infos == InvalidDsaPointer)
+	{
+		fclose(file);
+		dsa_detach(AutoPrewarmDSA);
+		AutoPrewarmDSA = NULL;
+		return;
+	}
+
+	state->apw_num_block_infos = num_elements;
+
+	blkinfo = (BlockInfoRecord *)
+		dsa_get_address(AutoPrewarmDSA, state->apw_block_infos);
+
+	next_db_pos_size = 64;
+	next_db_pos = (uint32 *) palloc(sizeof(uint32) * next_db_pos_size);
+
+	/* read and fill block infos */
+	for (i = 0; i < num_elements; i++, blkinfo++)
+	{
+		/* get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &blkinfo->database,
+						&blkinfo->spcNode, &blkinfo->filenode,
+						(uint32 *) &blkinfo->forknum,
+						&blkinfo->blocknum))
+			break;				/* no more records found. */
+		if (i == 0)
+		{
+			next_db_pos[num_db++] = 0;
+			prev_database = blkinfo->database;
+		}
+		else if (prev_database != blkinfo->database)
+		{
+			if (num_db >= next_db_pos_size)
+			{
+				next_db_pos_size *= 2;
+				next_db_pos = (uint32 *) repalloc(next_db_pos,
+										  sizeof(uint32) * next_db_pos_size);
+			}
+
+			next_db_pos[num_db++] = this_dbs_elements;
+			this_dbs_elements = 0;
+			prev_database = blkinfo->database;
+		}
+
+		this_dbs_elements++;
+	}
+
+	fclose(file);
+	i = 0;
+
+	/* get next database's first block info's position. */
+	while (!got_sigterm && i < num_db)
+	{
+		/*
+		 * Register a sub-worker to load new database's block. Wait until the
+		 * sub-worker finish its job before launching next subworker.
+		 */
+		launch_prewarm_subworker(next_db_pos[i++]);
+	}
+
+	pfree(next_db_pos);
+
+	if (AutoPrewarmDSA)
+	{
+		if (state->apw_block_infos != InvalidDsaPointer)
+		{
+			dsa_free(AutoPrewarmDSA, state->apw_block_infos);
+			state->apw_block_infos = InvalidDsaPointer;
+			state->apw_num_block_infos = 0;
+		}
+
+		dsa_detach(AutoPrewarmDSA);
+		AutoPrewarmDSA = NULL;
+	}
+
+	ereport(LOG, (errmsg("autoprewarm load task ended")));
+	return;
+}
+
+/* ============================================================================
+ * =============	buffer pool info dump part of autoprewarm	===============
+ * ============================================================================
+ */
+
+/* This sub-module is for periodically dumping buffer pool's block info into
+ * a dump file AUTOPREWARM_FILE.
+ * Each entry of block info looks like this:
+ * <DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum> and we shall call it
+ * as BlockInfoRecord. Note we write in the text form so that the dump
+ * information is readable and if necessary can be carefully edited.
+ *
+ * The prewarm task will read these blockInfoRecord one by one in sequence and
+ * distribute it among its sub workers to load corresponding blocks.
+ */
+
+/*
+ *	dump_now - the main routine which goes through each buffer header of buffer
+ *	pool and dumps their meta data. We Sort these data and then dump them.
+ *	Sorting is necessary as it facilitates sequential read during load.
+ */
+static uint32
+dump_now(void)
+{
+	static char dump_file_path[MAXPGPATH],
+				transient_dump_file_path[MAXPGPATH];
+	uint32		i;
+	int			ret,
+				buflen;
+	uint32		num_blocks;
+	BlockInfoRecord *block_info_array;
+	BufferDesc *bufHdr;
+	int			fd;
+	char		buf[1024];
+
+	block_info_array =
+		(BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	for (num_blocks = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Have we been asked to stop dump? */
+		if (dump_interval == AT_PWARM_OFF)
+		{
+			free(block_info_array);
+			return 0;
+		}
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		if (buf_state & BM_TAG_VALID)
+		{
+			block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_blocks].spcNode = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+			++num_blocks;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	/*
+	 * sort the block number to increase chance of sequential reads during
+	 * load.
+	 */
+	pg_qsort(block_info_array, num_blocks, sizeof(BlockInfoRecord),
+			 blockinfo_cmp);
+
+	snprintf(transient_dump_file_path, sizeof(dump_file_path),
+			 "%s.%d", AUTOPREWARM_FILE, MyProcPid);
+
+	fd = OpenTransientFile(transient_dump_file_path,
+						   O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("autoprewarm: could not open \"%s\": %m",
+						dump_file_path)));
+
+	snprintf(dump_file_path, sizeof(dump_file_path),
+			 "%s", AUTOPREWARM_FILE);
+	buflen = sprintf(buf, "<<%u>>\n", num_blocks);
+	if (write(fd, buf, buflen) < buflen)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("autoprewarm: error writing to \"%s\" : %m",
+						dump_file_path)));
+
+	for (i = 0; i < num_blocks; i++)
+	{
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Have we been asked to stop dump? */
+		if (dump_interval == AT_PWARM_OFF)
+		{
+			free(block_info_array);
+			CloseTransientFile(fd);
+			unlink(transient_dump_file_path);
+			return 0;
+		}
+
+		buflen = sprintf(buf, "%u,%u,%u,%u,%u\n",
+						 block_info_array[i].database,
+						 block_info_array[i].spcNode,
+						 block_info_array[i].filenode,
+						 (uint32) block_info_array[i].forknum,
+						 block_info_array[i].blocknum);
+
+		if (write(fd, buf, buflen) < buflen)
+		{
+			CloseTransientFile(fd);
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("autoprewarm: error writing to \"%s\" : %m",
+							dump_file_path)));
+		}
+	}
+
+	pfree(block_info_array);
+
+	/*
+	 * rename transient_dump_file_path to dump_file_path to make things
+	 * permanent.
+	 */
+	ret = CloseTransientFile(fd);
+	if (ret != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("autoprewarm: error closing \"%s\" : %m",
+						transient_dump_file_path)));
+	(void) durable_rename(transient_dump_file_path, dump_file_path, LOG);
+
+	ereport(LOG, (errmsg("autoprewarm: saved metadata info of %d blocks",
+						 num_blocks)));
+	return num_blocks;
+}
+
+/*
+ * dump_block_info_periodically - at regular intervals, which is defined by GUC
+ * dump_interval, dump the info of blocks which are present in buffer pool.
+ */
+void
+dump_block_info_periodically(void)
+{
+	TimestampTz last_dump_time = GetCurrentTimestamp();
+
+	while (!got_sigterm)
+	{
+		int			rc;
+		struct timeval nap;
+
+		nap.tv_sec = AT_PWARM_DEFAULT_DUMP_INTERVAL;
+		nap.tv_usec = 0;
+
+		/* Has been set not to dump. Nothing more to do. */
+		if (dump_interval == AT_PWARM_OFF)
+			return;
+
+		if (dump_interval > AT_PWARM_DUMP_AT_SHUTDOWN_ONLY)
+		{
+			TimestampTz current_time = GetCurrentTimestamp();
+
+			if (TimestampDifferenceExceeds(last_dump_time,
+										   current_time,
+										   (dump_interval * 1000)))
+			{
+				dump_now();
+				if (got_sigterm)
+					return;		/* got shutdown signal just after a dump. And,
+								 * I think better to return now. */
+				last_dump_time = GetCurrentTimestamp();
+				nap.tv_sec = dump_interval;
+				nap.tv_usec = 0;
+			}
+			else
+			{
+				long		secs;
+				int			usecs;
+
+				TimestampDifference(last_dump_time, current_time,
+									&secs, &usecs);
+				nap.tv_sec = dump_interval - secs;
+				nap.tv_usec = 0;
+			}
+		}
+
+		ResetLatch(&MyProc->procLatch);
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   (nap.tv_sec * 1000L) + (nap.tv_usec / 1000L),
+					   PG_WAIT_EXTENSION);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* One last block meta info dump while postmaster shutdown. */
+	if (dump_interval != AT_PWARM_OFF)
+		dump_now();
+}
+
+/*
+ * autoprewarm_main -- the main entry point of autoprewarm bgworker process.
+ */
+void
+autoprewarm_main(Datum main_arg)
+{
+	AutoPrewarmTask next_task;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, SigtermHandler);
+	pqsignal(SIGHUP, SighupHandler);
+	pqsignal(SIGUSR1, Sigusr1Handler);
+
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+
+	next_task = get_autoprewarm_task(DatumGetInt32(main_arg));
+
+	ereport(LOG, (errmsg("autoprewarm has started")));
+
+	/*
+	 * **** perform autoprewarm's next task	****
+	 */
+	if (next_task == TASK_PREWARM_BUFFERPOOL)
+	{
+		prewarm_buffer_pool();
+		/* prewarm is done lets move to TASK_DUMP_BUFFERPOOL_INFO. */
+		state->current_task = TASK_DUMP_BUFFERPOOL_INFO;
+		next_task = TASK_DUMP_BUFFERPOOL_INFO;
+	}
+
+	if (next_task == TASK_DUMP_BUFFERPOOL_INFO)
+	{
+		dump_block_info_periodically();
+
+		/*
+		 * down grade to TASK_DUMP_IMMEDIATE_ONCE so others can start
+		 * TASK_DUMP_BUFFERPOOL_INFO
+		 */
+		state->current_task = TASK_DUMP_IMMEDIATE_ONCE;
+	}
+
+	ereport(LOG, (errmsg("autoprewarm shutting down")));
+}
+
+/* ============================================================================
+ * =============	extension's entry functions/utilities	===================
+ * ============================================================================
+ */
+
+/* Register autoprewarm load bgworker. */
+static void
+setup_autoprewarm(BackgroundWorker *autoprewarm, const char *worker_name,
+			   const char *worker_function, Datum main_arg, int restart_time,
+				  int extra_flags)
+{
+	MemSet(autoprewarm, 0, sizeof(BackgroundWorker));
+	autoprewarm->bgw_flags = BGWORKER_SHMEM_ACCESS | extra_flags;
+
+	/* Register the autoprewarm background worker */
+	autoprewarm->bgw_start_time = BgWorkerStart_ConsistentState;
+	autoprewarm->bgw_restart_time = restart_time;
+	strcpy(autoprewarm->bgw_library_name, "pg_prewarm");
+	strcpy(autoprewarm->bgw_function_name, worker_function);
+	strncpy(autoprewarm->bgw_name, worker_name, BGW_MAXLEN);
+	autoprewarm->bgw_main_arg = main_arg;
+}
+
+/* Extension's entry point. */
+void
+_PG_init(void)
+{
+	BackgroundWorker autoprewarm;
+
+	/* Define custom GUC variables. */
+	DefineCustomIntVariable("pg_prewarm.dump_interval",
+					   "Sets the maximum time between two buffer pool dumps",
+							"If set to Zero, timer based dumping is disabled."
+							" If set to -1, stops the running autoprewarm.",
+							&dump_interval,
+							AT_PWARM_DEFAULT_DUMP_INTERVAL,
+							AT_PWARM_OFF, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	DefineCustomStringVariable("pg_prewarm.default_database",
+				"default database to connect if dump has not recorded same.",
+							   NULL,
+							   &default_database,
+							   "postgres",
+							   PGC_POSTMASTER,
+							   0,
+							   NULL,
+							   NULL,
+							   NULL);
+	EmitWarningsOnPlaceholders("pg_prewarm");
+
+	/* if not run as a preloaded library, nothing more to do here! */
+	if (!process_shared_preload_libraries_in_progress)
+		return;
+
+	/* Request additional shared resources */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
+	RequestNamedLWLockTranche("pg_autoprewarm", 1);
+
+	/* Has been set not to prewarm/dump. Nothing more to do. */
+	if (dump_interval == AT_PWARM_OFF)
+		return;
+
+	/* Register autoprewarm load. */
+	setup_autoprewarm(&autoprewarm, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_PREWARM_BUFFERPOOL), 0, 0);
+	RegisterBackgroundWorker(&autoprewarm);
+}
+
+/*
+ * Dynamically launch an autoprewarm dump worker.
+ */
+static pid_t
+autoprewarm_dump_launcher(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	setup_autoprewarm(&worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_DUMP_BUFFERPOOL_INFO), 0, 0);
+
+	/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			   errmsg("registering dynamic bgworker \"autoprewarm\" failed"),
+				 errhint("Consider increasing configuration parameter "
+						 "\"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerStartup(handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not start autoprewarm dump bgworker"),
+			   errhint("More details may be available in the server log.")));
+	}
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			  errmsg("cannot start bgworker autoprewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart"
+						 " the database.")));
+	}
+
+	Assert(status == BGWH_STARTED);
+	return pid;
+}
+
+/*
+ * The C-Language entry function to launch autoprewarm dump bgworker.
+ */
+Datum
+launch_autoprewarm_dump(PG_FUNCTION_ARGS)
+{
+	pid_t		pid;
+
+	/* Has been set not to prewarm/dump. Nothing more to do. */
+	if (dump_interval == AT_PWARM_OFF)
+		PG_RETURN_NULL();
+
+	pid = autoprewarm_dump_launcher();
+	PG_RETURN_INT32(pid);
+}
+
+/*
+ * The C-Language entry function to dump immediately.
+ */
+Datum
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+	AutoPrewarmTask next_task;
+
+	/* dump only if prewarm is not in progress. */
+	next_task = get_autoprewarm_task(TASK_DUMP_IMMEDIATE_ONCE);
+	if (next_task == TASK_DUMP_IMMEDIATE_ONCE)
+		PG_RETURN_INT64(dump_now());
+	PG_RETURN_NULL();
+}
diff --git a/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
new file mode 100644
index 0000000..6c35fb7
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,14 @@
+/* contrib/pg_prewarm/pg_prewarm--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_prewarm UPDATE TO '1.2'" to load this file. \quit
+
+CREATE FUNCTION launch_autoprewarm_dump()
+RETURNS pg_catalog.int4 STRICT
+AS 'MODULE_PATHNAME', 'launch_autoprewarm_dump'
+LANGUAGE C;
+
+CREATE FUNCTION autoprewarm_dump_now()
+RETURNS pg_catalog.int8 STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_dump_now'
+LANGUAGE C;
diff --git a/contrib/pg_prewarm/pg_prewarm.control b/contrib/pg_prewarm/pg_prewarm.control
index cf2fb92..40e3add 100644
--- a/contrib/pg_prewarm/pg_prewarm.control
+++ b/contrib/pg_prewarm/pg_prewarm.control
@@ -1,5 +1,5 @@
 # pg_prewarm extension
 comment = 'prewarm relation data'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_prewarm'
 relocatable = true
diff --git a/contrib/pg_prewarm/pg_prewarm.h b/contrib/pg_prewarm/pg_prewarm.h
new file mode 100644
index 0000000..9e2a6e6
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm.h
@@ -0,0 +1,43 @@
+/*
+ * contrib/pg_prewarm/pg_prewarm.h
+ */
+#ifndef __PG_PREWARM_H__
+#define __PG_PREWARM_H__
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <unistd.h>
+
+/* These are always necessary for a bgworker. */
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+
+/* These are necessary for prewarm utilities. */
+#include "access/heapam.h"
+#include "access/xact.h"
+#include "catalog/catalog.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "executor/spi.h"
+#include "fmgr.h"
+#include "miscadmin.h"
+#include "port/atomics.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/buf_internals.h"
+#include "storage/smgr.h"
+#include "utils/acl.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relfilenodemap.h"
+#include "utils/resowner.h"
+
+#endif   /* __PG_PREWARM_H__ */
diff --git a/doc/src/sgml/pgprewarm.sgml b/doc/src/sgml/pgprewarm.sgml
index c090401..1538446 100644
--- a/doc/src/sgml/pgprewarm.sgml
+++ b/doc/src/sgml/pgprewarm.sgml
@@ -10,7 +10,9 @@
  <para>
   The <filename>pg_prewarm</filename> module provides a convenient way
   to load relation data into either the operating system buffer cache
-  or the <productname>PostgreSQL</productname> buffer cache.
+  or the <productname>PostgreSQL</productname> buffer cache. Additionally, an
+  automatic prewarming of the server buffers is supported whenever the server
+  restarts.
  </para>
 
  <sect2>
@@ -55,6 +57,102 @@ pg_prewarm(regclass, mode text default 'buffer', fork text default 'main',
    cache. For these reasons, prewarming is typically most useful at startup,
    when caches are largely empty.
   </para>
+
+<synopsis>
+launch_autoprewarm_dump() RETURNS int4
+</synopsis>
+
+  <para>
+   This is a SQL callable function to launch the <literal>autoprewarm</literal>
+   worker to dump the buffer pool information at regular interval. In a server,
+   we can only run one <literal>autoprewarm</literal> worker so if worker sees
+   another existing worker it will exit immediately. The return value is pid of
+   the worker which has been launched.
+  </para>
+
+<synopsis>
+autoprewarm_dump_now() RETURNS int8
+</synopsis>
+
+  <para>
+   This is a SQL callable function to dump buffer pool information immediately
+   once by a backend. This can work in parallel
+   with the <literal>autoprewarm</literal> worker while it is dumping.
+   The return value is the number of blocks info dumped.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>autoprewarm</title>
+
+  <para>
+  A bgworker which automatically records information about blocks which were
+  present in buffer pool before server shutdown and then prewarm the buffer
+  pool upon server restart with those blocks.
+  </para>
+
+  <para>
+  When the shared library <literal>pg_prewarm</literal> is preloaded via
+  <xref linkend="guc-shared-preload-libraries"> in <filename>postgresql.conf</>,
+  a bgworker <literal>autoprewarm</literal> is launched immediately after the
+  server has reached a consistent state. The bgworker will start loading blocks
+  recorded in <literal>$PGDATA/autoprewarm.blocks</literal> until there is a
+  free buffer left in the buffer pool. This way we do not replace any new
+  blocks which were loaded either by the recovery process or the querying
+  clients.
+  </para>
+
+  <para>
+  Once the <literal>autoprewarm</literal> bgworker has completed its prewarm
+  task, it will start a new task to periodically dump the information about
+  blocks which are currently in shared buffer pool. Upon next server restart,
+  the bgworker will prewarm the buffer pool by loading those blocks. The GUC
+  <literal>pg_prewarm.dump_interval</literal> will control the dumping activity
+  of the bgworker.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+   <term>
+     <varname>pg_prewarm.dump_interval</varname> (<type>int</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.dump_interval</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is valid only for <literal>autoprewarm</literal>. The minimum number
+      of seconds between two buffer pool's block information dump. The default
+      is 300 seconds. It also takes special values. If set to 0 then timer
+      based dump is disabled, it dumps only while the server is shutting down.
+      If set to -1, the running <literal>autoprewarm</literal> will be stopped.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+ <variablelist>
+   <varlistentry>
+    <term>
+     <varname>pg_prewarm.default_database</varname> (<type>string</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.default_database</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is valid only for <literal>autoprewarm</literal>. The blocks of
+      global objects will not have a database associated with them. The
+      <literal>default_database</literal> will be used to connect and preload
+      such blocks.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
  </sect2>
 
  <sect2>
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 5d0a636..06a34a7 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,23 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- a lockless check to see if there is a free buffer in
+ *					   buffer pool.
+ *
+ * If the result is true that will become stale once free buffers are moved out
+ * by other operations, so the caller who strictly want to use a free buffer
+ * should not call this.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index ff99f6b..ab04bd9 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 15c72f5..ad6fb9d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -133,6 +133,8 @@ AttrDefault
 AttrNumber
 AttributeOpts
 AuthRequest
+AutoPrewarmSharedState
+AutoPrewarmTask
 AutoVacOpts
 AutoVacuumShmemStruct
 AuxProcType
@@ -206,10 +208,12 @@ BitmapOr
 BitmapOrPath
 BitmapOrState
 Bitmapset
+BlkType
 BlobInfo
 Block
 BlockId
 BlockIdData
+BlockInfoRecord
 BlockNumber
 BlockSampler
 BlockSamplerData

#69

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Mithun Cy (#68)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

On Tue, May 23, 2017 at 7:06 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

Thanks, Andres,

I have tried to fix all of your comments.

There was a typo issue in previous patch 07 where instead of == I have
used !=. And, a mistake in comments. I have corrected same now.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

autoprewarm_08.patchapplication/octet-stream; name=autoprewarm_08.patchDownload

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e..88580d1 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,10 +1,10 @@
 # contrib/pg_prewarm/Makefile
 
 MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+OBJS = pg_prewarm.o autoprewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
-DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
+DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
new file mode 100644
index 0000000..3df477c
--- /dev/null
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -0,0 +1,1135 @@
+/*-------------------------------------------------------------------------
+ *
+ * autoprewarm.c
+ *			Automatically prewarm the shared buffer pool when server restarts.
+ *
+ *	Copyright (c) 2016-2017, PostgreSQL Global Development Group
+ *
+ *	IDENTIFICATION
+ *		contrib/autoprewarm.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "pg_prewarm.h"
+
+/*
+ * autoprewarm :
+ *
+ * What is it?
+ * ===========
+ * A bgworker which automatically records information about blocks which were
+ * present in buffer pool before server shutdown and then prewarm the buffer
+ * pool upon server restart with those blocks.
+ *
+ * How does it work?
+ * =================
+ * When the shared library "pg_prewarm" is preloaded, a
+ * bgworker "autoprewarm" is launched immediately after the server has reached
+ * consistent state. The bgworker will start loading blocks recorded in the
+ * format BlockInfoRecord
+ * <<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>> in
+ * $PGDATA/AUTOPREWARM_FILE, until there is no free buffer left in the buffer
+ * pool. This way we do not replace any new blocks which were loaded either by
+ * the recovery process or the querying clients.
+ *
+ * Once the "autoprewarm" bgworker has completed its prewarm task, it will
+ * start a new task to periodically dump the BlockInfoRecords related to blocks
+ * which are currently in shared buffer pool. Upon next server restart, the
+ * bgworker will prewarm the buffer pool by loading those blocks. The GUC
+ * pg_prewarm.dump_interval will control the dumping activity of the bgworker.
+ */
+
+PG_FUNCTION_INFO_V1(launch_autoprewarm_dump);
+PG_FUNCTION_INFO_V1(autoprewarm_dump_now);
+
+#define AT_PWARM_OFF -1
+#define AT_PWARM_DUMP_AT_SHUTDOWN_ONLY 0
+#define AT_PWARM_DEFAULT_DUMP_INTERVAL 300
+
+#define AUTOPREWARM_FILE "autoprewarm.blocks"
+
+/* Primary functions */
+void		_PG_init(void);
+void		autoprewarm_main(Datum main_arg);
+static void dump_block_info_periodically(void);
+static pid_t autoprewarm_dump_launcher(void);
+static void setup_autoprewarm(BackgroundWorker *autoprewarm,
+				  const char *worker_name,
+				  const char *worker_function,
+				  Datum main_arg, int restart_time,
+				  int extra_flags);
+void		load_one_database(Datum main_arg);
+
+/*
+ * Signal Handlers.
+ */
+
+static void SigtermHandler(SIGNAL_ARGS);
+static void SighupHandler(SIGNAL_ARGS);
+static void Sigusr1Handler(SIGNAL_ARGS);
+
+/* flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+static volatile sig_atomic_t got_sighup = false;
+
+/*
+ *	Signal handler for SIGTERM
+ *	Set a flag to let the main loop to terminate, and set our latch to wake it
+ *	up.
+ */
+static void
+SigtermHandler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGHUP
+ *	Set a flag to tell the process to reread the config file, and set our
+ *	latch to wake it up.
+ */
+static void
+SighupHandler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sighup = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGUSR1.
+ *	The prewarm sub-workers will notify with SIGUSR1 on their startup/shutdown.
+ */
+static void
+Sigusr1Handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/* ============================================================================
+ * ==============	types and variables used by autoprewam	  =============
+ * ============================================================================
+ */
+
+/*
+ * Meta-data of each persistent block which is dumped and used to load.
+ */
+typedef struct BlockInfoRecord
+{
+	Oid			database;		/* database */
+	Oid			spcNode;		/* tablespace */
+	Oid			filenode;		/* relation's filenode. */
+	ForkNumber	forknum;		/* fork number */
+	BlockNumber blocknum;		/* block number */
+} BlockInfoRecord;
+
+/*
+ * Tasks performed by autoprewarm workers.
+ */
+typedef enum
+{
+	TASK_PREWARM_BUFFERPOOL,	/* prewarm the buffer pool. */
+	TASK_DUMP_BUFFERPOOL_INFO,	/* dump the buffer pool block info. */
+	TASK_DUMP_IMMEDIATE_ONCE,	/* dump the buffer pool block info immediately
+								 * once. */
+	TASK_END					/* no more tasks to do. */
+} AutoPrewarmTask;
+
+/*
+ * Shared state information about the running autoprewarm bgworker.
+ */
+typedef struct AutoPrewarmSharedState
+{
+	LWLock	   *lock;			/* protects SharedState */
+	AutoPrewarmTask current_task;		/* current tasks performed by
+										 * autoprewarm workers. */
+	bool		is_bgworker_running;	/* if set can't start another worker. */
+	bool		can_do_prewarm; /* if set can't do prewarm task. */
+
+	/* dsa used to distribute block_infos among subworkers of prewarm task. */
+	dsa_handle	apw_dsa_handle;
+	uint32		apw_num_block_infos;	/* total number of sorted block_infos
+										 * in dsa. */
+	dsa_pointer apw_block_infos;/* dsa memory where above block_infos are
+								 * stored. */
+} AutoPrewarmSharedState;
+
+static dsa_area *AutoPrewarmDSA = NULL;
+
+static AutoPrewarmSharedState *state = NULL;
+
+/*
+ * Kind of BlockInfoRecord in AUTOPREWARM_FILE file.
+ */
+typedef enum
+{
+	BLKTYPE_NEW_DATABASE,		/* first BlockInfoRecord of new database. */
+	BLKTYPE_NEW_RELATION,		/* first BlockInfoRecord of new relation. */
+	BLKTYPE_NEW_FORK,			/* first BlockInfoRecord of new fork file. */
+	BLKTYPE_NEW_BLOCK,			/* any next BlockInfoRecord. */
+	BLKTYPE_END					/* No More BlockInfoRecords available in dump
+								 * file. */
+} BlkType;
+
+/* GUC variable which control the dump activity of autoprewarm. */
+static int	dump_interval = 0;
+
+/*
+ * GUC variable which says to which database we have to connect when
+ * BlockInfoRecord belongs to global objects.
+ */
+static char *default_database;
+
+/* compare member elements to check if they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * blockinfo_cmp - compare function used for qsort().
+ */
+static int
+blockinfo_cmp(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(spcNode);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+	return 0;
+}
+
+/* ============================================================================
+ * =====================	prewarm part of autoprewarm =======================
+ * ============================================================================
+ */
+
+/*
+ * reset_shm_state - on_shm_exit reset the prewarm state.
+ */
+
+static void
+reset_shm_state(int code, Datum arg)
+{
+	state->is_bgworker_running = false;
+	state->current_task = TASK_END;
+	if (AutoPrewarmDSA)
+	{
+		if (state->apw_block_infos != InvalidDsaPointer)
+		{
+			dsa_free(AutoPrewarmDSA, state->apw_block_infos);
+			state->apw_block_infos = InvalidDsaPointer;
+			state->apw_num_block_infos = 0;
+		}
+
+		dsa_detach(AutoPrewarmDSA);
+		AutoPrewarmDSA = NULL;
+	}
+}
+
+/*
+ * get_autoprewarm_task - get next task allowed and to be performed by the
+ * autoprewarm worker.
+ */
+static AutoPrewarmTask
+get_autoprewarm_task(AutoPrewarmTask todo_task)
+{
+	bool		found = false;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	state = ShmemInitStruct("autoprewarm",
+							sizeof(AutoPrewarmSharedState),
+							&found);
+	if (!found)
+	{
+		/* First time through ... */
+		state->lock = &(GetNamedLWLockTranche("pg_autoprewarm"))->lock;
+		state->current_task = TASK_END;
+		state->is_bgworker_running = false;
+		state->can_do_prewarm = true;
+	}
+
+	LWLockRelease(AddinShmemInitLock);
+	LWLockAcquire(state->lock, LW_EXCLUSIVE);
+
+	/*
+	 * If already a bgworker is running we cannot run another. But if task is
+	 * to just dump immediate and there is no prewarm happening we can go
+	 * further.
+	 */
+	if (state->is_bgworker_running &&
+		(todo_task != TASK_DUMP_IMMEDIATE_ONCE ||
+		 state->current_task == TASK_PREWARM_BUFFERPOOL))
+	{
+		LWLockRelease(state->lock);
+		return TASK_END;
+	}
+
+	/*
+	 * If asked to do prewarm, check whether we can do so. We avoid prewarm if
+	 * its already done on startup.
+	 */
+	if (todo_task == TASK_PREWARM_BUFFERPOOL && !state->can_do_prewarm)
+		todo_task = TASK_DUMP_BUFFERPOOL_INFO;
+
+	/*
+	 * For now if there was a previous attempt to prewarm or dump any further
+	 * request to prewarm will not be entertained.
+	 */
+	state->can_do_prewarm = false;
+
+	if (todo_task != TASK_DUMP_IMMEDIATE_ONCE)
+	{
+		state->is_bgworker_running = true;
+		state->current_task = todo_task;
+		on_shmem_exit(reset_shm_state, 0);
+	}
+
+	LWLockRelease(state->lock);
+	return todo_task;
+}
+
+/*
+ * getnextblockinfo -- given a BlkType get its next BlockInfoRecord.
+ */
+static BlkType
+getnextblockinfo(BlockInfoRecord *toload_block, uint32 curr_blockinfo_pos,
+				 BlkType reqblock, uint32 *next_blockinfo_pos)
+{
+	BlkType		nextblk;
+
+	while (true)
+	{
+		/* get next block. */
+		if (curr_blockinfo_pos >= state->apw_num_block_infos)
+			return BLKTYPE_END; /* No more valid entry hence stop processing. */
+
+		if (toload_block[curr_blockinfo_pos].database !=
+			toload_block[curr_blockinfo_pos + 1].database)
+			nextblk = BLKTYPE_NEW_DATABASE;
+		else if (toload_block[curr_blockinfo_pos].filenode !=
+				 toload_block[curr_blockinfo_pos + 1].filenode)
+			nextblk = BLKTYPE_NEW_RELATION;
+		else if (toload_block[curr_blockinfo_pos].forknum !=
+				 toload_block[curr_blockinfo_pos + 1].forknum)
+			nextblk = BLKTYPE_NEW_FORK;
+		else
+			nextblk = BLKTYPE_NEW_BLOCK;
+
+		if (nextblk <= reqblock)
+		{
+			*next_blockinfo_pos = curr_blockinfo_pos + 1;
+			return nextblk;
+		}
+
+		curr_blockinfo_pos++;
+	}
+}
+
+/*
+ * connect_to_db -- connect to the given dbid.
+ *
+ * For global objects the dbid will be InvalidOid, connect to user given
+ * default_database and try to load those blocks.
+ */
+static void
+connect_to_db(Oid dbid)
+{
+	if (!OidIsValid(dbid))
+		BackgroundWorkerInitializeConnection(default_database, NULL);
+	else
+		BackgroundWorkerInitializeConnectionByOid(dbid, InvalidOid);
+	SetCurrentStatementStartTimestamp();
+	StartTransactionCommand();
+	SPI_connect();
+	PushActiveSnapshot(GetTransactionSnapshot());
+}
+
+/*
+ * load_one_database -- start of prewarm sub-worker, this will try to load
+ * blocks of one database starting from block info position passed by main
+ * prewarm worker.
+ */
+void
+load_one_database(Datum main_arg)
+{
+	uint32		curr_pos,
+				next_pos;
+	BlockInfoRecord *toload_block;
+	Relation	rel = NULL;
+	bool		have_dbconnection = false;
+	BlkType		loadblocktype;
+	BlockNumber nblocks = 0;
+	bool		found;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, SigtermHandler);
+	pqsignal(SIGHUP, SighupHandler);
+
+	/*
+	 * We're now ready to receive signals
+	 */
+	BackgroundWorkerUnblockSignals();
+
+	curr_pos = DatumGetInt64(main_arg);
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	state = ShmemInitStruct("autoprewarm",
+							sizeof(AutoPrewarmSharedState),
+							&found);
+	LWLockRelease(AddinShmemInitLock);
+
+	Assert(found);
+	Assert(state->apw_dsa_handle);
+
+	AutoPrewarmDSA = dsa_attach(state->apw_dsa_handle);
+	toload_block = (BlockInfoRecord *)
+		dsa_get_address(AutoPrewarmDSA, state->apw_block_infos);
+
+	loadblocktype = BLKTYPE_NEW_DATABASE;
+	next_pos = curr_pos;
+
+	while (loadblocktype != BLKTYPE_END)
+	{
+		Buffer		buf;
+		Oid			reloid;
+
+		/*
+		 * Load the block only if there exist a free buffer. We do not want to
+		 * replace a block already in buffer pool.
+		 */
+		if (!have_free_buffer())
+			goto end_load;
+
+		if (got_sigterm)
+			goto end_load;
+
+		switch (loadblocktype)
+		{
+			case BLKTYPE_NEW_DATABASE:
+
+				if (have_dbconnection)
+					goto end_load;		/* blocks belong to a new database,
+										 * lets end the loading process. */
+
+				/*
+				 * connect to the database.
+				 */
+				connect_to_db(toload_block[next_pos].database);
+				have_dbconnection = true;
+
+			case BLKTYPE_NEW_RELATION:
+
+				/*
+				 * release lock on previous relation.
+				 */
+				if (rel)
+				{
+					relation_close(rel, AccessShareLock);
+					rel = NULL;
+				}
+
+				loadblocktype = BLKTYPE_NEW_RELATION;
+
+				/*
+				 * lock new relation.
+				 */
+				reloid =
+					RelidByRelfilenode(toload_block[next_pos].spcNode,
+									   toload_block[next_pos].filenode);
+
+				if (!OidIsValid(reloid))
+					break;
+
+				rel = try_relation_open(reloid, AccessShareLock);
+				if (!rel)
+					break;
+				RelationOpenSmgr(rel);
+
+			case BLKTYPE_NEW_FORK:
+
+				/*
+				 * check if fork exists and if block is within the range
+				 */
+				loadblocktype = BLKTYPE_NEW_FORK;
+				if (toload_block[next_pos].forknum > InvalidForkNumber &&
+					toload_block[next_pos].forknum <= MAX_FORKNUM &&
+					!smgrexists(rel->rd_smgr, toload_block[next_pos].forknum))
+					break;
+				nblocks = RelationGetNumberOfBlocksInFork(rel,
+											 toload_block[next_pos].forknum);
+			case BLKTYPE_NEW_BLOCK:
+
+				/* check if blocknum is valid and with in fork file size. */
+				if (toload_block[next_pos].blocknum >= nblocks)
+				{
+					/* move to next forknum. */
+					loadblocktype = BLKTYPE_NEW_FORK;
+					break;
+				}
+
+				buf = ReadBufferExtended(rel, toload_block[next_pos].forknum,
+								 toload_block[next_pos].blocknum, RBM_NORMAL,
+										 NULL);
+				if (BufferIsValid(buf))
+				{
+					ReleaseBuffer(buf);
+				}
+
+				loadblocktype = BLKTYPE_NEW_BLOCK;
+				break;
+
+			case BLKTYPE_END:
+				Assert(0);		/* Should not be here! */
+		}
+
+		curr_pos = next_pos;
+		loadblocktype = getnextblockinfo(toload_block, curr_pos, loadblocktype,
+										 &next_pos);
+	}
+
+end_load:
+
+	dsa_detach(AutoPrewarmDSA);
+	/* release lock on previous relation. */
+	if (rel)
+	{
+		relation_close(rel, AccessShareLock);
+		rel = NULL;
+	}
+
+	if (have_dbconnection)
+	{
+		SPI_finish();
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+	return;
+}
+
+/*
+ * launch_prewarm_subworker -- register a dynamic worker to load the blocks
+ * starting from next_db_pos. We wait until the worker has stopped.
+ */
+static void
+launch_prewarm_subworker(uint32 next_db_pos)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle = NULL;
+	BgwHandleStatus status;
+
+	setup_autoprewarm(&worker, "autoprewarm", "load_one_database",
+					  Int64GetDatum(next_db_pos), BGW_NEVER_RESTART,
+					  BGWORKER_BACKEND_DATABASE_CONNECTION);
+	/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerShutdown */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker autoprewarm failed"),
+				 errhint("Consider increasing configuration parameter "
+						 "\"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerShutdown(handle);
+	if (status == BGWH_STOPPED)
+		return;
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			  errmsg("cannot start bgworker autoprewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart"
+						 " the database.")));
+	}
+
+	Assert(0);
+}
+
+/*
+ *	prewarm_buffer_pool - the main routine which prewarm the buffer pool.
+ *
+ *	The prewarm bgworker will first load all of the BlockInfoRecord's in
+ *	$PGDATA/AUTOPREWARM_FILE to a dsa. And those BlockInfoRecords are further
+ *	separated based on their database. And for each group of BlockInfoRecords a
+ *	sub-workers will be launched to load corresponding blocks. Each sub-worker
+ *	will be launched in sequential order only after the previous sub-worker has
+ *	finished its job.
+ */
+static void
+prewarm_buffer_pool(void)
+{
+	static char dump_file_path[MAXPGPATH];
+	FILE	   *file = NULL;
+	uint32	   *next_db_pos;
+	size_t		next_db_pos_size;
+	uint32		this_dbs_elements = 0,
+				num_elements,
+				num_db = 0,
+				i;
+	Oid			prev_database;
+	BlockInfoRecord *blkinfo;
+
+	snprintf(dump_file_path, sizeof(dump_file_path), "%s",
+			 AUTOPREWARM_FILE);
+
+	file = fopen(dump_file_path, PG_BINARY_R);
+	if (!file)
+		return;					/* No file to load. */
+
+	if (!state->apw_dsa_handle)
+	{
+		AutoPrewarmDSA = dsa_create(state->lock->tranche);
+		state->apw_dsa_handle = dsa_get_handle(AutoPrewarmDSA);
+	}
+	else
+		AutoPrewarmDSA = dsa_attach(state->apw_dsa_handle);
+
+	if (fscanf(file, "<<%u>>", &num_elements) != 1)
+	{
+		fclose(file);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("Error reading num of elements in \"%s\" for"
+						" autoprewarm : %m", dump_file_path)));
+	}
+
+	state->apw_block_infos =
+		dsa_allocate_extended(AutoPrewarmDSA,
+							  sizeof(BlockInfoRecord) * num_elements,
+							  DSA_ALLOC_NO_OOM);
+	if (state->apw_block_infos == InvalidDsaPointer)
+	{
+		fclose(file);
+		dsa_detach(AutoPrewarmDSA);
+		AutoPrewarmDSA = NULL;
+		return;
+	}
+
+	state->apw_num_block_infos = num_elements;
+
+	blkinfo = (BlockInfoRecord *)
+		dsa_get_address(AutoPrewarmDSA, state->apw_block_infos);
+
+	next_db_pos_size = 64;
+	next_db_pos = (uint32 *) palloc(sizeof(uint32) * next_db_pos_size);
+
+	/* read and fill block infos */
+	for (i = 0; i < num_elements; i++, blkinfo++)
+	{
+		/* get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &blkinfo->database,
+						&blkinfo->spcNode, &blkinfo->filenode,
+						(uint32 *) &blkinfo->forknum,
+						&blkinfo->blocknum))
+			break;				/* no more records found. */
+		if (i == 0)
+		{
+			next_db_pos[num_db++] = 0;
+			prev_database = blkinfo->database;
+		}
+		else if (prev_database != blkinfo->database)
+		{
+			if (num_db >= next_db_pos_size)
+			{
+				next_db_pos_size *= 2;
+				next_db_pos = (uint32 *) repalloc(next_db_pos,
+										  sizeof(uint32) * next_db_pos_size);
+			}
+
+			next_db_pos[num_db++] = this_dbs_elements;
+			this_dbs_elements = 0;
+			prev_database = blkinfo->database;
+		}
+
+		this_dbs_elements++;
+	}
+
+	fclose(file);
+	i = 0;
+
+	/* get next database's first block info's position. */
+	while (!got_sigterm && i < num_db)
+	{
+		/*
+		 * Register a sub-worker to load new database's block. Wait until the
+		 * sub-worker finish its job before launching next subworker.
+		 */
+		launch_prewarm_subworker(next_db_pos[i++]);
+	}
+
+	pfree(next_db_pos);
+
+	if (AutoPrewarmDSA)
+	{
+		if (state->apw_block_infos != InvalidDsaPointer)
+		{
+			dsa_free(AutoPrewarmDSA, state->apw_block_infos);
+			state->apw_block_infos = InvalidDsaPointer;
+			state->apw_num_block_infos = 0;
+		}
+
+		dsa_detach(AutoPrewarmDSA);
+		AutoPrewarmDSA = NULL;
+	}
+
+	ereport(LOG, (errmsg("autoprewarm load task ended")));
+	return;
+}
+
+/* ============================================================================
+ * =============	buffer pool info dump part of autoprewarm	===============
+ * ============================================================================
+ */
+
+/* This sub-module is for periodically dumping buffer pool's block info into
+ * a dump file AUTOPREWARM_FILE.
+ * Each entry of block info looks like this:
+ * <DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum> and we shall call it
+ * as BlockInfoRecord. Note we write in the text form so that the dump
+ * information is readable and if necessary can be carefully edited.
+ *
+ * The prewarm task will read these blockInfoRecord one by one in sequence and
+ * distribute it among its sub workers to load corresponding blocks.
+ */
+
+/*
+ *	dump_now - the main routine which goes through each buffer header of buffer
+ *	pool and dumps their meta data. We Sort these data and then dump them.
+ *	Sorting is necessary as it facilitates sequential read during load.
+ */
+static uint32
+dump_now(void)
+{
+	static char dump_file_path[MAXPGPATH],
+				transient_dump_file_path[MAXPGPATH];
+	uint32		i;
+	int			ret,
+				buflen;
+	uint32		num_blocks;
+	BlockInfoRecord *block_info_array;
+	BufferDesc *bufHdr;
+	int			fd;
+	char		buf[1024];
+
+	block_info_array =
+		(BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	for (num_blocks = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Have we been asked to stop dump? */
+		if (dump_interval == AT_PWARM_OFF)
+		{
+			free(block_info_array);
+			return 0;
+		}
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		if (buf_state & BM_TAG_VALID)
+		{
+			block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_blocks].spcNode = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+			++num_blocks;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	/*
+	 * sort the block number to increase chance of sequential reads during
+	 * load.
+	 */
+	pg_qsort(block_info_array, num_blocks, sizeof(BlockInfoRecord),
+			 blockinfo_cmp);
+
+	snprintf(transient_dump_file_path, sizeof(dump_file_path),
+			 "%s.%d", AUTOPREWARM_FILE, MyProcPid);
+
+	fd = OpenTransientFile(transient_dump_file_path,
+						   O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("autoprewarm: could not open \"%s\": %m",
+						dump_file_path)));
+
+	snprintf(dump_file_path, sizeof(dump_file_path),
+			 "%s", AUTOPREWARM_FILE);
+	buflen = sprintf(buf, "<<%u>>\n", num_blocks);
+	if (write(fd, buf, buflen) < buflen)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("autoprewarm: error writing to \"%s\" : %m",
+						dump_file_path)));
+
+	for (i = 0; i < num_blocks; i++)
+	{
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Have we been asked to stop dump? */
+		if (dump_interval == AT_PWARM_OFF)
+		{
+			free(block_info_array);
+			CloseTransientFile(fd);
+			unlink(transient_dump_file_path);
+			return 0;
+		}
+
+		buflen = sprintf(buf, "%u,%u,%u,%u,%u\n",
+						 block_info_array[i].database,
+						 block_info_array[i].spcNode,
+						 block_info_array[i].filenode,
+						 (uint32) block_info_array[i].forknum,
+						 block_info_array[i].blocknum);
+
+		if (write(fd, buf, buflen) < buflen)
+		{
+			CloseTransientFile(fd);
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("autoprewarm: error writing to \"%s\" : %m",
+							dump_file_path)));
+		}
+	}
+
+	pfree(block_info_array);
+
+	/*
+	 * rename transient_dump_file_path to dump_file_path to make things
+	 * permanent.
+	 */
+	ret = CloseTransientFile(fd);
+	if (ret != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("autoprewarm: error closing \"%s\" : %m",
+						transient_dump_file_path)));
+	(void) durable_rename(transient_dump_file_path, dump_file_path, LOG);
+
+	ereport(LOG, (errmsg("autoprewarm: saved metadata info of %d blocks",
+						 num_blocks)));
+	return num_blocks;
+}
+
+/*
+ * dump_block_info_periodically - at regular intervals, which is defined by GUC
+ * dump_interval, dump the info of blocks which are present in buffer pool.
+ */
+void
+dump_block_info_periodically(void)
+{
+	TimestampTz last_dump_time = GetCurrentTimestamp();
+
+	while (!got_sigterm)
+	{
+		int			rc;
+		struct timeval nap;
+
+		nap.tv_sec = AT_PWARM_DEFAULT_DUMP_INTERVAL;
+		nap.tv_usec = 0;
+
+		/* Has been set not to dump. Nothing more to do. */
+		if (dump_interval == AT_PWARM_OFF)
+			return;
+
+		if (dump_interval > AT_PWARM_DUMP_AT_SHUTDOWN_ONLY)
+		{
+			TimestampTz current_time = GetCurrentTimestamp();
+
+			if (TimestampDifferenceExceeds(last_dump_time,
+										   current_time,
+										   (dump_interval * 1000)))
+			{
+				dump_now();
+				if (got_sigterm)
+					return;		/* got shutdown signal just after a dump. And,
+								 * I think better to return now. */
+				last_dump_time = GetCurrentTimestamp();
+				nap.tv_sec = dump_interval;
+				nap.tv_usec = 0;
+			}
+			else
+			{
+				long		secs;
+				int			usecs;
+
+				TimestampDifference(last_dump_time, current_time,
+									&secs, &usecs);
+				nap.tv_sec = dump_interval - secs;
+				nap.tv_usec = 0;
+			}
+		}
+
+		ResetLatch(&MyProc->procLatch);
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   (nap.tv_sec * 1000L) + (nap.tv_usec / 1000L),
+					   PG_WAIT_EXTENSION);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* One last block meta info dump while postmaster shutdown. */
+	if (dump_interval != AT_PWARM_OFF)
+		dump_now();
+}
+
+/*
+ * autoprewarm_main -- the main entry point of autoprewarm bgworker process.
+ */
+void
+autoprewarm_main(Datum main_arg)
+{
+	AutoPrewarmTask next_task;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, SigtermHandler);
+	pqsignal(SIGHUP, SighupHandler);
+	pqsignal(SIGUSR1, Sigusr1Handler);
+
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+
+	next_task = get_autoprewarm_task(DatumGetInt32(main_arg));
+
+	ereport(LOG, (errmsg("autoprewarm has started")));
+
+	/*
+	 * **** perform autoprewarm's next task	****
+	 */
+	if (next_task == TASK_PREWARM_BUFFERPOOL)
+	{
+		prewarm_buffer_pool();
+		/* prewarm is done lets move to TASK_DUMP_BUFFERPOOL_INFO. */
+		state->current_task = TASK_DUMP_BUFFERPOOL_INFO;
+		next_task = TASK_DUMP_BUFFERPOOL_INFO;
+	}
+
+	if (next_task == TASK_DUMP_BUFFERPOOL_INFO)
+	{
+		dump_block_info_periodically();
+
+		/*
+		 * down grade to TASK_DUMP_IMMEDIATE_ONCE so others can start
+		 * TASK_DUMP_BUFFERPOOL_INFO
+		 */
+		state->current_task = TASK_DUMP_IMMEDIATE_ONCE;
+	}
+
+	ereport(LOG, (errmsg("autoprewarm shutting down")));
+}
+
+/* ============================================================================
+ * =============	extension's entry functions/utilities	===================
+ * ============================================================================
+ */
+
+/* Register autoprewarm load bgworker. */
+static void
+setup_autoprewarm(BackgroundWorker *autoprewarm, const char *worker_name,
+			   const char *worker_function, Datum main_arg, int restart_time,
+				  int extra_flags)
+{
+	MemSet(autoprewarm, 0, sizeof(BackgroundWorker));
+	autoprewarm->bgw_flags = BGWORKER_SHMEM_ACCESS | extra_flags;
+
+	/* Register the autoprewarm background worker */
+	autoprewarm->bgw_start_time = BgWorkerStart_ConsistentState;
+	autoprewarm->bgw_restart_time = restart_time;
+	strcpy(autoprewarm->bgw_library_name, "pg_prewarm");
+	strcpy(autoprewarm->bgw_function_name, worker_function);
+	strncpy(autoprewarm->bgw_name, worker_name, BGW_MAXLEN);
+	autoprewarm->bgw_main_arg = main_arg;
+}
+
+/* Extension's entry point. */
+void
+_PG_init(void)
+{
+	BackgroundWorker autoprewarm;
+
+	/* Define custom GUC variables. */
+	DefineCustomIntVariable("pg_prewarm.dump_interval",
+					   "Sets the maximum time between two buffer pool dumps",
+							"If set to Zero, timer based dumping is disabled."
+							" If set to -1, stops the running autoprewarm.",
+							&dump_interval,
+							AT_PWARM_DEFAULT_DUMP_INTERVAL,
+							AT_PWARM_OFF, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	DefineCustomStringVariable("pg_prewarm.default_database",
+				"default database to connect if dump has not recorded same.",
+							   NULL,
+							   &default_database,
+							   "postgres",
+							   PGC_POSTMASTER,
+							   0,
+							   NULL,
+							   NULL,
+							   NULL);
+	EmitWarningsOnPlaceholders("pg_prewarm");
+
+	/* if not run as a preloaded library, nothing more to do here! */
+	if (!process_shared_preload_libraries_in_progress)
+		return;
+
+	/* Request additional shared resources */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
+	RequestNamedLWLockTranche("pg_autoprewarm", 1);
+
+	/* Has been set not to prewarm/dump. Nothing more to do. */
+	if (dump_interval == AT_PWARM_OFF)
+		return;
+
+	/* Register autoprewarm load. */
+	setup_autoprewarm(&autoprewarm, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_PREWARM_BUFFERPOOL), 0, 0);
+	RegisterBackgroundWorker(&autoprewarm);
+}
+
+/*
+ * Dynamically launch an autoprewarm dump worker.
+ */
+static pid_t
+autoprewarm_dump_launcher(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	setup_autoprewarm(&worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_DUMP_BUFFERPOOL_INFO), 0, 0);
+
+	/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			   errmsg("registering dynamic bgworker \"autoprewarm\" failed"),
+				 errhint("Consider increasing configuration parameter "
+						 "\"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerStartup(handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not start autoprewarm dump bgworker"),
+			   errhint("More details may be available in the server log.")));
+	}
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			  errmsg("cannot start bgworker autoprewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart"
+						 " the database.")));
+	}
+
+	Assert(status == BGWH_STARTED);
+	return pid;
+}
+
+/*
+ * The C-Language entry function to launch autoprewarm dump bgworker.
+ */
+Datum
+launch_autoprewarm_dump(PG_FUNCTION_ARGS)
+{
+	pid_t		pid;
+
+	/* Has been set not to prewarm/dump. Nothing more to do. */
+	if (dump_interval == AT_PWARM_OFF)
+		PG_RETURN_NULL();
+
+	pid = autoprewarm_dump_launcher();
+	PG_RETURN_INT32(pid);
+}
+
+/*
+ * The C-Language entry function to dump immediately.
+ */
+Datum
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+	AutoPrewarmTask next_task;
+
+	/* dump only if prewarm is not in progress. */
+	next_task = get_autoprewarm_task(TASK_DUMP_IMMEDIATE_ONCE);
+	if (next_task == TASK_DUMP_IMMEDIATE_ONCE)
+		PG_RETURN_INT64(dump_now());
+	PG_RETURN_NULL();
+}
diff --git a/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
new file mode 100644
index 0000000..6c35fb7
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,14 @@
+/* contrib/pg_prewarm/pg_prewarm--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_prewarm UPDATE TO '1.2'" to load this file. \quit
+
+CREATE FUNCTION launch_autoprewarm_dump()
+RETURNS pg_catalog.int4 STRICT
+AS 'MODULE_PATHNAME', 'launch_autoprewarm_dump'
+LANGUAGE C;
+
+CREATE FUNCTION autoprewarm_dump_now()
+RETURNS pg_catalog.int8 STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_dump_now'
+LANGUAGE C;
diff --git a/contrib/pg_prewarm/pg_prewarm.control b/contrib/pg_prewarm/pg_prewarm.control
index cf2fb92..40e3add 100644
--- a/contrib/pg_prewarm/pg_prewarm.control
+++ b/contrib/pg_prewarm/pg_prewarm.control
@@ -1,5 +1,5 @@
 # pg_prewarm extension
 comment = 'prewarm relation data'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_prewarm'
 relocatable = true
diff --git a/contrib/pg_prewarm/pg_prewarm.h b/contrib/pg_prewarm/pg_prewarm.h
new file mode 100644
index 0000000..9e2a6e6
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm.h
@@ -0,0 +1,43 @@
+/*
+ * contrib/pg_prewarm/pg_prewarm.h
+ */
+#ifndef __PG_PREWARM_H__
+#define __PG_PREWARM_H__
+#include "postgres.h"
+
+#include <sys/stat.h>
+#include <unistd.h>
+
+/* These are always necessary for a bgworker. */
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+
+/* These are necessary for prewarm utilities. */
+#include "access/heapam.h"
+#include "access/xact.h"
+#include "catalog/catalog.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "executor/spi.h"
+#include "fmgr.h"
+#include "miscadmin.h"
+#include "port/atomics.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/buf_internals.h"
+#include "storage/smgr.h"
+#include "utils/acl.h"
+#include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relfilenodemap.h"
+#include "utils/resowner.h"
+
+#endif   /* __PG_PREWARM_H__ */
diff --git a/doc/src/sgml/pgprewarm.sgml b/doc/src/sgml/pgprewarm.sgml
index c090401..1538446 100644
--- a/doc/src/sgml/pgprewarm.sgml
+++ b/doc/src/sgml/pgprewarm.sgml
@@ -10,7 +10,9 @@
  <para>
   The <filename>pg_prewarm</filename> module provides a convenient way
   to load relation data into either the operating system buffer cache
-  or the <productname>PostgreSQL</productname> buffer cache.
+  or the <productname>PostgreSQL</productname> buffer cache. Additionally, an
+  automatic prewarming of the server buffers is supported whenever the server
+  restarts.
  </para>
 
  <sect2>
@@ -55,6 +57,102 @@ pg_prewarm(regclass, mode text default 'buffer', fork text default 'main',
    cache. For these reasons, prewarming is typically most useful at startup,
    when caches are largely empty.
   </para>
+
+<synopsis>
+launch_autoprewarm_dump() RETURNS int4
+</synopsis>
+
+  <para>
+   This is a SQL callable function to launch the <literal>autoprewarm</literal>
+   worker to dump the buffer pool information at regular interval. In a server,
+   we can only run one <literal>autoprewarm</literal> worker so if worker sees
+   another existing worker it will exit immediately. The return value is pid of
+   the worker which has been launched.
+  </para>
+
+<synopsis>
+autoprewarm_dump_now() RETURNS int8
+</synopsis>
+
+  <para>
+   This is a SQL callable function to dump buffer pool information immediately
+   once by a backend. This can work in parallel
+   with the <literal>autoprewarm</literal> worker while it is dumping.
+   The return value is the number of blocks info dumped.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>autoprewarm</title>
+
+  <para>
+  A bgworker which automatically records information about blocks which were
+  present in buffer pool before server shutdown and then prewarm the buffer
+  pool upon server restart with those blocks.
+  </para>
+
+  <para>
+  When the shared library <literal>pg_prewarm</literal> is preloaded via
+  <xref linkend="guc-shared-preload-libraries"> in <filename>postgresql.conf</>,
+  a bgworker <literal>autoprewarm</literal> is launched immediately after the
+  server has reached a consistent state. The bgworker will start loading blocks
+  recorded in <literal>$PGDATA/autoprewarm.blocks</literal> until there is a
+  free buffer left in the buffer pool. This way we do not replace any new
+  blocks which were loaded either by the recovery process or the querying
+  clients.
+  </para>
+
+  <para>
+  Once the <literal>autoprewarm</literal> bgworker has completed its prewarm
+  task, it will start a new task to periodically dump the information about
+  blocks which are currently in shared buffer pool. Upon next server restart,
+  the bgworker will prewarm the buffer pool by loading those blocks. The GUC
+  <literal>pg_prewarm.dump_interval</literal> will control the dumping activity
+  of the bgworker.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+   <term>
+     <varname>pg_prewarm.dump_interval</varname> (<type>int</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.dump_interval</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is valid only for <literal>autoprewarm</literal>. The minimum number
+      of seconds between two buffer pool's block information dump. The default
+      is 300 seconds. It also takes special values. If set to 0 then timer
+      based dump is disabled, it dumps only while the server is shutting down.
+      If set to -1, the running <literal>autoprewarm</literal> will be stopped.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+ <variablelist>
+   <varlistentry>
+    <term>
+     <varname>pg_prewarm.default_database</varname> (<type>string</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.default_database</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is valid only for <literal>autoprewarm</literal>. The blocks of
+      global objects will not have a database associated with them. The
+      <literal>default_database</literal> will be used to connect and preload
+      such blocks.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
  </sect2>
 
  <sect2>
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 5d0a636..06a34a7 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,23 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- a lockless check to see if there is a free buffer in
+ *					   buffer pool.
+ *
+ * If the result is true that will become stale once free buffers are moved out
+ * by other operations, so the caller who strictly want to use a free buffer
+ * should not call this.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index ff99f6b..ab04bd9 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 15c72f5..ad6fb9d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -133,6 +133,8 @@ AttrDefault
 AttrNumber
 AttributeOpts
 AuthRequest
+AutoPrewarmSharedState
+AutoPrewarmTask
 AutoVacOpts
 AutoVacuumShmemStruct
 AuxProcType
@@ -206,10 +208,12 @@ BitmapOr
 BitmapOrPath
 BitmapOrState
 Bitmapset
+BlkType
 BlobInfo
 Block
 BlockId
 BlockIdData
+BlockInfoRecord
 BlockNumber
 BlockSampler
 BlockSamplerData

#70

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Mithun Cy (#69)

Re: Proposal : For Auto-Prewarm.

On Wed, May 24, 2017 at 6:28 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Tue, May 23, 2017 at 7:06 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

Thanks, Andres,

I have tried to fix all of your comments.

There was a typo issue in previous patch 07 where instead of == I have
used !=. And, a mistake in comments. I have corrected same now.

+/*
+ * autoprewarm :
+ *
+ * What is it?
+ * ===========
+ * A bgworker which automatically records information about blocks which were
+ * present in buffer pool before server shutdown and then prewarm the buffer
+ * pool upon server restart with those blocks.
+ *
+ * How does it work?
+ * =================
+ * When the shared library "pg_prewarm" is preloaded, a
+ * bgworker "autoprewarm" is launched immediately after the server has reached
+ * consistent state. The bgworker will start loading blocks recorded in the
+ * format BlockInfoRecord
+ * <<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>> in
+ * $PGDATA/AUTOPREWARM_FILE, until there is no free buffer left in the buffer
+ * pool. This way we do not replace any new blocks which were loaded either by
+ * the recovery process or the querying clients.
+ *
+ * Once the "autoprewarm" bgworker has completed its prewarm task, it will
+ * start a new task to periodically dump the BlockInfoRecords related to blocks
+ * which are currently in shared buffer pool. Upon next server restart, the
+ * bgworker will prewarm the buffer pool by loading those blocks. The GUC
+ * pg_prewarm.dump_interval will control the dumping activity of the bgworker.
+ */

Make this part of the file header comment. Also, add an enabling GUC.
The default can be true, but it should be possible to preload the
library so that the SQL functions are available without a dynamic
library load without requiring you to get the auto-prewarm behavior.
I suggest pg_prewarm.autoprewarm = true / false.

Your SigtermHandler and SighupHandler routines are still capitalized
in a way that's not very consistent with what we do elsewhere. I
think all of our other signal handlers have names_like_this() not
NamesLikeThis().

+ * ============== types and variables used by autoprewam =============

Spelling.

+ * Meta-data of each persistent block which is dumped and used to load.

Metadata

+typedef struct BlockInfoRecord
+{
+    Oid            database;        /* database */
+    Oid            spcNode;        /* tablespace */
+    Oid            filenode;        /* relation's filenode. */
+    ForkNumber    forknum;        /* fork number */
+    BlockNumber blocknum;        /* block number */
+} BlockInfoRecord;

spcNode is capitalized differently from all of the other members.

+ LWLock *lock; /* protects SharedState */

Just declare this as LWLock lock, and initialize it using
LWLockInitialize. The way you're doing it is more complicated.

+static dsa_area *AutoPrewarmDSA = NULL;

DSA seems like more than you need here. There's only one allocation
being performed. I think it would be simpler and less error-prone to
use DSM directly. I don't even think you need a shm_toc. You could
just do:

seg = dsm_create(segsize, 0);
blkinfo = dsm_segment_address(seg);

Then pass dsm_segment_handle(seg) to the worker using bgw_main_arg or
bgw_extra, and have it call dsm_attach. An advantage of this approach
is that you only allocate the memory you actually need, whereas DSA
will allocate more, expecting that you might do further allocations.

+    pg_qsort(block_info_array, num_blocks, sizeof(BlockInfoRecord),
+             blockinfo_cmp);

I think it would make more sense to sort the array on the load side,
in the autoprewarm worker, rather than when dumping. First, many
dumps will never be reloaded, so there's no real reason to waste time
sorting them. Second, the way you've got the code right now, it
relies heavily on the assumption that the dump file will be sorted,
but that doesn't seem like a necessary assumption. We don't really
expect users to hand-edit the dump files, but if they do, it'd be nice
if it didn't randomly break things unnecessarily.

+                 errmsg("autoprewarm: could not open \"%s\": %m",
+                        dump_file_path)));

Use standard error message wordings! Don't create new translatable
messages by adding "autoprewarm" to error messages. There are
multiple instances of this throughout the patch.

+    snprintf(dump_file_path, sizeof(dump_file_path),
+             "%s", AUTOPREWARM_FILE);

This is completely pointless. You can get rid of the dump_file_path
variable and just use AUTOPREWARM_FILE wherever you would have used
dump_file_path. It's just making a copy of a string you already have.
Also, this code interrupts the flow of the surrounding logic in a
weird way; even if we needed it, it would make more sense to put it up
higher, where we construct the temporary path, or down lower, where
the value is actually needed.

+    SPI_connect();
+    PushActiveSnapshot(GetTransactionSnapshot());

It is really unclear why you need this, since the code does not use
SPI for anything, or do anything that needs a snapshot.

+ goto end_load;

I think you should try to rewrite this function so that it doesn't
need "goto". I also think in general that this function could be
written in a much more direct way by getting rid of the switch and the
BLKTYPE_* constants altogether. connect_to_db() can only happen once,
so do that the beginning. After that, the logic can look roughly like
this:

BlockInfoRecord *old_blk = NULL;

while (!got_sigterm && pos < maxpos && have_free_buffer())
{
BlockInfoRecord *blk = block_info[pos];

/* Quit if we've reached records for another database. */
if (old_blk != NULL && old_blk->dbOid != blk->dbOid)
break;

/*
* When we reach a new relation, close the old one. Note, however,
* that the previous try_relation_open may have failed, in which case
* rel will be NULL.
*/
if (old_blk != NULL && old_blk->relOid != blk->relOid && rel != NULL)
{
relation_close(rel, AccessShareLock);
rel = NULL;
}

/*
* Try to open each new relation, but only once, when we first
* encounter it. If it's been dropped, skip the associated blocks.
*/
if (old_blk == NULL || old_blk->relOid != blk->relOid)
{
Assert(rel == NULL);
rel = try_relation_open(blk->relOid, AccessShareLock);
}
if (!rel)
{
++pos;
continue;
}

/* Once per fork, check for fork existence and size. */
if (old_blk == NULL || old_blk->forkNumber != blk->forkNumber)
{
RelationOpenSmgr(rel);
if (smgrexists(rel->rd_smgr, blk->forkNumber))
nblocks = RelationGetNumberOfBlocksInFork(...);
else
nblocks = 0;
}

/* Prewarm buffer. */
buf = ReadBufferExtended(...);
...
++pos;
}

+                 errhint("Kill all remaining database processes and restart"
+                         " the database.")));

Don't break strings across lines. Just put it all on one (long) line.

I don't think you should really need default_database. It seems like
it should be possible to jigger things so that those blocks are loaded
together with some other database (say, let the first worker do it).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#71

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Robert Haas (#70)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

Thanks Robert,

On Wed, May 24, 2017 at 5:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:

+ *
+ * Once the "autoprewarm" bgworker has completed its prewarm task, it will
+ * start a new task to periodically dump the BlockInfoRecords related to blocks
+ * which are currently in shared buffer pool. Upon next server restart, the
+ * bgworker will prewarm the buffer pool by loading those blocks. The GUC
+ * pg_prewarm.dump_interval will control the dumping activity of the bgworker.
+ */

Make this part of the file header comment.

-- Thanks, I have moved same as part of header description.

Also, add an enabling GUC.
The default can be true, but it should be possible to preload the
library so that the SQL functions are available without a dynamic
library load without requiring you to get the auto-prewarm behavior.
I suggest pg_prewarm.autoprewarm = true / false.

-- Thanks, I have added a boolean GUC pg_prewarm.autoprewarm with
default true. I have made it as PGC_POSTMASTER level variable
considering the intention is here to avoid starting the autoprewarm
worker. Need help, have I missed anything? Currently, sql callable
functions autoprewarm_dump_now(), launch_autoprewarm_dump() are only
available after create extension pg_prewarm command will this change
now?
There is another GUC setting pg_prewarm.dump_interval if = -1 we stop
the running autoprewarm worker. I have a doubt should we combine these
2 entities into one such that it control the state of autoprewarm
worker?

Your SigtermHandler and SighupHandler routines are still capitalized
in a way that's not very consistent with what we do elsewhere. I
think all of our other signal handlers have names_like_this() not
NamesLikeThis().

-- handler functions are renamed for example apw_sigterm_handler, as
similar in autovacuum.c

+ * ============== types and variables used by autoprewam =============

Spelling.

-- Fixed, Sorry.

+ * Meta-data of each persistent block which is dumped and used to load.

Metadata

-- Fixed.

+typedef struct BlockInfoRecord
+{
+    Oid            database;        /* database */
+    Oid            spcNode;        /* tablespace */
+    Oid            filenode;        /* relation's filenode. */
+    ForkNumber    forknum;        /* fork number */
+    BlockNumber blocknum;        /* block number */
+} BlockInfoRecord;

spcNode is capitalized differently from all of the other members.

-- renamed from spcNode to spcnode.

+ LWLock *lock; /* protects SharedState */

Just declare this as LWLock lock, and initialize it using
LWLockInitialize. The way you're doing it is more complicated.

-- Fixed as suggested
LWLockInitialize(&state->lock, LWLockNewTrancheId());

+static dsa_area *AutoPrewarmDSA = NULL;

DSA seems like more than you need here. There's only one allocation
being performed. I think it would be simpler and less error-prone to
use DSM directly. I don't even think you need a shm_toc. You could
just do:

seg = dsm_create(segsize, 0);
blkinfo = dsm_segment_address(seg);
Then pass dsm_segment_handle(seg) to the worker using bgw_main_arg or
bgw_extra, and have it call dsm_attach. An advantage of this approach
is that you only allocate the memory you actually need, whereas DSA
will allocate more, expecting that you might do further allocations.

- Thanks Fixed. And we pass following arguments to subwrokers through bgw_extra
/*
* The block_infos allocated to each sub-worker to do prewarming.
*/
typedef struct prewarm_elem
{
dsm_handle block_info_handle; /* handle to dsm seg of block_infos */
Oid database; /* database to connect and load */
uint32 start_pos; /* start position within block_infos from
* which sub-worker start prewaring blocks. */
uint32 end_of_blockinfos; /* End of block_infos in dsm */
} prewarm_elem;

to distribute the prewarm work among subworkers.

+    pg_qsort(block_info_array, num_blocks, sizeof(BlockInfoRecord),
+             blockinfo_cmp);
I think it would make more sense to sort the array on the load side,
in the autoprewarm worker, rather than when dumping.

Fixed Now sorting block_infos just before loading the blocks

+                 errmsg("autoprewarm: could not open \"%s\": %m",
+                        dump_file_path)));
Use standard error message wordings! Don't create new translatable
messages by adding "autoprewarm" to error messages. There are
multiple instances of this throughout the patch.

-- Removed autoprewarm as part of sufix in error message in all of the places.

+    snprintf(dump_file_path, sizeof(dump_file_path),
+             "%s", AUTOPREWARM_FILE);
This is completely pointless. You can get rid of the dump_file_path

-- dump_file_path is removed.

+    SPI_connect();
+    PushActiveSnapshot(GetTransactionSnapshot());
It is really unclear why you need this, since the code does not use
SPI for anything, or do anything that needs a snapshot.

-Sorry Removed it now.

+ goto end_load;

I think you should try to rewrite this function so that it doesn't
need "goto". I also think in general that this function could be
written in a much more direct way by getting rid of the switch and the
BLKTYPE_* constants altogether. connect_to_db() can only happen once,
so do that the beginning. After that, the logic can look roughly like
this:

-- Fixed using exact code framework as you have suggested previously.

+                 errhint("Kill all remaining database processes and restart"
+                         " the database.")));
Don't break strings across lines. Just put it all on one (long) line.

-- Fixed; I have tried to trim the string which was going beyond
80chars but has fixed it now as you have suggested.

I don't think you should really need default_database. It seems like
it should be possible to jigger things so that those blocks are loaded
together with some other database (say, let the first worker do it).

-- Fixed, for block_infos with database 0 will be combined with next
database's block_info load. One problem which I have kept open is what
if there are only block_info's with database 0 in dump file, With
default_database we could have handled such case. After removing it I
am ignoring block_infos of 0 databases in such cases. Is that okay?.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

autoprewarm_09.patchapplication/octet-stream; name=autoprewarm_09.patchDownload

commit 601bd8a69c32d2d3a286e7895c7b4d05d713d479
Author: mithun <mithun@localhost.localdomain>
Date:   Tue May 30 10:06:02 2017 +0530

    autoprewarm_09.patch

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e..88580d1 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,10 +1,10 @@
 # contrib/pg_prewarm/Makefile
 
 MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+OBJS = pg_prewarm.o autoprewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
-DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
+DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
new file mode 100644
index 0000000..ac83c08
--- /dev/null
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -0,0 +1,1030 @@
+/*-------------------------------------------------------------------------
+ *
+ * autoprewarm.c
+ *		Automatically prewarm the shared buffer pool when server restarts.
+ *
+ * DESCRIPTION
+ *
+ *		It is a bgworker which automatically records information about blocks
+ *		which were present in buffer pool before server shutdown and then
+ *		prewarm the buffer pool upon server restart with those blocks.
+ *
+ *		How does it work? When the shared library "pg_prewarm" is preloaded, a
+ *		bgworker "autoprewarm" is launched immediately after the server has
+ *		reached consistent state. The bgworker will start loading blocks
+ *		recorded in the format BlockInfoRecord
+ *		<<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>> in
+ *		$PGDATA/AUTOPREWARM_FILE, until there is no free buffer left in the
+ *		buffer pool. This way we do not replace any new blocks which were
+ *		loaded either by the recovery process or the querying clients.
+ *
+ *		Once the "autoprewarm" bgworker has completed its prewarm task, it will
+ *		start a new task to periodically dump the BlockInfoRecords related to
+ *		blocks which are currently in shared buffer pool. Upon next server
+ *		restart, the bgworker will prewarm the buffer pool by loading those
+ *		blocks. The GUC pg_prewarm.dump_interval will control the dumping
+ *		activity of the bgworker.
+ *
+ *	Copyright (c) 2016-2017, PostgreSQL Global Development Group
+ *
+ *	IDENTIFICATION
+ *		contrib/autoprewarm.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "autoprewarm.h"
+
+PG_FUNCTION_INFO_V1(launch_autoprewarm_dump);
+PG_FUNCTION_INFO_V1(autoprewarm_dump_now);
+
+#define AT_PWARM_OFF -1
+#define AT_PWARM_DUMP_AT_SHUTDOWN_ONLY 0
+#define AT_PWARM_DEFAULT_DUMP_INTERVAL 300
+
+#define AUTOPREWARM_FILE "autoprewarm.blocks"
+
+/* Primary functions */
+void		_PG_init(void);
+void		autoprewarm_main(Datum main_arg);
+static void dump_block_info_periodically(void);
+static pid_t autoprewarm_dump_launcher(void);
+static void setup_autoprewarm(BackgroundWorker *autoprewarm,
+				  const char *worker_name,
+				  const char *worker_function,
+				  Datum main_arg, int restart_time,
+				  int extra_flags);
+void		load_one_database(Datum main_arg);
+
+/*
+ * Signal Handlers.
+ */
+
+static void apw_sigterm_handler(SIGNAL_ARGS);
+static void apw_sighup_handler(SIGNAL_ARGS);
+static void apw_sigusr1_handler(SIGNAL_ARGS);
+
+/* flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+static volatile sig_atomic_t got_sighup = false;
+
+/*
+ *	Signal handler for SIGTERM
+ *	Set a flag to let the main loop to terminate, and set our latch to wake it
+ *	up.
+ */
+static void
+apw_sigterm_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGHUP
+ *	Set a flag to tell the process to reread the config file, and set our
+ *	latch to wake it up.
+ */
+static void
+apw_sighup_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sighup = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGUSR1.
+ *	The prewarm sub-workers will notify with SIGUSR1 on their startup/shutdown.
+ */
+static void
+apw_sigusr1_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/* ============================================================================
+ * ==============	types and variables used by autoprewarm   =============
+ * ============================================================================
+ */
+
+/*
+ * Metadata of each persistent block which is dumped and used to load.
+ */
+typedef struct BlockInfoRecord
+{
+	Oid			database;		/* database */
+	Oid			spcnode;		/* tablespace */
+	Oid			filenode;		/* relation's filenode. */
+	ForkNumber	forknum;		/* fork number */
+	BlockNumber blocknum;		/* block number */
+} BlockInfoRecord;
+
+/*
+ * Tasks performed by autoprewarm workers.
+ */
+typedef enum
+{
+	TASK_PREWARM_BUFFERPOOL,	/* prewarm the buffer pool. */
+	TASK_DUMP_BUFFERPOOL_INFO,	/* dump the buffer pool block info. */
+	TASK_DUMP_IMMEDIATE_ONCE,	/* dump the buffer pool block info immediately
+								 * once. */
+	TASK_END					/* no more tasks to do. */
+} AutoPrewarmTask;
+
+/*
+ * Shared state information about the running autoprewarm bgworker.
+ */
+typedef struct AutoPrewarmSharedState
+{
+	LWLock		lock;			/* protects SharedState */
+	AutoPrewarmTask current_task;		/* current tasks performed by
+										 * autoprewarm workers. */
+	bool		is_bgworker_running;	/* if set can't start another worker. */
+	bool		can_do_prewarm; /* if set can't do prewarm task. */
+} AutoPrewarmSharedState;
+
+static AutoPrewarmSharedState *state = NULL;
+
+/* dsm used during TASK_PREWARM_BUFFERPOOL to store read BlockInfoRecord's. */
+static dsm_segment *seg = NULL;
+
+/*
+ * The block_infos allocated to each sub-worker to do prewarming.
+ */
+typedef struct prewarm_elem
+{
+	dsm_handle	block_info_handle;		/* handle to dsm seg of block_infos */
+	Oid			database;		/* database to connect and load */
+	uint32		start_pos;		/* start position within block_infos from
+								 * which sub-worker start prewaring blocks. */
+	uint32		end_of_blockinfos;		/* End of block_infos in dsm */
+} prewarm_elem;
+
+/* GUC variable which control the dump activity of autoprewarm. */
+static int	dump_interval = 0;
+
+/*
+ * GUC variable which say whether autoprewarm worker has to be started when
+ * preloaded.
+ */
+static bool autoprewarm = true;
+
+/* compare member elements to check if they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * blockinfo_cmp - compare function used for qsort().
+ */
+static int
+blockinfo_cmp(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(spcnode);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+	return 0;
+}
+
+/* ============================================================================
+ * =====================	prewarm part of autoprewarm =======================
+ * ============================================================================
+ */
+
+/*
+ * reset_shm_state - on_shm_exit reset the prewarm state.
+ */
+
+static void
+reset_shm_state(int code, Datum arg)
+{
+	state->is_bgworker_running = false;
+	state->current_task = TASK_END;
+}
+
+/*
+ * detach_blkinfos - on_shm_exit detach the dsm allocated for blockinfos.
+ */
+static void
+detach_blkinfos(int code, Datum arg)
+{
+	if (seg != NULL)
+		dsm_detach(seg);
+}
+
+/*
+ * get_autoprewarm_task - get next task allowed and to be performed by the
+ * autoprewarm worker.
+ */
+static AutoPrewarmTask
+get_autoprewarm_task(AutoPrewarmTask todo_task)
+{
+	bool		found = false;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	state = ShmemInitStruct("autoprewarm",
+							sizeof(AutoPrewarmSharedState),
+							&found);
+	if (!found)
+	{
+		/* First time through ... */
+		LWLockInitialize(&state->lock, LWLockNewTrancheId());
+		state->current_task = TASK_END;
+		state->is_bgworker_running = false;
+		state->can_do_prewarm = true;
+	}
+
+	LWLockRelease(AddinShmemInitLock);
+	LWLockAcquire(&state->lock, LW_EXCLUSIVE);
+
+	/*
+	 * If already a bgworker is running we cannot run another. But if task is
+	 * to just dump immediate and there is no prewarm happening we can go
+	 * further.
+	 */
+	if (state->is_bgworker_running &&
+		(todo_task != TASK_DUMP_IMMEDIATE_ONCE ||
+		 state->current_task == TASK_PREWARM_BUFFERPOOL))
+	{
+		LWLockRelease(&state->lock);
+		return TASK_END;
+	}
+
+	/*
+	 * If asked to do prewarm, check whether we can do so. We avoid prewarm if
+	 * its already done on startup.
+	 */
+	if (todo_task == TASK_PREWARM_BUFFERPOOL && !state->can_do_prewarm)
+		todo_task = TASK_DUMP_BUFFERPOOL_INFO;
+
+	/*
+	 * For now if there was a previous attempt to prewarm or dump any further
+	 * request to prewarm will not be entertained.
+	 */
+	state->can_do_prewarm = false;
+
+	if (todo_task != TASK_DUMP_IMMEDIATE_ONCE)
+	{
+		state->is_bgworker_running = true;
+		state->current_task = todo_task;
+		on_shmem_exit(reset_shm_state, 0);
+	}
+
+	LWLockRelease(&state->lock);
+	return todo_task;
+}
+
+/*
+ * load_one_database -- start of prewarm sub-worker, this will try to load
+ * blocks of one database starting from block info position passed by main
+ * prewarm worker.
+ */
+void
+load_one_database(Datum main_arg)
+{
+	uint32		pos;
+	BlockInfoRecord *block_info;
+	Relation	rel = NULL;
+	BlockNumber nblocks = 0;
+	prewarm_elem pelem;
+	BlockInfoRecord *old_blk;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+
+	/*
+	 * We're now ready to receive signals
+	 */
+	BackgroundWorkerUnblockSignals();
+
+	memcpy(&pelem, MyBgworkerEntry->bgw_extra, sizeof(prewarm_elem));
+
+	seg = dsm_attach(pelem.block_info_handle);
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("unable to map dynamic shared memory segment")));
+	on_shmem_exit(detach_blkinfos, 0);
+
+	block_info = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	BackgroundWorkerInitializeConnectionByOid(pelem.database, InvalidOid);
+	SetCurrentStatementStartTimestamp();
+	StartTransactionCommand();
+
+	old_blk = NULL;
+	pos = pelem.start_pos;
+
+	while (!got_sigterm && pos < pelem.end_of_blockinfos && have_free_buffer())
+	{
+		BlockInfoRecord *blk = &block_info[pos];
+		Buffer		buf;
+
+		/*
+		 * Quit if we've reached records for another database. Unless the
+		 * previous blocks were of global objects which were combined with
+		 * next database's block infos.
+		 */
+		if (old_blk != NULL && old_blk->database != blk->database &&
+			old_blk->database != 0)
+			break;
+
+		/*
+		 * When we reach a new relation, close the old one.  Note, however,
+		 * that the previous try_relation_open may have failed, in which case
+		 * rel will be NULL.
+		 */
+		if (old_blk != NULL && old_blk->filenode != blk->filenode && rel != NULL)
+		{
+			relation_close(rel, AccessShareLock);
+			rel = NULL;
+		}
+
+		/*
+		 * Try to open each new relation, but only once, when we first
+		 * encounter it.  If it's been dropped, skip the associated blocks.
+		 */
+		if (old_blk == NULL || old_blk->filenode != blk->filenode)
+		{
+			Oid			reloid;
+
+			Assert(rel == NULL);
+			reloid = RelidByRelfilenode(blk->spcnode, blk->filenode);
+			if (OidIsValid(reloid))
+				rel = try_relation_open(reloid, AccessShareLock);
+		}
+		if (!rel)
+		{
+			++pos;
+			old_blk = blk;
+			continue;
+		}
+
+		/* Once per fork, check for fork existence and size. */
+		if (old_blk == NULL || old_blk->forknum != blk->forknum)
+		{
+			RelationOpenSmgr(rel);
+			if (smgrexists(rel->rd_smgr, blk->forknum))
+				nblocks = RelationGetNumberOfBlocksInFork(rel, blk->forknum);
+			else
+				nblocks = 0;
+		}
+
+		/* check if blocknum is valid and with in fork file size. */
+		if (blk->blocknum >= nblocks)
+		{
+			/* move to next forknum. */
+			++pos;
+			old_blk = blk;
+			continue;
+		}
+
+		/* Prewarm buffer. */
+		buf = ReadBufferExtended(rel, blk->forknum, blk->blocknum, RBM_NORMAL,
+								 NULL);
+		if (BufferIsValid(buf))
+			ReleaseBuffer(buf);
+
+		old_blk = blk;
+		++pos;
+	}
+
+	dsm_detach(seg);
+	seg = NULL;
+
+	/* release lock on previous relation. */
+	if (rel)
+	{
+		relation_close(rel, AccessShareLock);
+		rel = NULL;
+	}
+
+	CommitTransactionCommand();
+	return;
+}
+
+/*
+ * launch_prewarm_subworker -- register a dynamic worker to load the blocks
+ * starting from next_db_pos. We wait until the worker has stopped.
+ */
+static void
+launch_prewarm_subworker(prewarm_elem *pelem)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle = NULL;
+	BgwHandleStatus status;
+
+	setup_autoprewarm(&worker, "autoprewarm", "load_one_database",
+					  (Datum) NULL, BGW_NEVER_RESTART,
+					  BGWORKER_BACKEND_DATABASE_CONNECTION);
+
+	/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerShutdown */
+	worker.bgw_notify_pid = MyProcPid;
+	memcpy(worker.bgw_extra, pelem, sizeof(prewarm_elem));
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker autoprewarm failed"),
+				 errhint("Consider increasing configuration parameter "
+						 "\"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerShutdown(handle);
+	if (status == BGWH_STOPPED)
+		return;
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			  errmsg("cannot start bgworker autoprewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart"
+						 " the database.")));
+	}
+
+	Assert(0);
+}
+
+/*
+ *	prewarm_buffer_pool - the main routine which prewarm the buffer pool.
+ *
+ *	The prewarm bgworker will first load all of the BlockInfoRecord's in
+ *	$PGDATA/AUTOPREWARM_FILE to a dsm. And those BlockInfoRecords are further
+ *	separated based on their database. And for each group of BlockInfoRecords a
+ *	sub-workers will be launched to load corresponding blocks. Each sub-worker
+ *	will be launched in sequential order only after the previous sub-worker has
+ *	finished its job.
+ */
+static void
+prewarm_buffer_pool(void)
+{
+	FILE	   *file = NULL;
+	uint32	   *next_db_pos;
+	size_t		next_db_pos_size;
+	uint32		this_dbs_elements = 0,
+				num_elements,
+				num_db = 0,
+				i;
+	Oid			prev_database;
+	BlockInfoRecord *blkinfo;
+
+	file = fopen(AUTOPREWARM_FILE, PG_BINARY_R);
+	if (!file)
+	{
+		if (errno != ENOENT)
+			ereport(ERROR, (errcode_for_file_access(),
+							errmsg("could not read file \"%s\": %m",
+								   AUTOPREWARM_FILE)));
+		return;					/* No file to load. */
+	}
+
+	if (fscanf(file, "<<%u>>", &num_elements) != 1)
+	{
+		fclose(file);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("Error reading num of elements in \"%s\" for"
+						" autoprewarm : %m", AUTOPREWARM_FILE)));
+	}
+
+	seg = dsm_create(sizeof(BlockInfoRecord) * num_elements, 0);
+	on_shmem_exit(detach_blkinfos, 0);
+
+	blkinfo = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	for (i = 0; i < num_elements; i++)
+	{
+		/* get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &blkinfo[i].database,
+						&blkinfo[i].spcnode, &blkinfo[i].filenode,
+						(uint32 *) &blkinfo[i].forknum, &blkinfo[i].blocknum))
+			break;
+	}
+
+	num_elements = i;
+
+	/*
+	 * sort the block number to increase the chance of sequential reads during
+	 * load.
+	 */
+	pg_qsort(blkinfo, num_elements, sizeof(BlockInfoRecord), blockinfo_cmp);
+	next_db_pos_size = 64;
+	next_db_pos = (uint32 *) palloc(sizeof(uint32) * next_db_pos_size);
+
+	/* read and fill block infos */
+	for (i = 0; i < num_elements; i++)
+	{
+		if (i == 0)
+		{
+			prev_database = blkinfo[i].database;
+			next_db_pos[num_db++] = 0;
+		}
+		else if (prev_database != blkinfo[i].database)
+		{
+			if (num_db >= next_db_pos_size)
+			{
+				next_db_pos_size *= 2;
+				next_db_pos = (uint32 *) repalloc(next_db_pos,
+										  sizeof(uint32) * next_db_pos_size);
+			}
+
+			next_db_pos[num_db++] = this_dbs_elements;
+			this_dbs_elements = 0;
+			prev_database = blkinfo[i].database;
+		}
+
+		this_dbs_elements++;
+	}
+
+	fclose(file);
+	i = 0;
+
+	/* get next database's first block info's position. */
+	while (!got_sigterm && i < num_db)
+	{
+		prewarm_elem pelem;
+
+		pelem.start_pos = next_db_pos[i];
+
+		if (blkinfo[next_db_pos[i]].database == 0)
+		{
+			/*
+			 * For block info of a global object whose database will be 0 try
+			 * to combine them with next non-zero database's block infos to
+			 * load. If there are no other block infos than the global objects
+			 * we silently ignore them. Should I throw error?
+			 */
+			if ((i + 1) < num_db)
+			{
+				pelem.database = blkinfo[next_db_pos[i + 1]].database;
+				i++;
+			}
+			else
+				break;
+		}
+		else
+			pelem.database = blkinfo[next_db_pos[i]].database;
+		pelem.block_info_handle = dsm_segment_handle(seg);
+		pelem.end_of_blockinfos = num_elements;
+
+		/*
+		 * Register a sub-worker to load new database's block. Wait until the
+		 * sub-worker finish its job before launching next sub-worker.
+		 */
+		launch_prewarm_subworker(&pelem);
+		i++;
+	}
+
+	pfree(next_db_pos);
+	dsm_detach(seg);
+	seg = NULL;
+	ereport(LOG, (errmsg("autoprewarm load task ended")));
+	return;
+}
+
+/* ============================================================================
+ * =============	buffer pool info dump part of autoprewarm	===============
+ * ============================================================================
+ */
+
+/* This sub-module is for periodically dumping buffer pool's block info into
+ * a dump file AUTOPREWARM_FILE.
+ * Each entry of block info looks like this:
+ * <DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum> and we shall call it
+ * as BlockInfoRecord. Note we write in the text form so that the dump
+ * information is readable and if necessary can be carefully edited.
+ *
+ * The prewarm task will read these blockInfoRecord one by one in sequence and
+ * distribute it among its sub workers to load corresponding blocks.
+ */
+
+/*
+ *	dump_now - the main routine which goes through each buffer header of buffer
+ *	pool and dumps their meta data. We Sort these data and then dump them.
+ *	Sorting is necessary as it facilitates sequential read during load.
+ */
+static uint32
+dump_now(void)
+{
+	static char transient_dump_file_path[MAXPGPATH];
+	uint32		i;
+	int			ret,
+				buflen;
+	uint32		num_blocks;
+	BlockInfoRecord *block_info_array;
+	BufferDesc *bufHdr;
+	int			fd;
+	char		buf[1024];
+
+	block_info_array =
+		(BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	for (num_blocks = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Have we been asked to stop dump? */
+		if (dump_interval == AT_PWARM_OFF)
+		{
+			free(block_info_array);
+			return 0;
+		}
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		if (buf_state & BM_TAG_VALID)
+		{
+			block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_blocks].spcnode = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+			++num_blocks;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	snprintf(transient_dump_file_path, MAXPGPATH, "%s.%d", AUTOPREWARM_FILE,
+			 MyProcPid);
+
+	fd = OpenTransientFile(transient_dump_file_path,
+						   O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open \"%s\": %m", AUTOPREWARM_FILE)));
+
+	buflen = sprintf(buf, "<<%u>>\n", num_blocks);
+	if (write(fd, buf, buflen) < buflen)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("autoprewarm: error writing to \"%s\" : %m",
+						AUTOPREWARM_FILE)));
+
+	for (i = 0; i < num_blocks; i++)
+	{
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Have we been asked to stop dump? */
+		if (dump_interval == AT_PWARM_OFF)
+		{
+			free(block_info_array);
+			CloseTransientFile(fd);
+			unlink(transient_dump_file_path);
+			return 0;
+		}
+
+		buflen = sprintf(buf, "%u,%u,%u,%u,%u\n",
+						 block_info_array[i].database,
+						 block_info_array[i].spcnode,
+						 block_info_array[i].filenode,
+						 (uint32) block_info_array[i].forknum,
+						 block_info_array[i].blocknum);
+
+		if (write(fd, buf, buflen) < buflen)
+		{
+			CloseTransientFile(fd);
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("error writing to \"%s\" : %m",
+							AUTOPREWARM_FILE)));
+		}
+	}
+
+	pfree(block_info_array);
+
+	/*
+	 * rename transient_dump_file_path to AUTOPREWARM_FILE to make things
+	 * permanent.
+	 */
+	ret = CloseTransientFile(fd);
+	if (ret != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("error closing \"%s\" : %m",
+						transient_dump_file_path)));
+	(void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
+
+	ereport(LOG, (errmsg("saved metadata info of %d blocks", num_blocks)));
+	return num_blocks;
+}
+
+/*
+ * dump_block_info_periodically - at regular intervals, which is defined by GUC
+ * dump_interval, dump the info of blocks which are present in buffer pool.
+ */
+void
+dump_block_info_periodically(void)
+{
+	TimestampTz last_dump_time = GetCurrentTimestamp();
+
+	while (!got_sigterm)
+	{
+		int			rc;
+		struct timeval nap;
+
+		nap.tv_sec = AT_PWARM_DEFAULT_DUMP_INTERVAL;
+		nap.tv_usec = 0;
+
+		/* Has been set not to dump. Nothing more to do. */
+		if (dump_interval == AT_PWARM_OFF)
+			return;
+
+		if (dump_interval > AT_PWARM_DUMP_AT_SHUTDOWN_ONLY)
+		{
+			TimestampTz current_time = GetCurrentTimestamp();
+
+			if (TimestampDifferenceExceeds(last_dump_time,
+										   current_time,
+										   (dump_interval * 1000)))
+			{
+				dump_now();
+				if (got_sigterm)
+					return;		/* got shutdown signal during or right after a
+								 * dump. And, I think better to return now. */
+				last_dump_time = GetCurrentTimestamp();
+				nap.tv_sec = dump_interval;
+				nap.tv_usec = 0;
+			}
+			else
+			{
+				long		secs;
+				int			usecs;
+
+				TimestampDifference(last_dump_time, current_time,
+									&secs, &usecs);
+				nap.tv_sec = dump_interval - secs;
+				nap.tv_usec = 0;
+			}
+		}
+
+		ResetLatch(&MyProc->procLatch);
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   (nap.tv_sec * 1000L) + (nap.tv_usec / 1000L),
+					   PG_WAIT_EXTENSION);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* One last block meta info dump while postmaster shutdown. */
+	if (dump_interval != AT_PWARM_OFF)
+		dump_now();
+}
+
+/*
+ * autoprewarm_main -- the main entry point of autoprewarm bgworker process.
+ */
+void
+autoprewarm_main(Datum main_arg)
+{
+	AutoPrewarmTask next_task;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+	pqsignal(SIGUSR1, apw_sigusr1_handler);
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+
+	next_task = get_autoprewarm_task(DatumGetInt32(main_arg));
+
+	ereport(LOG, (errmsg("autoprewarm has started")));
+
+	/*
+	 * **** perform autoprewarm's next task	****
+	 */
+	if (next_task == TASK_PREWARM_BUFFERPOOL)
+	{
+		prewarm_buffer_pool();
+
+		/* prewarm is done lets move to TASK_DUMP_BUFFERPOOL_INFO. */
+		state->current_task = TASK_DUMP_BUFFERPOOL_INFO;
+		next_task = TASK_DUMP_BUFFERPOOL_INFO;
+	}
+
+	if (next_task == TASK_DUMP_BUFFERPOOL_INFO)
+	{
+		dump_block_info_periodically();
+
+		/*
+		 * down grade to TASK_DUMP_IMMEDIATE_ONCE so others can start
+		 * TASK_DUMP_BUFFERPOOL_INFO
+		 */
+		state->current_task = TASK_DUMP_IMMEDIATE_ONCE;
+	}
+
+	ereport(LOG, (errmsg("autoprewarm shutting down")));
+}
+
+/* ============================================================================
+ * =============	extension's entry functions/utilities	===================
+ * ============================================================================
+ */
+
+/* Register autoprewarm load bgworker. */
+static void
+setup_autoprewarm(BackgroundWorker *autoprewarm, const char *worker_name,
+			   const char *worker_function, Datum main_arg, int restart_time,
+				  int extra_flags)
+{
+	MemSet(autoprewarm, 0, sizeof(BackgroundWorker));
+	autoprewarm->bgw_flags = BGWORKER_SHMEM_ACCESS | extra_flags;
+
+	/* Register the autoprewarm background worker */
+	autoprewarm->bgw_start_time = BgWorkerStart_ConsistentState;
+	autoprewarm->bgw_restart_time = restart_time;
+	strcpy(autoprewarm->bgw_library_name, "pg_prewarm");
+	strcpy(autoprewarm->bgw_function_name, worker_function);
+	strncpy(autoprewarm->bgw_name, worker_name, BGW_MAXLEN);
+	autoprewarm->bgw_main_arg = main_arg;
+}
+
+/* Extension's entry point. */
+void
+_PG_init(void)
+{
+	BackgroundWorker prewarm_worker;
+
+	/* Define custom GUC variables. */
+	if (process_shared_preload_libraries_in_progress)
+		DefineCustomBoolVariable("pg_prewarm.autoprewarm",
+								 "Enable/Disable auto-prewarm feature.",
+								 NULL,
+								 &autoprewarm,
+								 true,
+								 PGC_POSTMASTER,
+								 0,
+								 NULL,
+								 NULL,
+								 NULL);
+
+	DefineCustomIntVariable("pg_prewarm.dump_interval",
+					   "Sets the maximum time between two buffer pool dumps",
+							"If set to Zero, timer based dumping is disabled."
+							" If set to -1, stops the running autoprewarm.",
+							&dump_interval,
+							AT_PWARM_DEFAULT_DUMP_INTERVAL,
+							AT_PWARM_OFF, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	EmitWarningsOnPlaceholders("pg_prewarm");
+
+	/* if not run as a preloaded library, nothing more to do here! */
+	if (!process_shared_preload_libraries_in_progress)
+		return;
+
+	/* Request additional shared resources */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
+	RequestNamedLWLockTranche("pg_autoprewarm", 1);
+
+	/* Has been set not to start autoprewarm bgworker. Nothing more to do. */
+	if (!autoprewarm)
+		return;
+
+	/* Register autoprewarm load. */
+	setup_autoprewarm(&prewarm_worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_PREWARM_BUFFERPOOL), 0, 0);
+	RegisterBackgroundWorker(&prewarm_worker);
+}
+
+/*
+ * Dynamically launch an autoprewarm dump worker.
+ */
+static pid_t
+autoprewarm_dump_launcher(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	setup_autoprewarm(&worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_DUMP_BUFFERPOOL_INFO), 0, 0);
+
+	/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			   errmsg("registering dynamic bgworker \"autoprewarm\" failed"),
+				 errhint("Consider increasing configuration parameter "
+						 "\"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerStartup(handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not start autoprewarm dump bgworker"),
+			   errhint("More details may be available in the server log.")));
+	}
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			  errmsg("cannot start bgworker autoprewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart the database.")));
+	}
+
+	Assert(status == BGWH_STARTED);
+	return pid;
+}
+
+/*
+ * The C-Language entry function to launch autoprewarm dump bgworker.
+ */
+Datum
+launch_autoprewarm_dump(PG_FUNCTION_ARGS)
+{
+	pid_t		pid;
+
+	/* Has been set not to dump. Nothing more to do. */
+	if (dump_interval == AT_PWARM_OFF)
+		PG_RETURN_NULL();
+
+	pid = autoprewarm_dump_launcher();
+	PG_RETURN_INT32(pid);
+}
+
+/*
+ * The C-Language entry function to dump immediately.
+ */
+Datum
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+	AutoPrewarmTask next_task;
+
+	/* dump only if prewarm is not in progress. */
+	next_task = get_autoprewarm_task(TASK_DUMP_IMMEDIATE_ONCE);
+	if (next_task == TASK_DUMP_IMMEDIATE_ONCE)
+		PG_RETURN_INT64(dump_now());
+	PG_RETURN_NULL();
+}
diff --git a/contrib/pg_prewarm/autoprewarm.h b/contrib/pg_prewarm/autoprewarm.h
new file mode 100644
index 0000000..4220fc2
--- /dev/null
+++ b/contrib/pg_prewarm/autoprewarm.h
@@ -0,0 +1,35 @@
+/*
+ * contrib/pg_prewarm/autoprewarm.h
+ */
+#ifndef __AUTOPREWARM_H__
+#define __AUTOPREWARM_H__
+
+#include "postgres.h"
+#include <unistd.h>
+
+/* These are always necessary for a bgworker. */
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+
+/* These are necessary for prewarm utilities. */
+#include "access/heapam.h"
+#include "access/xact.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "pgstat.h"
+#include "storage/buf_internals.h"
+#include "storage/dsm.h"
+#include "storage/smgr.h"
+#include "utils/acl.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relfilenodemap.h"
+#include "utils/resowner.h"
+
+#endif   /* __AUTOPREWARM_H__ */
diff --git a/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
new file mode 100644
index 0000000..6c35fb7
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,14 @@
+/* contrib/pg_prewarm/pg_prewarm--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_prewarm UPDATE TO '1.2'" to load this file. \quit
+
+CREATE FUNCTION launch_autoprewarm_dump()
+RETURNS pg_catalog.int4 STRICT
+AS 'MODULE_PATHNAME', 'launch_autoprewarm_dump'
+LANGUAGE C;
+
+CREATE FUNCTION autoprewarm_dump_now()
+RETURNS pg_catalog.int8 STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_dump_now'
+LANGUAGE C;
diff --git a/contrib/pg_prewarm/pg_prewarm.control b/contrib/pg_prewarm/pg_prewarm.control
index cf2fb92..40e3add 100644
--- a/contrib/pg_prewarm/pg_prewarm.control
+++ b/contrib/pg_prewarm/pg_prewarm.control
@@ -1,5 +1,5 @@
 # pg_prewarm extension
 comment = 'prewarm relation data'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_prewarm'
 relocatable = true
diff --git a/doc/src/sgml/pgprewarm.sgml b/doc/src/sgml/pgprewarm.sgml
index c090401..ab5bf42 100644
--- a/doc/src/sgml/pgprewarm.sgml
+++ b/doc/src/sgml/pgprewarm.sgml
@@ -10,7 +10,9 @@
  <para>
   The <filename>pg_prewarm</filename> module provides a convenient way
   to load relation data into either the operating system buffer cache
-  or the <productname>PostgreSQL</productname> buffer cache.
+  or the <productname>PostgreSQL</productname> buffer cache. Additionally, an
+  automatic prewarming of the server buffers is supported whenever the server
+  restarts.
  </para>
 
  <sect2>
@@ -55,6 +57,102 @@ pg_prewarm(regclass, mode text default 'buffer', fork text default 'main',
    cache. For these reasons, prewarming is typically most useful at startup,
    when caches are largely empty.
   </para>
+
+<synopsis>
+launch_autoprewarm_dump() RETURNS int4
+</synopsis>
+
+  <para>
+   This is a SQL callable function to launch the <literal>autoprewarm</literal>
+   worker to dump the buffer pool information at regular interval. In a server,
+   we can only run one <literal>autoprewarm</literal> worker so if worker sees
+   another existing worker it will exit immediately. The return value is pid of
+   the worker which has been launched.
+  </para>
+
+<synopsis>
+autoprewarm_dump_now() RETURNS int8
+</synopsis>
+
+  <para>
+   This is a SQL callable function to dump buffer pool information immediately
+   once by a backend. This can work in parallel
+   with the <literal>autoprewarm</literal> worker while it is dumping.
+   The return value is the number of blocks info dumped.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>autoprewarm</title>
+
+  <para>
+  A bgworker which automatically records information about blocks which were
+  present in buffer pool before server shutdown and then prewarm the buffer
+  pool upon server restart with those blocks.
+  </para>
+
+  <para>
+  When the shared library <literal>pg_prewarm</literal> is preloaded via
+  <xref linkend="guc-shared-preload-libraries"> in <filename>postgresql.conf</>,
+  a bgworker <literal>autoprewarm</literal> is launched immediately after the
+  server has reached a consistent state. The bgworker will start loading blocks
+  recorded in <literal>$PGDATA/autoprewarm.blocks</literal> until there is a
+  free buffer left in the buffer pool. This way we do not replace any new
+  blocks which were loaded either by the recovery process or the querying
+  clients.
+  </para>
+
+  <para>
+  Once the <literal>autoprewarm</literal> bgworker has completed its prewarm
+  task, it will start a new task to periodically dump the information about
+  blocks which are currently in shared buffer pool. Upon next server restart,
+  the bgworker will prewarm the buffer pool by loading those blocks. The GUC
+  <literal>pg_prewarm.dump_interval</literal> will control the dumping activity
+  of the bgworker.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+ <variablelist>
+   <varlistentry>
+    <term>
+     <varname>pg_prewarm.autoprewarm</varname> (<type>boolean</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.autoprewarm</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is valid only for <literal>autoprewarm</literal>. An autoprewarm
+      worker will only be started if this variable is set <literal>on</literal>.
+      The default value is <literal>on</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry>
+   <term>
+     <varname>pg_prewarm.dump_interval</varname> (<type>int</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.dump_interval</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is valid only for <literal>autoprewarm</literal>. The minimum number
+      of seconds between two buffer pool's block information dump. The default
+      is 300 seconds. It also takes special values. If set to 0 then timer
+      based dump is disabled, it dumps only while the server is shutting down.
+      If set to -1, the running <literal>autoprewarm</literal> will be stopped.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
  </sect2>
 
  <sect2>
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 5d0a636..06a34a7 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,23 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- a lockless check to see if there is a free buffer in
+ *					   buffer pool.
+ *
+ * If the result is true that will become stale once free buffers are moved out
+ * by other operations, so the caller who strictly want to use a free buffer
+ * should not call this.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index ff99f6b..ab04bd9 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index eaa6d32..c6fa86a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -138,6 +138,8 @@ AttrDefault
 AttrNumber
 AttributeOpts
 AuthRequest
+AutoPrewarmSharedState
+AutoPrewarmTask
 AutoVacOpts
 AutoVacuumShmemStruct
 AutoVacuumWorkItem
@@ -214,10 +216,12 @@ BitmapOr
 BitmapOrPath
 BitmapOrState
 Bitmapset
+BlkType
 BlobInfo
 Block
 BlockId
 BlockIdData
+BlockInfoRecord
 BlockNumber
 BlockSampler
 BlockSamplerData
@@ -2869,6 +2873,7 @@ pos_trgm
 post_parse_analyze_hook_type
 pqbool
 pqsigfunc
+prewarm_elem
 printQueryOpt
 printTableContent
 printTableFooter

#72

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 8 years ago

In reply to: Mithun Cy (#1)

Re: Proposal : For Auto-Prewarm.

On 27.10.2016 14:39, Mithun Cy wrote:

# pg_autoprewarm.

This a PostgreSQL contrib module which automatically dump all of the
blocknums
present in buffer pool at the time of server shutdown(smart and fast
mode only,
to be enhanced to dump at regular interval.) and load these blocks
when server restarts.

Design:
------
We have created a BG Worker Auto Pre-warmer which during shutdown
dumps all the
blocknum in buffer pool in sorted order.
Format of each entry is
<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>.
Auto Pre-warmer is started as soon as the postmaster is started we do
not wait
for recovery to finish and database to reach a consistent state. If
there is a
"dump_file" to load we start loading each block entry to buffer pool until
there is a free buffer. This way we do not replace any new blocks
which was
loaded either by recovery process or querying clients. Then it waits
until it receives
SIGTERM to dump the block information in buffer pool.

HOW TO USE:
-----------
Build and add the pg_autoprewarm to shared_preload_libraries. Auto
Pre-warmer
process automatically do dumping of buffer pool's block info and load
them when
restarted.

TO DO:
------
Add functionality to dump based on timer at regular interval.
And some cleanups.

I wonder if you considered parallel prewarming of a table?
Right now either with pg_prewarm, either with pg_autoprewarm, preloading
table's data is performed by one backend.
It certainly makes sense if there is just one HDD and we want to
minimize impact of pg_prewarm on normal DBMS activity.
But sometimes we need to load data in memory as soon as possible. And
modern systems has larger number of CPU cores and
RAID devices make it possible to efficiently load data in parallel.

I have asked this question in context of my CFS (compressed file system)
for Postgres. The customer's complaint was that there are 64 cores at
his system but when
he is building index, decompression of heap data is performed by only
one core. This is why I thought about prewarm... (parallel index
construction is separate story...)

pg_prewarm makes is possible to specify range of blocks, so, in
principle, it is possible to manually preload table in parallel, by
spawining pg_prewarm
with different subranges in several backends. But it is definitely not
user friendly approach.
And as far as I understand pg_autoprewarm has all necessary
infrastructure to do parallel load. We just need to spawn more than one
background worker and specify
separate block range for each worker.

Do you think that such functionality (parallel autoprewarm) can be
useful and be easily added?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#73

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Mithun Cy (#71)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

On Tue, May 30, 2017 at 10:16 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

Thanks Robert,

Sorry, there was one more mistake ( a typo) in dump_now() instead of
using pfree I used free corrected same in the new patch v10.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

autoprewarm_10.patchapplication/octet-stream; name=autoprewarm_10.patchDownload

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e..88580d1 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,10 +1,10 @@
 # contrib/pg_prewarm/Makefile
 
 MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+OBJS = pg_prewarm.o autoprewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
-DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
+DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
new file mode 100644
index 0000000..ac0f9e4
--- /dev/null
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -0,0 +1,1032 @@
+/*-------------------------------------------------------------------------
+ *
+ * autoprewarm.c
+ *		Automatically prewarm the shared buffer pool when server restarts.
+ *
+ * DESCRIPTION
+ *
+ *		It is a bgworker which automatically records information about blocks
+ *		which were present in buffer pool before server shutdown and then
+ *		prewarm the buffer pool upon server restart with those blocks.
+ *
+ *		How does it work? When the shared library "pg_prewarm" is preloaded, a
+ *		bgworker "autoprewarm" is launched immediately after the server has
+ *		reached consistent state. The bgworker will start loading blocks
+ *		recorded in the format BlockInfoRecord
+ *		<<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>> in
+ *		$PGDATA/AUTOPREWARM_FILE, until there is no free buffer left in the
+ *		buffer pool. This way we do not replace any new blocks which were
+ *		loaded either by the recovery process or the querying clients.
+ *
+ *		Once the "autoprewarm" bgworker has completed its prewarm task, it will
+ *		start a new task to periodically dump the BlockInfoRecords related to
+ *		blocks which are currently in shared buffer pool. Upon next server
+ *		restart, the bgworker will prewarm the buffer pool by loading those
+ *		blocks. The GUC pg_prewarm.dump_interval will control the dumping
+ *		activity of the bgworker.
+ *
+ *	Copyright (c) 2016-2017, PostgreSQL Global Development Group
+ *
+ *	IDENTIFICATION
+ *		contrib/autoprewarm.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "autoprewarm.h"
+
+PG_FUNCTION_INFO_V1(launch_autoprewarm_dump);
+PG_FUNCTION_INFO_V1(autoprewarm_dump_now);
+
+#define AT_PWARM_OFF -1
+#define AT_PWARM_DUMP_AT_SHUTDOWN_ONLY 0
+#define AT_PWARM_DEFAULT_DUMP_INTERVAL 300
+
+#define AUTOPREWARM_FILE "autoprewarm.blocks"
+
+/* Primary functions */
+void		_PG_init(void);
+void		autoprewarm_main(Datum main_arg);
+static void dump_block_info_periodically(void);
+static pid_t autoprewarm_dump_launcher(void);
+static void setup_autoprewarm(BackgroundWorker *autoprewarm,
+				  const char *worker_name,
+				  const char *worker_function,
+				  Datum main_arg, int restart_time,
+				  int extra_flags);
+void		load_one_database(Datum main_arg);
+
+/*
+ * Signal Handlers.
+ */
+
+static void apw_sigterm_handler(SIGNAL_ARGS);
+static void apw_sighup_handler(SIGNAL_ARGS);
+static void apw_sigusr1_handler(SIGNAL_ARGS);
+
+/* flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+static volatile sig_atomic_t got_sighup = false;
+
+/*
+ *	Signal handler for SIGTERM
+ *	Set a flag to let the main loop to terminate, and set our latch to wake it
+ *	up.
+ */
+static void
+apw_sigterm_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGHUP
+ *	Set a flag to tell the process to reread the config file, and set our
+ *	latch to wake it up.
+ */
+static void
+apw_sighup_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sighup = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGUSR1.
+ *	The prewarm sub-workers will notify with SIGUSR1 on their startup/shutdown.
+ */
+static void
+apw_sigusr1_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/* ============================================================================
+ * ==============	types and variables used by autoprewarm   =============
+ * ============================================================================
+ */
+
+/*
+ * Metadata of each persistent block which is dumped and used to load.
+ */
+typedef struct BlockInfoRecord
+{
+	Oid			database;		/* database */
+	Oid			spcnode;		/* tablespace */
+	Oid			filenode;		/* relation's filenode. */
+	ForkNumber	forknum;		/* fork number */
+	BlockNumber blocknum;		/* block number */
+} BlockInfoRecord;
+
+/*
+ * Tasks performed by autoprewarm workers.
+ */
+typedef enum
+{
+	TASK_PREWARM_BUFFERPOOL,	/* prewarm the buffer pool. */
+	TASK_DUMP_BUFFERPOOL_INFO,	/* dump the buffer pool block info. */
+	TASK_DUMP_IMMEDIATE_ONCE,	/* dump the buffer pool block info immediately
+								 * once. */
+	TASK_END					/* no more tasks to do. */
+} AutoPrewarmTask;
+
+/*
+ * Shared state information about the running autoprewarm bgworker.
+ */
+typedef struct AutoPrewarmSharedState
+{
+	LWLock		lock;			/* protects SharedState */
+	AutoPrewarmTask current_task;		/* current tasks performed by
+										 * autoprewarm workers. */
+	bool		is_bgworker_running;	/* if set can't start another worker. */
+	bool		can_do_prewarm; /* if set can't do prewarm task. */
+} AutoPrewarmSharedState;
+
+static AutoPrewarmSharedState *state = NULL;
+
+/* dsm used during TASK_PREWARM_BUFFERPOOL to store read BlockInfoRecord's. */
+static dsm_segment *seg = NULL;
+
+/*
+ * The block_infos allocated to each sub-worker to do prewarming.
+ */
+typedef struct prewarm_elem
+{
+	dsm_handle	block_info_handle;		/* handle to dsm seg of block_infos */
+	Oid			database;		/* database to connect and load */
+	uint32		start_pos;		/* start position within block_infos from
+								 * which sub-worker start prewaring blocks. */
+	uint32		end_of_blockinfos;		/* End of block_infos in dsm */
+} prewarm_elem;
+
+/* GUC variable which control the dump activity of autoprewarm. */
+static int	dump_interval = 0;
+
+/*
+ * GUC variable which say whether autoprewarm worker has to be started when
+ * preloaded.
+ */
+static bool autoprewarm = true;
+
+/* compare member elements to check if they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * blockinfo_cmp - compare function used for qsort().
+ */
+static int
+blockinfo_cmp(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(spcnode);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+	return 0;
+}
+
+/* ============================================================================
+ * =====================	prewarm part of autoprewarm =======================
+ * ============================================================================
+ */
+
+/*
+ * reset_shm_state - on_shm_exit reset the prewarm state.
+ */
+
+static void
+reset_shm_state(int code, Datum arg)
+{
+	state->is_bgworker_running = false;
+	state->current_task = TASK_END;
+}
+
+/*
+ * detach_blkinfos - on_shm_exit detach the dsm allocated for blockinfos.
+ */
+static void
+detach_blkinfos(int code, Datum arg)
+{
+	if (seg != NULL)
+		dsm_detach(seg);
+}
+
+/*
+ * get_autoprewarm_task - get next task allowed and to be performed by the
+ * autoprewarm worker.
+ */
+static AutoPrewarmTask
+get_autoprewarm_task(AutoPrewarmTask todo_task)
+{
+	bool		found = false;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	state = ShmemInitStruct("autoprewarm",
+							sizeof(AutoPrewarmSharedState),
+							&found);
+	if (!found)
+	{
+		/* First time through ... */
+		LWLockInitialize(&state->lock, LWLockNewTrancheId());
+		state->current_task = TASK_END;
+		state->is_bgworker_running = false;
+		state->can_do_prewarm = true;
+	}
+
+	LWLockRelease(AddinShmemInitLock);
+	LWLockAcquire(&state->lock, LW_EXCLUSIVE);
+
+	/*
+	 * If already a bgworker is running we cannot run another. But if task is
+	 * to just dump immediate and there is no prewarm happening we can go
+	 * further.
+	 */
+	if (state->is_bgworker_running &&
+		(todo_task != TASK_DUMP_IMMEDIATE_ONCE ||
+		 state->current_task == TASK_PREWARM_BUFFERPOOL))
+	{
+		LWLockRelease(&state->lock);
+		return TASK_END;
+	}
+
+	/*
+	 * If asked to do prewarm, check whether we can do so. We avoid prewarm if
+	 * its already done on startup.
+	 */
+	if (todo_task == TASK_PREWARM_BUFFERPOOL && !state->can_do_prewarm)
+		todo_task = TASK_DUMP_BUFFERPOOL_INFO;
+
+	/*
+	 * For now if there was a previous attempt to prewarm or dump any further
+	 * request to prewarm will not be entertained.
+	 */
+	state->can_do_prewarm = false;
+
+	if (todo_task != TASK_DUMP_IMMEDIATE_ONCE)
+	{
+		state->is_bgworker_running = true;
+		state->current_task = todo_task;
+		on_shmem_exit(reset_shm_state, 0);
+	}
+
+	LWLockRelease(&state->lock);
+	return todo_task;
+}
+
+/*
+ * load_one_database -- start of prewarm sub-worker, this will try to load
+ * blocks of one database starting from block info position passed by main
+ * prewarm worker.
+ */
+void
+load_one_database(Datum main_arg)
+{
+	uint32		pos;
+	BlockInfoRecord *block_info;
+	Relation	rel = NULL;
+	BlockNumber nblocks = 0;
+	prewarm_elem pelem;
+	BlockInfoRecord *old_blk;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+
+	/*
+	 * We're now ready to receive signals
+	 */
+	BackgroundWorkerUnblockSignals();
+
+	memcpy(&pelem, MyBgworkerEntry->bgw_extra, sizeof(prewarm_elem));
+
+	seg = dsm_attach(pelem.block_info_handle);
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("unable to map dynamic shared memory segment")));
+	on_shmem_exit(detach_blkinfos, 0);
+
+	block_info = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	BackgroundWorkerInitializeConnectionByOid(pelem.database, InvalidOid);
+	SetCurrentStatementStartTimestamp();
+	StartTransactionCommand();
+
+	old_blk = NULL;
+	pos = pelem.start_pos;
+
+	while (!got_sigterm && pos < pelem.end_of_blockinfos && have_free_buffer())
+	{
+		BlockInfoRecord *blk = &block_info[pos];
+		Buffer		buf;
+
+		/*
+		 * Quit if we've reached records for another database. Unless the
+		 * previous blocks were of global objects which were combined with
+		 * next database's block infos.
+		 */
+		if (old_blk != NULL && old_blk->database != blk->database &&
+			old_blk->database != 0)
+			break;
+
+		/*
+		 * When we reach a new relation, close the old one.  Note, however,
+		 * that the previous try_relation_open may have failed, in which case
+		 * rel will be NULL.
+		 */
+		if (old_blk != NULL && old_blk->filenode != blk->filenode && rel != NULL)
+		{
+			relation_close(rel, AccessShareLock);
+			rel = NULL;
+		}
+
+		/*
+		 * Try to open each new relation, but only once, when we first
+		 * encounter it.  If it's been dropped, skip the associated blocks.
+		 */
+		if (old_blk == NULL || old_blk->filenode != blk->filenode)
+		{
+			Oid			reloid;
+
+			Assert(rel == NULL);
+			reloid = RelidByRelfilenode(blk->spcnode, blk->filenode);
+			if (OidIsValid(reloid))
+				rel = try_relation_open(reloid, AccessShareLock);
+		}
+		if (!rel)
+		{
+			++pos;
+			old_blk = blk;
+			continue;
+		}
+
+		/* Once per fork, check for fork existence and size. */
+		if (old_blk == NULL || old_blk->forknum != blk->forknum)
+		{
+			RelationOpenSmgr(rel);
+			if (smgrexists(rel->rd_smgr, blk->forknum))
+				nblocks = RelationGetNumberOfBlocksInFork(rel, blk->forknum);
+			else
+				nblocks = 0;
+		}
+
+		/* check if blocknum is valid and with in fork file size. */
+		if (blk->blocknum >= nblocks)
+		{
+			/* move to next forknum. */
+			++pos;
+			old_blk = blk;
+			continue;
+		}
+
+		/* Prewarm buffer. */
+		buf = ReadBufferExtended(rel, blk->forknum, blk->blocknum, RBM_NORMAL,
+								 NULL);
+		if (BufferIsValid(buf))
+			ReleaseBuffer(buf);
+
+		old_blk = blk;
+		++pos;
+	}
+
+	dsm_detach(seg);
+	seg = NULL;
+
+	/* release lock on previous relation. */
+	if (rel)
+	{
+		relation_close(rel, AccessShareLock);
+		rel = NULL;
+	}
+
+	CommitTransactionCommand();
+	return;
+}
+
+/*
+ * launch_prewarm_subworker -- register a dynamic worker to load the blocks
+ * starting from next_db_pos. We wait until the worker has stopped.
+ */
+static void
+launch_prewarm_subworker(prewarm_elem *pelem)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle = NULL;
+	BgwHandleStatus status;
+
+	setup_autoprewarm(&worker, "autoprewarm", "load_one_database",
+					  (Datum) NULL, BGW_NEVER_RESTART,
+					  BGWORKER_BACKEND_DATABASE_CONNECTION);
+
+	/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerShutdown */
+	worker.bgw_notify_pid = MyProcPid;
+	memcpy(worker.bgw_extra, pelem, sizeof(prewarm_elem));
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker autoprewarm failed"),
+				 errhint("Consider increasing configuration parameter "
+						 "\"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerShutdown(handle);
+	if (status == BGWH_STOPPED)
+		return;
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			  errmsg("cannot start bgworker autoprewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart"
+						 " the database.")));
+	}
+
+	Assert(0);
+}
+
+/*
+ *	prewarm_buffer_pool - the main routine which prewarm the buffer pool.
+ *
+ *	The prewarm bgworker will first load all of the BlockInfoRecord's in
+ *	$PGDATA/AUTOPREWARM_FILE to a dsm. And those BlockInfoRecords are further
+ *	separated based on their database. And for each group of BlockInfoRecords a
+ *	sub-workers will be launched to load corresponding blocks. Each sub-worker
+ *	will be launched in sequential order only after the previous sub-worker has
+ *	finished its job.
+ */
+static void
+prewarm_buffer_pool(void)
+{
+	FILE	   *file = NULL;
+	uint32	   *next_db_pos;
+	size_t		next_db_pos_size;
+	uint32		this_dbs_elements = 0,
+				num_elements,
+				num_db = 0,
+				i;
+	Oid			prev_database;
+	BlockInfoRecord *blkinfo;
+
+	file = fopen(AUTOPREWARM_FILE, PG_BINARY_R);
+	if (!file)
+	{
+		if (errno != ENOENT)
+			ereport(ERROR, (errcode_for_file_access(),
+							errmsg("could not read file \"%s\": %m",
+								   AUTOPREWARM_FILE)));
+		return;					/* No file to load. */
+	}
+
+	if (fscanf(file, "<<%u>>", &num_elements) != 1)
+	{
+		fclose(file);
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("Error reading num of elements in \"%s\" for"
+						" autoprewarm : %m", AUTOPREWARM_FILE)));
+	}
+
+	seg = dsm_create(sizeof(BlockInfoRecord) * num_elements, 0);
+	on_shmem_exit(detach_blkinfos, 0);
+
+	blkinfo = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	for (i = 0; i < num_elements; i++)
+	{
+		/* get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &blkinfo[i].database,
+						&blkinfo[i].spcnode, &blkinfo[i].filenode,
+						(uint32 *) &blkinfo[i].forknum, &blkinfo[i].blocknum))
+			break;
+	}
+
+	num_elements = i;
+
+	/*
+	 * sort the block number to increase the chance of sequential reads during
+	 * load.
+	 */
+	pg_qsort(blkinfo, num_elements, sizeof(BlockInfoRecord), blockinfo_cmp);
+	next_db_pos_size = 64;
+	next_db_pos = (uint32 *) palloc(sizeof(uint32) * next_db_pos_size);
+
+	/* read and fill block infos */
+	for (i = 0; i < num_elements; i++)
+	{
+		if (i == 0)
+		{
+			prev_database = blkinfo[i].database;
+			next_db_pos[num_db++] = 0;
+		}
+		else if (prev_database != blkinfo[i].database)
+		{
+			if (num_db >= next_db_pos_size)
+			{
+				next_db_pos_size *= 2;
+				next_db_pos = (uint32 *) repalloc(next_db_pos,
+										  sizeof(uint32) * next_db_pos_size);
+			}
+
+			next_db_pos[num_db++] = this_dbs_elements;
+			this_dbs_elements = 0;
+			prev_database = blkinfo[i].database;
+		}
+
+		this_dbs_elements++;
+	}
+
+	fclose(file);
+	i = 0;
+
+	/* get next database's first block info's position. */
+	while (!got_sigterm && i < num_db)
+	{
+		prewarm_elem pelem;
+
+		pelem.start_pos = next_db_pos[i];
+
+		if (blkinfo[next_db_pos[i]].database == 0)
+		{
+			/*
+			 * For block info of a global object whose database will be 0 try
+			 * to combine them with next non-zero database's block infos to
+			 * load. If there are no other block infos than the global objects
+			 * we silently ignore them. Should I throw error?
+			 */
+			if ((i + 1) < num_db)
+			{
+				pelem.database = blkinfo[next_db_pos[i + 1]].database;
+				i++;
+			}
+			else
+				break;
+		}
+		else
+			pelem.database = blkinfo[next_db_pos[i]].database;
+		pelem.block_info_handle = dsm_segment_handle(seg);
+		pelem.end_of_blockinfos = num_elements;
+
+		/*
+		 * Register a sub-worker to load new database's block. Wait until the
+		 * sub-worker finish its job before launching next sub-worker.
+		 */
+		launch_prewarm_subworker(&pelem);
+		i++;
+	}
+
+	pfree(next_db_pos);
+	dsm_detach(seg);
+	seg = NULL;
+	ereport(LOG, (errmsg("autoprewarm load task ended")));
+	return;
+}
+
+/* ============================================================================
+ * =============	buffer pool info dump part of autoprewarm	===============
+ * ============================================================================
+ */
+
+/* This sub-module is for periodically dumping buffer pool's block info into
+ * a dump file AUTOPREWARM_FILE.
+ * Each entry of block info looks like this:
+ * <DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum> and we shall call it
+ * as BlockInfoRecord. Note we write in the text form so that the dump
+ * information is readable and if necessary can be carefully edited.
+ *
+ * The prewarm task will read these blockInfoRecord one by one in sequence and
+ * distribute it among its sub workers to load corresponding blocks.
+ */
+
+/*
+ *	dump_now - the main routine which goes through each buffer header of buffer
+ *	pool and dumps their meta data. We Sort these data and then dump them.
+ *	Sorting is necessary as it facilitates sequential read during load.
+ */
+static uint32
+dump_now(void)
+{
+	static char transient_dump_file_path[MAXPGPATH];
+	uint32		i;
+	int			ret,
+				buflen;
+	uint32		num_blocks;
+	BlockInfoRecord *block_info_array;
+	BufferDesc *bufHdr;
+	int			fd;
+	char		buf[1024];
+
+	block_info_array =
+		(BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	for (num_blocks = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Have we been asked to stop dump? */
+		if (dump_interval == AT_PWARM_OFF)
+		{
+			free(block_info_array);
+			return 0;
+		}
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		if (buf_state & BM_TAG_VALID)
+		{
+			block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_blocks].spcnode = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+			++num_blocks;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	snprintf(transient_dump_file_path, MAXPGPATH, "%s.%d", AUTOPREWARM_FILE,
+			 MyProcPid);
+
+	fd = OpenTransientFile(transient_dump_file_path,
+						   O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open \"%s\": %m", AUTOPREWARM_FILE)));
+
+	buflen = sprintf(buf, "<<%u>>\n", num_blocks);
+	if (write(fd, buf, buflen) < buflen)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("autoprewarm: error writing to \"%s\" : %m",
+						AUTOPREWARM_FILE)));
+
+	for (i = 0; i < num_blocks; i++)
+	{
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Have we been asked to stop dump? */
+		if (dump_interval == AT_PWARM_OFF)
+		{
+			pfree(block_info_array);
+			CloseTransientFile(fd);
+			unlink(transient_dump_file_path);
+			return 0;
+		}
+
+		buflen = sprintf(buf, "%u,%u,%u,%u,%u\n",
+						 block_info_array[i].database,
+						 block_info_array[i].spcnode,
+						 block_info_array[i].filenode,
+						 (uint32) block_info_array[i].forknum,
+						 block_info_array[i].blocknum);
+
+		if (write(fd, buf, buflen) < buflen)
+		{
+			pfree(block_info_array);
+			CloseTransientFile(fd);
+			unlink(transient_dump_file_path);
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("error writing to \"%s\" : %m",
+							AUTOPREWARM_FILE)));
+		}
+	}
+
+	pfree(block_info_array);
+
+	/*
+	 * rename transient_dump_file_path to AUTOPREWARM_FILE to make things
+	 * permanent.
+	 */
+	ret = CloseTransientFile(fd);
+	if (ret != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("error closing \"%s\" : %m",
+						transient_dump_file_path)));
+	(void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
+
+	ereport(LOG, (errmsg("saved metadata info of %d blocks", num_blocks)));
+	return num_blocks;
+}
+
+/*
+ * dump_block_info_periodically - at regular intervals, which is defined by GUC
+ * dump_interval, dump the info of blocks which are present in buffer pool.
+ */
+void
+dump_block_info_periodically(void)
+{
+	TimestampTz last_dump_time = GetCurrentTimestamp();
+
+	while (!got_sigterm)
+	{
+		int			rc;
+		struct timeval nap;
+
+		nap.tv_sec = AT_PWARM_DEFAULT_DUMP_INTERVAL;
+		nap.tv_usec = 0;
+
+		/* Has been set not to dump. Nothing more to do. */
+		if (dump_interval == AT_PWARM_OFF)
+			return;
+
+		if (dump_interval > AT_PWARM_DUMP_AT_SHUTDOWN_ONLY)
+		{
+			TimestampTz current_time = GetCurrentTimestamp();
+
+			if (TimestampDifferenceExceeds(last_dump_time,
+										   current_time,
+										   (dump_interval * 1000)))
+			{
+				dump_now();
+				if (got_sigterm)
+					return;		/* got shutdown signal during or right after a
+								 * dump. And, I think better to return now. */
+				last_dump_time = GetCurrentTimestamp();
+				nap.tv_sec = dump_interval;
+				nap.tv_usec = 0;
+			}
+			else
+			{
+				long		secs;
+				int			usecs;
+
+				TimestampDifference(last_dump_time, current_time,
+									&secs, &usecs);
+				nap.tv_sec = dump_interval - secs;
+				nap.tv_usec = 0;
+			}
+		}
+
+		ResetLatch(&MyProc->procLatch);
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   (nap.tv_sec * 1000L) + (nap.tv_usec / 1000L),
+					   PG_WAIT_EXTENSION);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* One last block meta info dump while postmaster shutdown. */
+	if (dump_interval != AT_PWARM_OFF)
+		dump_now();
+}
+
+/*
+ * autoprewarm_main -- the main entry point of autoprewarm bgworker process.
+ */
+void
+autoprewarm_main(Datum main_arg)
+{
+	AutoPrewarmTask next_task;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+	pqsignal(SIGUSR1, apw_sigusr1_handler);
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+
+	next_task = get_autoprewarm_task(DatumGetInt32(main_arg));
+
+	ereport(LOG, (errmsg("autoprewarm has started")));
+
+	/*
+	 * **** perform autoprewarm's next task	****
+	 */
+	if (next_task == TASK_PREWARM_BUFFERPOOL)
+	{
+		prewarm_buffer_pool();
+
+		/* prewarm is done lets move to TASK_DUMP_BUFFERPOOL_INFO. */
+		state->current_task = TASK_DUMP_BUFFERPOOL_INFO;
+		next_task = TASK_DUMP_BUFFERPOOL_INFO;
+	}
+
+	if (next_task == TASK_DUMP_BUFFERPOOL_INFO)
+	{
+		dump_block_info_periodically();
+
+		/*
+		 * down grade to TASK_DUMP_IMMEDIATE_ONCE so others can start
+		 * TASK_DUMP_BUFFERPOOL_INFO
+		 */
+		state->current_task = TASK_DUMP_IMMEDIATE_ONCE;
+	}
+
+	ereport(LOG, (errmsg("autoprewarm shutting down")));
+}
+
+/* ============================================================================
+ * =============	extension's entry functions/utilities	===================
+ * ============================================================================
+ */
+
+/* Register autoprewarm load bgworker. */
+static void
+setup_autoprewarm(BackgroundWorker *autoprewarm, const char *worker_name,
+			   const char *worker_function, Datum main_arg, int restart_time,
+				  int extra_flags)
+{
+	MemSet(autoprewarm, 0, sizeof(BackgroundWorker));
+	autoprewarm->bgw_flags = BGWORKER_SHMEM_ACCESS | extra_flags;
+
+	/* Register the autoprewarm background worker */
+	autoprewarm->bgw_start_time = BgWorkerStart_ConsistentState;
+	autoprewarm->bgw_restart_time = restart_time;
+	strcpy(autoprewarm->bgw_library_name, "pg_prewarm");
+	strcpy(autoprewarm->bgw_function_name, worker_function);
+	strncpy(autoprewarm->bgw_name, worker_name, BGW_MAXLEN);
+	autoprewarm->bgw_main_arg = main_arg;
+}
+
+/* Extension's entry point. */
+void
+_PG_init(void)
+{
+	BackgroundWorker prewarm_worker;
+
+	/* Define custom GUC variables. */
+	if (process_shared_preload_libraries_in_progress)
+		DefineCustomBoolVariable("pg_prewarm.autoprewarm",
+								 "Enable/Disable auto-prewarm feature.",
+								 NULL,
+								 &autoprewarm,
+								 true,
+								 PGC_POSTMASTER,
+								 0,
+								 NULL,
+								 NULL,
+								 NULL);
+
+	DefineCustomIntVariable("pg_prewarm.dump_interval",
+					   "Sets the maximum time between two buffer pool dumps",
+							"If set to Zero, timer based dumping is disabled."
+							" If set to -1, stops the running autoprewarm.",
+							&dump_interval,
+							AT_PWARM_DEFAULT_DUMP_INTERVAL,
+							AT_PWARM_OFF, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	EmitWarningsOnPlaceholders("pg_prewarm");
+
+	/* if not run as a preloaded library, nothing more to do here! */
+	if (!process_shared_preload_libraries_in_progress)
+		return;
+
+	/* Request additional shared resources */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
+	RequestNamedLWLockTranche("pg_autoprewarm", 1);
+
+	/* Has been set not to start autoprewarm bgworker. Nothing more to do. */
+	if (!autoprewarm)
+		return;
+
+	/* Register autoprewarm load. */
+	setup_autoprewarm(&prewarm_worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_PREWARM_BUFFERPOOL), 0, 0);
+	RegisterBackgroundWorker(&prewarm_worker);
+}
+
+/*
+ * Dynamically launch an autoprewarm dump worker.
+ */
+static pid_t
+autoprewarm_dump_launcher(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	setup_autoprewarm(&worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_DUMP_BUFFERPOOL_INFO), 0, 0);
+
+	/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			   errmsg("registering dynamic bgworker \"autoprewarm\" failed"),
+				 errhint("Consider increasing configuration parameter "
+						 "\"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerStartup(handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not start autoprewarm dump bgworker"),
+			   errhint("More details may be available in the server log.")));
+	}
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			  errmsg("cannot start bgworker autoprewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart the database.")));
+	}
+
+	Assert(status == BGWH_STARTED);
+	return pid;
+}
+
+/*
+ * The C-Language entry function to launch autoprewarm dump bgworker.
+ */
+Datum
+launch_autoprewarm_dump(PG_FUNCTION_ARGS)
+{
+	pid_t		pid;
+
+	/* Has been set not to dump. Nothing more to do. */
+	if (dump_interval == AT_PWARM_OFF)
+		PG_RETURN_NULL();
+
+	pid = autoprewarm_dump_launcher();
+	PG_RETURN_INT32(pid);
+}
+
+/*
+ * The C-Language entry function to dump immediately.
+ */
+Datum
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+	AutoPrewarmTask next_task;
+
+	/* dump only if prewarm is not in progress. */
+	next_task = get_autoprewarm_task(TASK_DUMP_IMMEDIATE_ONCE);
+	if (next_task == TASK_DUMP_IMMEDIATE_ONCE)
+		PG_RETURN_INT64(dump_now());
+	PG_RETURN_NULL();
+}
diff --git a/contrib/pg_prewarm/autoprewarm.h b/contrib/pg_prewarm/autoprewarm.h
new file mode 100644
index 0000000..4220fc2
--- /dev/null
+++ b/contrib/pg_prewarm/autoprewarm.h
@@ -0,0 +1,35 @@
+/*
+ * contrib/pg_prewarm/autoprewarm.h
+ */
+#ifndef __AUTOPREWARM_H__
+#define __AUTOPREWARM_H__
+
+#include "postgres.h"
+#include <unistd.h>
+
+/* These are always necessary for a bgworker. */
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+
+/* These are necessary for prewarm utilities. */
+#include "access/heapam.h"
+#include "access/xact.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "pgstat.h"
+#include "storage/buf_internals.h"
+#include "storage/dsm.h"
+#include "storage/smgr.h"
+#include "utils/acl.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relfilenodemap.h"
+#include "utils/resowner.h"
+
+#endif   /* __AUTOPREWARM_H__ */
diff --git a/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
new file mode 100644
index 0000000..6c35fb7
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,14 @@
+/* contrib/pg_prewarm/pg_prewarm--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_prewarm UPDATE TO '1.2'" to load this file. \quit
+
+CREATE FUNCTION launch_autoprewarm_dump()
+RETURNS pg_catalog.int4 STRICT
+AS 'MODULE_PATHNAME', 'launch_autoprewarm_dump'
+LANGUAGE C;
+
+CREATE FUNCTION autoprewarm_dump_now()
+RETURNS pg_catalog.int8 STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_dump_now'
+LANGUAGE C;
diff --git a/contrib/pg_prewarm/pg_prewarm.control b/contrib/pg_prewarm/pg_prewarm.control
index cf2fb92..40e3add 100644
--- a/contrib/pg_prewarm/pg_prewarm.control
+++ b/contrib/pg_prewarm/pg_prewarm.control
@@ -1,5 +1,5 @@
 # pg_prewarm extension
 comment = 'prewarm relation data'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_prewarm'
 relocatable = true
diff --git a/doc/src/sgml/pgprewarm.sgml b/doc/src/sgml/pgprewarm.sgml
index c090401..ab5bf42 100644
--- a/doc/src/sgml/pgprewarm.sgml
+++ b/doc/src/sgml/pgprewarm.sgml
@@ -10,7 +10,9 @@
  <para>
   The <filename>pg_prewarm</filename> module provides a convenient way
   to load relation data into either the operating system buffer cache
-  or the <productname>PostgreSQL</productname> buffer cache.
+  or the <productname>PostgreSQL</productname> buffer cache. Additionally, an
+  automatic prewarming of the server buffers is supported whenever the server
+  restarts.
  </para>
 
  <sect2>
@@ -55,6 +57,102 @@ pg_prewarm(regclass, mode text default 'buffer', fork text default 'main',
    cache. For these reasons, prewarming is typically most useful at startup,
    when caches are largely empty.
   </para>
+
+<synopsis>
+launch_autoprewarm_dump() RETURNS int4
+</synopsis>
+
+  <para>
+   This is a SQL callable function to launch the <literal>autoprewarm</literal>
+   worker to dump the buffer pool information at regular interval. In a server,
+   we can only run one <literal>autoprewarm</literal> worker so if worker sees
+   another existing worker it will exit immediately. The return value is pid of
+   the worker which has been launched.
+  </para>
+
+<synopsis>
+autoprewarm_dump_now() RETURNS int8
+</synopsis>
+
+  <para>
+   This is a SQL callable function to dump buffer pool information immediately
+   once by a backend. This can work in parallel
+   with the <literal>autoprewarm</literal> worker while it is dumping.
+   The return value is the number of blocks info dumped.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>autoprewarm</title>
+
+  <para>
+  A bgworker which automatically records information about blocks which were
+  present in buffer pool before server shutdown and then prewarm the buffer
+  pool upon server restart with those blocks.
+  </para>
+
+  <para>
+  When the shared library <literal>pg_prewarm</literal> is preloaded via
+  <xref linkend="guc-shared-preload-libraries"> in <filename>postgresql.conf</>,
+  a bgworker <literal>autoprewarm</literal> is launched immediately after the
+  server has reached a consistent state. The bgworker will start loading blocks
+  recorded in <literal>$PGDATA/autoprewarm.blocks</literal> until there is a
+  free buffer left in the buffer pool. This way we do not replace any new
+  blocks which were loaded either by the recovery process or the querying
+  clients.
+  </para>
+
+  <para>
+  Once the <literal>autoprewarm</literal> bgworker has completed its prewarm
+  task, it will start a new task to periodically dump the information about
+  blocks which are currently in shared buffer pool. Upon next server restart,
+  the bgworker will prewarm the buffer pool by loading those blocks. The GUC
+  <literal>pg_prewarm.dump_interval</literal> will control the dumping activity
+  of the bgworker.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+ <variablelist>
+   <varlistentry>
+    <term>
+     <varname>pg_prewarm.autoprewarm</varname> (<type>boolean</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.autoprewarm</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is valid only for <literal>autoprewarm</literal>. An autoprewarm
+      worker will only be started if this variable is set <literal>on</literal>.
+      The default value is <literal>on</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry>
+   <term>
+     <varname>pg_prewarm.dump_interval</varname> (<type>int</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.dump_interval</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is valid only for <literal>autoprewarm</literal>. The minimum number
+      of seconds between two buffer pool's block information dump. The default
+      is 300 seconds. It also takes special values. If set to 0 then timer
+      based dump is disabled, it dumps only while the server is shutting down.
+      If set to -1, the running <literal>autoprewarm</literal> will be stopped.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
  </sect2>
 
  <sect2>
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 5d0a636..06a34a7 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,23 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- a lockless check to see if there is a free buffer in
+ *					   buffer pool.
+ *
+ * If the result is true that will become stale once free buffers are moved out
+ * by other operations, so the caller who strictly want to use a free buffer
+ * should not call this.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index ff99f6b..ab04bd9 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index eaa6d32..c6fa86a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -138,6 +138,8 @@ AttrDefault
 AttrNumber
 AttributeOpts
 AuthRequest
+AutoPrewarmSharedState
+AutoPrewarmTask
 AutoVacOpts
 AutoVacuumShmemStruct
 AutoVacuumWorkItem
@@ -214,10 +216,12 @@ BitmapOr
 BitmapOrPath
 BitmapOrState
 Bitmapset
+BlkType
 BlobInfo
 Block
 BlockId
 BlockIdData
+BlockInfoRecord
 BlockNumber
 BlockSampler
 BlockSamplerData
@@ -2869,6 +2873,7 @@ pos_trgm
 post_parse_analyze_hook_type
 pqbool
 pqsigfunc
+prewarm_elem
 printQueryOpt
 printTableContent
 printTableFooter

#74

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Konstantin Knizhnik (#72)

Re: Proposal : For Auto-Prewarm.

On Tue, May 30, 2017 at 12:36 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

On 27.10.2016 14:39, Mithun Cy wrote:
And as far as I understand pg_autoprewarm has all necessary infrastructure
to do parallel load. We just need to spawn more than one background worker
and specify
separate block range for each worker.

Do you think that such functionality (parallel autoprewarm) can be useful
and be easily added?

I have not put any attention on making the autoprewarm parallel. But
as per the current state of the code, making the subworkers to load
the blocks in parallel should be possible and I think could be easily
added. Probably we need a configurable parameter to restrict max
number of parallel prewarm sub-workers. Since I have not thought much
about this I might not be aware of all of the difficulties around it.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#75

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Konstantin Knizhnik (#72)

Re: Proposal : For Auto-Prewarm.

On Tue, May 30, 2017 at 12:36 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

On 27.10.2016 14:39, Mithun Cy wrote:

I wonder if you considered parallel prewarming of a table?
Right now either with pg_prewarm, either with pg_autoprewarm, preloading
table's data is performed by one backend.
It certainly makes sense if there is just one HDD and we want to minimize
impact of pg_prewarm on normal DBMS activity.
But sometimes we need to load data in memory as soon as possible. And modern
systems has larger number of CPU cores and
RAID devices make it possible to efficiently load data in parallel.

I have asked this question in context of my CFS (compressed file system) for
Postgres. The customer's complaint was that there are 64 cores at his system
but when
he is building index, decompression of heap data is performed by only one
core. This is why I thought about prewarm... (parallel index construction is
separate story...)

pg_prewarm makes is possible to specify range of blocks, so, in principle,
it is possible to manually preload table in parallel, by spawining
pg_prewarm
with different subranges in several backends. But it is definitely not user
friendly approach.
And as far as I understand pg_autoprewarm has all necessary infrastructure
to do parallel load. We just need to spawn more than one background worker
and specify
separate block range for each worker.

Do you think that such functionality (parallel autoprewarm) can be useful
and be easily added?

I think parallel load functionality can be useful for few cases like
when the system has multiple I/O channels. I think doing it
parallelly might need some additional infrastructure to manage the
workers based on how we decide to parallelism like whether we allow
each worker to pick one block and load the same or specify the range
of blocks for each worker. Each way has its own pros and cons. It
seems like even if we want to add such an option to *prewarm
functionality, it should be added as a separate patch as it has its
own set of problems that needs to be solved.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#76

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Mithun Cy (#73)

Re: Proposal : For Auto-Prewarm.

On Tue, May 30, 2017 at 8:15 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Tue, May 30, 2017 at 10:16 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

Thanks Robert,

Sorry, there was one more mistake ( a typo) in dump_now() instead of
using pfree I used free corrected same in the new patch v10.

+ * contrib/autoprewarm.c

Wrong.

+    Oid            database;        /* database */
+    Oid            spcnode;        /* tablespace */
+    Oid            filenode;        /* relation's filenode. */
+    ForkNumber    forknum;        /* fork number */
+    BlockNumber blocknum;        /* block number */

Why spcnode rather than tablespace? Also, the third line has a
period, but not the others. I think you could just drop these
comments; they basically just duplicate the structure names, except
for spcnode which doesn't but probably should.

+ bool can_do_prewarm; /* if set can't do prewarm task. */

The comment and the name of the variable imply opposite meanings.

+/*
+ * detach_blkinfos - on_shm_exit detach the dsm allocated for blockinfos.
+ */
+static void
+detach_blkinfos(int code, Datum arg)
+{
+    if (seg != NULL)
+        dsm_detach(seg);
+}

I assume you don't really need this. Presumably process exit is going
to detach the DSM anyway.

+    if (seg == NULL)
+        ereport(ERROR,
+                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                 errmsg("unable to map dynamic shared memory segment")));

Let's use the wording from the message in parallel.c rather than this
wording. Actually, we should probably (as a separate patch) change
test_shm_mq's worker.c to use the parallel.c wording also.

+ SetCurrentStatementStartTimestamp();

Why do we need this?

+ StartTransactionCommand();

Do we need a transaction? If so, how about starting a new transaction
for each relation instead of using a single one for the entire
prewarm?

+    if (status == BGWH_STOPPED)
+        return;
+
+    if (status == BGWH_POSTMASTER_DIED)
+    {
+        ereport(ERROR,
+                (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+              errmsg("cannot start bgworker autoprewarm without postmaster"),
+                 errhint("Kill all remaining database processes and restart"
+                         " the database.")));
+    }
+
+    Assert(0);

Instead of writing it this way, remove the first "if" block and change
the second one to Assert(status == BGWH_STOPPED). Also, I'd ditch the
curly braces in this case.

+ file = fopen(AUTOPREWARM_FILE, PG_BINARY_R);

Use AllocateFile() instead. See the comments for that function and at
the top of fd.c. Then you don't need to worry about leaking the file
handle on an error, so you can remove the fclose() before ereport()
stuff -- which is incomplete anyway, because you could fail e.g.
inside dsm_create().

+                 errmsg("Error reading num of elements in \"%s\" for"
+                        " autoprewarm : %m", AUTOPREWARM_FILE)));

As I said in a previous review, please do NOT split error messages
across lines like this. Also, this error message is nothing close to
PostgreSQL style. Please read
https://www.postgresql.org/docs/devel/static/error-style-guide.html
and learn to follow all those guidelines written therein. I see at
least 3 separate problems with this message.

+ num_elements = i;

I'd do something like if (i != num_elements) elog(ERROR, "autoprewarm
block dump has %d entries but expected %d", i, num_elements); It
seems OK for this to be elog() rather than ereport() because the file
should never be corrupt unless the user has cheated by hand-editing
it.

I think you can get rid of next_db_pos altogether, and this
prewarm_elem thing too. Design sketch:

1. Move all the stuff that's in prewarm_elem into
AutoPrewarmSharedState. Rename start_pos to prewarm_start_idx and
end_of_blockinfos to prewarm_stop_idx.

2. Instead of computing all of the database ranges first and then
launching workers, do it all in one loop. So figure out where the
records for the current database end and set prewarm_start_idx and
prewarm_end_idx appropriately. Launch a worker. When the worker
terminates, set prewarm_start_idx = prewarm_end_idx and advance
prewarm_end_idx to the end of the next database's records.

This saves having to allocate memory for the next_db_pos array, and it
also avoids this crock:

+ memcpy(&pelem, MyBgworkerEntry->bgw_extra, sizeof(prewarm_elem));

The reason that's bad is because it only works so long as bgw_extra is
large enough to hold prewarm_elem. If prewarm_elem grows or bgw_extra
shrinks, this turns into a buffer overrun.

I would use AUTOPREWARM_FILE ".tmp" rather than a name incorporating
the PID for the temporary file. Otherwise, you might leave many
temporary files behind under different names. If you use the same
name every time, you'll never have more than one, and the next
successful dump will end up getting rid of it along the way.

+            pfree(block_info_array);
+            CloseTransientFile(fd);
+            unlink(transient_dump_file_path);
+            ereport(ERROR,
+                    (errcode_for_file_access(),
+                     errmsg("error writing to \"%s\" : %m",
+                            AUTOPREWARM_FILE)));

Again, this is NOT a standard error message text. It occurs in zero
other places in the source tree. You are not the first person to need
an error message for a failed write to a file; please look at what the
previous authors did. Also, the pfree() before report is not needed;
isn't the whole process going to terminate? Also, you can't really use
errcode_for_file_access() here, because you've meanwhile done
CloseTransientFile() and unlink(), which will have clobbered errno.

+ ereport(LOG, (errmsg("saved metadata info of %d blocks", num_blocks)));

Not project style for ereport(). Line break after the first comma.
Similarly elsewhere.

+ *    dump_now - the main routine which goes through each buffer
header of buffer
+ *    pool and dumps their meta data. We Sort these data and then dump them.
+ *    Sorting is necessary as it facilitates sequential read during load.

This is no longer true, because you moved the sort to the load side.
It's also not capitalized properly.

Discussions of the format of the autoprewarm dump file involve
inexplicably varying number of < and > symbols:

+ *        <<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>> in
+ * <DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum> and we shall call it
+        buflen = sprintf(buf, "%u,%u,%u,%u,%u\n",

+#ifndef __AUTOPREWARM_H__
+#define __AUTOPREWARM_H__

We don't use double-underscored names for header files. Please
consult existing header files for the appropriate style. Also, why
does this file exist at all, instead of just including them in the .c
file? The pointer of a header is for things that need to be included
by multiple .c files, but there's no such need here.

+             * load. If there are no other block infos than the global objects
+             * we silently ignore them. Should I throw error?

Silently ignoring them seems fine. Throwing an error doesn't seem
like it would improve things.

+        /*
+         * Register a sub-worker to load new database's block. Wait until the
+         * sub-worker finish its job before launching next sub-worker.
+         */
+        launch_prewarm_subworker(&pelem);

The function name implies that it launches the worker, but the comment
implies that it also waits for it to terminate. Functions should be
named in a way that matches what they do.

I feel like the get_autoprewarm_task() function is introducing fairly
baroque logic for something that really ought to be more simple. All
autoprewarm_main() really needs to do is:

if (!state->autoprewarm_done)
autoprewarm();
dump_block_info_periodically();

The locking in autoprewarm_dump_now() is a bit tricky. There are two
trouble cases. One is that we try to rename() our new dump file on
top of the existing one while a background worker is still using it to
perform an autoprewarm. The other is that we try to write a new
temporary dump file while some other process is trying to do the same
thing. I think we should fix this by storing a PID in
AutoPrewarmSharedState; a process which wants to perform a dump or an
autoprewarm must change the PID from 0 to their own PID, and change it
back to 0 on successful completion or error exit. If we go to perform
an immediate dump process and finds a non-zero value already just does
ereport(ERROR, ...), including the PID of the other process in the
message (e.g. "unable to perform block dump because dump file is being
used by PID %d"). In a background worker, if we go to dump and find
the file in use, log a message (e.g. "skipping block dump because it
is already being performed by PID %d", "skipping prewarm because block
dump file is being rewritten by PID %d").

I also think we should change is_bgworker_running to a PID, so that if
we try to launch a new worker we can report something like
"autoprewarm worker is already running under PID %d".

So putting that all together, I suppose AutoPrewarmSharedState should
end up looking like this:

LWLock lock; /* mutual exclusion */
pid_t bgworker_pid; /* for main bgworker */
pid_t pid_using_dumpfile; /* for autoprewarm or block dump */

/* following items are for communication with per-database worker */
dsm_handle block_info_handle;
Oid database;
int prewarm_start_idx;
int prewarm_stop_idx;

I suggest going through and changing "subworker" to "per-database
worker" throughout.

BTW, have you tested how long this takes to run with, say, shared_buffers = 8GB?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#77

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Robert Haas (#76)

2 attachment(s)

Re: Proposal : For Auto-Prewarm.

On Wed, May 31, 2017 at 10:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:

+ * contrib/autoprewarm.c

Wrong.

-- Oops Sorry fixed.

+    Oid            database;        /* database */
+    Oid            spcnode;        /* tablespace */
+    Oid            filenode;        /* relation's filenode. */
+    ForkNumber    forknum;        /* fork number */
+    BlockNumber blocknum;        /* block number */
Why spcnode rather than tablespace? Also, the third line has a
period, but not the others. I think you could just drop these
comments; they basically just duplicate the structure names, except
for spcnode which doesn't but probably should.

-- Dropped comments and changed spcnode to tablespace.

+ bool can_do_prewarm; /* if set can't do prewarm task. */

The comment and the name of the variable imply opposite meanings.

-- Sorry a typo. Now this variable has been removed as you have
suggested with new variables in AutoPrewarmSharedState.

+/*
+ * detach_blkinfos - on_shm_exit detach the dsm allocated for blockinfos.
+ */
I assume you don't really need this.  Presumably process exit is going
to detach the DSM anyway.

-- Yes considering process exit will detach the dsm, I have removed
that function.

+    if (seg == NULL)
+        ereport(ERROR,
+                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                 errmsg("unable to map dynamic shared memory segment")));
Let's use the wording from the message in parallel.c rather than this
wording. Actually, we should probably (as a separate patch) change
test_shm_mq's worker.c to use the parallel.c wording also.

-- I have corrected the message with "could not map dynamic shared
memory segment" as in parallel.c

+ SetCurrentStatementStartTimestamp();

Why do we need this?

-- Removed Sorry forgot to remove same when I removed the SPI connection code.

+ StartTransactionCommand();

Do we need a transaction? If so, how about starting a new transaction
for each relation instead of using a single one for the entire
prewarm?

-- We do relation_open hence need a transaction. As suggested now we
start a new transaction on every new relation.

+    if (status == BGWH_STOPPED)
+        return;
+
+    if (status == BGWH_POSTMASTER_DIED)
+    {
+        ereport(ERROR,
+                (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+              errmsg("cannot start bgworker autoprewarm without postmaster"),
+                 errhint("Kill all remaining database processes and restart"
+                         " the database.")));
+    }
+
+    Assert(0);

Instead of writing it this way, remove the first "if" block and change
the second one to Assert(status == BGWH_STOPPED). Also, I'd ditch the
curly braces in this case.

-- Fixed as suggested.

+ file = fopen(AUTOPREWARM_FILE, PG_BINARY_R);

Use AllocateFile() instead. See the comments for that function and at
the top of fd.c. Then you don't need to worry about leaking the file
handle on an error, so you can remove the fclose() before ereport()

now> stuff -- which is incomplete anyway, because you could fail e.g.

inside dsm_create().

-- Using AllocateFile now.

+                 errmsg("Error reading num of elements in \"%s\" for"
+                        " autoprewarm : %m", AUTOPREWARM_FILE)));
As I said in a previous review, please do NOT split error messages
across lines like this. Also, this error message is nothing close to
PostgreSQL style. Please read
https://www.postgresql.org/docs/devel/static/error-style-guide.html
and learn to follow all those guidelines written therein. I see at
least 3 separate problems with this message.

-- Thanks, I have tried to fix it now.

+ num_elements = i;

I'd do something like if (i != num_elements) elog(ERROR, "autoprewarm
block dump has %d entries but expected %d", i, num_elements); It
seems OK for this to be elog() rather than ereport() because the file
should never be corrupt unless the user has cheated by hand-editing
it.

-- Fixed as suggested. Now eloged as an ERROR.

I think you can get rid of next_db_pos altogether, and this
prewarm_elem thing too. Design sketch:

1. Move all the stuff that's in prewarm_elem into
AutoPrewarmSharedState. Rename start_pos to prewarm_start_idx and
end_of_blockinfos to prewarm_stop_idx.

2. Instead of computing all of the database ranges first and then
launching workers, do it all in one loop. So figure out where the
records for the current database end and set prewarm_start_idx and
prewarm_end_idx appropriately. Launch a worker. When the worker
terminates, set prewarm_start_idx = prewarm_end_idx and advance
prewarm_end_idx to the end of the next database's records.

This saves having to allocate memory for the next_db_pos array, and it
also avoids this crock:

+ memcpy(&pelem, MyBgworkerEntry->bgw_extra, sizeof(prewarm_elem));

-- Fixed as suggested.

The reason that's bad is because it only works so long as bgw_extra is
large enough to hold prewarm_elem. If prewarm_elem grows or bgw_extra
shrinks, this turns into a buffer overrun.

-- passing prewarm info through bgw_extra helped us to restrict the
scope and lifetime of prewarm_elem only to prewarm task. Moving them
to shared memory made them global even though they are not needed once
prewarm task is finished. As there are other disadvantages of using
bgw_extra I have now implemented as you have suggested.

I would use AUTOPREWARM_FILE ".tmp" rather than a name incorporating
the PID for the temporary file. Otherwise, you might leave many
temporary files behind under different names. If you use the same
name every time, you'll never have more than one, and the next
successful dump will end up getting rid of it along the way.

-- Fixed as sugested. Previosuly PID was used so that concurrent dump
can happen between dump worker and immediate dump as they will write
to two different files. With new way of registering PID before file
access in shared memory I think that problem can be adressed.

+            pfree(block_info_array);
+            CloseTransientFile(fd);
+            unlink(transient_dump_file_path);
+            ereport(ERROR,
+                    (errcode_for_file_access(),
+                     errmsg("error writing to \"%s\" : %m",

.> + AUTOPREWARM_FILE)));

Again, this is NOT a standard error message text. It occurs in zero
other places in the source tree. You are not the first person to need
an error message for a failed write to a file; please look at what the
previous authors did. Also, the pfree() before report is not needed;
isn't the whole process going to terminate? Also, you can't really use
errcode_for_file_access() here, because you've meanwhile done
CloseTransientFile() and unlink(), which will have clobbered errno.

-- Removed pfree, saved errno before CloseTransientFile() and unlink()

+ ereport(LOG, (errmsg("saved metadata info of %d blocks", num_blocks)));

Not project style for ereport(). Line break after the first comma.
Similarly elsewhere.

-- Tried to fix same

+ *    dump_now - the main routine which goes through each buffer
header of buffer
+ *    pool and dumps their meta data. We Sort these data and then dump them.
+ *    Sorting is necessary as it facilitates sequential read during load.
This is no longer true, because you moved the sort to the load side.
It's also not capitalized properly.

-- Sorry removed now.

Discussions of the format of the autoprewarm dump file involve
inexplicably varying number of < and > symbols:
+ *        <<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>> in
+ * <DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum> and we shall call it
+        buflen = sprintf(buf, "%u,%u,%u,%u,%u\n",

-- Sorry fixed now, in all of the places the block info formats will
not have such (</>) delimiter.

+#ifndef __AUTOPREWARM_H__
+#define __AUTOPREWARM_H__

We don't use double-underscored names for header files. Please
consult existing header files for the appropriate style. Also, why
does this file exist at all, instead of just including them in the .c
file? The pointer of a header is for things that need to be included
by multiple .c files, but there's no such need here.

-- This was done to fix one of the previous review comments. I have
moved them back to .c file.

+             * load. If there are no other block infos than the global objects
+             * we silently ignore them. Should I throw error?
Silently ignoring them seems fine. Throwing an error doesn't seem
like it would improve things.

-- Okay thanks.

+        /*
+         * Register a sub-worker to load new database's block. Wait until the
+         * sub-worker finish its job before launching next sub-worker.
+         */
+        launch_prewarm_subworker(&pelem);
The function name implies that it launches the worker, but the comment
implies that it also waits for it to terminate. Functions should be
named in a way that matches what they do.

-- Have renamed it to launch_and_wait_for_per_database_worker

I feel like the get_autoprewarm_task() function is introducing fairly
baroque logic for something that really ought to be more simple. All
autoprewarm_main() really needs to do is:

if (!state->autoprewarm_done)
autoprewarm();
dump_block_info_periodically();

-- Have simplified things as suggested now. Function
get_autoprewarm_task has been removed.

The locking in autoprewarm_dump_now() is a bit tricky. There are two
trouble cases. One is that we try to rename() our new dump file on
top of the existing one while a background worker is still using it to
perform an autoprewarm. The other is that we try to write a new
temporary dump file while some other process is trying to do the same
thing. I think we should fix this by storing a PID in
AutoPrewarmSharedState; a process which wants to perform a dump or an
autoprewarm must change the PID from 0 to their own PID, and change it
back to 0 on successful completion or error exit. If we go to perform
an immediate dump process and finds a non-zero value already just does
ereport(ERROR, ...), including the PID of the other process in the
message (e.g. "unable to perform block dump because dump file is being
used by PID %d"). In a background worker, if we go to dump and find
the file in use, log a message (e.g. "skipping block dump because it
is already being performed by PID %d", "skipping prewarm because block
dump file is being rewritten by PID %d").

-- Fixed as suggested.

I also think we should change is_bgworker_running to a PID, so that if
we try to launch a new worker we can report something like
"autoprewarm worker is already running under PID %d".

-- Fixed. I could only "LOG" about another autoprewarm worker already
running and then exit. Because on ERROR we try to restart the worker,
so do not want to restart such workers.

So putting that all together, I suppose AutoPrewarmSharedState should
end up looking like this:

LWLock lock; /* mutual exclusion */
pid_t bgworker_pid; /* for main bgworker */
pid_t pid_using_dumpfile; /* for autoprewarm or block dump */

-- I think one more member is required which state whether prewarm can
be done when the worker restarts.

/* following items are for communication with per-database worker */
dsm_handle block_info_handle;
Oid database;
int prewarm_start_idx;
int prewarm_stop_idx;

-- Fixed as suggested

I suggest going through and changing "subworker" to "per-database
worker" throughout.

-- Fixed as suggested.

BTW, have you tested how long this takes to run with, say, shared_buffers = 8GB?

I have tried same on my local machine with ssd as a storage.

settings: shared_buffers = 8GB, loaded data with pg_bench scale_factor=1000.

Total blocks got dumped
autoprewarm_dump_now
----------------------
1048576

5 different load time based logs

1.
2017-06-04 11:30:26.460 IST [116253] LOG: autoprewarm has started
2017-06-04 11:30:43.443 IST [116253] LOG: autoprewarm load task ended
-- 17 secs

2
2017-06-04 11:31:13.565 IST [116291] LOG: autoprewarm has started
2017-06-04 11:31:30.317 IST [116291] LOG: autoprewarm load task ended
-- 17 secs

3.
2017-06-04 11:32:12.995 IST [116329] LOG: autoprewarm has started
2017-06-04 11:32:29.982 IST [116329] LOG: autoprewarm load task ended
-- 17 secs

4.
2017-06-04 11:32:58.974 IST [116361] LOG: autoprewarm has started
2017-06-04 11:33:15.017 IST [116361] LOG: autoprewarm load task ended
-- 17secs

5.
2017-06-04 12:15:49.772 IST [117936] LOG: autoprewarm has started
2017-06-04 12:16:11.012 IST [117936] LOG: autoprewarm load task ended
-- 22 secs.

So mostly from 17 to 22 secs.

But I think I need to do tests on a larger set of configuration on
different storage types. I shall do same and upload later. I have also
uploaded latest performance test results (on my local machine ssd
drive)
configuration: shared_buffer = 8GB,
test setting: scale_factor=300 (data fits to shared_buffers) pgbench clients =1

TEST
PGBENCH_RUN="./pgbench --no-vacuum --protocol=prepared --time=5 -j 1
-c 1 --select-only postgres"
START_TIME=$SECONDS; echo TIME, TPS; while true; do TPS=$($PGBENCH_RUN
| grep excluding | cut -d ' ' -f 3); TIME=$((SECONDS-START_TIME));
echo $TIME, $TPS; done

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

autoprewarm_11.patchapplication/octet-stream; name=autoprewarm_11.patchDownload

commit c07e8f64131cc0cbb933a3881596a74a4782e351
Author: mithun <mithun@localhost.localdomain>
Date:   Sun Jun 4 12:12:57 2017 +0530

    autoprewarm_11.patch

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e..88580d1 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,10 +1,10 @@
 # contrib/pg_prewarm/Makefile
 
 MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+OBJS = pg_prewarm.o autoprewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
-DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
+DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
new file mode 100644
index 0000000..51acc09
--- /dev/null
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -0,0 +1,1055 @@
+/*-------------------------------------------------------------------------
+ *
+ * autoprewarm.c
+ *		Automatically prewarm the shared buffer pool when server restarts.
+ *
+ * DESCRIPTION
+ *
+ *		It is a bgworker which automatically records information about blocks
+ *		which were present in buffer pool before server shutdown and then
+ *		prewarm the buffer pool upon server restart with those blocks.
+ *
+ *		How does it work? When the shared library "pg_prewarm" is preloaded, a
+ *		bgworker "autoprewarm" is launched immediately after the server has
+ *		reached consistent state. The bgworker will start loading blocks
+ *		recorded in the format BlockInfoRecord
+ *		database,tablespace,filenode,forknum,blocknum in
+ *		$PGDATA/AUTOPREWARM_FILE, until there is no free buffer left in the
+ *		buffer pool. This way we do not replace any new blocks which were
+ *		loaded either by the recovery process or the querying clients.
+ *
+ *		Once the "autoprewarm" bgworker has completed its prewarm task, it will
+ *		start a new task to periodically dump the BlockInfoRecords related to
+ *		blocks which are currently in shared buffer pool. Upon next server
+ *		restart, the bgworker will prewarm the buffer pool by loading those
+ *		blocks. The GUC pg_prewarm.dump_interval will control the dumping
+ *		activity of the bgworker.
+ *
+ *	Copyright (c) 2016-2017, PostgreSQL Global Development Group
+ *
+ *	IDENTIFICATION
+ *		contrib/pg_prewarm/autoprewarm.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include <unistd.h>
+
+/* These are always necessary for a bgworker. */
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+
+/* These are necessary for prewarm utilities. */
+#include "access/heapam.h"
+#include "access/xact.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "pgstat.h"
+#include "storage/buf_internals.h"
+#include "storage/dsm.h"
+#include "storage/smgr.h"
+#include "utils/acl.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relfilenodemap.h"
+#include "utils/resowner.h"
+
+PG_FUNCTION_INFO_V1(launch_autoprewarm_dump);
+PG_FUNCTION_INFO_V1(autoprewarm_dump_now);
+
+#define AT_PWARM_OFF -1
+#define AT_PWARM_DUMP_AT_SHUTDOWN_ONLY 0
+#define AT_PWARM_DEFAULT_DUMP_INTERVAL 300
+
+#define AUTOPREWARM_FILE "autoprewarm.blocks"
+
+/* Primary functions */
+void		_PG_init(void);
+void		autoprewarm_main(Datum main_arg);
+static void dump_block_info_periodically(void);
+static pid_t autoprewarm_dump_launcher(void);
+static void setup_autoprewarm(BackgroundWorker *autoprewarm,
+				  const char *worker_name,
+				  const char *worker_function,
+				  Datum main_arg, int restart_time,
+				  int extra_flags);
+void		load_one_database(Datum main_arg);
+
+/*
+ * Signal Handlers.
+ */
+
+static void apw_sigterm_handler(SIGNAL_ARGS);
+static void apw_sighup_handler(SIGNAL_ARGS);
+static void apw_sigusr1_handler(SIGNAL_ARGS);
+
+/* flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+static volatile sig_atomic_t got_sighup = false;
+
+/*
+ *	Signal handler for SIGTERM
+ *	Set a flag to let the main loop to terminate, and set our latch to wake it
+ *	up.
+ */
+static void
+apw_sigterm_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGHUP
+ *	Set a flag to tell the process to reread the config file, and set our
+ *	latch to wake it up.
+ */
+static void
+apw_sighup_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sighup = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGUSR1.
+ *	The prewarm per-database workers will notify with SIGUSR1 on their
+ *	startup/shutdown.
+ */
+static void
+apw_sigusr1_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/* ============================================================================
+ * ==============	types and variables used by autoprewarm   =============
+ * ============================================================================
+ */
+
+/*
+ * Metadata of each persistent block which is dumped and used to load.
+ */
+typedef struct BlockInfoRecord
+{
+	Oid			database;
+	Oid			tablespace;
+	Oid			filenode;
+	ForkNumber	forknum;
+	BlockNumber blocknum;
+} BlockInfoRecord;
+
+/*
+ * Tasks performed by autoprewarm workers.
+ */
+typedef enum
+{
+	TASK_PREWARM_BUFFERPOOL,	/* prewarm the buffer pool. */
+	TASK_DUMP_BUFFERPOOL_INFO	/* dump the buffer pool block info. */
+} AutoPrewarmTask;
+
+/*
+ * Shared state information about the running autoprewarm bgworker.
+ */
+typedef struct AutoPrewarmSharedState
+{
+	LWLock		lock;			/* mutual exclusion */
+	pid_t		bgworker_pid;	/* for main bgworker */
+	pid_t		pid_using_dumpfile;		/* for autoprewarm or block dump */
+	bool		skip_prewarm_on_restart;		/* if set true, prewarm task
+												 * will not be done */
+
+	/* following items are for communication with per-database worker */
+	dsm_handle	block_info_handle;
+	Oid			database;
+	int			prewarm_start_idx;
+	int			prewarm_stop_idx;
+} AutoPrewarmSharedState;
+
+static AutoPrewarmSharedState *state = NULL;
+
+/* GUC variable which control the dump activity of autoprewarm. */
+static int	dump_interval = 0;
+
+/*
+ * GUC variable which say whether autoprewarm worker has to be started when
+ * preloaded.
+ */
+static bool autoprewarm = true;
+
+/* compare member elements to check if they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * blockinfo_cmp
+ *		Compare function used for qsort().
+ */
+static int
+blockinfo_cmp(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(tablespace);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+	return 0;
+}
+
+/* ============================================================================
+ * =====================	prewarm part of autoprewarm =======================
+ * ============================================================================
+ */
+
+/*
+ * reset_shm_state
+ *		on_shm_exit reset the prewarm state
+ */
+
+static void
+reset_shm_state(int code, Datum arg)
+{
+	if (state->pid_using_dumpfile == MyProcPid)
+		state->pid_using_dumpfile = InvalidPid;
+	if (state->bgworker_pid == MyProcPid)
+		state->bgworker_pid = InvalidPid;
+}
+
+/*
+ * init_autoprewarm_state
+ *		Allocate and initialize autoprewarm related shared memory
+ */
+static void
+init_autoprewarm_state(void)
+{
+	bool		found = false;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	state = ShmemInitStruct("autoprewarm",
+							sizeof(AutoPrewarmSharedState),
+							&found);
+	if (!found)
+	{
+		/* First time through ... */
+		LWLockInitialize(&state->lock, LWLockNewTrancheId());
+		state->bgworker_pid = InvalidPid;
+		state->pid_using_dumpfile = InvalidPid;
+		state->skip_prewarm_on_restart = false;
+	}
+
+	LWLockRelease(AddinShmemInitLock);
+}
+
+/*
+ * load_one_database
+ *		Load block infos of one database by connecting to them.
+ *
+ * Start of prewarm per-database worker. This will try to load blocks of one
+ * database starting from block info position state->prewarm_start_idx to
+ * state->prewarm_stop_idx.
+ */
+void
+load_one_database(Datum main_arg)
+{
+	uint32		pos;
+	BlockInfoRecord *block_info;
+	Relation	rel = NULL;
+	BlockNumber nblocks = 0;
+	BlockInfoRecord *old_blk;
+	dsm_segment *seg;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+
+	/*
+	 * We're now ready to receive signals
+	 */
+	BackgroundWorkerUnblockSignals();
+
+	init_autoprewarm_state();
+	seg = dsm_attach(state->block_info_handle);
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("could not map dynamic shared memory segment")));
+
+	block_info = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	BackgroundWorkerInitializeConnectionByOid(state->database, InvalidOid);
+	old_blk = NULL;
+	pos = state->prewarm_start_idx;
+
+	while (!got_sigterm && pos < state->prewarm_stop_idx && have_free_buffer())
+	{
+		BlockInfoRecord *blk = &block_info[pos];
+		Buffer		buf;
+
+		/*
+		 * Quit if we've reached records for another database. Unless the
+		 * previous blocks were of global objects which were combined with
+		 * next database's block infos.
+		 */
+		if (old_blk != NULL && old_blk->database != blk->database &&
+			old_blk->database != 0)
+			break;
+
+		/*
+		 * When we reach a new relation, close the old one.  Note, however,
+		 * that the previous try_relation_open may have failed, in which case
+		 * rel will be NULL.
+		 */
+		if (old_blk != NULL && old_blk->filenode != blk->filenode &&
+			rel != NULL)
+		{
+			relation_close(rel, AccessShareLock);
+			rel = NULL;
+			CommitTransactionCommand();
+		}
+
+		/*
+		 * Try to open each new relation, but only once, when we first
+		 * encounter it.  If it's been dropped, skip the associated blocks.
+		 */
+		if (old_blk == NULL || old_blk->filenode != blk->filenode)
+		{
+			Oid			reloid;
+
+			Assert(rel == NULL);
+			StartTransactionCommand();
+			reloid = RelidByRelfilenode(blk->tablespace, blk->filenode);
+			if (OidIsValid(reloid))
+				rel = try_relation_open(reloid, AccessShareLock);
+
+			if (!rel)
+				CommitTransactionCommand();
+		}
+		if (!rel)
+		{
+			++pos;
+			old_blk = blk;
+			continue;
+		}
+
+		/* Once per fork, check for fork existence and size. */
+		if (old_blk == NULL ||
+			old_blk->filenode != blk->filenode ||
+			old_blk->forknum != blk->forknum)
+		{
+			RelationOpenSmgr(rel);
+
+			/*
+			 * smgrexists is not safe for illegal forknum, so test before
+			 * calling same.
+			 */
+			if (blk->forknum > InvalidForkNumber &&
+				blk->forknum <= MAX_FORKNUM &&
+				smgrexists(rel->rd_smgr, blk->forknum))
+				nblocks = RelationGetNumberOfBlocksInFork(rel, blk->forknum);
+			else
+				nblocks = 0;
+		}
+
+		/* check if blocknum is valid and with in fork file size. */
+		if (blk->blocknum >= nblocks)
+		{
+			/* move to next forknum. */
+			++pos;
+			old_blk = blk;
+			continue;
+		}
+
+		/* Prewarm buffer. */
+		buf = ReadBufferExtended(rel, blk->forknum, blk->blocknum, RBM_NORMAL,
+								 NULL);
+		if (BufferIsValid(buf))
+			ReleaseBuffer(buf);
+
+		old_blk = blk;
+		++pos;
+	}
+
+	dsm_detach(seg);
+
+	/* release lock on previous relation. */
+	if (rel)
+	{
+		relation_close(rel, AccessShareLock);
+		CommitTransactionCommand();
+	}
+
+	return;
+}
+
+/*
+ * launch_and_wait_for_per_database_worker
+ *		Register a per-database dynamic worker to load.
+ */
+static void
+launch_and_wait_for_per_database_worker(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle = NULL;
+	BgwHandleStatus status PG_USED_FOR_ASSERTS_ONLY;
+
+	setup_autoprewarm(&worker, "autoprewarm", "load_one_database",
+					  (Datum) NULL, BGW_NEVER_RESTART,
+					  BGWORKER_BACKEND_DATABASE_CONNECTION);
+
+	/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerShutdown */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker autoprewarm failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerShutdown(handle);
+	Assert(status == BGWH_STOPPED);
+}
+
+/*
+ * prewarm_buffer_pool
+ *		The main routine which prewarm the buffer pool
+ *
+ * The prewarm bgworker will first load all of the BlockInfoRecord's in
+ * $PGDATA/AUTOPREWARM_FILE to a dsm. And those BlockInfoRecords are further
+ * separated based on their database. And for each group of BlockInfoRecords a
+ * per-database worker will be launched to load corresponding blocks. Each of
+ * those workers will be launched in sequential order only after the previous
+ * one has finished its job.
+ */
+static void
+prewarm_buffer_pool(void)
+{
+	FILE	   *file = NULL;
+	uint32		num_elements,
+				i;
+	BlockInfoRecord *blkinfo;
+	dsm_segment *seg;
+
+	/*
+	 * since there could be at max one worker who could do a prewarm no need
+	 * to take lock before setting skip_prewarm_on_restart.
+	 */
+	state->skip_prewarm_on_restart = true;
+
+	LWLockAcquire(&state->lock, LW_EXCLUSIVE);
+	if (state->pid_using_dumpfile == InvalidPid)
+		state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		LWLockRelease(&state->lock);
+		ereport(LOG,
+				(errmsg("skipping prewarm because block dump file is being written by PID %d",
+						state->pid_using_dumpfile)));
+		return;
+	}
+
+	LWLockRelease(&state->lock);
+
+	file = AllocateFile(AUTOPREWARM_FILE, PG_BINARY_R);
+	if (!file)
+	{
+		if (errno != ENOENT)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m",
+							AUTOPREWARM_FILE)));
+
+		state->pid_using_dumpfile = InvalidPid;
+		return;					/* No file to load. */
+	}
+
+	if (fscanf(file, "<<%u>>i\n", &num_elements) != 1)
+	{
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from file \"%s\": %m",
+						AUTOPREWARM_FILE)));
+	}
+
+	seg = dsm_create(sizeof(BlockInfoRecord) * num_elements, 0);
+
+	blkinfo = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	for (i = 0; i < num_elements; i++)
+	{
+		/* get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &blkinfo[i].database,
+						&blkinfo[i].tablespace, &blkinfo[i].filenode,
+						(uint32 *) &blkinfo[i].forknum, &blkinfo[i].blocknum))
+			break;
+	}
+
+	FreeFile(file);
+
+	if (num_elements != i)
+		elog(ERROR, "autoprewarm block dump has %u entries but expected %u",
+			 i, num_elements);
+
+	/*
+	 * sort the block number to increase the chance of sequential reads during
+	 * load.
+	 */
+	pg_qsort(blkinfo, num_elements, sizeof(BlockInfoRecord), blockinfo_cmp);
+
+	state->block_info_handle = dsm_segment_handle(seg);
+	state->prewarm_start_idx = state->prewarm_stop_idx = 0;
+
+	/* get next database's first block info's position. */
+	while (state->prewarm_start_idx < num_elements)
+	{
+		uint32		i = state->prewarm_start_idx;
+		Oid			current_db = blkinfo[i].database;
+
+		/*
+		 * advance the prewarm_stop_idx to end of block infos of current
+		 * database.
+		 */
+		do
+		{
+			i++;
+			if (current_db != blkinfo[i].database)
+			{
+				/*
+				 * For block info of a global object whose database will be 0
+				 * try to combine them with next non-zero database's block
+				 * infos to load.
+				 */
+				if (current_db != InvalidOid)
+					break;
+				current_db = blkinfo[i].database;
+			}
+		} while (i < num_elements);
+
+		/*
+		 * If we are here with database as InvalidOid it means we only have
+		 * block_infos belonging to global objects. As we do not have a valid
+		 * database to connect we shall simply ignore them.
+		 */
+		if (current_db == 0)
+			break;
+
+		state->prewarm_stop_idx = i;
+		state->database = current_db;
+
+		Assert(state->prewarm_start_idx < state->prewarm_stop_idx);
+
+		/*
+		 * Register a per-database worker to load new database's block. And
+		 * wait until they finish their job to launch next one.
+		 */
+		launch_and_wait_for_per_database_worker();
+		state->prewarm_start_idx = state->prewarm_stop_idx;
+	}
+
+	dsm_detach(seg);
+	state->block_info_handle = DSM_HANDLE_INVALID;
+
+	state->pid_using_dumpfile = InvalidPid;
+	ereport(LOG,
+			(errmsg("autoprewarm load task ended")));
+	return;
+}
+
+/* ============================================================================
+ * =============	buffer pool info dump part of autoprewarm	===============
+ * ============================================================================
+ */
+
+/* This sub-module is for periodically dumping buffer pool's block info into
+ * a dump file AUTOPREWARM_FILE.
+ * Each entry of block info looks like this:
+ * database,tablespace,filenode,forknum,blocknum and we shall call it as
+ * BlockInfoRecord. Note we write in the text form so that the dump information
+ * is readable and if necessary can be carefully edited.
+ */
+
+/*
+ * dump_now
+ *		Dumps block infos in buffer pool
+ */
+static uint32
+dump_now(bool is_bgworker)
+{
+	static char transient_dump_file_path[MAXPGPATH];
+	uint32		i;
+	int			ret,
+				buflen;
+	uint32		num_blocks;
+	BlockInfoRecord *block_info_array;
+	BufferDesc *bufHdr;
+	int			fd;
+	char		buf[1024];
+
+	LWLockAcquire(&state->lock, LW_EXCLUSIVE);
+	if (state->pid_using_dumpfile == InvalidPid)
+		state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		LWLockRelease(&state->lock);
+
+		if (!is_bgworker)
+			ereport(ERROR,
+					(errmsg("could not perform block dump because dump file is being used by PID %d",
+							state->pid_using_dumpfile)));
+		ereport(LOG,
+				(errmsg("skipping block dump because it is already being performed by PID %d",
+						state->pid_using_dumpfile)));
+		return 0;
+	}
+
+	LWLockRelease(&state->lock);
+
+	block_info_array =
+		(BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	for (num_blocks = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Have we been asked to stop dump? */
+		if (dump_interval == AT_PWARM_OFF)
+		{
+			pfree(block_info_array);
+			return 0;
+		}
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		if (buf_state & BM_TAG_VALID)
+		{
+			block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_blocks].tablespace = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+			++num_blocks;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	snprintf(transient_dump_file_path, MAXPGPATH, "%s.tmp", AUTOPREWARM_FILE);
+
+	fd = OpenTransientFile(transient_dump_file_path,
+						   O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open \"%s\": %m",
+						transient_dump_file_path)));
+
+	buflen = sprintf(buf, "<<%u>>\n", num_blocks);
+	if (write(fd, buf, buflen) < buflen)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write to file \"%s\" : %m",
+						transient_dump_file_path)));
+
+	for (i = 0; i < num_blocks; i++)
+	{
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Have we been asked to stop dump? */
+		if (dump_interval == AT_PWARM_OFF)
+		{
+			pfree(block_info_array);
+			CloseTransientFile(fd);
+			unlink(transient_dump_file_path);
+			return 0;
+		}
+
+		buflen = sprintf(buf, "%u,%u,%u,%u,%u\n",
+						 block_info_array[i].database,
+						 block_info_array[i].tablespace,
+						 block_info_array[i].filenode,
+						 (uint32) block_info_array[i].forknum,
+						 block_info_array[i].blocknum);
+
+		if (write(fd, buf, buflen) < buflen)
+		{
+			int			save_errno = errno;
+
+			CloseTransientFile(fd);
+			unlink(transient_dump_file_path);
+			errno = save_errno;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to file \"%s\": %m",
+							transient_dump_file_path)));
+		}
+	}
+
+	pfree(block_info_array);
+
+	/*
+	 * rename transient_dump_file_path to AUTOPREWARM_FILE to make things
+	 * permanent.
+	 */
+	ret = CloseTransientFile(fd);
+	if (ret != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\" : %m",
+						transient_dump_file_path)));
+	(void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
+
+	state->pid_using_dumpfile = InvalidPid;
+
+	ereport(LOG,
+			(errmsg("saved metadata info of %d blocks", num_blocks)));
+	return num_blocks;
+}
+
+/*
+ * dump_block_info_periodically
+ *		Loop which periodically calls dump_now()
+ *
+ * At regular intervals, which is defined by GUC dump_interval, dump_now() will
+ * be called.
+ */
+void
+dump_block_info_periodically(void)
+{
+	TimestampTz last_dump_time = GetCurrentTimestamp();
+
+	while (!got_sigterm)
+	{
+		int			rc;
+		struct timeval nap;
+
+		nap.tv_sec = AT_PWARM_DEFAULT_DUMP_INTERVAL;
+		nap.tv_usec = 0;
+
+		/* Has been set not to dump. Nothing more to do. */
+		if (dump_interval == AT_PWARM_OFF)
+			return;
+
+		if (dump_interval > AT_PWARM_DUMP_AT_SHUTDOWN_ONLY)
+		{
+			TimestampTz current_time = GetCurrentTimestamp();
+
+			if (TimestampDifferenceExceeds(last_dump_time,
+										   current_time,
+										   (dump_interval * 1000)))
+			{
+				dump_now(true);
+				if (got_sigterm)
+					return;		/* got shutdown signal during or right after a
+								 * dump. And, I think better to return now. */
+				last_dump_time = GetCurrentTimestamp();
+				nap.tv_sec = dump_interval;
+				nap.tv_usec = 0;
+			}
+			else
+			{
+				long		secs;
+				int			usecs;
+
+				TimestampDifference(last_dump_time, current_time,
+									&secs, &usecs);
+				nap.tv_sec = dump_interval - secs;
+				nap.tv_usec = 0;
+			}
+		}
+
+		ResetLatch(&MyProc->procLatch);
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   (nap.tv_sec * 1000L) + (nap.tv_usec / 1000L),
+					   PG_WAIT_EXTENSION);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		/*
+		 * In case of a SIGHUP, just reload the configuration.
+		 */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* One last block info dump while postmaster shutdown. */
+	if (dump_interval != AT_PWARM_OFF)
+		dump_now(true);
+}
+
+/*
+ * autoprewarm_main
+ *		The main entry point of autoprewarm bgworker process
+ */
+void
+autoprewarm_main(Datum main_arg)
+{
+	AutoPrewarmTask todo_task;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+	pqsignal(SIGUSR1, apw_sigusr1_handler);
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+
+	todo_task = DatumGetInt32(main_arg);
+	Assert(todo_task == TASK_PREWARM_BUFFERPOOL ||
+		   todo_task == TASK_DUMP_BUFFERPOOL_INFO);
+	init_autoprewarm_state();
+
+	LWLockAcquire(&state->lock, LW_EXCLUSIVE);
+	if (state->bgworker_pid != InvalidPid)
+	{
+		LWLockRelease(&state->lock);
+		ereport(LOG,
+				(errmsg("could not continue autoprewarm worker is already running under PID %d",
+						state->bgworker_pid)));
+		return;
+	}
+
+	state->bgworker_pid = MyProcPid;
+	LWLockRelease(&state->lock);
+
+	on_shmem_exit(reset_shm_state, 0);
+
+	ereport(LOG,
+			(errmsg("autoprewarm has started")));
+
+	/*
+	 * **** perform autoprewarm's task	****
+	 */
+	if (todo_task == TASK_PREWARM_BUFFERPOOL &&
+		!state->skip_prewarm_on_restart)
+		prewarm_buffer_pool();
+
+	dump_block_info_periodically();
+
+	ereport(LOG,
+			(errmsg("autoprewarm shutting down")));
+}
+
+/* ============================================================================
+ * =============	extension's entry functions/utilities	===================
+ * ============================================================================
+ */
+
+/*
+ * setup_autoprewarm
+ *		A Common function to initialize BackgroundWorker structure
+ */
+static void
+setup_autoprewarm(BackgroundWorker *autoprewarm, const char *worker_name,
+			   const char *worker_function, Datum main_arg, int restart_time,
+				  int extra_flags)
+{
+	MemSet(autoprewarm, 0, sizeof(BackgroundWorker));
+	autoprewarm->bgw_flags = BGWORKER_SHMEM_ACCESS | extra_flags;
+
+	/* Register the autoprewarm background worker */
+	autoprewarm->bgw_start_time = BgWorkerStart_ConsistentState;
+	autoprewarm->bgw_restart_time = restart_time;
+	strcpy(autoprewarm->bgw_library_name, "pg_prewarm");
+	strcpy(autoprewarm->bgw_function_name, worker_function);
+	strncpy(autoprewarm->bgw_name, worker_name, BGW_MAXLEN);
+	autoprewarm->bgw_main_arg = main_arg;
+}
+
+/*
+ * _PG_init
+ *		Extension's entry point
+ */
+void
+_PG_init(void)
+{
+	BackgroundWorker prewarm_worker;
+
+	/* Define custom GUC variables. */
+	if (process_shared_preload_libraries_in_progress)
+		DefineCustomBoolVariable("pg_prewarm.autoprewarm",
+								 "Enable/Disable auto-prewarm feature.",
+								 NULL,
+								 &autoprewarm,
+								 true,
+								 PGC_POSTMASTER,
+								 0,
+								 NULL,
+								 NULL,
+								 NULL);
+
+	DefineCustomIntVariable("pg_prewarm.dump_interval",
+					   "Sets the maximum time between two buffer pool dumps",
+							"If set to Zero, timer based dumping is disabled."
+							" If set to -1, stops the running autoprewarm.",
+							&dump_interval,
+							AT_PWARM_DEFAULT_DUMP_INTERVAL,
+							AT_PWARM_OFF, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	EmitWarningsOnPlaceholders("pg_prewarm");
+
+	/* if not run as a preloaded library, nothing more to do here! */
+	if (!process_shared_preload_libraries_in_progress)
+		return;
+
+	/* Request additional shared resources */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
+
+	/* Has been set not to start autoprewarm bgworker. Nothing more to do. */
+	if (!autoprewarm)
+		return;
+
+	/* Register autoprewarm load. */
+	setup_autoprewarm(&prewarm_worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_PREWARM_BUFFERPOOL), 0, 0);
+	RegisterBackgroundWorker(&prewarm_worker);
+}
+
+/*
+ * autoprewarm_dump_launcher
+ *		Dynamically launch an autoprewarm dump worker
+ */
+static pid_t
+autoprewarm_dump_launcher(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	setup_autoprewarm(&worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_DUMP_BUFFERPOOL_INFO), 0, 0);
+
+	/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			   errmsg("registering dynamic bgworker \"autoprewarm\" failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerStartup(handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not start autoprewarm dump bgworker"),
+			   errhint("More details may be available in the server log.")));
+	}
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			  errmsg("cannot start bgworker autoprewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart the database.")));
+	}
+
+	Assert(status == BGWH_STARTED);
+	return pid;
+}
+
+/*
+ * launch_autoprewarm_dump
+ *		The C-Language entry function to launch autoprewarm dump bgworker
+ */
+Datum
+launch_autoprewarm_dump(PG_FUNCTION_ARGS)
+{
+	pid_t		pid;
+
+	/* Has been set not to dump. Nothing more to do. */
+	if (dump_interval == AT_PWARM_OFF)
+		PG_RETURN_NULL();
+
+	pid = autoprewarm_dump_launcher();
+	PG_RETURN_INT32(pid);
+}
+
+/*
+ * autoprewarm_dump_now
+ *		The C-Language entry function to dump immediately
+ */
+Datum
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+	uint32		num_blocks = 0;
+
+	init_autoprewarm_state();
+
+	PG_TRY();
+	{
+		num_blocks = dump_now(false);
+	}
+	PG_CATCH();
+	{
+		if (state->pid_using_dumpfile == MyProcPid)
+			state->pid_using_dumpfile = InvalidPid;
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+	PG_RETURN_INT64(num_blocks);
+}
diff --git a/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
new file mode 100644
index 0000000..6c35fb7
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,14 @@
+/* contrib/pg_prewarm/pg_prewarm--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_prewarm UPDATE TO '1.2'" to load this file. \quit
+
+CREATE FUNCTION launch_autoprewarm_dump()
+RETURNS pg_catalog.int4 STRICT
+AS 'MODULE_PATHNAME', 'launch_autoprewarm_dump'
+LANGUAGE C;
+
+CREATE FUNCTION autoprewarm_dump_now()
+RETURNS pg_catalog.int8 STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_dump_now'
+LANGUAGE C;
diff --git a/contrib/pg_prewarm/pg_prewarm.control b/contrib/pg_prewarm/pg_prewarm.control
index cf2fb92..40e3add 100644
--- a/contrib/pg_prewarm/pg_prewarm.control
+++ b/contrib/pg_prewarm/pg_prewarm.control
@@ -1,5 +1,5 @@
 # pg_prewarm extension
 comment = 'prewarm relation data'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_prewarm'
 relocatable = true
diff --git a/doc/src/sgml/pgprewarm.sgml b/doc/src/sgml/pgprewarm.sgml
index c090401..e8d0c2e 100644
--- a/doc/src/sgml/pgprewarm.sgml
+++ b/doc/src/sgml/pgprewarm.sgml
@@ -10,7 +10,9 @@
  <para>
   The <filename>pg_prewarm</filename> module provides a convenient way
   to load relation data into either the operating system buffer cache
-  or the <productname>PostgreSQL</productname> buffer cache.
+  or the <productname>PostgreSQL</productname> buffer cache. Additionally, an
+  automatic prewarming of the server buffers is supported whenever the server
+  restarts.
  </para>
 
  <sect2>
@@ -55,6 +57,100 @@ pg_prewarm(regclass, mode text default 'buffer', fork text default 'main',
    cache. For these reasons, prewarming is typically most useful at startup,
    when caches are largely empty.
   </para>
+
+<synopsis>
+launch_autoprewarm_dump() RETURNS int4
+</synopsis>
+
+  <para>
+   This is a SQL callable function to launch the <literal>autoprewarm</literal>
+   worker to dump the buffer pool information at regular interval. In a server,
+   we can only run one <literal>autoprewarm</literal> worker so if worker sees
+   another existing worker it will exit immediately. The return value is pid of
+   the worker which has been launched.
+  </para>
+
+<synopsis>
+autoprewarm_dump_now() RETURNS int8
+</synopsis>
+
+  <para>
+   This is a SQL callable function to dump buffer pool information immediately
+   once by a backend. The return value is the number of block infos dumped.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>autoprewarm</title>
+
+  <para>
+  A bgworker which automatically records information about blocks which were
+  present in buffer pool before server shutdown and then prewarm the buffer
+  pool upon server restart with those blocks.
+  </para>
+
+  <para>
+  When the shared library <literal>pg_prewarm</literal> is preloaded via
+  <xref linkend="guc-shared-preload-libraries"> in <filename>postgresql.conf</>,
+  a bgworker <literal>autoprewarm</literal> is launched immediately after the
+  server has reached a consistent state. The bgworker will start loading blocks
+  recorded in <literal>$PGDATA/autoprewarm.blocks</literal> until there is a
+  free buffer left in the buffer pool. This way we do not replace any new
+  blocks which were loaded either by the recovery process or the querying
+  clients.
+  </para>
+
+  <para>
+  Once the <literal>autoprewarm</literal> bgworker has completed its prewarm
+  task, it will start a new task to periodically dump the information about
+  blocks which are currently in shared buffer pool. Upon next server restart,
+  the bgworker will prewarm the buffer pool by loading those blocks. The GUC
+  <literal>pg_prewarm.dump_interval</literal> will control the dumping activity
+  of the bgworker.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+ <variablelist>
+   <varlistentry>
+    <term>
+     <varname>pg_prewarm.autoprewarm</varname> (<type>boolean</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.autoprewarm</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is valid only for <literal>autoprewarm</literal>. An autoprewarm
+      worker will only be started if this variable is set <literal>on</literal>.
+      The default value is <literal>on</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry>
+   <term>
+     <varname>pg_prewarm.dump_interval</varname> (<type>int</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.dump_interval</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is valid only for <literal>autoprewarm</literal>. The minimum number
+      of seconds between two buffer pool's block information dump. The default
+      is 300 seconds. It also takes special values. If set to 0 then timer
+      based dump is disabled, it dumps only while the server is shutting down.
+      If set to -1, the running <literal>autoprewarm</literal> will be stopped.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
  </sect2>
 
  <sect2>
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 5d0a636..06a34a7 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,23 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- a lockless check to see if there is a free buffer in
+ *					   buffer pool.
+ *
+ * If the result is true that will become stale once free buffers are moved out
+ * by other operations, so the caller who strictly want to use a free buffer
+ * should not call this.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index ff99f6b..ab04bd9 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index eaa6d32..c6fa86a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -138,6 +138,8 @@ AttrDefault
 AttrNumber
 AttributeOpts
 AuthRequest
+AutoPrewarmSharedState
+AutoPrewarmTask
 AutoVacOpts
 AutoVacuumShmemStruct
 AutoVacuumWorkItem
@@ -214,10 +216,12 @@ BitmapOr
 BitmapOrPath
 BitmapOrState
 Bitmapset
+BlkType
 BlobInfo
 Block
 BlockId
 BlockIdData
+BlockInfoRecord
 BlockNumber
 BlockSampler
 BlockSamplerData
@@ -2869,6 +2873,7 @@ pos_trgm
 post_parse_analyze_hook_type
 pqbool
 pqsigfunc
+prewarm_elem
 printQueryOpt
 printTableContent
 printTableFooter

autoprewarm_performance_report.odsapplication/vnd.oasis.opendocument.spreadsheet; name=autoprewarm_performance_report.odsDownload

#78

Rafia Sabih

rafia.sabih@enterprisedb.com

over 8 years ago

In reply to: Mithun Cy (#77)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

On Sun, Jun 4, 2017 at 12:45 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Wed, May 31, 2017 at 10:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:

+ * contrib/autoprewarm.c

Wrong.

-- Oops Sorry fixed.
+    Oid            database;        /* database */
+    Oid            spcnode;        /* tablespace */
+    Oid            filenode;        /* relation's filenode. */
+    ForkNumber    forknum;        /* fork number */
+    BlockNumber blocknum;        /* block number */
Why spcnode rather than tablespace? Also, the third line has a
period, but not the others. I think you could just drop these
comments; they basically just duplicate the structure names, except
for spcnode which doesn't but probably should.
-- Dropped comments and changed spcnode to tablespace.

+ bool can_do_prewarm; /* if set can't do prewarm task. */

The comment and the name of the variable imply opposite meanings.

-- Sorry a typo. Now this variable has been removed as you have
suggested with new variables in AutoPrewarmSharedState.
+/*
+ * detach_blkinfos - on_shm_exit detach the dsm allocated for blockinfos.
+ */
I assume you don't really need this.  Presumably process exit is going
to detach the DSM anyway.
-- Yes considering process exit will detach the dsm, I have removed
that function.
+    if (seg == NULL)
+        ereport(ERROR,
+                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                 errmsg("unable to map dynamic shared memory segment")));
Let's use the wording from the message in parallel.c rather than this
wording. Actually, we should probably (as a separate patch) change
test_shm_mq's worker.c to use the parallel.c wording also.
-- I have corrected the message with "could not map dynamic shared
memory segment" as in parallel.c

+ SetCurrentStatementStartTimestamp();

Why do we need this?

-- Removed Sorry forgot to remove same when I removed the SPI connection code.

+ StartTransactionCommand();

Do we need a transaction? If so, how about starting a new transaction
for each relation instead of using a single one for the entire
prewarm?

-- We do relation_open hence need a transaction. As suggested now we
start a new transaction on every new relation.
+    if (status == BGWH_STOPPED)
+        return;
+
+    if (status == BGWH_POSTMASTER_DIED)
+    {
+        ereport(ERROR,
+                (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+              errmsg("cannot start bgworker autoprewarm without postmaster"),
+                 errhint("Kill all remaining database processes and restart"
+                         " the database.")));
+    }
+
+    Assert(0);
Instead of writing it this way, remove the first "if" block and change
the second one to Assert(status == BGWH_STOPPED). Also, I'd ditch the
curly braces in this case.
-- Fixed as suggested.

+ file = fopen(AUTOPREWARM_FILE, PG_BINARY_R);

Use AllocateFile() instead. See the comments for that function and at
the top of fd.c. Then you don't need to worry about leaking the file
handle on an error, so you can remove the fclose() before ereport()

now> stuff -- which is incomplete anyway, because you could fail e.g.

inside dsm_create().

-- Using AllocateFile now.
+                 errmsg("Error reading num of elements in \"%s\" for"
+                        " autoprewarm : %m", AUTOPREWARM_FILE)));
As I said in a previous review, please do NOT split error messages
across lines like this. Also, this error message is nothing close to
PostgreSQL style. Please read
https://www.postgresql.org/docs/devel/static/error-style-guide.html
and learn to follow all those guidelines written therein. I see at
least 3 separate problems with this message.
-- Thanks, I have tried to fix it now.

+ num_elements = i;

I'd do something like if (i != num_elements) elog(ERROR, "autoprewarm
block dump has %d entries but expected %d", i, num_elements); It
seems OK for this to be elog() rather than ereport() because the file
should never be corrupt unless the user has cheated by hand-editing
it.

-- Fixed as suggested. Now eloged as an ERROR.

I think you can get rid of next_db_pos altogether, and this
prewarm_elem thing too. Design sketch:

1. Move all the stuff that's in prewarm_elem into
AutoPrewarmSharedState. Rename start_pos to prewarm_start_idx and
end_of_blockinfos to prewarm_stop_idx.

2. Instead of computing all of the database ranges first and then
launching workers, do it all in one loop. So figure out where the
records for the current database end and set prewarm_start_idx and
prewarm_end_idx appropriately. Launch a worker. When the worker
terminates, set prewarm_start_idx = prewarm_end_idx and advance
prewarm_end_idx to the end of the next database's records.

This saves having to allocate memory for the next_db_pos array, and it
also avoids this crock:

+ memcpy(&pelem, MyBgworkerEntry->bgw_extra, sizeof(prewarm_elem));

-- Fixed as suggested.

The reason that's bad is because it only works so long as bgw_extra is
large enough to hold prewarm_elem. If prewarm_elem grows or bgw_extra
shrinks, this turns into a buffer overrun.

-- passing prewarm info through bgw_extra helped us to restrict the
scope and lifetime of prewarm_elem only to prewarm task. Moving them
to shared memory made them global even though they are not needed once
prewarm task is finished. As there are other disadvantages of using
bgw_extra I have now implemented as you have suggested.

I would use AUTOPREWARM_FILE ".tmp" rather than a name incorporating
the PID for the temporary file. Otherwise, you might leave many
temporary files behind under different names. If you use the same
name every time, you'll never have more than one, and the next
successful dump will end up getting rid of it along the way.

-- Fixed as sugested. Previosuly PID was used so that concurrent dump
can happen between dump worker and immediate dump as they will write
to two different files. With new way of registering PID before file
access in shared memory I think that problem can be adressed.
+            pfree(block_info_array);
+            CloseTransientFile(fd);
+            unlink(transient_dump_file_path);
+            ereport(ERROR,
+                    (errcode_for_file_access(),
+                     errmsg("error writing to \"%s\" : %m",
.> + AUTOPREWARM_FILE)));

Again, this is NOT a standard error message text. It occurs in zero
other places in the source tree. You are not the first person to need
an error message for a failed write to a file; please look at what the
previous authors did. Also, the pfree() before report is not needed;
isn't the whole process going to terminate? Also, you can't really use
errcode_for_file_access() here, because you've meanwhile done
CloseTransientFile() and unlink(), which will have clobbered errno.

-- Removed pfree, saved errno before CloseTransientFile() and unlink()

+ ereport(LOG, (errmsg("saved metadata info of %d blocks", num_blocks)));

Not project style for ereport(). Line break after the first comma.
Similarly elsewhere.

-- Tried to fix same
+ *    dump_now - the main routine which goes through each buffer
header of buffer
+ *    pool and dumps their meta data. We Sort these data and then dump them.
+ *    Sorting is necessary as it facilitates sequential read during load.
This is no longer true, because you moved the sort to the load side.
It's also not capitalized properly.
-- Sorry removed now.
Discussions of the format of the autoprewarm dump file involve
inexplicably varying number of < and > symbols:
+ *        <<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>> in
+ * <DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum> and we shall call it
+        buflen = sprintf(buf, "%u,%u,%u,%u,%u\n",
-- Sorry fixed now, in all of the places the block info formats will
not have such (</>) delimiter.

+#ifndef __AUTOPREWARM_H__
+#define __AUTOPREWARM_H__

We don't use double-underscored names for header files. Please
consult existing header files for the appropriate style. Also, why
does this file exist at all, instead of just including them in the .c
file? The pointer of a header is for things that need to be included
by multiple .c files, but there's no such need here.

-- This was done to fix one of the previous review comments. I have
moved them back to .c file.
+             * load. If there are no other block infos than the global objects
+             * we silently ignore them. Should I throw error?
Silently ignoring them seems fine. Throwing an error doesn't seem
like it would improve things.
-- Okay thanks.
+        /*
+         * Register a sub-worker to load new database's block. Wait until the
+         * sub-worker finish its job before launching next sub-worker.
+         */
+        launch_prewarm_subworker(&pelem);
The function name implies that it launches the worker, but the comment
implies that it also waits for it to terminate. Functions should be
named in a way that matches what they do.
-- Have renamed it to launch_and_wait_for_per_database_worker

I feel like the get_autoprewarm_task() function is introducing fairly
baroque logic for something that really ought to be more simple. All
autoprewarm_main() really needs to do is:

if (!state->autoprewarm_done)
autoprewarm();
dump_block_info_periodically();

-- Have simplified things as suggested now. Function
get_autoprewarm_task has been removed.

The locking in autoprewarm_dump_now() is a bit tricky. There are two
trouble cases. One is that we try to rename() our new dump file on
top of the existing one while a background worker is still using it to
perform an autoprewarm. The other is that we try to write a new
temporary dump file while some other process is trying to do the same
thing. I think we should fix this by storing a PID in
AutoPrewarmSharedState; a process which wants to perform a dump or an
autoprewarm must change the PID from 0 to their own PID, and change it
back to 0 on successful completion or error exit. If we go to perform
an immediate dump process and finds a non-zero value already just does
ereport(ERROR, ...), including the PID of the other process in the
message (e.g. "unable to perform block dump because dump file is being
used by PID %d"). In a background worker, if we go to dump and find
the file in use, log a message (e.g. "skipping block dump because it
is already being performed by PID %d", "skipping prewarm because block
dump file is being rewritten by PID %d").

-- Fixed as suggested.

I also think we should change is_bgworker_running to a PID, so that if
we try to launch a new worker we can report something like
"autoprewarm worker is already running under PID %d".

-- Fixed. I could only "LOG" about another autoprewarm worker already
running and then exit. Because on ERROR we try to restart the worker,
so do not want to restart such workers.

So putting that all together, I suppose AutoPrewarmSharedState should
end up looking like this:

LWLock lock; /* mutual exclusion */
pid_t bgworker_pid; /* for main bgworker */
pid_t pid_using_dumpfile; /* for autoprewarm or block dump */

-- I think one more member is required which state whether prewarm can
be done when the worker restarts.

/* following items are for communication with per-database worker */
dsm_handle block_info_handle;
Oid database;
int prewarm_start_idx;
int prewarm_stop_idx;

-- Fixed as suggested

I suggest going through and changing "subworker" to "per-database
worker" throughout.

-- Fixed as suggested.

BTW, have you tested how long this takes to run with, say, shared_buffers = 8GB?

I have tried same on my local machine with ssd as a storage.

settings: shared_buffers = 8GB, loaded data with pg_bench scale_factor=1000.

Total blocks got dumped
autoprewarm_dump_now
----------------------
1048576

5 different load time based logs

1.
2017-06-04 11:30:26.460 IST [116253] LOG: autoprewarm has started
2017-06-04 11:30:43.443 IST [116253] LOG: autoprewarm load task ended
-- 17 secs

2
2017-06-04 11:31:13.565 IST [116291] LOG: autoprewarm has started
2017-06-04 11:31:30.317 IST [116291] LOG: autoprewarm load task ended
-- 17 secs

3.
2017-06-04 11:32:12.995 IST [116329] LOG: autoprewarm has started
2017-06-04 11:32:29.982 IST [116329] LOG: autoprewarm load task ended
-- 17 secs

4.
2017-06-04 11:32:58.974 IST [116361] LOG: autoprewarm has started
2017-06-04 11:33:15.017 IST [116361] LOG: autoprewarm load task ended
-- 17secs

5.
2017-06-04 12:15:49.772 IST [117936] LOG: autoprewarm has started
2017-06-04 12:16:11.012 IST [117936] LOG: autoprewarm load task ended
-- 22 secs.

So mostly from 17 to 22 secs.

But I think I need to do tests on a larger set of configuration on
different storage types. I shall do same and upload later. I have also
uploaded latest performance test results (on my local machine ssd
drive)
configuration: shared_buffer = 8GB,
test setting: scale_factor=300 (data fits to shared_buffers) pgbench clients =1

TEST
PGBENCH_RUN="./pgbench --no-vacuum --protocol=prepared --time=5 -j 1
-c 1 --select-only postgres"
START_TIME=$SECONDS; echo TIME, TPS; while true; do TPS=$($PGBENCH_RUN
| grep excluding | cut -d ' ' -f 3); TIME=$((SECONDS-START_TIME));
echo $TIME, $TPS; done

I had a look at the patch from stylistic/formatting point of view,
please find the attached patch for the suggested modifications.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachments:

cosmetic_autoprewarm.patchapplication/octet-stream; name=cosmetic_autoprewarm.patchDownload

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index 51acc09aa7..9845aeb22a 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -5,22 +5,22 @@
  *
  * DESCRIPTION
  *
- *		It is a bgworker which automatically records information about blocks
- *		which were present in buffer pool before server shutdown and then
- *		prewarm the buffer pool upon server restart with those blocks.
+ *		It is a bgworker process that automatically records information about
+ *		blocks which were present in buffer pool before server shutdown and then
+ *		prewarms the buffer pool upon server restart with those blocks.
  *
  *		How does it work? When the shared library "pg_prewarm" is preloaded, a
  *		bgworker "autoprewarm" is launched immediately after the server has
- *		reached consistent state. The bgworker will start loading blocks
- *		recorded in the format BlockInfoRecord
- *		database,tablespace,filenode,forknum,blocknum in
+ *		reached a consistent state. The bgworker will start loading blocks
+ *		recorded in the format BlockInfoRecord consisting of
+ *		database, tablespace, filenode, forknum, blocknum in
  *		$PGDATA/AUTOPREWARM_FILE, until there is no free buffer left in the
  *		buffer pool. This way we do not replace any new blocks which were
  *		loaded either by the recovery process or the querying clients.
  *
  *		Once the "autoprewarm" bgworker has completed its prewarm task, it will
  *		start a new task to periodically dump the BlockInfoRecords related to
- *		blocks which are currently in shared buffer pool. Upon next server
+ *		the blocks which are currently in shared buffer pool. On next server
  *		restart, the bgworker will prewarm the buffer pool by loading those
  *		blocks. The GUC pg_prewarm.dump_interval will control the dumping
  *		activity of the bgworker.
@@ -89,13 +89,13 @@ static void apw_sigterm_handler(SIGNAL_ARGS);
 static void apw_sighup_handler(SIGNAL_ARGS);
 static void apw_sigusr1_handler(SIGNAL_ARGS);
 
-/* flags set by signal handlers */
+/* Flags set by signal handlers */
 static volatile sig_atomic_t got_sigterm = false;
 static volatile sig_atomic_t got_sighup = false;
 
 /*
  *	Signal handler for SIGTERM
- *	Set a flag to let the main loop to terminate, and set our latch to wake it
+ *	Set a flag for the termination of main loop, and set our latch to wake it
  *	up.
  */
 static void
@@ -113,7 +113,7 @@ apw_sigterm_handler(SIGNAL_ARGS)
 
 /*
  *	Signal handler for SIGHUP
- *	Set a flag to tell the process to reread the config file, and set our
+ *	Set a flag to notify the process to reread the config file, and set our
  *	latch to wake it up.
  */
 static void
@@ -146,13 +146,11 @@ apw_sigusr1_handler(SIGNAL_ARGS)
 }
 
 /* ============================================================================
- * ==============	types and variables used by autoprewarm   =============
+ * ==============	Types and variables used by autoprewarm   =============
  * ============================================================================
  */
 
-/*
- * Metadata of each persistent block which is dumped and used to load.
- */
+/* Metadata of each persistent block which is dumped and used for loading. */
 typedef struct BlockInfoRecord
 {
 	Oid			database;
@@ -160,20 +158,16 @@ typedef struct BlockInfoRecord
 	Oid			filenode;
 	ForkNumber	forknum;
 	BlockNumber blocknum;
-} BlockInfoRecord;
+}	BlockInfoRecord;
 
-/*
- * Tasks performed by autoprewarm workers.
- */
+/* Tasks performed by autoprewarm workers.*/
 typedef enum
 {
 	TASK_PREWARM_BUFFERPOOL,	/* prewarm the buffer pool. */
 	TASK_DUMP_BUFFERPOOL_INFO	/* dump the buffer pool block info. */
-} AutoPrewarmTask;
+}	AutoPrewarmTask;
 
-/*
- * Shared state information about the running autoprewarm bgworker.
- */
+/* Shared state information about running the autoprewarm bgworker. */
 typedef struct AutoPrewarmSharedState
 {
 	LWLock		lock;			/* mutual exclusion */
@@ -182,25 +176,25 @@ typedef struct AutoPrewarmSharedState
 	bool		skip_prewarm_on_restart;		/* if set true, prewarm task
 												 * will not be done */
 
-	/* following items are for communication with per-database worker */
+	/* Following items are for communication with per-database worker */
 	dsm_handle	block_info_handle;
 	Oid			database;
 	int			prewarm_start_idx;
 	int			prewarm_stop_idx;
-} AutoPrewarmSharedState;
+}	AutoPrewarmSharedState;
 
 static AutoPrewarmSharedState *state = NULL;
 
-/* GUC variable which control the dump activity of autoprewarm. */
+/* GUC variable that controls the dump activity of autoprewarm. */
 static int	dump_interval = 0;
 
 /*
- * GUC variable which say whether autoprewarm worker has to be started when
+ * GUC variable to decide if autoprewarm worker has to be started when
  * preloaded.
  */
 static bool autoprewarm = true;
 
-/* compare member elements to check if they are not equal. */
+/* Compare member elements to check if they are not equal. */
 #define cmp_member_elem(fld)	\
 do { \
 	if (a->fld < b->fld)		\
@@ -228,7 +222,7 @@ blockinfo_cmp(const void *p, const void *q)
 }
 
 /* ============================================================================
- * =====================	prewarm part of autoprewarm =======================
+ * =====================	Prewarm part of autoprewarm =======================
  * ============================================================================
  */
 
@@ -273,9 +267,9 @@ init_autoprewarm_state(void)
 
 /*
  * load_one_database
- *		Load block infos of one database by connecting to them.
+ *		Load block info of one database after connecting to them.
  *
- * Start of prewarm per-database worker. This will try to load blocks of one
+ * Start prewarm per-database worker, which will load blocks of one
  * database starting from block info position state->prewarm_start_idx to
  * state->prewarm_stop_idx.
  */
@@ -293,9 +287,7 @@ load_one_database(Datum main_arg)
 	pqsignal(SIGTERM, apw_sigterm_handler);
 	pqsignal(SIGHUP, apw_sighup_handler);
 
-	/*
-	 * We're now ready to receive signals
-	 */
+	/* We're now ready to receive signals */
 	BackgroundWorkerUnblockSignals();
 
 	init_autoprewarm_state();
@@ -317,18 +309,18 @@ load_one_database(Datum main_arg)
 		Buffer		buf;
 
 		/*
-		 * Quit if we've reached records for another database. Unless the
+		 * Quit if we've reached records of another database. Unless the
 		 * previous blocks were of global objects which were combined with
-		 * next database's block infos.
+		 * next database's block info.
 		 */
 		if (old_blk != NULL && old_blk->database != blk->database &&
 			old_blk->database != 0)
 			break;
 
 		/*
-		 * When we reach a new relation, close the old one.  Note, however,
-		 * that the previous try_relation_open may have failed, in which case
-		 * rel will be NULL.
+		 * On reaching a new relation, close the old one.  Note, that the
+		 * previous try_relation_open may have failed, in which case rel will
+		 * be NULL.
 		 */
 		if (old_blk != NULL && old_blk->filenode != blk->filenode &&
 			rel != NULL)
@@ -339,8 +331,8 @@ load_one_database(Datum main_arg)
 		}
 
 		/*
-		 * Try to open each new relation, but only once, when we first
-		 * encounter it.  If it's been dropped, skip the associated blocks.
+		 * Each relation is open only once at it's first encounter. If it's
+		 * been dropped, skip the associated blocks.
 		 */
 		if (old_blk == NULL || old_blk->filenode != blk->filenode)
 		{
@@ -362,7 +354,7 @@ load_one_database(Datum main_arg)
 			continue;
 		}
 
-		/* Once per fork, check for fork existence and size. */
+		/* Check each fork for it's existence and size. */
 		if (old_blk == NULL ||
 			old_blk->filenode != blk->filenode ||
 			old_blk->forknum != blk->forknum)
@@ -370,8 +362,8 @@ load_one_database(Datum main_arg)
 			RelationOpenSmgr(rel);
 
 			/*
-			 * smgrexists is not safe for illegal forknum, so test before
-			 * calling same.
+			 * smgrexists is not safe for illegal forknum, hence check if the
+			 * passed forknum is valid before using it in smgrexists.
 			 */
 			if (blk->forknum > InvalidForkNumber &&
 				blk->forknum <= MAX_FORKNUM &&
@@ -381,10 +373,10 @@ load_one_database(Datum main_arg)
 				nblocks = 0;
 		}
 
-		/* check if blocknum is valid and with in fork file size. */
+		/* Check if blocknum is valid and within fork file size. */
 		if (blk->blocknum >= nblocks)
 		{
-			/* move to next forknum. */
+			/* Move to next forknum. */
 			++pos;
 			old_blk = blk;
 			continue;
@@ -402,7 +394,7 @@ load_one_database(Datum main_arg)
 
 	dsm_detach(seg);
 
-	/* release lock on previous relation. */
+	/* Release lock on previous relation. */
 	if (rel)
 	{
 		relation_close(rel, AccessShareLock);
@@ -427,7 +419,7 @@ launch_and_wait_for_per_database_worker(void)
 					  (Datum) NULL, BGW_NEVER_RESTART,
 					  BGWORKER_BACKEND_DATABASE_CONNECTION);
 
-	/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerShutdown */
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerShutdown */
 	worker.bgw_notify_pid = MyProcPid;
 
 	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
@@ -444,12 +436,12 @@ launch_and_wait_for_per_database_worker(void)
 
 /*
  * prewarm_buffer_pool
- *		The main routine which prewarm the buffer pool
+ *		The main routine that prewarms the buffer pool.
  *
- * The prewarm bgworker will first load all of the BlockInfoRecord's in
- * $PGDATA/AUTOPREWARM_FILE to a dsm. And those BlockInfoRecords are further
- * separated based on their database. And for each group of BlockInfoRecords a
- * per-database worker will be launched to load corresponding blocks. Each of
+ * The prewarm bgworker will first load all of the BlockInfoRecords in
+ * $PGDATA/AUTOPREWARM_FILE to a dsm. Further, those BlockInfoRecords are
+ * separated based on their database. Finally, for each group of BlockInfoRecords a
+ * per-database worker will be launched to load corresponding blocks. Now, each of
  * those workers will be launched in sequential order only after the previous
  * one has finished its job.
  */
@@ -463,8 +455,8 @@ prewarm_buffer_pool(void)
 	dsm_segment *seg;
 
 	/*
-	 * since there could be at max one worker who could do a prewarm no need
-	 * to take lock before setting skip_prewarm_on_restart.
+	 * Since there could be at max one worker who could do a prewarm, hence,
+	 * acquiring locks is not required before setting skip_prewarm_on_restart.
 	 */
 	state->skip_prewarm_on_restart = true;
 
@@ -509,7 +501,7 @@ prewarm_buffer_pool(void)
 
 	for (i = 0; i < num_elements; i++)
 	{
-		/* get next block. */
+		/* Get next block. */
 		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &blkinfo[i].database,
 						&blkinfo[i].tablespace, &blkinfo[i].filenode,
 						(uint32 *) &blkinfo[i].forknum, &blkinfo[i].blocknum))
@@ -523,7 +515,7 @@ prewarm_buffer_pool(void)
 			 i, num_elements);
 
 	/*
-	 * sort the block number to increase the chance of sequential reads during
+	 * Sort the block number to increase the chance of sequential reads during
 	 * load.
 	 */
 	pg_qsort(blkinfo, num_elements, sizeof(BlockInfoRecord), blockinfo_cmp);
@@ -531,14 +523,14 @@ prewarm_buffer_pool(void)
 	state->block_info_handle = dsm_segment_handle(seg);
 	state->prewarm_start_idx = state->prewarm_stop_idx = 0;
 
-	/* get next database's first block info's position. */
+	/* Get the info position of the first block of the next database. */
 	while (state->prewarm_start_idx < num_elements)
 	{
 		uint32		i = state->prewarm_start_idx;
 		Oid			current_db = blkinfo[i].database;
 
 		/*
-		 * advance the prewarm_stop_idx to end of block infos of current
+		 * Advance the prewarm_stop_idx to the end of block info of current
 		 * database.
 		 */
 		do
@@ -549,7 +541,7 @@ prewarm_buffer_pool(void)
 				/*
 				 * For block info of a global object whose database will be 0
 				 * try to combine them with next non-zero database's block
-				 * infos to load.
+				 * info to load.
 				 */
 				if (current_db != InvalidOid)
 					break;
@@ -559,8 +551,8 @@ prewarm_buffer_pool(void)
 
 		/*
 		 * If we are here with database as InvalidOid it means we only have
-		 * block_infos belonging to global objects. As we do not have a valid
-		 * database to connect we shall simply ignore them.
+		 * block_info belonging to global objects. As we do not have a valid
+		 * database to connect we shall ignore them.
 		 */
 		if (current_db == 0)
 			break;
@@ -571,8 +563,8 @@ prewarm_buffer_pool(void)
 		Assert(state->prewarm_start_idx < state->prewarm_stop_idx);
 
 		/*
-		 * Register a per-database worker to load new database's block. And
-		 * wait until they finish their job to launch next one.
+		 * Register a per-database worker to load new database's block. Wait
+		 * until they finish their job to launch next one.
 		 */
 		launch_and_wait_for_per_database_worker();
 		state->prewarm_start_idx = state->prewarm_stop_idx;
@@ -588,21 +580,21 @@ prewarm_buffer_pool(void)
 }
 
 /* ============================================================================
- * =============	buffer pool info dump part of autoprewarm	===============
+ * =============	Buffer pool info dump part of autoprewarm	===============
  * ============================================================================
  */
 
 /* This sub-module is for periodically dumping buffer pool's block info into
  * a dump file AUTOPREWARM_FILE.
- * Each entry of block info looks like this:
- * database,tablespace,filenode,forknum,blocknum and we shall call it as
- * BlockInfoRecord. Note we write in the text form so that the dump information
- * is readable and if necessary can be carefully edited.
+ * Each entry of block info consists of following,
+ * database, tablespace, filenode, forknum, blocknum, and we will refer it as
+ * BlockInfoRecord. Note that this is in the text form so that the dump information
+ * is readable and can be edited, if required.
  */
 
 /*
  * dump_now
- *		Dumps block infos in buffer pool
+ *		Dumps block info in buffer pool.
  */
 static uint32
 dump_now(bool is_bgworker)
@@ -643,9 +635,7 @@ dump_now(bool is_bgworker)
 	{
 		uint32		buf_state;
 
-		/*
-		 * In case of a SIGHUP, just reload the configuration.
-		 */
+		/* In case of a SIGHUP, just reload the configuration. */
 		if (got_sighup)
 		{
 			got_sighup = false;
@@ -661,7 +651,7 @@ dump_now(bool is_bgworker)
 
 		bufHdr = GetBufferDescriptor(i);
 
-		/* lock each buffer header before inspecting. */
+		/* Lock each buffer header before inspecting. */
 		buf_state = LockBufHdr(bufHdr);
 
 		if (buf_state & BM_TAG_VALID)
@@ -696,9 +686,7 @@ dump_now(bool is_bgworker)
 
 	for (i = 0; i < num_blocks; i++)
 	{
-		/*
-		 * In case of a SIGHUP, just reload the configuration.
-		 */
+		/* In case of a SIGHUP, just reload the configuration. */
 		if (got_sighup)
 		{
 			got_sighup = false;
@@ -738,7 +726,7 @@ dump_now(bool is_bgworker)
 	pfree(block_info_array);
 
 	/*
-	 * rename transient_dump_file_path to AUTOPREWARM_FILE to make things
+	 * Rename transient_dump_file_path to AUTOPREWARM_FILE to make things
 	 * permanent.
 	 */
 	ret = CloseTransientFile(fd);
@@ -758,10 +746,9 @@ dump_now(bool is_bgworker)
 
 /*
  * dump_block_info_periodically
- *		Loop which periodically calls dump_now()
+ *		Sub-routine to periodically call dump_now().
  *
- * At regular intervals, which is defined by GUC dump_interval, dump_now() will
- * be called.
+ * Call dum_now() at regular intervals defined by GUC variable --dump_interval.
  */
 void
 dump_block_info_periodically(void)
@@ -776,7 +763,7 @@ dump_block_info_periodically(void)
 		nap.tv_sec = AT_PWARM_DEFAULT_DUMP_INTERVAL;
 		nap.tv_usec = 0;
 
-		/* Has been set not to dump. Nothing more to do. */
+		/* If prewarm is set to off then nothing more to do. */
 		if (dump_interval == AT_PWARM_OFF)
 			return;
 
@@ -790,8 +777,8 @@ dump_block_info_periodically(void)
 			{
 				dump_now(true);
 				if (got_sigterm)
-					return;		/* got shutdown signal during or right after a
-								 * dump. And, I think better to return now. */
+					return;		/* It is better to return when shutdown signal
+								 * is receive during or right after a dump. */
 				last_dump_time = GetCurrentTimestamp();
 				nap.tv_sec = dump_interval;
 				nap.tv_usec = 0;
@@ -817,9 +804,7 @@ dump_block_info_periodically(void)
 		if (rc & WL_POSTMASTER_DEATH)
 			proc_exit(1);
 
-		/*
-		 * In case of a SIGHUP, just reload the configuration.
-		 */
+		/* In case of a SIGHUP, just reload the configuration. */
 		if (got_sighup)
 		{
 			got_sighup = false;
@@ -834,7 +819,7 @@ dump_block_info_periodically(void)
 
 /*
  * autoprewarm_main
- *		The main entry point of autoprewarm bgworker process
+ *		The main entry point of autoprewarm bgworker process.
  */
 void
 autoprewarm_main(Datum main_arg)
@@ -846,7 +831,7 @@ autoprewarm_main(Datum main_arg)
 	pqsignal(SIGHUP, apw_sighup_handler);
 	pqsignal(SIGUSR1, apw_sigusr1_handler);
 
-	/* We're now ready to receive signals */
+	/* We're now ready to receive signals. */
 	BackgroundWorkerUnblockSignals();
 
 	todo_task = DatumGetInt32(main_arg);
@@ -859,7 +844,7 @@ autoprewarm_main(Datum main_arg)
 	{
 		LWLockRelease(&state->lock);
 		ereport(LOG,
-				(errmsg("could not continue autoprewarm worker is already running under PID %d",
+				(errmsg("autoprewarm worker is already running under PID %d",
 						state->bgworker_pid)));
 		return;
 	}
@@ -873,7 +858,7 @@ autoprewarm_main(Datum main_arg)
 			(errmsg("autoprewarm has started")));
 
 	/*
-	 * **** perform autoprewarm's task	****
+	 * **** Perform autoprewarm's task	****
 	 */
 	if (todo_task == TASK_PREWARM_BUFFERPOOL &&
 		!state->skip_prewarm_on_restart)
@@ -886,13 +871,13 @@ autoprewarm_main(Datum main_arg)
 }
 
 /* ============================================================================
- * =============	extension's entry functions/utilities	===================
+ * =============	Extension's entry functions/utilities	===================
  * ============================================================================
  */
 
 /*
  * setup_autoprewarm
- *		A Common function to initialize BackgroundWorker structure
+ *		A Common function to initialize BackgroundWorker structure.
  */
 static void
 setup_autoprewarm(BackgroundWorker *autoprewarm, const char *worker_name,
@@ -913,7 +898,7 @@ setup_autoprewarm(BackgroundWorker *autoprewarm, const char *worker_name,
 
 /*
  * _PG_init
- *		Extension's entry point
+ *		Extension's entry point.
  */
 void
 _PG_init(void)
@@ -948,14 +933,14 @@ _PG_init(void)
 
 	EmitWarningsOnPlaceholders("pg_prewarm");
 
-	/* if not run as a preloaded library, nothing more to do here! */
+	/* If not run as a preloaded library, nothing more to do here. */
 	if (!process_shared_preload_libraries_in_progress)
 		return;
 
-	/* Request additional shared resources */
+	/* Request additional shared resources. */
 	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
 
-	/* Has been set not to start autoprewarm bgworker. Nothing more to do. */
+	/* If autoprewarm bgworker is disabled then nothing more to do. */
 	if (!autoprewarm)
 		return;
 
@@ -967,7 +952,7 @@ _PG_init(void)
 
 /*
  * autoprewarm_dump_launcher
- *		Dynamically launch an autoprewarm dump worker
+ *		Dynamically launch an autoprewarm dump worker.
  */
 static pid_t
 autoprewarm_dump_launcher(void)
@@ -980,7 +965,7 @@ autoprewarm_dump_launcher(void)
 	setup_autoprewarm(&worker, "autoprewarm", "autoprewarm_main",
 					  Int32GetDatum(TASK_DUMP_BUFFERPOOL_INFO), 0, 0);
 
-	/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
 	worker.bgw_notify_pid = MyProcPid;
 
 	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
@@ -1014,14 +999,14 @@ autoprewarm_dump_launcher(void)
 
 /*
  * launch_autoprewarm_dump
- *		The C-Language entry function to launch autoprewarm dump bgworker
+ *		The C-Language entry function to launch autoprewarm dump bgworker.
  */
 Datum
 launch_autoprewarm_dump(PG_FUNCTION_ARGS)
 {
 	pid_t		pid;
 
-	/* Has been set not to dump. Nothing more to do. */
+	/* If dump_interval is disabled then nothing more to do. */
 	if (dump_interval == AT_PWARM_OFF)
 		PG_RETURN_NULL();
 
@@ -1031,7 +1016,7 @@ launch_autoprewarm_dump(PG_FUNCTION_ARGS)
 
 /*
  * autoprewarm_dump_now
- *		The C-Language entry function to dump immediately
+ *		The C-Language entry function to dump immediately.
  */
 Datum
 autoprewarm_dump_now(PG_FUNCTION_ARGS)

#79

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Rafia Sabih (#78)

Re: Proposal : For Auto-Prewarm.

On Mon, Jun 5, 2017 at 7:58 AM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

I had a look at the patch from stylistic/formatting point of view,
please find the attached patch for the suggested modifications.

Many of these seem worse, like these ones:

-         * Quit if we've reached records for another database. Unless the
+         * Quit if we've reached records of another database. Unless the

-         * When we reach a new relation, close the old one.  Note, however,
-         * that the previous try_relation_open may have failed, in which case
-         * rel will be NULL.
+         * On reaching a new relation, close the old one.  Note, that the
+         * previous try_relation_open may have failed, in which case rel will
+         * be NULL.

-         * Try to open each new relation, but only once, when we first
-         * encounter it.  If it's been dropped, skip the associated blocks.
+         * Each relation is open only once at it's first encounter. If it's
+         * been dropped, skip the associated blocks.

Others are better, like these:

-                (errmsg("could not continue autoprewarm worker is
already running under PID %d",
+                (errmsg("autoprewarm worker is already running under PID %d",

- * Start of prewarm per-database worker. This will try to load blocks of one
+ * Start prewarm per-database worker, which will load blocks of one

Others don't really seem better or worse, like:

-         * Register a per-database worker to load new database's block. And
-         * wait until they finish their job to launch next one.
+         * Register a per-database worker to load new database's block. Wait
+         * until they finish their job to launch next one.

IMHO, there's still a good bit of work needed here to make this sound
like American English. For example:

- *        It is a bgworker which automatically records information about blocks
- *        which were present in buffer pool before server shutdown and then
- *        prewarm the buffer pool upon server restart with those blocks.
+ *        It is a bgworker process that automatically records information about
+ *        blocks which were present in buffer pool before server
shutdown and then
+ *        prewarms the buffer pool upon server restart with those blocks.

This construction "It is a..." without a clear referent seems to be
standard in Indian English, but it looks wrong to English speakers
from other parts of the world, or at least to me.

+     * Since there could be at max one worker who could do a prewarm, hence,
+     * acquiring locks is not required before setting skip_prewarm_on_restart.

To me, adding a comma before hence looks like a significant
improvement, but the word hence itself seems out-of-place. Also, I'd
change "at max" to "at most" and maybe reword the sentence a little.
There's a lot of little things like this which I have tended be quite
strict about changing before commit; I occasionally wonder whether
it's really worth the effort. It's not really wrong, it just sounds
weird to me as an American.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#80

Neha Sharma

neha.sharma@enterprisedb.com

over 8 years ago

In reply to: Robert Haas (#79)

Re: Proposal : For Auto-Prewarm.

Hi,

I have been testing this feature for a while and below are the observations
for few scenarios.

*Observation:*
scenario 1: If we set pg_prewarm.dump_interval = -1.0,we get an additional
warning message in logfile and instead of ending the task of auto-dump it
executes successfully.
[centos@test-machine bin]$ more logfile
2017-06-06 08:39:53.127 GMT [21905] WARNING: invalid value for parameter
"pg_prewarm.dump_interval": "-1.0"
2017-06-06 08:39:53.127 GMT [21905] HINT: Valid units for this parameter
are "ms", "s", "min", "h", and "d".
2017-06-06 08:39:53.127 GMT [21905] LOG: listening on IPv6 address "::1",
port 5432
2017-06-06 08:39:53.127 GMT [21905] LOG: listening on IPv4 address
"127.0.0.1", port 5432
2017-06-06 08:39:53.130 GMT [21905] LOG: listening on Unix socket
"/tmp/.s.PGSQL.5432"
2017-06-06 08:39:53.143 GMT [21906] LOG: database system was shut down at
2017-06-06 08:38:20 GMT
2017-06-06 08:39:53.155 GMT [21905] LOG: database system is ready to
accept connections
2017-06-06 08:39:53.155 GMT [21912] LOG: autoprewarm has started
[centos@test-machine bin]$ ps -ef | grep prewarm
centos 21912 21905 0 08:39 ? 00:00:00 postgres: bgworker:
autoprewarm
[centos@test-machine bin]$ ./psql postgres
psql (10beta1)
Type "help" for help.

postgres=# show pg_prewarm.dump_interval;
pg_prewarm.dump_interval
--------------------------
5min
(1 row)

scenario 2: If we set pg_prewarm.dump_interval = 0.0,we get an additional
warning message in logfile and the message states that the task was started
and the worker thread it is also active,but the dump_interval duration is
set to default 5 min (300 sec) instead of 0.

[centos@test-machine bin]$ ps -ef | grep prewarm
centos 21980 21973 0 08:54 ? 00:00:00 postgres: bgworker:
autoprewarm

[centos@test-machine bin]$ more logfile
2017-06-06 09:20:52.436 GMT [22223] WARNING: invalid value for parameter
"pg_prewarm.dump_interval": "0.0"
2017-06-06 09:20:52.436 GMT [22223] HINT: Valid units for this parameter
are "ms", "s", "min", "h", and "d".
2017-06-06 09:20:52.436 GMT [22223] LOG: listening on IPv6 address "::1",
port 5432
2017-06-06 09:20:52.437 GMT [22223] LOG: listening on IPv4 address
"127.0.0.1", port 5432
2017-06-06 09:20:52.439 GMT [22223] LOG: listening on Unix socket
"/tmp/.s.PGSQL.5432"
2017-06-06 09:20:52.452 GMT [22224] LOG: database system was shut down at
2017-06-06 09:19:49 GMT
2017-06-06 09:20:52.455 GMT [22223] LOG: database system is ready to
accept connections
2017-06-06 09:20:52.455 GMT [22230] LOG: autoprewarm has started

[centos@test-machine bin]$ ./psql postgres
psql (10beta1)
Type "help" for help.

postgres=# show pg_prewarm.dump_interval;
pg_prewarm.dump_interval
--------------------------
5min
(1 row)

On Mon, Jun 5, 2017 at 8:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jun 5, 2017 at 7:58 AM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

I had a look at the patch from stylistic/formatting point of view,
please find the attached patch for the suggested modifications.

Many of these seem worse, like these ones:
-         * Quit if we've reached records for another database. Unless the
+         * Quit if we've reached records of another database. Unless the
-         * When we reach a new relation, close the old one.  Note,
however,
-         * that the previous try_relation_open may have failed, in which
case
-         * rel will be NULL.
+         * On reaching a new relation, close the old one.  Note, that the
+         * previous try_relation_open may have failed, in which case rel
will
+         * be NULL.
-         * Try to open each new relation, but only once, when we first
-         * encounter it.  If it's been dropped, skip the associated
blocks.
+         * Each relation is open only once at it's first encounter. If
it's
+         * been dropped, skip the associated blocks.
Others are better, like these:
-                (errmsg("could not continue autoprewarm worker is
already running under PID %d",
+                (errmsg("autoprewarm worker is already running under PID
%d",
- * Start of prewarm per-database worker. This will try to load blocks of
one
+ * Start prewarm per-database worker, which will load blocks of one
Others don't really seem better or worse, like:
-         * Register a per-database worker to load new database's block.
And
-         * wait until they finish their job to launch next one.
+         * Register a per-database worker to load new database's block.
Wait
+         * until they finish their job to launch next one.
IMHO, there's still a good bit of work needed here to make this sound
like American English. For example:
- *        It is a bgworker which automatically records information about
blocks
- *        which were present in buffer pool before server shutdown and
then
- *        prewarm the buffer pool upon server restart with those blocks.
+ *        It is a bgworker process that automatically records information
about
+ *        blocks which were present in buffer pool before server
shutdown and then
+ *        prewarms the buffer pool upon server restart with those blocks.
This construction "It is a..." without a clear referent seems to be
standard in Indian English, but it looks wrong to English speakers
from other parts of the world, or at least to me.
+     * Since there could be at max one worker who could do a prewarm,
hence,
+     * acquiring locks is not required before setting
skip_prewarm_on_restart.
To me, adding a comma before hence looks like a significant
improvement, but the word hence itself seems out-of-place. Also, I'd
change "at max" to "at most" and maybe reword the sentence a little.
There's a lot of little things like this which I have tended be quite
strict about changing before commit; I occasionally wonder whether
it's really worth the effort. It's not really wrong, it just sounds
weird to me as an American.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Regards,

Neha Sharma

Import Notes

Resolved by subject fallback

#81

Rafia Sabih

rafia.sabih@enterprisedb.com

over 8 years ago

In reply to: Robert Haas (#79)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

On Mon, Jun 5, 2017 at 8:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Many of these seem worse, like these ones:

-         * Quit if we've reached records for another database. Unless the
+         * Quit if we've reached records of another database. Unless the

Why is it worse? I've always encountered using "records of database"
and not "records for database". Anyhow, I tried reframing the sentence
altogether.

-         * When we reach a new relation, close the old one.  Note, however,
-         * that the previous try_relation_open may have failed, in which case
-         * rel will be NULL.
+         * On reaching a new relation, close the old one.  Note, that the
+         * previous try_relation_open may have failed, in which case rel will
+         * be NULL.

Reframed the sentence.

-         * Try to open each new relation, but only once, when we first
-         * encounter it.  If it's been dropped, skip the associated blocks.
+         * Each relation is open only once at it's first encounter. If it's
+         * been dropped, skip the associated blocks.

Reframed.

IMHO, there's still a good bit of work needed here to make this sound
like American English. For example:
- *        It is a bgworker which automatically records information about blocks
- *        which were present in buffer pool before server shutdown and then
- *        prewarm the buffer pool upon server restart with those blocks.
+ *        It is a bgworker process that automatically records information about
+ *        blocks which were present in buffer pool before server
shutdown and then
+ *        prewarms the buffer pool upon server restart with those blocks.
This construction "It is a..." without a clear referent seems to be
standard in Indian English, but it looks wrong to English speakers
from other parts of the world, or at least to me.

Agreed, tried reframing the sentence.

+     * Since there could be at max one worker who could do a prewarm, hence,
+     * acquiring locks is not required before setting skip_prewarm_on_restart.
To me, adding a comma before hence looks like a significant
improvement, but the word hence itself seems out-of-place. Also, I'd
change "at max" to "at most" and maybe reword the sentence a little.
There's a lot of little things like this which I have tended be quite
strict about changing before commit; I occasionally wonder whether
it's really worth the effort. It's not really wrong, it just sounds
weird to me as an American.

Agree, sentence reframed.
I am attaching the patch with the modifications I made on a second look.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachments:

cosmetic_autoprewarm_v2.patchapplication/octet-stream; name=cosmetic_autoprewarm_v2.patchDownload

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index 51acc09aa7..f97094201b 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -1,26 +1,27 @@
 /*-------------------------------------------------------------------------
  *
  * autoprewarm.c
- *		Automatically prewarm the shared buffer pool when server restarts.
+ *		Automatically prewarms the shared buffer pool when server restarts.
  *
  * DESCRIPTION
  *
- *		It is a bgworker which automatically records information about blocks
- *		which were present in buffer pool before server shutdown and then
- *		prewarm the buffer pool upon server restart with those blocks.
+ *		Autoprewarm is a bgworker process that automatically records the
+ *		information about blocks which were present in buffer pool before
+ *		server shutdown. Then prewarms the buffer pool on server restart
+ *		with those blocks.
  *
  *		How does it work? When the shared library "pg_prewarm" is preloaded, a
  *		bgworker "autoprewarm" is launched immediately after the server has
- *		reached consistent state. The bgworker will start loading blocks
- *		recorded in the format BlockInfoRecord
- *		database,tablespace,filenode,forknum,blocknum in
+ *		reached a consistent state. The bgworker will start loading blocks
+ *		recorded in the format BlockInfoRecord consisting of
+ *		database, tablespace, filenode, forknum, blocknum in
  *		$PGDATA/AUTOPREWARM_FILE, until there is no free buffer left in the
  *		buffer pool. This way we do not replace any new blocks which were
  *		loaded either by the recovery process or the querying clients.
  *
  *		Once the "autoprewarm" bgworker has completed its prewarm task, it will
  *		start a new task to periodically dump the BlockInfoRecords related to
- *		blocks which are currently in shared buffer pool. Upon next server
+ *		the blocks which are currently in shared buffer pool. On next server
  *		restart, the bgworker will prewarm the buffer pool by loading those
  *		blocks. The GUC pg_prewarm.dump_interval will control the dumping
  *		activity of the bgworker.
@@ -89,14 +90,13 @@ static void apw_sigterm_handler(SIGNAL_ARGS);
 static void apw_sighup_handler(SIGNAL_ARGS);
 static void apw_sigusr1_handler(SIGNAL_ARGS);
 
-/* flags set by signal handlers */
+/* Flags set by signal handlers */
 static volatile sig_atomic_t got_sigterm = false;
 static volatile sig_atomic_t got_sighup = false;
 
 /*
- *	Signal handler for SIGTERM
- *	Set a flag to let the main loop to terminate, and set our latch to wake it
- *	up.
+ * 	Signal handler for SIGTERM
+ * 	Set a flag to handle.
  */
 static void
 apw_sigterm_handler(SIGNAL_ARGS)
@@ -113,8 +113,7 @@ apw_sigterm_handler(SIGNAL_ARGS)
 
 /*
  *	Signal handler for SIGHUP
- *	Set a flag to tell the process to reread the config file, and set our
- *	latch to wake it up.
+ *	Set a flag to reread the config file.
  */
 static void
 apw_sighup_handler(SIGNAL_ARGS)
@@ -131,8 +130,7 @@ apw_sighup_handler(SIGNAL_ARGS)
 
 /*
  *	Signal handler for SIGUSR1.
- *	The prewarm per-database workers will notify with SIGUSR1 on their
- *	startup/shutdown.
+ *	The prewarm workers notify with SIGUSR1 on their startup/shutdown.
  */
 static void
 apw_sigusr1_handler(SIGNAL_ARGS)
@@ -146,13 +144,11 @@ apw_sigusr1_handler(SIGNAL_ARGS)
 }
 
 /* ============================================================================
- * ==============	types and variables used by autoprewarm   =============
+ * ==============	Types and variables used by autoprewarm   =============
  * ============================================================================
  */
 
-/*
- * Metadata of each persistent block which is dumped and used to load.
- */
+/* Metadata of each persistent block which is dumped and used for loading. */
 typedef struct BlockInfoRecord
 {
 	Oid			database;
@@ -162,18 +158,14 @@ typedef struct BlockInfoRecord
 	BlockNumber blocknum;
 } BlockInfoRecord;
 
-/*
- * Tasks performed by autoprewarm workers.
- */
+/* Tasks performed by autoprewarm workers.*/
 typedef enum
 {
 	TASK_PREWARM_BUFFERPOOL,	/* prewarm the buffer pool. */
 	TASK_DUMP_BUFFERPOOL_INFO	/* dump the buffer pool block info. */
 } AutoPrewarmTask;
 
-/*
- * Shared state information about the running autoprewarm bgworker.
- */
+/* Shared state information for autoprewarm bgworker. */
 typedef struct AutoPrewarmSharedState
 {
 	LWLock		lock;			/* mutual exclusion */
@@ -182,7 +174,7 @@ typedef struct AutoPrewarmSharedState
 	bool		skip_prewarm_on_restart;		/* if set true, prewarm task
 												 * will not be done */
 
-	/* following items are for communication with per-database worker */
+	/* Following items are for communication with per-database worker */
 	dsm_handle	block_info_handle;
 	Oid			database;
 	int			prewarm_start_idx;
@@ -191,16 +183,16 @@ typedef struct AutoPrewarmSharedState
 
 static AutoPrewarmSharedState *state = NULL;
 
-/* GUC variable which control the dump activity of autoprewarm. */
+/* GUC variable that controls the dump activity of autoprewarm. */
 static int	dump_interval = 0;
 
 /*
- * GUC variable which say whether autoprewarm worker has to be started when
+ * GUC variable to decide if autoprewarm worker has to be started when
  * preloaded.
  */
 static bool autoprewarm = true;
 
-/* compare member elements to check if they are not equal. */
+/* Compare member elements to check if they are not equal. */
 #define cmp_member_elem(fld)	\
 do { \
 	if (a->fld < b->fld)		\
@@ -228,7 +220,7 @@ blockinfo_cmp(const void *p, const void *q)
 }
 
 /* ============================================================================
- * =====================	prewarm part of autoprewarm =======================
+ * =====================	Prewarm part of autoprewarm =======================
  * ============================================================================
  */
 
@@ -248,7 +240,7 @@ reset_shm_state(int code, Datum arg)
 
 /*
  * init_autoprewarm_state
- *		Allocate and initialize autoprewarm related shared memory
+ *		Allocate and initialize autoprewarm related shared memory.
  */
 static void
 init_autoprewarm_state(void)
@@ -273,11 +265,10 @@ init_autoprewarm_state(void)
 
 /*
  * load_one_database
- *		Load block infos of one database by connecting to them.
+ *		This sub-routine loads the BlockInfoRecords of the database set in AutoPrewarmSharedState.
  *
- * Start of prewarm per-database worker. This will try to load blocks of one
- * database starting from block info position state->prewarm_start_idx to
- * state->prewarm_stop_idx.
+ * Connect to the database and load the blocks of that database starting from
+ * the position state->prewarm_start_idx to state->prewarm_stop_idx.
  */
 void
 load_one_database(Datum main_arg)
@@ -293,9 +284,7 @@ load_one_database(Datum main_arg)
 	pqsignal(SIGTERM, apw_sigterm_handler);
 	pqsignal(SIGHUP, apw_sighup_handler);
 
-	/*
-	 * We're now ready to receive signals
-	 */
+	/* We're now ready to receive signals */
 	BackgroundWorkerUnblockSignals();
 
 	init_autoprewarm_state();
@@ -317,18 +306,17 @@ load_one_database(Datum main_arg)
 		Buffer		buf;
 
 		/*
-		 * Quit if we've reached records for another database. Unless the
-		 * previous blocks were of global objects which were combined with
-		 * next database's block infos.
+		 * Quit if we've reached records for another database. If previous
+		 * blocks are of some global objects, then continue pre-warming.
 		 */
 		if (old_blk != NULL && old_blk->database != blk->database &&
 			old_blk->database != 0)
 			break;
 
 		/*
-		 * When we reach a new relation, close the old one.  Note, however,
-		 * that the previous try_relation_open may have failed, in which case
-		 * rel will be NULL.
+		 * As soon as we encounter a block of a new relation, close the old
+		 * relation. Note, that rel will be NULL if try_relation_open failed
+		 * previously, in that case there is nothing to close.
 		 */
 		if (old_blk != NULL && old_blk->filenode != blk->filenode &&
 			rel != NULL)
@@ -340,7 +328,7 @@ load_one_database(Datum main_arg)
 
 		/*
 		 * Try to open each new relation, but only once, when we first
-		 * encounter it.  If it's been dropped, skip the associated blocks.
+		 * encounter it. If it's been dropped, skip the associated blocks.
 		 */
 		if (old_blk == NULL || old_blk->filenode != blk->filenode)
 		{
@@ -370,8 +358,8 @@ load_one_database(Datum main_arg)
 			RelationOpenSmgr(rel);
 
 			/*
-			 * smgrexists is not safe for illegal forknum, so test before
-			 * calling same.
+			 * smgrexists is not safe for illegal forknum, hence check if the
+			 * passed forknum is valid before using it in smgrexists.
 			 */
 			if (blk->forknum > InvalidForkNumber &&
 				blk->forknum <= MAX_FORKNUM &&
@@ -381,10 +369,10 @@ load_one_database(Datum main_arg)
 				nblocks = 0;
 		}
 
-		/* check if blocknum is valid and with in fork file size. */
+		/* Check if blocknum is valid and within fork file size. */
 		if (blk->blocknum >= nblocks)
 		{
-			/* move to next forknum. */
+			/* Move to next forknum. */
 			++pos;
 			old_blk = blk;
 			continue;
@@ -402,7 +390,7 @@ load_one_database(Datum main_arg)
 
 	dsm_detach(seg);
 
-	/* release lock on previous relation. */
+	/* Release lock on previous relation. */
 	if (rel)
 	{
 		relation_close(rel, AccessShareLock);
@@ -427,7 +415,7 @@ launch_and_wait_for_per_database_worker(void)
 					  (Datum) NULL, BGW_NEVER_RESTART,
 					  BGWORKER_BACKEND_DATABASE_CONNECTION);
 
-	/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerShutdown */
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerShutdown */
 	worker.bgw_notify_pid = MyProcPid;
 
 	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
@@ -444,14 +432,14 @@ launch_and_wait_for_per_database_worker(void)
 
 /*
  * prewarm_buffer_pool
- *		The main routine which prewarm the buffer pool
+ *		The main routine that prewarms the buffer pool.
  *
- * The prewarm bgworker will first load all of the BlockInfoRecord's in
- * $PGDATA/AUTOPREWARM_FILE to a dsm. And those BlockInfoRecords are further
- * separated based on their database. And for each group of BlockInfoRecords a
- * per-database worker will be launched to load corresponding blocks. Each of
- * those workers will be launched in sequential order only after the previous
- * one has finished its job.
+ * The prewarm bgworker will first load all the BlockInfoRecords in
+ * $PGDATA/AUTOPREWARM_FILE to a dsm. Further, these BlockInfoRecords are
+ * separated based on their databases. Finally, for each group of
+ * BlockInfoRecords a per-database worker will be launched to load the
+ * corresponding blocks. Launch the next worker only after the previous one has
+ * finished its job.
  */
 static void
 prewarm_buffer_pool(void)
@@ -463,8 +451,8 @@ prewarm_buffer_pool(void)
 	dsm_segment *seg;
 
 	/*
-	 * since there could be at max one worker who could do a prewarm no need
-	 * to take lock before setting skip_prewarm_on_restart.
+	 * Since there could be at most one worker for prewarm, locking is not
+	 * required for setting skip_prewarm_on_restart.
 	 */
 	state->skip_prewarm_on_restart = true;
 
@@ -509,7 +497,7 @@ prewarm_buffer_pool(void)
 
 	for (i = 0; i < num_elements; i++)
 	{
-		/* get next block. */
+		/* Get next block. */
 		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &blkinfo[i].database,
 						&blkinfo[i].tablespace, &blkinfo[i].filenode,
 						(uint32 *) &blkinfo[i].forknum, &blkinfo[i].blocknum))
@@ -523,7 +511,7 @@ prewarm_buffer_pool(void)
 			 i, num_elements);
 
 	/*
-	 * sort the block number to increase the chance of sequential reads during
+	 * Sort the block number to increase the chance of sequential reads during
 	 * load.
 	 */
 	pg_qsort(blkinfo, num_elements, sizeof(BlockInfoRecord), blockinfo_cmp);
@@ -531,15 +519,15 @@ prewarm_buffer_pool(void)
 	state->block_info_handle = dsm_segment_handle(seg);
 	state->prewarm_start_idx = state->prewarm_stop_idx = 0;
 
-	/* get next database's first block info's position. */
+	/* Get the info position of the first block of the next database. */
 	while (state->prewarm_start_idx < num_elements)
 	{
 		uint32		i = state->prewarm_start_idx;
 		Oid			current_db = blkinfo[i].database;
 
 		/*
-		 * advance the prewarm_stop_idx to end of block infos of current
-		 * database.
+		 * Advance the prewarm_stop_idx to the first BlockRecordInfo that does
+		 * not belong to this database.
 		 */
 		do
 		{
@@ -547,9 +535,8 @@ prewarm_buffer_pool(void)
 			if (current_db != blkinfo[i].database)
 			{
 				/*
-				 * For block info of a global object whose database will be 0
-				 * try to combine them with next non-zero database's block
-				 * infos to load.
+				 * For the BlockRecordInfos of a global object, combine them
+				 * with BlockRecordInfos of the next non-global object.
 				 */
 				if (current_db != InvalidOid)
 					break;
@@ -558,9 +545,9 @@ prewarm_buffer_pool(void)
 		} while (i < num_elements);
 
 		/*
-		 * If we are here with database as InvalidOid it means we only have
-		 * block_infos belonging to global objects. As we do not have a valid
-		 * database to connect we shall simply ignore them.
+		 * If we are here with database having InvalidOid, then only
+		 * BlockRecordInfos belonging to global objects exist. Since, we can
+		 * not connect with InvalidOid skip prewarming for these objects.
 		 */
 		if (current_db == 0)
 			break;
@@ -571,8 +558,8 @@ prewarm_buffer_pool(void)
 		Assert(state->prewarm_start_idx < state->prewarm_stop_idx);
 
 		/*
-		 * Register a per-database worker to load new database's block. And
-		 * wait until they finish their job to launch next one.
+		 * Register a per-database worker to load blocks of the database. Wait
+		 * until it has finished before starting the next worker.
 		 */
 		launch_and_wait_for_per_database_worker();
 		state->prewarm_start_idx = state->prewarm_stop_idx;
@@ -587,22 +574,23 @@ prewarm_buffer_pool(void)
 	return;
 }
 
-/* ============================================================================
- * =============	buffer pool info dump part of autoprewarm	===============
+/*
+ * ============================================================================
+ * ===================== Dump part of Autoprewarm =============================
  * ============================================================================
  */
 
-/* This sub-module is for periodically dumping buffer pool's block info into
- * a dump file AUTOPREWARM_FILE.
- * Each entry of block info looks like this:
- * database,tablespace,filenode,forknum,blocknum and we shall call it as
- * BlockInfoRecord. Note we write in the text form so that the dump information
- * is readable and if necessary can be carefully edited.
+/*
+ * This sub-module is for periodically dumping BlockRecordInfos in buffer pool
+ * into a dump file AUTOPREWARM_FILE.
+ * Each entry of BlockRecordInfo consists of database, tablespace, filenode,
+ * forknum, blocknum. Note that this is in the text form so that the dump
+ * information is readable and can be edited, if required.
  */
 
 /*
  * dump_now
- *		Dumps block infos in buffer pool
+ *		Dumps BlockRecordInfos in buffer pool.
  */
 static uint32
 dump_now(bool is_bgworker)
@@ -643,9 +631,7 @@ dump_now(bool is_bgworker)
 	{
 		uint32		buf_state;
 
-		/*
-		 * In case of a SIGHUP, just reload the configuration.
-		 */
+		/* In case of a SIGHUP, just reload the configuration. */
 		if (got_sighup)
 		{
 			got_sighup = false;
@@ -661,7 +647,7 @@ dump_now(bool is_bgworker)
 
 		bufHdr = GetBufferDescriptor(i);
 
-		/* lock each buffer header before inspecting. */
+		/* Lock each buffer header before inspecting. */
 		buf_state = LockBufHdr(bufHdr);
 
 		if (buf_state & BM_TAG_VALID)
@@ -696,9 +682,7 @@ dump_now(bool is_bgworker)
 
 	for (i = 0; i < num_blocks; i++)
 	{
-		/*
-		 * In case of a SIGHUP, just reload the configuration.
-		 */
+		/* In case of a SIGHUP, just reload the configuration. */
 		if (got_sighup)
 		{
 			got_sighup = false;
@@ -738,7 +722,7 @@ dump_now(bool is_bgworker)
 	pfree(block_info_array);
 
 	/*
-	 * rename transient_dump_file_path to AUTOPREWARM_FILE to make things
+	 * Rename transient_dump_file_path to AUTOPREWARM_FILE to make things
 	 * permanent.
 	 */
 	ret = CloseTransientFile(fd);
@@ -758,10 +742,9 @@ dump_now(bool is_bgworker)
 
 /*
  * dump_block_info_periodically
- *		Loop which periodically calls dump_now()
+ *		Sub-routine to periodically call dump_now().
  *
- * At regular intervals, which is defined by GUC dump_interval, dump_now() will
- * be called.
+ * Call dum_now() at regular intervals defined by GUC variable dump_interval.
  */
 void
 dump_block_info_periodically(void)
@@ -776,7 +759,7 @@ dump_block_info_periodically(void)
 		nap.tv_sec = AT_PWARM_DEFAULT_DUMP_INTERVAL;
 		nap.tv_usec = 0;
 
-		/* Has been set not to dump. Nothing more to do. */
+		/* Have we been asked to stop dumping? */
 		if (dump_interval == AT_PWARM_OFF)
 			return;
 
@@ -789,9 +772,13 @@ dump_block_info_periodically(void)
 										   (dump_interval * 1000)))
 			{
 				dump_now(true);
+
+				/*
+				 * It is better to stop when shutdown signal is received
+				 * during or right after a dump.
+				 */
 				if (got_sigterm)
-					return;		/* got shutdown signal during or right after a
-								 * dump. And, I think better to return now. */
+					return;
 				last_dump_time = GetCurrentTimestamp();
 				nap.tv_sec = dump_interval;
 				nap.tv_usec = 0;
@@ -817,9 +804,7 @@ dump_block_info_periodically(void)
 		if (rc & WL_POSTMASTER_DEATH)
 			proc_exit(1);
 
-		/*
-		 * In case of a SIGHUP, just reload the configuration.
-		 */
+		/* In case of a SIGHUP, just reload the configuration. */
 		if (got_sighup)
 		{
 			got_sighup = false;
@@ -827,14 +812,14 @@ dump_block_info_periodically(void)
 		}
 	}
 
-	/* One last block info dump while postmaster shutdown. */
+	/* It's time for postmaster shutdown, let's dump for one last time. */
 	if (dump_interval != AT_PWARM_OFF)
 		dump_now(true);
 }
 
 /*
  * autoprewarm_main
- *		The main entry point of autoprewarm bgworker process
+ *		The main entry point of autoprewarm bgworker process.
  */
 void
 autoprewarm_main(Datum main_arg)
@@ -846,7 +831,7 @@ autoprewarm_main(Datum main_arg)
 	pqsignal(SIGHUP, apw_sighup_handler);
 	pqsignal(SIGUSR1, apw_sigusr1_handler);
 
-	/* We're now ready to receive signals */
+	/* We're now ready to receive signals. */
 	BackgroundWorkerUnblockSignals();
 
 	todo_task = DatumGetInt32(main_arg);
@@ -859,7 +844,7 @@ autoprewarm_main(Datum main_arg)
 	{
 		LWLockRelease(&state->lock);
 		ereport(LOG,
-				(errmsg("could not continue autoprewarm worker is already running under PID %d",
+				(errmsg("autoprewarm worker is already running under PID %d",
 						state->bgworker_pid)));
 		return;
 	}
@@ -872,9 +857,7 @@ autoprewarm_main(Datum main_arg)
 	ereport(LOG,
 			(errmsg("autoprewarm has started")));
 
-	/*
-	 * **** perform autoprewarm's task	****
-	 */
+	/* Perform autoprewarm's task. */
 	if (todo_task == TASK_PREWARM_BUFFERPOOL &&
 		!state->skip_prewarm_on_restart)
 		prewarm_buffer_pool();
@@ -886,13 +869,13 @@ autoprewarm_main(Datum main_arg)
 }
 
 /* ============================================================================
- * =============	extension's entry functions/utilities	===================
+ * =============	Extension's entry functions/utilities	===================
  * ============================================================================
  */
 
 /*
  * setup_autoprewarm
- *		A Common function to initialize BackgroundWorker structure
+ *		A Common function to initialize BackgroundWorker structure.
  */
 static void
 setup_autoprewarm(BackgroundWorker *autoprewarm, const char *worker_name,
@@ -913,7 +896,7 @@ setup_autoprewarm(BackgroundWorker *autoprewarm, const char *worker_name,
 
 /*
  * _PG_init
- *		Extension's entry point
+ *		Extension's entry point.
  */
 void
 _PG_init(void)
@@ -935,8 +918,8 @@ _PG_init(void)
 
 	DefineCustomIntVariable("pg_prewarm.dump_interval",
 					   "Sets the maximum time between two buffer pool dumps",
-							"If set to Zero, timer based dumping is disabled."
-							" If set to -1, stops the running autoprewarm.",
+							"If set to zero, timer based dumping is disabled."
+							" If set to -1, stops autoprewarm.",
 							&dump_interval,
 							AT_PWARM_DEFAULT_DUMP_INTERVAL,
 							AT_PWARM_OFF, INT_MAX / 1000,
@@ -948,14 +931,14 @@ _PG_init(void)
 
 	EmitWarningsOnPlaceholders("pg_prewarm");
 
-	/* if not run as a preloaded library, nothing more to do here! */
+	/* If not run as a preloaded library, nothing more to do. */
 	if (!process_shared_preload_libraries_in_progress)
 		return;
 
-	/* Request additional shared resources */
+	/* Request additional shared resources. */
 	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
 
-	/* Has been set not to start autoprewarm bgworker. Nothing more to do. */
+	/* If autoprewarm bgworker is disabled then nothing more to do. */
 	if (!autoprewarm)
 		return;
 
@@ -967,7 +950,7 @@ _PG_init(void)
 
 /*
  * autoprewarm_dump_launcher
- *		Dynamically launch an autoprewarm dump worker
+ *		Dynamically launch an autoprewarm dump worker.
  */
 static pid_t
 autoprewarm_dump_launcher(void)
@@ -980,7 +963,7 @@ autoprewarm_dump_launcher(void)
 	setup_autoprewarm(&worker, "autoprewarm", "autoprewarm_main",
 					  Int32GetDatum(TASK_DUMP_BUFFERPOOL_INFO), 0, 0);
 
-	/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
 	worker.bgw_notify_pid = MyProcPid;
 
 	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
@@ -1014,14 +997,14 @@ autoprewarm_dump_launcher(void)
 
 /*
  * launch_autoprewarm_dump
- *		The C-Language entry function to launch autoprewarm dump bgworker
+ *		The C-Language entry function to launch autoprewarm dump bgworker.
  */
 Datum
 launch_autoprewarm_dump(PG_FUNCTION_ARGS)
 {
 	pid_t		pid;
 
-	/* Has been set not to dump. Nothing more to do. */
+	/* If dump_interval is disabled then nothing more to do. */
 	if (dump_interval == AT_PWARM_OFF)
 		PG_RETURN_NULL();
 
@@ -1031,7 +1014,7 @@ launch_autoprewarm_dump(PG_FUNCTION_ARGS)
 
 /*
  * autoprewarm_dump_now
- *		The C-Language entry function to dump immediately
+ *		The C-Language entry function to dump immediately.
  */
 Datum
 autoprewarm_dump_now(PG_FUNCTION_ARGS)

#82

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Rafia Sabih (#81)

Re: Proposal : For Auto-Prewarm.

On Tue, Jun 6, 2017 at 6:29 AM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:

On Mon, Jun 5, 2017 at 8:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Many of these seem worse, like these ones:
-         * Quit if we've reached records for another database. Unless the
+         * Quit if we've reached records of another database. Unless the
Why is it worse? I've always encountered using "records of database"
and not "records for database". Anyhow, I tried reframing the sentence
altogether.

My experience is the opposite. If I Google "from another database",
including the quotes, I get 516,000 hits; if I do the same using "of
another database", I get 110,000 hits. I think that means the former
usage is more popular, although obviously to some degree it's a matter
of taste.

+ * database, tablespace, filenode, forknum, blocknum in

The file doesn't contain the spaces you added here, which is probably
why Mithun did it as he did, but actually we don't really need this
level of detail in the file header comment anyway. I think you could
drop the specific mention of how blocks are identified.

+ * GUC variable to decide if autoprewarm worker has to be started when

if->whether? has to be->should be?

Actually this patch uses "if" in various places where I would use
"whether", but that's probably a matter of taste to some extent.

- * Start of prewarm per-database worker. This will try to load blocks of one
- * database starting from block info position state->prewarm_start_idx to
- * state->prewarm_stop_idx.
+ * Connect to the database and load the blocks of that database starting from
+ * the position state->prewarm_start_idx to state->prewarm_stop_idx.

That's a really good rephrasing. It's more compact and more
imperative. The only thing that seems a little off is that you say
"starting from" and then mention both the start and stop indexes.
Maybe say "between" instead.

-         * Quit if we've reached records for another database. Unless the
-         * previous blocks were of global objects which were combined with
-         * next database's block infos.
+         * Quit if we've reached records for another database. If previous
+         * blocks are of some global objects, then continue pre-warming.

Good.

-         * When we reach a new relation, close the old one.  Note, however,
-         * that the previous try_relation_open may have failed, in which case
-         * rel will be NULL.
+         * As soon as we encounter a block of a new relation, close the old
+         * relation. Note, that rel will be NULL if try_relation_open failed
+         * previously, in that case there is nothing to close.

I wrote the original comment here, so you may not be too surprised to
here that I liked it as it was. Actually, your rewrite of the first
sentence seems like it might be better, but I think the second one is
not. By deleting "however", you've made the comma after "Note"
redundant, and by changing "in which case" to "in that case", you've
made a dependent clause into a comma splice. I hate comma splices.

https://en.wikipedia.org/wiki/Comma_splice

+ * $PGDATA/AUTOPREWARM_FILE to a dsm. Further, these BlockInfoRecords are

I would probably capitalize DSM here. There's no data structure
called dsm (lower-case) and we have other examples where it is
capitalized in documentation and comment text (see, e.g.
custom-scan.sgml).

+ * Since there could be at most one worker for prewarm, locking is not

could -> can?

-                 * For block info of a global object whose database will be 0
-                 * try to combine them with next non-zero database's block
-                 * infos to load.
+                 * For the BlockRecordInfos of a global object, combine them
+                 * with BlockRecordInfos of the next non-global object.

Good. Or even "Combine BlockRecordInfos for a global object with the
next non-global object", which I think is even more compact.

+         * If we are here with database having InvalidOid, then only
+         * BlockRecordInfos belonging to global objects exist. Since, we can
+         * not connect with InvalidOid skip prewarming for these objects.

If we reach this point with current_db == InvalidOid, ...

+ * Sub-routine to periodically call dump_now().

Every existing use of the word subroutine in our code based spells it
with no hyphen.

[rhaas pgsql]$ git grep -- Subroutine | wc -l
47
[rhaas pgsql]$ git grep -- Sub-routine | wc -l
0

Personally, I find this change worse, because calling something a
subroutine is totally generic, like saying that you met a "person" or
that something was "nice". Calling it a loop gives at least a little
bit of specific information.

+ * Call dum_now() at regular intervals defined by GUC variable dump_interval.

This change introduces an obvious typographical error.

-    /* One last block info dump while postmaster shutdown. */
+    /* It's time for postmaster shutdown, let's dump for one last time. */

Comma splice.

+ /* Perform autoprewarm's task. */

Uninformative.

+ * A Common function to initialize BackgroundWorker structure.

Adding a period to the end is a good idea, but how about also fixing
the capitalization of "Common"? I think this is a common usage in
India, but not elsewhere.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#83

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Neha Sharma (#80)

Re: Proposal : For Auto-Prewarm.

On Tue, Jun 6, 2017 at 3:48 PM, Neha Sharma
<neha.sharma@enterprisedb.com> wrote:

Hi,

I have been testing this feature for a while and below are the observations
for few scenarios.

Observation:
scenario 1: If we set pg_prewarm.dump_interval = -1.0,we get an additional
warning message in logfile and instead of ending the task of auto-dump it
executes successfully.
[centos@test-machine bin]$ more logfile
2017-06-06 08:39:53.127 GMT [21905] WARNING: invalid value for parameter
"pg_prewarm.dump_interval": "-1.0"
2017-06-06 08:39:53.127 GMT [21905] HINT: Valid units for this parameter
are "ms", "s", "min", "h", and "d".

postgres=# show pg_prewarm.dump_interval;
pg_prewarm.dump_interval
--------------------------
5min
(1 row)

scenario 2: If we set pg_prewarm.dump_interval = 0.0,we get an additional
warning message in logfile and the message states that the task was started
and the worker thread it is also active,but the dump_interval duration is
set to default 5 min (300 sec) instead of 0.

[centos@test-machine bin]$ ps -ef | grep prewarm
centos 21980 21973 0 08:54 ? 00:00:00 postgres: bgworker:
autoprewarm

[centos@test-machine bin]$ more logfile
2017-06-06 09:20:52.436 GMT [22223] WARNING: invalid value for parameter
"pg_prewarm.dump_interval": "0.0"
2017-06-06 09:20:52.436 GMT [22223] HINT: Valid units for this parameter
are "ms", "s", "min", "h", and "d".
postgres=# show pg_prewarm.dump_interval;
pg_prewarm.dump_interval
--------------------------
5min
(1 row)

dump_interval is int type custom GUC variable so passing floating
value is illegal here, but the reason we are only throwing a warning
and setting it to default 5 mins is that of existing behavior

@define_custom_variable

/*
* Assign the string value(s) stored in the placeholder to the real
* variable. Essentially, we need to duplicate all the active and stacked
* values, but with appropriate validation and datatype adjustment.
*
* If an assignment fails, we report a WARNING and keep going. We don't
* want to throw ERROR for bad values, because it'd bollix the add-on
* module that's presumably halfway through getting loaded. In such cases
* the default or previous state will become active instead.
*/

/* First, apply the reset value if any */
if (pHolder->reset_val)
(void) set_config_option(name, pHolder->reset_val,
pHolder->gen.reset_scontext,
pHolder->gen.reset_source,
GUC_ACTION_SET, true, WARNING, false);

I think this should be fine as it is defined behavior for all of the
add-on modules. Please let me know if you have any suggestions.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#84

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Robert Haas (#82)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

I have merged Rafia's patch for cosmetic changes. I have also fixed
some of the changes you have recommended over that. But kept few as it
is since Rafia's opinion was needed on that.

On Wed, Jun 7, 2017 at 5:57 PM, Robert Haas <robertmhaas@gmail.com> wrote:

My experience is the opposite. If I Google "from another database",
including the quotes, I get 516,000 hits; if I do the same using "of
another database", I get 110,000 hits. I think that means the former
usage is more popular, although obviously to some degree it's a matter
of taste.

-- Open.

+ * database, tablespace, filenode, forknum, blocknum in

The file doesn't contain the spaces you added here, which is probably
why Mithun did it as he did, but actually we don't really need this
level of detail in the file header comment anyway. I think you could
drop the specific mention of how blocks are identified.

-- Fixed. It has been removed now.

+ * GUC variable to decide if autoprewarm worker has to be started when

if->whether? has to be->should be?

Actually this patch uses "if" in various places where I would use
"whether", but that's probably a matter of taste to some extent.

-- Fixed. I have changed from if to whether.

- * Start of prewarm per-database worker. This will try to load blocks of one
- * database starting from block info position state->prewarm_start_idx to
- * state->prewarm_stop_idx.
+ * Connect to the database and load the blocks of that database starting from
+ * the position state->prewarm_start_idx to state->prewarm_stop_idx.
That's a really good rephrasing. It's more compact and more
imperative. The only thing that seems a little off is that you say
"starting from" and then mention both the start and stop indexes.
Maybe say "between" instead.

-- It is a half-open interval so rewrote it within the notation [,)

-         * Quit if we've reached records for another database. Unless the
-         * previous blocks were of global objects which were combined with
-         * next database's block infos.
+         * Quit if we've reached records for another database. If previous
+         * blocks are of some global objects, then continue pre-warming.
Good.
-         * When we reach a new relation, close the old one.  Note, however,
-         * that the previous try_relation_open may have failed, in which case
-         * rel will be NULL.
+         * As soon as we encounter a block of a new relation, close the old
+         * relation. Note, that rel will be NULL if try_relation_open failed
+         * previously, in that case there is nothing to close.
I wrote the original comment here, so you may not be too surprised to
here that I liked it as it was. Actually, your rewrite of the first
sentence seems like it might be better, but I think the second one is
not. By deleting "however", you've made the comma after "Note"
redundant, and by changing "in which case" to "in that case", you've
made a dependent clause into a comma splice. I hate comma splices.

https://en.wikipedia.org/wiki/Comma_splice

-- Open

+ * $PGDATA/AUTOPREWARM_FILE to a dsm. Further, these BlockInfoRecords are
I would probably capitalize DSM here. There's no data structure
called dsm (lower-case) and we have other examples where it is
capitalized in documentation and comment text (see, e.g.
custom-scan.sgml).

-- Fixed.

+ * Since there could be at most one worker for prewarm, locking is not

could -> can?

-- Fixed.

-                 * For block info of a global object whose database will be 0
-                 * try to combine them with next non-zero database's block
-                 * infos to load.
+                 * For the BlockRecordInfos of a global object, combine them
+                 * with BlockRecordInfos of the next non-global object.

Good. Or even "Combine BlockRecordInfos for a global object with the
next non-global object", which I think is even more compact.

-- Fixed.

+         * If we are here with database having InvalidOid, then only
+         * BlockRecordInfos belonging to global objects exist. Since, we can
+         * not connect with InvalidOid skip prewarming for these objects.

If we reach this point with current_db == InvalidOid, ...

--Fixed.

+ * Sub-routine to periodically call dump_now().

Every existing use of the word subroutine in our code based spells it
with no hyphen.

[rhaas pgsql]$ git grep -- Subroutine | wc -l
47
[rhaas pgsql]$ git grep -- Sub-routine | wc -l
0

Personally, I find this change worse, because calling something a
subroutine is totally generic, like saying that you met a "person" or
that something was "nice". Calling it a loop gives at least a little
bit of specific information.

-- Fixed. We call it a loop now.

+ * Call dum_now() at regular intervals defined by GUC variable dump_interval.

This change introduces an obvious typographical error.

-- Fixed.

-    /* One last block info dump while postmaster shutdown. */
+    /* It's time for postmaster shutdown, let's dump for one last time. */

Comma splice.

-- Open

+ /* Perform autoprewarm's task. */

Uninformative.

-- Fixed. I have removed same now.

+ * A Common function to initialize BackgroundWorker structure.

Adding a period to the end is a good idea, but how about also fixing
the capitalization of "Common"? I think this is a common usage in
India, but not elsewhere.

-- Fixed to common.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

autoprewarm_12.patchapplication/octet-stream; name=autoprewarm_12.patchDownload

commit b21ecfa28fe396aac3e960b759cdbe4da09113d4
Author: mithun <mithun@localhost.localdomain>
Date:   Fri Jun 9 09:42:30 2017 +0530

    patch 12

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e..88580d1 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,10 +1,10 @@
 # contrib/pg_prewarm/Makefile
 
 MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+OBJS = pg_prewarm.o autoprewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
-DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
+DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
new file mode 100644
index 0000000..646d508
--- /dev/null
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -0,0 +1,1039 @@
+/*-------------------------------------------------------------------------
+ *
+ * autoprewarm.c
+ *		Automatically prewarms the shared buffer pool when server restarts.
+ *
+ * DESCRIPTION
+ *
+ *		Autoprewarm is a bgworker process that automatically records the
+ *		information about blocks which were present in buffer pool before
+ *		server shutdown. Then prewarms the buffer pool on server restart
+ *		with those blocks.
+ *
+ *		How does it work? When the shared library "pg_prewarm" is preloaded, a
+ *		bgworker "autoprewarm" is launched immediately after the server has
+ *		reached a consistent state. The bgworker will start loading blocks
+ *		recorded until there is no free buffer left in the buffer pool. This
+ *		way we do not replace any new blocks which were loaded either by the
+ *		recovery process or the querying clients.
+ *
+ *		Once the "autoprewarm" bgworker has completed its prewarm task, it will
+ *		start a new task to periodically dump the BlockInfoRecords related to
+ *		the blocks which are currently in shared buffer pool. On next server
+ *		restart, the bgworker will prewarm the buffer pool by loading those
+ *		blocks. The GUC pg_prewarm.dump_interval will control the dumping
+ *		activity of the bgworker.
+ *
+ *	Copyright (c) 2016-2017, PostgreSQL Global Development Group
+ *
+ *	IDENTIFICATION
+ *		contrib/pg_prewarm/autoprewarm.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include <unistd.h>
+
+/* These are always necessary for a bgworker. */
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+
+/* These are necessary for prewarm utilities. */
+#include "access/heapam.h"
+#include "access/xact.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "pgstat.h"
+#include "storage/buf_internals.h"
+#include "storage/dsm.h"
+#include "storage/smgr.h"
+#include "utils/acl.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relfilenodemap.h"
+#include "utils/resowner.h"
+
+PG_FUNCTION_INFO_V1(launch_autoprewarm_dump);
+PG_FUNCTION_INFO_V1(autoprewarm_dump_now);
+
+#define AT_PWARM_OFF -1
+#define AT_PWARM_DUMP_AT_SHUTDOWN_ONLY 0
+#define AT_PWARM_DEFAULT_DUMP_INTERVAL 300
+
+#define AUTOPREWARM_FILE "autoprewarm.blocks"
+
+/* Primary functions */
+void		_PG_init(void);
+void		autoprewarm_main(Datum main_arg);
+static void dump_block_info_periodically(void);
+static pid_t autoprewarm_dump_launcher(void);
+static void setup_autoprewarm(BackgroundWorker *autoprewarm,
+				  const char *worker_name,
+				  const char *worker_function,
+				  Datum main_arg, int restart_time,
+				  int extra_flags);
+void		load_one_database(Datum main_arg);
+
+/*
+ * Signal Handlers.
+ */
+
+static void apw_sigterm_handler(SIGNAL_ARGS);
+static void apw_sighup_handler(SIGNAL_ARGS);
+static void apw_sigusr1_handler(SIGNAL_ARGS);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+static volatile sig_atomic_t got_sighup = false;
+
+/*
+ *	Signal handler for SIGTERM
+ *	Set a flag to handle.
+ */
+static void
+apw_sigterm_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGHUP
+ *	Set a flag to reread the config file.
+ */
+static void
+apw_sighup_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sighup = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGUSR1.
+ *	The prewarm workers notify with SIGUSR1 on their startup/shutdown.
+ */
+static void
+apw_sigusr1_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/* ============================================================================
+ * ==============	Types and variables used by autoprewarm   =============
+ * ============================================================================
+ */
+
+/* Metadata of each persistent block which is dumped and used for loading. */
+typedef struct BlockInfoRecord
+{
+	Oid			database;
+	Oid			tablespace;
+	Oid			filenode;
+	ForkNumber	forknum;
+	BlockNumber blocknum;
+} BlockInfoRecord;
+
+/* Tasks performed by autoprewarm workers.*/
+typedef enum
+{
+	TASK_PREWARM_BUFFERPOOL,	/* prewarm the buffer pool. */
+	TASK_DUMP_BUFFERPOOL_INFO	/* dump the buffer pool block info. */
+} AutoPrewarmTask;
+
+/* Shared state information for autoprewarm bgworker. */
+typedef struct AutoPrewarmSharedState
+{
+	LWLock		lock;			/* mutual exclusion */
+	pid_t		bgworker_pid;	/* for main bgworker */
+	pid_t		pid_using_dumpfile;		/* for autoprewarm or block dump */
+	bool		skip_prewarm_on_restart;		/* if set true, prewarm task
+												 * will not be done */
+
+	/* Following items are for communication with per-database worker */
+	dsm_handle	block_info_handle;
+	Oid			database;
+	int			prewarm_start_idx;
+	int			prewarm_stop_idx;
+} AutoPrewarmSharedState;
+
+static AutoPrewarmSharedState *state = NULL;
+
+/* GUC variable that controls the dump activity of autoprewarm. */
+static int	dump_interval = 0;
+
+/*
+ * GUC variable to decide whether autoprewarm worker should be started when
+ * preloaded.
+ */
+static bool autoprewarm = true;
+
+/* Compare member elements to check whether they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * blockinfo_cmp
+ *		Compare function used for qsort().
+ */
+static int
+blockinfo_cmp(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(tablespace);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+	return 0;
+}
+
+/* ============================================================================
+ * =====================	Prewarm part of autoprewarm =======================
+ * ============================================================================
+ */
+
+/*
+ * reset_shm_state
+ *		on_shm_exit reset the prewarm state
+ */
+
+static void
+reset_shm_state(int code, Datum arg)
+{
+	if (state->pid_using_dumpfile == MyProcPid)
+		state->pid_using_dumpfile = InvalidPid;
+	if (state->bgworker_pid == MyProcPid)
+		state->bgworker_pid = InvalidPid;
+}
+
+/*
+ * init_autoprewarm_state
+ *		Allocate and initialize autoprewarm related shared memory.
+ */
+static void
+init_autoprewarm_state(void)
+{
+	bool		found = false;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	state = ShmemInitStruct("autoprewarm",
+							sizeof(AutoPrewarmSharedState),
+							&found);
+	if (!found)
+	{
+		/* First time through ... */
+		LWLockInitialize(&state->lock, LWLockNewTrancheId());
+		state->bgworker_pid = InvalidPid;
+		state->pid_using_dumpfile = InvalidPid;
+		state->skip_prewarm_on_restart = false;
+	}
+
+	LWLockRelease(AddinShmemInitLock);
+}
+
+/*
+ * load_one_database
+ *		This subroutine loads the BlockInfoRecords of the database set in
+ *		AutoPrewarmSharedState.
+ *
+ * Connect to the database and load the blocks of that database which are given
+ * by [state->prewarm_start_idx, state->prewarm_stop_idx).
+ */
+void
+load_one_database(Datum main_arg)
+{
+	uint32		pos;
+	BlockInfoRecord *block_info;
+	Relation	rel = NULL;
+	BlockNumber nblocks = 0;
+	BlockInfoRecord *old_blk;
+	dsm_segment *seg;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+
+	init_autoprewarm_state();
+	seg = dsm_attach(state->block_info_handle);
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("could not map dynamic shared memory segment")));
+
+	block_info = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	BackgroundWorkerInitializeConnectionByOid(state->database, InvalidOid);
+	old_blk = NULL;
+	pos = state->prewarm_start_idx;
+
+	while (!got_sigterm && pos < state->prewarm_stop_idx && have_free_buffer())
+	{
+		BlockInfoRecord *blk = &block_info[pos];
+		Buffer		buf;
+
+		/*
+		 * Quit if we've reached records for another database. If previous
+		 * blocks are of some global objects, then continue pre-warming.
+		 */
+		if (old_blk != NULL && old_blk->database != blk->database &&
+			old_blk->database != 0)
+			break;
+
+		/*
+		 * As soon as we encounter a block of a new relation, close the old
+		 * relation. Note, that rel will be NULL if try_relation_open failed
+		 * previously, in that case there is nothing to close.
+		 */
+		if (old_blk != NULL && old_blk->filenode != blk->filenode &&
+			rel != NULL)
+		{
+			relation_close(rel, AccessShareLock);
+			rel = NULL;
+			CommitTransactionCommand();
+		}
+
+		/*
+		 * Try to open each new relation, but only once, when we first
+		 * encounter it. If it's been dropped, skip the associated blocks.
+		 */
+		if (old_blk == NULL || old_blk->filenode != blk->filenode)
+		{
+			Oid			reloid;
+
+			Assert(rel == NULL);
+			StartTransactionCommand();
+			reloid = RelidByRelfilenode(blk->tablespace, blk->filenode);
+			if (OidIsValid(reloid))
+				rel = try_relation_open(reloid, AccessShareLock);
+
+			if (!rel)
+				CommitTransactionCommand();
+		}
+		if (!rel)
+		{
+			++pos;
+			old_blk = blk;
+			continue;
+		}
+
+		/* Once per fork, check for fork existence and size. */
+		if (old_blk == NULL ||
+			old_blk->filenode != blk->filenode ||
+			old_blk->forknum != blk->forknum)
+		{
+			RelationOpenSmgr(rel);
+
+			/*
+			 * smgrexists is not safe for illegal forknum, hence check whether
+			 * the passed forknum is valid before using it in smgrexists.
+			 */
+			if (blk->forknum > InvalidForkNumber &&
+				blk->forknum <= MAX_FORKNUM &&
+				smgrexists(rel->rd_smgr, blk->forknum))
+				nblocks = RelationGetNumberOfBlocksInFork(rel, blk->forknum);
+			else
+				nblocks = 0;
+		}
+
+		/* Check whether blocknum is valid and within fork file size. */
+		if (blk->blocknum >= nblocks)
+		{
+			/* Move to next forknum. */
+			++pos;
+			old_blk = blk;
+			continue;
+		}
+
+		/* Prewarm buffer. */
+		buf = ReadBufferExtended(rel, blk->forknum, blk->blocknum, RBM_NORMAL,
+								 NULL);
+		if (BufferIsValid(buf))
+			ReleaseBuffer(buf);
+
+		old_blk = blk;
+		++pos;
+	}
+
+	dsm_detach(seg);
+
+	/* Release lock on previous relation. */
+	if (rel)
+	{
+		relation_close(rel, AccessShareLock);
+		CommitTransactionCommand();
+	}
+
+	return;
+}
+
+/*
+ * launch_and_wait_for_per_database_worker
+ *		Register a per-database dynamic worker to load.
+ */
+static void
+launch_and_wait_for_per_database_worker(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle = NULL;
+	BgwHandleStatus status PG_USED_FOR_ASSERTS_ONLY;
+
+	setup_autoprewarm(&worker, "autoprewarm", "load_one_database",
+					  (Datum) NULL, BGW_NEVER_RESTART,
+					  BGWORKER_BACKEND_DATABASE_CONNECTION);
+
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerShutdown */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker autoprewarm failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerShutdown(handle);
+	Assert(status == BGWH_STOPPED);
+}
+
+/*
+ * prewarm_buffer_pool
+ *		The main routine that prewarms the buffer pool.
+ *
+ * The prewarm bgworker will first load all the BlockInfoRecords in
+ * $PGDATA/AUTOPREWARM_FILE to a DSM. Further, these BlockInfoRecords are
+ * separated based on their databases. Finally, for each group of
+ * BlockInfoRecords a per-database worker will be launched to load the
+ * corresponding blocks. Launch the next worker only after the previous one has
+ * finished its job.
+ */
+static void
+prewarm_buffer_pool(void)
+{
+	FILE	   *file = NULL;
+	uint32		num_elements,
+				i;
+	BlockInfoRecord *blkinfo;
+	dsm_segment *seg;
+
+	/*
+	 * Since there can be at most one worker for prewarm, locking is not
+	 * required for setting skip_prewarm_on_restart.
+	 */
+	state->skip_prewarm_on_restart = true;
+
+	LWLockAcquire(&state->lock, LW_EXCLUSIVE);
+	if (state->pid_using_dumpfile == InvalidPid)
+		state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		LWLockRelease(&state->lock);
+		ereport(LOG,
+				(errmsg("skipping prewarm because block dump file is being written by PID %d",
+						state->pid_using_dumpfile)));
+		return;
+	}
+
+	LWLockRelease(&state->lock);
+
+	file = AllocateFile(AUTOPREWARM_FILE, PG_BINARY_R);
+	if (!file)
+	{
+		if (errno != ENOENT)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m",
+							AUTOPREWARM_FILE)));
+
+		state->pid_using_dumpfile = InvalidPid;
+		return;					/* No file to load. */
+	}
+
+	if (fscanf(file, "<<%u>>i\n", &num_elements) != 1)
+	{
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from file \"%s\": %m",
+						AUTOPREWARM_FILE)));
+	}
+
+	seg = dsm_create(sizeof(BlockInfoRecord) * num_elements, 0);
+
+	blkinfo = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	for (i = 0; i < num_elements; i++)
+	{
+		/* Get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &blkinfo[i].database,
+						&blkinfo[i].tablespace, &blkinfo[i].filenode,
+						(uint32 *) &blkinfo[i].forknum, &blkinfo[i].blocknum))
+			break;
+	}
+
+	FreeFile(file);
+
+	if (num_elements != i)
+		elog(ERROR, "autoprewarm block dump has %u entries but expected %u",
+			 i, num_elements);
+
+	/*
+	 * Sort the block number to increase the chance of sequential reads during
+	 * load.
+	 */
+	pg_qsort(blkinfo, num_elements, sizeof(BlockInfoRecord), blockinfo_cmp);
+
+	state->block_info_handle = dsm_segment_handle(seg);
+	state->prewarm_start_idx = state->prewarm_stop_idx = 0;
+
+	/* Get the info position of the first block of the next database. */
+	while (state->prewarm_start_idx < num_elements)
+	{
+		uint32		i = state->prewarm_start_idx;
+		Oid			current_db = blkinfo[i].database;
+
+		/*
+		 * Advance the prewarm_stop_idx to the first BlockRecordInfo that does
+		 * not belong to this database.
+		 */
+		i++;
+		while (i < num_elements)
+		{
+			if (current_db != blkinfo[i].database)
+			{
+				/*
+				 * Combine BlockRecordInfos of global object with the next
+				 * non-global object.
+				 */
+				if (current_db != InvalidOid)
+					break;
+				current_db = blkinfo[i].database;
+			}
+
+			i++;
+		}
+
+		/*
+		 * If we reach this point with current_db == InvalidOid, then only
+		 * BlockRecordInfos belonging to global objects exist. Since, we can
+		 * not connect with InvalidOid skip prewarming for these objects.
+		 */
+		if (current_db == InvalidOid)
+			break;
+
+		state->prewarm_stop_idx = i;
+		state->database = current_db;
+
+		Assert(state->prewarm_start_idx < state->prewarm_stop_idx);
+
+		/*
+		 * Register a per-database worker to load blocks of the database. Wait
+		 * until it has finished before starting the next worker.
+		 */
+		launch_and_wait_for_per_database_worker();
+		state->prewarm_start_idx = state->prewarm_stop_idx;
+	}
+
+	dsm_detach(seg);
+	state->block_info_handle = DSM_HANDLE_INVALID;
+
+	state->pid_using_dumpfile = InvalidPid;
+	ereport(LOG,
+			(errmsg("autoprewarm load task ended")));
+	return;
+}
+
+/*
+ * ============================================================================
+ * ===================== Dump part of Autoprewarm =============================
+ * ============================================================================
+ */
+
+/*
+ * This submodule is for periodically dumping BlockRecordInfos in buffer pool
+ * into a dump file AUTOPREWARM_FILE.
+ * Each entry of BlockRecordInfo consists of database, tablespace, filenode,
+ * forknum, blocknum. Note that this is in the text form so that the dump
+ * information is readable and can be edited, if required.
+ */
+
+/*
+ * dump_now
+ *		Dumps BlockRecordInfos in buffer pool.
+ */
+static uint32
+dump_now(bool is_bgworker)
+{
+	static char transient_dump_file_path[MAXPGPATH];
+	uint32		i;
+	int			ret,
+				buflen;
+	uint32		num_blocks;
+	BlockInfoRecord *block_info_array;
+	BufferDesc *bufHdr;
+	int			fd;
+	char		buf[1024];
+
+	LWLockAcquire(&state->lock, LW_EXCLUSIVE);
+	if (state->pid_using_dumpfile == InvalidPid)
+		state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		LWLockRelease(&state->lock);
+
+		if (!is_bgworker)
+			ereport(ERROR,
+					(errmsg("could not perform block dump because dump file is being used by PID %d",
+							state->pid_using_dumpfile)));
+		ereport(LOG,
+				(errmsg("skipping block dump because it is already being performed by PID %d",
+						state->pid_using_dumpfile)));
+		return 0;
+	}
+
+	LWLockRelease(&state->lock);
+
+	block_info_array =
+		(BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	for (num_blocks = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		/* In case of a SIGHUP, just reload the configuration. */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Have we been asked to stop dump? */
+		if (dump_interval == AT_PWARM_OFF)
+		{
+			pfree(block_info_array);
+			return 0;
+		}
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* Lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		if (buf_state & BM_TAG_VALID)
+		{
+			block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_blocks].tablespace = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+			++num_blocks;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	snprintf(transient_dump_file_path, MAXPGPATH, "%s.tmp", AUTOPREWARM_FILE);
+
+	fd = OpenTransientFile(transient_dump_file_path,
+						   O_CREAT | O_WRONLY | O_TRUNC, 0666);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open \"%s\": %m",
+						transient_dump_file_path)));
+
+	buflen = sprintf(buf, "<<%u>>\n", num_blocks);
+	if (write(fd, buf, buflen) < buflen)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write to file \"%s\" : %m",
+						transient_dump_file_path)));
+
+	for (i = 0; i < num_blocks; i++)
+	{
+		/* In case of a SIGHUP, just reload the configuration. */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Have we been asked to stop dump? */
+		if (dump_interval == AT_PWARM_OFF)
+		{
+			pfree(block_info_array);
+			CloseTransientFile(fd);
+			unlink(transient_dump_file_path);
+			return 0;
+		}
+
+		buflen = sprintf(buf, "%u,%u,%u,%u,%u\n",
+						 block_info_array[i].database,
+						 block_info_array[i].tablespace,
+						 block_info_array[i].filenode,
+						 (uint32) block_info_array[i].forknum,
+						 block_info_array[i].blocknum);
+
+		if (write(fd, buf, buflen) < buflen)
+		{
+			int			save_errno = errno;
+
+			CloseTransientFile(fd);
+			unlink(transient_dump_file_path);
+			errno = save_errno;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to file \"%s\": %m",
+							transient_dump_file_path)));
+		}
+	}
+
+	pfree(block_info_array);
+
+	/*
+	 * Rename transient_dump_file_path to AUTOPREWARM_FILE to make things
+	 * permanent.
+	 */
+	ret = CloseTransientFile(fd);
+	if (ret != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\" : %m",
+						transient_dump_file_path)));
+	(void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
+
+	state->pid_using_dumpfile = InvalidPid;
+
+	ereport(LOG,
+			(errmsg("saved metadata info of %d blocks", num_blocks)));
+	return num_blocks;
+}
+
+/*
+ * dump_block_info_periodically
+ *		 This loop periodically call dump_now().
+ *
+ * Call dum_now() at regular intervals defined by GUC variable dump_interval.
+ */
+void
+dump_block_info_periodically(void)
+{
+	TimestampTz last_dump_time = GetCurrentTimestamp();
+
+	while (!got_sigterm)
+	{
+		int			rc;
+		struct timeval nap;
+
+		nap.tv_sec = AT_PWARM_DEFAULT_DUMP_INTERVAL;
+		nap.tv_usec = 0;
+
+		/* Have we been asked to stop dumping? */
+		if (dump_interval == AT_PWARM_OFF)
+			return;
+
+		if (dump_interval > AT_PWARM_DUMP_AT_SHUTDOWN_ONLY)
+		{
+			TimestampTz current_time = GetCurrentTimestamp();
+
+			if (TimestampDifferenceExceeds(last_dump_time,
+										   current_time,
+										   (dump_interval * 1000)))
+			{
+				dump_now(true);
+
+				/*
+				 * It is better to stop when shutdown signal is received
+				 * during or right after a dump.
+				 */
+				if (got_sigterm)
+					return;
+				last_dump_time = GetCurrentTimestamp();
+				nap.tv_sec = dump_interval;
+				nap.tv_usec = 0;
+			}
+			else
+			{
+				long		secs;
+				int			usecs;
+
+				TimestampDifference(last_dump_time, current_time,
+									&secs, &usecs);
+				nap.tv_sec = dump_interval - secs;
+				nap.tv_usec = 0;
+			}
+		}
+
+		ResetLatch(&MyProc->procLatch);
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   (nap.tv_sec * 1000L) + (nap.tv_usec / 1000L),
+					   PG_WAIT_EXTENSION);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		/* In case of a SIGHUP, just reload the configuration. */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* It's time for postmaster shutdown, let's dump for one last time. */
+	if (dump_interval != AT_PWARM_OFF)
+		dump_now(true);
+}
+
+/*
+ * autoprewarm_main
+ *		The main entry point of autoprewarm bgworker process.
+ */
+void
+autoprewarm_main(Datum main_arg)
+{
+	AutoPrewarmTask todo_task;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+	pqsignal(SIGUSR1, apw_sigusr1_handler);
+
+	/* We're now ready to receive signals. */
+	BackgroundWorkerUnblockSignals();
+
+	todo_task = DatumGetInt32(main_arg);
+	Assert(todo_task == TASK_PREWARM_BUFFERPOOL ||
+		   todo_task == TASK_DUMP_BUFFERPOOL_INFO);
+	init_autoprewarm_state();
+
+	LWLockAcquire(&state->lock, LW_EXCLUSIVE);
+	if (state->bgworker_pid != InvalidPid)
+	{
+		LWLockRelease(&state->lock);
+		ereport(LOG,
+				(errmsg("autoprewarm worker is already running under PID %d",
+						state->bgworker_pid)));
+		return;
+	}
+
+	state->bgworker_pid = MyProcPid;
+	LWLockRelease(&state->lock);
+
+	on_shmem_exit(reset_shm_state, 0);
+
+	ereport(LOG,
+			(errmsg("autoprewarm has started")));
+
+	/* Perform autoprewarm's task. */
+	if (todo_task == TASK_PREWARM_BUFFERPOOL &&
+		!state->skip_prewarm_on_restart)
+		prewarm_buffer_pool();
+
+	dump_block_info_periodically();
+
+	ereport(LOG,
+			(errmsg("autoprewarm shutting down")));
+}
+
+/* ============================================================================
+ * =============	Extension's entry functions/utilities	===================
+ * ============================================================================
+ */
+
+/*
+ * setup_autoprewarm
+ *		A common function to initialize BackgroundWorker structure.
+ */
+static void
+setup_autoprewarm(BackgroundWorker *autoprewarm, const char *worker_name,
+			   const char *worker_function, Datum main_arg, int restart_time,
+				  int extra_flags)
+{
+	MemSet(autoprewarm, 0, sizeof(BackgroundWorker));
+	autoprewarm->bgw_flags = BGWORKER_SHMEM_ACCESS | extra_flags;
+
+	/* Register the autoprewarm background worker */
+	autoprewarm->bgw_start_time = BgWorkerStart_ConsistentState;
+	autoprewarm->bgw_restart_time = restart_time;
+	strcpy(autoprewarm->bgw_library_name, "pg_prewarm");
+	strcpy(autoprewarm->bgw_function_name, worker_function);
+	strncpy(autoprewarm->bgw_name, worker_name, BGW_MAXLEN);
+	autoprewarm->bgw_main_arg = main_arg;
+}
+
+/*
+ * _PG_init
+ *		Extension's entry point.
+ */
+void
+_PG_init(void)
+{
+	BackgroundWorker prewarm_worker;
+
+	/* Define custom GUC variables. */
+	if (process_shared_preload_libraries_in_progress)
+		DefineCustomBoolVariable("pg_prewarm.autoprewarm",
+								 "Enable/Disable auto-prewarm feature.",
+								 NULL,
+								 &autoprewarm,
+								 true,
+								 PGC_POSTMASTER,
+								 0,
+								 NULL,
+								 NULL,
+								 NULL);
+
+	DefineCustomIntVariable("pg_prewarm.dump_interval",
+					   "Sets the maximum time between two buffer pool dumps",
+							"If set to zero, timer based dumping is disabled."
+							" If set to -1, stops autoprewarm.",
+							&dump_interval,
+							AT_PWARM_DEFAULT_DUMP_INTERVAL,
+							AT_PWARM_OFF, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	EmitWarningsOnPlaceholders("pg_prewarm");
+
+	/* If not run as a preloaded library, nothing more to do. */
+	if (!process_shared_preload_libraries_in_progress)
+		return;
+
+	/* Request additional shared resources. */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
+
+	/* If autoprewarm bgworker is disabled then nothing more to do. */
+	if (!autoprewarm)
+		return;
+
+	/* Register autoprewarm load. */
+	setup_autoprewarm(&prewarm_worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_PREWARM_BUFFERPOOL), 0, 0);
+	RegisterBackgroundWorker(&prewarm_worker);
+}
+
+/*
+ * autoprewarm_dump_launcher
+ *		Dynamically launch an autoprewarm dump worker.
+ */
+static pid_t
+autoprewarm_dump_launcher(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	setup_autoprewarm(&worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_DUMP_BUFFERPOOL_INFO), 0, 0);
+
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			   errmsg("registering dynamic bgworker \"autoprewarm\" failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerStartup(handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not start autoprewarm dump bgworker"),
+			   errhint("More details may be available in the server log.")));
+	}
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			  errmsg("cannot start bgworker autoprewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart the database.")));
+	}
+
+	Assert(status == BGWH_STARTED);
+	return pid;
+}
+
+/*
+ * launch_autoprewarm_dump
+ *		The C-Language entry function to launch autoprewarm dump bgworker.
+ */
+Datum
+launch_autoprewarm_dump(PG_FUNCTION_ARGS)
+{
+	pid_t		pid;
+
+	/* If dump_interval is disabled then nothing more to do. */
+	if (dump_interval == AT_PWARM_OFF)
+		PG_RETURN_NULL();
+
+	pid = autoprewarm_dump_launcher();
+	PG_RETURN_INT32(pid);
+}
+
+/*
+ * autoprewarm_dump_now
+ *		The C-Language entry function to dump immediately.
+ */
+Datum
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+	uint32		num_blocks = 0;
+
+	init_autoprewarm_state();
+
+	PG_TRY();
+	{
+		num_blocks = dump_now(false);
+	}
+	PG_CATCH();
+	{
+		if (state->pid_using_dumpfile == MyProcPid)
+			state->pid_using_dumpfile = InvalidPid;
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+	PG_RETURN_INT64(num_blocks);
+}
diff --git a/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
new file mode 100644
index 0000000..6c35fb7
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,14 @@
+/* contrib/pg_prewarm/pg_prewarm--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_prewarm UPDATE TO '1.2'" to load this file. \quit
+
+CREATE FUNCTION launch_autoprewarm_dump()
+RETURNS pg_catalog.int4 STRICT
+AS 'MODULE_PATHNAME', 'launch_autoprewarm_dump'
+LANGUAGE C;
+
+CREATE FUNCTION autoprewarm_dump_now()
+RETURNS pg_catalog.int8 STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_dump_now'
+LANGUAGE C;
diff --git a/contrib/pg_prewarm/pg_prewarm.control b/contrib/pg_prewarm/pg_prewarm.control
index cf2fb92..40e3add 100644
--- a/contrib/pg_prewarm/pg_prewarm.control
+++ b/contrib/pg_prewarm/pg_prewarm.control
@@ -1,5 +1,5 @@
 # pg_prewarm extension
 comment = 'prewarm relation data'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_prewarm'
 relocatable = true
diff --git a/doc/src/sgml/pgprewarm.sgml b/doc/src/sgml/pgprewarm.sgml
index c090401..e8d0c2e 100644
--- a/doc/src/sgml/pgprewarm.sgml
+++ b/doc/src/sgml/pgprewarm.sgml
@@ -10,7 +10,9 @@
  <para>
   The <filename>pg_prewarm</filename> module provides a convenient way
   to load relation data into either the operating system buffer cache
-  or the <productname>PostgreSQL</productname> buffer cache.
+  or the <productname>PostgreSQL</productname> buffer cache. Additionally, an
+  automatic prewarming of the server buffers is supported whenever the server
+  restarts.
  </para>
 
  <sect2>
@@ -55,6 +57,100 @@ pg_prewarm(regclass, mode text default 'buffer', fork text default 'main',
    cache. For these reasons, prewarming is typically most useful at startup,
    when caches are largely empty.
   </para>
+
+<synopsis>
+launch_autoprewarm_dump() RETURNS int4
+</synopsis>
+
+  <para>
+   This is a SQL callable function to launch the <literal>autoprewarm</literal>
+   worker to dump the buffer pool information at regular interval. In a server,
+   we can only run one <literal>autoprewarm</literal> worker so if worker sees
+   another existing worker it will exit immediately. The return value is pid of
+   the worker which has been launched.
+  </para>
+
+<synopsis>
+autoprewarm_dump_now() RETURNS int8
+</synopsis>
+
+  <para>
+   This is a SQL callable function to dump buffer pool information immediately
+   once by a backend. The return value is the number of block infos dumped.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>autoprewarm</title>
+
+  <para>
+  A bgworker which automatically records information about blocks which were
+  present in buffer pool before server shutdown and then prewarm the buffer
+  pool upon server restart with those blocks.
+  </para>
+
+  <para>
+  When the shared library <literal>pg_prewarm</literal> is preloaded via
+  <xref linkend="guc-shared-preload-libraries"> in <filename>postgresql.conf</>,
+  a bgworker <literal>autoprewarm</literal> is launched immediately after the
+  server has reached a consistent state. The bgworker will start loading blocks
+  recorded in <literal>$PGDATA/autoprewarm.blocks</literal> until there is a
+  free buffer left in the buffer pool. This way we do not replace any new
+  blocks which were loaded either by the recovery process or the querying
+  clients.
+  </para>
+
+  <para>
+  Once the <literal>autoprewarm</literal> bgworker has completed its prewarm
+  task, it will start a new task to periodically dump the information about
+  blocks which are currently in shared buffer pool. Upon next server restart,
+  the bgworker will prewarm the buffer pool by loading those blocks. The GUC
+  <literal>pg_prewarm.dump_interval</literal> will control the dumping activity
+  of the bgworker.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+ <variablelist>
+   <varlistentry>
+    <term>
+     <varname>pg_prewarm.autoprewarm</varname> (<type>boolean</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.autoprewarm</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is valid only for <literal>autoprewarm</literal>. An autoprewarm
+      worker will only be started if this variable is set <literal>on</literal>.
+      The default value is <literal>on</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry>
+   <term>
+     <varname>pg_prewarm.dump_interval</varname> (<type>int</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.dump_interval</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is valid only for <literal>autoprewarm</literal>. The minimum number
+      of seconds between two buffer pool's block information dump. The default
+      is 300 seconds. It also takes special values. If set to 0 then timer
+      based dump is disabled, it dumps only while the server is shutting down.
+      If set to -1, the running <literal>autoprewarm</literal> will be stopped.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
  </sect2>
 
  <sect2>
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 5d0a636..06a34a7 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,23 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- a lockless check to see if there is a free buffer in
+ *					   buffer pool.
+ *
+ * If the result is true that will become stale once free buffers are moved out
+ * by other operations, so the caller who strictly want to use a free buffer
+ * should not call this.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index ff99f6b..ab04bd9 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index eaa6d32..c6fa86a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -138,6 +138,8 @@ AttrDefault
 AttrNumber
 AttributeOpts
 AuthRequest
+AutoPrewarmSharedState
+AutoPrewarmTask
 AutoVacOpts
 AutoVacuumShmemStruct
 AutoVacuumWorkItem
@@ -214,10 +216,12 @@ BitmapOr
 BitmapOrPath
 BitmapOrState
 Bitmapset
+BlkType
 BlobInfo
 Block
 BlockId
 BlockIdData
+BlockInfoRecord
 BlockNumber
 BlockSampler
 BlockSamplerData
@@ -2869,6 +2873,7 @@ pos_trgm
 post_parse_analyze_hook_type
 pqbool
 pqsigfunc
+prewarm_elem
 printQueryOpt
 printTableContent
 printTableFooter

#85

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Mithun Cy (#84)

Re: Proposal : For Auto-Prewarm.

On Fri, Jun 9, 2017 at 10:01 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

I have merged Rafia's patch for cosmetic changes. I have also fixed
some of the changes you have recommended over that. But kept few as it
is since Rafia's opinion was needed on that.

Few comments on the latest patch:
1.
+ /* Check whether blocknum is valid and within fork file size. */
+ if (blk->blocknum >= nblocks)
+ {
+ /* Move to next forknum. */
+ ++pos;
+ old_blk = blk;
+ continue;
+ }
+
+ /* Prewarm buffer. */
+ buf = ReadBufferExtended(rel, blk->forknum, blk->blocknum, RBM_NORMAL,
+ NULL);
+ if (BufferIsValid(buf))
+ ReleaseBuffer(buf);
+
+ old_blk = blk;
+ ++pos;

You are incrementing position at different places in this loop. I
think you can do it once at the start of the loop.

2.
+dump_now(bool is_bgworker)
{
..
+ fd = OpenTransientFile(transient_dump_file_path,
+   O_CREAT | O_WRONLY | O_TRUNC, 0666);

+prewarm_buffer_pool(void)
{
..
+ file = AllocateFile(AUTOPREWARM_FILE, PG_BINARY_R);

During prewarm, you seem to be using binary mode to open a file
whereas during dump binary flag is not passed. Is there a reason
for such a difference?

3.
+ ereport(LOG,
+ (errmsg("saved metadata info of %d blocks", num_blocks)));

It doesn't seem like a good idea to log this info at each dump
interval. How about making this as a DEBUG1 message?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#86

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Amit Kapila (#85)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

Thanks, Amit,

On Fri, Jun 9, 2017 at 8:07 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Few comments on the latest patch:
+
+ /* Prewarm buffer. */
+ buf = ReadBufferExtended(rel, blk->forknum, blk->blocknum, RBM_NORMAL,
+ NULL);
+ if (BufferIsValid(buf))
+ ReleaseBuffer(buf);
+
+ old_blk = blk;
+ ++pos;
You are incrementing position at different places in this loop. I
think you can do it once at the start of the loop.

Fixed now moved all of ++pos to one place now.

2.
+dump_now(bool is_bgworker)
{
..
+ fd = OpenTransientFile(transient_dump_file_path,
+   O_CREAT | O_WRONLY | O_TRUNC, 0666);
+prewarm_buffer_pool(void)
{
..
+ file = AllocateFile(AUTOPREWARM_FILE, PG_BINARY_R);
During prewarm, you seem to be using binary mode to open a file
whereas during dump binary flag is not passed. Is there a reason
for such a difference?

-- Sorry fixed now, both use binary.

3.
+ ereport(LOG,
+ (errmsg("saved metadata info of %d blocks", num_blocks)));
It doesn't seem like a good idea to log this info at each dump
interval. How about making this as a DEBUG1 message?

-- Fixed, made it as DEBUG1 along with another message "autoprewarm load
task ended" message.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

autoprewarm_13.patchapplication/octet-stream; name=autoprewarm_13.patchDownload

commit 59c8f4999e647eb848267d1648a110a288d410da
Author: mithun <mithun@localhost.localdomain>
Date:   Mon Jun 12 06:18:48 2017 +0530

    patch 12

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e..88580d1 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,10 +1,10 @@
 # contrib/pg_prewarm/Makefile
 
 MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+OBJS = pg_prewarm.o autoprewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
-DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
+DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
new file mode 100644
index 0000000..7e5c8ce
--- /dev/null
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -0,0 +1,1035 @@
+/*-------------------------------------------------------------------------
+ *
+ * autoprewarm.c
+ *		Automatically prewarms the shared buffer pool when server restarts.
+ *
+ * DESCRIPTION
+ *
+ *		Autoprewarm is a bgworker process that automatically records the
+ *		information about blocks which were present in buffer pool before
+ *		server shutdown. Then prewarms the buffer pool on server restart
+ *		with those blocks.
+ *
+ *		How does it work? When the shared library "pg_prewarm" is preloaded, a
+ *		bgworker "autoprewarm" is launched immediately after the server has
+ *		reached a consistent state. The bgworker will start loading blocks
+ *		recorded until there is no free buffer left in the buffer pool. This
+ *		way we do not replace any new blocks which were loaded either by the
+ *		recovery process or the querying clients.
+ *
+ *		Once the "autoprewarm" bgworker has completed its prewarm task, it will
+ *		start a new task to periodically dump the BlockInfoRecords related to
+ *		the blocks which are currently in shared buffer pool. On next server
+ *		restart, the bgworker will prewarm the buffer pool by loading those
+ *		blocks. The GUC pg_prewarm.dump_interval will control the dumping
+ *		activity of the bgworker.
+ *
+ *	Copyright (c) 2016-2017, PostgreSQL Global Development Group
+ *
+ *	IDENTIFICATION
+ *		contrib/pg_prewarm/autoprewarm.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include <unistd.h>
+
+/* These are always necessary for a bgworker. */
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+
+/* These are necessary for prewarm utilities. */
+#include "access/heapam.h"
+#include "access/xact.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "pgstat.h"
+#include "storage/buf_internals.h"
+#include "storage/dsm.h"
+#include "storage/smgr.h"
+#include "utils/acl.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relfilenodemap.h"
+#include "utils/resowner.h"
+
+PG_FUNCTION_INFO_V1(launch_autoprewarm_dump);
+PG_FUNCTION_INFO_V1(autoprewarm_dump_now);
+
+#define AT_PWARM_OFF -1
+#define AT_PWARM_DUMP_AT_SHUTDOWN_ONLY 0
+#define AT_PWARM_DEFAULT_DUMP_INTERVAL 300
+
+#define AUTOPREWARM_FILE "autoprewarm.blocks"
+
+/* Primary functions */
+void		_PG_init(void);
+void		autoprewarm_main(Datum main_arg);
+static void dump_block_info_periodically(void);
+static pid_t autoprewarm_dump_launcher(void);
+static void setup_autoprewarm(BackgroundWorker *autoprewarm,
+				  const char *worker_name,
+				  const char *worker_function,
+				  Datum main_arg, int restart_time,
+				  int extra_flags);
+void		load_one_database(Datum main_arg);
+
+/*
+ * Signal Handlers.
+ */
+
+static void apw_sigterm_handler(SIGNAL_ARGS);
+static void apw_sighup_handler(SIGNAL_ARGS);
+static void apw_sigusr1_handler(SIGNAL_ARGS);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+static volatile sig_atomic_t got_sighup = false;
+
+/*
+ *	Signal handler for SIGTERM
+ *	Set a flag to handle.
+ */
+static void
+apw_sigterm_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGHUP
+ *	Set a flag to reread the config file.
+ */
+static void
+apw_sighup_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sighup = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGUSR1.
+ *	The prewarm workers notify with SIGUSR1 on their startup/shutdown.
+ */
+static void
+apw_sigusr1_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/* ============================================================================
+ * ==============	Types and variables used by autoprewarm   =============
+ * ============================================================================
+ */
+
+/* Metadata of each persistent block which is dumped and used for loading. */
+typedef struct BlockInfoRecord
+{
+	Oid			database;
+	Oid			tablespace;
+	Oid			filenode;
+	ForkNumber	forknum;
+	BlockNumber blocknum;
+} BlockInfoRecord;
+
+/* Tasks performed by autoprewarm workers.*/
+typedef enum
+{
+	TASK_PREWARM_BUFFERPOOL,	/* prewarm the buffer pool. */
+	TASK_DUMP_BUFFERPOOL_INFO	/* dump the buffer pool block info. */
+} AutoPrewarmTask;
+
+/* Shared state information for autoprewarm bgworker. */
+typedef struct AutoPrewarmSharedState
+{
+	LWLock		lock;			/* mutual exclusion */
+	pid_t		bgworker_pid;	/* for main bgworker */
+	pid_t		pid_using_dumpfile;		/* for autoprewarm or block dump */
+	bool		skip_prewarm_on_restart;		/* if set true, prewarm task
+												 * will not be done */
+
+	/* Following items are for communication with per-database worker */
+	dsm_handle	block_info_handle;
+	Oid			database;
+	int			prewarm_start_idx;
+	int			prewarm_stop_idx;
+} AutoPrewarmSharedState;
+
+static AutoPrewarmSharedState *state = NULL;
+
+/* GUC variable that controls the dump activity of autoprewarm. */
+static int	dump_interval = 0;
+
+/*
+ * GUC variable to decide whether autoprewarm worker should be started when
+ * preloaded.
+ */
+static bool autoprewarm = true;
+
+/* Compare member elements to check whether they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * blockinfo_cmp
+ *		Compare function used for qsort().
+ */
+static int
+blockinfo_cmp(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(tablespace);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+	return 0;
+}
+
+/* ============================================================================
+ * =====================	Prewarm part of autoprewarm =======================
+ * ============================================================================
+ */
+
+/*
+ * reset_shm_state
+ *		on_shm_exit reset the prewarm state
+ */
+
+static void
+reset_shm_state(int code, Datum arg)
+{
+	if (state->pid_using_dumpfile == MyProcPid)
+		state->pid_using_dumpfile = InvalidPid;
+	if (state->bgworker_pid == MyProcPid)
+		state->bgworker_pid = InvalidPid;
+}
+
+/*
+ * init_autoprewarm_state
+ *		Allocate and initialize autoprewarm related shared memory.
+ */
+static void
+init_autoprewarm_state(void)
+{
+	bool		found = false;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	state = ShmemInitStruct("autoprewarm",
+							sizeof(AutoPrewarmSharedState),
+							&found);
+	if (!found)
+	{
+		/* First time through ... */
+		LWLockInitialize(&state->lock, LWLockNewTrancheId());
+		state->bgworker_pid = InvalidPid;
+		state->pid_using_dumpfile = InvalidPid;
+		state->skip_prewarm_on_restart = false;
+	}
+
+	LWLockRelease(AddinShmemInitLock);
+}
+
+/*
+ * load_one_database
+ *		This subroutine loads the BlockInfoRecords of the database set in
+ *		AutoPrewarmSharedState.
+ *
+ * Connect to the database and load the blocks of that database which are given
+ * by [state->prewarm_start_idx, state->prewarm_stop_idx).
+ */
+void
+load_one_database(Datum main_arg)
+{
+	uint32		pos;
+	BlockInfoRecord *block_info;
+	Relation	rel = NULL;
+	BlockNumber nblocks = 0;
+	BlockInfoRecord *old_blk;
+	dsm_segment *seg;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+
+	init_autoprewarm_state();
+	seg = dsm_attach(state->block_info_handle);
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("could not map dynamic shared memory segment")));
+
+	block_info = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	BackgroundWorkerInitializeConnectionByOid(state->database, InvalidOid);
+	old_blk = NULL;
+	pos = state->prewarm_start_idx;
+
+	while (!got_sigterm && pos < state->prewarm_stop_idx && have_free_buffer())
+	{
+		BlockInfoRecord *blk = &block_info[pos++];
+		Buffer		buf;
+
+		/*
+		 * Quit if we've reached records for another database. If previous
+		 * blocks are of some global objects, then continue pre-warming.
+		 */
+		if (old_blk != NULL && old_blk->database != blk->database &&
+			old_blk->database != 0)
+			break;
+
+		/*
+		 * As soon as we encounter a block of a new relation, close the old
+		 * relation. Note, that rel will be NULL if try_relation_open failed
+		 * previously, in that case there is nothing to close.
+		 */
+		if (old_blk != NULL && old_blk->filenode != blk->filenode &&
+			rel != NULL)
+		{
+			relation_close(rel, AccessShareLock);
+			rel = NULL;
+			CommitTransactionCommand();
+		}
+
+		/*
+		 * Try to open each new relation, but only once, when we first
+		 * encounter it. If it's been dropped, skip the associated blocks.
+		 */
+		if (old_blk == NULL || old_blk->filenode != blk->filenode)
+		{
+			Oid			reloid;
+
+			Assert(rel == NULL);
+			StartTransactionCommand();
+			reloid = RelidByRelfilenode(blk->tablespace, blk->filenode);
+			if (OidIsValid(reloid))
+				rel = try_relation_open(reloid, AccessShareLock);
+
+			if (!rel)
+				CommitTransactionCommand();
+		}
+		if (!rel)
+		{
+			old_blk = blk;
+			continue;
+		}
+
+		/* Once per fork, check for fork existence and size. */
+		if (old_blk == NULL ||
+			old_blk->filenode != blk->filenode ||
+			old_blk->forknum != blk->forknum)
+		{
+			RelationOpenSmgr(rel);
+
+			/*
+			 * smgrexists is not safe for illegal forknum, hence check whether
+			 * the passed forknum is valid before using it in smgrexists.
+			 */
+			if (blk->forknum > InvalidForkNumber &&
+				blk->forknum <= MAX_FORKNUM &&
+				smgrexists(rel->rd_smgr, blk->forknum))
+				nblocks = RelationGetNumberOfBlocksInFork(rel, blk->forknum);
+			else
+				nblocks = 0;
+		}
+
+		/* Check whether blocknum is valid and within fork file size. */
+		if (blk->blocknum >= nblocks)
+		{
+			/* Move to next forknum. */
+			old_blk = blk;
+			continue;
+		}
+
+		/* Prewarm buffer. */
+		buf = ReadBufferExtended(rel, blk->forknum, blk->blocknum, RBM_NORMAL,
+								 NULL);
+		if (BufferIsValid(buf))
+			ReleaseBuffer(buf);
+
+		old_blk = blk;
+	}
+
+	dsm_detach(seg);
+
+	/* Release lock on previous relation. */
+	if (rel)
+	{
+		relation_close(rel, AccessShareLock);
+		CommitTransactionCommand();
+	}
+
+	return;
+}
+
+/*
+ * launch_and_wait_for_per_database_worker
+ *		Register a per-database dynamic worker to load.
+ */
+static void
+launch_and_wait_for_per_database_worker(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle = NULL;
+	BgwHandleStatus status PG_USED_FOR_ASSERTS_ONLY;
+
+	setup_autoprewarm(&worker, "autoprewarm", "load_one_database",
+					  (Datum) NULL, BGW_NEVER_RESTART,
+					  BGWORKER_BACKEND_DATABASE_CONNECTION);
+
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerShutdown */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker autoprewarm failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerShutdown(handle);
+	Assert(status == BGWH_STOPPED);
+}
+
+/*
+ * prewarm_buffer_pool
+ *		The main routine that prewarms the buffer pool.
+ *
+ * The prewarm bgworker will first load all the BlockInfoRecords in
+ * $PGDATA/AUTOPREWARM_FILE to a DSM. Further, these BlockInfoRecords are
+ * separated based on their databases. Finally, for each group of
+ * BlockInfoRecords a per-database worker will be launched to load the
+ * corresponding blocks. Launch the next worker only after the previous one has
+ * finished its job.
+ */
+static void
+prewarm_buffer_pool(void)
+{
+	FILE	   *file = NULL;
+	uint32		num_elements,
+				i;
+	BlockInfoRecord *blkinfo;
+	dsm_segment *seg;
+
+	/*
+	 * Since there can be at most one worker for prewarm, locking is not
+	 * required for setting skip_prewarm_on_restart.
+	 */
+	state->skip_prewarm_on_restart = true;
+
+	LWLockAcquire(&state->lock, LW_EXCLUSIVE);
+	if (state->pid_using_dumpfile == InvalidPid)
+		state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		LWLockRelease(&state->lock);
+		ereport(LOG,
+				(errmsg("skipping prewarm because block dump file is being written by PID %d",
+						state->pid_using_dumpfile)));
+		return;
+	}
+
+	LWLockRelease(&state->lock);
+
+	file = AllocateFile(AUTOPREWARM_FILE, PG_BINARY_R);
+	if (!file)
+	{
+		if (errno != ENOENT)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m",
+							AUTOPREWARM_FILE)));
+
+		state->pid_using_dumpfile = InvalidPid;
+		return;					/* No file to load. */
+	}
+
+	if (fscanf(file, "<<%u>>i\n", &num_elements) != 1)
+	{
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from file \"%s\": %m",
+						AUTOPREWARM_FILE)));
+	}
+
+	seg = dsm_create(sizeof(BlockInfoRecord) * num_elements, 0);
+
+	blkinfo = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	for (i = 0; i < num_elements; i++)
+	{
+		/* Get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &blkinfo[i].database,
+						&blkinfo[i].tablespace, &blkinfo[i].filenode,
+						(uint32 *) &blkinfo[i].forknum, &blkinfo[i].blocknum))
+			break;
+	}
+
+	FreeFile(file);
+
+	if (num_elements != i)
+		elog(ERROR, "autoprewarm block dump has %u entries but expected %u",
+			 i, num_elements);
+
+	/*
+	 * Sort the block number to increase the chance of sequential reads during
+	 * load.
+	 */
+	pg_qsort(blkinfo, num_elements, sizeof(BlockInfoRecord), blockinfo_cmp);
+
+	state->block_info_handle = dsm_segment_handle(seg);
+	state->prewarm_start_idx = state->prewarm_stop_idx = 0;
+
+	/* Get the info position of the first block of the next database. */
+	while (state->prewarm_start_idx < num_elements)
+	{
+		uint32		i = state->prewarm_start_idx;
+		Oid			current_db = blkinfo[i].database;
+
+		/*
+		 * Advance the prewarm_stop_idx to the first BlockRecordInfo that does
+		 * not belong to this database.
+		 */
+		i++;
+		while (i < num_elements)
+		{
+			if (current_db != blkinfo[i].database)
+			{
+				/*
+				 * Combine BlockRecordInfos of global object with the next
+				 * non-global object.
+				 */
+				if (current_db != InvalidOid)
+					break;
+				current_db = blkinfo[i].database;
+			}
+
+			i++;
+		}
+
+		/*
+		 * If we reach this point with current_db == InvalidOid, then only
+		 * BlockRecordInfos belonging to global objects exist. Since, we can
+		 * not connect with InvalidOid skip prewarming for these objects.
+		 */
+		if (current_db == InvalidOid)
+			break;
+
+		state->prewarm_stop_idx = i;
+		state->database = current_db;
+
+		Assert(state->prewarm_start_idx < state->prewarm_stop_idx);
+
+		/*
+		 * Register a per-database worker to load blocks of the database. Wait
+		 * until it has finished before starting the next worker.
+		 */
+		launch_and_wait_for_per_database_worker();
+		state->prewarm_start_idx = state->prewarm_stop_idx;
+	}
+
+	dsm_detach(seg);
+	state->block_info_handle = DSM_HANDLE_INVALID;
+
+	state->pid_using_dumpfile = InvalidPid;
+	ereport(DEBUG1,
+			(errmsg("autoprewarm load task ended")));
+	return;
+}
+
+/*
+ * ============================================================================
+ * ===================== Dump part of Autoprewarm =============================
+ * ============================================================================
+ */
+
+/*
+ * This submodule is for periodically dumping BlockRecordInfos in buffer pool
+ * into a dump file AUTOPREWARM_FILE.
+ * Each entry of BlockRecordInfo consists of database, tablespace, filenode,
+ * forknum, blocknum. Note that this is in the text form so that the dump
+ * information is readable and can be edited, if required.
+ */
+
+/*
+ * dump_now
+ *		Dumps BlockRecordInfos in buffer pool.
+ */
+static uint32
+dump_now(bool is_bgworker)
+{
+	static char transient_dump_file_path[MAXPGPATH];
+	uint32		i;
+	int			ret,
+				buflen;
+	uint32		num_blocks;
+	BlockInfoRecord *block_info_array;
+	BufferDesc *bufHdr;
+	int			fd;
+	char		buf[1024];
+
+	LWLockAcquire(&state->lock, LW_EXCLUSIVE);
+	if (state->pid_using_dumpfile == InvalidPid)
+		state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		LWLockRelease(&state->lock);
+
+		if (!is_bgworker)
+			ereport(ERROR,
+					(errmsg("could not perform block dump because dump file is being used by PID %d",
+							state->pid_using_dumpfile)));
+		ereport(LOG,
+				(errmsg("skipping block dump because it is already being performed by PID %d",
+						state->pid_using_dumpfile)));
+		return 0;
+	}
+
+	LWLockRelease(&state->lock);
+
+	block_info_array =
+		(BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	for (num_blocks = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		/* In case of a SIGHUP, just reload the configuration. */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Have we been asked to stop dump? */
+		if (dump_interval == AT_PWARM_OFF)
+		{
+			pfree(block_info_array);
+			return 0;
+		}
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* Lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		if (buf_state & BM_TAG_VALID)
+		{
+			block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_blocks].tablespace = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+			++num_blocks;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	snprintf(transient_dump_file_path, MAXPGPATH, "%s.tmp", AUTOPREWARM_FILE);
+
+	fd = OpenTransientFile(transient_dump_file_path,
+						   O_CREAT | O_WRONLY | O_TRUNC | PG_BINARY, 0666);
+	if (fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open \"%s\": %m",
+						transient_dump_file_path)));
+
+	buflen = sprintf(buf, "<<%u>>\n", num_blocks);
+	if (write(fd, buf, buflen) < buflen)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write to file \"%s\" : %m",
+						transient_dump_file_path)));
+
+	for (i = 0; i < num_blocks; i++)
+	{
+		/* In case of a SIGHUP, just reload the configuration. */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Have we been asked to stop dump? */
+		if (dump_interval == AT_PWARM_OFF)
+		{
+			pfree(block_info_array);
+			CloseTransientFile(fd);
+			unlink(transient_dump_file_path);
+			return 0;
+		}
+
+		buflen = sprintf(buf, "%u,%u,%u,%u,%u\n",
+						 block_info_array[i].database,
+						 block_info_array[i].tablespace,
+						 block_info_array[i].filenode,
+						 (uint32) block_info_array[i].forknum,
+						 block_info_array[i].blocknum);
+
+		if (write(fd, buf, buflen) < buflen)
+		{
+			int			save_errno = errno;
+
+			CloseTransientFile(fd);
+			unlink(transient_dump_file_path);
+			errno = save_errno;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to file \"%s\": %m",
+							transient_dump_file_path)));
+		}
+	}
+
+	pfree(block_info_array);
+
+	/*
+	 * Rename transient_dump_file_path to AUTOPREWARM_FILE to make things
+	 * permanent.
+	 */
+	ret = CloseTransientFile(fd);
+	if (ret != 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\" : %m",
+						transient_dump_file_path)));
+	(void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
+
+	state->pid_using_dumpfile = InvalidPid;
+
+	ereport(DEBUG1,
+			(errmsg("saved metadata info of %d blocks", num_blocks)));
+	return num_blocks;
+}
+
+/*
+ * dump_block_info_periodically
+ *		 This loop periodically call dump_now().
+ *
+ * Call dum_now() at regular intervals defined by GUC variable dump_interval.
+ */
+void
+dump_block_info_periodically(void)
+{
+	TimestampTz last_dump_time = GetCurrentTimestamp();
+
+	while (!got_sigterm)
+	{
+		int			rc;
+		struct timeval nap;
+
+		nap.tv_sec = AT_PWARM_DEFAULT_DUMP_INTERVAL;
+		nap.tv_usec = 0;
+
+		/* Have we been asked to stop dumping? */
+		if (dump_interval == AT_PWARM_OFF)
+			return;
+
+		if (dump_interval > AT_PWARM_DUMP_AT_SHUTDOWN_ONLY)
+		{
+			TimestampTz current_time = GetCurrentTimestamp();
+
+			if (TimestampDifferenceExceeds(last_dump_time,
+										   current_time,
+										   (dump_interval * 1000)))
+			{
+				dump_now(true);
+
+				/*
+				 * It is better to stop when shutdown signal is received
+				 * during or right after a dump.
+				 */
+				if (got_sigterm)
+					return;
+				last_dump_time = GetCurrentTimestamp();
+				nap.tv_sec = dump_interval;
+				nap.tv_usec = 0;
+			}
+			else
+			{
+				long		secs;
+				int			usecs;
+
+				TimestampDifference(last_dump_time, current_time,
+									&secs, &usecs);
+				nap.tv_sec = dump_interval - secs;
+				nap.tv_usec = 0;
+			}
+		}
+
+		ResetLatch(&MyProc->procLatch);
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   (nap.tv_sec * 1000L) + (nap.tv_usec / 1000L),
+					   PG_WAIT_EXTENSION);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		/* In case of a SIGHUP, just reload the configuration. */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* It's time for postmaster shutdown, let's dump for one last time. */
+	if (dump_interval != AT_PWARM_OFF)
+		dump_now(true);
+}
+
+/*
+ * autoprewarm_main
+ *		The main entry point of autoprewarm bgworker process.
+ */
+void
+autoprewarm_main(Datum main_arg)
+{
+	AutoPrewarmTask todo_task;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+	pqsignal(SIGUSR1, apw_sigusr1_handler);
+
+	/* We're now ready to receive signals. */
+	BackgroundWorkerUnblockSignals();
+
+	todo_task = DatumGetInt32(main_arg);
+	Assert(todo_task == TASK_PREWARM_BUFFERPOOL ||
+		   todo_task == TASK_DUMP_BUFFERPOOL_INFO);
+	init_autoprewarm_state();
+
+	LWLockAcquire(&state->lock, LW_EXCLUSIVE);
+	if (state->bgworker_pid != InvalidPid)
+	{
+		LWLockRelease(&state->lock);
+		ereport(LOG,
+				(errmsg("autoprewarm worker is already running under PID %d",
+						state->bgworker_pid)));
+		return;
+	}
+
+	state->bgworker_pid = MyProcPid;
+	LWLockRelease(&state->lock);
+
+	on_shmem_exit(reset_shm_state, 0);
+
+	ereport(LOG,
+			(errmsg("autoprewarm has started")));
+
+	if (todo_task == TASK_PREWARM_BUFFERPOOL &&
+		!state->skip_prewarm_on_restart)
+		prewarm_buffer_pool();
+
+	dump_block_info_periodically();
+
+	ereport(LOG,
+			(errmsg("autoprewarm shutting down")));
+}
+
+/* ============================================================================
+ * =============	Extension's entry functions/utilities	===================
+ * ============================================================================
+ */
+
+/*
+ * setup_autoprewarm
+ *		A common function to initialize BackgroundWorker structure.
+ */
+static void
+setup_autoprewarm(BackgroundWorker *autoprewarm, const char *worker_name,
+			   const char *worker_function, Datum main_arg, int restart_time,
+				  int extra_flags)
+{
+	MemSet(autoprewarm, 0, sizeof(BackgroundWorker));
+	autoprewarm->bgw_flags = BGWORKER_SHMEM_ACCESS | extra_flags;
+
+	/* Register the autoprewarm background worker */
+	autoprewarm->bgw_start_time = BgWorkerStart_ConsistentState;
+	autoprewarm->bgw_restart_time = restart_time;
+	strcpy(autoprewarm->bgw_library_name, "pg_prewarm");
+	strcpy(autoprewarm->bgw_function_name, worker_function);
+	strncpy(autoprewarm->bgw_name, worker_name, BGW_MAXLEN);
+	autoprewarm->bgw_main_arg = main_arg;
+}
+
+/*
+ * _PG_init
+ *		Extension's entry point.
+ */
+void
+_PG_init(void)
+{
+	BackgroundWorker prewarm_worker;
+
+	/* Define custom GUC variables. */
+	if (process_shared_preload_libraries_in_progress)
+		DefineCustomBoolVariable("pg_prewarm.autoprewarm",
+								 "Enable/Disable auto-prewarm feature.",
+								 NULL,
+								 &autoprewarm,
+								 true,
+								 PGC_POSTMASTER,
+								 0,
+								 NULL,
+								 NULL,
+								 NULL);
+
+	DefineCustomIntVariable("pg_prewarm.dump_interval",
+					   "Sets the maximum time between two buffer pool dumps",
+							"If set to zero, timer based dumping is disabled."
+							" If set to -1, stops autoprewarm.",
+							&dump_interval,
+							AT_PWARM_DEFAULT_DUMP_INTERVAL,
+							AT_PWARM_OFF, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	EmitWarningsOnPlaceholders("pg_prewarm");
+
+	/* If not run as a preloaded library, nothing more to do. */
+	if (!process_shared_preload_libraries_in_progress)
+		return;
+
+	/* Request additional shared resources. */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
+
+	/* If autoprewarm bgworker is disabled then nothing more to do. */
+	if (!autoprewarm)
+		return;
+
+	/* Register autoprewarm load. */
+	setup_autoprewarm(&prewarm_worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_PREWARM_BUFFERPOOL), 0, 0);
+	RegisterBackgroundWorker(&prewarm_worker);
+}
+
+/*
+ * autoprewarm_dump_launcher
+ *		Dynamically launch an autoprewarm dump worker.
+ */
+static pid_t
+autoprewarm_dump_launcher(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	setup_autoprewarm(&worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_DUMP_BUFFERPOOL_INFO), 0, 0);
+
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			   errmsg("registering dynamic bgworker \"autoprewarm\" failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerStartup(handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not start autoprewarm dump bgworker"),
+			   errhint("More details may be available in the server log.")));
+	}
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			  errmsg("cannot start bgworker autoprewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart the database.")));
+	}
+
+	Assert(status == BGWH_STARTED);
+	return pid;
+}
+
+/*
+ * launch_autoprewarm_dump
+ *		The C-Language entry function to launch autoprewarm dump bgworker.
+ */
+Datum
+launch_autoprewarm_dump(PG_FUNCTION_ARGS)
+{
+	pid_t		pid;
+
+	/* If dump_interval is disabled then nothing more to do. */
+	if (dump_interval == AT_PWARM_OFF)
+		PG_RETURN_NULL();
+
+	pid = autoprewarm_dump_launcher();
+	PG_RETURN_INT32(pid);
+}
+
+/*
+ * autoprewarm_dump_now
+ *		The C-Language entry function to dump immediately.
+ */
+Datum
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+	uint32		num_blocks = 0;
+
+	init_autoprewarm_state();
+
+	PG_TRY();
+	{
+		num_blocks = dump_now(false);
+	}
+	PG_CATCH();
+	{
+		if (state->pid_using_dumpfile == MyProcPid)
+			state->pid_using_dumpfile = InvalidPid;
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+	PG_RETURN_INT64(num_blocks);
+}
diff --git a/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
new file mode 100644
index 0000000..6c35fb7
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,14 @@
+/* contrib/pg_prewarm/pg_prewarm--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_prewarm UPDATE TO '1.2'" to load this file. \quit
+
+CREATE FUNCTION launch_autoprewarm_dump()
+RETURNS pg_catalog.int4 STRICT
+AS 'MODULE_PATHNAME', 'launch_autoprewarm_dump'
+LANGUAGE C;
+
+CREATE FUNCTION autoprewarm_dump_now()
+RETURNS pg_catalog.int8 STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_dump_now'
+LANGUAGE C;
diff --git a/contrib/pg_prewarm/pg_prewarm.control b/contrib/pg_prewarm/pg_prewarm.control
index cf2fb92..40e3add 100644
--- a/contrib/pg_prewarm/pg_prewarm.control
+++ b/contrib/pg_prewarm/pg_prewarm.control
@@ -1,5 +1,5 @@
 # pg_prewarm extension
 comment = 'prewarm relation data'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_prewarm'
 relocatable = true
diff --git a/doc/src/sgml/pgprewarm.sgml b/doc/src/sgml/pgprewarm.sgml
index c090401..e8d0c2e 100644
--- a/doc/src/sgml/pgprewarm.sgml
+++ b/doc/src/sgml/pgprewarm.sgml
@@ -10,7 +10,9 @@
  <para>
   The <filename>pg_prewarm</filename> module provides a convenient way
   to load relation data into either the operating system buffer cache
-  or the <productname>PostgreSQL</productname> buffer cache.
+  or the <productname>PostgreSQL</productname> buffer cache. Additionally, an
+  automatic prewarming of the server buffers is supported whenever the server
+  restarts.
  </para>
 
  <sect2>
@@ -55,6 +57,100 @@ pg_prewarm(regclass, mode text default 'buffer', fork text default 'main',
    cache. For these reasons, prewarming is typically most useful at startup,
    when caches are largely empty.
   </para>
+
+<synopsis>
+launch_autoprewarm_dump() RETURNS int4
+</synopsis>
+
+  <para>
+   This is a SQL callable function to launch the <literal>autoprewarm</literal>
+   worker to dump the buffer pool information at regular interval. In a server,
+   we can only run one <literal>autoprewarm</literal> worker so if worker sees
+   another existing worker it will exit immediately. The return value is pid of
+   the worker which has been launched.
+  </para>
+
+<synopsis>
+autoprewarm_dump_now() RETURNS int8
+</synopsis>
+
+  <para>
+   This is a SQL callable function to dump buffer pool information immediately
+   once by a backend. The return value is the number of block infos dumped.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>autoprewarm</title>
+
+  <para>
+  A bgworker which automatically records information about blocks which were
+  present in buffer pool before server shutdown and then prewarm the buffer
+  pool upon server restart with those blocks.
+  </para>
+
+  <para>
+  When the shared library <literal>pg_prewarm</literal> is preloaded via
+  <xref linkend="guc-shared-preload-libraries"> in <filename>postgresql.conf</>,
+  a bgworker <literal>autoprewarm</literal> is launched immediately after the
+  server has reached a consistent state. The bgworker will start loading blocks
+  recorded in <literal>$PGDATA/autoprewarm.blocks</literal> until there is a
+  free buffer left in the buffer pool. This way we do not replace any new
+  blocks which were loaded either by the recovery process or the querying
+  clients.
+  </para>
+
+  <para>
+  Once the <literal>autoprewarm</literal> bgworker has completed its prewarm
+  task, it will start a new task to periodically dump the information about
+  blocks which are currently in shared buffer pool. Upon next server restart,
+  the bgworker will prewarm the buffer pool by loading those blocks. The GUC
+  <literal>pg_prewarm.dump_interval</literal> will control the dumping activity
+  of the bgworker.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+ <variablelist>
+   <varlistentry>
+    <term>
+     <varname>pg_prewarm.autoprewarm</varname> (<type>boolean</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.autoprewarm</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is valid only for <literal>autoprewarm</literal>. An autoprewarm
+      worker will only be started if this variable is set <literal>on</literal>.
+      The default value is <literal>on</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry>
+   <term>
+     <varname>pg_prewarm.dump_interval</varname> (<type>int</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.dump_interval</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is valid only for <literal>autoprewarm</literal>. The minimum number
+      of seconds between two buffer pool's block information dump. The default
+      is 300 seconds. It also takes special values. If set to 0 then timer
+      based dump is disabled, it dumps only while the server is shutting down.
+      If set to -1, the running <literal>autoprewarm</literal> will be stopped.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
  </sect2>
 
  <sect2>
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 5d0a636..06a34a7 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,23 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- a lockless check to see if there is a free buffer in
+ *					   buffer pool.
+ *
+ * If the result is true that will become stale once free buffers are moved out
+ * by other operations, so the caller who strictly want to use a free buffer
+ * should not call this.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index ff99f6b..ab04bd9 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index eaa6d32..c6fa86a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -138,6 +138,8 @@ AttrDefault
 AttrNumber
 AttributeOpts
 AuthRequest
+AutoPrewarmSharedState
+AutoPrewarmTask
 AutoVacOpts
 AutoVacuumShmemStruct
 AutoVacuumWorkItem
@@ -214,10 +216,12 @@ BitmapOr
 BitmapOrPath
 BitmapOrState
 Bitmapset
+BlkType
 BlobInfo
 Block
 BlockId
 BlockIdData
+BlockInfoRecord
 BlockNumber
 BlockSampler
 BlockSamplerData
@@ -2869,6 +2873,7 @@ pos_trgm
 post_parse_analyze_hook_type
 pqbool
 pqsigfunc
+prewarm_elem
 printQueryOpt
 printTableContent
 printTableFooter

#87

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Mithun Cy (#86)

Re: Proposal : For Auto-Prewarm.

On Mon, Jun 12, 2017 at 6:31 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

Thanks, Amit,

+ /* Perform autoprewarm's task. */
+ if (todo_task == TASK_PREWARM_BUFFERPOOL &&
+ !state->skip_prewarm_on_restart)

Why have you removed above comment in the new patch? I am not
pointing this because above comment is meaningful, rather changing
things in different versions of the patch without informing reviewer
can increase the time to review. I feel you can write some better
comment here.

1.
new file mode 100644
index 0000000..6c35fb7
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,14 @@
+/* contrib/pg_prewarm/pg_prewarm--1.0--1.1.sql */

In comments, the SQL file name is wrong.

2.
+ /* Define custom GUC variables. */
+ if (process_shared_preload_libraries_in_progress)
+ DefineCustomBoolVariable("pg_prewarm.autoprewarm",
+ "Enable/Disable auto-prewarm feature.",
+ NULL,
+ &autoprewarm,
+ true,
+ PGC_POSTMASTER,
+ 0,
+ NULL,
+ NULL,
+ NULL);
+
+ DefineCustomIntVariable("pg_prewarm.dump_interval",
+   "Sets the maximum time between two buffer pool dumps",
+ "If set to zero, timer based dumping is disabled."
+ " If set to -1, stops autoprewarm.",
+ &dump_interval,
+ AT_PWARM_DEFAULT_DUMP_INTERVAL,
+ AT_PWARM_OFF, INT_MAX / 1000,
+ PGC_SIGHUP,
+ GUC_UNIT_S,
+ NULL,
+ NULL,
+ NULL);
+
+ EmitWarningsOnPlaceholders("pg_prewarm");
+
+ /* If not run as a preloaded library, nothing more to do. */
+ if (!process_shared_preload_libraries_in_progress)
+ return;

a. You can easily write this code such that
process_shared_preload_libraries_in_progress needs to be checked just
once. Move the define of pg_prewarm.dump_interval at first place and
then check if (!process_shared_preload_libraries_in_progress ) return.

b. Variable name autoprewarm isn't self-explanatory, also if you have
to search the use of this variable in the code, it is difficult
because a lot of unrelated usages can pop-up. How about naming it as
start_prewarm_worker or enable_autoprewarm or something like that?

3.
+static AutoPrewarmSharedState *state = NULL;

Again, the naming of this variable (state) is not meaningful. How
about SharedPrewarmState or something like that?

4.
+ ereport(LOG,
+ (errmsg("autoprewarm has started")));
..
+ ereport(LOG,
+ (errmsg("autoprewarm shutting down")));

How about changing messages as "autoprewarm worker started" and
"autoprewarm worker stopped" respectively?

5.
+void
+dump_block_info_periodically(void)
+{
+ TimestampTz last_dump_time = GetCurrentTimestamp();
..
+ if (TimestampDifferenceExceeds(last_dump_time,
+   current_time,
+   (dump_interval * 1000)))
+ {
+ dump_now(true);
..

With above coding, it will not dump the very first time it tries to
dump the blocks information. I think it is better if it dumps the
first time and then dumps after dump_interval. I think it is not
difficult to arrange above code to do so if you also think that is
better behavior?

6.
+dump_now(bool is_bgworker)
{
..
+ if (write(fd, buf, buflen) < buflen)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write to file \"%s\" : %m",
+ transient_dump_file_path)));
..
}

You seem to forget to close and unlink the file in above code path.
There are a lot of places in this function where you have to free
memory or close file in case of an error condition. You can use
multiple labels to define error exit paths, something like we have
done in DecodeXLogRecord.

7.
+ for (i = 0; i < num_blocks; i++)
+ {
+ /* In case of a SIGHUP, just reload the configuration. */
+ if (got_sighup)
+ {
+ got_sighup = false;
+ ProcessConfigFile(PGC_SIGHUP);
+ }
+
+ /* Have we been asked to stop dump? */
+ if (dump_interval == AT_PWARM_OFF)
+ {
+ pfree(block_info_array);
+ CloseTransientFile(fd);
+ unlink(transient_dump_file_path);
+ return 0;
+ }
+
+ buflen = sprintf(buf, "%u,%u,%u,%u,%u\n",
+ block_info_array[i].database,
+ block_info_array[i].tablespace,
+ block_info_array[i].filenode,
+ (uint32) block_info_array[i].forknum,
+ block_info_array[i].blocknum);
+
+ if (write(fd, buf, buflen) < buflen)
+ {
+ int save_errno = errno;
+
+ CloseTransientFile(fd);
+ unlink(transient_dump_file_path);
+ errno = save_errno;
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write to file \"%s\": %m",
+ transient_dump_file_path)));
+ }

It seems we can club the above writes into 8K sized writes instead of
one write per block information.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#88

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Amit Kapila (#87)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

On Mon, Jun 12, 2017 at 7:34 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Mon, Jun 12, 2017 at 6:31 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

Thanks, Amit,
+ /* Perform autoprewarm's task. */
+ if (todo_task == TASK_PREWARM_BUFFERPOOL &&
+ !state->skip_prewarm_on_restart)
Why have you removed above comment in the new patch? I am not
pointing this because above comment is meaningful, rather changing
things in different versions of the patch without informing reviewer
can increase the time to review. I feel you can write some better
comment here.

That happened during previous comment fix. I think I have removed in
patch_12 itself and I have stated same in mail. I felt this code was simple
so there was no need of adding new comments. I have tried to add few now as
suggested.

1.
new file mode 100644
index 0000000..6c35fb7
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,14 @@
+/* contrib/pg_prewarm/pg_prewarm--1.0--1.1.sql */

In comments, the SQL file name is wrong.

-- Sorry, Fixed.

2.
+ /* Define custom GUC variables. */
+ if (process_shared_preload_libraries_in_progress)
+ DefineCustomBoolVariable("pg_prewarm.autoprewarm",
+ "Enable/Disable auto-prewarm feature.",
+ NULL,
+ &autoprewarm,
+ true,
+ PGC_POSTMASTER,
+ 0,
+ NULL,
+ NULL,
+ NULL);
+
+ DefineCustomIntVariable("pg_prewarm.dump_interval",
+   "Sets the maximum time between two buffer pool dumps",
+ "If set to zero, timer based dumping is disabled."
+ " If set to -1, stops autoprewarm.",
+ &dump_interval,
+ AT_PWARM_DEFAULT_DUMP_INTERVAL,
+ AT_PWARM_OFF, INT_MAX / 1000,
+ PGC_SIGHUP,
+ GUC_UNIT_S,
+ NULL,
+ NULL,
+ NULL);
+
+ EmitWarningsOnPlaceholders("pg_prewarm");
+
+ /* If not run as a preloaded library, nothing more to do. */
+ if (!process_shared_preload_libraries_in_progress)
+ return;

-- Thanks I have fixed as suggested. Previously I did it that way to avoid
calling EmitWarningsOnPlaceholders in two different places.

b. Variable name autoprewarm isn't self-explanatory, also if you have
to search the use of this variable in the code, it is difficult
because a lot of unrelated usages can pop-up. How about naming it as
start_prewarm_worker or enable_autoprewarm or something like that?

-- Name was taken as part of previous comments, I think enable_autoprewarm
looks good so renaming it. Please let me know if I need to reconsider same.

3.
+static AutoPrewarmSharedState *state = NULL;

Again, the naming of this variable (state) is not meaningful. How
about SharedPrewarmState or something like that?

-- state is for both prewarm and dump worker I would like to keep it simple
and small, some where else I have used "apw_sigterm_handler" so I think
"apw_state" could be a good compromise. I have renamed functions
to reset_apw_state, init_apw_state in similar lines. Please let me know if
I need to reconsider same.

4.
+ ereport(LOG,
+ (errmsg("autoprewarm has started")));
..
+ ereport(LOG,
+ (errmsg("autoprewarm shutting down")));
How about changing messages as "autoprewarm worker started" and
"autoprewarm worker stopped" respectively?

-- Thanks fixed as suggested.

5.
+void
+dump_block_info_periodically(void)
+{
+ TimestampTz last_dump_time = GetCurrentTimestamp();
..
+ if (TimestampDifferenceExceeds(last_dump_time,
+   current_time,
+   (dump_interval * 1000)))
+ {
+ dump_now(true);
..
With above coding, it will not dump the very first time it tries to
dump the blocks information. I think it is better if it dumps the
first time and then dumps after dump_interval. I think it is not
difficult to arrange above code to do so if you also think that is
better behavior?

-- Thanks agree, fixed as suggested.

6.
+dump_now(bool is_bgworker)
{
..
+ if (write(fd, buf, buflen) < buflen)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write to file \"%s\" : %m",
+ transient_dump_file_path)));
..
}
You seem to forget to close and unlink the file in above code path.
There are a lot of places in this function where you have to free
memory or close file in case of an error condition. You can use
multiple labels to define error exit paths, something like we have
done in DecodeXLogRecord.

-- Fixed and I have moved those error message to a new
function buffer_file_flush while fixing below comments, so I think having a
goto to as in DecodeXLogRecord is not necessary now.

+ for (i = 0; i < num_blocks; i++)
+ {
+ /* In case of a SIGHUP, just reload the configuration. */
+ if (got_sighup)
+ {
+ got_sighup = false;
+ ProcessConfigFile(PGC_SIGHUP);
+ }
+
+ /* Have we been asked to stop dump? */
+ if (dump_interval == AT_PWARM_OFF)
+ {
+ pfree(block_info_array);
+ CloseTransientFile(fd);
+ unlink(transient_dump_file_path);
+ return 0;
+ }
+
+ buflen = sprintf(buf, "%u,%u,%u,%u,%u\n",
+ block_info_array[i].database,
+ block_info_array[i].tablespace,
+ block_info_array[i].filenode,
+ (uint32) block_info_array[i].forknum,
+ block_info_array[i].blocknum);
+
+ if (write(fd, buf, buflen) < buflen)
+ {
+ int save_errno = errno;
+
+ CloseTransientFile(fd);
+ unlink(transient_dump_file_path);
+ errno = save_errno;
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not write to file \"%s\": %m",
+ transient_dump_file_path)));
+ }

It seems we can club the above writes into 8K sized writes instead of
one write per block information.

-- I have tried to fix same mostly inspired from BufFileWrite in buffile.c.
I have not used same because there was no interfaces suitable to use the
facility and also I am not sure if I can use it in our context. Should I
also use buffer file while reading from autoprewarm.blocks file.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

autoprewarm_14.patchapplication/octet-stream; name=autoprewarm_14.patchDownload

commit 9f94121bd76c30c4fa22e129141bd69a47c6e325
Author: mithun <mithun@localhost.localdomain>
Date:   Thu Jun 15 09:55:27 2017 +0530

    patch 14

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e..88580d1 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,10 +1,10 @@
 # contrib/pg_prewarm/Makefile
 
 MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+OBJS = pg_prewarm.o autoprewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
-DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
+DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
new file mode 100644
index 0000000..f84fa4a
--- /dev/null
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -0,0 +1,1109 @@
+/*-------------------------------------------------------------------------
+ *
+ * autoprewarm.c
+ *		Automatically prewarms the shared buffer pool when server restarts.
+ *
+ * DESCRIPTION
+ *
+ *		Autoprewarm is a bgworker process that automatically records the
+ *		information about blocks which were present in buffer pool before
+ *		server shutdown. Then prewarms the buffer pool on server restart
+ *		with those blocks.
+ *
+ *		How does it work? When the shared library "pg_prewarm" is preloaded, a
+ *		bgworker "autoprewarm" is launched immediately after the server has
+ *		reached a consistent state. The bgworker will start loading blocks
+ *		recorded until there is no free buffer left in the buffer pool. This
+ *		way we do not replace any new blocks which were loaded either by the
+ *		recovery process or the querying clients.
+ *
+ *		Once the "autoprewarm" bgworker has completed its prewarm task, it will
+ *		start a new task to periodically dump the BlockInfoRecords related to
+ *		the blocks which are currently in shared buffer pool. On next server
+ *		restart, the bgworker will prewarm the buffer pool by loading those
+ *		blocks. The GUC pg_prewarm.dump_interval will control the dumping
+ *		activity of the bgworker.
+ *
+ *	Copyright (c) 2016-2017, PostgreSQL Global Development Group
+ *
+ *	IDENTIFICATION
+ *		contrib/pg_prewarm/autoprewarm.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include <unistd.h>
+
+/* These are always necessary for a bgworker. */
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+
+/* These are necessary for prewarm utilities. */
+#include "access/heapam.h"
+#include "access/xact.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "pgstat.h"
+#include "storage/buf_internals.h"
+#include "storage/dsm.h"
+#include "storage/smgr.h"
+#include "utils/acl.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relfilenodemap.h"
+#include "utils/resowner.h"
+
+PG_FUNCTION_INFO_V1(launch_autoprewarm_dump);
+PG_FUNCTION_INFO_V1(autoprewarm_dump_now);
+
+#define AT_PWARM_OFF -1
+#define AT_PWARM_DUMP_AT_SHUTDOWN_ONLY 0
+#define AT_PWARM_DEFAULT_DUMP_INTERVAL 300
+
+#define AUTOPREWARM_FILE "autoprewarm.blocks"
+
+/* Primary functions */
+void		_PG_init(void);
+void		autoprewarm_main(Datum main_arg);
+static void dump_block_info_periodically(void);
+static pid_t autoprewarm_dump_launcher(void);
+static void setup_autoprewarm(BackgroundWorker *autoprewarm,
+				  const char *worker_name,
+				  const char *worker_function,
+				  Datum main_arg, int restart_time,
+				  int extra_flags);
+void		load_one_database(Datum main_arg);
+
+/*
+ * Signal Handlers.
+ */
+
+static void apw_sigterm_handler(SIGNAL_ARGS);
+static void apw_sighup_handler(SIGNAL_ARGS);
+static void apw_sigusr1_handler(SIGNAL_ARGS);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+static volatile sig_atomic_t got_sighup = false;
+
+/*
+ *	Signal handler for SIGTERM
+ *	Set a flag to handle.
+ */
+static void
+apw_sigterm_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGHUP
+ *	Set a flag to reread the config file.
+ */
+static void
+apw_sighup_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sighup = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGUSR1.
+ *	The prewarm workers notify with SIGUSR1 on their startup/shutdown.
+ */
+static void
+apw_sigusr1_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/* ============================================================================
+ * ==============	Types and variables used by autoprewarm   =============
+ * ============================================================================
+ */
+
+/* Metadata of each persistent block which is dumped and used for loading. */
+typedef struct BlockInfoRecord
+{
+	Oid			database;
+	Oid			tablespace;
+	Oid			filenode;
+	ForkNumber	forknum;
+	BlockNumber blocknum;
+} BlockInfoRecord;
+
+/* Tasks performed by autoprewarm workers.*/
+typedef enum
+{
+	TASK_PREWARM_BUFFERPOOL,	/* prewarm the buffer pool. */
+	TASK_DUMP_BUFFERPOOL_INFO	/* dump the buffer pool block info. */
+} AutoPrewarmTask;
+
+/* Shared state information for autoprewarm bgworker. */
+typedef struct AutoPrewarmSharedState
+{
+	LWLock		lock;			/* mutual exclusion */
+	pid_t		bgworker_pid;	/* for main bgworker */
+	pid_t		pid_using_dumpfile;		/* for autoprewarm or block dump */
+	bool		skip_prewarm_on_restart;		/* if set true, prewarm task
+												 * will not be done */
+
+	/* Following items are for communication with per-database worker */
+	dsm_handle	block_info_handle;
+	Oid			database;
+	int			prewarm_start_idx;
+	int			prewarm_stop_idx;
+} AutoPrewarmSharedState;
+
+static AutoPrewarmSharedState *apw_state = NULL;
+
+/*
+ * This data structure represents buffered file.
+ */
+typedef struct BufferFile
+{
+	char		transient_dump_file_path[MAXPGPATH];	/* actual file to be
+														 * written */
+	int			fd;				/* file descriptor to above file */
+	char		buf[BLCKSZ];	/* buffer used before writing to file */
+	int			pos;			/* next write position in buffer. */
+}	BufferFile;
+
+/* GUC variable that controls the dump activity of autoprewarm. */
+static int	dump_interval = 0;
+
+/*
+ * GUC variable to decide whether autoprewarm worker should be started when
+ * preloaded.
+ */
+static bool enable_autoprewarm = true;
+
+/* Compare member elements to check whether they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * blockinfo_cmp
+ *		Compare function used for qsort().
+ */
+static int
+blockinfo_cmp(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(tablespace);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+	return 0;
+}
+
+/* ============================================================================
+ * =====================	Prewarm part of autoprewarm =======================
+ * ============================================================================
+ */
+
+/*
+ * reset_apw_state
+ *		on_apw_exit reset the prewarm state
+ */
+
+static void
+reset_apw_state(int code, Datum arg)
+{
+	if (apw_state->pid_using_dumpfile == MyProcPid)
+		apw_state->pid_using_dumpfile = InvalidPid;
+	if (apw_state->bgworker_pid == MyProcPid)
+		apw_state->bgworker_pid = InvalidPid;
+}
+
+/*
+ * init_apw_state
+ *		Allocate and initialize autoprewarm related shared memory.
+ */
+static void
+init_apw_state(void)
+{
+	bool		found = false;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	apw_state = ShmemInitStruct("autoprewarm",
+								sizeof(AutoPrewarmSharedState),
+								&found);
+	if (!found)
+	{
+		/* First time through ... */
+		LWLockInitialize(&apw_state->lock, LWLockNewTrancheId());
+		apw_state->bgworker_pid = InvalidPid;
+		apw_state->pid_using_dumpfile = InvalidPid;
+		apw_state->skip_prewarm_on_restart = false;
+	}
+
+	LWLockRelease(AddinShmemInitLock);
+}
+
+/*
+ * load_one_database
+ *		This subroutine loads the BlockInfoRecords of the database set in
+ *		AutoPrewarmSharedState.
+ *
+ * Connect to the database and load the blocks of that database which are given
+ * by [apw_state->prewarm_start_idx, apw_state->prewarm_stop_idx).
+ */
+void
+load_one_database(Datum main_arg)
+{
+	uint32		pos;
+	BlockInfoRecord *block_info;
+	Relation	rel = NULL;
+	BlockNumber nblocks = 0;
+	BlockInfoRecord *old_blk;
+	dsm_segment *seg;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+
+	init_apw_state();
+	seg = dsm_attach(apw_state->block_info_handle);
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("could not map dynamic shared memory segment")));
+
+	block_info = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	BackgroundWorkerInitializeConnectionByOid(apw_state->database, InvalidOid);
+	old_blk = NULL;
+	pos = apw_state->prewarm_start_idx;
+
+	while (!got_sigterm && pos < apw_state->prewarm_stop_idx &&
+		   have_free_buffer())
+	{
+		BlockInfoRecord *blk = &block_info[pos++];
+		Buffer		buf;
+
+		/*
+		 * Quit if we've reached records for another database. If previous
+		 * blocks are of some global objects, then continue pre-warming.
+		 */
+		if (old_blk != NULL && old_blk->database != blk->database &&
+			old_blk->database != 0)
+			break;
+
+		/*
+		 * As soon as we encounter a block of a new relation, close the old
+		 * relation. Note, that rel will be NULL if try_relation_open failed
+		 * previously, in that case there is nothing to close.
+		 */
+		if (old_blk != NULL && old_blk->filenode != blk->filenode &&
+			rel != NULL)
+		{
+			relation_close(rel, AccessShareLock);
+			rel = NULL;
+			CommitTransactionCommand();
+		}
+
+		/*
+		 * Try to open each new relation, but only once, when we first
+		 * encounter it. If it's been dropped, skip the associated blocks.
+		 */
+		if (old_blk == NULL || old_blk->filenode != blk->filenode)
+		{
+			Oid			reloid;
+
+			Assert(rel == NULL);
+			StartTransactionCommand();
+			reloid = RelidByRelfilenode(blk->tablespace, blk->filenode);
+			if (OidIsValid(reloid))
+				rel = try_relation_open(reloid, AccessShareLock);
+
+			if (!rel)
+				CommitTransactionCommand();
+		}
+		if (!rel)
+		{
+			old_blk = blk;
+			continue;
+		}
+
+		/* Once per fork, check for fork existence and size. */
+		if (old_blk == NULL ||
+			old_blk->filenode != blk->filenode ||
+			old_blk->forknum != blk->forknum)
+		{
+			RelationOpenSmgr(rel);
+
+			/*
+			 * smgrexists is not safe for illegal forknum, hence check whether
+			 * the passed forknum is valid before using it in smgrexists.
+			 */
+			if (blk->forknum > InvalidForkNumber &&
+				blk->forknum <= MAX_FORKNUM &&
+				smgrexists(rel->rd_smgr, blk->forknum))
+				nblocks = RelationGetNumberOfBlocksInFork(rel, blk->forknum);
+			else
+				nblocks = 0;
+		}
+
+		/* Check whether blocknum is valid and within fork file size. */
+		if (blk->blocknum >= nblocks)
+		{
+			/* Move to next forknum. */
+			old_blk = blk;
+			continue;
+		}
+
+		/* Prewarm buffer. */
+		buf = ReadBufferExtended(rel, blk->forknum, blk->blocknum, RBM_NORMAL,
+								 NULL);
+		if (BufferIsValid(buf))
+			ReleaseBuffer(buf);
+
+		old_blk = blk;
+	}
+
+	dsm_detach(seg);
+
+	/* Release lock on previous relation. */
+	if (rel)
+	{
+		relation_close(rel, AccessShareLock);
+		CommitTransactionCommand();
+	}
+
+	return;
+}
+
+/*
+ * launch_and_wait_for_per_database_worker
+ *		Register a per-database dynamic worker to load.
+ */
+static void
+launch_and_wait_for_per_database_worker(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle = NULL;
+	BgwHandleStatus status PG_USED_FOR_ASSERTS_ONLY;
+
+	setup_autoprewarm(&worker, "autoprewarm", "load_one_database",
+					  (Datum) NULL, BGW_NEVER_RESTART,
+					  BGWORKER_BACKEND_DATABASE_CONNECTION);
+
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerShutdown */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker autoprewarm failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerShutdown(handle);
+	Assert(status == BGWH_STOPPED);
+}
+
+/*
+ * prewarm_buffer_pool
+ *		The main routine that prewarms the buffer pool.
+ *
+ * The prewarm bgworker will first load all the BlockInfoRecords in
+ * $PGDATA/AUTOPREWARM_FILE to a DSM. Further, these BlockInfoRecords are
+ * separated based on their databases. Finally, for each group of
+ * BlockInfoRecords a per-database worker will be launched to load the
+ * corresponding blocks. Launch the next worker only after the previous one has
+ * finished its job.
+ */
+static void
+prewarm_buffer_pool(void)
+{
+	FILE	   *file = NULL;
+	uint32		num_elements,
+				i;
+	BlockInfoRecord *blkinfo;
+	dsm_segment *seg;
+
+	/*
+	 * Since there can be at most one worker for prewarm, locking is not
+	 * required for setting skip_prewarm_on_restart.
+	 */
+	apw_state->skip_prewarm_on_restart = true;
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->pid_using_dumpfile == InvalidPid)
+		apw_state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		LWLockRelease(&apw_state->lock);
+		ereport(LOG,
+				(errmsg("skipping prewarm because block dump file is being written by PID %d",
+						apw_state->pid_using_dumpfile)));
+		return;
+	}
+
+	LWLockRelease(&apw_state->lock);
+
+	file = AllocateFile(AUTOPREWARM_FILE, PG_BINARY_R);
+	if (!file)
+	{
+		if (errno != ENOENT)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m",
+							AUTOPREWARM_FILE)));
+
+		apw_state->pid_using_dumpfile = InvalidPid;
+		return;					/* No file to load. */
+	}
+
+	if (fscanf(file, "<<%u>>i\n", &num_elements) != 1)
+	{
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from file \"%s\": %m",
+						AUTOPREWARM_FILE)));
+	}
+
+	seg = dsm_create(sizeof(BlockInfoRecord) * num_elements, 0);
+
+	blkinfo = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	for (i = 0; i < num_elements; i++)
+	{
+		/* Get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &blkinfo[i].database,
+						&blkinfo[i].tablespace, &blkinfo[i].filenode,
+						(uint32 *) &blkinfo[i].forknum, &blkinfo[i].blocknum))
+			break;
+	}
+
+	FreeFile(file);
+
+	if (num_elements != i)
+		elog(ERROR, "autoprewarm block dump has %u entries but expected %u",
+			 i, num_elements);
+
+	/*
+	 * Sort the block number to increase the chance of sequential reads during
+	 * load.
+	 */
+	pg_qsort(blkinfo, num_elements, sizeof(BlockInfoRecord), blockinfo_cmp);
+
+	apw_state->block_info_handle = dsm_segment_handle(seg);
+	apw_state->prewarm_start_idx = apw_state->prewarm_stop_idx = 0;
+
+	/* Get the info position of the first block of the next database. */
+	while (apw_state->prewarm_start_idx < num_elements)
+	{
+		uint32		i = apw_state->prewarm_start_idx;
+		Oid			current_db = blkinfo[i].database;
+
+		/*
+		 * Advance the prewarm_stop_idx to the first BlockRecordInfo that does
+		 * not belong to this database.
+		 */
+		i++;
+		while (i < num_elements)
+		{
+			if (current_db != blkinfo[i].database)
+			{
+				/*
+				 * Combine BlockRecordInfos of global object with the next
+				 * non-global object.
+				 */
+				if (current_db != InvalidOid)
+					break;
+				current_db = blkinfo[i].database;
+			}
+
+			i++;
+		}
+
+		/*
+		 * If we reach this point with current_db == InvalidOid, then only
+		 * BlockRecordInfos belonging to global objects exist. Since, we can
+		 * not connect with InvalidOid skip prewarming for these objects.
+		 */
+		if (current_db == InvalidOid)
+			break;
+
+		apw_state->prewarm_stop_idx = i;
+		apw_state->database = current_db;
+
+		Assert(apw_state->prewarm_start_idx < apw_state->prewarm_stop_idx);
+
+		/*
+		 * Register a per-database worker to load blocks of the database. Wait
+		 * until it has finished before starting the next worker.
+		 */
+		launch_and_wait_for_per_database_worker();
+		apw_state->prewarm_start_idx = apw_state->prewarm_stop_idx;
+	}
+
+	dsm_detach(seg);
+	apw_state->block_info_handle = DSM_HANDLE_INVALID;
+
+	apw_state->pid_using_dumpfile = InvalidPid;
+	ereport(DEBUG1,
+			(errmsg("autoprewarm load task ended")));
+	return;
+}
+
+/*
+ * ============================================================================
+ * ===================== Dump part of Autoprewarm =============================
+ * ============================================================================
+ */
+
+/*
+ * This submodule is for periodically dumping BlockRecordInfos in buffer pool
+ * into a dump file AUTOPREWARM_FILE.
+ * Each entry of BlockRecordInfo consists of database, tablespace, filenode,
+ * forknum, blocknum. Note that this is in the text form so that the dump
+ * information is readable and can be edited, if required.
+ */
+
+/*
+ * buffer_file_flush
+ *		Unload the buffer contents to actual file.
+ *
+ */
+static void
+buffer_file_flush(BufferFile * file)
+{
+	ssize_t		w_size;
+	char	   *buf = file->buf;
+
+	while (file->pos)
+	{
+		/* write to file until an error */
+		w_size = write(file->fd, buf, file->pos);
+		if (w_size > 0)
+		{
+			file->pos -= w_size;
+			buf += w_size;
+		}
+		else
+		{
+			int			save_errno = errno;
+
+			CloseTransientFile(file->fd);
+			unlink(file->transient_dump_file_path);
+			errno = save_errno;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to file \"%s\" : %m",
+							file->transient_dump_file_path)));
+		}
+	}
+}
+
+/*
+ * buffer_file_write
+ *		First accumulate the contents in a BLCKSZ buffer then unload it to
+ *		actual file.
+ */
+static void
+buffer_file_write(BufferFile * file, char *block_info, int block_info_len)
+{
+	Assert(block_info_len <= BLCKSZ);
+
+	/* If we exceed the buffer size unload buffer to actual file. */
+	if ((file->pos + block_info_len) > BLCKSZ)
+		buffer_file_flush(file);
+
+	memcpy(file->buf + file->pos, block_info, block_info_len);
+	file->pos += block_info_len;
+}
+
+/*
+ * dump_now
+ *		Dumps BlockRecordInfos in buffer pool.
+ */
+static uint32
+dump_now(bool is_bgworker)
+{
+	uint32		i;
+	int			ret,
+				block_info_len;
+	uint32		num_blocks;
+	BlockInfoRecord *block_info_array;
+	BufferDesc *bufHdr;
+	BufferFile *file;
+	char		block_info[1024];
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->pid_using_dumpfile == InvalidPid)
+		apw_state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		LWLockRelease(&apw_state->lock);
+
+		if (!is_bgworker)
+			ereport(ERROR,
+					(errmsg("could not perform block dump because dump file is being used by PID %d",
+							apw_state->pid_using_dumpfile)));
+		ereport(LOG,
+				(errmsg("skipping block dump because it is already being performed by PID %d",
+						apw_state->pid_using_dumpfile)));
+		return 0;
+	}
+
+	LWLockRelease(&apw_state->lock);
+
+	block_info_array =
+		(BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	for (num_blocks = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		/* In case of a SIGHUP, just reload the configuration. */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Have we been asked to stop dump? */
+		if (dump_interval == AT_PWARM_OFF)
+		{
+			pfree(block_info_array);
+			return 0;
+		}
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* Lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		if (buf_state & BM_TAG_VALID)
+		{
+			block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_blocks].tablespace = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+			++num_blocks;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	file = (BufferFile *) palloc(sizeof(BufferFile));
+	snprintf(file->transient_dump_file_path, MAXPGPATH, "%s.tmp",
+			 AUTOPREWARM_FILE);
+
+	file->fd = OpenTransientFile(file->transient_dump_file_path,
+							 O_CREAT | O_WRONLY | O_TRUNC | PG_BINARY, 0666);
+	if (file->fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open \"%s\": %m",
+						file->transient_dump_file_path)));
+	file->pos = 0;
+
+	block_info_len = sprintf(block_info, "<<%u>>\n", num_blocks);
+	buffer_file_write(file, block_info, block_info_len);
+
+	for (i = 0; i < num_blocks; i++)
+	{
+		/* In case of a SIGHUP, just reload the configuration. */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Have we been asked to stop dump? */
+		if (dump_interval == AT_PWARM_OFF)
+		{
+			pfree(block_info_array);
+			CloseTransientFile(file->fd);
+			unlink(file->transient_dump_file_path);
+			pfree(file);
+			return 0;
+		}
+
+		block_info_len = sprintf(block_info, "%u,%u,%u,%u,%u\n",
+								 block_info_array[i].database,
+								 block_info_array[i].tablespace,
+								 block_info_array[i].filenode,
+								 (uint32) block_info_array[i].forknum,
+								 block_info_array[i].blocknum);
+
+		buffer_file_write(file, block_info, block_info_len);
+	}
+
+	pfree(block_info_array);
+
+	/* Write remaining buffer contents to actual file. */
+	buffer_file_flush(file);
+
+	/*
+	 * Rename transient_dump_file_path to AUTOPREWARM_FILE to make things
+	 * permanent.
+	 */
+	ret = CloseTransientFile(file->fd);
+	if (ret != 0)
+	{
+		int			save_errno = errno;
+
+		unlink(file->transient_dump_file_path);
+		errno = save_errno;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\" : %m",
+						file->transient_dump_file_path)));
+	}
+
+	(void) durable_rename(file->transient_dump_file_path, AUTOPREWARM_FILE,
+						  ERROR);
+	pfree(file);
+	apw_state->pid_using_dumpfile = InvalidPid;
+
+	ereport(DEBUG1,
+			(errmsg("saved metadata info of %d blocks", num_blocks)));
+	return num_blocks;
+}
+
+/*
+ * dump_block_info_periodically
+ *		 This loop periodically call dump_now().
+ *
+ * Call dum_now() at regular intervals defined by GUC variable dump_interval.
+ */
+void
+dump_block_info_periodically(void)
+{
+	TimestampTz last_dump_time = 0;
+
+	while (!got_sigterm)
+	{
+		int			rc;
+		struct timeval nap;
+
+		nap.tv_sec = AT_PWARM_DEFAULT_DUMP_INTERVAL;
+		nap.tv_usec = 0;
+
+		/* Have we been asked to stop dumping? */
+		if (dump_interval == AT_PWARM_OFF)
+			return;
+
+		if (dump_interval > AT_PWARM_DUMP_AT_SHUTDOWN_ONLY)
+		{
+			TimestampTz current_time = GetCurrentTimestamp();
+
+			if (last_dump_time == 0 ||
+				TimestampDifferenceExceeds(last_dump_time,
+										   current_time,
+										   (dump_interval * 1000)))
+			{
+				dump_now(true);
+
+				/*
+				 * It is better to stop when shutdown signal is received
+				 * during or right after a dump.
+				 */
+				if (got_sigterm)
+					return;
+				last_dump_time = GetCurrentTimestamp();
+				nap.tv_sec = dump_interval;
+				nap.tv_usec = 0;
+			}
+			else
+			{
+				long		secs;
+				int			usecs;
+
+				TimestampDifference(last_dump_time, current_time,
+									&secs, &usecs);
+				nap.tv_sec = dump_interval - secs;
+				nap.tv_usec = 0;
+			}
+		}
+		else
+			last_dump_time = 0;
+
+		ResetLatch(&MyProc->procLatch);
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   (nap.tv_sec * 1000L) + (nap.tv_usec / 1000L),
+					   PG_WAIT_EXTENSION);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		/* In case of a SIGHUP, just reload the configuration. */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* It's time for postmaster shutdown, let's dump for one last time. */
+	if (dump_interval != AT_PWARM_OFF)
+		dump_now(true);
+}
+
+/*
+ * autoprewarm_main
+ *		The main entry point of autoprewarm bgworker process.
+ */
+void
+autoprewarm_main(Datum main_arg)
+{
+	AutoPrewarmTask todo_task;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+	pqsignal(SIGUSR1, apw_sigusr1_handler);
+
+	/* We're now ready to receive signals. */
+	BackgroundWorkerUnblockSignals();
+
+	todo_task = DatumGetInt32(main_arg);
+	Assert(todo_task == TASK_PREWARM_BUFFERPOOL ||
+		   todo_task == TASK_DUMP_BUFFERPOOL_INFO);
+	init_apw_state();
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->bgworker_pid != InvalidPid)
+	{
+		LWLockRelease(&apw_state->lock);
+		ereport(LOG,
+				(errmsg("autoprewarm worker is already running under PID %d",
+						apw_state->bgworker_pid)));
+		return;
+	}
+
+	apw_state->bgworker_pid = MyProcPid;
+	LWLockRelease(&apw_state->lock);
+
+	on_shmem_exit(reset_apw_state, 0);
+
+	ereport(LOG,
+			(errmsg("autoprewarm worker started")));
+
+	/*
+	 * We have finished initializing worker's state, let's start actual work.
+	 */
+	if (todo_task == TASK_PREWARM_BUFFERPOOL &&
+		!apw_state->skip_prewarm_on_restart)
+		prewarm_buffer_pool();
+
+	dump_block_info_periodically();
+
+	ereport(LOG,
+			(errmsg("autoprewarm worker stopped")));
+}
+
+/* ============================================================================
+ * =============	Extension's entry functions/utilities	===================
+ * ============================================================================
+ */
+
+/*
+ * setup_autoprewarm
+ *		A common function to initialize BackgroundWorker structure.
+ */
+static void
+setup_autoprewarm(BackgroundWorker *autoprewarm, const char *worker_name,
+			   const char *worker_function, Datum main_arg, int restart_time,
+				  int extra_flags)
+{
+	MemSet(autoprewarm, 0, sizeof(BackgroundWorker));
+	autoprewarm->bgw_flags = BGWORKER_SHMEM_ACCESS | extra_flags;
+
+	/* Register the autoprewarm background worker */
+	autoprewarm->bgw_start_time = BgWorkerStart_ConsistentState;
+	autoprewarm->bgw_restart_time = restart_time;
+	strcpy(autoprewarm->bgw_library_name, "pg_prewarm");
+	strcpy(autoprewarm->bgw_function_name, worker_function);
+	strncpy(autoprewarm->bgw_name, worker_name, BGW_MAXLEN);
+	autoprewarm->bgw_main_arg = main_arg;
+}
+
+/*
+ * _PG_init
+ *		Extension's entry point.
+ */
+void
+_PG_init(void)
+{
+	BackgroundWorker prewarm_worker;
+
+	/* Define custom GUC variables. */
+
+	DefineCustomIntVariable("pg_prewarm.dump_interval",
+					   "Sets the maximum time between two buffer pool dumps",
+							"If set to zero, timer based dumping is disabled."
+							" If set to -1, stops autoprewarm.",
+							&dump_interval,
+							AT_PWARM_DEFAULT_DUMP_INTERVAL,
+							AT_PWARM_OFF, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	if (process_shared_preload_libraries_in_progress)
+		DefineCustomBoolVariable("pg_prewarm.autoprewarm",
+								 "Enable/Disable auto-prewarm feature.",
+								 NULL,
+								 &enable_autoprewarm,
+								 true,
+								 PGC_POSTMASTER,
+								 0,
+								 NULL,
+								 NULL,
+								 NULL);
+	else
+	{
+		/* If not run as a preloaded library, nothing more to do. */
+		EmitWarningsOnPlaceholders("pg_prewarm");
+		return;
+	}
+
+	EmitWarningsOnPlaceholders("pg_prewarm");
+
+	/* Request additional shared resources. */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
+
+	/* If autoprewarm bgworker is disabled then nothing more to do. */
+	if (!enable_autoprewarm)
+		return;
+
+	/* Register autoprewarm load. */
+	setup_autoprewarm(&prewarm_worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_PREWARM_BUFFERPOOL), 0, 0);
+	RegisterBackgroundWorker(&prewarm_worker);
+}
+
+/*
+ * autoprewarm_dump_launcher
+ *		Dynamically launch an autoprewarm dump worker.
+ */
+static pid_t
+autoprewarm_dump_launcher(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	setup_autoprewarm(&worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_DUMP_BUFFERPOOL_INFO), 0, 0);
+
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			   errmsg("registering dynamic bgworker \"autoprewarm\" failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerStartup(handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not start autoprewarm dump bgworker"),
+			   errhint("More details may be available in the server log.")));
+	}
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			  errmsg("cannot start bgworker autoprewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart the database.")));
+	}
+
+	Assert(status == BGWH_STARTED);
+	return pid;
+}
+
+/*
+ * launch_autoprewarm_dump
+ *		The C-Language entry function to launch autoprewarm dump bgworker.
+ */
+Datum
+launch_autoprewarm_dump(PG_FUNCTION_ARGS)
+{
+	pid_t		pid;
+
+	/* If dump_interval is disabled then nothing more to do. */
+	if (dump_interval == AT_PWARM_OFF)
+		PG_RETURN_NULL();
+
+	pid = autoprewarm_dump_launcher();
+	PG_RETURN_INT32(pid);
+}
+
+/*
+ * autoprewarm_dump_now
+ *		The C-Language entry function to dump immediately.
+ */
+Datum
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+	uint32		num_blocks = 0;
+
+	init_apw_state();
+
+	PG_TRY();
+	{
+		num_blocks = dump_now(false);
+	}
+	PG_CATCH();
+	{
+		if (apw_state->pid_using_dumpfile == MyProcPid)
+			apw_state->pid_using_dumpfile = InvalidPid;
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+	PG_RETURN_INT64(num_blocks);
+}
diff --git a/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
new file mode 100644
index 0000000..a2241c6
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,14 @@
+/* contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_prewarm UPDATE TO '1.2'" to load this file. \quit
+
+CREATE FUNCTION launch_autoprewarm_dump()
+RETURNS pg_catalog.int4 STRICT
+AS 'MODULE_PATHNAME', 'launch_autoprewarm_dump'
+LANGUAGE C;
+
+CREATE FUNCTION autoprewarm_dump_now()
+RETURNS pg_catalog.int8 STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_dump_now'
+LANGUAGE C;
diff --git a/contrib/pg_prewarm/pg_prewarm.control b/contrib/pg_prewarm/pg_prewarm.control
index cf2fb92..40e3add 100644
--- a/contrib/pg_prewarm/pg_prewarm.control
+++ b/contrib/pg_prewarm/pg_prewarm.control
@@ -1,5 +1,5 @@
 # pg_prewarm extension
 comment = 'prewarm relation data'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_prewarm'
 relocatable = true
diff --git a/doc/src/sgml/pgprewarm.sgml b/doc/src/sgml/pgprewarm.sgml
index c090401..5346c94 100644
--- a/doc/src/sgml/pgprewarm.sgml
+++ b/doc/src/sgml/pgprewarm.sgml
@@ -10,7 +10,9 @@
  <para>
   The <filename>pg_prewarm</filename> module provides a convenient way
   to load relation data into either the operating system buffer cache
-  or the <productname>PostgreSQL</productname> buffer cache.
+  or the <productname>PostgreSQL</productname> buffer cache. Additionally, an
+  automatic prewarming of the server buffers is supported whenever the server
+  restarts.
  </para>
 
  <sect2>
@@ -55,6 +57,100 @@ pg_prewarm(regclass, mode text default 'buffer', fork text default 'main',
    cache. For these reasons, prewarming is typically most useful at startup,
    when caches are largely empty.
   </para>
+
+<synopsis>
+launch_autoprewarm_dump() RETURNS int4
+</synopsis>
+
+  <para>
+   This is a SQL callable function to launch the <literal>autoprewarm</literal>
+   worker to dump the buffer pool information at regular interval. In a server,
+   we can only run one <literal>autoprewarm</literal> worker so if worker sees
+   another existing worker it will exit immediately. The return value is pid of
+   the worker which has been launched.
+  </para>
+
+<synopsis>
+autoprewarm_dump_now() RETURNS int8
+</synopsis>
+
+  <para>
+   This is a SQL callable function to dump buffer pool information immediately
+   once by a backend. The return value is the number of block infos dumped.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>autoprewarm</title>
+
+  <para>
+  A bgworker which automatically records information about blocks which were
+  present in buffer pool before server shutdown and then prewarm the buffer
+  pool upon server restart with those blocks.
+  </para>
+
+  <para>
+  When the shared library <literal>pg_prewarm</literal> is preloaded via
+  <xref linkend="guc-shared-preload-libraries"> in <filename>postgresql.conf</>,
+  a bgworker <literal>autoprewarm</literal> is launched immediately after the
+  server has reached a consistent state. The bgworker will start loading blocks
+  recorded in <literal>$PGDATA/autoprewarm.blocks</literal> until there is a
+  free buffer left in the buffer pool. This way we do not replace any new
+  blocks which were loaded either by the recovery process or the querying
+  clients.
+  </para>
+
+  <para>
+  Once the <literal>autoprewarm</literal> bgworker has completed its prewarm
+  task, it will start a new task to periodically dump the information about
+  blocks which are currently in shared buffer pool. Upon next server restart,
+  the bgworker will prewarm the buffer pool by loading those blocks. The GUC
+  <literal>pg_prewarm.dump_interval</literal> will control the dumping activity
+  of the bgworker.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+ <variablelist>
+   <varlistentry>
+    <term>
+     <varname>pg_prewarm.enable_autoprewarm</varname> (<type>boolean</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.enable_autoprewarm</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is valid only for <literal>autoprewarm</literal>. An autoprewarm
+      worker will only be started if this variable is set <literal>on</literal>.
+      The default value is <literal>on</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry>
+   <term>
+     <varname>pg_prewarm.dump_interval</varname> (<type>int</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.dump_interval</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is valid only for <literal>autoprewarm</literal>. The minimum number
+      of seconds between two buffer pool's block information dump. The default
+      is 300 seconds. It also takes special values. If set to 0 then timer
+      based dump is disabled, it dumps only while the server is shutting down.
+      If set to -1, the running <literal>autoprewarm</literal> will be stopped.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
  </sect2>
 
  <sect2>
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 5d0a636..06a34a7 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,23 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- a lockless check to see if there is a free buffer in
+ *					   buffer pool.
+ *
+ * If the result is true that will become stale once free buffers are moved out
+ * by other operations, so the caller who strictly want to use a free buffer
+ * should not call this.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index ff99f6b..ab04bd9 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index eaa6d32..c6fa86a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -138,6 +138,8 @@ AttrDefault
 AttrNumber
 AttributeOpts
 AuthRequest
+AutoPrewarmSharedState
+AutoPrewarmTask
 AutoVacOpts
 AutoVacuumShmemStruct
 AutoVacuumWorkItem
@@ -214,10 +216,12 @@ BitmapOr
 BitmapOrPath
 BitmapOrState
 Bitmapset
+BlkType
 BlobInfo
 Block
 BlockId
 BlockIdData
+BlockInfoRecord
 BlockNumber
 BlockSampler
 BlockSamplerData
@@ -2869,6 +2873,7 @@ pos_trgm
 post_parse_analyze_hook_type
 pqbool
 pqsigfunc
+prewarm_elem
 printQueryOpt
 printTableContent
 printTableFooter

#89

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Mithun Cy (#88)

Re: Proposal : For Auto-Prewarm.

On Thu, Jun 15, 2017 at 12:35 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

[ new patch ]

I think this is looking better. I have some suggestions:

* I suggest renaming launch_autoprewarm_dump() to
autoprewarm_start_worker(). I think that will be clearer. Remember
that user-visible names, internal names, and the documentation should
all match.

* I think the GUC could be pg_prewarm.autoprewarm rather than
pg_prewarm.enable_autoprewarm. It's shorter and, I think, no less
clear.

* In the documentation, don't say "This is a SQL callable function
to....". This is a list of SQL-callable functions, so each thing in
the list is one. Just delete this from the beginning of each
sentence.

* The reason for the AT_PWARM_* naming is not very obvious. Does AT
mean "at" or "auto" or something else? How about
AUTOPREWARM_INTERVAL_DISABLED, AUTOPREWARM_INTERVAL_SHUTDOWN_ONLY,
AUTOPREWARM_INTERVAL_DEFAULT?

* Instead of defining apw_sigusr1_handler, I think you could just use
procsignal_sigusr1_handler. Instead of defining apw_sigterm_handler,
perhaps you could just use die(). got_sigterm would go away, and
you'd just CHECK_FOR_INTERRUPTS().

* The PG_TRY()/PG_CATCH() block in autoprewarm_dump_now() could reuse
reset_apw_state(), which might be better named detach_apw_shmem().
Similarly, init_apw_state() could be init_apw_shmem().

* Instead of load_one_database(), I suggest
autoprewarm_database_main(). That is more parallel to
autoprewarm_main(), which you have elsewhere, and makes it more
obvious that it's the main entrypoint for some background worker.

* Instead of launch_and_wait_for_per_database_worker(), I suggest
autoprewarm_one_database(), and instead of prewarm_buffer_pool(), I
suggest autoprewarm_buffers(). The motivation for changing prewarm
to autoprewarm is that we want the names here to be clearly distinct
from the other parts of pg_prewarm that are not related to
autoprewarm. The motivation for changing buffer_pool to buffers is
just that it's a little shorter. Personally I also like the sound it
of it better, but YMMV.

* prewarm_buffer_pool() ends with a useless return statement. I
suggest removing it.

* Instead of creating our own buffering system via buffer_file_write()
and buffer_file_flush(), why not just use the facilities provided by
the operating system? fopen() et. al. provide buffering, and we have
AllocateFile() to provide a FILE *; it's just like
OpenTransientFile(), which you are using, but you'll get the buffering
stuff for free. Maybe there's some reason why this won't work out
nicely, but off-hand it seems like it might. It looks like you are
already using AllocateFile() to read the dump, so using it to write
the dump as well seems like it would be logical.

* I think that it would be cool if, when autoprewarm completed, it
printed a message at LOG rather than DEBUG1, and with a few more
details, like "autoprewarm successfully prewarmed %d of %d
previously-loaded blocks". This would require some additional
tracking that you haven't got right now; you'd have to keep track not
only of the number of blocks read from the file but how many of those
some worker actually loaded. You could do that with an extra counter
in the shared memory area that gets incremented by the per-database
workers.

* dump_block_info_periodically() calls ResetLatch() immediately before
WaitLatch; that's backwards. See the commit message for commit
887feefe87b9099eeeec2967ec31ce20df4dfa9b and the comments it added to
the top of latch.h for details on how to do this correctly.

* dump_block_info_periodically()'s main loop is a bit confusing. I
think that after calling dump_now(true) it should just "continue",
which will automatically retest got_sigterm. You could rightly object
to that plan on the grounds that we then wouldn't recheck got_sighup
promptly, but you can fix that by moving the got_sighup test to the
top of the loop, which is a good idea anyway for at least two other
reasons. First, you probably want to check for a pending SIGHUP on
initially entering this function, because something might have changed
during the prewarm phase, and second, see the previous comment about
using the "another valid coding pattern" from latch.h, which puts the
ResetLatch() at the bottom of the loop.

* I think that launch_autoprewarm_dump() should ereport(ERROR, ...)
rather than just return NULL if the feature is disabled. Maybe
something like ... ERROR: pg_prewarm.dump_interval must be
non-negative in order to launch worker

* Not sure about this one, but maybe we should consider just getting
rid of pg_prewarm.dump_interval = -1 altogether and make the minimum
value 0. If pg_prewarm.autoprewarm = on, then we start the worker and
dump according to the dump interval; if pg_prewarm.autoprewarm = off
then we don't start the worker automatically, but we still let you
start it manually. If you do, it respects the configured
dump_interval. With this design, we don't need the error suggested in
the previous item at all, and the code can be simplified in various
places --- all the checks for AT_PWARM_OFF go away. And I don't see
that we're really losing anything. There's not much sense in dumping
but not prewarming or prewarming but not dumping, so having
pg_prewarm.autoprewarm configure whether the worker is started
automatically rather than whether it prewarms (with a separate control
for whether it dumps) seems to make sense. The one time when you want
to do one without the other is when you first install the extension --
during the first server lifetime, you'll want to dump, so that after
the next restart you have something to preload. But this design would
allow that.

That's all I have time for today - hope it helps.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#90

Thom Brown

thom@linux.com

over 8 years ago

In reply to: Robert Haas (#89)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

On 22 June 2017 at 22:52, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jun 15, 2017 at 12:35 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

[ new patch ]

I think this is looking better. I have some suggestions:

* I suggest renaming launch_autoprewarm_dump() to
autoprewarm_start_worker(). I think that will be clearer. Remember
that user-visible names, internal names, and the documentation should
all match.

I like related functions and GUCs to be similarly named so that they
have the same prefix.

* I think the GUC could be pg_prewarm.autoprewarm rather than
pg_prewarm.enable_autoprewarm. It's shorter and, I think, no less
clear.

I also think pg_prewarm.dump_interval should be renamed to
pg_prewarm.autoprewarm_interval.

* In the documentation, don't say "This is a SQL callable function
to....". This is a list of SQL-callable functions, so each thing in
the list is one. Just delete this from the beginning of each
sentence.

I've made a pass at the documentation and ended up removing those
intros. I haven't made any GUC/function renaming changes, but I have
rewritten some paragraphs for clarity. Updated patch attached.

One thing I couldn't quite make sense of is:

"The autoprewarm process will start loading blocks recorded in
$PGDATA/autoprewarm.blocks until there is a free buffer left in the
buffer pool."

Is this saying "until there is a single free buffer remaining in
shared buffers"? I haven't corrected or clarified this as I don't
understand it.

Also, I find it a bit messy that launch_autoprewarm_dump() doesn't
detect an autoprewarm process already running. I'd want this to
return NULL or an error if called for a 2nd time.

* The reason for the AT_PWARM_* naming is not very obvious. Does AT
mean "at" or "auto" or something else? How about
AUTOPREWARM_INTERVAL_DISABLED, AUTOPREWARM_INTERVAL_SHUTDOWN_ONLY,
AUTOPREWARM_INTERVAL_DEFAULT?

* Instead of defining apw_sigusr1_handler, I think you could just use
procsignal_sigusr1_handler. Instead of defining apw_sigterm_handler,
perhaps you could just use die(). got_sigterm would go away, and
you'd just CHECK_FOR_INTERRUPTS().

* The PG_TRY()/PG_CATCH() block in autoprewarm_dump_now() could reuse
reset_apw_state(), which might be better named detach_apw_shmem().
Similarly, init_apw_state() could be init_apw_shmem().

* Instead of load_one_database(), I suggest
autoprewarm_database_main(). That is more parallel to
autoprewarm_main(), which you have elsewhere, and makes it more
obvious that it's the main entrypoint for some background worker.

* Instead of launch_and_wait_for_per_database_worker(), I suggest
autoprewarm_one_database(), and instead of prewarm_buffer_pool(), I
suggest autoprewarm_buffers(). The motivation for changing prewarm
to autoprewarm is that we want the names here to be clearly distinct
from the other parts of pg_prewarm that are not related to
autoprewarm. The motivation for changing buffer_pool to buffers is
just that it's a little shorter. Personally I also like the sound it
of it better, but YMMV.

* prewarm_buffer_pool() ends with a useless return statement. I
suggest removing it.

* Instead of creating our own buffering system via buffer_file_write()
and buffer_file_flush(), why not just use the facilities provided by
the operating system? fopen() et. al. provide buffering, and we have
AllocateFile() to provide a FILE *; it's just like
OpenTransientFile(), which you are using, but you'll get the buffering
stuff for free. Maybe there's some reason why this won't work out
nicely, but off-hand it seems like it might. It looks like you are
already using AllocateFile() to read the dump, so using it to write
the dump as well seems like it would be logical.

* I think that it would be cool if, when autoprewarm completed, it
printed a message at LOG rather than DEBUG1, and with a few more
details, like "autoprewarm successfully prewarmed %d of %d
previously-loaded blocks". This would require some additional
tracking that you haven't got right now; you'd have to keep track not
only of the number of blocks read from the file but how many of those
some worker actually loaded. You could do that with an extra counter
in the shared memory area that gets incremented by the per-database
workers.

* dump_block_info_periodically() calls ResetLatch() immediately before
WaitLatch; that's backwards. See the commit message for commit
887feefe87b9099eeeec2967ec31ce20df4dfa9b and the comments it added to
the top of latch.h for details on how to do this correctly.

* dump_block_info_periodically()'s main loop is a bit confusing. I
think that after calling dump_now(true) it should just "continue",
which will automatically retest got_sigterm. You could rightly object
to that plan on the grounds that we then wouldn't recheck got_sighup
promptly, but you can fix that by moving the got_sighup test to the
top of the loop, which is a good idea anyway for at least two other
reasons. First, you probably want to check for a pending SIGHUP on
initially entering this function, because something might have changed
during the prewarm phase, and second, see the previous comment about
using the "another valid coding pattern" from latch.h, which puts the
ResetLatch() at the bottom of the loop.

* I think that launch_autoprewarm_dump() should ereport(ERROR, ...)
rather than just return NULL if the feature is disabled. Maybe
something like ... ERROR: pg_prewarm.dump_interval must be
non-negative in order to launch worker

* Not sure about this one, but maybe we should consider just getting
rid of pg_prewarm.dump_interval = -1 altogether and make the minimum
value 0. If pg_prewarm.autoprewarm = on, then we start the worker and
dump according to the dump interval; if pg_prewarm.autoprewarm = off
then we don't start the worker automatically, but we still let you
start it manually. If you do, it respects the configured
dump_interval. With this design, we don't need the error suggested in
the previous item at all, and the code can be simplified in various
places --- all the checks for AT_PWARM_OFF go away. And I don't see
that we're really losing anything. There's not much sense in dumping
but not prewarming or prewarming but not dumping, so having
pg_prewarm.autoprewarm configure whether the worker is started
automatically rather than whether it prewarms (with a separate control
for whether it dumps) seems to make sense. The one time when you want
to do one without the other is when you first install the extension --
during the first server lifetime, you'll want to dump, so that after
the next restart you have something to preload. But this design would
allow that.

--
Thom

Attachments:

autoprewarm_15.patchtext/x-patch; charset=US-ASCII; name=autoprewarm_15.patchDownload

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e..88580d1 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,10 +1,10 @@
 # contrib/pg_prewarm/Makefile
 
 MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+OBJS = pg_prewarm.o autoprewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
-DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
+DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
new file mode 100644
index 0000000..f84fa4a
--- /dev/null
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -0,0 +1,1109 @@
+/*-------------------------------------------------------------------------
+ *
+ * autoprewarm.c
+ *		Automatically prewarms the shared buffer pool when server restarts.
+ *
+ * DESCRIPTION
+ *
+ *		Autoprewarm is a bgworker process that automatically records the
+ *		information about blocks which were present in buffer pool before
+ *		server shutdown. Then prewarms the buffer pool on server restart
+ *		with those blocks.
+ *
+ *		How does it work? When the shared library "pg_prewarm" is preloaded, a
+ *		bgworker "autoprewarm" is launched immediately after the server has
+ *		reached a consistent state. The bgworker will start loading blocks
+ *		recorded until there is no free buffer left in the buffer pool. This
+ *		way we do not replace any new blocks which were loaded either by the
+ *		recovery process or the querying clients.
+ *
+ *		Once the "autoprewarm" bgworker has completed its prewarm task, it will
+ *		start a new task to periodically dump the BlockInfoRecords related to
+ *		the blocks which are currently in shared buffer pool. On next server
+ *		restart, the bgworker will prewarm the buffer pool by loading those
+ *		blocks. The GUC pg_prewarm.dump_interval will control the dumping
+ *		activity of the bgworker.
+ *
+ *	Copyright (c) 2016-2017, PostgreSQL Global Development Group
+ *
+ *	IDENTIFICATION
+ *		contrib/pg_prewarm/autoprewarm.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include <unistd.h>
+
+/* These are always necessary for a bgworker. */
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+
+/* These are necessary for prewarm utilities. */
+#include "access/heapam.h"
+#include "access/xact.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "pgstat.h"
+#include "storage/buf_internals.h"
+#include "storage/dsm.h"
+#include "storage/smgr.h"
+#include "utils/acl.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relfilenodemap.h"
+#include "utils/resowner.h"
+
+PG_FUNCTION_INFO_V1(launch_autoprewarm_dump);
+PG_FUNCTION_INFO_V1(autoprewarm_dump_now);
+
+#define AT_PWARM_OFF -1
+#define AT_PWARM_DUMP_AT_SHUTDOWN_ONLY 0
+#define AT_PWARM_DEFAULT_DUMP_INTERVAL 300
+
+#define AUTOPREWARM_FILE "autoprewarm.blocks"
+
+/* Primary functions */
+void		_PG_init(void);
+void		autoprewarm_main(Datum main_arg);
+static void dump_block_info_periodically(void);
+static pid_t autoprewarm_dump_launcher(void);
+static void setup_autoprewarm(BackgroundWorker *autoprewarm,
+				  const char *worker_name,
+				  const char *worker_function,
+				  Datum main_arg, int restart_time,
+				  int extra_flags);
+void		load_one_database(Datum main_arg);
+
+/*
+ * Signal Handlers.
+ */
+
+static void apw_sigterm_handler(SIGNAL_ARGS);
+static void apw_sighup_handler(SIGNAL_ARGS);
+static void apw_sigusr1_handler(SIGNAL_ARGS);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+static volatile sig_atomic_t got_sighup = false;
+
+/*
+ *	Signal handler for SIGTERM
+ *	Set a flag to handle.
+ */
+static void
+apw_sigterm_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGHUP
+ *	Set a flag to reread the config file.
+ */
+static void
+apw_sighup_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sighup = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGUSR1.
+ *	The prewarm workers notify with SIGUSR1 on their startup/shutdown.
+ */
+static void
+apw_sigusr1_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/* ============================================================================
+ * ==============	Types and variables used by autoprewarm   =============
+ * ============================================================================
+ */
+
+/* Metadata of each persistent block which is dumped and used for loading. */
+typedef struct BlockInfoRecord
+{
+	Oid			database;
+	Oid			tablespace;
+	Oid			filenode;
+	ForkNumber	forknum;
+	BlockNumber blocknum;
+} BlockInfoRecord;
+
+/* Tasks performed by autoprewarm workers.*/
+typedef enum
+{
+	TASK_PREWARM_BUFFERPOOL,	/* prewarm the buffer pool. */
+	TASK_DUMP_BUFFERPOOL_INFO	/* dump the buffer pool block info. */
+} AutoPrewarmTask;
+
+/* Shared state information for autoprewarm bgworker. */
+typedef struct AutoPrewarmSharedState
+{
+	LWLock		lock;			/* mutual exclusion */
+	pid_t		bgworker_pid;	/* for main bgworker */
+	pid_t		pid_using_dumpfile;		/* for autoprewarm or block dump */
+	bool		skip_prewarm_on_restart;		/* if set true, prewarm task
+												 * will not be done */
+
+	/* Following items are for communication with per-database worker */
+	dsm_handle	block_info_handle;
+	Oid			database;
+	int			prewarm_start_idx;
+	int			prewarm_stop_idx;
+} AutoPrewarmSharedState;
+
+static AutoPrewarmSharedState *apw_state = NULL;
+
+/*
+ * This data structure represents buffered file.
+ */
+typedef struct BufferFile
+{
+	char		transient_dump_file_path[MAXPGPATH];	/* actual file to be
+														 * written */
+	int			fd;				/* file descriptor to above file */
+	char		buf[BLCKSZ];	/* buffer used before writing to file */
+	int			pos;			/* next write position in buffer. */
+}	BufferFile;
+
+/* GUC variable that controls the dump activity of autoprewarm. */
+static int	dump_interval = 0;
+
+/*
+ * GUC variable to decide whether autoprewarm worker should be started when
+ * preloaded.
+ */
+static bool enable_autoprewarm = true;
+
+/* Compare member elements to check whether they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * blockinfo_cmp
+ *		Compare function used for qsort().
+ */
+static int
+blockinfo_cmp(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(tablespace);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+	return 0;
+}
+
+/* ============================================================================
+ * =====================	Prewarm part of autoprewarm =======================
+ * ============================================================================
+ */
+
+/*
+ * reset_apw_state
+ *		on_apw_exit reset the prewarm state
+ */
+
+static void
+reset_apw_state(int code, Datum arg)
+{
+	if (apw_state->pid_using_dumpfile == MyProcPid)
+		apw_state->pid_using_dumpfile = InvalidPid;
+	if (apw_state->bgworker_pid == MyProcPid)
+		apw_state->bgworker_pid = InvalidPid;
+}
+
+/*
+ * init_apw_state
+ *		Allocate and initialize autoprewarm related shared memory.
+ */
+static void
+init_apw_state(void)
+{
+	bool		found = false;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	apw_state = ShmemInitStruct("autoprewarm",
+								sizeof(AutoPrewarmSharedState),
+								&found);
+	if (!found)
+	{
+		/* First time through ... */
+		LWLockInitialize(&apw_state->lock, LWLockNewTrancheId());
+		apw_state->bgworker_pid = InvalidPid;
+		apw_state->pid_using_dumpfile = InvalidPid;
+		apw_state->skip_prewarm_on_restart = false;
+	}
+
+	LWLockRelease(AddinShmemInitLock);
+}
+
+/*
+ * load_one_database
+ *		This subroutine loads the BlockInfoRecords of the database set in
+ *		AutoPrewarmSharedState.
+ *
+ * Connect to the database and load the blocks of that database which are given
+ * by [apw_state->prewarm_start_idx, apw_state->prewarm_stop_idx).
+ */
+void
+load_one_database(Datum main_arg)
+{
+	uint32		pos;
+	BlockInfoRecord *block_info;
+	Relation	rel = NULL;
+	BlockNumber nblocks = 0;
+	BlockInfoRecord *old_blk;
+	dsm_segment *seg;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+
+	init_apw_state();
+	seg = dsm_attach(apw_state->block_info_handle);
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("could not map dynamic shared memory segment")));
+
+	block_info = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	BackgroundWorkerInitializeConnectionByOid(apw_state->database, InvalidOid);
+	old_blk = NULL;
+	pos = apw_state->prewarm_start_idx;
+
+	while (!got_sigterm && pos < apw_state->prewarm_stop_idx &&
+		   have_free_buffer())
+	{
+		BlockInfoRecord *blk = &block_info[pos++];
+		Buffer		buf;
+
+		/*
+		 * Quit if we've reached records for another database. If previous
+		 * blocks are of some global objects, then continue pre-warming.
+		 */
+		if (old_blk != NULL && old_blk->database != blk->database &&
+			old_blk->database != 0)
+			break;
+
+		/*
+		 * As soon as we encounter a block of a new relation, close the old
+		 * relation. Note, that rel will be NULL if try_relation_open failed
+		 * previously, in that case there is nothing to close.
+		 */
+		if (old_blk != NULL && old_blk->filenode != blk->filenode &&
+			rel != NULL)
+		{
+			relation_close(rel, AccessShareLock);
+			rel = NULL;
+			CommitTransactionCommand();
+		}
+
+		/*
+		 * Try to open each new relation, but only once, when we first
+		 * encounter it. If it's been dropped, skip the associated blocks.
+		 */
+		if (old_blk == NULL || old_blk->filenode != blk->filenode)
+		{
+			Oid			reloid;
+
+			Assert(rel == NULL);
+			StartTransactionCommand();
+			reloid = RelidByRelfilenode(blk->tablespace, blk->filenode);
+			if (OidIsValid(reloid))
+				rel = try_relation_open(reloid, AccessShareLock);
+
+			if (!rel)
+				CommitTransactionCommand();
+		}
+		if (!rel)
+		{
+			old_blk = blk;
+			continue;
+		}
+
+		/* Once per fork, check for fork existence and size. */
+		if (old_blk == NULL ||
+			old_blk->filenode != blk->filenode ||
+			old_blk->forknum != blk->forknum)
+		{
+			RelationOpenSmgr(rel);
+
+			/*
+			 * smgrexists is not safe for illegal forknum, hence check whether
+			 * the passed forknum is valid before using it in smgrexists.
+			 */
+			if (blk->forknum > InvalidForkNumber &&
+				blk->forknum <= MAX_FORKNUM &&
+				smgrexists(rel->rd_smgr, blk->forknum))
+				nblocks = RelationGetNumberOfBlocksInFork(rel, blk->forknum);
+			else
+				nblocks = 0;
+		}
+
+		/* Check whether blocknum is valid and within fork file size. */
+		if (blk->blocknum >= nblocks)
+		{
+			/* Move to next forknum. */
+			old_blk = blk;
+			continue;
+		}
+
+		/* Prewarm buffer. */
+		buf = ReadBufferExtended(rel, blk->forknum, blk->blocknum, RBM_NORMAL,
+								 NULL);
+		if (BufferIsValid(buf))
+			ReleaseBuffer(buf);
+
+		old_blk = blk;
+	}
+
+	dsm_detach(seg);
+
+	/* Release lock on previous relation. */
+	if (rel)
+	{
+		relation_close(rel, AccessShareLock);
+		CommitTransactionCommand();
+	}
+
+	return;
+}
+
+/*
+ * launch_and_wait_for_per_database_worker
+ *		Register a per-database dynamic worker to load.
+ */
+static void
+launch_and_wait_for_per_database_worker(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle = NULL;
+	BgwHandleStatus status PG_USED_FOR_ASSERTS_ONLY;
+
+	setup_autoprewarm(&worker, "autoprewarm", "load_one_database",
+					  (Datum) NULL, BGW_NEVER_RESTART,
+					  BGWORKER_BACKEND_DATABASE_CONNECTION);
+
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerShutdown */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker autoprewarm failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerShutdown(handle);
+	Assert(status == BGWH_STOPPED);
+}
+
+/*
+ * prewarm_buffer_pool
+ *		The main routine that prewarms the buffer pool.
+ *
+ * The prewarm bgworker will first load all the BlockInfoRecords in
+ * $PGDATA/AUTOPREWARM_FILE to a DSM. Further, these BlockInfoRecords are
+ * separated based on their databases. Finally, for each group of
+ * BlockInfoRecords a per-database worker will be launched to load the
+ * corresponding blocks. Launch the next worker only after the previous one has
+ * finished its job.
+ */
+static void
+prewarm_buffer_pool(void)
+{
+	FILE	   *file = NULL;
+	uint32		num_elements,
+				i;
+	BlockInfoRecord *blkinfo;
+	dsm_segment *seg;
+
+	/*
+	 * Since there can be at most one worker for prewarm, locking is not
+	 * required for setting skip_prewarm_on_restart.
+	 */
+	apw_state->skip_prewarm_on_restart = true;
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->pid_using_dumpfile == InvalidPid)
+		apw_state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		LWLockRelease(&apw_state->lock);
+		ereport(LOG,
+				(errmsg("skipping prewarm because block dump file is being written by PID %d",
+						apw_state->pid_using_dumpfile)));
+		return;
+	}
+
+	LWLockRelease(&apw_state->lock);
+
+	file = AllocateFile(AUTOPREWARM_FILE, PG_BINARY_R);
+	if (!file)
+	{
+		if (errno != ENOENT)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m",
+							AUTOPREWARM_FILE)));
+
+		apw_state->pid_using_dumpfile = InvalidPid;
+		return;					/* No file to load. */
+	}
+
+	if (fscanf(file, "<<%u>>i\n", &num_elements) != 1)
+	{
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from file \"%s\": %m",
+						AUTOPREWARM_FILE)));
+	}
+
+	seg = dsm_create(sizeof(BlockInfoRecord) * num_elements, 0);
+
+	blkinfo = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	for (i = 0; i < num_elements; i++)
+	{
+		/* Get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &blkinfo[i].database,
+						&blkinfo[i].tablespace, &blkinfo[i].filenode,
+						(uint32 *) &blkinfo[i].forknum, &blkinfo[i].blocknum))
+			break;
+	}
+
+	FreeFile(file);
+
+	if (num_elements != i)
+		elog(ERROR, "autoprewarm block dump has %u entries but expected %u",
+			 i, num_elements);
+
+	/*
+	 * Sort the block number to increase the chance of sequential reads during
+	 * load.
+	 */
+	pg_qsort(blkinfo, num_elements, sizeof(BlockInfoRecord), blockinfo_cmp);
+
+	apw_state->block_info_handle = dsm_segment_handle(seg);
+	apw_state->prewarm_start_idx = apw_state->prewarm_stop_idx = 0;
+
+	/* Get the info position of the first block of the next database. */
+	while (apw_state->prewarm_start_idx < num_elements)
+	{
+		uint32		i = apw_state->prewarm_start_idx;
+		Oid			current_db = blkinfo[i].database;
+
+		/*
+		 * Advance the prewarm_stop_idx to the first BlockRecordInfo that does
+		 * not belong to this database.
+		 */
+		i++;
+		while (i < num_elements)
+		{
+			if (current_db != blkinfo[i].database)
+			{
+				/*
+				 * Combine BlockRecordInfos of global object with the next
+				 * non-global object.
+				 */
+				if (current_db != InvalidOid)
+					break;
+				current_db = blkinfo[i].database;
+			}
+
+			i++;
+		}
+
+		/*
+		 * If we reach this point with current_db == InvalidOid, then only
+		 * BlockRecordInfos belonging to global objects exist. Since, we can
+		 * not connect with InvalidOid skip prewarming for these objects.
+		 */
+		if (current_db == InvalidOid)
+			break;
+
+		apw_state->prewarm_stop_idx = i;
+		apw_state->database = current_db;
+
+		Assert(apw_state->prewarm_start_idx < apw_state->prewarm_stop_idx);
+
+		/*
+		 * Register a per-database worker to load blocks of the database. Wait
+		 * until it has finished before starting the next worker.
+		 */
+		launch_and_wait_for_per_database_worker();
+		apw_state->prewarm_start_idx = apw_state->prewarm_stop_idx;
+	}
+
+	dsm_detach(seg);
+	apw_state->block_info_handle = DSM_HANDLE_INVALID;
+
+	apw_state->pid_using_dumpfile = InvalidPid;
+	ereport(DEBUG1,
+			(errmsg("autoprewarm load task ended")));
+	return;
+}
+
+/*
+ * ============================================================================
+ * ===================== Dump part of Autoprewarm =============================
+ * ============================================================================
+ */
+
+/*
+ * This submodule is for periodically dumping BlockRecordInfos in buffer pool
+ * into a dump file AUTOPREWARM_FILE.
+ * Each entry of BlockRecordInfo consists of database, tablespace, filenode,
+ * forknum, blocknum. Note that this is in the text form so that the dump
+ * information is readable and can be edited, if required.
+ */
+
+/*
+ * buffer_file_flush
+ *		Unload the buffer contents to actual file.
+ *
+ */
+static void
+buffer_file_flush(BufferFile * file)
+{
+	ssize_t		w_size;
+	char	   *buf = file->buf;
+
+	while (file->pos)
+	{
+		/* write to file until an error */
+		w_size = write(file->fd, buf, file->pos);
+		if (w_size > 0)
+		{
+			file->pos -= w_size;
+			buf += w_size;
+		}
+		else
+		{
+			int			save_errno = errno;
+
+			CloseTransientFile(file->fd);
+			unlink(file->transient_dump_file_path);
+			errno = save_errno;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to file \"%s\" : %m",
+							file->transient_dump_file_path)));
+		}
+	}
+}
+
+/*
+ * buffer_file_write
+ *		First accumulate the contents in a BLCKSZ buffer then unload it to
+ *		actual file.
+ */
+static void
+buffer_file_write(BufferFile * file, char *block_info, int block_info_len)
+{
+	Assert(block_info_len <= BLCKSZ);
+
+	/* If we exceed the buffer size unload buffer to actual file. */
+	if ((file->pos + block_info_len) > BLCKSZ)
+		buffer_file_flush(file);
+
+	memcpy(file->buf + file->pos, block_info, block_info_len);
+	file->pos += block_info_len;
+}
+
+/*
+ * dump_now
+ *		Dumps BlockRecordInfos in buffer pool.
+ */
+static uint32
+dump_now(bool is_bgworker)
+{
+	uint32		i;
+	int			ret,
+				block_info_len;
+	uint32		num_blocks;
+	BlockInfoRecord *block_info_array;
+	BufferDesc *bufHdr;
+	BufferFile *file;
+	char		block_info[1024];
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->pid_using_dumpfile == InvalidPid)
+		apw_state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		LWLockRelease(&apw_state->lock);
+
+		if (!is_bgworker)
+			ereport(ERROR,
+					(errmsg("could not perform block dump because dump file is being used by PID %d",
+							apw_state->pid_using_dumpfile)));
+		ereport(LOG,
+				(errmsg("skipping block dump because it is already being performed by PID %d",
+						apw_state->pid_using_dumpfile)));
+		return 0;
+	}
+
+	LWLockRelease(&apw_state->lock);
+
+	block_info_array =
+		(BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	for (num_blocks = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		/* In case of a SIGHUP, just reload the configuration. */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Have we been asked to stop dump? */
+		if (dump_interval == AT_PWARM_OFF)
+		{
+			pfree(block_info_array);
+			return 0;
+		}
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* Lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		if (buf_state & BM_TAG_VALID)
+		{
+			block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_blocks].tablespace = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+			++num_blocks;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	file = (BufferFile *) palloc(sizeof(BufferFile));
+	snprintf(file->transient_dump_file_path, MAXPGPATH, "%s.tmp",
+			 AUTOPREWARM_FILE);
+
+	file->fd = OpenTransientFile(file->transient_dump_file_path,
+							 O_CREAT | O_WRONLY | O_TRUNC | PG_BINARY, 0666);
+	if (file->fd < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open \"%s\": %m",
+						file->transient_dump_file_path)));
+	file->pos = 0;
+
+	block_info_len = sprintf(block_info, "<<%u>>\n", num_blocks);
+	buffer_file_write(file, block_info, block_info_len);
+
+	for (i = 0; i < num_blocks; i++)
+	{
+		/* In case of a SIGHUP, just reload the configuration. */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		/* Have we been asked to stop dump? */
+		if (dump_interval == AT_PWARM_OFF)
+		{
+			pfree(block_info_array);
+			CloseTransientFile(file->fd);
+			unlink(file->transient_dump_file_path);
+			pfree(file);
+			return 0;
+		}
+
+		block_info_len = sprintf(block_info, "%u,%u,%u,%u,%u\n",
+								 block_info_array[i].database,
+								 block_info_array[i].tablespace,
+								 block_info_array[i].filenode,
+								 (uint32) block_info_array[i].forknum,
+								 block_info_array[i].blocknum);
+
+		buffer_file_write(file, block_info, block_info_len);
+	}
+
+	pfree(block_info_array);
+
+	/* Write remaining buffer contents to actual file. */
+	buffer_file_flush(file);
+
+	/*
+	 * Rename transient_dump_file_path to AUTOPREWARM_FILE to make things
+	 * permanent.
+	 */
+	ret = CloseTransientFile(file->fd);
+	if (ret != 0)
+	{
+		int			save_errno = errno;
+
+		unlink(file->transient_dump_file_path);
+		errno = save_errno;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\" : %m",
+						file->transient_dump_file_path)));
+	}
+
+	(void) durable_rename(file->transient_dump_file_path, AUTOPREWARM_FILE,
+						  ERROR);
+	pfree(file);
+	apw_state->pid_using_dumpfile = InvalidPid;
+
+	ereport(DEBUG1,
+			(errmsg("saved metadata info of %d blocks", num_blocks)));
+	return num_blocks;
+}
+
+/*
+ * dump_block_info_periodically
+ *		 This loop periodically call dump_now().
+ *
+ * Call dum_now() at regular intervals defined by GUC variable dump_interval.
+ */
+void
+dump_block_info_periodically(void)
+{
+	TimestampTz last_dump_time = 0;
+
+	while (!got_sigterm)
+	{
+		int			rc;
+		struct timeval nap;
+
+		nap.tv_sec = AT_PWARM_DEFAULT_DUMP_INTERVAL;
+		nap.tv_usec = 0;
+
+		/* Have we been asked to stop dumping? */
+		if (dump_interval == AT_PWARM_OFF)
+			return;
+
+		if (dump_interval > AT_PWARM_DUMP_AT_SHUTDOWN_ONLY)
+		{
+			TimestampTz current_time = GetCurrentTimestamp();
+
+			if (last_dump_time == 0 ||
+				TimestampDifferenceExceeds(last_dump_time,
+										   current_time,
+										   (dump_interval * 1000)))
+			{
+				dump_now(true);
+
+				/*
+				 * It is better to stop when shutdown signal is received
+				 * during or right after a dump.
+				 */
+				if (got_sigterm)
+					return;
+				last_dump_time = GetCurrentTimestamp();
+				nap.tv_sec = dump_interval;
+				nap.tv_usec = 0;
+			}
+			else
+			{
+				long		secs;
+				int			usecs;
+
+				TimestampDifference(last_dump_time, current_time,
+									&secs, &usecs);
+				nap.tv_sec = dump_interval - secs;
+				nap.tv_usec = 0;
+			}
+		}
+		else
+			last_dump_time = 0;
+
+		ResetLatch(&MyProc->procLatch);
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   (nap.tv_sec * 1000L) + (nap.tv_usec / 1000L),
+					   PG_WAIT_EXTENSION);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+
+		/* In case of a SIGHUP, just reload the configuration. */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+	}
+
+	/* It's time for postmaster shutdown, let's dump for one last time. */
+	if (dump_interval != AT_PWARM_OFF)
+		dump_now(true);
+}
+
+/*
+ * autoprewarm_main
+ *		The main entry point of autoprewarm bgworker process.
+ */
+void
+autoprewarm_main(Datum main_arg)
+{
+	AutoPrewarmTask todo_task;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+	pqsignal(SIGUSR1, apw_sigusr1_handler);
+
+	/* We're now ready to receive signals. */
+	BackgroundWorkerUnblockSignals();
+
+	todo_task = DatumGetInt32(main_arg);
+	Assert(todo_task == TASK_PREWARM_BUFFERPOOL ||
+		   todo_task == TASK_DUMP_BUFFERPOOL_INFO);
+	init_apw_state();
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->bgworker_pid != InvalidPid)
+	{
+		LWLockRelease(&apw_state->lock);
+		ereport(LOG,
+				(errmsg("autoprewarm worker is already running under PID %d",
+						apw_state->bgworker_pid)));
+		return;
+	}
+
+	apw_state->bgworker_pid = MyProcPid;
+	LWLockRelease(&apw_state->lock);
+
+	on_shmem_exit(reset_apw_state, 0);
+
+	ereport(LOG,
+			(errmsg("autoprewarm worker started")));
+
+	/*
+	 * We have finished initializing worker's state, let's start actual work.
+	 */
+	if (todo_task == TASK_PREWARM_BUFFERPOOL &&
+		!apw_state->skip_prewarm_on_restart)
+		prewarm_buffer_pool();
+
+	dump_block_info_periodically();
+
+	ereport(LOG,
+			(errmsg("autoprewarm worker stopped")));
+}
+
+/* ============================================================================
+ * =============	Extension's entry functions/utilities	===================
+ * ============================================================================
+ */
+
+/*
+ * setup_autoprewarm
+ *		A common function to initialize BackgroundWorker structure.
+ */
+static void
+setup_autoprewarm(BackgroundWorker *autoprewarm, const char *worker_name,
+			   const char *worker_function, Datum main_arg, int restart_time,
+				  int extra_flags)
+{
+	MemSet(autoprewarm, 0, sizeof(BackgroundWorker));
+	autoprewarm->bgw_flags = BGWORKER_SHMEM_ACCESS | extra_flags;
+
+	/* Register the autoprewarm background worker */
+	autoprewarm->bgw_start_time = BgWorkerStart_ConsistentState;
+	autoprewarm->bgw_restart_time = restart_time;
+	strcpy(autoprewarm->bgw_library_name, "pg_prewarm");
+	strcpy(autoprewarm->bgw_function_name, worker_function);
+	strncpy(autoprewarm->bgw_name, worker_name, BGW_MAXLEN);
+	autoprewarm->bgw_main_arg = main_arg;
+}
+
+/*
+ * _PG_init
+ *		Extension's entry point.
+ */
+void
+_PG_init(void)
+{
+	BackgroundWorker prewarm_worker;
+
+	/* Define custom GUC variables. */
+
+	DefineCustomIntVariable("pg_prewarm.dump_interval",
+					   "Sets the maximum time between two buffer pool dumps",
+							"If set to zero, timer based dumping is disabled."
+							" If set to -1, stops autoprewarm.",
+							&dump_interval,
+							AT_PWARM_DEFAULT_DUMP_INTERVAL,
+							AT_PWARM_OFF, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	if (process_shared_preload_libraries_in_progress)
+		DefineCustomBoolVariable("pg_prewarm.autoprewarm",
+								 "Enable/Disable auto-prewarm feature.",
+								 NULL,
+								 &enable_autoprewarm,
+								 true,
+								 PGC_POSTMASTER,
+								 0,
+								 NULL,
+								 NULL,
+								 NULL);
+	else
+	{
+		/* If not run as a preloaded library, nothing more to do. */
+		EmitWarningsOnPlaceholders("pg_prewarm");
+		return;
+	}
+
+	EmitWarningsOnPlaceholders("pg_prewarm");
+
+	/* Request additional shared resources. */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
+
+	/* If autoprewarm bgworker is disabled then nothing more to do. */
+	if (!enable_autoprewarm)
+		return;
+
+	/* Register autoprewarm load. */
+	setup_autoprewarm(&prewarm_worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_PREWARM_BUFFERPOOL), 0, 0);
+	RegisterBackgroundWorker(&prewarm_worker);
+}
+
+/*
+ * autoprewarm_dump_launcher
+ *		Dynamically launch an autoprewarm dump worker.
+ */
+static pid_t
+autoprewarm_dump_launcher(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	setup_autoprewarm(&worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_DUMP_BUFFERPOOL_INFO), 0, 0);
+
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			   errmsg("registering dynamic bgworker \"autoprewarm\" failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerStartup(handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not start autoprewarm dump bgworker"),
+			   errhint("More details may be available in the server log.")));
+	}
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+			  errmsg("cannot start bgworker autoprewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart the database.")));
+	}
+
+	Assert(status == BGWH_STARTED);
+	return pid;
+}
+
+/*
+ * launch_autoprewarm_dump
+ *		The C-Language entry function to launch autoprewarm dump bgworker.
+ */
+Datum
+launch_autoprewarm_dump(PG_FUNCTION_ARGS)
+{
+	pid_t		pid;
+
+	/* If dump_interval is disabled then nothing more to do. */
+	if (dump_interval == AT_PWARM_OFF)
+		PG_RETURN_NULL();
+
+	pid = autoprewarm_dump_launcher();
+	PG_RETURN_INT32(pid);
+}
+
+/*
+ * autoprewarm_dump_now
+ *		The C-Language entry function to dump immediately.
+ */
+Datum
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+	uint32		num_blocks = 0;
+
+	init_apw_state();
+
+	PG_TRY();
+	{
+		num_blocks = dump_now(false);
+	}
+	PG_CATCH();
+	{
+		if (apw_state->pid_using_dumpfile == MyProcPid)
+			apw_state->pid_using_dumpfile = InvalidPid;
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+	PG_RETURN_INT64(num_blocks);
+}
diff --git a/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
new file mode 100644
index 0000000..a2241c6
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,14 @@
+/* contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_prewarm UPDATE TO '1.2'" to load this file. \quit
+
+CREATE FUNCTION launch_autoprewarm_dump()
+RETURNS pg_catalog.int4 STRICT
+AS 'MODULE_PATHNAME', 'launch_autoprewarm_dump'
+LANGUAGE C;
+
+CREATE FUNCTION autoprewarm_dump_now()
+RETURNS pg_catalog.int8 STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_dump_now'
+LANGUAGE C;
diff --git a/contrib/pg_prewarm/pg_prewarm.control b/contrib/pg_prewarm/pg_prewarm.control
index cf2fb92..40e3add 100644
--- a/contrib/pg_prewarm/pg_prewarm.control
+++ b/contrib/pg_prewarm/pg_prewarm.control
@@ -1,5 +1,5 @@
 # pg_prewarm extension
 comment = 'prewarm relation data'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_prewarm'
 relocatable = true
diff --git a/doc/src/sgml/pgprewarm.sgml b/doc/src/sgml/pgprewarm.sgml
index c090401..7f1972d 100644
--- a/doc/src/sgml/pgprewarm.sgml
+++ b/doc/src/sgml/pgprewarm.sgml
@@ -10,7 +10,9 @@
  <para>
   The <filename>pg_prewarm</filename> module provides a convenient way
   to load relation data into either the operating system buffer cache
-  or the <productname>PostgreSQL</productname> buffer cache.
+  or the <productname>PostgreSQL</productname> buffer cache. Additionally, an
+  automatic prewarming of the server buffers is supported whenever the server
+  restarts.
  </para>
 
  <sect2>
@@ -55,6 +57,103 @@ pg_prewarm(regclass, mode text default 'buffer', fork text default 'main',
    cache. For these reasons, prewarming is typically most useful at startup,
    when caches are largely empty.
   </para>
+
+<synopsis>
+launch_autoprewarm_dump() RETURNS int4
+</synopsis>
+
+  <para>
+   This will launch the <literal>autoprewarm</literal> worker which will dump
+   shared buffers to disk at the interval specified by
+   <varname>pg_prewarm.dump_interval</varname>.  The return value is the
+   process ID of the autoprewarm worker.  As only one
+   <literal>autoprewarm</literal> worker can be run per cluster at a time,
+   additional invokations will return a process ID, but that process will
+   immediately exit.
+  </para>
+
+<synopsis>
+autoprewarm_dump_now() RETURNS int8
+</synopsis>
+
+  <para>
+   This will immediately dump shared buffers to disk.  The return value is
+   the number of blocks dumped.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>autoprewarm</title>
+
+  <para>
+  This is a background worker process which will automatically dump shared
+  buffers to disk before a shutdown and then prewarm shared buffers the
+  next time the server is started by loading blocks from disk back into
+  the buffer pool.
+  </para>
+
+  <para>
+  When the shared library <literal>pg_prewarm</literal> is preloaded via
+  <xref linkend="guc-shared-preload-libraries"> in <filename>postgresql.conf</>,
+  an <literal>autoprewarm</literal> background worker is launched immediately after the
+  server has reached a consistent state. The autoprewarm process will start loading blocks
+  recorded in <filename>$PGDATA/autoprewarm.blocks</filename> until there is a
+  free buffer left in the buffer pool. This way we do not replace any new
+  blocks which were loaded either by the recovery process or the querying
+  clients.
+  </para>
+
+  <para>
+  Once the <literal>autoprewarm</literal> process has finished loading buffers
+  from disk, it will periodically dump shared buffers to disk at the inverval
+  specified by <varname>pg_prewarm.dump_interval</varname>.  Upon the next
+  server restart, the autoprewarm process will prewarm shared buffers with the
+  blocks that were last dumped to disk.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+ <variablelist>
+   <varlistentry>
+    <term>
+     <varname>pg_prewarm.enable_autoprewarm</varname> (<type>boolean</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.enable_autoprewarm</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      If set to <literal>on<literal>, an autoprewarm worker will be started
+      upon server start.  Setting this to <literal>off</literal> disables it.
+      The default value is <literal>on</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry>
+   <term>
+     <varname>pg_prewarm.dump_interval</varname> (<type>int</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.dump_interval</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is the minimum number of seconds after which autoprewarm dumps
+      shared buffers to disk.  The default is 300 seconds.  If set to 0,
+      shared buffers will not be dumped at regular intervals, only when the
+      server is shut down.
+      If set to -1, the running <literal>autoprewarm</literal> process will
+      be stopped.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
  </sect2>
 
  <sect2>
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 9d8ae6a..f033323 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,23 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- a lockless check to see if there is a free buffer in
+ *					   buffer pool.
+ *
+ * If the result is true that will become stale once free buffers are moved out
+ * by other operations, so the caller who strictly want to use a free buffer
+ * should not call this.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index b768b6f..300adfc 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 23a4bbd..8785b3b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -138,6 +138,8 @@ AttrDefault
 AttrNumber
 AttributeOpts
 AuthRequest
+AutoPrewarmSharedState
+AutoPrewarmTask
 AutoVacOpts
 AutoVacuumShmemStruct
 AutoVacuumWorkItem
@@ -214,10 +216,12 @@ BitmapOr
 BitmapOrPath
 BitmapOrState
 Bitmapset
+BlkType
 BlobInfo
 Block
 BlockId
 BlockIdData
+BlockInfoRecord
 BlockNumber
 BlockSampler
 BlockSamplerData
@@ -2870,6 +2874,7 @@ pos_trgm
 post_parse_analyze_hook_type
 pqbool
 pqsigfunc
+prewarm_elem
 printQueryOpt
 printTableContent
 printTableFooter

#91

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Robert Haas (#89)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

Thanks, Robert, I have tried to fix all of you comments and merged to
fixes suggested by Thom in patch 15.

On Fri, Jun 23, 2017 at 3:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:

* I suggest renaming launch_autoprewarm_dump() to
autoprewarm_start_worker(). I think that will be clearer. Remember
that user-visible names, internal names, and the documentation should
all match.

-- Fixed as suggested.

* I think the GUC could be pg_prewarm.autoprewarm rather than
pg_prewarm.enable_autoprewarm. It's shorter and, I think, no less
clear.

-- I have made GUC name as autoprewarm.

* In the documentation, don't say "This is a SQL callable function
to....". This is a list of SQL-callable functions, so each thing in
the list is one. Just delete this from the beginning of each
sentence.

-- Fixed, Thom has provided the fix and I have merged same to my patch.

* The reason for the AT_PWARM_* naming is not very obvious. Does AT
mean "at" or "auto" or something else? How about
AUTOPREWARM_INTERVAL_DISABLED, AUTOPREWARM_INTERVAL_SHUTDOWN_ONLY,
AUTOPREWARM_INTERVAL_DEFAULT?

-- Fixed as suggested. The AUTOPREWARM_INTERVAL_DISABLED is removed
now as suggested by below comments.

* Instead of defining apw_sigusr1_handler, I think you could just use
procsignal_sigusr1_handler. Instead of defining apw_sigterm_handler,
perhaps you could just use die(). got_sigterm would go away, and
you'd just CHECK_FOR_INTERRUPTS().

-- Hi have registered procsignal_sigusr1_handler instead of
apw_sigusr1_handler. But I have some doubts about using die instead of
apw_sigterm_handler in main autoprewarm worker. On shutdown(sigterm)
we should dump and then exit, so doing a CHECK_FOR_INTERRUPTS() we
might miss dumping the buffer contents. I think I need to modify some
server code in ProcessInterrupts to handle this, please let me know if
I am wrong about this.
For per-database prewarm worker, this seems right so I am registering
die for SIGTERM and calling CHECK_FOR_INTERRUPTS(). Also for
autoprewarm_dump_now().

* The PG_TRY()/PG_CATCH() block in autoprewarm_dump_now() could reuse
reset_apw_state(), which might be better named detach_apw_shmem().
Similarly, init_apw_state() could be init_apw_shmem().

-- Fixed.

* Instead of load_one_database(), I suggest
autoprewarm_database_main(). That is more parallel to
autoprewarm_main(), which you have elsewhere, and makes it more
obvious that it's the main entrypoint for some background worker.

-- Fixed.

* Instead of launch_and_wait_for_per_database_worker(), I suggest
autoprewarm_one_database(), and instead of prewarm_buffer_pool(), I
suggest autoprewarm_buffers(). The motivation for changing prewarm
to autoprewarm is that we want the names here to be clearly distinct
from the other parts of pg_prewarm that are not related to
autoprewarm. The motivation for changing buffer_pool to buffers is
just that it's a little shorter. Personally I also like the sound it
of it better, but YMMV.

-- Fixed as suggested. I have renamed as suggested.

* prewarm_buffer_pool() ends with a useless return statement. I
suggest removing it.

-- Sorry Fixed.

* Instead of creating our own buffering system via buffer_file_write()
and buffer_file_flush(), why not just use the facilities provided by
the operating system? fopen() et. al. provide buffering, and we have
AllocateFile() to provide a FILE *; it's just like
OpenTransientFile(), which you are using, but you'll get the buffering
stuff for free. Maybe there's some reason why this won't work out
nicely, but off-hand it seems like it might. It looks like you are
already using AllocateFile() to read the dump, so using it to write
the dump as well seems like it would be logical.

-- Now using AllocateFile().

* I think that it would be cool if, when autoprewarm completed, it
printed a message at LOG rather than DEBUG1, and with a few more
details, like "autoprewarm successfully prewarmed %d of %d
previously-loaded blocks". This would require some additional
tracking that you haven't got right now; you'd have to keep track not
only of the number of blocks read from the file but how many of those
some worker actually loaded. You could do that with an extra counter
in the shared memory area that gets incremented by the per-database
workers.

* dump_block_info_periodically() calls ResetLatch() immediately before
WaitLatch; that's backwards. See the commit message for commit
887feefe87b9099eeeec2967ec31ce20df4dfa9b and the comments it added to
the top of latch.h for details on how to do this correctly.

-- Sorry Fixed.

* dump_block_info_periodically()'s main loop is a bit confusing. I
think that after calling dump_now(true) it should just "continue",
which will automatically retest got_sigterm. You could rightly object
to that plan on the grounds that we then wouldn't recheck got_sighup
promptly, but you can fix that by moving the got_sighup test to the
top of the loop, which is a good idea anyway for at least two other
reasons. First, you probably want to check for a pending SIGHUP on
initially entering this function, because something might have changed
during the prewarm phase, and second, see the previous comment about
using the "another valid coding pattern" from latch.h, which puts the
ResetLatch() at the bottom of the loop.

-- Agree, my idea was while we were dumping or just immediately after
dumping if we receive sigterm we need not dump again for shutdown. I
think I am wrong so fixed as you have suggested.

* I think that launch_autoprewarm_dump() should ereport(ERROR, ...)
rather than just return NULL if the feature is disabled. Maybe
something like ... ERROR: pg_prewarm.dump_interval must be
non-negative in order to launch worker

-- I have removed pg_prewarm.dump_interval = -1 case as you have
suggested below. So no need for error now.

* Not sure about this one, but maybe we should consider just getting
rid of pg_prewarm.dump_interval = -1 altogether and make the minimum
value 0. If pg_prewarm.autoprewarm = on, then we start the worker and
dump according to the dump interval; if pg_prewarm.autoprewarm = off
then we don't start the worker automatically, but we still let you
start it manually. If you do, it respects the configured
dump_interval. With this design, we don't need the error suggested in
the previous item at all, and the code can be simplified in various
places --- all the checks for AT_PWARM_OFF go away. And I don't see
that we're really losing anything. There's not much sense in dumping
but not prewarming or prewarming but not dumping, so having
pg_prewarm.autoprewarm configure whether the worker is started
automatically rather than whether it prewarms (with a separate control
for whether it dumps) seems to make sense. The one time when you want
to do one without the other is when you first install the extension --
during the first server lifetime, you'll want to dump, so that after
the next restart you have something to preload. But this design would
allow that.

-- Agree. Removed the case pg_prewarm.dump_interval = -1. I had
similar doubt which I have tried to raise previously

On Tue, May 30, 2017 at 10:16 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

There is another GUC setting pg_prewarm.dump_interval if = -1 we stop
the running autoprewarm worker. I have a doubt should we combine these
2 entities into one such that it controls the state of autoprewarm
worker?

Now I have one doubt, do we need a mechanism to stop running
autoprewarm worker while keeping the server alive? Can I use the
pg_prewarm.autoprewarm for the same purpose?

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

autoprewarm_16.patchapplication/octet-stream; name=autoprewarm_16.patchDownload

commit 7de07cf8e9451a59bad07854aef1f08adb35cc7d
Author: mithun <mithun@localhost.localdomain>
Date:   Tue Jun 27 11:19:29 2017 +0530

    commit 16

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e..88580d1 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,10 +1,10 @@
 # contrib/pg_prewarm/Makefile
 
 MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+OBJS = pg_prewarm.o autoprewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
-DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
+DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
new file mode 100644
index 0000000..5b83920
--- /dev/null
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -0,0 +1,1002 @@
+/*-------------------------------------------------------------------------
+ *
+ * autoprewarm.c
+ *		Automatically prewarms the shared buffers when server restarts.
+ *
+ * DESCRIPTION
+ *
+ *		Autoprewarm is a bgworker process that automatically records the
+ *		information about blocks which were present in shared buffers before
+ *		server shutdown. Then prewarms the shared buffers on server restart
+ *		with those blocks.
+ *
+ *		How does it work? When the shared library "pg_prewarm" is preloaded, a
+ *		bgworker "autoprewarm" is launched immediately after the server has
+ *		reached a consistent state. The bgworker will start loading blocks
+ *		recorded until there is no free buffer left in the shared buffers. This
+ *		way we do not replace any new blocks which were loaded either by the
+ *		recovery process or the querying clients.
+ *
+ *		Once the "autoprewarm" bgworker has completed its prewarm task, it will
+ *		start a new task to periodically dump the BlockInfoRecords related to
+ *		the blocks which are currently in shared buffers. On next server
+ *		restart, the bgworker will prewarm the shared buffers by loading those
+ *		blocks. The GUC pg_prewarm.autoprewarm_interval will control the
+ *		dumping activity of the bgworker.
+ *
+ *	Copyright (c) 2016-2017, PostgreSQL Global Development Group
+ *
+ *	IDENTIFICATION
+ *		contrib/pg_prewarm/autoprewarm.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include <unistd.h>
+
+/* These are always necessary for a bgworker. */
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+
+/* These are necessary for prewarm utilities. */
+#include "access/heapam.h"
+#include "access/xact.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "pgstat.h"
+#include "storage/buf_internals.h"
+#include "storage/dsm.h"
+#include "storage/procsignal.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/acl.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relfilenodemap.h"
+#include "utils/resowner.h"
+
+PG_FUNCTION_INFO_V1(autoprewarm_start_worker);
+PG_FUNCTION_INFO_V1(autoprewarm_dump_now);
+
+#define AUTOPREWARM_INTERVAL_SHUTDOWN_ONLY 0
+#define AUTOPREWARM_INTERVAL_DEFAULT 300
+
+#define AUTOPREWARM_FILE "autoprewarm.blocks"
+
+/* Primary functions */
+void		_PG_init(void);
+void		autoprewarm_main(Datum main_arg);
+static void dump_block_info_periodically(void);
+static pid_t autoprewarm_dump_launcher(void);
+static void setup_autoprewarm(BackgroundWorker *autoprewarm,
+				  const char *worker_name,
+				  const char *worker_function,
+				  Datum main_arg, int restart_time,
+				  int extra_flags);
+void		autoprewarm_database_main(Datum main_arg);
+
+/*
+ * Signal Handlers.
+ */
+
+static void apw_sigterm_handler(SIGNAL_ARGS);
+static void apw_sighup_handler(SIGNAL_ARGS);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+static volatile sig_atomic_t got_sighup = false;
+
+/*
+ *	Signal handler for SIGTERM
+ *	Set a flag to handle.
+ */
+static void
+apw_sigterm_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGHUP
+ *	Set a flag to reread the config file.
+ */
+static void
+apw_sighup_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sighup = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/* ============================================================================
+ * ==============	Types and variables used by autoprewarm   =============
+ * ============================================================================
+ */
+
+/* Metadata of each persistent block which is dumped and used for loading. */
+typedef struct BlockInfoRecord
+{
+	Oid			database;
+	Oid			tablespace;
+	Oid			filenode;
+	ForkNumber	forknum;
+	BlockNumber blocknum;
+} BlockInfoRecord;
+
+/* Tasks performed by autoprewarm workers.*/
+typedef enum
+{
+	TASK_PREWARM_BUFFERPOOL,	/* prewarm the shared buffers. */
+	TASK_DUMP_BUFFERPOOL_INFO	/* dump the shared buffer's block info. */
+} AutoPrewarmTask;
+
+/* Shared state information for autoprewarm bgworker. */
+typedef struct AutoPrewarmSharedState
+{
+	LWLock		lock;			/* mutual exclusion */
+	pid_t		bgworker_pid;	/* for main bgworker */
+	pid_t		pid_using_dumpfile; /* for autoprewarm or block dump */
+	bool		skip_prewarm_on_restart;	/* if set true, prewarm task will
+											 * not be done */
+
+	/* Following items are for communication with per-database worker */
+	dsm_handle	block_info_handle;
+	Oid			database;
+	int			prewarm_start_idx;
+	int			prewarm_stop_idx;
+	uint32		prewarmed_blocks;
+} AutoPrewarmSharedState;
+
+static AutoPrewarmSharedState *apw_state = NULL;
+
+/* GUC variable that controls the dump activity of autoprewarm. */
+static int	autoprewarm_interval = 0;
+
+/*
+ * The GUC variable controls whether the server should run the autoprewarm
+ * worker.
+ */
+static bool autoprewarm = true;
+
+/* Compare member elements to check whether they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * blockinfo_cmp
+ *		Compare function used for qsort().
+ */
+static int
+blockinfo_cmp(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(tablespace);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+	return 0;
+}
+
+/* ============================================================================
+ * =================	Prewarm part of autoprewarm ========================
+ * ============================================================================
+ */
+
+/*
+ * detach_apw_shmem
+ *		on_apw_exit reset the prewarm shared state
+ */
+
+static void
+detach_apw_shmem(int code, Datum arg)
+{
+	if (apw_state->pid_using_dumpfile == MyProcPid)
+		apw_state->pid_using_dumpfile = InvalidPid;
+	if (apw_state->bgworker_pid == MyProcPid)
+		apw_state->bgworker_pid = InvalidPid;
+}
+
+/*
+ * init_apw_shmem
+ *		Allocate and initialize autoprewarm related shared memory.
+ */
+static void
+init_apw_shmem(void)
+{
+	bool		found = false;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	apw_state = ShmemInitStruct("autoprewarm",
+								sizeof(AutoPrewarmSharedState),
+								&found);
+	if (!found)
+	{
+		/* First time through ... */
+		LWLockInitialize(&apw_state->lock, LWLockNewTrancheId());
+		apw_state->bgworker_pid = InvalidPid;
+		apw_state->pid_using_dumpfile = InvalidPid;
+		apw_state->skip_prewarm_on_restart = false;
+	}
+
+	LWLockRelease(AddinShmemInitLock);
+}
+
+/*
+ * autoprewarm_database_main
+ *		This subroutine loads the BlockInfoRecords of the database set in
+ *		AutoPrewarmSharedState.
+ *
+ * Connect to the database and load the blocks of that database which are given
+ * by [apw_state->prewarm_start_idx, apw_state->prewarm_stop_idx).
+ */
+void
+autoprewarm_database_main(Datum main_arg)
+{
+	uint32		pos;
+	BlockInfoRecord *block_info;
+	Relation	rel = NULL;
+	BlockNumber nblocks = 0;
+	BlockInfoRecord *old_blk;
+	dsm_segment *seg;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, die);
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+
+	init_apw_shmem();
+	seg = dsm_attach(apw_state->block_info_handle);
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("could not map dynamic shared memory segment")));
+
+	block_info = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	BackgroundWorkerInitializeConnectionByOid(apw_state->database, InvalidOid);
+	old_blk = NULL;
+	pos = apw_state->prewarm_start_idx;
+
+	while (pos < apw_state->prewarm_stop_idx && have_free_buffer())
+	{
+		BlockInfoRecord *blk = &block_info[pos++];
+		Buffer		buf;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Quit if we've reached records for another database. If previous
+		 * blocks are of some global objects, then continue pre-warming.
+		 */
+		if (old_blk != NULL && old_blk->database != blk->database &&
+			old_blk->database != 0)
+			break;
+
+		/*
+		 * As soon as we encounter a block of a new relation, close the old
+		 * relation. Note, that rel will be NULL if try_relation_open failed
+		 * previously, in that case there is nothing to close.
+		 */
+		if (old_blk != NULL && old_blk->filenode != blk->filenode &&
+			rel != NULL)
+		{
+			relation_close(rel, AccessShareLock);
+			rel = NULL;
+			CommitTransactionCommand();
+		}
+
+		/*
+		 * Try to open each new relation, but only once, when we first
+		 * encounter it. If it's been dropped, skip the associated blocks.
+		 */
+		if (old_blk == NULL || old_blk->filenode != blk->filenode)
+		{
+			Oid			reloid;
+
+			Assert(rel == NULL);
+			StartTransactionCommand();
+			reloid = RelidByRelfilenode(blk->tablespace, blk->filenode);
+			if (OidIsValid(reloid))
+				rel = try_relation_open(reloid, AccessShareLock);
+
+			if (!rel)
+				CommitTransactionCommand();
+		}
+		if (!rel)
+		{
+			old_blk = blk;
+			continue;
+		}
+
+		/* Once per fork, check for fork existence and size. */
+		if (old_blk == NULL ||
+			old_blk->filenode != blk->filenode ||
+			old_blk->forknum != blk->forknum)
+		{
+			RelationOpenSmgr(rel);
+
+			/*
+			 * smgrexists is not safe for illegal forknum, hence check whether
+			 * the passed forknum is valid before using it in smgrexists.
+			 */
+			if (blk->forknum > InvalidForkNumber &&
+				blk->forknum <= MAX_FORKNUM &&
+				smgrexists(rel->rd_smgr, blk->forknum))
+				nblocks = RelationGetNumberOfBlocksInFork(rel, blk->forknum);
+			else
+				nblocks = 0;
+		}
+
+		/* Check whether blocknum is valid and within fork file size. */
+		if (blk->blocknum >= nblocks)
+		{
+			/* Move to next forknum. */
+			old_blk = blk;
+			continue;
+		}
+
+		/* Prewarm buffer. */
+		buf = ReadBufferExtended(rel, blk->forknum, blk->blocknum, RBM_NORMAL,
+								 NULL);
+		if (BufferIsValid(buf))
+		{
+			apw_state->prewarmed_blocks++;
+			ReleaseBuffer(buf);
+		}
+
+		old_blk = blk;
+	}
+
+	dsm_detach(seg);
+
+	/* Release lock on previous relation. */
+	if (rel)
+	{
+		relation_close(rel, AccessShareLock);
+		CommitTransactionCommand();
+	}
+
+	return;
+}
+
+/*
+ * autoprewarm_one_database
+ *		Register a per-database dynamic worker to load.
+ */
+static void
+autoprewarm_one_database(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle = NULL;
+	BgwHandleStatus status PG_USED_FOR_ASSERTS_ONLY;
+
+	setup_autoprewarm(&worker, "autoprewarm", "autoprewarm_database_main",
+					  (Datum) NULL, BGW_NEVER_RESTART,
+					  BGWORKER_BACKEND_DATABASE_CONNECTION);
+
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerShutdown */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker autoprewarm failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerShutdown(handle);
+	Assert(status == BGWH_STOPPED);
+}
+
+/*
+ * autoprewarm_buffers
+ *		The main routine that prewarms the shared buffers.
+ *
+ * The prewarm bgworker will first load all the BlockInfoRecords in
+ * $PGDATA/AUTOPREWARM_FILE to a DSM. Further, these BlockInfoRecords are
+ * separated based on their databases. Finally, for each group of
+ * BlockInfoRecords a per-database worker will be launched to load the
+ * corresponding blocks. Launch the next worker only after the previous one has
+ * finished its job.
+ */
+static void
+autoprewarm_buffers(void)
+{
+	FILE	   *file = NULL;
+	uint32		num_elements,
+				i;
+	BlockInfoRecord *blkinfo;
+	dsm_segment *seg;
+
+	/*
+	 * Since there can be at most one worker for prewarm, locking is not
+	 * required for setting skip_prewarm_on_restart.
+	 */
+	apw_state->skip_prewarm_on_restart = true;
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->pid_using_dumpfile == InvalidPid)
+		apw_state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		LWLockRelease(&apw_state->lock);
+		ereport(LOG,
+				(errmsg("skipping prewarm because block dump file is being written by PID %d",
+						apw_state->pid_using_dumpfile)));
+		return;
+	}
+
+	LWLockRelease(&apw_state->lock);
+
+	file = AllocateFile(AUTOPREWARM_FILE, PG_BINARY_R);
+	if (!file)
+	{
+		if (errno != ENOENT)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m",
+							AUTOPREWARM_FILE)));
+
+		apw_state->pid_using_dumpfile = InvalidPid;
+		return;					/* No file to load. */
+	}
+
+	if (fscanf(file, "<<%u>>i\n", &num_elements) != 1)
+	{
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from file \"%s\": %m",
+						AUTOPREWARM_FILE)));
+	}
+
+	seg = dsm_create(sizeof(BlockInfoRecord) * num_elements, 0);
+
+	blkinfo = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	for (i = 0; i < num_elements; i++)
+	{
+		/* Get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &blkinfo[i].database,
+						&blkinfo[i].tablespace, &blkinfo[i].filenode,
+						(uint32 *) &blkinfo[i].forknum, &blkinfo[i].blocknum))
+			break;
+	}
+
+	FreeFile(file);
+
+	if (num_elements != i)
+		elog(ERROR, "autoprewarm block dump has %u entries but expected %u",
+			 i, num_elements);
+
+	/*
+	 * Sort the block number to increase the chance of sequential reads during
+	 * load.
+	 */
+	pg_qsort(blkinfo, num_elements, sizeof(BlockInfoRecord), blockinfo_cmp);
+
+	apw_state->block_info_handle = dsm_segment_handle(seg);
+	apw_state->prewarm_start_idx = apw_state->prewarm_stop_idx = 0;
+	apw_state->prewarmed_blocks = 0;
+
+	/* Get the info position of the first block of the next database. */
+	while (apw_state->prewarm_start_idx < num_elements)
+	{
+		uint32		i = apw_state->prewarm_start_idx;
+		Oid			current_db = blkinfo[i].database;
+
+		/*
+		 * Advance the prewarm_stop_idx to the first BlockRecordInfo that does
+		 * not belong to this database.
+		 */
+		i++;
+		while (i < num_elements)
+		{
+			if (current_db != blkinfo[i].database)
+			{
+				/*
+				 * Combine BlockRecordInfos of global object with the next
+				 * non-global object.
+				 */
+				if (current_db != InvalidOid)
+					break;
+				current_db = blkinfo[i].database;
+			}
+
+			i++;
+		}
+
+		/*
+		 * If we reach this point with current_db == InvalidOid, then only
+		 * BlockRecordInfos belonging to global objects exist. Since, we can
+		 * not connect with InvalidOid skip prewarming for these objects.
+		 */
+		if (current_db == InvalidOid)
+			break;
+
+		apw_state->prewarm_stop_idx = i;
+		apw_state->database = current_db;
+
+		Assert(apw_state->prewarm_start_idx < apw_state->prewarm_stop_idx);
+
+		/*
+		 * Register a per-database worker to load blocks of the database. Wait
+		 * until it has finished before starting the next worker.
+		 */
+		autoprewarm_one_database();
+		apw_state->prewarm_start_idx = apw_state->prewarm_stop_idx;
+	}
+
+	dsm_detach(seg);
+	apw_state->block_info_handle = DSM_HANDLE_INVALID;
+
+	apw_state->pid_using_dumpfile = InvalidPid;
+	ereport(LOG,
+			(errmsg("autoprewarm successfully prewarmed %d of %d previously-loaded blocks",
+					apw_state->prewarmed_blocks, num_elements)));
+}
+
+/*
+ * ============================================================================
+ * ==============	Dump part of Autoprewarm =============================
+ * ============================================================================
+ */
+
+/*
+ * This submodule is for periodically dumping BlockRecordInfos in shared
+ * buffers into a dump file AUTOPREWARM_FILE.
+ * Each entry of BlockRecordInfo consists of database, tablespace, filenode,
+ * forknum, blocknum. Note that this is in the text form so that the dump
+ * information is readable and can be edited, if required.
+ */
+
+/*
+ * dump_now
+ *		Dumps BlockRecordInfos in shared buffers.
+ */
+static uint32
+dump_now(bool is_bgworker)
+{
+	uint32		i;
+	int			ret;
+	uint32		num_blocks;
+	BlockInfoRecord *block_info_array;
+	BufferDesc *bufHdr;
+	FILE	   *file;
+	char		transient_dump_file_path[MAXPGPATH];
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->pid_using_dumpfile == InvalidPid)
+		apw_state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		LWLockRelease(&apw_state->lock);
+
+		if (!is_bgworker)
+			ereport(ERROR,
+					(errmsg("could not perform block dump because dump file is being used by PID %d",
+							apw_state->pid_using_dumpfile)));
+		ereport(LOG,
+				(errmsg("skipping block dump because it is already being performed by PID %d",
+						apw_state->pid_using_dumpfile)));
+		return 0;
+	}
+
+	LWLockRelease(&apw_state->lock);
+
+	block_info_array =
+		(BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	for (num_blocks = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		if (!is_bgworker)
+			CHECK_FOR_INTERRUPTS();
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* Lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		if (buf_state & BM_TAG_VALID)
+		{
+			block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_blocks].tablespace = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+			++num_blocks;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	snprintf(transient_dump_file_path, MAXPGPATH, "%s.tmp", AUTOPREWARM_FILE);
+	file = AllocateFile(transient_dump_file_path, PG_BINARY_W);
+	if (!file)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m",
+						transient_dump_file_path)));
+
+	ret = fprintf(file, "<<%u>>\n", num_blocks);
+	if (ret < 0)
+	{
+		int			save_errno = errno;
+
+		FreeFile(file);
+		unlink(transient_dump_file_path);
+		errno = save_errno;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write to file \"%s\" : %m",
+						transient_dump_file_path)));
+	}
+
+	for (i = 0; i < num_blocks; i++)
+	{
+		if (!is_bgworker)
+			CHECK_FOR_INTERRUPTS();
+
+		ret = fprintf(file, "%u,%u,%u,%u,%u\n",
+					  block_info_array[i].database,
+					  block_info_array[i].tablespace,
+					  block_info_array[i].filenode,
+					  (uint32) block_info_array[i].forknum,
+					  block_info_array[i].blocknum);
+		if (ret < 0)
+		{
+			int			save_errno = errno;
+
+			FreeFile(file);
+			unlink(transient_dump_file_path);
+			errno = save_errno;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to file \"%s\" : %m",
+							transient_dump_file_path)));
+		}
+	}
+
+	pfree(block_info_array);
+
+	/*
+	 * Rename transient_dump_file_path to AUTOPREWARM_FILE to make things
+	 * permanent.
+	 */
+	ret = FreeFile(file);
+	if (ret != 0)
+	{
+		int			save_errno = errno;
+
+		unlink(transient_dump_file_path);
+		errno = save_errno;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\" : %m",
+						transient_dump_file_path)));
+	}
+
+	(void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
+	apw_state->pid_using_dumpfile = InvalidPid;
+
+	ereport(DEBUG1,
+			(errmsg("saved metadata info of %d blocks", num_blocks)));
+	return num_blocks;
+}
+
+/*
+ * dump_block_info_periodically
+ *		 This loop periodically call dump_now().
+ *
+ * Call dum_now() at regular intervals defined by GUC variable
+ * autoprewarm_interval.
+ */
+void
+dump_block_info_periodically(void)
+{
+	TimestampTz last_dump_time = 0;
+
+	while (!got_sigterm)
+	{
+		int			rc;
+		struct timeval nap;
+
+		nap.tv_sec = AUTOPREWARM_INTERVAL_DEFAULT;
+		nap.tv_usec = 0;
+
+		/* In case of a SIGHUP, just reload the configuration. */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		if (autoprewarm_interval > AUTOPREWARM_INTERVAL_SHUTDOWN_ONLY)
+		{
+			TimestampTz current_time = GetCurrentTimestamp();
+
+			if (last_dump_time == 0 ||
+				TimestampDifferenceExceeds(last_dump_time,
+										   current_time,
+										   (autoprewarm_interval * 1000)))
+			{
+				dump_now(true);
+				last_dump_time = GetCurrentTimestamp();
+				nap.tv_sec = autoprewarm_interval;
+				nap.tv_usec = 0;
+			}
+			else
+			{
+				long		secs;
+				int			usecs;
+
+				TimestampDifference(last_dump_time, current_time,
+									&secs, &usecs);
+				nap.tv_sec = autoprewarm_interval - secs;
+				nap.tv_usec = 0;
+			}
+		}
+		else
+			last_dump_time = 0;
+
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   (nap.tv_sec * 1000L) + (nap.tv_usec / 1000L),
+					   PG_WAIT_EXTENSION);
+		ResetLatch(&MyProc->procLatch);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+
+	/* It's time for postmaster shutdown, let's dump for one last time. */
+	dump_now(true);
+}
+
+/*
+ * autoprewarm_main
+ *		The main entry point of autoprewarm bgworker process.
+ */
+void
+autoprewarm_main(Datum main_arg)
+{
+	AutoPrewarmTask todo_task;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
+
+	/* We're now ready to receive signals. */
+	BackgroundWorkerUnblockSignals();
+
+	todo_task = DatumGetInt32(main_arg);
+	Assert(todo_task == TASK_PREWARM_BUFFERPOOL ||
+		   todo_task == TASK_DUMP_BUFFERPOOL_INFO);
+	init_apw_shmem();
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->bgworker_pid != InvalidPid)
+	{
+		LWLockRelease(&apw_state->lock);
+		ereport(LOG,
+				(errmsg("autoprewarm worker is already running under PID %d",
+						apw_state->bgworker_pid)));
+		return;
+	}
+
+	apw_state->bgworker_pid = MyProcPid;
+	LWLockRelease(&apw_state->lock);
+
+	on_shmem_exit(detach_apw_shmem, 0);
+
+	ereport(LOG,
+			(errmsg("autoprewarm worker started")));
+
+	/*
+	 * We have finished initializing worker's state, let's start actual work.
+	 */
+	if (todo_task == TASK_PREWARM_BUFFERPOOL &&
+		!apw_state->skip_prewarm_on_restart)
+		autoprewarm_buffers();
+
+	dump_block_info_periodically();
+
+	ereport(LOG,
+			(errmsg("autoprewarm worker stopped")));
+}
+
+/* ============================================================================
+ * =============	Extension's entry functions/utilities	===============
+ * ============================================================================
+ */
+
+/*
+ * setup_autoprewarm
+ *		A common function to initialize BackgroundWorker structure.
+ */
+static void
+setup_autoprewarm(BackgroundWorker *autoprewarm, const char *worker_name,
+				  const char *worker_function, Datum main_arg, int restart_time,
+				  int extra_flags)
+{
+	MemSet(autoprewarm, 0, sizeof(BackgroundWorker));
+	autoprewarm->bgw_flags = BGWORKER_SHMEM_ACCESS | extra_flags;
+
+	/* Register the autoprewarm background worker */
+	autoprewarm->bgw_start_time = BgWorkerStart_ConsistentState;
+	autoprewarm->bgw_restart_time = restart_time;
+	strcpy(autoprewarm->bgw_library_name, "pg_prewarm");
+	strcpy(autoprewarm->bgw_function_name, worker_function);
+	strncpy(autoprewarm->bgw_name, worker_name, BGW_MAXLEN);
+	autoprewarm->bgw_main_arg = main_arg;
+}
+
+/*
+ * _PG_init
+ *		Extension's entry point.
+ */
+void
+_PG_init(void)
+{
+	BackgroundWorker autoprewarm_worker;
+
+	/* Define custom GUC variables. */
+
+	DefineCustomIntVariable("pg_prewarm.autoprewarm_interval",
+							"Sets the maximum time between two shared buffers dumps",
+							"If set to zero, timer based dumping is disabled.",
+							&autoprewarm_interval,
+							AUTOPREWARM_INTERVAL_DEFAULT,
+							AUTOPREWARM_INTERVAL_SHUTDOWN_ONLY, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	if (process_shared_preload_libraries_in_progress)
+		DefineCustomBoolVariable("pg_prewarm.autoprewarm",
+								 "Starts the autoprewarm worker.",
+								 NULL,
+								 &autoprewarm,
+								 true,
+								 PGC_POSTMASTER,
+								 0,
+								 NULL,
+								 NULL,
+								 NULL);
+	else
+	{
+		/* If not run as a preloaded library, nothing more to do. */
+		EmitWarningsOnPlaceholders("pg_prewarm");
+		return;
+	}
+
+	EmitWarningsOnPlaceholders("pg_prewarm");
+
+	/* Request additional shared resources. */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
+
+	/* If autoprewarm is disabled then nothing more to do. */
+	if (!autoprewarm)
+		return;
+
+	/* Register autoprewarm load. */
+	setup_autoprewarm(&autoprewarm_worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_PREWARM_BUFFERPOOL), 0, 0);
+	RegisterBackgroundWorker(&autoprewarm_worker);
+}
+
+/*
+ * autoprewarm_dump_launcher
+ *		Dynamically launch an autoprewarm dump worker.
+ */
+static pid_t
+autoprewarm_dump_launcher(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	setup_autoprewarm(&worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_DUMP_BUFFERPOOL_INFO), 0, 0);
+
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker \"autoprewarm\" failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerStartup(handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not start autoprewarm dump bgworker"),
+				 errhint("More details may be available in the server log.")));
+	}
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("cannot start bgworker autoprewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart the database.")));
+	}
+
+	Assert(status == BGWH_STARTED);
+	return pid;
+}
+
+/*
+ * autoprewarm_start_worker
+ *		The C-Language entry function to launch autoprewarm dump bgworker.
+ */
+Datum
+autoprewarm_start_worker(PG_FUNCTION_ARGS)
+{
+	pid_t		pid;
+
+	pid = autoprewarm_dump_launcher();
+	PG_RETURN_INT32(pid);
+}
+
+/*
+ * autoprewarm_dump_now
+ *		The C-Language entry function to dump immediately.
+ */
+Datum
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+	uint32		num_blocks = 0;
+
+	init_apw_shmem();
+
+	PG_TRY();
+	{
+		num_blocks = dump_now(false);
+	}
+	PG_CATCH();
+	{
+		detach_apw_shmem(0, 0);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+	PG_RETURN_INT64(num_blocks);
+}
diff --git a/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
new file mode 100644
index 0000000..c0000bc
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,14 @@
+/* contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_prewarm UPDATE TO '1.2'" to load this file. \quit
+
+CREATE FUNCTION autoprewarm_start_worker()
+RETURNS pg_catalog.int4 STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_start_worker'
+LANGUAGE C;
+
+CREATE FUNCTION autoprewarm_dump_now()
+RETURNS pg_catalog.int8 STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_dump_now'
+LANGUAGE C;
diff --git a/contrib/pg_prewarm/pg_prewarm.control b/contrib/pg_prewarm/pg_prewarm.control
index cf2fb92..40e3add 100644
--- a/contrib/pg_prewarm/pg_prewarm.control
+++ b/contrib/pg_prewarm/pg_prewarm.control
@@ -1,5 +1,5 @@
 # pg_prewarm extension
 comment = 'prewarm relation data'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_prewarm'
 relocatable = true
diff --git a/doc/src/sgml/pgprewarm.sgml b/doc/src/sgml/pgprewarm.sgml
index c090401..9abf598 100644
--- a/doc/src/sgml/pgprewarm.sgml
+++ b/doc/src/sgml/pgprewarm.sgml
@@ -10,7 +10,9 @@
  <para>
   The <filename>pg_prewarm</filename> module provides a convenient way
   to load relation data into either the operating system buffer cache
-  or the <productname>PostgreSQL</productname> buffer cache.
+  or the <productname>PostgreSQL</productname> buffer cache. Additionally, an
+  automatic prewarming of the server buffers is supported whenever the server
+  restarts.
  </para>
 
  <sect2>
@@ -55,6 +57,101 @@ pg_prewarm(regclass, mode text default 'buffer', fork text default 'main',
    cache. For these reasons, prewarming is typically most useful at startup,
    when caches are largely empty.
   </para>
+
+<synopsis>
+autoprewarm_start_worker() RETURNS int4
+</synopsis>
+
+  <para>
+   This will start the <literal>autoprewarm</literal> worker which will dump
+   shared buffers to disk at the interval specified by
+   <varname>pg_prewarm.autoprewarm_interval</varname>.  The return value is the
+   process ID of the autoprewarm worker.  As only one
+   <literal>autoprewarm</literal> worker can be run per cluster at a time,
+   additional invokations will return a process ID, but that process will
+   immediately exit.
+  </para>
+
+<synopsis>
+autoprewarm_dump_now() RETURNS int8
+</synopsis>
+
+  <para>
+   This will immediately dump shared buffers to disk.  The return value is
+   the number of blocks dumped.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>autoprewarm</title>
+
+  <para>
+  This is a background worker process which will automatically dump shared
+  buffers to disk before a shutdown and then prewarm shared buffers the
+  next time the server is started by loading blocks from disk back into
+  the buffer pool.
+  </para>
+
+  <para>
+  When the shared library <literal>pg_prewarm</literal> is preloaded via
+  <xref linkend="guc-shared-preload-libraries"> in <filename>postgresql.conf</>,
+  an <literal>autoprewarm</literal> background worker is launched immediately
+  after the server has reached a consistent state. The autoprewarm process will
+  start loading blocks recorded in
+  <filename>$PGDATA/autoprewarm.blocks</filename> until there is no free buffer
+  left in the buffer pool. This way we do not replace any new blocks which were
+  loaded either by the recovery process or the querying clients.
+  </para>
+
+  <para>
+  Once the <literal>autoprewarm</literal> process has finished loading buffers
+  from disk, it will periodically dump shared buffers to disk at the inverval
+  specified by <varname>pg_prewarm.autoprewarm_interval</varname>.  Upon the
+  next server restart, the autoprewarm process will prewarm shared buffers with
+  the blocks that were last dumped to disk.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+ <variablelist>
+   <varlistentry>
+    <term>
+     <varname>pg_prewarm.autoprewarm</varname> (<type>boolean</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.autoprewarm</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      Controls whether the server should run autoprewarm worker. This is on by
+      default. This parameter can only be set in the postgresql.conf file or on
+      the server command line
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry>
+   <term>
+     <varname>pg_prewarm.autoprewarm_interval</varname> (<type>int</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.autoprewarm_interval</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is the minimum number of seconds after which autoprewarm dumps
+      shared buffers to disk. The default is 300 seconds. If set to 0,
+      shared buffers will not be dumped at regular intervals, but only when the
+      server shut down.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
  </sect2>
 
  <sect2>
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 9d8ae6a..f033323 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,23 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- a lockless check to see if there is a free buffer in
+ *					   buffer pool.
+ *
+ * If the result is true that will become stale once free buffers are moved out
+ * by other operations, so the caller who strictly want to use a free buffer
+ * should not call this.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index b768b6f..300adfc 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 23a4bbd..d8948cc 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -138,6 +138,8 @@ AttrDefault
 AttrNumber
 AttributeOpts
 AuthRequest
+AutoPrewarmSharedState
+AutoPrewarmTask
 AutoVacOpts
 AutoVacuumShmemStruct
 AutoVacuumWorkItem
@@ -218,6 +220,7 @@ BlobInfo
 Block
 BlockId
 BlockIdData
+BlockInfoRecord
 BlockNumber
 BlockSampler
 BlockSamplerData

#92

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Thom Brown (#90)

Re: Proposal : For Auto-Prewarm.

On Fri, Jun 23, 2017 at 5:45 AM, Thom Brown <thom@linux.com> wrote:

I also think pg_prewarm.dump_interval should be renamed to
pg_prewarm.autoprewarm_interval.

Thanks, I have changed it to pg_prewarm.autoprewarm_interval.

* In the documentation, don't say "This is a SQL callable function
to....". This is a list of SQL-callable functions, so each thing in
the list is one. Just delete this from the beginning of each
sentence.

One thing I couldn't quite make sense of is:

"The autoprewarm process will start loading blocks recorded in
$PGDATA/autoprewarm.blocks until there is a free buffer left in the
buffer pool."

Is this saying "until there is a single free buffer remaining in
shared buffers"? I haven't corrected or clarified this as I don't
understand it.

Sorry, that was a typo I wanted to say until there is no free buffer
left. Fixed in autoprewarm_16.patch.

Also, I find it a bit messy that launch_autoprewarm_dump() doesn't
detect an autoprewarm process already running. I'd want this to
return NULL or an error if called for a 2nd time.

We log instead of error as we try to check only after launching the
worker and inside worker. One solution could be as similar to
autoprewam_dump_now(), the autoprewarm_start_worker() can init shared
memory and check if we can launch worker in backend itself. I will try
to fix same.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#93

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Mithun Cy (#92)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

On Tue, Jun 27, 2017 at 11:41 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Fri, Jun 23, 2017 at 5:45 AM, Thom Brown <thom@linux.com> wrote:

Also, I find it a bit messy that launch_autoprewarm_dump() doesn't
detect an autoprewarm process already running. I'd want this to
return NULL or an error if called for a 2nd time.

We log instead of error as we try to check only after launching the
worker and inside worker. One solution could be as similar to
autoprewam_dump_now(), the autoprewarm_start_worker() can init shared
memory and check if we can launch worker in backend itself. I will try
to fix same.

I have fixed it now as follows

+Datum
+autoprewarm_start_worker(PG_FUNCTION_ARGS)
+{
+   pid_t       pid;
+
+   init_apw_shmem();
+   pid = apw_state->bgworker_pid;
+   if (pid != InvalidPid)
+       ereport(ERROR,
+               (errmsg("autoprewarm worker is already running under PID %d",
+                       pid)));
+
+   autoprewarm_dump_launcher();
+   PG_RETURN_VOID();
+}

In backend itself, we shall check if an autoprewarm worker is running
then only start the server. There is a possibility if this function is
executed concurrently when there is no worker already running (Which I
think is not a normal usage) then both call will say it has
successfully launched the worker even though only one could have
successfully done that (other will log and silently die). I think that
is okay as the objective was to get one worker up and running.

I have changed the return value to void. The worker could be restarted
when there is an error. So returned pid is not going to be same as
worker pid in such cases. Also, I do not see any use of pid. Made
documentation changes regarding above changes.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

autoprewarm_17.patchapplication/octet-stream; name=autoprewarm_17.patchDownload

commit 9aa897298eaa781758443f7a0332c5153c12b3e7
Author: mithun <mithun@localhost.localdomain>
Date:   Sun Jul 2 22:14:35 2017 +0530

    commit 17

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e..88580d1 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,10 +1,10 @@
 # contrib/pg_prewarm/Makefile
 
 MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+OBJS = pg_prewarm.o autoprewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
-DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
+DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
new file mode 100644
index 0000000..7e9fa94
--- /dev/null
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -0,0 +1,1007 @@
+/*-------------------------------------------------------------------------
+ *
+ * autoprewarm.c
+ *		Automatically prewarms the shared buffers when server restarts.
+ *
+ * DESCRIPTION
+ *
+ *		Autoprewarm is a bgworker process that automatically records the
+ *		information about blocks which were present in shared buffers before
+ *		server shutdown. Then prewarms the shared buffers on server restart
+ *		with those blocks.
+ *
+ *		How does it work? When the shared library "pg_prewarm" is preloaded, a
+ *		bgworker "autoprewarm" is launched immediately after the server has
+ *		reached a consistent state. The bgworker will start loading blocks
+ *		recorded until there is no free buffer left in the shared buffers. This
+ *		way we do not replace any new blocks which were loaded either by the
+ *		recovery process or the querying clients.
+ *
+ *		Once the "autoprewarm" bgworker has completed its prewarm task, it will
+ *		start a new task to periodically dump the BlockInfoRecords related to
+ *		the blocks which are currently in shared buffers. On next server
+ *		restart, the bgworker will prewarm the shared buffers by loading those
+ *		blocks. The GUC pg_prewarm.autoprewarm_interval will control the
+ *		dumping activity of the bgworker.
+ *
+ *	Copyright (c) 2016-2017, PostgreSQL Global Development Group
+ *
+ *	IDENTIFICATION
+ *		contrib/pg_prewarm/autoprewarm.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include <unistd.h>
+
+/* These are always necessary for a bgworker. */
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+
+/* These are necessary for prewarm utilities. */
+#include "access/heapam.h"
+#include "access/xact.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "pgstat.h"
+#include "storage/buf_internals.h"
+#include "storage/dsm.h"
+#include "storage/procsignal.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/acl.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relfilenodemap.h"
+#include "utils/resowner.h"
+
+PG_FUNCTION_INFO_V1(autoprewarm_start_worker);
+PG_FUNCTION_INFO_V1(autoprewarm_dump_now);
+
+#define AUTOPREWARM_INTERVAL_SHUTDOWN_ONLY 0
+#define AUTOPREWARM_INTERVAL_DEFAULT 300
+
+#define AUTOPREWARM_FILE "autoprewarm.blocks"
+
+/* Primary functions */
+void		_PG_init(void);
+void		autoprewarm_main(Datum main_arg);
+static void dump_block_info_periodically(void);
+static void autoprewarm_dump_launcher(void);
+static void setup_autoprewarm(BackgroundWorker *autoprewarm,
+				  const char *worker_name,
+				  const char *worker_function,
+				  Datum main_arg, int restart_time,
+				  int extra_flags);
+void		autoprewarm_database_main(Datum main_arg);
+
+/*
+ * Signal Handlers.
+ */
+
+static void apw_sigterm_handler(SIGNAL_ARGS);
+static void apw_sighup_handler(SIGNAL_ARGS);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+static volatile sig_atomic_t got_sighup = false;
+
+/*
+ *	Signal handler for SIGTERM
+ *	Set a flag to handle.
+ */
+static void
+apw_sigterm_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGHUP
+ *	Set a flag to reread the config file.
+ */
+static void
+apw_sighup_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sighup = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/* ============================================================================
+ * ==============	Types and variables used by autoprewarm   =============
+ * ============================================================================
+ */
+
+/* Metadata of each persistent block which is dumped and used for loading. */
+typedef struct BlockInfoRecord
+{
+	Oid			database;
+	Oid			tablespace;
+	Oid			filenode;
+	ForkNumber	forknum;
+	BlockNumber blocknum;
+} BlockInfoRecord;
+
+/* Tasks performed by autoprewarm workers.*/
+typedef enum
+{
+	TASK_PREWARM_BUFFERPOOL,	/* prewarm the shared buffers. */
+	TASK_DUMP_BUFFERPOOL_INFO	/* dump the shared buffer's block info. */
+} AutoPrewarmTask;
+
+/* Shared state information for autoprewarm bgworker. */
+typedef struct AutoPrewarmSharedState
+{
+	LWLock		lock;			/* mutual exclusion */
+	pid_t		bgworker_pid;	/* for main bgworker */
+	pid_t		pid_using_dumpfile; /* for autoprewarm or block dump */
+	bool		skip_prewarm_on_restart;	/* if set true, prewarm task will
+											 * not be done */
+
+	/* Following items are for communication with per-database worker */
+	dsm_handle	block_info_handle;
+	Oid			database;
+	int			prewarm_start_idx;
+	int			prewarm_stop_idx;
+	uint32		prewarmed_blocks;
+} AutoPrewarmSharedState;
+
+static AutoPrewarmSharedState *apw_state = NULL;
+
+/* GUC variable that controls the dump activity of autoprewarm. */
+static int	autoprewarm_interval = 0;
+
+/*
+ * The GUC variable controls whether the server should run the autoprewarm
+ * worker.
+ */
+static bool autoprewarm = true;
+
+/* Compare member elements to check whether they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * blockinfo_cmp
+ *		Compare function used for qsort().
+ */
+static int
+blockinfo_cmp(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(tablespace);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+	return 0;
+}
+
+/* ============================================================================
+ * =================	Prewarm part of autoprewarm ========================
+ * ============================================================================
+ */
+
+/*
+ * detach_apw_shmem
+ *		on_apw_exit reset the prewarm shared state
+ */
+
+static void
+detach_apw_shmem(int code, Datum arg)
+{
+	if (apw_state->pid_using_dumpfile == MyProcPid)
+		apw_state->pid_using_dumpfile = InvalidPid;
+	if (apw_state->bgworker_pid == MyProcPid)
+		apw_state->bgworker_pid = InvalidPid;
+}
+
+/*
+ * init_apw_shmem
+ *		Allocate and initialize autoprewarm related shared memory.
+ */
+static void
+init_apw_shmem(void)
+{
+	bool		found = false;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	apw_state = ShmemInitStruct("autoprewarm",
+								sizeof(AutoPrewarmSharedState),
+								&found);
+	if (!found)
+	{
+		/* First time through ... */
+		LWLockInitialize(&apw_state->lock, LWLockNewTrancheId());
+		apw_state->bgworker_pid = InvalidPid;
+		apw_state->pid_using_dumpfile = InvalidPid;
+		apw_state->skip_prewarm_on_restart = false;
+	}
+
+	LWLockRelease(AddinShmemInitLock);
+}
+
+/*
+ * autoprewarm_database_main
+ *		This subroutine loads the BlockInfoRecords of the database set in
+ *		AutoPrewarmSharedState.
+ *
+ * Connect to the database and load the blocks of that database which are given
+ * by [apw_state->prewarm_start_idx, apw_state->prewarm_stop_idx).
+ */
+void
+autoprewarm_database_main(Datum main_arg)
+{
+	uint32		pos;
+	BlockInfoRecord *block_info;
+	Relation	rel = NULL;
+	BlockNumber nblocks = 0;
+	BlockInfoRecord *old_blk;
+	dsm_segment *seg;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, die);
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+
+	init_apw_shmem();
+	seg = dsm_attach(apw_state->block_info_handle);
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("could not map dynamic shared memory segment")));
+
+	block_info = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	BackgroundWorkerInitializeConnectionByOid(apw_state->database, InvalidOid);
+	old_blk = NULL;
+	pos = apw_state->prewarm_start_idx;
+
+	while (pos < apw_state->prewarm_stop_idx && have_free_buffer())
+	{
+		BlockInfoRecord *blk = &block_info[pos++];
+		Buffer		buf;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Quit if we've reached records for another database. If previous
+		 * blocks are of some global objects, then continue pre-warming.
+		 */
+		if (old_blk != NULL && old_blk->database != blk->database &&
+			old_blk->database != 0)
+			break;
+
+		/*
+		 * As soon as we encounter a block of a new relation, close the old
+		 * relation. Note, that rel will be NULL if try_relation_open failed
+		 * previously, in that case there is nothing to close.
+		 */
+		if (old_blk != NULL && old_blk->filenode != blk->filenode &&
+			rel != NULL)
+		{
+			relation_close(rel, AccessShareLock);
+			rel = NULL;
+			CommitTransactionCommand();
+		}
+
+		/*
+		 * Try to open each new relation, but only once, when we first
+		 * encounter it. If it's been dropped, skip the associated blocks.
+		 */
+		if (old_blk == NULL || old_blk->filenode != blk->filenode)
+		{
+			Oid			reloid;
+
+			Assert(rel == NULL);
+			StartTransactionCommand();
+			reloid = RelidByRelfilenode(blk->tablespace, blk->filenode);
+			if (OidIsValid(reloid))
+				rel = try_relation_open(reloid, AccessShareLock);
+
+			if (!rel)
+				CommitTransactionCommand();
+		}
+		if (!rel)
+		{
+			old_blk = blk;
+			continue;
+		}
+
+		/* Once per fork, check for fork existence and size. */
+		if (old_blk == NULL ||
+			old_blk->filenode != blk->filenode ||
+			old_blk->forknum != blk->forknum)
+		{
+			RelationOpenSmgr(rel);
+
+			/*
+			 * smgrexists is not safe for illegal forknum, hence check whether
+			 * the passed forknum is valid before using it in smgrexists.
+			 */
+			if (blk->forknum > InvalidForkNumber &&
+				blk->forknum <= MAX_FORKNUM &&
+				smgrexists(rel->rd_smgr, blk->forknum))
+				nblocks = RelationGetNumberOfBlocksInFork(rel, blk->forknum);
+			else
+				nblocks = 0;
+		}
+
+		/* Check whether blocknum is valid and within fork file size. */
+		if (blk->blocknum >= nblocks)
+		{
+			/* Move to next forknum. */
+			old_blk = blk;
+			continue;
+		}
+
+		/* Prewarm buffer. */
+		buf = ReadBufferExtended(rel, blk->forknum, blk->blocknum, RBM_NORMAL,
+								 NULL);
+		if (BufferIsValid(buf))
+		{
+			apw_state->prewarmed_blocks++;
+			ReleaseBuffer(buf);
+		}
+
+		old_blk = blk;
+	}
+
+	dsm_detach(seg);
+
+	/* Release lock on previous relation. */
+	if (rel)
+	{
+		relation_close(rel, AccessShareLock);
+		CommitTransactionCommand();
+	}
+
+	return;
+}
+
+/*
+ * autoprewarm_one_database
+ *		Register a per-database dynamic worker to load.
+ */
+static void
+autoprewarm_one_database(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle = NULL;
+	BgwHandleStatus status PG_USED_FOR_ASSERTS_ONLY;
+
+	setup_autoprewarm(&worker, "autoprewarm", "autoprewarm_database_main",
+					  (Datum) NULL, BGW_NEVER_RESTART,
+					  BGWORKER_BACKEND_DATABASE_CONNECTION);
+
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerShutdown */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker autoprewarm failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerShutdown(handle);
+	Assert(status == BGWH_STOPPED);
+}
+
+/*
+ * autoprewarm_buffers
+ *		The main routine that prewarms the shared buffers.
+ *
+ * The prewarm bgworker will first load all the BlockInfoRecords in
+ * $PGDATA/AUTOPREWARM_FILE to a DSM. Further, these BlockInfoRecords are
+ * separated based on their databases. Finally, for each group of
+ * BlockInfoRecords a per-database worker will be launched to load the
+ * corresponding blocks. Launch the next worker only after the previous one has
+ * finished its job.
+ */
+static void
+autoprewarm_buffers(void)
+{
+	FILE	   *file = NULL;
+	uint32		num_elements,
+				i;
+	BlockInfoRecord *blkinfo;
+	dsm_segment *seg;
+
+	/*
+	 * Since there can be at most one worker for prewarm, locking is not
+	 * required for setting skip_prewarm_on_restart.
+	 */
+	apw_state->skip_prewarm_on_restart = true;
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->pid_using_dumpfile == InvalidPid)
+		apw_state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		LWLockRelease(&apw_state->lock);
+		ereport(LOG,
+				(errmsg("skipping prewarm because block dump file is being written by PID %d",
+						apw_state->pid_using_dumpfile)));
+		return;
+	}
+
+	LWLockRelease(&apw_state->lock);
+
+	file = AllocateFile(AUTOPREWARM_FILE, PG_BINARY_R);
+	if (!file)
+	{
+		if (errno != ENOENT)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m",
+							AUTOPREWARM_FILE)));
+
+		apw_state->pid_using_dumpfile = InvalidPid;
+		return;					/* No file to load. */
+	}
+
+	if (fscanf(file, "<<%u>>i\n", &num_elements) != 1)
+	{
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from file \"%s\": %m",
+						AUTOPREWARM_FILE)));
+	}
+
+	seg = dsm_create(sizeof(BlockInfoRecord) * num_elements, 0);
+
+	blkinfo = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	for (i = 0; i < num_elements; i++)
+	{
+		/* Get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &blkinfo[i].database,
+						&blkinfo[i].tablespace, &blkinfo[i].filenode,
+						(uint32 *) &blkinfo[i].forknum, &blkinfo[i].blocknum))
+			break;
+	}
+
+	FreeFile(file);
+
+	if (num_elements != i)
+		elog(ERROR, "autoprewarm block dump has %u entries but expected %u",
+			 i, num_elements);
+
+	/*
+	 * Sort the block number to increase the chance of sequential reads during
+	 * load.
+	 */
+	pg_qsort(blkinfo, num_elements, sizeof(BlockInfoRecord), blockinfo_cmp);
+
+	apw_state->block_info_handle = dsm_segment_handle(seg);
+	apw_state->prewarm_start_idx = apw_state->prewarm_stop_idx = 0;
+	apw_state->prewarmed_blocks = 0;
+
+	/* Get the info position of the first block of the next database. */
+	while (apw_state->prewarm_start_idx < num_elements)
+	{
+		uint32		i = apw_state->prewarm_start_idx;
+		Oid			current_db = blkinfo[i].database;
+
+		/*
+		 * Advance the prewarm_stop_idx to the first BlockRecordInfo that does
+		 * not belong to this database.
+		 */
+		i++;
+		while (i < num_elements)
+		{
+			if (current_db != blkinfo[i].database)
+			{
+				/*
+				 * Combine BlockRecordInfos of global object with the next
+				 * non-global object.
+				 */
+				if (current_db != InvalidOid)
+					break;
+				current_db = blkinfo[i].database;
+			}
+
+			i++;
+		}
+
+		/*
+		 * If we reach this point with current_db == InvalidOid, then only
+		 * BlockRecordInfos belonging to global objects exist. Since, we can
+		 * not connect with InvalidOid skip prewarming for these objects.
+		 */
+		if (current_db == InvalidOid)
+			break;
+
+		apw_state->prewarm_stop_idx = i;
+		apw_state->database = current_db;
+
+		Assert(apw_state->prewarm_start_idx < apw_state->prewarm_stop_idx);
+
+		/*
+		 * Register a per-database worker to load blocks of the database. Wait
+		 * until it has finished before starting the next worker.
+		 */
+		autoprewarm_one_database();
+		apw_state->prewarm_start_idx = apw_state->prewarm_stop_idx;
+	}
+
+	dsm_detach(seg);
+	apw_state->block_info_handle = DSM_HANDLE_INVALID;
+
+	apw_state->pid_using_dumpfile = InvalidPid;
+	ereport(LOG,
+			(errmsg("autoprewarm successfully prewarmed %d of %d previously-loaded blocks",
+					apw_state->prewarmed_blocks, num_elements)));
+}
+
+/*
+ * ============================================================================
+ * ==============	Dump part of Autoprewarm =============================
+ * ============================================================================
+ */
+
+/*
+ * This submodule is for periodically dumping BlockRecordInfos in shared
+ * buffers into a dump file AUTOPREWARM_FILE.
+ * Each entry of BlockRecordInfo consists of database, tablespace, filenode,
+ * forknum, blocknum. Note that this is in the text form so that the dump
+ * information is readable and can be edited, if required.
+ */
+
+/*
+ * dump_now
+ *		Dumps BlockRecordInfos in shared buffers.
+ */
+static uint32
+dump_now(bool is_bgworker)
+{
+	uint32		i;
+	int			ret;
+	uint32		num_blocks;
+	BlockInfoRecord *block_info_array;
+	BufferDesc *bufHdr;
+	FILE	   *file;
+	char		transient_dump_file_path[MAXPGPATH];
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->pid_using_dumpfile == InvalidPid)
+		apw_state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		LWLockRelease(&apw_state->lock);
+
+		if (!is_bgworker)
+			ereport(ERROR,
+					(errmsg("could not perform block dump because dump file is being used by PID %d",
+							apw_state->pid_using_dumpfile)));
+		ereport(LOG,
+				(errmsg("skipping block dump because it is already being performed by PID %d",
+						apw_state->pid_using_dumpfile)));
+		return 0;
+	}
+
+	LWLockRelease(&apw_state->lock);
+
+	block_info_array =
+		(BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	for (num_blocks = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		if (!is_bgworker)
+			CHECK_FOR_INTERRUPTS();
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* Lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		if (buf_state & BM_TAG_VALID)
+		{
+			block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_blocks].tablespace = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+			++num_blocks;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	snprintf(transient_dump_file_path, MAXPGPATH, "%s.tmp", AUTOPREWARM_FILE);
+	file = AllocateFile(transient_dump_file_path, PG_BINARY_W);
+	if (!file)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m",
+						transient_dump_file_path)));
+
+	ret = fprintf(file, "<<%u>>\n", num_blocks);
+	if (ret < 0)
+	{
+		int			save_errno = errno;
+
+		FreeFile(file);
+		unlink(transient_dump_file_path);
+		errno = save_errno;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write to file \"%s\" : %m",
+						transient_dump_file_path)));
+	}
+
+	for (i = 0; i < num_blocks; i++)
+	{
+		if (!is_bgworker)
+			CHECK_FOR_INTERRUPTS();
+
+		ret = fprintf(file, "%u,%u,%u,%u,%u\n",
+					  block_info_array[i].database,
+					  block_info_array[i].tablespace,
+					  block_info_array[i].filenode,
+					  (uint32) block_info_array[i].forknum,
+					  block_info_array[i].blocknum);
+		if (ret < 0)
+		{
+			int			save_errno = errno;
+
+			FreeFile(file);
+			unlink(transient_dump_file_path);
+			errno = save_errno;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to file \"%s\" : %m",
+							transient_dump_file_path)));
+		}
+	}
+
+	pfree(block_info_array);
+
+	/*
+	 * Rename transient_dump_file_path to AUTOPREWARM_FILE to make things
+	 * permanent.
+	 */
+	ret = FreeFile(file);
+	if (ret != 0)
+	{
+		int			save_errno = errno;
+
+		unlink(transient_dump_file_path);
+		errno = save_errno;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\" : %m",
+						transient_dump_file_path)));
+	}
+
+	(void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
+	apw_state->pid_using_dumpfile = InvalidPid;
+
+	ereport(DEBUG1,
+			(errmsg("saved metadata info of %d blocks", num_blocks)));
+	return num_blocks;
+}
+
+/*
+ * dump_block_info_periodically
+ *		 This loop periodically call dump_now().
+ *
+ * Call dum_now() at regular intervals defined by GUC variable
+ * autoprewarm_interval.
+ */
+void
+dump_block_info_periodically(void)
+{
+	TimestampTz last_dump_time = 0;
+
+	while (!got_sigterm)
+	{
+		int			rc;
+		struct timeval nap;
+
+		nap.tv_sec = AUTOPREWARM_INTERVAL_DEFAULT;
+		nap.tv_usec = 0;
+
+		/* In case of a SIGHUP, just reload the configuration. */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		if (autoprewarm_interval > AUTOPREWARM_INTERVAL_SHUTDOWN_ONLY)
+		{
+			TimestampTz current_time = GetCurrentTimestamp();
+
+			if (last_dump_time == 0 ||
+				TimestampDifferenceExceeds(last_dump_time,
+										   current_time,
+										   (autoprewarm_interval * 1000)))
+			{
+				dump_now(true);
+				last_dump_time = GetCurrentTimestamp();
+				nap.tv_sec = autoprewarm_interval;
+				nap.tv_usec = 0;
+			}
+			else
+			{
+				long		secs;
+				int			usecs;
+
+				TimestampDifference(last_dump_time, current_time,
+									&secs, &usecs);
+				nap.tv_sec = autoprewarm_interval - secs;
+				nap.tv_usec = 0;
+			}
+		}
+		else
+			last_dump_time = 0;
+
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   (nap.tv_sec * 1000L) + (nap.tv_usec / 1000L),
+					   PG_WAIT_EXTENSION);
+		ResetLatch(&MyProc->procLatch);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+
+	/* It's time for postmaster shutdown, let's dump for one last time. */
+	dump_now(true);
+}
+
+/*
+ * autoprewarm_main
+ *		The main entry point of autoprewarm bgworker process.
+ */
+void
+autoprewarm_main(Datum main_arg)
+{
+	AutoPrewarmTask todo_task;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
+
+	/* We're now ready to receive signals. */
+	BackgroundWorkerUnblockSignals();
+
+	todo_task = DatumGetInt32(main_arg);
+	Assert(todo_task == TASK_PREWARM_BUFFERPOOL ||
+		   todo_task == TASK_DUMP_BUFFERPOOL_INFO);
+	init_apw_shmem();
+	on_shmem_exit(detach_apw_shmem, 0);
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->bgworker_pid != InvalidPid)
+	{
+		LWLockRelease(&apw_state->lock);
+		ereport(LOG,
+				(errmsg("autoprewarm worker is already running under PID %d",
+						apw_state->bgworker_pid)));
+		return;
+	}
+
+	apw_state->bgworker_pid = MyProcPid;
+	LWLockRelease(&apw_state->lock);
+
+	ereport(LOG,
+			(errmsg("autoprewarm worker started")));
+
+	/*
+	 * We have finished initializing worker's state, let's start actual work.
+	 */
+	if (todo_task == TASK_PREWARM_BUFFERPOOL &&
+		!apw_state->skip_prewarm_on_restart)
+		autoprewarm_buffers();
+
+	dump_block_info_periodically();
+
+	ereport(LOG,
+			(errmsg("autoprewarm worker stopped")));
+}
+
+/* ============================================================================
+ * =============	Extension's entry functions/utilities	===============
+ * ============================================================================
+ */
+
+/*
+ * setup_autoprewarm
+ *		A common function to initialize BackgroundWorker structure.
+ */
+static void
+setup_autoprewarm(BackgroundWorker *autoprewarm, const char *worker_name,
+				  const char *worker_function, Datum main_arg, int restart_time,
+				  int extra_flags)
+{
+	MemSet(autoprewarm, 0, sizeof(BackgroundWorker));
+	autoprewarm->bgw_flags = BGWORKER_SHMEM_ACCESS | extra_flags;
+
+	/* Register the autoprewarm background worker */
+	autoprewarm->bgw_start_time = BgWorkerStart_ConsistentState;
+	autoprewarm->bgw_restart_time = restart_time;
+	strcpy(autoprewarm->bgw_library_name, "pg_prewarm");
+	strcpy(autoprewarm->bgw_function_name, worker_function);
+	strncpy(autoprewarm->bgw_name, worker_name, BGW_MAXLEN);
+	autoprewarm->bgw_main_arg = main_arg;
+}
+
+/*
+ * _PG_init
+ *		Extension's entry point.
+ */
+void
+_PG_init(void)
+{
+	BackgroundWorker autoprewarm_worker;
+
+	/* Define custom GUC variables. */
+
+	DefineCustomIntVariable("pg_prewarm.autoprewarm_interval",
+							"Sets the maximum time between two shared buffers dumps",
+							"If set to zero, timer based dumping is disabled.",
+							&autoprewarm_interval,
+							AUTOPREWARM_INTERVAL_DEFAULT,
+							AUTOPREWARM_INTERVAL_SHUTDOWN_ONLY, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	if (process_shared_preload_libraries_in_progress)
+		DefineCustomBoolVariable("pg_prewarm.autoprewarm",
+								 "Starts the autoprewarm worker.",
+								 NULL,
+								 &autoprewarm,
+								 true,
+								 PGC_POSTMASTER,
+								 0,
+								 NULL,
+								 NULL,
+								 NULL);
+	else
+	{
+		/* If not run as a preloaded library, nothing more to do. */
+		EmitWarningsOnPlaceholders("pg_prewarm");
+		return;
+	}
+
+	EmitWarningsOnPlaceholders("pg_prewarm");
+
+	/* Request additional shared resources. */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
+
+	/* If autoprewarm is disabled then nothing more to do. */
+	if (!autoprewarm)
+		return;
+
+	/* Register autoprewarm load. */
+	setup_autoprewarm(&autoprewarm_worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_PREWARM_BUFFERPOOL), 0, 0);
+	RegisterBackgroundWorker(&autoprewarm_worker);
+}
+
+/*
+ * autoprewarm_dump_launcher
+ *		Dynamically launch an autoprewarm dump worker.
+ */
+static void
+autoprewarm_dump_launcher(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	setup_autoprewarm(&worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_DUMP_BUFFERPOOL_INFO), 0, 0);
+
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker \"autoprewarm\" failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerStartup(handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not start autoprewarm dump bgworker"),
+				 errhint("More details may be available in the server log.")));
+	}
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("cannot start bgworker autoprewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart the database.")));
+	}
+
+	Assert(status == BGWH_STARTED);
+}
+
+/*
+ * autoprewarm_start_worker
+ *		The C-Language entry function to launch autoprewarm dump bgworker.
+ */
+Datum
+autoprewarm_start_worker(PG_FUNCTION_ARGS)
+{
+	pid_t		pid;
+
+	init_apw_shmem();
+	pid = apw_state->bgworker_pid;
+	if (pid != InvalidPid)
+		ereport(ERROR,
+				(errmsg("autoprewarm worker is already running under PID %d",
+						pid)));
+
+	autoprewarm_dump_launcher();
+	PG_RETURN_VOID();
+}
+
+/*
+ * autoprewarm_dump_now
+ *		The C-Language entry function to dump immediately.
+ */
+Datum
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+	uint32		num_blocks = 0;
+
+	init_apw_shmem();
+
+	PG_TRY();
+	{
+		num_blocks = dump_now(false);
+	}
+	PG_CATCH();
+	{
+		detach_apw_shmem(0, 0);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+	PG_RETURN_INT64(num_blocks);
+}
diff --git a/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
new file mode 100644
index 0000000..2381c06
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,14 @@
+/* contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_prewarm UPDATE TO '1.2'" to load this file. \quit
+
+CREATE FUNCTION autoprewarm_start_worker()
+RETURNS VOID STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_start_worker'
+LANGUAGE C;
+
+CREATE FUNCTION autoprewarm_dump_now()
+RETURNS pg_catalog.int8 STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_dump_now'
+LANGUAGE C;
diff --git a/contrib/pg_prewarm/pg_prewarm.control b/contrib/pg_prewarm/pg_prewarm.control
index cf2fb92..40e3add 100644
--- a/contrib/pg_prewarm/pg_prewarm.control
+++ b/contrib/pg_prewarm/pg_prewarm.control
@@ -1,5 +1,5 @@
 # pg_prewarm extension
 comment = 'prewarm relation data'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_prewarm'
 relocatable = true
diff --git a/doc/src/sgml/pgprewarm.sgml b/doc/src/sgml/pgprewarm.sgml
index c090401..45ed387 100644
--- a/doc/src/sgml/pgprewarm.sgml
+++ b/doc/src/sgml/pgprewarm.sgml
@@ -10,7 +10,9 @@
  <para>
   The <filename>pg_prewarm</filename> module provides a convenient way
   to load relation data into either the operating system buffer cache
-  or the <productname>PostgreSQL</productname> buffer cache.
+  or the <productname>PostgreSQL</productname> buffer cache. Additionally, an
+  automatic prewarming of the server buffers is supported whenever the server
+  restarts.
  </para>
 
  <sect2>
@@ -55,6 +57,97 @@ pg_prewarm(regclass, mode text default 'buffer', fork text default 'main',
    cache. For these reasons, prewarming is typically most useful at startup,
    when caches are largely empty.
   </para>
+
+<synopsis>
+autoprewarm_start_worker() RETURNS void
+</synopsis>
+
+  <para>
+   This will start the <literal>autoprewarm</literal> worker which will dump
+   shared buffers to disk at the interval specified by
+   <varname>pg_prewarm.autoprewarm_interval</varname>.
+  </para>
+
+<synopsis>
+autoprewarm_dump_now() RETURNS int8
+</synopsis>
+
+  <para>
+   This will immediately dump shared buffers to disk.  The return value is
+   the number of blocks dumped.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>autoprewarm</title>
+
+  <para>
+  This is a background worker process which will automatically dump shared
+  buffers to disk before a shutdown and then prewarm shared buffers the
+  next time the server is started by loading blocks from disk back into
+  the buffer pool.
+  </para>
+
+  <para>
+  When the shared library <literal>pg_prewarm</literal> is preloaded via
+  <xref linkend="guc-shared-preload-libraries"> in <filename>postgresql.conf</>,
+  an <literal>autoprewarm</literal> background worker is launched immediately
+  after the server has reached a consistent state. The autoprewarm process will
+  start loading blocks recorded in
+  <filename>$PGDATA/autoprewarm.blocks</filename> until there is no free buffer
+  left in the buffer pool. This way we do not replace any new blocks which were
+  loaded either by the recovery process or the querying clients.
+  </para>
+
+  <para>
+  Once the <literal>autoprewarm</literal> process has finished loading buffers
+  from disk, it will periodically dump shared buffers to disk at the inverval
+  specified by <varname>pg_prewarm.autoprewarm_interval</varname>.  Upon the
+  next server restart, the autoprewarm process will prewarm shared buffers with
+  the blocks that were last dumped to disk.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+ <variablelist>
+   <varlistentry>
+    <term>
+     <varname>pg_prewarm.autoprewarm</varname> (<type>boolean</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.autoprewarm</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      Controls whether the server should run autoprewarm worker. This is on by
+      default. This parameter can only be set in the postgresql.conf file or on
+      the server command line
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry>
+   <term>
+     <varname>pg_prewarm.autoprewarm_interval</varname> (<type>int</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.autoprewarm_interval</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is the minimum number of seconds after which autoprewarm dumps
+      shared buffers to disk. The default is 300 seconds. If set to 0,
+      shared buffers will not be dumped at regular intervals, but only when the
+      server shut down.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
  </sect2>
 
  <sect2>
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 9d8ae6a..f033323 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,23 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- a lockless check to see if there is a free buffer in
+ *					   buffer pool.
+ *
+ * If the result is true that will become stale once free buffers are moved out
+ * by other operations, so the caller who strictly want to use a free buffer
+ * should not call this.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index b768b6f..300adfc 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 23a4bbd..d8948cc 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -138,6 +138,8 @@ AttrDefault
 AttrNumber
 AttributeOpts
 AuthRequest
+AutoPrewarmSharedState
+AutoPrewarmTask
 AutoVacOpts
 AutoVacuumShmemStruct
 AutoVacuumWorkItem
@@ -218,6 +220,7 @@ BlobInfo
 Block
 BlockId
 BlockIdData
+BlockInfoRecord
 BlockNumber
 BlockSampler
 BlockSamplerData

#94

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Mithun Cy (#93)

Re: Proposal : For Auto-Prewarm.

On Sun, Jul 2, 2017 at 10:32 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Tue, Jun 27, 2017 at 11:41 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Fri, Jun 23, 2017 at 5:45 AM, Thom Brown <thom@linux.com> wrote:

Also, I find it a bit messy that launch_autoprewarm_dump() doesn't
detect an autoprewarm process already running. I'd want this to
return NULL or an error if called for a 2nd time.

We log instead of error as we try to check only after launching the
worker and inside worker. One solution could be as similar to
autoprewam_dump_now(), the autoprewarm_start_worker() can init shared
memory and check if we can launch worker in backend itself. I will try
to fix same.

I have fixed it now as follows
+Datum
+autoprewarm_start_worker(PG_FUNCTION_ARGS)
+{
+   pid_t       pid;
+
+   init_apw_shmem();
+   pid = apw_state->bgworker_pid;
+   if (pid != InvalidPid)
+       ereport(ERROR,
+               (errmsg("autoprewarm worker is already running under PID %d",
+                       pid)));
+
+   autoprewarm_dump_launcher();
+   PG_RETURN_VOID();
+}
In backend itself, we shall check if an autoprewarm worker is running
then only start the server. There is a possibility if this function is
executed concurrently when there is no worker already running (Which I
think is not a normal usage) then both call will say it has
successfully launched the worker even though only one could have
successfully done that (other will log and silently die).

Why can't we close this remaining race condition? Basically, if we
just perform all of the actions in this function under the lock and
autoprewarm_dump_launcher waits till the autoprewarm worker has
initialized the bgworker_pid, then there won't be any remaining race
condition. I think if we don't close this race condition, it will be
unpredictable whether the user will get the error or there will be
only a server log for the same.

I think that
is okay as the objective was to get one worker up and running.

You are right that the objective will be met, but still, I feel the
behavior of this API will be unpredictable which in my opinion should
be fixed. If it is really not possible or extremely difficult to fix
this behavior, then I think we should update the docs.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#95

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Mithun Cy (#93)

Re: Proposal : For Auto-Prewarm.

On Sun, Jul 2, 2017 at 10:32 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Tue, Jun 27, 2017 at 11:41 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Fri, Jun 23, 2017 at 5:45 AM, Thom Brown <thom@linux.com> wrote:

Also, I find it a bit messy that launch_autoprewarm_dump() doesn't
detect an autoprewarm process already running. I'd want this to
return NULL or an error if called for a 2nd time.

We log instead of error as we try to check only after launching the
worker and inside worker. One solution could be as similar to
autoprewam_dump_now(), the autoprewarm_start_worker() can init shared
memory and check if we can launch worker in backend itself. I will try
to fix same.

I have fixed it now as follows

Few comments on the latest patch:

1.
+ LWLockRelease(&apw_state->lock);
+ if (!is_bgworker)
+ ereport(ERROR,
+ (errmsg("could not perform block dump because dump file is being
used by PID %d",
+ apw_state->pid_using_dumpfile)));
+ ereport(LOG,
+ (errmsg("skipping block dump because it is already being performed by PID %d",
+ apw_state->pid_using_dumpfile)));

The above code looks confusing as both the messages are saying the
same thing in different words. I think you keep one message (probably
the first one) and decide error level based on if this is invoked for
bgworker. Also, you can move LWLockRelease after error message,
because if there is any error, then it will automatically release all
lwlocks.

2.
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+ uint32 num_blocks = 0;
+
..
+ PG_RETURN_INT64(num_blocks);
..
}

Is there any reason for using PG_RETURN_INT64 instead of PG_RETURN_UINT32?

3.
+dump_now(bool is_bgworker)
{
..
+ if (buf_state & BM_TAG_VALID)
+ {
+ block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+ block_info_array[num_blocks].tablespace = bufHdr->tag.rnode.spcNode;
+ block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+ block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+ block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+ ++num_blocks;
+ }
..
}

I think there is no use of writing Unlogged buffers unless the dump is
for the shutdown. You might want to use BufferIsPermanent to detect
the same.

4.
+static uint32
+dump_now(bool is_bgworker)
{
..
+ for (num_blocks = 0, i = 0; i < NBuffers; i++)
+ {
+ uint32 buf_state;
+
+ if (!is_bgworker)
+ CHECK_FOR_INTERRUPTS();
..
}

Why checking for interrupts is only for non-bgwroker cases?

5.
+ * Each entry of BlockRecordInfo consists of database, tablespace, filenode,
+ * forknum, blocknum. Note that this is in the text form so that the dump
+ * information is readable and can be edited, if required.
+ */

In the above comment, you are saying that the dump file is in text
form whereas in the code you are using binary form. I think code
should match comments. Is there a reason of preferring binary over
text or vice versa?

6.
+dump_now(bool is_bgworker)
{
..
+ (void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
+ apw_state->pid_using_dumpfile = InvalidPid;
..
}

How will pid_using_dumpfile be set to InvalidPid in the case of error
for non-bgworker cases?

7.
+dump_now(bool is_bgworker)
{
..
+ (void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);

..
}

How will transient_dump_file_path be unlinked in the case of error in
durable_rename? I think you need to use PG_TRY..PG_CATCH to ensure
same?

8.
+ file = AllocateFile(transient_dump_file_path, PG_BINARY_W);
+ if (!file)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m",
+ transient_dump_file_path)));
+
+ ret = fprintf(file, "<<%u>>\n", num_blocks);
+ if (ret < 0)
+ {
+ int save_errno = errno;
+
+ FreeFile(file);

I think you don't need to close the file in case of error, it will be
automatically taken care in case of error (via transaction abort
path).

9.
+ /* Register autoprewarm load. */
+ setup_autoprewarm(&autoprewarm_worker, "autoprewarm", "autoprewarm_main",
+  Int32GetDatum(TASK_PREWARM_BUFFERPOOL), 0, 0);

What does "load" signify in above comment? Do you want to say worker instead?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#96

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Robert Haas (#89)

Re: Proposal : For Auto-Prewarm.

On Fri, Jun 23, 2017 at 3:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jun 15, 2017 at 12:35 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

* Instead of creating our own buffering system via buffer_file_write()
and buffer_file_flush(), why not just use the facilities provided by
the operating system? fopen() et. al. provide buffering, and we have
AllocateFile() to provide a FILE *; it's just like
OpenTransientFile(), which you are using, but you'll get the buffering
stuff for free. Maybe there's some reason why this won't work out
nicely, but off-hand it seems like it might. It looks like you are
already using AllocateFile() to read the dump, so using it to write
the dump as well seems like it would be logical.

One thing that is worth considering is AllocateFile is recommended to
be used for short operations. Refer text atop AllocateFile(). If the
number of blocks to be dumped is large, then the file can remain open
for the significant period of time.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#97

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Amit Kapila (#94)

Re: Proposal : For Auto-Prewarm.

On Mon, Jul 3, 2017 at 11:58 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sun, Jul 2, 2017 at 10:32 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
On Tue, Jun 27, 2017 at 11:41 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Fri, Jun 23, 2017 at 5:45 AM, Thom Brown <thom@linux.com> wrote:

Also, I find it a bit messy that launch_autoprewarm_dump() doesn't
detect an autoprewarm process already running. I'd want this to
return NULL or an error if called for a 2nd time.

We log instead of error as we try to check only after launching the
worker and inside worker. One solution could be as similar to
autoprewam_dump_now(), the autoprewarm_start_worker() can init shared
memory and check if we can launch worker in backend itself. I will try
to fix same.

I have fixed it now as follows
+Datum
+autoprewarm_start_worker(PG_FUNCTION_ARGS)
+{
+   pid_t       pid;
+
+   init_apw_shmem();
+   pid = apw_state->bgworker_pid;
+   if (pid != InvalidPid)
+       ereport(ERROR,
+               (errmsg("autoprewarm worker is already running under PID %d",
+                       pid)));
+
+   autoprewarm_dump_launcher();
+   PG_RETURN_VOID();
+}
In backend itself, we shall check if an autoprewarm worker is running
then only start the server. There is a possibility if this function is
executed concurrently when there is no worker already running (Which I
think is not a normal usage) then both call will say it has
successfully launched the worker even though only one could have
successfully done that (other will log and silently die).
Why can't we close this remaining race condition? Basically, if we
just perform all of the actions in this function under the lock and
autoprewarm_dump_launcher waits till the autoprewarm worker has
initialized the bgworker_pid, then there won't be any remaining race
condition. I think if we don't close this race condition, it will be
unpredictable whether the user will get the error or there will be
only a server log for the same.

Yes, I can make autoprewarm_dump_launcher to wait until the launched
bgworker set its pid, but this requires one more synchronization
variable between launcher and worker. More than that I see
ShmemInitStruct(), LWLockAcquire can throw ERROR (restarts the
worker), which needs to be called before setting pid. So I thought it
won't be harmful let two concurrent calls to launch workers and we
just log failures. Please let me know if I need to rethink about it.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#98

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Amit Kapila (#96)

Re: Proposal : For Auto-Prewarm.

On Mon, Jul 3, 2017 at 3:55 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jun 23, 2017 at 3:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jun 15, 2017 at 12:35 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

* Instead of creating our own buffering system via buffer_file_write()
and buffer_file_flush(), why not just use the facilities provided by
the operating system? fopen() et. al. provide buffering, and we have
AllocateFile() to provide a FILE *; it's just like
OpenTransientFile(), which you are using, but you'll get the buffering
stuff for free. Maybe there's some reason why this won't work out
nicely, but off-hand it seems like it might. It looks like you are
already using AllocateFile() to read the dump, so using it to write
the dump as well seems like it would be logical.

One thing that is worth considering is AllocateFile is recommended to
be used for short operations. Refer text atop AllocateFile(). If the
number of blocks to be dumped is large, then the file can remain open
for the significant period of time.

-- Agree. I think I need suggestion on this we will hold on to 1 fd,
but I am not sure what amount of time file being opened qualify as a
case against using AllocateFile().

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#99

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Amit Kapila (#95)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

On Mon, Jul 3, 2017 at 3:34 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Few comments on the latest patch:
1.
+ LWLockRelease(&apw_state->lock);
+ if (!is_bgworker)
+ ereport(ERROR,
+ (errmsg("could not perform block dump because dump file is being
used by PID %d",
+ apw_state->pid_using_dumpfile)));
+ ereport(LOG,
+ (errmsg("skipping block dump because it is already being performed by PID %d",
+ apw_state->pid_using_dumpfile)));
The above code looks confusing as both the messages are saying the
same thing in different words. I think you keep one message (probably
the first one) and decide error level based on if this is invoked for
bgworker. Also, you can move LWLockRelease after error message,
because if there is any error, then it will automatically release all
lwlocks.

ERROR is used for autoprewarm_dump_now which is called from the backend.
LOG is used for bgworker.
wordings used are to match the context if failing to dump is
acceptable or not. In the case of bgworker, it is acceptable we are
not particular about the start time of dump but the only interval
between the dumps. So if already somebody doing it is acceptable. But
one who calls autoprewarm_dump_now might be particular about the start
time of dump so we throw error making him retry same.

The wording's are suggested by Robert(below snapshot) in one of his
previous comments and I also agree with it. If you think I should
reconsider this and I am missing something I am open to suggestions.

On Wed, May 31, 2017 at 10:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
+If we go to perform
+an immediate dump process and finds a non-zero value already just does
+ereport(ERROR, ...), including the PID of the other process in the
+message (e.g. "unable to perform block dump because dump file is being
+used by PID %d").  In a background worker, if we go to dump and find
+the file in use, log a message (e.g. "skipping block dump because it
+is already being performed by PID %d", "skipping prewarm because block
+dump file is being rewritten by PID %d").

Thanks moved the LWLockRelease after ERROR call.

2.
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+ uint32 num_blocks = 0;
+
..
+ PG_RETURN_INT64(num_blocks);
..
}
Is there any reason for using PG_RETURN_INT64 instead of PG_RETURN_UINT32?

Return type autoprewarm_dump_now() is pg_catalog.int8 to accommodate
uint32 so I used PG_RETURN_INT64. I think PG_RETURN_UINT32 can be used
as well I have replaced now.

3.
+dump_now(bool is_bgworker)
{
..
+ if (buf_state & BM_TAG_VALID)
+ {
+ block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+ block_info_array[num_blocks].tablespace = bufHdr->tag.rnode.spcNode;
+ block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+ block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+ block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+ ++num_blocks;
+ }
..
}
I think there is no use of writing Unlogged buffers unless the dump is
for the shutdown.  You might want to use BufferIsPermanent to detect the same.

-- I do not think that is true pages of the unlogged table are also
read into buffers for read-only purpose. So if we miss to dump them
while we shut down then the previous dump should be used.

4.
+static uint32
+dump_now(bool is_bgworker)
{
..
+ for (num_blocks = 0, i = 0; i < NBuffers; i++)
+ {
+ uint32 buf_state;
+
+ if (!is_bgworker)
+ CHECK_FOR_INTERRUPTS();
..
}

Why checking for interrupts is only for non-bgwroker cases?

-- autoprewarm_dump_now is directly called from the backend. In such
case, we have to handle signals registered for backend in dump_now().
For bgworker dump_block_info_periodically caller of dump_now() handles
SIGTERM, SIGUSR1 which we are interested in.

5.
+ * Each entry of BlockRecordInfo consists of database, tablespace, filenode,
+ * forknum, blocknum. Note that this is in the text form so that the dump
+ * information is readable and can be edited, if required.
+ */
In the above comment, you are saying that the dump file is in text
form whereas in the code you are using binary form. I think code
should match comments. Is there a reason of preferring binary over
text or vice versa?

-- Previously I used the write() on file descriptor. Sorry I should
have changed the mode of opening to text mode when I moved the code to
use AllocateFile Sorry fixed same now.

6.
+dump_now(bool is_bgworker)
{
..
+ (void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
+ apw_state->pid_using_dumpfile = InvalidPid;
..
}
How will pid_using_dumpfile be set to InvalidPid in the case of error
for non-bgworker cases?

-- I have a try() - catch() in autoprewarm_dump_now I think that is okay.

7.
+dump_now(bool is_bgworker)
{
..
+ (void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
..
}

How will transient_dump_file_path be unlinked in the case of error in
durable_rename? I think you need to use PG_TRY..PG_CATCH to ensure
same?

-- If durable_rename is failing that seems basic functionality of
autoperwarm is failing so I want it to be an ERROR. I do not want to
remove the temp file as we always truncate before reusing it again. So
I think there is no need to catch all ERROR in dump_now() just to
remove the temp file.

8.
+ file = AllocateFile(transient_dump_file_path, PG_BINARY_W);
+ if (!file)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m",
+ transient_dump_file_path)));
+
+ ret = fprintf(file, "<<%u>>\n", num_blocks);
+ if (ret < 0)
+ {
+ int save_errno = errno;
+
+ FreeFile(file);
I think you don't need to close the file in case of error, it will be
automatically taken care in case of error (via transaction abort
path).

-- I was trying to close the file before unlinking. Agree it is not
necessary to do so but just did it as a practice. I have removed it
now.

9.
+ /* Register autoprewarm load. */
+ setup_autoprewarm(&autoprewarm_worker, "autoprewarm", "autoprewarm_main",
+  Int32GetDatum(TASK_PREWARM_BUFFERPOOL), 0, 0);
What does "load" signify in above comment? Do you want to say worker instead?

-- Sorry fixed now.

In addition to above I have made one more change, per-database
autoprewarm bgworker has been renamed from "autoprewarm" to
"per-database autoprewarm"

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

autoprewarm_18.patchapplication/octet-stream; name=autoprewarm_18.patchDownload

commit 9494d9978b1a1e22c68b9546607bc1b110a7a133
Author: mithun <mithun@localhost.localdomain>
Date:   Wed Jul 5 17:59:09 2017 +0530

    commit 18

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e..88580d1 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,10 +1,10 @@
 # contrib/pg_prewarm/Makefile
 
 MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+OBJS = pg_prewarm.o autoprewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
-DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
+DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
new file mode 100644
index 0000000..7b3ce95
--- /dev/null
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -0,0 +1,1007 @@
+/*-------------------------------------------------------------------------
+ *
+ * autoprewarm.c
+ *		Automatically prewarms the shared buffers when server restarts.
+ *
+ * DESCRIPTION
+ *
+ *		Autoprewarm is a bgworker process that automatically records the
+ *		information about blocks which were present in shared buffers before
+ *		server shutdown. Then prewarms the shared buffers on server restart
+ *		with those blocks.
+ *
+ *		How does it work? When the shared library "pg_prewarm" is preloaded, a
+ *		bgworker "autoprewarm" is launched immediately after the server has
+ *		reached a consistent state. The bgworker will start loading blocks
+ *		recorded until there is no free buffer left in the shared buffers. This
+ *		way we do not replace any new blocks which were loaded either by the
+ *		recovery process or the querying clients.
+ *
+ *		Once the "autoprewarm" bgworker has completed its prewarm task, it will
+ *		start a new task to periodically dump the BlockInfoRecords related to
+ *		the blocks which are currently in shared buffers. On next server
+ *		restart, the bgworker will prewarm the shared buffers by loading those
+ *		blocks. The GUC pg_prewarm.autoprewarm_interval will control the
+ *		dumping activity of the bgworker.
+ *
+ *	Copyright (c) 2016-2017, PostgreSQL Global Development Group
+ *
+ *	IDENTIFICATION
+ *		contrib/pg_prewarm/autoprewarm.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include <unistd.h>
+
+/* These are always necessary for a bgworker. */
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+
+/* These are necessary for prewarm utilities. */
+#include "access/heapam.h"
+#include "access/xact.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "pgstat.h"
+#include "storage/buf_internals.h"
+#include "storage/dsm.h"
+#include "storage/procsignal.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/acl.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relfilenodemap.h"
+#include "utils/resowner.h"
+
+PG_FUNCTION_INFO_V1(autoprewarm_start_worker);
+PG_FUNCTION_INFO_V1(autoprewarm_dump_now);
+
+#define AUTOPREWARM_INTERVAL_SHUTDOWN_ONLY 0
+#define AUTOPREWARM_INTERVAL_DEFAULT 300
+
+#define AUTOPREWARM_FILE "autoprewarm.blocks"
+
+/* Primary functions */
+void		_PG_init(void);
+void		autoprewarm_main(Datum main_arg);
+static void dump_block_info_periodically(void);
+static void autoprewarm_dump_launcher(void);
+static void setup_autoprewarm(BackgroundWorker *autoprewarm,
+				  const char *worker_name,
+				  const char *worker_function,
+				  Datum main_arg, int restart_time,
+				  int extra_flags);
+void		autoprewarm_database_main(Datum main_arg);
+
+/*
+ * Signal Handlers.
+ */
+
+static void apw_sigterm_handler(SIGNAL_ARGS);
+static void apw_sighup_handler(SIGNAL_ARGS);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+static volatile sig_atomic_t got_sighup = false;
+
+/*
+ *	Signal handler for SIGTERM
+ *	Set a flag to handle.
+ */
+static void
+apw_sigterm_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGHUP
+ *	Set a flag to reread the config file.
+ */
+static void
+apw_sighup_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sighup = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/* ============================================================================
+ * ==============	Types and variables used by autoprewarm   =============
+ * ============================================================================
+ */
+
+/* Metadata of each persistent block which is dumped and used for loading. */
+typedef struct BlockInfoRecord
+{
+	Oid			database;
+	Oid			tablespace;
+	Oid			filenode;
+	ForkNumber	forknum;
+	BlockNumber blocknum;
+} BlockInfoRecord;
+
+/* Tasks performed by autoprewarm workers.*/
+typedef enum
+{
+	TASK_PREWARM_BUFFERPOOL,	/* prewarm the shared buffers. */
+	TASK_DUMP_BUFFERPOOL_INFO	/* dump the shared buffer's block info. */
+} AutoPrewarmTask;
+
+/* Shared state information for autoprewarm bgworker. */
+typedef struct AutoPrewarmSharedState
+{
+	LWLock		lock;			/* mutual exclusion */
+	pid_t		bgworker_pid;	/* for main bgworker */
+	pid_t		pid_using_dumpfile; /* for autoprewarm or block dump */
+	bool		skip_prewarm_on_restart;	/* if set true, prewarm task will
+											 * not be done */
+
+	/* Following items are for communication with per-database worker */
+	dsm_handle	block_info_handle;
+	Oid			database;
+	int			prewarm_start_idx;
+	int			prewarm_stop_idx;
+	uint32		prewarmed_blocks;
+} AutoPrewarmSharedState;
+
+static AutoPrewarmSharedState *apw_state = NULL;
+
+/* GUC variable that controls the dump activity of autoprewarm. */
+static int	autoprewarm_interval = 0;
+
+/*
+ * The GUC variable controls whether the server should run the autoprewarm
+ * worker.
+ */
+static bool autoprewarm = true;
+
+/* Compare member elements to check whether they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * blockinfo_cmp
+ *		Compare function used for qsort().
+ */
+static int
+blockinfo_cmp(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(tablespace);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+	return 0;
+}
+
+/* ============================================================================
+ * =================	Prewarm part of autoprewarm ========================
+ * ============================================================================
+ */
+
+/*
+ * detach_apw_shmem
+ *		on_apw_exit reset the prewarm shared state
+ */
+
+static void
+detach_apw_shmem(int code, Datum arg)
+{
+	if (apw_state->pid_using_dumpfile == MyProcPid)
+		apw_state->pid_using_dumpfile = InvalidPid;
+	if (apw_state->bgworker_pid == MyProcPid)
+		apw_state->bgworker_pid = InvalidPid;
+}
+
+/*
+ * init_apw_shmem
+ *		Allocate and initialize autoprewarm related shared memory.
+ */
+static void
+init_apw_shmem(void)
+{
+	bool		found = false;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	apw_state = ShmemInitStruct("autoprewarm",
+								sizeof(AutoPrewarmSharedState),
+								&found);
+	if (!found)
+	{
+		/* First time through ... */
+		LWLockInitialize(&apw_state->lock, LWLockNewTrancheId());
+		apw_state->bgworker_pid = InvalidPid;
+		apw_state->pid_using_dumpfile = InvalidPid;
+		apw_state->skip_prewarm_on_restart = false;
+	}
+
+	LWLockRelease(AddinShmemInitLock);
+}
+
+/*
+ * autoprewarm_database_main
+ *		This subroutine loads the BlockInfoRecords of the database set in
+ *		AutoPrewarmSharedState.
+ *
+ * Connect to the database and load the blocks of that database which are given
+ * by [apw_state->prewarm_start_idx, apw_state->prewarm_stop_idx).
+ */
+void
+autoprewarm_database_main(Datum main_arg)
+{
+	uint32		pos;
+	BlockInfoRecord *block_info;
+	Relation	rel = NULL;
+	BlockNumber nblocks = 0;
+	BlockInfoRecord *old_blk;
+	dsm_segment *seg;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, die);
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+
+	init_apw_shmem();
+	seg = dsm_attach(apw_state->block_info_handle);
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("could not map dynamic shared memory segment")));
+
+	block_info = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	BackgroundWorkerInitializeConnectionByOid(apw_state->database, InvalidOid);
+	old_blk = NULL;
+	pos = apw_state->prewarm_start_idx;
+
+	while (pos < apw_state->prewarm_stop_idx && have_free_buffer())
+	{
+		BlockInfoRecord *blk = &block_info[pos++];
+		Buffer		buf;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Quit if we've reached records for another database. If previous
+		 * blocks are of some global objects, then continue pre-warming.
+		 */
+		if (old_blk != NULL && old_blk->database != blk->database &&
+			old_blk->database != 0)
+			break;
+
+		/*
+		 * As soon as we encounter a block of a new relation, close the old
+		 * relation. Note, that rel will be NULL if try_relation_open failed
+		 * previously, in that case there is nothing to close.
+		 */
+		if (old_blk != NULL && old_blk->filenode != blk->filenode &&
+			rel != NULL)
+		{
+			relation_close(rel, AccessShareLock);
+			rel = NULL;
+			CommitTransactionCommand();
+		}
+
+		/*
+		 * Try to open each new relation, but only once, when we first
+		 * encounter it. If it's been dropped, skip the associated blocks.
+		 */
+		if (old_blk == NULL || old_blk->filenode != blk->filenode)
+		{
+			Oid			reloid;
+
+			Assert(rel == NULL);
+			StartTransactionCommand();
+			reloid = RelidByRelfilenode(blk->tablespace, blk->filenode);
+			if (OidIsValid(reloid))
+				rel = try_relation_open(reloid, AccessShareLock);
+
+			if (!rel)
+				CommitTransactionCommand();
+		}
+		if (!rel)
+		{
+			old_blk = blk;
+			continue;
+		}
+
+		/* Once per fork, check for fork existence and size. */
+		if (old_blk == NULL ||
+			old_blk->filenode != blk->filenode ||
+			old_blk->forknum != blk->forknum)
+		{
+			RelationOpenSmgr(rel);
+
+			/*
+			 * smgrexists is not safe for illegal forknum, hence check whether
+			 * the passed forknum is valid before using it in smgrexists.
+			 */
+			if (blk->forknum > InvalidForkNumber &&
+				blk->forknum <= MAX_FORKNUM &&
+				smgrexists(rel->rd_smgr, blk->forknum))
+				nblocks = RelationGetNumberOfBlocksInFork(rel, blk->forknum);
+			else
+				nblocks = 0;
+		}
+
+		/* Check whether blocknum is valid and within fork file size. */
+		if (blk->blocknum >= nblocks)
+		{
+			/* Move to next forknum. */
+			old_blk = blk;
+			continue;
+		}
+
+		/* Prewarm buffer. */
+		buf = ReadBufferExtended(rel, blk->forknum, blk->blocknum, RBM_NORMAL,
+								 NULL);
+		if (BufferIsValid(buf))
+		{
+			apw_state->prewarmed_blocks++;
+			ReleaseBuffer(buf);
+		}
+
+		old_blk = blk;
+	}
+
+	dsm_detach(seg);
+
+	/* Release lock on previous relation. */
+	if (rel)
+	{
+		relation_close(rel, AccessShareLock);
+		CommitTransactionCommand();
+	}
+
+	return;
+}
+
+/*
+ * autoprewarm_one_database
+ *		Register a per-database dynamic worker to load.
+ */
+static void
+autoprewarm_one_database(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle = NULL;
+	BgwHandleStatus status PG_USED_FOR_ASSERTS_ONLY;
+
+	setup_autoprewarm(&worker, "per-database autoprewarm",
+					  "autoprewarm_database_main",
+					  (Datum) NULL, BGW_NEVER_RESTART,
+					  BGWORKER_BACKEND_DATABASE_CONNECTION);
+
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerShutdown */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker autoprewarm failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerShutdown(handle);
+	Assert(status == BGWH_STOPPED);
+}
+
+/*
+ * autoprewarm_buffers
+ *		The main routine that prewarms the shared buffers.
+ *
+ * The prewarm bgworker will first load all the BlockInfoRecords in
+ * $PGDATA/AUTOPREWARM_FILE to a DSM. Further, these BlockInfoRecords are
+ * separated based on their databases. Finally, for each group of
+ * BlockInfoRecords a per-database worker will be launched to load the
+ * corresponding blocks. Launch the next worker only after the previous one has
+ * finished its job.
+ */
+static void
+autoprewarm_buffers(void)
+{
+	FILE	   *file = NULL;
+	uint32		num_elements,
+				i;
+	BlockInfoRecord *blkinfo;
+	dsm_segment *seg;
+
+	/*
+	 * Since there can be at most one worker for prewarm, locking is not
+	 * required for setting skip_prewarm_on_restart.
+	 */
+	apw_state->skip_prewarm_on_restart = true;
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->pid_using_dumpfile == InvalidPid)
+		apw_state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		LWLockRelease(&apw_state->lock);
+		ereport(LOG,
+				(errmsg("skipping prewarm because block dump file is being written by PID %d",
+						apw_state->pid_using_dumpfile)));
+		return;
+	}
+
+	LWLockRelease(&apw_state->lock);
+
+	file = AllocateFile(AUTOPREWARM_FILE, "r");
+	if (!file)
+	{
+		if (errno != ENOENT)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m",
+							AUTOPREWARM_FILE)));
+
+		apw_state->pid_using_dumpfile = InvalidPid;
+		return;					/* No file to load. */
+	}
+
+	if (fscanf(file, "<<%u>>i\n", &num_elements) != 1)
+	{
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from file \"%s\": %m",
+						AUTOPREWARM_FILE)));
+	}
+
+	seg = dsm_create(sizeof(BlockInfoRecord) * num_elements, 0);
+
+	blkinfo = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	for (i = 0; i < num_elements; i++)
+	{
+		/* Get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &blkinfo[i].database,
+						&blkinfo[i].tablespace, &blkinfo[i].filenode,
+						(uint32 *) &blkinfo[i].forknum, &blkinfo[i].blocknum))
+			break;
+	}
+
+	FreeFile(file);
+
+	if (num_elements != i)
+		elog(ERROR, "autoprewarm block dump has %u entries but expected %u",
+			 i, num_elements);
+
+	/*
+	 * Sort the block number to increase the chance of sequential reads during
+	 * load.
+	 */
+	pg_qsort(blkinfo, num_elements, sizeof(BlockInfoRecord), blockinfo_cmp);
+
+	apw_state->block_info_handle = dsm_segment_handle(seg);
+	apw_state->prewarm_start_idx = apw_state->prewarm_stop_idx = 0;
+	apw_state->prewarmed_blocks = 0;
+
+	/* Get the info position of the first block of the next database. */
+	while (apw_state->prewarm_start_idx < num_elements)
+	{
+		uint32		i = apw_state->prewarm_start_idx;
+		Oid			current_db = blkinfo[i].database;
+
+		/*
+		 * Advance the prewarm_stop_idx to the first BlockRecordInfo that does
+		 * not belong to this database.
+		 */
+		i++;
+		while (i < num_elements)
+		{
+			if (current_db != blkinfo[i].database)
+			{
+				/*
+				 * Combine BlockRecordInfos of global object with the next
+				 * non-global object.
+				 */
+				if (current_db != InvalidOid)
+					break;
+				current_db = blkinfo[i].database;
+			}
+
+			i++;
+		}
+
+		/*
+		 * If we reach this point with current_db == InvalidOid, then only
+		 * BlockRecordInfos belonging to global objects exist. Since, we can
+		 * not connect with InvalidOid skip prewarming for these objects.
+		 */
+		if (current_db == InvalidOid)
+			break;
+
+		apw_state->prewarm_stop_idx = i;
+		apw_state->database = current_db;
+
+		Assert(apw_state->prewarm_start_idx < apw_state->prewarm_stop_idx);
+
+		/*
+		 * Register a per-database worker to load blocks of the database. Wait
+		 * until it has finished before starting the next worker.
+		 */
+		autoprewarm_one_database();
+		apw_state->prewarm_start_idx = apw_state->prewarm_stop_idx;
+	}
+
+	dsm_detach(seg);
+	apw_state->block_info_handle = DSM_HANDLE_INVALID;
+
+	apw_state->pid_using_dumpfile = InvalidPid;
+	ereport(LOG,
+			(errmsg("autoprewarm successfully prewarmed %d of %d previously-loaded blocks",
+					apw_state->prewarmed_blocks, num_elements)));
+}
+
+/*
+ * ============================================================================
+ * ==============	Dump part of Autoprewarm =============================
+ * ============================================================================
+ */
+
+/*
+ * This submodule is for periodically dumping BlockRecordInfos in shared
+ * buffers into a dump file AUTOPREWARM_FILE.
+ * Each entry of BlockRecordInfo consists of database, tablespace, filenode,
+ * forknum, blocknum. Note that this is in the text form so that the dump
+ * information is readable and can be edited, if required.
+ */
+
+/*
+ * dump_now
+ *		Dumps BlockRecordInfos in shared buffers.
+ */
+static uint32
+dump_now(bool is_bgworker)
+{
+	uint32		i;
+	int			ret;
+	uint32		num_blocks;
+	BlockInfoRecord *block_info_array;
+	BufferDesc *bufHdr;
+	FILE	   *file;
+	char		transient_dump_file_path[MAXPGPATH];
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->pid_using_dumpfile == InvalidPid)
+		apw_state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		if (!is_bgworker)
+			ereport(ERROR,
+					(errmsg("could not perform block dump because dump file is being used by PID %d",
+							apw_state->pid_using_dumpfile)));
+
+		LWLockRelease(&apw_state->lock);
+		ereport(LOG,
+				(errmsg("skipping block dump because it is already being performed by PID %d",
+						apw_state->pid_using_dumpfile)));
+		return 0;
+	}
+
+	LWLockRelease(&apw_state->lock);
+
+	block_info_array =
+		(BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	for (num_blocks = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		if (!is_bgworker)
+			CHECK_FOR_INTERRUPTS();
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* Lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		if (buf_state & BM_TAG_VALID)
+		{
+			block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_blocks].tablespace = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+			++num_blocks;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	snprintf(transient_dump_file_path, MAXPGPATH, "%s.tmp", AUTOPREWARM_FILE);
+	file = AllocateFile(transient_dump_file_path, "w");
+	if (!file)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m",
+						transient_dump_file_path)));
+
+	ret = fprintf(file, "<<%u>>\n", num_blocks);
+	if (ret < 0)
+	{
+		int			save_errno = errno;
+
+		unlink(transient_dump_file_path);
+		errno = save_errno;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write to file \"%s\" : %m",
+						transient_dump_file_path)));
+	}
+
+	for (i = 0; i < num_blocks; i++)
+	{
+		if (!is_bgworker)
+			CHECK_FOR_INTERRUPTS();
+
+		ret = fprintf(file, "%u,%u,%u,%u,%u\n",
+					  block_info_array[i].database,
+					  block_info_array[i].tablespace,
+					  block_info_array[i].filenode,
+					  (uint32) block_info_array[i].forknum,
+					  block_info_array[i].blocknum);
+		if (ret < 0)
+		{
+			int			save_errno = errno;
+
+			FreeFile(file);
+			unlink(transient_dump_file_path);
+			errno = save_errno;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to file \"%s\" : %m",
+							transient_dump_file_path)));
+		}
+	}
+
+	pfree(block_info_array);
+
+	/*
+	 * Rename transient_dump_file_path to AUTOPREWARM_FILE to make things
+	 * permanent.
+	 */
+	ret = FreeFile(file);
+	if (ret != 0)
+	{
+		int			save_errno = errno;
+
+		unlink(transient_dump_file_path);
+		errno = save_errno;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\" : %m",
+						transient_dump_file_path)));
+	}
+
+	(void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
+	apw_state->pid_using_dumpfile = InvalidPid;
+
+	ereport(DEBUG1,
+			(errmsg("saved metadata info of %d blocks", num_blocks)));
+	return num_blocks;
+}
+
+/*
+ * dump_block_info_periodically
+ *		 This loop periodically call dump_now().
+ *
+ * Call dum_now() at regular intervals defined by GUC variable
+ * autoprewarm_interval.
+ */
+void
+dump_block_info_periodically(void)
+{
+	TimestampTz last_dump_time = 0;
+
+	while (!got_sigterm)
+	{
+		int			rc;
+		struct timeval nap;
+
+		nap.tv_sec = AUTOPREWARM_INTERVAL_DEFAULT;
+		nap.tv_usec = 0;
+
+		/* In case of a SIGHUP, just reload the configuration. */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		if (autoprewarm_interval > AUTOPREWARM_INTERVAL_SHUTDOWN_ONLY)
+		{
+			TimestampTz current_time = GetCurrentTimestamp();
+
+			if (last_dump_time == 0 ||
+				TimestampDifferenceExceeds(last_dump_time,
+										   current_time,
+										   (autoprewarm_interval * 1000)))
+			{
+				dump_now(true);
+				last_dump_time = GetCurrentTimestamp();
+				nap.tv_sec = autoprewarm_interval;
+				nap.tv_usec = 0;
+			}
+			else
+			{
+				long		secs;
+				int			usecs;
+
+				TimestampDifference(last_dump_time, current_time,
+									&secs, &usecs);
+				nap.tv_sec = autoprewarm_interval - secs;
+				nap.tv_usec = 0;
+			}
+		}
+		else
+			last_dump_time = 0;
+
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   (nap.tv_sec * 1000L) + (nap.tv_usec / 1000L),
+					   PG_WAIT_EXTENSION);
+		ResetLatch(&MyProc->procLatch);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+
+	/* It's time for postmaster shutdown, let's dump for one last time. */
+	dump_now(true);
+}
+
+/*
+ * autoprewarm_main
+ *		The main entry point of autoprewarm bgworker process.
+ */
+void
+autoprewarm_main(Datum main_arg)
+{
+	AutoPrewarmTask todo_task;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
+
+	/* We're now ready to receive signals. */
+	BackgroundWorkerUnblockSignals();
+
+	todo_task = DatumGetInt32(main_arg);
+	Assert(todo_task == TASK_PREWARM_BUFFERPOOL ||
+		   todo_task == TASK_DUMP_BUFFERPOOL_INFO);
+	init_apw_shmem();
+	on_shmem_exit(detach_apw_shmem, 0);
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->bgworker_pid != InvalidPid)
+	{
+		LWLockRelease(&apw_state->lock);
+		ereport(LOG,
+				(errmsg("autoprewarm worker is already running under PID %d",
+						apw_state->bgworker_pid)));
+		return;
+	}
+
+	apw_state->bgworker_pid = MyProcPid;
+	LWLockRelease(&apw_state->lock);
+
+	ereport(LOG,
+			(errmsg("autoprewarm worker started")));
+
+	/*
+	 * We have finished initializing worker's state, let's start actual work.
+	 */
+	if (todo_task == TASK_PREWARM_BUFFERPOOL &&
+		!apw_state->skip_prewarm_on_restart)
+		autoprewarm_buffers();
+
+	dump_block_info_periodically();
+
+	ereport(LOG,
+			(errmsg("autoprewarm worker stopped")));
+}
+
+/* ============================================================================
+ * =============	Extension's entry functions/utilities	===============
+ * ============================================================================
+ */
+
+/*
+ * setup_autoprewarm
+ *		A common function to initialize BackgroundWorker structure.
+ */
+static void
+setup_autoprewarm(BackgroundWorker *autoprewarm, const char *worker_name,
+				  const char *worker_function, Datum main_arg, int restart_time,
+				  int extra_flags)
+{
+	MemSet(autoprewarm, 0, sizeof(BackgroundWorker));
+	autoprewarm->bgw_flags = BGWORKER_SHMEM_ACCESS | extra_flags;
+
+	/* Register the autoprewarm background worker */
+	autoprewarm->bgw_start_time = BgWorkerStart_ConsistentState;
+	autoprewarm->bgw_restart_time = restart_time;
+	strcpy(autoprewarm->bgw_library_name, "pg_prewarm");
+	strcpy(autoprewarm->bgw_function_name, worker_function);
+	strncpy(autoprewarm->bgw_name, worker_name, BGW_MAXLEN);
+	autoprewarm->bgw_main_arg = main_arg;
+}
+
+/*
+ * _PG_init
+ *		Extension's entry point.
+ */
+void
+_PG_init(void)
+{
+	BackgroundWorker autoprewarm_worker;
+
+	/* Define custom GUC variables. */
+
+	DefineCustomIntVariable("pg_prewarm.autoprewarm_interval",
+							"Sets the maximum time between two shared buffers dumps",
+							"If set to zero, timer based dumping is disabled.",
+							&autoprewarm_interval,
+							AUTOPREWARM_INTERVAL_DEFAULT,
+							AUTOPREWARM_INTERVAL_SHUTDOWN_ONLY, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	if (process_shared_preload_libraries_in_progress)
+		DefineCustomBoolVariable("pg_prewarm.autoprewarm",
+								 "Starts the autoprewarm worker.",
+								 NULL,
+								 &autoprewarm,
+								 true,
+								 PGC_POSTMASTER,
+								 0,
+								 NULL,
+								 NULL,
+								 NULL);
+	else
+	{
+		/* If not run as a preloaded library, nothing more to do. */
+		EmitWarningsOnPlaceholders("pg_prewarm");
+		return;
+	}
+
+	EmitWarningsOnPlaceholders("pg_prewarm");
+
+	/* Request additional shared resources. */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
+
+	/* If autoprewarm is disabled then nothing more to do. */
+	if (!autoprewarm)
+		return;
+
+	/* Register autoprewarm worker. */
+	setup_autoprewarm(&autoprewarm_worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_PREWARM_BUFFERPOOL), 0, 0);
+	RegisterBackgroundWorker(&autoprewarm_worker);
+}
+
+/*
+ * autoprewarm_dump_launcher
+ *		Dynamically launch an autoprewarm dump worker.
+ */
+static void
+autoprewarm_dump_launcher(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	setup_autoprewarm(&worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_DUMP_BUFFERPOOL_INFO), 0, 0);
+
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker \"autoprewarm\" failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerStartup(handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not start autoprewarm dump bgworker"),
+				 errhint("More details may be available in the server log.")));
+	}
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("cannot start bgworker autoprewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart the database.")));
+	}
+
+	Assert(status == BGWH_STARTED);
+}
+
+/*
+ * autoprewarm_start_worker
+ *		The C-Language entry function to launch autoprewarm dump bgworker.
+ */
+Datum
+autoprewarm_start_worker(PG_FUNCTION_ARGS)
+{
+	pid_t		pid;
+
+	init_apw_shmem();
+	pid = apw_state->bgworker_pid;
+	if (pid != InvalidPid)
+		ereport(ERROR,
+				(errmsg("autoprewarm worker is already running under PID %d",
+						pid)));
+
+	autoprewarm_dump_launcher();
+	PG_RETURN_VOID();
+}
+
+/*
+ * autoprewarm_dump_now
+ *		The C-Language entry function to dump immediately.
+ */
+Datum
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+	uint32		num_blocks = 0;
+
+	init_apw_shmem();
+
+	PG_TRY();
+	{
+		num_blocks = dump_now(false);
+	}
+	PG_CATCH();
+	{
+		detach_apw_shmem(0, 0);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+	PG_RETURN_UINT32(num_blocks);
+}
diff --git a/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
new file mode 100644
index 0000000..2381c06
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,14 @@
+/* contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_prewarm UPDATE TO '1.2'" to load this file. \quit
+
+CREATE FUNCTION autoprewarm_start_worker()
+RETURNS VOID STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_start_worker'
+LANGUAGE C;
+
+CREATE FUNCTION autoprewarm_dump_now()
+RETURNS pg_catalog.int8 STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_dump_now'
+LANGUAGE C;
diff --git a/contrib/pg_prewarm/pg_prewarm.control b/contrib/pg_prewarm/pg_prewarm.control
index cf2fb92..40e3add 100644
--- a/contrib/pg_prewarm/pg_prewarm.control
+++ b/contrib/pg_prewarm/pg_prewarm.control
@@ -1,5 +1,5 @@
 # pg_prewarm extension
 comment = 'prewarm relation data'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_prewarm'
 relocatable = true
diff --git a/doc/src/sgml/pgprewarm.sgml b/doc/src/sgml/pgprewarm.sgml
index c090401..45ed387 100644
--- a/doc/src/sgml/pgprewarm.sgml
+++ b/doc/src/sgml/pgprewarm.sgml
@@ -10,7 +10,9 @@
  <para>
   The <filename>pg_prewarm</filename> module provides a convenient way
   to load relation data into either the operating system buffer cache
-  or the <productname>PostgreSQL</productname> buffer cache.
+  or the <productname>PostgreSQL</productname> buffer cache. Additionally, an
+  automatic prewarming of the server buffers is supported whenever the server
+  restarts.
  </para>
 
  <sect2>
@@ -55,6 +57,97 @@ pg_prewarm(regclass, mode text default 'buffer', fork text default 'main',
    cache. For these reasons, prewarming is typically most useful at startup,
    when caches are largely empty.
   </para>
+
+<synopsis>
+autoprewarm_start_worker() RETURNS void
+</synopsis>
+
+  <para>
+   This will start the <literal>autoprewarm</literal> worker which will dump
+   shared buffers to disk at the interval specified by
+   <varname>pg_prewarm.autoprewarm_interval</varname>.
+  </para>
+
+<synopsis>
+autoprewarm_dump_now() RETURNS int8
+</synopsis>
+
+  <para>
+   This will immediately dump shared buffers to disk.  The return value is
+   the number of blocks dumped.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>autoprewarm</title>
+
+  <para>
+  This is a background worker process which will automatically dump shared
+  buffers to disk before a shutdown and then prewarm shared buffers the
+  next time the server is started by loading blocks from disk back into
+  the buffer pool.
+  </para>
+
+  <para>
+  When the shared library <literal>pg_prewarm</literal> is preloaded via
+  <xref linkend="guc-shared-preload-libraries"> in <filename>postgresql.conf</>,
+  an <literal>autoprewarm</literal> background worker is launched immediately
+  after the server has reached a consistent state. The autoprewarm process will
+  start loading blocks recorded in
+  <filename>$PGDATA/autoprewarm.blocks</filename> until there is no free buffer
+  left in the buffer pool. This way we do not replace any new blocks which were
+  loaded either by the recovery process or the querying clients.
+  </para>
+
+  <para>
+  Once the <literal>autoprewarm</literal> process has finished loading buffers
+  from disk, it will periodically dump shared buffers to disk at the inverval
+  specified by <varname>pg_prewarm.autoprewarm_interval</varname>.  Upon the
+  next server restart, the autoprewarm process will prewarm shared buffers with
+  the blocks that were last dumped to disk.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+ <variablelist>
+   <varlistentry>
+    <term>
+     <varname>pg_prewarm.autoprewarm</varname> (<type>boolean</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.autoprewarm</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      Controls whether the server should run autoprewarm worker. This is on by
+      default. This parameter can only be set in the postgresql.conf file or on
+      the server command line
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry>
+   <term>
+     <varname>pg_prewarm.autoprewarm_interval</varname> (<type>int</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.autoprewarm_interval</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is the minimum number of seconds after which autoprewarm dumps
+      shared buffers to disk. The default is 300 seconds. If set to 0,
+      shared buffers will not be dumped at regular intervals, but only when the
+      server shut down.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
  </sect2>
 
  <sect2>
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 9d8ae6a..f033323 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,23 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- a lockless check to see if there is a free buffer in
+ *					   buffer pool.
+ *
+ * If the result is true that will become stale once free buffers are moved out
+ * by other operations, so the caller who strictly want to use a free buffer
+ * should not call this.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index b768b6f..300adfc 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 23a4bbd..d8948cc 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -138,6 +138,8 @@ AttrDefault
 AttrNumber
 AttributeOpts
 AuthRequest
+AutoPrewarmSharedState
+AutoPrewarmTask
 AutoVacOpts
 AutoVacuumShmemStruct
 AutoVacuumWorkItem
@@ -218,6 +220,7 @@ BlobInfo
 Block
 BlockId
 BlockIdData
+BlockInfoRecord
 BlockNumber
 BlockSampler
 BlockSamplerData

#100

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Mithun Cy (#99)

Re: Proposal : For Auto-Prewarm.

On Wed, Jul 5, 2017 at 6:25 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Mon, Jul 3, 2017 at 3:34 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Few comments on the latest patch:
1.
+ LWLockRelease(&apw_state->lock);
+ if (!is_bgworker)
+ ereport(ERROR,
+ (errmsg("could not perform block dump because dump file is being
used by PID %d",
+ apw_state->pid_using_dumpfile)));
+ ereport(LOG,
+ (errmsg("skipping block dump because it is already being performed by PID %d",
+ apw_state->pid_using_dumpfile)));
The above code looks confusing as both the messages are saying the
same thing in different words. I think you keep one message (probably
the first one) and decide error level based on if this is invoked for
bgworker. Also, you can move LWLockRelease after error message,
because if there is any error, then it will automatically release all
lwlocks.
ERROR is used for autoprewarm_dump_now which is called from the backend.
LOG is used for bgworker.
wordings used are to match the context if failing to dump is
acceptable or not. In the case of bgworker, it is acceptable we are
not particular about the start time of dump but the only interval
between the dumps. So if already somebody doing it is acceptable. But
one who calls autoprewarm_dump_now might be particular about the start
time of dump so we throw error making him retry same.

The wording's are suggested by Robert(below snapshot) in one of his
previous comments and I also agree with it. If you think I should
reconsider this and I am missing something I am open to suggestions.

Not an issue, if you and Robert think having two different messages is
better, then let's leave it. One improvement we could do here is to
initialize a boolean global variable for AutoPrewarmWorker, then use
it wherever required.

3.
+dump_now(bool is_bgworker)
{
..
+ if (buf_state & BM_TAG_VALID)
+ {
+ block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+ block_info_array[num_blocks].tablespace = bufHdr->tag.rnode.spcNode;
+ block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+ block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+ block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+ ++num_blocks;
+ }
..
}
I think there is no use of writing Unlogged buffers unless the dump is
for the shutdown.  You might want to use BufferIsPermanent to detect the same.

-- I do not think that is true pages of the unlogged table are also
read into buffers for read-only purpose. So if we miss to dump them
while we shut down then the previous dump should be used.

I am not able to understand what you want to say. Unlogged tables
should be empty in case of crash recovery. Also, we never flush
unlogged buffers except for shutdown checkpoint, refer BufferAlloc and
in particular below comment:

* Make sure BM_PERMANENT is set for buffers that must be written at every
* checkpoint. Unlogged buffers only need to be written at shutdown
* checkpoints, except for their "init" forks, which need to be treated
* just like permanent relations.

4.
+static uint32
+dump_now(bool is_bgworker)
{
..
+ for (num_blocks = 0, i = 0; i < NBuffers; i++)
+ {
+ uint32 buf_state;
+
+ if (!is_bgworker)
+ CHECK_FOR_INTERRUPTS();
..
}
Why checking for interrupts is only for non-bgwroker cases?
-- autoprewarm_dump_now is directly called from the backend. In such
case, we have to handle signals registered for backend in dump_now().
For bgworker dump_block_info_periodically caller of dump_now() handles
SIGTERM, SIGUSR1 which we are interested in.

Okay, but what about signal handler for SIGUSR1
(procsignal_sigusr1_handler). Have you verified that it will never
set the InterruptPending flag?

6.
+dump_now(bool is_bgworker)
{
..
+ (void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
+ apw_state->pid_using_dumpfile = InvalidPid;
..
}
How will pid_using_dumpfile be set to InvalidPid in the case of error
for non-bgworker cases?
-- I have a try() - catch() in autoprewarm_dump_now I think that is okay.

Okay, then that will work.

7.
+dump_now(bool is_bgworker)
{
..
+ (void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
..
}

How will transient_dump_file_path be unlinked in the case of error in
durable_rename? I think you need to use PG_TRY..PG_CATCH to ensure
same?
-- If durable_rename is failing that seems basic functionality of
autoperwarm is failing so I want it to be an ERROR. I do not want to
remove the temp file as we always truncate before reusing it again. So
I think there is no need to catch all ERROR in dump_now() just to
remove the temp file.

I am not getting your argument here, do you mean to say that if
writing to a transient file is failed then we should remove the
transient file but if the rename is failed then there is no need to
remove it? It sounds strange to me, but maybe you have reason to do
it like that.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Resolved by subject fallback

#101

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Mithun Cy (#97)

Re: Proposal : For Auto-Prewarm.

On Wed, Jul 5, 2017 at 6:25 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Mon, Jul 3, 2017 at 11:58 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sun, Jul 2, 2017 at 10:32 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
On Tue, Jun 27, 2017 at 11:41 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Fri, Jun 23, 2017 at 5:45 AM, Thom Brown <thom@linux.com> wrote:

Also, I find it a bit messy that launch_autoprewarm_dump() doesn't
detect an autoprewarm process already running. I'd want this to
return NULL or an error if called for a 2nd time.

We log instead of error as we try to check only after launching the
worker and inside worker. One solution could be as similar to
autoprewam_dump_now(), the autoprewarm_start_worker() can init shared
memory and check if we can launch worker in backend itself. I will try
to fix same.

I have fixed it now as follows
+Datum
+autoprewarm_start_worker(PG_FUNCTION_ARGS)
+{
+   pid_t       pid;
+
+   init_apw_shmem();
+   pid = apw_state->bgworker_pid;
+   if (pid != InvalidPid)
+       ereport(ERROR,
+               (errmsg("autoprewarm worker is already running under PID %d",
+                       pid)));
+
+   autoprewarm_dump_launcher();
+   PG_RETURN_VOID();
+}
In backend itself, we shall check if an autoprewarm worker is running
then only start the server. There is a possibility if this function is
executed concurrently when there is no worker already running (Which I
think is not a normal usage) then both call will say it has
successfully launched the worker even though only one could have
successfully done that (other will log and silently die).
Why can't we close this remaining race condition? Basically, if we
just perform all of the actions in this function under the lock and
autoprewarm_dump_launcher waits till the autoprewarm worker has
initialized the bgworker_pid, then there won't be any remaining race
condition. I think if we don't close this race condition, it will be
unpredictable whether the user will get the error or there will be
only a server log for the same.
Yes, I can make autoprewarm_dump_launcher to wait until the launched
bgworker set its pid, but this requires one more synchronization
variable between launcher and worker. More than that I see
ShmemInitStruct(), LWLockAcquire can throw ERROR (restarts the
worker), which needs to be called before setting pid. So I thought it
won't be harmful let two concurrent calls to launch workers and we
just log failures. Please let me know if I need to rethink about it.

I don't know whether you need to rethink but as presented in the
patch, it seems unclear to me about the specs of API. As this is an
exposed function to the user, I think the behavior should be well
defined. If you don't think changing code has many advantages, then
at the very least update the docs to indicate the expectation and
behavior of the API. Also, I think it is better to add few comments
in the code to tell about the unpredictable behavior in case of race
condition and the reason for same.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#102

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Amit Kapila (#100)

Re: Proposal : For Auto-Prewarm.

On Thu, Jul 6, 2017 at 10:52 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I am not able to understand what you want to say. Unlogged tables
should be empty in case of crash recovery. Also, we never flush
unlogged buffers except for shutdown checkpoint, refer BufferAlloc and
in particular below comment:

-- Sorry I said that because of my lack of knowledge about unlogged
tables. Yes, what you say is right "an unlogged table is automatically
truncated after a crash or unclean shutdown". So it will be enough if
we just dump them during shutdown time.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#103

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Amit Kapila (#100)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

On Thu, Jul 6, 2017 at 10:52 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

3.

-- I do not think that is true pages of the unlogged table are also
read into buffers for read-only purpose. So if we miss to dump them
while we shut down then the previous dump should be used.

I am not able to understand what you want to say. Unlogged tables
should be empty in case of crash recovery. Also, we never flush
unlogged buffers except for shutdown checkpoint, refer BufferAlloc and
in particular below comment:

* Make sure BM_PERMANENT is set for buffers that must be written at every
* checkpoint. Unlogged buffers only need to be written at shutdown
* checkpoints, except for their "init" forks, which need to be treated
* just like permanent relations.

+ if (buf_state & BM_TAG_VALID &&
+ ((buf_state & BM_PERMANENT) || dump_unlogged))
I have changed it now the final call to dump_now during shutdown or if
called through autoprewarm_dump_now() only we dump blockinfo of
unlogged tables.

-- autoprewarm_dump_now is directly called from the backend. In such
case, we have to handle signals registered for backend in dump_now().
For bgworker dump_block_info_periodically caller of dump_now() handles
SIGTERM, SIGUSR1 which we are interested in.

Okay, but what about signal handler for c
(procsignal_sigusr1_handler). Have you verified that it will never
set the InterruptPending flag?

Okay now CHECK_FOR_INTERRUPTS is called for both.

7.
+dump_now(bool is_bgworker)
{
..
+ (void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
..
}

How will transient_dump_file_path be unlinked in the case of error in
durable_rename? I think you need to use PG_TRY..PG_CATCH to ensure
same?
-- If durable_rename is failing that seems basic functionality of
autoperwarm is failing so I want it to be an ERROR. I do not want to
remove the temp file as we always truncate before reusing it again. So
I think there is no need to catch all ERROR in dump_now() just to
remove the temp file.
I am not getting your argument here, do you mean to say that if
writing to a transient file is failed then we should remove the
transient file but if the rename is failed then there is no need to
remove it? It sounds strange to me, but maybe you have reason to do
it like that.

my intention is to unlink when ever possible and when ever control is
within the function. I thought it is okay if we error inside called
function. If temp file is left there that will not be a problem as it
will be reused(truncated first) for next dump. If you think it is
needed I will add a try() catch() around, to catch any error and then
remove the file.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

autoprewarm_19.patchapplication/octet-stream; name=autoprewarm_19.patchDownload

commit d0d626ff55be0d88bba80d77b41b08ad71eb73ae
Author: mithun <mithun@localhost.localdomain>
Date:   Fri Jul 14 17:02:11 2017 +0530

    patch 19

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e..88580d1 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,10 +1,10 @@
 # contrib/pg_prewarm/Makefile
 
 MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+OBJS = pg_prewarm.o autoprewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
-DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
+DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
new file mode 100644
index 0000000..2e04745
--- /dev/null
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -0,0 +1,1011 @@
+/*-------------------------------------------------------------------------
+ *
+ * autoprewarm.c
+ *		Automatically prewarms the shared buffers when server restarts.
+ *
+ * DESCRIPTION
+ *
+ *		Autoprewarm is a bgworker process that automatically records the
+ *		information about blocks which were present in shared buffers before
+ *		server shutdown. Then prewarms the shared buffers on server restart
+ *		with those blocks.
+ *
+ *		How does it work? When the shared library "pg_prewarm" is preloaded, a
+ *		bgworker "autoprewarm" is launched immediately after the server has
+ *		reached a consistent state. The bgworker will start loading blocks
+ *		recorded until there is no free buffer left in the shared buffers. This
+ *		way we do not replace any new blocks which were loaded either by the
+ *		recovery process or the querying clients.
+ *
+ *		Once the "autoprewarm" bgworker has completed its prewarm task, it will
+ *		start a new task to periodically dump the BlockInfoRecords related to
+ *		the blocks which are currently in shared buffers. On next server
+ *		restart, the bgworker will prewarm the shared buffers by loading those
+ *		blocks. The GUC pg_prewarm.autoprewarm_interval will control the
+ *		dumping activity of the bgworker.
+ *
+ *	Copyright (c) 2016-2017, PostgreSQL Global Development Group
+ *
+ *	IDENTIFICATION
+ *		contrib/pg_prewarm/autoprewarm.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include <unistd.h>
+
+/* These are always necessary for a bgworker. */
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+
+/* These are necessary for prewarm utilities. */
+#include "access/heapam.h"
+#include "access/xact.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "pgstat.h"
+#include "storage/buf_internals.h"
+#include "storage/dsm.h"
+#include "storage/procsignal.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/acl.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relfilenodemap.h"
+#include "utils/resowner.h"
+
+PG_FUNCTION_INFO_V1(autoprewarm_start_worker);
+PG_FUNCTION_INFO_V1(autoprewarm_dump_now);
+
+#define AUTOPREWARM_INTERVAL_SHUTDOWN_ONLY 0
+#define AUTOPREWARM_INTERVAL_DEFAULT 300
+
+#define AUTOPREWARM_FILE "autoprewarm.blocks"
+
+/* Primary functions */
+void		_PG_init(void);
+void		autoprewarm_main(Datum main_arg);
+static void dump_block_info_periodically(void);
+static void autoprewarm_dump_launcher(void);
+static void setup_autoprewarm(BackgroundWorker *autoprewarm,
+				  const char *worker_name,
+				  const char *worker_function,
+				  Datum main_arg, int restart_time,
+				  int extra_flags);
+void		autoprewarm_database_main(Datum main_arg);
+
+/*
+ * Signal Handlers.
+ */
+
+static void apw_sigterm_handler(SIGNAL_ARGS);
+static void apw_sighup_handler(SIGNAL_ARGS);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+static volatile sig_atomic_t got_sighup = false;
+
+/*
+ *	Signal handler for SIGTERM
+ *	Set a flag to handle.
+ */
+static void
+apw_sigterm_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ *	Signal handler for SIGHUP
+ *	Set a flag to reread the config file.
+ */
+static void
+apw_sighup_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sighup = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/* ============================================================================
+ * ==============	Types and variables used by autoprewarm   =============
+ * ============================================================================
+ */
+
+/* Metadata of each persistent block which is dumped and used for loading. */
+typedef struct BlockInfoRecord
+{
+	Oid			database;
+	Oid			tablespace;
+	Oid			filenode;
+	ForkNumber	forknum;
+	BlockNumber blocknum;
+} BlockInfoRecord;
+
+/* Tasks performed by autoprewarm workers.*/
+typedef enum
+{
+	TASK_PREWARM_BUFFERPOOL,	/* prewarm the shared buffers. */
+	TASK_DUMP_BUFFERPOOL_INFO	/* dump the shared buffer's block info. */
+} AutoPrewarmTask;
+
+/* Shared state information for autoprewarm bgworker. */
+typedef struct AutoPrewarmSharedState
+{
+	LWLock		lock;			/* mutual exclusion */
+	pid_t		bgworker_pid;	/* for main bgworker */
+	pid_t		pid_using_dumpfile; /* for autoprewarm or block dump */
+	bool		skip_prewarm_on_restart;	/* if set true, prewarm task will
+											 * not be done */
+
+	/* Following items are for communication with per-database worker */
+	dsm_handle	block_info_handle;
+	Oid			database;
+	int			prewarm_start_idx;
+	int			prewarm_stop_idx;
+	uint32		prewarmed_blocks;
+} AutoPrewarmSharedState;
+
+static AutoPrewarmSharedState *apw_state = NULL;
+
+/* GUC variable that controls the dump activity of autoprewarm. */
+static int	autoprewarm_interval = 0;
+
+/*
+ * The GUC variable controls whether the server should run the autoprewarm
+ * worker.
+ */
+static bool autoprewarm = true;
+
+/* Compare member elements to check whether they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * blockinfo_cmp
+ *		Compare function used for qsort().
+ */
+static int
+blockinfo_cmp(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(tablespace);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+	return 0;
+}
+
+/* ============================================================================
+ * =================	Prewarm part of autoprewarm ========================
+ * ============================================================================
+ */
+
+/*
+ * detach_apw_shmem
+ *		on_apw_exit reset the prewarm shared state
+ */
+
+static void
+detach_apw_shmem(int code, Datum arg)
+{
+	if (apw_state->pid_using_dumpfile == MyProcPid)
+		apw_state->pid_using_dumpfile = InvalidPid;
+	if (apw_state->bgworker_pid == MyProcPid)
+		apw_state->bgworker_pid = InvalidPid;
+}
+
+/*
+ * init_apw_shmem
+ *		Allocate and initialize autoprewarm related shared memory.
+ */
+static void
+init_apw_shmem(void)
+{
+	bool		found = false;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	apw_state = ShmemInitStruct("autoprewarm",
+								sizeof(AutoPrewarmSharedState),
+								&found);
+	if (!found)
+	{
+		/* First time through ... */
+		LWLockInitialize(&apw_state->lock, LWLockNewTrancheId());
+		apw_state->bgworker_pid = InvalidPid;
+		apw_state->pid_using_dumpfile = InvalidPid;
+		apw_state->skip_prewarm_on_restart = false;
+	}
+
+	LWLockRelease(AddinShmemInitLock);
+}
+
+/*
+ * autoprewarm_database_main
+ *		This subroutine loads the BlockInfoRecords of the database set in
+ *		AutoPrewarmSharedState.
+ *
+ * Connect to the database and load the blocks of that database which are given
+ * by [apw_state->prewarm_start_idx, apw_state->prewarm_stop_idx).
+ */
+void
+autoprewarm_database_main(Datum main_arg)
+{
+	uint32		pos;
+	BlockInfoRecord *block_info;
+	Relation	rel = NULL;
+	BlockNumber nblocks = 0;
+	BlockInfoRecord *old_blk;
+	dsm_segment *seg;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, die);
+
+	/* We're now ready to receive signals */
+	BackgroundWorkerUnblockSignals();
+
+	init_apw_shmem();
+	seg = dsm_attach(apw_state->block_info_handle);
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("could not map dynamic shared memory segment")));
+
+	block_info = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	BackgroundWorkerInitializeConnectionByOid(apw_state->database, InvalidOid);
+	old_blk = NULL;
+	pos = apw_state->prewarm_start_idx;
+
+	while (pos < apw_state->prewarm_stop_idx && have_free_buffer())
+	{
+		BlockInfoRecord *blk = &block_info[pos++];
+		Buffer		buf;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Quit if we've reached records for another database. If previous
+		 * blocks are of some global objects, then continue pre-warming.
+		 */
+		if (old_blk != NULL && old_blk->database != blk->database &&
+			old_blk->database != 0)
+			break;
+
+		/*
+		 * As soon as we encounter a block of a new relation, close the old
+		 * relation. Note, that rel will be NULL if try_relation_open failed
+		 * previously, in that case there is nothing to close.
+		 */
+		if (old_blk != NULL && old_blk->filenode != blk->filenode &&
+			rel != NULL)
+		{
+			relation_close(rel, AccessShareLock);
+			rel = NULL;
+			CommitTransactionCommand();
+		}
+
+		/*
+		 * Try to open each new relation, but only once, when we first
+		 * encounter it. If it's been dropped, skip the associated blocks.
+		 */
+		if (old_blk == NULL || old_blk->filenode != blk->filenode)
+		{
+			Oid			reloid;
+
+			Assert(rel == NULL);
+			StartTransactionCommand();
+			reloid = RelidByRelfilenode(blk->tablespace, blk->filenode);
+			if (OidIsValid(reloid))
+				rel = try_relation_open(reloid, AccessShareLock);
+
+			if (!rel)
+				CommitTransactionCommand();
+		}
+		if (!rel)
+		{
+			old_blk = blk;
+			continue;
+		}
+
+		/* Once per fork, check for fork existence and size. */
+		if (old_blk == NULL ||
+			old_blk->filenode != blk->filenode ||
+			old_blk->forknum != blk->forknum)
+		{
+			RelationOpenSmgr(rel);
+
+			/*
+			 * smgrexists is not safe for illegal forknum, hence check whether
+			 * the passed forknum is valid before using it in smgrexists.
+			 */
+			if (blk->forknum > InvalidForkNumber &&
+				blk->forknum <= MAX_FORKNUM &&
+				smgrexists(rel->rd_smgr, blk->forknum))
+				nblocks = RelationGetNumberOfBlocksInFork(rel, blk->forknum);
+			else
+				nblocks = 0;
+		}
+
+		/* Check whether blocknum is valid and within fork file size. */
+		if (blk->blocknum >= nblocks)
+		{
+			/* Move to next forknum. */
+			old_blk = blk;
+			continue;
+		}
+
+		/* Prewarm buffer. */
+		buf = ReadBufferExtended(rel, blk->forknum, blk->blocknum, RBM_NORMAL,
+								 NULL);
+		if (BufferIsValid(buf))
+		{
+			apw_state->prewarmed_blocks++;
+			ReleaseBuffer(buf);
+		}
+
+		old_blk = blk;
+	}
+
+	dsm_detach(seg);
+
+	/* Release lock on previous relation. */
+	if (rel)
+	{
+		relation_close(rel, AccessShareLock);
+		CommitTransactionCommand();
+	}
+
+	return;
+}
+
+/*
+ * autoprewarm_one_database
+ *		Register a per-database dynamic worker to load.
+ */
+static void
+autoprewarm_one_database(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle = NULL;
+	BgwHandleStatus status PG_USED_FOR_ASSERTS_ONLY;
+
+	setup_autoprewarm(&worker, "per-database autoprewarm",
+					  "autoprewarm_database_main",
+					  (Datum) NULL, BGW_NEVER_RESTART,
+					  BGWORKER_BACKEND_DATABASE_CONNECTION);
+
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerShutdown */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker autoprewarm failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerShutdown(handle);
+	Assert(status == BGWH_STOPPED);
+}
+
+/*
+ * autoprewarm_buffers
+ *		The main routine that prewarms the shared buffers.
+ *
+ * The prewarm bgworker will first load all the BlockInfoRecords in
+ * $PGDATA/AUTOPREWARM_FILE to a DSM. Further, these BlockInfoRecords are
+ * separated based on their databases. Finally, for each group of
+ * BlockInfoRecords a per-database worker will be launched to load the
+ * corresponding blocks. Launch the next worker only after the previous one has
+ * finished its job.
+ */
+static void
+autoprewarm_buffers(void)
+{
+	FILE	   *file = NULL;
+	uint32		num_elements,
+				i;
+	BlockInfoRecord *blkinfo;
+	dsm_segment *seg;
+
+	/*
+	 * Since there can be at most one worker for prewarm, locking is not
+	 * required for setting skip_prewarm_on_restart.
+	 */
+	apw_state->skip_prewarm_on_restart = true;
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->pid_using_dumpfile == InvalidPid)
+		apw_state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		LWLockRelease(&apw_state->lock);
+		ereport(LOG,
+				(errmsg("skipping prewarm because block dump file is being written by PID %d",
+						apw_state->pid_using_dumpfile)));
+		return;
+	}
+
+	LWLockRelease(&apw_state->lock);
+
+	file = AllocateFile(AUTOPREWARM_FILE, "r");
+	if (!file)
+	{
+		if (errno != ENOENT)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not read file \"%s\": %m",
+							AUTOPREWARM_FILE)));
+
+		apw_state->pid_using_dumpfile = InvalidPid;
+		return;					/* No file to load. */
+	}
+
+	if (fscanf(file, "<<%u>>i\n", &num_elements) != 1)
+	{
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from file \"%s\": %m",
+						AUTOPREWARM_FILE)));
+	}
+
+	seg = dsm_create(sizeof(BlockInfoRecord) * num_elements, 0);
+
+	blkinfo = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	for (i = 0; i < num_elements; i++)
+	{
+		/* Get next block. */
+		if (5 != fscanf(file, "%u,%u,%u,%u,%u\n", &blkinfo[i].database,
+						&blkinfo[i].tablespace, &blkinfo[i].filenode,
+						(uint32 *) &blkinfo[i].forknum, &blkinfo[i].blocknum))
+			break;
+	}
+
+	FreeFile(file);
+
+	if (num_elements != i)
+		elog(ERROR, "autoprewarm block dump has %u entries but expected %u",
+			 i, num_elements);
+
+	/*
+	 * Sort the block number to increase the chance of sequential reads during
+	 * load.
+	 */
+	pg_qsort(blkinfo, num_elements, sizeof(BlockInfoRecord), blockinfo_cmp);
+
+	apw_state->block_info_handle = dsm_segment_handle(seg);
+	apw_state->prewarm_start_idx = apw_state->prewarm_stop_idx = 0;
+	apw_state->prewarmed_blocks = 0;
+
+	/* Get the info position of the first block of the next database. */
+	while (apw_state->prewarm_start_idx < num_elements)
+	{
+		uint32		i = apw_state->prewarm_start_idx;
+		Oid			current_db = blkinfo[i].database;
+
+		/*
+		 * Advance the prewarm_stop_idx to the first BlockRecordInfo that does
+		 * not belong to this database.
+		 */
+		i++;
+		while (i < num_elements)
+		{
+			if (current_db != blkinfo[i].database)
+			{
+				/*
+				 * Combine BlockRecordInfos of global object with the next
+				 * non-global object.
+				 */
+				if (current_db != InvalidOid)
+					break;
+				current_db = blkinfo[i].database;
+			}
+
+			i++;
+		}
+
+		/*
+		 * If we reach this point with current_db == InvalidOid, then only
+		 * BlockRecordInfos belonging to global objects exist. Since, we can
+		 * not connect with InvalidOid skip prewarming for these objects.
+		 */
+		if (current_db == InvalidOid)
+			break;
+
+		apw_state->prewarm_stop_idx = i;
+		apw_state->database = current_db;
+
+		Assert(apw_state->prewarm_start_idx < apw_state->prewarm_stop_idx);
+
+		/*
+		 * Register a per-database worker to load blocks of the database. Wait
+		 * until it has finished before starting the next worker.
+		 */
+		autoprewarm_one_database();
+		apw_state->prewarm_start_idx = apw_state->prewarm_stop_idx;
+	}
+
+	dsm_detach(seg);
+	apw_state->block_info_handle = DSM_HANDLE_INVALID;
+
+	apw_state->pid_using_dumpfile = InvalidPid;
+	ereport(LOG,
+			(errmsg("autoprewarm successfully prewarmed %d of %d previously-loaded blocks",
+					apw_state->prewarmed_blocks, num_elements)));
+}
+
+/*
+ * ============================================================================
+ * ==============	Dump part of Autoprewarm =============================
+ * ============================================================================
+ */
+
+/*
+ * This submodule is for periodically dumping BlockRecordInfos in shared
+ * buffers into a dump file AUTOPREWARM_FILE.
+ * Each entry of BlockRecordInfo consists of database, tablespace, filenode,
+ * forknum, blocknum. Note that this is in the text form so that the dump
+ * information is readable and can be edited, if required.
+ */
+
+/*
+ * dump_now
+ *		Dumps BlockRecordInfos in shared buffers.
+ */
+static uint32
+dump_now(bool is_bgworker, bool dump_unlogged)
+{
+	uint32		i;
+	int			ret;
+	uint32		num_blocks;
+	BlockInfoRecord *block_info_array;
+	BufferDesc *bufHdr;
+	FILE	   *file;
+	char		transient_dump_file_path[MAXPGPATH];
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->pid_using_dumpfile == InvalidPid)
+		apw_state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		if (!is_bgworker)
+			ereport(ERROR,
+					(errmsg("could not perform block dump because dump file is being used by PID %d",
+							apw_state->pid_using_dumpfile)));
+
+		LWLockRelease(&apw_state->lock);
+		ereport(LOG,
+				(errmsg("skipping block dump because it is already being performed by PID %d",
+						apw_state->pid_using_dumpfile)));
+		return 0;
+	}
+
+	LWLockRelease(&apw_state->lock);
+
+	block_info_array =
+		(BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	for (num_blocks = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		CHECK_FOR_INTERRUPTS();
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* Lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		/*
+		 * The unlogged table will be automatically truncated after a crash or
+		 * unclean shutdown. In such cases we need not prewarm them. Dump
+		 * those BlockRecordInfos only if asked by the caller.
+		 */
+		if (buf_state & BM_TAG_VALID &&
+			((buf_state & BM_PERMANENT) || dump_unlogged))
+		{
+			block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_blocks].tablespace = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+			++num_blocks;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	snprintf(transient_dump_file_path, MAXPGPATH, "%s.tmp", AUTOPREWARM_FILE);
+	file = AllocateFile(transient_dump_file_path, "w");
+	if (!file)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m",
+						transient_dump_file_path)));
+
+	ret = fprintf(file, "<<%u>>\n", num_blocks);
+	if (ret < 0)
+	{
+		int			save_errno = errno;
+
+		unlink(transient_dump_file_path);
+		errno = save_errno;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write to file \"%s\" : %m",
+						transient_dump_file_path)));
+	}
+
+	for (i = 0; i < num_blocks; i++)
+	{
+		CHECK_FOR_INTERRUPTS();
+
+		ret = fprintf(file, "%u,%u,%u,%u,%u\n",
+					  block_info_array[i].database,
+					  block_info_array[i].tablespace,
+					  block_info_array[i].filenode,
+					  (uint32) block_info_array[i].forknum,
+					  block_info_array[i].blocknum);
+		if (ret < 0)
+		{
+			int			save_errno = errno;
+
+			FreeFile(file);
+			unlink(transient_dump_file_path);
+			errno = save_errno;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to file \"%s\" : %m",
+							transient_dump_file_path)));
+		}
+	}
+
+	pfree(block_info_array);
+
+	/*
+	 * Rename transient_dump_file_path to AUTOPREWARM_FILE to make things
+	 * permanent.
+	 */
+	ret = FreeFile(file);
+	if (ret != 0)
+	{
+		int			save_errno = errno;
+
+		unlink(transient_dump_file_path);
+		errno = save_errno;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\" : %m",
+						transient_dump_file_path)));
+	}
+
+	(void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
+	apw_state->pid_using_dumpfile = InvalidPid;
+
+	ereport(DEBUG1,
+			(errmsg("saved metadata info of %d blocks", num_blocks)));
+	return num_blocks;
+}
+
+/*
+ * dump_block_info_periodically
+ *		 This loop periodically call dump_now().
+ *
+ * Call dum_now() at regular intervals defined by GUC variable
+ * autoprewarm_interval.
+ */
+void
+dump_block_info_periodically(void)
+{
+	TimestampTz last_dump_time = 0;
+
+	while (!got_sigterm)
+	{
+		int			rc;
+		struct timeval nap;
+
+		nap.tv_sec = AUTOPREWARM_INTERVAL_DEFAULT;
+		nap.tv_usec = 0;
+
+		/* In case of a SIGHUP, just reload the configuration. */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		if (autoprewarm_interval > AUTOPREWARM_INTERVAL_SHUTDOWN_ONLY)
+		{
+			TimestampTz current_time = GetCurrentTimestamp();
+
+			if (last_dump_time == 0 ||
+				TimestampDifferenceExceeds(last_dump_time,
+										   current_time,
+										   (autoprewarm_interval * 1000)))
+			{
+				dump_now(true, false);
+				last_dump_time = GetCurrentTimestamp();
+				nap.tv_sec = autoprewarm_interval;
+				nap.tv_usec = 0;
+			}
+			else
+			{
+				long		secs;
+				int			usecs;
+
+				TimestampDifference(last_dump_time, current_time,
+									&secs, &usecs);
+				nap.tv_sec = autoprewarm_interval - secs;
+				nap.tv_usec = 0;
+			}
+		}
+		else
+			last_dump_time = 0;
+
+		rc = WaitLatch(&MyProc->procLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   (nap.tv_sec * 1000L) + (nap.tv_usec / 1000L),
+					   PG_WAIT_EXTENSION);
+		ResetLatch(&MyProc->procLatch);
+
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+
+	/* It's time for postmaster shutdown, let's dump for one last time. */
+	dump_now(true, true);
+}
+
+/*
+ * autoprewarm_main
+ *		The main entry point of autoprewarm bgworker process.
+ */
+void
+autoprewarm_main(Datum main_arg)
+{
+	AutoPrewarmTask todo_task;
+
+	/* Establish signal handlers before unblocking signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
+
+	/* We're now ready to receive signals. */
+	BackgroundWorkerUnblockSignals();
+
+	todo_task = DatumGetInt32(main_arg);
+	Assert(todo_task == TASK_PREWARM_BUFFERPOOL ||
+		   todo_task == TASK_DUMP_BUFFERPOOL_INFO);
+	init_apw_shmem();
+	on_shmem_exit(detach_apw_shmem, 0);
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->bgworker_pid != InvalidPid)
+	{
+		LWLockRelease(&apw_state->lock);
+		ereport(LOG,
+				(errmsg("autoprewarm worker is already running under PID %d",
+						apw_state->bgworker_pid)));
+		return;
+	}
+
+	apw_state->bgworker_pid = MyProcPid;
+	LWLockRelease(&apw_state->lock);
+
+	ereport(LOG,
+			(errmsg("autoprewarm worker started")));
+
+	/*
+	 * We have finished initializing worker's state, let's start actual work.
+	 */
+	if (todo_task == TASK_PREWARM_BUFFERPOOL &&
+		!apw_state->skip_prewarm_on_restart)
+		autoprewarm_buffers();
+
+	dump_block_info_periodically();
+
+	ereport(LOG,
+			(errmsg("autoprewarm worker stopped")));
+}
+
+/* ============================================================================
+ * =============	Extension's entry functions/utilities	===============
+ * ============================================================================
+ */
+
+/*
+ * setup_autoprewarm
+ *		A common function to initialize BackgroundWorker structure.
+ */
+static void
+setup_autoprewarm(BackgroundWorker *autoprewarm, const char *worker_name,
+				  const char *worker_function, Datum main_arg, int restart_time,
+				  int extra_flags)
+{
+	MemSet(autoprewarm, 0, sizeof(BackgroundWorker));
+	autoprewarm->bgw_flags = BGWORKER_SHMEM_ACCESS | extra_flags;
+
+	/* Register the autoprewarm background worker */
+	autoprewarm->bgw_start_time = BgWorkerStart_ConsistentState;
+	autoprewarm->bgw_restart_time = restart_time;
+	strcpy(autoprewarm->bgw_library_name, "pg_prewarm");
+	strcpy(autoprewarm->bgw_function_name, worker_function);
+	strncpy(autoprewarm->bgw_name, worker_name, BGW_MAXLEN);
+	autoprewarm->bgw_main_arg = main_arg;
+}
+
+/*
+ * _PG_init
+ *		Extension's entry point.
+ */
+void
+_PG_init(void)
+{
+	BackgroundWorker autoprewarm_worker;
+
+	/* Define custom GUC variables. */
+
+	DefineCustomIntVariable("pg_prewarm.autoprewarm_interval",
+							"Sets the maximum time between two shared buffers dumps",
+							"If set to zero, timer based dumping is disabled.",
+							&autoprewarm_interval,
+							AUTOPREWARM_INTERVAL_DEFAULT,
+							AUTOPREWARM_INTERVAL_SHUTDOWN_ONLY, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	if (process_shared_preload_libraries_in_progress)
+		DefineCustomBoolVariable("pg_prewarm.autoprewarm",
+								 "Starts the autoprewarm worker.",
+								 NULL,
+								 &autoprewarm,
+								 true,
+								 PGC_POSTMASTER,
+								 0,
+								 NULL,
+								 NULL,
+								 NULL);
+	else
+	{
+		/* If not run as a preloaded library, nothing more to do. */
+		EmitWarningsOnPlaceholders("pg_prewarm");
+		return;
+	}
+
+	EmitWarningsOnPlaceholders("pg_prewarm");
+
+	/* Request additional shared resources. */
+	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
+
+	/* If autoprewarm is disabled then nothing more to do. */
+	if (!autoprewarm)
+		return;
+
+	/* Register autoprewarm worker. */
+	setup_autoprewarm(&autoprewarm_worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_PREWARM_BUFFERPOOL), 0, 0);
+	RegisterBackgroundWorker(&autoprewarm_worker);
+}
+
+/*
+ * autoprewarm_dump_launcher
+ *		Dynamically launch an autoprewarm dump worker.
+ */
+static void
+autoprewarm_dump_launcher(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	setup_autoprewarm(&worker, "autoprewarm", "autoprewarm_main",
+					  Int32GetDatum(TASK_DUMP_BUFFERPOOL_INFO), 0, 0);
+
+	/* Set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker \"autoprewarm\" failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+	}
+
+	status = WaitForBackgroundWorkerStartup(handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not start autoprewarm dump bgworker"),
+				 errhint("More details may be available in the server log.")));
+	}
+
+	if (status == BGWH_POSTMASTER_DIED)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("cannot start bgworker autoprewarm without postmaster"),
+				 errhint("Kill all remaining database processes and restart the database.")));
+	}
+
+	Assert(status == BGWH_STARTED);
+}
+
+/*
+ * autoprewarm_start_worker
+ *		The C-Language entry function to launch autoprewarm dump bgworker.
+ */
+Datum
+autoprewarm_start_worker(PG_FUNCTION_ARGS)
+{
+	pid_t		pid;
+
+	init_apw_shmem();
+	pid = apw_state->bgworker_pid;
+	if (pid != InvalidPid)
+		ereport(ERROR,
+				(errmsg("autoprewarm worker is already running under PID %d",
+						pid)));
+
+	autoprewarm_dump_launcher();
+	PG_RETURN_VOID();
+}
+
+/*
+ * autoprewarm_dump_now
+ *		The C-Language entry function to dump immediately.
+ */
+Datum
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+	uint32		num_blocks = 0;
+
+	init_apw_shmem();
+
+	PG_TRY();
+	{
+		num_blocks = dump_now(false, true);
+	}
+	PG_CATCH();
+	{
+		detach_apw_shmem(0, 0);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+	PG_RETURN_UINT32(num_blocks);
+}
diff --git a/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
new file mode 100644
index 0000000..2381c06
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,14 @@
+/* contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_prewarm UPDATE TO '1.2'" to load this file. \quit
+
+CREATE FUNCTION autoprewarm_start_worker()
+RETURNS VOID STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_start_worker'
+LANGUAGE C;
+
+CREATE FUNCTION autoprewarm_dump_now()
+RETURNS pg_catalog.int8 STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_dump_now'
+LANGUAGE C;
diff --git a/contrib/pg_prewarm/pg_prewarm.control b/contrib/pg_prewarm/pg_prewarm.control
index cf2fb92..40e3add 100644
--- a/contrib/pg_prewarm/pg_prewarm.control
+++ b/contrib/pg_prewarm/pg_prewarm.control
@@ -1,5 +1,5 @@
 # pg_prewarm extension
 comment = 'prewarm relation data'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_prewarm'
 relocatable = true
diff --git a/doc/src/sgml/pgprewarm.sgml b/doc/src/sgml/pgprewarm.sgml
index c090401..5a3b532 100644
--- a/doc/src/sgml/pgprewarm.sgml
+++ b/doc/src/sgml/pgprewarm.sgml
@@ -10,7 +10,9 @@
  <para>
   The <filename>pg_prewarm</filename> module provides a convenient way
   to load relation data into either the operating system buffer cache
-  or the <productname>PostgreSQL</productname> buffer cache.
+  or the <productname>PostgreSQL</productname> buffer cache. Additionally, an
+  automatic prewarming of the server buffers is supported whenever the server
+  restarts.
  </para>
 
  <sect2>
@@ -55,6 +57,102 @@ pg_prewarm(regclass, mode text default 'buffer', fork text default 'main',
    cache. For these reasons, prewarming is typically most useful at startup,
    when caches are largely empty.
   </para>
+
+<synopsis>
+autoprewarm_start_worker() RETURNS void
+</synopsis>
+
+  <para>
+   This will start the <literal>autoprewarm</literal> worker which will dump
+   shared buffers to disk at the interval specified by
+   <varname>pg_prewarm.autoprewarm_interval</varname>. As only one
+   <literal>autoprewarm</literal> worker can be run per cluster at a time,
+   additional invocations, if a worker is already running, will return an error.
+   On some corner case when this function is called concurrently to start a
+   worker only one of them can successfully start the worker but both returns
+   success to indicate a new worker is started.
+  </para>
+
+<synopsis>
+autoprewarm_dump_now() RETURNS int8
+</synopsis>
+
+  <para>
+   This will immediately dump shared buffers to disk.  The return value is
+   the number of blocks dumped.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>autoprewarm</title>
+
+  <para>
+  This is a background worker process which will automatically dump shared
+  buffers to disk before a shutdown and then prewarm shared buffers the
+  next time the server is started by loading blocks from disk back into
+  the buffer pool.
+  </para>
+
+  <para>
+  When the shared library <literal>pg_prewarm</literal> is preloaded via
+  <xref linkend="guc-shared-preload-libraries"> in <filename>postgresql.conf</>,
+  an <literal>autoprewarm</literal> background worker is launched immediately
+  after the server has reached a consistent state. The autoprewarm process will
+  start loading blocks recorded in
+  <filename>$PGDATA/autoprewarm.blocks</filename> until there is no free buffer
+  left in the buffer pool. This way we do not replace any new blocks which were
+  loaded either by the recovery process or the querying clients.
+  </para>
+
+  <para>
+  Once the <literal>autoprewarm</literal> process has finished loading buffers
+  from disk, it will periodically dump shared buffers to disk at the inverval
+  specified by <varname>pg_prewarm.autoprewarm_interval</varname>.  Upon the
+  next server restart, the autoprewarm process will prewarm shared buffers with
+  the blocks that were last dumped to disk.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+ <variablelist>
+   <varlistentry>
+    <term>
+     <varname>pg_prewarm.autoprewarm</varname> (<type>boolean</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.autoprewarm</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      Controls whether the server should run autoprewarm worker. This is on by
+      default. This parameter can only be set in the postgresql.conf file or on
+      the server command line
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry>
+   <term>
+     <varname>pg_prewarm.autoprewarm_interval</varname> (<type>int</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.autoprewarm_interval</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is the minimum number of seconds after which autoprewarm dumps
+      shared buffers to disk. The default is 300 seconds. If set to 0,
+      shared buffers will not be dumped at regular intervals, but only when the
+      server shut down.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
  </sect2>
 
  <sect2>
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 9d8ae6a..f033323 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,23 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- a lockless check to see if there is a free buffer in
+ *					   buffer pool.
+ *
+ * If the result is true that will become stale once free buffers are moved out
+ * by other operations, so the caller who strictly want to use a free buffer
+ * should not call this.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index b768b6f..300adfc 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 23a4bbd..d8948cc 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -138,6 +138,8 @@ AttrDefault
 AttrNumber
 AttributeOpts
 AuthRequest
+AutoPrewarmSharedState
+AutoPrewarmTask
 AutoVacOpts
 AutoVacuumShmemStruct
 AutoVacuumWorkItem
@@ -218,6 +220,7 @@ BlobInfo
 Block
 BlockId
 BlockIdData
+BlockInfoRecord
 BlockNumber
 BlockSampler
 BlockSamplerData

#104

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Mithun Cy (#103)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

On Fri, Jul 14, 2017 at 8:17 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

[ new patch ]

I spent some time going over this patch. I initially thought it only
needed minor cosmetic tweaking but the more I poked at it the more
things I found that seemed like they should be changed, so the
attached version looks pretty significantly different from what was
last posted. It's not actually as different as it looks because a lot
of the changes are purely cosmetic.

Changes:

- Rewrote the documentation, many of the comments, and some of the
other messages significantly.
- Renamed private functions so they all start with apw_ instead of
having what seemed to be a mix of naming conventions.
- Reorganized the file so that the important functions are at the top.
- Added prototypes for the static functions that lacked them.
- Got rid of AutoPrewarmTask.
- Got rid of skip_prewarm_on_restart.
- Added LWLockAcquire/LWLockRelease calls in many places where they
were left out. This may make no difference but it seems safer.
- Refactored the worker-starting code into two separate functions, one
for the main worker and one for the per-database worker.
- Inlined some functions that were only called from one place.
- Rewrote the delay loop. Previously this used a struct timeval but
tv_usec was always 0 and the actual struct was never passed to any
system function, so I think this loop couldn't have been accurate to
more than the nearest second and depending unnecessarily on the
operating system structure seems pointless. I changed also changed it
to be more explicit about the autoprewarm_interval == 0 case and to
bump the last dump time before, rather than after, dumping.
Otherwise, the time between dumps will be increased by the amount of
time the dump itself takes, which is not what the user will expect.
- Used the correct PG_RETURN macro -- the return type of
autoprewarm_dump_now is int8, so PG_RETURN_INT64 must be used.
- Updated various other places to use int64 for consistency.
- Possibly a few other things I'm forgetting about right now.

It's quite possible that in making all of these changes I've
introduced some bugs, so I think this needs some testing and review.
It's also possible that some of the changes that I made are actually
not improvements and should be reverted, but it's always hard to tell
that about your own code. Anyway, please see the attached version.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

autoprewarm-rmh.patchapplication/octet-stream; name=autoprewarm-rmh.patchDownload

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e72b..88580d1118 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,10 +1,10 @@
 # contrib/pg_prewarm/Makefile
 
 MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+OBJS = pg_prewarm.o autoprewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
-DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
+DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
new file mode 100644
index 0000000000..59f4d2b0f1
--- /dev/null
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -0,0 +1,927 @@
+/*-------------------------------------------------------------------------
+ *
+ * autoprewarm.c
+ *		Periodically dump information about the blocks present in
+ *		shared_buffers, and reload them on server restart.
+ *
+ *		Due to locking considerations, we can't actually begin prewarming
+ *		until the server reaches a consistent state.  We need the catalogs
+ *		to be consistent so that we can figure out which relation to lock,
+ *		and we need to lock the relations so that we don't try to prewarm
+ *		pages from a relation that is in the process of being dropped.
+ *
+ *		While prewarming, autoprewarm will use two workers.  There's a
+ *		master worker that reads and sorts the list of blocks to be
+ *		prewarmed and then launches a per-database worker for each
+ *		relevant database in turn.  The former keeps running after the
+ *		initial prewarm is complete to update the dump file periodically.
+ *
+ *	Copyright (c) 2016-2017, PostgreSQL Global Development Group
+ *
+ *	IDENTIFICATION
+ *		contrib/pg_prewarm/autoprewarm.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include <unistd.h>
+
+#include "access/heapam.h"
+#include "access/xact.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/buf_internals.h"
+#include "storage/dsm.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/procsignal.h"
+#include "storage/shmem.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/acl.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relfilenodemap.h"
+#include "utils/resowner.h"
+
+#define AUTOPREWARM_FILE "autoprewarm.blocks"
+
+/* Metadata for each block we dump. */
+typedef struct BlockInfoRecord
+{
+	Oid			database;
+	Oid			tablespace;
+	Oid			filenode;
+	ForkNumber	forknum;
+	BlockNumber blocknum;
+} BlockInfoRecord;
+
+/* Shared state information for autoprewarm bgworker. */
+typedef struct AutoPrewarmSharedState
+{
+	LWLock		lock;			/* mutual exclusion */
+	pid_t		bgworker_pid;	/* for main bgworker */
+	pid_t		pid_using_dumpfile; /* for autoprewarm or block dump */
+
+	/* Following items are for communication with per-database worker */
+	dsm_handle	block_info_handle;
+	Oid			database;
+	int64		prewarm_start_idx;
+	int64		prewarm_stop_idx;
+	int64		prewarmed_blocks;
+} AutoPrewarmSharedState;
+
+void		_PG_init(void);
+void		autoprewarm_main(Datum main_arg);
+void		autoprewarm_database_main(Datum main_arg);
+
+PG_FUNCTION_INFO_V1(autoprewarm_start_worker);
+PG_FUNCTION_INFO_V1(autoprewarm_dump_now);
+
+static void apw_load_buffers(void);
+static int64 apw_dump_now(bool is_bgworker, bool dump_unlogged);
+static void apw_start_master_worker(void);
+static void apw_start_database_worker(void);
+static bool apw_init_shmem(void);
+static void apw_detach_shmem(int code, Datum arg);
+static int	apw_compare_blockinfo(const void *p, const void *q);
+static void apw_sigterm_handler(SIGNAL_ARGS);
+static void apw_sighup_handler(SIGNAL_ARGS);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+static volatile sig_atomic_t got_sighup = false;
+
+/* Pointer to shared-memory state. */
+static AutoPrewarmSharedState *apw_state = NULL;
+
+/* GUC variables. */
+static bool autoprewarm = true; /* start worker? */
+static int	autoprewarm_interval;	/* dump interval */
+
+/*
+ * Module load callback.
+ */
+void
+_PG_init(void)
+{
+	DefineCustomIntVariable("pg_prewarm.autoprewarm_interval",
+							"Sets the interval between dumps of shared buffers",
+							"If set to zero, time-based dumping is disabled.",
+							&autoprewarm_interval,
+							300,
+							0, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	if (!process_shared_preload_libraries_in_progress)
+		return;
+
+	/* can't define PGC_POSTMASTER variable after startup */
+	DefineCustomBoolVariable("pg_prewarm.autoprewarm",
+							 "Starts the autoprewarm worker.",
+							 NULL,
+							 &autoprewarm,
+							 true,
+							 PGC_POSTMASTER,
+							 0,
+							 NULL,
+							 NULL,
+							 NULL);
+
+	EmitWarningsOnPlaceholders("pg_prewarm");
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
+
+	/* Register autoprewarm worker, if enabled. */
+	if (autoprewarm)
+		apw_start_master_worker();
+}
+
+/*
+ * Main entry point for the master autoprewarm process.  Per-database workers
+ * have a separate entry point.
+ */
+void
+autoprewarm_main(Datum main_arg)
+{
+	bool		first_time = true;
+	TimestampTz last_dump_time = 0;
+
+	/* Establish signal handlers; once that's done, unblock signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
+	BackgroundWorkerUnblockSignals();
+
+	/* Create (if necessary) and attach to our shared memory area. */
+	if (apw_init_shmem())
+		first_time = false;
+
+	/* Set on-detach hook so that our PID will be cleared on exit. */
+	on_shmem_exit(apw_detach_shmem, 0);
+
+	/*
+	 * Store our PID in the shared memory area --- unless there's already
+	 * another worker running, in which case just exit.
+	 */
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->bgworker_pid != InvalidPid)
+	{
+		LWLockRelease(&apw_state->lock);
+		ereport(LOG,
+				(errmsg("autoprewarm worker is already running under PID %d",
+						apw_state->bgworker_pid)));
+		return;
+	}
+	apw_state->bgworker_pid = MyProcPid;
+	LWLockRelease(&apw_state->lock);
+
+	/*
+	 * Preload buffers from the dump file only if we just created the shared
+	 * memory region.  Otherwise, it's either already been done or shouldn't
+	 * be done - e.g. because the old dump file has been overwritten since the
+	 * server was started.
+	 *
+	 * There's not much point in performing a dump immediately after we finish
+	 * preloading; so, if we do end up preloading, consider the last dump time
+	 * to be equal to the current time.
+	 */
+	if (first_time)
+	{
+		apw_load_buffers();
+		last_dump_time = GetCurrentTimestamp();
+	}
+
+	/* Periodically dump buffers until terminated. */
+	while (!got_sigterm)
+	{
+		int			rc;
+
+		/* In case of a SIGHUP, just reload the configuration. */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		if (autoprewarm_interval <= 0)
+		{
+			/* We're only dumping at shutdown, so just wait forever. */
+			rc = WaitLatch(&MyProc->procLatch,
+						   WL_LATCH_SET | WL_POSTMASTER_DEATH,
+						   -1L,
+						   PG_WAIT_EXTENSION);
+		}
+		else
+		{
+			long		delay_in_ms = 0;
+			TimestampTz next_dump_time = 0;
+			long		secs = 0;
+			int			usecs = 0;
+
+			/* Compute the next dump time. */
+			next_dump_time =
+				TimestampTzPlusMilliseconds(last_dump_time,
+											autoprewarm_interval * 1000);
+			TimestampDifference(GetCurrentTimestamp(), next_dump_time,
+								&secs, &usecs);
+			delay_in_ms = secs + (usecs / 1000);
+
+			/* Perform a dump if it's time. */
+			if (delay_in_ms <= 0)
+			{
+				last_dump_time = GetCurrentTimestamp();
+				apw_dump_now(true, false);
+				continue;
+			}
+
+			/* Sleep until the next dump time. */
+			rc = WaitLatch(&MyProc->procLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   delay_in_ms,
+						   PG_WAIT_EXTENSION);
+		}
+
+		/* Reset the latch, bail out if postmaster died, otherwise loop. */
+		ResetLatch(&MyProc->procLatch);
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+
+	/*
+	 * Dump one last time.  We assume this is probably the result of a system
+	 * shutdown, although it's possible that we've merely been terminated.
+	 */
+	apw_dump_now(true, true);
+}
+
+/*
+ * Read the dump file and launch per-database workers one at a time to
+ * prewarm the buffers found there.
+ */
+static void
+apw_load_buffers(void)
+{
+	FILE	   *file = NULL;
+	int64		num_elements,
+				i;
+	BlockInfoRecord *blkinfo;
+	dsm_segment *seg;
+
+	/*
+	 * Skip the prewarm if the dump file is in use; otherwise, prevent any
+	 * other process from writing it while we're using it.
+	 */
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->pid_using_dumpfile == InvalidPid)
+		apw_state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		LWLockRelease(&apw_state->lock);
+		ereport(LOG,
+				(errmsg("skipping prewarm because block dump file is being written by PID %d",
+						apw_state->pid_using_dumpfile)));
+		return;
+	}
+	LWLockRelease(&apw_state->lock);
+
+	/*
+	 * Open the block dump file.  Exit quietly if it doesn't exist, but report
+	 * any other error.
+	 */
+	file = AllocateFile(AUTOPREWARM_FILE, "r");
+	if (!file)
+	{
+		if (errno == ENOENT)
+		{
+			LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+			apw_state->pid_using_dumpfile = InvalidPid;
+			LWLockRelease(&apw_state->lock);
+			return;				/* No file to load. */
+		}
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read file \"%s\": %m",
+						AUTOPREWARM_FILE)));
+	}
+
+	/* First line of the file is a record count. */
+	if (fscanf(file, "<<" INT64_FORMAT ">>\n", &num_elements) != 1)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from file \"%s\": %m",
+						AUTOPREWARM_FILE)));
+
+	/* Allocate a dynamic shared memory segment to store the record data. */
+	seg = dsm_create(sizeof(BlockInfoRecord) * num_elements, 0);
+	blkinfo = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	/* Read records, one per line. */
+	for (i = 0; i < num_elements; i++)
+	{
+		unsigned	forknum;
+
+		if (fscanf(file, "%u,%u,%u,%u,%u\n", &blkinfo[i].database,
+				   &blkinfo[i].tablespace, &blkinfo[i].filenode,
+				   &forknum, &blkinfo[i].blocknum) != 5)
+			ereport(ERROR,
+					(errmsg("autoprewarm block dump file is corrupted at line " INT64_FORMAT,
+							i + 1)));
+		blkinfo[i].forknum = forknum;
+	}
+
+	FreeFile(file);
+
+	/* Sort the blocks to be loaded. */
+	pg_qsort(blkinfo, num_elements, sizeof(BlockInfoRecord),
+			 apw_compare_blockinfo);
+
+	/* Populate shared memory state. */
+	apw_state->block_info_handle = dsm_segment_handle(seg);
+	apw_state->prewarm_start_idx = apw_state->prewarm_stop_idx = 0;
+	apw_state->prewarmed_blocks = 0;
+
+	/* Get the info position of the first block of the next database. */
+	while (apw_state->prewarm_start_idx < num_elements)
+	{
+		uint32		i = apw_state->prewarm_start_idx;
+		Oid			current_db = blkinfo[i].database;
+
+		/*
+		 * Advance the prewarm_stop_idx to the first BlockRecordInfo that does
+		 * not belong to this database.
+		 */
+		i++;
+		while (i < num_elements)
+		{
+			if (current_db != blkinfo[i].database)
+			{
+				/*
+				 * Combine BlockRecordInfos for global objects withs those of
+				 * the database.
+				 */
+				if (current_db != InvalidOid)
+					break;
+				current_db = blkinfo[i].database;
+			}
+
+			i++;
+		}
+
+		/*
+		 * If we reach this point with current_db == InvalidOid, then only
+		 * BlockRecordInfos belonging to global objects exist.  We can't
+		 * prewarm without a database connection, so just bail out.
+		 */
+		if (current_db == InvalidOid)
+			break;
+
+		/* Configure stop point and database for next per-database worker. */
+		apw_state->prewarm_stop_idx = i;
+		apw_state->database = current_db;
+		Assert(apw_state->prewarm_start_idx < apw_state->prewarm_stop_idx);
+
+		/* If we've run out of free buffers, don't launch another worker. */
+		if (!have_free_buffer())
+			break;
+
+		/*
+		 * Start a per-database worker to load blocks for this database; this
+		 * function will return once the per-database worker exits.
+		 */
+		apw_start_database_worker();
+
+		/* Prepare for next database. */
+		apw_state->prewarm_start_idx = apw_state->prewarm_stop_idx;
+	}
+
+	/* Clean up. */
+	dsm_detach(seg);
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	apw_state->block_info_handle = DSM_HANDLE_INVALID;
+	apw_state->pid_using_dumpfile = InvalidPid;
+	LWLockRelease(&apw_state->lock);
+
+	/* Report our success. */
+	ereport(LOG,
+			(errmsg("autoprewarm successfully prewarmed " INT64_FORMAT
+					" of " INT64_FORMAT " previously-loaded blocks",
+					apw_state->prewarmed_blocks, num_elements)));
+}
+
+/*
+ * Prewarm all blocks for one database (and possibly also global objects, if
+ * those got grouped with this database).
+ */
+void
+autoprewarm_database_main(Datum main_arg)
+{
+	uint32		pos;
+	BlockInfoRecord *block_info;
+	Relation	rel = NULL;
+	BlockNumber nblocks = 0;
+	BlockInfoRecord *old_blk = NULL;
+	dsm_segment *seg;
+
+	/* Establish signal handlers; once that's done, unblock signals. */
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to correct database and get block information. */
+	apw_init_shmem();
+	seg = dsm_attach(apw_state->block_info_handle);
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("could not map dynamic shared memory segment")));
+	BackgroundWorkerInitializeConnectionByOid(apw_state->database, InvalidOid);
+	block_info = (BlockInfoRecord *) dsm_segment_address(seg);
+	pos = apw_state->prewarm_start_idx;
+
+	/*
+	 * Loop until we run out of blocks to prewarm or until we run out of free
+	 * buffers.
+	 */
+	while (pos < apw_state->prewarm_stop_idx && have_free_buffer())
+	{
+		BlockInfoRecord *blk = &block_info[pos++];
+		Buffer		buf;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Quit if we've reached records for another database. If previous
+		 * blocks are of some global objects, then continue pre-warming.
+		 */
+		if (old_blk != NULL && old_blk->database != blk->database &&
+			old_blk->database != 0)
+			break;
+
+		/*
+		 * As soon as we encounter a block of a new relation, close the old
+		 * relation. Note that rel will be NULL if try_relation_open failed
+		 * previously; in that case, there is nothing to close.
+		 */
+		if (old_blk != NULL && old_blk->filenode != blk->filenode &&
+			rel != NULL)
+		{
+			relation_close(rel, AccessShareLock);
+			rel = NULL;
+			CommitTransactionCommand();
+		}
+
+		/*
+		 * Try to open each new relation, but only once, when we first
+		 * encounter it. If it's been dropped, skip the associated blocks.
+		 */
+		if (old_blk == NULL || old_blk->filenode != blk->filenode)
+		{
+			Oid			reloid;
+
+			Assert(rel == NULL);
+			StartTransactionCommand();
+			reloid = RelidByRelfilenode(blk->tablespace, blk->filenode);
+			if (OidIsValid(reloid))
+				rel = try_relation_open(reloid, AccessShareLock);
+
+			if (!rel)
+				CommitTransactionCommand();
+		}
+		if (!rel)
+		{
+			old_blk = blk;
+			continue;
+		}
+
+		/* Once per fork, check for fork existence and size. */
+		if (old_blk == NULL ||
+			old_blk->filenode != blk->filenode ||
+			old_blk->forknum != blk->forknum)
+		{
+			RelationOpenSmgr(rel);
+
+			/*
+			 * smgrexists is not safe for illegal forknum, hence check whether
+			 * the passed forknum is valid before using it in smgrexists.
+			 */
+			if (blk->forknum > InvalidForkNumber &&
+				blk->forknum <= MAX_FORKNUM &&
+				smgrexists(rel->rd_smgr, blk->forknum))
+				nblocks = RelationGetNumberOfBlocksInFork(rel, blk->forknum);
+			else
+				nblocks = 0;
+		}
+
+		/* Check whether blocknum is valid and within fork file size. */
+		if (blk->blocknum >= nblocks)
+		{
+			/* Move to next forknum. */
+			old_blk = blk;
+			continue;
+		}
+
+		/* Prewarm buffer. */
+		buf = ReadBufferExtended(rel, blk->forknum, blk->blocknum, RBM_NORMAL,
+								 NULL);
+		if (BufferIsValid(buf))
+		{
+			apw_state->prewarmed_blocks++;
+			ReleaseBuffer(buf);
+		}
+
+		old_blk = blk;
+	}
+
+	dsm_detach(seg);
+
+	/* Release lock on previous relation. */
+	if (rel)
+	{
+		relation_close(rel, AccessShareLock);
+		CommitTransactionCommand();
+	}
+}
+
+/*
+ * Dump information on blocks in shared buffers.  We use a text format here
+ * so that it's easy to understand and even change the file contents if
+ * necessary.
+ */
+static int64
+apw_dump_now(bool is_bgworker, bool dump_unlogged)
+{
+	uint32		i;
+	int			ret;
+	int64		num_blocks;
+	BlockInfoRecord *block_info_array;
+	BufferDesc *bufHdr;
+	FILE	   *file;
+	char		transient_dump_file_path[MAXPGPATH];
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->pid_using_dumpfile == InvalidPid)
+		apw_state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		if (!is_bgworker)
+			ereport(ERROR,
+					(errmsg("could not perform block dump because dump file is being used by PID %d",
+							apw_state->pid_using_dumpfile)));
+
+		LWLockRelease(&apw_state->lock);
+		ereport(LOG,
+				(errmsg("skipping block dump because it is already being performed by PID %d",
+						apw_state->pid_using_dumpfile)));
+		return 0;
+	}
+
+	LWLockRelease(&apw_state->lock);
+
+	block_info_array =
+		(BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	for (num_blocks = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		CHECK_FOR_INTERRUPTS();
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* Lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		/*
+		 * Unlogged tables will be automatically truncated after a crash or
+		 * unclean shutdown. In such cases we need not prewarm them. Dump them
+		 * only if requested by caller.
+		 */
+		if (buf_state & BM_TAG_VALID &&
+			((buf_state & BM_PERMANENT) || dump_unlogged))
+		{
+			block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_blocks].tablespace = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+			++num_blocks;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	snprintf(transient_dump_file_path, MAXPGPATH, "%s.tmp", AUTOPREWARM_FILE);
+	file = AllocateFile(transient_dump_file_path, "w");
+	if (!file)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m",
+						transient_dump_file_path)));
+
+	ret = fprintf(file, "<<" INT64_FORMAT ">>\n", num_blocks);
+	if (ret < 0)
+	{
+		int			save_errno = errno;
+
+		unlink(transient_dump_file_path);
+		errno = save_errno;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write to file \"%s\" : %m",
+						transient_dump_file_path)));
+	}
+
+	for (i = 0; i < num_blocks; i++)
+	{
+		CHECK_FOR_INTERRUPTS();
+
+		ret = fprintf(file, "%u,%u,%u,%u,%u\n",
+					  block_info_array[i].database,
+					  block_info_array[i].tablespace,
+					  block_info_array[i].filenode,
+					  (uint32) block_info_array[i].forknum,
+					  block_info_array[i].blocknum);
+		if (ret < 0)
+		{
+			int			save_errno = errno;
+
+			FreeFile(file);
+			unlink(transient_dump_file_path);
+			errno = save_errno;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to file \"%s\" : %m",
+							transient_dump_file_path)));
+		}
+	}
+
+	pfree(block_info_array);
+
+	/*
+	 * Rename transient_dump_file_path to AUTOPREWARM_FILE to make things
+	 * permanent.
+	 */
+	ret = FreeFile(file);
+	if (ret != 0)
+	{
+		int			save_errno = errno;
+
+		unlink(transient_dump_file_path);
+		errno = save_errno;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\" : %m",
+						transient_dump_file_path)));
+	}
+
+	(void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
+	apw_state->pid_using_dumpfile = InvalidPid;
+
+	ereport(DEBUG1,
+			(errmsg("wrote block details for " INT64_FORMAT " blocks",
+					num_blocks)));
+	return num_blocks;
+}
+
+/*
+ * SQL-callable function to launch autoprewarm.
+ */
+Datum
+autoprewarm_start_worker(PG_FUNCTION_ARGS)
+{
+	pid_t		pid;
+
+	if (!autoprewarm)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("autoprewarm is disabled")));
+
+	apw_init_shmem();
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	pid = apw_state->bgworker_pid;
+	LWLockRelease(&apw_state->lock);
+
+	if (pid != InvalidPid)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("autoprewarm worker is already running under PID %d",
+						pid)));
+
+	apw_start_master_worker();
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * SQL-callable function to perform an immediate block dump.
+ */
+Datum
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+	int64		num_blocks;
+
+	apw_init_shmem();
+
+	PG_TRY();
+	{
+		num_blocks = apw_dump_now(false, true);
+	}
+	PG_CATCH();
+	{
+		apw_detach_shmem(0, 0);
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
+
+	PG_RETURN_INT64(num_blocks);
+}
+
+/*
+ * Allocate and initialize autoprewarm related shared memory, if not already
+ * done, and set up backend-local pointer to that state.  Returns true if an
+ * existing shared memory segment was found.
+ */
+static bool
+apw_init_shmem(void)
+{
+	bool		found;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	apw_state = ShmemInitStruct("autoprewarm",
+								sizeof(AutoPrewarmSharedState),
+								&found);
+	if (!found)
+	{
+		/* First time through ... */
+		LWLockInitialize(&apw_state->lock, LWLockNewTrancheId());
+		apw_state->bgworker_pid = InvalidPid;
+		apw_state->pid_using_dumpfile = InvalidPid;
+	}
+	LWLockRelease(AddinShmemInitLock);
+
+	return found;
+}
+
+/*
+ * Clear our PID from autoprewarm shared state.
+ */
+static void
+apw_detach_shmem(int code, Datum arg)
+{
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->pid_using_dumpfile == MyProcPid)
+		apw_state->pid_using_dumpfile = InvalidPid;
+	if (apw_state->bgworker_pid == MyProcPid)
+		apw_state->bgworker_pid = InvalidPid;
+	LWLockRelease(&apw_state->lock);
+}
+
+/*
+ * Start autoprewarm master worker process.
+ */
+static void
+apw_start_master_worker(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	MemSet(&worker, 0, sizeof(BackgroundWorker));
+	worker.bgw_flags = BGWORKER_SHMEM_ACCESS;
+	worker.bgw_start_time = BgWorkerStart_ConsistentState;
+	strcpy(worker.bgw_library_name, "pg_prewarm");
+	strcpy(worker.bgw_function_name, "autoprewarm_main");
+	strcpy(worker.bgw_name, "autoprewarm");
+
+	if (process_shared_preload_libraries_in_progress)
+	{
+		RegisterBackgroundWorker(&worker);
+		return;
+	}
+
+	/* must set notify PID to wait for startup */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not register background process"),
+				 errhint("You may need to increase max_worker_processes.")));
+
+	status = WaitForBackgroundWorkerStartup(handle, &pid);
+	if (status != BGWH_STARTED)
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not start background process"),
+				 errhint("More details may be available in the server log.")));
+}
+
+/*
+ * Start autoprewarm per-database worker process.
+ */
+static void
+apw_start_database_worker(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+
+	MemSet(&worker, 0, sizeof(BackgroundWorker));
+	worker.bgw_flags =
+		BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	worker.bgw_start_time = BgWorkerStart_ConsistentState;
+	strcpy(worker.bgw_library_name, "pg_prewarm");
+	strcpy(worker.bgw_function_name, "autoprewarm_database_main");
+	strcpy(worker.bgw_name, "autoprewarm");
+
+	/* must set notify PID to wait for shutdown */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker autoprewarm failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+
+	/*
+	 * Ignore return value; if it fails, postmaster has died, but we have
+	 * checks for that elsewhere.
+	 */
+	WaitForBackgroundWorkerShutdown(handle);
+}
+
+/* Compare member elements to check whether they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * apw_compare_blockinfo
+ *
+ * We depend on all records for a particular database being consecutive
+ * in the dump file; each per-database worker will preload blocks until
+ * it sees a block for some other database.  Sorting by tablespace,
+ * filenode, forknum, and blocknum isn't critical for correctness, but
+ * helps us get a sequential I/O pattern.
+ */
+static int
+apw_compare_blockinfo(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(tablespace);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+
+	return 0;
+}
+
+/*
+ * Signal handler for SIGTERM
+ */
+static void
+apw_sigterm_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Signal handler for SIGHUP
+ */
+static void
+apw_sighup_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sighup = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
diff --git a/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
new file mode 100644
index 0000000000..2381c06eb9
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,14 @@
+/* contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_prewarm UPDATE TO '1.2'" to load this file. \quit
+
+CREATE FUNCTION autoprewarm_start_worker()
+RETURNS VOID STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_start_worker'
+LANGUAGE C;
+
+CREATE FUNCTION autoprewarm_dump_now()
+RETURNS pg_catalog.int8 STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_dump_now'
+LANGUAGE C;
diff --git a/contrib/pg_prewarm/pg_prewarm.control b/contrib/pg_prewarm/pg_prewarm.control
index cf2fb92bed..40e3add481 100644
--- a/contrib/pg_prewarm/pg_prewarm.control
+++ b/contrib/pg_prewarm/pg_prewarm.control
@@ -1,5 +1,5 @@
 # pg_prewarm extension
 comment = 'prewarm relation data'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_prewarm'
 relocatable = true
diff --git a/doc/src/sgml/pgprewarm.sgml b/doc/src/sgml/pgprewarm.sgml
index c090401eca..c6b94a8b72 100644
--- a/doc/src/sgml/pgprewarm.sgml
+++ b/doc/src/sgml/pgprewarm.sgml
@@ -10,7 +10,13 @@
  <para>
   The <filename>pg_prewarm</filename> module provides a convenient way
   to load relation data into either the operating system buffer cache
-  or the <productname>PostgreSQL</productname> buffer cache.
+  or the <productname>PostgreSQL</productname> buffer cache.  Prewarming
+  can be performed manually using the <filename>pg_prewarm</> function,
+  or can be performed automatically by including <literal>pg_prewarm</> in
+  <xref linkend="guc-shared-preload-libraries">.  In the latter case, the
+  system will run a background worker which periodically records the contents
+  of shared buffers in a file called <filename>autoprewarm.blocks</> and
+  will, using 2 background workers, reload those same blocks after a restart.
  </para>
 
  <sect2>
@@ -55,6 +61,67 @@ pg_prewarm(regclass, mode text default 'buffer', fork text default 'main',
    cache. For these reasons, prewarming is typically most useful at startup,
    when caches are largely empty.
   </para>
+
+<synopsis>
+autoprewarm_start_worker() RETURNS void
+</synopsis>
+
+  <para>
+   Launch the main autoprewarm worker.  This will normally happen
+   automatically, but is useful if automatic prewarm was not configured at
+   server startup time and you wish to start up the worker at a later time.
+  </para>
+
+<synopsis>
+autoprewarm_dump_now() RETURNS int8
+</synopsis>
+
+  <para>
+   Update <filename>autoprewarm.blocks</> immediately.  This may be useful
+   if the autoprewarm worker is not running but you anticipate running it
+   after the next restart.  The return value is the number of records written
+   to <filename>autoprewarm.blocks</>.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+ <variablelist>
+   <varlistentry>
+    <term>
+     <varname>pg_prewarm.autoprewarm</varname> (<type>boolean</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.autoprewarm</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      Controls whether the server should run the autoprewarm worker. This is
+      on by default. This parameter can only be set at server start.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry>
+   <term>
+     <varname>pg_prewarm.autoprewarm_interval</varname> (<type>int</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.autoprewarm_interval</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is the interval between updates to <literal>autoprewarm.blocks</>.
+      The default is 300 seconds. If set to 0, the file will not be
+      dumped at regular intervals, but only when the server is shut down.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
  </sect2>
 
  <sect2>
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 9d8ae6ae8e..f033323cff 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,23 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- a lockless check to see if there is a free buffer in
+ *					   buffer pool.
+ *
+ * If the result is true that will become stale once free buffers are moved out
+ * by other operations, so the caller who strictly want to use a free buffer
+ * should not call this.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index b768b6fc96..300adfcf9e 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8166d86ca1..a4ace383fa 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -138,6 +138,7 @@ AttrDefault
 AttrNumber
 AttributeOpts
 AuthRequest
+AutoPrewarmSharedState
 AutoVacOpts
 AutoVacuumShmemStruct
 AutoVacuumWorkItem
@@ -218,6 +219,7 @@ BlobInfo
 Block
 BlockId
 BlockIdData
+BlockInfoRecord
 BlockNumber
 BlockSampler
 BlockSamplerData

#105

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Robert Haas (#104)

Re: Proposal : For Auto-Prewarm.

On Wed, Aug 16, 2017 at 2:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jul 14, 2017 at 8:17 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

[ new patch ]

It's quite possible that in making all of these changes I've
introduced some bugs, so I think this needs some testing and review.
It's also possible that some of the changes that I made are actually
not improvements and should be reverted, but it's always hard to tell
that about your own code. Anyway, please see the attached version.

Sorry, Robert, I was on vacation so could not pick this immediately. I
have been testing and reviewing the patch and I found following
issues.

1. Hang in apw_detach_shmem.
+/*
+ * Clear our PID from autoprewarm shared state.
+ */
+static void
+apw_detach_shmem(int code, Datum arg)
+{
+   LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+   if (apw_state->pid_using_dumpfile == MyProcPid)
+   apw_state->pid_using_dumpfile = InvalidPid;
+   if (apw_state->bgworker_pid == MyProcPid)
+   apw_state->bgworker_pid = InvalidPid;
+   LWLockRelease(&apw_state->lock);
+}

The reason is that we might already be under the apw_state->lock when
we error out and jump to apw_detach_shmem. So we should not be trying
to take the lock again. For example, in autoprewarm_dump_now(),
apw_dump_now() will error out under the lock if bgworker is already
using dump file.

=======
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+    int64 num_blocks;
+
+    apw_init_shmem();
+
+    PG_TRY();
+    {
+         num_blocks = apw_dump_now(false, true);
+    }
+    PG_CATCH();
+    {
+         apw_detach_shmem(0, 0);
+         PG_RE_THROW();
+    }
+    PG_END_TRY();
+
+    PG_RETURN_INT64(num_blocks);
+}

=======
+ LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+ if (apw_state->pid_using_dumpfile == InvalidPid)
+ apw_state->pid_using_dumpfile = MyProcPid;
+ else
+ {
+     if (!is_bgworker)
+         ereport(ERROR,
+                    (errmsg("could not perform block dump because
dump file is being used by PID %d",
+                     apw_state->pid_using_dumpfile)));

This attempt to take lock again hangs the autoprewarm module. I think
there is no need to take lock while we reset those variables as we
reset only if we have set it ourselves.

2) I also found one issue which was my own mistake in my previous patch 19.
In "apw_dump_now" I missed calling FreeFile() on first write error,
whereas on othercases I am already calling the same.
ret = fprintf(file, "<<" INT64_FORMAT ">>\n", num_blocks);
+ if (ret < 0)
+ {
+ int save_errno = errno;
+
+ unlink(transient_dump_file_path);

Other than this, the patch is working as it was previously doing. If
you think my presumed fix(not to take lock) to hang issue is right I
will produce a patch for the same.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#106

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Mithun Cy (#105)

1 attachment(s)

Re: Proposal : For Auto-Prewarm.

On Fri, Aug 18, 2017 at 2:23 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

1. Hang in apw_detach_shmem.
+/*
+ * Clear our PID from autoprewarm shared state.
+ */
+static void
+apw_detach_shmem(int code, Datum arg)
+{
+   LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+   if (apw_state->pid_using_dumpfile == MyProcPid)
+   apw_state->pid_using_dumpfile = InvalidPid;
+   if (apw_state->bgworker_pid == MyProcPid)
+   apw_state->bgworker_pid = InvalidPid;
+   LWLockRelease(&apw_state->lock);
+}
The reason is that we might already be under the apw_state->lock when
we error out and jump to apw_detach_shmem. So we should not be trying
to take the lock again. For example, in autoprewarm_dump_now(),
apw_dump_now() will error out under the lock if bgworker is already
using dump file.

Ah, good catch. While I agree that there is probably no great harm
from skipping the lock here, I think it would be better to just avoid
throwing an error while we hold the lock. I think apw_dump_now() is
the only place where that could happen, and in the attached version,
I've fixed it so it doesn't do that any more. Independent of the
correctness issue, I think the code is easier to read this way.

I also realize that it's not formally sufficient to use
PG_TRY()/PG_CATCH() here, because a FATAL would leave us in a bad
state. Changed to PG_ENSURE_ERROR_CLEANUP().

2) I also found one issue which was my own mistake in my previous patch 19.
In "apw_dump_now" I missed calling FreeFile() on first write error,
whereas on othercases I am already calling the same.
ret = fprintf(file, "<<" INT64_FORMAT ">>\n", num_blocks);
+ if (ret < 0)
+ {
+ int save_errno = errno;
+
+ unlink(transient_dump_file_path);

Changed in the attached version.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

autoprewarm-rmh-v2.patchapplication/octet-stream; name=autoprewarm-rmh-v2.patchDownload

diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile
index 7ad941e72b..88580d1118 100644
--- a/contrib/pg_prewarm/Makefile
+++ b/contrib/pg_prewarm/Makefile
@@ -1,10 +1,10 @@
 # contrib/pg_prewarm/Makefile
 
 MODULE_big = pg_prewarm
-OBJS = pg_prewarm.o $(WIN32RES)
+OBJS = pg_prewarm.o autoprewarm.o $(WIN32RES)
 
 EXTENSION = pg_prewarm
-DATA = pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
+DATA = pg_prewarm--1.1--1.2.sql pg_prewarm--1.1.sql pg_prewarm--1.0--1.1.sql
 PGFILEDESC = "pg_prewarm - preload relation data into system buffer cache"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
new file mode 100644
index 0000000000..cc0350e6d6
--- /dev/null
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -0,0 +1,924 @@
+/*-------------------------------------------------------------------------
+ *
+ * autoprewarm.c
+ *		Periodically dump information about the blocks present in
+ *		shared_buffers, and reload them on server restart.
+ *
+ *		Due to locking considerations, we can't actually begin prewarming
+ *		until the server reaches a consistent state.  We need the catalogs
+ *		to be consistent so that we can figure out which relation to lock,
+ *		and we need to lock the relations so that we don't try to prewarm
+ *		pages from a relation that is in the process of being dropped.
+ *
+ *		While prewarming, autoprewarm will use two workers.  There's a
+ *		master worker that reads and sorts the list of blocks to be
+ *		prewarmed and then launches a per-database worker for each
+ *		relevant database in turn.  The former keeps running after the
+ *		initial prewarm is complete to update the dump file periodically.
+ *
+ *	Copyright (c) 2016-2017, PostgreSQL Global Development Group
+ *
+ *	IDENTIFICATION
+ *		contrib/pg_prewarm/autoprewarm.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+#include <unistd.h>
+
+#include "access/heapam.h"
+#include "access/xact.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "storage/buf_internals.h"
+#include "storage/dsm.h"
+#include "storage/ipc.h"
+#include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/proc.h"
+#include "storage/procsignal.h"
+#include "storage/shmem.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/acl.h"
+#include "utils/guc.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relfilenodemap.h"
+#include "utils/resowner.h"
+
+#define AUTOPREWARM_FILE "autoprewarm.blocks"
+
+/* Metadata for each block we dump. */
+typedef struct BlockInfoRecord
+{
+	Oid			database;
+	Oid			tablespace;
+	Oid			filenode;
+	ForkNumber	forknum;
+	BlockNumber blocknum;
+} BlockInfoRecord;
+
+/* Shared state information for autoprewarm bgworker. */
+typedef struct AutoPrewarmSharedState
+{
+	LWLock		lock;			/* mutual exclusion */
+	pid_t		bgworker_pid;	/* for main bgworker */
+	pid_t		pid_using_dumpfile; /* for autoprewarm or block dump */
+
+	/* Following items are for communication with per-database worker */
+	dsm_handle	block_info_handle;
+	Oid			database;
+	int64		prewarm_start_idx;
+	int64		prewarm_stop_idx;
+	int64		prewarmed_blocks;
+} AutoPrewarmSharedState;
+
+void		_PG_init(void);
+void		autoprewarm_main(Datum main_arg);
+void		autoprewarm_database_main(Datum main_arg);
+
+PG_FUNCTION_INFO_V1(autoprewarm_start_worker);
+PG_FUNCTION_INFO_V1(autoprewarm_dump_now);
+
+static void apw_load_buffers(void);
+static int64 apw_dump_now(bool is_bgworker, bool dump_unlogged);
+static void apw_start_master_worker(void);
+static void apw_start_database_worker(void);
+static bool apw_init_shmem(void);
+static void apw_detach_shmem(int code, Datum arg);
+static int	apw_compare_blockinfo(const void *p, const void *q);
+static void apw_sigterm_handler(SIGNAL_ARGS);
+static void apw_sighup_handler(SIGNAL_ARGS);
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_sigterm = false;
+static volatile sig_atomic_t got_sighup = false;
+
+/* Pointer to shared-memory state. */
+static AutoPrewarmSharedState *apw_state = NULL;
+
+/* GUC variables. */
+static bool autoprewarm = true; /* start worker? */
+static int	autoprewarm_interval;	/* dump interval */
+
+/*
+ * Module load callback.
+ */
+void
+_PG_init(void)
+{
+	DefineCustomIntVariable("pg_prewarm.autoprewarm_interval",
+							"Sets the interval between dumps of shared buffers",
+							"If set to zero, time-based dumping is disabled.",
+							&autoprewarm_interval,
+							300,
+							0, INT_MAX / 1000,
+							PGC_SIGHUP,
+							GUC_UNIT_S,
+							NULL,
+							NULL,
+							NULL);
+
+	if (!process_shared_preload_libraries_in_progress)
+		return;
+
+	/* can't define PGC_POSTMASTER variable after startup */
+	DefineCustomBoolVariable("pg_prewarm.autoprewarm",
+							 "Starts the autoprewarm worker.",
+							 NULL,
+							 &autoprewarm,
+							 true,
+							 PGC_POSTMASTER,
+							 0,
+							 NULL,
+							 NULL,
+							 NULL);
+
+	EmitWarningsOnPlaceholders("pg_prewarm");
+
+	RequestAddinShmemSpace(MAXALIGN(sizeof(AutoPrewarmSharedState)));
+
+	/* Register autoprewarm worker, if enabled. */
+	if (autoprewarm)
+		apw_start_master_worker();
+}
+
+/*
+ * Main entry point for the master autoprewarm process.  Per-database workers
+ * have a separate entry point.
+ */
+void
+autoprewarm_main(Datum main_arg)
+{
+	bool		first_time = true;
+	TimestampTz last_dump_time = 0;
+
+	/* Establish signal handlers; once that's done, unblock signals. */
+	pqsignal(SIGTERM, apw_sigterm_handler);
+	pqsignal(SIGHUP, apw_sighup_handler);
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
+	BackgroundWorkerUnblockSignals();
+
+	/* Create (if necessary) and attach to our shared memory area. */
+	if (apw_init_shmem())
+		first_time = false;
+
+	/* Set on-detach hook so that our PID will be cleared on exit. */
+	on_shmem_exit(apw_detach_shmem, 0);
+
+	/*
+	 * Store our PID in the shared memory area --- unless there's already
+	 * another worker running, in which case just exit.
+	 */
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->bgworker_pid != InvalidPid)
+	{
+		LWLockRelease(&apw_state->lock);
+		ereport(LOG,
+				(errmsg("autoprewarm worker is already running under PID %d",
+						apw_state->bgworker_pid)));
+		return;
+	}
+	apw_state->bgworker_pid = MyProcPid;
+	LWLockRelease(&apw_state->lock);
+
+	/*
+	 * Preload buffers from the dump file only if we just created the shared
+	 * memory region.  Otherwise, it's either already been done or shouldn't
+	 * be done - e.g. because the old dump file has been overwritten since the
+	 * server was started.
+	 *
+	 * There's not much point in performing a dump immediately after we finish
+	 * preloading; so, if we do end up preloading, consider the last dump time
+	 * to be equal to the current time.
+	 */
+	if (first_time)
+	{
+		apw_load_buffers();
+		last_dump_time = GetCurrentTimestamp();
+	}
+
+	/* Periodically dump buffers until terminated. */
+	while (!got_sigterm)
+	{
+		int			rc;
+
+		/* In case of a SIGHUP, just reload the configuration. */
+		if (got_sighup)
+		{
+			got_sighup = false;
+			ProcessConfigFile(PGC_SIGHUP);
+		}
+
+		if (autoprewarm_interval <= 0)
+		{
+			/* We're only dumping at shutdown, so just wait forever. */
+			rc = WaitLatch(&MyProc->procLatch,
+						   WL_LATCH_SET | WL_POSTMASTER_DEATH,
+						   -1L,
+						   PG_WAIT_EXTENSION);
+		}
+		else
+		{
+			long		delay_in_ms = 0;
+			TimestampTz next_dump_time = 0;
+			long		secs = 0;
+			int			usecs = 0;
+
+			/* Compute the next dump time. */
+			next_dump_time =
+				TimestampTzPlusMilliseconds(last_dump_time,
+											autoprewarm_interval * 1000);
+			TimestampDifference(GetCurrentTimestamp(), next_dump_time,
+								&secs, &usecs);
+			delay_in_ms = secs + (usecs / 1000);
+
+			/* Perform a dump if it's time. */
+			if (delay_in_ms <= 0)
+			{
+				last_dump_time = GetCurrentTimestamp();
+				apw_dump_now(true, false);
+				continue;
+			}
+
+			/* Sleep until the next dump time. */
+			rc = WaitLatch(&MyProc->procLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   delay_in_ms,
+						   PG_WAIT_EXTENSION);
+		}
+
+		/* Reset the latch, bail out if postmaster died, otherwise loop. */
+		ResetLatch(&MyProc->procLatch);
+		if (rc & WL_POSTMASTER_DEATH)
+			proc_exit(1);
+	}
+
+	/*
+	 * Dump one last time.  We assume this is probably the result of a system
+	 * shutdown, although it's possible that we've merely been terminated.
+	 */
+	apw_dump_now(true, true);
+}
+
+/*
+ * Read the dump file and launch per-database workers one at a time to
+ * prewarm the buffers found there.
+ */
+static void
+apw_load_buffers(void)
+{
+	FILE	   *file = NULL;
+	int64		num_elements,
+				i;
+	BlockInfoRecord *blkinfo;
+	dsm_segment *seg;
+
+	/*
+	 * Skip the prewarm if the dump file is in use; otherwise, prevent any
+	 * other process from writing it while we're using it.
+	 */
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->pid_using_dumpfile == InvalidPid)
+		apw_state->pid_using_dumpfile = MyProcPid;
+	else
+	{
+		LWLockRelease(&apw_state->lock);
+		ereport(LOG,
+				(errmsg("skipping prewarm because block dump file is being written by PID %d",
+						apw_state->pid_using_dumpfile)));
+		return;
+	}
+	LWLockRelease(&apw_state->lock);
+
+	/*
+	 * Open the block dump file.  Exit quietly if it doesn't exist, but report
+	 * any other error.
+	 */
+	file = AllocateFile(AUTOPREWARM_FILE, "r");
+	if (!file)
+	{
+		if (errno == ENOENT)
+		{
+			LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+			apw_state->pid_using_dumpfile = InvalidPid;
+			LWLockRelease(&apw_state->lock);
+			return;				/* No file to load. */
+		}
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read file \"%s\": %m",
+						AUTOPREWARM_FILE)));
+	}
+
+	/* First line of the file is a record count. */
+	if (fscanf(file, "<<" INT64_FORMAT ">>\n", &num_elements) != 1)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from file \"%s\": %m",
+						AUTOPREWARM_FILE)));
+
+	/* Allocate a dynamic shared memory segment to store the record data. */
+	seg = dsm_create(sizeof(BlockInfoRecord) * num_elements, 0);
+	blkinfo = (BlockInfoRecord *) dsm_segment_address(seg);
+
+	/* Read records, one per line. */
+	for (i = 0; i < num_elements; i++)
+	{
+		unsigned	forknum;
+
+		if (fscanf(file, "%u,%u,%u,%u,%u\n", &blkinfo[i].database,
+				   &blkinfo[i].tablespace, &blkinfo[i].filenode,
+				   &forknum, &blkinfo[i].blocknum) != 5)
+			ereport(ERROR,
+					(errmsg("autoprewarm block dump file is corrupted at line " INT64_FORMAT,
+							i + 1)));
+		blkinfo[i].forknum = forknum;
+	}
+
+	FreeFile(file);
+
+	/* Sort the blocks to be loaded. */
+	pg_qsort(blkinfo, num_elements, sizeof(BlockInfoRecord),
+			 apw_compare_blockinfo);
+
+	/* Populate shared memory state. */
+	apw_state->block_info_handle = dsm_segment_handle(seg);
+	apw_state->prewarm_start_idx = apw_state->prewarm_stop_idx = 0;
+	apw_state->prewarmed_blocks = 0;
+
+	/* Get the info position of the first block of the next database. */
+	while (apw_state->prewarm_start_idx < num_elements)
+	{
+		uint32		i = apw_state->prewarm_start_idx;
+		Oid			current_db = blkinfo[i].database;
+
+		/*
+		 * Advance the prewarm_stop_idx to the first BlockRecordInfo that does
+		 * not belong to this database.
+		 */
+		i++;
+		while (i < num_elements)
+		{
+			if (current_db != blkinfo[i].database)
+			{
+				/*
+				 * Combine BlockRecordInfos for global objects withs those of
+				 * the database.
+				 */
+				if (current_db != InvalidOid)
+					break;
+				current_db = blkinfo[i].database;
+			}
+
+			i++;
+		}
+
+		/*
+		 * If we reach this point with current_db == InvalidOid, then only
+		 * BlockRecordInfos belonging to global objects exist.  We can't
+		 * prewarm without a database connection, so just bail out.
+		 */
+		if (current_db == InvalidOid)
+			break;
+
+		/* Configure stop point and database for next per-database worker. */
+		apw_state->prewarm_stop_idx = i;
+		apw_state->database = current_db;
+		Assert(apw_state->prewarm_start_idx < apw_state->prewarm_stop_idx);
+
+		/* If we've run out of free buffers, don't launch another worker. */
+		if (!have_free_buffer())
+			break;
+
+		/*
+		 * Start a per-database worker to load blocks for this database; this
+		 * function will return once the per-database worker exits.
+		 */
+		apw_start_database_worker();
+
+		/* Prepare for next database. */
+		apw_state->prewarm_start_idx = apw_state->prewarm_stop_idx;
+	}
+
+	/* Clean up. */
+	dsm_detach(seg);
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	apw_state->block_info_handle = DSM_HANDLE_INVALID;
+	apw_state->pid_using_dumpfile = InvalidPid;
+	LWLockRelease(&apw_state->lock);
+
+	/* Report our success. */
+	ereport(LOG,
+			(errmsg("autoprewarm successfully prewarmed " INT64_FORMAT
+					" of " INT64_FORMAT " previously-loaded blocks",
+					apw_state->prewarmed_blocks, num_elements)));
+}
+
+/*
+ * Prewarm all blocks for one database (and possibly also global objects, if
+ * those got grouped with this database).
+ */
+void
+autoprewarm_database_main(Datum main_arg)
+{
+	uint32		pos;
+	BlockInfoRecord *block_info;
+	Relation	rel = NULL;
+	BlockNumber nblocks = 0;
+	BlockInfoRecord *old_blk = NULL;
+	dsm_segment *seg;
+
+	/* Establish signal handlers; once that's done, unblock signals. */
+	pqsignal(SIGTERM, die);
+	BackgroundWorkerUnblockSignals();
+
+	/* Connect to correct database and get block information. */
+	apw_init_shmem();
+	seg = dsm_attach(apw_state->block_info_handle);
+	if (seg == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("could not map dynamic shared memory segment")));
+	BackgroundWorkerInitializeConnectionByOid(apw_state->database, InvalidOid);
+	block_info = (BlockInfoRecord *) dsm_segment_address(seg);
+	pos = apw_state->prewarm_start_idx;
+
+	/*
+	 * Loop until we run out of blocks to prewarm or until we run out of free
+	 * buffers.
+	 */
+	while (pos < apw_state->prewarm_stop_idx && have_free_buffer())
+	{
+		BlockInfoRecord *blk = &block_info[pos++];
+		Buffer		buf;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Quit if we've reached records for another database. If previous
+		 * blocks are of some global objects, then continue pre-warming.
+		 */
+		if (old_blk != NULL && old_blk->database != blk->database &&
+			old_blk->database != 0)
+			break;
+
+		/*
+		 * As soon as we encounter a block of a new relation, close the old
+		 * relation. Note that rel will be NULL if try_relation_open failed
+		 * previously; in that case, there is nothing to close.
+		 */
+		if (old_blk != NULL && old_blk->filenode != blk->filenode &&
+			rel != NULL)
+		{
+			relation_close(rel, AccessShareLock);
+			rel = NULL;
+			CommitTransactionCommand();
+		}
+
+		/*
+		 * Try to open each new relation, but only once, when we first
+		 * encounter it. If it's been dropped, skip the associated blocks.
+		 */
+		if (old_blk == NULL || old_blk->filenode != blk->filenode)
+		{
+			Oid			reloid;
+
+			Assert(rel == NULL);
+			StartTransactionCommand();
+			reloid = RelidByRelfilenode(blk->tablespace, blk->filenode);
+			if (OidIsValid(reloid))
+				rel = try_relation_open(reloid, AccessShareLock);
+
+			if (!rel)
+				CommitTransactionCommand();
+		}
+		if (!rel)
+		{
+			old_blk = blk;
+			continue;
+		}
+
+		/* Once per fork, check for fork existence and size. */
+		if (old_blk == NULL ||
+			old_blk->filenode != blk->filenode ||
+			old_blk->forknum != blk->forknum)
+		{
+			RelationOpenSmgr(rel);
+
+			/*
+			 * smgrexists is not safe for illegal forknum, hence check whether
+			 * the passed forknum is valid before using it in smgrexists.
+			 */
+			if (blk->forknum > InvalidForkNumber &&
+				blk->forknum <= MAX_FORKNUM &&
+				smgrexists(rel->rd_smgr, blk->forknum))
+				nblocks = RelationGetNumberOfBlocksInFork(rel, blk->forknum);
+			else
+				nblocks = 0;
+		}
+
+		/* Check whether blocknum is valid and within fork file size. */
+		if (blk->blocknum >= nblocks)
+		{
+			/* Move to next forknum. */
+			old_blk = blk;
+			continue;
+		}
+
+		/* Prewarm buffer. */
+		buf = ReadBufferExtended(rel, blk->forknum, blk->blocknum, RBM_NORMAL,
+								 NULL);
+		if (BufferIsValid(buf))
+		{
+			apw_state->prewarmed_blocks++;
+			ReleaseBuffer(buf);
+		}
+
+		old_blk = blk;
+	}
+
+	dsm_detach(seg);
+
+	/* Release lock on previous relation. */
+	if (rel)
+	{
+		relation_close(rel, AccessShareLock);
+		CommitTransactionCommand();
+	}
+}
+
+/*
+ * Dump information on blocks in shared buffers.  We use a text format here
+ * so that it's easy to understand and even change the file contents if
+ * necessary.
+ */
+static int64
+apw_dump_now(bool is_bgworker, bool dump_unlogged)
+{
+	uint32		i;
+	int			ret;
+	int64		num_blocks;
+	BlockInfoRecord *block_info_array;
+	BufferDesc *bufHdr;
+	FILE	   *file;
+	char		transient_dump_file_path[MAXPGPATH];
+	pid_t		pid;
+
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	pid = apw_state->pid_using_dumpfile;
+	if (apw_state->pid_using_dumpfile == InvalidPid)
+		apw_state->pid_using_dumpfile = MyProcPid;
+	LWLockRelease(&apw_state->lock);
+
+	if (pid != InvalidPid)
+	{
+		if (!is_bgworker)
+			ereport(ERROR,
+					(errmsg("could not perform block dump because dump file is being used by PID %d",
+							apw_state->pid_using_dumpfile)));
+
+		ereport(LOG,
+				(errmsg("skipping block dump because it is already being performed by PID %d",
+						apw_state->pid_using_dumpfile)));
+		return 0;
+	}
+
+	block_info_array =
+		(BlockInfoRecord *) palloc(sizeof(BlockInfoRecord) * NBuffers);
+
+	for (num_blocks = 0, i = 0; i < NBuffers; i++)
+	{
+		uint32		buf_state;
+
+		CHECK_FOR_INTERRUPTS();
+
+		bufHdr = GetBufferDescriptor(i);
+
+		/* Lock each buffer header before inspecting. */
+		buf_state = LockBufHdr(bufHdr);
+
+		/*
+		 * Unlogged tables will be automatically truncated after a crash or
+		 * unclean shutdown. In such cases we need not prewarm them. Dump them
+		 * only if requested by caller.
+		 */
+		if (buf_state & BM_TAG_VALID &&
+			((buf_state & BM_PERMANENT) || dump_unlogged))
+		{
+			block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
+			block_info_array[num_blocks].tablespace = bufHdr->tag.rnode.spcNode;
+			block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
+			block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
+			block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
+			++num_blocks;
+		}
+
+		UnlockBufHdr(bufHdr, buf_state);
+	}
+
+	snprintf(transient_dump_file_path, MAXPGPATH, "%s.tmp", AUTOPREWARM_FILE);
+	file = AllocateFile(transient_dump_file_path, "w");
+	if (!file)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m",
+						transient_dump_file_path)));
+
+	ret = fprintf(file, "<<" INT64_FORMAT ">>\n", num_blocks);
+	if (ret < 0)
+	{
+		int			save_errno = errno;
+
+		FreeFile(file);
+		unlink(transient_dump_file_path);
+		errno = save_errno;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not write to file \"%s\" : %m",
+						transient_dump_file_path)));
+	}
+
+	for (i = 0; i < num_blocks; i++)
+	{
+		CHECK_FOR_INTERRUPTS();
+
+		ret = fprintf(file, "%u,%u,%u,%u,%u\n",
+					  block_info_array[i].database,
+					  block_info_array[i].tablespace,
+					  block_info_array[i].filenode,
+					  (uint32) block_info_array[i].forknum,
+					  block_info_array[i].blocknum);
+		if (ret < 0)
+		{
+			int			save_errno = errno;
+
+			FreeFile(file);
+			unlink(transient_dump_file_path);
+			errno = save_errno;
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not write to file \"%s\" : %m",
+							transient_dump_file_path)));
+		}
+	}
+
+	pfree(block_info_array);
+
+	/*
+	 * Rename transient_dump_file_path to AUTOPREWARM_FILE to make things
+	 * permanent.
+	 */
+	ret = FreeFile(file);
+	if (ret != 0)
+	{
+		int			save_errno = errno;
+
+		unlink(transient_dump_file_path);
+		errno = save_errno;
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\" : %m",
+						transient_dump_file_path)));
+	}
+
+	(void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
+	apw_state->pid_using_dumpfile = InvalidPid;
+
+	ereport(DEBUG1,
+			(errmsg("wrote block details for " INT64_FORMAT " blocks",
+					num_blocks)));
+	return num_blocks;
+}
+
+/*
+ * SQL-callable function to launch autoprewarm.
+ */
+Datum
+autoprewarm_start_worker(PG_FUNCTION_ARGS)
+{
+	pid_t		pid;
+
+	if (!autoprewarm)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("autoprewarm is disabled")));
+
+	apw_init_shmem();
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	pid = apw_state->bgworker_pid;
+	LWLockRelease(&apw_state->lock);
+
+	if (pid != InvalidPid)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("autoprewarm worker is already running under PID %d",
+						pid)));
+
+	apw_start_master_worker();
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * SQL-callable function to perform an immediate block dump.
+ */
+Datum
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+	int64		num_blocks;
+
+	apw_init_shmem();
+
+	PG_ENSURE_ERROR_CLEANUP(apw_detach_shmem, 0);
+	{
+		num_blocks = apw_dump_now(false, true);
+	}
+	PG_END_ENSURE_ERROR_CLEANUP(apw_detach_shmem, 0);
+
+	PG_RETURN_INT64(num_blocks);
+}
+
+/*
+ * Allocate and initialize autoprewarm related shared memory, if not already
+ * done, and set up backend-local pointer to that state.  Returns true if an
+ * existing shared memory segment was found.
+ */
+static bool
+apw_init_shmem(void)
+{
+	bool		found;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	apw_state = ShmemInitStruct("autoprewarm",
+								sizeof(AutoPrewarmSharedState),
+								&found);
+	if (!found)
+	{
+		/* First time through ... */
+		LWLockInitialize(&apw_state->lock, LWLockNewTrancheId());
+		apw_state->bgworker_pid = InvalidPid;
+		apw_state->pid_using_dumpfile = InvalidPid;
+	}
+	LWLockRelease(AddinShmemInitLock);
+
+	return found;
+}
+
+/*
+ * Clear our PID from autoprewarm shared state.
+ */
+static void
+apw_detach_shmem(int code, Datum arg)
+{
+	LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+	if (apw_state->pid_using_dumpfile == MyProcPid)
+		apw_state->pid_using_dumpfile = InvalidPid;
+	if (apw_state->bgworker_pid == MyProcPid)
+		apw_state->bgworker_pid = InvalidPid;
+	LWLockRelease(&apw_state->lock);
+}
+
+/*
+ * Start autoprewarm master worker process.
+ */
+static void
+apw_start_master_worker(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	MemSet(&worker, 0, sizeof(BackgroundWorker));
+	worker.bgw_flags = BGWORKER_SHMEM_ACCESS;
+	worker.bgw_start_time = BgWorkerStart_ConsistentState;
+	strcpy(worker.bgw_library_name, "pg_prewarm");
+	strcpy(worker.bgw_function_name, "autoprewarm_main");
+	strcpy(worker.bgw_name, "autoprewarm");
+
+	if (process_shared_preload_libraries_in_progress)
+	{
+		RegisterBackgroundWorker(&worker);
+		return;
+	}
+
+	/* must set notify PID to wait for startup */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not register background process"),
+				 errhint("You may need to increase max_worker_processes.")));
+
+	status = WaitForBackgroundWorkerStartup(handle, &pid);
+	if (status != BGWH_STARTED)
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("could not start background process"),
+				 errhint("More details may be available in the server log.")));
+}
+
+/*
+ * Start autoprewarm per-database worker process.
+ */
+static void
+apw_start_database_worker(void)
+{
+	BackgroundWorker worker;
+	BackgroundWorkerHandle *handle;
+
+	MemSet(&worker, 0, sizeof(BackgroundWorker));
+	worker.bgw_flags =
+		BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	worker.bgw_start_time = BgWorkerStart_ConsistentState;
+	strcpy(worker.bgw_library_name, "pg_prewarm");
+	strcpy(worker.bgw_function_name, "autoprewarm_database_main");
+	strcpy(worker.bgw_name, "autoprewarm");
+
+	/* must set notify PID to wait for shutdown */
+	worker.bgw_notify_pid = MyProcPid;
+
+	if (!RegisterDynamicBackgroundWorker(&worker, &handle))
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+				 errmsg("registering dynamic bgworker autoprewarm failed"),
+				 errhint("Consider increasing configuration parameter \"max_worker_processes\".")));
+
+	/*
+	 * Ignore return value; if it fails, postmaster has died, but we have
+	 * checks for that elsewhere.
+	 */
+	WaitForBackgroundWorkerShutdown(handle);
+}
+
+/* Compare member elements to check whether they are not equal. */
+#define cmp_member_elem(fld)	\
+do { \
+	if (a->fld < b->fld)		\
+		return -1;				\
+	else if (a->fld > b->fld)	\
+		return 1;				\
+} while(0);
+
+/*
+ * apw_compare_blockinfo
+ *
+ * We depend on all records for a particular database being consecutive
+ * in the dump file; each per-database worker will preload blocks until
+ * it sees a block for some other database.  Sorting by tablespace,
+ * filenode, forknum, and blocknum isn't critical for correctness, but
+ * helps us get a sequential I/O pattern.
+ */
+static int
+apw_compare_blockinfo(const void *p, const void *q)
+{
+	BlockInfoRecord *a = (BlockInfoRecord *) p;
+	BlockInfoRecord *b = (BlockInfoRecord *) q;
+
+	cmp_member_elem(database);
+	cmp_member_elem(tablespace);
+	cmp_member_elem(filenode);
+	cmp_member_elem(forknum);
+	cmp_member_elem(blocknum);
+
+	return 0;
+}
+
+/*
+ * Signal handler for SIGTERM
+ */
+static void
+apw_sigterm_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sigterm = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * Signal handler for SIGHUP
+ */
+static void
+apw_sighup_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	got_sighup = true;
+
+	if (MyProc)
+		SetLatch(&MyProc->procLatch);
+
+	errno = save_errno;
+}
diff --git a/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
new file mode 100644
index 0000000000..2381c06eb9
--- /dev/null
+++ b/contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql
@@ -0,0 +1,14 @@
+/* contrib/pg_prewarm/pg_prewarm--1.1--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_prewarm UPDATE TO '1.2'" to load this file. \quit
+
+CREATE FUNCTION autoprewarm_start_worker()
+RETURNS VOID STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_start_worker'
+LANGUAGE C;
+
+CREATE FUNCTION autoprewarm_dump_now()
+RETURNS pg_catalog.int8 STRICT
+AS 'MODULE_PATHNAME', 'autoprewarm_dump_now'
+LANGUAGE C;
diff --git a/contrib/pg_prewarm/pg_prewarm.control b/contrib/pg_prewarm/pg_prewarm.control
index cf2fb92bed..40e3add481 100644
--- a/contrib/pg_prewarm/pg_prewarm.control
+++ b/contrib/pg_prewarm/pg_prewarm.control
@@ -1,5 +1,5 @@
 # pg_prewarm extension
 comment = 'prewarm relation data'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_prewarm'
 relocatable = true
diff --git a/doc/src/sgml/pgprewarm.sgml b/doc/src/sgml/pgprewarm.sgml
index c090401eca..c6b94a8b72 100644
--- a/doc/src/sgml/pgprewarm.sgml
+++ b/doc/src/sgml/pgprewarm.sgml
@@ -10,7 +10,13 @@
  <para>
   The <filename>pg_prewarm</filename> module provides a convenient way
   to load relation data into either the operating system buffer cache
-  or the <productname>PostgreSQL</productname> buffer cache.
+  or the <productname>PostgreSQL</productname> buffer cache.  Prewarming
+  can be performed manually using the <filename>pg_prewarm</> function,
+  or can be performed automatically by including <literal>pg_prewarm</> in
+  <xref linkend="guc-shared-preload-libraries">.  In the latter case, the
+  system will run a background worker which periodically records the contents
+  of shared buffers in a file called <filename>autoprewarm.blocks</> and
+  will, using 2 background workers, reload those same blocks after a restart.
  </para>
 
  <sect2>
@@ -55,6 +61,67 @@ pg_prewarm(regclass, mode text default 'buffer', fork text default 'main',
    cache. For these reasons, prewarming is typically most useful at startup,
    when caches are largely empty.
   </para>
+
+<synopsis>
+autoprewarm_start_worker() RETURNS void
+</synopsis>
+
+  <para>
+   Launch the main autoprewarm worker.  This will normally happen
+   automatically, but is useful if automatic prewarm was not configured at
+   server startup time and you wish to start up the worker at a later time.
+  </para>
+
+<synopsis>
+autoprewarm_dump_now() RETURNS int8
+</synopsis>
+
+  <para>
+   Update <filename>autoprewarm.blocks</> immediately.  This may be useful
+   if the autoprewarm worker is not running but you anticipate running it
+   after the next restart.  The return value is the number of records written
+   to <filename>autoprewarm.blocks</>.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Configuration Parameters</title>
+
+ <variablelist>
+   <varlistentry>
+    <term>
+     <varname>pg_prewarm.autoprewarm</varname> (<type>boolean</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.autoprewarm</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      Controls whether the server should run the autoprewarm worker. This is
+      on by default. This parameter can only be set at server start.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry>
+   <term>
+     <varname>pg_prewarm.autoprewarm_interval</varname> (<type>int</type>)
+     <indexterm>
+      <primary><varname>pg_prewarm.autoprewarm_interval</> configuration parameter</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      This is the interval between updates to <literal>autoprewarm.blocks</>.
+      The default is 300 seconds. If set to 0, the file will not be
+      dumped at regular intervals, but only when the server is shut down.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
  </sect2>
 
  <sect2>
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 9d8ae6ae8e..f033323cff 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -169,6 +169,23 @@ ClockSweepTick(void)
 }
 
 /*
+ * have_free_buffer -- a lockless check to see if there is a free buffer in
+ *					   buffer pool.
+ *
+ * If the result is true that will become stale once free buffers are moved out
+ * by other operations, so the caller who strictly want to use a free buffer
+ * should not call this.
+ */
+bool
+have_free_buffer()
+{
+	if (StrategyControl->firstFreeBuffer >= 0)
+		return true;
+	else
+		return false;
+}
+
+/*
  * StrategyGetBuffer
  *
  *	Called by the bufmgr to get the next candidate buffer to use in
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index b768b6fc96..300adfcf9e 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -317,6 +317,7 @@ extern void StrategyNotifyBgWriter(int bgwprocno);
 
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8166d86ca1..a4ace383fa 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -138,6 +138,7 @@ AttrDefault
 AttrNumber
 AttributeOpts
 AuthRequest
+AutoPrewarmSharedState
 AutoVacOpts
 AutoVacuumShmemStruct
 AutoVacuumWorkItem
@@ -218,6 +219,7 @@ BlobInfo
 Block
 BlockId
 BlockIdData
+BlockInfoRecord
 BlockNumber
 BlockSampler
 BlockSamplerData

#107

Mithun Cy

mithun.cy@enterprisedb.com

over 8 years ago

In reply to: Robert Haas (#106)

Re: Proposal : For Auto-Prewarm.

On Fri, Aug 18, 2017 at 9:43 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Aug 18, 2017 at 2:23 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
Ah, good catch. While I agree that there is probably no great harm
from skipping the lock here, I think it would be better to just avoid
throwing an error while we hold the lock. I think apw_dump_now() is
the only place where that could happen, and in the attached version,
I've fixed it so it doesn't do that any more. Independent of the
correctness issue, I think the code is easier to read this way.

I also realize that it's not formally sufficient to use
PG_TRY()/PG_CATCH() here, because a FATAL would leave us in a bad
state. Changed to PG_ENSURE_ERROR_CLEANUP().
2) I also found one issue which was my own mistake in my previous patch 19.
In "apw_dump_now" I missed calling FreeFile() on first write error,
whereas on othercases I am already calling the same.
ret = fprintf(file, "<<" INT64_FORMAT ">>\n", num_blocks);
+ if (ret < 0)
+ {
+ int save_errno = errno;
+
+ unlink(transient_dump_file_path);
Changed in the attached version.

Thanks for the patch, I have tested the above fix now it works as
described. From my test patch looks good, I did not find any other
issues.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#108

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Mithun Cy (#107)

Re: Proposal : For Auto-Prewarm.

On Mon, Aug 21, 2017 at 2:42 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

Thanks for the patch, I have tested the above fix now it works as
described. From my test patch looks good, I did not find any other
issues.

Considering the totality of the circumstances, it seemed appropriate
to me to commit this. So I did.

Thanks for all your work on this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers