System load consideration before spawning parallel workers

Started by Haribabu Kommiover 9 years ago17 messages
#1Haribabu Kommi
kommi.haribabu@gmail.com

we observed that spawning the specified number of parallel workers for
every query that satisfies for parallelism is sometimes leading to
performance drop compared to improvement during the peak system load
with other processes. Adding more processes to the system is leading
to more context switches thus it reducing the performance of other SQL
operations.

In order to avoid this problem, how about adding some kind of system
load consideration into account before spawning the parallel workers?

This may not be a problem for some users, so instead of adding the
code into the core for the system load calculation and etc, how about
providing some additional hook in the code? so that user who wants to
consider the system load registers the function and this hooks
provides the number of parallel workers that can be started.

comments?

Regards,
Hari Babu
Fujitsu Australia

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Haribabu Kommi (#1)
Re: System load consideration before spawning parallel workers

On Fri, Jul 29, 2016 at 11:26 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

we observed that spawning the specified number of parallel workers for
every query that satisfies for parallelism is sometimes leading to
performance drop compared to improvement during the peak system load
with other processes. Adding more processes to the system is leading
to more context switches thus it reducing the performance of other SQL
operations.

Have you consider to tune using max_worker_processes, basically I
think even if you have kept the moderate value for
max_parallel_workers_per_gather, the number of processes might
increase if total number allowed is much bigger.

Are the total number of parallel workers more than number of
CPU's/cores in the system? If yes, I think that might be one reason
for seeing performance degradation.

In order to avoid this problem, how about adding some kind of system
load consideration into account before spawning the parallel workers?

Hook could be a possibility, but not sure how users are going to
decide the number of parallel workers, there might be other backends
as well which can consume resources. I think we might need some form
of throttling w.r.t assignment of parallel workers to avoid system
overload.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Amit Kapila (#2)
Re: System load consideration before spawning parallel workers

On Fri, Jul 29, 2016 at 8:48 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jul 29, 2016 at 11:26 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

we observed that spawning the specified number of parallel workers for
every query that satisfies for parallelism is sometimes leading to
performance drop compared to improvement during the peak system load
with other processes. Adding more processes to the system is leading
to more context switches thus it reducing the performance of other SQL
operations.

Have you consider to tune using max_worker_processes, basically I
think even if you have kept the moderate value for
max_parallel_workers_per_gather, the number of processes might
increase if total number allowed is much bigger.

Are the total number of parallel workers more than number of
CPU's/cores in the system? If yes, I think that might be one reason
for seeing performance degradation.

Tuning max_worker_processes may work. But the problem here is, During the
peak load test, it is observed that setting parallel is leading to
drop in performance.

The main point here is, even if user set all the configurations properly to use
only the free resources as part of parallel query, in case if a sudden
load increase
can cause some performance problems.

In order to avoid this problem, how about adding some kind of system
load consideration into account before spawning the parallel workers?

Hook could be a possibility, but not sure how users are going to
decide the number of parallel workers, there might be other backends
as well which can consume resources. I think we might need some form
of throttling w.r.t assignment of parallel workers to avoid system
overload.

There are some utilities and functions that are available to calculate the
current system load, based on the available resources and system load,
the module can allow the number of parallel workers that can start. In my
observation, adding this calculation will add some overhead for simple
queries. Because of this reason, i feel this can be hook function, only for
the users who want it, can be loaded.

Regards,
Hari Babu
Fujitsu Australia

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Haribabu Kommi (#3)
Re: System load consideration before spawning parallel workers

On 01/08/16 18:08, Haribabu Kommi wrote:

On Fri, Jul 29, 2016 at 8:48 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jul 29, 2016 at 11:26 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

we observed that spawning the specified number of parallel workers for
every query that satisfies for parallelism is sometimes leading to
performance drop compared to improvement during the peak system load
with other processes. Adding more processes to the system is leading
to more context switches thus it reducing the performance of other SQL
operations.

Have you consider to tune using max_worker_processes, basically I
think even if you have kept the moderate value for
max_parallel_workers_per_gather, the number of processes might
increase if total number allowed is much bigger.

Are the total number of parallel workers more than number of
CPU's/cores in the system? If yes, I think that might be one reason
for seeing performance degradation.

Tuning max_worker_processes may work. But the problem here is, During the
peak load test, it is observed that setting parallel is leading to
drop in performance.

The main point here is, even if user set all the configurations properly to use
only the free resources as part of parallel query, in case if a sudden
load increase
can cause some performance problems.

In order to avoid this problem, how about adding some kind of system
load consideration into account before spawning the parallel workers?

Hook could be a possibility, but not sure how users are going to
decide the number of parallel workers, there might be other backends
as well which can consume resources. I think we might need some form
of throttling w.r.t assignment of parallel workers to avoid system
overload.

There are some utilities and functions that are available to calculate the
current system load, based on the available resources and system load,
the module can allow the number of parallel workers that can start. In my
observation, adding this calculation will add some overhead for simple
queries. Because of this reason, i feel this can be hook function, only for
the users who want it, can be loaded.

Regards,
Hari Babu
Fujitsu Australia

Possibly look how make does it with the '-l' flag?

'-l 8' don't start more process when load is 8 or greater, works on
Linux at least...

Cheers,
Gavin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Haribabu Kommi (#3)
Re: System load consideration before spawning parallel workers

On 8/1/16 1:08 AM, Haribabu Kommi wrote:

There are some utilities and functions that are available to calculate the
current system load, based on the available resources and system load,
the module can allow the number of parallel workers that can start. In my
observation, adding this calculation will add some overhead for simple
queries. Because of this reason, i feel this can be hook function, only for
the users who want it, can be loaded.

I think we need to provide more tools to allow users to control system
behavior on a more dynamic basis. How many workers to launch is a good
example. There's more reasons than just CPU that parallel workers can
help (IO being an obvious one, but possible other things like GPU).
Another example is allowing users to alter the selection process used by
autovac workers.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Jim Nasby (#5)
2 attachment(s)
Re: System load consideration before spawning parallel workers

On Fri, Aug 5, 2016 at 9:46 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 8/1/16 1:08 AM, Haribabu Kommi wrote:

There are some utilities and functions that are available to calculate the
current system load, based on the available resources and system load,
the module can allow the number of parallel workers that can start. In my
observation, adding this calculation will add some overhead for simple
queries. Because of this reason, i feel this can be hook function, only
for
the users who want it, can be loaded.

I think we need to provide more tools to allow users to control system
behavior on a more dynamic basis. How many workers to launch is a good
example. There's more reasons than just CPU that parallel workers can help
(IO being an obvious one, but possible other things like GPU). Another
example is allowing users to alter the selection process used by autovac
workers.

Yes, we need to consider many parameters as a system load, not just only
the CPU. Here I attached a POC patch that implements the CPU load
calculation and decide the number of workers based on the available CPU
load. The load calculation code is not an optimized one, there are many ways
that can used to calculate the system load. This is just for an example.

Regards,
Hari Babu
Fujitsu Australia

Attachments:

system_load_hook.patchapplication/octet-stream; name=system_load_hook.patchDownload
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index a47eba6..1f425d5 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -37,6 +37,12 @@
 
 
 /*
+ * Hook function to control the number of parallel workers that can
+ * be generated for a parallel query.
+ */
+number_of_parallel_workers_hook_type number_of_parallel_workers_hook;
+
+/*
  * We don't want to waste a lot of memory on an error queue which, most of
  * the time, will process only a handful of small messages.  However, it is
  * desirable to make it large enough that a typical ErrorResponse can be sent
@@ -436,9 +442,10 @@ LaunchParallelWorkers(ParallelContext *pcxt)
 	BackgroundWorker worker;
 	int			i;
 	bool		any_registrations_failed = false;
+	int 		nworkers = pcxt->nworkers;
 
 	/* Skip this if we have no workers. */
-	if (pcxt->nworkers == 0)
+	if (nworkers == 0)
 		return;
 
 	/* We need to be a lock group leader. */
@@ -462,6 +469,8 @@ LaunchParallelWorkers(ParallelContext *pcxt)
 	worker.bgw_notify_pid = MyProcPid;
 	memset(&worker.bgw_extra, 0, BGW_EXTRALEN);
 
+	if (number_of_parallel_workers_hook)
+		nworkers = Min(nworkers, number_of_parallel_workers_hook(nworkers));
 	/*
 	 * Start workers.
 	 *
@@ -470,7 +479,7 @@ LaunchParallelWorkers(ParallelContext *pcxt)
 	 * fails.  It wouldn't help much anyway, because registering the worker in
 	 * no way guarantees that it will start up and initialize successfully.
 	 */
-	for (i = 0; i < pcxt->nworkers; ++i)
+	for (i = 0; i < nworkers; ++i)
 	{
 		memcpy(worker.bgw_extra, &i, sizeof(int));
 		if (!any_registrations_failed &&
diff --git a/src/include/access/parallel.h b/src/include/access/parallel.h
index 2f8f36f..f587447 100644
--- a/src/include/access/parallel.h
+++ b/src/include/access/parallel.h
@@ -21,6 +21,9 @@
 #include "storage/shm_toc.h"
 
 typedef void (*parallel_worker_main_type) (dsm_segment *seg, shm_toc *toc);
+typedef int (*number_of_parallel_workers_hook_type)(int nworkers);
+
+extern PGDLLIMPORT number_of_parallel_workers_hook_type number_of_parallel_workers_hook;
 
 typedef struct ParallelWorkerInfo
 {
system_load_contrib.patchapplication/octet-stream; name=system_load_contrib.patchDownload
diff --git a/contrib/system_load/Makefile b/contrib/system_load/Makefile
new file mode 100644
index 0000000..2ba70eb
--- /dev/null
+++ b/contrib/system_load/Makefile
@@ -0,0 +1,16 @@
+# contrib/system_load/Makefile
+
+MODULE_big = system_load
+OBJS = system_load.o cpu.o $(WIN32RES)
+PGFILEDESC = "system_load - facilty to consider system load while generating parallel workers"
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/system_load
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/system_load/cpu.c b/contrib/system_load/cpu.c
new file mode 100644
index 0000000..0652eca
--- /dev/null
+++ b/contrib/system_load/cpu.c
@@ -0,0 +1,431 @@
+/*-------------------------------------------------------------------------
+ *
+ * cpu.c
+ *	  Get IDLE CPU information.
+ * 
+ * Copyright (c) 2008-2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  contrib/system_load/cpu.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#ifdef WIN32
+#include <windows.h>
+#endif
+
+#include "utils/builtins.h"
+
+#include "cpu.h"
+
+/* cgroup subsystems */
+#define CPU_SUBSYSTEM "cpu"
+#define CPUACCT_SUBSYSTEM "cpuacct"
+
+/*
+ * get_cpu_idle:
+ * Gather idle cpu time for a given cpu.
+ */
+#ifndef WIN32
+static int
+get_cpu_idle(void)
+{
+	int			skip_line = 0;
+	char		*cmd = "dstat -c -C total 1 1";
+	char		cmd_output[MAX_CMD_OUTPUT_SIZ];
+	FILE	   	*ptr;
+	int			cpu_idle = 100;
+
+	/* Running dstat command and store its output in cmd_output  */
+	if ((ptr = popen(cmd, "r")) != NULL)
+	{
+		while (fgets(cmd_output, MAX_CMD_SIZ, ptr) != NULL)
+		{
+			skip_line++;
+			if (skip_line > 2)
+			{
+				const char	s[] = " ";
+				char	   *token;
+				int			cpu_usr;
+				int			cpu_sys;
+
+				/* get the first token */
+				token = strtok(cmd_output, s);
+				cpu_usr = atoi(token);
+
+				/* get the second token */
+				token = strtok(NULL, s);
+				cpu_sys = atoi(token);
+				cpu_idle = 100 - (cpu_sys + cpu_usr);
+			}
+		}
+
+		pclose(ptr);
+	}
+
+	return cpu_idle;
+}
+
+/*
+ * get_num_of_cpus:
+ * Get the number of CPUs in the system
+ */
+static int
+get_number_of_cpus(void)
+{
+	char	   *cmd = "grep processor /proc/cpuinfo | wc -l";
+	FILE	   *fpipe;
+	char		buf[256];
+	int			num_of_cpus;
+
+	if ((fpipe = popen(cmd, "r")) != NULL)
+	{
+		fgets(buf, sizeof(buf), fpipe);
+		pclose(fpipe);
+		num_of_cpus = pg_atoi(buf, sizeof(int32), 0);
+		return num_of_cpus;
+	}
+	else
+	{
+		return 1;
+	}
+}
+
+static long
+get_cgroup_cmd_output(char *cmd)
+{
+	long cmd_result = -1;
+	char cmd_output[MAX_CMD_OUTPUT_SIZ];
+	FILE *fpipe;
+
+	if ((fpipe = popen(cmd, "r")) != NULL)
+	{
+		if(fgets(cmd_output, MAX_CMD_SIZ, fpipe))
+		{
+			cmd_result = atol(cmd_output);
+		}
+		pclose(fpipe);
+	}
+
+	return cmd_result;
+}
+
+/*
+ * Returns the mount point for specified cgroup subsystem.
+ * Returns NULL if the specified subsystem not already mounted.
+ */
+static char *
+get_mount_point(char *subsystem)
+{
+	char cmd[MAX_CMD_SIZ];
+	char cmd_output[MAX_CMD_OUTPUT_SIZ];
+	char *mount_point = NULL;
+	FILE *fpipe;
+
+	sprintf(cmd, "lssubsys -m %s", subsystem);
+
+	if ((fpipe = popen(cmd, "r")) != NULL)
+	{
+		if(fgets(cmd_output, MAX_CMD_SIZ, fpipe))
+		{
+			strtok(cmd_output," ");
+			mount_point = strtok(NULL, " ");
+		}
+
+		pclose(fpipe);
+	}
+
+	if (!mount_point)
+		return NULL;
+
+	/* Remove trailing newline */
+	if (strchr(mount_point, '\n') != NULL)
+		*strchr(mount_point, '\n') = '\0';
+
+	return pstrdup(mount_point);
+}
+
+/*
+ * Returns the cgroup, which the backend process belongs to.
+ * Returns NULL if backend does not belong to any cgroup.
+ */
+static char *
+get_cgroup_name(int pid, char *sub_system)
+{
+	char cmd[MAX_CMD_SIZ];
+	char cmd_output[MAX_CMD_OUTPUT_SIZ];
+	char *token;
+	char *lastTok = NULL;
+
+	FILE *fpipe;
+
+	sprintf(cmd, "cat /proc/%d/cgroup | grep -w %s", pid, sub_system);
+
+	if ((fpipe = popen(cmd, "r")) != NULL)
+	{
+		if(fgets(cmd_output, MAX_CMD_SIZ, fpipe))
+		{
+			token = strtok(cmd_output, ":");
+			while(token != NULL)
+			{
+				lastTok = token;
+				token = strtok(NULL, ":");
+			}
+		}
+		pclose(fpipe);
+	}
+
+	/* No associated cgroup*/
+	if (!lastTok)
+		return NULL;
+
+	/* Remove leading "/" */
+	if (strchr(lastTok, '/') != NULL)
+		lastTok = lastTok + 1;
+
+	/* Remove trailing newline */
+	if (strchr(lastTok, '\n') != NULL)
+		*strchr(lastTok, '\n') = '\0';
+
+	/* No associated cgroup*/
+	if (strcmp(lastTok, "") == 0)
+		return NULL;
+
+	return pstrdup(lastTok);
+}
+
+int
+get_cgroup_cpu_idle(void)
+{
+	char *cpu_mount_point;
+	char *cpuacct_mount_point;
+	char *cpu_cgroup;
+	char *cpuacct_cgroup;
+
+	char cmd[MAX_CMD_SIZ];
+
+	long cpuacct_usage_old = 0;
+	long cpuacct_usage_new = 0;
+	long cpuacct_usage_delta_us = 0;
+	long cpu_cfs_quota_us;
+	long cpu_cfs_period_us;
+	int	number_of_cpus = 0;
+	int idle_cpu = 0;
+
+	cpuacct_cgroup = get_cgroup_name(MyProcPid, CPUACCT_SUBSYSTEM);
+	cpu_cgroup = get_cgroup_name(MyProcPid, CPU_SUBSYSTEM);
+	elog(DEBUG1,"cpuacct groupname : %s\n", cpuacct_cgroup);
+	elog(DEBUG1,"cpu cgroup: %s\n", cpu_cgroup);
+
+	/* If control group configured, get the idle cpu using cgroup parameters */
+	if (cpuacct_cgroup && cpu_cgroup)
+	{
+		cpu_mount_point = get_mount_point(CPU_SUBSYSTEM);
+		cpuacct_mount_point = get_mount_point(CPUACCT_SUBSYSTEM);
+		elog(DEBUG1, "cpuacct mpt: %s\n",cpuacct_mount_point);
+		elog(DEBUG1, "cpu mpt %s\n", cpu_mount_point);
+
+		sprintf(cmd, "cat %s/%s/cpu.cfs_quota_us", cpu_mount_point, cpu_cgroup);
+		cpu_cfs_quota_us = get_cgroup_cmd_output(cmd);
+
+		if (cpu_cfs_quota_us != -1)
+		{
+			sprintf(cmd, "cat %s/%s/cpu.cfs_period_us", cpu_mount_point, cpu_cgroup);
+			cpu_cfs_period_us = get_cgroup_cmd_output(cmd);
+
+			sprintf(cmd, "cat %s/%s/cpuacct.usage", cpuacct_mount_point, cpuacct_cgroup);
+			cpuacct_usage_old = get_cgroup_cmd_output(cmd);
+
+			pg_usleep(cpu_cfs_period_us);
+
+			sprintf(cmd, "cat %s/%s/cpuacct.usage", cpuacct_mount_point, cpuacct_cgroup);
+			cpuacct_usage_new = get_cgroup_cmd_output(cmd);
+
+			cpuacct_usage_delta_us = (cpuacct_usage_new - cpuacct_usage_old)/1000;
+			idle_cpu = (cpu_cfs_quota_us - cpuacct_usage_delta_us) * 100 / cpu_cfs_period_us;
+
+			pfree(cpu_cgroup);
+			pfree(cpuacct_cgroup);
+			pfree(cpuacct_mount_point);
+			pfree(cpu_mount_point);
+
+			elog(DEBUG1, "BE: get_cgroup_cpu_idle(): Total Idle CPU in cgroup: %d ", idle_cpu);
+			return idle_cpu;
+		}
+	}
+
+	/*
+	 * TODO: Calculate the system load which will be platform dependent for
+	 * linux/solaris and Windows.
+	 */
+	number_of_cpus = get_number_of_cpus();
+	idle_cpu = get_cpu_idle() * number_of_cpus;
+
+	elog(DEBUG1, "BE: get_cgroup_cpu_idle(): Number of CPU's: %d, Total Idle CPU: %d ",
+	number_of_cpus, idle_cpu);
+	return idle_cpu;
+}
+#endif /*end #ifndef WIN32*/
+
+#ifdef WIN32
+typedef BOOL(WINAPI *LPFN_GLPI)(
+	PSYSTEM_LOGICAL_PROCESSOR_INFORMATION,
+	PDWORD);
+/* Helper function to count set bits in the processor mask. */
+unsigned long CountSetBits(ULONG_PTR bitMask)
+{
+	DWORD LSHIFT = sizeof(ULONG_PTR) * 8 - 1;
+	DWORD bitSetCount = 0;
+	ULONG_PTR bitTest = (ULONG_PTR)1 << LSHIFT;
+	DWORD i;
+
+	for (i = 0; i <= LSHIFT; ++i)
+	{
+		bitSetCount += ((bitMask & bitTest) ? 1 : 0);
+		bitTest /= 2;
+	}
+
+	return bitSetCount;
+}
+
+static float CalculateCPULoad(unsigned long long idleTicks, unsigned long long totalTicks)
+{
+	static unsigned long long _previousTotalTicks = 0;
+	static unsigned long long _previousIdleTicks = 0;
+
+	unsigned long long totalTicksSinceLastTime = totalTicks - _previousTotalTicks;
+	unsigned long long idleTicksSinceLastTime = idleTicks - _previousIdleTicks;
+
+	float ret = 1.0f - ((totalTicksSinceLastTime > 0) ? ((float)idleTicksSinceLastTime) / totalTicksSinceLastTime : 0);
+
+	_previousTotalTicks = totalTicks;
+	_previousIdleTicks = idleTicks;
+	return ret;
+}
+
+static unsigned long long FileTimeToInt64(const FILETIME *ft)
+{
+	return (((unsigned long long)(ft->dwHighDateTime)) << 32) | ((unsigned long long)ft->dwLowDateTime);
+}
+
+/***
+ * Returns 1.0f for "CPU fully pinned", 0.0f for "CPU idle", or somewhere in between
+ * You'll need to call this at regular intervals, since it measures the load between
+ * the previous call and the current one.  Returns 0 on error.
+ */
+float GetCPULoad()
+{
+	FILETIME idleTime, kernelTime, userTime;
+	float ret = 0.0f;
+
+	if (GetSystemTimes(&idleTime, &kernelTime, &userTime))
+		ret = CalculateCPULoad(FileTimeToInt64(&idleTime), FileTimeToInt64(&kernelTime) + FileTimeToInt64(&userTime));
+
+	return ret;
+}
+
+/*
+ * Get total number of cpus. 
+ * Return 1 if failed to get it.
+ */
+int GetNumberOfCPUs()
+{
+	LPFN_GLPI glpi;
+	BOOL done = FALSE;
+	PSYSTEM_LOGICAL_PROCESSOR_INFORMATION buffer = NULL;
+	PSYSTEM_LOGICAL_PROCESSOR_INFORMATION ptr = NULL;
+	DWORD returnLength = 0;
+	DWORD logicalProcessorCount = 0;
+	DWORD byteOffset = 0;
+
+	glpi = (LPFN_GLPI)GetProcAddress(
+		GetModuleHandle(TEXT("kernel32")),
+		"GetLogicalProcessorInformation");
+	if (NULL == glpi)
+	{
+		elog(DEBUG1,"GetLogicalProcessorInformation is not supported.\n");
+		return (1);
+	}
+
+	while (!done)
+	{
+		DWORD rc = glpi(buffer, &returnLength);
+
+		if (FALSE == rc)
+		{
+			if (GetLastError() == ERROR_INSUFFICIENT_BUFFER)
+			{
+				if (buffer)
+					free(buffer);
+
+				buffer = (PSYSTEM_LOGICAL_PROCESSOR_INFORMATION)malloc(
+					returnLength);
+
+				if (NULL == buffer)
+				{
+					elog(DEBUG1,"Error: Allocation failure\n");
+					return (1);
+				}
+			}
+			else
+			{
+				elog(DEBUG1,"Error %d\n", GetLastError());
+				return (1);
+			}
+		}
+		else
+		{
+			done = TRUE;
+		}
+	}
+
+	ptr = buffer;
+
+	while (byteOffset + sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION) <= returnLength)
+	{
+		switch (ptr->Relationship)
+		{
+		case RelationProcessorCore:
+			// A hyperthreaded core supplies more than one logical processor.
+			logicalProcessorCount += CountSetBits(ptr->ProcessorMask);
+			break;
+
+		default:			
+			break;
+		}
+		byteOffset += sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION);
+		ptr++;
+	}
+
+
+	free(buffer);
+
+	return logicalProcessorCount;
+}
+
+int
+get_cgroup_cpu_idle(void)
+{
+	int num = 0;
+	float cpuload = 0;
+	int totalIdleCPU = 0;
+
+	GetCPULoad();
+	/* 200 millisecond sleep delay */
+	Sleep(200);
+	cpuload = GetCPULoad() * 100;
+		
+	num = GetNumberOfCPUs();
+
+	totalIdleCPU = (100 - (int)cpuload) * num;
+
+	return totalIdleCPU;
+}
+
+#endif
diff --git a/contrib/system_load/cpu.h b/contrib/system_load/cpu.h
new file mode 100644
index 0000000..b89bed8
--- /dev/null
+++ b/contrib/system_load/cpu.h
@@ -0,0 +1,20 @@
+/*-------------------------------------------------------------------------
+ *
+ * cpu.h
+ *	  Declare for getting IDLE CPU information functions
+ *
+ * contrib/system_load/cpu.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CPU_H
+#define CPU_H
+
+#include "miscadmin.h"
+
+#define MAX_CMD_SIZ 100
+#define MAX_CMD_OUTPUT_SIZ 256
+
+extern int get_cgroup_cpu_idle(void);
+
+#endif   /* CPU_H */
diff --git a/contrib/system_load/system_load.c b/contrib/system_load/system_load.c
new file mode 100644
index 0000000..78683e5
--- /dev/null
+++ b/contrib/system_load/system_load.c
@@ -0,0 +1,100 @@
+/*-------------------------------------------------------------------------
+ *
+ * system_load.c
+ *
+ * Modules to find out the current system load. This can be used to
+ * identify number of parallel workers that can be started.
+ *
+ * Copyright (c) 2008-2016, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  contrib/system_load/system_load.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/parallel.h"
+#include "utils/guc.h"
+
+#include "cpu.h"
+
+
+PG_MODULE_MAGIC;
+
+/* GUC variable */
+static bool enable_system_load = false;
+
+/* Saved hook values in case of unload */
+static number_of_parallel_workers_hook_type prev_number_of_parallel_workers_hook = NULL;
+
+void		_PG_init(void);
+void		_PG_fini(void);
+
+static int system_load_parallel_workers(int nworkers);
+
+/*
+ * Module load callback
+ */
+void
+_PG_init(void)
+{
+	/*
+	 * we have to be loaded via shared_preload_libraries. If not, fall out
+	 * without hooking into any of the main system.
+	 */
+	if (!process_shared_preload_libraries_in_progress)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("system load extension must be loaded via shared_preload_libraries")));
+				 
+	/* Define custom GUC variables. */
+	DefineCustomBoolVariable("system_load.enable",
+							 "Use system load into consideration while generating parallel workers.",
+							 NULL,
+							 &enable_system_load,
+							 false,
+							 PGC_SUSET,
+							 0,
+							 NULL,
+							 NULL,
+							 NULL);
+
+	/* Install hooks. */
+	prev_number_of_parallel_workers_hook = number_of_parallel_workers_hook;
+	number_of_parallel_workers_hook = system_load_parallel_workers;
+}
+
+/*
+ * Module unload callback
+ */
+void
+_PG_fini(void)
+{
+	/* Uninstall hooks. */
+	number_of_parallel_workers_hook = prev_number_of_parallel_workers_hook;
+}
+
+/*
+ * Function to return the number of parallel workers that are possible
+ * under current system load.
+ */
+static int system_load_parallel_workers(int nworkers)
+{
+	if (enable_system_load)
+	{
+		int idle_cpu;
+		
+		idle_cpu = get_cgroup_cpu_idle();
+		return (idle_cpu / 100);
+	}
+	else
+	{
+		/*
+ 		 * In case system load calculation is disabled,
+		 * return the planned number of workers.
+		 */
+		return nworkers;
+	}
+}
#7Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Gavin Flower (#4)
Re: System load consideration before spawning parallel workers

On 8/1/16 2:17 AM, Gavin Flower wrote:

Possibly look how make does it with the '-l' flag?

'-l 8' don't start more process when load is 8 or greater, works on
Linux at least...

The problem with that approach is that it takes about a minute for the
load averages figures to be updated, by which time you have already
thrashed your system.

You can try this out by building PostgreSQL this way. Please save your
work first, because you might have to hard-reboot your system.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Haribabu Kommi (#6)
Re: System load consideration before spawning parallel workers

On 8/16/16 3:39 AM, Haribabu Kommi wrote:

Yes, we need to consider many parameters as a system load, not just only
the CPU. Here I attached a POC patch that implements the CPU load
calculation and decide the number of workers based on the available CPU
load. The load calculation code is not an optimized one, there are many ways
that can used to calculate the system load. This is just for an example.

I see a number of discussion points here:

We don't yet have enough field experience with the parallel query
facilities to know what kind of use patterns there are and what systems
for load management we need. So I think building a highly specific
system like this seems premature. We have settings to limit process
numbers, which seems OK as a start, and those knobs have worked
reasonably well in other areas (e.g., max connections, autovacuum). We
might well want to enhance this area, but we'll need more experience and
information.

If we think that checking the CPU load is a useful way to manage process
resources, why not apply this to more kinds of processes? I could
imagine that limiting connections by load could be useful. Parallel
workers is only one specific niche of this problem.

As I just wrote in another message in this thread, I don't trust system
load metrics very much as a gatekeeper. They are reasonable for
long-term charting to discover trends, but there are numerous potential
problems for using them for this kind of resource control thing.

All of this seems very platform specific, too. You have
Windows-specific code, but the rest seems very Linux-specific. The
dstat tool I had never heard of before. There is stuff with cgroups,
which I don't know how portable they are across different Linux
installations. Something about Solaris was mentioned. What about the
rest? How can we maintain this in the long term? How do we know that
these facilities actually work correctly and not cause mysterious problems?

There is a bunch of math in there that is not documented much. I can't
tell without reverse engineering the code what any of this is supposed
to do.

My suggestion is that we focus on refining the process control numbers
that we already have. We had extensive discussions about that during
9.6 beta. We have related patches in the commit fest right now. Many
ideas have been posted. System admins are generally able to count their
CPUs and match that to the number of sessions and jobs they need to run.
Everything beyond that could be great but seems premature before we
have the basics figured out.

Maybe a couple of hooks could be useful to allow people to experiment
with this. But the hooks should be more general, as described above.
But I think a few GUC settings that can be adjusted at run time could be
sufficient as well.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#8)
Re: System load consideration before spawning parallel workers

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

As I just wrote in another message in this thread, I don't trust system
load metrics very much as a gatekeeper. They are reasonable for
long-term charting to discover trends, but there are numerous potential
problems for using them for this kind of resource control thing.

As a note in support of that, sendmail has a "feature" to suppress service
if system load gets above X, which I have never found to do anything
except result in self-DOSing. The load spike might not have anything to
do with the service that is trying to un-spike things. Even if it does,
Peter is correct to note that the response delay is much too long to form
part of a useful feedback loop. It could be all right for scheduling
activities whose length is comparable to the load average measurement
interval, but not for short-term decisions.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Peter Eisentraut (#7)
Re: System load consideration before spawning parallel workers

On 02/09/16 04:44, Peter Eisentraut wrote:

On 8/1/16 2:17 AM, Gavin Flower wrote:

Possibly look how make does it with the '-l' flag?

'-l 8' don't start more process when load is 8 or greater, works on
Linux at least...

The problem with that approach is that it takes about a minute for the
load averages figures to be updated, by which time you have already
thrashed your system.

You can try this out by building PostgreSQL this way. Please save your
work first, because you might have to hard-reboot your system.

Hmm... I've built several versions of pg this way, without any obvious
problems!

Looking at top, suggests that the load averages never go much above 8,
and are usually less.

This is the bash script I use:

#!/bin/bash
# postgresql-build.sh

VERSION='9.5.0'

TAR_FILE="postgresql-$VERSION.tar.bz2"
echo 'TAR_FILE['$TAR_FILE']'
tar xvf $TAR_FILE

PORT='--with-pgport=5433' ############################ std is 5432

BASE_DIR="postgresql-$VERSION"
echo 'BASE_DIR['$BASE_DIR']'
cd $BASE_DIR

PREFIX="--prefix=/usr/local/lib/postgres-$VERSION"
echo 'PREFIX['$PREFIX']'

LANGUAGES='--with-python'
echo 'LANGUAGES['$LANGUAGES']'

SECURITY='--with-openssl --with-pam --with-ldap'
echo 'PREFIX['$PREFIX']'

XML='--with-libxml --with-libxslt'
echo 'SECURITY['$SECURITY']'

TZDATA='--with-system-tzdata=/usr/share/zoneinfo'
echo 'TZDATA['$TZDATA']'

##DEBUG='--enable-debug'
##echo 'DEBUG['$DEBUG']'

./configure $PREFIX $LANGUAGES $SECURITY $XML $TZDATA $DEBUG

time make -j7 -l8 && time make -j7 -l8 check

Cheers,
Gavin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Peter Eisentraut (#8)
Re: System load consideration before spawning parallel workers

On 02/09/16 05:01, Peter Eisentraut wrote:

On 8/16/16 3:39 AM, Haribabu Kommi wrote:

[...]

All of this seems very platform specific, too. You have
Windows-specific code, but the rest seems very Linux-specific. The
dstat tool I had never heard of before. There is stuff with cgroups,
which I don't know how portable they are across different Linux
installations. Something about Solaris was mentioned. What about the
rest? How can we maintain this in the long term? How do we know that
these facilities actually work correctly and not cause mysterious problems?

[...]
I think that we should not hobble pg in Linux, because of limitations of
other O/S's like those from Microsoft!

On the safe side, if a feature has insufficient evidence of working in a
particular O/S, then it should not be default enabled for that O/S.

If a feature is useful in Linux, but not elsewhere: then pg should still
run in the other O/S's but the documentation should reflect that.

Cheers,.
Gavin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gavin Flower (#10)
Re: System load consideration before spawning parallel workers

Gavin Flower <GavinFlower@archidevsys.co.nz> writes:

On 02/09/16 04:44, Peter Eisentraut wrote:

You can try this out by building PostgreSQL this way. Please save your
work first, because you might have to hard-reboot your system.

Hmm... I've built several versions of pg this way, without any obvious
problems!

I'm a little skeptical of that too. However, I'd note that with a "make"
you're not likely to care, or possibly even notice, if the thing does
something like go completely to sleep for a little while, or if some
sub-jobs proceed well while others do not. The fact that "-l 8" works
okay for make doesn't necessarily translate to more-interactive use cases.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#8)
Re: System load consideration before spawning parallel workers

On Thu, Sep 1, 2016 at 01:01:35PM -0400, Peter Eisentraut wrote:

Maybe a couple of hooks could be useful to allow people to experiment
with this. But the hooks should be more general, as described above.
But I think a few GUC settings that can be adjusted at run time could be
sufficient as well.

Couldn't SQL sessions call a PL/Perl function that could query the OS
and set max_parallel_workers_per_gather appropriately?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Bruce Momjian (#13)
Re: System load consideration before spawning parallel workers

On 9/2/16 4:07 PM, Bruce Momjian wrote:

Couldn't SQL sessions call a PL/Perl function that could query the OS
and set max_parallel_workers_per_gather appropriately?

I'd certainly like to see a greater ability to utilize "hooks" without
resorting to C. "hooks" in quotes because while some hooks need to be in
C to be of practical use, others (such as a parallelization limit or
controlling autovacuum) might not.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Peter Geoghegan
pg@heroku.com
In reply to: Peter Eisentraut (#8)
Re: System load consideration before spawning parallel workers

On Thu, Sep 1, 2016 at 10:01 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 8/16/16 3:39 AM, Haribabu Kommi wrote:

Yes, we need to consider many parameters as a system load, not just only
the CPU. Here I attached a POC patch that implements the CPU load
calculation and decide the number of workers based on the available CPU
load. The load calculation code is not an optimized one, there are many ways
that can used to calculate the system load. This is just for an example.

I see a number of discussion points here:

We don't yet have enough field experience with the parallel query
facilities to know what kind of use patterns there are and what systems
for load management we need. So I think building a highly specific
system like this seems premature. We have settings to limit process
numbers, which seems OK as a start, and those knobs have worked
reasonably well in other areas (e.g., max connections, autovacuum). We
might well want to enhance this area, but we'll need more experience and
information.

If we think that checking the CPU load is a useful way to manage process
resources, why not apply this to more kinds of processes? I could
imagine that limiting connections by load could be useful. Parallel
workers is only one specific niche of this problem.

+1 to all of this, particularly the point about parallel workers being
one niche aspect of an overall problem.

What I'd like to see in this area first is our moving away from the
work_mem model. I think it makes a lot of sense to manage memory
currently capped by the catch-all work_mem setting as a shared
resource, to be dynamically doled out among backends according to
availability, priority, and possibly other considerations. I see the
9.6 work on external sort as a building piece for that, as it removed
the one thing that was sensitive to work_mem in a surprising,
unpredictable way.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Peter Eisentraut (#8)
Re: System load consideration before spawning parallel workers

On Fri, Sep 2, 2016 at 3:01 AM, Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:

On 8/16/16 3:39 AM, Haribabu Kommi wrote:

Yes, we need to consider many parameters as a system load, not just only
the CPU. Here I attached a POC patch that implements the CPU load
calculation and decide the number of workers based on the available CPU
load. The load calculation code is not an optimized one, there are many

ways

that can used to calculate the system load. This is just for an example.

I see a number of discussion points here:

We don't yet have enough field experience with the parallel query
facilities to know what kind of use patterns there are and what systems
for load management we need. So I think building a highly specific
system like this seems premature. We have settings to limit process
numbers, which seems OK as a start, and those knobs have worked
reasonably well in other areas (e.g., max connections, autovacuum). We
might well want to enhance this area, but we'll need more experience and
information.

Yes, I agree that parallel query is a new feature and we cannot decide it's
affect now itself.

If we think that checking the CPU load is a useful way to manage process
resources, why not apply this to more kinds of processes? I could
imagine that limiting connections by load could be useful. Parallel
workers is only one specific niche of this problem.

Yes, I agree that parallel is only one problem.

How about Postmater calculates the CPU and etc load on the system and
update it in a shared location where every backend can access the details.
Using that, we can decide what operations to control. Using some GUC
specified interval, Postmater updates the system load, so this will not
affect
the performance of other backends.

As I just wrote in another message in this thread, I don't trust system
load metrics very much as a gatekeeper. They are reasonable for
long-term charting to discover trends, but there are numerous potential
problems for using them for this kind of resource control thing.

All of this seems very platform specific, too. You have
Windows-specific code, but the rest seems very Linux-specific. The
dstat tool I had never heard of before. There is stuff with cgroups,
which I don't know how portable they are across different Linux
installations. Something about Solaris was mentioned. What about the
rest? How can we maintain this in the long term? How do we know that
these facilities actually work correctly and not cause mysterious problems?

The CPU load calculation patch is a POC patch, i didn't evaluate it's
behavior
in all platforms.

Maybe a couple of hooks could be useful to allow people to experiment
with this. But the hooks should be more general, as described above.
But I think a few GUC settings that can be adjusted at run time could be
sufficient as well.

With the GUC settings of parallel it is possible to control the behavior
where
it improves the performance because of more parallel workers when there is
very less load on the system. In case if the system load increases and use
of
more parallel workers can add the overhead instead of improvement to
existing
current behavior when the load is high.

In such cases, the number of parallel workers needs to be reduced with
change
in GUC settings. Instead of that, I just thought, how about if we do the
same
automatically.

Regards,
Hari Babu
Fujitsu Australia

#17Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Haribabu Kommi (#16)
Re: System load consideration before spawning parallel workers

It seems clear that this patch design is not favored by the community,
so I'm setting the patch as rejected in the CF app.

I think there is interest in managing system resources better, but I
don't know what that would look like.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers