pgbench - extend initialization phase control

Started by Fabien COELHOalmost 7 years ago29 messages
#1Fabien COELHO
coelho@cri.ensmp.fr
1 attachment(s)

Hello devs,

the attached patch adds some more control on the initialization phase.
In particular, ( and ) allow to begin/commit explicitely, and G generates
the data server-side instead of client side, which might be a good idea
depending on the available bandwidth.

Together with the previously submitted patch about getting stats on the
initialization phase, the idea is to possibly improve this phase, or use
it as a benchmark tool in itself.

--
Fabien.

Attachments:

pgbench-init-extended-1.patchtext/x-diff; name=pgbench-init-extended-1.patchDownload
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ee2501be55..39a8efcdd1 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -167,7 +167,7 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         <replaceable>init_steps</replaceable> specifies the
         initialization steps to be performed, using one character per step.
         Each step is invoked in the specified order.
-        The default is <literal>dtgvp</literal>.
+        The default is <literal>dt(g)vp</literal>.
         The available steps are:
 
         <variablelist>
@@ -193,12 +193,31 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
           </listitem>
          </varlistentry>
          <varlistentry>
-          <term><literal>g</literal> (Generate data)</term>
+          <term><literal>(</literal> (begin transaction)</term>
+          <listitem>
+           <para>
+            Emit a <command>BEGIN</command>.
+           </para>
+          </listitem>
+         </varlistentry>
+         <varlistentry>
+          <term><literal>g</literal> or <literal>G</literal> (Generate data, client or server side)</term>
           <listitem>
            <para>
             Generate data and load it into the standard tables,
             replacing any data already present.
            </para>
+           <para>
+            When data is generated server side, there is no progress output.
+           </para>
+          </listitem>
+         </varlistentry>
+         <varlistentry>
+          <term><literal>)</literal> (commit transaction)</term>
+          <listitem>
+           <para>
+            Emit a <command>COMMIT</command>.
+           </para>
           </listitem>
          </varlistentry>
          <varlistentry>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 99529de52a..3931895968 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -130,7 +130,13 @@ static int	pthread_join(pthread_t th, void **thread_return);
 /********************************************************************
  * some configurable parameters */
 
-#define DEFAULT_INIT_STEPS "dtgvp"	/* default -I setting */
+/*
+ * we do all data generation in one transaction to enable the backend's
+ * data-loading optimizations
+ */
+#define DEFAULT_INIT_STEPS "dt(g)vp"	/* default -I setting */
+
+#define ALL_INIT_STEPS "dtgGvpf()"		/* all possible steps */
 
 #define LOG_STEP_SECONDS	5	/* seconds between log messages */
 #define DEFAULT_NXACTS	10		/* default nxacts */
@@ -608,7 +614,7 @@ usage(void)
 		   "  %s [OPTION]... [DBNAME]\n"
 		   "\nInitialization options:\n"
 		   "  -i, --initialize         invokes initialization mode\n"
-		   "  -I, --init-steps=[dtgvpf]+ (default \"dtgvp\")\n"
+		   "  -I, --init-steps=[" ALL_INIT_STEPS "]+ (default \"" DEFAULT_INIT_STEPS "\")\n"
 		   "                           run selected initialization steps\n"
 		   "  -F, --fillfactor=NUM     set fill factor\n"
 		   "  -n, --no-vacuum          do not run VACUUM during initialization\n"
@@ -3698,10 +3704,23 @@ initCreateTables(PGconn *con)
 }
 
 /*
- * Fill the standard tables with some data
+ * truncate away any old data, in one command in case there are foreign keys
  */
 static void
-initGenerateData(PGconn *con)
+initTruncateTables(PGconn *con)
+{
+	executeStatement(con, "truncate table "
+					 "pgbench_accounts, "
+					 "pgbench_branches, "
+					 "pgbench_history, "
+					 "pgbench_tellers");
+}
+
+/*
+ * Fill the standard tables with some data, from the client side
+ */
+static void
+initGenerateDataClientSide(PGconn *con)
 {
 	char		sql[256];
 	PGresult   *res;
@@ -3715,23 +3734,9 @@ initGenerateData(PGconn *con)
 				remaining_sec;
 	int			log_interval = 1;
 
-	fprintf(stderr, "generating data...\n");
+	fprintf(stderr, "generating data client side...\n");
 
-	/*
-	 * we do all of this in one transaction to enable the backend's
-	 * data-loading optimizations
-	 */
-	executeStatement(con, "begin");
-
-	/*
-	 * truncate away any old data, in one command in case there are foreign
-	 * keys
-	 */
-	executeStatement(con, "truncate table "
-					 "pgbench_accounts, "
-					 "pgbench_branches, "
-					 "pgbench_history, "
-					 "pgbench_tellers");
+	initTruncateTables(con);
 
 	/*
 	 * fill branches, tellers, accounts in that order in case foreign keys
@@ -3831,8 +3836,37 @@ initGenerateData(PGconn *con)
 		fprintf(stderr, "PQendcopy failed\n");
 		exit(1);
 	}
+}
 
-	executeStatement(con, "commit");
+/*
+ * Fill the standard tables with some data, from the server side
+ */
+static void
+initGenerateDataServerSide(PGconn *con)
+{
+	char		sql[256];
+
+	fprintf(stderr, "generating data server side...\n");
+
+	initTruncateTables(con);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_branches(bid,bbalance,filler) "
+			 "select bid, 0, '' "
+			 "from generate_series(1, %d) as bid", scale);
+	executeStatement(con, sql);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_tellers(tid,bid,tbalance,filler) "
+			 "select tid, (tid - 1) / %d + 1, 0, '' "
+			 "from generate_series(1, %d) as tid", ntellers, scale * ntellers);
+	executeStatement(con, sql);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_accounts(aid,bid,abalance,filler) "
+			 "select aid, (aid - 1) / %d + 1, 0, '' "
+			 "from generate_series(1, %d) as aid", naccounts, scale * naccounts);
+	executeStatement(con, sql);
 }
 
 /*
@@ -3916,6 +3950,7 @@ static void
 checkInitSteps(const char *initialize_steps)
 {
 	const char *step;
+	int begins = 0;
 
 	if (initialize_steps[0] == '\0')
 	{
@@ -3925,13 +3960,40 @@ checkInitSteps(const char *initialize_steps)
 
 	for (step = initialize_steps; *step != '\0'; step++)
 	{
-		if (strchr("dtgvpf ", *step) == NULL)
+		if (strchr(ALL_INIT_STEPS " ", *step) == NULL)
 		{
-			fprintf(stderr, "unrecognized initialization step \"%c\"\n",
+			fprintf(stderr,
+					"unrecognized initialization step \"%c\"\n"
+					"allowed step characters are: \"" ALL_INIT_STEPS "\"\n",
 					*step);
-			fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\", \"p\", \"f\"\n");
 			exit(1);
 		}
+
+		/* count BEGIN/COMMIT for matching */
+		if (*step == '(')
+			begins++;
+		else if (*step == ')')
+			begins--;
+
+		if (begins < 0)
+		{
+			fprintf(stderr, "COMMIT \")\" at char %ld of \"%s\" does not have a matching BEGIN \"(\"\n",
+							step - initialize_steps, initialize_steps);
+			exit(1);
+		}
+		else if (begins >= 2)
+		{
+			fprintf(stderr, "BEGIN \"(\" within a BEGIN at char %ld of \"%s\", nested transactions are not supported\n",
+					step - initialize_steps, initialize_steps);
+			exit(1);
+		}
+	}
+
+	if (begins > 0)
+	{
+		fprintf(stderr, "a BEGIN \"(\" in \"%s\" does not have a matching COMMIT \")\"\n",
+						initialize_steps);
+		exit(1);
 	}
 }
 
@@ -3957,8 +4019,17 @@ runInitSteps(const char *initialize_steps)
 			case 't':
 				initCreateTables(con);
 				break;
+			case '(':
+				executeStatement(con, "begin");
+				break;
 			case 'g':
-				initGenerateData(con);
+				initGenerateDataClientSide(con);
+				break;
+			case 'G':
+				initGenerateDataServerSide(con);
+				break;
+			case ')':
+				executeStatement(con, "commit");
 				break;
 			case 'v':
 				initVacuum(con);
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 62906d5e55..ae6af00099 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -107,7 +107,7 @@ pgbench(
 
 # Again, with all possible options
 pgbench(
-	'--initialize --init-steps=dtpvg --scale=1 --unlogged-tables --fillfactor=98 --foreign-keys --quiet --tablespace=pg_default --index-tablespace=pg_default',
+	'--initialize --init-steps=dtpv(g) --scale=1 --unlogged-tables --fillfactor=98 --foreign-keys --quiet --tablespace=pg_default --index-tablespace=pg_default',
 	0,
 	[qr{^$}i],
 	[
@@ -122,14 +122,14 @@ pgbench(
 
 # Test interaction of --init-steps with legacy step-selection options
 pgbench(
-	'--initialize --init-steps=dtpvgvv --no-vacuum --foreign-keys --unlogged-tables',
+	'--initialize --init-steps=dtpv(G)vv --no-vacuum --foreign-keys --unlogged-tables',
 	0,
 	[qr{^$}],
 	[
 		qr{dropping old tables},
 		qr{creating tables},
 		qr{creating primary keys},
-		qr{.* of .* tuples \(.*\) done},
+		qr{generating data server side},
 		qr{creating foreign keys},
 		qr{done\.}
 	],
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 69a6d03bb3..e2cb4f8b12 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -147,7 +147,22 @@ my @options = (
 	[
 		'invalid init step',
 		'-i -I dta',
-		[ qr{unrecognized initialization step}, qr{allowed steps are} ]
+		[ qr{unrecognized initialization step}, qr{allowed step characters are} ]
+	],
+	[
+		'invalid init step begin/begin',
+		'-i -I ((',
+		[ qr{nested transactions are not supported} ]
+	],
+	[
+		'invalid init step end',
+		'-i -I )',
+		[ qr{does not have a matching BEGIN} ]
+	],
+	[
+		'invalid init step begin',
+		'-i -I (dt',
+		[ qr{does not have a matching COMMIT} ]
 	],
 	[
 		'bad random seed',
#2Ibrar Ahmed
ibrar.ahmad@gmail.com
In reply to: Fabien COELHO (#1)
Re: pgbench - extend initialization phase control

Does both client/server side data generation in a single command make sense?

#3Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Ibrar Ahmed (#2)
Re: pgbench - extend initialization phase control

Hello Ibrar,

Does both client/server side data generation in a single command make
sense?

I think yes, especially with the other patch which adds timing measures to
the initialization phases. It really depends what you want to test.

With client-side generation you test the libpq COPY interface and network
protocol for bulk loading.

With server-side generation you are get the final result faster when
network bandwidth is low, and somehow you are testing a different kind of
small query which generates a lot of data.

--
Fabien.

#4Ibrar Ahmed
ibrar.ahmad@gmail.com
In reply to: Fabien COELHO (#3)
Re: pgbench - extend initialization phase control

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: not tested

Other than that, the patch looks good to me.

The new status of this patch is: Ready for Committer

#5Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Ibrar Ahmed (#4)
1 attachment(s)
Re: pgbench - extend initialization phase control

Hello Ibrar,

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: not tested

Other than that, the patch looks good to me.

The new status of this patch is: Ready for Committer

Thanks for the review.

Attached v2 is a rebase after ce8f9467.

--
Fabien.

Attachments:

pgbench-init-extended-2.patchtext/x-diff; name=pgbench-init-extended-2.patchDownload
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 816f9cc4c7..bcdac60ade 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -167,7 +167,7 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         <replaceable>init_steps</replaceable> specifies the
         initialization steps to be performed, using one character per step.
         Each step is invoked in the specified order.
-        The default is <literal>dtgvp</literal>.
+        The default is <literal>dt(g)vp</literal>.
         The available steps are:
 
         <variablelist>
@@ -193,12 +193,31 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
           </listitem>
          </varlistentry>
          <varlistentry>
-          <term><literal>g</literal> (Generate data)</term>
+          <term><literal>(</literal> (begin transaction)</term>
+          <listitem>
+           <para>
+            Emit a <command>BEGIN</command>.
+           </para>
+          </listitem>
+         </varlistentry>
+         <varlistentry>
+          <term><literal>g</literal> or <literal>G</literal> (Generate data, client or server side)</term>
           <listitem>
            <para>
             Generate data and load it into the standard tables,
             replacing any data already present.
            </para>
+           <para>
+            When data is generated server side, there is no progress output.
+           </para>
+          </listitem>
+         </varlistentry>
+         <varlistentry>
+          <term><literal>)</literal> (commit transaction)</term>
+          <listitem>
+           <para>
+            Emit a <command>COMMIT</command>.
+           </para>
           </listitem>
          </varlistentry>
          <varlistentry>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 570cf3306a..a990eb6f21 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -130,7 +130,13 @@ static int	pthread_join(pthread_t th, void **thread_return);
 /********************************************************************
  * some configurable parameters */
 
-#define DEFAULT_INIT_STEPS "dtgvp"	/* default -I setting */
+/*
+ * we do all data generation in one transaction to enable the backend's
+ * data-loading optimizations
+ */
+#define DEFAULT_INIT_STEPS "dt(g)vp"	/* default -I setting */
+
+#define ALL_INIT_STEPS "dtgGvpf()"		/* all possible steps */
 
 #define LOG_STEP_SECONDS	5	/* seconds between log messages */
 #define DEFAULT_NXACTS	10		/* default nxacts */
@@ -608,7 +614,7 @@ usage(void)
 		   "  %s [OPTION]... [DBNAME]\n"
 		   "\nInitialization options:\n"
 		   "  -i, --initialize         invokes initialization mode\n"
-		   "  -I, --init-steps=[dtgvpf]+ (default \"dtgvp\")\n"
+		   "  -I, --init-steps=[" ALL_INIT_STEPS "]+ (default \"" DEFAULT_INIT_STEPS "\")\n"
 		   "                           run selected initialization steps\n"
 		   "  -F, --fillfactor=NUM     set fill factor\n"
 		   "  -n, --no-vacuum          do not run VACUUM during initialization\n"
@@ -3689,10 +3695,23 @@ initCreateTables(PGconn *con)
 }
 
 /*
- * Fill the standard tables with some data
+ * truncate away any old data, in one command in case there are foreign keys
  */
 static void
-initGenerateData(PGconn *con)
+initTruncateTables(PGconn *con)
+{
+	executeStatement(con, "truncate table "
+					 "pgbench_accounts, "
+					 "pgbench_branches, "
+					 "pgbench_history, "
+					 "pgbench_tellers");
+}
+
+/*
+ * Fill the standard tables with some data, from the client side
+ */
+static void
+initGenerateDataClientSide(PGconn *con)
 {
 	char		sql[256];
 	PGresult   *res;
@@ -3706,23 +3725,9 @@ initGenerateData(PGconn *con)
 				remaining_sec;
 	int			log_interval = 1;
 
-	fprintf(stderr, "generating data...\n");
+	fprintf(stderr, "generating data client side...\n");
 
-	/*
-	 * we do all of this in one transaction to enable the backend's
-	 * data-loading optimizations
-	 */
-	executeStatement(con, "begin");
-
-	/*
-	 * truncate away any old data, in one command in case there are foreign
-	 * keys
-	 */
-	executeStatement(con, "truncate table "
-					 "pgbench_accounts, "
-					 "pgbench_branches, "
-					 "pgbench_history, "
-					 "pgbench_tellers");
+	initTruncateTables(con);
 
 	/*
 	 * fill branches, tellers, accounts in that order in case foreign keys
@@ -3822,8 +3827,37 @@ initGenerateData(PGconn *con)
 		fprintf(stderr, "PQendcopy failed\n");
 		exit(1);
 	}
+}
 
-	executeStatement(con, "commit");
+/*
+ * Fill the standard tables with some data, from the server side
+ */
+static void
+initGenerateDataServerSide(PGconn *con)
+{
+	char		sql[256];
+
+	fprintf(stderr, "generating data server side...\n");
+
+	initTruncateTables(con);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_branches(bid,bbalance,filler) "
+			 "select bid, 0, '' "
+			 "from generate_series(1, %d) as bid", scale);
+	executeStatement(con, sql);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_tellers(tid,bid,tbalance,filler) "
+			 "select tid, (tid - 1) / %d + 1, 0, '' "
+			 "from generate_series(1, %d) as tid", ntellers, scale * ntellers);
+	executeStatement(con, sql);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_accounts(aid,bid,abalance,filler) "
+			 "select aid, (aid - 1) / %d + 1, 0, '' "
+			 "from generate_series(1, %d) as aid", naccounts, scale * naccounts);
+	executeStatement(con, sql);
 }
 
 /*
@@ -3907,6 +3941,7 @@ static void
 checkInitSteps(const char *initialize_steps)
 {
 	const char *step;
+	int begins = 0;
 
 	if (initialize_steps[0] == '\0')
 	{
@@ -3916,13 +3951,40 @@ checkInitSteps(const char *initialize_steps)
 
 	for (step = initialize_steps; *step != '\0'; step++)
 	{
-		if (strchr("dtgvpf ", *step) == NULL)
+		if (strchr(ALL_INIT_STEPS " ", *step) == NULL)
 		{
-			fprintf(stderr, "unrecognized initialization step \"%c\"\n",
+			fprintf(stderr,
+					"unrecognized initialization step \"%c\"\n"
+					"allowed step characters are: \"" ALL_INIT_STEPS "\"\n",
 					*step);
-			fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\", \"p\", \"f\"\n");
 			exit(1);
 		}
+
+		/* count BEGIN/COMMIT for matching */
+		if (*step == '(')
+			begins++;
+		else if (*step == ')')
+			begins--;
+
+		if (begins < 0)
+		{
+			fprintf(stderr, "COMMIT \")\" at char %ld of \"%s\" does not have a matching BEGIN \"(\"\n",
+							step - initialize_steps, initialize_steps);
+			exit(1);
+		}
+		else if (begins >= 2)
+		{
+			fprintf(stderr, "BEGIN \"(\" within a BEGIN at char %ld of \"%s\", nested transactions are not supported\n",
+					step - initialize_steps, initialize_steps);
+			exit(1);
+		}
+	}
+
+	if (begins > 0)
+	{
+		fprintf(stderr, "a BEGIN \"(\" in \"%s\" does not have a matching COMMIT \")\"\n",
+						initialize_steps);
+		exit(1);
 	}
 }
 
@@ -3960,9 +4022,20 @@ runInitSteps(const char *initialize_steps)
 				op = "create tables";
 				initCreateTables(con);
 				break;
+			case '(':
+				executeStatement(con, "begin");
+				break;
 			case 'g':
-				op = "generate";
-				initGenerateData(con);
+				op = "client generate";
+				initGenerateDataClientSide(con);
+				break;
+			case 'G':
+				op = "server generate";
+				initGenerateDataServerSide(con);
+				break;
+			case ')':
+				op = "commit";
+				executeStatement(con, "commit");
 				break;
 			case 'v':
 				op = "vacuum";
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 5a2fdb9acb..9cbbfb9245 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -100,7 +100,7 @@ pgbench(
 
 # Again, with all possible options
 pgbench(
-	'--initialize --init-steps=dtpvg --scale=1 --unlogged-tables --fillfactor=98 --foreign-keys --quiet --tablespace=pg_default --index-tablespace=pg_default',
+	'--initialize --init-steps=dtpv(g) --scale=1 --unlogged-tables --fillfactor=98 --foreign-keys --quiet --tablespace=pg_default --index-tablespace=pg_default',
 	0,
 	[qr{^$}i],
 	[
@@ -116,14 +116,14 @@ pgbench(
 
 # Test interaction of --init-steps with legacy step-selection options
 pgbench(
-	'--initialize --init-steps=dtpvgvv --no-vacuum --foreign-keys --unlogged-tables',
+	'--initialize --init-steps=dtpv(G)vv --no-vacuum --foreign-keys --unlogged-tables',
 	0,
 	[qr{^$}],
 	[
 		qr{dropping old tables},
 		qr{creating tables},
 		qr{creating primary keys},
-		qr{.* of .* tuples \(.*\) done},
+		qr{generating data server side},
 		qr{creating foreign keys},
 		qr{(?!vacuuming)}, # no vacuum
 		qr{done in \d+\.\d\d s }
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index f7fa18418b..f3406e8600 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -147,7 +147,22 @@ my @options = (
 	[
 		'invalid init step',
 		'-i -I dta',
-		[ qr{unrecognized initialization step}, qr{allowed steps are} ]
+		[ qr{unrecognized initialization step}, qr{allowed step characters are} ]
+	],
+	[
+		'invalid init step begin/begin',
+		'-i -I ((',
+		[ qr{nested transactions are not supported} ]
+	],
+	[
+		'invalid init step end',
+		'-i -I )',
+		[ qr{does not have a matching BEGIN} ]
+	],
+	[
+		'invalid init step begin',
+		'-i -I (dt',
+		[ qr{does not have a matching COMMIT} ]
 	],
 	[
 		'bad random seed',
#6btendouan
btendouan@oss.nttdata.com
In reply to: Fabien COELHO (#1)
Re: pgbench - extend initialization phase control

Hello Fabien,

---------- Forwarded message ---------
From: Fabien COELHO <coelho@cri.ensmp.fr>
Date: Tue, Jul 16, 2019 at 4:58 PM
Subject: Re: pgbench - extend initialization phase control
To: Ibrar Ahmed <ibrar.ahmad@gmail.com>
Cc: PostgreSQL Developers <pgsql-hackers@lists.postgresql.org>

Hello Ibrar,

The following review has been posted through the commitfest
application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: not tested

Other than that, the patch looks good to me.

The new status of this patch is: Ready for Committer

Thanks for the review.

Attached v2 is a rebase after ce8f9467.

Thanks for your new patch.

But I failed to apply it. Please rebase it against HEAD.

Regards,

---------
Anna

#7Fabien COELHO
coelho@cri.ensmp.fr
In reply to: btendouan (#6)
1 attachment(s)
Re: pgbench - extend initialization phase control

Attached v2 is a rebase after ce8f9467.

Here is rebase v3.

--
Fabien.

Attachments:

pgbench-init-extended-3.patchtext/x-diff; name=pgbench-init-extended-3.patchDownload
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index e3a0abb4c7..e9f43f3b26 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -167,7 +167,7 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         <replaceable>init_steps</replaceable> specifies the
         initialization steps to be performed, using one character per step.
         Each step is invoked in the specified order.
-        The default is <literal>dtgvp</literal>.
+        The default is <literal>dt(g)vp</literal>.
         The available steps are:
 
         <variablelist>
@@ -193,12 +193,31 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
           </listitem>
          </varlistentry>
          <varlistentry>
-          <term><literal>g</literal> (Generate data)</term>
+          <term><literal>(</literal> (begin transaction)</term>
+          <listitem>
+           <para>
+            Emit a <command>BEGIN</command>.
+           </para>
+          </listitem>
+         </varlistentry>
+         <varlistentry>
+          <term><literal>g</literal> or <literal>G</literal> (Generate data, client or server side)</term>
           <listitem>
            <para>
             Generate data and load it into the standard tables,
             replacing any data already present.
            </para>
+           <para>
+            When data is generated server side, there is no progress output.
+           </para>
+          </listitem>
+         </varlistentry>
+         <varlistentry>
+          <term><literal>)</literal> (commit transaction)</term>
+          <listitem>
+           <para>
+            Emit a <command>COMMIT</command>.
+           </para>
           </listitem>
          </varlistentry>
          <varlistentry>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e72ad0036e..50b8139f50 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -130,7 +130,13 @@ static int	pthread_join(pthread_t th, void **thread_return);
 /********************************************************************
  * some configurable parameters */
 
-#define DEFAULT_INIT_STEPS "dtgvp"	/* default -I setting */
+/*
+ * we do all data generation in one transaction to enable the backend's
+ * data-loading optimizations
+ */
+#define DEFAULT_INIT_STEPS "dt(g)vp"	/* default -I setting */
+
+#define ALL_INIT_STEPS "dtgGvpf()"		/* all possible steps */
 
 #define LOG_STEP_SECONDS	5	/* seconds between log messages */
 #define DEFAULT_NXACTS	10		/* default nxacts */
@@ -626,7 +632,7 @@ usage(void)
 		   "  %s [OPTION]... [DBNAME]\n"
 		   "\nInitialization options:\n"
 		   "  -i, --initialize         invokes initialization mode\n"
-		   "  -I, --init-steps=[dtgvpf]+ (default \"dtgvp\")\n"
+		   "  -I, --init-steps=[" ALL_INIT_STEPS "]+ (default \"" DEFAULT_INIT_STEPS "\")\n"
 		   "                           run selected initialization steps\n"
 		   "  -F, --fillfactor=NUM     set fill factor\n"
 		   "  -n, --no-vacuum          do not run VACUUM during initialization\n"
@@ -3802,10 +3808,23 @@ append_fillfactor(char *opts, int len)
 }
 
 /*
- * Fill the standard tables with some data
+ * truncate away any old data, in one command in case there are foreign keys
  */
 static void
-initGenerateData(PGconn *con)
+initTruncateTables(PGconn *con)
+{
+	executeStatement(con, "truncate table "
+					 "pgbench_accounts, "
+					 "pgbench_branches, "
+					 "pgbench_history, "
+					 "pgbench_tellers");
+}
+
+/*
+ * Fill the standard tables with some data, from the client side
+ */
+static void
+initGenerateDataClientSide(PGconn *con)
 {
 	char		sql[256];
 	PGresult   *res;
@@ -3819,23 +3838,9 @@ initGenerateData(PGconn *con)
 				remaining_sec;
 	int			log_interval = 1;
 
-	fprintf(stderr, "generating data...\n");
+	fprintf(stderr, "generating data client side...\n");
 
-	/*
-	 * we do all of this in one transaction to enable the backend's
-	 * data-loading optimizations
-	 */
-	executeStatement(con, "begin");
-
-	/*
-	 * truncate away any old data, in one command in case there are foreign
-	 * keys
-	 */
-	executeStatement(con, "truncate table "
-					 "pgbench_accounts, "
-					 "pgbench_branches, "
-					 "pgbench_history, "
-					 "pgbench_tellers");
+	initTruncateTables(con);
 
 	/*
 	 * fill branches, tellers, accounts in that order in case foreign keys
@@ -3935,8 +3940,37 @@ initGenerateData(PGconn *con)
 		fprintf(stderr, "PQendcopy failed\n");
 		exit(1);
 	}
+}
 
-	executeStatement(con, "commit");
+/*
+ * Fill the standard tables with some data, from the server side
+ */
+static void
+initGenerateDataServerSide(PGconn *con)
+{
+	char		sql[256];
+
+	fprintf(stderr, "generating data server side...\n");
+
+	initTruncateTables(con);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_branches(bid,bbalance,filler) "
+			 "select bid, 0, '' "
+			 "from generate_series(1, %d) as bid", scale);
+	executeStatement(con, sql);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_tellers(tid,bid,tbalance,filler) "
+			 "select tid, (tid - 1) / %d + 1, 0, '' "
+			 "from generate_series(1, %d) as tid", ntellers, scale * ntellers);
+	executeStatement(con, sql);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_accounts(aid,bid,abalance,filler) "
+			 "select aid, (aid - 1) / %d + 1, 0, '' "
+			 "from generate_series(1, %d) as aid", naccounts, scale * naccounts);
+	executeStatement(con, sql);
 }
 
 /*
@@ -4020,6 +4054,7 @@ static void
 checkInitSteps(const char *initialize_steps)
 {
 	const char *step;
+	int begins = 0;
 
 	if (initialize_steps[0] == '\0')
 	{
@@ -4029,13 +4064,40 @@ checkInitSteps(const char *initialize_steps)
 
 	for (step = initialize_steps; *step != '\0'; step++)
 	{
-		if (strchr("dtgvpf ", *step) == NULL)
+		if (strchr(ALL_INIT_STEPS " ", *step) == NULL)
 		{
-			fprintf(stderr, "unrecognized initialization step \"%c\"\n",
+			fprintf(stderr,
+					"unrecognized initialization step \"%c\"\n"
+					"Allowed step characters are: \"" ALL_INIT_STEPS "\".\n",
 					*step);
-			fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\", \"p\", \"f\"\n");
 			exit(1);
 		}
+
+		/* count BEGIN/COMMIT for matching */
+		if (*step == '(')
+			begins++;
+		else if (*step == ')')
+			begins--;
+
+		if (begins < 0)
+		{
+			fprintf(stderr, "COMMIT \")\" at char %ld of \"%s\" does not have a matching BEGIN \"(\"\n",
+							step - initialize_steps, initialize_steps);
+			exit(1);
+		}
+		else if (begins >= 2)
+		{
+			fprintf(stderr, "BEGIN \"(\" within a BEGIN at char %ld of \"%s\", nested transactions are not supported\n",
+					step - initialize_steps, initialize_steps);
+			exit(1);
+		}
+	}
+
+	if (begins > 0)
+	{
+		fprintf(stderr, "a BEGIN \"(\" in \"%s\" does not have a matching COMMIT \")\"\n",
+						initialize_steps);
+		exit(1);
 	}
 }
 
@@ -4073,9 +4135,20 @@ runInitSteps(const char *initialize_steps)
 				op = "create tables";
 				initCreateTables(con);
 				break;
+			case '(':
+				executeStatement(con, "begin");
+				break;
 			case 'g':
-				op = "generate";
-				initGenerateData(con);
+				op = "client generate";
+				initGenerateDataClientSide(con);
+				break;
+			case 'G':
+				op = "server generate";
+				initGenerateDataServerSide(con);
+				break;
+			case ')':
+				op = "commit";
+				executeStatement(con, "commit");
 				break;
 			case 'v':
 				op = "vacuum";
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index c441626d7c..ddfdc79efa 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -113,7 +113,7 @@ pgbench(
 
 # Again, with all possible options
 pgbench(
-	'--initialize --init-steps=dtpvg --scale=1 --unlogged-tables --fillfactor=98 --foreign-keys --quiet --tablespace=regress_pgbench_tap_1_ts --index-tablespace=regress_pgbench_tap_1_ts --partitions=2 --partition-method=hash',
+	'--initialize --init-steps=dtpv(g) --scale=1 --unlogged-tables --fillfactor=98 --foreign-keys --quiet --tablespace=regress_pgbench_tap_1_ts --index-tablespace=regress_pgbench_tap_1_ts --partitions=2 --partition-method=hash',
 	0,
 	[qr{^$}i],
 	[
@@ -130,7 +130,7 @@ pgbench(
 
 # Test interaction of --init-steps with legacy step-selection options
 pgbench(
-	'--initialize --init-steps=dtpvgvv --no-vacuum --foreign-keys --unlogged-tables --partitions=3',
+	'--initialize --init-steps=dtpv(G)vv --no-vacuum --foreign-keys --unlogged-tables --partitions=3',
 	0,
 	[qr{^$}],
 	[
@@ -138,7 +138,7 @@ pgbench(
 		qr{creating tables},
 		qr{creating 3 partitions},
 		qr{creating primary keys},
-		qr{.* of .* tuples \(.*\) done},
+		qr{generating data server side},
 		qr{creating foreign keys},
 		qr{(?!vacuuming)}, # no vacuum
 		qr{done in \d+\.\d\d s }
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 1e9542af3f..256c7104cd 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -147,7 +147,22 @@ my @options = (
 	[
 		'invalid init step',
 		'-i -I dta',
-		[ qr{unrecognized initialization step}, qr{allowed steps are} ]
+		[ qr{unrecognized initialization step}, qr{allowed step characters are} ]
+	],
+	[
+		'invalid init step begin/begin',
+		'-i -I ((',
+		[ qr{nested transactions are not supported} ]
+	],
+	[
+		'invalid init step end',
+		'-i -I )',
+		[ qr{does not have a matching BEGIN} ]
+	],
+	[
+		'invalid init step begin',
+		'-i -I (dt',
+		[ qr{does not have a matching COMMIT} ]
 	],
 	[
 		'bad random seed',
#8btendouan
btendouan@oss.nttdata.com
In reply to: Fabien COELHO (#7)
Re: pgbench - extend initialization phase control

Here is rebase v3.

Hi,

Thanks for your new patch.

Failed regression test.
It's necessary to change the first a in “allowed step characters are” to
uppercase A in the regression test of 002_pgbench_no_server.pl.

The behavior of "g" is different between v12 and the patche, and
backward compatibility is lost.
In v12, BEGIN and COMMIT are specified only by choosing "g".
It's a problem that backward compatibility is lost.

When using ( and ) with the -I, the documentation should indicate that
double quotes are required,
and "v" not be able to enclose in ( and ).

Regards,

--
Anna

#9btendouan
btendouan@oss.nttdata.com
In reply to: btendouan (#8)
Re: pgbench - extend initialization phase control

Hi,

When g is specified, null is inserted in the filler column of
pgbentch_tellrs, acounts, branches.
But when G is specified, empty string is inserted.

Do you have any intention of this difference?

--
Anna

#10Fabien COELHO
coelho@cri.ensmp.fr
In reply to: btendouan (#8)
1 attachment(s)
Re: pgbench - extend initialization phase control

Hello,

Failed regression test. It's necessary to change the first a in “allowed
step characters are” to uppercase A in the regression test of
002_pgbench_no_server.pl.

Argh. I think I ran the test, then stupidly updated the message afterwards
to better match best practices, without rechecking:-(

The behavior of "g" is different between v12 and the patche, and
backward compatibility is lost. In v12, BEGIN and COMMIT are specified
only by choosing "g". It's a problem that backward compatibility is
lost.

Somehow yes, but I do not see this as an actual problem from a functional
point of view: it just means that if you use a 'dtgvp' with the newer
version and if the inserts were to fail, then they are not under an
explicit transaction, so previous inserts are not cleaned up. However,
this is a pretty unlikely case, and anyway the error is reported, so any
user would be expected not to go on after the initialization phase.

So basically I do not see the very small regression for an unlikely corner
case to induce any problem in practice.

The benefit of controlling where begin/end actually occur is that it may
have an impact on performance, and it allows to check that.

When using ( and ) with the -I, the documentation should indicate that double
quotes are required,

Or single quotes, or backslash, if launch from the command line. I added a
mention of escaping or protection in the doc in that case.

and "v" not be able to enclose in ( and ).

That is a postgresql limitation, which may evolve. There could be others.
I updated the doc to say that some commands may not work inside an
explicit transaction.

When g is specified, null is inserted in the filler column of
pgbentch_tellrs, acounts, branches. But when G is specified, empty
string is inserted.

Indeed there is a small diff. ISTM that the actual filling with the
initial client version is NULL for branches and tellers, and a
blank-padded string for accounts.

I fixed the patch so that the end-result is the same with both g and G.

Do you have any intention of this difference?

Yes and no.

I intended that tellers & branches filler are filled, but I did not really
notice that the client side was implicitely using NULL, although it says
so in a comment. Although I'm not happy with the fact because it cheats
with the benchmark design which requires the filler columns to be really
filled and stored as is, it is indeed the place to change this (bad)
behavior.

Attached a v4 with the updates described above.

--
Fabien.

Attachments:

pgbench-init-extended-4.patchtext/x-diff; name=pgbench-init-extended-4.patchDownload
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index e3a0abb4c7..c5e4d17fd5 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -167,7 +167,7 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         <replaceable>init_steps</replaceable> specifies the
         initialization steps to be performed, using one character per step.
         Each step is invoked in the specified order.
-        The default is <literal>dtgvp</literal>.
+        The default is <literal>dt(g)vp</literal>.
         The available steps are:
 
         <variablelist>
@@ -193,12 +193,40 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
           </listitem>
          </varlistentry>
          <varlistentry>
-          <term><literal>g</literal> (Generate data)</term>
+          <term><literal>(</literal> (begin transaction)</term>
+          <listitem>
+           <para>
+            Emit a <command>BEGIN</command>.
+           </para>
+           <para>
+            Beware that some steps may not work when called within an explicit transaction.
+           </para>
+           <para>
+            Beware that using <literal>(</literal> on the command line requires some protection or escaping.
+           </para>
+          </listitem>
+         </varlistentry>
+         <varlistentry>
+          <term><literal>g</literal> or <literal>G</literal> (Generate data, client or server side)</term>
           <listitem>
            <para>
             Generate data and load it into the standard tables,
             replacing any data already present.
            </para>
+           <para>
+            When data is generated server side, there is no progress output.
+           </para>
+          </listitem>
+         </varlistentry>
+         <varlistentry>
+          <term><literal>)</literal> (commit transaction)</term>
+          <listitem>
+           <para>
+            Emit a <command>COMMIT</command>.
+           </para>
+           <para>
+            Beware that using <literal>)</literal> on the command line requires some protection or escaping.
+           </para>
           </listitem>
          </varlistentry>
          <varlistentry>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e72ad0036e..597562248a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -130,7 +130,13 @@ static int	pthread_join(pthread_t th, void **thread_return);
 /********************************************************************
  * some configurable parameters */
 
-#define DEFAULT_INIT_STEPS "dtgvp"	/* default -I setting */
+/*
+ * we do all data generation in one transaction to enable the backend's
+ * data-loading optimizations
+ */
+#define DEFAULT_INIT_STEPS "dt(g)vp"	/* default -I setting */
+
+#define ALL_INIT_STEPS "dtgGvpf()"		/* all possible steps */
 
 #define LOG_STEP_SECONDS	5	/* seconds between log messages */
 #define DEFAULT_NXACTS	10		/* default nxacts */
@@ -626,7 +632,7 @@ usage(void)
 		   "  %s [OPTION]... [DBNAME]\n"
 		   "\nInitialization options:\n"
 		   "  -i, --initialize         invokes initialization mode\n"
-		   "  -I, --init-steps=[dtgvpf]+ (default \"dtgvp\")\n"
+		   "  -I, --init-steps=[" ALL_INIT_STEPS "]+ (default \"" DEFAULT_INIT_STEPS "\")\n"
 		   "                           run selected initialization steps\n"
 		   "  -F, --fillfactor=NUM     set fill factor\n"
 		   "  -n, --no-vacuum          do not run VACUUM during initialization\n"
@@ -3802,10 +3808,23 @@ append_fillfactor(char *opts, int len)
 }
 
 /*
- * Fill the standard tables with some data
+ * truncate away any old data, in one command in case there are foreign keys
  */
 static void
-initGenerateData(PGconn *con)
+initTruncateTables(PGconn *con)
+{
+	executeStatement(con, "truncate table "
+					 "pgbench_accounts, "
+					 "pgbench_branches, "
+					 "pgbench_history, "
+					 "pgbench_tellers");
+}
+
+/*
+ * Fill the standard tables with some data, from the client side
+ */
+static void
+initGenerateDataClientSide(PGconn *con)
 {
 	char		sql[256];
 	PGresult   *res;
@@ -3819,23 +3838,9 @@ initGenerateData(PGconn *con)
 				remaining_sec;
 	int			log_interval = 1;
 
-	fprintf(stderr, "generating data...\n");
+	fprintf(stderr, "generating data client side...\n");
 
-	/*
-	 * we do all of this in one transaction to enable the backend's
-	 * data-loading optimizations
-	 */
-	executeStatement(con, "begin");
-
-	/*
-	 * truncate away any old data, in one command in case there are foreign
-	 * keys
-	 */
-	executeStatement(con, "truncate table "
-					 "pgbench_accounts, "
-					 "pgbench_branches, "
-					 "pgbench_history, "
-					 "pgbench_tellers");
+	initTruncateTables(con);
 
 	/*
 	 * fill branches, tellers, accounts in that order in case foreign keys
@@ -3935,8 +3940,41 @@ initGenerateData(PGconn *con)
 		fprintf(stderr, "PQendcopy failed\n");
 		exit(1);
 	}
+}
 
-	executeStatement(con, "commit");
+/*
+ * Fill the standard tables with some data, from the server side
+ *
+ * As already the case with the client-side filling, the filler
+ * column defaults to NULL in pgbench_branches and pgbench_tellers,
+ * and is a blank-padded string in pgbench_accounts.
+ */
+static void
+initGenerateDataServerSide(PGconn *con)
+{
+	char		sql[256];
+
+	fprintf(stderr, "generating data server side...\n");
+
+	initTruncateTables(con);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_branches(bid,bbalance) "
+			 "select bid, 0 "
+			 "from generate_series(1, %d) as bid", scale);
+	executeStatement(con, sql);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_tellers(tid,bid,tbalance) "
+			 "select tid, (tid - 1) / %d + 1, 0 "
+			 "from generate_series(1, %d) as tid", ntellers, scale * ntellers);
+	executeStatement(con, sql);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_accounts(aid,bid,abalance,filler) "
+			 "select aid, (aid - 1) / %d + 1, 0, '' "
+			 "from generate_series(1, %d) as aid", naccounts, scale * naccounts);
+	executeStatement(con, sql);
 }
 
 /*
@@ -4020,6 +4058,7 @@ static void
 checkInitSteps(const char *initialize_steps)
 {
 	const char *step;
+	int begins = 0;
 
 	if (initialize_steps[0] == '\0')
 	{
@@ -4029,13 +4068,40 @@ checkInitSteps(const char *initialize_steps)
 
 	for (step = initialize_steps; *step != '\0'; step++)
 	{
-		if (strchr("dtgvpf ", *step) == NULL)
+		if (strchr(ALL_INIT_STEPS " ", *step) == NULL)
 		{
-			fprintf(stderr, "unrecognized initialization step \"%c\"\n",
+			fprintf(stderr,
+					"unrecognized initialization step \"%c\"\n"
+					"Allowed step characters are: \"" ALL_INIT_STEPS "\".\n",
 					*step);
-			fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\", \"p\", \"f\"\n");
 			exit(1);
 		}
+
+		/* count BEGIN/COMMIT for matching */
+		if (*step == '(')
+			begins++;
+		else if (*step == ')')
+			begins--;
+
+		if (begins < 0)
+		{
+			fprintf(stderr, "COMMIT \")\" at char %ld of \"%s\" does not have a matching BEGIN \"(\"\n",
+							step - initialize_steps, initialize_steps);
+			exit(1);
+		}
+		else if (begins >= 2)
+		{
+			fprintf(stderr, "BEGIN \"(\" within a BEGIN at char %ld of \"%s\", nested transactions are not supported\n",
+					step - initialize_steps, initialize_steps);
+			exit(1);
+		}
+	}
+
+	if (begins > 0)
+	{
+		fprintf(stderr, "a BEGIN \"(\" in \"%s\" does not have a matching COMMIT \")\"\n",
+						initialize_steps);
+		exit(1);
 	}
 }
 
@@ -4073,9 +4139,20 @@ runInitSteps(const char *initialize_steps)
 				op = "create tables";
 				initCreateTables(con);
 				break;
+			case '(':
+				executeStatement(con, "begin");
+				break;
 			case 'g':
-				op = "generate";
-				initGenerateData(con);
+				op = "client generate";
+				initGenerateDataClientSide(con);
+				break;
+			case 'G':
+				op = "server generate";
+				initGenerateDataServerSide(con);
+				break;
+			case ')':
+				op = "commit";
+				executeStatement(con, "commit");
 				break;
 			case 'v':
 				op = "vacuum";
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index c441626d7c..ddfdc79efa 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -113,7 +113,7 @@ pgbench(
 
 # Again, with all possible options
 pgbench(
-	'--initialize --init-steps=dtpvg --scale=1 --unlogged-tables --fillfactor=98 --foreign-keys --quiet --tablespace=regress_pgbench_tap_1_ts --index-tablespace=regress_pgbench_tap_1_ts --partitions=2 --partition-method=hash',
+	'--initialize --init-steps=dtpv(g) --scale=1 --unlogged-tables --fillfactor=98 --foreign-keys --quiet --tablespace=regress_pgbench_tap_1_ts --index-tablespace=regress_pgbench_tap_1_ts --partitions=2 --partition-method=hash',
 	0,
 	[qr{^$}i],
 	[
@@ -130,7 +130,7 @@ pgbench(
 
 # Test interaction of --init-steps with legacy step-selection options
 pgbench(
-	'--initialize --init-steps=dtpvgvv --no-vacuum --foreign-keys --unlogged-tables --partitions=3',
+	'--initialize --init-steps=dtpv(G)vv --no-vacuum --foreign-keys --unlogged-tables --partitions=3',
 	0,
 	[qr{^$}],
 	[
@@ -138,7 +138,7 @@ pgbench(
 		qr{creating tables},
 		qr{creating 3 partitions},
 		qr{creating primary keys},
-		qr{.* of .* tuples \(.*\) done},
+		qr{generating data server side},
 		qr{creating foreign keys},
 		qr{(?!vacuuming)}, # no vacuum
 		qr{done in \d+\.\d\d s }
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 1e9542af3f..39badeb3aa 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -147,7 +147,22 @@ my @options = (
 	[
 		'invalid init step',
 		'-i -I dta',
-		[ qr{unrecognized initialization step}, qr{allowed steps are} ]
+		[ qr{unrecognized initialization step}, qr{Allowed step characters are} ]
+	],
+	[
+		'invalid init step begin/begin',
+		'-i -I ((',
+		[ qr{nested transactions are not supported} ]
+	],
+	[
+		'invalid init step end',
+		'-i -I )',
+		[ qr{does not have a matching BEGIN} ]
+	],
+	[
+		'invalid init step begin',
+		'-i -I (dt',
+		[ qr{does not have a matching COMMIT} ]
 	],
 	[
 		'bad random seed',
#11btendouan
btendouan@oss.nttdata.com
In reply to: Fabien COELHO (#10)
Re: pgbench - extend initialization phase control

Attached a v4 with the updates described above.

Hi,

Thanks for updating the patch.
All tests are passed. There is no problem in operation.

--
Anna

#12Fujii Masao
masao.fujii@gmail.com
In reply to: Fabien COELHO (#10)
Re: pgbench - extend initialization phase control

On Thu, Oct 17, 2019 at 8:09 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello,

Failed regression test. It's necessary to change the first a in “allowed
step characters are” to uppercase A in the regression test of
002_pgbench_no_server.pl.

Argh. I think I ran the test, then stupidly updated the message afterwards
to better match best practices, without rechecking:-(

The behavior of "g" is different between v12 and the patche, and
backward compatibility is lost. In v12, BEGIN and COMMIT are specified
only by choosing "g". It's a problem that backward compatibility is
lost.

Somehow yes, but I do not see this as an actual problem from a functional
point of view: it just means that if you use a 'dtgvp' with the newer
version and if the inserts were to fail, then they are not under an
explicit transaction, so previous inserts are not cleaned up. However,
this is a pretty unlikely case, and anyway the error is reported, so any
user would be expected not to go on after the initialization phase.

So basically I do not see the very small regression for an unlikely corner
case to induce any problem in practice.

The benefit of controlling where begin/end actually occur is that it may
have an impact on performance, and it allows to check that.

I still fail to understand the benefit of addition of () settings.
Could you clarify what case () settings are useful for? You are
thinking to execute all initialization SQL statements within
single transaction, e.g., -I (dtgp), for some reasons?

When using ( and ) with the -I, the documentation should indicate that double
quotes are required,

Or single quotes, or backslash, if launch from the command line. I added a
mention of escaping or protection in the doc in that case.

What about using, for example, b (BEGIN) and c (COMMIT) instead
to avoid such restriction?

and "v" not be able to enclose in ( and ).

That is a postgresql limitation, which may evolve. There could be others.
I updated the doc to say that some commands may not work inside an
explicit transaction.

I think that it's better to check whehter "v" is enclosed with () or not
at the beginning of pgbench, and report an error if it is. Otherwise,
if -I (dtgv) is specified, pgbench reports an error after time-consuming
data generation is performed, and of course that data generation is
rollbacked.

Regards,

--
Fujii Masao

#13Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Fujii Masao (#12)
1 attachment(s)
Re: pgbench - extend initialization phase control

Hello Masao-san,

The benefit of controlling where begin/end actually occur is that it may
have an impact on performance, and it allows to check that.

I still fail to understand the benefit of addition of () settings.
Could you clarify what case () settings are useful for? You are
thinking to execute all initialization SQL statements within
single transaction, e.g., -I (dtgp), for some reasons?

Yep. Or anything else, including without (), to allow checking the
performance impact or non impact of transactions on the initialization
phase.

When using ( and ) with the -I, the documentation should indicate that double
quotes are required,

Or single quotes, or backslash, if launch from the command line. I added a
mention of escaping or protection in the doc in that case.

What about using, for example, b (BEGIN) and c (COMMIT) instead
to avoid such restriction?

It is indeed possible. Using a open/close symmetric character ( (), {},
[]) looks more pleasing and allows to see easily whether everything is
properly closed. I switched to {} which does not generate the same quoting
issue in shell.

I think that it's better to check whehter "v" is enclosed with () or not
at the beginning of pgbench, and report an error if it is.

Otherwise, if -I (dtgv) is specified, pgbench reports an error after
time-consuming data generation is performed, and of course that data
generation is rollbacked.

Patch v5 attached added a check for v inside (), although I'm not keen on
putting it there, and uses {} instead of ().

--
Fabien.

Attachments:

pgbench-init-extended-5.patchtext/x-diff; name=pgbench-init-extended-5.patchDownload
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index e3a0abb4c7..e150209b4e 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -167,7 +167,7 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         <replaceable>init_steps</replaceable> specifies the
         initialization steps to be performed, using one character per step.
         Each step is invoked in the specified order.
-        The default is <literal>dtgvp</literal>.
+        The default is <literal>dt{g}vp</literal>.
         The available steps are:
 
         <variablelist>
@@ -193,12 +193,34 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
           </listitem>
          </varlistentry>
          <varlistentry>
-          <term><literal>g</literal> (Generate data)</term>
+          <term><literal>{</literal> (begin transaction)</term>
+          <listitem>
+           <para>
+            Emit a <command>BEGIN</command>.
+           </para>
+           <para>
+            Beware that some steps may not work when called within an explicit transaction.
+           </para>
+          </listitem>
+         </varlistentry>
+         <varlistentry>
+          <term><literal>g</literal> or <literal>G</literal> (Generate data, client or server side)</term>
           <listitem>
            <para>
             Generate data and load it into the standard tables,
             replacing any data already present.
            </para>
+           <para>
+            When data is generated server side, there is no progress output.
+           </para>
+          </listitem>
+         </varlistentry>
+         <varlistentry>
+          <term><literal>}</literal> (commit transaction)</term>
+          <listitem>
+           <para>
+            Emit a <command>COMMIT</command>.
+           </para>
           </listitem>
          </varlistentry>
          <varlistentry>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e72ad0036e..be10630926 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -130,7 +130,13 @@ static int	pthread_join(pthread_t th, void **thread_return);
 /********************************************************************
  * some configurable parameters */
 
-#define DEFAULT_INIT_STEPS "dtgvp"	/* default -I setting */
+/*
+ * we do all data generation in one transaction to enable the backend's
+ * data-loading optimizations
+ */
+#define DEFAULT_INIT_STEPS "dt{g}vp"	/* default -I setting */
+
+#define ALL_INIT_STEPS "dtgGvpf{}"		/* all possible steps */
 
 #define LOG_STEP_SECONDS	5	/* seconds between log messages */
 #define DEFAULT_NXACTS	10		/* default nxacts */
@@ -626,7 +632,7 @@ usage(void)
 		   "  %s [OPTION]... [DBNAME]\n"
 		   "\nInitialization options:\n"
 		   "  -i, --initialize         invokes initialization mode\n"
-		   "  -I, --init-steps=[dtgvpf]+ (default \"dtgvp\")\n"
+		   "  -I, --init-steps=[" ALL_INIT_STEPS "]+ (default \"" DEFAULT_INIT_STEPS "\")\n"
 		   "                           run selected initialization steps\n"
 		   "  -F, --fillfactor=NUM     set fill factor\n"
 		   "  -n, --no-vacuum          do not run VACUUM during initialization\n"
@@ -3802,10 +3808,23 @@ append_fillfactor(char *opts, int len)
 }
 
 /*
- * Fill the standard tables with some data
+ * truncate away any old data, in one command in case there are foreign keys
  */
 static void
-initGenerateData(PGconn *con)
+initTruncateTables(PGconn *con)
+{
+	executeStatement(con, "truncate table "
+					 "pgbench_accounts, "
+					 "pgbench_branches, "
+					 "pgbench_history, "
+					 "pgbench_tellers");
+}
+
+/*
+ * Fill the standard tables with some data, from the client side
+ */
+static void
+initGenerateDataClientSide(PGconn *con)
 {
 	char		sql[256];
 	PGresult   *res;
@@ -3819,23 +3838,9 @@ initGenerateData(PGconn *con)
 				remaining_sec;
 	int			log_interval = 1;
 
-	fprintf(stderr, "generating data...\n");
+	fprintf(stderr, "generating data client side...\n");
 
-	/*
-	 * we do all of this in one transaction to enable the backend's
-	 * data-loading optimizations
-	 */
-	executeStatement(con, "begin");
-
-	/*
-	 * truncate away any old data, in one command in case there are foreign
-	 * keys
-	 */
-	executeStatement(con, "truncate table "
-					 "pgbench_accounts, "
-					 "pgbench_branches, "
-					 "pgbench_history, "
-					 "pgbench_tellers");
+	initTruncateTables(con);
 
 	/*
 	 * fill branches, tellers, accounts in that order in case foreign keys
@@ -3935,8 +3940,41 @@ initGenerateData(PGconn *con)
 		fprintf(stderr, "PQendcopy failed\n");
 		exit(1);
 	}
+}
 
-	executeStatement(con, "commit");
+/*
+ * Fill the standard tables with some data, from the server side
+ *
+ * As already the case with the client-side filling, the filler
+ * column defaults to NULL in pgbench_branches and pgbench_tellers,
+ * and is a blank-padded string in pgbench_accounts.
+ */
+static void
+initGenerateDataServerSide(PGconn *con)
+{
+	char		sql[256];
+
+	fprintf(stderr, "generating data server side...\n");
+
+	initTruncateTables(con);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_branches(bid,bbalance) "
+			 "select bid, 0 "
+			 "from generate_series(1, %d) as bid", scale);
+	executeStatement(con, sql);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_tellers(tid,bid,tbalance) "
+			 "select tid, (tid - 1) / %d + 1, 0 "
+			 "from generate_series(1, %d) as tid", ntellers, scale * ntellers);
+	executeStatement(con, sql);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_accounts(aid,bid,abalance,filler) "
+			 "select aid, (aid - 1) / %d + 1, 0, '' "
+			 "from generate_series(1, %d) as aid", naccounts, scale * naccounts);
+	executeStatement(con, sql);
 }
 
 /*
@@ -4020,6 +4058,7 @@ static void
 checkInitSteps(const char *initialize_steps)
 {
 	const char *step;
+	int begins = 0;
 
 	if (initialize_steps[0] == '\0')
 	{
@@ -4029,13 +4068,46 @@ checkInitSteps(const char *initialize_steps)
 
 	for (step = initialize_steps; *step != '\0'; step++)
 	{
-		if (strchr("dtgvpf ", *step) == NULL)
+		if (strchr(ALL_INIT_STEPS " ", *step) == NULL)
 		{
-			fprintf(stderr, "unrecognized initialization step \"%c\"\n",
+			fprintf(stderr,
+					"unrecognized initialization step \"%c\"\n"
+					"Allowed step characters are: \"" ALL_INIT_STEPS "\".\n",
 					*step);
-			fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\", \"p\", \"f\"\n");
 			exit(1);
 		}
+
+		if (*step == 'v' && begins > 0)
+		{
+			fprintf(stderr, "VACUUM cannot run inside a transaction block\n");
+			exit(1);
+		}
+
+		/* count BEGIN/COMMIT for matching */
+		if (*step == '{')
+			begins++;
+		else if (*step == '}')
+			begins--;
+
+		if (begins < 0)
+		{
+			fprintf(stderr, "COMMIT \"}\" at char %ld of \"%s\" does not have a matching BEGIN \"{\"\n",
+							step - initialize_steps, initialize_steps);
+			exit(1);
+		}
+		else if (begins >= 2)
+		{
+			fprintf(stderr, "BEGIN \"{\" within a BEGIN at char %ld of \"%s\", nested transactions are not supported\n",
+					step - initialize_steps, initialize_steps);
+			exit(1);
+		}
+	}
+
+	if (begins > 0)
+	{
+		fprintf(stderr, "a BEGIN \"{\" in \"%s\" does not have a matching COMMIT \"}\"\n",
+						initialize_steps);
+		exit(1);
 	}
 }
 
@@ -4073,9 +4145,20 @@ runInitSteps(const char *initialize_steps)
 				op = "create tables";
 				initCreateTables(con);
 				break;
+			case '{':
+				executeStatement(con, "begin");
+				break;
 			case 'g':
-				op = "generate";
-				initGenerateData(con);
+				op = "client generate";
+				initGenerateDataClientSide(con);
+				break;
+			case 'G':
+				op = "server generate";
+				initGenerateDataServerSide(con);
+				break;
+			case '}':
+				op = "commit";
+				executeStatement(con, "commit");
 				break;
 			case 'v':
 				op = "vacuum";
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index c441626d7c..256b377e09 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -113,7 +113,7 @@ pgbench(
 
 # Again, with all possible options
 pgbench(
-	'--initialize --init-steps=dtpvg --scale=1 --unlogged-tables --fillfactor=98 --foreign-keys --quiet --tablespace=regress_pgbench_tap_1_ts --index-tablespace=regress_pgbench_tap_1_ts --partitions=2 --partition-method=hash',
+	'--initialize --init-steps=dtpv{g} --scale=1 --unlogged-tables --fillfactor=98 --foreign-keys --quiet --tablespace=regress_pgbench_tap_1_ts --index-tablespace=regress_pgbench_tap_1_ts --partitions=2 --partition-method=hash',
 	0,
 	[qr{^$}i],
 	[
@@ -130,7 +130,7 @@ pgbench(
 
 # Test interaction of --init-steps with legacy step-selection options
 pgbench(
-	'--initialize --init-steps=dtpvgvv --no-vacuum --foreign-keys --unlogged-tables --partitions=3',
+	'--initialize --init-steps=dtpv{G}vv --no-vacuum --foreign-keys --unlogged-tables --partitions=3',
 	0,
 	[qr{^$}],
 	[
@@ -138,7 +138,7 @@ pgbench(
 		qr{creating tables},
 		qr{creating 3 partitions},
 		qr{creating primary keys},
-		qr{.* of .* tuples \(.*\) done},
+		qr{generating data server side},
 		qr{creating foreign keys},
 		qr{(?!vacuuming)}, # no vacuum
 		qr{done in \d+\.\d\d s }
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 1e9542af3f..43b20ebcde 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -147,7 +147,27 @@ my @options = (
 	[
 		'invalid init step',
 		'-i -I dta',
-		[ qr{unrecognized initialization step}, qr{allowed steps are} ]
+		[ qr{unrecognized initialization step}, qr{Allowed step characters are} ]
+	],
+	[
+		'invalid init step begin/begin',
+		'-i -I {{',
+		[ qr{nested transactions are not supported} ]
+	],
+	[
+		'invalid init step end',
+		'-i -I }',
+		[ qr{does not have a matching BEGIN} ]
+	],
+	[
+		'invalid init step begin',
+		'-i -I {dt',
+		[ qr{does not have a matching COMMIT} ]
+	],
+	[
+		'invalid vacuum inside transaction block',
+		'-i -I {v}',
+		[ qr{VACUUM cannot run inside a transaction block} ]
 	],
 	[
 		'bad random seed',
#14Fujii Masao
masao.fujii@gmail.com
In reply to: Fabien COELHO (#13)
Re: pgbench - extend initialization phase control

On Thu, Oct 24, 2019 at 9:16 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello Masao-san,

The benefit of controlling where begin/end actually occur is that it may
have an impact on performance, and it allows to check that.

I still fail to understand the benefit of addition of () settings.
Could you clarify what case () settings are useful for? You are
thinking to execute all initialization SQL statements within
single transaction, e.g., -I (dtgp), for some reasons?

Yep. Or anything else, including without (), to allow checking the
performance impact or non impact of transactions on the initialization
phase.

Is there actually such performance impact? AFAIR most time-consuming part in
initialization phase is the generation of pgbench_accounts data. This part is
performed within single transaction whether () are specified or not. No?
So I'm not sure how () are useful to check performance impact in init phase.
Maybe I'm missing something...

When using ( and ) with the -I, the documentation should indicate that double
quotes are required,

Or single quotes, or backslash, if launch from the command line. I added a
mention of escaping or protection in the doc in that case.

What about using, for example, b (BEGIN) and c (COMMIT) instead
to avoid such restriction?

It is indeed possible. Using a open/close symmetric character ( (), {},
[]) looks more pleasing and allows to see easily whether everything is
properly closed. I switched to {} which does not generate the same quoting
issue in shell.

I think that it's better to check whehter "v" is enclosed with () or not
at the beginning of pgbench, and report an error if it is.

Otherwise, if -I (dtgv) is specified, pgbench reports an error after
time-consuming data generation is performed, and of course that data
generation is rollbacked.

Patch v5 attached added a check for v inside (), although I'm not keen on
putting it there, and uses {} instead of ().

Thanks for updating the patch!

Regards,

--
Fujii Masao

#15Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Fujii Masao (#14)
Re: pgbench - extend initialization phase control

Hello,

Yep. Or anything else, including without (), to allow checking the
performance impact or non impact of transactions on the initialization
phase.

Is there actually such performance impact? AFAIR most time-consuming part in
initialization phase is the generation of pgbench_accounts data.

Maybe. If you cannot check, you can only guess. Probably it should be
small, but the current version does not allow to check whether it is so.

--
Fabien.

#16Fujii Masao
masao.fujii@gmail.com
In reply to: Fabien COELHO (#15)
Re: pgbench - extend initialization phase control

On Fri, Oct 25, 2019 at 12:06 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello,

Yep. Or anything else, including without (), to allow checking the
performance impact or non impact of transactions on the initialization
phase.

Is there actually such performance impact? AFAIR most time-consuming part in
initialization phase is the generation of pgbench_accounts data.

Maybe. If you cannot check, you can only guess. Probably it should be
small, but the current version does not allow to check whether it is so.

Could you elaborate what you actually want to measure the performance
impact by adding explicit begin and commit? Currently pgbench -i issues
the following queries. The data generation part is already executed within
single transaction. You want to execute not only data generation but also
drop/creation of tables within single transaction, and measure how much
performance impact happens? I'm sure that would be negligible.
Or you want to execute data generate in multiple transactions, i.e.,
execute each statement for data generation (e.g., one INSERT) in single
transaction, and then want to measure the performance impact?
But the patch doesn't enable us to do such data generation yet.

So I'm thinking that it's maybe better to commit the addtion of "G" option
first separately. And then we can discuss how much "(" and ")" options
are useful later.

------------------------------------------
drop table if exists pgbench_accounts, pgbench_branches,
pgbench_history, pgbench_tellers
create table pgbench_history(tid int,bid int,aid int,delta
int,mtime timestamp,filler char(22))
create table pgbench_tellers(tid int not null,bid int,tbalance
int,filler char(84)) with (fillfactor=100)
create table pgbench_accounts(aid int not null,bid int,abalance
int,filler char(84)) with (fillfactor=100)
create table pgbench_branches(bid int not null,bbalance int,filler
char(88)) with (fillfactor=100)
begin
truncate table pgbench_accounts, pgbench_branches, pgbench_history,
pgbench_tellers
insert into pgbench_branches(bid,bbalance) values(1,0)
insert into pgbench_tellers(tid,bid,tbalance) values (1,1,0)
insert into pgbench_tellers(tid,bid,tbalance) values (2,1,0)
insert into pgbench_tellers(tid,bid,tbalance) values (3,1,0)
insert into pgbench_tellers(tid,bid,tbalance) values (4,1,0)
insert into pgbench_tellers(tid,bid,tbalance) values (5,1,0)
insert into pgbench_tellers(tid,bid,tbalance) values (6,1,0)
insert into pgbench_tellers(tid,bid,tbalance) values (7,1,0)
insert into pgbench_tellers(tid,bid,tbalance) values (8,1,0)
insert into pgbench_tellers(tid,bid,tbalance) values (9,1,0)
insert into pgbench_tellers(tid,bid,tbalance) values (10,1,0)
copy pgbench_accounts from stdin
commit
vacuum analyze pgbench_branches
vacuum analyze pgbench_tellers
vacuum analyze pgbench_accounts
vacuum analyze pgbench_history
alter table pgbench_branches add primary key (bid)
alter table pgbench_tellers add primary key (tid)
alter table pgbench_accounts add primary key (aid)
------------------------------------------

Regards,

--
Fujii Masao

#17Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Fujii Masao (#16)
1 attachment(s)
Re: pgbench - extend initialization phase control

Hello Masao-san,

Maybe. If you cannot check, you can only guess. Probably it should be
small, but the current version does not allow to check whether it is so.

Could you elaborate what you actually want to measure the performance
impact by adding explicit begin and commit? Currently pgbench -i issues
the following queries. The data generation part is already executed within
single transaction. You want to execute not only data generation but also
drop/creation of tables within single transaction, and measure how much
performance impact happens? I'm sure that would be negligible.
Or you want to execute data generate in multiple transactions, i.e.,
execute each statement for data generation (e.g., one INSERT) in single
transaction, and then want to measure the performance impact?
But the patch doesn't enable us to do such data generation yet.

Indeed, you cannot do this precise thing, but you can do others.

So I'm thinking that it's maybe better to commit the addtion of "G" option
first separately. And then we can discuss how much "(" and ")" options
are useful later.

Attached patch v6 only provides G - server side data generation.

--
Fabien.

Attachments:

pgbench-init-extended-6.patchtext/x-diff; name=pgbench-init-extended-6.patchDownload
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index e3a0abb4c7..d3d483a380 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -193,12 +193,15 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
           </listitem>
          </varlistentry>
          <varlistentry>
-          <term><literal>g</literal> (Generate data)</term>
+          <term><literal>g</literal> or <literal>G</literal> (Generate data, client or server side)</term>
           <listitem>
            <para>
             Generate data and load it into the standard tables,
             replacing any data already present.
            </para>
+           <para>
+            When data is generated server side, there is no progress output.
+           </para>
           </listitem>
          </varlistentry>
          <varlistentry>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 03bcd22996..e3c1e55bbb 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -131,8 +131,14 @@ static int	pthread_join(pthread_t th, void **thread_return);
 /********************************************************************
  * some configurable parameters */
 
+/*
+ * we do all data generation in one transaction to enable the backend's
+ * data-loading optimizations
+ */
 #define DEFAULT_INIT_STEPS "dtgvp"	/* default -I setting */
 
+#define ALL_INIT_STEPS "dtgGvpf"	/* all possible steps */
+
 #define LOG_STEP_SECONDS	5	/* seconds between log messages */
 #define DEFAULT_NXACTS	10		/* default nxacts */
 
@@ -627,7 +633,7 @@ usage(void)
 		   "  %s [OPTION]... [DBNAME]\n"
 		   "\nInitialization options:\n"
 		   "  -i, --initialize         invokes initialization mode\n"
-		   "  -I, --init-steps=[dtgvpf]+ (default \"dtgvp\")\n"
+		   "  -I, --init-steps=[" ALL_INIT_STEPS "]+ (default \"" DEFAULT_INIT_STEPS "\")\n"
 		   "                           run selected initialization steps\n"
 		   "  -F, --fillfactor=NUM     set fill factor\n"
 		   "  -n, --no-vacuum          do not run VACUUM during initialization\n"
@@ -3803,10 +3809,23 @@ append_fillfactor(char *opts, int len)
 }
 
 /*
- * Fill the standard tables with some data
+ * truncate away any old data, in one command in case there are foreign keys
  */
 static void
-initGenerateData(PGconn *con)
+initTruncateTables(PGconn *con)
+{
+	executeStatement(con, "truncate table "
+					 "pgbench_accounts, "
+					 "pgbench_branches, "
+					 "pgbench_history, "
+					 "pgbench_tellers");
+}
+
+/*
+ * Fill the standard tables with some data, from the client side
+ */
+static void
+initGenerateDataClientSide(PGconn *con)
 {
 	char		sql[256];
 	PGresult   *res;
@@ -3820,7 +3839,7 @@ initGenerateData(PGconn *con)
 				remaining_sec;
 	int			log_interval = 1;
 
-	fprintf(stderr, "generating data...\n");
+	fprintf(stderr, "generating data client side...\n");
 
 	/*
 	 * we do all of this in one transaction to enable the backend's
@@ -3828,15 +3847,7 @@ initGenerateData(PGconn *con)
 	 */
 	executeStatement(con, "begin");
 
-	/*
-	 * truncate away any old data, in one command in case there are foreign
-	 * keys
-	 */
-	executeStatement(con, "truncate table "
-					 "pgbench_accounts, "
-					 "pgbench_branches, "
-					 "pgbench_history, "
-					 "pgbench_tellers");
+	initTruncateTables(con);
 
 	/*
 	 * fill branches, tellers, accounts in that order in case foreign keys
@@ -3940,6 +3951,49 @@ initGenerateData(PGconn *con)
 	executeStatement(con, "commit");
 }
 
+/*
+ * Fill the standard tables with some data, from the server side
+ *
+ * As already the case with the client-side filling, the filler
+ * column defaults to NULL in pgbench_branches and pgbench_tellers,
+ * and is a blank-padded string in pgbench_accounts.
+ */
+static void
+initGenerateDataServerSide(PGconn *con)
+{
+	char		sql[256];
+
+	fprintf(stderr, "generating data server side...\n");
+
+	/*
+	 * we do all of this in one transaction to enable the backend's
+	 * data-loading optimizations
+	 */
+	executeStatement(con, "begin");
+
+	initTruncateTables(con);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_branches(bid,bbalance) "
+			 "select bid, 0 "
+			 "from generate_series(1, %d) as bid", scale);
+	executeStatement(con, sql);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_tellers(tid,bid,tbalance) "
+			 "select tid, (tid - 1) / %d + 1, 0 "
+			 "from generate_series(1, %d) as tid", ntellers, scale * ntellers);
+	executeStatement(con, sql);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_accounts(aid,bid,abalance,filler) "
+			 "select aid, (aid - 1) / %d + 1, 0, '' "
+			 "from generate_series(1, %d) as aid", naccounts, scale * naccounts);
+	executeStatement(con, sql);
+
+	executeStatement(con, "commit");
+}
+
 /*
  * Invoke vacuum on the standard tables
  */
@@ -4020,21 +4074,20 @@ initCreateFKeys(PGconn *con)
 static void
 checkInitSteps(const char *initialize_steps)
 {
-	const char *step;
-
 	if (initialize_steps[0] == '\0')
 	{
 		fprintf(stderr, "no initialization steps specified\n");
 		exit(1);
 	}
 
-	for (step = initialize_steps; *step != '\0'; step++)
+	for (const char *step = initialize_steps; *step != '\0'; step++)
 	{
-		if (strchr("dtgvpf ", *step) == NULL)
+		if (strchr(ALL_INIT_STEPS " ", *step) == NULL)
 		{
-			fprintf(stderr, "unrecognized initialization step \"%c\"\n",
+			fprintf(stderr,
+					"unrecognized initialization step \"%c\"\n"
+					"Allowed step characters are: \"" ALL_INIT_STEPS "\".\n",
 					*step);
-			fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\", \"p\", \"f\"\n");
 			exit(1);
 		}
 	}
@@ -4075,8 +4128,12 @@ runInitSteps(const char *initialize_steps)
 				initCreateTables(con);
 				break;
 			case 'g':
-				op = "generate";
-				initGenerateData(con);
+				op = "client generate";
+				initGenerateDataClientSide(con);
+				break;
+			case 'G':
+				op = "server generate";
+				initGenerateDataServerSide(con);
 				break;
 			case 'v':
 				op = "vacuum";
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index c441626d7c..4384d0072a 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -130,7 +130,7 @@ pgbench(
 
 # Test interaction of --init-steps with legacy step-selection options
 pgbench(
-	'--initialize --init-steps=dtpvgvv --no-vacuum --foreign-keys --unlogged-tables --partitions=3',
+	'--initialize --init-steps=dtpvGvv --no-vacuum --foreign-keys --unlogged-tables --partitions=3',
 	0,
 	[qr{^$}],
 	[
@@ -138,7 +138,7 @@ pgbench(
 		qr{creating tables},
 		qr{creating 3 partitions},
 		qr{creating primary keys},
-		qr{.* of .* tuples \(.*\) done},
+		qr{generating data server side},
 		qr{creating foreign keys},
 		qr{(?!vacuuming)}, # no vacuum
 		qr{done in \d+\.\d\d s }
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 1e9542af3f..8b6d442812 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -147,7 +147,7 @@ my @options = (
 	[
 		'invalid init step',
 		'-i -I dta',
-		[ qr{unrecognized initialization step}, qr{allowed steps are} ]
+		[ qr{unrecognized initialization step}, qr{Allowed step characters are} ]
 	],
 	[
 		'bad random seed',
#18Fujii Masao
masao.fujii@gmail.com
In reply to: Fabien COELHO (#17)
Re: pgbench - extend initialization phase control

On Mon, Oct 28, 2019 at 10:36 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello Masao-san,

Maybe. If you cannot check, you can only guess. Probably it should be
small, but the current version does not allow to check whether it is so.

Could you elaborate what you actually want to measure the performance
impact by adding explicit begin and commit? Currently pgbench -i issues
the following queries. The data generation part is already executed within
single transaction. You want to execute not only data generation but also
drop/creation of tables within single transaction, and measure how much
performance impact happens? I'm sure that would be negligible.
Or you want to execute data generate in multiple transactions, i.e.,
execute each statement for data generation (e.g., one INSERT) in single
transaction, and then want to measure the performance impact?
But the patch doesn't enable us to do such data generation yet.

Indeed, you cannot do this precise thing, but you can do others.

So I'm thinking that it's maybe better to commit the addtion of "G" option
first separately. And then we can discuss how much "(" and ")" options
are useful later.

Attached patch v6 only provides G - server side data generation.

Thanks for the patch!

+ snprintf(sql, sizeof(sql),
+ "insert into pgbench_branches(bid,bbalance) "
+ "select bid, 0 "
+ "from generate_series(1, %d) as bid", scale);

"scale" should be "nbranches * scale".

+ snprintf(sql, sizeof(sql),
+ "insert into pgbench_accounts(aid,bid,abalance,filler) "
+ "select aid, (aid - 1) / %d + 1, 0, '' "
+ "from generate_series(1, %d) as aid", naccounts, scale * naccounts);

Like client-side data generation, INT64_FORMAT should be used here
instead of %d?

If large scale factor is specified, the query for generating pgbench_accounts
data can take a very long time. While that query is running, operators may be
likely to do Ctrl-C to cancel the data generation. In this case, IMO pgbench
should cancel the query, i.e., call PQcancel(). Otherwise, the query will keep
running to the end.

- for (step = initialize_steps; *step != '\0'; step++)
+ for (const char *step = initialize_steps; *step != '\0'; step++)

Per PostgreSQL basic coding style, ISTM that "const char *step"
should be declared separately from "for" loop, like the original.

- fprintf(stderr, "unrecognized initialization step \"%c\"\n",
+ fprintf(stderr,
+ "unrecognized initialization step \"%c\"\n"
+ "Allowed step characters are: \"" ALL_INIT_STEPS "\".\n",
  *step);
- fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\",
\"p\", \"f\"\n");

The original message seems better to me. So what about just appending "G"
into the above latter message? That is,
"allowed steps are: \"d\", \"t\", \"g\", \"G\", \"v\", \"p\", \"f\"\n"

-          <term><literal>g</literal> (Generate data)</term>
+          <term><literal>g</literal> or <literal>G</literal>
(Generate data, client or server side)</term>

Isn't it better to explain a bit more what "client-side / server-side data
generation" is? For example, something like

When "g" (client-side data generation) is specified, data is generated
in pgbench client and sent to the server. When "G" (server-side data
generation) is specified, only queries are sent from pgbench client
and then data is generated in the server. If the network bandwidth is low
between pgbench and the server, using "G" might make the data
generation faster.

Regards,

--
Fujii Masao

#19Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Fujii Masao (#18)
1 attachment(s)
Re: pgbench - extend initialization phase control

Hello Masao-san,

+ snprintf(sql, sizeof(sql),
+ "insert into pgbench_branches(bid,bbalance) "
+ "select bid, 0 "
+ "from generate_series(1, %d) as bid", scale);

"scale" should be "nbranches * scale".

Yep, even if nbranches is 1, it should be there.

+ snprintf(sql, sizeof(sql),
+ "insert into pgbench_accounts(aid,bid,abalance,filler) "
+ "select aid, (aid - 1) / %d + 1, 0, '' "
+ "from generate_series(1, %d) as aid", naccounts, scale * naccounts);

Like client-side data generation, INT64_FORMAT should be used here
instead of %d?

Indeed.

If large scale factor is specified, the query for generating pgbench_accounts
data can take a very long time. While that query is running, operators may be
likely to do Ctrl-C to cancel the data generation. In this case, IMO pgbench
should cancel the query, i.e., call PQcancel(). Otherwise, the query will keep
running to the end.

Hmmm. Why not. Now the infra to do that seems to already exists twice,
once in "src/bin/psql/common.c" and once in "src/bin/scripts/common.c".

I cannot say I'm thrilled to replicate this once more. I think that the
reasonable option is to share this in fe-utils and then to reuse it from
there. However, ISTM that such a restructuring patch which not belong to
this feature.

- for (step = initialize_steps; *step != '\0'; step++)
+ for (const char *step = initialize_steps; *step != '\0'; step++)

Per PostgreSQL basic coding style,

C99 (20 years ago) is new the norm, and this style is now allowed, there
are over a hundred instances of these already. I tend to use that where
appropriate.

- fprintf(stderr, "unrecognized initialization step \"%c\"\n",
+ fprintf(stderr,
+ "unrecognized initialization step \"%c\"\n"
+ "Allowed step characters are: \"" ALL_INIT_STEPS "\".\n",
*step);
- fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\",
\"p\", \"f\"\n");

The original message seems better to me. So what about just appending "G"
into the above latter message? That is,
"allowed steps are: \"d\", \"t\", \"g\", \"G\", \"v\", \"p\", \"f\"\n"

I needed this list in several places, so it makes sense to share the
definition, and frankly the list of half a dozen comma-separated chars
does not strike me as much better than just giving the allowed chars
directly. So the simpler the better, from my point of view.

Isn't it better to explain a bit more what "client-side / server-side data
generation" is? For example, something like

Ok.

Attached v7 does most of the above, but the list of char message and the
signal handling. The first one does not look really better to me, and the
second one belongs to a restructuring patch that I'll try to submit.

--
Fabien.

Attachments:

pgbench-init-extended-7.patchtext/x-diff; name=pgbench-init-extended-7.patchDownload
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index e3a0abb4c7..7be9c81c43 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -193,12 +193,25 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
           </listitem>
          </varlistentry>
          <varlistentry>
-          <term><literal>g</literal> (Generate data)</term>
+          <term><literal>g</literal> or <literal>G</literal> (Generate data, client-side or server-side)</term>
           <listitem>
            <para>
             Generate data and load it into the standard tables,
             replacing any data already present.
            </para>
+           <para>
+            With <literal>g</literal> (client-side data generation),
+            data is generated in <command>pgbench</command> client and then
+            sent to the server. This uses the client/server bandwidth
+            extensively through a <command>COPY</command>.
+           </para>
+           <para>
+            With <literal>G</literal> (server-side data generation),
+            only limited queries are sent from <command>pgbench</command>
+            client and then data is actually generated in the server.
+            No significant bandwidth is required for this variant, but
+            the server will do more work.
+           </para>
           </listitem>
          </varlistentry>
          <varlistentry>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 03bcd22996..7f5c1d00c8 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -131,8 +131,14 @@ static int	pthread_join(pthread_t th, void **thread_return);
 /********************************************************************
  * some configurable parameters */
 
+/*
+ * we do all data generation in one transaction to enable the backend's
+ * data-loading optimizations
+ */
 #define DEFAULT_INIT_STEPS "dtgvp"	/* default -I setting */
 
+#define ALL_INIT_STEPS "dtgGvpf"	/* all possible steps */
+
 #define LOG_STEP_SECONDS	5	/* seconds between log messages */
 #define DEFAULT_NXACTS	10		/* default nxacts */
 
@@ -627,7 +633,7 @@ usage(void)
 		   "  %s [OPTION]... [DBNAME]\n"
 		   "\nInitialization options:\n"
 		   "  -i, --initialize         invokes initialization mode\n"
-		   "  -I, --init-steps=[dtgvpf]+ (default \"dtgvp\")\n"
+		   "  -I, --init-steps=[" ALL_INIT_STEPS "]+ (default \"" DEFAULT_INIT_STEPS "\")\n"
 		   "                           run selected initialization steps\n"
 		   "  -F, --fillfactor=NUM     set fill factor\n"
 		   "  -n, --no-vacuum          do not run VACUUM during initialization\n"
@@ -3803,10 +3809,23 @@ append_fillfactor(char *opts, int len)
 }
 
 /*
- * Fill the standard tables with some data
+ * truncate away any old data, in one command in case there are foreign keys
  */
 static void
-initGenerateData(PGconn *con)
+initTruncateTables(PGconn *con)
+{
+	executeStatement(con, "truncate table "
+					 "pgbench_accounts, "
+					 "pgbench_branches, "
+					 "pgbench_history, "
+					 "pgbench_tellers");
+}
+
+/*
+ * Fill the standard tables with some data, from the client side
+ */
+static void
+initGenerateDataClientSide(PGconn *con)
 {
 	char		sql[256];
 	PGresult   *res;
@@ -3820,7 +3839,7 @@ initGenerateData(PGconn *con)
 				remaining_sec;
 	int			log_interval = 1;
 
-	fprintf(stderr, "generating data...\n");
+	fprintf(stderr, "generating data client side...\n");
 
 	/*
 	 * we do all of this in one transaction to enable the backend's
@@ -3828,15 +3847,7 @@ initGenerateData(PGconn *con)
 	 */
 	executeStatement(con, "begin");
 
-	/*
-	 * truncate away any old data, in one command in case there are foreign
-	 * keys
-	 */
-	executeStatement(con, "truncate table "
-					 "pgbench_accounts, "
-					 "pgbench_branches, "
-					 "pgbench_history, "
-					 "pgbench_tellers");
+	initTruncateTables(con);
 
 	/*
 	 * fill branches, tellers, accounts in that order in case foreign keys
@@ -3940,6 +3951,49 @@ initGenerateData(PGconn *con)
 	executeStatement(con, "commit");
 }
 
+/*
+ * Fill the standard tables with some data, from the server side
+ *
+ * As already the case with the client-side filling, the filler
+ * column defaults to NULL in pgbench_branches and pgbench_tellers,
+ * and is a blank-padded string in pgbench_accounts.
+ */
+static void
+initGenerateDataServerSide(PGconn *con)
+{
+	char		sql[256];
+
+	fprintf(stderr, "generating data server side...\n");
+
+	/*
+	 * we do all of this in one transaction to enable the backend's
+	 * data-loading optimizations
+	 */
+	executeStatement(con, "begin");
+
+	initTruncateTables(con);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_branches(bid,bbalance) "
+			 "select bid, 0 "
+			 "from generate_series(1, %d) as bid", nbranches * scale);
+	executeStatement(con, sql);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_tellers(tid,bid,tbalance) "
+			 "select tid, (tid - 1) / %d + 1, 0 "
+			 "from generate_series(1, %d) as tid", ntellers, ntellers * scale);
+	executeStatement(con, sql);
+
+	snprintf(sql, sizeof(sql),
+			 "insert into pgbench_accounts(aid,bid,abalance,filler) "
+			 "select aid, (aid - 1) / %d + 1, 0, '' "
+			 "from generate_series(1, "INT64_FORMAT") as aid", naccounts, (int64) naccounts * scale);
+	executeStatement(con, sql);
+
+	executeStatement(con, "commit");
+}
+
 /*
  * Invoke vacuum on the standard tables
  */
@@ -4020,21 +4074,20 @@ initCreateFKeys(PGconn *con)
 static void
 checkInitSteps(const char *initialize_steps)
 {
-	const char *step;
-
 	if (initialize_steps[0] == '\0')
 	{
 		fprintf(stderr, "no initialization steps specified\n");
 		exit(1);
 	}
 
-	for (step = initialize_steps; *step != '\0'; step++)
+	for (const char *step = initialize_steps; *step != '\0'; step++)
 	{
-		if (strchr("dtgvpf ", *step) == NULL)
+		if (strchr(ALL_INIT_STEPS " ", *step) == NULL)
 		{
-			fprintf(stderr, "unrecognized initialization step \"%c\"\n",
+			fprintf(stderr,
+					"unrecognized initialization step \"%c\"\n"
+					"Allowed step characters are: \"" ALL_INIT_STEPS "\".\n",
 					*step);
-			fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\", \"p\", \"f\"\n");
 			exit(1);
 		}
 	}
@@ -4075,8 +4128,12 @@ runInitSteps(const char *initialize_steps)
 				initCreateTables(con);
 				break;
 			case 'g':
-				op = "generate";
-				initGenerateData(con);
+				op = "client generate";
+				initGenerateDataClientSide(con);
+				break;
+			case 'G':
+				op = "server generate";
+				initGenerateDataServerSide(con);
 				break;
 			case 'v':
 				op = "vacuum";
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index c441626d7c..4384d0072a 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -130,7 +130,7 @@ pgbench(
 
 # Test interaction of --init-steps with legacy step-selection options
 pgbench(
-	'--initialize --init-steps=dtpvgvv --no-vacuum --foreign-keys --unlogged-tables --partitions=3',
+	'--initialize --init-steps=dtpvGvv --no-vacuum --foreign-keys --unlogged-tables --partitions=3',
 	0,
 	[qr{^$}],
 	[
@@ -138,7 +138,7 @@ pgbench(
 		qr{creating tables},
 		qr{creating 3 partitions},
 		qr{creating primary keys},
-		qr{.* of .* tuples \(.*\) done},
+		qr{generating data server side},
 		qr{creating foreign keys},
 		qr{(?!vacuuming)}, # no vacuum
 		qr{done in \d+\.\d\d s }
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 1e9542af3f..8b6d442812 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -147,7 +147,7 @@ my @options = (
 	[
 		'invalid init step',
 		'-i -I dta',
-		[ qr{unrecognized initialization step}, qr{allowed steps are} ]
+		[ qr{unrecognized initialization step}, qr{Allowed step characters are} ]
 	],
 	[
 		'bad random seed',
#20Fabien COELHO
fabien.coelho@mines-paristech.fr
In reply to: Fabien COELHO (#19)
Re: pgbench - extend initialization phase control

Hello Masao-san,

If large scale factor is specified, the query for generating
pgbench_accounts data can take a very long time. While that query is
running, operators may be likely to do Ctrl-C to cancel the data
generation. In this case, IMO pgbench should cancel the query, i.e.,
call PQcancel(). Otherwise, the query will keep running to the end.

Hmmm. Why not. Now the infra to do that seems to already exists twice, once
in "src/bin/psql/common.c" and once in "src/bin/scripts/common.c".

I cannot say I'm thrilled to replicate this once more. I think that the
reasonable option is to share this in fe-utils and then to reuse it from
there. However, ISTM that such a restructuring patch which not belong to this
feature. [...]

I just did a patch to share the code:

/messages/by-id/alpine.DEB.2.21.1910311939430.27369@lancre
https://commitfest.postgresql.org/25/2336/

--
Fabien.

#21Fujii Masao
masao.fujii@gmail.com
In reply to: Fabien COELHO (#19)
1 attachment(s)
Re: pgbench - extend initialization phase control

On Thu, Oct 31, 2019 at 11:54 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello Masao-san,

+ snprintf(sql, sizeof(sql),
+ "insert into pgbench_branches(bid,bbalance) "
+ "select bid, 0 "
+ "from generate_series(1, %d) as bid", scale);

"scale" should be "nbranches * scale".

Yep, even if nbranches is 1, it should be there.

+ snprintf(sql, sizeof(sql),
+ "insert into pgbench_accounts(aid,bid,abalance,filler) "
+ "select aid, (aid - 1) / %d + 1, 0, '' "
+ "from generate_series(1, %d) as aid", naccounts, scale * naccounts);

Like client-side data generation, INT64_FORMAT should be used here
instead of %d?

Indeed.

If large scale factor is specified, the query for generating pgbench_accounts
data can take a very long time. While that query is running, operators may be
likely to do Ctrl-C to cancel the data generation. In this case, IMO pgbench
should cancel the query, i.e., call PQcancel(). Otherwise, the query will keep
running to the end.

Hmmm. Why not. Now the infra to do that seems to already exists twice,
once in "src/bin/psql/common.c" and once in "src/bin/scripts/common.c".

I cannot say I'm thrilled to replicate this once more. I think that the
reasonable option is to share this in fe-utils and then to reuse it from
there. However, ISTM that such a restructuring patch which not belong to
this feature.

Understood. Ok, let's discuss this in other thread that you started.

- for (step = initialize_steps; *step != '\0'; step++)
+ for (const char *step = initialize_steps; *step != '\0'; step++)

Per PostgreSQL basic coding style,

C99 (20 years ago) is new the norm, and this style is now allowed, there
are over a hundred instances of these already. I tend to use that where
appropriate.

Yes, I understood there are several places using such style.
But I still wonder why we should apply such change here.
If there is the reason why this change is necessary here,
I'm OK with that. But if not, basically I'd like to avoid the change.
Otherwise it may make the back-patch a bit harder
when we change the surrounding code.

- fprintf(stderr, "unrecognized initialization step \"%c\"\n",
+ fprintf(stderr,
+ "unrecognized initialization step \"%c\"\n"
+ "Allowed step characters are: \"" ALL_INIT_STEPS "\".\n",
*step);
- fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\",
\"p\", \"f\"\n");

The original message seems better to me. So what about just appending "G"
into the above latter message? That is,
"allowed steps are: \"d\", \"t\", \"g\", \"G\", \"v\", \"p\", \"f\"\n"

I needed this list in several places, so it makes sense to share the
definition, and frankly the list of half a dozen comma-separated chars
does not strike me as much better than just giving the allowed chars
directly. So the simpler the better, from my point of view.

OK.

Isn't it better to explain a bit more what "client-side / server-side data
generation" is? For example, something like

Ok.

Attached v7 does most of the above, but the list of char message and the
signal handling. The first one does not look really better to me, and the
second one belongs to a restructuring patch that I'll try to submit.

Thanks for updating the patch!
Attached is the slightly updated version of the patch. Based on your
patch, I added the descriptions about logging of "g" and "G" steps into
the doc, and did some cosmetic changes. Barrying any objections,
I'm thinking to commit this patch.

While reviewing the patch, I found that current code allows space
character to be specified in -I. That is, checkInitSteps() accepts
space character. Why should we do this? Probably I understand
why runInitSteps() needs to accept space character (because "v"
in the specified string with -I is replaced with a space character
when --no-vacuum option is given). But I'm not sure why that's also
necessary in checkInitSteps(). Instead, we should treat a space
character as invalid in checkInitSteps()?

Regards,

--
Fujii Masao

Attachments:

pgbench-init-extended-7_fujii.patchapplication/octet-stream; name=pgbench-init-extended-7_fujii.patchDownload
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index e3a0abb4c7..a6370f00dd 100644
*** a/doc/src/sgml/ref/pgbench.sgml
--- b/doc/src/sgml/ref/pgbench.sgml
***************
*** 193,204 **** pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
            </listitem>
           </varlistentry>
           <varlistentry>
!           <term><literal>g</literal> (Generate data)</term>
            <listitem>
             <para>
              Generate data and load it into the standard tables,
              replacing any data already present.
             </para>
            </listitem>
           </varlistentry>
           <varlistentry>
--- 193,226 ----
            </listitem>
           </varlistentry>
           <varlistentry>
!           <term><literal>g</literal> or <literal>G</literal> (Generate data, client-side or server-side)</term>
            <listitem>
             <para>
              Generate data and load it into the standard tables,
              replacing any data already present.
             </para>
+            <para>
+             With <literal>g</literal> (client-side data generation),
+             data is generated in <command>pgbench</command> client and then
+             sent to the server. This uses the client/server bandwidth
+             extensively through a <command>COPY</command>.
+             Using <literal>g</literal> causes logging to print one message
+             each 100,000 rows when generating data into
+             <structname>pgbench_accounts</structname> table.
+            </para>
+            <para>
+             With <literal>G</literal> (server-side data generation),
+             only limited queries are sent from <command>pgbench</command>
+             client and then data is actually generated in the server.
+             No significant bandwidth is required for this variant, but
+             the server will do more work.
+             Using <literal>G</literal> causes logging to print no progress
+             message when generating data into
+             <structname>pgbench_accounts</structname> table.
+            </para>
+            <para>
+             
+            </para>
            </listitem>
           </varlistentry>
           <varlistentry>
***************
*** 262,270 **** pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
        <listitem>
         <para>
          Switch logging to quiet mode, producing only one progress message per 5
!         seconds. The default logging prints one message each 100000 rows, which
          often outputs many lines per second (especially on good hardware).
         </para>
        </listitem>
       </varlistentry>
  
--- 284,296 ----
        <listitem>
         <para>
          Switch logging to quiet mode, producing only one progress message per 5
!         seconds. The default logging prints one message each 100,000 rows, which
          often outputs many lines per second (especially on good hardware).
         </para>
+        <para>
+         This setting has no effect if <literal>G</literal> is specified
+         in <option>-I</option>.
+        </para>
        </listitem>
       </varlistentry>
  
diff --git a/src/bin/pgbench/pgbenchindex 03bcd22996..14dbc4510c 100644
*** a/src/bin/pgbench/pgbench.c
--- b/src/bin/pgbench/pgbench.c
***************
*** 132,137 **** static int	pthread_join(pthread_t th, void **thread_return);
--- 132,138 ----
   * some configurable parameters */
  
  #define DEFAULT_INIT_STEPS "dtgvp"	/* default -I setting */
+ #define ALL_INIT_STEPS "dtgGvpf"	/* all possible steps */
  
  #define LOG_STEP_SECONDS	5	/* seconds between log messages */
  #define DEFAULT_NXACTS	10		/* default nxacts */
***************
*** 627,633 **** usage(void)
  		   "  %s [OPTION]... [DBNAME]\n"
  		   "\nInitialization options:\n"
  		   "  -i, --initialize         invokes initialization mode\n"
! 		   "  -I, --init-steps=[dtgvpf]+ (default \"dtgvp\")\n"
  		   "                           run selected initialization steps\n"
  		   "  -F, --fillfactor=NUM     set fill factor\n"
  		   "  -n, --no-vacuum          do not run VACUUM during initialization\n"
--- 628,634 ----
  		   "  %s [OPTION]... [DBNAME]\n"
  		   "\nInitialization options:\n"
  		   "  -i, --initialize         invokes initialization mode\n"
! 		   "  -I, --init-steps=[" ALL_INIT_STEPS "]+ (default \"" DEFAULT_INIT_STEPS "\")\n"
  		   "                           run selected initialization steps\n"
  		   "  -F, --fillfactor=NUM     set fill factor\n"
  		   "  -n, --no-vacuum          do not run VACUUM during initialization\n"
***************
*** 3803,3812 **** append_fillfactor(char *opts, int len)
  }
  
  /*
!  * Fill the standard tables with some data
   */
  static void
! initGenerateData(PGconn *con)
  {
  	char		sql[256];
  	PGresult   *res;
--- 3804,3826 ----
  }
  
  /*
!  * Truncate away any old data, in one command in case there are foreign keys
   */
  static void
! initTruncateTables(PGconn *con)
! {
! 	executeStatement(con, "truncate table "
! 					 "pgbench_accounts, "
! 					 "pgbench_branches, "
! 					 "pgbench_history, "
! 					 "pgbench_tellers");
! }
! 
! /*
!  * Fill the standard tables with some data generated and sent from the client
!  */
! static void
! initGenerateDataClientSide(PGconn *con)
  {
  	char		sql[256];
  	PGresult   *res;
***************
*** 3820,3826 **** initGenerateData(PGconn *con)
  				remaining_sec;
  	int			log_interval = 1;
  
! 	fprintf(stderr, "generating data...\n");
  
  	/*
  	 * we do all of this in one transaction to enable the backend's
--- 3834,3840 ----
  				remaining_sec;
  	int			log_interval = 1;
  
! 	fprintf(stderr, "generating data (client-side)...\n");
  
  	/*
  	 * we do all of this in one transaction to enable the backend's
***************
*** 3828,3842 **** initGenerateData(PGconn *con)
  	 */
  	executeStatement(con, "begin");
  
! 	/*
! 	 * truncate away any old data, in one command in case there are foreign
! 	 * keys
! 	 */
! 	executeStatement(con, "truncate table "
! 					 "pgbench_accounts, "
! 					 "pgbench_branches, "
! 					 "pgbench_history, "
! 					 "pgbench_tellers");
  
  	/*
  	 * fill branches, tellers, accounts in that order in case foreign keys
--- 3842,3849 ----
  	 */
  	executeStatement(con, "begin");
  
! 	/* truncate away any old data */
! 	initTruncateTables(con);
  
  	/*
  	 * fill branches, tellers, accounts in that order in case foreign keys
***************
*** 3940,3945 **** initGenerateData(PGconn *con)
--- 3947,3997 ----
  	executeStatement(con, "commit");
  }
  
+ /*
+  * Fill the standard tables with some data generated on the server
+  *
+  * As already the case with the client-side data generation, the filler
+  * column defaults to NULL in pgbench_branches and pgbench_tellers,
+  * and is a blank-padded string in pgbench_accounts.
+  */
+ static void
+ initGenerateDataServerSide(PGconn *con)
+ {
+ 	char		sql[256];
+ 
+ 	fprintf(stderr, "generating data (server-side)...\n");
+ 
+ 	/*
+ 	 * we do all of this in one transaction to enable the backend's
+ 	 * data-loading optimizations
+ 	 */
+ 	executeStatement(con, "begin");
+ 
+ 	/* truncate away any old data */
+ 	initTruncateTables(con);
+ 
+ 	snprintf(sql, sizeof(sql),
+ 			 "insert into pgbench_branches(bid,bbalance) "
+ 			 "select bid, 0 "
+ 			 "from generate_series(1, %d) as bid", nbranches * scale);
+ 	executeStatement(con, sql);
+ 
+ 	snprintf(sql, sizeof(sql),
+ 			 "insert into pgbench_tellers(tid,bid,tbalance) "
+ 			 "select tid, (tid - 1) / %d + 1, 0 "
+ 			 "from generate_series(1, %d) as tid", ntellers, ntellers * scale);
+ 	executeStatement(con, sql);
+ 
+ 	snprintf(sql, sizeof(sql),
+ 			 "insert into pgbench_accounts(aid,bid,abalance,filler) "
+ 			 "select aid, (aid - 1) / %d + 1, 0, '' "
+ 			 "from generate_series(1, "INT64_FORMAT") as aid",
+ 			 naccounts, (int64) naccounts * scale);
+ 	executeStatement(con, sql);
+ 
+ 	executeStatement(con, "commit");
+ }
+ 
  /*
   * Invoke vacuum on the standard tables
   */
***************
*** 4020,4040 **** initCreateFKeys(PGconn *con)
  static void
  checkInitSteps(const char *initialize_steps)
  {
- 	const char *step;
- 
  	if (initialize_steps[0] == '\0')
  	{
  		fprintf(stderr, "no initialization steps specified\n");
  		exit(1);
  	}
  
! 	for (step = initialize_steps; *step != '\0'; step++)
  	{
! 		if (strchr("dtgvpf ", *step) == NULL)
  		{
! 			fprintf(stderr, "unrecognized initialization step \"%c\"\n",
  					*step);
! 			fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\", \"p\", \"f\"\n");
  			exit(1);
  		}
  	}
--- 4072,4092 ----
  static void
  checkInitSteps(const char *initialize_steps)
  {
  	if (initialize_steps[0] == '\0')
  	{
  		fprintf(stderr, "no initialization steps specified\n");
  		exit(1);
  	}
  
! 	for (const char *step = initialize_steps; *step != '\0'; step++)
  	{
! 		if (strchr(ALL_INIT_STEPS " ", *step) == NULL)
  		{
! 			fprintf(stderr,
! 					"unrecognized initialization step \"%c\"\n",
  					*step);
! 			fprintf(stderr,
! 					"Allowed step characters are: \"" ALL_INIT_STEPS "\".\n");
  			exit(1);
  		}
  	}
***************
*** 4075,4082 **** runInitSteps(const char *initialize_steps)
  				initCreateTables(con);
  				break;
  			case 'g':
! 				op = "generate";
! 				initGenerateData(con);
  				break;
  			case 'v':
  				op = "vacuum";
--- 4127,4138 ----
  				initCreateTables(con);
  				break;
  			case 'g':
! 				op = "client-side generate";
! 				initGenerateDataClientSide(con);
! 				break;
! 			case 'G':
! 				op = "server-side generate";
! 				initGenerateDataServerSide(con);
  				break;
  			case 'v':
  				op = "vacuum";
diff --git a/src/bin/pgbench/t/0index c441626d7c..1845869016 100644
*** a/src/bin/pgbench/t/001_pgbench_with_server.pl
--- b/src/bin/pgbench/t/001_pgbench_with_server.pl
***************
*** 130,136 **** pgbench(
  
  # Test interaction of --init-steps with legacy step-selection options
  pgbench(
! 	'--initialize --init-steps=dtpvgvv --no-vacuum --foreign-keys --unlogged-tables --partitions=3',
  	0,
  	[qr{^$}],
  	[
--- 130,136 ----
  
  # Test interaction of --init-steps with legacy step-selection options
  pgbench(
! 	'--initialize --init-steps=dtpvGvv --no-vacuum --foreign-keys --unlogged-tables --partitions=3',
  	0,
  	[qr{^$}],
  	[
***************
*** 138,144 **** pgbench(
  		qr{creating tables},
  		qr{creating 3 partitions},
  		qr{creating primary keys},
! 		qr{.* of .* tuples \(.*\) done},
  		qr{creating foreign keys},
  		qr{(?!vacuuming)}, # no vacuum
  		qr{done in \d+\.\d\d s }
--- 138,144 ----
  		qr{creating tables},
  		qr{creating 3 partitions},
  		qr{creating primary keys},
! 		qr{generating data \(server-side\)},
  		qr{creating foreign keys},
  		qr{(?!vacuuming)}, # no vacuum
  		qr{done in \d+\.\d\d s }
diff --git a/src/bin/pgbench/t/002_pgbench_no_serveindex 1e9542af3f..8b6d442812 100644
*** a/src/bin/pgbench/t/002_pgbench_no_server.pl
--- b/src/bin/pgbench/t/002_pgbench_no_server.pl
***************
*** 147,153 **** my @options = (
  	[
  		'invalid init step',
  		'-i -I dta',
! 		[ qr{unrecognized initialization step}, qr{allowed steps are} ]
  	],
  	[
  		'bad random seed',
--- 147,153 ----
  	[
  		'invalid init step',
  		'-i -I dta',
! 		[ qr{unrecognized initialization step}, qr{Allowed step characters are} ]
  	],
  	[
  		'bad random seed',
#22Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Fujii Masao (#21)
Re: pgbench - extend initialization phase control

Hello,

- for (step = initialize_steps; *step != '\0'; step++)
+ for (const char *step = initialize_steps; *step != '\0'; step++)

But I still wonder why we should apply such change here.

Because it removes one declaration and reduces the scope of one variable?

If there is the reason why this change is necessary here,

Nope, such changes are never necessary.

I'm OK with that. But if not, basically I'd like to avoid the change.
Otherwise it may make the back-patch a bit harder
when we change the surrounding code.

I think that this is small enough so that it can be managed, if any back
patch occurs on the surrounding code, which is anyway pretty unlikely.

Attached is the slightly updated version of the patch. Based on your
patch, I added the descriptions about logging of "g" and "G" steps into
the doc, and did some cosmetic changes. Barrying any objections,
I'm thinking to commit this patch.

I'd suggest:

"to print one message each ..." -> "to print one message every ..."

"to print no progress ..." -> "not to print any progress ..."

I would not call "fprintf(stderr" twice in a row if I can call it once.

While reviewing the patch, I found that current code allows space
character to be specified in -I. That is, checkInitSteps() accepts
space character. Why should we do this?

Probably I understand why runInitSteps() needs to accept space character
(because "v" in the specified string with -I is replaced with a space
character when --no-vacuum option is given).

Yes, that is the reason, otherwise the string would have to be shifted.

But I'm not sure why that's also necessary in checkInitSteps(). Instead,
we should treat a space character as invalid in checkInitSteps()?

I think that it may break --no-vacuum, and I thought that there may be
other option which remove things, eventually. Also, having a NO-OP looks
ok to me.

--
Fabien.

#23Fujii Masao
masao.fujii@gmail.com
In reply to: Fabien COELHO (#22)
Re: pgbench - extend initialization phase control

On Wed, Nov 6, 2019 at 6:23 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello,

- for (step = initialize_steps; *step != '\0'; step++)
+ for (const char *step = initialize_steps; *step != '\0'; step++)

But I still wonder why we should apply such change here.

Because it removes one declaration and reduces the scope of one variable?

If there is the reason why this change is necessary here,

Nope, such changes are never necessary.

I'm OK with that. But if not, basically I'd like to avoid the change.
Otherwise it may make the back-patch a bit harder
when we change the surrounding code.

I think that this is small enough so that it can be managed, if any back
patch occurs on the surrounding code, which is anyway pretty unlikely.

Attached is the slightly updated version of the patch. Based on your
patch, I added the descriptions about logging of "g" and "G" steps into
the doc, and did some cosmetic changes. Barrying any objections,
I'm thinking to commit this patch.

I'd suggest:

"to print one message each ..." -> "to print one message every ..."

"to print no progress ..." -> "not to print any progress ..."

I would not call "fprintf(stderr" twice in a row if I can call it once.

Thanks for the suggestion!
I updated the patch in that way and committed it!

This commit doesn't include the change "for (const char ...)"
and "merge two fprintf into one" ones that we were discussing.
Because they are trivial but I'm not sure if they are improvements
or not, yet. If they are, probably it's better to apply such changes
to all the places having the similar issues. But that seems overkill.

While reviewing the patch, I found that current code allows space
character to be specified in -I. That is, checkInitSteps() accepts
space character. Why should we do this?

Probably I understand why runInitSteps() needs to accept space character
(because "v" in the specified string with -I is replaced with a space
character when --no-vacuum option is given).

Yes, that is the reason, otherwise the string would have to be shifted.

But I'm not sure why that's also necessary in checkInitSteps(). Instead,
we should treat a space character as invalid in checkInitSteps()?

I think that it may break --no-vacuum, and I thought that there may be
other option which remove things, eventually. Also, having a NO-OP looks
ok to me.

As far as I read the code, checkInitSteps() checks the initialization
steps that users specified. The initialization steps string that
"v" was replaced with blank character is not given to checkInitSteps().
So ISTM that dropping the handling of blank character from
checkInitSteps() doesn't break --no-vacuum.

Regards,

--
Fujii Masao

#24btkimurayuzk
btkimurayuzk@oss.nttdata.com
In reply to: Fujii Masao (#23)
1 attachment(s)
Re: pgbench - extend initialization phase control

2019-11-06 11:31 に Fujii Masao さんは書きました:

On Wed, Nov 6, 2019 at 6:23 AM Fabien COELHO <coelho@cri.ensmp.fr>
wrote:

Hello,

- for (step = initialize_steps; *step != '\0'; step++)
+ for (const char *step = initialize_steps; *step != '\0'; step++)

But I still wonder why we should apply such change here.

Because it removes one declaration and reduces the scope of one
variable?

If there is the reason why this change is necessary here,

Nope, such changes are never necessary.

I'm OK with that. But if not, basically I'd like to avoid the change.
Otherwise it may make the back-patch a bit harder
when we change the surrounding code.

I think that this is small enough so that it can be managed, if any
back
patch occurs on the surrounding code, which is anyway pretty unlikely.

Attached is the slightly updated version of the patch. Based on your
patch, I added the descriptions about logging of "g" and "G" steps into
the doc, and did some cosmetic changes. Barrying any objections,
I'm thinking to commit this patch.

I'd suggest:

"to print one message each ..." -> "to print one message every ..."

"to print no progress ..." -> "not to print any progress ..."

I would not call "fprintf(stderr" twice in a row if I can call it
once.

Thanks for the suggestion!
I updated the patch in that way and committed it!

This commit doesn't include the change "for (const char ...)"
and "merge two fprintf into one" ones that we were discussing.
Because they are trivial but I'm not sure if they are improvements
or not, yet. If they are, probably it's better to apply such changes
to all the places having the similar issues. But that seems overkill.

While reviewing the patch, I found that current code allows space
character to be specified in -I. That is, checkInitSteps() accepts
space character. Why should we do this?

Probably I understand why runInitSteps() needs to accept space character
(because "v" in the specified string with -I is replaced with a space
character when --no-vacuum option is given).

Yes, that is the reason, otherwise the string would have to be
shifted.

But I'm not sure why that's also necessary in checkInitSteps(). Instead,
we should treat a space character as invalid in checkInitSteps()?

I think that it may break --no-vacuum, and I thought that there may be
other option which remove things, eventually. Also, having a NO-OP
looks
ok to me.

As far as I read the code, checkInitSteps() checks the initialization
steps that users specified. The initialization steps string that
"v" was replaced with blank character is not given to checkInitSteps().
So ISTM that dropping the handling of blank character from
checkInitSteps() doesn't break --no-vacuum.

This is a patch which does not allow space character in -I options .

Regard,
Yu Kimura

Attachments:

pgbench_not_allow_space_as_option.patchtext/x-diff; name=pgbench_not_allow_space_as_option.patchDownload
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 14dbc4510c..95b23895ff 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -4080,7 +4080,7 @@ checkInitSteps(const char *initialize_steps)
 
 	for (const char *step = initialize_steps; *step != '\0'; step++)
 	{
-		if (strchr(ALL_INIT_STEPS " ", *step) == NULL)
+		if (strchr(ALL_INIT_STEPS, *step) == NULL)
 		{
 			fprintf(stderr,
 					"unrecognized initialization step \"%c\"\n",
#25Fabien COELHO
coelho@cri.ensmp.fr
In reply to: btkimurayuzk (#24)
Re: pgbench - extend initialization phase control

I think that it may break --no-vacuum, and I thought that there may be
other option which remove things, eventually. Also, having a NO-OP looks
ok to me.

As far as I read the code, checkInitSteps() checks the initialization
steps that users specified. The initialization steps string that
"v" was replaced with blank character is not given to checkInitSteps().
So ISTM that dropping the handling of blank character from
checkInitSteps() doesn't break --no-vacuum.

This is a patch which does not allow space character in -I options .

I do not think that this is desirable. It would be a regression, and
allowing a no-op is not an issue in anyway.

--
Fabien Coelho - CRI, MINES ParisTech

#26Fujii Masao
masao.fujii@gmail.com
In reply to: Fabien COELHO (#25)
Re: pgbench - extend initialization phase control

On Thu, Nov 7, 2019 at 5:18 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

I think that it may break --no-vacuum, and I thought that there may be
other option which remove things, eventually. Also, having a NO-OP looks
ok to me.

As far as I read the code, checkInitSteps() checks the initialization
steps that users specified. The initialization steps string that
"v" was replaced with blank character is not given to checkInitSteps().
So ISTM that dropping the handling of blank character from
checkInitSteps() doesn't break --no-vacuum.

This is a patch which does not allow space character in -I options .

I do not think that this is desirable. It would be a regression, and
allowing a no-op is not an issue in anyway.

Why is that regression, you think? I think that's an oversight.
If I'm missing something and accepting a blank character as no-op in
also checkInitSteps() is really necessary for some reasons,
which should be documented. But, if so, another question is;
why should only blank character be treated as no-op, in checkInitSteps()?

Regards,

--
Fujii Masao

#27Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Fujii Masao (#26)
1 attachment(s)
Re: pgbench - extend initialization phase control

Hello Masao-san,

I do not think that this is desirable. It would be a regression, and
allowing a no-op is not an issue in anyway.

Why is that regression, you think?

Because "pgbench -I ' d'" currently works and it would cease to work after
the patch.

I think that's an oversight. If I'm missing something and accepting a
blank character as no-op in also checkInitSteps() is really necessary
for some reasons, which should be documented. But, if so, another
question is; why should only blank character be treated as no-op, in
checkInitSteps()?

The idea is to have one character that can be substituted to remove any
operation.

On principle, allowing a no-op character, whatever the choice, is a good
idea, because it means that the caller can take advantage of that if need
be.

I think that the actual oversight is that the checkInitSteps should be
called at the beginning of processing initialization steps rather than
while processing -I, because currently other places modify the
initialization string (no-vacuum, foreign key) and thus are not checked.

I agree that it should be documented.

Attached patch adds a doc and moves the check where it should be, and
modifies a test with an explicit no-op space initialization step.

--
Fabien.

Attachments:

pgbench-check-init-1.patchtext/x-diff; name=pgbench-check-init-1.patchDownload
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 4c48a58ed2..5008377998 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -168,7 +168,8 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
         initialization steps to be performed, using one character per step.
         Each step is invoked in the specified order.
         The default is <literal>dtgvp</literal>.
-        The available steps are:
+        The space character is accepted as a no-op, and the other available
+        steps are:
 
         <variablelist>
          <varlistentry>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 14dbc4510c..4178127c21 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -4104,6 +4104,9 @@ runInitSteps(const char *initialize_steps)
 	double		run_time = 0.0;
 	bool		first = true;
 
+	/* check that all steps are valid before executing them */
+	checkInitSteps(initialize_steps);
+
 	initPQExpBuffer(&stats);
 
 	if ((con = doConnect()) == NULL)
@@ -5501,7 +5504,6 @@ main(int argc, char **argv)
 				if (initialize_steps)
 					pg_free(initialize_steps);
 				initialize_steps = pg_strdup(optarg);
-				checkInitSteps(initialize_steps);
 				initialization_option_set = true;
 				break;
 			case 'h':
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 1845869016..4e3d3e464c 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -130,7 +130,7 @@ pgbench(
 
 # Test interaction of --init-steps with legacy step-selection options
 pgbench(
-	'--initialize --init-steps=dtpvGvv --no-vacuum --foreign-keys --unlogged-tables --partitions=3',
+	'--initialize --no-vacuum --foreign-keys --unlogged-tables --partitions=3',
 	0,
 	[qr{^$}],
 	[
@@ -143,7 +143,9 @@ pgbench(
 		qr{(?!vacuuming)}, # no vacuum
 		qr{done in \d+\.\d\d s }
 	],
-	'pgbench --init-steps');
+	'pgbench --init-steps',
+	undef,
+	'--init-steps= dtpvGvv');
 
 # Run all builtin scripts, for a few transactions each
 pgbench(
#28Fujii Masao
masao.fujii@gmail.com
In reply to: Fabien COELHO (#27)
Re: pgbench - extend initialization phase control

On Thu, Nov 7, 2019 at 6:35 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello Masao-san,

I do not think that this is desirable. It would be a regression, and
allowing a no-op is not an issue in anyway.

Why is that regression, you think?

Because "pgbench -I ' d'" currently works and it would cease to work after
the patch.

If the behavior has been documented and visible to users,
I agree that it should not be dropped for compatibility basically.
But in this case, that was not.

I think that's an oversight. If I'm missing something and accepting a
blank character as no-op in also checkInitSteps() is really necessary
for some reasons, which should be documented. But, if so, another
question is; why should only blank character be treated as no-op, in
checkInitSteps()?

The idea is to have one character that can be substituted to remove any
operation.

Probably I understand that idea is necessary in the internal of pgbench
because pgbench internally may modify the initialization steps string.
But I'm not sure why it needs to be exposed, yet.

On principle, allowing a no-op character, whatever the choice, is a good
idea, because it means that the caller can take advantage of that if need
be.

I think that the actual oversight is that the checkInitSteps should be
called at the beginning of processing initialization steps rather than
while processing -I, because currently other places modify the
initialization string (no-vacuum, foreign key) and thus are not checked.

As far as I read the code, runInitSteps() does the check. If the initialization
steps string contains unrecognized character, runInitSteps() emits an error.

* (We could just leave it to runInitSteps() to fail if there are wrong
* characters, but since initialization can take awhile, it seems friendlier
* to check during option parsing.)

The above comment in checkInitSteps() seems to explain why
checkInitSteps() is called at the beginning of processing initialization
steps.

Regards,

--
Fujii Masao

#29Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Fujii Masao (#28)
Re: pgbench - extend initialization phase control

Hello,

I think that the actual oversight is that the checkInitSteps should be
called at the beginning of processing initialization steps rather than
while processing -I, because currently other places modify the
initialization string (no-vacuum, foreign key) and thus are not checked.

As far as I read the code, runInitSteps() does the check. If the initialization
steps string contains unrecognized character, runInitSteps() emits an error.

Sure, but the previous step have been executed and committed, the point of
the check is to detect the issue before starting the execution.

* (We could just leave it to runInitSteps() to fail if there are wrong
* characters, but since initialization can take awhile, it seems friendlier
* to check during option parsing.)

The above comment in checkInitSteps() seems to explain why
checkInitSteps() is called at the beginning of processing initialization
steps.

Yep, the comment is right in the motivation, but not accurate anymore wrt
the submitted patch. V2 attached updates this comment.

--
Fabien.