pgbench randomness initialization

Started by Andres Freundalmost 10 years ago34 messages

andres@anarazel.de

almost 10 years ago

Hi,

pondering
http://archives.postgresql.org/message-id/CA%2BTgmoZJdA6K7-17K4A48rVB0UPR98HVuaNcfNNLrGsdb1uChg%40mail.gmail.com
et al I was wondering why it's a good idea for pgbench to do
INSTR_TIME_SET_CURRENT(start_time);
srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
to initialize randomness and then
for (i = 0; i < nthreads; i++)
thread->random_state[0] = random();
thread->random_state[1] = random();
thread->random_state[2] = random();
to initialize the individual thread random state which is then used by
pg_erand48().

To me it seems better to instead initialize srandom() with a known value
(say, uh, 0). Or even better don't use random() at all, and fill a
global pg_erand48() with a known state; and use pg_erand48() to
initialize the thread states.

Obviously that doesn't make pgbench entirely reproducible, but it seems
a lot better than now. Individual threads would do work in a
reproducible order.

I see very little reason to have the current behaviour, or at the very
least not by default.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fabien COELHO

coelho@cri.ensmp.fr

almost 10 years ago

In reply to: Andres Freund (#1)

Re: pgbench randomness initialization

Hello Andres,

et al I was wondering why it's a good idea for pgbench to do
INSTR_TIME_SET_CURRENT(start_time);
srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
to initialize randomness and then
for (i = 0; i < nthreads; i++)
thread->random_state[0] = random();
thread->random_state[1] = random();
thread->random_state[2] = random();
to initialize the individual thread random state which is then used by
pg_erand48().

To me it seems better to instead initialize srandom() with a known value
(say, uh, 0). Or even better don't use random() at all, and fill a
global pg_erand48() with a known state; and use pg_erand48() to
initialize the thread states.

Obviously that doesn't make pgbench entirely reproducible, but it seems
a lot better than now. Individual threads would do work in a
reproducible order.

I see very little reason to have the current behaviour, or at the very
least not by default.

I think that it depends on what you want, which may vary:

(1) "exactly" reproducible runs, but one run may hit a particular
steady state not representative of what happens in general.

(2) runs which really vary from one to the next, so as
to have an idea about how much it may vary, what is the
performance stability.

Currently pgbench focusses on (2), which may or may not be fine depending
on what you are doing. From a personal point of view I think that (2) is
more significant to collect performance data, even if the results are more
unstable: that simply reflects reality and its intrinsic variations, so
I'm fine that as the default.

Now for those interested in (1) for some reason, I would suggest to rely a
PGBENCH_RANDOM_SEED environment variable or --random-seed option which
could be used to have a oxymoronic "deterministic randomness", if desired.
I do not think that it should be the default, though.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Fabien COELHO (#2)

Re: pgbench randomness initialization

On 2016-04-07 11:56:12 +0200, Fabien COELHO wrote:

(2) runs which really vary from one to the next, so as
to have an idea about how much it may vary, what is the
performance stability.

I don't think this POV makes all that much sense. If you do something
non-comparable, then the results aren't, uh, comparable. Which also
means there's a lower chance to reproduce observed problems.

Currently pgbench focusses on (2), which may or may not be fine depending on
what you are doing. From a personal point of view I think that (2) is more
significant to collect performance data, even if the results are more
unstable: that simply reflects reality and its intrinsic variations, so I'm
fine that as the default.

Uh, and what's the benefit of that variability? pgbench isn't a reality
simulation tool, it's a benchmarking tool. And benchmarks with intrisinc
variability are bad benchmarks.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fabien COELHO

coelho@cri.ensmp.fr

almost 10 years ago

In reply to: Andres Freund (#3)

Re: pgbench randomness initialization

(2) runs which really vary from one to the next, so as
to have an idea about how much it may vary, what is the
performance stability.

I don't think this POV makes all that much sense. If you do something
non-comparable, then the results aren't, uh, comparable. Which also
means there's a lower chance to reproduce observed problems.

That also means that you are likely not to hit them if you always do the
very same run...

Moreover, the Monte Carlo method requires randomness for its convergence
result.

Currently pgbench focusses on (2), which may or may not be fine depending on
what you are doing. From a personal point of view I think that (2) is more
significant to collect performance data, even if the results are more
unstable: that simply reflects reality and its intrinsic variations, so I'm
fine that as the default.

Uh, and what's the benefit of that variability? pgbench isn't a reality
simulation tool, it's a benchmarking tool. And benchmarks with intrisinc
variability are bad benchmarks.

From a statistical perspective, one run does not mean anything. If you do
the exact same run over and over again, then all mathematical results
about (slow) convergence towards the average are lost. This is like trying
to survey a population by asking the questions to the same person over and
over: the result will be biased.

Now when you develop, which is the use case you probably have in mind, you
want to compare two pg version and check for the performance impact, so
having the exact same run seems like a proxy to quickly check for that.

However, from a stastistical perspective this is just heresy: you may do a
change which improves one given run at the expense of all possible others
and you would not know it: Say for instance that there are two different
behaviors depending on something, then you will check against one of them
only.

So I have no mathematical doubt that changing the seed is the right
default setting, thus I think that the current behavior is fine. However
I'm okay if someone wants to control the randomness for some reason (maybe
having "less sure" results, but quickly), so it could be allowed somehow.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Fabien COELHO (#4)

Re: pgbench randomness initialization

On 2016-04-07 12:25:58 +0200, Fabien COELHO wrote:

(2) runs which really vary from one to the next, so as
to have an idea about how much it may vary, what is the
performance stability.

I don't think this POV makes all that much sense. If you do something
non-comparable, then the results aren't, uh, comparable. Which also
means there's a lower chance to reproduce observed problems.

That also means that you are likely not to hit them if you always do the
very same run...

If you run the test for longer... Or explicitly iterate over IVs. At the
very least we need to make pgbench output the IV used, to have some
chance of repeating tests.

Uh, and what's the benefit of that variability? pgbench isn't a reality
simulation tool, it's a benchmarking tool. And benchmarks with intrisinc
variability are bad benchmarks.

From a statistical perspective, one run does not mean anything. If you do
the exact same run over and over again, then all mathematical results about
(slow) convergence towards the average are lost. This is like trying to
survey a population by asking the questions to the same person over and
over: the result will be biased.

That comparison pretty much invalidates any point you're making, it's
that bad.

Now when you develop, which is the use case you probably have in mind, you
want to compare two pg version and check for the performance impact, so
having the exact same run seems like a proxy to quickly check for that.

It's not about "quickly" checking for something. If you look at the
results in thread mentioned in the OP, the order of operations
drastically and *PERSISTENTLY* changes the observations. Causing *days*
of work lost.

However, from a stastistical perspective this is just heresy: you may do a
change which improves one given run at the expense of all possible others
and you would not know it: Say for instance that there are two different
behaviors depending on something, then you will check against one of them
only.

Meh. That assumes that we're doing a huge number of pgbench runs; but
usually people do maybe a handful. Tops. If you're trying to defend
against scenarios like that you need to design your tests so that you'll
encounter such problems by running longer.

So I have no mathematical doubt that changing the seed is the right default
setting, thus I think that the current behavior is fine. However I'm okay if
someone wants to control the randomness for some reason (maybe having "less
sure" results, but quickly), so it could be allowed somehow.

There might be some statistics arguments, but I think they're pretty
ignoring reality.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fabien COELHO

coelho@cri.ensmp.fr

almost 10 years ago

In reply to: Andres Freund (#5)

Re: pgbench randomness initialization

Hello Andres,

If you run the test for longer... Or explicitly iterate over IVs. At the
very least we need to make pgbench output the IV used, to have some
chance of repeating tests.

Note that I'm not against providing a way to repeat tests "exactly", and I
have suggested two means: environment variable and/or option.

[...] That comparison pretty much invalidates any point you're making,
it's that bad.

At least it is simple, if simplistic.

Here is another one: I knew a financial institution which needed to
evaluate the VAR of exotic financial products every night. They relied on
MC for that. Alas, it was not converging quickly enough, results were
unstable, so they took your advice: they froze the seed. Day after day the
results were mostly the same, the VAR was stable one morning to the other,
the management is happy, the risks were under control... That was in the
mid 2000s:-)

However, from a stastistical perspective this is just heresy: you may do a
change which improves one given run at the expense of all possible others
and you would not know it: Say for instance that there are two different
behaviors depending on something, then you will check against one of them
only.

Meh. That assumes that we're doing a huge number of pgbench runs;

A number of, not necessarily "huge". Or averaging a lot of intermediate
values and having a hard look at the distribution, not just the final tps
number.

but usually people do maybe a handful. Tops. If you're trying to defend
against scenarios like that you need to design your tests so that you'll
encounter such problems by running longer.

People usually do a lot of things, does not mean that it is "right".

So I have no mathematical doubt that changing the seed is the right
default setting, thus I think that the current behavior is fine.
However I'm okay if someone wants to control the randomness for some
reason (maybe having "less sure" results, but quickly), so it could be
allowed somehow.

There might be some statistics arguments,

Yep, there is.

but I think they're pretty ignoring reality.

Hmmm. If reality wants to ignore mathematics, usually it looses, so this
will not be with my blessing. Note that as a committer you do not need me
to freeze the seed. I'm just providing an opinion backed by mathematical
proofs.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Fabien COELHO (#2)

Re: pgbench randomness initialization

On Thu, Apr 7, 2016 at 5:56 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

I think that it depends on what you want, which may vary:

(1) "exactly" reproducible runs, but one run may hit a particular
steady state not representative of what happens in general.

(2) runs which really vary from one to the next, so as
to have an idea about how much it may vary, what is the
performance stability.

Currently pgbench focusses on (2), which may or may not be fine depending on
what you are doing. From a personal point of view I think that (2) is more
significant to collect performance data, even if the results are more
unstable: that simply reflects reality and its intrinsic variations, so I'm
fine that as the default.

Now for those interested in (1) for some reason, I would suggest to rely a
PGBENCH_RANDOM_SEED environment variable or --random-seed option which could
be used to have a oxymoronic "deterministic randomness", if desired.
I do not think that it should be the default, though.

I agree entirely. If performance is erratic, that's actually
something you want to discover during benchmarking. If different
pgbench runs (of non-trivial length) are producing substantially
different results, then that's really a problem we need to fix, not
just adjust pgbench to cover it up.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Robert Haas (#7)

Re: pgbench randomness initialization

On 2016-04-07 08:58:16 -0400, Robert Haas wrote:

On Thu, Apr 7, 2016 at 5:56 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

I think that it depends on what you want, which may vary:

(1) "exactly" reproducible runs, but one run may hit a particular
steady state not representative of what happens in general.

(2) runs which really vary from one to the next, so as
to have an idea about how much it may vary, what is the
performance stability.

Currently pgbench focusses on (2), which may or may not be fine depending on
what you are doing. From a personal point of view I think that (2) is more
significant to collect performance data, even if the results are more
unstable: that simply reflects reality and its intrinsic variations, so I'm
fine that as the default.

Now for those interested in (1) for some reason, I would suggest to rely a
PGBENCH_RANDOM_SEED environment variable or --random-seed option which could
be used to have a oxymoronic "deterministic randomness", if desired.
I do not think that it should be the default, though.

I agree entirely. If performance is erratic, that's actually
something you want to discover during benchmarking. If different
pgbench runs (of non-trivial length) are producing substantially
different results, then that's really a problem we need to fix, not
just adjust pgbench to cover it up.

It's not about "covering it up"; it's about actually being able to take
action based on benchmark results, and about practically being able to
run benchmarks. The argument above means essentially that we need to run
a significant number of pgbench runs for *anything*, because running
them 3-5 times before/after just isn't meaningful enough.

It means that you can't separate between OS caused, and pgbench order
caused performance differences.

I agree that it's a horrid problem that we can get half the throughput
dependent on large machines, dependant on the ordering. But without
running queries in the same order before/after a patch there's no way to
validate whether $patch caused the problem. And no way to reliably
trigger problematic scenarios.

I also agree that it's important to be able to vary workloads. But if
you do so, you should do so in the same order, both pre/post a
patch. Afaics the prime use of pgbench is validation of the performance
effects of patches; therefore it should be usable for that, and it's
not.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Andres Freund (#8)

Re: pgbench randomness initialization

On Thu, Apr 7, 2016 at 9:15 AM, Andres Freund <andres@anarazel.de> wrote:

It's not about "covering it up"; it's about actually being able to take
action based on benchmark results, and about practically being able to
run benchmarks. The argument above means essentially that we need to run
a significant number of pgbench runs for *anything*, because running
them 3-5 times before/after just isn't meaningful enough.

It means that you can't separate between OS caused, and pgbench order
caused performance differences.

I'm not objecting to adding an option for this; but I think Fabien is
right that it shouldn't be the default.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Tom Lane

tgl@sss.pgh.pa.us

almost 10 years ago

In reply to: Andres Freund (#5)

Re: pgbench randomness initialization

Andres Freund <andres@anarazel.de> writes:

On 2016-04-07 12:25:58 +0200, Fabien COELHO wrote:

So I have no mathematical doubt that changing the seed is the right default
setting, thus I think that the current behavior is fine. However I'm okay if
someone wants to control the randomness for some reason (maybe having "less
sure" results, but quickly), so it could be allowed somehow.

There might be some statistics arguments, but I think they're pretty
ignoring reality.

Sorry, but I think Fabien is right and you are wrong. There is no
point in having randomness in there at all if the thing is constrained
to generate the same "random" sequence every time.

I don't object to having an option to force the initial seed, but
it should not be the default, and it most certainly should not be
the only behavior.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Tom Lane (#10)

Re: pgbench randomness initialization

On 2016-04-07 09:46:27 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2016-04-07 12:25:58 +0200, Fabien COELHO wrote:

So I have no mathematical doubt that changing the seed is the right default
setting, thus I think that the current behavior is fine. However I'm okay if
someone wants to control the randomness for some reason (maybe having "less
sure" results, but quickly), so it could be allowed somehow.

There might be some statistics arguments, but I think they're pretty
ignoring reality.

Sorry, but I think Fabien is right and you are wrong.

Given that it's 3:1 so far, you might be right...

There is no point in having randomness in there at all if the thing is
constrained to generate the same "random" sequence every time.

but that argument seems pretty absurd. It's obviously different to query
for different rows over a run, rather than querying the same row again
and again, in all backends. The reason we use randomness is to avoid
easily discernible patterns in querying. Without randomness, how would
you do that?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Fabien COELHO

coelho@cri.ensmp.fr

almost 10 years ago

In reply to: Robert Haas (#9)

1 attachment(s)

Re: pgbench randomness initialization

It means that you can't separate between OS caused, and pgbench order
caused performance differences.

I'm not objecting to adding an option for this; but I think Fabien is
right that it shouldn't be the default.

Yep.

Andres, attached is a simple POC with an option & environment variable
(whereas I should rather have looked at the current checkpointer/vacuum
issue which I have reproduced:-().

While testing it I had a funny pattern, something like:

pgbench --random-seed=123 -M prepared -T 3 -P 1 -S
1.0: 600 tps
2.0: 600 tps
3.0: 600 tps

First rerun just after:

pgbench --random-seed=123 -M prepared -T 3 -P 1 -S
1.0: 1800 tps
2.0: 600 tps
3.0: 600 tps

The first rerun hits the same pages, so the first 1800 transactions are
run in one second, and then it is new pages which are loaded so the
performance goes down.

Second rerun just after:

pgbench --random-seed=123 -M prepared -T 3 -P 1 -S
1.0: 1800 tps
2.0: 1400 tps
3.0: 600 tps

The second redun hits the same 3000 transactions than the previous one in
about 1.7 seconds, then goes back to 600 tps for new pages...

After more iterations the performance is 1800 tps during the 3 seconds.

This clearly illustrates that it should be used with caution.

--
Fabien.

Attachments:

pgbench-seed-1.patchtext/x-diff; charset=us-ascii; name=pgbench-seed-1.patchDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 06cd5db..1908896 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -678,6 +678,33 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
        </para>
       </listitem>
      </varlistentry>
+
+     <varlistentry>
+      <term><option>--random-seed=</><replaceable>SEED</></term>
+      <listitem>
+       <para>
+        Set random generator seed.  This random generator is used to initialize
+        per-thread random generator states.
+        Expected values for <replaceable>SEED</> are: <literal>time</> (the default,
+        the seed is based on the current time) or any unsigned integer value.
+        The random generator is invoked explicitely from a pgbench script
+        (<literal>random...</> functions) or implicitely (for instance option
+        <option>--rate</> uses random to schedule transactions).
+        The random generator seed may also be provided through environment variable
+        <literal>PGBENCH_RANDOM_SEED</>.
+      </para>
+      <para>
+        Setting the seed explicitely allows to reproduce a <command>pgbench</> run
+        exactly, as far as random numbers are concerned.
+        From a statistical viewpoint this is a bad idea because it can hide the
+        performance variability or improve performance unduly, e.g. by hitting
+        the same pages than a previous run.
+        However it may also be of great help for debugging, for instance
+        re-running a tricky case which leads to an error.
+        Use wisely.
+       </para>
+      </listitem>
+     </varlistentry>
     </variablelist>
    </para>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 076fbd3..d6db19f 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -434,6 +434,7 @@ usage(void)
 		   "  -U, --username=USERNAME  connect as specified database user\n"
 		 "  -V, --version            output version information, then exit\n"
 		   "  -?, --help               show this help, then exit\n"
+		   "  --random-seed=SEED       set random seed (\"time\", integer)\n"
 		   "\n"
 		   "Report bugs to <pgsql-bugs@postgresql.org>.\n",
 		   progname, progname);
@@ -3258,6 +3259,7 @@ main(int argc, char **argv)
 		{"sampling-rate", required_argument, NULL, 4},
 		{"aggregate-interval", required_argument, NULL, 5},
 		{"progress-timestamp", no_argument, NULL, 6},
+		{"random-seed", required_argument, NULL, 7},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -3292,6 +3294,7 @@ main(int argc, char **argv)
 	PGconn	   *con;
 	PGresult   *res;
 	char	   *env;
+	char	   *seed = NULL;
 
 	char		val[64];
 
@@ -3607,6 +3610,9 @@ main(int argc, char **argv)
 				progress_timestamp = true;
 				benchmarking_option_set = true;
 				break;
+			case 7:
+				seed = pg_strdup(optarg);
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -3845,7 +3851,25 @@ main(int argc, char **argv)
 
 	/* set random seed */
 	INSTR_TIME_SET_CURRENT(start_time);
-	srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
+
+	if (seed == NULL)
+		seed = getenv("PGBENCH_RANDOM_SEED");
+
+	if (seed == NULL || strcmp(seed, "time") == 0)
+		srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
+	else
+	{
+		unsigned int s;
+		fprintf(stderr, "random seed set to '%s'\n", seed);
+		if (sscanf(seed, "%u", &s) != 1)
+		{
+			fprintf(stderr,
+					"error while scanning '%s', expecting an unsigned integer\n",
+					seed);
+			exit(1);
+		}
+		srandom(s);
+	}
 
 	/* set up thread data structures */
 	threads = (TState *) pg_malloc(sizeof(TState) * nthreads);

#13

Alvaro Herrera

alvherre@2ndquadrant.com

almost 10 years ago

In reply to: Fabien COELHO (#12)

Re: pgbench randomness initialization

Fabien COELHO wrote:

While testing it I had a funny pattern, something like:

pgbench --random-seed=123 -M prepared -T 3 -P 1 -S
1.0: 600 tps
2.0: 600 tps
3.0: 600 tps

The output should include the random seed used, whether it was passed
with --random-seed, environment variable or randomly determined. That
way, the user that later wants to verify why a particular run caused
some particular misbehavior knows what seed to use to reproduce that
run.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Fabien COELHO

coelho@cri.ensmp.fr

about 8 years ago

In reply to: Alvaro Herrera (#13)

1 attachment(s)

Re: [HACKERS] pgbench randomness initialization

Hello Alvaro,

I revive this patch because controlling the seed is useful for tap testing
pgbench.

The output should include the random seed used, whether it was passed
with --random-seed, environment variable or randomly determined. That
way, the user that later wants to verify why a particular run caused
some particular misbehavior knows what seed to use to reproduce that
run.

Yep.

Here is a new version which output use used seed when a seed is
explicitely set with an option or from the environment.

However, the default (current) behavior remains silent, so no visible
changes unless tinkering with it.

The patch also allows to use a "strong" random for seeding the PRNG,
thanks to pg_strong_random().

The tests assume that stdlib random/srandom behavior is standard thus
deterministic between platform.

--
Fabien.

Attachments:

pgbench-seed-2.patchtext/x-diff; name=pgbench-seed-2.patchDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 1519fe7..49dda81 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -761,6 +761,37 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
        </para>
       </listitem>
      </varlistentry>
+
+     <varlistentry>
+      <term><option>--random-seed=</option><replaceable>SEED</repleaceable></term>
+      <listitem>
+       <para>
+        Set random generator seed.  This random generator is used to initialize
+        per-thread random generator states.
+        Expected values for <replaceable>SEED</repleaceable> are:
+        <literal>time</literal> (the default, the seed is based on the current time),
+        <literal>rand</literal> (use a strong random source if available),
+        or any unsigned integer value.
+        The random generator is invoked explicitely from a pgbench script
+        (<literal>random...</literal> functions) or implicitely (for instance option
+        <option>--rate</option> uses random to schedule transactions).
+        The random generator seed may also be provided through environment variable
+        <literal>PGBENCH_RANDOM_SEED</literal>.
+        To ensure that the provided seed impacts all possible uses, put this option
+        first or use the environment variable.
+      </para>
+      <para>
+        Setting the seed explicitely allows to reproduce a <command>pgbench</command>
+        run exactly, as far as random numbers are concerned.
+        From a statistical viewpoint this is a bad idea because it can hide the
+        performance variability or improve performance unduly, e.g. by hitting
+        the same pages than a previous run.
+        However it may also be of great help for debugging, for instance
+        re-running a tricky case which leads to an error.
+        Use wisely.
+       </para>
+      </listitem>
+     </varlistentry>
     </variablelist>
    </para>
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e065f7b..fdd731d 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -557,6 +557,7 @@ usage(void)
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
+		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
 		   "  -d, --debug              print debugging output\n"
@@ -4010,6 +4011,49 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	}
 }
 
+/* call srandom based on some seed. NULL triggers the default behavior. */
+static void
+set_random_seed(const char *seed)
+{
+	unsigned int iseed;
+
+	if (seed == NULL)
+		seed = getenv("PGBENCH_RANDOM_SEED");
+
+	if (seed == NULL || *seed == '\0' || strcmp(seed, "time") == 0)
+	{
+		/* rely on current time */
+		instr_time	now;
+		INSTR_TIME_SET_CURRENT(now);
+		iseed = (unsigned int) INSTR_TIME_GET_MICROSEC(now);
+	}
+	else if (strcmp(seed, "rand") == 0)
+	{
+		/* use some "strong" random source */
+		if (!pg_strong_random(&iseed, sizeof(iseed)))
+		{
+			fprintf(stderr, "cannot seed random from a strong source\n");
+			exit(1);
+		}
+	}
+	else
+	{
+		/* parse seed value coming either from option or environment */
+		char garbage;
+		if (sscanf(seed, "%u%c", &iseed, &garbage) != 1)
+		{
+			fprintf(stderr,
+					"error while scanning '%s', expecting an unsigned integer\n",
+					seed);
+			exit(1);
+		}
+	}
+
+	if (seed != NULL && *seed != '\0')
+		fprintf(stderr, "setting random seed to %u\n", iseed);
+	srandom(iseed);
+}
+
 
 int
 main(int argc, char **argv)
@@ -4052,6 +4096,7 @@ main(int argc, char **argv)
 		{"progress-timestamp", no_argument, NULL, 6},
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
+		{"random-seed", required_argument, NULL, 9},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -4120,6 +4165,9 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
+	/* set random seed early, because it may be used while parsing scripts. */
+	set_random_seed(NULL);
+
 	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
 		char	   *script;
@@ -4392,6 +4440,10 @@ main(int argc, char **argv)
 				initialization_option_set = true;
 				foreign_keys = true;
 				break;
+			case 9:				/* random-seed */
+				benchmarking_option_set = true;
+				set_random_seed(optarg);
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -4698,10 +4750,6 @@ main(int argc, char **argv)
 	}
 	PQfinish(con);
 
-	/* set random seed */
-	INSTR_TIME_SET_CURRENT(start_time);
-	srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
-
 	/* set up thread data structures */
 	threads = (TState *) pg_malloc(sizeof(TState) * nthreads);
 	nclients_dealt = 0;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 3dd080e..f50c8d2 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -210,11 +210,16 @@ COMMIT;
 } });
 
 # test expressions
+# command 1..3 and 23 depend on random seed which is used to call srandom.
 pgbench(
-	'-t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808',
+	'--random-seed=5432 -t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808',
 	0,
 	[ qr{type: .*/001_pgbench_expressions}, qr{processed: 1/1} ],
-	[   qr{command=4.: int 4\b},
+	[   qr{setting random seed to 5432\b},
+        qr{command=1.: int 28\b},
+	    qr{command=2.: int 7\b},
+	    qr{command=3.: int 47\b},
+	    qr{command=4.: int 4\b},
 		qr{command=5.: int 5\b},
 		qr{command=6.: int 6\b},
 		qr{command=7.: int 7\b},
@@ -232,7 +237,7 @@ pgbench(
 		qr{command=19.: double 19\b},
 		qr{command=20.: double 20\b},
 		qr{command=21.: int 9223372036854775807\b},
-		qr{command=23.: int [1-9]\b},
+		qr{command=23.: int 1\b},
 		qr{command=24.: double -27\b},
 		qr{command=25.: double 1024\b},
 		qr{command=26.: double 1\b},

#15

Fabien COELHO

coelho@cri.ensmp.fr

about 8 years ago

In reply to: Fabien COELHO (#14)

1 attachment(s)

Re: [HACKERS] pgbench randomness initialization

Here is a new version which output use used seed when a seed is explicitely
set with an option or from the environment.

It is even better without xml typos, with simpler coding and the doc in
the right place... Sorry for the noise.

--
Fabien.

Attachments:

pgbench-seed-3.patchtext/x-diff; name=pgbench-seed-3.patchDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 1519fe7..ec07fa3 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -680,6 +680,37 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--random-seed=</option><replaceable>SEED</replaceable></term>
+      <listitem>
+       <para>
+        Set random generator seed.  This random generator is used to initialize
+        per-thread random generator states.
+        Expected values for <replaceable>SEED</replaceable> are:
+        <literal>time</literal> (the default, the seed is based on the current time),
+        <literal>rand</literal> (use a strong random source if available),
+        or any unsigned integer value.
+        The random generator is invoked explicitely from a pgbench script
+        (<literal>random...</literal> functions) or implicitely (for instance option
+        <option>--rate</option> uses random to schedule transactions).
+        The random generator seed may also be provided through environment variable
+        <literal>PGBENCH_RANDOM_SEED</literal>.
+        To ensure that the provided seed impacts all possible uses, put this option
+        first or use the environment variable.
+      </para>
+      <para>
+        Setting the seed explicitely allows to reproduce a <command>pgbench</command>
+        run exactly, as far as random numbers are concerned.
+        From a statistical viewpoint this is a bad idea because it can hide the
+        performance variability or improve performance unduly, e.g. by hitting
+        the same pages than a previous run.
+        However it may also be of great help for debugging, for instance
+        re-running a tricky case which leads to an error.
+        Use wisely.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--sampling-rate=<replaceable>rate</replaceable></option></term>
       <listitem>
        <para>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e065f7b..4eb82f7 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -557,6 +557,7 @@ usage(void)
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
+		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
 		   "  -d, --debug              print debugging output\n"
@@ -4010,6 +4011,46 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	}
 }
 
+/* call srandom based on some seed. NULL triggers the default behavior. */
+static void
+set_random_seed(const char *seed)
+{
+	unsigned int iseed;
+
+	if (seed == NULL || *seed == '\0' || strcmp(seed, "time") == 0)
+	{
+		/* rely on current time */
+		instr_time	now;
+		INSTR_TIME_SET_CURRENT(now);
+		iseed = (unsigned int) INSTR_TIME_GET_MICROSEC(now);
+	}
+	else if (strcmp(seed, "rand") == 0)
+	{
+		/* use some "strong" random source */
+		if (!pg_strong_random(&iseed, sizeof(iseed)))
+		{
+			fprintf(stderr, "cannot seed random from a strong source\n");
+			exit(1);
+		}
+	}
+	else
+	{
+		/* parse seed value coming either from option or environment */
+		char garbage;
+		if (sscanf(seed, "%u%c", &iseed, &garbage) != 1)
+		{
+			fprintf(stderr,
+					"error while scanning '%s', expecting an unsigned integer\n",
+					seed);
+			exit(1);
+		}
+	}
+
+	if (seed != NULL)
+		fprintf(stderr, "setting random seed to %u\n", iseed);
+	srandom(iseed);
+}
+
 
 int
 main(int argc, char **argv)
@@ -4052,6 +4093,7 @@ main(int argc, char **argv)
 		{"progress-timestamp", no_argument, NULL, 6},
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
+		{"random-seed", required_argument, NULL, 9},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -4120,6 +4162,9 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
+	/* set random seed early, because it may be used while parsing scripts. */
+	set_random_seed(getenv("PGBENCH_RANDOM_SEED"));
+
 	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
 		char	   *script;
@@ -4392,6 +4437,10 @@ main(int argc, char **argv)
 				initialization_option_set = true;
 				foreign_keys = true;
 				break;
+			case 9:				/* random-seed */
+				benchmarking_option_set = true;
+				set_random_seed(optarg);
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -4698,10 +4747,6 @@ main(int argc, char **argv)
 	}
 	PQfinish(con);
 
-	/* set random seed */
-	INSTR_TIME_SET_CURRENT(start_time);
-	srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
-
 	/* set up thread data structures */
 	threads = (TState *) pg_malloc(sizeof(TState) * nthreads);
 	nclients_dealt = 0;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 3dd080e..8050f0c 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -210,11 +210,16 @@ COMMIT;
 } });
 
 # test expressions
+# command 1..3 and 23 depend on random seed which is used to call srandom.
 pgbench(
-	'-t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808',
+	'--random-seed=5432 -t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808',
 	0,
 	[ qr{type: .*/001_pgbench_expressions}, qr{processed: 1/1} ],
-	[   qr{command=4.: int 4\b},
+	[   qr{setting random seed to 5432\b},
+        qr{command=1.: int 28\b}, # uniform random
+	    qr{command=2.: int 7\b}, # exponential random
+	    qr{command=3.: int 47\b}, # gaussian random
+	    qr{command=4.: int 4\b},
 		qr{command=5.: int 5\b},
 		qr{command=6.: int 6\b},
 		qr{command=7.: int 7\b},
@@ -232,7 +237,7 @@ pgbench(
 		qr{command=19.: double 19\b},
 		qr{command=20.: double 20\b},
 		qr{command=21.: int 9223372036854775807\b},
-		qr{command=23.: int [1-9]\b},
+		qr{command=23.: int 1\b}, # zipfian random
 		qr{command=24.: double -27\b},
 		qr{command=25.: double 1024\b},
 		qr{command=26.: double 1\b},

#16

Chapman Flack

chap@anastigmatix.net

about 8 years ago

In reply to: Fabien COELHO (#15)

2 attachment(s)

Re: Re: [HACKERS] pgbench randomness initialization

On 01/02/18 05:57, Fabien COELHO wrote:

Here is a new version which output use used seed when a seed is
explicitely set with an option or from the environment.

This is a simple patch that does what it says on the tin. I ran into
trouble with the pgbench TAP test *even before applying the patch*, but
only because I was doing a VPATH build as a user without 'write'
on the source tree (001_pgbench_with_server.pl tried to make pgbench
create log files there). Bad me. Oddly, that was the only test in the
whole tree to have such an issue, so here I add a pre-patch to fix that.
Now my review needs a review. :)

With that taken care of, the added tests all pass for me. I wonder, though:

The tests assume that stdlib random/srandom behavior is standard thus
deterministic between platform.

Is the behavior of srandom() and the system generator really so precisely
specified that seed 5432 will produce the same values hardcoded in the
tests on all platforms? It does on mine, but could it produce spurious
test failures for others? I guess the (or a) spec would be here:

http://pubs.opengroup.org/onlinepubs/7908799/xsh/initstate.html

It specifies a "non-linear additive feedback random-number generator
employing a default state array size of 31 long integers", but it does
not pin down parameters or claim only one candidate exists.

To have the test run pgbench twice with the same seed and compare the
results sounds safer.

This revised pgbench-seed-4.patch changes the code not at all, and
the TAP test only in whitespace. I did some wordsmithing of the doc,
which I hope was not presumptuous of me, only as a conversation starter.
I expanded the second sentence because on my first reading I wasn't quite
sure of its meaning. Once I had looked at the code, I could see that the
sentence was economical and precise already, but I tried to find a version
that would also have been clear to the me-before-I-looked.

The documentation doesn't say that --random-seed= (or PGBENCH_RANDOM_SEED=)
will have the same effect as 'time', and indeed, I really think it should
be unset (defaulting to 'time'), or 'time', or 'rand', or an integer,
or an error. The error, right now, says only "expecting an unsigned
integer"; it should also mention time and rand. Should 'rand' be something
that conveys more about its meaning, 'strong' perhaps?

The documentation doesn't mention the allowed range of the unsigned
integer (which I get, because 'unsigned int' is exactly the signature
for srandom, but somebody might read "unsigned integer" in a more
generic sense). I'm not sure what would be a better way to say it.
The base (only decimal, as now in the code) isn't specified either.

Maybe the documentation should mention that the output now includes the
random seed being used, so that (even without planning ahead) a session
can be re-run with the same seed if necessary. It could just say "an
unsigned integer in decimal, as it is shown in pgbench's output" or
something like that.

Something more may need to be said in the doc about reproducibility. I think
the real story is that a run can be reproduced if the number of clients is
equal to the number of threads. Although each thread has its own generator
state, each client does not (if there is more than one per thread), and as
soon as real select() delays start happening in CSTATE_WAIT_RESULT, the
clients dealt out to a given thread may not be hitting that thread's
generator in a deterministic order.

-Chap

Attachments:

pgbench-TAP-VPATH-1.patchtext/x-patch; name=pgbench-TAP-VPATH-1.patchDownload

From 4fe8f8563fedcb04b167db476adaf3bc398f879d Mon Sep 17 00:00:00 2001
From: Chapman Flack <chap@anastigmatix.net>
Date: Mon, 8 Jan 2018 23:15:01 -0500
Subject: [PATCH 1/2] Survive VPATH build without w on source tree.

A couple tests pass log options to pgbench, with a bare filename
for --log-prefix, so it would be created in the current directory.
In a VPATH build, inconveniently, the Makefile chdirs into the
original source tree before running this, and if the build user has
no write access there, the tests fail. Chdir back to a nice writable
$TESTDIR/tmp_check to avoid that.
---
 src/bin/pgbench/t/001_pgbench_with_server.pl | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 3dd080e..8305db3 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -5,6 +5,11 @@ use PostgresNode;
 use TestLib;
 use Test::More;
 
+if ( exists $ENV{'TESTDIR'} ) {
+	chdir $ENV{'TESTDIR'};
+	-d 'tmp_check' && -w 'tmp_check' && chdir 'tmp_check';
+}
+
 # start a pgbench specific server
 my $node = get_new_node('main');
 $node->init;
-- 
2.7.3

pgbench-seed-4.patchtext/x-patch; name=pgbench-seed-4.patchDownload

From 33d5ec2b863fe3e8198d1c89de91bc36b52db5cc Mon Sep 17 00:00:00 2001
From: Chapman Flack <chap@anastigmatix.net>
Date: Tue, 9 Jan 2018 01:29:43 -0500
Subject: [PATCH 2/2] New pgbench --random-seed option for reproducibility.

---
 doc/src/sgml/ref/pgbench.sgml                | 33 +++++++++++++++++
 src/bin/pgbench/pgbench.c                    | 53 +++++++++++++++++++++++++---
 src/bin/pgbench/t/001_pgbench_with_server.pl | 11 ++++--
 3 files changed, 90 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 1519fe7..13b1a0a 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -680,6 +680,39 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--random-seed=</option><replaceable>SEED</replaceable></term>
+      <listitem>
+       <para>
+        Set random generator seed.  Seeds the system random number generator,
+        which then produces a sequence of initial generator states, one for
+        each thread.
+        Values for <replaceable>SEED</replaceable> may be:
+        <literal>time</literal> (the default, the seed is based on the current time),
+        <literal>rand</literal> (use a strong random source, failing if none
+        is available), or any unsigned integer value.
+        The random generator is invoked explicitly from a pgbench script
+        (<literal>random...</literal> functions) or implicitly (for instance option
+        <option>--rate</option> uses it to schedule transactions).
+        Any value allowed for <replaceable>SEED</replaceable> may also be
+        provided through the environment variable
+        <literal>PGBENCH_RANDOM_SEED</literal>.
+        To ensure that the provided seed impacts all possible uses, put this option
+        first or use the environment variable.
+      </para>
+      <para>
+        Setting the seed explicitly allows to reproduce a <command>pgbench</command>
+        run exactly, as far as random numbers are concerned.
+        From a statistical viewpoint this is a bad idea because it can hide the
+        performance variability or improve performance unduly, e.g. by hitting
+        the same pages as a previous run.
+        However, it may also be of great help for debugging, for instance
+        re-running a tricky case which leads to an error.
+        Use wisely.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--sampling-rate=<replaceable>rate</replaceable></option></term>
       <listitem>
        <para>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index fc2c734..e1da3fd 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -557,6 +557,7 @@ usage(void)
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
+		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
 		   "  -d, --debug              print debugging output\n"
@@ -4010,6 +4011,46 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	}
 }
 
+/* call srandom based on some seed. NULL triggers the default behavior. */
+static void
+set_random_seed(const char *seed)
+{
+	unsigned int iseed;
+
+	if (seed == NULL || *seed == '\0' || strcmp(seed, "time") == 0)
+	{
+		/* rely on current time */
+		instr_time	now;
+		INSTR_TIME_SET_CURRENT(now);
+		iseed = (unsigned int) INSTR_TIME_GET_MICROSEC(now);
+	}
+	else if (strcmp(seed, "rand") == 0)
+	{
+		/* use some "strong" random source */
+		if (!pg_strong_random(&iseed, sizeof(iseed)))
+		{
+			fprintf(stderr, "cannot seed random from a strong source\n");
+			exit(1);
+		}
+	}
+	else
+	{
+		/* parse seed value coming either from option or environment */
+		char garbage;
+		if (sscanf(seed, "%u%c", &iseed, &garbage) != 1)
+		{
+			fprintf(stderr,
+					"error while scanning '%s', expecting an unsigned integer\n",
+					seed);
+			exit(1);
+		}
+	}
+
+	if (seed != NULL)
+		fprintf(stderr, "setting random seed to %u\n", iseed);
+	srandom(iseed);
+}
+
 
 int
 main(int argc, char **argv)
@@ -4052,6 +4093,7 @@ main(int argc, char **argv)
 		{"progress-timestamp", no_argument, NULL, 6},
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
+		{"random-seed", required_argument, NULL, 9},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -4120,6 +4162,9 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
+	/* set random seed early, because it may be used while parsing scripts. */
+	set_random_seed(getenv("PGBENCH_RANDOM_SEED"));
+
 	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
 		char	   *script;
@@ -4392,6 +4437,10 @@ main(int argc, char **argv)
 				initialization_option_set = true;
 				foreign_keys = true;
 				break;
+			case 9:				/* random-seed */
+				benchmarking_option_set = true;
+				set_random_seed(optarg);
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -4698,10 +4747,6 @@ main(int argc, char **argv)
 	}
 	PQfinish(con);
 
-	/* set random seed */
-	INSTR_TIME_SET_CURRENT(start_time);
-	srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
-
 	/* set up thread data structures */
 	threads = (TState *) pg_malloc(sizeof(TState) * nthreads);
 	nclients_dealt = 0;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 8305db3..9497818 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -215,11 +215,16 @@ COMMIT;
 } });
 
 # test expressions
+# command 1..3 and 23 depend on random seed which is used to call srandom.
 pgbench(
-	'-t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808',
+	'--random-seed=5432 -t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808',
 	0,
 	[ qr{type: .*/001_pgbench_expressions}, qr{processed: 1/1} ],
-	[   qr{command=4.: int 4\b},
+	[   qr{setting random seed to 5432\b},
+		qr{command=1.: int 28\b}, # uniform random
+		qr{command=2.: int 7\b}, # exponential random
+		qr{command=3.: int 47\b}, # gaussian random
+		qr{command=4.: int 4\b},
 		qr{command=5.: int 5\b},
 		qr{command=6.: int 6\b},
 		qr{command=7.: int 7\b},
@@ -237,7 +242,7 @@ pgbench(
 		qr{command=19.: double 19\b},
 		qr{command=20.: double 20\b},
 		qr{command=21.: int 9223372036854775807\b},
-		qr{command=23.: int [1-9]\b},
+		qr{command=23.: int 1\b}, # zipfian random
 		qr{command=24.: double -27\b},
 		qr{command=25.: double 1024\b},
 		qr{command=26.: double 1\b},
-- 
2.7.3

#17

Chapman Flack

chap@anastigmatix.net

about 8 years ago

In reply to: Chapman Flack (#16)

Re: pgbench randomness initialization

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested

Initial review is the message this is in reply to.

The new status of this patch is: Waiting on Author

#18

Fabien COELHO

coelho@cri.ensmp.fr

about 8 years ago

In reply to: Chapman Flack (#16)

1 attachment(s)

Re: Re: [HACKERS] pgbench randomness initialization

Hello Chapman,

Thanks for the review,

The tests assume that stdlib random/srandom behavior is standard thus
deterministic between platform.

Is the behavior of srandom() and the system generator really so precisely
specified that seed 5432 will produce the same values hardcoded in the
tests on all platforms? [...]

Good question.

I'm hoping that in practice it would be the same, or that their would be
very few cases (eg linux vs windows vs macos...). I was counting on the
the buildfarm to tell me if I'm wrong, and fix it if needed.

To have the test run pgbench twice with the same seed and compare the
results sounds safer.

Interesting idea. The test script would be more complicated to do that,
though. I would prefer to bet on "random" determinism (:-) and resort to
such a solution only if it is not the case. Or maybe just put back some
"\d+" to keep it simple.

This is a debatable strategy.

I did some wordsmithing of the doc, which I hope was not presumptuous of
me, only as a conversation starter. [...]

Thanks for the doc improvements.

The documentation doesn't say that --random-seed= (or PGBENCH_RANDOM_SEED=)
will have the same effect as 'time', and indeed, I really think it should
be unset (defaulting to 'time'), or 'time', or 'rand', or an integer,
or an error.

Ok, done.

The error, right now, says only "expecting an unsigned integer"; it
should also mention time and rand.

Ok, done.

Should 'rand' be something that conveys more about its meaning, 'strong'
perhaps?

Hmmm. "Random means random":-). I have no opinion about whether it would
be better. ISTM that "strong" would require some explanations. I let is as
"rand" for now.

The documentation doesn't mention the allowed range of the unsigned
integer (which I get, because 'unsigned int' is exactly the signature
for srandom, but somebody might read "unsigned integer" in a more
generic sense).

Ok. I extended so that it works with octal, decimal and hexadecimal, and
updated the doc accordingly. I did not add range information though.

I'm not sure what would be a better way to say it.
The base (only decimal, as now in the code) isn't specified either.

Sure.

Maybe the documentation should mention that the output now includes the
random seed being used, so that (even without planning ahead) [...]

It does so only if the seed is explicitely set, otherwise it keeps the
previous behavior of being silent. I added a sentence about that.

Something more may need to be said in the doc about reproducibility. I think
the real story is that a run can be reproduced if the number of clients is
equal to the number of threads.

Yes, good point. I tried to hide the issue under the "as far as random
numbers are concerned". I've tried to improve this point in the doc.

Although each thread has its own generator state, each client does not
(if there is more than one per thread), and as soon as real select()
delays start happening in CSTATE_WAIT_RESULT, the clients dealt out to a
given thread may not be hitting that thread's generator in a
deterministic order.

Yep. This may evolve, for instance the error handling patch needs to
restart transactions so it adds a state into the client.

--
Fabien.

Attachments:

pgbench-seed-5.patchtext/x-diff; name=pgbench-seed-5.patchDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 1519fe7..7e81a51 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -680,6 +680,45 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--random-seed=</option><replaceable>SEED</replaceable></term>
+      <listitem>
+       <para>
+        Set random generator seed.  Seeds the system random number generator,
+        which then produces a sequence of initial generator states, one for
+        each thread.
+        Values for <replaceable>SEED</replaceable> may be:
+        <literal>time</literal> (the default, the seed is based on the current time),
+        <literal>rand</literal> (use a strong random source, failing if none
+        is available), or any unsigned octal (<literal>012470</literal>),
+        decimal (<literal>5432</literal>) or
+        hexedecimal (<literal>0x1538</literal>) integer value.
+        The random generator is invoked explicitly from a pgbench script
+        (<literal>random...</literal> functions) or implicitly (for instance option
+        <option>--rate</option> uses it to schedule transactions).
+        When explicitely set, the value used for seeding is shown on the terminal.
+        Any value allowed for <replaceable>SEED</replaceable> may also be
+        provided through the environment variable
+        <literal>PGBENCH_RANDOM_SEED</literal>.
+        To ensure that the provided seed impacts all possible uses, put this option
+        first or use the environment variable.
+      </para>
+      <para>
+        Setting the seed explicitly allows to reproduce a <command>pgbench</command>
+        run exactly, as far as random numbers are concerned.
+        As the random state is managed per thread, this means the exact same
+        <command>pgbench</command> run for an identical invocation if there is one
+        client per thread and there are no external or data dependencies.
+        From a statistical viewpoint reproducing runs exactly is a bad idea because
+        it can hide the performance variability or improve performance unduly,
+        e.g. by hitting the same pages as a previous run.
+        However, it may also be of great help for debugging, for instance
+        re-running a tricky case which leads to an error.
+        Use wisely.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--sampling-rate=<replaceable>rate</replaceable></option></term>
       <listitem>
        <para>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index fc2c734..d4ff4c7 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -557,6 +557,7 @@ usage(void)
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
+		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
 		   "  -d, --debug              print debugging output\n"
@@ -4010,6 +4011,55 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	}
 }
 
+/* call srandom based on some seed. NULL triggers the default behavior. */
+static void
+set_random_seed(const char *seed)
+{
+	unsigned int iseed;
+
+	if (seed == NULL || strcmp(seed, "time") == 0)
+	{
+		/* rely on current time */
+		instr_time	now;
+		INSTR_TIME_SET_CURRENT(now);
+		iseed = (unsigned int) INSTR_TIME_GET_MICROSEC(now);
+	}
+	else if (strcmp(seed, "rand") == 0)
+	{
+		/* use some "strong" random source */
+		if (!pg_strong_random(&iseed, sizeof(iseed)))
+		{
+			fprintf(stderr, "cannot seed random from a strong source\n");
+			exit(1);
+		}
+	}
+	else
+	{
+		/* parse uint seed value */
+		char garbage;
+		if (!
+			/* hexa */
+			((seed[0] == '0' && (seed[1] == 'x' || seed[1] == 'X') &&
+			 sscanf(seed, "%x%c", &iseed, &garbage) == 1) ||
+			 /* octal */
+			(seed[0] == '0' && seed[1] != 'x' && seed[1] != 'X' &&
+			 sscanf(seed, "%o%c", &iseed, &garbage) == 1) ||
+			 /* decimal */
+			 (seed[0] != '0' &&
+			  sscanf(seed, "%u%c", &iseed, &garbage) == 1)))
+		{
+			fprintf(stderr,
+					"error while scanning '%s', expecting an unsigned integer, 'time' or 'rand'\n",
+					seed);
+			exit(1);
+		}
+	}
+
+	if (seed != NULL)
+		fprintf(stderr, "setting random seed to %u\n", iseed);
+	srandom(iseed);
+}
+
 
 int
 main(int argc, char **argv)
@@ -4052,6 +4102,7 @@ main(int argc, char **argv)
 		{"progress-timestamp", no_argument, NULL, 6},
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
+		{"random-seed", required_argument, NULL, 9},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -4120,6 +4171,9 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
+	/* set random seed early, because it may be used while parsing scripts. */
+	set_random_seed(getenv("PGBENCH_RANDOM_SEED"));
+
 	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
 		char	   *script;
@@ -4392,6 +4446,10 @@ main(int argc, char **argv)
 				initialization_option_set = true;
 				foreign_keys = true;
 				break;
+			case 9:				/* random-seed */
+				benchmarking_option_set = true;
+				set_random_seed(optarg);
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -4698,10 +4756,6 @@ main(int argc, char **argv)
 	}
 	PQfinish(con);
 
-	/* set random seed */
-	INSTR_TIME_SET_CURRENT(start_time);
-	srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
-
 	/* set up thread data structures */
 	threads = (TState *) pg_malloc(sizeof(TState) * nthreads);
 	nclients_dealt = 0;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 3dd080e..7fee671 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -210,11 +210,16 @@ COMMIT;
 } });
 
 # test expressions
+# command 1..3 and 23 depend on random seed which is used to call srandom.
 pgbench(
-	'-t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808',
+	'--random-seed=0x1538 -t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808',
 	0,
 	[ qr{type: .*/001_pgbench_expressions}, qr{processed: 1/1} ],
-	[   qr{command=4.: int 4\b},
+	[   qr{setting random seed to 5432\b},
+		qr{command=1.: int 28\b}, # uniform random
+		qr{command=2.: int 7\b}, # exponential random
+		qr{command=3.: int 47\b}, # gaussian random
+		qr{command=4.: int 4\b},
 		qr{command=5.: int 5\b},
 		qr{command=6.: int 6\b},
 		qr{command=7.: int 7\b},
@@ -232,7 +237,7 @@ pgbench(
 		qr{command=19.: double 19\b},
 		qr{command=20.: double 20\b},
 		qr{command=21.: int 9223372036854775807\b},
-		qr{command=23.: int [1-9]\b},
+		qr{command=23.: int 1\b}, # zipfian random
 		qr{command=24.: double -27\b},
 		qr{command=25.: double 1024\b},
 		qr{command=26.: double 1\b},
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 6ea55f8..6853650 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -78,6 +78,8 @@ my @options = (
 	[ 'invalid init step', '-i -I dta',
 		[qr{unrecognized initialization step},
 		 qr{allowed steps are} ] ],
+	[ 'bad random seed', '--random-seed=one',
+		[qr{error while scanning 'one', expecting an unsigned integer} ] ],
 
 	# loging sub-options
 	[   'sampling => log', '--sampling-rate=0.01',

#19

Fabien COELHO

coelho@cri.ensmp.fr

about 8 years ago

In reply to: Chapman Flack (#16)

1 attachment(s)

Re: Re: [HACKERS] pgbench randomness initialization

This is a simple patch that does what it says on the tin. I ran into
trouble with the pgbench TAP test *even before applying the patch*, but
only because I was doing a VPATH build as a user without 'write'
on the source tree (001_pgbench_with_server.pl tried to make pgbench
create log files there). Bad me. Oddly, that was the only test in the
whole tree to have such an issue, so here I add a pre-patch to fix that.
Now my review needs a review. :)

Yep. I find the multiple chdir solution a little bit too extreme.

ISTM that it should rather add the correct path to --log-prefix by
prepending $node->basedir, like the pgbench function does for -f scripts.

See attached.

--
Fabien.

Attachments:

pgbench-vpath-2.patchtext/x-diff; name=pgbench-vpath-2.patchDownload

diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index e579334..ba7e363 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -611,24 +611,26 @@ sub check_pgbench_logs
 	ok(unlink(@logs), "remove log files");
 }
 
+my $bdir = $node->basedir;
+
 # with sampling rate
 pgbench(
-'-n -S -t 50 -c 2 --log --log-prefix=001_pgbench_log_2 --sampling-rate=0.5',
+"-n -S -t 50 -c 2 --log --log-prefix=$bdir/001_pgbench_log_2 --sampling-rate=0.5",
 	0,
 	[ qr{select only}, qr{processed: 100/100} ],
 	[qr{^$}],
 	'pgbench logs');
 
-check_pgbench_logs('001_pgbench_log_2', 1, 8, 92,
+check_pgbench_logs("$bdir/001_pgbench_log_2", 1, 8, 92,
 	qr{^0 \d{1,2} \d+ \d \d+ \d+$});
 
 # check log file in some detail
 pgbench(
-	'-n -b se -t 10 -l --log-prefix=001_pgbench_log_3',
+	"-n -b se -t 10 -l --log-prefix=$bdir/001_pgbench_log_3",
 	0, [ qr{select only}, qr{processed: 10/10} ],
 	[qr{^$}], 'pgbench logs contents');
 
-check_pgbench_logs('001_pgbench_log_3', 1, 10, 10,
+check_pgbench_logs("$bdir/001_pgbench_log_3", 1, 10, 10,
 	qr{^\d \d{1,2} \d+ \d \d+ \d+$});
 
 # done

#20

Fabien COELHO

coelho@cri.ensmp.fr

about 8 years ago

In reply to: Fabien COELHO (#18)

1 attachment(s)

Re: Re: [HACKERS] pgbench randomness initialization

Here is a rebase, plus some more changes:

I have improved the error message to tell from where the value was
provided.

I have removed the test to the exact values produced from the expression
test run.

I have added a test which run from the same seed value several times
and checks that the output values are the same.

--
Fabien.

Attachments:

pgbench-seed-6.patchtext/x-diff; name=pgbench-seed-6.patchDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 3dd492c..22ebb51 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -680,6 +680,45 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--random-seed=</option><replaceable>SEED</replaceable></term>
+      <listitem>
+       <para>
+        Set random generator seed.  Seeds the system random number generator,
+        which then produces a sequence of initial generator states, one for
+        each thread.
+        Values for <replaceable>SEED</replaceable> may be:
+        <literal>time</literal> (the default, the seed is based on the current time),
+        <literal>rand</literal> (use a strong random source, failing if none
+        is available), or any unsigned octal (<literal>012470</literal>),
+        decimal (<literal>5432</literal>) or
+        hexedecimal (<literal>0x1538</literal>) integer value.
+        The random generator is invoked explicitly from a pgbench script
+        (<literal>random...</literal> functions) or implicitly (for instance option
+        <option>--rate</option> uses it to schedule transactions).
+        When explicitely set, the value used for seeding is shown on the terminal.
+        Any value allowed for <replaceable>SEED</replaceable> may also be
+        provided through the environment variable
+        <literal>PGBENCH_RANDOM_SEED</literal>.
+        To ensure that the provided seed impacts all possible uses, put this option
+        first or use the environment variable.
+      </para>
+      <para>
+        Setting the seed explicitly allows to reproduce a <command>pgbench</command>
+        run exactly, as far as random numbers are concerned.
+        As the random state is managed per thread, this means the exact same
+        <command>pgbench</command> run for an identical invocation if there is one
+        client per thread and there are no external or data dependencies.
+        From a statistical viewpoint reproducing runs exactly is a bad idea because
+        it can hide the performance variability or improve performance unduly,
+        e.g. by hitting the same pages as a previous run.
+        However, it may also be of great help for debugging, for instance
+        re-running a tricky case which leads to an error.
+        Use wisely.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--sampling-rate=<replaceable>rate</replaceable></option></term>
       <listitem>
        <para>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 31ea6ca..206dfd5 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -560,6 +560,7 @@ usage(void)
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
+		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
 		   "  -d, --debug              print debugging output\n"
@@ -4362,6 +4363,55 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	}
 }
 
+/* call srandom based on some seed. NULL triggers the default behavior. */
+static void
+set_random_seed(const char *seed, const char *origin)
+{
+	unsigned int iseed;
+
+	if (seed == NULL || strcmp(seed, "time") == 0)
+	{
+		/* rely on current time */
+		instr_time	now;
+		INSTR_TIME_SET_CURRENT(now);
+		iseed = (unsigned int) INSTR_TIME_GET_MICROSEC(now);
+	}
+	else if (strcmp(seed, "rand") == 0)
+	{
+		/* use some "strong" random source */
+		if (!pg_strong_random(&iseed, sizeof(iseed)))
+		{
+			fprintf(stderr, "cannot seed random from a strong source\n");
+			exit(1);
+		}
+	}
+	else
+	{
+		/* parse uint seed value */
+		char garbage;
+		if (!
+			/* hexa */
+			((seed[0] == '0' && (seed[1] == 'x' || seed[1] == 'X') &&
+			 sscanf(seed, "%x%c", &iseed, &garbage) == 1) ||
+			 /* octal */
+			(seed[0] == '0' && seed[1] != 'x' && seed[1] != 'X' &&
+			 sscanf(seed, "%o%c", &iseed, &garbage) == 1) ||
+			 /* decimal */
+			 (seed[0] != '0' &&
+			  sscanf(seed, "%u%c", &iseed, &garbage) == 1)))
+		{
+			fprintf(stderr,
+					"error while scanning '%s' from %s, expecting an unsigned integer, 'time' or 'rand'\n",
+					seed, origin);
+			exit(1);
+		}
+	}
+
+	if (seed != NULL)
+		fprintf(stderr, "setting random seed to %u\n", iseed);
+	srandom(iseed);
+}
+
 
 int
 main(int argc, char **argv)
@@ -4404,6 +4454,7 @@ main(int argc, char **argv)
 		{"progress-timestamp", no_argument, NULL, 6},
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
+		{"random-seed", required_argument, NULL, 9},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -4472,6 +4523,9 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
+	/* set random seed early, because it may be used while parsing scripts. */
+	set_random_seed(getenv("PGBENCH_RANDOM_SEED"), "PGBENCH_RANDOM_SEED environment variable");
+
 	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
 		char	   *script;
@@ -4744,6 +4798,10 @@ main(int argc, char **argv)
 				initialization_option_set = true;
 				foreign_keys = true;
 				break;
+			case 9:				/* random-seed */
+				benchmarking_option_set = true;
+				set_random_seed(optarg, "--random-seed option");
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -5050,10 +5108,6 @@ main(int argc, char **argv)
 	}
 	PQfinish(con);
 
-	/* set random seed */
-	INSTR_TIME_SET_CURRENT(start_time);
-	srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
-
 	/* set up thread data structures */
 	threads = (TState *) pg_malloc(sizeof(TState) * nthreads);
 	nclients_dealt = 0;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index e579334..26ea535 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -210,14 +210,18 @@ COMMIT;
 } });
 
 # test expressions
+# command 1..3 and 23 depend on random seed which is used to call srandom.
 pgbench(
-	'-t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808 -Dn=null -Dt=t -Df=of -Dd=1.0',
+	'--random-seed=0x1538 -t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808 -Dn=null -Dt=t -Df=of -Dd=1.0',
 	0,
 	[ qr{type: .*/001_pgbench_expressions}, qr{processed: 1/1} ],
-	[   qr{command=1.: int 1\d\b},
-	    qr{command=2.: int 1\d\d\b},
-	    qr{command=3.: int 1\d\d\d\b},
-	    qr{command=4.: int 4\b},
+	[   qr{setting random seed to 5432\b},
+		# After explicit seeding, the four * random checks (1-3,20) should be
+		# deterministic, but not necessarily portable.
+		qr{command=1.: int 1\d\b}, # uniform random: 12 on linux
+		qr{command=2.: int 1\d\d\b}, # exponential random: 106 on linux
+		qr{command=3.: int 1\d\d\d\b}, # gaussian random: 1462 on linux
+		qr{command=4.: int 4\b},
 		qr{command=5.: int 5\b},
 		qr{command=6.: int 6\b},
 		qr{command=7.: int 7\b},
@@ -230,7 +234,7 @@ pgbench(
 		qr{command=16.: double 16\b},
 		qr{command=17.: double 17\b},
 		qr{command=18.: int 9223372036854775807\b},
-		qr{command=20.: int [1-9]\b},
+		qr{command=20.: int \d\b}, # zipfian random: 1 on linux
 		qr{command=21.: double -27\b},
 		qr{command=22.: double 1024\b},
 		qr{command=23.: double 1\b},
@@ -340,6 +344,45 @@ pgbench(
 SELECT :v0, :v1, :v2, :v3;
 } });
 
+# random determinism when seeded
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE seeded_random(rand TEXT NOT NULL, val INTEGER NOT NULL);');
+
+# same value in decimal, octal and hexadecimal
+for my $seed ('5432', '012470', '0x1538', '0X1538')
+{
+    pgbench("--random-seed=$seed -t 1",
+	0,
+	[qr{processed: 1/1}],
+	[qr{setting random seed to 5432\b}],
+	"random seeded with $seed",
+	{ "001_pgbench_random_seed_$seed" => q{-- test random functions
+\set ur random(1000, 1999)
+\set er random_exponential(2000, 2999, 2.0)
+\set gr random_gaussian(3000, 3999, 3.0)
+\set zr random_zipfian(4000, 4999, 2.5)
+INSERT INTO seeded_random(rand, val) VALUES
+  ('uniform', :ur),
+  ('exponential', :er),
+  ('gaussian', :gr),
+  ('zipfian', :zr);
+} });
+}
+
+# check that all runs generated the same 4 values
+my ($ret, $out, $err) =
+  $node->psql('postgres',
+	'SELECT rand, val, COUNT(*) FROM seeded_random GROUP BY rand, val');
+
+ok($ret == 0, "psql seeded_random count ok");
+ok($err eq '', "psql seeded_random count stderr is empty");
+ok($out =~ /uniform\|1\d\d\d\|4/, "psql seeded_random count uniform");
+ok($out =~ /exponential\|2\d\d\d\|4/, "psql seeded_random count exponential");
+ok($out =~ /gaussian\|3\d\d\d\|4/, "psql seeded_random count gaussian");
+ok($out =~ /zipfian\|4\d\d\d\|4/, "psql seeded_random count zipfian");
+
+$node->safe_psql('postgres', 'DROP TABLE seeded_random;');
+
 =head
 
 } });
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 6ea55f8..c015f36 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -78,6 +78,8 @@ my @options = (
 	[ 'invalid init step', '-i -I dta',
 		[qr{unrecognized initialization step},
 		 qr{allowed steps are} ] ],
+	[ 'bad random seed', '--random-seed=one',
+		[qr{error while scanning 'one' from --random-seed option, expecting an unsigned integer} ] ],
 
 	# loging sub-options
 	[   'sampling => log', '--sampling-rate=0.01',

#21

Fabien COELHO

coelho@cri.ensmp.fr

almost 8 years ago

In reply to: Fabien COELHO (#20)

1 attachment(s)

Re: Re: [HACKERS] pgbench randomness initialization

Here is a rebase, plus some more changes:

I have improved the error message to tell from where the value was provided.

I have removed the test to the exact values produced from the expression test
run.

I have added a test which run from the same seed value several times
and checks that the output values are the same.

This version adds a :random_seed script variable, for information.

--
Fabien.

Attachments:

pgbench-seed-7.patchtext/x-diff; name=pgbench-seed-7.patchDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 3dd492c..aa1a9e0 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -680,6 +680,45 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--random-seed=</option><replaceable>SEED</replaceable></term>
+      <listitem>
+       <para>
+        Set random generator seed.  Seeds the system random number generator,
+        which then produces a sequence of initial generator states, one for
+        each thread.
+        Values for <replaceable>SEED</replaceable> may be:
+        <literal>time</literal> (the default, the seed is based on the current time),
+        <literal>rand</literal> (use a strong random source, failing if none
+        is available), or any unsigned octal (<literal>012470</literal>),
+        decimal (<literal>5432</literal>) or
+        hexedecimal (<literal>0x1538</literal>) integer value.
+        The random generator is invoked explicitly from a pgbench script
+        (<literal>random...</literal> functions) or implicitly (for instance option
+        <option>--rate</option> uses it to schedule transactions).
+        When explicitely set, the value used for seeding is shown on the terminal.
+        Any value allowed for <replaceable>SEED</replaceable> may also be
+        provided through the environment variable
+        <literal>PGBENCH_RANDOM_SEED</literal>.
+        To ensure that the provided seed impacts all possible uses, put this option
+        first or use the environment variable.
+      </para>
+      <para>
+        Setting the seed explicitly allows to reproduce a <command>pgbench</command>
+        run exactly, as far as random numbers are concerned.
+        As the random state is managed per thread, this means the exact same
+        <command>pgbench</command> run for an identical invocation if there is one
+        client per thread and there are no external or data dependencies.
+        From a statistical viewpoint reproducing runs exactly is a bad idea because
+        it can hide the performance variability or improve performance unduly,
+        e.g. by hitting the same pages as a previous run.
+        However, it may also be of great help for debugging, for instance
+        re-running a tricky case which leads to an error.
+        Use wisely.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--sampling-rate=<replaceable>rate</replaceable></option></term>
       <listitem>
        <para>
@@ -874,14 +913,19 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
 
      <tbody>
       <row>
-       <entry> <literal>scale</literal> </entry>
-       <entry>current scale factor</entry>
-      </row>
-
-      <row>
        <entry> <literal>client_id</literal> </entry>
        <entry>unique number identifying the client session (starts from zero)</entry>
       </row>
+
+      <row>
+       <entry> <literal>random_seed</literal> </entry>
+       <entry>random generator seed</entry>
+      </row>
+
+      <row>
+       <entry> <literal>scale</literal> </entry>
+       <entry>current scale factor</entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 31ea6ca..c49a48a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -146,6 +146,9 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
+/* random seed used when calling srandom() */
+int64 random_seed = -1;
+
 /*
  * end of configurable parameters
  *********************************************************************/
@@ -560,6 +563,7 @@ usage(void)
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
+		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
 		   "  -d, --debug              print debugging output\n"
@@ -4362,6 +4366,57 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	}
 }
 
+/* call srandom based on some seed. NULL triggers the default behavior. */
+static void
+set_random_seed(const char *seed, const char *origin)
+{
+	unsigned int iseed;
+
+	if (seed == NULL || strcmp(seed, "time") == 0)
+	{
+		/* rely on current time */
+		instr_time	now;
+		INSTR_TIME_SET_CURRENT(now);
+		iseed = (unsigned int) INSTR_TIME_GET_MICROSEC(now);
+	}
+	else if (strcmp(seed, "rand") == 0)
+	{
+		/* use some "strong" random source */
+		if (!pg_strong_random(&iseed, sizeof(iseed)))
+		{
+			fprintf(stderr, "cannot seed random from a strong source\n");
+			exit(1);
+		}
+	}
+	else
+	{
+		/* parse uint seed value */
+		char garbage;
+		if (!
+			/* hexa */
+			((seed[0] == '0' && (seed[1] == 'x' || seed[1] == 'X') &&
+			 sscanf(seed, "%x%c", &iseed, &garbage) == 1) ||
+			 /* octal */
+			(seed[0] == '0' && seed[1] != 'x' && seed[1] != 'X' &&
+			 sscanf(seed, "%o%c", &iseed, &garbage) == 1) ||
+			 /* decimal */
+			 (seed[0] != '0' &&
+			  sscanf(seed, "%u%c", &iseed, &garbage) == 1)))
+		{
+			fprintf(stderr,
+					"error while scanning '%s' from %s, expecting an unsigned integer, 'time' or 'rand'\n",
+					seed, origin);
+			exit(1);
+		}
+	}
+
+	if (seed != NULL)
+		fprintf(stderr, "setting random seed to %u\n", iseed);
+	srandom(iseed);
+	/* no precision loss: 32 bit unsigned int cast to 64 bit int */
+	random_seed = iseed;
+}
+
 
 int
 main(int argc, char **argv)
@@ -4404,6 +4459,7 @@ main(int argc, char **argv)
 		{"progress-timestamp", no_argument, NULL, 6},
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
+		{"random-seed", required_argument, NULL, 9},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -4472,6 +4528,9 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
+	/* set random seed early, because it may be used while parsing scripts. */
+	set_random_seed(getenv("PGBENCH_RANDOM_SEED"), "PGBENCH_RANDOM_SEED environment variable");
+
 	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
 		char	   *script;
@@ -4744,6 +4803,10 @@ main(int argc, char **argv)
 				initialization_option_set = true;
 				foreign_keys = true;
 				break;
+			case 9:				/* random-seed */
+				benchmarking_option_set = true;
+				set_random_seed(optarg, "--random-seed option");
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -5033,6 +5096,16 @@ main(int argc, char **argv)
 		}
 	}
 
+	/* idem for :random_seed */
+	if (lookupVariable(&state[0], "random_seed") == NULL)
+	{
+		for (i = 0; i < nclients; i++)
+		{
+			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+				exit(1);
+		}
+	}
+
 	if (!is_no_vacuum)
 	{
 		fprintf(stderr, "starting vacuum...");
@@ -5050,10 +5123,6 @@ main(int argc, char **argv)
 	}
 	PQfinish(con);
 
-	/* set random seed */
-	INSTR_TIME_SET_CURRENT(start_time);
-	srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
-
 	/* set up thread data structures */
 	threads = (TState *) pg_malloc(sizeof(TState) * nthreads);
 	nclients_dealt = 0;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index a8b2962..de99e4e 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -210,14 +210,18 @@ COMMIT;
 } });
 
 # test expressions
+# command 1..3 and 23 depend on random seed which is used to call srandom.
 pgbench(
-	'-t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808 -Dn=null -Dt=t -Df=of -Dd=1.0',
+	'--random-seed=0x1538 -t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808 -Dn=null -Dt=t -Df=of -Dd=1.0',
 	0,
 	[ qr{type: .*/001_pgbench_expressions}, qr{processed: 1/1} ],
-	[   qr{command=1.: int 1\d\b},
-	    qr{command=2.: int 1\d\d\b},
-	    qr{command=3.: int 1\d\d\d\b},
-	    qr{command=4.: int 4\b},
+	[   qr{setting random seed to 5432\b},
+		# After explicit seeding, the four * random checks (1-3,20) should be
+		# deterministic, but not necessarily portable.
+		qr{command=1.: int 1\d\b}, # uniform random: 12 on linux
+		qr{command=2.: int 1\d\d\b}, # exponential random: 106 on linux
+		qr{command=3.: int 1\d\d\d\b}, # gaussian random: 1462 on linux
+		qr{command=4.: int 4\b},
 		qr{command=5.: int 5\b},
 		qr{command=6.: int 6\b},
 		qr{command=7.: int 7\b},
@@ -230,7 +234,7 @@ pgbench(
 		qr{command=16.: double 16\b},
 		qr{command=17.: double 17\b},
 		qr{command=18.: int 9223372036854775807\b},
-		qr{command=20.: int [1-9]\b},
+		qr{command=20.: int \d\b}, # zipfian random: 1 on linux
 		qr{command=21.: double -27\b},
 		qr{command=22.: double 1024\b},
 		qr{command=23.: double 1\b},
@@ -259,6 +263,9 @@ pgbench(
 		qr{command=46.: int 46\b},
 		qr{command=47.: boolean true\b},
 		qr{command=48.: boolean true\b},
+		qr{command=53.: int 1\b},    # :scale
+		qr{command=54.: int 0\b},    # :client_id
+		qr{command=55.: int 5432\b}, # :random_seed
 	],
 	'pgbench expressions',
 	{   '001_pgbench_expressions' => q{-- integer functions
@@ -332,6 +339,10 @@ pgbench(
 \set yz debug(case when :zy = 0 then -1 else (1 / :zy) end)
 \set yz debug(case when :zy = 0 or (1 / :zy) < 0 then -1 else (1 / :zy) end)
 \set yz debug(case when :zy > 0 and (1 / :zy) < 0 then (1 / :zy) else 1 end)
+-- check automatic variables
+\set sc debug(:scale)
+\set ci debug(:client_id)
+\set rs debug(:random_seed)
 -- substitute variables of all possible types
 \set v0 NULL
 \set v1 TRUE
@@ -340,6 +351,45 @@ pgbench(
 SELECT :v0, :v1, :v2, :v3;
 } });
 
+# random determinism when seeded
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE seeded_random(rand TEXT NOT NULL, val INTEGER NOT NULL);');
+
+# same value in decimal, octal and hexadecimal
+for my $seed ('5432', '012470', '0x1538', '0X1538')
+{
+    pgbench("--random-seed=$seed -t 1",
+	0,
+	[qr{processed: 1/1}],
+	[qr{setting random seed to 5432\b}],
+	"random seeded with $seed",
+	{ "001_pgbench_random_seed_$seed" => q{-- test random functions
+\set ur random(1000, 1999)
+\set er random_exponential(2000, 2999, 2.0)
+\set gr random_gaussian(3000, 3999, 3.0)
+\set zr random_zipfian(4000, 4999, 2.5)
+INSERT INTO seeded_random(rand, val) VALUES
+  ('uniform', :ur),
+  ('exponential', :er),
+  ('gaussian', :gr),
+  ('zipfian', :zr);
+} });
+}
+
+# check that all runs generated the same 4 values
+my ($ret, $out, $err) =
+  $node->psql('postgres',
+	'SELECT rand, val, COUNT(*) FROM seeded_random GROUP BY rand, val');
+
+ok($ret == 0, "psql seeded_random count ok");
+ok($err eq '', "psql seeded_random count stderr is empty");
+ok($out =~ /uniform\|1\d\d\d\|4/, "psql seeded_random count uniform");
+ok($out =~ /exponential\|2\d\d\d\|4/, "psql seeded_random count exponential");
+ok($out =~ /gaussian\|3\d\d\d\|4/, "psql seeded_random count gaussian");
+ok($out =~ /zipfian\|4\d\d\d\|4/, "psql seeded_random count zipfian");
+
+$node->safe_psql('postgres', 'DROP TABLE seeded_random;');
+
 =head
 
 } });
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 6ea55f8..c015f36 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -78,6 +78,8 @@ my @options = (
 	[ 'invalid init step', '-i -I dta',
 		[qr{unrecognized initialization step},
 		 qr{allowed steps are} ] ],
+	[ 'bad random seed', '--random-seed=one',
+		[qr{error while scanning 'one' from --random-seed option, expecting an unsigned integer} ] ],
 
 	# loging sub-options
 	[   'sampling => log', '--sampling-rate=0.01',

#22

Fabien COELHO

coelho@cri.ensmp.fr

almost 8 years ago

In reply to: Fabien COELHO (#21)

1 attachment(s)

Re: Re: [HACKERS] pgbench randomness initialization

Here is a rebase, plus some more changes:

I have improved the error message to tell from where the value was
provided.

I have removed the test to the exact values produced from the expression
test run.

I have added a test which run from the same seed value several times
and checks that the output values are the same.

This version adds a :random_seed script variable, for information.

Rebased.

--
Fabien.

Attachments:

pgbench-seed-8.patchtext/x-diff; name=pgbench-seed-8.patchDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 3dd492c..aa1a9e0 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -680,6 +680,45 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--random-seed=</option><replaceable>SEED</replaceable></term>
+      <listitem>
+       <para>
+        Set random generator seed.  Seeds the system random number generator,
+        which then produces a sequence of initial generator states, one for
+        each thread.
+        Values for <replaceable>SEED</replaceable> may be:
+        <literal>time</literal> (the default, the seed is based on the current time),
+        <literal>rand</literal> (use a strong random source, failing if none
+        is available), or any unsigned octal (<literal>012470</literal>),
+        decimal (<literal>5432</literal>) or
+        hexedecimal (<literal>0x1538</literal>) integer value.
+        The random generator is invoked explicitly from a pgbench script
+        (<literal>random...</literal> functions) or implicitly (for instance option
+        <option>--rate</option> uses it to schedule transactions).
+        When explicitely set, the value used for seeding is shown on the terminal.
+        Any value allowed for <replaceable>SEED</replaceable> may also be
+        provided through the environment variable
+        <literal>PGBENCH_RANDOM_SEED</literal>.
+        To ensure that the provided seed impacts all possible uses, put this option
+        first or use the environment variable.
+      </para>
+      <para>
+        Setting the seed explicitly allows to reproduce a <command>pgbench</command>
+        run exactly, as far as random numbers are concerned.
+        As the random state is managed per thread, this means the exact same
+        <command>pgbench</command> run for an identical invocation if there is one
+        client per thread and there are no external or data dependencies.
+        From a statistical viewpoint reproducing runs exactly is a bad idea because
+        it can hide the performance variability or improve performance unduly,
+        e.g. by hitting the same pages as a previous run.
+        However, it may also be of great help for debugging, for instance
+        re-running a tricky case which leads to an error.
+        Use wisely.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--sampling-rate=<replaceable>rate</replaceable></option></term>
       <listitem>
        <para>
@@ -874,14 +913,19 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
 
      <tbody>
       <row>
-       <entry> <literal>scale</literal> </entry>
-       <entry>current scale factor</entry>
-      </row>
-
-      <row>
        <entry> <literal>client_id</literal> </entry>
        <entry>unique number identifying the client session (starts from zero)</entry>
       </row>
+
+      <row>
+       <entry> <literal>random_seed</literal> </entry>
+       <entry>random generator seed</entry>
+      </row>
+
+      <row>
+       <entry> <literal>scale</literal> </entry>
+       <entry>current scale factor</entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 31ea6ca..c49a48a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -146,6 +146,9 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
+/* random seed used when calling srandom() */
+int64 random_seed = -1;
+
 /*
  * end of configurable parameters
  *********************************************************************/
@@ -560,6 +563,7 @@ usage(void)
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
+		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
 		   "  -d, --debug              print debugging output\n"
@@ -4362,6 +4366,57 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	}
 }
 
+/* call srandom based on some seed. NULL triggers the default behavior. */
+static void
+set_random_seed(const char *seed, const char *origin)
+{
+	unsigned int iseed;
+
+	if (seed == NULL || strcmp(seed, "time") == 0)
+	{
+		/* rely on current time */
+		instr_time	now;
+		INSTR_TIME_SET_CURRENT(now);
+		iseed = (unsigned int) INSTR_TIME_GET_MICROSEC(now);
+	}
+	else if (strcmp(seed, "rand") == 0)
+	{
+		/* use some "strong" random source */
+		if (!pg_strong_random(&iseed, sizeof(iseed)))
+		{
+			fprintf(stderr, "cannot seed random from a strong source\n");
+			exit(1);
+		}
+	}
+	else
+	{
+		/* parse uint seed value */
+		char garbage;
+		if (!
+			/* hexa */
+			((seed[0] == '0' && (seed[1] == 'x' || seed[1] == 'X') &&
+			 sscanf(seed, "%x%c", &iseed, &garbage) == 1) ||
+			 /* octal */
+			(seed[0] == '0' && seed[1] != 'x' && seed[1] != 'X' &&
+			 sscanf(seed, "%o%c", &iseed, &garbage) == 1) ||
+			 /* decimal */
+			 (seed[0] != '0' &&
+			  sscanf(seed, "%u%c", &iseed, &garbage) == 1)))
+		{
+			fprintf(stderr,
+					"error while scanning '%s' from %s, expecting an unsigned integer, 'time' or 'rand'\n",
+					seed, origin);
+			exit(1);
+		}
+	}
+
+	if (seed != NULL)
+		fprintf(stderr, "setting random seed to %u\n", iseed);
+	srandom(iseed);
+	/* no precision loss: 32 bit unsigned int cast to 64 bit int */
+	random_seed = iseed;
+}
+
 
 int
 main(int argc, char **argv)
@@ -4404,6 +4459,7 @@ main(int argc, char **argv)
 		{"progress-timestamp", no_argument, NULL, 6},
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
+		{"random-seed", required_argument, NULL, 9},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -4472,6 +4528,9 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
+	/* set random seed early, because it may be used while parsing scripts. */
+	set_random_seed(getenv("PGBENCH_RANDOM_SEED"), "PGBENCH_RANDOM_SEED environment variable");
+
 	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
 		char	   *script;
@@ -4744,6 +4803,10 @@ main(int argc, char **argv)
 				initialization_option_set = true;
 				foreign_keys = true;
 				break;
+			case 9:				/* random-seed */
+				benchmarking_option_set = true;
+				set_random_seed(optarg, "--random-seed option");
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -5033,6 +5096,16 @@ main(int argc, char **argv)
 		}
 	}
 
+	/* idem for :random_seed */
+	if (lookupVariable(&state[0], "random_seed") == NULL)
+	{
+		for (i = 0; i < nclients; i++)
+		{
+			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+				exit(1);
+		}
+	}
+
 	if (!is_no_vacuum)
 	{
 		fprintf(stderr, "starting vacuum...");
@@ -5050,10 +5123,6 @@ main(int argc, char **argv)
 	}
 	PQfinish(con);
 
-	/* set random seed */
-	INSTR_TIME_SET_CURRENT(start_time);
-	srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
-
 	/* set up thread data structures */
 	threads = (TState *) pg_malloc(sizeof(TState) * nthreads);
 	nclients_dealt = 0;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 99286f6..fa9d9ec 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -210,14 +210,18 @@ COMMIT;
 } });
 
 # test expressions
+# command 1..3 and 23 depend on random seed which is used to call srandom.
 pgbench(
-	'-t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808 -Dn=null -Dt=t -Df=of -Dd=1.0',
+	'--random-seed=0x1538 -t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808 -Dn=null -Dt=t -Df=of -Dd=1.0',
 	0,
 	[ qr{type: .*/001_pgbench_expressions}, qr{processed: 1/1} ],
-	[   qr{command=1.: int 1\d\b},
-	    qr{command=2.: int 1\d\d\b},
-	    qr{command=3.: int 1\d\d\d\b},
-	    qr{command=4.: int 4\b},
+	[   qr{setting random seed to 5432\b},
+		# After explicit seeding, the four * random checks (1-3,20) should be
+		# deterministic, but not necessarily portable.
+		qr{command=1.: int 1\d\b}, # uniform random: 12 on linux
+		qr{command=2.: int 1\d\d\b}, # exponential random: 106 on linux
+		qr{command=3.: int 1\d\d\d\b}, # gaussian random: 1462 on linux
+		qr{command=4.: int 4\b},
 		qr{command=5.: int 5\b},
 		qr{command=6.: int 6\b},
 		qr{command=7.: int 7\b},
@@ -230,7 +234,7 @@ pgbench(
 		qr{command=16.: double 16\b},
 		qr{command=17.: double 17\b},
 		qr{command=18.: int 9223372036854775807\b},
-		qr{command=20.: int [1-9]\b},
+		qr{command=20.: int \d\b}, # zipfian random: 1 on linux
 		qr{command=21.: double -27\b},
 		qr{command=22.: double 1024\b},
 		qr{command=23.: double 1\b},
@@ -259,6 +263,9 @@ pgbench(
 		qr{command=46.: int 46\b},
 		qr{command=47.: boolean true\b},
 		qr{command=48.: boolean true\b},
+		qr{command=53.: int 1\b},    # :scale
+		qr{command=54.: int 0\b},    # :client_id
+		qr{command=55.: int 5432\b}, # :random_seed
 	],
 	'pgbench expressions',
 	{   '001_pgbench_expressions' => q{-- integer functions
@@ -332,6 +339,10 @@ pgbench(
 \set yz debug(case when :zy = 0 then -1 else (1 / :zy) end)
 \set yz debug(case when :zy = 0 or (1 / :zy) < 0 then -1 else (1 / :zy) end)
 \set yz debug(case when :zy > 0 and (1 / :zy) < 0 then (1 / :zy) else 1 end)
+-- check automatic variables
+\set sc debug(:scale)
+\set ci debug(:client_id)
+\set rs debug(:random_seed)
 -- substitute variables of all possible types
 \set v0 NULL
 \set v1 TRUE
@@ -340,6 +351,45 @@ pgbench(
 SELECT :v0, :v1, :v2, :v3;
 } });
 
+# random determinism when seeded
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE seeded_random(rand TEXT NOT NULL, val INTEGER NOT NULL);');
+
+# same value in decimal, octal and hexadecimal
+for my $seed ('5432', '012470', '0x1538', '0X1538')
+{
+    pgbench("--random-seed=$seed -t 1",
+	0,
+	[qr{processed: 1/1}],
+	[qr{setting random seed to 5432\b}],
+	"random seeded with $seed",
+	{ "001_pgbench_random_seed_$seed" => q{-- test random functions
+\set ur random(1000, 1999)
+\set er random_exponential(2000, 2999, 2.0)
+\set gr random_gaussian(3000, 3999, 3.0)
+\set zr random_zipfian(4000, 4999, 2.5)
+INSERT INTO seeded_random(rand, val) VALUES
+  ('uniform', :ur),
+  ('exponential', :er),
+  ('gaussian', :gr),
+  ('zipfian', :zr);
+} });
+}
+
+# check that all runs generated the same 4 values
+my ($ret, $out, $err) =
+  $node->psql('postgres',
+	'SELECT rand, val, COUNT(*) FROM seeded_random GROUP BY rand, val');
+
+ok($ret == 0, "psql seeded_random count ok");
+ok($err eq '', "psql seeded_random count stderr is empty");
+ok($out =~ /uniform\|1\d\d\d\|4/, "psql seeded_random count uniform");
+ok($out =~ /exponential\|2\d\d\d\|4/, "psql seeded_random count exponential");
+ok($out =~ /gaussian\|3\d\d\d\|4/, "psql seeded_random count gaussian");
+ok($out =~ /zipfian\|4\d\d\d\|4/, "psql seeded_random count zipfian");
+
+$node->safe_psql('postgres', 'DROP TABLE seeded_random;');
+
 # backslash commands
 pgbench(
 	'-t 1', 0,
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 6ea55f8..c015f36 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -78,6 +78,8 @@ my @options = (
 	[ 'invalid init step', '-i -I dta',
 		[qr{unrecognized initialization step},
 		 qr{allowed steps are} ] ],
+	[ 'bad random seed', '--random-seed=one',
+		[qr{error while scanning 'one' from --random-seed option, expecting an unsigned integer} ] ],
 
 	# loging sub-options
 	[   'sampling => log', '--sampling-rate=0.01',

#23

Tom Lane

tgl@sss.pgh.pa.us

almost 8 years ago

In reply to: Fabien COELHO (#19)

Re: [HACKERS] pgbench randomness initialization

Fabien COELHO <coelho@cri.ensmp.fr> writes:

This is a simple patch that does what it says on the tin. I ran into
trouble with the pgbench TAP test *even before applying the patch*, but
only because I was doing a VPATH build as a user without 'write'
on the source tree (001_pgbench_with_server.pl tried to make pgbench
create log files there). Bad me. Oddly, that was the only test in the
whole tree to have such an issue, so here I add a pre-patch to fix that.
Now my review needs a review. :)

Yep. I find the multiple chdir solution a little bit too extreme.

ISTM that it should rather add the correct path to --log-prefix by
prepending $node->basedir, like the pgbench function does for -f scripts.
See attached.

Hm ... so I tried to replicate this problem, and failed to: the log files
get made under the VPATH build directory, as desired, even without this
patch. Am I doing something wrong, or is this platform-dependent somehow?

regards, tom lane

#24

Fabien COELHO

coelho@cri.ensmp.fr

almost 8 years ago

In reply to: Tom Lane (#23)

Re: [HACKERS] pgbench randomness initialization

Hello Tom,

Fabien COELHO <coelho@cri.ensmp.fr> writes:

This is a simple patch that does what it says on the tin. I ran into
trouble with the pgbench TAP test *even before applying the patch*, but
only because I was doing a VPATH build as a user without 'write'
on the source tree (001_pgbench_with_server.pl tried to make pgbench
create log files there). Bad me. Oddly, that was the only test in the
whole tree to have such an issue, so here I add a pre-patch to fix that.
Now my review needs a review. :)

Yep. I find the multiple chdir solution a little bit too extreme.

ISTM that it should rather add the correct path to --log-prefix by
prepending $node->basedir, like the pgbench function does for -f scripts.
See attached.

Hm ... so I tried to replicate this problem, and failed to: the log files
get made under the VPATH build directory, as desired, even without this
patch. Am I doing something wrong, or is this platform-dependent somehow?

As I recall, it indeed works if the source directories are rw, but fails
if they are ro because then the local mkdir fails. So you would have to do
a vpath build with sources that are ro to get the issue the patch is
fixing. Otherwise, the issue would have been cought earlier by the
buildfarm, which I guess is doing vpath compilation and full validation.

--
Fabien.

#25

Tom Lane

tgl@sss.pgh.pa.us

almost 8 years ago

In reply to: Fabien COELHO (#24)

Re: [HACKERS] pgbench randomness initialization

Fabien COELHO <coelho@cri.ensmp.fr> writes:

Hm ... so I tried to replicate this problem, and failed to: the log files
get made under the VPATH build directory, as desired, even without this
patch. Am I doing something wrong, or is this platform-dependent somehow?

As I recall, it indeed works if the source directories are rw, but fails
if they are ro because then the local mkdir fails. So you would have to do
a vpath build with sources that are ro to get the issue the patch is
fixing.

Ah, right you are. Apparently, if the source is rw, the temporary files
in question are made there but immediately deleted, so the bug isn't
obvious.

Fix verified and pushed.

regards, tom lane

#26

Chapman Flack

chap@anastigmatix.net

almost 8 years ago

In reply to: Fabien COELHO (#22)

Re: pgbench randomness initialization

The patch 8 works and addresses the things I noticed earlier.

It needs s/explicitely/explicitly/ in the docs.

The parsing of the seed involves matters of taste, I guess: if it were a signed int, then
sscanf's built-in %i would do everything those three explicit hex/octal/decimal branches
do, but there's no unsigned version of %i. Then there's strtoul(..., base=0), which accepts
the same choice of bases, but there's no unsigned-int-width version of that. Maybe it
would still look cleaner to use strtoul and just check that the result fits in unsigned int?
As I began, it comes down to taste ... this code does work.

I am not sure about the "idem for :random_seed" part: Does this mean that a value
could be given with -Drandom_seed on the command line, and become the value
of :random_seed, possibly different from the value given to --random-seed?
Is that intended? (Perhaps it is; I'm merely asking.)

-Chap

The new status of this patch is: Waiting on Author

#27

Fabien COELHO

coelho@cri.ensmp.fr

almost 8 years ago

In reply to: Chapman Flack (#26)

1 attachment(s)

Re: pgbench randomness initialization

Hello Chapman,

Here is v9.

It needs s/explicitely/explicitly/ in the docs.

Done.

The parsing of the seed involves matters of taste, I guess: if it were a
signed int, then sscanf's built-in %i would do everything those three
explicit hex/octal/decimal branches do, but there's no unsigned version
of %i. Then there's strtoul(..., base=0), which accepts the same choice
of bases, but there's no unsigned-int-width version of that. Maybe it
would still look cleaner to use strtoul and just check that the result
fits in unsigned int? As I began, it comes down to taste ... this code
does work.

I must admit that I'm not too happy with the result as well, so I dropped
the octal/hexadecimal parsing.

I am not sure about the "idem for :random_seed" part: Does this mean that a value
could be given with -Drandom_seed on the command line, and become the value
of :random_seed, possibly different from the value given to --random-seed?
Is that intended? (Perhaps it is; I'm merely asking.)

The "idem" is about setting the variable but not overwritting it if it
already exists. The intention is that :random_seed is the random seed,
unless the user set it to something else in which case it is the user's
value. I've improved the variable description in the doc to point out that
the value may be overwritten with -D.

--
Fabien.

Attachments:

pgbench-seed-9.patchtext/plain; name=pgbench-seed-9.patchDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 5f28023..4ef0adc 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -680,6 +680,43 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--random-seed=</option><replaceable>SEED</replaceable></term>
+      <listitem>
+       <para>
+        Set random generator seed.  Seeds the system random number generator,
+        which then produces a sequence of initial generator states, one for
+        each thread.
+        Values for <replaceable>SEED</replaceable> may be:
+        <literal>time</literal> (the default, the seed is based on the current time),
+        <literal>rand</literal> (use a strong random source, failing if none
+        is available), or an unsigned integer value.
+        The random generator is invoked explicitly from a pgbench script
+        (<literal>random...</literal> functions) or implicitly (for instance option
+        <option>--rate</option> uses it to schedule transactions).
+        When explicitly set, the value used for seeding is shown on the terminal.
+        Any value allowed for <replaceable>SEED</replaceable> may also be
+        provided through the environment variable
+        <literal>PGBENCH_RANDOM_SEED</literal>.
+        To ensure that the provided seed impacts all possible uses, put this option
+        first or use the environment variable.
+      </para>
+      <para>
+        Setting the seed explicitly allows to reproduce a <command>pgbench</command>
+        run exactly, as far as random numbers are concerned.
+        As the random state is managed per thread, this means the exact same
+        <command>pgbench</command> run for an identical invocation if there is one
+        client per thread and there are no external or data dependencies.
+        From a statistical viewpoint reproducing runs exactly is a bad idea because
+        it can hide the performance variability or improve performance unduly,
+        e.g. by hitting the same pages as a previous run.
+        However, it may also be of great help for debugging, for instance
+        re-running a tricky case which leads to an error.
+        Use wisely.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--sampling-rate=<replaceable>rate</replaceable></option></term>
       <listitem>
        <para>
@@ -874,14 +911,19 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
 
      <tbody>
       <row>
-       <entry> <literal>scale</literal> </entry>
-       <entry>current scale factor</entry>
-      </row>
-
-      <row>
        <entry> <literal>client_id</literal> </entry>
        <entry>unique number identifying the client session (starts from zero)</entry>
       </row>
+
+      <row>
+       <entry> <literal>random_seed</literal> </entry>
+       <entry>random generator seed (unless overwritten with <option>-D</option>)</entry>
+      </row>
+
+      <row>
+       <entry> <literal>scale</literal> </entry>
+       <entry>current scale factor</entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 5c07dd9..b516fb3 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -146,6 +146,9 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
+/* random seed used when calling srandom() */
+int64 random_seed = -1;
+
 /*
  * end of configurable parameters
  *********************************************************************/
@@ -561,6 +564,7 @@ usage(void)
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
+		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
 		   "  -d, --debug              print debugging output\n"
@@ -4353,6 +4357,49 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	}
 }
 
+/* call srandom based on some seed. NULL triggers the default behavior. */
+static void
+set_random_seed(const char *seed, const char *origin)
+{
+	/* srandom expects an unsigned int */
+	unsigned int iseed;
+
+	if (seed == NULL || strcmp(seed, "time") == 0)
+	{
+		/* rely on current time */
+		instr_time	now;
+		INSTR_TIME_SET_CURRENT(now);
+		iseed = (unsigned int) INSTR_TIME_GET_MICROSEC(now);
+	}
+	else if (strcmp(seed, "rand") == 0)
+	{
+		/* use some "strong" random source */
+		if (!pg_strong_random(&iseed, sizeof(iseed)))
+		{
+			fprintf(stderr, "cannot seed random from a strong source\n");
+			exit(1);
+		}
+	}
+	else
+	{
+		/* parse seed unsigned int value */
+		char garbage;
+		if (sscanf(seed, "%u%c", &iseed, &garbage) != 1)
+		{
+			fprintf(stderr,
+					"error while scanning '%s' from %s, expecting an unsigned integer, 'time' or 'rand'\n",
+					seed, origin);
+			exit(1);
+		}
+	}
+
+	if (seed != NULL)
+		fprintf(stderr, "setting random seed to %u\n", iseed);
+	srandom(iseed);
+	/* no precision loss: 32 bit unsigned int cast to 64 bit int */
+	random_seed = iseed;
+}
+
 
 int
 main(int argc, char **argv)
@@ -4395,6 +4442,7 @@ main(int argc, char **argv)
 		{"progress-timestamp", no_argument, NULL, 6},
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
+		{"random-seed", required_argument, NULL, 9},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -4463,6 +4511,9 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
+	/* set random seed early, because it may be used while parsing scripts. */
+	set_random_seed(getenv("PGBENCH_RANDOM_SEED"), "PGBENCH_RANDOM_SEED environment variable");
+
 	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
 		char	   *script;
@@ -4735,6 +4786,10 @@ main(int argc, char **argv)
 				initialization_option_set = true;
 				foreign_keys = true;
 				break;
+			case 9:				/* random-seed */
+				benchmarking_option_set = true;
+				set_random_seed(optarg, "--random-seed option");
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -5024,6 +5079,16 @@ main(int argc, char **argv)
 		}
 	}
 
+	/* idem for :random_seed */
+	if (lookupVariable(&state[0], "random_seed") == NULL)
+	{
+		for (i = 0; i < nclients; i++)
+		{
+			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+				exit(1);
+		}
+	}
+
 	if (!is_no_vacuum)
 	{
 		fprintf(stderr, "starting vacuum...");
@@ -5041,10 +5106,6 @@ main(int argc, char **argv)
 	}
 	PQfinish(con);
 
-	/* set random seed */
-	INSTR_TIME_SET_CURRENT(start_time);
-	srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
-
 	/* set up thread data structures */
 	threads = (TState *) pg_malloc(sizeof(TState) * nthreads);
 	nclients_dealt = 0;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 0c23d2f..2a69dfb 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -29,6 +29,12 @@ sub pgbench
 			$filename =~ s/\@\d+$//;
 
 			#push @filenames, $filename;
+			# filenames are expected to be unique on a test
+			if (-e $filename)
+			{
+				ok(0, "$filename must not already exists");
+				unlink $filename or die "cannot unlink $filename: $!";
+			}
 			append_to_file($filename, $$files{$fn});
 		}
 	}
@@ -210,14 +216,18 @@ COMMIT;
 } });
 
 # test expressions
+# command 1..3 and 23 depend on random seed which is used to call srandom.
 pgbench(
-	'-t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808 -Dn=null -Dt=t -Df=of -Dd=1.0',
+	'--random-seed=5432 -t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808 -Dn=null -Dt=t -Df=of -Dd=1.0',
 	0,
 	[ qr{type: .*/001_pgbench_expressions}, qr{processed: 1/1} ],
-	[   qr{command=1.: int 1\d\b},
-	    qr{command=2.: int 1\d\d\b},
-	    qr{command=3.: int 1\d\d\d\b},
-	    qr{command=4.: int 4\b},
+	[   qr{setting random seed to 5432\b},
+		# After explicit seeding, the four * random checks (1-3,20) should be
+		# deterministic, but not necessarily portable.
+		qr{command=1.: int 1\d\b}, # uniform random: 12 on linux
+		qr{command=2.: int 1\d\d\b}, # exponential random: 106 on linux
+		qr{command=3.: int 1\d\d\d\b}, # gaussian random: 1462 on linux
+		qr{command=4.: int 4\b},
 		qr{command=5.: int 5\b},
 		qr{command=6.: int 6\b},
 		qr{command=7.: int 7\b},
@@ -230,7 +240,7 @@ pgbench(
 		qr{command=16.: double 16\b},
 		qr{command=17.: double 17\b},
 		qr{command=18.: int 9223372036854775807\b},
-		qr{command=20.: int [1-9]\b},
+		qr{command=20.: int \d\b}, # zipfian random: 1 on linux
 		qr{command=21.: double -27\b},
 		qr{command=22.: double 1024\b},
 		qr{command=23.: double 1\b},
@@ -259,6 +269,9 @@ pgbench(
 		qr{command=46.: int 46\b},
 		qr{command=47.: boolean true\b},
 		qr{command=48.: boolean true\b},
+		qr{command=53.: int 1\b},    # :scale
+		qr{command=54.: int 0\b},    # :client_id
+		qr{command=55.: int 5432\b}, # :random_seed
 	],
 	'pgbench expressions',
 	{   '001_pgbench_expressions' => q{-- integer functions
@@ -332,6 +345,10 @@ pgbench(
 \set yz debug(case when :zy = 0 then -1 else (1 / :zy) end)
 \set yz debug(case when :zy = 0 or (1 / :zy) < 0 then -1 else (1 / :zy) end)
 \set yz debug(case when :zy > 0 and (1 / :zy) < 0 then (1 / :zy) else 1 end)
+-- check automatic variables
+\set sc debug(:scale)
+\set ci debug(:client_id)
+\set rs debug(:random_seed)
 -- substitute variables of all possible types
 \set v0 NULL
 \set v1 TRUE
@@ -340,6 +357,46 @@ pgbench(
 SELECT :v0, :v1, :v2, :v3;
 } });
 
+# random determinism when seeded
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE seeded_random(seed INT8 NOT NULL, rand TEXT NOT NULL, val INTEGER NOT NULL);');
+
+# same value to check for determinism
+my $seed = int(rand(1000000000));
+for my $i (1, 2)
+{
+    pgbench("--random-seed=$seed -t 1",
+	0,
+	[qr{processed: 1/1}],
+	[qr{setting random seed to $seed\b}],
+	"random seeded with $seed",
+	{ "001_pgbench_random_seed_$i" => q{-- test random functions
+\set ur random(1000, 1999)
+\set er random_exponential(2000, 2999, 2.0)
+\set gr random_gaussian(3000, 3999, 3.0)
+\set zr random_zipfian(4000, 4999, 2.5)
+INSERT INTO seeded_random(seed, rand, val) VALUES
+  (:random_seed, 'uniform', :ur),
+  (:random_seed, 'exponential', :er),
+  (:random_seed, 'gaussian', :gr),
+  (:random_seed, 'zipfian', :zr);
+} });
+}
+
+# check that all runs generated the same 4 values
+my ($ret, $out, $err) =
+  $node->psql('postgres',
+	'SELECT seed, rand, val, COUNT(*) FROM seeded_random GROUP BY seed, rand, val');
+
+ok($ret == 0, "psql seeded_random count ok");
+ok($err eq '', "psql seeded_random count stderr is empty");
+ok($out =~ /\b$seed\|uniform\|1\d\d\d\|2/, "psql seeded_random count uniform");
+ok($out =~ /\b$seed\|exponential\|2\d\d\d\|2/, "psql seeded_random count exponential");
+ok($out =~ /\b$seed\|gaussian\|3\d\d\d\|2/, "psql seeded_random count gaussian");
+ok($out =~ /\b$seed\|zipfian\|4\d\d\d\|2/, "psql seeded_random count zipfian");
+
+$node->safe_psql('postgres', 'DROP TABLE seeded_random;');
+
 # backslash commands
 pgbench(
 	'-t 1', 0,
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 6ea55f8..c015f36 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -78,6 +78,8 @@ my @options = (
 	[ 'invalid init step', '-i -I dta',
 		[qr{unrecognized initialization step},
 		 qr{allowed steps are} ] ],
+	[ 'bad random seed', '--random-seed=one',
+		[qr{error while scanning 'one' from --random-seed option, expecting an unsigned integer} ] ],
 
 	# loging sub-options
 	[   'sampling => log', '--sampling-rate=0.01',

#28

Chapman Flack

chap@anastigmatix.net

almost 8 years ago

In reply to: Fabien COELHO (#27)

Re: Re: pgbench randomness initialization

I'm sorry, I must have missed your reply on the 5th somehow.

On 03/05/18 07:01, Fabien COELHO wrote:

I must admit that I'm not too happy with the result as well, so I dropped
the octal/hexadecimal parsing.

That seems perfectly reasonable to me; perfectly adequate to accept only
one base.

But now the documentation is back to its original state of silence on
what base or how many bases might be allowed. Could it just say
"or an unsigned decimal integer value"? Then no one will wonder.

The "idem" is about setting the variable but not overwritting it if it
already exists. The intention is that :random_seed is the random seed,
unless the user set it to something else in which case it is the user's
value. I've improved the variable description in the doc to point out that
the value may be overwritten with -D.

Ok.

-Chap

#29

Fabien COELHO

coelho@cri.ensmp.fr

almost 8 years ago

In reply to: Chapman Flack (#28)

1 attachment(s)

Re: Re: pgbench randomness initialization

But now the documentation is back to its original state of silence on
what base or how many bases might be allowed. Could it just say
"or an unsigned decimal integer value"? Then no one will wonder.

Done in the attached.

Thanks for the reviews.

--
Fabien.

Attachments:

pgbench-seed-10.patchtext/plain; name=pgbench-seed-10.patchDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 5f28023..86a91ba 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -680,6 +680,43 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--random-seed=</option><replaceable>SEED</replaceable></term>
+      <listitem>
+       <para>
+        Set random generator seed.  Seeds the system random number generator,
+        which then produces a sequence of initial generator states, one for
+        each thread.
+        Values for <replaceable>SEED</replaceable> may be:
+        <literal>time</literal> (the default, the seed is based on the current time),
+        <literal>rand</literal> (use a strong random source, failing if none
+        is available), or an unsigned decimal integer value.
+        The random generator is invoked explicitly from a pgbench script
+        (<literal>random...</literal> functions) or implicitly (for instance option
+        <option>--rate</option> uses it to schedule transactions).
+        When explicitly set, the value used for seeding is shown on the terminal.
+        Any value allowed for <replaceable>SEED</replaceable> may also be
+        provided through the environment variable
+        <literal>PGBENCH_RANDOM_SEED</literal>.
+        To ensure that the provided seed impacts all possible uses, put this option
+        first or use the environment variable.
+      </para>
+      <para>
+        Setting the seed explicitly allows to reproduce a <command>pgbench</command>
+        run exactly, as far as random numbers are concerned.
+        As the random state is managed per thread, this means the exact same
+        <command>pgbench</command> run for an identical invocation if there is one
+        client per thread and there are no external or data dependencies.
+        From a statistical viewpoint reproducing runs exactly is a bad idea because
+        it can hide the performance variability or improve performance unduly,
+        e.g. by hitting the same pages as a previous run.
+        However, it may also be of great help for debugging, for instance
+        re-running a tricky case which leads to an error.
+        Use wisely.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--sampling-rate=<replaceable>rate</replaceable></option></term>
       <listitem>
        <para>
@@ -874,14 +911,19 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
 
      <tbody>
       <row>
-       <entry> <literal>scale</literal> </entry>
-       <entry>current scale factor</entry>
-      </row>
-
-      <row>
        <entry> <literal>client_id</literal> </entry>
        <entry>unique number identifying the client session (starts from zero)</entry>
       </row>
+
+      <row>
+       <entry> <literal>random_seed</literal> </entry>
+       <entry>random generator seed (unless overwritten with <option>-D</option>)</entry>
+      </row>
+
+      <row>
+       <entry> <literal>scale</literal> </entry>
+       <entry>current scale factor</entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 29d69de..a4c6c7b 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -146,6 +146,9 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
+/* random seed used when calling srandom() */
+int64 random_seed = -1;
+
 /*
  * end of configurable parameters
  *********************************************************************/
@@ -561,6 +564,7 @@ usage(void)
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
+		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
 		   "  -d, --debug              print debugging output\n"
@@ -4353,6 +4357,49 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	}
 }
 
+/* call srandom based on some seed. NULL triggers the default behavior. */
+static void
+set_random_seed(const char *seed, const char *origin)
+{
+	/* srandom expects an unsigned int */
+	unsigned int iseed;
+
+	if (seed == NULL || strcmp(seed, "time") == 0)
+	{
+		/* rely on current time */
+		instr_time	now;
+		INSTR_TIME_SET_CURRENT(now);
+		iseed = (unsigned int) INSTR_TIME_GET_MICROSEC(now);
+	}
+	else if (strcmp(seed, "rand") == 0)
+	{
+		/* use some "strong" random source */
+		if (!pg_strong_random(&iseed, sizeof(iseed)))
+		{
+			fprintf(stderr, "cannot seed random from a strong source\n");
+			exit(1);
+		}
+	}
+	else
+	{
+		/* parse seed unsigned int value */
+		char garbage;
+		if (sscanf(seed, "%u%c", &iseed, &garbage) != 1)
+		{
+			fprintf(stderr,
+					"error while scanning '%s' from %s, expecting an unsigned integer, 'time' or 'rand'\n",
+					seed, origin);
+			exit(1);
+		}
+	}
+
+	if (seed != NULL)
+		fprintf(stderr, "setting random seed to %u\n", iseed);
+	srandom(iseed);
+	/* no precision loss: 32 bit unsigned int cast to 64 bit int */
+	random_seed = iseed;
+}
+
 
 int
 main(int argc, char **argv)
@@ -4395,6 +4442,7 @@ main(int argc, char **argv)
 		{"progress-timestamp", no_argument, NULL, 6},
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
+		{"random-seed", required_argument, NULL, 9},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -4463,6 +4511,9 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
+	/* set random seed early, because it may be used while parsing scripts. */
+	set_random_seed(getenv("PGBENCH_RANDOM_SEED"), "PGBENCH_RANDOM_SEED environment variable");
+
 	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
 		char	   *script;
@@ -4735,6 +4786,10 @@ main(int argc, char **argv)
 				initialization_option_set = true;
 				foreign_keys = true;
 				break;
+			case 9:				/* random-seed */
+				benchmarking_option_set = true;
+				set_random_seed(optarg, "--random-seed option");
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -5024,6 +5079,16 @@ main(int argc, char **argv)
 		}
 	}
 
+	/* idem for :random_seed */
+	if (lookupVariable(&state[0], "random_seed") == NULL)
+	{
+		for (i = 0; i < nclients; i++)
+		{
+			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+				exit(1);
+		}
+	}
+
 	if (!is_no_vacuum)
 	{
 		fprintf(stderr, "starting vacuum...");
@@ -5041,10 +5106,6 @@ main(int argc, char **argv)
 	}
 	PQfinish(con);
 
-	/* set random seed */
-	INSTR_TIME_SET_CURRENT(start_time);
-	srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
-
 	/* set up thread data structures */
 	threads = (TState *) pg_malloc(sizeof(TState) * nthreads);
 	nclients_dealt = 0;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 0c23d2f..2a69dfb 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -29,6 +29,12 @@ sub pgbench
 			$filename =~ s/\@\d+$//;
 
 			#push @filenames, $filename;
+			# filenames are expected to be unique on a test
+			if (-e $filename)
+			{
+				ok(0, "$filename must not already exists");
+				unlink $filename or die "cannot unlink $filename: $!";
+			}
 			append_to_file($filename, $$files{$fn});
 		}
 	}
@@ -210,14 +216,18 @@ COMMIT;
 } });
 
 # test expressions
+# command 1..3 and 23 depend on random seed which is used to call srandom.
 pgbench(
-	'-t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808 -Dn=null -Dt=t -Df=of -Dd=1.0',
+	'--random-seed=5432 -t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808 -Dn=null -Dt=t -Df=of -Dd=1.0',
 	0,
 	[ qr{type: .*/001_pgbench_expressions}, qr{processed: 1/1} ],
-	[   qr{command=1.: int 1\d\b},
-	    qr{command=2.: int 1\d\d\b},
-	    qr{command=3.: int 1\d\d\d\b},
-	    qr{command=4.: int 4\b},
+	[   qr{setting random seed to 5432\b},
+		# After explicit seeding, the four * random checks (1-3,20) should be
+		# deterministic, but not necessarily portable.
+		qr{command=1.: int 1\d\b}, # uniform random: 12 on linux
+		qr{command=2.: int 1\d\d\b}, # exponential random: 106 on linux
+		qr{command=3.: int 1\d\d\d\b}, # gaussian random: 1462 on linux
+		qr{command=4.: int 4\b},
 		qr{command=5.: int 5\b},
 		qr{command=6.: int 6\b},
 		qr{command=7.: int 7\b},
@@ -230,7 +240,7 @@ pgbench(
 		qr{command=16.: double 16\b},
 		qr{command=17.: double 17\b},
 		qr{command=18.: int 9223372036854775807\b},
-		qr{command=20.: int [1-9]\b},
+		qr{command=20.: int \d\b}, # zipfian random: 1 on linux
 		qr{command=21.: double -27\b},
 		qr{command=22.: double 1024\b},
 		qr{command=23.: double 1\b},
@@ -259,6 +269,9 @@ pgbench(
 		qr{command=46.: int 46\b},
 		qr{command=47.: boolean true\b},
 		qr{command=48.: boolean true\b},
+		qr{command=53.: int 1\b},    # :scale
+		qr{command=54.: int 0\b},    # :client_id
+		qr{command=55.: int 5432\b}, # :random_seed
 	],
 	'pgbench expressions',
 	{   '001_pgbench_expressions' => q{-- integer functions
@@ -332,6 +345,10 @@ pgbench(
 \set yz debug(case when :zy = 0 then -1 else (1 / :zy) end)
 \set yz debug(case when :zy = 0 or (1 / :zy) < 0 then -1 else (1 / :zy) end)
 \set yz debug(case when :zy > 0 and (1 / :zy) < 0 then (1 / :zy) else 1 end)
+-- check automatic variables
+\set sc debug(:scale)
+\set ci debug(:client_id)
+\set rs debug(:random_seed)
 -- substitute variables of all possible types
 \set v0 NULL
 \set v1 TRUE
@@ -340,6 +357,46 @@ pgbench(
 SELECT :v0, :v1, :v2, :v3;
 } });
 
+# random determinism when seeded
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE seeded_random(seed INT8 NOT NULL, rand TEXT NOT NULL, val INTEGER NOT NULL);');
+
+# same value to check for determinism
+my $seed = int(rand(1000000000));
+for my $i (1, 2)
+{
+    pgbench("--random-seed=$seed -t 1",
+	0,
+	[qr{processed: 1/1}],
+	[qr{setting random seed to $seed\b}],
+	"random seeded with $seed",
+	{ "001_pgbench_random_seed_$i" => q{-- test random functions
+\set ur random(1000, 1999)
+\set er random_exponential(2000, 2999, 2.0)
+\set gr random_gaussian(3000, 3999, 3.0)
+\set zr random_zipfian(4000, 4999, 2.5)
+INSERT INTO seeded_random(seed, rand, val) VALUES
+  (:random_seed, 'uniform', :ur),
+  (:random_seed, 'exponential', :er),
+  (:random_seed, 'gaussian', :gr),
+  (:random_seed, 'zipfian', :zr);
+} });
+}
+
+# check that all runs generated the same 4 values
+my ($ret, $out, $err) =
+  $node->psql('postgres',
+	'SELECT seed, rand, val, COUNT(*) FROM seeded_random GROUP BY seed, rand, val');
+
+ok($ret == 0, "psql seeded_random count ok");
+ok($err eq '', "psql seeded_random count stderr is empty");
+ok($out =~ /\b$seed\|uniform\|1\d\d\d\|2/, "psql seeded_random count uniform");
+ok($out =~ /\b$seed\|exponential\|2\d\d\d\|2/, "psql seeded_random count exponential");
+ok($out =~ /\b$seed\|gaussian\|3\d\d\d\|2/, "psql seeded_random count gaussian");
+ok($out =~ /\b$seed\|zipfian\|4\d\d\d\|2/, "psql seeded_random count zipfian");
+
+$node->safe_psql('postgres', 'DROP TABLE seeded_random;');
+
 # backslash commands
 pgbench(
 	'-t 1', 0,
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 6ea55f8..c015f36 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -78,6 +78,8 @@ my @options = (
 	[ 'invalid init step', '-i -I dta',
 		[qr{unrecognized initialization step},
 		 qr{allowed steps are} ] ],
+	[ 'bad random seed', '--random-seed=one',
+		[qr{error while scanning 'one' from --random-seed option, expecting an unsigned integer} ] ],
 
 	# loging sub-options
 	[   'sampling => log', '--sampling-rate=0.01',

#30

Chapman Flack

chap@anastigmatix.net

almost 8 years ago

In reply to: Fabien COELHO (#29)

Re: pgbench randomness initialization

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, failed
Spec compliant: not tested
Documentation: tested, failed

This is a simple patch, includes documentation, includes and passes tests, and, in my rookie opinion, is ready for committer.

The new status of this patch is: Ready for Committer

#31

Teodor Sigaev

teodor@sigaev.ru

almost 8 years ago

In reply to: Fabien COELHO (#29)

Re: pgbench randomness initialization

Patch isn't applyed cleanly anymore.

Fabien COELHO wrote:

But now the documentation is back to its original state of silence on
what base or how many bases might be allowed. Could it just say
"or an unsigned decimal integer value"? Then no one will wonder.

Done in the attached.

Thanks for the reviews.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#32

Fabien COELHO

coelho@cri.ensmp.fr

almost 8 years ago

In reply to: Teodor Sigaev (#31)

1 attachment(s)

Re: pgbench randomness initialization

Patch isn't applyed cleanly anymore.

Indeed. Here is a rebase. All pgbench patches conflict about test cases.

--
Fabien.

Attachments:

pgbench-seed-11.patchtext/plain; name=pgbench-seed-11.patchDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index f07ddf1..e4582bf 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -680,6 +680,43 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--random-seed=</option><replaceable>SEED</replaceable></term>
+      <listitem>
+       <para>
+        Set random generator seed.  Seeds the system random number generator,
+        which then produces a sequence of initial generator states, one for
+        each thread.
+        Values for <replaceable>SEED</replaceable> may be:
+        <literal>time</literal> (the default, the seed is based on the current time),
+        <literal>rand</literal> (use a strong random source, failing if none
+        is available), or an unsigned decimal integer value.
+        The random generator is invoked explicitly from a pgbench script
+        (<literal>random...</literal> functions) or implicitly (for instance option
+        <option>--rate</option> uses it to schedule transactions).
+        When explicitly set, the value used for seeding is shown on the terminal.
+        Any value allowed for <replaceable>SEED</replaceable> may also be
+        provided through the environment variable
+        <literal>PGBENCH_RANDOM_SEED</literal>.
+        To ensure that the provided seed impacts all possible uses, put this option
+        first or use the environment variable.
+      </para>
+      <para>
+        Setting the seed explicitly allows to reproduce a <command>pgbench</command>
+        run exactly, as far as random numbers are concerned.
+        As the random state is managed per thread, this means the exact same
+        <command>pgbench</command> run for an identical invocation if there is one
+        client per thread and there are no external or data dependencies.
+        From a statistical viewpoint reproducing runs exactly is a bad idea because
+        it can hide the performance variability or improve performance unduly,
+        e.g. by hitting the same pages as a previous run.
+        However, it may also be of great help for debugging, for instance
+        re-running a tricky case which leads to an error.
+        Use wisely.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--sampling-rate=<replaceable>rate</replaceable></option></term>
       <listitem>
        <para>
@@ -884,6 +921,11 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </row>
 
       <row>
+       <entry> <literal>random_seed</literal> </entry>
+       <entry>random generator seed (unless overwritten with <option>-D</option>)</entry>
+      </row>
+
+      <row>
        <entry> <literal>scale</literal> </entry>
        <entry>current scale factor</entry>
       </row>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index a15aa06..8397d25 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -154,6 +154,9 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
+/* random seed used when calling srandom() */
+int64 random_seed = -1;
+
 /*
  * end of configurable parameters
  *********************************************************************/
@@ -569,6 +572,7 @@ usage(void)
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
+		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
 		   "  -d, --debug              print debugging output\n"
@@ -4433,6 +4437,49 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	}
 }
 
+/* call srandom based on some seed. NULL triggers the default behavior. */
+static void
+set_random_seed(const char *seed, const char *origin)
+{
+	/* srandom expects an unsigned int */
+	unsigned int iseed;
+
+	if (seed == NULL || strcmp(seed, "time") == 0)
+	{
+		/* rely on current time */
+		instr_time	now;
+		INSTR_TIME_SET_CURRENT(now);
+		iseed = (unsigned int) INSTR_TIME_GET_MICROSEC(now);
+	}
+	else if (strcmp(seed, "rand") == 0)
+	{
+		/* use some "strong" random source */
+		if (!pg_strong_random(&iseed, sizeof(iseed)))
+		{
+			fprintf(stderr, "cannot seed random from a strong source\n");
+			exit(1);
+		}
+	}
+	else
+	{
+		/* parse seed unsigned int value */
+		char garbage;
+		if (sscanf(seed, "%u%c", &iseed, &garbage) != 1)
+		{
+			fprintf(stderr,
+					"error while scanning '%s' from %s, expecting an unsigned integer, 'time' or 'rand'\n",
+					seed, origin);
+			exit(1);
+		}
+	}
+
+	if (seed != NULL)
+		fprintf(stderr, "setting random seed to %u\n", iseed);
+	srandom(iseed);
+	/* no precision loss: 32 bit unsigned int cast to 64 bit int */
+	random_seed = iseed;
+}
+
 
 int
 main(int argc, char **argv)
@@ -4475,6 +4522,7 @@ main(int argc, char **argv)
 		{"progress-timestamp", no_argument, NULL, 6},
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
+		{"random-seed", required_argument, NULL, 9},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -4543,6 +4591,9 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
+	/* set random seed early, because it may be used while parsing scripts. */
+	set_random_seed(getenv("PGBENCH_RANDOM_SEED"), "PGBENCH_RANDOM_SEED environment variable");
+
 	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
 		char	   *script;
@@ -4815,6 +4866,10 @@ main(int argc, char **argv)
 				initialization_option_set = true;
 				foreign_keys = true;
 				break;
+			case 9:				/* random-seed */
+				benchmarking_option_set = true;
+				set_random_seed(optarg, "--random-seed option");
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -5043,10 +5098,6 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
-	/* set random seed */
-	INSTR_TIME_SET_CURRENT(start_time);
-	srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
-
 	if (internal_script_used)
 	{
 		/*
@@ -5102,10 +5153,8 @@ main(int argc, char **argv)
 	if (lookupVariable(&state[0], "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-		{
 			if (!putVariableInt(&state[i], "startup", "client_id", i))
 				exit(1);
-		}
 	}
 
 	/* set default seed for hash functions */
@@ -5121,6 +5170,14 @@ main(int argc, char **argv)
 				exit(1);
 	}
 
+	/* set random seed unless overwritten */
+	if (lookupVariable(&state[0], "random_seed") == NULL)
+	{
+		for (i = 0; i < nclients; i++)
+			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+				exit(1);
+	}
+
 	if (!is_no_vacuum)
 	{
 		fprintf(stderr, "starting vacuum...");
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 50cbb23..6fbe39b 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -29,6 +29,12 @@ sub pgbench
 			$filename =~ s/\@\d+$//;
 
 			#push @filenames, $filename;
+			# filenames are expected to be unique on a test
+			if (-e $filename)
+			{
+				ok(0, "$filename must not already exists");
+				unlink $filename or die "cannot unlink $filename: $!";
+			}
 			append_to_file($filename, $$files{$fn});
 		}
 	}
@@ -210,14 +216,18 @@ COMMIT;
 } });
 
 # test expressions
+# command 1..3 and 23 depend on random seed which is used to call srandom.
 pgbench(
-	'-t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808 -Dn=null -Dt=t -Df=of -Dd=1.0',
+	'--random-seed=5432 -t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808 -Dn=null -Dt=t -Df=of -Dd=1.0',
 	0,
 	[ qr{type: .*/001_pgbench_expressions}, qr{processed: 1/1} ],
-	[   qr{command=1.: int 1\d\b},
-	    qr{command=2.: int 1\d\d\b},
-	    qr{command=3.: int 1\d\d\d\b},
-	    qr{command=4.: int 4\b},
+	[   qr{setting random seed to 5432\b},
+		# After explicit seeding, the four * random checks (1-3,20) should be
+		# deterministic, but not necessarily portable.
+		qr{command=1.: int 1\d\b}, # uniform random: 12 on linux
+		qr{command=2.: int 1\d\d\b}, # exponential random: 106 on linux
+		qr{command=3.: int 1\d\d\d\b}, # gaussian random: 1462 on linux
+		qr{command=4.: int 4\b},
 		qr{command=5.: int 5\b},
 		qr{command=6.: int 6\b},
 		qr{command=7.: int 7\b},
@@ -230,7 +240,7 @@ pgbench(
 		qr{command=16.: double 16\b},
 		qr{command=17.: double 17\b},
 		qr{command=18.: int 9223372036854775807\b},
-		qr{command=20.: int [1-9]\b},
+		qr{command=20.: int \d\b}, # zipfian random: 1 on linux
 		qr{command=21.: double -27\b},
 		qr{command=22.: double 1024\b},
 		qr{command=23.: double 1\b},
@@ -264,6 +274,9 @@ pgbench(
 		qr{command=51.: int -7793829335365542153\b},
 		qr{command=52.: int -?\d+\b},
 		qr{command=53.: boolean true\b},
+		qr{command=58.: int 1\b},    # :scale
+		qr{command=59.: int 0\b},    # :client_id
+		qr{command=60.: int 5432\b}, # :random_seed
 	],
 	'pgbench expressions',
 	{   '001_pgbench_expressions' => q{-- integer functions
@@ -343,6 +356,10 @@ pgbench(
 \set yz debug(case when :zy = 0 then -1 else (1 / :zy) end)
 \set yz debug(case when :zy = 0 or (1 / :zy) < 0 then -1 else (1 / :zy) end)
 \set yz debug(case when :zy > 0 and (1 / :zy) < 0 then (1 / :zy) else 1 end)
+-- check automatic variables
+\set sc debug(:scale)
+\set ci debug(:client_id)
+\set rs debug(:random_seed)
 -- substitute variables of all possible types
 \set v0 NULL
 \set v1 TRUE
@@ -351,6 +368,46 @@ pgbench(
 SELECT :v0, :v1, :v2, :v3;
 } });
 
+# random determinism when seeded
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE seeded_random(seed INT8 NOT NULL, rand TEXT NOT NULL, val INTEGER NOT NULL);');
+
+# same value to check for determinism
+my $seed = int(rand(1000000000));
+for my $i (1, 2)
+{
+    pgbench("--random-seed=$seed -t 1",
+	0,
+	[qr{processed: 1/1}],
+	[qr{setting random seed to $seed\b}],
+	"random seeded with $seed",
+	{ "001_pgbench_random_seed_$i" => q{-- test random functions
+\set ur random(1000, 1999)
+\set er random_exponential(2000, 2999, 2.0)
+\set gr random_gaussian(3000, 3999, 3.0)
+\set zr random_zipfian(4000, 4999, 2.5)
+INSERT INTO seeded_random(seed, rand, val) VALUES
+  (:random_seed, 'uniform', :ur),
+  (:random_seed, 'exponential', :er),
+  (:random_seed, 'gaussian', :gr),
+  (:random_seed, 'zipfian', :zr);
+} });
+}
+
+# check that all runs generated the same 4 values
+my ($ret, $out, $err) =
+  $node->psql('postgres',
+	'SELECT seed, rand, val, COUNT(*) FROM seeded_random GROUP BY seed, rand, val');
+
+ok($ret == 0, "psql seeded_random count ok");
+ok($err eq '', "psql seeded_random count stderr is empty");
+ok($out =~ /\b$seed\|uniform\|1\d\d\d\|2/, "psql seeded_random count uniform");
+ok($out =~ /\b$seed\|exponential\|2\d\d\d\|2/, "psql seeded_random count exponential");
+ok($out =~ /\b$seed\|gaussian\|3\d\d\d\|2/, "psql seeded_random count gaussian");
+ok($out =~ /\b$seed\|zipfian\|4\d\d\d\|2/, "psql seeded_random count zipfian");
+
+$node->safe_psql('postgres', 'DROP TABLE seeded_random;');
+
 # backslash commands
 pgbench(
 	'-t 1', 0,
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 6ea55f8..c015f36 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -78,6 +78,8 @@ my @options = (
 	[ 'invalid init step', '-i -I dta',
 		[qr{unrecognized initialization step},
 		 qr{allowed steps are} ] ],
+	[ 'bad random seed', '--random-seed=one',
+		[qr{error while scanning 'one' from --random-seed option, expecting an unsigned integer} ] ],
 
 	# loging sub-options
 	[   'sampling => log', '--sampling-rate=0.01',

#33

Fabien COELHO

coelho@cri.ensmp.fr

almost 8 years ago

In reply to: Fabien COELHO (#32)

1 attachment(s)

Re: pgbench randomness initialization

Patch isn't applyed cleanly anymore.

Indeed. Here is a rebase. All pgbench patches conflict about test cases.

Patch v12, yet another rebase.

--
Fabien.

Attachments:

pgbench-seed-12.patchtext/plain; charset=us-ascii; name=pgbench-seed-12.patchDownload

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index d52d324..41d9030 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -680,6 +680,43 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--random-seed=</option><replaceable>SEED</replaceable></term>
+      <listitem>
+       <para>
+        Set random generator seed.  Seeds the system random number generator,
+        which then produces a sequence of initial generator states, one for
+        each thread.
+        Values for <replaceable>SEED</replaceable> may be:
+        <literal>time</literal> (the default, the seed is based on the current time),
+        <literal>rand</literal> (use a strong random source, failing if none
+        is available), or an unsigned decimal integer value.
+        The random generator is invoked explicitly from a pgbench script
+        (<literal>random...</literal> functions) or implicitly (for instance option
+        <option>--rate</option> uses it to schedule transactions).
+        When explicitly set, the value used for seeding is shown on the terminal.
+        Any value allowed for <replaceable>SEED</replaceable> may also be
+        provided through the environment variable
+        <literal>PGBENCH_RANDOM_SEED</literal>.
+        To ensure that the provided seed impacts all possible uses, put this option
+        first or use the environment variable.
+      </para>
+      <para>
+        Setting the seed explicitly allows to reproduce a <command>pgbench</command>
+        run exactly, as far as random numbers are concerned.
+        As the random state is managed per thread, this means the exact same
+        <command>pgbench</command> run for an identical invocation if there is one
+        client per thread and there are no external or data dependencies.
+        From a statistical viewpoint reproducing runs exactly is a bad idea because
+        it can hide the performance variability or improve performance unduly,
+        e.g. by hitting the same pages as a previous run.
+        However, it may also be of great help for debugging, for instance
+        re-running a tricky case which leads to an error.
+        Use wisely.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--sampling-rate=<replaceable>rate</replaceable></option></term>
       <listitem>
        <para>
@@ -884,6 +921,11 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
       </row>
 
       <row>
+       <entry> <literal>random_seed</literal> </entry>
+       <entry>random generator seed (unless overwritten with <option>-D</option>)</entry>
+      </row>
+
+      <row>
        <entry> <literal>scale</literal> </entry>
        <entry>current scale factor</entry>
       </row>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 894571e..48604a1 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -155,6 +155,9 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
+/* random seed used when calling srandom() */
+int64 random_seed = -1;
+
 /*
  * end of configurable parameters
  *********************************************************************/
@@ -579,6 +582,7 @@ usage(void)
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
+		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
 		   "  -d, --debug              print debugging output\n"
@@ -4664,6 +4668,49 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	}
 }
 
+/* call srandom based on some seed. NULL triggers the default behavior. */
+static void
+set_random_seed(const char *seed, const char *origin)
+{
+	/* srandom expects an unsigned int */
+	unsigned int iseed;
+
+	if (seed == NULL || strcmp(seed, "time") == 0)
+	{
+		/* rely on current time */
+		instr_time	now;
+		INSTR_TIME_SET_CURRENT(now);
+		iseed = (unsigned int) INSTR_TIME_GET_MICROSEC(now);
+	}
+	else if (strcmp(seed, "rand") == 0)
+	{
+		/* use some "strong" random source */
+		if (!pg_strong_random(&iseed, sizeof(iseed)))
+		{
+			fprintf(stderr, "cannot seed random from a strong source\n");
+			exit(1);
+		}
+	}
+	else
+	{
+		/* parse seed unsigned int value */
+		char garbage;
+		if (sscanf(seed, "%u%c", &iseed, &garbage) != 1)
+		{
+			fprintf(stderr,
+					"error while scanning '%s' from %s, expecting an unsigned integer, 'time' or 'rand'\n",
+					seed, origin);
+			exit(1);
+		}
+	}
+
+	if (seed != NULL)
+		fprintf(stderr, "setting random seed to %u\n", iseed);
+	srandom(iseed);
+	/* no precision loss: 32 bit unsigned int cast to 64 bit int */
+	random_seed = iseed;
+}
+
 
 int
 main(int argc, char **argv)
@@ -4706,6 +4753,7 @@ main(int argc, char **argv)
 		{"progress-timestamp", no_argument, NULL, 6},
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
+		{"random-seed", required_argument, NULL, 9},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -4774,6 +4822,9 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
+	/* set random seed early, because it may be used while parsing scripts. */
+	set_random_seed(getenv("PGBENCH_RANDOM_SEED"), "PGBENCH_RANDOM_SEED environment variable");
+
 	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
 		char	   *script;
@@ -5046,6 +5097,10 @@ main(int argc, char **argv)
 				initialization_option_set = true;
 				foreign_keys = true;
 				break;
+			case 9:				/* random-seed */
+				benchmarking_option_set = true;
+				set_random_seed(optarg, "--random-seed option");
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -5280,10 +5335,6 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
-	/* set random seed */
-	INSTR_TIME_SET_CURRENT(start_time);
-	srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
-
 	if (internal_script_used)
 	{
 		/*
@@ -5339,10 +5390,8 @@ main(int argc, char **argv)
 	if (lookupVariable(&state[0], "client_id") == NULL)
 	{
 		for (i = 0; i < nclients; i++)
-		{
 			if (!putVariableInt(&state[i], "startup", "client_id", i))
 				exit(1);
-		}
 	}
 
 	/* set default seed for hash functions */
@@ -5358,6 +5407,14 @@ main(int argc, char **argv)
 				exit(1);
 	}
 
+	/* set random seed unless overwritten */
+	if (lookupVariable(&state[0], "random_seed") == NULL)
+	{
+		for (i = 0; i < nclients; i++)
+			if (!putVariableInt(&state[i], "startup", "random_seed", random_seed))
+				exit(1);
+	}
+
 	if (!is_no_vacuum)
 	{
 		fprintf(stderr, "starting vacuum...");
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 7448a96..0929418 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -29,6 +29,12 @@ sub pgbench
 			$filename =~ s/\@\d+$//;
 
 			#push @filenames, $filename;
+			# filenames are expected to be unique on a test
+			if (-e $filename)
+			{
+				ok(0, "$filename must not already exists");
+				unlink $filename or die "cannot unlink $filename: $!";
+			}
 			append_to_file($filename, $$files{$fn});
 		}
 	}
@@ -210,14 +216,18 @@ COMMIT;
 } });
 
 # test expressions
+# command 1..3 and 23 depend on random seed which is used to call srandom.
 pgbench(
-	'-t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808 -Dn=null -Dt=t -Df=of -Dd=1.0',
+	'--random-seed=5432 -t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808 -Dn=null -Dt=t -Df=of -Dd=1.0',
 	0,
 	[ qr{type: .*/001_pgbench_expressions}, qr{processed: 1/1} ],
-	[   qr{command=1.: int 1\d\b},
-	    qr{command=2.: int 1\d\d\b},
-	    qr{command=3.: int 1\d\d\d\b},
-	    qr{command=4.: int 4\b},
+	[   qr{setting random seed to 5432\b},
+		# After explicit seeding, the four * random checks (1-3,20) should be
+		# deterministic, but not necessarily portable.
+		qr{command=1.: int 1\d\b}, # uniform random: 12 on linux
+		qr{command=2.: int 1\d\d\b}, # exponential random: 106 on linux
+		qr{command=3.: int 1\d\d\d\b}, # gaussian random: 1462 on linux
+		qr{command=4.: int 4\b},
 		qr{command=5.: int 5\b},
 		qr{command=6.: int 6\b},
 		qr{command=7.: int 7\b},
@@ -230,7 +240,7 @@ pgbench(
 		qr{command=16.: double 16\b},
 		qr{command=17.: double 17\b},
 		qr{command=18.: int 9223372036854775807\b},
-		qr{command=20.: int [1-9]\b},
+		qr{command=20.: int \d\b}, # zipfian random: 1 on linux
 		qr{command=21.: double -27\b},
 		qr{command=22.: double 1024\b},
 		qr{command=23.: double 1\b},
@@ -270,6 +280,9 @@ pgbench(
 		qr{command=86.: int 86\b},
 		qr{command=93.: int 93\b},
 		qr{command=95.: int 0\b},
+		qr{command=96.: int 1\b},    # :scale
+		qr{command=97.: int 0\b},    # :client_id
+		qr{command=98.: int 5432\b}, # :random_seed
 	],
 	'pgbench expressions',
 	{   '001_pgbench_expressions' => q{-- integer functions
@@ -390,8 +403,52 @@ SELECT :v0, :v1, :v2, :v3;
 \endif
 -- must be zero if false branches where skipped
 \set nope debug(:nope)
+-- check automatic variables
+\set sc debug(:scale)
+\set ci debug(:client_id)
+\set rs debug(:random_seed)
 } });
 
+# random determinism when seeded
+$node->safe_psql('postgres',
+	'CREATE UNLOGGED TABLE seeded_random(seed INT8 NOT NULL, rand TEXT NOT NULL, val INTEGER NOT NULL);');
+
+# same value to check for determinism
+my $seed = int(rand(1000000000));
+for my $i (1, 2)
+{
+    pgbench("--random-seed=$seed -t 1",
+	0,
+	[qr{processed: 1/1}],
+	[qr{setting random seed to $seed\b}],
+	"random seeded with $seed",
+	{ "001_pgbench_random_seed_$i" => q{-- test random functions
+\set ur random(1000, 1999)
+\set er random_exponential(2000, 2999, 2.0)
+\set gr random_gaussian(3000, 3999, 3.0)
+\set zr random_zipfian(4000, 4999, 2.5)
+INSERT INTO seeded_random(seed, rand, val) VALUES
+  (:random_seed, 'uniform', :ur),
+  (:random_seed, 'exponential', :er),
+  (:random_seed, 'gaussian', :gr),
+  (:random_seed, 'zipfian', :zr);
+} });
+}
+
+# check that all runs generated the same 4 values
+my ($ret, $out, $err) =
+  $node->psql('postgres',
+	'SELECT seed, rand, val, COUNT(*) FROM seeded_random GROUP BY seed, rand, val');
+
+ok($ret == 0, "psql seeded_random count ok");
+ok($err eq '', "psql seeded_random count stderr is empty");
+ok($out =~ /\b$seed\|uniform\|1\d\d\d\|2/, "psql seeded_random count uniform");
+ok($out =~ /\b$seed\|exponential\|2\d\d\d\|2/, "psql seeded_random count exponential");
+ok($out =~ /\b$seed\|gaussian\|3\d\d\d\|2/, "psql seeded_random count gaussian");
+ok($out =~ /\b$seed\|zipfian\|4\d\d\d\|2/, "psql seeded_random count zipfian");
+
+$node->safe_psql('postgres', 'DROP TABLE seeded_random;');
+
 # backslash commands
 pgbench(
 	'-t 1', 0,
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 80c5aed..682bc22 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -110,6 +110,8 @@ my @options = (
 	[ 'invalid init step', '-i -I dta',
 		[qr{unrecognized initialization step},
 		 qr{allowed steps are} ] ],
+	[ 'bad random seed', '--random-seed=one',
+		[qr{error while scanning 'one' from --random-seed option, expecting an unsigned integer} ] ],
 
 	# loging sub-options
 	[   'sampling => log', '--sampling-rate=0.01',

#34

Teodor Sigaev

teodor@sigaev.ru

almost 8 years ago

In reply to: Fabien COELHO (#33)

Re: pgbench randomness initialization

Thank you, pushed

Fabien COELHO wrote:

Patch isn't applyed cleanly anymore.

Indeed. Here is a rebase. All pgbench patches conflict about test cases.

Patch v12, yet another rebase.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/