too much pgbench init output

Started by Peter Eisentrautover 13 years ago31 messages

peter_e@gmx.net

over 13 years ago

1 attachment(s)

When initializing a large database, pgbench writes tons of "%d tuples
done" lines. I propose to change this to a sort of progress counter
that stays on the same line, as in the attached patch.

Attachments:

pg-pgbench-report.patchtext/x-patch; charset=UTF-8; name=pg-pgbench-report.patchDownload

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 00cab73..b5f3054 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -1420,8 +1420,9 @@
 		}
 
 		if (j % 100000 == 0)
-			fprintf(stderr, "%d tuples done.\n", j);
+			fprintf(stderr, "%d of %d tuples done.\r", j, naccounts * scale);
 	}
+	fprintf(stderr, "\n");
 	if (PQputline(con, "\\.\n"))
 	{
 		fprintf(stderr, "very last PQputline failed\n");

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Peter Eisentraut (#1)

Re: too much pgbench init output

On Sat, Sep 1, 2012 at 12:00 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

When initializing a large database, pgbench writes tons of "%d tuples
done" lines. I propose to change this to a sort of progress counter
that stays on the same line, as in the attached patch.

I'm not sure I like this - what if the output is being saved off to a file?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Tomas Vondra

tv@fuzzy.cz

over 13 years ago

In reply to: Robert Haas (#2)

Re: too much pgbench init output

On 1 Září 2012, 12:30, Robert Haas wrote:

On Sat, Sep 1, 2012 at 12:00 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

When initializing a large database, pgbench writes tons of "%d tuples
done" lines. I propose to change this to a sort of progress counter
that stays on the same line, as in the attached patch.

I'm not sure I like this - what if the output is being saved off to a
file?

What about using istty(stdout) to handle this situation? Although I find
it usually confusing, because it prints one thing when executed directly
and something else when the output is redirected to a file.

I see two other options:

(1) removing this output altogether (I can't imagine a situation when this
really matters) and replace it with a simple "inserted 23% of rows,
estimated remaining time 14:23 (863 sec)", updated each 1%

(2) adding a switch (--verbose) that enables these lines, don't print them
by default

Another option might be updating the process title so that the "top" shows
current progress more precisely than (1).

Tomas

Peter Eisentraut

peter_e@gmx.net

over 13 years ago

In reply to: Robert Haas (#2)

Re: too much pgbench init output

On 9/1/12 6:30 AM, Robert Haas wrote:

On Sat, Sep 1, 2012 at 12:00 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

When initializing a large database, pgbench writes tons of "%d tuples
done" lines. I propose to change this to a sort of progress counter
that stays on the same line, as in the attached patch.

I'm not sure I like this - what if the output is being saved off to a file?

I suppose we could print \n instead of \r then.

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Peter Eisentraut (#4)

Re: too much pgbench init output

Peter Eisentraut <peter_e@gmx.net> writes:

On 9/1/12 6:30 AM, Robert Haas wrote:

On Sat, Sep 1, 2012 at 12:00 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

When initializing a large database, pgbench writes tons of "%d tuples
done" lines. I propose to change this to a sort of progress counter
that stays on the same line, as in the attached patch.

I'm not sure I like this - what if the output is being saved off to a file?

I suppose we could print \n instead of \r then.

Possibly off-the-wall idea: we could fix the "too much output" problem
once and for all by going to a log scale.

10 tuples done
100 tuples done
1000 tuples done
10000 tuples done
100000 tuples done
...

regards, tom lane

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Tom Lane (#5)

Re: too much pgbench init output

On Tue, Sep 4, 2012 at 8:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

On 9/1/12 6:30 AM, Robert Haas wrote:

On Sat, Sep 1, 2012 at 12:00 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

When initializing a large database, pgbench writes tons of "%d tuples
done" lines. I propose to change this to a sort of progress counter
that stays on the same line, as in the attached patch.

I'm not sure I like this - what if the output is being saved off to a file?

I suppose we could print \n instead of \r then.

Possibly off-the-wall idea: we could fix the "too much output" problem
once and for all by going to a log scale.

10 tuples done
100 tuples done
1000 tuples done
10000 tuples done
100000 tuples done
...

I don't like that, because one of the things you can see by following
the current output is where the checkpoint stalls are happening during
the load. You'd lose the ability to notice any kind of slowdown after
the first few tuples with this kind of format.

Actually, this whole things seems like a solution in search of a
problem to me. We just reduced the verbosity of pgbench -i tenfold in
the very recent past - I would have thought that enough to address
this problem. But maybe not.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Peter Eisentraut

peter_e@gmx.net

over 13 years ago

In reply to: Robert Haas (#6)

Re: too much pgbench init output

On Tue, 2012-09-04 at 23:14 -0400, Robert Haas wrote:

Actually, this whole things seems like a solution in search of a
problem to me. We just reduced the verbosity of pgbench -i tenfold in
the very recent past - I would have thought that enough to address
this problem. But maybe not.

The problem is that

a) It blasts out too much output and everything scrolls off the screen,
and

b) There is no indication of where the end is.

These are independent problems, and I'd be happy to address them
separately if there are such specific concerns attached to this.

Speaking of tenfold, we could reduce the output frequency tenfold to
once every 1000000, which would alleviate this problem for a while
longer.

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Peter Eisentraut (#7)

Re: too much pgbench init output

Peter Eisentraut <peter_e@gmx.net> writes:

On Tue, 2012-09-04 at 23:14 -0400, Robert Haas wrote:

Actually, this whole things seems like a solution in search of a
problem to me. We just reduced the verbosity of pgbench -i tenfold in
the very recent past - I would have thought that enough to address
this problem. But maybe not.

The problem is that

a) It blasts out too much output and everything scrolls off the screen,
and

Robert evidently thinks that the verbosity of the output is a feature
not a bug. I'm not convinced that eyeballing pgbench output is a
particularly useful way to measure checkpoint stalls, but ...

b) There is no indication of where the end is.

Well, surely *that* can be fixed in a noncontroversial way: just
print "M/N tuples done", where N is the target.

regards, tom lane

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Peter Eisentraut (#7)

Re: too much pgbench init output

On Tue, Sep 4, 2012 at 11:31 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

On Tue, 2012-09-04 at 23:14 -0400, Robert Haas wrote:

Actually, this whole things seems like a solution in search of a
problem to me. We just reduced the verbosity of pgbench -i tenfold in
the very recent past - I would have thought that enough to address
this problem. But maybe not.

The problem is that

a) It blasts out too much output and everything scrolls off the screen,
and

b) There is no indication of where the end is.

These are independent problems, and I'd be happy to address them
separately if there are such specific concerns attached to this.

Speaking of tenfold, we could reduce the output frequency tenfold to
once every 1000000, which would alleviate this problem for a while
longer.

Well, I wouldn't object to displaying a percentage on each output
line. But I don't really like the idea of having them less frequent
than they already are, because if you run into a situation that makes
pgbench -i run slowly, as I occasionally do, it's marginal to tell the
difference between "slow" and "completely hung" even with the current
level of verbosity.

However, we could add a -q flag to run more quietly, or something like
that. Actually, I'd even be fine with making the default quieter,
though we can't use -v for verbose since that's already taken. But
I'd like to preserve the option of getting the current amount of
output because sometimes I need that to troubleshoot problems.
Actually it'd be nice to even get a bit more output: say, a timestamp
on each line, and a completion percentage... but now I'm getting
greedy.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10

Pavan Deolasee

pavan.deolasee@gmail.com

over 13 years ago

In reply to: Robert Haas (#9)

Re: too much pgbench init output

On Wed, Sep 5, 2012 at 9:47 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Actually it'd be nice to even get a bit more output: say, a timestamp
on each line, and a completion percentage... but now I'm getting
greedy.

May be we need a verbosity level and print a lot less or a lot more
information than what we do today. That will satisfy everyone. Hopefully.

Thanks,
Pavan

#11

Peter Eisentraut

peter_e@gmx.net

over 13 years ago

In reply to: Tom Lane (#8)

Re: too much pgbench init output

On Tue, 2012-09-04 at 23:44 -0400, Tom Lane wrote:

b) There is no indication of where the end is.

Well, surely *that* can be fixed in a noncontroversial way: just
print "M/N tuples done", where N is the target.

I have made this change. I won't pursue using \r if others find it
useful as is.

#12

Tomas Vondra

tv@fuzzy.cz

over 13 years ago

In reply to: Robert Haas (#9)

1 attachment(s)

Re: too much pgbench init output

On 5.9.2012 06:17, Robert Haas wrote:

On Tue, Sep 4, 2012 at 11:31 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

On Tue, 2012-09-04 at 23:14 -0400, Robert Haas wrote:

Actually, this whole things seems like a solution in search of a
problem to me. We just reduced the verbosity of pgbench -i tenfold in
the very recent past - I would have thought that enough to address
this problem. But maybe not.

The problem is that

a) It blasts out too much output and everything scrolls off the screen,
and

b) There is no indication of where the end is.

These are independent problems, and I'd be happy to address them
separately if there are such specific concerns attached to this.

Speaking of tenfold, we could reduce the output frequency tenfold to
once every 1000000, which would alleviate this problem for a while
longer.

Well, I wouldn't object to displaying a percentage on each output
line. But I don't really like the idea of having them less frequent
than they already are, because if you run into a situation that makes
pgbench -i run slowly, as I occasionally do, it's marginal to tell the
difference between "slow" and "completely hung" even with the current
level of verbosity.

However, we could add a -q flag to run more quietly, or something like
that. Actually, I'd even be fine with making the default quieter,
though we can't use -v for verbose since that's already taken. But
I'd like to preserve the option of getting the current amount of
output because sometimes I need that to troubleshoot problems.
Actually it'd be nice to even get a bit more output: say, a timestamp
on each line, and a completion percentage... but now I'm getting
greedy.

Hi,

I've been thinking about this a bit more, and do propose to use an
option that determines "logging step" i.e. number of items (either
directly or as a percentage) between log lines.

The attached patch defines a new option "--logging-step" that accepts
either integers or percents. For example if you want to print a line
each 1000 lines, you can to this

$ pgbench -i -s 1000 --logging-step 1000 testdb

and if you want to print a line each 5%, you can do this

$ pgbench -i -s 1000 --logging-step 5% testdb

and that's it.

Moreover the patch adds a record of elapsed an estimate of remaining
time. So for example with 21% you may get this:

creating tables...
21000 of 100000 tuples (21%) done (elapsed 1.56 s, remaining 5.85 s).
42000 of 100000 tuples (42%) done (elapsed 3.15 s, remaining 4.35 s).
63000 of 100000 tuples (63%) done (elapsed 4.73 s, remaining 2.78 s).
84000 of 100000 tuples (84%) done (elapsed 6.30 s, remaining 1.20 s).
100000 of 100000 tuples (100%) done (elapsed 8.17 s, remaining 0.00 s).
vacuum...
set primary keys...

Now, I've had a hard time with the patch - no matter what I do, I do get
"invalid option" error whenever I try to run that from command line for
some reason. But when I run it from gdb, it works just fine.

kind regards
Tomas

Attachments:

pgbench-logging-v1.difftext/plain; charset=UTF-8; name=pgbench-logging-v1.diffDownload

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index f5ac3b1..ce7e240 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -130,6 +130,11 @@ int			foreign_keys = 0;
 int			unlogged_tables = 0;
 
 /*
+ * logging step (inserts)
+ */
+int			log_step = 100000;
+
+/*
  * tablespace selection
  */
 char	   *tablespace = NULL;
@@ -356,6 +361,8 @@ usage(void)
 		   "               create tables in the specified tablespace\n"
 		   "  --unlogged-tables\n"
 		   "               create tables as unlogged tables\n"
+		   "  --logging-step NUM\n"
+		   "               how often to print info about init progress\n"
 		   "\nBenchmarking options:\n"
 		"  -c NUM       number of concurrent database clients (default: 1)\n"
 		   "  -C           establish new connection for each transaction\n"
@@ -1340,6 +1347,10 @@ init(bool is_no_vacuum)
 	char		sql[256];
 	int			i;
 
+	/* used to track elapsed time and estimate of the remaining time */
+	instr_time	start, diff;
+	double elapsed_sec, remaining_sec;
+
 	if ((con = doConnect()) == NULL)
 		exit(1);
 
@@ -1408,6 +1419,8 @@ init(bool is_no_vacuum)
 	}
 	PQclear(res);
 
+	INSTR_TIME_SET_CURRENT(start);
+
 	for (i = 0; i < naccounts * scale; i++)
 	{
 		int			j = i + 1;
@@ -1419,10 +1432,18 @@ init(bool is_no_vacuum)
 			exit(1);
 		}
 
-		if (j % 100000 == 0)
-			fprintf(stderr, "%d of %d tuples (%d%%) done.\n",
+		if (j % log_step == 0 || j == scale * naccounts)
+		{
+			INSTR_TIME_SET_CURRENT(diff);
+			INSTR_TIME_SUBTRACT(diff, start);
+			
+			elapsed_sec = INSTR_TIME_GET_DOUBLE(diff);
+			remaining_sec = (scale * naccounts - j) * elapsed_sec / j;
+			
+			fprintf(stderr, "%d of %d tuples (%d%%) done (elapsed %.2f s, remaining %.2f s).\n",
 					j, naccounts * scale,
-					j * 100 / (naccounts * scale));
+					j * 100 / (naccounts * scale), elapsed_sec, remaining_sec);
+		}
 	}
 	if (PQputline(con, "\\.\n"))
 	{
@@ -1901,6 +1922,7 @@ main(int argc, char **argv)
 	int			do_vacuum_accounts = 0; /* do vacuum accounts before testing? */
 	int			ttype = 0;		/* transaction type. 0: TPC-B, 1: SELECT only,
 								 * 2: skip update of branches and tellers */
+	float		log_step_pct = 0;	/* logging step in percent */
 	int			optindex;
 	char	   *filename = NULL;
 	bool		scale_given = false;
@@ -1920,6 +1942,7 @@ main(int argc, char **argv)
 		{"index-tablespace", required_argument, NULL, 3},
 		{"tablespace", required_argument, NULL, 2},
 		{"unlogged-tables", no_argument, &unlogged_tables, 1},
+		{"logging-step", required_argument, NULL, 6},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -2125,6 +2148,14 @@ main(int argc, char **argv)
 			case 3:				/* index-tablespace */
 				index_tablespace = optarg;
 				break;
+			case 6:
+				if (optarg[strlen(optarg)-1] == '%') {
+					optarg[strlen(optarg)-1] = '\0';
+					log_step_pct = atof(optarg);
+				} else {
+					log_step = atol(optarg);
+				}
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -2144,6 +2175,11 @@ main(int argc, char **argv)
 			dbName = "";
 	}
 
+	/* compute the log_step from total number of accounts and log_step_pct */
+	if (log_step_pct != 0) {
+		log_step = log_step_pct * naccounts * scale / 100;
+	}
+
 	if (is_init_mode)
 	{
 		init(is_no_vacuum);

#13

Alvaro Herrera

alvherre@2ndquadrant.com

about 13 years ago

In reply to: Tomas Vondra (#12)

Re: too much pgbench init output

Tomas Vondra wrote:

I've been thinking about this a bit more, and do propose to use an
option that determines "logging step" i.e. number of items (either
directly or as a percentage) between log lines.

The attached patch defines a new option "--logging-step" that accepts
either integers or percents. For example if you want to print a line
each 1000 lines, you can to this

$ pgbench -i -s 1000 --logging-step 1000 testdb

I find it hard to get excited about having to specify a command line
argument to tweak this. Would it work to have it emit messages
depending on elapsed time and log scale of tuples emitted? So for
example emit the first message after 5 seconds or 100k tuples, then back
off until (say) 15 seconds have lapsed and 1M tuples, etc? The idea is
to make it verbose enough to keep a human satisfied with what he sees,
but not flood the terminal with pointless updates. (I think printing
the ETA might be nice as well, not sure).

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#14

Robert Haas

robertmhaas@gmail.com

about 13 years ago

In reply to: Alvaro Herrera (#13)

Re: too much pgbench init output

On Tue, Oct 23, 2012 at 12:02 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Tomas Vondra wrote:

I've been thinking about this a bit more, and do propose to use an
option that determines "logging step" i.e. number of items (either
directly or as a percentage) between log lines.

The attached patch defines a new option "--logging-step" that accepts
either integers or percents. For example if you want to print a line
each 1000 lines, you can to this

$ pgbench -i -s 1000 --logging-step 1000 testdb

I find it hard to get excited about having to specify a command line
argument to tweak this. Would it work to have it emit messages
depending on elapsed time and log scale of tuples emitted? So for
example emit the first message after 5 seconds or 100k tuples, then back
off until (say) 15 seconds have lapsed and 1M tuples, etc? The idea is
to make it verbose enough to keep a human satisfied with what he sees,
but not flood the terminal with pointless updates. (I think printing
the ETA might be nice as well, not sure).

I like this idea. One of the times when the more verbose output is
really useful is when you expect it to run fast but then it turns out
that for some reason it runs really slow. If you make the output too
terse, then you end up not really knowing what's going on. Having it
give an update at least every 5 seconds would be a nice way to give
the user a heads-up if things aren't going as planned, without
cluttering the normal case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15

Tomas Vondra

tv@fuzzy.cz

about 13 years ago

In reply to: Robert Haas (#14)

1 attachment(s)

Re: too much pgbench init output

On 23.10.2012 18:21, Robert Haas wrote:

On Tue, Oct 23, 2012 at 12:02 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Tomas Vondra wrote:

I've been thinking about this a bit more, and do propose to use an
option that determines "logging step" i.e. number of items (either
directly or as a percentage) between log lines.

The attached patch defines a new option "--logging-step" that accepts
either integers or percents. For example if you want to print a line
each 1000 lines, you can to this

$ pgbench -i -s 1000 --logging-step 1000 testdb

I find it hard to get excited about having to specify a command line
argument to tweak this. Would it work to have it emit messages
depending on elapsed time and log scale of tuples emitted? So for
example emit the first message after 5 seconds or 100k tuples, then back
off until (say) 15 seconds have lapsed and 1M tuples, etc? The idea is
to make it verbose enough to keep a human satisfied with what he sees,
but not flood the terminal with pointless updates. (I think printing
the ETA might be nice as well, not sure).

I like this idea. One of the times when the more verbose output is
really useful is when you expect it to run fast but then it turns out
that for some reason it runs really slow. If you make the output too
terse, then you end up not really knowing what's going on. Having it
give an update at least every 5 seconds would be a nice way to give
the user a heads-up if things aren't going as planned, without
cluttering the normal case.

I've prepared a patch along these lines. The attached version used only
elapsed time to print the log messages each 5 seconds, so now it prints
a meessage each 5 seconds no matter what, along with an estimate of
remaining time.

I've removed the config option, although it might be useful to specify
the interval?

I'm not entirely sure how the 'log scale of tuples' should work - for
example when the time 15 seconds limit is reached, should it be reset
back to the previous step (5 seconds) to give a more detailed info, or
should it be kept at 15 seconds?

Tomas

Attachments:

pgbench-logging-v2.patchtext/plain; charset=UTF-8; name=pgbench-logging-v2.patchDownload

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 090c210..3cbb564 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -135,6 +135,11 @@ int			unlogged_tables = 0;
 double		sample_rate = 0.0;
 
 /*
+ * logging steps (seconds between log messages)
+ */
+int			log_step_seconds = 5;
+
+/*
  * tablespace selection
  */
 char	   *tablespace = NULL;
@@ -1362,6 +1367,11 @@ init(bool is_no_vacuum)
 	char		sql[256];
 	int			i;
 
+	/* used to track elapsed time and estimate of the remaining time */
+	instr_time	start, diff;
+	double elapsed_sec, remaining_sec;
+	int log_interval = 1;
+
 	if ((con = doConnect()) == NULL)
 		exit(1);
 
@@ -1430,6 +1440,8 @@ init(bool is_no_vacuum)
 	}
 	PQclear(res);
 
+	INSTR_TIME_SET_CURRENT(start);
+
 	for (i = 0; i < naccounts * scale; i++)
 	{
 		int			j = i + 1;
@@ -1441,10 +1453,28 @@ init(bool is_no_vacuum)
 			exit(1);
 		}
 
-		if (j % 100000 == 0)
-			fprintf(stderr, "%d of %d tuples (%d%%) done.\n",
-					j, naccounts * scale,
-					(int) (((int64) j * 100) / (naccounts * scale)));
+		/* let's not call the timing for each row, but only each 100 rows */
+		if (j % 100 == 0 || j == scale * naccounts)
+		{
+			INSTR_TIME_SET_CURRENT(diff);
+			INSTR_TIME_SUBTRACT(diff, start);
+			
+			elapsed_sec = INSTR_TIME_GET_DOUBLE(diff);
+			remaining_sec = (scale * naccounts - j) * elapsed_sec / j;
+			
+			/* have ve reached the next interval? */
+			if (elapsed_sec >= log_interval * log_step_seconds) {
+			
+				fprintf(stderr, "%d of %d tuples (%d%%) done (elapsed %.2f s, remaining %.2f s).\n",
+						j, naccounts * scale,
+						(int) (((int64) j * 100) / (naccounts * scale)), elapsed_sec, remaining_sec);
+				
+				/* skip to the next interval */
+				log_interval = (int)ceil(elapsed_sec/log_step_seconds);
+				
+			}
+		}
+
 	}
 	if (PQputline(con, "\\.\n"))
 	{

#16

Jeevan Chalke

jeevan.chalke@enterprisedb.com

about 13 years ago

In reply to: Tomas Vondra (#15)

Re: too much pgbench init output

Hi,

I gone through the discussion for this patch and here is my review:

The main aim of this patch is to reduce the number of log lines. It is also
suggested to use an options to provide the interval but few of us are not
much agree on it. So final discussion ended at keeping 5 sec interval
between each log line.

However, I see, there are two types of users here:
1. Who likes these log lines, so that they can troubleshoot some slowness
and all
2. Who do not like these log lines.

So keeping these in mind, I rather go for an option which will control
this. People falling in category one can set this option to very low where
as users falling under second category can keep it high.

However, assuming we settled on 5 sec delay, here are few comments on that
patch attached:

Comments:
=========

Patch gets applied cleanly with some whitespace errors. make and make
install too went smooth.
make check was smooth. Rather it should be smooth since there are NO
changes in other part of the code rather than just pgbench.c and we do not
have any test-case as well.

However, here are few comments on changes in pgbench.c

1.
Since the final discussion ended at keeping a 5 seconds interval will be
good enough, Author used a global int variable for that.
Given that it's just a constant, #define would be a better choice.

2.
+ /* let's not call the timing for each row, but only each 100 rows
*/
Why only 100 rows ? Have you done any testing to come up with number 100 ?
To me it seems very low. It will be good to test with 1K or even 10K.
On my machine (2.4 GHz Intel core 2 duo Macbook PRO, running Ubuntu in VM
with 4GB RAM, 1067 DDR3), in 5 Sec, approx 1M rows were inserted. So
checking every 100 rows looks overkill.

3.
Please indent following block as per the indentation just above that

/* used to track elapsed time and estimate of the remaining time */
instr_time start, diff;
double elapsed_sec, remaining_sec;
int log_interval = 1;

4.
+ /* have ve reached the next interval? */
Do you mean "have WE reached..."

5.
While applying a patch, I got few white-space errors. But I think every
patch goes through pgindent which might take care of this.

Thanks

On Sun, Nov 11, 2012 at 11:02 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 23.10.2012 18:21, Robert Haas wrote:

On Tue, Oct 23, 2012 at 12:02 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Tomas Vondra wrote:

I've been thinking about this a bit more, and do propose to use an
option that determines "logging step" i.e. number of items (either
directly or as a percentage) between log lines.

The attached patch defines a new option "--logging-step" that accepts
either integers or percents. For example if you want to print a line
each 1000 lines, you can to this

$ pgbench -i -s 1000 --logging-step 1000 testdb

I find it hard to get excited about having to specify a command line
argument to tweak this. Would it work to have it emit messages
depending on elapsed time and log scale of tuples emitted? So for
example emit the first message after 5 seconds or 100k tuples, then back
off until (say) 15 seconds have lapsed and 1M tuples, etc? The idea is
to make it verbose enough to keep a human satisfied with what he sees,
but not flood the terminal with pointless updates. (I think printing
the ETA might be nice as well, not sure).

I like this idea. One of the times when the more verbose output is
really useful is when you expect it to run fast but then it turns out
that for some reason it runs really slow. If you make the output too
terse, then you end up not really knowing what's going on. Having it
give an update at least every 5 seconds would be a nice way to give
the user a heads-up if things aren't going as planned, without
cluttering the normal case.

I've prepared a patch along these lines. The attached version used only
elapsed time to print the log messages each 5 seconds, so now it prints
a meessage each 5 seconds no matter what, along with an estimate of
remaining time.

I've removed the config option, although it might be useful to specify
the interval?

I'm not entirely sure how the 'log scale of tuples' should work - for
example when the time 15 seconds limit is reached, should it be reset
back to the previous step (5 seconds) to give a more detailed info, or
should it be kept at 15 seconds?

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.

#17

Tomas Vondra

tv@fuzzy.cz

about 13 years ago

In reply to: Jeevan Chalke (#16)

Re: too much pgbench init output

On 19.11.2012 11:59, Jeevan Chalke wrote:

Hi,

I gone through the discussion for this patch and here is my review:

The main aim of this patch is to reduce the number of log lines. It is
also suggested to use an options to provide the interval but few of us
are not much agree on it. So final discussion ended at keeping 5 sec
interval between each log line.

However, I see, there are two types of users here:
1. Who likes these log lines, so that they can troubleshoot some
slowness and all
2. Who do not like these log lines.

Who likes these lines / needs them for something useful?

So keeping these in mind, I rather go for an option which will control
this. People falling in category one can set this option to very low
where as users falling under second category can keep it high.

So what option(s) would you expect? Something that tunes the interval
length or something else?

A switch that'd choose between the old and new behavior might be a good
idea, but I'd strongly vote against "automagic" heuristics. It makes the
behavior very difficult to predict and I really don't want to force the
users to wonder whether the long delay is due to general slowness of the
machine or some "clever" logic that causes long delays between log messages.

That's why I choose a very simple approach with constant time interval.
It does what I was aiming for (less logs) and it's easy to predict.
Sure, we could choose different interval (or make it an option).

However, assuming we settled on 5 sec delay, here are few comments on
that patch attached:

Comments:
=========

Patch gets applied cleanly with some whitespace errors. make and make
install too went smooth.
make check was smooth. Rather it should be smooth since there are NO
changes in other part of the code rather than just pgbench.c and we do
not have any test-case as well.

However, here are few comments on changes in pgbench.c

1.
Since the final discussion ended at keeping a 5 seconds interval will be
good enough, Author used a global int variable for that.
Given that it's just a constant, #define would be a better choice.

Good point. Although if we add an option to supply different values, a
variable is a better match.

2.
+ /* let's not call the timing for each row, but only each 100
rows */
Why only 100 rows ? Have you done any testing to come up with number 100
? To me it seems very low. It will be good to test with 1K or even 10K.
On my machine (2.4 GHz Intel core 2 duo Macbook PRO, running Ubuntu in
VM with 4GB RAM, 1067 DDR3), in 5 Sec, approx 1M rows were inserted. So
checking every 100 rows looks overkill.

Well, the 100 is clearly a magical constant. The goal was to choose a
value large enough to amortize the getlocaltime() cost, but small enough
to print info even in cases when the performance sucks for some reason.

I've seen issues where the speed suddenly dropped to a fraction of the
expected value, e.g. 100 rows/second, and in those cases you'd have to
wait for a very long time to actually get the next log message (with the
suggested 10k step).

So 100 seems like a good compromise to me ...

3.
Please indent following block as per the indentation just above that

/* used to track elapsed time and estimate of the remaining time */
instr_time start, diff;
double elapsed_sec, remaining_sec;
int log_interval = 1;

4.
+ /* have ve reached the next interval? */
Do you mean "have WE reached..."

5.
While applying a patch, I got few white-space errors. But I think every
patch goes through pgindent which might take care of this.

Thanks

Thanks for the review. I'll wait for a bit more discussion about the
choices before submitting another version of the patch.

Tomas

#18

Jeevan Chalke

jeevan.chalke@enterprisedb.com

about 13 years ago

In reply to: Tomas Vondra (#17)

Re: too much pgbench init output

Hi,

On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 19.11.2012 11:59, Jeevan Chalke wrote:

Hi,

I gone through the discussion for this patch and here is my review:

The main aim of this patch is to reduce the number of log lines. It is
also suggested to use an options to provide the interval but few of us
are not much agree on it. So final discussion ended at keeping 5 sec
interval between each log line.

However, I see, there are two types of users here:
1. Who likes these log lines, so that they can troubleshoot some
slowness and all
2. Who do not like these log lines.

Who likes these lines / needs them for something useful?

No idea. I fall in second category.

But from the discussion, I believe some people may need detailed (or lot
more) output.

So keeping these in mind, I rather go for an option which will control
this. People falling in category one can set this option to very low
where as users falling under second category can keep it high.

So what option(s) would you expect? Something that tunes the interval
length or something else?

Interval length.

A switch that'd choose between the old and new behavior might be a good
idea, but I'd strongly vote against "automagic" heuristics. It makes the
behavior very difficult to predict and I really don't want to force the
users to wonder whether the long delay is due to general slowness of the
machine or some "clever" logic that causes long delays between log
messages.

That's why I choose a very simple approach with constant time interval.
It does what I was aiming for (less logs) and it's easy to predict.
Sure, we could choose different interval (or make it an option).

I am preferring an option for choosing an interval, say from 1 second to 10
seconds.

BTW, what if, we put one log message every 10% (or 5%) with time taken
(time taken for last 10% (or 5%) and cumulative) over 5 seconds ?
This will have only 10 (or 20) lines per pgbench initialisation.
And since we are showing time taken for each block, if any slowness
happens, one can easily find a block by looking at the timings and
troubleshoot it.
Though 10% or 5% is again a debatable number, but keeping it constant will
eliminate the requirement of an option.

However, assuming we settled on 5 sec delay, here are few comments on
that patch attached:

Comments:
=========

Patch gets applied cleanly with some whitespace errors. make and make
install too went smooth.
make check was smooth. Rather it should be smooth since there are NO
changes in other part of the code rather than just pgbench.c and we do
not have any test-case as well.

However, here are few comments on changes in pgbench.c

1.
Since the final discussion ended at keeping a 5 seconds interval will be
good enough, Author used a global int variable for that.
Given that it's just a constant, #define would be a better choice.

Good point. Although if we add an option to supply different values, a
variable is a better match.

Exactly, if we ended up with an option then it looks good. But in your
current patch it was constant, so #define should be preferred.

2.
+ /* let's not call the timing for each row, but only each 100
rows */
Why only 100 rows ? Have you done any testing to come up with number 100
? To me it seems very low. It will be good to test with 1K or even 10K.
On my machine (2.4 GHz Intel core 2 duo Macbook PRO, running Ubuntu in
VM with 4GB RAM, 1067 DDR3), in 5 Sec, approx 1M rows were inserted. So
checking every 100 rows looks overkill.

Well, the 100 is clearly a magical constant. The goal was to choose a
value large enough to amortize the getlocaltime() cost, but small enough
to print info even in cases when the performance sucks for some reason.

I've seen issues where the speed suddenly dropped to a fraction of the
expected value, e.g. 100 rows/second, and in those cases you'd have to
wait for a very long time to actually get the next log message (with the
suggested 10k step).

So 100 seems like a good compromise to me ...

Hmm... inserting just 100 rows / seconds is really slow.
Since you already seen such behaviour, I have no objection keeping it 100.

3.
Please indent following block as per the indentation just above that

/* used to track elapsed time and estimate of the remaining time */
instr_time start, diff;
double elapsed_sec, remaining_sec;
int log_interval = 1;

OK

4.
+ /* have ve reached the next interval? */
Do you mean "have WE reached..."

OK

5.
While applying a patch, I got few white-space errors. But I think every
patch goes through pgindent which might take care of this.

OK

Thanks

Thanks for the review. I'll wait for a bit more discussion about the
choices before submitting another version of the patch.

Sure

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thanks
--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

#19

Tomas Vondra

tv@fuzzy.cz

about 13 years ago

In reply to: Jeevan Chalke (#18)

1 attachment(s)

Re: too much pgbench init output

On 20.11.2012 08:22, Jeevan Chalke wrote:

Hi,

On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra <tv@fuzzy.cz
<mailto:tv@fuzzy.cz>> wrote:

On 19.11.2012 11:59, Jeevan Chalke wrote:

Hi,

I gone through the discussion for this patch and here is my review:

The main aim of this patch is to reduce the number of log lines. It is
also suggested to use an options to provide the interval but few of us
are not much agree on it. So final discussion ended at keeping 5 sec
interval between each log line.

However, I see, there are two types of users here:
1. Who likes these log lines, so that they can troubleshoot some
slowness and all
2. Who do not like these log lines.

Who likes these lines / needs them for something useful?

No idea. I fall in second category.

But from the discussion, I believe some people may need detailed (or lot
more) output.

I've read the thread again and my impression is that no one really needs
or likes those lines, but

(1) it's rather pointless to print a message every 100k rows, as it
usually fills the console with garbabe

(2) it's handy to have regular updates of the progress

I don't think there're people (in the thread) that require to keep the
current amount of log messages.

But there might be users who actually use the current logs to do
something (although I can't imagine what). If we want to do this in a
backwards compatible way, we should probably use a new option (e.g.
"-q") to enable the new (less verbose) logging.

Do we want to allow both types of logging, or shall we keep only the new
one? If both, which one should be the default one?

So keeping these in mind, I rather go for an option which will control
this. People falling in category one can set this option to very low
where as users falling under second category can keep it high.

So what option(s) would you expect? Something that tunes the interval
length or something else?

Interval length.

Well, I can surely imagine something like "--interval N".

A switch that'd choose between the old and new behavior might be a good
idea, but I'd strongly vote against "automagic" heuristics. It makes the
behavior very difficult to predict and I really don't want to force the
users to wonder whether the long delay is due to general slowness of the
machine or some "clever" logic that causes long delays between log
messages.

That's why I choose a very simple approach with constant time interval.
It does what I was aiming for (less logs) and it's easy to predict.
Sure, we could choose different interval (or make it an option).

I am preferring an option for choosing an interval, say from 1 second to
10 seconds.

Ummmm, why not to allow arbitrary integer? Why saying 1 to 10 seconds?

BTW, what if, we put one log message every 10% (or 5%) with time taken
(time taken for last 10% (or 5%) and cumulative) over 5 seconds ?
This will have only 10 (or 20) lines per pgbench initialisation.
And since we are showing time taken for each block, if any slowness
happens, one can easily find a block by looking at the timings and
troubleshoot it.
Though 10% or 5% is again a debatable number, but keeping it constant
will eliminate the requirement of an option.

That's what I originally proposed in September (see the messages from
17/9), and Alvaro was not relly excited about this.

Attached is a patch with fixed whitespace / indentation errors etc.
Otherwise it's the same as the previous version.

Tomas

Attachments:

pgbench-logging-v3.patchtext/plain; charset=UTF-8; name=pgbench-logging-v3.patchDownload

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index e376452..334ce4c 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -135,6 +135,11 @@ int			unlogged_tables = 0;
 double		sample_rate = 0.0;
 
 /*
+ * logging steps (seconds between log messages)
+ */
+int			log_step_seconds = 5;
+
+/*
  * tablespace selection
  */
 char	   *tablespace = NULL;
@@ -1362,6 +1367,11 @@ init(bool is_no_vacuum)
 	char		sql[256];
 	int			i;
 
+	/* used to track elapsed time and estimate of the remaining time */
+	instr_time	start, diff;
+	double		elapsed_sec, remaining_sec;
+	int			log_interval = 1;
+
 	if ((con = doConnect()) == NULL)
 		exit(1);
 
@@ -1430,6 +1440,8 @@ init(bool is_no_vacuum)
 	}
 	PQclear(res);
 
+	INSTR_TIME_SET_CURRENT(start);
+
 	for (i = 0; i < naccounts * scale; i++)
 	{
 		int			j = i + 1;
@@ -1441,10 +1453,27 @@ init(bool is_no_vacuum)
 			exit(1);
 		}
 
-		if (j % 100000 == 0)
-			fprintf(stderr, "%d of %d tuples (%d%%) done.\n",
-					j, naccounts * scale,
-					(int) (((int64) j * 100) / (naccounts * scale)));
+		/* let's not call the timing for each row, but only each 100 rows */
+		if (j % 100 == 0 || j == scale * naccounts)
+		{
+			INSTR_TIME_SET_CURRENT(diff);
+			INSTR_TIME_SUBTRACT(diff, start);
+
+			elapsed_sec = INSTR_TIME_GET_DOUBLE(diff);
+			remaining_sec = (scale * naccounts - j) * elapsed_sec / j;
+
+			/* have we reached the next interval? */
+			if (elapsed_sec >= log_interval * log_step_seconds) {
+
+				fprintf(stderr, "%d of %d tuples (%d%%) done (elapsed %.2f s, remaining %.2f s).\n",
+						j, naccounts * scale,
+						(int) (((int64) j * 100) / (naccounts * scale)), elapsed_sec, remaining_sec);
+
+				/* skip to the next interval */
+				log_interval = (int)ceil(elapsed_sec/log_step_seconds);
+			}
+		}
+
 	}
 	if (PQputline(con, "\\.\n"))
 	{

#20

Jeevan Chalke

jeevan.chalke@enterprisedb.com

about 13 years ago

In reply to: Tomas Vondra (#19)

Re: too much pgbench init output

On Sun, Dec 9, 2012 at 8:11 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 20.11.2012 08:22, Jeevan Chalke wrote:

Hi,

On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra <tv@fuzzy.cz
<mailto:tv@fuzzy.cz>> wrote:

On 19.11.2012 11:59, Jeevan Chalke wrote:

Hi,

I gone through the discussion for this patch and here is my review:

The main aim of this patch is to reduce the number of log lines.

It is

also suggested to use an options to provide the interval but few

of us

are not much agree on it. So final discussion ended at keeping 5

sec

interval between each log line.

However, I see, there are two types of users here:
1. Who likes these log lines, so that they can troubleshoot some
slowness and all
2. Who do not like these log lines.

Who likes these lines / needs them for something useful?

No idea. I fall in second category.

But from the discussion, I believe some people may need detailed (or lot
more) output.

I've read the thread again and my impression is that no one really needs
or likes those lines, but

(1) it's rather pointless to print a message every 100k rows, as it
usually fills the console with garbabe

(2) it's handy to have regular updates of the progress

I don't think there're people (in the thread) that require to keep the
current amount of log messages.

But there might be users who actually use the current logs to do
something (although I can't imagine what). If we want to do this in a
backwards compatible way, we should probably use a new option (e.g.
"-q") to enable the new (less verbose) logging.

Do we want to allow both types of logging, or shall we keep only the new
one? If both, which one should be the default one?

Both the options are fine with me, but the default should be the current
behaviour.

So keeping these in mind, I rather go for an option which will

control

this. People falling in category one can set this option to very

low

where as users falling under second category can keep it high.

So what option(s) would you expect? Something that tunes the interval
length or something else?

Interval length.

Well, I can surely imagine something like "--interval N".

A switch that'd choose between the old and new behavior might be a

good

idea, but I'd strongly vote against "automagic" heuristics. It makes

the

behavior very difficult to predict and I really don't want to force

the

users to wonder whether the long delay is due to general slowness of

the

machine or some "clever" logic that causes long delays between log
messages.

That's why I choose a very simple approach with constant time

interval.

It does what I was aiming for (less logs) and it's easy to predict.
Sure, we could choose different interval (or make it an option).

I am preferring an option for choosing an interval, say from 1 second to
10 seconds.

Ummmm, why not to allow arbitrary integer? Why saying 1 to 10 seconds?

Hmm.. actually, I have no issues with any number there. Just put 1..10 as
we hard-coded it 5. No particular reason as such.

BTW, what if, we put one log message every 10% (or 5%) with time taken
(time taken for last 10% (or 5%) and cumulative) over 5 seconds ?
This will have only 10 (or 20) lines per pgbench initialisation.
And since we are showing time taken for each block, if any slowness
happens, one can easily find a block by looking at the timings and
troubleshoot it.
Though 10% or 5% is again a debatable number, but keeping it constant
will eliminate the requirement of an option.

That's what I originally proposed in September (see the messages from
17/9), and Alvaro was not relly excited about this.

Attached is a patch with fixed whitespace / indentation errors etc.
Otherwise it's the same as the previous version.

OK. Looks good now.

Any other views / suggestions are welcome.

Thanks

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

#21

Tomas Vondra

tv@fuzzy.cz

about 13 years ago

In reply to: Jeevan Chalke (#20)

1 attachment(s)

Re: too much pgbench init output

Hi,

attached is a new version of the patch that

(a) converts the 'log_step_seconds' variable to a constant (and does
not allow changing it using a command-line option etc.)

(b) keeps the current logging as a default

I'm still not convinced there should be yet another know for tuning the
log interval - opinions?

Tomas

Show quoted text

On 11.12.2012 10:23, Jeevan Chalke wrote:

On Sun, Dec 9, 2012 at 8:11 AM, Tomas Vondra <tv@fuzzy.cz
<mailto:tv@fuzzy.cz>> wrote:

On 20.11.2012 08:22, Jeevan Chalke wrote:

Hi,

On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra <tv@fuzzy.cz

<mailto:tv@fuzzy.cz>

<mailto:tv@fuzzy.cz <mailto:tv@fuzzy.cz>>> wrote:

On 19.11.2012 11:59, Jeevan Chalke wrote:

Hi,

I gone through the discussion for this patch and here is my

review:

The main aim of this patch is to reduce the number of log

lines. It is

also suggested to use an options to provide the interval but

few of us

are not much agree on it. So final discussion ended at

keeping 5 sec

interval between each log line.

However, I see, there are two types of users here:
1. Who likes these log lines, so that they can troubleshoot some
slowness and all
2. Who do not like these log lines.

Who likes these lines / needs them for something useful?

No idea. I fall in second category.

But from the discussion, I believe some people may need detailed

(or lot

more) output.

I've read the thread again and my impression is that no one really needs
or likes those lines, but

(1) it's rather pointless to print a message every 100k rows, as it
usually fills the console with garbabe

(2) it's handy to have regular updates of the progress

I don't think there're people (in the thread) that require to keep the
current amount of log messages.

But there might be users who actually use the current logs to do
something (although I can't imagine what). If we want to do this in a
backwards compatible way, we should probably use a new option (e.g.
"-q") to enable the new (less verbose) logging.

Do we want to allow both types of logging, or shall we keep only the new
one? If both, which one should be the default one?

Both the options are fine with me, but the default should be the current
behaviour.

So keeping these in mind, I rather go for an option which

will control

this. People falling in category one can set this option to

very low

where as users falling under second category can keep it high.

So what option(s) would you expect? Something that tunes the

interval

length or something else?

Interval length.

Well, I can surely imagine something like "--interval N".

+1

A switch that'd choose between the old and new behavior might

be a good

idea, but I'd strongly vote against "automagic" heuristics. It

makes the

behavior very difficult to predict and I really don't want to

force the

users to wonder whether the long delay is due to general

slowness of the

machine or some "clever" logic that causes long delays between log
messages.

That's why I choose a very simple approach with constant time

interval.

It does what I was aiming for (less logs) and it's easy to

predict.

Sure, we could choose different interval (or make it an option).

I am preferring an option for choosing an interval, say from 1

second to

10 seconds.

Ummmm, why not to allow arbitrary integer? Why saying 1 to 10 seconds?

Hmm.. actually, I have no issues with any number there. Just put 1..10
as we hard-coded it 5. No particular reason as such.

BTW, what if, we put one log message every 10% (or 5%) with time taken
(time taken for last 10% (or 5%) and cumulative) over 5 seconds ?
This will have only 10 (or 20) lines per pgbench initialisation.
And since we are showing time taken for each block, if any slowness
happens, one can easily find a block by looking at the timings and
troubleshoot it.
Though 10% or 5% is again a debatable number, but keeping it constant
will eliminate the requirement of an option.

That's what I originally proposed in September (see the messages from
17/9), and Alvaro was not relly excited about this.

Attached is a patch with fixed whitespace / indentation errors etc.
Otherwise it's the same as the previous version.

OK. Looks good now.

Any other views / suggestions are welcome.

Thanks

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org
<mailto:pgsql-hackers@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com <http://www.enterprisedb.com>
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the
intended recipient, any use, dissemination, distribution, retention,
archiving, or copying of this communication is strictly prohibited. If
you have received this e-mail in error, please notify the sender
immediately by reply e-mail and delete this message.

Attachments:

pgbench-logging-v4.patchtext/plain; charset=UTF-8; name=pgbench-logging-v4.patchDownload

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 334ce4c..14d8ad5 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -39,6 +39,7 @@
 #include "portability/instr_time.h"
 
 #include <ctype.h>
+#include <math.h>
 
 #ifndef WIN32
 #include <sys/time.h>
@@ -102,6 +103,7 @@ extern int	optind;
 #define MAXCLIENTS	1024
 #endif
 
+#define LOG_STEP_SECONDS	5	/* seconds between log messages */
 #define DEFAULT_NXACTS	10		/* default nxacts */
 
 int			nxacts = 0;			/* number of transactions per client */
@@ -135,11 +137,6 @@ int			unlogged_tables = 0;
 double		sample_rate = 0.0;
 
 /*
- * logging steps (seconds between log messages)
- */
-int			log_step_seconds = 5;
-
-/*
  * tablespace selection
  */
 char	   *tablespace = NULL;
@@ -155,6 +152,7 @@ char	   *index_tablespace = NULL;
 #define naccounts	100000
 
 bool		use_log;			/* log transaction latencies to a file */
+bool		use_quiet;			/* quiet logging onto stderr */
 bool		is_connect;			/* establish connection for each transaction */
 bool		is_latencies;		/* report per-command latencies */
 int			main_pid;			/* main process id used in log filename */
@@ -394,6 +392,7 @@ usage(void)
 		   "  -v           vacuum all four standard tables before tests\n"
 		   "\nCommon options:\n"
 		   "  -d             print debugging output\n"
+		   "  -q           quiet logging (a message each 5 seconds)\n"
 		   "  -h HOSTNAME    database server host or socket directory\n"
 		   "  -p PORT        database server port number\n"
 		   "  -U USERNAME    connect as specified database user\n"
@@ -1453,8 +1452,14 @@ init(bool is_no_vacuum)
 			exit(1);
 		}
 
+		/* If we want to stick with the original logging, print a message each
+		 * 100k inserted rows. */
+		if ((! use_quiet) && (j % 100000 == 0))
+			fprintf(stderr, "%d of %d tuples (%d%%) done.\n",
+							j, naccounts * scale,
+							(int) (((int64) j * 100) / (naccounts * scale)));
 		/* let's not call the timing for each row, but only each 100 rows */
-		if (j % 100 == 0 || j == scale * naccounts)
+		else if (use_quiet && (j % 100 == 0))
 		{
 			INSTR_TIME_SET_CURRENT(diff);
 			INSTR_TIME_SUBTRACT(diff, start);
@@ -1462,15 +1467,15 @@ init(bool is_no_vacuum)
 			elapsed_sec = INSTR_TIME_GET_DOUBLE(diff);
 			remaining_sec = (scale * naccounts - j) * elapsed_sec / j;
 
-			/* have we reached the next interval? */
-			if (elapsed_sec >= log_interval * log_step_seconds) {
+			/* have we reached the next interval (or end)? */
+			if ((j == scale * naccounts) || (elapsed_sec >= log_interval * LOG_STEP_SECONDS)) {
 
 				fprintf(stderr, "%d of %d tuples (%d%%) done (elapsed %.2f s, remaining %.2f s).\n",
 						j, naccounts * scale,
 						(int) (((int64) j * 100) / (naccounts * scale)), elapsed_sec, remaining_sec);
 
 				/* skip to the next interval */
-				log_interval = (int)ceil(elapsed_sec/log_step_seconds);
+				log_interval = (int)ceil(elapsed_sec/LOG_STEP_SECONDS);
 			}
 		}
 
@@ -2016,7 +2021,7 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
-	while ((c = getopt_long(argc, argv, "ih:nvp:dSNc:j:Crs:t:T:U:lf:D:F:M:", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "ih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:", long_options, &optindex)) != -1)
 	{
 		switch (c)
 		{
@@ -2124,6 +2129,9 @@ main(int argc, char **argv)
 			case 'l':
 				use_log = true;
 				break;
+			case 'q':
+				use_quiet = true;
+				break;
 			case 'f':
 				ttype = 3;
 				filename = pg_strdup(optarg);

#22

Jeevan Chalke

jeevan.chalke@enterprisedb.com

about 13 years ago

In reply to: Tomas Vondra (#21)

Re: too much pgbench init output

On Mon, Dec 17, 2012 at 5:37 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

Hi,

attached is a new version of the patch that

(a) converts the 'log_step_seconds' variable to a constant (and does
not allow changing it using a command-line option etc.)

(b) keeps the current logging as a default

(c) adds a "-q" switch that enables the new logging with a 5-second
interval

I'm still not convinced there should be yet another know for tuning the
log interval - opinions?

It seems that you have generated a patch over your earlier version and due
to that it is not cleanly applying on fresh sources.
Please generate patch on fresh sources.

However, I absolutely no issues with the design. Also code review is
already done and looks good to me.
I think to move forward on this we need someone from core-team. So I am
planning to change its status to "ready-for-committor".

Before that please provide updated patch for final code review.

Thanks

Tomas

On 11.12.2012 10:23, Jeevan Chalke wrote:

On Sun, Dec 9, 2012 at 8:11 AM, Tomas Vondra <tv@fuzzy.cz
<mailto:tv@fuzzy.cz>> wrote:

On 20.11.2012 08:22, Jeevan Chalke wrote:

Hi,

On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra <tv@fuzzy.cz

<mailto:tv@fuzzy.cz>

<mailto:tv@fuzzy.cz <mailto:tv@fuzzy.cz>>> wrote:

On 19.11.2012 11:59, Jeevan Chalke wrote:

Hi,

I gone through the discussion for this patch and here is my

review:

The main aim of this patch is to reduce the number of log

lines. It is

also suggested to use an options to provide the interval but

few of us

are not much agree on it. So final discussion ended at

keeping 5 sec

interval between each log line.

However, I see, there are two types of users here:
1. Who likes these log lines, so that they can troubleshoot

some

slowness and all
2. Who do not like these log lines.

Who likes these lines / needs them for something useful?

No idea. I fall in second category.

But from the discussion, I believe some people may need detailed

(or lot

more) output.

I've read the thread again and my impression is that no one really

needs

or likes those lines, but

(1) it's rather pointless to print a message every 100k rows, as it
usually fills the console with garbabe

(2) it's handy to have regular updates of the progress

I don't think there're people (in the thread) that require to keep

the

current amount of log messages.

But there might be users who actually use the current logs to do
something (although I can't imagine what). If we want to do this in a
backwards compatible way, we should probably use a new option (e.g.
"-q") to enable the new (less verbose) logging.

Do we want to allow both types of logging, or shall we keep only the

new

one? If both, which one should be the default one?

Both the options are fine with me, but the default should be the current
behaviour.

So keeping these in mind, I rather go for an option which

will control

this. People falling in category one can set this option to

very low

where as users falling under second category can keep it

high.

So what option(s) would you expect? Something that tunes the

interval

length or something else?

Interval length.

Well, I can surely imagine something like "--interval N".

+1

A switch that'd choose between the old and new behavior might

be a good

idea, but I'd strongly vote against "automagic" heuristics. It

makes the

behavior very difficult to predict and I really don't want to

force the

users to wonder whether the long delay is due to general

slowness of the

machine or some "clever" logic that causes long delays between

log

messages.

That's why I choose a very simple approach with constant time

interval.

It does what I was aiming for (less logs) and it's easy to

predict.

Sure, we could choose different interval (or make it an

option).

I am preferring an option for choosing an interval, say from 1

second to

10 seconds.

Ummmm, why not to allow arbitrary integer? Why saying 1 to 10

seconds?

Hmm.. actually, I have no issues with any number there. Just put 1..10
as we hard-coded it 5. No particular reason as such.

BTW, what if, we put one log message every 10% (or 5%) with time

taken

(time taken for last 10% (or 5%) and cumulative) over 5 seconds ?
This will have only 10 (or 20) lines per pgbench initialisation.
And since we are showing time taken for each block, if any slowness
happens, one can easily find a block by looking at the timings and
troubleshoot it.
Though 10% or 5% is again a debatable number, but keeping it

constant

will eliminate the requirement of an option.

That's what I originally proposed in September (see the messages from
17/9), and Alvaro was not relly excited about this.

Attached is a patch with fixed whitespace / indentation errors etc.
Otherwise it's the same as the previous version.

OK. Looks good now.

Any other views / suggestions are welcome.

Thanks

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org
<mailto:pgsql-hackers@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com <http://www.enterprisedb.com>
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the
intended recipient, any use, dissemination, distribution, retention,
archiving, or copying of this communication is strictly prohibited. If
you have received this e-mail in error, please notify the sender
immediately by reply e-mail and delete this message.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

#23

Tomas Vondra

tv@fuzzy.cz

about 13 years ago

In reply to: Jeevan Chalke (#22)

1 attachment(s)

Re: too much pgbench init output

On 19.12.2012 06:30, Jeevan Chalke wrote:

On Mon, Dec 17, 2012 at 5:37 AM, Tomas Vondra <tv@fuzzy.cz
<mailto:tv@fuzzy.cz>> wrote:

Hi,

attached is a new version of the patch that

(a) converts the 'log_step_seconds' variable to a constant (and does
not allow changing it using a command-line option etc.)

(b) keeps the current logging as a default

(c) adds a "-q" switch that enables the new logging with a 5-second
interval

I'm still not convinced there should be yet another know for tuning the
log interval - opinions?

It seems that you have generated a patch over your earlier version and
due to that it is not cleanly applying on fresh sources.
Please generate patch on fresh sources.

Seems you're right - I've attached the proper patch against current master.

However, I absolutely no issues with the design. Also code review is
already done and looks good to me.
I think to move forward on this we need someone from core-team. So I am
planning to change its status to "ready-for-committor".

Before that please provide updated patch for final code review.

thanks
Tomas

Attachments:

pgbench-logging-v5.patchtext/plain; charset=UTF-8; name=pgbench-logging-v5.patchDownload

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index e376452..f3953a7 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -39,6 +39,7 @@
 #include "portability/instr_time.h"
 
 #include <ctype.h>
+#include <math.h>
 
 #ifndef WIN32
 #include <sys/time.h>
@@ -102,6 +103,7 @@ extern int	optind;
 #define MAXCLIENTS	1024
 #endif
 
+#define LOG_STEP_SECONDS	5	/* seconds between log messages */
 #define DEFAULT_NXACTS	10		/* default nxacts */
 
 int			nxacts = 0;			/* number of transactions per client */
@@ -150,6 +152,7 @@ char	   *index_tablespace = NULL;
 #define naccounts	100000
 
 bool		use_log;			/* log transaction latencies to a file */
+bool		use_quiet;			/* quiet logging onto stderr */
 bool		is_connect;			/* establish connection for each transaction */
 bool		is_latencies;		/* report per-command latencies */
 int			main_pid;			/* main process id used in log filename */
@@ -389,6 +392,7 @@ usage(void)
 		   "  -v           vacuum all four standard tables before tests\n"
 		   "\nCommon options:\n"
 		   "  -d             print debugging output\n"
+		   "  -q             quiet logging (a message each 5 seconds)\n"
 		   "  -h HOSTNAME    database server host or socket directory\n"
 		   "  -p PORT        database server port number\n"
 		   "  -U USERNAME    connect as specified database user\n"
@@ -1362,6 +1366,11 @@ init(bool is_no_vacuum)
 	char		sql[256];
 	int			i;
 
+	/* used to track elapsed time and estimate of the remaining time */
+	instr_time	start, diff;
+	double		elapsed_sec, remaining_sec;
+	int			log_interval = 1;
+
 	if ((con = doConnect()) == NULL)
 		exit(1);
 
@@ -1430,6 +1439,8 @@ init(bool is_no_vacuum)
 	}
 	PQclear(res);
 
+	INSTR_TIME_SET_CURRENT(start);
+
 	for (i = 0; i < naccounts * scale; i++)
 	{
 		int			j = i + 1;
@@ -1441,10 +1452,33 @@ init(bool is_no_vacuum)
 			exit(1);
 		}
 
-		if (j % 100000 == 0)
+		/* If we want to stick with the original logging, print a message each
+		 * 100k inserted rows. */
+		if ((! use_quiet) && (j % 100000 == 0))
 			fprintf(stderr, "%d of %d tuples (%d%%) done.\n",
-					j, naccounts * scale,
-					(int) (((int64) j * 100) / (naccounts * scale)));
+							j, naccounts * scale,
+							(int) (((int64) j * 100) / (naccounts * scale)));
+		/* let's not call the timing for each row, but only each 100 rows */
+		else if (use_quiet && (j % 100 == 0))
+		{
+			INSTR_TIME_SET_CURRENT(diff);
+			INSTR_TIME_SUBTRACT(diff, start);
+
+			elapsed_sec = INSTR_TIME_GET_DOUBLE(diff);
+			remaining_sec = (scale * naccounts - j) * elapsed_sec / j;
+
+			/* have we reached the next interval (or end)? */
+			if ((j == scale * naccounts) || (elapsed_sec >= log_interval * LOG_STEP_SECONDS)) {
+
+				fprintf(stderr, "%d of %d tuples (%d%%) done (elapsed %.2f s, remaining %.2f s).\n",
+						j, naccounts * scale,
+						(int) (((int64) j * 100) / (naccounts * scale)), elapsed_sec, remaining_sec);
+
+				/* skip to the next interval */
+				log_interval = (int)ceil(elapsed_sec/LOG_STEP_SECONDS);
+			}
+		}
+
 	}
 	if (PQputline(con, "\\.\n"))
 	{
@@ -1987,7 +2021,7 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
-	while ((c = getopt_long(argc, argv, "ih:nvp:dSNc:j:Crs:t:T:U:lf:D:F:M:", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "ih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:", long_options, &optindex)) != -1)
 	{
 		switch (c)
 		{
@@ -2095,6 +2129,9 @@ main(int argc, char **argv)
 			case 'l':
 				use_log = true;
 				break;
+			case 'q':
+				use_quiet = true;
+				break;
 			case 'f':
 				ttype = 3;
 				filename = pg_strdup(optarg);

#24

Jeevan Chalke

jeevan.chalke@enterprisedb.com

about 13 years ago

In reply to: Tomas Vondra (#23)

Re: too much pgbench init output

Looks good to me.

Will mark "Ready for Committer"

Thanks

On Thu, Dec 20, 2012 at 2:30 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 19.12.2012 06:30, Jeevan Chalke wrote:

On Mon, Dec 17, 2012 at 5:37 AM, Tomas Vondra <tv@fuzzy.cz
<mailto:tv@fuzzy.cz>> wrote:

Hi,

attached is a new version of the patch that

(a) converts the 'log_step_seconds' variable to a constant (and does
not allow changing it using a command-line option etc.)

(b) keeps the current logging as a default

(c) adds a "-q" switch that enables the new logging with a 5-second
interval

I'm still not convinced there should be yet another know for tuning

the

log interval - opinions?

It seems that you have generated a patch over your earlier version and
due to that it is not cleanly applying on fresh sources.
Please generate patch on fresh sources.

Seems you're right - I've attached the proper patch against current master.

However, I absolutely no issues with the design. Also code review is
already done and looks good to me.
I think to move forward on this we need someone from core-team. So I am
planning to change its status to "ready-for-committor".

Before that please provide updated patch for final code review.

thanks
Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

#25

Tatsuo Ishii

ishii@postgresql.org

about 13 years ago

In reply to: Tomas Vondra (#23)

Re: too much pgbench init output

On 19.12.2012 06:30, Jeevan Chalke wrote:

On Mon, Dec 17, 2012 at 5:37 AM, Tomas Vondra <tv@fuzzy.cz
<mailto:tv@fuzzy.cz>> wrote:

Hi,

attached is a new version of the patch that

(a) converts the 'log_step_seconds' variable to a constant (and does
not allow changing it using a command-line option etc.)

(b) keeps the current logging as a default

(c) adds a "-q" switch that enables the new logging with a 5-second
interval

I'm still not convinced there should be yet another know for tuning the
log interval - opinions?

It seems that you have generated a patch over your earlier version and
due to that it is not cleanly applying on fresh sources.
Please generate patch on fresh sources.

Seems you're right - I've attached the proper patch against current master.

However, I absolutely no issues with the design. Also code review is
already done and looks good to me.
I think to move forward on this we need someone from core-team. So I am
planning to change its status to "ready-for-committor".

Before that please provide updated patch for final code review.

As a committer, I have looked into the patch. I noticed two things:

1) In the help you put '-q' option into "Common options" section. I
think this should be moved to "Initialization options" section because
the option is only applied while initializing.

2) Shouldn't a long option for '-q' be provided? Something like
'--quiet-progress-logging'?

3) No patches for docs found (doc/src/sgml/pgbench.sgml)
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Tomas Vondra

tv@fuzzy.cz

about 13 years ago

In reply to: Tatsuo Ishii (#25)

1 attachment(s)

Re: too much pgbench init output

On 6.1.2013 03:03, Tatsuo Ishii wrote:

As a committer, I have looked into the patch. I noticed two things:

1) In the help you put '-q' option into "Common options" section. I
think this should be moved to "Initialization options" section because
the option is only applied while initializing.

Good point, moved.

2) Shouldn't a long option for '-q' be provided? Something like
'--quiet-progress-logging'?

I don't think so. Currently pgbench has either short or long option, not
both (for the same thing). I believe we should stick to this and either
choose "-q" or "--quiet-logging" but not both.

3) No patches for docs found (doc/src/sgml/pgbench.sgml)

I've added a brief description of the "-q" option into the docs. IMHO
it's enough but feel free to add some more details.

There's one more thing I've just noticed - the original version of the
patch simply removed the old logging, but this one keeps both old and
quiet logging. But the old logging still uses this:

fprintf(stderr, "%d of %d tuples (%d%%) done.\n", ....

while the new logging does this

fprintf(stderr, "%d of %d tuples (%d%%) done (elapsed %.2f s,
remaining %.2f s).\n",

i.e. it prints additional info about elapsed/estimated time. Do we want
to keep it this way (i.e. not to mess with the old logging) or do we
want to add these new fields to the old logging too?

I suggest to add it to the old logging, to keep the log messages the
same, the only difference being the logging frequency.

Tomas

Attachments:

pgbench-logging-v6.patchtext/plain; charset=UTF-8; name=pgbench-logging-v6.patchDownload

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index e376452..31764fe 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -39,6 +39,7 @@
 #include "portability/instr_time.h"
 
 #include <ctype.h>
+#include <math.h>
 
 #ifndef WIN32
 #include <sys/time.h>
@@ -102,6 +103,7 @@ extern int	optind;
 #define MAXCLIENTS	1024
 #endif
 
+#define LOG_STEP_SECONDS	5	/* seconds between log messages */
 #define DEFAULT_NXACTS	10		/* default nxacts */
 
 int			nxacts = 0;			/* number of transactions per client */
@@ -150,6 +152,7 @@ char	   *index_tablespace = NULL;
 #define naccounts	100000
 
 bool		use_log;			/* log transaction latencies to a file */
+bool		use_quiet;			/* quiet logging onto stderr */
 bool		is_connect;			/* establish connection for each transaction */
 bool		is_latencies;		/* report per-command latencies */
 int			main_pid;			/* main process id used in log filename */
@@ -359,6 +362,7 @@ usage(void)
 		   "  -n           do not run VACUUM after initialization\n"
 		   "  -F NUM       fill factor\n"
 		   "  -s NUM       scaling factor\n"
+		   "  -q           quiet logging (one message each 5 seconds)\n"
 		   "  --foreign-keys\n"
 		   "               create foreign key constraints between tables\n"
 		   "  --index-tablespace=TABLESPACE\n"
@@ -1362,6 +1366,11 @@ init(bool is_no_vacuum)
 	char		sql[256];
 	int			i;
 
+	/* used to track elapsed time and estimate of the remaining time */
+	instr_time	start, diff;
+	double		elapsed_sec, remaining_sec;
+	int			log_interval = 1;
+
 	if ((con = doConnect()) == NULL)
 		exit(1);
 
@@ -1430,6 +1439,8 @@ init(bool is_no_vacuum)
 	}
 	PQclear(res);
 
+	INSTR_TIME_SET_CURRENT(start);
+
 	for (i = 0; i < naccounts * scale; i++)
 	{
 		int			j = i + 1;
@@ -1441,10 +1452,33 @@ init(bool is_no_vacuum)
 			exit(1);
 		}
 
-		if (j % 100000 == 0)
+		/* If we want to stick with the original logging, print a message each
+		 * 100k inserted rows. */
+		if ((! use_quiet) && (j % 100000 == 0))
 			fprintf(stderr, "%d of %d tuples (%d%%) done.\n",
-					j, naccounts * scale,
-					(int) (((int64) j * 100) / (naccounts * scale)));
+							j, naccounts * scale,
+							(int) (((int64) j * 100) / (naccounts * scale)));
+		/* let's not call the timing for each row, but only each 100 rows */
+		else if (use_quiet && (j % 100 == 0))
+		{
+			INSTR_TIME_SET_CURRENT(diff);
+			INSTR_TIME_SUBTRACT(diff, start);
+
+			elapsed_sec = INSTR_TIME_GET_DOUBLE(diff);
+			remaining_sec = (scale * naccounts - j) * elapsed_sec / j;
+
+			/* have we reached the next interval (or end)? */
+			if ((j == scale * naccounts) || (elapsed_sec >= log_interval * LOG_STEP_SECONDS)) {
+
+				fprintf(stderr, "%d of %d tuples (%d%%) done (elapsed %.2f s, remaining %.2f s).\n",
+						j, naccounts * scale,
+						(int) (((int64) j * 100) / (naccounts * scale)), elapsed_sec, remaining_sec);
+
+				/* skip to the next interval */
+				log_interval = (int)ceil(elapsed_sec/LOG_STEP_SECONDS);
+			}
+		}
+
 	}
 	if (PQputline(con, "\\.\n"))
 	{
@@ -1987,7 +2021,7 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
-	while ((c = getopt_long(argc, argv, "ih:nvp:dSNc:j:Crs:t:T:U:lf:D:F:M:", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "ih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:", long_options, &optindex)) != -1)
 	{
 		switch (c)
 		{
@@ -2095,6 +2129,9 @@ main(int argc, char **argv)
 			case 'l':
 				use_log = true;
 				break;
+			case 'q':
+				use_quiet = true;
+				break;
 			case 'f':
 				ttype = 3;
 				filename = pg_strdup(optarg);
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index 91530ab..58686b1 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -190,6 +190,17 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
      </varlistentry>
 
      <varlistentry>
+      <term><option>-q</option></term>
+      <listitem>
+       <para>
+        Switch logging to quiet mode, producing only one progress message per 5
+        seconds. The default logging prints one message each 100000 rows, which
+        often outputs many lines per second (especially on good hardware).
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--foreign-keys</option></term>
       <listitem>
        <para>

#27

Tatsuo Ishii

ishii@postgresql.org

about 13 years ago

In reply to: Tomas Vondra (#26)

Re: too much pgbench init output

On 6.1.2013 03:03, Tatsuo Ishii wrote:

As a committer, I have looked into the patch. I noticed two things:

1) In the help you put '-q' option into "Common options" section. I
think this should be moved to "Initialization options" section because
the option is only applied while initializing.

Good point, moved.

In addition to this, I'd suggest to add checking -q is only possible
with -i option since without -i, -q is meaningless.

2) Shouldn't a long option for '-q' be provided? Something like
'--quiet-progress-logging'?

I don't think so. Currently pgbench has either short or long option, not
both (for the same thing). I believe we should stick to this and either
choose "-q" or "--quiet-logging" but not both.

Ok.

3) No patches for docs found (doc/src/sgml/pgbench.sgml)

I've added a brief description of the "-q" option into the docs. IMHO
it's enough but feel free to add some more details.

Good.

There's one more thing I've just noticed - the original version of the
patch simply removed the old logging, but this one keeps both old and
quiet logging. But the old logging still uses this:

fprintf(stderr, "%d of %d tuples (%d%%) done.\n", ....

while the new logging does this

fprintf(stderr, "%d of %d tuples (%d%%) done (elapsed %.2f s,
remaining %.2f s).\n",

i.e. it prints additional info about elapsed/estimated time. Do we want
to keep it this way (i.e. not to mess with the old logging) or do we
want to add these new fields to the old logging too?

I suggest to add it to the old logging, to keep the log messages the
same, the only difference being the logging frequency.

If we do so, probably '-q' is not appropeate option name any more,
since the only difference between old logging and new one is, the
former is printed every 10000 lines while the latter is every 5
seconds, which is not really "quiet". What do you think?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Tomas Vondra

tv@fuzzy.cz

about 13 years ago

In reply to: Tatsuo Ishii (#27)

1 attachment(s)

Re: too much pgbench init output

On 6.1.2013 05:07, Tatsuo Ishii wrote:

On 6.1.2013 03:03, Tatsuo Ishii wrote:

As a committer, I have looked into the patch. I noticed two things:

1) In the help you put '-q' option into "Common options" section. I
think this should be moved to "Initialization options" section because
the option is only applied while initializing.

Good point, moved.

In addition to this, I'd suggest to add checking -q is only possible
with -i option since without -i, -q is meaningless.

Done.

There's one more thing I've just noticed - the original version of the
patch simply removed the old logging, but this one keeps both old and
quiet logging. But the old logging still uses this:

fprintf(stderr, "%d of %d tuples (%d%%) done.\n", ....

while the new logging does this

fprintf(stderr, "%d of %d tuples (%d%%) done (elapsed %.2f s,
remaining %.2f s).\n",

i.e. it prints additional info about elapsed/estimated time. Do we want
to keep it this way (i.e. not to mess with the old logging) or do we
want to add these new fields to the old logging too?

I suggest to add it to the old logging, to keep the log messages the
same, the only difference being the logging frequency.

If we do so, probably '-q' is not appropeate option name any more,
since the only difference between old logging and new one is, the
former is printed every 10000 lines while the latter is every 5
seconds, which is not really "quiet". What do you think?

AFAIK the "5 second" logging is much quieter in most cases (and a bit
more verbose when the initialization gets very slower), so I think the
"quiet" logging is not a bad match although maybe there's a better name.

This change (adding the elapsed/remaining fields to the original loggin)
would be consistent with this name, because considering a single line,
the "-q" is more verbose right now.

So I'd stick with the "-q" option and added the fields to the original
logging. But I'm not opposing a different name, I just can't think of a
better one.

Tomas

Attachments:

pgbench-logging-v7.patchtext/plain; charset=UTF-8; name=pgbench-logging-v7.patchDownload

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index e376452..39bd8a5 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -39,6 +39,7 @@
 #include "portability/instr_time.h"
 
 #include <ctype.h>
+#include <math.h>
 
 #ifndef WIN32
 #include <sys/time.h>
@@ -102,6 +103,7 @@ extern int	optind;
 #define MAXCLIENTS	1024
 #endif
 
+#define LOG_STEP_SECONDS	5	/* seconds between log messages */
 #define DEFAULT_NXACTS	10		/* default nxacts */
 
 int			nxacts = 0;			/* number of transactions per client */
@@ -150,6 +152,7 @@ char	   *index_tablespace = NULL;
 #define naccounts	100000
 
 bool		use_log;			/* log transaction latencies to a file */
+bool		use_quiet;			/* quiet logging onto stderr */
 bool		is_connect;			/* establish connection for each transaction */
 bool		is_latencies;		/* report per-command latencies */
 int			main_pid;			/* main process id used in log filename */
@@ -359,6 +362,7 @@ usage(void)
 		   "  -n           do not run VACUUM after initialization\n"
 		   "  -F NUM       fill factor\n"
 		   "  -s NUM       scaling factor\n"
+		   "  -q           quiet logging (one message each 5 seconds)\n"
 		   "  --foreign-keys\n"
 		   "               create foreign key constraints between tables\n"
 		   "  --index-tablespace=TABLESPACE\n"
@@ -1362,6 +1366,11 @@ init(bool is_no_vacuum)
 	char		sql[256];
 	int			i;
 
+	/* used to track elapsed time and estimate of the remaining time */
+	instr_time	start, diff;
+	double		elapsed_sec, remaining_sec;
+	int			log_interval = 1;
+
 	if ((con = doConnect()) == NULL)
 		exit(1);
 
@@ -1430,6 +1439,8 @@ init(bool is_no_vacuum)
 	}
 	PQclear(res);
 
+	INSTR_TIME_SET_CURRENT(start);
+
 	for (i = 0; i < naccounts * scale; i++)
 	{
 		int			j = i + 1;
@@ -1441,10 +1452,33 @@ init(bool is_no_vacuum)
 			exit(1);
 		}
 
-		if (j % 100000 == 0)
+		/* If we want to stick with the original logging, print a message each
+		 * 100k inserted rows. */
+		if ((! use_quiet) && (j % 100000 == 0))
 			fprintf(stderr, "%d of %d tuples (%d%%) done.\n",
-					j, naccounts * scale,
-					(int) (((int64) j * 100) / (naccounts * scale)));
+							j, naccounts * scale,
+							(int) (((int64) j * 100) / (naccounts * scale)));
+		/* let's not call the timing for each row, but only each 100 rows */
+		else if (use_quiet && (j % 100 == 0))
+		{
+			INSTR_TIME_SET_CURRENT(diff);
+			INSTR_TIME_SUBTRACT(diff, start);
+
+			elapsed_sec = INSTR_TIME_GET_DOUBLE(diff);
+			remaining_sec = (scale * naccounts - j) * elapsed_sec / j;
+
+			/* have we reached the next interval (or end)? */
+			if ((j == scale * naccounts) || (elapsed_sec >= log_interval * LOG_STEP_SECONDS)) {
+
+				fprintf(stderr, "%d of %d tuples (%d%%) done (elapsed %.2f s, remaining %.2f s).\n",
+						j, naccounts * scale,
+						(int) (((int64) j * 100) / (naccounts * scale)), elapsed_sec, remaining_sec);
+
+				/* skip to the next interval */
+				log_interval = (int)ceil(elapsed_sec/LOG_STEP_SECONDS);
+			}
+		}
+
 	}
 	if (PQputline(con, "\\.\n"))
 	{
@@ -1987,7 +2021,7 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
-	while ((c = getopt_long(argc, argv, "ih:nvp:dSNc:j:Crs:t:T:U:lf:D:F:M:", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "ih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:", long_options, &optindex)) != -1)
 	{
 		switch (c)
 		{
@@ -2095,6 +2129,9 @@ main(int argc, char **argv)
 			case 'l':
 				use_log = true;
 				break;
+			case 'q':
+				use_quiet = true;
+				break;
 			case 'f':
 				ttype = 3;
 				filename = pg_strdup(optarg);
@@ -2198,6 +2235,13 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	/* -q may be used only with -i */
+	if (use_quiet && !is_init_mode)
+	{
+		fprintf(stderr, "quiet-logging is allowed only in initialization mode (-i)\n");
+		exit(1);
+	}
+
 	/*
 	 * is_latencies only works with multiple threads in thread-based
 	 * implementations, not fork-based ones, because it supposes that the
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index 91530ab..58686b1 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -190,6 +190,17 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
      </varlistentry>
 
      <varlistentry>
+      <term><option>-q</option></term>
+      <listitem>
+       <para>
+        Switch logging to quiet mode, producing only one progress message per 5
+        seconds. The default logging prints one message each 100000 rows, which
+        often outputs many lines per second (especially on good hardware).
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--foreign-keys</option></term>
       <listitem>
        <para>

#29

Tatsuo Ishii

ishii@postgresql.org

about 13 years ago

In reply to: Tomas Vondra (#28)

Re: too much pgbench init output

If we do so, probably '-q' is not appropeate option name any more,
since the only difference between old logging and new one is, the
former is printed every 10000 lines while the latter is every 5
seconds, which is not really "quiet". What do you think?

AFAIK the "5 second" logging is much quieter in most cases (and a bit
more verbose when the initialization gets very slower), so I think the
"quiet" logging is not a bad match although maybe there's a better name.

This change (adding the elapsed/remaining fields to the original loggin)
would be consistent with this name, because considering a single line,
the "-q" is more verbose right now.

So I'd stick with the "-q" option and added the fields to the original
logging. But I'm not opposing a different name, I just can't think of a
better one.

Ok, I'm with you ("-q" and along with adding the elapsed/remaining
fields to the original logging).
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

Tomas Vondra

tv@fuzzy.cz

about 13 years ago

In reply to: Tatsuo Ishii (#29)

1 attachment(s)

Re: too much pgbench init output

On 6.1.2013 10:35, Tatsuo Ishii wrote:

If we do so, probably '-q' is not appropeate option name any more,
since the only difference between old logging and new one is, the
former is printed every 10000 lines while the latter is every 5
seconds, which is not really "quiet". What do you think?

AFAIK the "5 second" logging is much quieter in most cases (and a bit
more verbose when the initialization gets very slower), so I think the
"quiet" logging is not a bad match although maybe there's a better name.

This change (adding the elapsed/remaining fields to the original loggin)
would be consistent with this name, because considering a single line,
the "-q" is more verbose right now.

So I'd stick with the "-q" option and added the fields to the original
logging. But I'm not opposing a different name, I just can't think of a
better one.

Ok, I'm with you ("-q" and along with adding the elapsed/remaining
fields to the original logging).

Great, attached is a patch that does that.

Tomas

Attachments:

pgbench-logging-v8.patchtext/plain; charset=UTF-8; name=pgbench-logging-v8.patchDownload

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index e376452..9b9ca77 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -39,6 +39,7 @@
 #include "portability/instr_time.h"
 
 #include <ctype.h>
+#include <math.h>
 
 #ifndef WIN32
 #include <sys/time.h>
@@ -102,6 +103,7 @@ extern int	optind;
 #define MAXCLIENTS	1024
 #endif
 
+#define LOG_STEP_SECONDS	5	/* seconds between log messages */
 #define DEFAULT_NXACTS	10		/* default nxacts */
 
 int			nxacts = 0;			/* number of transactions per client */
@@ -150,6 +152,7 @@ char	   *index_tablespace = NULL;
 #define naccounts	100000
 
 bool		use_log;			/* log transaction latencies to a file */
+bool		use_quiet;			/* quiet logging onto stderr */
 bool		is_connect;			/* establish connection for each transaction */
 bool		is_latencies;		/* report per-command latencies */
 int			main_pid;			/* main process id used in log filename */
@@ -359,6 +362,7 @@ usage(void)
 		   "  -n           do not run VACUUM after initialization\n"
 		   "  -F NUM       fill factor\n"
 		   "  -s NUM       scaling factor\n"
+		   "  -q           quiet logging (one message each 5 seconds)\n"
 		   "  --foreign-keys\n"
 		   "               create foreign key constraints between tables\n"
 		   "  --index-tablespace=TABLESPACE\n"
@@ -1362,6 +1366,11 @@ init(bool is_no_vacuum)
 	char		sql[256];
 	int			i;
 
+	/* used to track elapsed time and estimate of the remaining time */
+	instr_time	start, diff;
+	double		elapsed_sec, remaining_sec;
+	int			log_interval = 1;
+
 	if ((con = doConnect()) == NULL)
 		exit(1);
 
@@ -1430,6 +1439,8 @@ init(bool is_no_vacuum)
 	}
 	PQclear(res);
 
+	INSTR_TIME_SET_CURRENT(start);
+
 	for (i = 0; i < naccounts * scale; i++)
 	{
 		int			j = i + 1;
@@ -1441,10 +1452,42 @@ init(bool is_no_vacuum)
 			exit(1);
 		}
 
-		if (j % 100000 == 0)
-			fprintf(stderr, "%d of %d tuples (%d%%) done.\n",
-					j, naccounts * scale,
-					(int) (((int64) j * 100) / (naccounts * scale)));
+		/* If we want to stick with the original logging, print a message each
+		 * 100k inserted rows. */
+		if ((! use_quiet) && (j % 100000 == 0))
+		{
+			INSTR_TIME_SET_CURRENT(diff);
+			INSTR_TIME_SUBTRACT(diff, start);
+
+			elapsed_sec = INSTR_TIME_GET_DOUBLE(diff);
+			remaining_sec = (scale * naccounts - j) * elapsed_sec / j;
+
+			fprintf(stderr, "%d of %d tuples (%d%%) done (elapsed %.2f s, remaining %.2f s).\n",
+							j, naccounts * scale,
+							(int) (((int64) j * 100) / (naccounts * scale)),
+							elapsed_sec, remaining_sec);
+		}
+		/* let's not call the timing for each row, but only each 100 rows */
+		else if (use_quiet && (j % 100 == 0))
+		{
+			INSTR_TIME_SET_CURRENT(diff);
+			INSTR_TIME_SUBTRACT(diff, start);
+
+			elapsed_sec = INSTR_TIME_GET_DOUBLE(diff);
+			remaining_sec = (scale * naccounts - j) * elapsed_sec / j;
+
+			/* have we reached the next interval (or end)? */
+			if ((j == scale * naccounts) || (elapsed_sec >= log_interval * LOG_STEP_SECONDS)) {
+
+				fprintf(stderr, "%d of %d tuples (%d%%) done (elapsed %.2f s, remaining %.2f s).\n",
+						j, naccounts * scale,
+						(int) (((int64) j * 100) / (naccounts * scale)), elapsed_sec, remaining_sec);
+
+				/* skip to the next interval */
+				log_interval = (int)ceil(elapsed_sec/LOG_STEP_SECONDS);
+			}
+		}
+
 	}
 	if (PQputline(con, "\\.\n"))
 	{
@@ -1987,7 +2030,7 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
-	while ((c = getopt_long(argc, argv, "ih:nvp:dSNc:j:Crs:t:T:U:lf:D:F:M:", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "ih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:", long_options, &optindex)) != -1)
 	{
 		switch (c)
 		{
@@ -2095,6 +2138,9 @@ main(int argc, char **argv)
 			case 'l':
 				use_log = true;
 				break;
+			case 'q':
+				use_quiet = true;
+				break;
 			case 'f':
 				ttype = 3;
 				filename = pg_strdup(optarg);
@@ -2198,6 +2244,13 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	/* -q may be used only with -i */
+	if (use_quiet && !is_init_mode)
+	{
+		fprintf(stderr, "quiet-logging is allowed only in initialization mode (-i)\n");
+		exit(1);
+	}
+
 	/*
 	 * is_latencies only works with multiple threads in thread-based
 	 * implementations, not fork-based ones, because it supposes that the
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index 91530ab..58686b1 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -190,6 +190,17 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
      </varlistentry>
 
      <varlistentry>
+      <term><option>-q</option></term>
+      <listitem>
+       <para>
+        Switch logging to quiet mode, producing only one progress message per 5
+        seconds. The default logging prints one message each 100000 rows, which
+        often outputs many lines per second (especially on good hardware).
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--foreign-keys</option></term>
       <listitem>
        <para>

#31

Tatsuo Ishii

ishii@postgresql.org

about 13 years ago

In reply to: Tomas Vondra (#30)

Re: too much pgbench init output

AFAIK the "5 second" logging is much quieter in most cases (and a bit
more verbose when the initialization gets very slower), so I think the
"quiet" logging is not a bad match although maybe there's a better name.

This change (adding the elapsed/remaining fields to the original loggin)
would be consistent with this name, because considering a single line,
the "-q" is more verbose right now.

So I'd stick with the "-q" option and added the fields to the original
logging. But I'm not opposing a different name, I just can't think of a
better one.

Ok, I'm with you ("-q" and along with adding the elapsed/remaining
fields to the original logging).

Great, attached is a patch that does that.

Committed. Thanks.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers