pgbench and timestamps

Started by Jaime Soleralmost 6 years ago8 messagesgeneral
Jump to latest
#1Jaime Soler
jaime.soler@gmail.com

Hi, does anybody know what is wrong with pgbench in this case ?. Here is a
simple query to generate a random date in a interval time.sql:

(select timestamp '2005-09-01' + random() * ( timestamp '2006-03-01
00:00:00' - timestamp '2005-09-01 00:00:00' ));

query executed successfullly with psql

/usr/lib/postgresql/12/bin/psql -p 5432 -h localhost -d picp -U
postgres -f time.sql
BEGIN
?column?
--------------------------
2005-11-24 13:22:02.4781
(1 fila)COMMIT

psql (PostgreSQL) 12.3 (Ubuntu 12.3-1.pgdg20.04+1)

but look at what happen with pgbench

pgbench -c 2 -j 2 -M prepared --file time.sql -h localhost -d picp -U
postgres -p 5432
pghost: localhost pgport: 5432 nclients: 2 nxacts: 10 dbName: picp
starting vacuum...ERROR: no existe la relación «pgbench_branches»
(ignoring this error and continuing anyway)
ERROR: no existe la relación «pgbench_tellers»
(ignoring this error and continuing anyway)
ERROR: no existe la relación «pgbench_history»
(ignoring this error and continuing anyway)
end.
client 0 executing script "time.sql"
ERROR: la sintaxis de entrada no es válida para tipo timestamp:
«2006-03-01 00$1$2»
LINE 1: ...t timestamp '2005-09-01' + random() * ( timestamp '2006-03-0...
^
client 0 sending P0_0
client 0 receiving
client 0 receiving
client 0 sending P0_1
client 0 receiving
client 0 receiving
client 0 script 0 aborted in command 1 query 0: ERROR: no existe la
sentencia preparada «P0_1»
client 1 executing script "time.sql"
ERROR: la sintaxis de entrada no es válida para tipo timestamp:
«2006-03-01 00$1$2»
LINE 1: ...t timestamp '2005-09-01' + random() * ( timestamp '2006-03-0...
^Run was
aborted; the above results are incomplete.

pgbench (PostgreSQL) 12.3 (Ubuntu 12.3-1.pgdg20.04+1)

I don't know why pgbench use timestamp: «2006-03-01 00$1$2» instead of
timestamp '2006-03-01 00:00:00'

Regards

#2David Rowley
dgrowleyml@gmail.com
In reply to: Jaime Soler (#1)
Re: pgbench and timestamps

On Wed, 24 Jun 2020 at 20:41, Jaime Soler <jaime.soler@gmail.com> wrote:

Hi, does anybody know what is wrong with pgbench in this case ?. Here is a simple query to generate a random date in a interval time.sql:

(select timestamp '2005-09-01' + random() * ( timestamp '2006-03-01 00:00:00' - timestamp '2005-09-01 00:00:00' ));
pgbench -c 2 -j 2 -M prepared --file time.sql -h localhost -d picp -U postgres -p 5432
ERROR: la sintaxis de entrada no es válida para tipo timestamp: «2006-03-01 00$1$2»

I don't know why pgbench use timestamp: «2006-03-01 00$1$2» instead of timestamp '2006-03-01 00:00:00'

I've not debugged it, but it looks like pgbench thinks that :00 is a
pgbench variable and is replacing each instance with a query
parameter.

https://www.postgresql.org/docs/12/pgbench.html says:

"There is a simple variable-substitution facility for script files.
Variable names must consist of letters (including non-Latin letters),
digits, and underscores. Variables can be set by the command-line -D
option, explained above, or by the meta commands explained below. In
addition to any variables preset by -D command-line options, there are
a few variables that are preset automatically, listed in Table 257. A
value specified for these variables using -D takes precedence over the
automatic presets. Once set, a variable's value can be inserted into a
SQL command by writing :variablename. When running more than one
client session, each session has its own set of variables. pgbench
supports up to 255 variable uses in one statement."

I don't often do much with pgbench and variables, but there are a few
things that surprise me here.

1) That pgbench replaces variables within single quotes, and;
2) that we still think it's a variable name when it starts with a digit, and;
3) We replace variables that are undefined.

I won't pretend to be familiar enough with pgbench internals to know
if there's any reasonable reasons why we do each of the above, but...

I guess you could work around this problem by just not putting the
midnight time in your timestamp. However, that might not work so well
if you want to specify a time other than midnight.

David

#3Jaime Soler
jaime.soler@gmail.com
In reply to: David Rowley (#2)
Re: pgbench and timestamps

Hi,

Thanks for your comments, I worked around that problem because I was able
to truncate the timestamp and use only the date part , alsoit might
works the use of to_timestamp. But I would like to understand what is
happening , I realized that pgbench is identified erroneously the minutes
and seconds parts :00:00 as two variables .

Regards

El mié., 24 jun. 2020 a las 14:50, David Rowley (<dgrowleyml@gmail.com>)
escribió:

Show quoted text

On Wed, 24 Jun 2020 at 20:41, Jaime Soler <jaime.soler@gmail.com> wrote:

Hi, does anybody know what is wrong with pgbench in this case ?. Here is

a simple query to generate a random date in a interval time.sql:

(select timestamp '2005-09-01' + random() * ( timestamp '2006-03-01

00:00:00' - timestamp '2005-09-01 00:00:00' ));

pgbench -c 2 -j 2 -M prepared --file time.sql -h localhost -d picp -U

postgres -p 5432

ERROR: la sintaxis de entrada no es válida para tipo timestamp:

«2006-03-01 00$1$2»

I don't know why pgbench use timestamp: «2006-03-01 00$1$2» instead of

timestamp '2006-03-01 00:00:00'

I've not debugged it, but it looks like pgbench thinks that :00 is a
pgbench variable and is replacing each instance with a query
parameter.

https://www.postgresql.org/docs/12/pgbench.html says:

"There is a simple variable-substitution facility for script files.
Variable names must consist of letters (including non-Latin letters),
digits, and underscores. Variables can be set by the command-line -D
option, explained above, or by the meta commands explained below. In
addition to any variables preset by -D command-line options, there are
a few variables that are preset automatically, listed in Table 257. A
value specified for these variables using -D takes precedence over the
automatic presets. Once set, a variable's value can be inserted into a
SQL command by writing :variablename. When running more than one
client session, each session has its own set of variables. pgbench
supports up to 255 variable uses in one statement."

I don't often do much with pgbench and variables, but there are a few
things that surprise me here.

1) That pgbench replaces variables within single quotes, and;
2) that we still think it's a variable name when it starts with a digit,
and;
3) We replace variables that are undefined.

I won't pretend to be familiar enough with pgbench internals to know
if there's any reasonable reasons why we do each of the above, but...

I guess you could work around this problem by just not putting the
midnight time in your timestamp. However, that might not work so well
if you want to specify a time other than midnight.

David

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#2)
Re: pgbench and timestamps

David Rowley <dgrowleyml@gmail.com> writes:

On Wed, 24 Jun 2020 at 20:41, Jaime Soler <jaime.soler@gmail.com> wrote:

I don't know why pgbench use timestamp: «2006-03-01 00$1$2» instead of timestamp '2006-03-01 00:00:00'

I've not debugged it, but it looks like pgbench thinks that :00 is a
pgbench variable and is replacing each instance with a query
parameter.

Yeah.

I don't often do much with pgbench and variables, but there are a few
things that surprise me here.

1) That pgbench replaces variables within single quotes, and;
2) that we still think it's a variable name when it starts with a digit, and;
3) We replace variables that are undefined.

Also (4) this only happens when in non-simple query mode --- the
example works fine without "-M prepared".

I can think of use-cases for substituting variables inside quotes,
so maybe (1) isn't a bug; but it sure seems like (3) and (4) are.

In any case, the documentation about this seems pretty inadequate.

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#4)
Re: pgbench and timestamps

I wrote:

David Rowley <dgrowleyml@gmail.com> writes:

I don't often do much with pgbench and variables, but there are a few
things that surprise me here.
1) That pgbench replaces variables within single quotes, and;
2) that we still think it's a variable name when it starts with a digit, and;
3) We replace variables that are undefined.

Also (4) this only happens when in non-simple query mode --- the
example works fine without "-M prepared".

After looking around in the code, it seems like the core of the issue
is that pgbench.c's parseQuery() doesn't check whether a possible
variable name is actually defined, unlike assignVariables() which is
what does the same job in simple query mode. So that explains the
behavioral difference.

The reason for doing that probably was that parseQuery() is run when
the input file is read, so that relevant variables might not be set
yet. We could fix that by postponing the work to be done at first
execution of the query, as is already the case for PQprepare'ing the
query.

Also, after further thought I realize that (1) absolutely is a bug
in the non-simple query modes, whatever you think about it in simple
mode. The non-simple modes are trying to pass the variable values
as extended-query-protocol parameters, and the backend is not going
to recognize $n inside a literal as being a parameter.

If we fixed (1) and (3) I think there wouldn't be any great need
to tighten up (2).

regards, tom lane

#6Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Tom Lane (#5)
Re: pgbench and timestamps

I'll look into it. Thanks for the analysis and CC-ing.

--
Fabien.

#7Jaime Soler
jaime.soler@gmail.com
In reply to: Tom Lane (#5)
Re: pgbench and timestamps

Thanks for your analysis.

Regards

El mié., 24 jun. 2020 a las 17:17, Tom Lane (<tgl@sss.pgh.pa.us>) escribió:

Show quoted text

I wrote:

David Rowley <dgrowleyml@gmail.com> writes:

I don't often do much with pgbench and variables, but there are a few
things that surprise me here.
1) That pgbench replaces variables within single quotes, and;
2) that we still think it's a variable name when it starts with a

digit, and;

3) We replace variables that are undefined.

Also (4) this only happens when in non-simple query mode --- the
example works fine without "-M prepared".

After looking around in the code, it seems like the core of the issue
is that pgbench.c's parseQuery() doesn't check whether a possible
variable name is actually defined, unlike assignVariables() which is
what does the same job in simple query mode. So that explains the
behavioral difference.

The reason for doing that probably was that parseQuery() is run when
the input file is read, so that relevant variables might not be set
yet. We could fix that by postponing the work to be done at first
execution of the query, as is already the case for PQprepare'ing the
query.

Also, after further thought I realize that (1) absolutely is a bug
in the non-simple query modes, whatever you think about it in simple
mode. The non-simple modes are trying to pass the variable values
as extended-query-protocol parameters, and the backend is not going
to recognize $n inside a literal as being a parameter.

If we fixed (1) and (3) I think there wouldn't be any great need
to tighten up (2).

regards, tom lane

#8Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Tom Lane (#5)
Re: pgbench and timestamps

Hello Tom,

The attached patch fixes some of the underlying problems reported by
delaying the :var to $1 substitution to the last possible moments, so that
what variables are actually defined is known. PREPARE-ing is also delayed
to after these substitutions are done.

It requires a mutex around the commands, I tried to do some windows
implementation which may or may not work.

The attached patch fixes (2) & (3) for extended & prepared.

I have a doubt about fixing (1) because it would be a significant
behavioral change and it requires changing the replace routine
significantly to check for quoting, comments, and so on. This means that
currently ':var' is still broken under -M extended & prepared, I could
only break it differently by providing a nicer error message and also
break it under simple whereas it currently works there. I'm not thrilled
by spending efforts to do that.

The patches change the name of "parseQuery" to "makeVariablesParameters",
because it was not actually parsing any query. Maybe the new name could be
improved.

In passing, there was a bug in how NULL was passed, which I tried to fix
as well.

I don't often do much with pgbench and variables, but there are a few
things that surprise me here.
1) That pgbench replaces variables within single quotes, and;
2) that we still think it's a variable name when it starts with a digit, and;
3) We replace variables that are undefined.

Also (4) this only happens when in non-simple query mode --- the
example works fine without "-M prepared".

After looking around in the code, it seems like the core of the issue
is that pgbench.c's parseQuery() doesn't check whether a possible
variable name is actually defined, unlike assignVariables() which is
what does the same job in simple query mode. So that explains the
behavioral difference.

Yes.

The reason for doing that probably was that parseQuery() is run when
the input file is read, so that relevant variables might not be set
yet. We could fix that by postponing the work to be done at first
execution of the query, as is already the case for PQprepare'ing the
query.

Yep, done at first execution of the Command, so that variables are known.

Also, after further thought I realize that (1) absolutely is a bug
in the non-simple query modes, whatever you think about it in simple
mode. The non-simple modes are trying to pass the variable values
as extended-query-protocol parameters, and the backend is not going
to recognize $n inside a literal as being a parameter.

Yep. See my comments above.

If we fixed (1) and (3) I think there wouldn't be any great need
to tighten up (2).

I did (2) but not (1), for now.

--
Fabien.

Attachments:

pgbench-var-fix-1.patchtext/x-diff; name=pgbench-var-fix-1.patchDownload+191-61