extend pgbench expressions with functions
This patch extends pgbench expression with functions. Currently only one
"abs" function is added. The point is rather to bootstrap the
infrastructure for other functions (such as hash, random variants...) to
be added later.
--
Fabien.
Attachments:
pgbench-expr-abs-1.patchtext/x-diff; name=pgbench-expr-abs-1.patchDownload+66-3
This patch extends pgbench expression with functions. Currently only one
"abs" function is added. The point is rather to bootstrap the infrastructure
for other functions (such as hash, random variants...) to be added later.
Here is a small v2 update:
- with better error messages on non existing functions
- a minimal documentation
--
Fabien.
Attachments:
pgbench-expr-abs-2.patchtext/x-diff; name=pgbench-expr-abs-2.patchDownload+68-4
Here is a small v2 update:
v3, just a rebase.
--
Fabien.
Attachments:
pgbench-expr-abs-3.patchtext/x-diff; name=pgbench-expr-abs-3.patchDownload+72-4
On Sun, Mar 29, 2015 at 2:20 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
Here is a small v2 update:
v3, just a rebase.
Thanks for working on this. I see it's already registered in the
2015-06 CF, and will have a look at when we get there.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
v3, just a rebase.
Thanks for working on this. I see it's already registered in the
2015-06 CF, and will have a look at when we get there.
v4, rebase (after pgbench moved to src) and use of the recently added
syntax_error function.
--
Fabien.
Attachments:
pgbench-expr-abs-4.patchtext/x-diff; name=pgbench-expr-abs-4.patchDownload+80-6
v4, rebase (after pgbench moved to src) and use of the recently added
syntax_error function.
v5 which adds a forgotten header declaration for a new function, so as to
avoid a warning.
--
Fabien.
Attachments:
pgbench-expr-abs-5.patchtext/x-diff; name=pgbench-expr-abs-5.patchDownload+81-6
v4, rebase (after pgbench moved to src) and use of the recently added
syntax_error function.v5 which adds a forgotten header declaration for a new function, so as to
avoid a warning.
While playing around for adding other functions I noticed that with the
parsing rules I set unknown function were detected quite late in the
process, so that the error messages was imprecise.
v6 fixes this by checking function names ealier.
--
Fabien.
Attachments:
pgbench-expr-abs-6.patchtext/x-diff; name=pgbench-expr-abs-6.patchDownload+94-7
v7: rebase after pgindent run for 9.6.
--
Fabien.
Attachments:
pgbench-expr-abs-7.patchtext/x-diff; name=pgbench-expr-abs-7.patchDownload+94-7
On 03/06/2015 11:41 AM, Fabien COELHO wrote:
This patch extends pgbench expression with functions. Currently only one
"abs" function is added. The point is rather to bootstrap the
infrastructure for other functions (such as hash, random variants...) to
be added later.
I think it would actually be good to add at least some of those other
functions in the initial patch. The infrastructure that this patch adds
only supports arguments with a single argument, so it won't get us very
far. Also, will we need non-integer (e.g. string, numeric, whatever)
arguments for the functions? How about other datatypes for variables in
general? Perhaps not, or if we do that can be a separate patch, but it's
something to keep in mind. The pgbench script language is evolving into
a full-blown Turing-complete programming language...
As an initial list of functions, I'd suggest:
abs(x)
min(x, y, ...)
max(x, y, ...)
random_uniform(min, max)
random_gaussian(min, max, threshold)
random_exponential(min, max, threshold)
Would that be enough to specify e.g. the
As soon as we add more functions, the way they are documented needs to
be reworked too; we'll need to add a table in the manual to list them.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello Heikki,
This patch extends pgbench expression with functions. Currently only one
"abs" function is added. The point is rather to bootstrap the
infrastructure for other functions (such as hash, random variants...) to
be added later.I think it would actually be good to add at least some of those other
functions in the initial patch.
Hmmm, sure. I wanted some feedback on the "how" before doing that, hence
the infrastructure patch submitted with just one fonction. Obviously I can
expand, but before that any opinion on the "how"?
For instance I decided against having individual functions recognized by
the lexer, and to really have an infrastructure for storing, checking and
adding them without lex/yacc.
The infrastructure that this patch adds only supports arguments with a
single argument, so it won't get us very far.
Also, will we need non-integer (e.g. string, numeric, whatever)
arguments for the functions?
Maybe float *constants* for random exponential & gaussian.
How about other datatypes for variables in general?
The point is not to develop another full language in pgbench. People can
do things with PL/pgSQL & server side if it must be really advanced, the
point is really to facilitate pgbench "simple" scripts.
Perhaps not, or if we do that can be a separate patch, but it's
something to keep in mind. The pgbench script language is evolving into
a full-blown Turing-complete programming language...
The point is *NOT* to do that. For Turing, basically you would need while
or recursion & condition. Currently there is no such thing and not plan
for such thing, and I do not think it is desirable.
As an initial list of functions, I'd suggest:
abs(x)
This is the one included in the patch.
min(x, y, ...)
max(x, y, ...)
Hmm, varargs...
random_uniform(min, max)
Ok. probably just "random"?
random_gaussian(min, max, threshold)
random_exponential(min, max, threshold)
Hmm, threshold is a float.
As soon as we add more functions, the way they are documented needs to
be reworked too; we'll need to add a table in the manual to list them.
Yep.
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello Heikki,
As an initial list of functions, I'd suggest:
abs(x)
min(x, y, ...)
max(x, y, ...)
random_uniform(min, max)
random_gaussian(min, max, threshold)
random_exponential(min, max, threshold)Would that be enough to specify e.g. the
As soon as we add more functions, the way they are documented needs to be
reworked too; we'll need to add a table in the manual to list them.
Here is a v8 with "abs", "min", "max", "random", "gaussian" et
"exponential".
There is no expression typing, but expressions are evaluated and cast to
the expected type depending on what is needed (variables expect an int
value, function arguments are usually int but for threshold...).
There is an evalDouble function to evaluate double arguments for
gaussian & exponential thresholds.
There is a special "int" function which evaluates its argument as a double
and cast the result to an int, for testing purpose.
Variables are only int values, and functions only return ints.
There is no real doc, WIP...
Also included are a set of sql stuff I used for minimal tests.
--
Fabien.
Hello Heikki,
As soon as we add more functions, the way they are documented needs to be
reworked too; we'll need to add a table in the manual to list them.Here is a v8 with "abs", "min", "max", "random", "gaussian" et
"exponential".[...]
There is no real doc, WIP...
Here is a v9 with a doc. The implicit typing of expressions is improved.
I also added two debug functions which allow to show intermediate integer
(idebug) or double (ddebug).
\set i idebug(random(1, 10))
will print the random value and assign it to i.
I updated the defaut scripts, which seems to speed up meta command
evaluations. The initial version does less than 2 million evals per
second:
sh> cat before.sql
\set naccounts 100000 * :scale
\setrandom aid 1 :naccounts
sh> ./pgbench -T 3 -P 1 -f before.sql
[...]
tps = 1899004.793098 (excluding connections establishing)
The updated version does more than 3 million evals per second:
sh> cat after.sql
\set aid random(1, 100000 * :scale)
sh> ./pgbench -T 3 -P 1 -f after.sql
[...]
tps = 3143168.813690 (excluding connections establishing)
Suggestion:
A possibility would be to remove altogether the \setrandom stuff as the
functionality is available through \set, maybe with an error message to
advise about using \set with one of the random functions. That would
remove about 200 ugly locs...
Another option would be to translate the setrandom stuff into a \set
expression, that would maybe save 100 locs for the eval, but keep and
expand a little the "manual" parser part.
--
Fabien.
Attachments:
pgbench-expr-abs-9.patchtext/x-diff; name=pgbench-expr-abs-9.patchDownload+539-45
On Sun, Jul 26, 2015 at 6:37 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
Hello Heikki,
As soon as we add more functions, the way they are documented needs to be
reworked too; we'll need to add a table in the manual to list them.Here is a v8 with "abs", "min", "max", "random", "gaussian" et
"exponential".[...]
There is no real doc, WIP...
Here is a v9 with a doc. The implicit typing of expressions is improved.
I also added two debug functions which allow to show intermediate integer
(idebug) or double (ddebug).\set i idebug(random(1, 10))
will print the random value and assign it to i.
I updated the defaut scripts, which seems to speed up meta command
evaluations. The initial version does less than 2 million evals per second:sh> cat before.sql
\set naccounts 100000 * :scale
\setrandom aid 1 :naccountssh> ./pgbench -T 3 -P 1 -f before.sql
[...]
tps = 1899004.793098 (excluding connections establishing)The updated version does more than 3 million evals per second:
sh> cat after.sql
\set aid random(1, 100000 * :scale)sh> ./pgbench -T 3 -P 1 -f after.sql
[...]
tps = 3143168.813690 (excluding connections establishing)Suggestion:
A possibility would be to remove altogether the \setrandom stuff as the
functionality is available through \set, maybe with an error message to
advise about using \set with one of the random functions. That would remove
about 200 ugly locs...Another option would be to translate the setrandom stuff into a \set
expression, that would maybe save 100 locs for the eval, but keep and expand
a little the "manual" parser part.
I have moved this patch to the next CF.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, failed
Spec compliant: tested, failed
Documentation: tested, failed
Hello.
I installed centos 6.7 version on virtualbax, have cloned postgres from GitHub.
And then i patched to postgres as the patch file with the following command:
$ patch -p1 < pgbench-expr-abs-9.patch
To make sure that the patch apply to postgres, i wrote two script files.
The script file is:
$ cat before.sql
\set naccounts 100000 * :scale
\setrandom aid 1 :naccounts
$ cat after.sql
\set aid random(1, 100000 * :scale)
Using the script file in pgbench, i run :
$ ./pgbench -T 3 -P 1 -f before.sql
$ ./pgbench -T 3 -P 1 -f after.sql
This was run well. I test for the functions(e.g abs, ddebug, double, exporand, idebug, int, gaussrand, min, max, random, sqrt) mentioned in the patch, it was also executed without error.
Finally i reviewed code in patch, but did not find to error.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: tested, passed
Hello.
I installed centos 6.7 version on virtualbax, have cloned postgres from GitHub.
And then i patched to postgres as the patch file with the following command:
$ patch -p1 < pgbench-expr-abs-9.patch
To make sure that the patch apply to postgres, i wrote two script files.
The script file is:
$ cat before.sql
\set naccounts 100000 * :scale
\setrandom aid 1 :naccounts
$ cat after.sql
\set aid random(1, 100000 * :scale)
Using the script file in pgbench, i run :
$ ./pgbench -T 3 -P 1 -f before.sql
$ ./pgbench -T 3 -P 1 -f after.sql
This was run well. I test for the functions(e.g abs, ddebug, double, exporand, idebug, int, gaussrand, min, max, random, sqrt) mentioned in the patch, it was also executed without error.
Finally i reviewed code in patch, but did not find to error.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi, I had a look on this, this gives amazing performance. (I
myself haven't tried that yet:-p) But anyway the new syntax looks
far smarter than preparing arbitraly metacommands.
michael> I have moved this patch to the next CF.
Hello Heikki,
As soon as we add more functions, the way they are documented needs to be
reworked too; we'll need to add a table in the manual to list them.Here is a v8 with "abs", "min", "max", "random", "gaussian" et
"exponential".[...]
There is no real doc, WIP...
As a quick glance on the patch, I have two simple comment.
Would you consider these comments?
1.evalInt and evalDouble are mutually calling on single expr
node. It looks simple and seems to work but could easily broken
to fall into infinite call and finally (in a moment) exceeds
stack depth.
I think it is better not to do so. For example, reunioning
evalDouble and evalInt into one evalulateExpr() which can return
both double and int result would do so. The interface would be
something like the follwings or others. Some individual
functions might be better to be split out.
static ExprRetType evaluateExpr(TState,CState,PgBenchExpr,
int *iret, double *dret);
or
typedef struct EvalRetVal {
bool is_double,
union val {
int *ival;
double *dval;
}
} EvalRetVal;
static bool evaluateExpr(..., EvalRetVal *ret);
2. You combined abs/sqrt/ddebug and then make a branch for
them. It'd be better to split them even if eval*() looks to be
duplicate.
regards,
Here is a v9 with a doc. The implicit typing of expressions is improved.
I also added two debug functions which allow to show intermediate integer
(idebug) or double (ddebug).\set i idebug(random(1, 10))
will print the random value and assign it to i.
I updated the defaut scripts, which seems to speed up meta command
evaluations. The initial version does less than 2 million evals per second:sh> cat before.sql
\set naccounts 100000 * :scale
\setrandom aid 1 :naccountssh> ./pgbench -T 3 -P 1 -f before.sql
[...]
tps = 1899004.793098 (excluding connections establishing)The updated version does more than 3 million evals per second:
sh> cat after.sql
\set aid random(1, 100000 * :scale)sh> ./pgbench -T 3 -P 1 -f after.sql
[...]
tps = 3143168.813690 (excluding connections establishing)Suggestion:
A possibility would be to remove altogether the \setrandom stuff as the
functionality is available through \set, maybe with an error message to
advise about using \set with one of the random functions. That would remove
about 200 ugly locs...Another option would be to translate the setrandom stuff into a \set
expression, that would maybe save 100 locs for the eval, but keep and expand
a little the "manual" parser part.I have moved this patch to the next CF.
--
Michael
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello Kyotaro-san,
1.evalInt and evalDouble are mutually calling on single expr
node.
Indeed.
It looks simple and seems to work but could easily broken
to fall into infinite call and finally (in a moment) exceeds
stack depth.
The recursion is on the tree-structure of the expression, which is
evaluated depth-first. As the tree must be finite, and there is no such
thing as recursive functions in the expression tree, this should never
happen. If it happened, it would be a bug in the eval functions, probably
a forgotten "case", and could be fixed there easily.
I think it is better not to do so. For example, reunioning
evalDouble and evalInt into one evalulateExpr() which can return
both double and int result would do so. The interface would be
something like the follwings or others. Some individual
functions might be better to be split out.static ExprRetType evaluateExpr(TState,CState,PgBenchExpr,
int *iret, double *dret);
That would not work as simply as that: this signature does not say whether
a double or an int is expected, and without this information you would not
know what to do on some operators (the typing is explicit and descendant
in the current implementation).
Even if you add this information, or you use the returned type information
to do the typing dynamically, that would mean testing the result types and
implementing the necessary casts depending on this information for every
function within the generic eval, which would result in repetitive and
ugly code for every function and operator: for instance for '+', you would
have to implement 4 cases explicitely, i+i, f+i, i+f, f+f as "int add",
"cast left and double add", "cast right and double add", and finally
"double add". Then you have other operators to consider... Although some
sharing can be done, this would end in lengthy & ugly code.
A possible alternative could be to explicitely and statically type all
operations, adding the necessary casts explicitely, distinguishing
operators, but the would mean more non-obvious code to "compile" the
parsed expressions, I'm not sure that pgbench deserves this.
The two eval functions and the descendant typing allow to hide/ignore
these issues because each eval function handles just one type, type
boundaries are explicitely handled by calling the other function, so there
is no reason to test and cast everything all the time.
So I thing that it would not be an improvement to try to go the way you
suggest.
2. You combined abs/sqrt/ddebug and then make a branch for
them. It'd be better to split them even if eval*() looks to be
duplicate.
Hm, why not.
Here is a v10 where I did that when it did not add too much code
repetition (eg I kept the shared branches for MIN & MAX and for the 3
RANDOM functions so as to avoid large copy pastes).
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
[...]
Here is a v10 where I did that when it did not add too much code repetition
(eg I kept the shared branches for MIN & MAX and for the 3 RANDOM functions
so as to avoid large copy pastes).
Ooops, forgotten attachement.
--
Fabien.
Attachments:
pgbench-expr-abs-10.patchtext/x-diff; name=pgbench-expr-abs-10.patchDownload+549-45
Hello,
At Tue, 15 Sep 2015 19:26:51 +0200 (CEST), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.10.1509151844080.6503@sto>
1.evalInt and evalDouble are mutually calling on single expr
node.Indeed.
It looks simple and seems to work but could easily broken
to fall into infinite call and finally (in a moment) exceeds
stack depth.The recursion is on the tree-structure of the expression, which is
evaluated depth-first. As the tree must be finite, and there is no
such thing as recursive functions in the expression tree, this should
never happen. If it happened, it would be a bug in the eval functions,
probably a forgotten "case", and could be fixed there easily.
My description should have been obscure. Indeed the call tree is
finite for *sane* expression node. But it makes infinit call for
a value of expr->etype unknown by both evalDouble and
evalInt. This is what I intended to express by the words 'easily
broken'. AFAICS most of switches for nodes in the core would
stops and emit the message like following,
| default:
| elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));
So I think the following code would do well.
| evalDoubld()
| {
| if (IS_INT_EXPR_NODE(expr->etype))
| {
| if (!evalInt(...))
| return false;
| *retval = (double) ival;
| return true;
| }
|
| switch (expr->etype)
| {
| case ENODE_DOUBLE_CONSTANT:
| ...
| default:
| elog(ERROR, "unrecognized expr type: %d", (int) expr->etype);
I suppose here's a big difference between SEGV after run through
the stack bottom and server's return with ERROR for the same
bug. But IS_INT_EXPR_NODE() would be redundant with the
switch-case..
Do you think that I'm taking the difference too serious? However
this is needelss discussion if the following change is acceptable
for you.
I think it is better not to do so. For example, reunioning
evalDouble and evalInt into one evalulateExpr() which can return
both double and int result would do so. The interface would be
something like the follwings or others. Some individual
functions might be better to be split out.static ExprRetType evaluateExpr(TState,CState,PgBenchExpr,
int *iret, double *dret);That would not work as simply as that: this signature does not say
whether a double or an int is expected, and without this information
you would not know what to do on some operators (the typing is
explicit and descendant in the current implementation).
Even if you add this information, or you use the returned type
information to do the typing dynamically, that would mean testing the
result types and implementing the necessary casts depending on this
information for every function within the generic eval, which would
result in repetitive and ugly code for every function and operator:
for instance for '+', you would have to implement 4 cases explicitely,
i+i, f+i, i+f, f+f as "int add", "cast left and double add", "cast
right and double add", and finally "double add". Then you have other
operators to consider... Although some sharing can be done, this would
end in lengthy & ugly code.
Yes, I also think it is ugly as I anticipated:(.
By the way, the complexity comes from separating integer and
double. If there is no serios reason to separate them, handling
all values as double makes things far simpler.
Could you let me know the reason why it strictly separates
integer and double?
| evaluateExpr(..., double *retval)
| {
| switch (expr->etype)
| {
| case ENODE_CONSTANT:
| *retval = expr->u.constant.val;
| return true;
| case ENODE_OPERATOR
| /* Nothing to worry about */
| case ENODE_FUNCTION:
| {
| switch (func)
| {
| case PGBENCH_ABS/SQRT/DEBUG:
| /* Nothing to worry about, too */
| /* case PGBENCH_DOUBLE is useless ever after */
| case PGBENCH_INT:
| double arg;
| if (!evalulateExpr(..., &arg);
| return false;
| *retval = floor(arg);
I don't see no problem in possible errors of floating point
calculations for this purpose. Is there any?
Anyway set meta command finally converts the result as integer
regardless of the real value so the following conversion does the
same as your current code.
in doCustom()
else if (pg_strcasecmp(argv[0], "set") == 0)
{
char res[64];
PgBenchExpr *expr = commands[st->state]->expr;
int64 result;
- if (!evalInt(thread, st, expr, &result))
+ if (!evaluateExpr(thread, st, expr, &dresult))
+ result = (int64)dresult;
This should make exprparse.y simpler.
A possible alternative could be to explicitely and statically type all
operations, adding the necessary casts explicitely, distinguishing
operators, but the would mean more non-obvious code to "compile" the
parsed expressions, I'm not sure that pgbench deserves this.
The two eval functions and the descendant typing allow to hide/ignore
these issues because each eval function handles just one type, type
boundaries are explicitely handled by calling the other function, so
there is no reason to test and cast everything all the time.So I thing that it would not be an improvement to try to go the way
you suggest.
2. You combined abs/sqrt/ddebug and then make a branch for
them. It'd be better to split them even if eval*() looks to be
duplicate.Hm, why not.
Thanks.
Here is a v10 where I did that when it did not add too much code
repetition (eg I kept the shared branches for MIN & MAX and for the 3
RANDOM functions so as to avoid large copy pastes).
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello Kyotaro-san,
My description should have been obscure. Indeed the call tree is
finite for *sane* expression node. But it makes infinit call for
a value of expr->etype unknown by both evalDouble and
evalInt.
Such issue would be detected if the function is actually tested, hopefully
this should be the case... :-)
However I agree that relying implicitely on the "default" case is not very
good practice, so I updated the code in the attached v11 to fail
explicitely on such errors.
I also attached a small test script, which exercises most (all?)
functions:
./pgbench -f functions.sql -t 1
[...]
By the way, the complexity comes from separating integer and
double. If there is no serios reason to separate them, handling
all values as double makes things far simpler.
Yep, but no.
Could you let me know the reason why it strictly separates integer and
double? I don't see no problem in possible errors of floating point
calculations for this purpose. Is there any?
Indeed it would make things simpler, but it would break large integers as
the int64 -> double -> int64 casts would result in approximations. The
integer type is the important one here because it is used for primary
keys, and you do not want a key to be approximated in any way, so the
int64 type must be fully and exactly supported.
--
Fabien.