WIP: hooking parser
Hello
some years ago there was some plans about parser's extensibility. I am
able write bison extensions, but I thing, so lot of work should be
done via hooking of transform stage.
I did small example - real implementation of Oracle's decode function.
It's based on hooking transformExpr function.
It works. And I thing, this should to solve lot of special task
related to increase compatibility with Oracle, Informix, or it could
be usefull for some others (json support).
postgres=# load 'decode';
LOAD
postgres=# select
decode(null::integer,2,'ahoj',3,'Petr',1,'Pavel',null, 'jaja');
decode
--------
jaja
(1 row)
postgres=# select decode(3,2,'ahoj',3,'Petr',1,'Pavel',null, 'jaja');
decode
--------
Petr
(1 row)
postgres=# select decode(6,2,'ahoj',3,'Petr',1,'Pavel',null, 'jaja');
decode
--------
(1 row)
postgres=# select decode(6,2,'ahoj',3,'Petr',1,'Pavel',null, 'jaja', 'Milos');
decode
--------
Milos
(1 row)
Any ideas, notes?
regards
Pavel Stehule
Pavel Stehule <pavel.stehule@gmail.com> writes:
some years ago there was some plans about parser's extensibility. I am
able write bison extensions, but I thing, so lot of work should be
done via hooking of transform stage.
This strikes me as next door to useless, because it can only handle
things that look like valid expressions to the existing grammar.
So pretty much all you can do is weird sorts of functions, which are
already accommodated at less effort with existing features such as
function overloading.
A hook check in that particular place is not going to have negligible
performance impact, since it's going to be hit tens or hundreds or
thousands of times per query rather than just once. So it's going to
require more than a marginal use case to persuade me we ought to have
it.
regards, tom lane
2009/2/11 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
some years ago there was some plans about parser's extensibility. I am
able write bison extensions, but I thing, so lot of work should be
done via hooking of transform stage.This strikes me as next door to useless, because it can only handle
things that look like valid expressions to the existing grammar.
So pretty much all you can do is weird sorts of functions, which are
already accommodated at less effort with existing features such as
function overloading.
Usually we don't need change syntax. But we need to control of
coercion stage. I afraid so function overloading is bad when there lot
of combination, and polymorphic functions are not enough.
for some cases we need more polymorphic types - anyelement1,
anyelement2, anyarray1, ...
A hook check in that particular place is not going to have negligible
performance impact, since it's going to be hit tens or hundreds or
thousands of times per query rather than just once. So it's going to
require more than a marginal use case to persuade me we ought to have
it.
Because this stage isn't repeated (I don't expect bigger performance
impact), it's similar to other's hooks. But, sure, wrong hook should
do strange things. It's risk.
+ argument - it increase customisability and allows gentle syntax
tuning. Function decode is first sample from today morning.
regards
Pavel Stehule
Show quoted text
regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes:
2009/2/11 Tom Lane <tgl@sss.pgh.pa.us>:
This strikes me as next door to useless, because it can only handle
things that look like valid expressions to the existing grammar.
So pretty much all you can do is weird sorts of functions, which are
already accommodated at less effort with existing features such as
function overloading.
Usually we don't need change syntax. But we need to control of
coercion stage. I afraid so function overloading is bad when there lot
of combination, and polymorphic functions are not enough.
for some cases we need more polymorphic types - anyelement1,
anyelement2, anyarray1, ...
Well, then we should go fix those things.
A hook function whose purpose is to fundamentally change query semantics
strikes me as a very dangerous thing anyway, because your queries either
stop working or suddenly do something completely different if the hook
happens not to be loaded. The hooks we've accepted to date are intended
for either monitoring or experimentation with planner behavior, neither
of which will change query semantics.
regards, tom lane
2009/2/11 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
2009/2/11 Tom Lane <tgl@sss.pgh.pa.us>:
This strikes me as next door to useless, because it can only handle
things that look like valid expressions to the existing grammar.
So pretty much all you can do is weird sorts of functions, which are
already accommodated at less effort with existing features such as
function overloading.Usually we don't need change syntax. But we need to control of
coercion stage. I afraid so function overloading is bad when there lot
of combination, and polymorphic functions are not enough.
for some cases we need more polymorphic types - anyelement1,
anyelement2, anyarray1, ...Well, then we should go fix those things.
I am for it, and I doing on it.
A hook function whose purpose is to fundamentally change query semantics
strikes me as a very dangerous thing anyway, because your queries either
stop working or suddenly do something completely different if the hook
happens not to be loaded. The hooks we've accepted to date are intended
for either monitoring or experimentation with planner behavior, neither
of which will change query semantics.
I agree, and I understand well this risk. But still it is better and
wide used than custom patching. Look on executor hook. There are only
three cases - useful cases. It is some corner, that is far for general
using (integrating into core) and too sugar for ignore it for ever.
It's possibility, nothing less, nothing more.
regards
Pavel Stehule
Show quoted text
regards, tom lane
On Wednesday 11 February 2009 12:05:03 Pavel Stehule wrote:
It works. And I thing, this should to solve lot of special task
related to increase compatibility with Oracle, Informix, or it could
be usefull for some others (json support).postgres=# load 'decode';
LOAD
postgres=# select
decode(null::integer,2,'ahoj',3,'Petr',1,'Pavel',null, 'jaja');
decode
--------
jaja
(1 row)
I think what you want here is some way to define a function that takes an
arbitrary number of arguments of arbitrary type and let the function figure
everything out. I see no reason why this can't be a variant on CREATE
FUNCTION, except that of course you need to figure out some API and function
resolution details. But it doesn't have to be a completely different concept
like a binary plugin.
2009/2/12 Peter Eisentraut <peter_e@gmx.net>:
On Wednesday 11 February 2009 12:05:03 Pavel Stehule wrote:
It works. And I thing, this should to solve lot of special task
related to increase compatibility with Oracle, Informix, or it could
be usefull for some others (json support).postgres=# load 'decode';
LOAD
postgres=# select
decode(null::integer,2,'ahoj',3,'Petr',1,'Pavel',null, 'jaja');
decode
--------
jaja
(1 row)I think what you want here is some way to define a function that takes an
arbitrary number of arguments of arbitrary type and let the function figure
everything out. I see no reason why this can't be a variant on CREATE
FUNCTION, except that of course you need to figure out some API and function
resolution details. But it doesn't have to be a completely different concept
like a binary plugin.
Actually I need add some metada to parameter list, Question is, what
is more simple and more readable - descriptive or procedural solution.
And what we are able to implement.
example DECODE(any1, any2, (asany1, asany2).,(asany2)+)
Actually I thing so with some hook of parser transform stage we should
to this task more simply.
I found next sample, that should be solved via hook - emulation of
Oracle behave '' is null.
Regards
Pavel
Peter Eisentraut <peter_e@gmx.net> writes:
I think what you want here is some way to define a function that takes an
arbitrary number of arguments of arbitrary type and let the function figure
everything out. I see no reason why this can't be a variant on CREATE
FUNCTION, except that of course you need to figure out some API and function
resolution details.
We've already got "variadic any" functions --- the problem is to tell
the parser what the function's result type will be, given a particular
parameter list. I agree that hooking transformExpr is not exactly the
most ideal way to attack that from a performance or complexity
standpoint.
regards, tom lane
Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
I think what you want here is some way to define a function that takes an
arbitrary number of arguments of arbitrary type and let the function figure
everything out. I see no reason why this can't be a variant on CREATE
FUNCTION, except that of course you need to figure out some API and function
resolution details.We've already got "variadic any" functions --- the problem is to tell
the parser what the function's result type will be, given a particular
parameter list. I agree that hooking transformExpr is not exactly the
most ideal way to attack that from a performance or complexity
standpoint.
What is the defined return type logic for the decode() function anyway?
If you want the full CASE-like resolution logic, it might be very hard
to fit that into a general system.
Peter Eisentraut wrote:
Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
I think what you want here is some way to define a function that
takes an arbitrary number of arguments of arbitrary type and let the
function figure everything out. I see no reason why this can't be a
variant on CREATE FUNCTION, except that of course you need to figure
out some API and function resolution details.We've already got "variadic any" functions --- the problem is to tell
the parser what the function's result type will be, given a particular
parameter list. I agree that hooking transformExpr is not exactly the
most ideal way to attack that from a performance or complexity
standpoint.What is the defined return type logic for the decode() function anyway?
If you want the full CASE-like resolution logic, it might be very hard
to fit that into a general system.
And on top of that, decode() is supposed to do short-circuit evaluation
of the arguments.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
2009/2/13 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
Peter Eisentraut wrote:
Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
I think what you want here is some way to define a function that takes
an arbitrary number of arguments of arbitrary type and let the function
figure everything out. I see no reason why this can't be a variant on
CREATE FUNCTION, except that of course you need to figure out some API and
function resolution details.We've already got "variadic any" functions --- the problem is to tell
the parser what the function's result type will be, given a particular
parameter list. I agree that hooking transformExpr is not exactly the
most ideal way to attack that from a performance or complexity
standpoint.What is the defined return type logic for the decode() function anyway?
If you want the full CASE-like resolution logic, it might be very hard to
fit that into a general system.And on top of that, decode() is supposed to do short-circuit evaluation of
the arguments.
yes, you should to look so this work do transform hook very vell
regards
Pavel
Show quoted text
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote:
And on top of that, decode() is supposed to do short-circuit evaluation
of the arguments.
Then the only solution is to hack it right into the parser.
There is an existing decode() function however ...
Next sample of parser hook using:
attachment contains module that transform every empty string to null.
I am not sure, if this behave is exactly compatible with Oracle, but
for first iteration it is good.
postgres=# select length('') is null;
?column?
----------
t
(1 row)
I thing, so this should be used for emulation of some constructors too.
Regards
Pavel Stehule
2009/2/13 Peter Eisentraut <peter_e@gmx.net>:
Show quoted text
Heikki Linnakangas wrote:
And on top of that, decode() is supposed to do short-circuit evaluation of
the arguments.Then the only solution is to hack it right into the parser.
There is an existing decode() function however ...
Attachments:
null.ctext/plain; charset=US-ASCII; name=null.cDownload
On Mon, Feb 16, 2009 at 02:35:54PM +0100, Pavel Stehule wrote:
attachment contains module that transform every empty string to null.
Why would anyone ever want to do this? This would appear to break all
sorts of things in very non-obvious ways:
SELECT CASE s WHEN '' THEN 'empty string' ELSE s END FROM foo;
UPDATE foo SET s = NULL WHERE s = '';
would no longer do the expected thing. It would only do the expected
thing (in my eyes) when strings of zero length were actually being
inserted into the database. Like:
INSERT INTO foo (s) VALUES ('');
UPDATE foo SET s = '' WHERE s = 'empty string';
Or am I missing something obvious?
--
Sam http://samason.me.uk/
2009/2/16 Sam Mason <sam@samason.me.uk>:
On Mon, Feb 16, 2009 at 02:35:54PM +0100, Pavel Stehule wrote:
attachment contains module that transform every empty string to null.
Why would anyone ever want to do this? This would appear to break all
sorts of things in very non-obvious ways:
I agree, so this behave is strange - but Oracle does it.
so normal query in Oracle for empty value looks like
select * from people where surname is null;
and some application expect transformation from '' to null.
http://www.thunderguy.com/semicolon/2003/04/26/oracle-empty-string-null/
so these modules (decode, oraemptystr) decrease differences between
PostgreSQL and Oracle.
Regards
Pavel Stehule
p.s. I am not Oracle expert, I expect so here are more qualified men.
Show quoted text
SELECT CASE s WHEN '' THEN 'empty string' ELSE s END FROM foo;
UPDATE foo SET s = NULL WHERE s = '';would no longer do the expected thing. It would only do the expected
thing (in my eyes) when strings of zero length were actually being
inserted into the database. Like:INSERT INTO foo (s) VALUES ('');
UPDATE foo SET s = '' WHERE s = 'empty string';Or am I missing something obvious?
--
Sam http://samason.me.uk/--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Feb 16, 2009 at 03:21:12PM +0100, Pavel Stehule wrote:
2009/2/16 Sam Mason <sam@samason.me.uk>:
On Mon, Feb 16, 2009 at 02:35:54PM +0100, Pavel Stehule wrote:
attachment contains module that transform every empty string to null.
Why would anyone ever want to do this? This would appear to break all
sorts of things in very non-obvious ways:I agree, so this behave is strange - but Oracle does it.
so normal query in Oracle for empty value looks like
select * from people where surname is null;
and some application expect transformation from '' to null.
http://www.thunderguy.com/semicolon/2003/04/26/oracle-empty-string-null/
that's pretty grim!
<rant>
I'd agree with the comment saying "A string variable that can't be
set empty is like a number variable that can't be set zero".
Oracle have just thrown out (or, at best, rewritten) the inductive
base case for strings. For numbers you (logically) start with a Zero
and a Succ (successor function) and model numbers as an arbitrary
number of applications of Succ to Zero ("2" is (Succ (Succ Zero))).
For strings, you start with an empty string and an append function
("hi" being (Append (Append '' \h) \i)).
</rant>
so these modules (decode, oraemptystr) decrease differences between
PostgreSQL and Oracle.
wouldn't it be better/easier to extend something like pgpool to
transform Oracle style SQL code to PG style code? You'd certainly
be able to get it more complete in reasonable amounts of time, but
performance would suffer when you went to look up table definitions to
check the types of various things.
--
Sam http://samason.me.uk/
Pavel Stehule <pavel.stehule@gmail.com> writes:
Next sample of parser hook using:
attachment contains module that transform every empty string to null.
I am not sure, if this behave is exactly compatible with Oracle,
Surely a parser hook like this would have nothing whatsoever to do
with Oracle's behavior.
regards, tom lane
2009/2/16 Sam Mason <sam@samason.me.uk>:
On Mon, Feb 16, 2009 at 03:21:12PM +0100, Pavel Stehule wrote:
2009/2/16 Sam Mason <sam@samason.me.uk>:
On Mon, Feb 16, 2009 at 02:35:54PM +0100, Pavel Stehule wrote:
attachment contains module that transform every empty string to null.
Why would anyone ever want to do this? This would appear to break all
sorts of things in very non-obvious ways:I agree, so this behave is strange - but Oracle does it.
so normal query in Oracle for empty value looks like
select * from people where surname is null;
and some application expect transformation from '' to null.
http://www.thunderguy.com/semicolon/2003/04/26/oracle-empty-string-null/
that's pretty grim!
<rant>
I'd agree with the comment saying "A string variable that can't be
set empty is like a number variable that can't be set zero".Oracle have just thrown out (or, at best, rewritten) the inductive
base case for strings. For numbers you (logically) start with a Zero
and a Succ (successor function) and model numbers as an arbitrary
number of applications of Succ to Zero ("2" is (Succ (Succ Zero))).
For strings, you start with an empty string and an append function
("hi" being (Append (Append '' \h) \i)).
</rant>so these modules (decode, oraemptystr) decrease differences between
PostgreSQL and Oracle.wouldn't it be better/easier to extend something like pgpool to
transform Oracle style SQL code to PG style code? You'd certainly
be able to get it more complete in reasonable amounts of time, but
performance would suffer when you went to look up table definitions to
check the types of various things.
then you should to rewrite complete PostgreSQL parser :) and
performance will be worse (you have to parse query string two times).
For this transformation you need query's semantic tree and access to
dictionary (some caches) . Lot of things should by done via
extensibility features of PostgreSQL. Sure - you can do this things
difficult outside of PostgreSQL or simply via parser's hook.
These samples are only for Oracle. But I am sure, so this technique
should be used for different databases too. Example. Informix uses
convention for named params like paramname = value. PostgreSQL 8.5
will use syntax paramname AS value. So you need change app. code. With
hook I am able transform transparently Informix syntax to PostgreSQL
syntax without significant increase of load or complexity.
regards
Pavel Stehule
Show quoted text
--
Sam http://samason.me.uk/--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Feb 16, 2009 at 04:40:23PM +0100, Pavel Stehule wrote:
2009/2/16 Sam Mason <sam@samason.me.uk>:
On Mon, Feb 16, 2009 at 03:21:12PM +0100, Pavel Stehule wrote:
so these modules (decode, oraemptystr) decrease differences between
PostgreSQL and Oracle.wouldn't it be better/easier to extend something like pgpool to
transform Oracle style SQL code to PG style code? You'd certainly
be able to get it more complete in reasonable amounts of time, but
performance would suffer when you went to look up table definitions to
check the types of various things.then you should to rewrite complete PostgreSQL parser :) and
performance will be worse (you have to parse query string two times).
Yes, there'd be a few thousand lines of code to write.
Note that you only need to parse things twice, planning only needs to be
done by PG, so it shouldn't be too bad. It'll add maybe a millisecond
or so to query execution times, with most of that time spent going off
to find table and function definitions from the real database.
For this transformation you need query's semantic tree and access to
dictionary (some caches) . Lot of things should by done via
extensibility features of PostgreSQL. Sure - you can do this things
difficult outside of PostgreSQL or simply via parser's hook.
But to do it properly inside PG would be difficult; how would your hooks
know to transform:
SELECT s FROM foo WHERE s IS NULL;
into:
SELECT s FROM foo WHERE (s = '' OR s IS NULL);
that all looks a bit tricky to me. Hum... actually it's not. All you
need to do is to rewrite any string reference "s" into NULLIF(s,'').
That would tank performance as indexes wouldn't be used most of the
time, but never mind.
These samples are only for Oracle. But I am sure, so this technique
should be used for different databases too. Example. Informix uses
convention for named params like paramname = value. PostgreSQL 8.5
will use syntax paramname AS value. So you need change app. code. With
hook I am able transform transparently Informix syntax to PostgreSQL
syntax without significant increase of load or complexity.
That would be a *much* bigger change; you're actually changing PG's
parser there and not just modifying the parse tree. If it was done
externally it would be a much easier thing to do.
--
Sam http://samason.me.uk/
2009/2/16 Sam Mason <sam@samason.me.uk>:
On Mon, Feb 16, 2009 at 04:40:23PM +0100, Pavel Stehule wrote:
2009/2/16 Sam Mason <sam@samason.me.uk>:
On Mon, Feb 16, 2009 at 03:21:12PM +0100, Pavel Stehule wrote:
so these modules (decode, oraemptystr) decrease differences between
PostgreSQL and Oracle.wouldn't it be better/easier to extend something like pgpool to
transform Oracle style SQL code to PG style code? You'd certainly
be able to get it more complete in reasonable amounts of time, but
performance would suffer when you went to look up table definitions to
check the types of various things.then you should to rewrite complete PostgreSQL parser :) and
performance will be worse (you have to parse query string two times).Yes, there'd be a few thousand lines of code to write.
Note that you only need to parse things twice, planning only needs to be
done by PG, so it shouldn't be too bad. It'll add maybe a millisecond
or so to query execution times, with most of that time spent going off
to find table and function definitions from the real database.
Hello
But to do it properly inside PG would be difficult; how would your hooks
know to transform:SELECT s FROM foo WHERE s IS NULL;
into:
SELECT s FROM foo WHERE (s = '' OR s IS NULL);
I don't need it. Oracle store NULL without ''. So expression some IS
NULL is stable.
that all looks a bit tricky to me. Hum... actually it's not. All you
need to do is to rewrite any string reference "s" into NULLIF(s,'').
That would tank performance as indexes wouldn't be used most of the
time, but never mind.
look to source what I do. It' just simple. But you have to emulate
Oracle behave everywhere. Then all is simple, because Oracle doesn't
know ''.
That would be a *much* bigger change; you're actually changing PG's
parser there and not just modifying the parse tree. If it was done
externally it would be a much easier thing to do.
No I don't do it. Loadable modules are really external. I need only
hook inside parser.
Regards
Pavel Stehule
Show quoted text
--
Sam http://samason.me.uk/--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers