patch: plpgsql - access records with rec.(expr)
Hi,
I got extremely frustrated with having to create a temp table every time
I wanted to access an arbitrary column from a record plpgsql. After
seeing a note on the developers TODO about accessing plpgsql records
with a 'dot bracket' notation I started digging into the plpgsql source.
My diff (against 8beta4) is attached.
Warning: I Am Not a C Programmer! I haven't even written a hello world
in C before, and I knew nothing about Flex before yesterday. It was fun
figuring stuff out, I'm amazed it mostly works, but I'm really hoping
someone can point out my mistakes.
Goal:
Enable users to access fields in record variables using the following
syntax like the following:
rec.(1)
rec.('foo')
rec.(myvar::int)
rec.(myvar || '_id')
Files changed:
plpgsql.h
- added 'expr' member to PLpgSQL_recfield type for holding the
PLpgSQL_expr structure.
scan.l
- added match for {identifier}{space}*\. AFAIK this should only match
if a longer expression doesn't?
pl_comp.c
- added plpgsql_parse_wordexpr() function called by above match. Ripped
off code from plpgsql_parse_word that deals with arg_v[expr] to find our
expression. Prob a dumb name for the function!
pl_exec.c
- hacked exec_assign_value() and exec_eval_datum() to use the expression
to get the field name/number.
Stuff I need help with:
1. It should recognise OLD.(1) as a field number, not a column name. I
think I've got to check the returned type from exec_eval_expr() then
exec_simple_cast_value() on it, but that seems beyond me.
2. Freeing stuff. As I explained, this is all pretty new to me, and the
comments about it around exec_eval_expr() et al just confused me :(
Please give some hints about what needs freeing!
3. Would probably be good to add check in pl_comp.c to see if the
expression actually needs to be evaluated at runtime (ie isn't just a
field name/number). How?
4. Make this also work for row.(expr), label.record.(expr) and
label.row.(expr) - but want to get the basics working first!
5. Because of the way the expression is parsed (looking for closing
parenth), this will choke if you try and put a function in there. Would
it be better to use curly braces '{expr}' or another character to mark
the expression?
I hope at this eventually leads to some really useful extra
functionality, particularly for writing generic triggers. And it's a
tribute to the existing code that a complete newbie can cut-and-paste
their way to a halfarsed solution in a (rather long) night!
Regards,
Matt
Attachments:
record_dot_parenth.difftext/x-patch; charset=UTF-8; name=record_dot_parenth.diffDownload+167-25
5. Because of the way the expression is parsed (looking for closing
parenth), this will choke if you try and put a function in there. Would
it be better to use curly braces '{expr}' or another character to mark
the expression?
I lie! pgpgsql_read_expression() is smarter than that!
However, I do have another problem. If the value of the expr changes
inside a loop to a fieldname of a different type, it dies with the "type
of \"%s\" does not match that when preparing the plan" message, which is
quite true: it obviously doesn't.
Just setting expectedtypeoid to InvalidOid bombs the whole thing :(
Hrm.... the "best made plans" and all that...
Matt
Hi all,
I've cleaned this up a bit so that assigning to a dynamic record field
now works - ie NEW.(myvar) := 'someval' - and accessing a field by
number works - ie myvar := OLD.(1)
It still doesn't work in a loop if the type of field referenced by the
expression changes - looking at it more I'm really not sure this is
possible, but that's a limitation I could live with. I'll also try
figuring out freeing stuff after casting values over the weekend.
But is this a worthwhile approach? If there are objections, please let
me know!
For myself, being able to pass a column name as an argument to a trigger
would make writing generic triggers a whole lot easier. And going back
through the archives on this list I see I'm not alone.
Regards,
Matt
Show quoted text
On Thu, 2004-11-18 at 13:18, Matt wrote:
Hi,
I got extremely frustrated with having to create a temp table every time
I wanted to access an arbitrary column from a record plpgsql. After
seeing a note on the developers TODO about accessing plpgsql records
with a 'dot bracket' notation I started digging into the plpgsql source.My diff (against 8beta4) is attached.
Warning: I Am Not a C Programmer! I haven't even written a hello world
in C before, and I knew nothing about Flex before yesterday. It was fun
figuring stuff out, I'm amazed it mostly works, but I'm really hoping
someone can point out my mistakes.Goal:
Enable users to access fields in record variables using the following
syntax like the following:
rec.(1)
rec.('foo')
rec.(myvar::int)
rec.(myvar || '_id')Files changed:
plpgsql.h
- added 'expr' member to PLpgSQL_recfield type for holding the
PLpgSQL_expr structure.scan.l
- added match for {identifier}{space}*\. AFAIK this should only match
if a longer expression doesn't?pl_comp.c
- added plpgsql_parse_wordexpr() function called by above match. Ripped
off code from plpgsql_parse_word that deals with arg_v[expr] to find our
expression. Prob a dumb name for the function!pl_exec.c
- hacked exec_assign_value() and exec_eval_datum() to use the expression
to get the field name/number.Stuff I need help with:
1. It should recognise OLD.(1) as a field number, not a column name. I
think I've got to check the returned type from exec_eval_expr() then
exec_simple_cast_value() on it, but that seems beyond me.2. Freeing stuff. As I explained, this is all pretty new to me, and the
comments about it around exec_eval_expr() et al just confused me :(
Please give some hints about what needs freeing!3. Would probably be good to add check in pl_comp.c to see if the
expression actually needs to be evaluated at runtime (ie isn't just a
field name/number). How?4. Make this also work for row.(expr), label.record.(expr) and
label.row.(expr) - but want to get the basics working first!5. Because of the way the expression is parsed (looking for closing
parenth), this will choke if you try and put a function in there. Would
it be better to use curly braces '{expr}' or another character to mark
the expression?I hope at this eventually leads to some really useful extra
functionality, particularly for writing generic triggers. And it's a
tribute to the existing code that a complete newbie can cut-and-paste
their way to a halfarsed solution in a (rather long) night!Regards,
Matt
______________________________________________________________________
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Attachments:
record_dot_parenth.difftext/x-patch; charset=UTF-8; name=record_dot_parenth.diffDownload+205-24
On Thu, 2004-11-18 at 13:18 +0000, Matt wrote:
I got extremely frustrated with having to create a temp table every time
I wanted to access an arbitrary column from a record plpgsql.
FYI, one thing I want to implement is an EVALUATE statement in plpgsql
(analogous to eval() in Perl, for example). If I understand your use
case, I think this will help somewhat, although of course it is still
clumsier than direct syntactic support.
Warning: I Am Not a C Programmer! I haven't even written a hello world
in C before, and I knew nothing about Flex before yesterday. It was fun
figuring stuff out, I'm amazed it mostly works, but I'm really hoping
someone can point out my mistakes.
An impressive first hack! :)
Enable users to access fields in record variables using the following
syntax like the following [...]
rec.(1)
Looks good.
rec.('foo')
I don't like this: it implicitly coerces a string literal into an
identifier (i.e. a column name). Treating data as code can be useful,
but I think we need to make it more obvious to the user. I think a
proper EVALUATE statement might be a better solution.
5. Because of the way the expression is parsed (looking for closing
parenth), this will choke if you try and put a function in there. Would
it be better to use curly braces '{expr}' or another character to mark
the expression?
How much thought went into choosing parentheses? (i.e. is a similar
syntax used in the procedural languages in other DBs?)
In any case, I don't think switching switching to braces is the right
solution to the lexing problem. I _believe_ the patch is OK as is --
plpgsql_read_expression() keeps track of parenthesis nesting so it
should be able to figure out when it sees an "unmatched" parenthesis
(indicating the end of the expression).
BTW, attached is a trivial incremental patch that fixes a few compile
warnings (which are indicative of actual problems) in your patch.
-Neil
Attachments:
record_dot_improv.patchtext/x-patch; charset=ISO-8859-1; name=record_dot_improv.patchDownload+6-6
Hi Neil,
Thanks for the comments. I've actually got (yet) another version ready
to go, which fixes the compile warnings and adds some sanity checks.
I'll post it as soon as I've got beta5 downloaded and tried out :)
FYI, one thing I want to implement is an EVALUATE statement in plpgsql
(analogous to eval() in Perl, for example). If I understand your use
case, I think this will help somewhat, although of course it is still
clumsier than direct syntactic support.
This would execute a string and pass back the result? I'm sure I'll find
a use for it at some point :)
rec.('foo')
I don't like this: it implicitly coerces a string literal into an
identifier (i.e. a column name). Treating data as code can be useful,
but I think we need to make it more obvious to the user. I think a
proper EVALUATE statement might be a better solution.
See your point. But what about NEW.($1)?
5. Because of the way the expression is parsed (looking for closing
parenth), this will choke if you try and put a function in there. Would
it be better to use curly braces '{expr}' or another character to mark
the expression?How much thought went into choosing parentheses? (i.e. is a similar
syntax used in the procedural languages in other DBs?)
Only used them because of the small note I saw on the developer's TODO
about accessing cols by ordinal - the example there was rec.(1). But I
was wrong about functions not working there - plpgsql_read_expression()
is smarter than that, as you say.
OK, download is done. I've got some more general ideas which relate to
this. I'll post along with updated version.
Matt
Hi,
Updated patch (now against beta5) attached. It now pfree's any converted
strings, avoids pointlessly casting an int4oid to int4oid, complies to
CS (uses tabs, not spaces) and works with label.record.(expression) too.
I'm still testing, it now does what I set out to achieve.
I haven't done anything to implement the same thing for row types. I'm
wondering if there's any point. The person writing the function is
implicitly going to know the structure of a row; the only use I can
foresee is iterating over the columns by ordinal, but that isn't
possible (at present, see below).
Neil's raised a coupe of good points:
- is the parenthesis format the right way to go? How do other sp
languages do it?
- if we have a more genaralised EVALUATE statement, do we still need
this?
I'm not at all fixated on the parenthesis idea. If there's a better way,
great. But if this stands a chance of making it in to the language one
day, it'd be good to know so I can start building stuff that uses the
format.
One thought I had about the format would be to use a kind of
javascript-ish syntax:
rec.item(expr) - as above
This might later lend itself to other record-specific functions:
rec.length() - returns number of columns
rec.getColType(expr) - returns oid of given column
rec.getColNumber(expr) - returns ordinal of given column
I can't see a crying need for any of these, mind... but length() would
be useful _if_ you could iterate over columns, so:
Loops:
The issue with loops is a nuisance - if the underlying column type
changes, my stuff throws a data type mismatch exception. I imagine this
could affect an EVALUATE expression inside a loop, too (or would their
plans never be saved, like EXECUTE?).
I might be totally wrong, but here's what I see:
To stop the mismatch you'd have to stop exec_eval_expr() saving the plan
for that expression. A flag in the expression struct could do that,
though you'd want to add a check that the function was defined as
volatile before allowing it.
The flag could be set during the function's compilation stage, but then
all functions using the rec.(expr) format anywhere would have to be
volatile.
A better approach would be to walk through the function's statement tree
just before execution, checking for troublesome expressions appearing
inside loops (and whose arguments are reassigned inside the loop?) and
marking them as dynamic.
Does that make any sense? Is it worth the work? Or should we just tell
anyone who actually needs it (I don't, at present) 'use another PL'?
Regards,
Matt
Attachments:
record_dot_parenth.difftext/x-patch; charset=UTF-8Download+294-14
Neil Conway <neilc@samurai.com> writes:
FYI, one thing I want to implement is an EVALUATE statement in plpgsql
(analogous to eval() in Perl, for example).
I'm confused. How/why is this different from EXECUTE?
regards, tom lane
Matt <matt@kynx.org> writes:
Does that make any sense? Is it worth the work? Or should we just tell
anyone who actually needs it (I don't, at present) 'use another PL'?
I don't really see this going anywhere --- it's contorting the semantics
of plpgsql too much for too little gain. The typical use-case I've
heard of for this sort of thing is "I want to write a generic trigger,
so I need to iterate over the columns of NEW.* without knowing their
names or data types in advance". Your proposal doesn't address the
issue of how the function would find out the column names in order to
make use of the proposed notation; and as you noted there's still a
serious problem with varying datatypes.
Either plperl or pltcl (probably also plpython but I'm not familiar
with that language) is a better choice for writing such triggers,
because those languages already have answers for all these issues.
regards, tom lane
Hi Tom,
Does that make any sense? Is it worth the work? Or should we just tell
anyone who actually needs it (I don't, at present) 'use another PL'?I don't really see this going anywhere --- it's contorting the semantics
of plpgsql too much for too little gain.
Yup, the last bit was definitely contortionist! Prob should have split
it into two separate posts.
What I'm most looking for comments on what's actually in the patch: with
it I can use NEW.($1). _That_ is a big gain.
On the rest...
Your proposal doesn't address the
issue of how the function would find out the column names in order to
make use of the proposed notation;
It doesn't need them. The posted patch already allows:
for i in 1..3 loop
myvar := rec.(i);
end loop;
...so long as all cols have the same datatype (note that col numbers are
1-based, like in the spi funcs).
and as you noted there's still a
serious problem with varying datatypes.
My contortions were an attempt to think my way around the varying
datatypes problem. Obviously my thinking got tied in knots ;) I agree,
that part is too much work for too little gain.
Either plperl or pltcl (probably also plpython but I'm not familiar
with that language) is a better choice for writing such triggers,
because those languages already have answers for all these issues.
I didn't think plperl could be used to write triggers. Is this new in 8?
As an application developer, though, it'd be nice to keep everything in
one pl language, and ideally one guaranteed available (or easily
installable without other dependencies) on every postgres db out there.
That's pretty much why I wrote the patch: to add the one missing feature
I need to write my generic triggers in plpgsql.
Regards,
Matt
On Mon, 2004-11-22 at 10:57 -0500, Tom Lane wrote:
I'm confused. How/why is this different from EXECUTE?
EVALUATE would take a string and evaluate it as a PL/PgSQL statement;
EXECUTE takes a string and executes it as a SQL statement. We've
discussed this before (although I may not have called it "EVALUATE" at
the time):
http://archives.postgresql.org/pgsql-bugs/2004-10/msg00005.php
-Neil
On Mon, 2004-11-22 at 10:06 +0000, Matt wrote:
This would execute a string and pass back the result?
It would evaluate a string as a PL/PgSQL statement (which means you
could construct any PL/PgSQL statement dynamically, including access to
fields of a RECORD determined at runtime).
I don't like this: it implicitly coerces a string literal into an
identifier (i.e. a column name). Treating data as code can be useful,
but I think we need to make it more obvious to the user. I think a
proper EVALUATE statement might be a better solution.See your point. But what about NEW.($1)?
I don't follow -- what do you mean?
(BTW, I think my comment also applies to variables of type "text" and
similar -- I think the patch would be a lot simpler if you just
implement access to record fields by ordinal position, and don't
implement access by field name.)
-Neil
See your point. But what about NEW.($1)?
I don't follow -- what do you mean?
I want to be able to be able to write a trigger function that accesses a
column passed as an argument to the function in the row that caused the
trigger. This is my use case.
I guess that would actually written NEW.(TG_ARGV[1]).
(BTW, I think my comment also applies to variables of type "text" and
similar -- I think the patch would be a lot simpler if you just
implement access to record fields by ordinal position, and don't
implement access by field name.)
Yes, it would be marginally simpler: I'd still have to call
exec_eval_datum() on the variable and check whether it could be
evaluated to an integer (trigger args are all text AFAIK). The only
difference would be throwing an error if it wasn't, instead of making
use of the value... and a slightly less readable 'create trigger'
statement.
It would be a good idea to check that the variable was either a constant
or a trigger arg. This would stop the looping problem, since the type of
the underlying field couldn't change.
But I've somehow got the feeling that this sort of thing isn't the
issue. The issue is whether we want to allow dynamic access to columns
in any syntax at all. A simple yes or no would do :)
Matt
BTW: here's the original post adding the rec.(3) syntax to the TODO:
http://archives.postgresql.org/pgsql-hackers/2003-07/msg00425.php
here's someone else who tried something very similar:
http://archives.postgresql.org/pgsql-hackers/2003-09/msg00533.php
Just to put in my .02$, I would absolutely love to see this
functionality included in plpgsql. With some extra error checking for
the know changing datatype failure, and docs that mention that
limitation, I'd say this is a great extension to the language.
plpgsql feels quicker than the interpreted PLs and it's far easier
than C to work with for writing triggers, so this patch makes plpgsql
a much more attractive target for general purpose stored procs. And
my gut feeling is that an EVALUATE statement would be significantly
slower than this.
In any case, thanks for the great work, Matt. Please, CORE, include this one!
As an alternative, what would be the possibility of creating a new PL
as a contrib module, say PLPGSQL_NG, to move forward with extensions
like this and perhaps EVALUATE?
--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
Show quoted text
On 23 Nov 2004 09:03:10 +0000, Matt <matt@kynx.org> wrote:
See your point. But what about NEW.($1)?
I don't follow -- what do you mean?
I want to be able to be able to write a trigger function that accesses a
column passed as an argument to the function in the row that caused the
trigger. This is my use case.I guess that would actually written NEW.(TG_ARGV[1]).
(BTW, I think my comment also applies to variables of type "text" and
similar -- I think the patch would be a lot simpler if you just
implement access to record fields by ordinal position, and don't
implement access by field name.)Yes, it would be marginally simpler: I'd still have to call
exec_eval_datum() on the variable and check whether it could be
evaluated to an integer (trigger args are all text AFAIK). The only
difference would be throwing an error if it wasn't, instead of making
use of the value... and a slightly less readable 'create trigger'
statement.It would be a good idea to check that the variable was either a constant
or a trigger arg. This would stop the looping problem, since the type of
the underlying field couldn't change.But I've somehow got the feeling that this sort of thing isn't the
issue. The issue is whether we want to allow dynamic access to columns
in any syntax at all. A simple yes or no would do :)Matt
BTW: here's the original post adding the rec.(3) syntax to the TODO:
http://archives.postgresql.org/pgsql-hackers/2003-07/msg00425.php
here's someone else who tried something very similar:
http://archives.postgresql.org/pgsql-hackers/2003-09/msg00533.php---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
Matt <matt@kynx.org> writes:
It would be a good idea to check that the variable was either a constant
or a trigger arg. This would stop the looping problem, since the type of
the underlying field couldn't change.
What about
for i in ...
... new.(tg_argv[i]) ...
But I've somehow got the feeling that this sort of thing isn't the
issue. The issue is whether we want to allow dynamic access to columns
in any syntax at all. A simple yes or no would do :)
MHO: this is a really ugly wart on the language, and it does not solve
the problems people would want to solve. It might solve *your* problem
but that's not enough to justify a wart of this size.
We do need to do something about the fact that EXECUTE can't access
plpgsql variables, though I'm afraid that fixing that is going to
require a rather complete overhaul of plpgsql :-(. But it needs one
anyway.
regards, tom lane
What about
for i in ...
... new.(tg_argv[i]) ...
Ooof! <clutches chest, falls to ground groaning> Constants or digits or
nothing, then <gets up, dusts himself off>
MHO: this is a really ugly wart on the language, and it does not solve
the problems people would want to solve. It might solve *your* problem
but that's not enough to justify a wart of this size.
<falls down again> But my warts are beautiful!
OK, fair enough. I had to try.
We do need to do something about the fact that EXECUTE can't access
plpgsql variables, though I'm afraid that fixing that is going to
require a rather complete overhaul of plpgsql :-(. But it needs one
anyway.
Look forward to seeing it.
Matt
Mike Rylander wrote:
As an alternative, what would be the possibility of creating a new PL
as a contrib module, say PLPGSQL_NG, to move forward with extensions
like this and perhaps EVALUATE?
I think the idea of rewriting PL/PgSQL from scratch has merit (and it's
something that I think would be well worth doing). IMHO it's not really
worth the trouble to fork the existing code base and add new features to
something that, hopefully, has a limited life span.
-Neil
On Tue, Nov 23, 2004 at 10:33:50AM -0500, Tom Lane wrote:
We do need to do something about the fact that EXECUTE can't access
plpgsql variables, though I'm afraid that fixing that is going to
require a rather complete overhaul of plpgsql :-(. But it needs one
anyway.
Why do you think that it would be difficult to do it with the existing code?
Actually I wanted to implement this for 8.1. I've already had a quick look
at it. My idea was to allow something like "EXECUTE SELECT INTO var1, var2
col1 col2 FROM ...". The code would have to parse the list of variables and
check if they match pl/pgsql variables, execute the query (without the INTO
stuff of course) via SPI and copy back the results, checking type mismatches
or a mismatch in the number of columns. I haven't thought about more complex
types however. Did I miss something?
Which other limitations are there in the current implementation of pl/pgsql?
Joachim
I think the idea of rewriting PL/PgSQL from scratch has merit (and it's
something that I think would be well worth doing). IMHO it's not really
worth the trouble to fork the existing code base and add new features to
something that, hopefully, has a limited life span.
I dunno, I kind of like the idea.
There's always going to be the age old conflict between people who are
basically users like me, and want to see a needed feature asap, and
developers who want to see it done right. And of course doing it right
is only way to go long term.
But so long as I can be fairly sure the syntax is agreed (EXECUTE SELECT
INTO ... in this case) and eventually will make it into the main code
base, I'd be willing to live on the edge for a while.
There'd have to be big 'experimental, everything may change' warnings
all over the contrib version. My only concern is if it would actually
delay the discussed rewrite of plpgsql by splitting the effort.
That's my two one hundreths of a euro, anyway.
Matt
Joachim Wieland <joe@mcknight.de> writes:
On Tue, Nov 23, 2004 at 10:33:50AM -0500, Tom Lane wrote:
We do need to do something about the fact that EXECUTE can't access
plpgsql variables, though I'm afraid that fixing that is going to
require a rather complete overhaul of plpgsql :-(. But it needs one
anyway.
Why do you think that it would be difficult to do it with the existing code?
Because the parsing context all gets thrown away, in particular the
namespace stack; so you can't tell what set of variables ought to be
visible to the EXECUTE.
Which other limitations are there in the current implementation of pl/pgsql?
Its memory management really sucks. Function parsetrees ought to be put
into private contexts, not malloc'd, so that they can reasonably be
freed when the function is deleted or redefined. Likewise for the
cached query plans. Also we need a way to discard cached query plans
when relevant object definitions change (though this is not plpgsql's
fault in particular, it's a generic problem).
Another issue is that we need a better solution for recursive plpgsql
functions. Function parsetrees have to be read-only (with the
exception of adding cached plans that weren't there before) during
execution, else recursive functions don't work right. We have a really
ugly and inefficient answer to this, and I'm not even sure that it
covers 100% of the cases (at one time it definitely didn't, but I can't
recall right now if that got fixed completely). I'm not entirely sure
what a cleaner answer would look like, but I know I don't like it now.
regards, tom lane