Deparsing DDL command strings
Hi,
Working on the Event Trigger next patch series, one of the challenge to
address is deparsing the DDL commands so that the User Defined Function
used by the trigger definition has that information.
I'm making good progress on that, it's some amount of code but pretty
straightforward. The only road blocker for now is best summarized by the
following comment in src/backend/commands/tablecmds.c
* Now add any newly specified column default values and CHECK constraints
* to the new relation. These are passed to us in the form of raw
* parsetrees; we need to transform them to executable expression trees
* before they can be added. The most convenient way to do that is to
* apply the parser's transformExpr routine, but transformExpr doesn't
* work unless we have a pre-existing relation. So, the transformation has
* to be postponed to this final step of CREATE TABLE.
So I have a Node *parsetree containing some CHECK and DEFAULT raw
expressions to work with. Those can reference non existing tables,
either to-be-created or already-dropped.
Should I work on transformExpr() so that it knows how to work with a non
existing relation, or should I write a new transformRawExpr() that knows
how to handle this case?
Or do we want to limit the deparsing feature and not accept some CHECK
and DEFAULT expressions (though not being able to cope with T_A_Const is
a bummer)? (I don't mean to do it, I still have to mention the choice).
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:
So I have a Node *parsetree containing some CHECK and DEFAULT raw
expressions to work with. Those can reference non existing tables,
either to-be-created or already-dropped.
Why don't you just pass the original query string, instead of writing
a mass of maintenance-requiring new code to reproduce it?
This would require (1) making sure the query string is still available
where needed. I think we are 99% of the way there but maybe not 100%.
(2) being able to identify the substring corresponding to the current
command, when we're processing a multi-command string. The parser could
easily provide that, I think --- we've just never insisted that it do
so before.
regards, tom lane
On Friday, October 05, 2012 04:03:03 PM Tom Lane wrote:
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:
So I have a Node *parsetree containing some CHECK and DEFAULT raw
expressions to work with. Those can reference non existing tables,
either to-be-created or already-dropped.Why don't you just pass the original query string, instead of writing
a mass of maintenance-requiring new code to reproduce it?
Its not easy to know which tables are referenced in e.g. an ALTER TABLE
statement if the original statement didn't schema qualify everything.
Its also not really possible to trigger cascading effects like the creating of
a sequence from a serial column that way...
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes:
On Friday, October 05, 2012 04:03:03 PM Tom Lane wrote:
Why don't you just pass the original query string, instead of writing
a mass of maintenance-requiring new code to reproduce it?
Its not easy to know which tables are referenced in e.g. an ALTER TABLE
statement if the original statement didn't schema qualify everything.
What he's talking about is deparsing the raw grammar output, which by
definition contains no more information than is in the query string.
Deparsing post-parse-analysis trees is a different problem (for which
code already exists, unlike the raw-tree case).
regards, tom lane
On Friday, October 05, 2012 04:24:55 PM Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On Friday, October 05, 2012 04:03:03 PM Tom Lane wrote:
Why don't you just pass the original query string, instead of writing
a mass of maintenance-requiring new code to reproduce it?Its not easy to know which tables are referenced in e.g. an ALTER TABLE
statement if the original statement didn't schema qualify everything.What he's talking about is deparsing the raw grammar output, which by
definition contains no more information than is in the query string.
Deparsing post-parse-analysis trees is a different problem (for which
code already exists, unlike the raw-tree case).
Sure. I am not saying Dimitri's approach is perfect. I am just listing some of
reasons why I think just using the raw input string isn't sufficient...
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Tom Lane <tgl@sss.pgh.pa.us> writes:
Why don't you just pass the original query string, instead of writing
a mass of maintenance-requiring new code to reproduce it?
Do we have that original query string in all cases, including EXECUTE
like spi calls from any PL? What about commands that internally set a
parsetree to feed ProcessUtility() directly? Do we want to refactor them
all just now as a prerequisite?
Also, we need to normalize that command string. Tools needing to look at
it won't want to depend on random white spacing and other oddities.
Those tools could also use the Node *parsetree and be written only in C,
but then what about giving them a head start by having a parsetree
walker in our code base?
Then we want to qualify object names. Some type names have already been
taken care of apparently by the parser here, relation names not yet and
we need to cope with non existing relation names.
My freshly grown limited understanding is that we currently only know
how to produce a "cooked" parse tree from the raw one if all referenced
objects do exist in the catalogs, so that we will postpone some
"cooking" (transform*) until the main object in a CREATE command are
defined, right?
Is that something we want to revisit?
Another option would be to capture search_path and other parse time
impacting GUCs, call that the query environment, and have a way to
serialize and pass in the environment and restore it either on the same
host or on another (replication is an important use case here).
Yet another option would be to output both the original query string and
something that's meant for easy machine parsing yet is not the internal
representation of the query, so that we're free to hack the parser at
will in between releases, even minor. Building that new code friendly
document will require about the same amount of code as spitting out
normalized SQL, I believe.
Yet another option would be to go the "sax" way rather than the "dom"
one: instead of spitting out a new command string have the user register
callbacks and only implement walking down the parsetree and calling
those. I'm not sure how much maintenance work we would save here, and
I'm not seeing another reason why going that way.
Yet another option would be to only provide for a hook and some space in
the EventTriggerData structure for extensions to register themselves and
provide whatever deparsing they need. But then we need to figure out a
way for the user defined function to use the resulting opaque data, from
any PL language, if only to be able to call some extension's API to
process it. Looks like a very messy way to punt the work outside of
core.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:
Tom Lane <tgl@sss.pgh.pa.us> writes:
Why don't you just pass the original query string, instead of writing
a mass of maintenance-requiring new code to reproduce it?
Do we have that original query string in all cases, including EXECUTE
like spi calls from any PL?
As I said, I believe we are pretty close to that if not there already.
There are a number of utility statements that *require* the source
string to be provided, because they do parse analysis and since 8.4 or
so the original string has been required for that. So we certainly have
the string available at ProcessUtility. I've not looked at the event
trigger patch at all, so I don't know what amount of refactoring is
going to be required below that ... but the point I'm trying to make is
that it would be a one-and-done task. Adding a requirement to be able
to decompile raw parse trees will be a permanent drag on every type of
SQL feature addition.
Also, we need to normalize that command string. Tools needing to look at
it won't want to depend on random white spacing and other oddities.
Instead, they'll get to depend on the oddities of parse transformations
(ie, what's done in the raw grammar vs. what's done in parse_analyze),
which is something we whack around regularly. Besides which, you've
merely asserted this requirement without providing any evidence that
it's important at all, let alone important enough to justify the kind of
maintenance burden that you want to saddle us with.
Those tools could also use the Node *parsetree and be written only in C,
but then what about giving them a head start by having a parsetree
walker in our code base?
Well, as far as a raw parsetree *walker* goes, there already is one in
nodeFuncs.c. It does not follow that we need something that tries to
reconstruct SQL from that. It's not clear to me that there is any
production-grade use-case for which reconstructed SQL is more useful
than examining the parse tree. Now, if you're talking about half-baked
code that gets only some cases right, then yeah grepping reconstructed
SQL might serve. But I'm not excited about doing a lot of work to
support partial solutions.
Then we want to qualify object names. Some type names have already been
taken care of apparently by the parser here, relation names not yet and
we need to cope with non existing relation names.
Which is exactly what you *won't* be able to do anything about when
looking at a raw parse tree. It's just a different presentation of the
query string.
My freshly grown limited understanding is that we currently only know
how to produce a "cooked" parse tree from the raw one if all referenced
objects do exist in the catalogs,
Well, yeah. Anything else is magic not code.
regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes:
going to be required below that ... but the point I'm trying to make is
that it would be a one-and-done task. Adding a requirement to be able
to decompile raw parse trees will be a permanent drag on every type of
SQL feature addition.
I'll show some examples of very involved command (CREATE and ALTER TABLE
are the most complex we have I think) and some very simple commands
(DROP TABLE is one of the simplest), so that we can make up our minds on
that angle.
Then we want to qualify object names. Some type names have already been
taken care of apparently by the parser here, relation names not yet and
we need to cope with non existing relation names.Which is exactly what you *won't* be able to do anything about when
looking at a raw parse tree. It's just a different presentation of the
query string.
So, I'm currently adding the deparsing to the existing only event we
have, which is ddl_command_start. That's maybe not the best place where
to do it, I even now wonder if we can do it there at all.
Doing the same thing at ddl_command_end would allow us have all the
information we need and leave nothing to magic guesses: full schema
qualification of all objects involved, main object(s) OIDs available,
all the jazz.
Well, yeah. Anything else is magic not code.
Well, prepending an object name with the first entry of the current
search_path as its schema is not that far a stretch when the object is
being created, as far as I see it. It's more reasonable to document that
the rewritten no-ambiguities command string is only available for
ddl_command_end events, though.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 10/5/12 11:15 AM, Tom Lane wrote:
Also, we need to normalize that command string. Tools needing to look at
it won't want to depend on random white spacing and other oddities.
Instead, they'll get to depend on the oddities of parse transformations
(ie, what's done in the raw grammar vs. what's done in parse_analyze),
which is something we whack around regularly. Besides which, you've
merely asserted this requirement without providing any evidence that
it's important at all, let alone important enough to justify the kind of
maintenance burden that you want to saddle us with.
I definitely want to be able to parse DDL commands to be able to either enforce things or to drive other parts of the system based on what's changing. Without the ability to capture (and parse) DDL commands I'm stuck creating wrapper functions around anything I want to capture and then trying to ensure that everyone uses the wrappers and not the raw DDL commands.
Event triggers that just spit out raw SQL give me the first part of this, but not the second part: I'm still stuck trying to parse raw SQL on my own. Having normalized SQL to parse should make that a bit easier, but ideally I'd like to be able to pull specific elements out of a command. I'd want to be able to do things like:
IF command is ALTER TABLE THEN
FOR EACH subcommand
IF subcommand IS DROP COLUMN THEN
do something that needs to know what column is being dropped
ELSE IF subcommand IS ADD COLUMN THEN
do something that needs to know the definition of the column being added
I don't think every bit of that has to be dealt with by the event trigger code itself. For example, if you're adding a column to a table and the entries have already been made in the catalog, you could query to get the details of the column definition if you were given an OID into pg_attributes.
Having said all that, an event system that spits back the raw SQL would certainly be better than nothing. But realize that people would still need to do parsing on it (ie: replication solutions will need to know what table just got ALTER'd).
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net
Jim Nasby <jim@nasby.net> writes:
I definitely want to be able to parse DDL commands to be able to
either enforce things or to drive other parts of the system based on
what's changing. Without the ability to capture (and parse) DDL
commands I'm stuck creating wrapper functions around anything I want
to capture and then trying to ensure that everyone uses the wrappers
and not the raw DDL commands.
Are you mainly working on some Auditing system?
Event triggers that just spit out raw SQL give me the first part of
this, but not the second part: I'm still stuck trying to parse raw SQL
on my own. Having normalized SQL to parse should make that a bit
easier, but ideally I'd like to be able to pull specific elements out
of a command. I'd want to be able to do things like:
The current design for event triggers is to spit out several things:
- command tag is already commited
- object id, can be null
- schema name, can be null
- object name
- operation either ALTER, CREATE or DROP, …
- object type TABLE, VIEW, FUNCTION, …
- normalized command string
After some more thinking, it appears that in several case you want to
have all those information filled in and you don't want to care if that
means your trigger needs to run at ddl_command_start or ddl_command_end.
The proposal I want to make here is to introduce a generic event (or an
event alias) named ddl_command_trace that the system provides at the
right spot where you have the information. That's useful when you don't
intend to divert the execution of the DDL and need to know all about it.
For a DROP operation, ddl_command_trace would be ddl_command_start, and
for a CREATE operation, that would be ddl_command_end, so that the
target object (still|already) exists when the trigger is fired.
IF command is ALTER TABLE THEN
That's called TG_TAG, see
http://www.postgresql.org/docs/devel/static/plpgsql-trigger.html#PLPGSQL-EVENT-TRIGGER
FOR EACH subcommand
IF subcommand IS DROP COLUMN THEN
do something that needs to know what column is being dropped
ELSE IF subcommand IS ADD COLUMN THEN
do something that needs to know the definition of the column being added
We decided not to publish any notion of subcommand at this stage.
I don't think every bit of that has to be dealt with by the event
trigger code itself. For example, if you're adding a column to a table
and the entries have already been made in the catalog, you could query
to get the details of the column definition if you were given an OID
into pg_attributes.
It's easy enough to provide the OID of the newly created main command
target object, it's harder to provide in a generic way all the OID of
the objects you might be interested into, because each command has its
own set of such.
DROP can target multiple objects, they all are the main target. ALTER
target only a single object, but can link to dependent objects. CREATE
an operator class or a cast and you're talking about a bunch of
operators and functions to tie together. It's not that easy.
Having said all that, an event system that spits back the raw SQL
would certainly be better than nothing. But realize that people would
still need to do parsing on it (ie: replication solutions will need to
know what table just got ALTER'd).
You would have most of what you're asking. I think that looking into the
normalized string to get the information you need when you already know
you're looking at an ALTER TABLE statement and you already have the
object references (schema, name, oid) is going to make things easier.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:
I'll show some examples of very involved command (CREATE and ALTER TABLE
are the most complex we have I think) and some very simple commands
(DROP TABLE is one of the simplest), so that we can make up our minds on
that angle.
So please find attached a demo patch to show up what it takes to deparse
complex command strings, and here's inline some example of why that's a
good idea to actually deparse them rather than hand out whatever the
user typed in:
\dy
List of event triggers
-[ RECORD 1 ]--------------------------
Name | regress_event_trigger_trace
Event | ddl_command_trace
Owner | dim
Enabled | enabled
Procedure | test_event_trigger
Tags |
foo=# drop table foo;
NOTICE: test_event_trigger: ddl_command_start DROP TABLE
NOTICE: test_event_trigger: DROP, TABLE
NOTICE: test_event_trigger: DROP TABLE public.foo RESTRICT;
DROP TABLE
foo=# create table foo(id serial primary key,
f2 text default 'plop' check (f2 != ''));
NOTICE: test_event_trigger: ddl_command_end CREATE TABLE
NOTICE: test_event_trigger: CREATE, TABLE
NOTICE: test_event_trigger: CREATE TABLE public.foo (id integer PRIMARY KEY DEFAULT nextval('foo_id_seq'::regclass) NOT NULL, f2 text DEFAULT 'plop' CHECK ((f2 <> ''::text)), CHECK ((f2 <> ''::text)));
CREATE TABLE
The user of that command string still has to know what to look for and
maybe should include a proper SQL parser, but at least it doesn't need
to do much guesswork about how the serial attached sequence will get
named by the system and such oddities.
The attached patch also includes support for the complete ALTER TABLE
command and some more (CREATE SEQUENCE, CREATE EXTENSION).
Doing the same thing at ddl_command_end would allow us have all the
information we need and leave nothing to magic guesses: full schema
qualification of all objects involved, main object(s) OIDs available,
all the jazz.
That's what is happening now in the attached patch, also with a new
event called 'ddl_command_trace' which will either map to _start or _end
depending on the operation (we want _start when doing DROP TABLE, we
want the operation to be complete before tracing it when talking about a
CREATE or an ALTER table).
And here's the scope we're talking about, including new command types,
new information passed down to user triggers, and the rewrite support
itself, isolated away:
git diff --stat postgres/master..
src/backend/catalog/heap.c | 5 +-
src/backend/commands/event_trigger.c | 241 ++++-
src/backend/tcop/utility.c | 187 ++--
src/backend/utils/adt/Makefile | 2 +-
src/backend/utils/adt/ddl_rewrite.c | 1415 +++++++++++++++++++++++++++
src/backend/utils/adt/ruleutils.c | 9 +-
src/backend/utils/cache/evtcache.c | 4 +
src/include/catalog/heap.h | 4 +
src/include/commands/event_trigger.h | 43 +-
src/include/utils/builtins.h | 14 +
src/include/utils/evtcache.h | 4 +-
src/pl/plpgsql/src/pl_comp.c | 40 +
src/pl/plpgsql/src/pl_exec.c | 53 +-
src/pl/plpgsql/src/plpgsql.h | 5 +
src/test/regress/expected/event_trigger.out | 40 +-
src/test/regress/sql/event_trigger.sql | 36 +-
16 files changed, 1938 insertions(+), 164 deletions(-)
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Attachments:
ddl_rewrite.1.patch.gzapplication/octet-streamDownload
��JxPddl_rewrite.1.patch �=kw�H�����x����1`c`}3g=�d���Y ��=s�
h-$E-�����[U�z�0vl��L�~TW����[��5�����0~ }�`��k��Lp����*&�������cf�:��*�h��!�U��GG;���ka�����������6+��1����\�f�?_�`���d`9�������{�����|a��q�������`��X�=q���+�?�m���������vMn�B��-g&|+(��Dh�����E�~�;3a^�>�!s��Iu�_�W���{���'�K��P�aCa��&���>��q��RM��Y��� b�=�����?�C���!�_T�6�k�+���������Q�e4Y>� E������^�-c{�� ��!�0�_"`���b�S���`��g7��3��5���z��>����'��V�� G����d����%[F���J�q�s�����N0|k:��"�h���H7Z������q}�J��D�V5E��A���o����d9�=v�������&i��v����z��
��j�|��>��4 �hIkdGL�Z~JdW��ED
��P���s����~���wXp���� ?4��;0�XD\A���#l��9QV���Y��[4!������+�sz ������Pbv?F�z�AiP^M�
����,��n��4� �����f�
�W!Y-XV���s��C��L-�u����f�H��%����v����#����� ��������<�#��)t�Q����A���]��>��&h�u����������I���?���n���a�"$�m��o���2I��Q��I�U�F9~�i?D�>��Dd������D�B4x����]I�NN��&�����-p�Y�d�X�u@�'�O��������-SGu@o�,lB6>���������L�+��J.ZfQ}�I�
��\�.�?�~�/�SEQuC�y���R��|2���uO]Ek��$v�t�"��@1 `�z�tp��E���S2��C�{������*�h����q�]{�$]'|���c���O/����C�2�06��#8��a�4��8K'�k��+��p����!yz1����p����H
7��x3�zW���b�M��v���H��b���y5��j�D0p(_��P�W,����6��Cr������>
���j����_H��W��BvX�����qZE q]�,n�Xru����j��5�+T�������{��i����m�� ���r���W�*7�F���C3������Z�������c\��B��l��P>*�3Z]7�n��3��q��J������e9�FI�u�X\E��'�b/� �5Z�p�K�r0��?|������W�a����py����HT*Za�tG�f@[�](,�R���G70b+�ZTD� -����|�u�j�e"��"���O,�F�F ������k
Z@��B�i�F5j4��i�r��6�o��<�R
���:A�B������3��%��0���B�Yv;��O���&8S�a�&3�:3d���H���4�Va$�\�I��ARM��`��d�gh6/R��^���D���S8���1K�!�H�����@��8Z��-$��{��F�+q7�����$)D�90��[�{�`!x������N ��B�O��^���q�w����K�O���}���F�r�`p��{Gb��������_��GH�[�Vn1��R���L(�|4��ap�P����JB5�CL����o�HB�o���TmR_��A��r��`����>�7���^J�6�0WX&�Q�hs����� �?��h)'!12�$��&�'
���(� r�����Ka��7W��G,O�������$�� P�����*���BL�Q�$tL�-�XR�r���Y6�� �$����� �P�~�0-�q�EJ����w��E�0/A45�(� );p��w�v�����9���I1��gP��V�s��:u�� ���s<�� |Z��J����B$~��$�������QA�M<2,k��l���|G�F����Gr!���"�~��=�>�(�� �^�Z�nB�g�@�#���`�K��3���5��3�n��@�qSS�X��,C���"�������n9�'K�bl��YL��a!�� }u�s����=���:K�+�#t`O$aHK��`9��#*�J�-$��P8f�e L�wPg�bb9�o���Y3];�;�e�M�)��,��}�L((�ImYH����"�3�|���?���g�@Y��{
�
D��7i����Z�,���v��������@���c1
�S����R�S8 ����-�?�<a��������s�>6j���<J��f�g�V
FT��� C$QZ�L�Y������1��d�P W��u�C���2�p��}��'��'��`�����d�V.�\�=���baWKOS�j4Q�bB����t����H&3�D� ��A���z������H�kU��������n>^
�k\X��ll�vm�f !��&�����a�@�Ng7�s��ZHp�C� �!��� ��,����R��m����Suo@60B����6q�����<w��8���8�5��� z2b`cm�KrC�����V�yr���F^mb�����!�w����e)��m��U|Qi�7�������sL��}`� YA4��*_���gs'-��F'�"���Gv��}e���`h9���L�=Th?c�S�j���[v�i��/L����/�t��&^��!
��|M�+��14���7E@7:wS�d��jjv�+%(���Q��Eu
P@w�$��(M�HO����0B7Vp���>"��i��FJ:�����+�F��`E�J�}q�$K��8�*�;�
@����%2�J�� =��
'���Oi!4�9�"��D������z]g���N�fNzy��^^p���9��uNzys������c�9�o���}��I�������e�m6r�*K�ZF�����<O�QL����A�y����\��V��4-���?�XM���_�N<�?a������K��V����v0�.$l��d�
�������l��M4����W*�q��GG�o��/�!�5����4jUV�����3��x� w������2}�co��=P0*<