Command Triggers patch v18
Hi,
I guess I sent v17 a little early considering that we now already have
v18 including support for CREATE TABLE AS and SELECT INTO, thanks to the
work of Andres and Tom.
There was some spurious tags in the sgml files in v17 that I did clean
up too.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Attachments:
On 20 March 2012 17:49, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
Hi,
I guess I sent v17 a little early considering that we now already have
v18 including support for CREATE TABLE AS and SELECT INTO, thanks to the
work of Andres and Tom.There was some spurious tags in the sgml files in v17 that I did clean
up too.
The new command triggers work correctly.
Having looked at your regression tests, you don't seem to have enough
"before" triggers in the tests. There's no test for before CREATE
TABLE, CREATE TABLE AS or SELECT INTO. In my tests I have 170 unique
command triggers, but there are only 44 in the regression test. Is
there a reason why there aren't many tests?
A problem still outstanding is that when I build the docs, the CREATE
COMMAND TRIGGER is listed after COMMAND TRIGGER in
html/sql-commands.html. I recall you mentioned you didn't have this
issue on your side. Can you just confirm this again? I believe I've
located the cause of this problem. In doc/src/sgml/reference.sgml the
ALTER/CREATE/DROP COMMAND TRIGGER references are placed below their
respective trigger counterparts. Putting them back into alphabetical
order corrects the issue.
On the pg_cmdtrigger page in the documentation, I'd recommend the
following changes:
s/Trigger's name/Trigger name/
s/Command TAG/Command tag/
s/The OID of the function that is called by this command trigger./The
OID of the function called by this command trigger./
Also "contype" should read "ctgtype".
Note that I haven't tested pl/Perl, pl/Python or pl/tcl yet.
Regards
--
Thom
Thom Brown <thom@linux.com> writes:
The new command triggers work correctly.
Thanks for your continued testing :)
Having looked at your regression tests, you don't seem to have enough
"before" triggers in the tests. There's no test for before CREATE
TABLE, CREATE TABLE AS or SELECT INTO. In my tests I have 170 unique
command triggers, but there are only 44 in the regression test. Is
there a reason why there aren't many tests?
Now that we share the same code for ANY triggers and specific ones, I
guess we could drop a lot of specific command triggers from the
regression tests.
A problem still outstanding is that when I build the docs, the CREATE
I would like to get back on code level review now if at all possible,
and I would integrate your suggestions here into the next patch revision
if another one is needed.
The only point yet to address from last round from Andres is about the
API around CommandFiresTrigger() and the Memory Context we use here.
We're missing an explicit Reset call, and to be able to have we need to
have a more complex API, because of the way RemoveObjects() and
RemoveRelations() work.
We would need to add no-reset APIs and an entry point to manually reset
the memory context, which currently gets disposed at the same time as
its parent context, the current one that's been setup before entering
standard_ProcessUtility().
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Friday, March 23, 2012 04:32:02 PM Dimitri Fontaine wrote:
I would like to get back on code level review now if at all possible,
and I would integrate your suggestions here into the next patch revision
if another one is needed.
Ok, I will give it another go.
Btw I just wanted to alert you to being careful when checking in the expect
files ;)
NOTICE: snitch: BEFORE any DROP TRIGGER
-ERROR: unexpected name list length (3)
+NOTICE: snitch: BEFORE DROP TRIGGER <NULL> foo_trigger
+NOTICE: snitch: AFTER any DROP TRIGGER
create conversion test for 'utf8' to 'sjis' from utf8_to_sjis;
j
you had an apparerently un-noticed error in there ;)
1.
if (!HeapTupleIsValid(tup))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg("command trigger \"%s\" does not exist, skipping",
trigname)));
The "skipping" part looks like a copy/pasto...
2.
In PLy_exec_command_trigger youre doing a PG_TRY() which looks pointless in
the current incarnation. Did you intend to add something in the catch?
I think without doing a decref of pltdata both in the sucess and the failure
path youre leaking memory.
3.
In plpython: Why do you pass objectId/pltobjectname/... as "NULL" instead of
None? Using a string for it seems like a bad from of in-band signalling to me.
4.
Not sure whether InitCommandContext is the best place to suppress command
trigger usage for some commands. That seems rather unintuitive to me. But
perhaps the implementation-ease is big enough...
Thats everything new I found... Not bad I think. After this somebody else
should take a look at I think (commiter or not).
The only point yet to address from last round from Andres is about the
API around CommandFiresTrigger() and the Memory Context we use here.
We're missing an explicit Reset call, and to be able to have we need to
have a more complex API, because of the way RemoveObjects() and
RemoveRelations() work.We would need to add no-reset APIs and an entry point to manually reset
the memory context, which currently gets disposed at the same time as
its parent context, the current one that's been setup before entering
standard_ProcessUtility().
Not sure if youre expecting further input from me about that?
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 20, 2012 at 1:49 PM, Dimitri Fontaine
<dimitri@2ndquadrant.fr> wrote:
Hi,
I guess I sent v17 a little early considering that we now already have
v18 including support for CREATE TABLE AS and SELECT INTO, thanks to the
work of Andres and Tom.There was some spurious tags in the sgml files in v17 that I did clean
up too.
There are spurious whitespace changes in the documentation. Some of
these are of the following form:
- return_arr
+ return_arr
create_trigger.sgml adds a stray blank line as well.
I think that the syntax for enabling or disabling a command trigger
should not use the keyword SET. So just:
ALTER COMMAND TRIGGER name ENABLE [ ALWAYS | REPLICA ];
ALTER COMMAND TRIGGER name DISABLE;
That way is more parallel with the existing syntax for ordinary triggers.
+ The name to give the new trigger. This must be distinct from the name
+ of any other trigger for the same table. The name cannot be
+ schema-qualified.
"For the same table" is a copy-and-pasteo.
- /* Look up object address. */
+ /* Look up object address. */
Spurious diff.
I think that get_object_address_cmdtrigger should be turned into a
case within get_object_address_unqualified; I can't see a reason for
it to have its own function.
+ elog(ERROR, "could not find tuple for command trigger
%u", trigOid);
The standard phrasing is "cache lookup failed for command trigger %u".
+/*
+ * Functions to execute the command triggers.
+ *
+ * We call the functions that matches the command triggers definitions in
+ * alphabetical order, and give them those arguments:
+ *
+ * command tag, text
+ * objectId, oid
+ * schemaname, text
+ * objectname, text
+ *
+ */
This will not survive pgindent, unless you surround it with ------ and
-----. More generally, it would be nice if you could pgindent the
whole patch and then fix anything that breaks in such a way that it
will survive the next pgindent run.
+ * if (CommandFiresTriggers(cmd)
Missing paren.
+ /* before or instead of command trigger might have cancelled
the command */
Leftovers.
@@ -169,7 +169,7 @@ ExplainQuery(ExplainStmt *stmt, const char *queryString,
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("EXPLAIN option BUFFERS
requires ANALYZE")));
-
+
/* if the timing was not set explicitly, set default value */
es.timing = (timing_set) ? es.timing : es.analyze;
Spurious diff.
All the changes to ruleutils.c appear spurious. Ditto the change to
src/test/regress/sql/triggers.sql.
In the pg_dump support, it seems you're going to try to execute an
empty query if this is run against a pre-9.2 server. It seems like
you should just return, or something like that. What do we do in
comparable cases elsewhere?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Mar 26, 2012 at 9:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:
[ various trivial issues ]
OK, now I got that out of my system. Now on to bigger topics.
I am extremely concerned about the way in which this patch arranges to
invoke command triggers. You've got call sites spattered all over the
place, and I think that's going to be, basically, an unfixable mess
and a breeding ground for bugs. On a first read-through:
1. BEFORE ALTER TABLE triggers are fired in AlterTable(). However, an
ALTER TABLE statement does not necessarily call AlterTable() once and
only once. The real top-level logic for AlterTable is in
ProcessUtility(), which runs transformAlterTableStmt() to generate a
list of commands and then either calls AlterTable() or recursively
invokes ProcessUtility() for each one. This means that if IF EXISTS
is used and the table does not exist, then BEFORE command triggers
won't get invoked at all. On the other hand, if multiple commands are
specified, then I think AlterTable() may get invoked either once or
more than once, depending on exactly which commands are specified; and
we might manage to fire some CREATE INDEX command triggers or whatnot
as well, again depending on exactly what that ALTER TABLE command is
doing.
2. BEFORE CREATE TABLE triggers seem to have similar issues; see
transformCreateStmt.
rhaas=# create table foo (a serial primary key);
NOTICE: CREATE TABLE will create implicit sequence "foo_a_seq" for
serial column "foo.a"
NOTICE: snitch: BEFORE CREATE SEQUENCE public.foo_a_seq [<NULL>]
NOTICE: snitch: AFTER CREATE SEQUENCE public.foo_a_seq [16392]
NOTICE: snitch: BEFORE CREATE TABLE public.foo [<NULL>]
NOTICE: snitch: BEFORE CREATE INDEX public.foo_pkey [<NULL>]
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"foo_pkey" for table "foo"
NOTICE: snitch: AFTER CREATE INDEX public.foo_pkey [16398]
NOTICE: snitch: BEFORE ALTER SEQUENCE public.foo_a_seq [16392]
NOTICE: snitch: AFTER ALTER SEQUENCE public.foo_a_seq [16392]
NOTICE: snitch: AFTER CREATE TABLE public.foo [16394]
CREATE TABLE
3. RemoveRelations() and RemoveObjects() similarly take the position
that when the object does not exist, command triggers need not fire.
This seems entirely arbitrary. CREATE EXTENSION IF NOT EXISTS,
however, takes the opposite (and probably correct) position that even
if we decide not to create the extension, we should still fire command
triggers. In a similar vein, AlterFunctionOwner_oid() skips firing
the command triggers if the old and new owners happen to be the same,
but other forms of ALTER FUNCTION (e.g. ALTER FUNCTION .. COST) fire
triggers even if the old and new parameters are the same; and
AlterForeignDataWrapperOwner_internal does NOT skip firing command
triggers just because the old and new owners are the same.
4. RemoveRelations() and RemoveObjects() also take the position that a
statement like "DROP TABLE foo, bar" should fire each relevant BEFORE
command trigger twice, then drop both objects, then fire each relevant
AFTER command trigger twice. I think that's wrong. It's 100% clear
that the user executed one, and only one, command. The only real
argument for it is that if we were to only fire command triggers once,
we wouldn't be able to pass identifying information about the objects
being dropped, since the argument-passing mechanism only has room to
pass details about a single object. I think that means that the
argument-passing mechanism needs improvement, not that we should
redefine one command as two commands. Something like CREATE SCHEMA
foo CREATE VIEW bar ... has the same problem: the user only entered
one command, but we're firing command triggers as if they entered
multiple commands. This case is possibly more defensible than the
other one, but note that the two aren't consistent with each other as
regards the order of trigger firing, and I actually think that they're
both wrong, and that only the toplevel command should fire triggers.
5. It seems desirable for BEFORE command triggers to fire at a
consistent point during command execution, but currently they don't.
For example, BEFORE DROP VIEW triggers don't fire until we've verified
that q exists, is a view, and that we have permission to drop it, but
LOAD triggers fire much earlier, before we've really checked anything
at all. And ALTER TABLE is somewhere in the middle: we fire the
BEFORE trigger after checking permissions on the main table, but
before all permissions checks are done, viz:
rhaas=> alter table p add foreign key (a) references p2 (a);
NOTICE: snitch: BEFORE ALTER TABLE public.p [16418]
ERROR: permission denied for relation p2
6. Another pretty hideous aspect of the CREATE TABLE behavior is that
AFTER triggers are fired from a completely different place than BEFORE
triggers. It's not this patch's fault that our CREATE TABLE and
ALTER TABLE code is a Rube Goldberg machine, but finding some place to
jam each trigger invocation in that happens to (mostly) work as the
code exists today is not sufficiently future-proof.
I am not too sure the above list is exhaustive. I think that what's
leading you down the garden path here is the desire to pass as much
information as possible to each command trigger. But for that, we
could simply put the hooks in ProcessUtility(), adding perhaps a bit
of logic to deal with recursive invocations of ProcessUtility() to
handle subcommands, and call it good. In fact, I think that for AFTER
triggers, you could make that work anyway - every time you enter
ProcessUtility(), you push something onto a global stack, and then
individual commands annotate that data structure with details such as
the OIDs of objects being operated on, and after processing the
command we fire the after triggers from ProcessUtility(). That way,
no matter how anybody changes the code in the future, we're guaranteed
that AFTER triggers will always fire exactly once per command, and the
worst thing that happens if someone fails to make the right changes in
some other part of the code is that the AFTER trigger doesn't get all
the command details, and even that seems less likely since a lot of
the early exit paths won't have those details to supply anyway.
BEFORE triggers are a lot harder, because there you want to gather a
certain amount of information and then fire the trigger. There's more
than a little bit of tension there. The longer you delay firing the
command trigger, the more information you can reliably supply - but on
the other hand, the more chance that things will error out before we
even reach the command trigger firing site. I think this is going to
cause problems in the future as people want to supply more details to
the command trigger - people who want more details to, say,
selectively deny commands will want to fire the command trigger later
in the execution sequence, but people who want to use it for auditing
will want to fire it earlier in the sequence, to log all attempts. I
don't know how to resolve this tension, but it seems to me that we'd
better try to pick a rule and follow it consistently; and we'd better
make sure that the rule is clear, documented, and well-defined.
More nitpicking:
- Creating a command trigger on a nonexistent function fails with
"cache lookup failed for function 0".
- tablespace.c has useless leftover changes.
- One of the examples in the docs says "RETURNS command trigger"
instead of "RETURNS command_trigger".
- The "DROP COMMAND TRIGGER" documentation contains leftovers that
refer to a parameter called "command" which no longer exists.
- The example in the "DROP COMMAND TRIGGER" documentation fails to
execute, as the syntax is no longer recognized.
- check_cmdtrigger_name is mostly duplicative of get_cmdtrigger_oid.
I think it can be rewritten as if (OidIsValid(get_cmdtrigger_oid(...))
ereport(...).
- I believe that all the changes to dropdb() can be reverted.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
First things first, thanks for the review!
I'm working on a new version of the patch to fix all the specific
comments you called "nitpicking" here and in your previous email. This
new patch will also implement a single list of triggers to run in
alphabetical order, not split by ANY/specific command.
Robert Haas <robertmhaas@gmail.com> writes:
I am extremely concerned about the way in which this patch arranges to
invoke command triggers. You've got call sites spattered all over the
place, and I think that's going to be, basically, an unfixable mess
and a breeding ground for bugs. On a first read-through:
In the first versions of the patch I did try to have a single point
where to integrate the feature and that didn't work out. If you want to
publish information such as object id, name and schema you need to have
specialized code spread out everywhere.
Then about where to call the trigger, it's a per-command decision with a
general policy. Your comments here are of two kinds, mostly about having
to guess the policy because it's not explicit, and some specifics that
either are in question or not following up on the policy.
The policy I've been willing to install is that command triggers should
get fired once the basic error checking is done. That's not perfect for
auditing purposes *if you want to log all attempts* but it's good enough
to audit all commands that had an impact on your system, and you still
can block commands.
Also, in most commands you can't get the object id and name before the
basic error checking is done.
Now, about the IF NOT EXISTS case, with the policy just described
there's no reason to trigger user code, but I can also see your position
here.
1. BEFORE ALTER TABLE triggers are fired in AlterTable(). However, an
I used to have that in utility.c but Andres had me move that out, and it
seems like I should get back to utility.c for the reasons you're
mentioning and that I missed.
2. BEFORE CREATE TABLE triggers seem to have similar issues; see
transformCreateStmt.rhaas=# create table foo (a serial primary key);
NOTICE: CREATE TABLE will create implicit sequence "foo_a_seq" for
serial column "foo.a"
NOTICE: snitch: BEFORE CREATE SEQUENCE public.foo_a_seq [<NULL>]
NOTICE: snitch: AFTER CREATE SEQUENCE public.foo_a_seq [16392]
NOTICE: snitch: BEFORE CREATE TABLE public.foo [<NULL>]
NOTICE: snitch: BEFORE CREATE INDEX public.foo_pkey [<NULL>]
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"foo_pkey" for table "foo"
NOTICE: snitch: AFTER CREATE INDEX public.foo_pkey [16398]
NOTICE: snitch: BEFORE ALTER SEQUENCE public.foo_a_seq [16392]
NOTICE: snitch: AFTER ALTER SEQUENCE public.foo_a_seq [16392]
NOTICE: snitch: AFTER CREATE TABLE public.foo [16394]
CREATE TABLE
That's meant to be a feature of the command trigger system, that's been
asked about by a lot of people. Having triggers fire for sub commands is
possibly The Right Thing™, or at least the most popular one.
3. RemoveRelations() and RemoveObjects() similarly take the position
that when the object does not exist, command triggers need not fire.
This seems entirely arbitrary. CREATE EXTENSION IF NOT EXISTS,
however, takes the opposite (and probably correct) position that even
if we decide not to create the extension, we should still fire command
triggers. In a similar vein, AlterFunctionOwner_oid() skips firing
the command triggers if the old and new owners happen to be the same,
but other forms of ALTER FUNCTION (e.g. ALTER FUNCTION .. COST) fire
triggers even if the old and new parameters are the same; and
AlterForeignDataWrapperOwner_internal does NOT skip firing command
triggers just because the old and new owners are the same.
We integrate here with the code as it stands, I didn't question the
error management already existing. Do we need to revise that in the
scope of the command triggers patch?
4. RemoveRelations() and RemoveObjects() also take the position that a
statement like "DROP TABLE foo, bar" should fire each relevant BEFORE
command trigger twice, then drop both objects, then fire each relevant
AFTER command trigger twice. I think that's wrong. It's 100% clear
See above, it's what users are asking us to implement as a feature.
5. It seems desirable for BEFORE command triggers to fire at a
consistent point during command execution, but currently they don't.
The policy should be to fire the triggers once the usual error checking
is done.
For example, BEFORE DROP VIEW triggers don't fire until we've verified
that q exists, is a view, and that we have permission to drop it, but
LOAD triggers fire much earlier, before we've really checked anything
at all. And ALTER TABLE is somewhere in the middle: we fire the
BEFORE trigger after checking permissions on the main table, but
before all permissions checks are done, viz:
I will rework LOAD, and ALTER TABLE too, which is more work as you can
imagine, because now we have to insinuate the command trigger calling
into each branch of what ALTER TABLE is able to implement.
6. Another pretty hideous aspect of the CREATE TABLE behavior is that
AFTER triggers are fired from a completely different place than BEFORE
triggers. It's not this patch's fault that our CREATE TABLE and
ALTER TABLE code is a Rube Goldberg machine, but finding some place to
jam each trigger invocation in that happens to (mostly) work as the
code exists today is not sufficiently future-proof.
I'm willing to make it better, what are your ideas?
command we fire the after triggers from ProcessUtility(). That way,
no matter how anybody changes the code in the future, we're guaranteed
that AFTER triggers will always fire exactly once per command, and the
From last year's cluster hacker meeting then several mails on this list,
it appears that we don't want to do it the way you propose, which indeed
would be simpler to implement.
the other hand, the more chance that things will error out before we
even reach the command trigger firing site. I think this is going to
I think that's a feature. You skip calling the commands trigger when you
know you won't run the command at all.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Tue, Mar 27, 2012 at 4:27 AM, Dimitri Fontaine
<dimitri@2ndquadrant.fr> wrote:
In the first versions of the patch I did try to have a single point
where to integrate the feature and that didn't work out. If you want to
publish information such as object id, name and schema you need to have
specialized code spread out everywhere.
[...]
That's meant to be a feature of the command trigger system, that's been
asked about by a lot of people. Having triggers fire for sub commands is
possibly The Right Thing™, or at least the most popular one.
[...]
I will rework LOAD, and ALTER TABLE too, which is more work as you can
imagine, because now we have to insinuate the command trigger calling
into each branch of what ALTER TABLE is able to implement.
[...]
From last year's cluster hacker meeting then several mails on this list,
it appears that we don't want to do it the way you propose, which indeed
would be simpler to implement.
[...]
I think that's a feature. You skip calling the commands trigger when you
know you won't run the command at all.
I am coming more and more strongly to the conclusion that we're going
in the wrong direction here. It seems to me that you're spending an
enormous amount of energy implementing something that goes by the name
COMMAND TRIGGER when what you really want is an EVENT TRIGGER.
Apparently, you don't want a callback every time someone types CREATE
TABLE: you want a callback every time a table gets created. You don't
want a callback every time someone types DROP FUNCTION; you want a
callback every time a function gets dropped. If the goal here is to
do replication, you'd more than likely even want such callbacks to
occur when the function is dropped indirectly via CASCADE. In the
interest of getting event triggers, you're arguing that we should
contort the definition of the work "command" to include sub-commands,
but not necessarily commands that turn out to be a no-op, and maybe
things that are sort of like what commands do even though nobody
actually ever executed a command by that name. I just don't think
that's a good idea. We either have triggers on commands, and they
execute once per command, period; or we have triggers on events and
they execute every time that event happens.
As it turns out, two people have already put quite a bit of work into
designing and implementing what amounts to an event trigger system for
PostgreSQL: me, and KaiGai Kohei. It's called the ObjectAccessHook
mechanism, and it fires every time we create or drop an object. It
passes the type of object created or dropped, the OID of the object,
and the column number also in the case of a column. The major
drawback of this mechanism is that you have to write the code you want
to execute in C, and you can't arrange for it to be executed via a DDL
command; instead, you have to set a global variable from your shared
library's _PG_init() function. However, I don't think that would be
very hard to fix. We could simply replaces the
InvokeObjectAccessHook() macro with a function that calls a list of
triggers pulled from the catalog.
I think there are a couple of advantages of this approach over what
you've got right now. First, there are a bunch of tested hook points
that are already committed. They have well-defined semantics that are
easy to understand: every time we create or drop an object, you get a
callback with these arguments. Second, KaiGai Kohei is interested in
adding more hook points in the future to service sepgsql. If the
command triggers code and the sepgsql code both use the same set of
hook points then (1) we won't clutter the code with multiple sets of
hook points and (2) any features that get added for purposes of
command triggers will also benefit sepgsql, and visca versa. Since
the two of you are trying to solve very similar problems, it would be
nice if both of you were pulling in the same direction. Third, and
most importantly, it seems to match the semantics you want, which is a
callback every time something *happens* rather than a callback every
time someone *types something*.
Specifically, I propose the following plan:
- Rename COMMAND TRIGGER to EVENT TRIGGER. Rewrite the documentation
to say that we're going to fire triggers every time an *event*
happens. Rewrite the code to put the firing mechanism inside
InvokeObjectAccessHook, which will become a function rather than a
macro.
- Change the list of supported trigger types to match what the
ObjectAccessHook mechanism understands, which means, at present,
post-create and drop. It might even make sense to forget about having
separate hooks for every time of object that can be created or dropped
and instead just let people say CREATE EVENT TRIGGER name ON { CREATE
| DROP } EXECUTE PROCEDURE function_name ( arguments ).
- Once that's done, start adding object-access-hook invocations (and
thus, the corresponding command trigger functionality) to whatever
other operations you'd like to have support it.
I realize this represents a radical design change from what you have
right now, but what you have right now is messy and ill-defined and I
don't think you can easily fix it. You're exposing great gobs of
implementation details which means that, inevitably, every time anyone
wants to refactor some code, the semantics of command triggers are
going to change, or else the developer will have to go to great
lengths to ensure that they don't. I don't think either of those
things is going to make anyone very happy.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tuesday, March 27, 2012 02:55:47 PM Robert Haas wrote:
On Tue, Mar 27, 2012 at 4:27 AM, Dimitri Fontaine
<dimitri@2ndquadrant.fr> wrote:
In the first versions of the patch I did try to have a single point
where to integrate the feature and that didn't work out. If you want to
publish information such as object id, name and schema you need to have
specialized code spread out everywhere.[...]
That's meant to be a feature of the command trigger system, that's been
asked about by a lot of people. Having triggers fire for sub commands is
possibly The Right Thing™, or at least the most popular one.[...]
I will rework LOAD, and ALTER TABLE too, which is more work as you can
imagine, because now we have to insinuate the command trigger calling
into each branch of what ALTER TABLE is able to implement.[...]
From last year's cluster hacker meeting then several mails on this list,
it appears that we don't want to do it the way you propose, which indeed
would be simpler to implement.[...]
I think that's a feature. You skip calling the commands trigger when you
know you won't run the command at all.I am coming more and more strongly to the conclusion that we're going
in the wrong direction here. It seems to me that you're spending an
enormous amount of energy implementing something that goes by the name
COMMAND TRIGGER when what you really want is an EVENT TRIGGER.
Apparently, you don't want a callback every time someone types CREATE
TABLE: you want a callback every time a table gets created.
Not necessarily. One use-case is doing something *before* something happens
like enforcing naming conventions, data types et al during development/schema
migration.
In the
interest of getting event triggers, you're arguing that we should
contort the definition of the work "command" to include sub-commands,
but not necessarily commands that turn out to be a no-op, and maybe
things that are sort of like what commands do even though nobody
actually ever executed a command by that name. I just don't think
that's a good idea. We either have triggers on commands, and they
execute once per command, period; or we have triggers on events and
they execute every time that event happens.
I don't think thats a very helpfull definition. What on earth would you want to
do with plain passing of
CREATE SCHEMA blub CREATE TABLE foo() CREATE TABLE bar();
So some decomposition seems to be necessary. Getting the level right sure
ain't totally easy.
It might be helpful to pass in the context from which something like that
happened.
As it turns out, two people have already put quite a bit of work into
designing and implementing what amounts to an event trigger system for
PostgreSQL: me, and KaiGai Kohei. It's called the ObjectAccessHook
mechanism, and it fires every time we create or drop an object. It
passes the type of object created or dropped, the OID of the object,
and the column number also in the case of a column. The major
drawback of this mechanism is that you have to write the code you want
to execute in C, and you can't arrange for it to be executed via a DDL
command; instead, you have to set a global variable from your shared
library's _PG_init() function. However, I don't think that would be
very hard to fix. We could simply replaces the
InvokeObjectAccessHook() macro with a function that calls a list of
triggers pulled from the catalog.
Which would basically require including pg_dump in the backend to implement
replication and logging. I don't think librarifying pg_dump is a fair burden
at all.
Also I have a *very hard* time to imagining to sensibly implement replicating
CREATE TABLE or ALTER TABLE ... ADD COLUMN with just object access hooks.
There is just not enough context. How would you discern between a ADD COLUMN
and a CREATE TABLE? They look very similar or even identical on a catalog
level.
I also have the strong feeling that all this would expose implementation
details *at least* as much as command triggers. A slight change in order of
catalog modifcation would be *way* harder to hide via the object hook than
something of a similar scale via command triggers.
I think there are a couple of advantages of this approach over what
you've got right now. First, there are a bunch of tested hook points
that are already committed. They have well-defined semantics that are
easy to understand: every time we create or drop an object, you get a
callback with these arguments. Second, KaiGai Kohei is interested in
adding more hook points in the future to service sepgsql. If the
command triggers code and the sepgsql code both use the same set of
hook points then (1) we won't clutter the code with multiple sets of
hook points and (2) any features that get added for purposes of
command triggers will also benefit sepgsql, and visca versa. Since
the two of you are trying to solve very similar problems, it would be
nice if both of you were pulling in the same direction. Third, and
most importantly, it seems to match the semantics you want, which is a
callback every time something *happens* rather than a callback every
time someone *types something*.
Obviously unifying both on implementation level would be nice, but I don't
really see it happening. The ObjectAccessHook mechanism seems too low level to
me.
Specifically, I propose the following plan:
- Rename COMMAND TRIGGER to EVENT TRIGGER. Rewrite the documentation
to say that we're going to fire triggers every time an *event*
happens. Rewrite the code to put the firing mechanism inside
InvokeObjectAccessHook, which will become a function rather than a
macro.
That would mean extending the ObjectAccessHook mechanism to handle:
- altering objects
- before events (which would make the current data passing mechanism moot, you
can't look in the catalogs yet)
- passing names (otherwise simple triggers are *WAY* to hard to write)
- giving enough usefull context to know whats happening. Knowing that a new
row was created is just about pointless in itself if you don't know why/from
where.
- ...
Which would - at least as far I can see - essentially morph into the command
triggers patch.
I realize this represents a radical design change from what you have
right now, but what you have right now is messy and ill-defined and I
don't think you can easily fix it. You're exposing great gobs of
implementation details which means that, inevitably, every time anyone
wants to refactor some code, the semantics of command triggers are
going to change, or else the developer will have to go to great
lengths to ensure that they don't. I don't think either of those
things is going to make anyone very happy.
Hm. I agree that there is some work needed to make the whole handling more
consistent. But I don't think its as bad as you paint it.
I don't want to brush of your review here - you definitely do raise valid
points. I don't particularly like the current implementation very much. But I
simply fail to see a less annoying way.
Its also not that I am just defending a colleague - you might have noticed the
@2ndquadrant - I got interested in this patch quite a bit before that was on
the horizon.
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 27, 2012 at 9:37 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Not necessarily. One use-case is doing something *before* something happens
like enforcing naming conventions, data types et al during development/schema
migration.
That is definitely a valid use case. The only reason why we don't
have something like that built into the ObjectAccessHook framework is
because we haven't yet figured out a clean way to make it work. Most
of KaiGai's previous attempts involved passing a bunch of garbage
selected apparently at random into the hook function, and I rejected
that as unmaintainable. Dimitri's code here doesn't have that problem
- it passes in a consistent set of information across the board. But
I still think it's unmaintainable, because there's no consistency
about when triggers get invoked, or whether they get invoked at all.
We need something that combines the strengths of both approaches.
I actually think that, to really meet all needs here, we may need the
ability to get control in more than one place. For example, one thing
that KaiGai has wanted to do (and I bet Dimitri would live to be able
to do it too, and I'm almost sure that Dan Farina would like to do it)
is override the built-in security policy for particular commands. I
think that KaiGai only wants to be able to deny things that would
normally be allowed, but I suspect some of those other folks would
also like to be able to allow things that would normally be denied.
For those use cases, you want to get control at permissions-checking
time. However, where Dimitri has the hooks right now, BEFORE triggers
for ALTER commands fire after you've already looked up the object that
you're manipulating. That has the advantage of allowing you to use
the OID of the object, rather than its name, to make policy decisions;
but by that time it's too late to cancel a denial-of-access by the
first-line permissions checks. Dimitri also mentioned wanting to be
able to cancel the actual operation - and presumably, do something
else instead, like go execute it on a different node, and I think
that's another valid use case for this kind of trigger. It's going to
take some work, though, to figure out what the right set of control
points is, and it's probably going to require some refactoring of the
existing code, which is a mess that I have been slowly trying to clean
up.
In the
interest of getting event triggers, you're arguing that we should
contort the definition of the work "command" to include sub-commands,
but not necessarily commands that turn out to be a no-op, and maybe
things that are sort of like what commands do even though nobody
actually ever executed a command by that name. I just don't think
that's a good idea. We either have triggers on commands, and they
execute once per command, period; or we have triggers on events and
they execute every time that event happens.I don't think thats a very helpfull definition. What on earth would you want to
do with plain passing of
CREATE SCHEMA blub CREATE TABLE foo() CREATE TABLE bar();So some decomposition seems to be necessary. Getting the level right sure
ain't totally easy.
It might be helpful to pass in the context from which something like that
happened.
I agree that it's not a very helpful decision, but I'm not the one who
said we wanted command triggers rather than event triggers. :-)
Which would basically require including pg_dump in the backend to implement
replication and logging. I don't think librarifying pg_dump is a fair burden
at all.
I don't either, but that strikes me as a largely unrelated problem.
As you may recall, I've complained that too much of that functionality
is in the client in the past, and I haven't changed my opinion on
that. But how is that any different with Dimitri's approach? You can
get a callback AFTER CREATE TABLE, and you'll get the table name. Now
what? If you get the trigger in C you can get the node tree, but
that's hardly any better. You're still going to need to do some
pretty tricky push-ups to get reliable replication. It's not at all
evident to me that the parse-tree is any better a place to start than
the system catalog representation; in fact, I would argue that it's
probably much worse, because you'll have to exactly replicate whatever
the backend did to decide what catalog entries to create, or you'll
get drift between servers.
Also I have a *very hard* time to imagining to sensibly implement replicating
CREATE TABLE or ALTER TABLE ... ADD COLUMN with just object access hooks.
There is just not enough context. How would you discern between a ADD COLUMN
and a CREATE TABLE? They look very similar or even identical on a catalog
level.
That can be fixed using the optional argument to
InvokeObjectAccessHook, similar to what we've done for differentiating
internal drops from other drops.
I also have the strong feeling that all this would expose implementation
details *at least* as much as command triggers. A slight change in order of
catalog modifcation would be *way* harder to hide via the object hook than
something of a similar scale via command triggers.
I don't see that. Right now, when you do something like CREATE TABLE
foo (a serial primary key), the exact order in which the objects are
created is apparent from the order of the trigger firing. The order
of BEFORE vs. AFTER triggers is also drawn out of a hat. So I don't
see how it's any worse. It also avoids a lot of definitional
ambiguity. If you say that we're going to have a trigger on the
CREATE SEQUENCE command, then what happens when the user creates a
sequence via some other method? The current patch says that we should
handle that by calling the CREATE SEQUENCE trigger if it happens to be
convenient because we're going through the same code path that a
normal CREATE SEQUENCE would have gone through, but if it uses a
different code path then let's not bother. Otherwise, how do you
explain the fact that a DROP .. CASCADE doesn't fire triggers for the
subordinate objects dropped, but CREATE TABLE does fire triggers for
the subordinate objects created? If you trigger on events, then
things are a lot more cut-and-dried. You fire the trigger when the
event happens. If the event doesn't happen, you don't fire the
trigger. If an implementation change causes objects to be created
when they weren't before, or not created when they were before, then,
yes, trigger firing behavior will change, but it will be clear that
the new behavior is correct. Right now, future developers modifying
this code have no reasonable basis for deciding whether or not a
trigger ought to fire in a given situation. There's no consistent
rule, no specification, and no documentation for how that's intended
to happen.
Hm. I agree that there is some work needed to make the whole handling more
consistent. But I don't think its as bad as you paint it.I don't want to brush of your review here - you definitely do raise valid
points. I don't particularly like the current implementation very much. But I
simply fail to see a less annoying way.Its also not that I am just defending a colleague - you might have noticed the
@2ndquadrant - I got interested in this patch quite a bit before that was on
the horizon.
Yep, understood, and I'd have the same complaints about this patch
that I do presently if it were submitted by someone who worked for
EnterpriseDB, or some independent third party. I do think it's a bit
of a shame that more people didn't pay attention to and help improve
the ObjectAccessHook machinery as we were working on that. I mean,
Dimitri is not the first or last person to want to get control during
DDL operations, and KaiGai's already done a lot of work figuring out
how to make it work reasonably. Pre-create hooks don't exist in that
machinery not because nobody wants them, but because it's hard. This
whole problem is hard. It would be wrong to paint it as a problem
that is unsolvable or not valuable, but it would be equally wrong to
expect that it's easy or that anyone's first attempt (mine, yours,
Dimitri's, KaiGai's, or Tom Lane's) is going to fall painlessly into
place without anyone needing to sweat a little blood.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
I am coming more and more strongly to the conclusion that we're going
in the wrong direction here. It seems to me that you're spending an
enormous amount of energy implementing something that goes by the name
COMMAND TRIGGER when what you really want is an EVENT TRIGGER.
No.
The following two features are not allowed by what you call an EVENT
TRIGGER yet the very reason why I started working on COMMAND TRIGGERs:
- BEFORE COMMAND TRIGGER
- Having the command string available in the command trigger
Now, because of scheduling, the current patch has been reduced not to
include the second feature yet, which is a good trade-off for now. Yet
it's entirely possible to implement such feature as an extension once
9.2 is out given current COMMAND TRIGGER patch.
I realize this represents a radical design change from what you have
right now, but what you have right now is messy and ill-defined and I
That's only because I've not been doing the hard choices alone, I wanted
to be able to speak about them here, and the only time that discussion
happen is when serious hand down code review is being done.
My take? Let's make the hard decisions together. Mechanisms are
implemented. The plural is what is causing problems here, but that also
mean we can indeed implement several policies now.
I've been proposing a non-messy policy in a previous mail, which I
realize the patch is not properly implementing now. I'd think moving the
patch to implement said policy (or another one after discussion) is next
step.
don't think you can easily fix it. You're exposing great gobs of
implementation details which means that, inevitably, every time anyone
wants to refactor some code, the semantics of command triggers are
going to change, or else the developer will have to go to great
lengths to ensure that they don't. I don't think either of those
things is going to make anyone very happy.
I guess you can't really have your cake and eat it too, right?
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Robert Haas <robertmhaas@gmail.com> writes:
I actually think that, to really meet all needs here, we may need the
ability to get control in more than one place. For example, one thing
that KaiGai has wanted to do (and I bet Dimitri would live to be able
to do it too, and I'm almost sure that Dan Farina would like to do it)
is override the built-in security policy for particular commands. I
I had that in a previous version of the patch, and removed it because
you were concerned about our ability to review it in time for 9.2, which
has obviously been a right decision.
That was called INSTEAD OF command triggers, and they could call a
SECURITY DEFINER function.
I agree that it's not a very helpful decision, but I'm not the one who
said we wanted command triggers rather than event triggers. :-)
Color me unconvinced about event triggers. That's not answering my use
cases.
that. But how is that any different with Dimitri's approach? You can
get a callback AFTER CREATE TABLE, and you'll get the table name. Now
what? If you get the trigger in C you can get the node tree, but
that's hardly any better. You're still going to need to do some
pretty tricky push-ups to get reliable replication. It's not at all
What you do with the parse tree is rewrite the command. It's possible to
do, but would entail exposing the internal parser state which Tom
objects too. I'm now thinking that can be maintained as a C extension.
evident to me that the parse-tree is any better a place to start than
the system catalog representation; in fact, I would argue that it's
probably much worse, because you'll have to exactly replicate whatever
the backend did to decide what catalog entries to create, or you'll
get drift between servers.
Try to build a command string from the catalogs… even if you can store a
snapshot of them before and after the command. Remember that you might
want to “replicate” to things that are NOT a PostgreSQL server.
ambiguity. If you say that we're going to have a trigger on the
CREATE SEQUENCE command, then what happens when the user creates a
sequence via some other method? The current patch says that we should
handle that by calling the CREATE SEQUENCE trigger if it happens to be
convenient because we're going through the same code path that a
normal CREATE SEQUENCE would have gone through, but if it uses a
different code path then let's not bother. Otherwise, how do you
Yes, the current set of which commands fire which triggers is explained
by how the code is written wrt standard_ProcessUtility() calls. We could
mark re-entrant calls and disable the command trigger feature, it would
not be our first backend global variable in flight.
Dimitri is not the first or last person to want to get control during
DDL operations, and KaiGai's already done a lot of work figuring out
how to make it work reasonably. Pre-create hooks don't exist in that
machinery not because nobody wants them, but because it's hard. This
I've been talking with Kaigai about using the Command Trigger
infrastructure to implement its control hooks, while reviewing one of
his patches, and he said that's not low-level enough for him.
whole problem is hard. It would be wrong to paint it as a problem
that is unsolvable or not valuable, but it would be equally wrong to
expect that it's easy or that anyone's first attempt (mine, yours,
Dimitri's, KaiGai's, or Tom Lane's) is going to fall painlessly into
place without anyone needing to sweat a little blood.
Sweating over that feature is a good summary of a whole lot of my and
some others' time lately.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Hi,
On Tuesday, March 27, 2012 04:29:58 PM Robert Haas wrote:
On Tue, Mar 27, 2012 at 9:37 AM, Andres Freund <andres@2ndquadrant.com>
wrote:
Not necessarily. One use-case is doing something *before* something
happens like enforcing naming conventions, data types et al during
development/schema migration.That is definitely a valid use case. The only reason why we don't
have something like that built into the ObjectAccessHook framework is
because we haven't yet figured out a clean way to make it work. Most
of KaiGai's previous attempts involved passing a bunch of garbage
selected apparently at random into the hook function, and I rejected
that as unmaintainable. Dimitri's code here doesn't have that problem
- it passes in a consistent set of information across the board. But
I still think it's unmaintainable, because there's no consistency
about when triggers get invoked, or whether they get invoked at all.
We need something that combines the strengths of both approaches.
Yes.
I still think the likeliest way towards that is defining some behaviour we want
regarding
* naming conflict handling
* locking
And then religiously make sure the patch adheres to that. That might need some
refactoring of existing code (like putting a art of the utility.c handling of
create table into its own function and such), but should be doable.
I think BEFORE command triggers ideally should run
* before permission checks
* before locking
* before internal checks are done (nameing conflicts, type checks and such)
Obviously some things will be caught before that (parse analysis of some
commands) and I think we won't be able to fully stop that, but its not totally
consistent now and while doing some work in the path of this patch seems
sensible it cannot do-over everything wrt this.
Looking up oids and such before calling the trigger doesn't seem to be
problematic from my pov. Command triggers are a superuser only facility,
additional information they gain are no problem.
Thinking about that I think we should enforce command trigger functions to be
security definer functions.
I actually think that, to really meet all needs here, we may need the
ability to get control in more than one place.
Not sure what you mean by that. Before/after permission checks, before/after
consistency checks? That sort of thing?
For example, one thing
that KaiGai has wanted to do (and I bet Dimitri would live to be able
to do it too, and I'm almost sure that Dan Farina would like to do it)
is override the built-in security policy for particular commands.
Dim definitely seems to want that: https://github.com/dimitri/pgextwlist ;)
I think that KaiGai only wants to be able to deny things that would
normally be allowed, but I suspect some of those other folks would
also like to be able to allow things that would normally be denied.
Denying seems to be easier than allowing stuff safely....
For those use cases, you want to get control at permissions-checking
time. However, where Dimitri has the hooks right now, BEFORE triggers
for ALTER commands fire after you've already looked up the object that
you're manipulating. That has the advantage of allowing you to use
the OID of the object, rather than its name, to make policy decisions;
but by that time it's too late to cancel a denial-of-access by the
first-line permissions checks.
Why? Just throw a access denied exception? Unless its done after the locking
that won't be visible by anything but timing?
Additional granting is more complex though, I am definitely with you there.
That will probably need INSTEAD triggers which I don't see for now. Those will
have their own share of problems.
Dimitri also mentioned wanting to be able to cancel the actual operation -
and presumably, do something else instead, like go execute it on a different
node, and I think that's another valid use case for this kind of trigger.
It's going to take some work, though, to figure out what the right set of
control points is, and it's probably going to require some refactoring of
the existing code, which is a mess that I have been slowly trying to clean
up.
I commend your bravery...
In the interest of getting event triggers, you're arguing that we should
contort the definition of the work "command" to include sub-commands,
but not necessarily commands that turn out to be a no-op, and maybe
things that are sort of like what commands do even though nobody
actually ever executed a command by that name. I just don't think
that's a good idea. We either have triggers on commands, and they
execute once per command, period; or we have triggers on events and
they execute every time that event happens.I don't think thats a very helpfull definition. What on earth would you
want to do with plain passing of
CREATE SCHEMA blub CREATE TABLE foo() CREATE TABLE bar();
So some decomposition seems to be necessary. Getting the level right sure
ain't totally easy.
It might be helpful to pass in the context from which something like that
happened.I agree that it's not a very helpful decision, but I'm not the one who
said we wanted command triggers rather than event triggers. :-)
Well, but on the other hand what will you do with a "pg_attribute row added"
event. It doesn't even have a oid ;)
Which would basically require including pg_dump in the backend to
implement replication and logging. I don't think librarifying pg_dump is
a fair burden at all.I don't either, but that strikes me as a largely unrelated problem.
As you may recall, I've complained that too much of that functionality
is in the client in the past, and I haven't changed my opinion on
that.
I am not really happy with that as well. But if we start putting all that into
this patch we will never get anywhere. I think moving that nearer to the
server in some form is a MAJOR project on its own.
Also I dont see it solving differential stuff at all.
But how would do something like logging the originating statement with the
object access hook machinery?
But how is that any different with Dimitri's approach? You can
get a callback AFTER CREATE TABLE, and you'll get the table name. Now
what? If you get the trigger in C you can get the node tree, but
that's hardly any better. You're still going to need to do some
pretty tricky push-ups to get reliable replication. It's not at all
evident to me that the parse-tree is any better a place to start than
the system catalog representation; in fact, I would argue that it's
probably much worse, because you'll have to exactly replicate whatever
the backend did to decide what catalog entries to create, or you'll
get drift between servers.
Well, the problem is with just catalog access its just about impossible to
generate differential changes from them. Also the catalog between clusters
*wont be the same* in a large part of the use-cases. Oids will basically never
match.
Matching the intent, which is mostly visible in the parsetree, seems to be a
whole much more helpfull.
I am pretty confident, given time, that I could write a fairly complete
differential-sql generation for the usual parts of DDL from the parsetree alone
in 2-3 days with a C level trigger. I am also very sure that I wouldn't be
able to even remotely get that far with catalog only access.
Also I have a *very hard* time to imagining to sensibly implement
replicating CREATE TABLE or ALTER TABLE ... ADD COLUMN with just object
access hooks. There is just not enough context. How would you discern
between a ADD COLUMN and a CREATE TABLE? They look very similar or even
identical on a catalog level.That can be fixed using the optional argument to
InvokeObjectAccessHook, similar to what we've done for differentiating
internal drops from other drops.
Several of the points where those hooks are invoked don't even seem to have
the necessary information for that... Doing that seems to involve passing
heaps of additional state through the backend.
I also have the strong feeling that all this would expose implementation
details *at least* as much as command triggers. A slight change in order
of catalog modifcation would be *way* harder to hide via the object hook
than something of a similar scale via command triggers.I don't see that. Right now, when you do something like CREATE TABLE
foo (a serial primary key), the exact order in which the objects are
created is apparent from the order of the trigger firing. The order
of BEFORE vs. AFTER triggers is also drawn out of a hat. So I don't
see how it's any worse. It also avoids a lot of definitional
ambiguity. If you say that we're going to have a trigger on the
CREATE SEQUENCE command, then what happens when the user creates a
sequence via some other method? The current patch says that we should
handle that by calling the CREATE SEQUENCE trigger if it happens to be
convenient because we're going through the same code path that a
normal CREATE SEQUENCE would have gone through, but if it uses a
different code path then let's not bother.
Otherwise, how do you explain the fact that a DROP .. CASCADE doesn't fire
triggers for the subordinate objects dropped, but CREATE TABLE does fire
triggers for the subordinate objects created?
I complained about that several times. As I recall it dim basically aggreed
but didn't want to do it now because it involve passing more data around.
Didn't convince me much, but I think I could live with it.
Adding proper command trigger calling via that doesn't seem to be too complex
to me.
If you trigger on events, then things are a lot more cut-and-dried. You fire
the trigger when the event happens. If the event doesn't happen, you don't
fire the trigger. If an implementation change causes objects to be created
when they weren't before, or not created when they were before, then,
yes, trigger firing behavior will change, but it will be clear that
the new behavior is correct. Right now, future developers modifying
this code have no reasonable basis for deciding whether or not a
trigger ought to fire in a given situation. There's no consistent
rule, no specification, and no documentation for how that's intended
to happen.
This stuff definitely needs to be properly specified. And I am sorry that I
didn't point this out earlier. I guess I just looked at too many iterations of
this to still have a big picture.
Hm. I agree that there is some work needed to make the whole handling
more consistent. But I don't think its as bad as you paint it.I don't want to brush of your review here - you definitely do raise valid
points. I don't particularly like the current implementation very much.
But I simply fail to see a less annoying way.Its also not that I am just defending a colleague - you might have
noticed the @2ndquadrant - I got interested in this patch quite a bit
before that was on the horizon.Yep, understood, and I'd have the same complaints about this patch
that I do presently if it were submitted by someone who worked for
EnterpriseDB, or some independent third party.
I didn't want to suggest that you do! I just wanted to make it explicit that
its not somebody asking me via external channels (expect maybe Dim asking for
reviews on #postgresql on irc) to do a round of review for some company
reason.
I do think it's a bit of a shame that more people didn't pay attention to
and help improve the ObjectAccessHook machinery as we were working on that.
I mean, Dimitri is not the first or last person to want to get control during
DDL operations, and KaiGai's already done a lot of work figuring out
how to make it work reasonably.
Pre-create hooks don't exist in that machinery not because nobody wants
them, but because it's hard.
I am not sure if the machinery is a very good fit for pre-create hooks. Thats
no argument against the machinery but against reusing it for this usecase.
This whole problem is hard. It would be wrong to paint it as a problem that
is unsolvable or not valuable, but it would be equally wrong to expect that
it's easy or that anyone's first attempt (mine, yours, Dimitri's, KaiGai's,
or Tom Lane's) is going to fall painlessly into place without anyone needing
to sweat a little blood.
If I appeared a bit grumpy about your review its mostly because I would like
to get the feature to into 9.3 because it would have saved me many hours of
pain in the past, and the review pushed it further away from that.
The review also didn't come in that early - but thats definitely not your
fault, Tom and you did a lot of work on many patches and unfortunately for us
you can't scale limiteless. You shouldn't need to, but reality seems to tell
us otherwise :(
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 27, 2012 at 11:05 AM, Dimitri Fontaine
<dimitri@2ndquadrant.fr> wrote:
I agree that it's not a very helpful decision, but I'm not the one who
said we wanted command triggers rather than event triggers. :-)Color me unconvinced about event triggers. That's not answering my use
cases.
Could we get a list of those use cases, maybe on a wiki page
somewhere, and add to it all the use cases that KaiGai Kohei or others
who are interested in this are aware of? Or maybe we can just discuss
via email, but it's going to be hard to agree that we've got something
that meets the requirements or doesn't if we're all imagining
different sets of requirements. The main use cases I can think of
are:
1. Replication. That is, after we perform a DDL operation, we call a
trigger and tell it what we did, so that it can make a record of that
information and ship it to another machine, where it will arrange to
do the same thing on the remote side.
2. Redirection. That is, before we perform a DDL operation, we call a
trigger and tell it what we've been asked to do, so that it can go
execute the request elsewhere (e.g. the master rather than the Hot
Standby slave).
3. Additional security or policy checks. That is, before we perform a
DDL operation, we call a trigger and tell it what we've been asked to
do, so that it can throw an error if it doesn't like our credentials
or, say, the naming convention we've used for our column names.
4. Relaxation of security checks. That is, we let a trigger get
control at permissions-checking time and let it make the go-or-no-go
decision in lieu of the usual permissions-checking code.
5. Auditing. Either when something is attempted (i.e. before) or
after it happens, we log the attempt/event somewhere.
Anything else?
In that list of use cases, it seems to me that you want BEFORE and
AFTER triggers to have somewhat different firing points. For the
BEFORE cases, you really need the command trigger to fire early, and
once. For example, if someone says "ALTER TABLE dimitri ADD COLUMN
fontaine INTEGER, DROP COLUMN IF EXISTS haas", permissions on dimitri
should really only get checked once, not once for each subcommand.
That's the point at which you need to get control for #3 or #4, and it
would be workable for #5 as well; I'm less sure about #2. On the
other hand, for the AFTER cases I've listed here, I think you really
want to know what *actually* happened, not what somebody thought about
doing. You want to know which tables, sequences, etc. *actually* got
created or dropped, not the ones that the user mentioned. If the user
mentioned a table but we didn't drop it (and we also didn't error out,
because IF EXISTS) is used, none of the AFTER cases really care; if we
dropped other stuff (because of CASCADE) the AFTER cases may very well
care.
Another thing to think about with respect to deciding on the correct
firing points is that you can't fire easily the trigger after we've
identified the object in question but before we've checked permissions
on it, which otherwise seems like an awfully desirable thing to do for
use cases 3, 4, and 5 from the above list, and maybe 2 as well. We
don't want to try to take locks on objects that the current user
doesn't have permission to access, because then a user with no
permissions whatsoever on an object can interfere with access by
authorized users. On the flip side, we can't reliably check
permissions before we've locked the object, because somebody else
might rename or drop it after we check permissions and before we get
the lock. Noah Misch invented a clever technique that I then used to
fix a bunch of these problems in 9.2: the fixed code (sadly, not all
cases are fixed yet, due to the fact that we ran out of time in the
development cycle) looks up the object name, checks permissions
(erroring out if the check fails), and then locks the object. Once it
gets the lock, it checks whether any shared-invalidation messages have
been processed since the point just before we looked up the object
name. If so, it redoes the name lookup. If the referrent of the name
has not changed, we're done; if it has, we drop the old lock and
relock the new object and loop around again, not being content until
we're sure that the object we locked is still the referrant of the
name. This leads to much more logical behavior than the old way of
doing things, and not incidentally gets rid of a lot of errors of the
form "cache lookup failed for relation %u" that users of existing
releases will remember, probably not too fondly.
However, it's got serious implications for triggers that want to relax
security policy, because the scope of what you can do inside that loop
is pretty limited. You can't really do anything to the relation while
you're checking permissions on it, because you haven't locked it yet.
If you injected a trigger there, it would have to be similarly
limited, and I don't know how we'd enforce that, and it would have to
be prepared to get called multiple times for the same operation to
handle the retry logic, which would be awkward for other cases, like
auditing. For cases where people just want to impose extra security
or policy checks, we could possibly just punt and do them after the
lock is taken, as Dimitri's patch does. That has the disadvantage
that you can possibly obtain a lock on an object that you don't have
permissions on, but maybe that's tolerable. To understand the issue
here, think about LOCK TABLE foo. If the table isn't otherwise locked
and you lack permissions, this will fail immediately. But in 9.1, it
will try to AccessExclusiveLock the table first and then check
permissions. As soon as it gets the lock, it will fail, but if other
people are using the table, those will block the AccessExclusiveLock,
but the pending AccessExclusiveLock will also block any new locks from
being granted. So essentially all new operations against the table
get blocked until all the existing transactions that have touched that
table commit or roll back, even though the guy doing the LOCK TABLE
has no privileges. Bummer. However, I think we could possibly live
with some limitations of this kind for *additional* checks imposed by
command triggers; we'll still be protected against someone getting a
table lock when they have no privileges at all. It's a lot more
awkward when someone wants to *relax* privilege checks, though - I'm
fairly confident that we *don't* want to go back to the pre-9.2
behavior of locking first and asking questions later.
Try to build a command string from the catalogs… even if you can store a
snapshot of them before and after the command. Remember that you might
want to “replicate” to things that are NOT a PostgreSQL server.
For CREATE commands, I am sure this is doable, although there also
other ways. For ALTER commands, I agree that you need to pass
detailed information about what is being altered and how. But that
drags you well way from the idea of a command trigger. At a minimum,
you're going to want to know about each subcommand of ALTER TABLE.
However, I think it's unrealistic to suppose we're going to get all of
that straightened out in time for 9.2. KaiGai took a whole release
cycle to get working support for just CREATE and DROP.
ambiguity. If you say that we're going to have a trigger on the
CREATE SEQUENCE command, then what happens when the user creates a
sequence via some other method? The current patch says that we should
handle that by calling the CREATE SEQUENCE trigger if it happens to be
convenient because we're going through the same code path that a
normal CREATE SEQUENCE would have gone through, but if it uses a
different code path then let's not bother. Otherwise, how do youYes, the current set of which commands fire which triggers is explained
by how the code is written wrt standard_ProcessUtility() calls. We could
mark re-entrant calls and disable the command trigger feature, it would
not be our first backend global variable in flight.
Agreed, although I think just adding an argument to ProcessUtility
would be enough.
Dimitri is not the first or last person to want to get control during
DDL operations, and KaiGai's already done a lot of work figuring out
how to make it work reasonably. Pre-create hooks don't exist in that
machinery not because nobody wants them, but because it's hard. ThisI've been talking with Kaigai about using the Command Trigger
infrastructure to implement its control hooks, while reviewing one of
his patches, and he said that's not low-level enough for him.
Right. That worries me. The infrastructure KaiGai is using is
low-level in two senses. First, it requires you to write a C module,
load it into the backend, and use _PG_init to frob global variables.
That is, it's not user-friendly; you have to do very low-level things
to get access to the feature. Second, it's fine-grained. A single
command can potentially hit those hooks many times; each individual
operation triggers a hook call, not the command as a whole. When
KaiGai says that command triggers wouldn't work for his purposes, he's
talking about the second issue, not the first one. It might be less
efficient for him to define his hooks as C-callable functions,
register them in pg_proc, and install them using CREATE COMMAND
TRIGGER, but it would work. No problem. However, the firing points
you've chosen would not be useful for his purposes. I'm beating a
dead horse here, but I think that's because you've got the wrong
firing points, not because his needs are somehow very different from
everything that anyone else wants to do.
Sweating over that feature is a good summary of a whole lot of my and
some others' time lately.
Understood.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Mar 27, 2012 at 11:58 AM, Andres Freund <andres@2ndquadrant.com> wrote:
I still think the likeliest way towards that is defining some behaviour we want
regarding
* naming conflict handling
* lockingAnd then religiously make sure the patch adheres to that. That might need some
refactoring of existing code (like putting a art of the utility.c handling of
create table into its own function and such), but should be doable.I think BEFORE command triggers ideally should run
* before permission checks
* before locking
* before internal checks are done (nameing conflicts, type checks and such)
It is possible to do this, and it would actually be much easier and
less invasive to implement than what Dimitri has done here, but the
downside is that you won't have done the name lookup either. See my
last email to Dimitri for a long explanation of why that restriction
is not easily circumventable: name lookups, locking, and permission
checks are necessarily and inextricably intertwined. Still, if others
agree this is useful, I think it would be a lot easier to get right
than what we have now.
Obviously some things will be caught before that (parse analysis of some
commands) and I think we won't be able to fully stop that, but its not totally
consistent now and while doing some work in the path of this patch seems
sensible it cannot do-over everything wrt this.
Some of what we're now doing as part of parse analysis should really
be reclassified. For example, the ProcessUtility branch that handles
T_CreateStmt and T_CreateForeignTableStmt should really be split out
as a separate function that lives in tablecmds.c, and I think at least
some of what's in transformCreateStmt should be moved to tablecmds.c
as well. The current arrangement makes it really hard to guarantee
things like "the name gets looked up just once", which seems obviously
desirable, since strange things will surely happen if the whole
command doesn't agree on which OID it's operating on.
Looking up oids and such before calling the trigger doesn't seem to be
problematic from my pov. Command triggers are a superuser only facility,
additional information they gain are no problem.
Thinking about that I think we should enforce command trigger functions to be
security definer functions.
I don't see any benefit from that restriction.
I actually think that, to really meet all needs here, we may need the
ability to get control in more than one place.Not sure what you mean by that. Before/after permission checks, before/after
consistency checks? That sort of thing?
Yes. For example, above you proposed a kind of trigger that fires
really early - basically after parsing but before everything else.
What Dimitri has implemented is a much later trigger that is still
before the meat of the command, but not before *everything*. And then
there's an AFTER trigger, which could fire either after an individual
piece or stage of the command, or after the whole command is complete,
either of which seems potentially useful depending on the
circumstances. I almost think that the BEFORE/AFTER name serves to
confuse rather than to clarify, in this case. It's really a series of
specific hook points: whenever we parse a command (but before we
execute it), after security and sanity checks but before we begin
executing the command, before or after various subcommands, after the
whole command is done, and maybe a few others. When we say BEFORE or
AFTER, we implicitly assume that we want at most two of the things
from that list, and I am not at all sure that's what going to be
enough.
One thing I had thought about doing, in the context of sepgsql, and we
may yet do it, is create a hook that gets invoked whenever we need to
decide whether it's OK to perform an action on an object. For
example, consider ALTER TABLE .. ADD FOREIGN KEY: we'd ask the hook
both "is it OK to add a foreign key to table X?" and also "is it OK to
make a foreign key refer to table Y"? This doesn't fit into the
command-trigger framework at all, but it's definitely useful for
sepgsql, and I bet it's good for other things, too - maybe not that
specific example, but that kind of thing.
I think that KaiGai only wants to be able to deny things that would
normally be allowed, but I suspect some of those other folks would
also like to be able to allow things that would normally be denied.Denying seems to be easier than allowing stuff safely....
Yes.
For those use cases, you want to get control at permissions-checking
time. However, where Dimitri has the hooks right now, BEFORE triggers
for ALTER commands fire after you've already looked up the object that
you're manipulating. That has the advantage of allowing you to use
the OID of the object, rather than its name, to make policy decisions;
but by that time it's too late to cancel a denial-of-access by the
first-line permissions checks.Why? Just throw a access denied exception? Unless its done after the locking
that won't be visible by anything but timing?
I think you misread what I wrote. It's still possible to deny things,
but it's too late to allow something that an earlier check has already
denied. Right now, sepgsql actually does its denying very late, from
the post-create hook. That's not ideal, of course, but it works. I
know KaiGai would like it to happen earlier, to avoid wasting work.
Additional granting is more complex though, I am definitely with you there.
That will probably need INSTEAD triggers which I don't see for now. Those will
have their own share of problems.
Right.
I agree that it's not a very helpful decision, but I'm not the one who
said we wanted command triggers rather than event triggers. :-)Well, but on the other hand what will you do with a "pg_attribute row added"
event. It doesn't even have a oid ;)
We refer to everything using ObjectAddress notation - the OID of the
system catalog that contains objects of that type, the OID of the
object itself, and a "sub-object ID" which is 0 in general but the
column number in that specific case. What you can do is go fetch the
pg_attribute row and then do whatever you like - log it, throw an
error, etc. It's a hook that gets invoked when you add a column. If
it's important to differentiate between a column that gets added at
CREATE TABLE time and one that gets added by ALTER TABLE .. ADD
COLUMN, that could be done. It's a lot easier to deal with triggers
that fire in cases you don't care about than to deal with triggers
that don't fire in cases you do care about. The former you can just
ignore.
Which would basically require including pg_dump in the backend to
implement replication and logging. I don't think librarifying pg_dump is
a fair burden at all.I don't either, but that strikes me as a largely unrelated problem.
As you may recall, I've complained that too much of that functionality
is in the client in the past, and I haven't changed my opinion on
that.I am not really happy with that as well. But if we start putting all that into
this patch we will never get anywhere.
Just to be clear, I wasn't proposing that.
I think moving that nearer to the
server in some form is a MAJOR project on its own.
Also I dont see it solving differential stuff at all.
True.
But how would do something like logging the originating statement with the
object access hook machinery?
You can't, but you can do a lot of other useful stuff that command
triggers as currently envisioned won't allow. I'm starting to think
that maybe BEFORE triggers need to fire on commands and AFTER triggers
on events? Can we define "start of command execution" as an event? I
don't think we need two completely different facilities, but I really
want to make sure we have a plan for capturing events. I can't
imagine how replication is gonna work if we can't log exactly what we
did, rather than just the raw query string. Statement-based
replication is sometimes useful, especially in heterogenous
environments, but it's always kinda sucky, in that there are edge
cases that break. If you want to know what schema the server is going
to use for a newly created object before it creates it, you've got to
do that calculation yourself, and now you've created a risk of getting
a different answer. If you want to capture all the drops and replay
them on the slave, you have to be able to handle CASCADE.
Well, the problem is with just catalog access its just about impossible to
generate differential changes from them. Also the catalog between clusters
*wont be the same* in a large part of the use-cases. Oids will basically never
match.
True. ALTER commands are going to need additional information. But
some of that information also won't be available until quite late.
For example, an ALTER on a parent might cascade down to children, and
a command trigger might want to know exactly which tables were
affected. That trigger isn't really BEFORE or AFTER or INSTEAD OF;
it's more like DURING, and at a very specific point DURING.
Matching the intent, which is mostly visible in the parsetree, seems to be a
whole much more helpfull.
But note that there are a lot of possible things you can't tell from
the parse-tree without deducing them. Sure, for LISTEN, the parse
tree is fine. But for ALTER TABLE or DROP the parse tree doesn't even
tell you what object you're operating on (at least, not unless the
user includes an explicit schema-qualification), let alone the whole
list of objects the command is ultimately going to process due to
cascading, inheritance, etc. You can reimplement that logic in your
trigger but that's both fragile and laborious.
Also I have a *very hard* time to imagining to sensibly implement
replicating CREATE TABLE or ALTER TABLE ... ADD COLUMN with just object
access hooks. There is just not enough context. How would you discern
between a ADD COLUMN and a CREATE TABLE? They look very similar or even
identical on a catalog level.That can be fixed using the optional argument to
InvokeObjectAccessHook, similar to what we've done for differentiating
internal drops from other drops.Several of the points where those hooks are invoked don't even seem to have
the necessary information for that... Doing that seems to involve passing
heaps of additional state through the backend.
I've been trying pretty hard to avoid that, but I agree that there are
possible stumbling blocks there. I don't think it's an insurmountable
problem, though. We just need to start with a clean definition of
what we want the behavior to be; then, we may need to refactor
somewhat to get there. The problem with the current code is that it's
just accepting whatever fits nicely into the current code as a
specification-in-fact, and I don't think that's a good basis for a
feature on which people will want to build complex and robust systems.
If I appeared a bit grumpy about your review its mostly because I would like
to get the feature to into 9.3 because it would have saved me many hours of
pain in the past, and the review pushed it further away from that.
The review also didn't come in that early - but thats definitely not your
fault, Tom and you did a lot of work on many patches and unfortunately for us
you can't scale limiteless. You shouldn't need to, but reality seems to tell
us otherwise :(
Yeah. I probably should have looked at this more sooner, but (1) I
was busy, (2) it seemed like a lot of active development was
happening, and (3) Thom was reviewing away. And I hate reviewing code
when there are new versions coming out every day, because obviously
there's no point in my doing final pre-commit hacking when that's
happening. I did do some big picture review that resulted in some
significant scope-trimming early in the CommitFest, but I wish now
that I'd dived into it a bit more. I find it rather regrettable that
there were so few people doing review this CommitFest. I spent a lot
of time on things that could have been done by other people, and the
result is that some things like this have gone rather late.
All that having been said, if I had my druthers, we would have bounced
this patch out of the CommitFest on day 1, because it has long been
and continues to be my belief that getting a major feature done in one
CommitFest is unrealistic, unless we're willing to have that
CommitFest take as long as 2 or 3 CommitFests, which I dislike. The
last CommitFest is hard enough without last-minute major feature
submissions; and you'll note that the latest any feature of mine has
ever been committed is late January, and not because I ignore everyone
else to work on my own stuff.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On Tuesday, March 27, 2012 07:34:46 PM Robert Haas wrote:
On Tue, Mar 27, 2012 at 11:58 AM, Andres Freund <andres@2ndquadrant.com>
wrote:
I still think the likeliest way towards that is defining some behaviour
we want regarding
* naming conflict handling
* lockingAnd then religiously make sure the patch adheres to that. That might need
some refactoring of existing code (like putting a art of the utility.c
handling of create table into its own function and such), but should be
doable.I think BEFORE command triggers ideally should run
* before permission checks
* before locking
* before internal checks are done (nameing conflicts, type checks and
such)It is possible to do this, and it would actually be much easier and
less invasive to implement than what Dimitri has done here, but the
downside is that you won't have done the name lookup either. See my
last email to Dimitri for a long explanation of why that restriction
is not easily circumventable: name lookups, locking, and permission
checks are necessarily and inextricably intertwined.
Read your other mail. Comming back to it I think the above might be to
restrictive to be usefull for a big part of use-cases. So your idea of more
hook points below has some merits.
Still, if others
agree this is useful, I think it would be a lot easier to get right
than what we have now.
I think its pretty important to have something thats usable rather easily
which requires names to be resolved and thus permission checks already
performed and (some) locks already acquired.
I think quite some of the possible usages need the facility to be as simple as
possible and won't care about already acquired locks/names.
Some of what we're now doing as part of parse analysis should really
be reclassified. For example, the ProcessUtility branch that handles
T_CreateStmt and T_CreateForeignTableStmt should really be split out
as a separate function that lives in tablecmds.c, and I think at least
some of what's in transformCreateStmt should be moved to tablecmds.c
as well. The current arrangement makes it really hard to guarantee
things like "the name gets looked up just once", which seems obviously
desirable, since strange things will surely happen if the whole
command doesn't agree on which OID it's operating on.
Yes, I aggree, most of that should go from utility.c.
Looking up oids and such before calling the trigger doesn't seem to be
problematic from my pov. Command triggers are a superuser only facility,
additional information they gain are no problem.
Thinking about that I think we should enforce command trigger functions
to be security definer functions.I don't see any benefit from that restriction.
The reason I was thinking it might be a good idea is that they get might get
more knowledge passed than the user could get directly otherwise. Especially
if we extend the scheme to more places/informations.
I actually think that, to really meet all needs here, we may need the
ability to get control in more than one place.Not sure what you mean by that. Before/after permission checks,
before/after consistency checks? That sort of thing?Yes. For example, above you proposed a kind of trigger that fires
really early - basically after parsing but before everything else.
What Dimitri has implemented is a much later trigger that is still
before the meat of the command, but not before *everything*. And then
there's an AFTER trigger, which could fire either after an individual
piece or stage of the command, or after the whole command is complete,
either of which seems potentially useful depending on the
circumstances. I almost think that the BEFORE/AFTER name serves to
confuse rather than to clarify, in this case. It's really a series of
specific hook points: whenever we parse a command (but before we
execute it), after security and sanity checks but before we begin
executing the command, before or after various subcommands, after the
whole command is done, and maybe a few others. When we say BEFORE or
AFTER, we implicitly assume that we want at most two of the things
from that list, and I am not at all sure that's what going to be
enough.
You might have a point there. Not sure if the complexity of explaining all the
different hook points is worth the pain.
What are the phases we have:
* after parse
* logging
* after resolving name
* after checking permisssions (interwined with the former)
* override permission check? INSTEAD?
* after locking (interwined with the former)
* except it needs to be connected to resolving the name/permission check
this doesn't seem to be an attractive hook point
* after validation
* additional validation likely want to hook here
* after execution
* replication might want to hook here
Am I missing an interesting phase?
Allowing all that probably seems to require a substantial refactoring of
commands/ and catalog/
One thing I had thought about doing, in the context of sepgsql, and we
may yet do it, is create a hook that gets invoked whenever we need to
decide whether it's OK to perform an action on an object. For
example, consider ALTER TABLE .. ADD FOREIGN KEY: we'd ask the hook
both "is it OK to add a foreign key to table X?" and also "is it OK to
make a foreign key refer to table Y"? This doesn't fit into the
command-trigger framework at all, but it's definitely useful for
sepgsql, and I bet it's good for other things, too - maybe not that
specific example, but that kind of thing.
Its a neat feature, yes. Seems to be orthogonal yes.
I agree that it's not a very helpful decision, but I'm not the one who
said we wanted command triggers rather than event triggers. :-)Well, but on the other hand what will you do with a "pg_attribute row
added" event. It doesn't even have a oid ;)We refer to everything using ObjectAddress notation - the OID of the
system catalog that contains objects of that type, the OID of the
object itself, and a "sub-object ID" which is 0 in general but the
column number in that specific case. What you can do is go fetch the
pg_attribute row and then do whatever you like - log it, throw an
error, etc. It's a hook that gets invoked when you add a column. If
it's important to differentiate between a column that gets added at
CREATE TABLE time and one that gets added by ALTER TABLE .. ADD
COLUMN, that could be done. It's a lot easier to deal with triggers
that fire in cases you don't care about than to deal with triggers
that don't fire in cases you do care about. The former you can just
ignore.
I don't believe that distinction can be done that easily without more impact
than the CT patch.
I think you need a surprisingly high amount of context to know when to ignore
a trigger. Especially as its not exactly easy to transfer knowledge from one
to the next.
But how would do something like logging the originating statement with
the object access hook machinery?You can't, but you can do a lot of other useful stuff that command
triggers as currently envisioned won't allow. I'm starting to think
that maybe BEFORE triggers need to fire on commands and AFTER triggers
on events? Can we define "start of command execution" as an event? I
don't think we need two completely different facilities, but I really
want to make sure we have a plan for capturing events. I can't
imagine how replication is gonna work if we can't log exactly what we
did, rather than just the raw query string. Statement-based
replication is sometimes useful, especially in heterogenous
environments, but it's always kinda sucky, in that there are edge
cases that break.
I don't think creating *new* DDL from the parsetree can really count as
statement based replication. And again, I don't think implementing that will
cost that much effort.
How would it help us to know - as individual events! - which tuples where
created where? Reassembling that will be a huge mess. I basically fail to see
*any* use case besides access checking.
If you want to know what schema the server is going
to use for a newly created object before it creates it, you've got to
do that calculation yourself, and now you've created a risk of getting
a different answer. If you want to capture all the drops and replay
them on the slave, you have to be able to handle CASCADE.
I would like to have CASCADE handled, yes.
Well, the problem is with just catalog access its just about impossible
to generate differential changes from them. Also the catalog between
clusters *wont be the same* in a large part of the use-cases. Oids will
basically never match.True. ALTER commands are going to need additional information. But
some of that information also won't be available until quite late.
For example, an ALTER on a parent might cascade down to children, and
a command trigger might want to know exactly which tables were
affected. That trigger isn't really BEFORE or AFTER or INSTEAD OF;
it's more like DURING, and at a very specific point DURING.
Well, *I* would like a separate commmand trigger to be fired for sub-actions
but see them qualified as such...
Matching the intent, which is mostly visible in the parsetree, seems to
be a whole much more helpfull.But note that there are a lot of possible things you can't tell from
the parse-tree without deducing them. Sure, for LISTEN, the parse
tree is fine. But for ALTER TABLE or DROP the parse tree doesn't even
tell you what object you're operating on (at least, not unless the
user includes an explicit schema-qualification), let alone the whole
list of objects the command is ultimately going to process due to
cascading, inheritance, etc. You can reimplement that logic in your
trigger but that's both fragile and laborious.
Resolving objects to the schema qualified variant isn't that much of a problem.
Inheritance is, I definitely can see that.
For lots of simple use-cases even something that has some restrictions
attached would be *WAY* better than the current situation where all that is
not possible at all.
I fear a bit that this discussion is leading to something thats never going to
materialize because it would require a huge amount of work to get there.
That can be fixed using the optional argument to
InvokeObjectAccessHook, similar to what we've done for differentiating
internal drops from other drops.
Several of the points where those hooks are invoked don't even seem to
have the necessary information for that... Doing that seems to involve
passing heaps of additional state through the backend.
I've been trying pretty hard to avoid that, but I agree that there are
possible stumbling blocks there. I don't think it's an insurmountable
problem, though.
We just need to start with a clean definition of
what we want the behavior to be; then, we may need to refactor
somewhat to get there. The problem with the current code is that it's
just accepting whatever fits nicely into the current code as a
specification-in-fact, and I don't think that's a good basis for a
feature on which people will want to build complex and robust systems.
I agree that we need consistency there.
I find it rather regrettable that there were so few people doing review this
CommitFest. I spent a lot of time on things that could have been done by
other people, and the result is that some things like this have gone rather
late.
I find that sad too. Even though I am part of the crowd that didn't do enough.
Thanks for the work!
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 27, 2012 at 4:05 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Looking up oids and such before calling the trigger doesn't seem to be
problematic from my pov. Command triggers are a superuser only facility,
additional information they gain are no problem.
Thinking about that I think we should enforce command trigger functions
to be security definer functions.I don't see any benefit from that restriction.
The reason I was thinking it might be a good idea is that they get might get
more knowledge passed than the user could get directly otherwise. Especially
if we extend the scheme to more places/informations.
As long as the superuser gets to decide whether or not a given
function is installed as a command trigger, there's no harm in
allowing him to make it either SECURITY INVOKER or SECURITY DEFINER.
I agree that making it SECURITY DEFINER will often be useful and
appropriate; I just see no reason to enforce that restriction
programatically.
What are the phases we have:
* after parse
* logging
* after resolving name
* after checking permisssions (interwined with the former)
* override permission check? INSTEAD?
* after locking (interwined with the former)
* except it needs to be connected to resolving the name/permission check
this doesn't seem to be an attractive hook point
* after validation
* additional validation likely want to hook here
* after execution
* replication might want to hook hereAm I missing an interesting phase?
I'd sort of categorize it like this:
- pre-parse
- post-parse
- at permissions-checking time
- post permissions-checking/name-lookup, before starting the main work
of the command, i.e. pre-execution
- "event" type triggers that happen when specific actions are
performed (e.g. CREATE, DROP, etc.) or as subcommands fire (e.g. in
ALTER TABLE, we could fire an alter-table-subcommand trigger every
time we execute a subcommand)
- post-execution
Allowing all that probably seems to require a substantial refactoring of
commands/ and catalog/
Probably. But we don't need to allow it all at once. What we need to
do is pick one or two things that are relatively easily done,
implement those first, and then build on it. Pre-parse, post-parse,
CREATE-event, DROP-event, and post-execution hooks are all pretty easy
to implement without major refactoring, I think. OTOH,
post-permissions-checking-pre-execution is going to be hard to
implement without refactoring, and ALTER-event hooks are going to be
hard, too.
I think you need a surprisingly high amount of context to know when to ignore
a trigger. Especially as its not exactly easy to transfer knowledge from one
to the next.
I'm not convinced, but maybe it would be easier to resolve this in the
context of a specific proposal.
I don't think creating *new* DDL from the parsetree can really count as
statement based replication. And again, I don't think implementing that will
cost that much effort.
How would it help us to know - as individual events! - which tuples where
created where? Reassembling that will be a huge mess. I basically fail to see
*any* use case besides access checking.
I think if you'd said this to me two years ago, I would have believed
you, but poking through this code in the last year or two, I've come
to the conclusion that there are a lot of subtle things that get
worked out after parse time, during execution. Aside from things like
recursing down the inheritance hierarchy and maybe creating some
indexes or sequences when creating a table, there's also subtle things
like working out exactly what index we're creating: name, access
method, operator class, collation, etc. And I'm pretty sure that if
we examine the code carefully we'll find there are a bunch more.
I fear a bit that this discussion is leading to something thats never going to
materialize because it would require a huge amount of work to get there.
Again, the way to avoid that is to tackle the simple cases first and
then work on the harder cases after that, but I don't think that's
what the current patch is doing.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
I think BEFORE command triggers ideally should run
* before permission checks
* before locking
* before internal checks are done (nameing conflicts, type checks and such)It is possible to do this, and it would actually be much easier and
less invasive to implement than what Dimitri has done here, but the
downside is that you won't have done the name lookup either.
There's a trade-off decision to take here, that was different in
previous versions of the patch. You can either run the trigger very
early or have specific information details.
The way to have both and keep your sanity, and that was implemented in
the patch, is to have ANY command triggers run before the process
utility big switch and provide only the command tag and parse tree, and
have the specific triggers called after permission, locking and internal
checks are done.
I've been asked to have a single call site for ANY and specific
triggers, which means you can't have BEFORE triggers running either
before or after those steps.
Now I can see why implementing that on top of the ANY command feature is
surprising enough for wanting to not do it this way. Maybe the answer is
to use another keyword to be able to register command triggers that run
that early and without any specific information other than the command
tag.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Wed, Mar 28, 2012 at 4:12 AM, Dimitri Fontaine
<dimitri@2ndquadrant.fr> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I think BEFORE command triggers ideally should run
* before permission checks
* before locking
* before internal checks are done (nameing conflicts, type checks and such)It is possible to do this, and it would actually be much easier and
less invasive to implement than what Dimitri has done here, but the
downside is that you won't have done the name lookup either.There's a trade-off decision to take here, that was different in
previous versions of the patch. You can either run the trigger very
early or have specific information details.The way to have both and keep your sanity, and that was implemented in
the patch, is to have ANY command triggers run before the process
utility big switch and provide only the command tag and parse tree, and
have the specific triggers called after permission, locking and internal
checks are done.I've been asked to have a single call site for ANY and specific
triggers, which means you can't have BEFORE triggers running either
before or after those steps.Now I can see why implementing that on top of the ANY command feature is
surprising enough for wanting to not do it this way. Maybe the answer is
to use another keyword to be able to register command triggers that run
that early and without any specific information other than the command
tag.
Yeah, I think so. I objected to the way you had it because of the
inconsistency, not because I think it's a useless place to fire
triggers.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Mar 28, 2012 at 7:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Now I can see why implementing that on top of the ANY command feature is
surprising enough for wanting to not do it this way. Maybe the answer is
to use another keyword to be able to register command triggers that run
that early and without any specific information other than the command
tag.Yeah, I think so. I objected to the way you had it because of the
inconsistency, not because I think it's a useless place to fire
triggers.
Further thought: I think maybe we shouldn't use keywords at all for
this, and instead use descriptive strings like post-parse or
pre-execution or command-start, because I bet in the end we're going
to end up with a bunch of them (create-schema-subcommand-start?
alter-table-subcommand-start?), and it would be nice not to hassle
with the grammar every time we want to add a new one. To make this
work with the grammar, we could either (1) enforce that each is
exactly one word or (2) require them to be quoted or (3) require those
that are not a single word to be quoted. I tend think #2 might be the
best option in this case, but...
I've also had another general thought about safety: we need to make
sure that we're only firing triggers from places where it's safe for
user code to perform arbitrary actions. In particular, we have to
assume that any triggers we invoke will AcceptInvalidationMessages()
and CommandCounterIncrement(); and we probably can't do it at all (or
at least not without some additional safeguard) in places where the
code does CheckTableNotInUse() up through the point where it's once
again safe to access the table, since the trigger might try. We also
need to think about re-entrancy: that is, in theory, the trigger might
execute any other DDL command - e.g. it might drop the table that
we've been asked to rename. I suspect that some of the current BEFORE
trigger stuff might fall down on that last one, since the existing
code not-unreasonably expects that once it's locked the table, the
catalog entries can't go away. Maybe it all happens to work out OK,
but I don't think we can count on that.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company