[BUG v13] Crash with event trigger in extension
Hi,
Philippe Beaudoin reported to me a crash he was able to reproduce with some
extension update script. Please, find in attachment a script that triggers it.
Just edit the three first vars to point to a temporary installation and
instance.
Some git bisect shows the bug was introduced by b5810de3f4, where
multi-statement query strings are using different short memory contexts.
To make it short, any ALTER TABLE command in an extension script is
currently calling lappend to add its command to the
currentEventTriggerState->commandList list from its own short context instead
of using the context of currentEventTriggerState.
Because of this, when we call EventTriggerCollectSimpleCommand() for the top
level "ALTER EXTENSION" query, the list has long been removed and a new call of
lappend on currentEventTriggerState->commandList raise
Assert(IsPointerList(list)).
Please, find in attachement a bug fix proposal. I did a very quick look at
other ->commandList manipulations in event_trigger.c. They all appear to
be in the correct context in a first look.
Regards,
On 2020-Sep-02, Jehan-Guillaume de Rorthais wrote:
Hi,
Philippe Beaudoin reported to me a crash he was able to reproduce with some
extension update script. Please, find in attachment a script that triggers it.
Just edit the three first vars to point to a temporary installation and
instance.Some git bisect shows the bug was introduced by b5810de3f4, where
multi-statement query strings are using different short memory contexts.
Good find.
I think we should turn your case into a proper TAP test so that we can
incorporate it, for coverage -- the lines you patch do appear covered,
but obviously not in these specific conditions. I'm just not very sure
*where* to add it; maybe src/test/modules/test_extensions is the place
to have it, though that one doesn't have TAP tests currently. An easy
addition, though.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, 2 Sep 2020 13:53:28 -0400
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2020-Sep-02, Jehan-Guillaume de Rorthais wrote:
Hi,
Philippe Beaudoin reported to me a crash he was able to reproduce with some
extension update script. Please, find in attachment a script that triggers
it. Just edit the three first vars to point to a temporary installation and
instance.Some git bisect shows the bug was introduced by b5810de3f4, where
multi-statement query strings are using different short memory contexts.Good find.
I think we should turn your case into a proper TAP test so that we can
incorporate it, for coverage -- the lines you patch do appear covered,
but obviously not in these specific conditions. I'm just not very sure
*where* to add it; maybe src/test/modules/test_extensions is the place
to have it, though that one doesn't have TAP tests currently. An easy
addition, though.
Sure, please find in attachment patch v2 adding a regression test.
Regards,
Attachments:
v2-0001-Fix-crash-with-alter-table-event-triggers-in-extensi.patchtext/x-patchDownload+52-3
On Thu, Sep 03, 2020 at 07:41:14PM +0200, Jehan-Guillaume de Rorthais wrote:
+$node->safe_psql('postgres', + q{CREATE EXTENSION test_event_trigger VERSION '1.0'}); +$node->safe_psql('postgres', + q{ALTER EXTENSION test_event_trigger UPDATE TO '2.0'}); + +my $ans = $node->safe_psql('postgres', q{ + SELECT extversion + FROM pg_catalog.pg_extension + WHERE extname='test_event_trigger' +}); + +is($ans, '2.0', 'Extension test_event_trigger upgraded');
I may be missing something of course, but if this just tests a
sequence of SQL commands, why don't you just include a SQL test in
text_extensions/sql/? That would be much cheaper than having to
create a full scenario with a TAP script.
--
Michael
On Fri, 4 Sep 2020 10:20:13 +0900
Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Sep 03, 2020 at 07:41:14PM +0200, Jehan-Guillaume de Rorthais wrote:
+$node->safe_psql('postgres', + q{CREATE EXTENSION test_event_trigger VERSION '1.0'}); +$node->safe_psql('postgres', + q{ALTER EXTENSION test_event_trigger UPDATE TO '2.0'}); + +my $ans = $node->safe_psql('postgres', q{ + SELECT extversion + FROM pg_catalog.pg_extension + WHERE extname='test_event_trigger' +}); + +is($ans, '2.0', 'Extension test_event_trigger upgraded');I may be missing something of course, but if this just tests a
sequence of SQL commands, why don't you just include a SQL test in
text_extensions/sql/? That would be much cheaper than having to
create a full scenario with a TAP script.
Indeed, it's much more simple and cheaper. However, the result is a bit
confusing as it's not a wrong SQL result, but the backend that actually crash.
The crash message itself appears in results/test_extensions.out, but no other
tests are run. Neither from the same SQL file or from other ones. That Means
the diff is quite large, but the error appears close from the top.
I don't know if some other crash regression tests has been included that way in
the past. But if this is OK, I can append the test to sql/test_extensions.sql
and remove the TAP. Let me know what you think.
Regards,
On 2020-Sep-04, Jehan-Guillaume de Rorthais wrote:
Indeed, it's much more simple and cheaper. However, the result is a bit
confusing as it's not a wrong SQL result, but the backend that actually crash.
The crash message itself appears in results/test_extensions.out, but no other
tests are run. Neither from the same SQL file or from other ones. That Means
the diff is quite large, but the error appears close from the top.
Oh! If it crashes without the extension, then we don't *need* the
exension and surely we don't need the test to be in src/test/modules/;
my inclination would be to put it somewhere in
src/test/regress/sql/event_triggers.sql, maybe at the bottom, with a
comment that references this thread.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, 4 Sep 2020 09:27:15 -0400
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2020-Sep-04, Jehan-Guillaume de Rorthais wrote:
Indeed, it's much more simple and cheaper. However, the result is a bit
confusing as it's not a wrong SQL result, but the backend that actually
crash. The crash message itself appears in results/test_extensions.out, but
no other tests are run. Neither from the same SQL file or from other ones.
That Means the diff is quite large, but the error appears close from the
top.Oh! If it crashes without the extension, then we don't *need* the
exension and surely we don't need the test to be in src/test/modules/;
Oh wait. It's not what I meant. We *need* an extension to trigger the
backend crash.
my inclination would be to put it somewhere in
src/test/regress/sql/event_triggers.sql, maybe at the bottom, with a
comment that references this thread.
As the regression need an extension to be installed, it still makes sens to
test from src/test/modules/test_extension where various extensions are
installed to run the checks.
Please find in attachment patch v3.
Attachments:
v3-0001-Fix-crash-with-alter-table-event-triggers-in-extensi.patchtext/x-patchDownload+47-3
On Fri, Sep 04, 2020 at 04:56:58PM +0200, Jehan-Guillaume de Rorthais wrote:
As the regression need an extension to be installed, it still makes sens to
test from src/test/modules/test_extension where various extensions are
installed to run the checks.Please find in attachment patch v3.
Yeah, the manipulation of the memory context makes the situation
tricky to adapt to, and putting a test in src/test/regress/ would
force us to add a special rule like what we do for contrib/spi/ in
GNUMakefile. So let's keep the test as cheap as we can, and the
structure you have here looks good enough to me.
+-- This a regression test to make sure an extension upgrade does not crash when +-- an event trigger is triggered from the extension script. +-- See: https://postgr.es/m/20200902193715.6e0269d4%40firost + +CREATE EXTENSION test_event_trigger VERSION '1.0'; +ALTER EXTENSION test_event_trigger UPDATE TO '2.0';
Here I would advise to use the correct message ID, aka
20200902193715.6e0269d4@firost. I cannot say for others, but
sometimes I do searches with the message ID directly in my inbox, and
this would not match. So it is better to keep a correct reference
without having to apply the extra math than %40 is @.
The two new SQL files to install and upgrade test_event_trigger are
missing a couple of things at their top, like:
/* src/test/modules/test_extensions/test_event_trigger--1.0.sql */
-- complain if script is sourced in psql, rather than via CREATE
EXTENSION
\echo Use "CREATE EXTENSION test_event_trigger" to load this file. \quit
+SELECT extversion +FROM pg_catalog.pg_extension +WHERE extname='test_event_trigger'; + extversion +------------ + 2.0 +(1 row)
What's the point of having this query? The previous command
succeeded, so we would be fine with the upgraded version.
More comments in each script would be helpful to document to the
reader why those objects are defined as such and their purpose in
life.
--
Michael
On Mon, Sep 07, 2020 at 10:35:08AM +0900, Michael Paquier wrote:
More comments in each script would be helpful to document to the
reader why those objects are defined as such and their purpose in
life.
I have looked at that, and tweaked a couple of things as per the
attached. Alvaro, Jehan-Guillaume, is that fine to you?
--
Michael
Attachments:
v4-0001-Fix-crash-with-alter-table-event-triggers-in-exte.patchtext/x-diff; charset=us-asciiDownload+48-3
On Tue, 8 Sep 2020 14:27:03 +0900
Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Sep 07, 2020 at 10:35:08AM +0900, Michael Paquier wrote:
More comments in each script would be helpful to document to the
reader why those objects are defined as such and their purpose in
life.I have looked at that, and tweaked a couple of things as per the
attached. Alvaro, Jehan-Guillaume, is that fine to you?
I was actually working on an answer with a new version of the patch, but you've
been faster. Thanks.
I would probably set the error message in VERSION '1.0' to
CREATE EXTENSION test_event_trigger VERSION '1.0'
But it's really nitpicking at this point.
More importantly, I wonder if this bug might be the visible tip of another
bigger one. I was wondering why this crash do not triggers without disabling
the event trigger first.
It appears that when keeping the event trigger enabled,
currentEventTriggerState might be initiated multiple times: one for the top
level "alter extension ... update", then once per DDL query triggering en event
in the extension update script. I suppose there's no crash because the very
first currentEventTriggerState initialized at top level is overrode by next
ones, each being destroyed in their own context. So at the end of the
extension script, currentEventTriggerState is just destroyed and empty.
I did not had time to investigate further more and produce a demo case though.
I might misunderstand something here.
Comment?
++
On 2020-Sep-08, Jehan-Guillaume de Rorthais wrote:
It appears that when keeping the event trigger enabled,
currentEventTriggerState might be initiated multiple times: one for the top
level "alter extension ... update", then once per DDL query triggering en event
in the extension update script. I suppose there's no crash because the very
first currentEventTriggerState initialized at top level is overrode by next
ones, each being destroyed in their own context. So at the end of the
extension script, currentEventTriggerState is just destroyed and empty.
Hmm. I spent a lot of time making sure the event trigger thing worked
correctly in reentrant cases, but I don't think the case of execution of
extension scripts was ever considered. So you're right that it might be
broken, since it appears not to be thoroughly tested (if at all).
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 8 Sep 2020 11:33:24 -0300
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2020-Sep-08, Jehan-Guillaume de Rorthais wrote:
It appears that when keeping the event trigger enabled,
currentEventTriggerState might be initiated multiple times: one for the top
level "alter extension ... update", then once per DDL query triggering en
event in the extension update script. I suppose there's no crash because
the very first currentEventTriggerState initialized at top level is
overrode by next ones, each being destroyed in their own context. So at the
end of the extension script, currentEventTriggerState is just destroyed and
empty.Hmm. I spent a lot of time making sure the event trigger thing worked
correctly in reentrant cases, but I don't think the case of execution of
extension scripts was ever considered. So you're right that it might be
broken, since it appears not to be thoroughly tested (if at all).
Argh. After some more debugging, I realized I missed the fact that
EventTriggerBeginCompleteQuery/EventTriggerEndCompleteQuery were actually
stacking allocations using currentEventTriggerState->previous. My bad, sorry for
your time.
I still have a last question in mind though. While running the bellow
scenario, the DDL in the extension do not trigger evt_ext_ddl_fnct. Is it
normal/expected?
BEGIN;
CREATE OR REPLACE FUNCTION _evt_ext_ddl_fnct()
RETURNS EVENT_TRIGGER LANGUAGE plpgsql AS
$$
DECLARE r record;
BEGIN
FOR r in SELECT * FROM pg_event_trigger_ddl_commands() LOOP
raise notice 'called %: %.%',
quote_ident(r.command_tag),
quote_ident(r.schema_name),
quote_ident(r.object_identity);
end loop;
END;
$$;
CREATE EVENT TRIGGER evt_ext_ddl_fnct
ON ddl_command_end
EXECUTE PROCEDURE _evt_ext_ddl_fnct();
CREATE EXTENSION test_event_trigger VERSION '1.0';
ALTER EXTENSION test_event_trigger UPDATE TO '2.0';
CREATE TABLE test( i int);
ROLLBACK;
The only output is:
NOTICE: called "CREATE EXTENSION": <NULL>.test_event_trigger
NOTICE: called "ALTER EXTENSION": <NULL>.test_event_trigger
NOTICE: called "CREATE TABLE": public."public.test"
Regards,
On 2020-Sep-08, Jehan-Guillaume de Rorthais wrote:
I still have a last question in mind though. While running the bellow
scenario, the DDL in the extension do not trigger evt_ext_ddl_fnct. Is it
normal/expected?
Hmm, I think this is intentional; see standard_ProcessUtility and
ProcessUtilitySlow, noting the value of isCompleteQuery for the
execution of the extension script.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 8 Sep 2020 14:32:22 -0300
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2020-Sep-08, Jehan-Guillaume de Rorthais wrote:
I still have a last question in mind though. While running the bellow
scenario, the DDL in the extension do not trigger evt_ext_ddl_fnct. Is it
normal/expected?Hmm, I think this is intentional; see standard_ProcessUtility and
ProcessUtilitySlow, noting the value of isCompleteQuery for the
execution of the extension script.
According to extension.c:execute_sql_string(), queries from
extension script are executed as PROCESS_UTILITY_QUERY context. So
isCompleteQuery is indeed always true in ProcessUtilitySlow. A breakpoint here
while running my test case confirm this.
Maybe you were talking about isTopLevel? But this one doesn't seem considered
while defining if event triggers should trigger or not.
Anyway, if event trigger should not trigger during create/alter extension, I
suppose the original memory context bug that starts this discussion shouldn't
happen in the first place (but need to be fixed anyway), isn't it?
Regards,
On Wed, Sep 09, 2020 at 12:58:40PM +0200, Jehan-Guillaume de Rorthais wrote:
According to extension.c:execute_sql_string(), queries from
extension script are executed as PROCESS_UTILITY_QUERY context. So
isCompleteQuery is indeed always true in ProcessUtilitySlow. A breakpoint here
while running my test case confirm this.Maybe you were talking about isTopLevel? But this one doesn't seem considered
while defining if event triggers should trigger or not.Anyway, if event trigger should not trigger during create/alter extension, I
suppose the original memory context bug that starts this discussion shouldn't
happen in the first place (but need to be fixed anyway), isn't it?
If I read correctly the code of event_trigger.c, command collection
and execution are and should be two different things, meaning that we
should still collect the commands and then at execution time we decide
if the event trigger associated to the commands should be fired or
not.
Anyway, based on your example in [1]/messages/by-id/20200908190759.12405fe5@firost -- Michael, I can see that the event trigger
for ddl_command_end is correctly triggered for the ALTER TABLE command
included in the extension upgrade script if the event trigger is
enabled at the time the extension script triggers, which is the
behavior I would expect. What may be a problem though is that the
NOTICE you are trying to print does not show up, but I think that this
is caused by the particular context where the SQL queries from an
extension script are triggered within the backend in this case.
Also, if you want to make sure of the event trigger execution, you can
just change the notice to an exception in _evt_ext_ddl_fnct() and you
would see ALTER EXTENSION fail, pg_event_trigger_ddl_commands
capturing correctly the information associated to ALTER TABLE for
table "t":
ERROR: P0001: called "ALTER TABLE": public."public.t"
CONTEXT: PL/pgSQL function _evt_ext_ddl_fnct() line 5 at RAISE
LOCATION: exec_stmt_raise, pl_exec.c:3878
FWIW, I think that the fix proposed is fine as-is, and that we had
better apply it. Alvaro, perhaps you would prefer taking care of it?
[1]: /messages/by-id/20200908190759.12405fe5@firost -- Michael
--
Michael
On Fri, 11 Sep 2020 15:14:41 +0900
Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Sep 09, 2020 at 12:58:40PM +0200, Jehan-Guillaume de Rorthais wrote:
According to extension.c:execute_sql_string(), queries from
extension script are executed as PROCESS_UTILITY_QUERY context. So
isCompleteQuery is indeed always true in ProcessUtilitySlow. A breakpoint
here while running my test case confirm this.Maybe you were talking about isTopLevel? But this one doesn't seem
considered while defining if event triggers should trigger or not.Anyway, if event trigger should not trigger during create/alter extension, I
suppose the original memory context bug that starts this discussion
shouldn't happen in the first place (but need to be fixed anyway), isn't
it?If I read correctly the code of event_trigger.c, command collection
and execution are and should be two different things, meaning that we
should still collect the commands and then at execution time we decide
if the event trigger associated to the commands should be fired or
not.Anyway, based on your example in [1], I can see that the event trigger
for ddl_command_end is correctly triggered for the ALTER TABLE command
included in the extension upgrade script if the event trigger is
enabled at the time the extension script triggers, which is the
behavior I would expect. What may be a problem though is that the
NOTICE you are trying to print does not show up, but I think that this
is caused by the particular context where the SQL queries from an
extension script are triggered within the backend in this case.
Also, if you want to make sure of the event trigger execution, you can
just change the notice to an exception in _evt_ext_ddl_fnct() and you
would see ALTER EXTENSION fail, pg_event_trigger_ddl_commands
capturing correctly the information associated to ALTER TABLE for
table "t":
ERROR: P0001: called "ALTER TABLE": public."public.t"
CONTEXT: PL/pgSQL function _evt_ext_ddl_fnct() line 5 at RAISE
LOCATION: exec_stmt_raise, pl_exec.c:3878
Thank for your investigation, explanation and time.
FWIW, I think that the fix proposed is fine as-is, and that we had
better apply it.
+1
On Fri, Sep 11, 2020 at 10:09:00AM +0200, Jehan-Guillaume de Rorthais wrote:
On Fri, 11 Sep 2020 15:14:41 +0900 Michael Paquier <michael@paquier.xyz> wrote:
FWIW, I think that the fix proposed is fine as-is, and that we had
better apply it.+1
Alvaro, would you prefer doing the final checks on this patch by
yourself and potentially commit it?
--
Michael
On 2020-Sep-12, Michael Paquier wrote:
On Fri, Sep 11, 2020 at 10:09:00AM +0200, Jehan-Guillaume de Rorthais wrote:
On Fri, 11 Sep 2020 15:14:41 +0900 Michael Paquier <michael@paquier.xyz> wrote:
FWIW, I think that the fix proposed is fine as-is, and that we had
better apply it.+1
Alvaro, would you prefer doing the final checks on this patch by
yourself and potentially commit it?
Yes, thanks.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Sep-12, Michael Paquier wrote:
On Fri, Sep 11, 2020 at 10:09:00AM +0200, Jehan-Guillaume de Rorthais wrote:
On Fri, 11 Sep 2020 15:14:41 +0900 Michael Paquier <michael@paquier.xyz> wrote:
FWIW, I think that the fix proposed is fine as-is, and that we had
better apply it.+1
Alvaro, would you prefer doing the final checks on this patch by
yourself and potentially commit it?
I gave this a very long look and was convinced that it was going to
break with multi-statement query strings, but it turns out that it
wasn't. So I pushed it without any other changes.
I forgot to add reviewer credit to the commit message :-( My apologies
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Sep 15, 2020 at 09:26:39PM -0300, Alvaro Herrera wrote:
I gave this a very long look and was convinced that it was going to
break with multi-statement query strings, but it turns out that it
wasn't. So I pushed it without any other changes.I forgot to add reviewer credit to the commit message :-( My apologies
No worries. Thanks for committing.
--
Michael