Transaction control in procedures

Started by Peter Eisentrautover 8 years ago41 messageshackers

peter_e@gmx.net

over 8 years ago

Here is a patch that implements transaction control in PL/Python
procedures. (This patch goes on top of "SQL procedures" patch v1.)

So you can do this:

CREATE PROCEDURE transaction_test1()
LANGUAGE plpythonu
AS $$
for i in range(0, 10):
plpy.execute("INSERT INTO test1 (a) VALUES (%d)" % i)
if i % 2 == 0:
plpy.commit()
else:
plpy.rollback()
$$;

CALL transaction_test1();

I started with PL/Python because the internal structures there are more
manageable. Obviously, people will want this for PL/pgSQL as well, and
I have that in the works. It's not in a usable state, but I have found
that the work needed is essentially the same as in PL/Python for example.

I have discovered three groups of obstacles that needed addressing to
make this work. At this point, the patch is more of a demo of what
these issues are, and if we come to satisfactory solutions for each of
them, things should fall into place more easily afterwards.

1) While calling CommitTransactionCommand() in the middle of a utility
command works just fine (several utility commands do this, of course),
calling AbortCurrentTransaction() in a similar way does not. There are
a few pieces of code that think that a transaction abort will always
result in a return to the main control loop, and so they will just clean
away everything. This is what the changes in portalmem.c are about.
Some comments there already hint about the issue. No doubt this will
need further refinement. I think it would be desirable in general to
separate the control flow concerns from the transaction management
concerns more cleanly.

2) SPI needs some work. It thinks that it can clean everything away at
transaction end. I have found that instead of TopTransactionContext one
can use PortalContext and get a more suitable life cycle for the memory.
I have played with some variants to make this configurable (e.g.,
argument to SPI_connect()), but that didn't seem very useful. There are
some comments indicating that there might not always be a PortalContext,
but the existing tests don't seem to mind. (There was a thread recently
about making a fake PortalContext for autovacuum, so maybe the current
idea is that we make sure there always is a PortalContext.) Maybe we
need another context like StatementContext or ProcedureContext.

There also needs to be a way via SPI to end transactions and allowing
*some* cleanup to happen but leaving the memory around. I have done
that via additional SPI API functions like SPI_commit(), which are then
available to PL implementations. I also tried making it possible
calling transaction statements directly via SPI_exec() or similar, but
that ended up a total disaster. So from the API perspective, I like the
current implementation, but the details will no doubt need refinement.

3) The PL implementations themselves allocate memory in
transaction-bound contexts for convenience as well. This is usually
easy to fix by switching to PortalContext as well. As you see, the
PL/Python code part of the patch is actually very small. Changes in
other PLs would be similar.

Two issues have not been addressed yet:

A) It would be desirable to be able to run commands such as VACUUM and
CREATE INDEX CONCURRENTLY in a procedure. This needs a bit of thinking
and standards-lawyering about the semantics, like where exactly do
transactions begin and end in various combinations. It will probably
also need a different entry point into SPI, because SPI_exec cannot
handle statements ending transactions. But so far my assessment is that
this can be added in a mostly independent way later on.

B) There needs to be some kind of call stack for procedure and function
invocations, so that we can track when we are allowed to make
transaction controlling calls. The key term in the SQL standard is
"non-atomic execution context", which seems specifically devised to
cover this scenario. So for example, if you have CALL -> CALL -> CALL,
the third call can issue a transaction statement. But if you have CALL
-> SELECT -> CALL, then the last call cannot, because the SELECT
introduces an atomic execution context. I don't know if we have such a
thing yet or something that we could easily latch on to.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Simon Riggs

simon@2ndQuadrant.com

over 8 years ago

In reply to: Peter Eisentraut (#1)

Re: Transaction control in procedures

On 31 October 2017 at 15:38, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

Here is a patch that implements transaction control in PL/Python
procedures. (This patch goes on top of "SQL procedures" patch v1.)

The patch is incredibly short for such a feature, which is probably a
good indication that it is feasible.

Amazing!

So you can do this:

CREATE PROCEDURE transaction_test1()
LANGUAGE plpythonu
AS $$
for i in range(0, 10):
plpy.execute("INSERT INTO test1 (a) VALUES (%d)" % i)
if i % 2 == 0:
plpy.commit()
else:
plpy.rollback()
$$;

CALL transaction_test1();

I started with PL/Python because the internal structures there are more
manageable.

AFAICS we execute 10 INSERTs and each one runs in a new top-level xid.
The INSERTs succeed in all cases but we then abort odd-numbered ones
and commit even numbered ones.

What would happen if some of the INSERTs failed? Where would control
go to? (Maybe this is just "no change" in this particular proc)

What happens if the procedure is updated during execution? Presumably
it keeps executing the original version as seen in the initial
snapshot?

Does the xmin of this session advance after each transaction, or do we
hold the snapshot used for the procedure body open, causing us to hold
back xmin and prevent vacuuming from being effective?

What would happen if a procedure recursively called itself? And yet it
was updated half-way through? Would that throw an error (I think it
should).

Obviously, people will want this for PL/pgSQL as well, and
I have that in the works. It's not in a usable state, but I have found
that the work needed is essentially the same as in PL/Python for example.

I have discovered three groups of obstacles that needed addressing to
make this work. At this point, the patch is more of a demo of what
these issues are, and if we come to satisfactory solutions for each of
them, things should fall into place more easily afterwards.

1) While calling CommitTransactionCommand() in the middle of a utility
command works just fine (several utility commands do this, of course),
calling AbortCurrentTransaction() in a similar way does not. There are
a few pieces of code that think that a transaction abort will always
result in a return to the main control loop, and so they will just clean
away everything. This is what the changes in portalmem.c are about.
Some comments there already hint about the issue. No doubt this will
need further refinement. I think it would be desirable in general to
separate the control flow concerns from the transaction management
concerns more cleanly.

2) SPI needs some work. It thinks that it can clean everything away at
transaction end. I have found that instead of TopTransactionContext one
can use PortalContext and get a more suitable life cycle for the memory.
I have played with some variants to make this configurable (e.g.,
argument to SPI_connect()), but that didn't seem very useful. There are
some comments indicating that there might not always be a PortalContext,
but the existing tests don't seem to mind. (There was a thread recently
about making a fake PortalContext for autovacuum, so maybe the current
idea is that we make sure there always is a PortalContext.) Maybe we
need another context like StatementContext or ProcedureContext.

There also needs to be a way via SPI to end transactions and allowing
*some* cleanup to happen but leaving the memory around. I have done
that via additional SPI API functions like SPI_commit(), which are then
available to PL implementations. I also tried making it possible
calling transaction statements directly via SPI_exec() or similar, but
that ended up a total disaster. So from the API perspective, I like the
current implementation, but the details will no doubt need refinement.

3) The PL implementations themselves allocate memory in
transaction-bound contexts for convenience as well. This is usually
easy to fix by switching to PortalContext as well. As you see, the
PL/Python code part of the patch is actually very small. Changes in
other PLs would be similar.

Is there some kind of interlock to prevent dropping the portal half way thru?

Will the SPI transaction control functions fail if called from a
normal function?

Two issues have not been addressed yet:

A) It would be desirable to be able to run commands such as VACUUM and
CREATE INDEX CONCURRENTLY in a procedure. This needs a bit of thinking
and standards-lawyering about the semantics, like where exactly do
transactions begin and end in various combinations. It will probably
also need a different entry point into SPI, because SPI_exec cannot
handle statements ending transactions. But so far my assessment is that
this can be added in a mostly independent way later on.

Sounds like a separate commit, but perhaps it influences the design?

B) There needs to be some kind of call stack for procedure and function
invocations, so that we can track when we are allowed to make
transaction controlling calls. The key term in the SQL standard is
"non-atomic execution context", which seems specifically devised to
cover this scenario. So for example, if you have CALL -> CALL -> CALL,
the third call can issue a transaction statement. But if you have CALL
-> SELECT -> CALL, then the last call cannot, because the SELECT
introduces an atomic execution context. I don't know if we have such a
thing yet or something that we could easily latch on to.

Yeh. The internal_xact flag is only set at EoXact, so its not really
set as described in the .h
It would certainly be useful to have some state that allows sanity
checking on weird state transitions.

What we would benefit from is a README that gives the theory of
operation, so everyone can read and agree.

Presumably we would need to contact authors of main PL languages to
get them to comment on the API requirements for their languages.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Merlin Moncure

mmoncure@gmail.com

over 8 years ago

In reply to: Simon Riggs (#2)

Re: [HACKERS] Transaction control in procedures

On Wed, Nov 8, 2017 at 5:48 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 31 October 2017 at 15:38, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

Here is a patch that implements transaction control in PL/Python
procedures. (This patch goes on top of "SQL procedures" patch v1.)

The patch is incredibly short for such a feature, which is probably a
good indication that it is feasible.

Amazing!

I have to agree with that. I'm really excited about this...

Some questions:
*) Will it be possible to do operations like this in pl/pgsql?

BEGIN
SELECT INTO r * FROM foo;

START TRANSACTION; -- perhaps we ought to have a special function
for this instead (BEGIN is reserved, etc).
SET transaction_isololation TO serializable;
...

*) Will there be any negative consequences to a procedure running
with an unbounded run time? For example, something like:

LOOP
SELECT check_for_stuff_to_do();

IF stuff_to_do
THEN
do_stuff();
ELSE
PERFORM pg_sleep(1);
END IF;
END LOOP;

*) Will pg_cancel_backend() cancel the currently executing statement
or the procedure? (I guess probably the procedure but I'm curious)

*) Will long running procedures be subject to statement timeout (and
does it apply to the entire procedure)? Will they be able to control
statement_timeout from within the procedure itself?

*) Will pg_stat_activity show the invoking CALL or the currently
executing statement? I see a strong argument for showing both of
these things. although I understand that's out of scope here.

If these questions (especially the first two) come down the correct
way, then it will mean that I can stop coding in other languages
(primarily bash) for a fairly large number of cases that I really
think belong in the database itself. This would really simplify
coding, some things in bash are really awkward to get right such as a
mutex to guarantee single script invocation. My only real dependency
on the operation system environment at that point would be cron to
step in to the backround daemon process (which would immediately set
an advisory lock).

I'm somewhat surprised that SPI is the point of attack for this
functionality, but if it works that's really the best case scenario
(the only downside I can see is that the various out of core pl/s have
to implement the interface individually).

merlin

Peter Eisentraut

peter_e@gmx.net

over 8 years ago

In reply to: Merlin Moncure (#3)

Re: [HACKERS] Transaction control in procedures

On 11/14/17 09:27, Merlin Moncure wrote:

*) Will it be possible to do operations like this in pl/pgsql?

BEGIN
SELECT INTO r * FROM foo;

START TRANSACTION; -- perhaps we ought to have a special function
for this instead (BEGIN is reserved, etc).
SET transaction_isololation TO serializable;
...

Eventually, I don't see why not. Currently, it's not complete.

One detail in your example is that when you enter the procedure, you are
already in a transaction, so you would have to run either COMMIT or
ROLLBACK before the START TRANSACTION.

Also, you can't run SET TRANSACTION ISOLATION through SPI, so one would
have to implement a separate code path for that, but that would just be
a bit of leg work.

*) Will there be any negative consequences to a procedure running
with an unbounded run time? For example, something like:

LOOP
SELECT check_for_stuff_to_do();

IF stuff_to_do
THEN
do_stuff();
ELSE
PERFORM pg_sleep(1);
END IF;
END LOOP;

That procedure doesn't do anything with transactions, so it's just like
a long-running function. Otherwise, it'd just be like long-running
client code managing transactions.

*) Will pg_cancel_backend() cancel the currently executing statement
or the procedure? (I guess probably the procedure but I'm curious)

Same as the way it currently works. It will raise an exception, which
will travel up the stack and eventually issue an error or be caught. If
someone knows more specific concerns here I could look into it, but I
don't see any problem.

*) Will long running procedures be subject to statement timeout (and
does it apply to the entire procedure)?

See previous item.

Will they be able to control
statement_timeout from within the procedure itself?

The statement timeout alarm is set by the top-level execution loop, so
you can't change a statement timeout that is already in progress. But
you could change the GUC and commit it for the next top-level statement.

*) Will pg_stat_activity show the invoking CALL or the currently
executing statement? I see a strong argument for showing both of
these things. although I understand that's out of scope here.

Not different from a function execution, i.e., top-level statement.

If these questions (especially the first two) come down the correct
way, then it will mean that I can stop coding in other languages
(primarily bash) for a fairly large number of cases that I really
think belong in the database itself. This would really simplify
coding, some things in bash are really awkward to get right such as a
mutex to guarantee single script invocation. My only real dependency
on the operation system environment at that point would be cron to
step in to the backround daemon process (which would immediately set
an advisory lock).

Well, some kind of scheduler around procedures would be pretty cool at
some point.

I'm somewhat surprised that SPI is the point of attack for this
functionality, but if it works that's really the best case scenario
(the only downside I can see is that the various out of core pl/s have
to implement the interface individually).

So I tried different things here, and I'll list them here to explain how
I got there.

Option zero is to not use SPI at all and implement a whole new internal
command execution system. But that would obviously be a large amount of
work, and it would look 85% like SPI, and as it turns out it's not
necessary.

The first thing I tried out what to run transaction control statements
through SPI. That turned out to be very complicated and confusing and
fragile, mainly because of the dichotomy between the internal
subtransaction management that the PLs do and the explicit transaction
control statements on the other hand. It was just a giant unworkable mess.

The next thing I tried was to shut down (SPI_finish) SPI before a
transaction boundary command and restart it (SPI_connect) it afterwards.
That would work in principle, but it would require a fair amount of
work in each PL, because they implicitly rely on SPI (or perhaps are
tangled up with SPI) for memory management.

The setup I finally arrived at was to implement the transaction boundary
commands as SPI API calls and let them internally make sure that only
the appropriate stuff is cleared away at transaction boundaries. This
turned out to be the easiest and cleanest. I have since the last patch
implemented the transaction control capabilities in PL/pgSQL, PL/Perl,
and PL/Tcl, and it was entirely trivial once the details were worked out
as I had shown in PL/Python. I will post an updated patch with this soon.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Merlin Moncure

mmoncure@gmail.com

over 8 years ago

In reply to: Peter Eisentraut (#4)

Re: [HACKERS] Transaction control in procedures

On Tue, Nov 14, 2017 at 12:09 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 11/14/17 09:27, Merlin Moncure wrote:

*) Will it be possible to do operations like this in pl/pgsql?

BEGIN
SELECT INTO r * FROM foo;

START TRANSACTION; -- perhaps we ought to have a special function
for this instead (BEGIN is reserved, etc).
SET transaction_isololation TO serializable;
...

Eventually, I don't see why not. Currently, it's not complete.

One detail in your example is that when you enter the procedure, you are
already in a transaction, so you would have to run either COMMIT or
ROLLBACK before the START TRANSACTION.

Ok, that's good, but it seems a little wonky to me to have to issue
COMMIT first. Shouldn't that be the default? Meaning you would not
be *in* a transaction unless you specified to be in one.

Also, you can't run SET TRANSACTION ISOLATION through SPI, so one would
have to implement a separate code path for that, but that would just be
a bit of leg work.

Roger -- I'm more interested in if your design generally supports this
being able to this (either now or in the future...). I'm hammering on
this point for basically two reasons:

1) Trying to understand if the MVCC snapshot creation can be
meaningfully controlled (I think so, but I'll verify)).
2) This is an important case for databases that want to run in a mode
(typically serializeable) but lower the isolation for specific cases;
for example to loop on a special flag being set in a table. It's
annoying to only be able to specify this on the client side; I tend to
like to abstract arcane database considerations into the database
whenever possible.

*) Will there be any negative consequences to a procedure running
with an unbounded run time? For example, something like:

LOOP
SELECT check_for_stuff_to_do();

IF stuff_to_do
THEN
do_stuff();
ELSE
PERFORM pg_sleep(1);
END IF;
END LOOP;

That procedure doesn't do anything with transactions, so it's just like
a long-running function. Otherwise, it'd just be like long-running
client code managing transactions.

Can we zero in on this? The question implied, 'can you do this
without being in a transaction'? PERFORM do_stuff() is a implicit
transaction, so it ought to end when the function returns right?
Meaning, assuming I was not already in a transaction when hitting this
block, I would not be subject to an endless transaction duration?

*) Will pg_cancel_backend() cancel the currently executing statement

or the procedure? (I guess probably the procedure but I'm curious)

Same as the way it currently works. It will raise an exception, which
will travel up the stack and eventually issue an error or be caught. If
someone knows more specific concerns here I could look into it, but I
don't see any problem.

Yeah, that works.

I'm somewhat surprised that SPI is the point of attack for this
functionality, but if it works that's really the best case scenario
(the only downside I can see is that the various out of core pl/s have
to implement the interface individually).

So I tried different things here, and I'll list them here to explain how
I got there.

Option zero is to not use SPI at all and implement a whole new internal
command execution system. But that would obviously be a large amount of
work, and it would look 85% like SPI, and as it turns out it's not
necessary.

The first thing I tried out what to run transaction control statements
through SPI. That turned out to be very complicated and confusing and
fragile, mainly because of the dichotomy between the internal
subtransaction management that the PLs do and the explicit transaction
control statements on the other hand. It was just a giant unworkable mess.

The next thing I tried was to shut down (SPI_finish) SPI before a
transaction boundary command and restart it (SPI_connect) it afterwards.
That would work in principle, but it would require a fair amount of
work in each PL, because they implicitly rely on SPI (or perhaps are
tangled up with SPI) for memory management.

The setup I finally arrived at was to implement the transaction boundary
commands as SPI API calls and let them internally make sure that only
the appropriate stuff is cleared away at transaction boundaries. This
turned out to be the easiest and cleanest. I have since the last patch
implemented the transaction control capabilities in PL/pgSQL, PL/Perl,
and PL/Tcl, and it was entirely trivial once the details were worked out
as I had shown in PL/Python. I will post an updated patch with this soon.

well, you've convinced me. now that you've got pl/pgsql implemented
I'll fire it up and see if I can make qualitative assessments...

merlin

legrand legrand

legrand_legrand@hotmail.com

over 8 years ago

In reply to: Merlin Moncure (#5)

Re: Transaction control in procedures

will that kind of statement (that is permitted with Oracle but gives errors
ora-1555 snapshot too old) be permitted ?

begin
for c in (select id from tab where cond='blabla')
loop
update tab set x=1 where id=c.id;
commit;
end loop;
end;

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Peter Eisentraut

peter_e@gmx.net

over 8 years ago

In reply to: Merlin Moncure (#5)

Re: [HACKERS] Transaction control in procedures

On 11/14/17 16:33, Merlin Moncure wrote:

One detail in your example is that when you enter the procedure, you are
already in a transaction, so you would have to run either COMMIT or
ROLLBACK before the START TRANSACTION.

Ok, that's good, but it seems a little wonky to me to have to issue
COMMIT first. Shouldn't that be the default? Meaning you would not
be *in* a transaction unless you specified to be in one.

But that's not how this feature is defined in the SQL standard and AFAIK
other implementations. When you enter the procedure call, you are in a
transaction. For one thing, a procedure does not *have* to do
transaction control. So if it's a plain old procedure like a function
that just runs a few statements, there needs to be a transaction. We
can't know ahead of time whether the procedure will execute a
transaction control statement and then retroactively change when the
transaction should have started. Something like an autonomous
transaction procedure might work more like that, but not this feature.

Also, you can't run SET TRANSACTION ISOLATION through SPI, so one would
have to implement a separate code path for that, but that would just be
a bit of leg work.

Roger -- I'm more interested in if your design generally supports this
being able to this (either now or in the future...).

Nothing in this patch really changes anything about how transactions
themselves work. So I don't see why any of this shouldn't work. As I
fill in the gaps in the code, I'll make sure to come back around to
this, but for the time being I can't say anything more.

Can we zero in on this? The question implied, 'can you do this
without being in a transaction'? PERFORM do_stuff() is a implicit
transaction, so it ought to end when the function returns right?
Meaning, assuming I was not already in a transaction when hitting this
block, I would not be subject to an endless transaction duration?

In the server, you are always in a transaction, so that's not how this
works. I think this also ties into my first response above.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Peter Eisentraut

peter_e@gmx.net

over 8 years ago

In reply to: Peter Eisentraut (#1)

Re: [HACKERS] Transaction control in procedures

On 10/31/17 15:38, Peter Eisentraut wrote:

Here is a patch that implements transaction control in PL/Python
procedures. (This patch goes on top of "SQL procedures" patch v1.)

Here is an updated patch, now on top of "SQL procedures" v2.

Relative to the previous patch v1 I added transaction control to
PL/pgSQL, PL/Perl, and PL/Tcl with relative ease. (There is a weird
crash in one PL/Perl test that is currently commented out. And I can't
get a proper backtrace. Maybe something to do with recursive Perl
interpreters?)

I crash-coursed myself in PL/Perl and PL/Tcl (and Tcl). If anyone has
more of a feel for those languages and wants to comment on the proposed
interfaces and internals, please chime in.

I also added tracking so that transaction control commands can only be
made in the proper context, currently meaning only top-level procedure
calls, not functions or other procedure calls. This should be extended
to also allow nested CALLs without anything in between, but I need more
time to code that.

I'll spend a bit more time on tidying up a few things, and a bunch of
documentation is missing, but I currently don't see any more major
issues here.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Merlin Moncure

mmoncure@gmail.com

over 8 years ago

In reply to: Peter Eisentraut (#7)

Re: [HACKERS] Transaction control in procedures

On Tue, Nov 14, 2017 at 5:27 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 11/14/17 16:33, Merlin Moncure wrote:

One detail in your example is that when you enter the procedure, you are
already in a transaction, so you would have to run either COMMIT or
ROLLBACK before the START TRANSACTION.

Ok, that's good, but it seems a little wonky to me to have to issue
COMMIT first. Shouldn't that be the default? Meaning you would not
be *in* a transaction unless you specified to be in one.

But that's not how this feature is defined in the SQL standard and AFAIK
other implementations. When you enter the procedure call, you are in a
transaction. For one thing, a procedure does not *have* to do
transaction control. So if it's a plain old procedure like a function
that just runs a few statements, there needs to be a transaction.

Hm, OK. Well, SQL Server (which is pretty far from the SQL standard)
works that way. See here:
http://www.4guysfromrolla.com/webtech/080305-1.shtml. DB2, which is
very close to the SQL standard, only supports COMMIT/ROLLBACK (not
begin/start etc)
https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.0/com.ibm.nz.sproc.doc/c_sproc_transaction_commits_and_rollbacks.html.
Either approach is ok I guess, and always being in a transaction
probably has some advantages. performance being an obvious one. With
DB2, the COMMIT statement acts as kind of a flush, or a paired
'commit;begin;'.

Can we zero in on this? The question implied, 'can you do this
without being in a transaction'? PERFORM do_stuff() is a implicit
transaction, so it ought to end when the function returns right?
Meaning, assuming I was not already in a transaction when hitting this
block, I would not be subject to an endless transaction duration?

In the server, you are always in a transaction, so that's not how this
works. I think this also ties into my first response above.

I'll try this out myself, but as long as we can have a *bounded*
transaction lifetime (basically the time to do stuff + 1 second) via
something like:
LOOP
<do stuff>
COMMIT;
PERFORM pg_sleep(1);
END LOOP;

... I'm good. I'll try your patch out ASAP. Thanks for answering all
my questions.

merlin

#10

Pavel Stehule

pavel.stehule@gmail.com

over 8 years ago

In reply to: Merlin Moncure (#9)

Re: [HACKERS] Transaction control in procedures

2017-11-15 14:38 GMT+01:00 Merlin Moncure <mmoncure@gmail.com>:

On Tue, Nov 14, 2017 at 5:27 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 11/14/17 16:33, Merlin Moncure wrote:

One detail in your example is that when you enter the procedure, you

are

already in a transaction, so you would have to run either COMMIT or
ROLLBACK before the START TRANSACTION.

Ok, that's good, but it seems a little wonky to me to have to issue
COMMIT first. Shouldn't that be the default? Meaning you would not
be *in* a transaction unless you specified to be in one.

But that's not how this feature is defined in the SQL standard and AFAIK
other implementations. When you enter the procedure call, you are in a
transaction. For one thing, a procedure does not *have* to do
transaction control. So if it's a plain old procedure like a function
that just runs a few statements, there needs to be a transaction.

Hm, OK. Well, SQL Server (which is pretty far from the SQL standard)
works that way. See here:
http://www.4guysfromrolla.com/webtech/080305-1.shtml. DB2, which is
very close to the SQL standard, only supports COMMIT/ROLLBACK (not
begin/start etc)
https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.
0/com.ibm.nz.sproc.doc/c_sproc_transaction_commits_and_rollbacks.html.
Either approach is ok I guess, and always being in a transaction
probably has some advantages. performance being an obvious one. With
DB2, the COMMIT statement acts as kind of a flush, or a paired
'commit;begin;'.

same in Oracle PL/SQL

Show quoted text

Can we zero in on this? The question implied, 'can you do this
without being in a transaction'? PERFORM do_stuff() is a implicit
transaction, so it ought to end when the function returns right?
Meaning, assuming I was not already in a transaction when hitting this
block, I would not be subject to an endless transaction duration?

In the server, you are always in a transaction, so that's not how this
works. I think this also ties into my first response above.

I'll try this out myself, but as long as we can have a *bounded*
transaction lifetime (basically the time to do stuff + 1 second) via
something like:
LOOP
<do stuff>
COMMIT;
PERFORM pg_sleep(1);
END LOOP;

... I'm good. I'll try your patch out ASAP. Thanks for answering all
my questions.

merlin

#11

Merlin Moncure

mmoncure@gmail.com

over 8 years ago

In reply to: Merlin Moncure (#9)

Re: [HACKERS] Transaction control in procedures

On Wed, Nov 15, 2017 at 7:38 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Tue, Nov 14, 2017 at 5:27 PM, Peter Eisentraut

Can we zero in on this? The question implied, 'can you do this
without being in a transaction'? PERFORM do_stuff() is a implicit
transaction, so it ought to end when the function returns right?
Meaning, assuming I was not already in a transaction when hitting this
block, I would not be subject to an endless transaction duration?

In the server, you are always in a transaction, so that's not how this
works. I think this also ties into my first response above.

I'll try this out myself, but as long as we can have a *bounded*
transaction lifetime (basically the time to do stuff + 1 second) via
something like:
LOOP
<do stuff>
COMMIT;
PERFORM pg_sleep(1);
END LOOP;

... I'm good. I'll try your patch out ASAP. Thanks for answering all
my questions.

Trying this out (v2 both patches, compiled clean, thank you!),
postgres=# CREATE OR REPLACE PROCEDURE foo() AS
$$
BEGIN
LOOP
PERFORM 1;
COMMIT;
RAISE NOTICE '%', now();
PERFORM pg_sleep(1);
END LOOP;
END;
$$ LANGUAGE PLPGSQL;
CREATE PROCEDURE
Time: 0.996 ms
postgres=# call foo();
NOTICE: 2017-11-15 08:52:08.936025-06
NOTICE: 2017-11-15 08:52:08.936025-06

... I noticed that:
*) now() did not advance with commit and,
*) xact_start via pg_stat_activity did not advance

Shouldn't both of those advance with the in-loop COMMIT?

merlin

#12

Peter Eisentraut

peter_e@gmx.net

over 8 years ago

In reply to: legrand legrand (#6)

Re: Transaction control in procedures

On 11/14/17 17:40, legrand legrand wrote:

will that kind of statement (that is permitted with Oracle but gives errors
ora-1555 snapshot too old) be permitted ?

begin
for c in (select id from tab where cond='blabla')
loop
update tab set x=1 where id=c.id;
commit;
end loop;
end;

Hmm, that currently results in

ERROR: cannot commit while a portal is pinned

You say this fails in Oracle too. Are we supposed to make this work
somehow?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13

Peter Eisentraut

peter_e@gmx.net

over 8 years ago

In reply to: Simon Riggs (#2)

Re: [HACKERS] Transaction control in procedures

On 11/8/17 18:48, Simon Riggs wrote:

What would happen if some of the INSERTs failed? Where would control
go to? (Maybe this is just "no change" in this particular proc)

An exception is raised and unless the exception is caught (depending on
the PL), control leaves the procedure. What is already committed stays.

What happens if the procedure is updated during execution? Presumably
it keeps executing the original version as seen in the initial
snapshot?

correct

Does the xmin of this session advance after each transaction, or do we
hold the snapshot used for the procedure body open, causing us to hold
back xmin and prevent vacuuming from being effective?

What would happen if a procedure recursively called itself? And yet it
was updated half-way through? Would that throw an error (I think it
should).

I don't think anything special happens here. The snapshot that is used
to read the procedure source code and other meta information is released
at a transaction boundary.

3) The PL implementations themselves allocate memory in
transaction-bound contexts for convenience as well. This is usually
easy to fix by switching to PortalContext as well. As you see, the
PL/Python code part of the patch is actually very small. Changes in
other PLs would be similar.

Is there some kind of interlock to prevent dropping the portal half way thru?

I should probably look this up, but I don't think this is fundamentally
different from how VACUUM and CREATE INDEX CONCURRENTLY run inside a
portal and issue multiple transactions in sequence.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#14

Peter Eisentraut

peter_e@gmx.net

over 8 years ago

In reply to: Merlin Moncure (#11)

Re: [HACKERS] Transaction control in procedures

On 11/15/17 09:54, Merlin Moncure wrote:

... I noticed that:
*) now() did not advance with commit and,
*) xact_start via pg_stat_activity did not advance

Shouldn't both of those advance with the in-loop COMMIT?

I think you are correct. I'll include that in the next patch version.
It shouldn't be difficult.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#15

legrand legrand

legrand_legrand@hotmail.com

over 8 years ago

In reply to: Peter Eisentraut (#12)

RE: Transaction control in procedures

We are just opening the "close cursors on/at commit" specification ;o)

- MS SQL server: cursor_close_on_commit
- Firebird: close_cursors_at_commit
- DB2: "with hold" syntax
- ...

I think it a plus to support keeping opened cursors at commit time,
but impacts have to be checked in details ...

Oracle Ora-1555 error comes in the extreme situation where rows used inside the cursor are modified, commited, before to be fetched.

#16

Merlin Moncure

mmoncure@gmail.com

over 8 years ago

In reply to: Peter Eisentraut (#14)

Re: [HACKERS] Transaction control in procedures

On Wed, Nov 15, 2017 at 3:42 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 11/15/17 09:54, Merlin Moncure wrote:

... I noticed that:
*) now() did not advance with commit and,
*) xact_start via pg_stat_activity did not advance

Shouldn't both of those advance with the in-loop COMMIT?

I think you are correct. I'll include that in the next patch version.
It shouldn't be difficult.

Thanks. A couple of more things.

*) This error message is incorrect now:
postgres=# CREATE OR REPLACE PROCEDURE foo() AS
$$
BEGIN
LOOP
SAVEPOINT x;
END LOOP;
END;
$$ LANGUAGE PLPGSQL;
CREATE PROCEDURE
Time: 0.912 ms
postgres=# call foo();
ERROR: cannot begin/end transactions in PL/pgSQL
HINT: Use a BEGIN block with an EXCEPTION clause instead.
CONTEXT: PL/pgSQL function foo() line 5 at SQL statement

I guess there are a few places that assume pl/pgsql is always run from
a in-transaction function.

*) Exception handlers seem to override COMMITs. The the following
procedure will not insert any rows. I wonder if this is the correct
behavior. I think there's a pretty good case to be made to raise an
error if a COMMIT is issued if you're in an exception block.

CREATE OR REPLACE PROCEDURE foo() AS
$$
BEGIN
LOOP
INSERT INTO foo DEFAULT VALUES;
COMMIT;
RAISE EXCEPTION 'test';
END LOOP;
EXCEPTION
WHEN OTHERS THEN RAISE NOTICE '%', SQLERRM;
END;
$$ LANGUAGE PLPGSQL;

*) The documentation could use some work. Would you like some help?

merlin

#17

Peter Eisentraut

peter_e@gmx.net

over 8 years ago

In reply to: legrand legrand (#15)

Re: Transaction control in procedures

On 11/16/17 07:04, legrand legrand wrote:

We are just opening theï¿½ "close cursors on/at commit" specification ;o)

- MS SQL server: cursor_close_on_commit
- Firebird: close_cursors_at_commit
- DB2: "with hold" syntax
- ...

I think it a plus to supportï¿½keeping opened cursors at commit time,
but impacts have to be checked in details ...

I think the facilities to support this in PostgreSQL are already there.
We'd just have to tweak PL/pgSQL to make some of its internal portals
"held" and then clean them up manually at some later point. So I think
this is a localized detail, not a fundamental problem.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#18

Merlin Moncure

mmoncure@gmail.com

over 8 years ago

In reply to: Peter Eisentraut (#17)

Re: Transaction control in procedures

On Thu, Nov 16, 2017 at 12:36 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 11/16/17 07:04, legrand legrand wrote:

We are just opening the "close cursors on/at commit" specification ;o)

- MS SQL server: cursor_close_on_commit
- Firebird: close_cursors_at_commit
- DB2: "with hold" syntax
- ...

I think it a plus to support keeping opened cursors at commit time,
but impacts have to be checked in details ...

I think the facilities to support this in PostgreSQL are already there.
We'd just have to tweak PL/pgSQL to make some of its internal portals
"held" and then clean them up manually at some later point. So I think
this is a localized detail, not a fundamental problem.

Automatically persisting cursors (WITH HOLD) can have some very
surprising performance considerations, except when the current code
execution depends on that particular cursor, in which case the current
behavior of raising a (hopefully better worded-) error seems
appropriate. Cursors based on temporary tables could be exempt from
having to be closed or checked on COMMIT.

plpgsql does not have the facility to create held cursors
FWICT...automatic promotion seems pretty dubious. It could certainly
be added, and cursors so held could be exempt from being force
closed/errored as well. In lieu of that, having users materialize data
in to temp tables for such cases seems reasonable.

merlin

#19

Simon Riggs

simon@2ndQuadrant.com

over 8 years ago

In reply to: Peter Eisentraut (#13)

Re: [HACKERS] Transaction control in procedures

On 15 November 2017 at 16:36, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 11/8/17 18:48, Simon Riggs wrote:

What would happen if some of the INSERTs failed? Where would control
go to? (Maybe this is just "no change" in this particular proc)

An exception is raised and unless the exception is caught (depending on
the PL), control leaves the procedure. What is already committed stays.

What happens if the procedure is updated during execution? Presumably
it keeps executing the original version as seen in the initial
snapshot?

correct

Does the xmin of this session advance after each transaction, or do we
hold the snapshot used for the procedure body open, causing us to hold
back xmin and prevent vacuuming from being effective?

What would happen if a procedure recursively called itself? And yet it
was updated half-way through? Would that throw an error (I think it
should).

I don't think anything special happens here. The snapshot that is used
to read the procedure source code and other meta information is released
at a transaction boundary.

I think we need to document that, or at least note in README

It's quite important for VACUUM.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#20

Simon Riggs

simon@2ndQuadrant.com

over 8 years ago

In reply to: Peter Eisentraut (#4)

Re: [HACKERS] Transaction control in procedures

On 14 November 2017 at 13:09, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

*) Will pg_cancel_backend() cancel the currently executing statement
or the procedure? (I guess probably the procedure but I'm curious)

Same as the way it currently works. It will raise an exception, which
will travel up the stack and eventually issue an error or be caught. If
someone knows more specific concerns here I could look into it, but I
don't see any problem.

*) Will long running procedures be subject to statement timeout (and
does it apply to the entire procedure)?

See previous item.

Will they be able to control
statement_timeout from within the procedure itself?

The statement timeout alarm is set by the top-level execution loop, so
you can't change a statement timeout that is already in progress. But
you could change the GUC and commit it for the next top-level statement.

*) Will pg_stat_activity show the invoking CALL or the currently
executing statement? I see a strong argument for showing both of
these things. although I understand that's out of scope here.

Not different from a function execution, i.e., top-level statement.

Which is the "top-level statement"? The CALL or the currently
executing statement within the proc? I think you mean former.

For the first two answers above the answer was "currently executing
statement", yet the third answer seems to be the procedure. So that is
a slight discrepancy.

ISTM we would like

1) a way to cancel execution of a procedure
2) a way to set a timeout to cancel execution of a procedure

as well as

1) a way to cancel execution of a statement that is running within a procedure
2) a way to set a timeout to cancel execution of a statement in a procedure

Visibility of what a routine is currently executing is the role of a
debugger utility/API.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#21

Merlin Moncure

mmoncure@gmail.com

over 8 years ago

In reply to: Simon Riggs (#20)

#22

Peter Eisentraut

peter_e@gmx.net

over 8 years ago

In reply to: Simon Riggs (#20)