[RFC] Transaction management overhaul is necessary?

Started by Tsunakawa, Takayukiabout 9 years ago4 messages

Tsunakawa, Takayuki

tsunakawa.takay@jp.fujitsu.com

about 9 years ago

Hello,

From our experience in handling customers' problems, I feel it's necessary to evolve PostgreSQL's transaction management. The concrete problems are:

1. PostgreSQL cannot end and begin transactions in PL/pgSQL and PL/Java stored functions.
This is often the reason people could not migrate to PostgreSQL.

2. PostgreSQL does not support statement-level rollback.
When some customer ran a batch app using psqlODBC, one postgres process used dozens of GBs of memory and crashed the OS. The batch app prepares some SQL statements with parameters, execute it five millions of times with different parameter values in a single transaction. They didn't experience a problem with Oracle.

This was because psqlODBC starts and ends a subtransaction for each SQL statement by default to implement statement-level rollback. And PostgreSQL creates one CurTransactionContext memory context, which is 8KB, for each subtransaction and retain them until the top transaction ends. The total memory used becomes 40GB (8KB * 5 million subtransactions.) This was avoided by setting the Protocol parameter to 7.4-1, which means transaction-level rollback.

The savepoint approach for supporting statement-level rollback is inefficient, because it adds two roundtrips (SAVEPOINT and RELEASE) for each statement.

I know autonomous transaction is also discussed, which seems to be difficult, so I hope some kind of transaction management overhaul can be discussed to cover all these transaction-related features. How should I start? I found the following item in the TODO list (but I haven't read it yet.) What other discussions should I look at?

--------------------------------------------------
Implement stored procedures
This might involve the control of transaction state and the return of multiple result sets
PL/pgSQL stored procedure returning multiple result sets (SELECTs)?
Proposal: real procedures again (8.4)
http://archives.postgresql.org/pgsql-hackers/2010-09/msg00542.php
Gathering specs and discussion on feature (post 9.1)
--------------------------------------------------

Regards
Takayuki Tsunakawa

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pavel.stehule@gmail.com

about 9 years ago

In reply to: Tsunakawa, Takayuki (#1)

Re: [RFC] Transaction management overhaul is necessary?

2016-10-21 10:24 GMT+02:00 Tsunakawa, Takayuki <
tsunakawa.takay@jp.fujitsu.com>:

Hello,

From our experience in handling customers' problems, I feel it's necessary
to evolve PostgreSQL's transaction management. The concrete problems are:

1. PostgreSQL cannot end and begin transactions in PL/pgSQL and PL/Java
stored functions.
This is often the reason people could not migrate to PostgreSQL.

2. PostgreSQL does not support statement-level rollback.
When some customer ran a batch app using psqlODBC, one postgres process
used dozens of GBs of memory and crashed the OS. The batch app prepares
some SQL statements with parameters, execute it five millions of times with
different parameter values in a single transaction. They didn't experience
a problem with Oracle.

This was because psqlODBC starts and ends a subtransaction for each SQL
statement by default to implement statement-level rollback. And PostgreSQL
creates one CurTransactionContext memory context, which is 8KB, for each
subtransaction and retain them until the top transaction ends. The total
memory used becomes 40GB (8KB * 5 million subtransactions.) This was
avoided by setting the Protocol parameter to 7.4-1, which means
transaction-level rollback.

The savepoint approach for supporting statement-level rollback is
inefficient, because it adds two roundtrips (SAVEPOINT and RELEASE) for
each statement.

I know autonomous transaction is also discussed, which seems to be
difficult, so I hope some kind of transaction management overhaul can be
discussed to cover all these transaction-related features. How should I
start? I found the following item in the TODO list (but I haven't read it
yet.) What other discussions should I look at?

You should to implement a CALL statement - that can be independent on outer
transaction. The behave inside procedure called by CALL statement should be
same like client side - and there you can controll transactions explicitly
without nesting.

Regards

Pavel

Show quoted text

--------------------------------------------------
Implement stored procedures
This might involve the control of transaction state and the return of
multiple result sets
PL/pgSQL stored procedure returning multiple result sets (SELECTs)?
Proposal: real procedures again (8.4)
http://archives.postgresql.org/pgsql-hackers/2010-09/msg00542.php
Gathering specs and discussion on feature (post 9.1)
--------------------------------------------------

Regards
Takayuki Tsunakawa

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

craig@2ndquadrant.com

about 9 years ago

In reply to: Pavel Stehule (#2)

Re: [RFC] Transaction management overhaul is necessary?

On 21 October 2016 at 18:57, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2016-10-21 10:24 GMT+02:00 Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com>:

Hello,

From our experience in handling customers' problems, I feel it's necessary
to evolve PostgreSQL's transaction management. The concrete problems are:

1. PostgreSQL cannot end and begin transactions in PL/pgSQL and PL/Java
stored functions.
This is often the reason people could not migrate to PostgreSQL.

I've run into relatively few for whom this has landed up being a
showstopper, but I agree it's a pain.

There'll probably be more as bigger outfits seek to move from That
Other Database where it's routine to do this.

This was because psqlODBC starts and ends a subtransaction for each SQL
statement by default to implement statement-level rollback. And PostgreSQL
creates one CurTransactionContext memory context, which is 8KB, for each
subtransaction and retain them until the top transaction ends.

Surely that's where to start then. Find a way to pool and re-use,
fully release, or otherwise be done with transaction contexts for
released savepoints.

The total
memory used becomes 40GB (8KB * 5 million subtransactions.) This was
avoided by setting the Protocol parameter to 7.4-1, which means
transaction-level rollback.

You can control transaction level rollback in psqlODBC directly. You
do not need to fall back to the old protocol. Check the driver
options.

The savepoint approach for supporting statement-level rollback is
inefficient, because it adds two roundtrips (SAVEPOINT and RELEASE) for each
statement.

Right. We can't just fire off each statement wrapped in SAVEPOINT and
RELEASE SAVEPOINT because we need to get the result of the statement
and decide whether to ROLLBACK TO SAVEPOINT or RELEASE SAVEPOINT. It
only requires two round trips if you shove the SAVEPOINT in with the
intended statement, but it's still messy.

I'd like to see an alternative statement with semantics more akin to
COMMIT - which automatically into ROLLBACK if the tx is aborted.
COMMIT SAVEPOINT would be too confusing since it's not truly
committed. I don't know what to call it. But basically something that
does RELEASE SAVEPOINT [named savepoint] unless the subxact is in
aborted state, in which case it does ROLLBACK TO [named savepoint].
Bonus points for letting it remember the last savepoint created and
use that.

Furthermore, we should really add it on the protocol level so drivers
can send subtransaction control messages more compactly, without
needing to go through the parser etc, and without massively spamming
the logs. For this purpose savepoint names would be internally
generated so the driver wouldn't have to send them. We'd log savepoint
boundaries when transaction logging was enabled. Since the client
would send the first such protocol request we could do it on the sly
without a protocol version bump; clients could just check server
version and not use the new messages for older servers. If they send
it to an older server they get a protocol error, which is fine.

You should to implement a CALL statement - that can be independent on outer
transaction. The behave inside procedure called by CALL statement should be
same like client side - and there you can controll transactions explicitly
without nesting.

I agree that'd be desirable. Top level "procedures" are necessary for
this, really.

This would also enable us to return multiple result sets.

We'd probably have to start at least one small read-only tx for the
initial cache access to look up the proc and set everything up, but if
we don't allocate xids local transactions are super cheap.

However, I think trying to tackle the memory context bloat reported
upthread would be a more effective starting point since it immediately
targets the problem actually experienced.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tsunakawa, Takayuki

tsunakawa.takay@jp.fujitsu.com

about 9 years ago

In reply to: Craig Ringer (#3)

Re: [RFC] Transaction management overhaul is necessary?

From: Craig Ringer [mailto:craig@2ndquadrant.com]

This was because psqlODBC starts and ends a subtransaction for each
SQL statement by default to implement statement-level rollback. And
PostgreSQL creates one CurTransactionContext memory context, which is
8KB, for each subtransaction and retain them until the top transaction

ends.

Surely that's where to start then. Find a way to pool and re-use, fully
release, or otherwise be done with transaction contexts for released
savepoints.

Yes, I'll investigate this. Any reference information would be appreciated on why the CurTransactionContexts had to be retained, and whether it's difficult to circumvent.

You can control transaction level rollback in psqlODBC directly. You do
not need to fall back to the old protocol. Check the driver options.

That driver option is Protocol=7.4-1. The name is misleading, as the driver now ignores version part (7.4), and interprets 1 as transaction-rollback.

Right. We can't just fire off each statement wrapped in SAVEPOINT and RELEASE
SAVEPOINT because we need to get the result of the statement and decide
whether to ROLLBACK TO SAVEPOINT or RELEASE SAVEPOINT. It only requires
two round trips if you shove the SAVEPOINT in with the intended statement,
but it's still messy.

I'd like to see an alternative statement with semantics more akin to COMMIT
- which automatically into ROLLBACK if the tx is aborted.
COMMIT SAVEPOINT would be too confusing since it's not truly committed.
I don't know what to call it. But basically something that does RELEASE
SAVEPOINT [named savepoint] unless the subxact is in aborted state, in which
case it does ROLLBACK TO [named savepoint].
Bonus points for letting it remember the last savepoint created and use
that.

Furthermore, we should really add it on the protocol level so drivers can
send subtransaction control messages more compactly, without needing to
go through the parser etc, and without massively spamming the logs. For
this purpose savepoint names would be internally generated so the driver
wouldn't have to send them. We'd log savepoint boundaries when transaction
logging was enabled. Since the client would send the first such protocol
request we could do it on the sly without a protocol version bump; clients
could just check server version and not use the new messages for older
servers. If they send it to an older server they get a protocol error, which
is fine.

I'm simply thinking of proposing a new GUC, something like "SET auto_rollback = {none | statement | transaction}", where none is the default and traditional behavior.

You should to implement a CALL statement - that can be independent on
outer transaction. The behave inside procedure called by CALL
statement should be same like client side - and there you can controll
transactions explicitly without nesting.

I agree that'd be desirable. Top level "procedures" are necessary for this,
really.

This would also enable us to return multiple result sets.

We'd probably have to start at least one small read-only tx for the initial
cache access to look up the proc and set everything up, but if we don't
allocate xids local transactions are super cheap.

OK, that would be a very big challenge... I can't imagine how difficult it will be now. But supporting the stored procedure with CALL statement would be a wall to overcome.

However, I think trying to tackle the memory context bloat reported upthread
would be a more effective starting point since it immediately targets the

Yes, I think I'll address this. Maybe I'll start different threads for each topic:

1. Memory context bloat
2. Statement-level rollback
3. Stored procedures where transactions can be ended and started

Regards
Takayuki Tsunakawa

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers