error-handling in ReorderBufferCommit() seems somewhat broken

Started by Tomas Vondraover 8 years ago5 messages
#1Tomas Vondra
tomas.vondra@2ndquadrant.com

Hi,

I've been investigating some failures in test_decoding regression tests,
and it seems to me the error-handling in ReorderBufferCommit() is
somewhat broken, leading to segfault crashes.

The problematic part is this:

PG_CATCH()
{
/*
* Force cache invalidation to happen outside of a valid transaction
* to prevent catalog access as we just caught an error.
*/
AbortCurrentTransaction();

/* make sure there's no cache pollution */
ReorderBufferExecuteInvalidations(rb, txn);

...

}

Triggering it trivial - just add elog(ERROR,...) at the beginning of the
PG_TRY() block.

The problem is that AbortCurrentTransaction() apparently releases the
memory where txn is allocated, making it entirely bogus. So in assert
builds txn->ivalidations are 0x7f7f7f7f7f7f7f7f, triggering a segfault.

Similar issues apply to subsequent calls in the catch block, that also
use txn in some way (e.g. through snapshot_now).

I suppose this is not quite intentional, but rather an example that
error-handling code is an order of magnitude more complicated to write
and test. I've only noticed as I'm investigating some regression
failures on Postgres-XL 10, which does not support subtransactions and
so the BeginInternalSubTransaction() call in the try branch always
fails, triggering the issue.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#1)
Re: error-handling in ReorderBufferCommit() seems somewhat broken

Hi,

On 2017-08-30 01:27:34 +0200, Tomas Vondra wrote:

I've been investigating some failures in test_decoding regression tests,
and it seems to me the error-handling in ReorderBufferCommit() is
somewhat broken, leading to segfault crashes.

The problematic part is this:

PG_CATCH()
{
/*
* Force cache invalidation to happen outside of a valid transaction
* to prevent catalog access as we just caught an error.
*/
AbortCurrentTransaction();

/* make sure there's no cache pollution */
ReorderBufferExecuteInvalidations(rb, txn);

...

}

Triggering it trivial - just add elog(ERROR,...) at the beginning of the
PG_TRY() block.

That's not really a valid thing to do, you should put it after the
BeginInternalSubTransaction()/StartTransactionCommand(). It's basically
assumed that those won't fail - arguably they should be outside of a
PG_TRY then, but that's a different matter. If you start decoding
outside of SQL failing before those will lead to rolling back the parent
tx...

I suppose this is not quite intentional, but rather an example that
error-handling code is an order of magnitude more complicated to write
and test. I've only noticed as I'm investigating some regression
failures on Postgres-XL 10, which does not support subtransactions and
so the BeginInternalSubTransaction() call in the try branch always
fails, triggering the issue.

So, IIUC, there's no live problem in postgres core, besides some ugly &
undocumented assumptions?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#2)
Re: error-handling in ReorderBufferCommit() seems somewhat broken

On 08/30/2017 01:34 AM, Andres Freund wrote:

Hi,

On 2017-08-30 01:27:34 +0200, Tomas Vondra wrote:

I've been investigating some failures in test_decoding regression tests,
and it seems to me the error-handling in ReorderBufferCommit() is
somewhat broken, leading to segfault crashes.

The problematic part is this:

PG_CATCH()
{
/*
* Force cache invalidation to happen outside of a valid transaction
* to prevent catalog access as we just caught an error.
*/
AbortCurrentTransaction();

/* make sure there's no cache pollution */
ReorderBufferExecuteInvalidations(rb, txn);

...

}

Triggering it trivial - just add elog(ERROR,...) at the beginning of the
PG_TRY() block.

That's not really a valid thing to do, you should put it after the
BeginInternalSubTransaction()/StartTransactionCommand(). It's basically
assumed that those won't fail - arguably they should be outside of a
PG_TRY then, but that's a different matter. If you start decoding
outside of SQL failing before those will lead to rolling back the parent
tx...

I suppose this is not quite intentional, but rather an example that
error-handling code is an order of magnitude more complicated to write
and test. I've only noticed as I'm investigating some regression
failures on Postgres-XL 10, which does not support subtransactions and
so the BeginInternalSubTransaction() call in the try branch always
fails, triggering the issue.

So, IIUC, there's no live problem in postgres core, besides some ugly &
undocumented assumptions?

I'm not really following your reasoning. You may very well be right that
the BeginInternalSubTransaction() example is not quite valid on postgres
core, but I don't see how that implies there can be no other errors in
the PG_TRY block. It was merely an explanation how I noticed this issue.

To make it absolutely clear, I claim that the PG_CATCH() block is
demonstrably broken as it calls AbortCurrentTransaction() and then
accesses already freed memory.

The switch in the PG_TRY() blocks includes multiple elog(ERROR) calls in
the switch, and AFAICS hitting any of them will have exactly the same
effect as failure in BeginInternalSubTransaction(). And I suppose there
many other ways to trigger an error and get into the catch block. In
other words, why would we have the block at all?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#3)
Re: error-handling in ReorderBufferCommit() seems somewhat broken

On 2017-08-30 02:11:10 +0200, Tomas Vondra wrote:

On 08/30/2017 01:34 AM, Andres Freund wrote:

Hi,

On 2017-08-30 01:27:34 +0200, Tomas Vondra wrote:

I've been investigating some failures in test_decoding regression tests,
and it seems to me the error-handling in ReorderBufferCommit() is
somewhat broken, leading to segfault crashes.

The problematic part is this:

PG_CATCH()
{
/*
* Force cache invalidation to happen outside of a valid transaction
* to prevent catalog access as we just caught an error.
*/
AbortCurrentTransaction();

/* make sure there's no cache pollution */
ReorderBufferExecuteInvalidations(rb, txn);

...

}

Triggering it trivial - just add elog(ERROR,...) at the beginning of the
PG_TRY() block.

That's not really a valid thing to do, you should put it after the
BeginInternalSubTransaction()/StartTransactionCommand(). It's basically
assumed that those won't fail - arguably they should be outside of a
PG_TRY then, but that's a different matter. If you start decoding
outside of SQL failing before those will lead to rolling back the parent
tx...

I suppose this is not quite intentional, but rather an example that
error-handling code is an order of magnitude more complicated to write
and test. I've only noticed as I'm investigating some regression
failures on Postgres-XL 10, which does not support subtransactions and
so the BeginInternalSubTransaction() call in the try branch always
fails, triggering the issue.

So, IIUC, there's no live problem in postgres core, besides some ugly &
undocumented assumptions?

I'm not really following your reasoning. You may very well be right that
the BeginInternalSubTransaction() example is not quite valid on postgres
core, but I don't see how that implies there can be no other errors in
the PG_TRY block. It was merely an explanation how I noticed this issue.

To make it absolutely clear, I claim that the PG_CATCH() block is
demonstrably broken as it calls AbortCurrentTransaction() and then
accesses already freed memory.

Yea, but that's not what happens normally. The txn memory isn't
allocated in the context created by the started [sub]transaction. I
think you're just seeing what you're seeing because an error thrown
before the BeginInternalSubTransaction() succeeds, will roll back the
*containing* transaction (the one that did the SQL SELECT * FROM
pg_*logical*() call), and free all the entire reorderbuffer stuff.

The switch in the PG_TRY() blocks includes multiple elog(ERROR) calls in
the switch, and AFAICS hitting any of them will have exactly the same
effect as failure in BeginInternalSubTransaction().

No, absolutely not. Once the subtransaction starts, the
AbortCurrentTransaction() will abort that subtransaction, rather than
the containing one.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#4)
Re: error-handling in ReorderBufferCommit() seems somewhat broken

On 08/30/2017 02:19 AM, Andres Freund wrote:

On 2017-08-30 02:11:10 +0200, Tomas Vondra wrote:

I'm not really following your reasoning. You may very well be right that
the BeginInternalSubTransaction() example is not quite valid on postgres
core, but I don't see how that implies there can be no other errors in
the PG_TRY block. It was merely an explanation how I noticed this issue.

To make it absolutely clear, I claim that the PG_CATCH() block is
demonstrably broken as it calls AbortCurrentTransaction() and then
accesses already freed memory.

Yea, but that's not what happens normally. The txn memory isn't
allocated in the context created by the started [sub]transaction. I
think you're just seeing what you're seeing because an error thrown
before the BeginInternalSubTransaction() succeeds, will roll back the
*containing* transaction (the one that did the SQL SELECT * FROM
pg_*logical*() call), and free all the entire reorderbuffer stuff.

The switch in the PG_TRY() blocks includes multiple elog(ERROR) calls in
the switch, and AFAICS hitting any of them will have exactly the same
effect as failure in BeginInternalSubTransaction().

No, absolutely not. Once the subtransaction starts, the
AbortCurrentTransaction() will abort that subtransaction, rather than
the containing one.

Ah, I see. You're right elog(ERROR) calls after the subtransaction start
are handled correctly and don't trigger a segfault. Sorry for the noise
and thanks for the explanation.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers