mvcc catalo gsnapshots and TopTransactionContext
Hi,
since the mvcc catalog patch has gone in we require all users of
systable_* to be in a valid transaction since the snapshot is copied via
CopySnapshot() in RegisterSnapshot(). Which we call in
systable_beginscan(). CopySnapshot() allocates the copied snapshot in
TopTransactionContext.
There doesn't seem be an explicitly stated rule that we cannot use the
syscaches outside of a transaction - but effectively that's required
atm.
Not having investigated at all so far I am not sure how much in core
code that breaks, but it certainly broke some out of tree code of mine
(bidirectional replication stuff, bdr branch on git.pg.o).
Is that acceptable?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, 2013-07-11 at 10:28 +0200, Andres Freund wrote:
There doesn't seem be an explicitly stated rule that we cannot use the
syscaches outside of a transaction - but effectively that's required
atm.
Aren't there other things that already required that before the MVCC
catalog snapshot patch went in? For instance, if you get a syscache
miss, you have to load from the catalogs, meaning you need to acquire a
lock. I've seen problems related to that in the past:
Regards,
Jeff Davis
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
since the mvcc catalog patch has gone in we require all users of
systable_* to be in a valid transaction since the snapshot is copied via
CopySnapshot() in RegisterSnapshot().
It never has been, and never will be, allowed to call the catcache code
without being in a transaction. What do you think will happen if the
requested row isn't in cache? A table access, that's what, and that
absolutely requires being in a transaction.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2013-07-11 11:53:57 -0700, Jeff Davis wrote:
On Thu, 2013-07-11 at 10:28 +0200, Andres Freund wrote:
There doesn't seem be an explicitly stated rule that we cannot use the
syscaches outside of a transaction - but effectively that's required
atm.Aren't there other things that already required that before the MVCC
catalog snapshot patch went in? For instance, if you get a syscache
miss, you have to load from the catalogs, meaning you need to acquire a
lock.
There are. Several. I am blaming it on conference induced haze. Or such.
On 2013-07-11 15:09:45 -0400, Tom Lane wrote:
It never has been, and never will be, allowed to call the catcache code
without being in a transaction. What do you think will happen if the
requested row isn't in cache? A table access, that's what, and that
absolutely requires being in a transaction.
Makes sense. I was confused because I thought I saw a
get_database_name() and other users outside of a transaction but that
turned out not looking closely enough.
I'd like to add an Assert like in the attached patch making sure we're
in a transaction. Otherwise it's far too easy not to hit an error during
development because everything is cached - and syscache usage isn't
always obvious from the outside to the naive or the tired.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
assert-transaction-state-in-searchcatcache.patchtext/x-patch; charset=us-asciiDownload+4-0
On Fri, Jul 12, 2013 at 5:42 AM, Andres Freund <andres@2ndquadrant.com> wrote:
I'd like to add an Assert like in the attached patch making sure we're
in a transaction. Otherwise it's far too easy not to hit an error during
development because everything is cached - and syscache usage isn't
always obvious from the outside to the naive or the tired.
Seems like a good idea to me. Committed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 12, 2013 at 11:42:23AM +0200, Andres Freund wrote:
On 2013-07-11 15:09:45 -0400, Tom Lane wrote:
It never has been, and never will be, allowed to call the catcache code
without being in a transaction. What do you think will happen if the
requested row isn't in cache? A table access, that's what, and that
absolutely requires being in a transaction.
I'd like to add an Assert like in the attached patch making sure we're
in a transaction. Otherwise it's far too easy not to hit an error during
development because everything is cached - and syscache usage isn't
always obvious from the outside to the naive or the tired.
The following test case reliably hits this new assertion:
create table t (c int);
begin;
create index on t (c);
savepoint q;
insert into t values (1);
select 1/0;
Stack trace:
#2 0x0000000000761159 in ExceptionalCondition (conditionName=<value optimized out>, errorType=<value optimized out>, fileName=<value optimized out>, lineNumber=<value optimized out>) at assert.c:54
#3 0x000000000074fc53 in SearchCatCache (cache=0xc53048, v1=139479, v2=0, v3=0, v4=0) at catcache.c:1072
#4 0x000000000075c011 in SearchSysCache (cacheId=0, key1=369539, key2=6, key3=18446744073709551615, key4=0) at syscache.c:909
#5 0x0000000000757e57 in RelationReloadIndexInfo (relation=0x7f2dd5fffea0) at relcache.c:1770
#6 0x000000000075817b in RelationClearRelation (relation=0x7f2dd5fffea0, rebuild=1 '\001') at relcache.c:1926
#7 0x00000000007588c6 in RelationFlushRelation (relationId=139479) at relcache.c:2076
#8 RelationCacheInvalidateEntry (relationId=139479) at relcache.c:2138
#9 0x000000000075185d in LocalExecuteInvalidationMessage (msg=0xcde778) at inval.c:546
#10 0x0000000000751185 in ProcessInvalidationMessages (hdr=0xc0d390, func=0x7517d0 <LocalExecuteInvalidationMessage>) at inval.c:422
#11 0x0000000000751a3f in AtEOSubXact_Inval (isCommit=0 '\000') at inval.c:973
#12 0x00000000004cb986 in AbortSubTransaction () at xact.c:4250
#13 0x00000000004cbf05 in AbortCurrentTransaction () at xact.c:2863
#14 0x00000000006968f6 in PostgresMain (argc=1, argv=0x7fff7cf71610, dbname=0xc13b60 "test", username=0xbedc88 "nm") at postgres.c:3848
#15 0x000000000064c2b5 in BackendRun () at postmaster.c:4044
#16 BackendStartup () at postmaster.c:3733
#17 ServerLoop () at postmaster.c:1571
#18 0x000000000064fa8d in PostmasterMain (argc=1, argv=0xbe84a0) at postmaster.c:1227
#19 0x00000000005e2dcd in main (argc=1, argv=0xbe84a0) at main.c:196
When we call AtEOSubXact_Inval() or AtEOXact_Inval() with a relation still
open, we can potentially get a relcache rebuild and therefore a syscache
lookup as shown above. CommitSubTransaction() is also potentially affected,
though I don't have an SQL-level test case for that. It calls
CommandCounterIncrement() after moving to TRANS_COMMIT. That CCI had better
find no invalidations of open relations, or we'll make syscache lookups. (In
simple tests, any necessary invalidations tend to happen at the CCI in
CommitTransactionCommand(), so the later CCI does in fact find nothing to do.
I have little confidence that should be counted upon, though.)
How might we best rearrange things to avoid these hazards?
Thanks,
nm
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-08-05 13:09:31 -0400, Noah Misch wrote:
On Fri, Jul 12, 2013 at 11:42:23AM +0200, Andres Freund wrote:
On 2013-07-11 15:09:45 -0400, Tom Lane wrote:
It never has been, and never will be, allowed to call the catcache code
without being in a transaction. What do you think will happen if the
requested row isn't in cache? A table access, that's what, and that
absolutely requires being in a transaction.I'd like to add an Assert like in the attached patch making sure we're
in a transaction. Otherwise it's far too easy not to hit an error during
development because everything is cached - and syscache usage isn't
always obvious from the outside to the naive or the tired.The following test case reliably hits this new assertion:
create table t (c int);
begin;
create index on t (c);
savepoint q;
insert into t values (1);
select 1/0;Stack trace:
#2 0x0000000000761159 in ExceptionalCondition (conditionName=<value optimized out>, errorType=<value optimized out>, fileName=<value optimized out>, lineNumber=<value optimized out>) at assert.c:54
#3 0x000000000074fc53 in SearchCatCache (cache=0xc53048, v1=139479, v2=0, v3=0, v4=0) at catcache.c:1072
#4 0x000000000075c011 in SearchSysCache (cacheId=0, key1=369539, key2=6, key3=18446744073709551615, key4=0) at syscache.c:909
#5 0x0000000000757e57 in RelationReloadIndexInfo (relation=0x7f2dd5fffea0) at relcache.c:1770
#6 0x000000000075817b in RelationClearRelation (relation=0x7f2dd5fffea0, rebuild=1 '\001') at relcache.c:1926
#7 0x00000000007588c6 in RelationFlushRelation (relationId=139479) at relcache.c:2076
#8 RelationCacheInvalidateEntry (relationId=139479) at relcache.c:2138
#9 0x000000000075185d in LocalExecuteInvalidationMessage (msg=0xcde778) at inval.c:546
#10 0x0000000000751185 in ProcessInvalidationMessages (hdr=0xc0d390, func=0x7517d0 <LocalExecuteInvalidationMessage>) at inval.c:422
#11 0x0000000000751a3f in AtEOSubXact_Inval (isCommit=0 '\000') at inval.c:973
#12 0x00000000004cb986 in AbortSubTransaction () at xact.c:4250
#13 0x00000000004cbf05 in AbortCurrentTransaction () at xact.c:2863
#14 0x00000000006968f6 in PostgresMain (argc=1, argv=0x7fff7cf71610, dbname=0xc13b60 "test", username=0xbedc88 "nm") at postgres.c:3848
#15 0x000000000064c2b5 in BackendRun () at postmaster.c:4044
#16 BackendStartup () at postmaster.c:3733
#17 ServerLoop () at postmaster.c:1571
#18 0x000000000064fa8d in PostmasterMain (argc=1, argv=0xbe84a0) at postmaster.c:1227
#19 0x00000000005e2dcd in main (argc=1, argv=0xbe84a0) at main.c:196When we call AtEOSubXact_Inval() or AtEOXact_Inval() with a relation still
open, we can potentially get a relcache rebuild and therefore a syscache
lookup as shown above. CommitSubTransaction() is also potentially affected,
though I don't have an SQL-level test case for that. It calls
CommandCounterIncrement() after moving to TRANS_COMMIT. That CCI had better
find no invalidations of open relations, or we'll make syscache lookups. (In
simple tests, any necessary invalidations tend to happen at the CCI in
CommitTransactionCommand(), so the later CCI does in fact find nothing to do.
I have little confidence that should be counted upon, though.)How might we best rearrange things to avoid these hazards?
Man. This is a nasty one. And seems to have been there for a long
time. I am rather surprised that we haven't seen this as a cause of
problems.
Deferring invalidation processing a good bit seems to be the only way to
deal with this I can see but that seems to require an uncomfortable
amount of shuffling around in xact.c.
I'll think about it more, but so far I haven't seen anything close to
nice.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-08-05 13:09:31 -0400, Noah Misch wrote:
On Fri, Jul 12, 2013 at 11:42:23AM +0200, Andres Freund wrote:
On 2013-07-11 15:09:45 -0400, Tom Lane wrote:
It never has been, and never will be, allowed to call the catcache code
without being in a transaction. What do you think will happen if the
requested row isn't in cache? A table access, that's what, and that
absolutely requires being in a transaction.I'd like to add an Assert like in the attached patch making sure we're
in a transaction. Otherwise it's far too easy not to hit an error during
development because everything is cached - and syscache usage isn't
always obvious from the outside to the naive or the tired.The following test case reliably hits this new assertion:
create table t (c int);
begin;
create index on t (c);
savepoint q;
insert into t values (1);
select 1/0;Stack trace:
#2 0x0000000000761159 in ExceptionalCondition (conditionName=<value optimized out>, errorType=<value optimized out>, fileName=<value optimized out>, lineNumber=<value optimized out>) at assert.c:54
#3 0x000000000074fc53 in SearchCatCache (cache=0xc53048, v1=139479, v2=0, v3=0, v4=0) at catcache.c:1072
#4 0x000000000075c011 in SearchSysCache (cacheId=0, key1=369539, key2=6, key3=18446744073709551615, key4=0) at syscache.c:909
#5 0x0000000000757e57 in RelationReloadIndexInfo (relation=0x7f2dd5fffea0) at relcache.c:1770
#6 0x000000000075817b in RelationClearRelation (relation=0x7f2dd5fffea0, rebuild=1 '\001') at relcache.c:1926
#7 0x00000000007588c6 in RelationFlushRelation (relationId=139479) at relcache.c:2076
#8 RelationCacheInvalidateEntry (relationId=139479) at relcache.c:2138
#9 0x000000000075185d in LocalExecuteInvalidationMessage (msg=0xcde778) at inval.c:546
#10 0x0000000000751185 in ProcessInvalidationMessages (hdr=0xc0d390, func=0x7517d0 <LocalExecuteInvalidationMessage>) at inval.c:422
#11 0x0000000000751a3f in AtEOSubXact_Inval (isCommit=0 '\000') at inval.c:973
#12 0x00000000004cb986 in AbortSubTransaction () at xact.c:4250
#13 0x00000000004cbf05 in AbortCurrentTransaction () at xact.c:2863
#14 0x00000000006968f6 in PostgresMain (argc=1, argv=0x7fff7cf71610, dbname=0xc13b60 "test", username=0xbedc88 "nm") at postgres.c:3848
#15 0x000000000064c2b5 in BackendRun () at postmaster.c:4044
#16 BackendStartup () at postmaster.c:3733
#17 ServerLoop () at postmaster.c:1571
#18 0x000000000064fa8d in PostmasterMain (argc=1, argv=0xbe84a0) at postmaster.c:1227
#19 0x00000000005e2dcd in main (argc=1, argv=0xbe84a0) at main.c:196When we call AtEOSubXact_Inval() or AtEOXact_Inval() with a relation still
open, we can potentially get a relcache rebuild and therefore a syscache
lookup as shown above. CommitSubTransaction() is also potentially affected,
though I don't have an SQL-level test case for that. It calls
CommandCounterIncrement() after moving to TRANS_COMMIT. That CCI had better
find no invalidations of open relations, or we'll make syscache lookups. (In
simple tests, any necessary invalidations tend to happen at the CCI in
CommitTransactionCommand(), so the later CCI does in fact find nothing to do.
I have little confidence that should be counted upon, though.)
How might we best rearrange things to avoid these hazards?
Ok. After a good bit of code reading, I think this isn't an actual bug
but an overzealous Assert(). I think it should be
Assert(IsTransactionBlock()) not Assert(IsTransactionState());
The reason for that is that when we do the AtEO(Sub)?Xact_Inval(), we've
already done a RecordTransactionAbort(true|false) and
CurrentTransactionState->state = TRANS_ABORT. So the visibility routines
have enough information to consider rows created by the aborted
transaction as invisible.
I am not really happy with the RelationReloadIndexInfo()s in
RelationClearRelation() when we're in an aborted state, especially as
the comment surrounding them are clearly out of date, but I don't see a
bug there anymore.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 6, 2013 at 3:06 AM, Andres Freund <andres@2ndquadrant.com> wrote:
The reason for that is that when we do the AtEO(Sub)?Xact_Inval(), we've
already done a RecordTransactionAbort(true|false) and
CurrentTransactionState->state = TRANS_ABORT. So the visibility routines
have enough information to consider rows created by the aborted
transaction as invisible.I am not really happy with the RelationReloadIndexInfo()s in
RelationClearRelation() when we're in an aborted state, especially as
the comment surrounding them are clearly out of date, but I don't see a
bug there anymore.
How can it be safe to try to read catalogs if the transaction is aborted?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-08-08 09:27:24 -0400, Robert Haas wrote:
On Tue, Aug 6, 2013 at 3:06 AM, Andres Freund <andres@2ndquadrant.com> wrote:
The reason for that is that when we do the AtEO(Sub)?Xact_Inval(), we've
already done a RecordTransactionAbort(true|false) and
CurrentTransactionState->state = TRANS_ABORT. So the visibility routines
have enough information to consider rows created by the aborted
transaction as invisible.I am not really happy with the RelationReloadIndexInfo()s in
RelationClearRelation() when we're in an aborted state, especially as
the comment surrounding them are clearly out of date, but I don't see a
bug there anymore.How can it be safe to try to read catalogs if the transaction is aborted?
Well. It isn't. At least not in general. The specific case triggered
here though are cache invalidations being processed which can lead to
the catalog being read (pretty crummy, but not easy to get rid
of). That's actually safe since before we process the invalidations we
have done:
1) CurrentTransactionState->state = TRANS_ABORT
2) RecordTransactionAbort(), marking the transaction as aborted in the
clog
3) marked subxacts as aborted
3) ProcArrayEndTransaction() (for toplevel ones)
Due to these any tqual stuff will treat the current (sub-)xact and it's
children as aborted. So the catalog lookups will use the catalog in a
sensible state.
Now, one could argue that it's certainly not safe for anything but
xact.c itself to play such games. And would be pretty damn right. We
could add some flat to signal catcache.c to temporarily use
Assert(IsTransactionBlock()) instead of IsTransactionStmt() but that
seems overly complex. I think the danger of code doing stuff in an
aborted transaction isn't that big.
This certainly deserves a good comment...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 8, 2013 at 1:28 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Well. It isn't. At least not in general. The specific case triggered
here though are cache invalidations being processed which can lead to
the catalog being read (pretty crummy, but not easy to get rid
of). That's actually safe since before we process the invalidations we
have done:
1) CurrentTransactionState->state = TRANS_ABORT
2) RecordTransactionAbort(), marking the transaction as aborted in the
clog
3) marked subxacts as aborted
3) ProcArrayEndTransaction() (for toplevel ones)Due to these any tqual stuff will treat the current (sub-)xact and it's
children as aborted. So the catalog lookups will use the catalog in a
sensible state.Now, one could argue that it's certainly not safe for anything but
xact.c itself to play such games. And would be pretty damn right. We
could add some flat to signal catcache.c to temporarily use
Assert(IsTransactionBlock()) instead of IsTransactionStmt() but that
seems overly complex. I think the danger of code doing stuff in an
aborted transaction isn't that big.This certainly deserves a good comment...
Do you want to propose something?
I basically don't have a very good feeling about this. Processing
invalidations should invalidate stuff, not try to reread it. You may
be right that our MVCC snapshot is OK at the point invalidations are
processed, but we don't know what caused the transaction to abort in
the first place.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-08-09 09:05:51 -0400, Robert Haas wrote:
On Thu, Aug 8, 2013 at 1:28 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Well. It isn't. At least not in general. The specific case triggered
here though are cache invalidations being processed which can lead to
the catalog being read (pretty crummy, but not easy to get rid
of). That's actually safe since before we process the invalidations we
have done:
1) CurrentTransactionState->state = TRANS_ABORT
2) RecordTransactionAbort(), marking the transaction as aborted in the
clog
3) marked subxacts as aborted
3) ProcArrayEndTransaction() (for toplevel ones)Due to these any tqual stuff will treat the current (sub-)xact and it's
children as aborted. So the catalog lookups will use the catalog in a
sensible state.Now, one could argue that it's certainly not safe for anything but
xact.c itself to play such games. And would be pretty damn right. We
could add some flat to signal catcache.c to temporarily use
Assert(IsTransactionBlock()) instead of IsTransactionStmt() but that
seems overly complex. I think the danger of code doing stuff in an
aborted transaction isn't that big.This certainly deserves a good comment...
Do you want to propose something?
I can, but it will have to wait a couple of days. I am only still online
because my holiday plans didn't 100% work out and got delayed by a
day...
I basically don't have a very good feeling about this. Processing
invalidations should invalidate stuff, not try to reread it.
I agree. But fixing that seems to require a good amount of surgery. The
problem is that currently we cannot know for sure some index doesn't
still use the index support infrastructure :(.
You may
be right that our MVCC snapshot is OK at the point invalidations are
processed, but we don't know what caused the transaction to abort in
the first place.
Well, we know it has been an ERROR and nothing worse. And that it
successfully longjmp'ed up the way to postgres.c or one of the PLs.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 06, 2013 at 09:06:59AM +0200, Andres Freund wrote:
On 2013-08-05 13:09:31 -0400, Noah Misch wrote:
When we call AtEOSubXact_Inval() or AtEOXact_Inval() with a relation still
open, we can potentially get a relcache rebuild and therefore a syscache
lookup as shown above. CommitSubTransaction() is also potentially affected,
though I don't have an SQL-level test case for that. It calls
CommandCounterIncrement() after moving to TRANS_COMMIT. That CCI had better
find no invalidations of open relations, or we'll make syscache lookups. (In
simple tests, any necessary invalidations tend to happen at the CCI in
CommitTransactionCommand(), so the later CCI does in fact find nothing to do.
I have little confidence that should be counted upon, though.)How might we best rearrange things to avoid these hazards?
Ok. After a good bit of code reading, I think this isn't an actual bug
but an overzealous Assert(). I think it should be
Assert(IsTransactionBlock()) not Assert(IsTransactionState());
IsTransactionBlock() is for higher-level things that care about actual use of
BEGIN. It's false in the middle of executing a single-statement transaction,
but that's of course a perfectly valid time for syscache lookups.
The reason for that is that when we do the AtEO(Sub)?Xact_Inval(), we've
already done a RecordTransactionAbort(true|false) and
CurrentTransactionState->state = TRANS_ABORT. So the visibility routines
have enough information to consider rows created by the aborted
transaction as invisible.I am not really happy with the RelationReloadIndexInfo()s in
RelationClearRelation() when we're in an aborted state, especially as
the comment surrounding them are clearly out of date, but I don't see a
bug there anymore.
Interesting.
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
On 2013-08-08 09:27:24 -0400, Robert Haas wrote:
How can it be safe to try to read catalogs if the transaction is aborted?
Well. It isn't. At least not in general. The specific case triggered
here though are cache invalidations being processed which can lead to
the catalog being read (pretty crummy, but not easy to get rid
of). That's actually safe since before we process the invalidations we
have done:
1) CurrentTransactionState->state = TRANS_ABORT
2) RecordTransactionAbort(), marking the transaction as aborted in the
clog
3) marked subxacts as aborted
3) ProcArrayEndTransaction() (for toplevel ones)
Due to these any tqual stuff will treat the current (sub-)xact and it's
children as aborted. So the catalog lookups will use the catalog in a
sensible state.
I don't have any faith in this argument. You might be right that we'll
correctly see our own output rows as aborted, but that's barely the tip
of the iceberg of risk here. Is it safe to take new locks in an aborted
transaction? (What if we're already past the lock-release point in
the abort sequence?) For that matter, given that we don't know what
exactly caused the transaction abort, how safe is it to do anything at
all --- we might for instance be nearly out of memory. If the catalog
reading attempt itself fails, won't we be in an infinite loop of
transaction aborts? I could probably think of ten more risks if
I spent a few more minutes at it.
Cache invalidation during abort should *not* lead to any attempt to
immediately revalidate the cache. No amount of excuses will make that
okay. I have not looked to see just what the path of control is in this
particular case, but we need to fix it, not paper over it.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Aug 09, 2013 at 02:11:46PM -0400, Tom Lane wrote:
Cache invalidation during abort should *not* lead to any attempt to
immediately revalidate the cache. No amount of excuses will make that
okay. I have not looked to see just what the path of control is in this
particular case, but we need to fix it, not paper over it.
+1. What if (sub)transaction end only manipulated the local invalidation
message queue for later processing? Actual processing would happen after
CleanupSubTransaction() returns control to the owning xact, or at the start of
the next transaction for a top-level ending.
--
Noah Misch
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-08-09 14:11:46 -0400, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2013-08-08 09:27:24 -0400, Robert Haas wrote:
How can it be safe to try to read catalogs if the transaction is aborted?
Well. It isn't. At least not in general. The specific case triggered
here though are cache invalidations being processed which can lead to
the catalog being read (pretty crummy, but not easy to get rid
of). That's actually safe since before we process the invalidations we
have done:
1) CurrentTransactionState->state = TRANS_ABORT
2) RecordTransactionAbort(), marking the transaction as aborted in the
clog
3) marked subxacts as aborted
3) ProcArrayEndTransaction() (for toplevel ones)Due to these any tqual stuff will treat the current (sub-)xact and it's
children as aborted. So the catalog lookups will use the catalog in a
sensible state.I don't have any faith in this argument. You might be right that we'll
correctly see our own output rows as aborted, but that's barely the tip
of the iceberg of risk here. Is it safe to take new locks in an aborted
transaction? (What if we're already past the lock-release point in
the abort sequence?)
Don't get me wrong. I find the idea of doing catalog lookup during abort
horrid. But it's been that way for at least 10 years (I checked 7.4), so
it has at least some resemblance of working.
Today we do a good bit less than back then, for one we don't do a full
cache reload during abort anymore, just for the index support
infrastructure. Also, you've reduced the amount of lookups a bit with the
relmapper introduction.
For that matter, given that we don't know what
exactly caused the transaction abort, how safe is it to do anything at
all --- we might for instance be nearly out of memory. If the catalog
reading attempt itself fails, won't we be in an infinite loop of
transaction aborts?
Looks like that's possible, yes. There seem to be quite some other
opportunities for this to happen if you look at the amount of work done
in AbortSubTransaction(). I guess it rarely happens because we
previously release some memory...
I could probably think of ten more risks if I spent a few more minutes
at it.
No need to convince me here. I neither could believe we were doing this,
nor figure out why it even "works" for the first hour of looking at it.
Cache invalidation during abort should *not* lead to any attempt to
immediately revalidate the cache. No amount of excuses will make that
okay. I have not looked to see just what the path of control is in this
particular case, but we need to fix it, not paper over it.
I agree, although that's easier said than done in the case of
subtransactions. The problem we have there is that it's perfectly valid
to still have references to a relation from the outer, not aborted,
transaction. Those need to be valid for anybody looking at the relcache
entry after we've processed the ROLLBACK TO/...
I guess the fix is something like we do in the commit case, where we
transfer invalidations to the parent transaction. If we then process
local invalidations *after* we've cleaned up the subtransaction
completely we should be fine. We already do an implicity
CommandCounterIncrement() in CommitSubTransaction()...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Aug 9, 2013 at 11:17 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-08-09 14:11:46 -0400, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2013-08-08 09:27:24 -0400, Robert Haas wrote:
How can it be safe to try to read catalogs if the transaction is aborted?
Well. It isn't. At least not in general. The specific case triggered
here though are cache invalidations being processed which can lead to
the catalog being read (pretty crummy, but not easy to get rid
of). That's actually safe since before we process the invalidations we
have done:
1) CurrentTransactionState->state = TRANS_ABORT
2) RecordTransactionAbort(), marking the transaction as aborted in the
clog
3) marked subxacts as aborted
3) ProcArrayEndTransaction() (for toplevel ones)Due to these any tqual stuff will treat the current (sub-)xact and it's
children as aborted. So the catalog lookups will use the catalog in a
sensible state.I don't have any faith in this argument. You might be right that we'll
correctly see our own output rows as aborted, but that's barely the tip
of the iceberg of risk here. Is it safe to take new locks in an aborted
transaction? (What if we're already past the lock-release point in
the abort sequence?)Don't get me wrong. I find the idea of doing catalog lookup during abort
horrid. But it's been that way for at least 10 years (I checked 7.4), so
it has at least some resemblance of working.
Today we do a good bit less than back then, for one we don't do a full
cache reload during abort anymore, just for the index support
infrastructure. Also, you've reduced the amount of lookups a bit with the
relmapper introduction.For that matter, given that we don't know what
exactly caused the transaction abort, how safe is it to do anything at
all --- we might for instance be nearly out of memory. If the catalog
reading attempt itself fails, won't we be in an infinite loop of
transaction aborts?Looks like that's possible, yes. There seem to be quite some other
opportunities for this to happen if you look at the amount of work done
in AbortSubTransaction(). I guess it rarely happens because we
previously release some memory...I could probably think of ten more risks if I spent a few more minutes
at it.No need to convince me here. I neither could believe we were doing this,
nor figure out why it even "works" for the first hour of looking at it.Cache invalidation during abort should *not* lead to any attempt to
immediately revalidate the cache. No amount of excuses will make that
okay. I have not looked to see just what the path of control is in this
particular case, but we need to fix it, not paper over it.I agree, although that's easier said than done in the case of
subtransactions. The problem we have there is that it's perfectly valid
to still have references to a relation from the outer, not aborted,
transaction. Those need to be valid for anybody looking at the relcache
entry after we've processed the ROLLBACK TO/...I guess the fix is something like we do in the commit case, where we
transfer invalidations to the parent transaction. If we then process
local invalidations *after* we've cleaned up the subtransaction
completely we should be fine. We already do an implicity
CommandCounterIncrement() in CommitSubTransaction()...
Andres, are you (or is anyone) going to try to fix this assertion failure?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-10-04 15:15:36 -0400, Robert Haas wrote:
Andres, are you (or is anyone) going to try to fix this assertion failure?
I think short term replacing it by IsTransactionOrTransactionBlock() is
the way to go. Entirely restructuring how cache invalidation in the
abort path works is not a small task.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 4, 2013 at 3:20 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-10-04 15:15:36 -0400, Robert Haas wrote:
Andres, are you (or is anyone) going to try to fix this assertion failure?
I think short term replacing it by IsTransactionOrTransactionBlock() is
the way to go. Entirely restructuring how cache invalidation in the
abort path works is not a small task.
Well, if we're going to go that route, how about something like the
attached? I included the assert-change per se, an explanatory
comment, and the test case that Noah devised to cause the current
assertion to fail.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
weaken-searchcatcache-assert.patchapplication/octet-stream; name=weaken-searchcatcache-assert.patchDownload+28-2
On 2013-10-07 15:02:36 -0400, Robert Haas wrote:
On Fri, Oct 4, 2013 at 3:20 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-10-04 15:15:36 -0400, Robert Haas wrote:
Andres, are you (or is anyone) going to try to fix this assertion failure?
I think short term replacing it by IsTransactionOrTransactionBlock() is
the way to go. Entirely restructuring how cache invalidation in the
abort path works is not a small task.Well, if we're going to go that route, how about something like the
attached? I included the assert-change per se, an explanatory
comment, and the test case that Noah devised to cause the current
assertion to fail.
Sounds good to me. Maybe add a comment to the added regression test
explaining it tests invalidation processing at xact abort?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers