Should XLogInsert() be done only inside a critical section?

Started by Tom Lanealmost 10 years ago5 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

Over in <17456.1460832307@sss.pgh.pa.us> I speculated about whether
we should be enforcing that WAL insertion happen only inside critical
sections. We don't currently, and a survey of the backend says that
there are quite a few calls that aren't inside critical sections.
But there are at least two good reasons why we should, IMO:

1. It's not very clear that XLogInsert will recover cleanly if it's
invoked outside a critical section and hits a failure. Certainly,
if we allow such usage, then every potential error inside that code
has to be analyzed under both critical-section and normal rules.

2. With no such check, it's quite easy for calling code to forget to
create a critical section around code stanzas where one is *necessary*
(because you're changing shared-buffer contents).

Both of these points represent pretty clear hazards for introduction
of future bugs, whether or not there are any such bugs today.

As against this, it could be argued that adding critical sections where
they're not absolutely necessary must make crashing failures more probable
than they need to be. But first you'd have to prove that they're not
absolutely necessary, which I'm unsure about because of point #1.

Anyway, I went through our tree and added START/END_CRIT_SECTION calls
around all XLogInsert calls that could currently be reached without one;
see attached. Since this potentially breaks third-party code I would
not propose back-patching it, but I think it's reasonable to propose
applying it to HEAD.

Thoughts?

regards, tom lane

Attachments:

xloginsert-must-be-in-critical-section.patchtext/x-diff; charset=us-ascii; name=xloginsert-must-be-in-critical-section.patchDownload+99-19
#2Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#1)
Re: Should XLogInsert() be done only inside a critical section?

On Thu, Apr 21, 2016 at 5:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Anyway, I went through our tree and added START/END_CRIT_SECTION calls
around all XLogInsert calls that could currently be reached without one;
see attached. Since this potentially breaks third-party code I would
not propose back-patching it, but I think it's reasonable to propose
applying it to HEAD.

+1 for sanitizing those code paths this way. This patch looks sane to
me after having a look with some testing.

--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -610,15 +610,12 @@ brinbuild(Relation heap, Relation index,
IndexInfo *indexInfo)
        elog(ERROR, "index \"%s\" already contains data",
             RelationGetRelationName(index));

- /*
- * Critical section not required, because on error the creation of the
- * whole relation will be rolled back.
- */
Perhaps Alvaro has a opinion to offer regarding this bit removed in brin.c?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#2)
Re: Should XLogInsert() be done only inside a critical section?

Michael Paquier wrote:

On Thu, Apr 21, 2016 at 5:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Anyway, I went through our tree and added START/END_CRIT_SECTION calls
around all XLogInsert calls that could currently be reached without one;
see attached. Since this potentially breaks third-party code I would
not propose back-patching it, but I think it's reasonable to propose
applying it to HEAD.

+1 for sanitizing those code paths this way. This patch looks sane to
me after having a look with some testing.

--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -610,15 +610,12 @@ brinbuild(Relation heap, Relation index,
IndexInfo *indexInfo)
elog(ERROR, "index \"%s\" already contains data",
RelationGetRelationName(index));

- /*
- * Critical section not required, because on error the creation of the
- * whole relation will be rolled back.
- */
Perhaps Alvaro has a opinion to offer regarding this bit removed in brin.c?

I vaguely recall copying this comment from elsewhere, but I didn't see
any other such comment being removed by the patch; I probably copied
something else which got slowly mutated into what's there today during
development.

if we're adding the critical section then the comment should
certainly be removed too.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#3)
Re: Should XLogInsert() be done only inside a critical section?

On Fri, Apr 22, 2016 at 5:18 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Michael Paquier wrote:

On Thu, Apr 21, 2016 at 5:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Anyway, I went through our tree and added START/END_CRIT_SECTION calls
around all XLogInsert calls that could currently be reached without one;
see attached. Since this potentially breaks third-party code I would
not propose back-patching it, but I think it's reasonable to propose
applying it to HEAD.

+1 for sanitizing those code paths this way. This patch looks sane to
me after having a look with some testing.

--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -610,15 +610,12 @@ brinbuild(Relation heap, Relation index,
IndexInfo *indexInfo)
elog(ERROR, "index \"%s\" already contains data",
RelationGetRelationName(index));

- /*
- * Critical section not required, because on error the creation of the
- * whole relation will be rolled back.
- */
Perhaps Alvaro has a opinion to offer regarding this bit removed in brin.c?

I vaguely recall copying this comment from elsewhere, but I didn't see
any other such comment being removed by the patch; I probably copied
something else which got slowly mutated into what's there today during
development.

if we're adding the critical section then the comment should
certainly be removed too.

A scan of the code is showing me that there are 88 sections in the
code containing a comment referring to a critical section, actually a
little bit more because those two terms are sometimes broken between
two lines. With Tom's patch applied, I have found two inconsistencies.

In RecordTransactionAbort@xact.c, there is the following comment that
would need a refresh because XactLogAbortRecord logs a record:
/* XXX do we really need a critical section here? */
START_CRIT_SECTION();

The comment of RelationTruncate@storage.c referring to the use of
critical sections should be updated.

Looking at that the access code while going through the patch, perhaps
it would be good to add some assertions regarding the presence of a
critical section in some code paths of gin. For example
dataExecPlaceToPage and entryExecPlaceToPage should be invoked in a
critical section per their comments. heap_page_prune_execute,
spgPageIndexMultiDelete, spgFormDeadTuple could be as well candidates
for such changes. Surely that's a different, HEAD-only patch though.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#1)
Re: Should XLogInsert() be done only inside a critical section?

On 20/04/16 23:44, Tom Lane wrote:

Over in <17456.1460832307@sss.pgh.pa.us> I speculated about whether
we should be enforcing that WAL insertion happen only inside critical
sections. We don't currently, and a survey of the backend says that
there are quite a few calls that aren't inside critical sections.
But there are at least two good reasons why we should, IMO:

1. It's not very clear that XLogInsert will recover cleanly if it's
invoked outside a critical section and hits a failure. Certainly,
if we allow such usage, then every potential error inside that code
has to be analyzed under both critical-section and normal rules.

It was certainly designed to recover from errors gracefully.
XLogInsertRecord(), which does the low-level work of inserting the
record to the WAL buffer, has a critical section of its own inside it.
The code in xloginsert.c, for constructing the record to insert,
operates on backend-private buffers only.

2. With no such check, it's quite easy for calling code to forget to
create a critical section around code stanzas where one is *necessary*
(because you're changing shared-buffer contents).

Yeah, that is very true.

Both of these points represent pretty clear hazards for introduction
of future bugs, whether or not there are any such bugs today.

As against this, it could be argued that adding critical sections where
they're not absolutely necessary must make crashing failures more probable
than they need to be. But first you'd have to prove that they're not
absolutely necessary, which I'm unsure about because of point #1.

One option would be to put the must-be-in-critical-section check in
XLogRegisterBlock(). A WAL record that is associated with a shared
buffer almost certainly needs a critical section, but many of the others
are safe without it.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers