Online checksums patch - once again

Started by Magnus Haganderalmost 7 years ago91 messageshackers

magnus@hagander.net

almost 7 years ago

OK, let's try this again :)

This is work mainly based in the first version of the online checksums
patch, but based on top of Andres WIP patchset for global barriers (
/messages/by-id/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de
)

Andres patch has been enhanced with wait events per
/messages/by-id/CABUevEwy4LUFqePC5YzanwtzyDDpYvgrj6R5WNznwrO5ouVg1w@mail.gmail.com
.

I'm including both those as attachments here to hopefully trick the cfbot
into being able to build things :)

Other than that, we believe that the list of objections from the first one
should be covered by now, but it's been quite some time and many emails, so
it's possible we missed some. So if you find them, point them out!

Documentation needs another go-over in particular base don changes since,
but the basic principles of how it works should not have changed.

//Magnus & Daniel

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Magnus Hagander (#1)

Re: Online checksums patch - once again

On 2019-Aug-26, Magnus Hagander wrote:

OK, let's try this again :)

This is work mainly based in the first version of the online checksums
patch, but based on top of Andres WIP patchset for global barriers (
/messages/by-id/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de
)

Andres patch has been enhanced with wait events per
/messages/by-id/CABUevEwy4LUFqePC5YzanwtzyDDpYvgrj6R5WNznwrO5ouVg1w@mail.gmail.com
.

Travis says your SGML doesn't compile (maybe you just forgot to "git
add" and edit allfiles.sgml?):

/usr/bin/xmllint --path . --noout --valid postgres.sgml
reference.sgml:287: parser error : Entity 'pgVerifyChecksums' not defined
&pgVerifyChecksums;
^
reference.sgml:295: parser error : chunk is not well balanced
postgres.sgml:231: parser error : Failure to process entity reference
&reference;
^
postgres.sgml:231: parser error : Entity 'reference' not defined
&reference;
^

Other than bots, this patch doesn't seem to have attracted any reviewers
this time around. Perhaps you need to bribe someone? (Maybe "how sad
your committer SSH key stopped working" would do?)

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Magnus Hagander

magnus@hagander.net

over 6 years ago

In reply to: Alvaro Herrera (#2)

Re: Online checksums patch - once again

On Thu, Sep 26, 2019 at 9:48 PM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

On 2019-Aug-26, Magnus Hagander wrote:

OK, let's try this again :)

This is work mainly based in the first version of the online checksums
patch, but based on top of Andres WIP patchset for global barriers (

/messages/by-id/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de

)

Andres patch has been enhanced with wait events per

/messages/by-id/CABUevEwy4LUFqePC5YzanwtzyDDpYvgrj6R5WNznwrO5ouVg1w@mail.gmail.com

.

Travis says your SGML doesn't compile (maybe you just forgot to "git
add" and edit allfiles.sgml?):

Nope, even easier -- the reference pgVerifyChecksums was renamed to
pgChecksums and for some reason we missed that in the merge.

I've rebased again on top of todays master, but that was the only change I
had to make.

Other than bots, this patch doesn't seem to have attracted any reviewers

this time around. Perhaps you need to bribe someone? (Maybe "how sad
your committer SSH key stopped working" would do?)

Hmm. I don't think that's a bribe, that's a threat. However, maybe it will
work.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 6 years ago

In reply to: Magnus Hagander (#3)

Re: Online checksums patch - once again

On Mon, Sep 30, 2019 at 01:03:20PM +0200, Magnus Hagander wrote:

On Thu, Sep 26, 2019 at 9:48 PM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

On 2019-Aug-26, Magnus Hagander wrote:

OK, let's try this again :)

This is work mainly based in the first version of the online checksums
patch, but based on top of Andres WIP patchset for global barriers (

/messages/by-id/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de

)

Andres patch has been enhanced with wait events per

/messages/by-id/CABUevEwy4LUFqePC5YzanwtzyDDpYvgrj6R5WNznwrO5ouVg1w@mail.gmail.com

.

Travis says your SGML doesn't compile (maybe you just forgot to "git
add" and edit allfiles.sgml?):

Nope, even easier -- the reference pgVerifyChecksums was renamed to
pgChecksums and for some reason we missed that in the merge.

I've rebased again on top of todays master, but that was the only change I
had to make.

Other than bots, this patch doesn't seem to have attracted any reviewers

this time around. Perhaps you need to bribe someone? (Maybe "how sad
your committer SSH key stopped working" would do?)

Hmm. I don't think that's a bribe, that's a threat. However, maybe it will
work.

IMHO the patch is ready to go - I think the global barrier solves the
issue in the previous version, and that's the only problem I'm aware of.
So +1 from me to go ahead and push it.

And now please uncomment my commit SSH key again, please ;-)

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Bruce Momjian

bruce@momjian.us

over 6 years ago

In reply to: Tomas Vondra (#4)

Re: Online checksums patch - once again

On Mon, Sep 30, 2019 at 02:49:44PM +0200, Tomas Vondra wrote:

On Mon, Sep 30, 2019 at 01:03:20PM +0200, Magnus Hagander wrote:

Other than bots, this patch doesn't seem to have attracted any reviewers

this time around. Perhaps you need to bribe someone? (Maybe "how sad
your committer SSH key stopped working" would do?)

Hmm. I don't think that's a bribe, that's a threat. However, maybe it will
work.

IMHO the patch is ready to go - I think the global barrier solves the
issue in the previous version, and that's the only problem I'm aware of.
So +1 from me to go ahead and push it.

And now please uncomment my commit SSH key again, please ;-)

For adding cluster-level encryption to Postgres, the plan is to create a
standby that has encryption enabled, then switchover to it. Is that a
method we support now for adding checksums to Postgres? Do we need the
ability to do it in-place too?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Magnus Hagander

magnus@hagander.net

over 6 years ago

In reply to: Bruce Momjian (#5)

Re: Online checksums patch - once again

On Mon, Sep 30, 2019 at 4:53 PM Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Sep 30, 2019 at 02:49:44PM +0200, Tomas Vondra wrote:

On Mon, Sep 30, 2019 at 01:03:20PM +0200, Magnus Hagander wrote:

Other than bots, this patch doesn't seem to have attracted any

reviewers

this time around. Perhaps you need to bribe someone? (Maybe "how

sad

your committer SSH key stopped working" would do?)

Hmm. I don't think that's a bribe, that's a threat. However, maybe it

will

work.

IMHO the patch is ready to go - I think the global barrier solves the
issue in the previous version, and that's the only problem I'm aware of.
So +1 from me to go ahead and push it.

And now please uncomment my commit SSH key again, please ;-)

For adding cluster-level encryption to Postgres, the plan is to create a
standby that has encryption enabled, then switchover to it. Is that a
method we support now for adding checksums to Postgres? Do we need the
ability to do it in-place too?

I definitely think we need the ability to do it in-place as well, yes.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Bruce Momjian

bruce@momjian.us

over 6 years ago

In reply to: Magnus Hagander (#6)

Re: Online checksums patch - once again

On Mon, Sep 30, 2019 at 04:57:41PM +0200, Magnus Hagander wrote:

On Mon, Sep 30, 2019 at 4:53 PM Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Sep 30, 2019 at 02:49:44PM +0200, Tomas Vondra wrote:

On Mon, Sep 30, 2019 at 01:03:20PM +0200, Magnus Hagander wrote:

Other than bots, this patch doesn't seem to have attracted any

reviewers

this time around.ï¿½ Perhaps you need to bribe someone?ï¿½ (Maybe "how

sad

your committer SSH key stopped working" would do?)

Hmm. I don't think that's a bribe, that's a threat. However, maybe it

will

work.

IMHO the patch is ready to go - I think the global barrier solves the
issue in the previous version, and that's the only problem I'm aware of.
So +1 from me to go ahead and push it.

And now please uncomment my commit SSH key again, please ;-)

For adding cluster-level encryption to Postgres, the plan is to create a
standby that has encryption enabled, then switchover to it.ï¿½ Is that a
method we support now for adding checksums to Postgres?ï¿½ Do we need the
ability to do it in-place too?

I definitely think we need the ability to do it in-place as well, yes.ï¿½

OK, just has to ask. I think for encryption, for the first version, we
will do just replica-only changes.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Magnus Hagander

magnus@hagander.net

over 6 years ago

In reply to: Tomas Vondra (#4)

Re: Online checksums patch - once again

On Mon, Sep 30, 2019 at 2:49 PM Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

On Mon, Sep 30, 2019 at 01:03:20PM +0200, Magnus Hagander wrote:

On Thu, Sep 26, 2019 at 9:48 PM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

On 2019-Aug-26, Magnus Hagander wrote:

OK, let's try this again :)

This is work mainly based in the first version of the online checksums
patch, but based on top of Andres WIP patchset for global barriers (

/messages/by-id/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de

)

Andres patch has been enhanced with wait events per

/messages/by-id/CABUevEwy4LUFqePC5YzanwtzyDDpYvgrj6R5WNznwrO5ouVg1w@mail.gmail.com

.

Travis says your SGML doesn't compile (maybe you just forgot to "git
add" and edit allfiles.sgml?):

Nope, even easier -- the reference pgVerifyChecksums was renamed to
pgChecksums and for some reason we missed that in the merge.

I've rebased again on top of todays master, but that was the only change I
had to make.

Other than bots, this patch doesn't seem to have attracted any reviewers

this time around. Perhaps you need to bribe someone? (Maybe "how sad
your committer SSH key stopped working" would do?)

Hmm. I don't think that's a bribe, that's a threat. However, maybe it will
work.

IMHO the patch is ready to go - I think the global barrier solves the
issue in the previous version, and that's the only problem I'm aware of.
So +1 from me to go ahead and push it.

Not to downvalue your review, but I'd really appreciate a review from
someone who was one of the ones who spotted the issue initially.

Especially -- Andres, any chance I can bribe you to take another look?

And now please uncomment my commit SSH key again, please ;-)

I'll consider it...

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: Magnus Hagander (#8)

Re: Online checksums patch - once again

Hi,

On 2019-09-30 16:59:00 +0200, Magnus Hagander wrote:

On Mon, Sep 30, 2019 at 2:49 PM Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

IMHO the patch is ready to go - I think the global barrier solves the
issue in the previous version, and that's the only problem I'm aware of.
So +1 from me to go ahead and push it.

I don't think the global barrier part is necessarily ready. I wrote it
on a flight back from a conference, to allow Magnus to make some
progress. And I don't think it has received meaningful review so far.

Especially -- Andres, any chance I can bribe you to take another look?

I'll try to take a look.

Greetings,

Andres Freund

#10

Magnus Hagander

magnus@hagander.net

over 6 years ago

In reply to: Andres Freund (#9)

Re: Online checksums patch - once again

On Mon, Sep 30, 2019 at 6:11 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-09-30 16:59:00 +0200, Magnus Hagander wrote:

On Mon, Sep 30, 2019 at 2:49 PM Tomas Vondra <

tomas.vondra@2ndquadrant.com>

wrote:

IMHO the patch is ready to go - I think the global barrier solves the
issue in the previous version, and that's the only problem I'm aware

of.

So +1 from me to go ahead and push it.

I don't think the global barrier part is necessarily ready. I wrote it
on a flight back from a conference, to allow Magnus to make some
progress. And I don't think it has received meaningful review so far.

I don't believe it has, no. I wouldn't trust my own level of review a
tleast :)

Especially -- Andres, any chance I can bribe you to take another look?

I'll try to take a look.

Much appreciated!

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#11

Michael Paquier

michael@paquier.xyz

over 6 years ago

In reply to: Magnus Hagander (#10)

Re: Online checksums patch - once again

On Wed, Oct 02, 2019 at 08:59:27PM +0200, Magnus Hagander wrote:

Much appreciated!

The latest patch does not apply, could you send a rebase? Moved it to
next CF, waiting on author.
--
Michael

#12

Daniel Gustafsson

daniel@yesql.se

over 6 years ago

In reply to: Michael Paquier (#11)

Re: Online checksums patch - once again

On 1 Dec 2019, at 03:32, Michael Paquier <michael@paquier.xyz> wrote:

The latest patch does not apply, could you send a rebase? Moved it to
next CF, waiting on author.

Attached is a rebased v14 patchset on top of maser. The Global Barriers patch
is left as a prerequisite, but it will obviously be dropped, or be
significantly changed, once the work Robert is doing with ProcSignalBarrier
lands.

cheers ./daniel

#13

Robert Haas

robertmhaas@gmail.com

over 6 years ago

In reply to: Daniel Gustafsson (#12)

Re: Online checksums patch - once again

On Tue, Dec 3, 2019 at 6:41 PM Daniel Gustafsson <daniel@yesql.se> wrote:

Attached is a rebased v14 patchset on top of maser. The Global Barriers patch
is left as a prerequisite, but it will obviously be dropped, or be
significantly changed, once the work Robert is doing with ProcSignalBarrier
lands.

Any chance you and/or Magnus could offer opinions on some of those
patches? I am reluctant to start committing things with nobody having
replied.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14

Daniel Gustafsson

daniel@yesql.se

over 6 years ago

In reply to: Robert Haas (#13)

Re: Online checksums patch - once again

On 5 Dec 2019, at 16:13, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Dec 3, 2019 at 6:41 PM Daniel Gustafsson <daniel@yesql.se> wrote:
Attached is a rebased v14 patchset on top of maser. The Global Barriers patch
is left as a prerequisite, but it will obviously be dropped, or be
significantly changed, once the work Robert is doing with ProcSignalBarrier
lands.

Any chance you and/or Magnus could offer opinions on some of those
patches? I am reluctant to start committing things with nobody having
replied.

I am currently reviewing your latest patchset, but need a bit more time.

cheers ./daniel

#15

Robert Haas

robertmhaas@gmail.com

over 6 years ago

In reply to: Daniel Gustafsson (#14)

Re: Online checksums patch - once again

On Thu, Dec 5, 2019 at 10:28 AM Daniel Gustafsson <daniel@yesql.se> wrote:

I am currently reviewing your latest patchset, but need a bit more time.

Oh, great, thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16

Daniel Gustafsson

daniel@yesql.se

over 6 years ago

In reply to: Robert Haas (#13)

Re: Online checksums patch - once again

On 5 Dec 2019, at 16:13, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Dec 3, 2019 at 6:41 PM Daniel Gustafsson <daniel@yesql.se> wrote:

Attached is a rebased v14 patchset on top of maser. The Global Barriers patch
is left as a prerequisite, but it will obviously be dropped, or be
significantly changed, once the work Robert is doing with ProcSignalBarrier
lands.

Any chance you and/or Magnus could offer opinions on some of those
patches? I am reluctant to start committing things with nobody having
replied.

Attached is a v15 of the online checksums patchset (minus 0005), rebased on top
of your v3 ProcSignalBarrier patch rather than Andres' PoC GlobalBarrier patch.
It does take the, perhaps, controversial approach of replacing the SAMPLE
barrier with the CHECKSUM barrier. The cfbot will be angry since this email
doesn't contain the procsignalbarrier patch, but it sounded like that would go
in shortly so opted for that.

This version also contains touchups to the documentation part, as well as a
pgindent run.

If reviewers think this version is nearing completion, then a v16 should
address the comment below, but as this version switches its underlying
infrastructure it seemed usefel for testing still.

+   /*
+    * Force a checkpoint to get everything out to disk. XXX: this should
+    * probably not be an IMMEDIATE checkpoint, but leave it there for now for
+    * testing.
+    */
+   RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);

cheers ./daniel

#17

Sergei Kornilov

sk@zsrv.org

over 6 years ago

In reply to: Daniel Gustafsson (#16)

Re: Online checksums patch - once again

Hello

Attached is a v15 of the online checksums patchset (minus 0005), rebased on top
of your v3 ProcSignalBarrier patch rather than Andres' PoC GlobalBarrier patch.
It does take the, perhaps, controversial approach of replacing the SAMPLE
barrier with the CHECKSUM barrier. The cfbot will be angry since this email
doesn't contain the procsignalbarrier patch, but it sounded like that would go
in shortly so opted for that.

ProcSignalBarrier was committed, so online checksums patchset has no other pending dependencies and should be applied cleanly on master. Right? The patchset needs another rebase in this case, does not apply...

regards, Sergei

#18

Robert Haas

robertmhaas@gmail.com

over 6 years ago

In reply to: Daniel Gustafsson (#16)

Re: Online checksums patch - once again

On Mon, Dec 16, 2019 at 10:16 AM Daniel Gustafsson <daniel@yesql.se> wrote:

If reviewers think this version is nearing completion, then a v16 should
address the comment below, but as this version switches its underlying
infrastructure it seemed usefel for testing still.

I think this patch still needs a lot of work.

- doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+ doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites ||
DataChecksumsInProgress());

This will have a small performance cost in a pretty hot code path. Not
sure that it's enough to worry about, though.

-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
{
Assert(ControlFile != NULL);
return (ControlFile->data_checksum_version > 0);
}

This seems troubling, because data_checksum_version can now change,
but you're still accessing it without a lock. This complain applies
likewise to a bunch of related functions in xlog.c as well.

+ elog(ERROR, "Checksums not in inprogress mode");

Questionable capitalization and punctuation.

+void
+SetDataChecksumsOff(void)
+{
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+ ControlFile->data_checksum_version = 0;
+ UpdateControlFile();
+ LWLockRelease(ControlFileLock);
+ WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM));
+
+ XlogChecksums(0);
+}

This looks racey. Suppose that checksums are on. Other backends will
see that checksums are disabled as soon as
ControlFile->data_checksum_version = 0 happens, and they will feel
free to write blocks without checksums. Now we crash, and those blocks
exist on disk even though the on-disk state still otherwise shows
checksums fully enabled. It's a little better if we stop reading
data_checksum_version without a lock, because now nobody else can see
the updated state until we've actually updated the control file. But
even then, isn't it strange that writes of non-checksummed stuff could
appear or be written to disk before XlogChecksums(0) happens? If
that's safe, it seems like it deserves some kind of comment.

+ /*
+ * If we reach this point with checksums in inprogress state, we notify
+ * the user that they need to manually restart the process to enable
+ * checksums. This is because we cannot launch a dynamic background worker
+ * directly from here, it has to be launched from a regular backend.
+ */
+ if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+ ereport(WARNING,
+ (errmsg("checksum state is \"inprogress\" with no worker"),
+ errhint("Either disable or enable checksums by calling the
pg_disable_data_checksums() or pg_enable_data_checksums()
functions.")));

This seems pretty half-baked.

+ (errmsg("could not start checksumhelper: has been canceled")));
+ (errmsg("could not start checksumhelper: already running")));
+ (errmsg("failed to start checksum helper launcher")));

These seem rough around the edges. Using an internal term like
'checksumhelper' in a user-facing error message doesn't seem great.
Generally primary error messages are phrased as a single utterance
where we can, rather than colon-separated fragments like this. The
third message calls it 'checksum helper launcher' whereas the other
two call it 'checksumhelper'. It also isn't very helpful; I don't
think most people like a message saying that something failed with no
explanation given.

+ elog(DEBUG1, "Checksumhelper skipping relation %d as it no longer
exists", relationId);

Here's another way to spell 'checksumhelper', and this time it refers
to the worker rather than the launcher. Also, relation IDs are OIDs,
so need to be printed with %u, and usually we try to print names if
possible. Also, this message, like a lot of messages in this patch,
begins with a capital letter and does not end with a period. That is
neither the style for primary messages nor the style for detail
messages. As these are primary messages, the primary message style
should be used. That style is no capital and no period.

+ if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+ {
+ ereport(LOG,
+ (errmsg("failed to start worker for checksumhelper in \"%s\"",
+ db->dbname)));
+ return FAILED;
+ }

I don't think having constants with names like
SUCCESSFUL/ABORTED/FAILED is a very good idea. Too much chance of name
collisions. I suggest adding a prefix.

Also, the retry logic here doesn't look particularly robust.
RegisterDynamicBackgroundWorker will fail if all slots are available;
if that happens twice for the same database, once on first attempting
it and again when retrying it, the whole process fails, all state is
lost, and all work has to be redone. That seems neither particularly
unlikely nor pleasant.

+ if (DatabaseList == NIL || list_length(DatabaseList) == 0)

I don't think that the second half of this test serves any purpose.

+ snprintf(activity, sizeof(activity), "Waiting for current
transactions to finish (waiting for %d)", waitforxid);

%u here too.

+ if (pgc->relpersistence == 't')

Use the relevant constant.

+ /*
+ * Wait for all temp tables that existed when we started to go away. This
+ * is necessary since we cannot "reach" them to enable checksums. Any temp
+ * tables created after we started will already have checksums in them
+ * (due to the inprogress state), so those are safe.
+ */

This does not seem very nice. It just leaves a worker running more or
less forever. It's essentially waiting for temp-table using sessions
to go away, but that could take a really long time.

+ WAIT_EVENT_PG_SLEEP);

You need to invent a new wait event and add docs for it.

+ if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM))
+ {
+ /*
+ * By virtue of getting here (i.e. interrupts being processed), we
+ * know that this backend won't have any in-progress writes (which
+ * might have missed the checksum change).
+ */
+ }

I don't believe this. I already wrote some about this over here:

/messages/by-id/CA+TgmobORrsgSUydZ3MsSw9L5MBUGz7jRK+973uPZgiyCQ81ag@mail.gmail.com

As a general point, I think that the idea of the ProcSignalBarrier
mechanism is that every backend has some private state that needs to
be updated, and when it absorbs the barrier you know that it's updated
that state, and so when everybody's absorbed the barrier you know that
all the state has been updated. Here, however, there actually is no
backend-private state. The only state that everyone's consulting is
the shared state stored in ControlFile->data_checksum_version. So what
does absorbing the barrier prove? Only that we've reached a
CHECK_FOR_INTERRUPTS(). But that is a useful guarantee only if we
never check for interrupts between the time we examine
ControlFile->data_checksum_version and the time we use it, and I see
no particular reason to believe that should always be true, and I
suspect it isn't, and even if it happens to be true today I think it
could get broken in the future pretty easily. There are no particular
restrictions documented in terms of where DataChecksumsNeedWrite(),
XLogHintBitIsNeeded(), etc. can be checked or what can be done between
checking the value and using it. The issue doesn't arise for today's
DataChecksumsEnabled() because the value can't ever change, but with
this patch things can change, and to me what the patch does about that
doesn't really look adequate.

I'm sort of out of time for right now, but I think this patch needs a
lot more work on the concurrency end of things. It seems to me that it
probably mostly works in practice, but that the whole concurrency
mechanism is not very solid and probably has a lot of rare cases where
it can just misbehave if you get unlucky. I'll try to spend some more
time thinking about this next week. I also think that the fact that
the mechanism for getting from 'inprogress' to 'on' seems fragile and
under-engineered. It still bothers me that there's no mechanism for
persisting the progress that we've made in enabling checksums; but
even apart from that, I think that there just hasn't been enough
thought given here to error/problem cases.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19

Daniel Gustafsson

daniel@yesql.se

over 6 years ago

In reply to: Robert Haas (#18)

Re: Online checksums patch - once again

On 3 Jan 2020, at 23:07, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Dec 16, 2019 at 10:16 AM Daniel Gustafsson <daniel@yesql.se> wrote:

If reviewers think this version is nearing completion, then a v16 should
address the comment below, but as this version switches its underlying
infrastructure it seemed usefel for testing still.

I think this patch still needs a lot of work.

Thanks a lot for your thorough review, much appreciated! Also, sorry for being
slow to respond. Below are fixes and responses to most of the feedback, but I
need a bit more time to think about the concurrency aspects that you brought
up. However, in the spirit of showing work early/often I opted for still
sending the partial response, to perhaps be able to at least close some of the
raised issues in the meantime.

- doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+ doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites ||
DataChecksumsInProgress());
This will have a small performance cost in a pretty hot code path. Not
sure that it's enough to worry about, though.

Not sure either, and/or how clever compilers are about inlining this. As a
test, I've switched over this to be a static inline function, as it's only
consumer is in xlog.c. Now, as mentioned later in this review, reading the
version unlocked has issues so do consider this a WIP test, not a final
suggestion.

-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
{
Assert(ControlFile != NULL);
return (ControlFile->data_checksum_version > 0);
}

This seems troubling, because data_checksum_version can now change,
but you're still accessing it without a lock. This complain applies
likewise to a bunch of related functions in xlog.c as well.

Right, let me do some more thinking on this before addressing in a next version
of the patch. Simply wrapping it in a SHARED lock still has TOCTOU problems so
a bit more work/thinking is needed.

+ elog(ERROR, "Checksums not in inprogress mode");

Questionable capitalization and punctuation.

Fixed capitalization, but elogs shouldn't end with a period so left that.

+void
+SetDataChecksumsOff(void)
+{
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+ ControlFile->data_checksum_version = 0;
+ UpdateControlFile();
+ LWLockRelease(ControlFileLock);
+ WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM));
+
+ XlogChecksums(0);
+}
This looks racey. Suppose that checksums are on. Other backends will
see that checksums are disabled as soon as
ControlFile->data_checksum_version = 0 happens, and they will feel
free to write blocks without checksums. Now we crash, and those blocks
exist on disk even though the on-disk state still otherwise shows
checksums fully enabled. It's a little better if we stop reading
data_checksum_version without a lock, because now nobody else can see
the updated state until we've actually updated the control file. But
even then, isn't it strange that writes of non-checksummed stuff could
appear or be written to disk before XlogChecksums(0) happens? If
that's safe, it seems like it deserves some kind of comment.

As mentioned above, I would like to address this in the next version. I'm
working on it, just need a little more time and wanted to share progress on the
other bits.

+ /*
+ * If we reach this point with checksums in inprogress state, we notify
+ * the user that they need to manually restart the process to enable
+ * checksums. This is because we cannot launch a dynamic background worker
+ * directly from here, it has to be launched from a regular backend.
+ */
+ if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+ ereport(WARNING,
+ (errmsg("checksum state is \"inprogress\" with no worker"),
+ errhint("Either disable or enable checksums by calling the
pg_disable_data_checksums() or pg_enable_data_checksums()
functions.")));

This seems pretty half-baked.

I don't disagree with that. However, given that enabling checksums is a pretty
intensive operation it seems somewhat unfriendly to automatically restart. As
a DBA I wouldn't want that to kick off without manual intervention, but there
is also the risk of this being missed due to assumptions that it would restart.
Any ideas on how to treat this?

If/when we can restart the processing where it left off, without the need to go
over all data again, things might be different wrt the default action.

+ (errmsg("could not start checksumhelper: has been canceled")));
+ (errmsg("could not start checksumhelper: already running")));
+ (errmsg("failed to start checksum helper launcher")));
These seem rough around the edges. Using an internal term like
'checksumhelper' in a user-facing error message doesn't seem great.
Generally primary error messages are phrased as a single utterance
where we can, rather than colon-separated fragments like this. The
third message calls it 'checksum helper launcher' whereas the other
two call it 'checksumhelper'. It also isn't very helpful; I don't
think most people like a message saying that something failed with no
explanation given.

+ elog(DEBUG1, "Checksumhelper skipping relation %d as it no longer
exists", relationId);

Here's another way to spell 'checksumhelper', and this time it refers
to the worker rather than the launcher. Also, relation IDs are OIDs,
so need to be printed with %u, and usually we try to print names if
possible. Also, this message, like a lot of messages in this patch,
begins with a capital letter and does not end with a period. That is
neither the style for primary messages nor the style for detail
messages. As these are primary messages, the primary message style
should be used. That style is no capital and no period.

I've removed checksumhelper from all user facing strings, and only kept them in
the DEBUG strings (which to some extent probably will be removed before a final
version of the patch, so didn't spend too much time on those just now). The
bgworker name is still checksumhelper launcher and checksumhelper worker, but
that could perhaps do with a better name too.

+ if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+ {
+ ereport(LOG,
+ (errmsg("failed to start worker for checksumhelper in \"%s\"",
+ db->dbname)));
+ return FAILED;
+ }
I don't think having constants with names like
SUCCESSFUL/ABORTED/FAILED is a very good idea. Too much chance of name
collisions. I suggest adding a prefix.

Fixed.

Also, the retry logic here doesn't look particularly robust.
RegisterDynamicBackgroundWorker will fail if all slots are available;
if that happens twice for the same database, once on first attempting
it and again when retrying it, the whole process fails, all state is
lost, and all work has to be redone. That seems neither particularly
unlikely nor pleasant.

Agreed, this was a brick or two shy of a load. I've rewritten this logic to
cope better with the conditions around startup/shutdown of bgworkers. I think
there should be some form of backoff mechanism as well in case there
temporarily aren't any slots, to avoid running through all the databases in
short order only to run up the retry counter. Something like if X databases in
succession fail on no slot being available, back off a little before trying X+1
to allow for operations that consume the slot(s) to finish. Or something. It
wont help for systems which are permanently starved with a too low
max_worker_processes, but nothing sort of will. For the latter, I've added a
note to the documentation.

+ if (DatabaseList == NIL || list_length(DatabaseList) == 0)

I don't think that the second half of this test serves any purpose.

True, but I think the code is clearer if the second half is the one we keep, so
went with that.

+ snprintf(activity, sizeof(activity), "Waiting for current
transactions to finish (waiting for %d)", waitforxid);

%u here too.

Fixed.

+ if (pgc->relpersistence == 't')

Use the relevant constant.

Fixed.

+ /*
+ * Wait for all temp tables that existed when we started to go away. This
+ * is necessary since we cannot "reach" them to enable checksums. Any temp
+ * tables created after we started will already have checksums in them
+ * (due to the inprogress state), so those are safe.
+ */
This does not seem very nice. It just leaves a worker running more or
less forever. It's essentially waiting for temp-table using sessions
to go away, but that could take a really long time.

It can, but is there a realistic alternative? I can't think of one but if you
have ideas I'd love for this requirement to go away, or be made less blocking.

At the same time, enabling checksums is hardly the kind of operation one does
casually in a busy database, but probably a more planned operation. This
requirement is mentioned in the documentation such that a DBA can plan for when
to start the processing.

+ WAIT_EVENT_PG_SLEEP);

You need to invent a new wait event and add docs for it.

Done. I failed to figure out a (IMO) good name though, and welcome suggestions
that are more descriptive. CHECKSUM_ENABLE_STARTCONDITION was what I settled on
but I'm not too excited about it.

+ if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM))
+ {
+ /*
+ * By virtue of getting here (i.e. interrupts being processed), we
+ * know that this backend won't have any in-progress writes (which
+ * might have missed the checksum change).
+ */
+ }
I don't believe this. I already wrote some about this over here:

/messages/by-id/CA+TgmobORrsgSUydZ3MsSw9L5MBUGz7jRK+973uPZgiyCQ81ag@mail.gmail.com

As a general point, I think that the idea of the ProcSignalBarrier
mechanism is that every backend has some private state that needs to
be updated, and when it absorbs the barrier you know that it's updated
that state, and so when everybody's absorbed the barrier you know that
all the state has been updated. Here, however, there actually is no
backend-private state. The only state that everyone's consulting is
the shared state stored in ControlFile->data_checksum_version. So what
does absorbing the barrier prove? Only that we've reached a
CHECK_FOR_INTERRUPTS(). But that is a useful guarantee only if we
never check for interrupts between the time we examine
ControlFile->data_checksum_version and the time we use it, and I see
no particular reason to believe that should always be true, and I
suspect it isn't, and even if it happens to be true today I think it
could get broken in the future pretty easily. There are no particular
restrictions documented in terms of where DataChecksumsNeedWrite(),
XLogHintBitIsNeeded(), etc. can be checked or what can be done between
checking the value and using it. The issue doesn't arise for today's
DataChecksumsEnabled() because the value can't ever change, but with
this patch things can change, and to me what the patch does about that
doesn't really look adequate.

I don't disagree with this, but I need to do a bit more thinking before
presenting a suggested fix for this concurrency issue.

I'm sort of out of time for right now, but I think this patch needs a
lot more work on the concurrency end of things. It seems to me that it
probably mostly works in practice, but that the whole concurrency
mechanism is not very solid and probably has a lot of rare cases where
it can just misbehave if you get unlucky. I'll try to spend some more
time thinking about this next week. I also think that the fact that
the mechanism for getting from 'inprogress' to 'on' seems fragile and
under-engineered. It still bothers me that there's no mechanism for
persisting the progress that we've made in enabling checksums; but
even apart from that, I think that there just hasn't been enough
thought given here to error/problem cases.

Thanks again for reviewing (and working on the infrastructure required for this
patch to begin with)! Regarding the persisting the progress; that would be a
really neat feature but I don't have any suggestion on how to do that safely
for real use-cases.

Attached is a v16 rebased on top of current master which addresses the above
commented points, and which I am basing the concurrency work on.

cheers ./daniel

#20

Robert Haas

robertmhaas@gmail.com

over 6 years ago

In reply to: Daniel Gustafsson (#19)

Re: Online checksums patch - once again

On Sat, Jan 18, 2020 at 6:18 PM Daniel Gustafsson <daniel@yesql.se> wrote:

Thanks again for reviewing (and working on the infrastructure required for this
patch to begin with)! Regarding the persisting the progress; that would be a
really neat feature but I don't have any suggestion on how to do that safely
for real use-cases.

Leaving to one side the question of how much work is involved, could
we do something conceptually similar to relfrozenxid/datfrozenxid,
i.e. use catalog state to keep track of which objects have been
handled and which not?

Very rough sketch:

* set a flag indicating that checksums must be computed for all page writes
* use barriers and other magic to make sure everyone has gotten the
memo from the previous step
* use new catalog fields pg_class.relhaschecksums and
pg_database.dathaschecksums to track whether checksums are enabled
* keep launching workers for databases where !pg_class.dathaschecksums
until none remain
* mark checksums as fully enabled
* party

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21

Magnus Hagander

magnus@hagander.net

over 6 years ago

In reply to: Robert Haas (#20)

#22

Robert Haas

robertmhaas@gmail.com

over 6 years ago

In reply to: Magnus Hagander (#21)

#23

Magnus Hagander

magnus@hagander.net

over 6 years ago

In reply to: Robert Haas (#22)

#24

Robert Haas

robertmhaas@gmail.com

over 6 years ago

In reply to: Magnus Hagander (#23)

#25

Daniel Gustafsson

daniel@yesql.se

over 6 years ago

In reply to: Robert Haas (#24)

#26

Robert Haas

robertmhaas@gmail.com

over 6 years ago

In reply to: Daniel Gustafsson (#25)

#27

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: Robert Haas (#26)

#28

David Steele

david@pgmasters.net

about 6 years ago

In reply to: Daniel Gustafsson (#19)

#29

David Steele

david@pgmasters.net

about 6 years ago

In reply to: David Steele (#28)

#30

Daniel Gustafsson

daniel@yesql.se

almost 6 years ago

In reply to: Robert Haas (#26)

#31

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Daniel Gustafsson (#30)

#32

Daniel Gustafsson

daniel@yesql.se

almost 6 years ago

In reply to: Robert Haas (#31)

#33

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Daniel Gustafsson (#32)

#34

Daniel Gustafsson

daniel@yesql.se

almost 6 years ago

In reply to: Robert Haas (#33)

#35

Justin Pryzby

pryzby@telsasoft.com

almost 6 years ago

In reply to: Daniel Gustafsson (#32)

#36

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Daniel Gustafsson (#30)

#37

Daniel Gustafsson

daniel@yesql.se

almost 6 years ago

In reply to: Robert Haas (#36)

#38

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Justin Pryzby (#35)

#39

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Robert Haas (#36)

#40

Michael Paquier

michael@paquier.xyz

over 5 years ago

In reply to: Daniel Gustafsson (#39)

#41

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Daniel Gustafsson (#39)

#42

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Michael Paquier (#40)

#43

Justin Pryzby

pryzby@telsasoft.com

over 5 years ago

In reply to: Daniel Gustafsson (#39)

#44

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Justin Pryzby (#43)

#45

Michael Paquier

michael@paquier.xyz

over 5 years ago

In reply to: Daniel Gustafsson (#44)

#46

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Michael Paquier (#45)

#47

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 5 years ago

In reply to: Daniel Gustafsson (#44)

#48

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 5 years ago

In reply to: Daniel Gustafsson (#44)

#49

Alvaro Herrera

alvherre@2ndquadrant.com

over 5 years ago

In reply to: Heikki Linnakangas (#47)

#50

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 5 years ago

In reply to: Alvaro Herrera (#49)

#51

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Heikki Linnakangas (#48)

#52

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 5 years ago

In reply to: Daniel Gustafsson (#51)

#53

Magnus Hagander

magnus@hagander.net

over 5 years ago

In reply to: Heikki Linnakangas (#52)

#54

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Heikki Linnakangas (#47)

#55

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 5 years ago

In reply to: Daniel Gustafsson (#54)

#56

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Heikki Linnakangas (#55)

#57

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 5 years ago

In reply to: Daniel Gustafsson (#56)

#58

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Heikki Linnakangas (#57)

#59

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Daniel Gustafsson (#58)

#60

Michael Banck

michael.banck@credativ.de

over 5 years ago

In reply to: Daniel Gustafsson (#59)

#61

Michael Banck

michael.banck@credativ.de

over 5 years ago

In reply to: Daniel Gustafsson (#59)

#62

Justin Pryzby

pryzby@telsasoft.com

over 5 years ago

In reply to: Michael Banck (#61)

#63

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Michael Banck (#61)

#64

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Michael Banck (#60)

#65

Michael Banck

michael.banck@credativ.de

over 5 years ago

In reply to: Daniel Gustafsson (#64)

#66

Magnus Hagander

magnus@hagander.net

over 5 years ago

In reply to: Daniel Gustafsson (#64)

#67

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Magnus Hagander (#66)

#68

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Daniel Gustafsson (#67)

#69

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Daniel Gustafsson (#68)

#70

Magnus Hagander

magnus@hagander.net

over 5 years ago

In reply to: Daniel Gustafsson (#69)

#71

Michael Banck

michael.banck@credativ.de

over 5 years ago

In reply to: Daniel Gustafsson (#69)

#72

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Michael Banck (#71)

#73

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 5 years ago

In reply to: Daniel Gustafsson (#72)

#74

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 5 years ago

In reply to: Heikki Linnakangas (#73)

#75

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Heikki Linnakangas (#73)

#76

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 5 years ago

In reply to: Heikki Linnakangas (#74)

#77

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Heikki Linnakangas (#76)

#78

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 5 years ago

In reply to: Daniel Gustafsson (#75)

#79

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Heikki Linnakangas (#78)

#80

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Daniel Gustafsson (#79)

#81

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 5 years ago

In reply to: Daniel Gustafsson (#80)

#82

Michael Paquier

michael@paquier.xyz

over 5 years ago

In reply to: Heikki Linnakangas (#81)

#83

Michael Banck

michael.banck@credativ.de

over 5 years ago

In reply to: Michael Paquier (#82)

#84

Magnus Hagander

magnus@hagander.net

over 5 years ago

In reply to: Heikki Linnakangas (#81)

#85

Bruce Momjian

bruce@momjian.us

over 5 years ago

In reply to: Magnus Hagander (#84)

#86

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 5 years ago

In reply to: Magnus Hagander (#84)

#87

Bruce Momjian

bruce@momjian.us

over 5 years ago

In reply to: Bruce Momjian (#85)

#88

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Heikki Linnakangas (#81)

#89

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Bruce Momjian (#87)

#90

Bruce Momjian

bruce@momjian.us

about 5 years ago

In reply to: Daniel Gustafsson (#89)

#91

Daniel Gustafsson

daniel@yesql.se

about 5 years ago

In reply to: Bruce Momjian (#90)

Online checksums patch - once again

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: