Online enabling of checksums

Started by Magnus Haganderabout 8 years ago198 messageshackers
Jump to latest
#1Magnus Hagander
magnus@hagander.net

*Once more, here is an attempt to solve the problem of on-line enabling of
checksums that me and Daniel have been hacking on for a bit. See for
example
/messages/by-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp=-7OJWBbcg@mail.gmail.com
</messages/by-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp=-7OJWBbcg@mail.gmail.com&gt;
and
/messages/by-id/FF393672-5608-46D6-9224-6620EC532693@endpoint.com
</messages/by-id/FF393672-5608-46D6-9224-6620EC532693@endpoint.com
for some previous discussions.Base design:Change the checksum flag to
instead of on and off be an enum. off/inprogress/on. When checksums are off
and on, they work like today. When checksums are in progress, checksums are
*written* but not verified. State can go from “off” to “inprogress”, from
“inprogress” to either “on” or “off”, or from “on” to “off”.Two new
functions are added, pg_enable_data_checksums() and
pg_disable_data_checksums(). The disable one is easy -- it just changes to
disable. The enable one will change the state to inprogress, and then start
a background worker (the “checksumhelper launcher”). This worker in turn
will start one sub-worker (“checksumhelper worker”) in each database
(currently all done sequentially). This worker will enumerate all
tables/indexes/etc in the database and validate their checksums. If there
is no checksum, or the checksum is incorrect, it will compute a new
checksum and write it out. When all databases have been processed, the
checksum state changes to “on” and the launcher shuts down. At this point,
the cluster has checksums enabled as if it was initdb’d with checksums
turned on.If the cluster shuts down while “inprogress”, the DBA will have
to manually either restart the worker (by calling pg_enable_checksums()) or
turn checksums off again. Checksums “in progress” only carries a cost and
no benefit.The change of the checksum state is WAL logged with a new xlog
record. All the buffers written by the background worker are forcibly
enabled full page writes to make sure the checksum is fully updated on the
standby even if no actual contents of the buffer changed.We’ve also
included a small commandline tool, bin/pg_verify_checksums, that can be run
against an offline cluster to validate all checksums. Future improvements
includes being able to use the background worker/launcher to perform an
online check as well. Being able to run more parallel workers in the
checksumhelper might also be of interest.The patch includes two sets of
tests, an isolation test turning on checksums while one session is writing
to the cluster and another is continuously reading, to simulate turning on
checksums in a production database. There is also a TAP test which enables
checksums with streaming replication turned on to test the new xlog record.
The isolation test ran into the 1024 character limit of the isolation test
lexer, with a separate patch and discussion at
/messages/by-id/8D628BE4-6606-4FF6-A3FF-8B2B0E9B43D0@yesql.se
</messages/by-id/8D628BE4-6606-4FF6-A3FF-8B2B0E9B43D0@yesql.se&gt;*

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

Attachments:

online_checksums.patchtext/x-patch; charset=US-ASCII; name=online_checksums.patchDownload+1761-32
#2Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#1)
Re: Online enabling of checksums

Re-sending this one with proper formatting. Apologies for the horrible
gmail-screws-up-the-text-part of the last one!

No change to patch or text, just the formatting.

//Magnus

Once more, here is an attempt to solve the problem of on-line enabling of
checksums that me and Daniel have been hacking on for a bit. See for
example
/messages/by-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp=-7OJWBbcg@mail.gmail.com
and
/messages/by-id/FF393672-5608-46D6-9224-6620EC532693@endpoint.com
for some previous discussions.

Base design:

Change the checksum flag to instead of on and off be an enum.
off/inprogress/on. When checksums are off and on, they work like today.
When checksums are in progress, checksums are *written* but not verified.
State can go from “off” to “inprogress”, from “inprogress” to either “on”
or “off”, or from “on” to “off”.

Two new functions are added, pg_enable_data_checksums() and
pg_disable_data_checksums(). The disable one is easy -- it just changes to
disable. The enable one will change the state to inprogress, and then start
a background worker (the “checksumhelper launcher”). This worker in turn
will start one sub-worker (“checksumhelper worker”) in each database
(currently all done sequentially). This worker will enumerate all
tables/indexes/etc in the database and validate their checksums. If there
is no checksum, or the checksum is incorrect, it will compute a new
checksum and write it out. When all databases have been processed, the
checksum state changes to “on” and the launcher shuts down. At this point,
the cluster has checksums enabled as if it was initdb’d with checksums
turned on.

If the cluster shuts down while “inprogress”, the DBA will have to manually
either restart the worker (by calling pg_enable_checksums()) or turn
checksums off again. Checksums “in progress” only carries a cost and no
benefit.

The change of the checksum state is WAL logged with a new xlog record. All
the buffers written by the background worker are forcibly enabled full page
writes to make sure the checksum is fully updated on the standby even if no
actual contents of the buffer changed.

We’ve also included a small commandline tool, bin/pg_verify_checksums, that
can be run against an offline cluster to validate all checksums. Future
improvements includes being able to use the background worker/launcher to
perform an online check as well. Being able to run more parallel workers in
the checksumhelper might also be of interest.

The patch includes two sets of tests, an isolation test turning on
checksums while one session is writing to the cluster and another is
continuously reading, to simulate turning on checksums in a production
database. There is also a TAP test which enables checksums with streaming
replication turned on to test the new xlog record. The isolation test ran
into the 1024 character limit of the isolation test lexer, with a separate
patch and discussion at
/messages/by-id/8D628BE4-6606-4FF6-A3FF-8B2B0E9B43D0@yesql.se

Attachments:

online_checksums.patchtext/x-patch; charset=US-ASCII; name=online_checksums.patchDownload+1761-32
#3Peter Eisentraut
peter_e@gmx.net
In reply to: Magnus Hagander (#1)
Re: Online enabling of checksums

On 2/21/18 15:53, Magnus Hagander wrote:

*Two new functions are added, pg_enable_data_checksums() and
pg_disable_data_checksums(). The disable one is easy -- it just changes
to disable. The enable one will change the state to inprogress, and then
start a background worker (the “checksumhelper launcher”). This worker
in turn will start one sub-worker (“checksumhelper worker”) in each
database (currently all done sequentially).*

This is at least the fourth version of the pattern launcher plus worker
background workers. I wonder whether we can do something to make this
easier and less repetitive. Not in this patch, of course.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Andrey Borodin
amborodin@acm.org
In reply to: Peter Eisentraut (#3)
Re: Online enabling of checksums

Hello, Magnus, Peter!

I'm excited that this feature emerged, thanks for the patch. Hope it will help to fix some mistakes made during initdb long time ago...

22 февр. 2018 г., в 18:22, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> написал(а):

On 2/21/18 15:53, Magnus Hagander wrote:

*Two new functions are added, pg_enable_data_checksums() and
pg_disable_data_checksums(). The disable one is easy -- it just changes
to disable. The enable one will change the state to inprogress, and then
start a background worker (the “checksumhelper launcher”). This worker
in turn will start one sub-worker (“checksumhelper worker”) in each
database (currently all done sequentially).*

This is at least the fourth version of the pattern launcher plus worker
background workers. I wonder whether we can do something to make this
easier and less repetitive. Not in this patch, of course.

Peter, can I ask for some pointers in searching for previous versions?
I want to review patch this patch and some code comparision could be handy....

So far I've found only this [0,1] (without code) and threads mentioned by Magnus [2,3]

Or do you mean extracting "worker+lancher" for reuse for other purposes?

Best regards, Andrey Borodin.

[0]: /messages/by-id/E2B195BF-7AA1-47AF-85BE-0E936D157902@endpoint.com </messages/by-id/E2B195BF-7AA1-47AF-85BE-0E936D157902@endpoint.com
[1]: /messages/by-id/7A00D9D1-535A-4C37-94C7-02296AAF063F@endpoint.com </messages/by-id/7A00D9D1-535A-4C37-94C7-02296AAF063F@endpoint.com
[2]: /messages/by-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp=-7OJWBbcg@mail.gmail.com </messages/by-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp=-7OJWBbcg@mail.gmail.com
[3]: /messages/by-id/FF393672-5608-46D6-9224-6620EC532693@endpoint.com

#5Magnus Hagander
magnus@hagander.net
In reply to: Andrey Borodin (#4)
Re: Online enabling of checksums

On Thu, Feb 22, 2018 at 4:47 PM, Andrey Borodin <x4mmm@yandex-team.ru>
wrote:

Hello, Magnus, Peter!

I'm excited that this feature emerged, thanks for the patch. Hope it will
help to fix some mistakes made during initdb long time ago...

22 февр. 2018 г., в 18:22, Peter Eisentraut <peter.eisentraut@2ndquadrant.
com> написал(а):

On 2/21/18 15:53, Magnus Hagander wrote:

*Two new functions are added, pg_enable_data_checksums() and
pg_disable_data_checksums(). The disable one is easy -- it just changes
to disable. The enable one will change the state to inprogress, and then
start a background worker (the “checksumhelper launcher”). This worker
in turn will start one sub-worker (“checksumhelper worker”) in each
database (currently all done sequentially).*

This is at least the fourth version of the pattern launcher plus worker
background workers. I wonder whether we can do something to make this
easier and less repetitive. Not in this patch, of course.

Peter, can I ask for some pointers in searching for previous versions?
I want to review patch this patch and some code comparision could be
handy....

So far I've found only this [0,1] (without code) and threads mentioned by
Magnus [2,3]

Or do you mean extracting "worker+lancher" for reuse for other purposes?

I'm pretty sure Peter means the second. Which could be interesting, but as
he says, not the topic for this patch.

I'm not entirely sure which the others ones are. Auto-Vacuum obviously is
one, which doesn't use the worker infrastructure. But I'm not sure which
the others are referring to?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#6Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#3)
Re: Online enabling of checksums

On 2018-02-22 08:22:48 -0500, Peter Eisentraut wrote:

On 2/21/18 15:53, Magnus Hagander wrote:

*Two new functions are added, pg_enable_data_checksums() and
pg_disable_data_checksums(). The disable one is easy -- it just changes
to disable. The enable one will change the state to inprogress, and then
start a background worker (the “checksumhelper launcher”). This worker
in turn will start one sub-worker (“checksumhelper worker”) in each
database (currently all done sequentially).*

This is at least the fourth version of the pattern launcher plus worker
background workers. I wonder whether we can do something to make this
easier and less repetitive. Not in this patch, of course.

I suspect I'm going to get some grief for this, but I think the time has
come to bite the bullet and support changing databases in the same
process...

Greetings,

Andres Freund

#7Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#6)
Re: Online enabling of checksums

On Thu, Feb 22, 2018 at 8:24 PM, Andres Freund <andres@anarazel.de> wrote:

On 2018-02-22 08:22:48 -0500, Peter Eisentraut wrote:

On 2/21/18 15:53, Magnus Hagander wrote:

*Two new functions are added, pg_enable_data_checksums() and
pg_disable_data_checksums(). The disable one is easy -- it just changes
to disable. The enable one will change the state to inprogress, and

then

start a background worker (the “checksumhelper launcher”). This worker
in turn will start one sub-worker (“checksumhelper worker”) in each
database (currently all done sequentially).*

This is at least the fourth version of the pattern launcher plus worker
background workers. I wonder whether we can do something to make this
easier and less repetitive. Not in this patch, of course.

I suspect I'm going to get some grief for this, but I think the time has
come to bite the bullet and support changing databases in the same
process...

Hey, I can't even see the goalposts anymore :P

Are you saying this should be done *in general*, or specifically for
background workers? I'm assuming you mean the general case? That would be
very useful, but is probably a fairly non-trivial task (TM).

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#8Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#7)
Re: Online enabling of checksums

Hi,

On 2018-02-22 20:30:52 +0100, Magnus Hagander wrote:

On Thu, Feb 22, 2018 at 8:24 PM, Andres Freund <andres@anarazel.de> wrote:

I suspect I'm going to get some grief for this, but I think the time has
come to bite the bullet and support changing databases in the same
process...

Hey, I can't even see the goalposts anymore :P

Hah. I vote for making this a hard requirement :P

Are you saying this should be done *in general*, or specifically for
background workers? I'm assuming you mean the general case?

I'd say bgworkers first. It's a lot clearer how to exactly do it
there. Refactoring the mainloop handling in PostgresMain() would be a
bigger task.

That would be very useful, but is probably a fairly non-trivial task
(TM).

I'm not actually that sure it is. We have nearly all the code, I
think. Syscache inval, ProcKill(), and then you're nearly ready to do
the normal connection dance again.

Greetings,

Andres Freund

#9Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#8)
Re: Online enabling of checksums

On Thu, Feb 22, 2018 at 8:41 PM, Andres Freund <andres@anarazel.de> wrote:

On 2018-02-22 20:30:52 +0100, Magnus Hagander wrote:

On Thu, Feb 22, 2018 at 8:24 PM, Andres Freund <andres@anarazel.de>

wrote:

I suspect I'm going to get some grief for this, but I think the time

has

come to bite the bullet and support changing databases in the same
process...

Hey, I can't even see the goalposts anymore :P

Hah. I vote for making this a hard requirement :P

Hah! Are you handing out binoculars? :)

Are you saying this should be done *in general*, or specifically for
background workers? I'm assuming you mean the general case?

I'd say bgworkers first. It's a lot clearer how to exactly do it
there. Refactoring the mainloop handling in PostgresMain() would be a
bigger task.

Yeah, it'd probably be easier. I don't know exactly what it'd involve but
clearly less.

In this particular case that would at least phase 1 simplify it because
we'd only need one process instead of worker/launcher. However, if we'd
ever want to parallellize it -- or any other process of the style, like
autovacuum -- you'd still need a launcher+worker combo. So making that
particular scenario simpler might be worthwhile on it's own.

That would be very useful, but is probably a fairly non-trivial task
(TM).

I'm not actually that sure it is. We have nearly all the code, I
think. Syscache inval, ProcKill(), and then you're nearly ready to do
the normal connection dance again.

I'll take your word for it :) I haven't dug into that part.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#10Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#9)
Re: Online enabling of checksums

On February 22, 2018 11:44:17 AM PST, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Feb 22, 2018 at 8:41 PM, Andres Freund <andres@anarazel.de>
wrote:
In this particular case that would at least phase 1 simplify it because
we'd only need one process instead of worker/launcher. However, if we'd
ever want to parallellize it -- or any other process of the style, like
autovacuum -- you'd still need a launcher+worker combo. So making that
particular scenario simpler might be worthwhile on it's own.

Why is that needed? You can just start two bgworkers and process a list of items stored in shared memory. Or even just check, I assume there'd be a catalog flag somewhere, whether a database / table / object of granularity has already been processed and use locking to prevent concurrent access.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#11Peter Eisentraut
peter_e@gmx.net
In reply to: Magnus Hagander (#5)
Re: Online enabling of checksums

On 2/22/18 12:38, Magnus Hagander wrote:

I'm not entirely sure which the others ones are. Auto-Vacuum obviously
is one, which doesn't use the worker infrastructure. But I'm not sure
which the others are referring to? 

autovacuum, subscription workers, auto prewarm

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#10)
Re: Online enabling of checksums

On Thu, Feb 22, 2018 at 8:52 PM, Andres Freund <andres@anarazel.de> wrote:

On February 22, 2018 11:44:17 AM PST, Magnus Hagander <magnus@hagander.net>
wrote:

On Thu, Feb 22, 2018 at 8:41 PM, Andres Freund <andres@anarazel.de>
wrote:
In this particular case that would at least phase 1 simplify it because
we'd only need one process instead of worker/launcher. However, if we'd
ever want to parallellize it -- or any other process of the style, like
autovacuum -- you'd still need a launcher+worker combo. So making that
particular scenario simpler might be worthwhile on it's own.

Why is that needed? You can just start two bgworkers and process a list of
items stored in shared memory. Or even just check, I assume there'd be a
catalog flag somewhere, whether a database / table / object of granularity
has already been processed and use locking to prevent concurrent access.

You could do that, but then you've moving the complexity to managing that
list in shared memory instead. I'm not sure that's any easier... And
certainly adding a catalog flag for a usecase like this one is not making
it easier.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#13Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#12)
Re: Online enabling of checksums

Hi,

On 2018-02-22 21:16:02 +0100, Magnus Hagander wrote:

You could do that, but then you've moving the complexity to managing that
list in shared memory instead.

Maybe I'm missing something, but how are you going to get quick parallel
processing if you don't have a shmem piece? You can't assign one
database per worker because commonly there's only one database. You
don't want to start/stop a worker for each relation because that'd be
extremely slow for databases with a lot of tables. Without shmem you
can't pass more than an oid to a bgworker. To me the combination of
these things imply that you need some other synchronization mechanism
*anyway*.

I'm not sure that's any easier... And
certainly adding a catalog flag for a usecase like this one is not making
it easier.

Hm, I imagined you'd need that anyway. Imagine a 10TB database that's
online converted to checksums. I assume you'd not want to reread 9TB if
you crash after processing most of the cluster already?

Regards,

Andres Freund

#14Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#13)
Re: Online enabling of checksums

On Thu, Feb 22, 2018 at 9:23 PM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-02-22 21:16:02 +0100, Magnus Hagander wrote:

You could do that, but then you've moving the complexity to managing that
list in shared memory instead.

Maybe I'm missing something, but how are you going to get quick parallel
processing if you don't have a shmem piece? You can't assign one
database per worker because commonly there's only one database. You
don't want to start/stop a worker for each relation because that'd be
extremely slow for databases with a lot of tables. Without shmem you
can't pass more than an oid to a bgworker. To me the combination of
these things imply that you need some other synchronization mechanism
*anyway*.

Yes, you probably need something like that if you want to be able to
parallelize on things inside each database. If you are OK parallelizing
things on a per-database level, you don't need it.

I'm not sure that's any easier... And
certainly adding a catalog flag for a usecase like this one is not making
it easier.

Hm, I imagined you'd need that anyway. Imagine a 10TB database that's
online converted to checksums. I assume you'd not want to reread 9TB if
you crash after processing most of the cluster already?

I would prefer that yes. But having to re-read 9TB is still significantly
better than not being able to turn on checksums at all (state today). And
adding a catalog column for it will carry the cost of the migration
*forever*, both for clusters that never have checksums and those that had
it from the beginning.

Accepting that the process will start over (but only read, not re-write,
the blocks that have already been processed) in case of a crash does
significantly simplify the process, and reduce the long-term cost of it in
the form of entries in the catalogs. Since this is a on-time operation (or
for many people, a zero-time operation), paying that cost that one time is
probably better than paying a much smaller cost but constantly.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#15Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#6)
Re: Online enabling of checksums

On Thu, Feb 22, 2018 at 11:24:37AM -0800, Andres Freund wrote:

I suspect I'm going to get some grief for this, but I think the time has
come to bite the bullet and support changing databases in the same
process...

I'd like to see that. Last time this has been discussed, and Robert
complained to me immediately when I suggested it, is that this is not
worth it with the many complications around syscache handling and
resource cleanup. It is in the long term more stable to use a model
where a parent process handles a set of children and decides to which
database each child should spawn, which is what autovacuum does.
--
Michael

#16Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#15)
Re: Online enabling of checksums

How does:

On 2018-02-23 11:48:16 +0900, Michael Paquier wrote:

On Thu, Feb 22, 2018 at 11:24:37AM -0800, Andres Freund wrote:

I suspect I'm going to get some grief for this, but I think the time has
come to bite the bullet and support changing databases in the same
process...

I'd like to see that. Last time this has been discussed, and Robert
complained to me immediately when I suggested it, is that this is not
worth it with the many complications around syscache handling and
resource cleanup.

relate to:

It is in the long term more stable to use a model
where a parent process handles a set of children and decides to which
database each child should spawn, which is what autovacuum does.

?

#17Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#15)
Re: Online enabling of checksums

On Thu, Feb 22, 2018 at 9:48 PM, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Feb 22, 2018 at 11:24:37AM -0800, Andres Freund wrote:

I suspect I'm going to get some grief for this, but I think the time has
come to bite the bullet and support changing databases in the same
process...

I'd like to see that. Last time this has been discussed, and Robert
complained to me immediately when I suggested it, is that this is not
worth it with the many complications around syscache handling and
resource cleanup. It is in the long term more stable to use a model
where a parent process handles a set of children and decides to which
database each child should spawn, which is what autovacuum does.

My position is that allowing processes to change databases is a good
idea but (1) it will probably take some work to get correct and (2) it
probably won't be super-fast due to the need to flush absolutely every
bit of state in sight that might've been influenced by the choice of
database.

I also agree with Andres that this email is not very easy to
understand, although my complaint is not so much that I don't see how
the parts relate as that you seem to be contradicting yourself.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Magnus Hagander
magnus@hagander.net
In reply to: Peter Eisentraut (#11)
Re: Online enabling of checksums

On Thu, Feb 22, 2018 at 9:09 PM, Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:

On 2/22/18 12:38, Magnus Hagander wrote:

I'm not entirely sure which the others ones are. Auto-Vacuum obviously
is one, which doesn't use the worker infrastructure. But I'm not sure
which the others are referring to?

autovacuum, subscription workers, auto prewarm

Oh, for some reason I thought you were thinking in pending patches. Yeah,
for those it makes sense -- though autovacuum isn't (currently) using
background workers for what it does, the rest certainly makes sense to do
something with.

But as you say, that's a separate patch :)

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#19Robert Haas
robertmhaas@gmail.com
In reply to: Magnus Hagander (#14)
Re: Online enabling of checksums

On Thu, Feb 22, 2018 at 3:28 PM, Magnus Hagander <magnus@hagander.net> wrote:

I would prefer that yes. But having to re-read 9TB is still significantly
better than not being able to turn on checksums at all (state today). And
adding a catalog column for it will carry the cost of the migration
*forever*, both for clusters that never have checksums and those that had it
from the beginning.

Accepting that the process will start over (but only read, not re-write, the
blocks that have already been processed) in case of a crash does
significantly simplify the process, and reduce the long-term cost of it in
the form of entries in the catalogs. Since this is a on-time operation (or
for many people, a zero-time operation), paying that cost that one time is
probably better than paying a much smaller cost but constantly.

That's not totally illogical, but to be honest I'm kinda surprised
that you're approaching it that way. I would have thought that
relchecksums and datchecksums columns would have been a sort of
automatic design choice for this feature. The thing to keep in mind
is that nobody's going to notice the overhead of adding those columns
in practice, but someone will surely notice the pain that comes from
having to restart the whole operation. You're talking about trading
an effectively invisible overhead for a very noticeable operational
problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Robert Haas (#19)
Re: Online enabling of checksums

On 02/24/2018 01:34 AM, Robert Haas wrote:

On Thu, Feb 22, 2018 at 3:28 PM, Magnus Hagander <magnus@hagander.net> wrote:

I would prefer that yes. But having to re-read 9TB is still significantly
better than not being able to turn on checksums at all (state today). And
adding a catalog column for it will carry the cost of the migration
*forever*, both for clusters that never have checksums and those that had it
from the beginning.

Accepting that the process will start over (but only read, not re-write, the
blocks that have already been processed) in case of a crash does
significantly simplify the process, and reduce the long-term cost of it in
the form of entries in the catalogs. Since this is a on-time operation (or
for many people, a zero-time operation), paying that cost that one time is
probably better than paying a much smaller cost but constantly.

That's not totally illogical, but to be honest I'm kinda surprised
that you're approaching it that way. I would have thought that
relchecksums and datchecksums columns would have been a sort of
automatic design choice for this feature. The thing to keep in mind
is that nobody's going to notice the overhead of adding those columns
in practice, but someone will surely notice the pain that comes from
having to restart the whole operation. You're talking about trading
an effectively invisible overhead for a very noticeable operational
problem.

I agree having to restart the whole operation after a crash is not
ideal, but I don't see how adding a flag actually solves it. The problem
is the large databases often store most of the data (>80%) in one or two
central tables (think fact tables in star schema, etc.). So if you
crash, it's likely half-way while processing this table, so the whole
table would still have relchecksums=false and would have to be processed
from scratch.

But perhaps you meant something like "position" instead of just a simple
true/false flag?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#21Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#20)
#22Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#21)
#23Stephen Frost
sfrost@snowman.net
In reply to: Tomas Vondra (#22)
#24Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Magnus Hagander (#1)
#25Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#1)
#26Magnus Hagander
magnus@hagander.net
In reply to: Robert Haas (#19)
#27Magnus Hagander
magnus@hagander.net
In reply to: Tomas Vondra (#24)
#28Magnus Hagander
magnus@hagander.net
In reply to: Michael Banck (#25)
#29Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#26)
#30Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#29)
#31Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#30)
#32Bruce Momjian
bruce@momjian.us
In reply to: Magnus Hagander (#1)
#33Magnus Hagander
magnus@hagander.net
In reply to: Bruce Momjian (#32)
#34Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#31)
#35Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#27)
#36Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Stephen Frost (#23)
#37Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Magnus Hagander (#34)
#38Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Magnus Hagander (#26)
#39Daniel Gustafsson
daniel@yesql.se
In reply to: Tomas Vondra (#38)
#40Andrey Borodin
amborodin@acm.org
In reply to: Magnus Hagander (#35)
#41Daniel Gustafsson
daniel@yesql.se
In reply to: Andrey Borodin (#40)
#42Robert Haas
robertmhaas@gmail.com
In reply to: Magnus Hagander (#33)
#43Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Magnus Hagander (#35)
#44Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alvaro Herrera (#43)
#45Daniel Gustafsson
daniel@yesql.se
In reply to: Tomas Vondra (#44)
#46Craig Ringer
craig@2ndquadrant.com
In reply to: Alvaro Herrera (#43)
#47Andrey Borodin
amborodin@acm.org
In reply to: Robert Haas (#42)
#48Andrey Borodin
amborodin@acm.org
In reply to: Daniel Gustafsson (#41)
#49Michael Paquier
michael@paquier.xyz
In reply to: Andrey Borodin (#48)
#50Andres Freund
andres@anarazel.de
In reply to: Andrey Borodin (#48)
#51Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#50)
#52Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#51)
#53Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#52)
#54Magnus Hagander
magnus@hagander.net
In reply to: Tomas Vondra (#44)
#55Magnus Hagander
magnus@hagander.net
In reply to: Robert Haas (#42)
#56Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Magnus Hagander (#54)
#57Magnus Hagander
magnus@hagander.net
In reply to: Alvaro Herrera (#56)
#58Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Magnus Hagander (#57)
#59Robert Haas
robertmhaas@gmail.com
In reply to: Magnus Hagander (#55)
#60Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#52)
#61Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Magnus Hagander (#55)
#62Magnus Hagander
magnus@hagander.net
In reply to: Tomas Vondra (#61)
#63Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Magnus Hagander (#62)
#64Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#63)
#65Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#64)
#66Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Robert Haas (#65)
#67Magnus Hagander
magnus@hagander.net
In reply to: Tomas Vondra (#66)
#68Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Magnus Hagander (#67)
#69Magnus Hagander
magnus@hagander.net
In reply to: Tomas Vondra (#68)
#70Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#69)
#71Daniel Gustafsson
daniel@yesql.se
In reply to: Michael Banck (#70)
#72Michael Banck
michael.banck@credativ.de
In reply to: Daniel Gustafsson (#71)
#73Magnus Hagander
magnus@hagander.net
In reply to: Michael Banck (#72)
#74Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#73)
#75Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#69)
#76Daniel Gustafsson
daniel@yesql.se
In reply to: Michael Banck (#75)
#77Michael Banck
michael.banck@credativ.de
In reply to: Daniel Gustafsson (#76)
#78Andrey Borodin
amborodin@acm.org
In reply to: Michael Banck (#77)
#79Daniel Gustafsson
daniel@yesql.se
In reply to: Michael Banck (#77)
#80Daniel Gustafsson
daniel@yesql.se
In reply to: Andrey Borodin (#78)
#81Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Daniel Gustafsson (#79)
#82Andrey Borodin
amborodin@acm.org
In reply to: Daniel Gustafsson (#80)
#83Magnus Hagander
magnus@hagander.net
In reply to: Andrey Borodin (#82)
#84Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#83)
#85Andrey Borodin
amborodin@acm.org
In reply to: Magnus Hagander (#84)
#86Andrey Borodin
amborodin@acm.org
In reply to: Heikki Linnakangas (#81)
#87Magnus Hagander
magnus@hagander.net
In reply to: Andrey Borodin (#86)
#88Magnus Hagander
magnus@hagander.net
In reply to: Heikki Linnakangas (#81)
#89Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Magnus Hagander (#88)
#90Magnus Hagander
magnus@hagander.net
In reply to: Tomas Vondra (#89)
#91Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Magnus Hagander (#90)
#92Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Magnus Hagander (#88)
#93Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Magnus Hagander (#88)
#94Magnus Hagander
magnus@hagander.net
In reply to: Tomas Vondra (#93)
#95Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Magnus Hagander (#94)
#96Magnus Hagander
magnus@hagander.net
In reply to: Tomas Vondra (#95)
#97Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Magnus Hagander (#96)
#98Magnus Hagander
magnus@hagander.net
In reply to: Tomas Vondra (#97)
#99Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#98)
#100Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Magnus Hagander (#99)
#101Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#99)
#102Magnus Hagander
magnus@hagander.net
In reply to: Tomas Vondra (#100)
#103Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Magnus Hagander (#102)
#104Andrey Borodin
amborodin@acm.org
In reply to: Tomas Vondra (#103)
#105Magnus Hagander
magnus@hagander.net
In reply to: Andrey Borodin (#104)
#106Andrey Borodin
amborodin@acm.org
In reply to: Magnus Hagander (#105)
#107Magnus Hagander
magnus@hagander.net
In reply to: Andrey Borodin (#106)
#108Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#107)
#109Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#108)
#110Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#109)
#111Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#110)
#112Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#111)
#113Joshua D. Drake
jd@commandprompt.com
In reply to: Andres Freund (#109)
#114Andres Freund
andres@anarazel.de
In reply to: Joshua D. Drake (#113)
#115Joshua D. Drake
jd@commandprompt.com
In reply to: Andres Freund (#114)
In reply to: Andres Freund (#112)
#117Magnus Hagander
magnus@hagander.net
In reply to: Peter Geoghegan (#116)
In reply to: Joshua D. Drake (#113)
#119Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#108)
#120Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#119)
#121Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#120)
#122Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#108)
#123Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#122)
#124Daniel Gustafsson
daniel@yesql.se
In reply to: Magnus Hagander (#117)
#125Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#121)
#126Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Magnus Hagander (#125)
#127Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#125)
#128Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#126)
#129Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#128)
#130Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#129)
#131Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#130)
#132Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#131)
#133Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#114)
#134Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#133)
#135Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#132)
#136Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#132)
#137Andres Freund
andres@anarazel.de
In reply to: Daniel Gustafsson (#124)
#138Andres Freund
andres@anarazel.de
In reply to: Daniel Gustafsson (#124)
#139Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#137)
#140Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#139)
#141Daniel Gustafsson
daniel@yesql.se
In reply to: Andres Freund (#138)
#142Andres Freund
andres@anarazel.de
In reply to: Daniel Gustafsson (#141)
#143Daniel Gustafsson
daniel@yesql.se
In reply to: Andres Freund (#142)
#144Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#140)
#145Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#144)
#146Andres Freund
andres@anarazel.de
In reply to: Daniel Gustafsson (#143)
#147Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#145)
#148Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#109)
#149Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#148)
#150Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#149)
#151Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#150)
#152Andres Freund
andres@anarazel.de
In reply to: Michael Banck (#151)
#153Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#150)
#154Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#153)
#155Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#148)
#156Magnus Hagander
magnus@hagander.net
In reply to: Robert Haas (#155)
#157Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#154)
In reply to: Magnus Hagander (#157)
In reply to: Sergei Kornilov (#158)
#160Daniel Gustafsson
daniel@yesql.se
In reply to: Sergei Kornilov (#159)
In reply to: Daniel Gustafsson (#160)
#162Robert Haas
robertmhaas@gmail.com
In reply to: Magnus Hagander (#157)
#163Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#162)
#164Bruce Momjian
bruce@momjian.us
In reply to: Daniel Gustafsson (#160)
#165Joshua D. Drake
jd@commandprompt.com
In reply to: Bruce Momjian (#164)
#166Bruce Momjian
bruce@momjian.us
In reply to: Joshua D. Drake (#165)
#167Daniel Gustafsson
daniel@yesql.se
In reply to: Joshua D. Drake (#165)
#168Daniel Gustafsson
daniel@yesql.se
In reply to: Andres Freund (#163)
#169Andres Freund
andres@anarazel.de
In reply to: Daniel Gustafsson (#168)
#170Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#169)
#171Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#170)
#172Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#170)
#173Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#172)
#174Michael Banck
michael.banck@credativ.de
In reply to: Alvaro Herrera (#172)
#175Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Michael Banck (#174)
In reply to: Tomas Vondra (#175)
#177Andres Freund
andres@anarazel.de
In reply to: Michael Banck (#174)
#178Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#175)
#179Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#177)
#180Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#178)
#181Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#179)
#182Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#180)
#183Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#173)
#184Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#183)
#185Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#184)
#186Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#185)
#187Joshua D. Drake
jd@commandprompt.com
In reply to: Alvaro Herrera (#179)
#188Andres Freund
andres@anarazel.de
In reply to: Joshua D. Drake (#187)
In reply to: Joshua D. Drake (#187)
#190Andres Freund
andres@anarazel.de
In reply to: Sergei Kornilov (#189)
In reply to: Andres Freund (#190)
#192Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Daniel Gustafsson (#160)
#193Stephen Frost
sfrost@snowman.net
In reply to: Tomas Vondra (#192)
#194Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Stephen Frost (#193)
#195Stephen Frost
sfrost@snowman.net
In reply to: Tomas Vondra (#194)
#196Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Stephen Frost (#195)
#197Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#196)
#198Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#197)