pg_checksums?

Started by Paul Försterover 2 years ago28 messagesgeneral

paul.foerster@gmail.com

over 2 years ago

Hi,

I have a few questions about pg_checksums.

Long story short, checksums were disabled in our environment but are getting enabled now (again!) because of block corruptions which destroyed a few databases in a database cluster. And before you say "told you so", the decision to disable checksums was not mine. Some people just have to learn the hard way.

Anyway, re-enabling data checksums creates a few questions:

a) why isn't it possible to enable checksumming while a database cluster is up?

b) why isn't it possible to check whether checksums are enabled or not?

c) in a Patroni cluster consisting of a primary and a sync standby, is it safe to enable checksumming in the replica, then switch over and enable it in the ex-primary, i.e. now new replica, without any penalty? Or do I have to perform a reinit to really get them in sync again, though paronictl happily reports them to be in sync?

d) how long does it take to enable checksums in general? Minimizing down-time is crucial. Does it depend on the database cluster size, or the number of files it uses, or what can be taken as a criterion to estimate then necessary down-time.

Thanks in advance for your insights.

Cheers
Paul

ronljohnsonjr@gmail.com

over 2 years ago

In reply to: Paul Förster (#1)

Re: pg_checksums?

On 10/27/23 13:34, Paul Förster wrote:

Hi,

I have a few questions about pg_checksums.

Long story short, checksums were disabled in our environment but are getting enabled now (again!) because of block corruptions which destroyed a few databases in a database cluster. And before you say "told you so", the decision to disable checksums was not mine. Some people just have to learn the hard way.

Anyway, re-enabling data checksums creates a few questions:

a) why isn't it possible to enable checksumming while a database cluster is up?

Data might be changing.

b) why isn't it possible to check whether checksums are enabled or not?

(This is my tiny test instance.)

$ pg_controldata | grep checksum
Data page checksum version: 0

postgres=# show data_checksums;
data_checksums
----------------
off
(1 row)

c) in a Patroni cluster consisting of a primary and a sync standby, is it safe to enable checksumming in the replica, then switch over and enable it in the ex-primary, i.e. now new replica, without any penalty? Or do I have to perform a reinit to really get them in sync again, though paronictl happily reports them to be in sync?

d) how long does it take to enable checksums in general? Minimizing down-time is crucial. Does it depend on the database cluster size, or the number of files it uses, or what can be taken as a criterion to estimate then necessary down-time.

Thanks in advance for your insights.

Cheers
Paul

--
Born in Arizona, moved to Babylonia.

paul.foerster@gmail.com

over 2 years ago

In reply to: Ron (#2)

Re: pg_checksums?

Hi Ron,

On Oct 27, 2023, at 21:02, Ron <ronljohnsonjr@gmail.com> wrote:

b) why isn't it possible to check whether checksums are enabled or not?

(This is my tiny test instance.)

$ pg_controldata | grep checksum
Data page checksum version: 0

postgres=# show data_checksums;
data_checksums
----------------
off
(1 row)

this helps a lot. Thanks very much.

Now, there are only two other questions.

Cheers
Paul

Daniel Gustafsson

daniel@yesql.se

over 2 years ago

In reply to: Paul Förster (#1)

Re: pg_checksums?

On 27 Oct 2023, at 20:34, Paul Förster <paul.foerster@gmail.com> wrote:

a) why isn't it possible to enable checksumming while a database cluster is up?

It is surprisingly complicated to enable checksums on a live cluster, a patch
was submitted a while back but ultimately never made it into postgres. The
below threads may shine some light on the problem:

/messages/by-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp=-7OJWBbcg@mail.gmail.com
/messages/by-id/560A2239-5DE2-4B9C-92BC-878C6822F47C@yesql.se

--
Daniel Gustafsson

bruce@momjian.us

over 2 years ago

In reply to: Daniel Gustafsson (#4)

Re: pg_checksums?

On Fri, Oct 27, 2023 at 10:45:16PM +0200, Daniel Gustafsson wrote:

On 27 Oct 2023, at 20:34, Paul Förster <paul.foerster@gmail.com> wrote:

a) why isn't it possible to enable checksumming while a database cluster is up?

It is surprisingly complicated to enable checksums on a live cluster, a patch
was submitted a while back but ultimately never made it into postgres. The
below threads may shine some light on the problem:

/messages/by-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp=-7OJWBbcg@mail.gmail.com
/messages/by-id/560A2239-5DE2-4B9C-92BC-878C6822F47C@yesql.se

Yeah, it was a big surprise that this feature was so hard to implement
because we have _no_ infrastructure for having multiple data layouts
active in a live system. The discussion eventually made that clear.

If we have more features that need this kind of dynamic ability, we
might revisit this feature too.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

paul.foerster@gmail.com

over 2 years ago

In reply to: Bruce Momjian (#5)

Re: pg_checksums?

Hi Bruce, hi Daniel,

On Oct 27, 2023, at 23:21, Bruce Momjian <bruce@momjian.us> wrote:

On Fri, Oct 27, 2023 at 10:45:16PM +0200, Daniel Gustafsson wrote:

On 27 Oct 2023, at 20:34, Paul Förster <paul.foerster@gmail.com> wrote:

a) why isn't it possible to enable checksumming while a database cluster is up?

It is surprisingly complicated to enable checksums on a live cluster, a patch
was submitted a while back but ultimately never made it into postgres. The
below threads may shine some light on the problem:

/messages/by-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp=-7OJWBbcg@mail.gmail.com
/messages/by-id/560A2239-5DE2-4B9C-92BC-878C6822F47C@yesql.se

Yeah, it was a big surprise that this feature was so hard to implement
because we have _no_ infrastructure for having multiple data layouts
active in a live system. The discussion eventually made that clear.

If we have more features that need this kind of dynamic ability, we
might revisit this feature too.

Ok, I see.

But unfortunately still, my questions c) and d) are unanswered. I'd especially be interested in an answer to c), i.e. is it *safe* to "pg_checksum -e" the replica instance in a patroni cluster, switch over, and then do the other one?

Cheers
Paul

Peter J. Holzer

hjp-pgsql@hjp.at

over 2 years ago

In reply to: Paul Förster (#6)

Re: pg_checksums?

On 2023-10-27 23:37:24 +0200, Paul Förster wrote:

But unfortunately still, my questions c) and d) are unanswered. I'd
especially be interested in an answer to c), i.e. is it *safe* to
"pg_checksum -e" the replica instance in a patroni cluster, switch
over, and then do the other one?

I don't think so. AFAIK Replication keeps the data files in sync on a
bit-for-bit level and turning on checksums changes the data layout.
Running a cluster where one node has checksums and the other doesn't
would result in a complete mess.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

paul.foerster@gmail.com

over 2 years ago

In reply to: Peter J. Holzer (#7)

Re: pg_checksums?

Hi Peter

On Oct 29, 2023, at 02:43, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
I don't think so. AFAIK Replication keeps the data files in sync on a
bit-for-bit level and turning on checksums changes the data layout.
Running a cluster where one node has checksums and the other doesn't
would result in a complete mess.

I agree with the last sentence. This is why I asked if it is safe to enable checksums on a replica, switch over and then do it again on the ex primary, i.e. now new replica without doing a reinit.

Cheers
Paul

Peter J. Holzer

hjp-pgsql@hjp.at

over 2 years ago

In reply to: Paul Förster (#8)

Re: pg_checksums?

On 2023-10-29 10:11:07 +0100, Paul Förster wrote:

On Oct 29, 2023, at 02:43, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

I don't think so. AFAIK Replication keeps the data files in sync on a
bit-for-bit level and turning on checksums changes the data layout.
Running a cluster where one node has checksums and the other doesn't
would result in a complete mess.

I agree with the last sentence. This is why I asked if it is safe to
enable checksums on a replica, switch over and then do it again on the
ex primary, i.e. now new replica without doing a reinit.

It *might* work if there are zero writes on the primary during the
downtime of the replica (because those writes couldn't be replicated),
but that seems hard to ensure. Even if you could get away with making
the primary read-only (is this even possible?) I wouldn't have much
confidence in the result and reinit the (new) replica anyway.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

paul.foerster@gmail.com

over 2 years ago

In reply to: Peter J. Holzer (#9)

Re: pg_checksums?

Hi Peter,

On Oct 29, 2023, at 11:49, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

It *might* work if there are zero writes on the primary during the
downtime of the replica (because those writes couldn't be replicated),
but that seems hard to ensure. Even if you could get away with making
the primary read-only (is this even possible?) I wouldn't have much
confidence in the result and reinit the (new) replica anyway.

As soon as I stop the replica to enable checksums, even writes can't get replicated anymore. So during enabling checksums, a replica is definitely protected against modifications by its primary, simply because it's down. The modifications of the primary are applied to the replica when it comes back online. So, I don't see a problem at this particular stage.

My fear is merely that enabling checksums does something to the physical state of the data files which are not compatible with the other side. Like for example manipulate the file headers in some way.

Maybe this question is better suited for the admin list than this general list?

Cheers
Paul

ronljohnsonjr@gmail.com

over 2 years ago

In reply to: Paul Förster (#8)

Re: pg_checksums?

On 10/29/23 04:11, Paul Förster wrote:

Hi Peter

On Oct 29, 2023, at 02:43, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
I don't think so. AFAIK Replication keeps the data files in sync on a
bit-for-bit level and turning on checksums changes the data layout.
Running a cluster where one node has checksums and the other doesn't
would result in a complete mess.

I agree with the last sentence. This is why I asked if it is safe to enable checksums on a replica, switch over and then do it again on the ex primary, i.e. now new replica without doing a reinit.

For that to work, the secondary files would have to remain identical to the
primary files. Theoretically that _should_ happen, but it might not, or
whatever command that enables checksums after the fact might have a sanity
check.

As for safety, what do you mean by "safe"?

--
Born in Arizona, moved to Babylonia.

Peter J. Holzer

hjp-pgsql@hjp.at

over 2 years ago

In reply to: Paul Förster (#10)

Re: pg_checksums?

On 2023-10-29 16:15:37 +0100, Paul Förster wrote:

On Oct 29, 2023, at 11:49, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

It *might* work if there are zero writes on the primary during the
downtime of the replica (because those writes couldn't be replicated),
but that seems hard to ensure. Even if you could get away with making
the primary read-only (is this even possible?) I wouldn't have much
confidence in the result and reinit the (new) replica anyway.

As soon as I stop the replica to enable checksums, even writes can't
get replicated anymore. So during enabling checksums, a replica is
definitely protected against modifications by its primary, simply
because it's down. The modifications of the primary are applied to the
replica when it comes back online.

And this is where it would break down. The modifications can't be
applied to the replica any more because the replica now contains
checksums and the modifications don't. In the best case the replica
would catch the discrepancy and refuse to apply the modifications which
would lead to the loss of these modifications. In the worst case it
would apply them anyway causing severe data corruption.

So, I don't see a problem at this particular stage.

My fear is merely that enabling checksums does something to the
physical state of the data files which are not compatible with the
other side.

Exactly. Those checksums have to be stored somewhere.

Like for example manipulate the file headers in some way.

Not just the file headers. Every single data block.

(Ok, it looks like the space for the checksum is reserved even if
checksums aren't enabled[1]https://www.postgresql.org/docs/current/storage-page-layout.html#PAGEHEADERDATA-TABLE. So at least pg_checksums doesn't have to
move data around to enable them. But overwriting a page with a checksum
with one without one would still be bad.)

hp

[1]: https://www.postgresql.org/docs/current/storage-page-layout.html#PAGEHEADERDATA-TABLE

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

paul.foerster@gmail.com

over 2 years ago

In reply to: Ron (#11)

Re: pg_checksums?

Hi Ron,

On Oct 29, 2023, at 16:37, Ron <ronljohnsonjr@gmail.com> wrote:

As for safety, what do you mean by "safe"?

Safe in the sense that, if I enable checksums on a replica, switch over and the enable checksums on the other side, if this is ok, or whether future mutations on the primary will corrupt the replica.

That's why I asked if I need to perform a patronictl reinit.

Cheers
Paul

paul.foerster@gmail.com

over 2 years ago

In reply to: Peter J. Holzer (#12)

Re: pg_checksums?

Hi Ron,

On Oct 29, 2023, at 16:38, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

And this is where it would break down. The modifications can't be
applied to the replica any more because the replica now contains
checksums and the modifications don't. In the best case the replica
would catch the discrepancy and refuse to apply the modifications which
would lead to the loss of these modifications. In the worst case it
would apply them anyway causing severe data corruption.

...

Not just the file headers. Every single data block.

(Ok, it looks like the space for the checksum is reserved even if
checksums aren't enabled[1]. So at least pg_checksums doesn't have to
move data around to enable them. But overwriting a page with a checksum
with one without one would still be bad.)

Those are the kind of answers and insights I was looking for. Thank you very much.

Ok, I will do a reinit then.

Cheers
Paul

ronljohnsonjr@gmail.com

over 2 years ago

In reply to: Paul Förster (#13)

Re: pg_checksums?

On 10/29/23 12:57, Paul Förster wrote:

Hi Ron,

On Oct 29, 2023, at 16:37, Ron <ronljohnsonjr@gmail.com> wrote:

As for safety, what do you mean by "safe"?

Safe in the sense that, if I enable checksums on a replica, switch over and the enable checksums on the other side, if this is ok, or whether future mutations on the primary will corrupt the replica.

Trying it would tell you something.

That's why I asked if I need to perform a patronictl reinit.

Best to ask Percona.

--
Born in Arizona, moved to Babylonia.

Michael Paquier

michael@paquier.xyz

over 2 years ago

In reply to: Peter J. Holzer (#9)

Re: pg_checksums?

On Sun, Oct 29, 2023 at 11:49:11AM +0100, Peter J. Holzer wrote:

On 2023-10-29 10:11:07 +0100, Paul Förster wrote:

On Oct 29, 2023, at 02:43, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

I don't think so. AFAIK Replication keeps the data files in sync on a
bit-for-bit level and turning on checksums changes the data layout.
Running a cluster where one node has checksums and the other doesn't
would result in a complete mess.

I agree with the last sentence. This is why I asked if it is safe to
enable checksums on a replica, switch over and then do it again on the
ex primary, i.e. now new replica without doing a reinit.

It *might* work if there are zero writes on the primary during the
downtime of the replica (because those writes couldn't be replicated),
but that seems hard to ensure. Even if you could get away with making
the primary read-only (is this even possible?) I wouldn't have much
confidence in the result and reinit the (new) replica anyway.

Hm? Page checksums are written when a page is flushed to disk, we
don't set them for dirty buffers or full-page writes included in WAL,
so it should be OK to do something like the following:
- Stop cleanly a standby.
- Run pg_checksums on the standby to enable them.
- Restart the standby.
- Catchup with the latest changes
- Stop cleanly the primary, letting the shutdown checkpoint be
replicated to the standby.
- Promote the standby.
- Enable checksums on the previous primary.
- Start the previous primary to be a standby of the node you failed
over to.
--
Michael

Peter J. Holzer

hjp-pgsql@hjp.at

over 2 years ago

In reply to: Michael Paquier (#16)

Re: pg_checksums?

On 2023-10-30 09:56:31 +0900, Michael Paquier wrote:

Hm? Page checksums are written when a page is flushed to disk, we
don't set them for dirty buffers or full-page writes included in WAL,
so it should be OK to do something like the following:
- Stop cleanly a standby.
- Run pg_checksums on the standby to enable them.
- Restart the standby.
- Catchup with the latest changes
- Stop cleanly the primary, letting the shutdown checkpoint be
replicated to the standby.
- Promote the standby.
- Enable checksums on the previous primary.
- Start the previous primary to be a standby of the node you failed
over to.

I stand corrected.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

Peter J. Holzer

hjp-pgsql@hjp.at

over 2 years ago

In reply to: Ron (#15)

Re: pg_checksums?

On 2023-10-29 13:26:27 -0500, Ron wrote:

On 10/29/23 12:57, Paul Förster wrote:

Safe in the sense that, if I enable checksums on a replica, switch
over and the enable checksums on the other side, if this is ok, or
whether future mutations on the primary will corrupt the replica.

Trying it would tell you something.

That's why I asked if I need to perform a patronictl reinit.

Best to ask Percona.

Why Percona?

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

paul.foerster@gmail.com

over 2 years ago

In reply to: Michael Paquier (#16)

Re: pg_checksums?

Hi Michael,

On Oct 30, 2023, at 01:56, Michael Paquier <michael@paquier.xyz> wrote:

Hm? Page checksums are written when a page is flushed to disk, we
don't set them for dirty buffers or full-page writes included in WAL,
so it should be OK to do something like the following:
- Stop cleanly a standby.
- Run pg_checksums on the standby to enable them.
- Restart the standby.
- Catchup with the latest changes
- Stop cleanly the primary, letting the shutdown checkpoint be
replicated to the standby.
- Promote the standby.
- Enable checksums on the previous primary.
- Start the previous primary to be a standby of the node you failed
over to.

That's exactly the reasoning behind my initial idea and question. Patroni does the switchover job for me including catching up on the latest changes, etc.

Seems that opinions vary. Are there any hard facts?

It turns out that enabling checksums can take quite some time to complete, i.e. downtime for the application which is hard to do in a 24x7 environment.

Cheers
Paul

paul.foerster@gmail.com

over 2 years ago

In reply to: Peter J. Holzer (#18)

Re: pg_checksums?

Hi Peter,

On Oct 30, 2023, at 11:03, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2023-10-29 13:26:27 -0500, Ron wrote:

Best to ask Percona.

Why Percona?

Probably a typo. Patroni is used.

Cheers
Paul

ronljohnsonjr@gmail.com

over 2 years ago

In reply to: Paul Förster (#20)

Alexander Kukushkin

cyberdemn@gmail.com

over 2 years ago

In reply to: Ron (#21)

paul.foerster@gmail.com

over 2 years ago

In reply to: Alexander Kukushkin (#22)

Alexander Kukushkin

cyberdemn@gmail.com

over 2 years ago

In reply to: Paul Förster (#23)

b55white@gmail.com

over 2 years ago

In reply to: Michael Paquier (#16)

paul.foerster@gmail.com

over 2 years ago

In reply to: Alexander Kukushkin (#24)

Nikolay Samokhvalov

samokhvalov@gmail.com

over 2 years ago

In reply to: Alexander Kukushkin (#22)

paul.foerster@gmail.com

over 2 years ago

In reply to: Nikolay Samokhvalov (#27)