pg_checksums?

Started by Paul Försterover 2 years ago28 messagesgeneral
Jump to latest
#1Paul Förster
paul.foerster@gmail.com

Hi,

I have a few questions about pg_checksums.

Long story short, checksums were disabled in our environment but are getting enabled now (again!) because of block corruptions which destroyed a few databases in a database cluster. And before you say "told you so", the decision to disable checksums was not mine. Some people just have to learn the hard way.

Anyway, re-enabling data checksums creates a few questions:

a) why isn't it possible to enable checksumming while a database cluster is up?

b) why isn't it possible to check whether checksums are enabled or not?

c) in a Patroni cluster consisting of a primary and a sync standby, is it safe to enable checksumming in the replica, then switch over and enable it in the ex-primary, i.e. now new replica, without any penalty? Or do I have to perform a reinit to really get them in sync again, though paronictl happily reports them to be in sync?

d) how long does it take to enable checksums in general? Minimizing down-time is crucial. Does it depend on the database cluster size, or the number of files it uses, or what can be taken as a criterion to estimate then necessary down-time.

Thanks in advance for your insights.

Cheers
Paul

#2Ron
ronljohnsonjr@gmail.com
In reply to: Paul Förster (#1)
Re: pg_checksums?

On 10/27/23 13:34, Paul Förster wrote:

Hi,

I have a few questions about pg_checksums.

Long story short, checksums were disabled in our environment but are getting enabled now (again!) because of block corruptions which destroyed a few databases in a database cluster. And before you say "told you so", the decision to disable checksums was not mine. Some people just have to learn the hard way.

Anyway, re-enabling data checksums creates a few questions:

a) why isn't it possible to enable checksumming while a database cluster is up?

Data might be changing.

b) why isn't it possible to check whether checksums are enabled or not?

(This is my tiny test instance.)

$ pg_controldata | grep checksum
Data page checksum version:           0

postgres=# show data_checksums;
 data_checksums
----------------
 off
(1 row)

c) in a Patroni cluster consisting of a primary and a sync standby, is it safe to enable checksumming in the replica, then switch over and enable it in the ex-primary, i.e. now new replica, without any penalty? Or do I have to perform a reinit to really get them in sync again, though paronictl happily reports them to be in sync?

d) how long does it take to enable checksums in general? Minimizing down-time is crucial. Does it depend on the database cluster size, or the number of files it uses, or what can be taken as a criterion to estimate then necessary down-time.

Thanks in advance for your insights.

Cheers
Paul

--
Born in Arizona, moved to Babylonia.

#3Paul Förster
paul.foerster@gmail.com
In reply to: Ron (#2)
Re: pg_checksums?

Hi Ron,

On Oct 27, 2023, at 21:02, Ron <ronljohnsonjr@gmail.com> wrote:

b) why isn't it possible to check whether checksums are enabled or not?

(This is my tiny test instance.)

$ pg_controldata | grep checksum
Data page checksum version: 0

postgres=# show data_checksums;
data_checksums
----------------
off
(1 row)

this helps a lot. Thanks very much.

Now, there are only two other questions.

Cheers
Paul

#4Daniel Gustafsson
daniel@yesql.se
In reply to: Paul Förster (#1)
Re: pg_checksums?

On 27 Oct 2023, at 20:34, Paul Förster <paul.foerster@gmail.com> wrote:

a) why isn't it possible to enable checksumming while a database cluster is up?

It is surprisingly complicated to enable checksums on a live cluster, a patch
was submitted a while back but ultimately never made it into postgres. The
below threads may shine some light on the problem:

/messages/by-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp=-7OJWBbcg@mail.gmail.com
/messages/by-id/560A2239-5DE2-4B9C-92BC-878C6822F47C@yesql.se

--
Daniel Gustafsson

#5Bruce Momjian
bruce@momjian.us
In reply to: Daniel Gustafsson (#4)
Re: pg_checksums?

On Fri, Oct 27, 2023 at 10:45:16PM +0200, Daniel Gustafsson wrote:

On 27 Oct 2023, at 20:34, Paul Förster <paul.foerster@gmail.com> wrote:

a) why isn't it possible to enable checksumming while a database cluster is up?

It is surprisingly complicated to enable checksums on a live cluster, a patch
was submitted a while back but ultimately never made it into postgres. The
below threads may shine some light on the problem:

/messages/by-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp=-7OJWBbcg@mail.gmail.com
/messages/by-id/560A2239-5DE2-4B9C-92BC-878C6822F47C@yesql.se

Yeah, it was a big surprise that this feature was so hard to implement
because we have _no_ infrastructure for having multiple data layouts
active in a live system. The discussion eventually made that clear.

If we have more features that need this kind of dynamic ability, we
might revisit this feature too.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

#6Paul Förster
paul.foerster@gmail.com
In reply to: Bruce Momjian (#5)
Re: pg_checksums?

Hi Bruce, hi Daniel,

On Oct 27, 2023, at 23:21, Bruce Momjian <bruce@momjian.us> wrote:

On Fri, Oct 27, 2023 at 10:45:16PM +0200, Daniel Gustafsson wrote:

On 27 Oct 2023, at 20:34, Paul Förster <paul.foerster@gmail.com> wrote:

a) why isn't it possible to enable checksumming while a database cluster is up?

It is surprisingly complicated to enable checksums on a live cluster, a patch
was submitted a while back but ultimately never made it into postgres. The
below threads may shine some light on the problem:

/messages/by-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp=-7OJWBbcg@mail.gmail.com
/messages/by-id/560A2239-5DE2-4B9C-92BC-878C6822F47C@yesql.se

Yeah, it was a big surprise that this feature was so hard to implement
because we have _no_ infrastructure for having multiple data layouts
active in a live system. The discussion eventually made that clear.

If we have more features that need this kind of dynamic ability, we
might revisit this feature too.

Ok, I see.

But unfortunately still, my questions c) and d) are unanswered. I'd especially be interested in an answer to c), i.e. is it *safe* to "pg_checksum -e" the replica instance in a patroni cluster, switch over, and then do the other one?

Cheers
Paul

#7Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Paul Förster (#6)
Re: pg_checksums?

On 2023-10-27 23:37:24 +0200, Paul Förster wrote:

But unfortunately still, my questions c) and d) are unanswered. I'd
especially be interested in an answer to c), i.e. is it *safe* to
"pg_checksum -e" the replica instance in a patroni cluster, switch
over, and then do the other one?

I don't think so. AFAIK Replication keeps the data files in sync on a
bit-for-bit level and turning on checksums changes the data layout.
Running a cluster where one node has checksums and the other doesn't
would result in a complete mess.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#8Paul Förster
paul.foerster@gmail.com
In reply to: Peter J. Holzer (#7)
Re: pg_checksums?

Hi Peter

On Oct 29, 2023, at 02:43, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
I don't think so. AFAIK Replication keeps the data files in sync on a
bit-for-bit level and turning on checksums changes the data layout.
Running a cluster where one node has checksums and the other doesn't
would result in a complete mess.

I agree with the last sentence. This is why I asked if it is safe to enable checksums on a replica, switch over and then do it again on the ex primary, i.e. now new replica without doing a reinit.

Cheers
Paul

#9Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Paul Förster (#8)
Re: pg_checksums?

On 2023-10-29 10:11:07 +0100, Paul Förster wrote:

On Oct 29, 2023, at 02:43, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

I don't think so. AFAIK Replication keeps the data files in sync on a
bit-for-bit level and turning on checksums changes the data layout.
Running a cluster where one node has checksums and the other doesn't
would result in a complete mess.

I agree with the last sentence. This is why I asked if it is safe to
enable checksums on a replica, switch over and then do it again on the
ex primary, i.e. now new replica without doing a reinit.

It *might* work if there are zero writes on the primary during the
downtime of the replica (because those writes couldn't be replicated),
but that seems hard to ensure. Even if you could get away with making
the primary read-only (is this even possible?) I wouldn't have much
confidence in the result and reinit the (new) replica anyway.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#10Paul Förster
paul.foerster@gmail.com
In reply to: Peter J. Holzer (#9)
Re: pg_checksums?

Hi Peter,

On Oct 29, 2023, at 11:49, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

It *might* work if there are zero writes on the primary during the
downtime of the replica (because those writes couldn't be replicated),
but that seems hard to ensure. Even if you could get away with making
the primary read-only (is this even possible?) I wouldn't have much
confidence in the result and reinit the (new) replica anyway.

As soon as I stop the replica to enable checksums, even writes can't get replicated anymore. So during enabling checksums, a replica is definitely protected against modifications by its primary, simply because it's down. The modifications of the primary are applied to the replica when it comes back online. So, I don't see a problem at this particular stage.

My fear is merely that enabling checksums does something to the physical state of the data files which are not compatible with the other side. Like for example manipulate the file headers in some way.

Maybe this question is better suited for the admin list than this general list?

Cheers
Paul

#11Ron
ronljohnsonjr@gmail.com
In reply to: Paul Förster (#8)
Re: pg_checksums?

On 10/29/23 04:11, Paul Förster wrote:

Hi Peter

On Oct 29, 2023, at 02:43, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
I don't think so. AFAIK Replication keeps the data files in sync on a
bit-for-bit level and turning on checksums changes the data layout.
Running a cluster where one node has checksums and the other doesn't
would result in a complete mess.

I agree with the last sentence. This is why I asked if it is safe to enable checksums on a replica, switch over and then do it again on the ex primary, i.e. now new replica without doing a reinit.

For that to work, the secondary files would have to remain identical to the
primary files.  Theoretically that _should_ happen, but it might not, or
whatever command that enables checksums after the fact might have a sanity
check.

As for safety, what do you mean by "safe"?

--
Born in Arizona, moved to Babylonia.

#12Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Paul Förster (#10)
Re: pg_checksums?

On 2023-10-29 16:15:37 +0100, Paul Förster wrote:

On Oct 29, 2023, at 11:49, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

It *might* work if there are zero writes on the primary during the
downtime of the replica (because those writes couldn't be replicated),
but that seems hard to ensure. Even if you could get away with making
the primary read-only (is this even possible?) I wouldn't have much
confidence in the result and reinit the (new) replica anyway.

As soon as I stop the replica to enable checksums, even writes can't
get replicated anymore. So during enabling checksums, a replica is
definitely protected against modifications by its primary, simply
because it's down. The modifications of the primary are applied to the
replica when it comes back online.

And this is where it would break down. The modifications can't be
applied to the replica any more because the replica now contains
checksums and the modifications don't. In the best case the replica
would catch the discrepancy and refuse to apply the modifications which
would lead to the loss of these modifications. In the worst case it
would apply them anyway causing severe data corruption.

So, I don't see a problem at this particular stage.

My fear is merely that enabling checksums does something to the
physical state of the data files which are not compatible with the
other side.

Exactly. Those checksums have to be stored somewhere.

Like for example manipulate the file headers in some way.

Not just the file headers. Every single data block.

(Ok, it looks like the space for the checksum is reserved even if
checksums aren't enabled[1]https://www.postgresql.org/docs/current/storage-page-layout.html#PAGEHEADERDATA-TABLE. So at least pg_checksums doesn't have to
move data around to enable them. But overwriting a page with a checksum
with one without one would still be bad.)

hp

[1]: https://www.postgresql.org/docs/current/storage-page-layout.html#PAGEHEADERDATA-TABLE

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#13Paul Förster
paul.foerster@gmail.com
In reply to: Ron (#11)
Re: pg_checksums?

Hi Ron,

On Oct 29, 2023, at 16:37, Ron <ronljohnsonjr@gmail.com> wrote:

As for safety, what do you mean by "safe"?

Safe in the sense that, if I enable checksums on a replica, switch over and the enable checksums on the other side, if this is ok, or whether future mutations on the primary will corrupt the replica.

That's why I asked if I need to perform a patronictl reinit.

Cheers
Paul

#14Paul Förster
paul.foerster@gmail.com
In reply to: Peter J. Holzer (#12)
Re: pg_checksums?

Hi Ron,

On Oct 29, 2023, at 16:38, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

And this is where it would break down. The modifications can't be
applied to the replica any more because the replica now contains
checksums and the modifications don't. In the best case the replica
would catch the discrepancy and refuse to apply the modifications which
would lead to the loss of these modifications. In the worst case it
would apply them anyway causing severe data corruption.

...

Not just the file headers. Every single data block.

(Ok, it looks like the space for the checksum is reserved even if
checksums aren't enabled[1]. So at least pg_checksums doesn't have to
move data around to enable them. But overwriting a page with a checksum
with one without one would still be bad.)

Those are the kind of answers and insights I was looking for. Thank you very much.

Ok, I will do a reinit then.

Cheers
Paul

#15Ron
ronljohnsonjr@gmail.com
In reply to: Paul Förster (#13)
Re: pg_checksums?

On 10/29/23 12:57, Paul Förster wrote:

Hi Ron,

On Oct 29, 2023, at 16:37, Ron <ronljohnsonjr@gmail.com> wrote:

As for safety, what do you mean by "safe"?

Safe in the sense that, if I enable checksums on a replica, switch over and the enable checksums on the other side, if this is ok, or whether future mutations on the primary will corrupt the replica.

Trying it would tell you something.

That's why I asked if I need to perform a patronictl reinit.

Best to ask Percona.

--
Born in Arizona, moved to Babylonia.

#16Michael Paquier
michael@paquier.xyz
In reply to: Peter J. Holzer (#9)
Re: pg_checksums?

On Sun, Oct 29, 2023 at 11:49:11AM +0100, Peter J. Holzer wrote:

On 2023-10-29 10:11:07 +0100, Paul Förster wrote:

On Oct 29, 2023, at 02:43, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

I don't think so. AFAIK Replication keeps the data files in sync on a
bit-for-bit level and turning on checksums changes the data layout.
Running a cluster where one node has checksums and the other doesn't
would result in a complete mess.

I agree with the last sentence. This is why I asked if it is safe to
enable checksums on a replica, switch over and then do it again on the
ex primary, i.e. now new replica without doing a reinit.

It *might* work if there are zero writes on the primary during the
downtime of the replica (because those writes couldn't be replicated),
but that seems hard to ensure. Even if you could get away with making
the primary read-only (is this even possible?) I wouldn't have much
confidence in the result and reinit the (new) replica anyway.

Hm? Page checksums are written when a page is flushed to disk, we
don't set them for dirty buffers or full-page writes included in WAL,
so it should be OK to do something like the following:
- Stop cleanly a standby.
- Run pg_checksums on the standby to enable them.
- Restart the standby.
- Catchup with the latest changes
- Stop cleanly the primary, letting the shutdown checkpoint be
replicated to the standby.
- Promote the standby.
- Enable checksums on the previous primary.
- Start the previous primary to be a standby of the node you failed
over to.
--
Michael

#17Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Michael Paquier (#16)
Re: pg_checksums?

On 2023-10-30 09:56:31 +0900, Michael Paquier wrote:

Hm? Page checksums are written when a page is flushed to disk, we
don't set them for dirty buffers or full-page writes included in WAL,
so it should be OK to do something like the following:
- Stop cleanly a standby.
- Run pg_checksums on the standby to enable them.
- Restart the standby.
- Catchup with the latest changes
- Stop cleanly the primary, letting the shutdown checkpoint be
replicated to the standby.
- Promote the standby.
- Enable checksums on the previous primary.
- Start the previous primary to be a standby of the node you failed
over to.

I stand corrected.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#18Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Ron (#15)
Re: pg_checksums?

On 2023-10-29 13:26:27 -0500, Ron wrote:

On 10/29/23 12:57, Paul Förster wrote:

Safe in the sense that, if I enable checksums on a replica, switch
over and the enable checksums on the other side, if this is ok, or
whether future mutations on the primary will corrupt the replica.

Trying it would tell you something.

That's why I asked if I need to perform a patronictl reinit.

Best to ask Percona.

Why Percona?

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#19Paul Förster
paul.foerster@gmail.com
In reply to: Michael Paquier (#16)
Re: pg_checksums?

Hi Michael,

On Oct 30, 2023, at 01:56, Michael Paquier <michael@paquier.xyz> wrote:

Hm? Page checksums are written when a page is flushed to disk, we
don't set them for dirty buffers or full-page writes included in WAL,
so it should be OK to do something like the following:
- Stop cleanly a standby.
- Run pg_checksums on the standby to enable them.
- Restart the standby.
- Catchup with the latest changes
- Stop cleanly the primary, letting the shutdown checkpoint be
replicated to the standby.
- Promote the standby.
- Enable checksums on the previous primary.
- Start the previous primary to be a standby of the node you failed
over to.

That's exactly the reasoning behind my initial idea and question. Patroni does the switchover job for me including catching up on the latest changes, etc.

Seems that opinions vary. Are there any hard facts?

It turns out that enabling checksums can take quite some time to complete, i.e. downtime for the application which is hard to do in a 24x7 environment.

Cheers
Paul

#20Paul Förster
paul.foerster@gmail.com
In reply to: Peter J. Holzer (#18)
Re: pg_checksums?

Hi Peter,

On Oct 30, 2023, at 11:03, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2023-10-29 13:26:27 -0500, Ron wrote:

Best to ask Percona.

Why Percona?

Probably a typo. Patroni is used.

Cheers
Paul

#21Ron
ronljohnsonjr@gmail.com
In reply to: Paul Förster (#20)
#22Alexander Kukushkin
cyberdemn@gmail.com
In reply to: Ron (#21)
#23Paul Förster
paul.foerster@gmail.com
In reply to: Alexander Kukushkin (#22)
#24Alexander Kukushkin
cyberdemn@gmail.com
In reply to: Paul Förster (#23)
#25Brad White
b55white@gmail.com
In reply to: Michael Paquier (#16)
#26Paul Förster
paul.foerster@gmail.com
In reply to: Alexander Kukushkin (#24)
#27Nikolay Samokhvalov
samokhvalov@gmail.com
In reply to: Alexander Kukushkin (#22)
#28Paul Förster
paul.foerster@gmail.com
In reply to: Nikolay Samokhvalov (#27)