Move --data-checksums to common options in initdb --help

Started by Michael Banckover 5 years ago46 messageshackers
Jump to latest
#1Michael Banck
michael.banck@credativ.de

Hi,

I noticed -k/--data-checksums is currently in the less commonly used
options part of the initdb --help output:

|Less commonly used options:
| -d, --debug generate lots of debugging output
| -k, --data-checksums use data page checksums

I think enough people use data checksums these days that it warrants to
be moved into the "normal part", like in the attached.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachments:

initdb_move_k_to_common_options.patchtext/x-patch; charset=UTF-8; name=initdb_move_k_to_common_options.patchDownload+1-1
#2Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#1)
Re: Move --data-checksums to common options in initdb --help

On Fri, Jan 01, 2021 at 08:34:34PM +0100, Michael Banck wrote:

I think enough people use data checksums these days that it warrants to
be moved into the "normal part", like in the attached.

+1.  Let's see first what others think about this change.
--
Michael
#3Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#2)
Re: Move --data-checksums to common options in initdb --help

Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:

On Fri, Jan 01, 2021 at 08:34:34PM +0100, Michael Banck wrote:

I think enough people use data checksums these days that it warrants to
be moved into the "normal part", like in the attached.

+1. Let's see first what others think about this change.

I agree with this, but I'd also like to propose, again, as has been
discussed a few times, making it the default too.

Thanks,

Stephen

#4Michael Banck
michael.banck@credativ.de
In reply to: Stephen Frost (#3)
data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

Heya,

(changing the subject as we're moving the goalposts)

Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost:

* Michael Paquier (michael@paquier.xyz) wrote:

On Fri, Jan 01, 2021 at 08:34:34PM +0100, Michael Banck wrote:

I think enough people use data checksums these days that it warrants to
be moved into the "normal part", like in the attached.

+1. Let's see first what others think about this change.

I agree with this, but I'd also like to propose, again, as has been
discussed a few times, making it the default too.

One thing my colleagues have complained is the seemingly excessive
amount of WAL generation when checksums are enabled (compared to the
default where data_checksums and wal_log_hints is both off) due to
additional FPIs.

So I made some quick benchmarks based on pgbench -i (i.e. just
initializing the data, not actually running queries) and seeing how much
WAL is produced during a VACUUM with a forced CHECKPOINT beforehand.

This creates a new instance, turns archiving on and then first does the
data-load with scale-factor 100 in pgbench (initialization steps "dtg"),
followed by a CHECKPOINT and then the VACUUM/PK generation steps
(initialization steps "vp"), followed by a final CHECKPOINT. It looks
like this where $CHECKSUM is either empty or '-k':

pg_ctl -D data1 stop; rm -rf data1/ data1_archive/*;
initdb $CHECKSUM -D
data1; cp postgresql.conf data1;
pg_ctl -D data1 -l data1_logfile start;
pgbench -s 100 -i -p 65432 -I dtg; echo CHECKPOINT | psql -p 65432;
pgbench -s 100 -i -p 65432 -I vp; echo CHECKPOINT | psql -p 65432;
du -s
-h data1/pg_wal data1/base data1_archive/

All runs were repeated twice. These are the $PGDATA/{pg_wal,base} sizes
and the archive, as well as the timing for the second pgbench
initialization step:

data_checksums=off, wal_compression=off

1,1G data1/pg_wal
1,5G data1/base
1,3G data1_archive/

done in 10.24 s (vacuum 3.31 s, primary keys 6.92 s).
done in 8.81 s (vacuum 2.72 s, primary keys 6.09 s).
done in 8.35 s (vacuum 2.32 s, primary keys 6.03 s).

data_checksums=on, wal_compression=off

1,5G data1/pg_wal
1,5G data1/base
2,5G data1_archive/

done in 67.42 s (vacuum 54.57 s, primary keys 12.85 s).
done in 65.03 s (vacuum 53.25 s, primary keys 11.78 s).
done in 77.57 s (vacuum 62.64 s, primary keys 14.94 s).

So data_checksums (and/or wal_log_hints, I ommitted those numbers as
they are basically identical to the data_checksums=on case) makes (i)
Vacuum run 20x and primary keys 2x longer and also increases the
generated WAL by 40% for pg_wal and roughly doubles the WAL in the
archive.

I then re-ran the tests with wal_compression=on in order to see how much
that helps:

data_checksums=off, wal_compression=on

1,1G data1/pg_wal
1,5G data1/base
1,2G data1_archive/

done in 26.60 s (vacuum 3.30 s, primary keys 23.30 s).
done in 19.54 s (vacuum 3.11 s, primary keys 16.43 s).
done in 19.50 s (vacuum 3.46 s, primary keys 16.04 s).

data_checksums=on, wal_compression=on

1,1G data1/pg_wal
1,5G data1/base
1,3G data1_archive/

done in 60.24 s (vacuum 42.52 s, primary keys 17.72 s).
done in 62.07 s (vacuum 45.64 s, primary keys 16.43 s).
done in 56.20 s (vacuum 40.96 s, primary keys 15.24 s).

This looks much better from the WAL size perspective, there's now almost
no additional WAL. However, that is because pgbench doesn't do TOAST, so
in a real-world example it might still be quite larger. Also, the vacuum
runtime is still 15x longer.

So maybe we should switch on wal_compression if we enable data checksums
by default.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#5Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#4)
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

On Mon, Jan 04, 2021 at 07:11:43PM +0100, Michael Banck wrote:

Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost:

* Michael Paquier (michael@paquier.xyz) wrote:

On Fri, Jan 01, 2021 at 08:34:34PM +0100, Michael Banck wrote:

I think enough people use data checksums these days that it warrants to
be moved into the "normal part", like in the attached.

+1. Let's see first what others think about this change.

I agree with this.

Okay, so I have applied this part as it makes sense independently.

But I'd also like to propose, again, as has been
discussed a few times, making it the default too.

While I don't particularly disagree, I think that this needs careful
evaluation.

So maybe we should switch on wal_compression if we enable data checksums
by default.

I don't agree with this assumption. In some CPU-bounded workloads, I
have seen that wal_compression = on leads to performance degradation
with or without checksums enabled.
--
Michael

#6Andres Freund
andres@anarazel.de
In reply to: Michael Banck (#4)
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

Hi,

On 2021-01-04 19:11:43 +0100, Michael Banck wrote:

Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost:

* Michael Paquier (michael@paquier.xyz) wrote:

On Fri, Jan 01, 2021 at 08:34:34PM +0100, Michael Banck wrote:

I think enough people use data checksums these days that it warrants to
be moved into the "normal part", like in the attached.

+1. Let's see first what others think about this change.

I agree with this, but I'd also like to propose, again, as has been
discussed a few times, making it the default too.

FWIW, I am quite doubtful we're there performance-wise. Besides the WAL
logging overhead, the copy we do via PageSetChecksumCopy() shows up
quite significantly in profiles here. Together with the checksums
computation that's *halfing* write throughput on fast drives in my aio
branch.

This looks much better from the WAL size perspective, there's now almost
no additional WAL. However, that is because pgbench doesn't do TOAST, so
in a real-world example it might still be quite larger. Also, the vacuum
runtime is still 15x longer.

That's obviously an issue.

So maybe we should switch on wal_compression if we enable data checksums
by default.

It unfortunately also hurts other workloads. If we moved towards a saner
compression algorithm that'd perhaps not be an issue anymore...

Greetings,

Andres Freund

#7Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#5)
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

Hi,

Am Mittwoch, den 06.01.2021, 10:52 +0900 schrieb Michael Paquier:

On Mon, Jan 04, 2021 at 07:11:43PM +0100, Michael Banck wrote:

So maybe we should switch on wal_compression if we enable data checksums
by default.

I don't agree with this assumption. In some CPU-bounded workloads, I
have seen that wal_compression = on leads to performance degradation
with or without checksums enabled.

I meant just flipping the default, admins could still turn off
wal_compression if they think it'd help their performance. But it might
be tricky to implement, not sure.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#8Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#6)
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

Greetings,

* Andres Freund (andres@anarazel.de) wrote:

On 2021-01-04 19:11:43 +0100, Michael Banck wrote:

Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost:

* Michael Paquier (michael@paquier.xyz) wrote:

On Fri, Jan 01, 2021 at 08:34:34PM +0100, Michael Banck wrote:

I think enough people use data checksums these days that it warrants to
be moved into the "normal part", like in the attached.

+1. Let's see first what others think about this change.

I agree with this, but I'd also like to propose, again, as has been
discussed a few times, making it the default too.

FWIW, I am quite doubtful we're there performance-wise. Besides the WAL
logging overhead, the copy we do via PageSetChecksumCopy() shows up
quite significantly in profiles here. Together with the checksums
computation that's *halfing* write throughput on fast drives in my aio
branch.

Our defaults are not going to win any performance trophies and so I
don't see the value in stressing over it here.

This looks much better from the WAL size perspective, there's now almost
no additional WAL. However, that is because pgbench doesn't do TOAST, so
in a real-world example it might still be quite larger. Also, the vacuum
runtime is still 15x longer.

That's obviously an issue.

It'd certainly be nice to figure out a way to improve the VACUUM run but
I don't think the impact on the time to run VACUUM is really a good
reason to not move forward with changing the default.

So maybe we should switch on wal_compression if we enable data checksums
by default.

That does seem like a good idea to me, +1 to also changing that.

It unfortunately also hurts other workloads. If we moved towards a saner
compression algorithm that'd perhaps not be an issue anymore...

I agree that improving compression performance would be good but I don't
see that as relevant to the question of what our defaults should be.

imv, enabling page checksums is akin to having fsync enabled by default.
Does it impact performance? Yes, surely quite a lot, but it's also the
safe and sane choice when it comes to defaults.

Thanks,

Stephen

#9Bruce Momjian
bruce@momjian.us
In reply to: Stephen Frost (#8)
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

On Wed, Jan 6, 2021 at 12:02:40PM -0500, Stephen Frost wrote:

It unfortunately also hurts other workloads. If we moved towards a saner
compression algorithm that'd perhaps not be an issue anymore...

I agree that improving compression performance would be good but I don't
see that as relevant to the question of what our defaults should be.

imv, enabling page checksums is akin to having fsync enabled by default.
Does it impact performance? Yes, surely quite a lot, but it's also the
safe and sane choice when it comes to defaults.

Well, you know fsyncs are required to recover from an OS crash, which is
more likely than detecting data corruption.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#10Stephen Frost
sfrost@snowman.net
In reply to: Bruce Momjian (#9)
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:

On Wed, Jan 6, 2021 at 12:02:40PM -0500, Stephen Frost wrote:

It unfortunately also hurts other workloads. If we moved towards a saner
compression algorithm that'd perhaps not be an issue anymore...

I agree that improving compression performance would be good but I don't
see that as relevant to the question of what our defaults should be.

imv, enabling page checksums is akin to having fsync enabled by default.
Does it impact performance? Yes, surely quite a lot, but it's also the
safe and sane choice when it comes to defaults.

Well, you know fsyncs are required to recover from an OS crash, which is
more likely than detecting data corruption.

Yes, I do know that. That doesn't change my feeling that we should have
checksums enabled by default.

Thanks,

Stephen

#11Magnus Hagander
magnus@hagander.net
In reply to: Michael Banck (#7)
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

On Wed, Jan 6, 2021 at 8:31 AM Michael Banck <michael.banck@credativ.de> wrote:

Hi,

Am Mittwoch, den 06.01.2021, 10:52 +0900 schrieb Michael Paquier:

On Mon, Jan 04, 2021 at 07:11:43PM +0100, Michael Banck wrote:

So maybe we should switch on wal_compression if we enable data checksums
by default.

I don't agree with this assumption. In some CPU-bounded workloads, I
have seen that wal_compression = on leads to performance degradation
with or without checksums enabled.

I meant just flipping the default, admins could still turn off
wal_compression if they think it'd help their performance. But it might
be tricky to implement, not sure.

The other argument is that admins can cheaply and quickly turn off
checksums if they don't want them.

The same cannot be said for turning them *on* again, that's a very
slow offline operation at this time.

Turning off checksums doesn't take noticeably more time than say
changing the shared_buffers from the default, which is also almost
guaranteed to be wrong for most installations.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#12Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#8)
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

Hi,

On 2021-01-06 12:02:40 -0500, Stephen Frost wrote:

* Andres Freund (andres@anarazel.de) wrote:

On 2021-01-04 19:11:43 +0100, Michael Banck wrote:

Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost:

I agree with this, but I'd also like to propose, again, as has been
discussed a few times, making it the default too.

FWIW, I am quite doubtful we're there performance-wise. Besides the WAL
logging overhead, the copy we do via PageSetChecksumCopy() shows up
quite significantly in profiles here. Together with the checksums
computation that's *halfing* write throughput on fast drives in my aio
branch.

Our defaults are not going to win any performance trophies and so I
don't see the value in stressing over it here.

Meh^3. There's a difference between defaults that are about resource
usage (e.g. shared_buffers) and defaults that aren't.

This looks much better from the WAL size perspective, there's now almost
no additional WAL. However, that is because pgbench doesn't do TOAST, so
in a real-world example it might still be quite larger. Also, the vacuum
runtime is still 15x longer.

That's obviously an issue.

It'd certainly be nice to figure out a way to improve the VACUUM run but
I don't think the impact on the time to run VACUUM is really a good
reason to not move forward with changing the default.

Vacuum performance is one of *THE* major complaints about
postgres. Making it run slower by a lot obviously exascerbates that
problem significantly. I think it'd be prohibitively expensive if it
were 1.5x, not to even speak of 15x.

imv, enabling page checksums is akin to having fsync enabled by default.
Does it impact performance? Yes, surely quite a lot, but it's also the
safe and sane choice when it comes to defaults.

Oh for crying out loud.

Greetings,

Andres Freund

#13Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#11)
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

Hi,

On 2021-01-06 18:27:48 +0100, Magnus Hagander wrote:

The other argument is that admins can cheaply and quickly turn off
checksums if they don't want them.

The same cannot be said for turning them *on* again, that's a very
slow offline operation at this time.

Turning off checksums doesn't take noticeably more time than say
changing the shared_buffers from the default, which is also almost
guaranteed to be wrong for most installations.

It still requires running a binary locally on the DB server, no? Which
means it'll not be an option on most cloud providers...

Greetings,

Andres Freund

#14Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#12)
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

Greetings,

* Andres Freund (andres@anarazel.de) wrote:

On 2021-01-06 12:02:40 -0500, Stephen Frost wrote:

* Andres Freund (andres@anarazel.de) wrote:

On 2021-01-04 19:11:43 +0100, Michael Banck wrote:

Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost:

I agree with this, but I'd also like to propose, again, as has been
discussed a few times, making it the default too.

FWIW, I am quite doubtful we're there performance-wise. Besides the WAL
logging overhead, the copy we do via PageSetChecksumCopy() shows up
quite significantly in profiles here. Together with the checksums
computation that's *halfing* write throughput on fast drives in my aio
branch.

Our defaults are not going to win any performance trophies and so I
don't see the value in stressing over it here.

Meh^3. There's a difference between defaults that are about resource
usage (e.g. shared_buffers) and defaults that aren't.

fsync isn't about resource usage.

This looks much better from the WAL size perspective, there's now almost
no additional WAL. However, that is because pgbench doesn't do TOAST, so
in a real-world example it might still be quite larger. Also, the vacuum
runtime is still 15x longer.

That's obviously an issue.

It'd certainly be nice to figure out a way to improve the VACUUM run but
I don't think the impact on the time to run VACUUM is really a good
reason to not move forward with changing the default.

Vacuum performance is one of *THE* major complaints about
postgres. Making it run slower by a lot obviously exascerbates that
problem significantly. I think it'd be prohibitively expensive if it
were 1.5x, not to even speak of 15x.

We already make vacuum, when run out of autovacuum, relatively slow,
quite intentionally. If someone's having trouble with vacuum run times
they're going to be adjusting the configuration anyway.

imv, enabling page checksums is akin to having fsync enabled by default.
Does it impact performance? Yes, surely quite a lot, but it's also the
safe and sane choice when it comes to defaults.

Oh for crying out loud.

Not sure what you're hoping to gain from such comments, but it doesn't
do anything to change my opinion.

Thanks,

Stephen

#15Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#13)
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

Greetings,

* Andres Freund (andres@anarazel.de) wrote:

On 2021-01-06 18:27:48 +0100, Magnus Hagander wrote:

The other argument is that admins can cheaply and quickly turn off
checksums if they don't want them.

The same cannot be said for turning them *on* again, that's a very
slow offline operation at this time.

Turning off checksums doesn't take noticeably more time than say
changing the shared_buffers from the default, which is also almost
guaranteed to be wrong for most installations.

It still requires running a binary locally on the DB server, no? Which
means it'll not be an option on most cloud providers...

... unless they choose to make it an option, which is entirely up to
them and certainly well within what they're capable of doing. I'd also
mention that, at least according to some cloud providers I've talked to,
they specifically wouldn't support PG until data checksums were
available, making me not really feel like having them enabled by default
would be such an issue (not to mention that, clearly, cloud providers
could choose to change the default for their deployments if they wished
to).

Thanks,

Stephen

#16Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#13)
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

On Wed, Jan 6, 2021 at 6:58 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-01-06 18:27:48 +0100, Magnus Hagander wrote:

The other argument is that admins can cheaply and quickly turn off
checksums if they don't want them.

The same cannot be said for turning them *on* again, that's a very
slow offline operation at this time.

Turning off checksums doesn't take noticeably more time than say
changing the shared_buffers from the default, which is also almost
guaranteed to be wrong for most installations.

It still requires running a binary locally on the DB server, no? Which

It does.

So does changing shared_buffers -- for example you need to run
"systemctl" if you're on systemd, or just pg_ctl if you're using
unpackaged postres.

means it'll not be an option on most cloud providers...

I really don't see why.

They've implemented the ability to restart postgres. Surely they can
implement the ability to run a single command in between.

Or if that's too complicated, they are more than capable of passing a
parameter to initdb to change what the default is on their platform.
They already do so for other things (such as not using trust or peer
auth by default, or by actually not having a superuser setc).

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#17Michael Banck
michael.banck@credativ.de
In reply to: Andres Freund (#13)
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

Am Mittwoch, den 06.01.2021, 09:58 -0800 schrieb Andres Freund:

It still requires running a binary locally on the DB server, no? Which
means it'll not be an option on most cloud providers...

At least Azure and RDS seem to have data_checksums on anyway, I don't
have a GCP test instance around handily right now to check.

Micael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#18Michael Banck
michael.banck@credativ.de
In reply to: Andres Freund (#12)
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

Hi,

On Wed, Jan 06, 2021 at 09:55:08AM -0800, Andres Freund wrote:

On 2021-01-06 12:02:40 -0500, Stephen Frost wrote:

* Andres Freund (andres@anarazel.de) wrote:

On 2021-01-04 19:11:43 +0100, Michael Banck wrote:

This looks much better from the WAL size perspective, there's now almost
no additional WAL. However, that is because pgbench doesn't do TOAST, so
in a real-world example it might still be quite larger. Also, the vacuum
runtime is still 15x longer.

That's obviously an issue.

It'd certainly be nice to figure out a way to improve the VACUUM run but
I don't think the impact on the time to run VACUUM is really a good
reason to not move forward with changing the default.

Vacuum performance is one of *THE* major complaints about
postgres. Making it run slower by a lot obviously exascerbates that
problem significantly. I think it'd be prohibitively expensive if it
were 1.5x, not to even speak of 15x.

To maybe clarify, the vacuum slowdown is just as large in my (somewhat
contrived as a worst-case scenario) tests when wal_log_hints is on and
not data_checksums, I just ommitted those numbers due to being basically
identical (or maybe a bit worse even):

|data_checksums=off, wal_log_hints=off:
|
|done in 10.24 s (vacuum 3.31 s, primary keys 6.92 s).
|done in 8.81 s (vacuum 2.72 s, primary keys 6.09 s).
|done in 8.35 s (vacuum 2.32 s, primary keys 6.03 s).
|
|data_checksums=off, wal_log_hints=on:
|
|1,5G data1/pg_wal
|1,5G data1/base
|2,5G data1_archive/
|
|done in 87.89 s (vacuum 69.67 s, primary keys 18.23 s).
|done in 73.71 s (vacuum 60.19 s, primary keys 13.52 s).
|done in 75.12 s (vacuum 62.49 s, primary keys 12.62 s).
|
|data_checksums=on, wal_log_hints=off:
|
|done in 67.42 s (vacuum 54.57 s, primary keys 12.85 s).
|done in 65.03 s (vacuum 53.25 s, primary keys 11.78 s).
|done in 77.57 s (vacuum 62.64 s, primary keys 14.94 s).

Of course, wal_log_hints is not the default either and can be turned off
easily. You mostly lose the ability to run pg_rewind I think, are there
other use-cases for it?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB M�nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 M�nchengladbach
Gesch�ftsf�hrung: Dr. Michael Meskes, J�rg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#19Michael Banck
michael.banck@credativ.de
In reply to: Michael Banck (#17)
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

Am Mittwoch, den 06.01.2021, 19:07 +0100 schrieb Michael Banck:

Am Mittwoch, den 06.01.2021, 09:58 -0800 schrieb Andres Freund:

It still requires running a binary locally on the DB server, no? Which
means it'll not be an option on most cloud providers...

At least Azure and RDS seem to have data_checksums on anyway, I don't
have a GCP test instance around handily right now to check.

Well I was curious: GCP SQL Postgres also has checksums enabled.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#20Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#14)
Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

Hi,

On 2021-01-06 13:01:59 -0500, Stephen Frost wrote:

* Andres Freund (andres@anarazel.de) wrote:

imv, enabling page checksums is akin to having fsync enabled by default.
Does it impact performance? Yes, surely quite a lot, but it's also the
safe and sane choice when it comes to defaults.

Oh for crying out loud.

Not sure what you're hoping to gain from such comments, but it doesn't
do anything to change my opinion.

It seems so facetious to compare fsync=off (will cause corruption) with
data_checksums=off (will not cause corruption) that I find the
comparison to be insulting.

Greetings,

Andres Freund

In reply to: Andres Freund (#12)
#22Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#20)
In reply to: Stephen Frost (#22)
#24Stephen Frost
sfrost@snowman.net
In reply to: Peter Geoghegan (#23)
In reply to: Stephen Frost (#24)
#26Stephen Frost
sfrost@snowman.net
In reply to: Peter Geoghegan (#25)
In reply to: Stephen Frost (#26)
#28Michael Banck
michael.banck@credativ.de
In reply to: Peter Geoghegan (#27)
In reply to: Michael Banck (#28)
#30Michael Banck
michael.banck@credativ.de
In reply to: Peter Geoghegan (#29)
In reply to: Peter Geoghegan (#29)
#32Michael Banck
michael.banck@credativ.de
In reply to: Peter Geoghegan (#31)
In reply to: Michael Banck (#32)
#34Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Geoghegan (#33)
#35Stephen Frost
sfrost@snowman.net
In reply to: Peter Geoghegan (#27)
#36Stephen Frost
sfrost@snowman.net
In reply to: Michael Banck (#30)
In reply to: Stephen Frost (#35)
#38Stephen Frost
sfrost@snowman.net
In reply to: Peter Geoghegan (#37)
#39tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Stephen Frost (#38)
#40Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Stephen Frost (#35)
#41Andres Freund
andres@anarazel.de
In reply to: Laurenz Albe (#40)
#42David Steele
david@pgmasters.net
In reply to: Andres Freund (#41)
#43Bruce Momjian
bruce@momjian.us
In reply to: Laurenz Albe (#40)
#44Bruce Momjian
bruce@momjian.us
In reply to: David Steele (#42)
#45Magnus Hagander
magnus@hagander.net
In reply to: Bruce Momjian (#44)
#46Bruce Momjian
bruce@momjian.us
In reply to: Magnus Hagander (#45)