Checksums by default?
Is it time to enable checksums by default, and give initdb a switch to turn
it off instead?
I keep running into situations where people haven't enabled it, because (a)
they didn't know about it, or (b) their packaging system ran initdb for
them so they didn't even know they could. And of course they usually figure
this out once the db has enough data and traffic that the only way to fix
it is to set up something like slony/bucardo/pglogical and a whole new
server to deal with it.. (Which is something that would also be good to fix
-- but having the default changed would be useful as well)
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
* Magnus Hagander (magnus@hagander.net) wrote:
Is it time to enable checksums by default, and give initdb a switch to turn
it off instead?
Yes, please.
We've already agreed to make changes to have a better user experience
and ask those who really care about certain performance aspects to have
to configure for performance instead (see: wal_level changes), I view
this as being very much in that same vein.
I know one argument in the past has been that we don't have a tool that
can be used to check all of the checksums, but that's also changed now
that pgBackRest supports verifying checksums during backups. I'm all
for adding a tool to core to perform a validation too, of course, though
it does make a lot of sense to validate checksums during backup since
you're reading all the pages anyway.
Thanks!
Stephen
On Sat, Jan 21, 2017 at 7:39 PM, Magnus Hagander <magnus@hagander.net> wrote:
Is it time to enable checksums by default, and give initdb a switch to turn
it off instead?I keep running into situations where people haven't enabled it, because (a)
they didn't know about it, or (b) their packaging system ran initdb for them
so they didn't even know they could. And of course they usually figure this
out once the db has enough data and traffic that the only way to fix it is
to set up something like slony/bucardo/pglogical and a whole new server to
deal with it.. (Which is something that would also be good to fix -- but
having the default changed would be useful as well)
Perhaps that's not mandatory, but I think that one obstacle in
changing this default is to be able to have pg_upgrade work from a
checksum-disabled old instance to a checksum-enabled instance. That
would really help with its adoption.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jan 21, 2017 at 3:05 PM, Michael Paquier <michael.paquier@gmail.com>
wrote:
On Sat, Jan 21, 2017 at 7:39 PM, Magnus Hagander <magnus@hagander.net>
wrote:Is it time to enable checksums by default, and give initdb a switch to
turn
it off instead?
I keep running into situations where people haven't enabled it, because
(a)
they didn't know about it, or (b) their packaging system ran initdb for
them
so they didn't even know they could. And of course they usually figure
this
out once the db has enough data and traffic that the only way to fix it
is
to set up something like slony/bucardo/pglogical and a whole new server
to
deal with it.. (Which is something that would also be good to fix -- but
having the default changed would be useful as well)Perhaps that's not mandatory, but I think that one obstacle in
changing this default is to be able to have pg_upgrade work from a
checksum-disabled old instance to a checksum-enabled instance. That
would really help with its adoption.
That's a different usecase though.
If we just change the default, then we'd have to teach pg_upgrade to
initialize the upgraded cluster without checksums. We still need to keep
that *option*, just reverse the default.
Being able to enable checksums on the fly is a different feature. Which I'd
really like to have. I have some unfinished code for it, but it's a bit too
unfinished so far :)
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
* Michael Paquier (michael.paquier@gmail.com) wrote:
On Sat, Jan 21, 2017 at 7:39 PM, Magnus Hagander <magnus@hagander.net> wrote:
Is it time to enable checksums by default, and give initdb a switch to turn
it off instead?I keep running into situations where people haven't enabled it, because (a)
they didn't know about it, or (b) their packaging system ran initdb for them
so they didn't even know they could. And of course they usually figure this
out once the db has enough data and traffic that the only way to fix it is
to set up something like slony/bucardo/pglogical and a whole new server to
deal with it.. (Which is something that would also be good to fix -- but
having the default changed would be useful as well)Perhaps that's not mandatory, but I think that one obstacle in
changing this default is to be able to have pg_upgrade work from a
checksum-disabled old instance to a checksum-enabled instance. That
would really help with its adoption.
That's moving the goal-posts here about 3000 miles away and I don't
believe it's necessary to have that to make this change.
I agree that it'd be great to have, of course, and we're looking at if
we could do something like: backup a checksum-disabled system, perform a
restore which adds checksums and marks the cluster as now having
checksums. If we can work out a good way to do that *and* have it work
with incremental backup/restore, then we could possibly provide a
small-downtime-window way to upgrade to a database with checksums.
Thanks!
Stephen
Magnus,
* Magnus Hagander (magnus@hagander.net) wrote:
On Sat, Jan 21, 2017 at 3:05 PM, Michael Paquier <michael.paquier@gmail.com>
wrote:On Sat, Jan 21, 2017 at 7:39 PM, Magnus Hagander <magnus@hagander.net>
wrote:Is it time to enable checksums by default, and give initdb a switch to
turn
it off instead?
I keep running into situations where people haven't enabled it, because
(a)
they didn't know about it, or (b) their packaging system ran initdb for
them
so they didn't even know they could. And of course they usually figure
this
out once the db has enough data and traffic that the only way to fix it
is
to set up something like slony/bucardo/pglogical and a whole new server
to
deal with it.. (Which is something that would also be good to fix -- but
having the default changed would be useful as well)Perhaps that's not mandatory, but I think that one obstacle in
changing this default is to be able to have pg_upgrade work from a
checksum-disabled old instance to a checksum-enabled instance. That
would really help with its adoption.That's a different usecase though.
Agreed.
If we just change the default, then we'd have to teach pg_upgrade to
initialize the upgraded cluster without checksums. We still need to keep
that *option*, just reverse the default.
Just to clarify- pg_upgrade doesn't init the new database, the user (or
a distribution script) does. As such *pg_upgradecluster* would have to
know to init the new cluster correctly based on the options the old
cluster was init'd with, but it might actually already do that (not sure
off-hand), and, even if it doesn't, it shouldn't be too hard to make it
to that.
Being able to enable checksums on the fly is a different feature. Which I'd
really like to have. I have some unfinished code for it, but it's a bit too
unfinished so far :)
Agreed.
Thanks!
Stephen
On 21/01/17 11:39, Magnus Hagander wrote:
Is it time to enable checksums by default, and give initdb a switch to
turn it off instead?
I'd like to see benchmark first, both in terms of CPU and in terms of
produced WAL (=network traffic) given that it turns on logging of hint bits.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Petr,
* Petr Jelinek (petr.jelinek@2ndquadrant.com) wrote:
On 21/01/17 11:39, Magnus Hagander wrote:
Is it time to enable checksums by default, and give initdb a switch to
turn it off instead?I'd like to see benchmark first, both in terms of CPU and in terms of
produced WAL (=network traffic) given that it turns on logging of hint bits.
Benchmarking was done previously, but I don't think it's really all that
relevant, we should be checksum'ing by default because we care about the
data and it's hard to get checksums enabled on a running system.
If this is going to be a serious argument made against making this
change (and, frankly, I don't believe that it should be) then what we
should do is simply provide a way for users to disable checksums. It
would be one-way and require a restart, of course, but it wouldn't be
hard to do.
Thanks!
Stephen
On Sun, Jan 22, 2017 at 12:18 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
On 21/01/17 11:39, Magnus Hagander wrote:
Is it time to enable checksums by default, and give initdb a switch to
turn it off instead?I'd like to see benchmark first, both in terms of CPU and in terms of
produced WAL (=network traffic) given that it turns on logging of hint bits.
+1
If the performance overhead by the checksums is really negligible,
we may be able to get rid of wal_log_hints parameter, as well.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Resolved by subject fallback
* Fujii Masao (masao.fujii@gmail.com) wrote:
On Sun, Jan 22, 2017 at 12:18 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:On 21/01/17 11:39, Magnus Hagander wrote:
Is it time to enable checksums by default, and give initdb a switch to
turn it off instead?I'd like to see benchmark first, both in terms of CPU and in terms of
produced WAL (=network traffic) given that it turns on logging of hint bits.+1
If the performance overhead by the checksums is really negligible,
we may be able to get rid of wal_log_hints parameter, as well.
Prior benchmarks showed it to be on the order of a few percent, as I
recall, so I'm not sure that we can say it's negligible (and that's not
why Magnus was proposing changing the default).
Thanks!
Stephen
Magnus Hagander <magnus@hagander.net> writes:
Is it time to enable checksums by default, and give initdb a switch to turn
it off instead?
Have we seen *even one* report of checksums catching problems in a useful
way?
I think this will be making the average user pay X% for nothing.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Magnus Hagander <magnus@hagander.net> writes:
Is it time to enable checksums by default, and give initdb a switch to turn
it off instead?Have we seen *even one* report of checksums catching problems in a useful
way?
This isn't the right question.
The right question is "have we seen reports of corruption which
checksums *would* have caught?" Admittedly, that's a much harder
question to answer, but I've definitely seen various reports of
corruption in the field, but it's reasonably rare (which I am sure we
can all be thankful for). I can't say for sure which of those cases
would have been caught if checksums had been enabled, but I have a hard
time believing that none of them would have been caught sooner if
checksums had been enabled and regular checksum validation was being
performed.
Given our current default and the relative rarity that it happens, it'll
be a great deal longer until we see such a report- but when we do (and I
don't doubt that we will, eventually) what are we going to do about it?
Tell the vast majority of people who still don't have checksums enabled
because it wasn't the default that they need to pg_dump/reload? That's
not a good way to treat our users.
I think this will be making the average user pay X% for nothing.
Have we seen *even one* report of someone having to disable checksums
for performance reasons? If so, that's an argument for giving a way for
users who really trust their hardware, virtualization system, kernel,
storage network, and everything else involved, to disable checksums (as
I suggested elsewhere), not a reason to keep the current default.
Thanks!
Stephen
On 21/01/17 16:40, Stephen Frost wrote:
Petr,
* Petr Jelinek (petr.jelinek@2ndquadrant.com) wrote:
On 21/01/17 11:39, Magnus Hagander wrote:
Is it time to enable checksums by default, and give initdb a switch to
turn it off instead?I'd like to see benchmark first, both in terms of CPU and in terms of
produced WAL (=network traffic) given that it turns on logging of hint bits.Benchmarking was done previously, but I don't think it's really all that
relevant, we should be checksum'ing by default because we care about the
data and it's hard to get checksums enabled on a running system.
I do think that performance implications are very relevant. And I
haven't seen any serious benchmark that would incorporate all current
differences between using and not using checksums.
The change of wal_level was supported by benchmark, I think it's
reasonable to ask for this to be as well.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Petr,
* Petr Jelinek (petr.jelinek@2ndquadrant.com) wrote:
On 21/01/17 16:40, Stephen Frost wrote:
* Petr Jelinek (petr.jelinek@2ndquadrant.com) wrote:
On 21/01/17 11:39, Magnus Hagander wrote:
Is it time to enable checksums by default, and give initdb a switch to
turn it off instead?I'd like to see benchmark first, both in terms of CPU and in terms of
produced WAL (=network traffic) given that it turns on logging of hint bits.Benchmarking was done previously, but I don't think it's really all that
relevant, we should be checksum'ing by default because we care about the
data and it's hard to get checksums enabled on a running system.I do think that performance implications are very relevant. And I
haven't seen any serious benchmark that would incorporate all current
differences between using and not using checksums.
This is just changing the *default*, not requiring checksums to always
be enabled. We do not hold the same standards for our defaults as we do
for always-enabled code, for clear reasons- not every situation is the
same and that's why we have defaults that people can change.
There are interesting arguments to be made about if checksum'ing is
every worthwhile at all (some seem to see that the feature is entirely
useless and we should just rip that code out, but I don't agree with
that), or if we should just always enable it (because fewer options is a
good thing and we care about our user's data and checksum'ing is worth
the performance hit if it's a small hit; I'm more on the fence when it
comes to this one as I have heard people say that they've run into cases
where it does enough of a difference in performance to matter for them).
We don't currently configure the defaults for any system to be the
fastest possible performance, or we wouldn't have changed wal_level and
we would have move aggressive settings for things like default work_mem,
maintenance_work_mem, shared_buffers, max_wal_size,
checkpoint_completion_target, all of the autovacuum settings,
effective_io_concurrency, effective_cache_size, etc, etc.
The change of wal_level was supported by benchmark, I think it's
reasonable to ask for this to be as well.
No, it wasn't, it was that people felt the cases where changing
wal_level would seriously hurt performance didn't out-weigh the value of
making the change to the default.
Thanks!
Stephen
Stephen Frost <sfrost@snowman.net> writes:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Have we seen *even one* report of checksums catching problems in a useful
way?
This isn't the right question.
I disagree. If they aren't doing something useful for people who have
turned them on, what's the reason to think they'd do something useful
for the rest?
The right question is "have we seen reports of corruption which
checksums *would* have caught?"
Sure, that's also a useful question, one which hasn't been answered.
A third useful question is "have we seen any reports of false-positive
checksum failures?". Even one false positive, IMO, would have costs that
likely outweigh any benefits for typical installations with reasonably
reliable storage hardware.
I really do not believe that there's a case for turning on checksums by
default, and I *certainly* won't go along with turning them on without
somebody actually making that case. "Is it time yet" is not an argument.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 21/01/17 17:31, Stephen Frost wrote:
Petr,
* Petr Jelinek (petr.jelinek@2ndquadrant.com) wrote:
On 21/01/17 16:40, Stephen Frost wrote:
* Petr Jelinek (petr.jelinek@2ndquadrant.com) wrote:
On 21/01/17 11:39, Magnus Hagander wrote:
Is it time to enable checksums by default, and give initdb a switch to
turn it off instead?I'd like to see benchmark first, both in terms of CPU and in terms of
produced WAL (=network traffic) given that it turns on logging of hint bits.Benchmarking was done previously, but I don't think it's really all that
relevant, we should be checksum'ing by default because we care about the
data and it's hard to get checksums enabled on a running system.I do think that performance implications are very relevant. And I
haven't seen any serious benchmark that would incorporate all current
differences between using and not using checksums.This is just changing the *default*, not requiring checksums to always
be enabled. We do not hold the same standards for our defaults as we do
for always-enabled code, for clear reasons- not every situation is the
same and that's why we have defaults that people can change.
I can buy that. If it's possible to turn checksums off without
recreating data directory then I think it would be okay to have default on.
The change of wal_level was supported by benchmark, I think it's
reasonable to ask for this to be as well.No, it wasn't, it was that people felt the cases where changing
wal_level would seriously hurt performance didn't out-weigh the value of
making the change to the default.
Really?
/messages/by-id/d34ce5b5-131f-66ce-f7c5-eb406dbe026f@2ndquadrant.com
/messages/by-id/83b33502-1bf8-1ffb-7c73-5b61ddeb68ab@2ndquadrant.com
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01/21/2017 04:48 PM, Stephen Frost wrote:
* Fujii Masao (masao.fujii@gmail.com) wrote:
If the performance overhead by the checksums is really negligible,
we may be able to get rid of wal_log_hints parameter, as well.Prior benchmarks showed it to be on the order of a few percent, as I
recall, so I'm not sure that we can say it's negligible (and that's not
why Magnus was proposing changing the default).
It might be worth looking into using the CRC CPU instruction to reduce
this overhead, like we do for the WAL checksums. Since that is a
different algorithm it would be a compatibility break and we would need
to support the old algorithm for upgraded clusters..
Andreas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Stephen Frost <sfrost@snowman.net> writes:
* Petr Jelinek (petr.jelinek@2ndquadrant.com) wrote:
The change of wal_level was supported by benchmark, I think it's
reasonable to ask for this to be as well.
No, it wasn't, it was that people felt the cases where changing
wal_level would seriously hurt performance didn't out-weigh the value of
making the change to the default.
It was "supported" in the sense that somebody took the trouble to measure
the impact, so that we had some facts on which to base the value judgment
that the cost was acceptable. In the case of checksums, you seem to be in
a hurry to arrive at a conclusion without any supporting evidence.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Petr Jelinek (petr.jelinek@2ndquadrant.com) wrote:
On 21/01/17 17:31, Stephen Frost wrote:
This is just changing the *default*, not requiring checksums to always
be enabled. We do not hold the same standards for our defaults as we do
for always-enabled code, for clear reasons- not every situation is the
same and that's why we have defaults that people can change.I can buy that. If it's possible to turn checksums off without
recreating data directory then I think it would be okay to have default on.
I'm glad to hear that.
The change of wal_level was supported by benchmark, I think it's
reasonable to ask for this to be as well.No, it wasn't, it was that people felt the cases where changing
wal_level would seriously hurt performance didn't out-weigh the value of
making the change to the default.Really?
Yes.
/messages/by-id/d34ce5b5-131f-66ce-f7c5-eb406dbe026f@2ndquadrant.com
From the above link:
So while it'd be trivial to construct workloads demonstrating the
optimizations in wal_level=minimal (e.g. initial loads doing CREATE
TABLE + COPY + CREATE INDEX in a single transaction), but that would be
mostly irrelevant I guess.
Instead, I've decided to run regular pgbench TPC-B-like workload on a
bunch of different scales, and measure throughput + some xlog stats with
each of the three wal_level options.
In other words, there was no performance testing of the cases where
wal_level=minimal (the old default) optimizations would have been
compared against wal_level > minimal.
I'm quite sure that the performance numbers for the CREATE TABLE + COPY
case with wal_level=minimal would have been *far* better than for
wal_level > minimal.
That case was entirely punted on as "mostly irrelevant" even though
there are known production environments where those optimizations make a
huge difference. Those are OLAP cases though, and not nearly enough
folks around here seem to care one bit about them, which I continue to
be disappointed by.
Even so, I *did* agree with the change to the default of wal_level,
based on an understanding of its value and that users could change to
wal_level=minimal if they wished to, just as I am arguing that same
thing here when it comes to checksums.
Thanks!
Stephen
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Stephen Frost <sfrost@snowman.net> writes:
* Petr Jelinek (petr.jelinek@2ndquadrant.com) wrote:
The change of wal_level was supported by benchmark, I think it's
reasonable to ask for this to be as well.No, it wasn't, it was that people felt the cases where changing
wal_level would seriously hurt performance didn't out-weigh the value of
making the change to the default.It was "supported" in the sense that somebody took the trouble to measure
the impact, so that we had some facts on which to base the value judgment
that the cost was acceptable. In the case of checksums, you seem to be in
a hurry to arrive at a conclusion without any supporting evidence.
No, no one measured the impact in the cases where wal_level=minimal
makes a big difference, that I saw, at least.
Further info with links to what was done are in my reply to Petr.
As for checksums, I do see value in them and I'm pretty sure that the
author of that particular feature did as well, or we wouldn't even have
it as an option. You seem to be of the opinion that we might as well
just rip all of that code and work out as being useless.
Thanks!
Stephen