synchronous_commit = remote_flush

Started by Thomas Munroover 9 years ago8 messages
#1Thomas Munro
thomas.munro@enterprisedb.com

Hi hackers,

To do something about the confusion I keep seeing about what exactly
"on" means, I've often wished we had "remote_flush". But it's not
obvious how the backwards compatibility could work, ie how to keep the
people happy who use "local" vs "on" to control syncrep, and also the
people who use "off" vs "on" to control asynchronous commit on
single-node systems. Is there any sensible way to do that, or is it
not broken and I should pipe down, or is it just far too entrenched
and never going to change?

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Thomas Munro (#1)
Re: synchronous_commit = remote_flush

On 8/17/16 11:22 PM, Thomas Munro wrote:

Hi hackers,

To do something about the confusion I keep seeing about what exactly
"on" means, I've often wished we had "remote_flush". But it's not
obvious how the backwards compatibility could work, ie how to keep the
people happy who use "local" vs "on" to control syncrep, and also the
people who use "off" vs "on" to control asynchronous commit on
single-node systems. Is there any sensible way to do that, or is it
not broken and I should pipe down, or is it just far too entrenched
and never going to change?

I'm wondering if we've hit the point where trying to put all of this in
a single GUC is a bad idea... changing that probably means a config
compatibility break, but I don't think that's necessarily a bad thing at
this point...
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#1)
Re: synchronous_commit = remote_flush

On Thu, Aug 18, 2016 at 12:22 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

To do something about the confusion I keep seeing about what exactly
"on" means, I've often wished we had "remote_flush". But it's not
obvious how the backwards compatibility could work, ie how to keep the
people happy who use "local" vs "on" to control syncrep, and also the
people who use "off" vs "on" to control asynchronous commit on
single-node systems. Is there any sensible way to do that, or is it
not broken and I should pipe down, or is it just far too entrenched
and never going to change?

I don't see why we can't add "remote_flush" as a synonym for "on". Do
you have something else in mind?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Robert Haas (#3)
Re: synchronous_commit = remote_flush

On Fri, Aug 19, 2016 at 5:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Aug 18, 2016 at 12:22 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

To do something about the confusion I keep seeing about what exactly
"on" means, I've often wished we had "remote_flush". But it's not
obvious how the backwards compatibility could work, ie how to keep the
people happy who use "local" vs "on" to control syncrep, and also the
people who use "off" vs "on" to control asynchronous commit on
single-node systems. Is there any sensible way to do that, or is it
not broken and I should pipe down, or is it just far too entrenched
and never going to change?

I don't see why we can't add "remote_flush" as a synonym for "on". Do
you have something else in mind?

+1 for adding "remote_flush" as a synonym for "on".
It doesn't break backward compatibility.

Regards,

--
Masahiko Sawada

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Masahiko Sawada (#4)
Re: synchronous_commit = remote_flush

On Fri, Aug 19, 2016 at 7:32 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Aug 19, 2016 at 5:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Aug 18, 2016 at 12:22 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

To do something about the confusion I keep seeing about what exactly
"on" means, I've often wished we had "remote_flush". But it's not
obvious how the backwards compatibility could work, ie how to keep the
people happy who use "local" vs "on" to control syncrep, and also the
people who use "off" vs "on" to control asynchronous commit on
single-node systems. Is there any sensible way to do that, or is it
not broken and I should pipe down, or is it just far too entrenched
and never going to change?

I don't see why we can't add "remote_flush" as a synonym for "on". Do
you have something else in mind?

+1 for adding "remote_flush" as a synonym for "on".
It doesn't break backward compatibility.

Right, we could just add it to guc.c after "on", so that you can "SET
synchronous_commit TO remote_flush", but then "SHOW
synchronous_commit" returns "on".

The problem I was thinking about was this: if you add "remote_flush"
before "on" in guc.c, then "SHOW ..." will return "remote_flush",
which would be really helpful for users trying to understand what
syncrep is actually doing; but it would probably confuse single node
users and async replication users.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Christoph Berg
myon@debian.org
In reply to: Thomas Munro (#5)
Re: synchronous_commit = remote_flush

Re: Thomas Munro 2016-08-21 <CAEepm=0EQvwhFih7wZ+cHL=UJDvF4KSe0thw1gPEY-ga3DcvmQ@mail.gmail.com>

Right, we could just add it to guc.c after "on", so that you can "SET
synchronous_commit TO remote_flush", but then "SHOW
synchronous_commit" returns "on".

The problem I was thinking about was this: if you add "remote_flush"
before "on" in guc.c, then "SHOW ..." will return "remote_flush",
which would be really helpful for users trying to understand what
syncrep is actually doing; but it would probably confuse single node
users and async replication users.

Maybe "flush" would work, given it applies locally and on the remote
side? (And "local" could be "local_flush"...?)

Christoph

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Jim Nasby (#2)
Re: synchronous_commit = remote_flush

On Fri, Aug 19, 2016 at 6:30 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

I'm wondering if we've hit the point where trying to put all of this in a
single GUC is a bad idea... changing that probably means a config
compatibility break, but I don't think that's necessarily a bad thing at
this point...

Aside from the (IMHO) slightly confusing way that "on" works, which is
the smaller issue I was raising in this thread, I agree that we might
eventually want to escape from the assumption that "local apply" (=
off), local flush, remote write, remote flush, remote apply happen in
that order and therefore a single linear control knob can describe
which of those to wait for.

Some pie-in-the-sky thoughts: we currently can't reach
"group-safe"[1]https://infoscience.epfl.ch/record/49936/files/WS03, where you wait only for N servers to have the WAL in
memory (let's say that for us that means write but not flush): the
closest we can get is "1-safe and group-safe", using remote_write to
wait for the standbys to write (= "group-safe"), which implies local
flush (= "1-safe"). Now that'd be a terrible level to use unless your
recovery procedure included cluster-wide communication to straighten
things out, and without any such clusterware it makes a lot of sense
to have the master flush before sending, and I'm not actually
proposing we change that, I'm just speculating that someone might
eventually want it. We also can't have standbys apply before they
flush; as far as I know there is no theoretical reason why that
shouldn't be allowed, except maybe for some special synchronisation
steps around checkpoint records so that recovery doesn't get too far
ahead. That'd mirror what happens on the master more closely.
Imagine if you wanted to wait for your transaction to become visible
on certain other servers, but didn't want to wait for any disks:
that'd be the distributed equivalent of today's "off", but today's
"remote_apply" implies local flush and remote flush. Or more likely
you'd want some combination: 2-safe or group-safe on some subset of
servers to satisfy your durability requirements, and applied on some
other perhaps larger subset of servers for consistency. But this is
just water cooler handwaving.

[1]: https://infoscience.epfl.ch/record/49936/files/WS03

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#7)
Re: synchronous_commit = remote_flush

On Sun, Aug 21, 2016 at 6:08 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Fri, Aug 19, 2016 at 6:30 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

I'm wondering if we've hit the point where trying to put all of this in a
single GUC is a bad idea... changing that probably means a config
compatibility break, but I don't think that's necessarily a bad thing at
this point...

Aside from the (IMHO) slightly confusing way that "on" works, which is
the smaller issue I was raising in this thread, I agree that we might
eventually want to escape from the assumption that "local apply" (=
off), local flush, remote write, remote flush, remote apply happen in
that order and therefore a single linear control knob can describe
which of those to wait for.

Some pie-in-the-sky thoughts: we currently can't reach
"group-safe"[1], where you wait only for N servers to have the WAL in
memory (let's say that for us that means write but not flush): the
closest we can get is "1-safe and group-safe", using remote_write to
wait for the standbys to write (= "group-safe"), which implies local
flush (= "1-safe"). Now that'd be a terrible level to use unless your
recovery procedure included cluster-wide communication to straighten
things out, and without any such clusterware it makes a lot of sense
to have the master flush before sending, and I'm not actually
proposing we change that, I'm just speculating that someone might
eventually want it. We also can't have standbys apply before they
flush; as far as I know there is no theoretical reason why that
shouldn't be allowed, except maybe for some special synchronisation
steps around checkpoint records so that recovery doesn't get too far
ahead.

Well, in order to remain recoverable, the standby has to obey the
WAL-before-data rule: if it writes a page with a given LSN, that LSN
had better be flushed to disk first. In practice, this means that if
you want a standby to remain recoverable without needing to contact
the rest of the cluster, you can't let its minimum recovery point pass
the WAL flush point. In short, this comes up anytime you evict a
buffer, not just around checkpoints.

That'd mirror what happens on the master more closely.
Imagine if you wanted to wait for your transaction to become visible
on certain other servers, but didn't want to wait for any disks:
that'd be the distributed equivalent of today's "off", but today's
"remote_apply" implies local flush and remote flush. Or more likely
you'd want some combination: 2-safe or group-safe on some subset of
servers to satisfy your durability requirements, and applied on some
other perhaps larger subset of servers for consistency. But this is
just water cooler handwaving.

Sure, that stuff would be great, and we'll probably have to redesign
synchronous_commit entirely if and when we get there, but I'm not sure
it makes sense to tinker with it now just for that. The original
reason why I suggested the current design for synchronous_commit is to
avoid forcing people to set yet another GUC in order to use
synchronous replication. The default of 'on' means that you can just
configure synchronous_standby_names and away you go. Perhaps a better
design as we added more values would have been to keep
synchronous_commit as on/local/off and use a separate GUC, say,
synchronous_replication to define what "on" means: remote_apply,
remote_flush, remote_apply, 2safe+groupsafe, or whatever. And when
synchronous_standby_names='' then the value of synchronous_replication
is ignored, and synchronous_commit=on means the same as
synchronous_commit=local, just as it does today.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers