Sync Rep and shutdown Re: Sync Rep v19

Started by Fujii Masaoabout 15 years ago42 messageshackers
Jump to latest
#1Fujii Masao
masao.fujii@gmail.com

On Wed, Mar 9, 2011 at 2:14 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

On Tue, Mar 8, 2011 at 11:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:

The fast shutdown handling seems fine, but why not just handle smart
shutdown the same way?

currently, smart shutdown means no new connections, wait until
existing ones close normally. for consistency, it should behave the
same for sync rep.

Agreed. I think that user who wants to request smart shutdown expects all
the existing connections to basically be closed normally by the client. So it
doesn't seem to be good idea to forcibly close the connection and prevent
the COMMIT from being returned in smart shutdown case. But I'm all ears
for better suggestions.

Anyway, we got the consensus about how fast shutdown should work with
sync rep. So I created the patch. Please feel free to comment and commit
the patch first ;)

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

fast_shutdown_syncrep_v1.patchapplication/octet-stream; name=fast_shutdown_syncrep_v1.patchDownload+81-52
#2Yeb Havinga
yebhavinga@gmail.com
In reply to: Fujii Masao (#1)
Re: Sync Rep and shutdown Re: Sync Rep v19

On 2011-03-09 08:38, Fujii Masao wrote:

On Wed, Mar 9, 2011 at 2:14 PM, Jaime Casanova<jaime@2ndquadrant.com> wrote:

On Tue, Mar 8, 2011 at 11:58 AM, Robert Haas<robertmhaas@gmail.com> wrote:

The fast shutdown handling seems fine, but why not just handle smart
shutdown the same way?

currently, smart shutdown means no new connections, wait until
existing ones close normally. for consistency, it should behave the
same for sync rep.

Agreed. I think that user who wants to request smart shutdown expects all
the existing connections to basically be closed normally by the client. So it
doesn't seem to be good idea to forcibly close the connection and prevent
the COMMIT from being returned in smart shutdown case. But I'm all ears
for better suggestions.

For me smart has always been synonymous to no forced disconnects/exits,
or put different, the 'clean' solution, as opposite to the fast and
unclean shutdown.

An alternative for a clean solution might be to forbid smart shutdown,
if none of the sync standbys is connected. This would prevent the master
to enter a state in which a standby cannot connect anymore.

regards,
Yeb Havinga

#3Magnus Hagander
magnus@hagander.net
In reply to: Fujii Masao (#1)
Re: Sync Rep and shutdown Re: Sync Rep v19

On Wed, Mar 9, 2011 at 08:38, Fujii Masao <masao.fujii@gmail.com> wrote:

On Wed, Mar 9, 2011 at 2:14 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

On Tue, Mar 8, 2011 at 11:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:

The fast shutdown handling seems fine, but why not just handle smart
shutdown the same way?

currently, smart shutdown means no new connections, wait until
existing ones close normally. for consistency, it should behave the
same for sync rep.

Agreed. I think that user who wants to request smart shutdown expects all
the existing connections to basically be closed normally by the client. So it
doesn't seem to be good idea to forcibly close the connection and prevent
the COMMIT from being returned in smart shutdown case. But I'm all ears
for better suggestions.

"don't use smart shutdowns"? ;)

Anyway, for those that *do* use smart intentionally, I agree that
doing any kind of forced close at all is just plain wrong.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#4Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#1)
Re: Sync Rep and shutdown Re: Sync Rep v19

On Wed, 2011-03-09 at 16:38 +0900, Fujii Masao wrote:

On Wed, Mar 9, 2011 at 2:14 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote:

On Tue, Mar 8, 2011 at 11:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:

The fast shutdown handling seems fine, but why not just handle smart
shutdown the same way?

currently, smart shutdown means no new connections, wait until
existing ones close normally. for consistency, it should behave the
same for sync rep.

Agreed. I think that user who wants to request smart shutdown expects all
the existing connections to basically be closed normally by the client. So it
doesn't seem to be good idea to forcibly close the connection and prevent
the COMMIT from being returned in smart shutdown case. But I'm all ears
for better suggestions.

Anyway, we got the consensus about how fast shutdown should work with
sync rep. So I created the patch. Please feel free to comment and commit
the patch first ;)

We're just about to publish Alpha4 with this feature in.

If we release waiters too early we will cause effective data loss, that
part is agreed. We've also accepted that there are few ways to release
the waiters.

I want to release the first version as "safe" and then relax from there
after feedback. What I don't want to hear is lots of complaints or
arguments about data loss from the first version. We can relax later,
after some time and thought.

So for now, I don't want to apply this patch or other similar ones that
seek to release waiters in various circumstances. That isn't a rejection
of you, its just a wish to play it safe and slowly.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#5Yeb Havinga
yebhavinga@gmail.com
In reply to: Simon Riggs (#4)
Re: Sync Rep and shutdown Re: Sync Rep v19

On 2011-03-09 15:10, Simon Riggs wrote:

On Wed, 2011-03-09 at 16:38 +0900, Fujii Masao wrote:

On Wed, Mar 9, 2011 at 2:14 PM, Jaime Casanova<jaime@2ndquadrant.com> wrote:

On Tue, Mar 8, 2011 at 11:58 AM, Robert Haas<robertmhaas@gmail.com> wrote:

The fast shutdown handling seems fine, but why not just handle smart
shutdown the same way?

currently, smart shutdown means no new connections, wait until
existing ones close normally. for consistency, it should behave the
same for sync rep.

Agreed. I think that user who wants to request smart shutdown expects all
the existing connections to basically be closed normally by the client. So it
doesn't seem to be good idea to forcibly close the connection and prevent
the COMMIT from being returned in smart shutdown case. But I'm all ears
for better suggestions.

Anyway, we got the consensus about how fast shutdown should work with
sync rep. So I created the patch. Please feel free to comment and commit
the patch first ;)

We're just about to publish Alpha4 with this feature in.

If we release waiters too early we will cause effective data loss, that
part is agreed. We've also accepted that there are few ways to release
the waiters.

I want to release the first version as "safe" and then relax from there
after feedback.

This is not safe and possible in the first version:

1) issue stop on master when no sync standby is connected:
mgrid@mg73:~$ pg_ctl -D /data stop
waiting for server to shut
down............................................................... failed
pg_ctl: server does not shut down

2) start the standby that failed
mgrid@mg72:~$ pg_ctl -D /data start
pg_ctl: another server might be running; trying to start server anyway
LOG: 00000: database system was interrupted while in recovery at log
time 2011-03-09 15:22:31 CET
HINT: If this has occurred more than once some data might be corrupted
and you might need to choose an earlier recovery target.
LOG: 00000: entering standby mode
LOG: 00000: redo starts at 57/1A000078
LOG: 00000: consistent recovery state reached at 57/1A0000A0
FATAL: XX000: could not connect to the primary server: FATAL: the
database system is shutting down

LOCATION: libpqrcv_connect, libpqwalreceiver.c:102
server starting
mgrid@mg72:~$ FATAL: XX000: could not connect to the primary server:
FATAL: the database system is shutting down

A safe solution would be to prevent smart shutdown on the master if it
is in sync mode and there are no sync standbys connected.

The current situation is definately unsafe because it forces people that
are in this state to do a fast shutdown.. but that fails as well, so
they are only left with immediate.

mgrid@mg73:~$ pg_ctl -D /data stop -m fast
waiting for server to shut
down............................................................... failed
pg_ctl: server does not shut down
mgrid@mg73:~$ pg_ctl -D /data stop -m immediate
waiting for server to shut down.... done
server stopped

regards,
Yeb Havinga

#6Simon Riggs
simon@2ndQuadrant.com
In reply to: Yeb Havinga (#5)
Re: Sync Rep and shutdown Re: Sync Rep v19

On Wed, 2011-03-09 at 15:37 +0100, Yeb Havinga wrote:

The current situation is definately unsafe because it forces people
that are in this state to do a fast shutdown.. but that fails as well,
so they are only left with immediate.

All the more reason not to change anything, since we disagree.

The idea is that you're supposed to wait for the standby to come back up
or do failover. If you shutdown the master its because you are choosing
to failover.

Shutting down the master and restarting without failover leads to a
situation where some sync rep commits are not on both master and
standby. Making it easier to shutdown encourages that, which I don't
wish to do, at this stage.

We may decide that this is the right approach but I don't wish to rush
into that decision. I want to have clear agreement about all the changes
we want and what we call them if we do them. Zero data loss is
ultimately about users having confidence in us, not about specific
features. Our disagreements on this patch risk damaging that confidence,
whoever is right/wrong.

Further changes can be made over the course of the next few weeks, based
upon feedback from a wider pool of potential users.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#7Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#6)
Re: Sync Rep and shutdown Re: Sync Rep v19

On Thu, Mar 10, 2011 at 12:03 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Wed, 2011-03-09 at 15:37 +0100, Yeb Havinga wrote:

The current situation is definately unsafe because it forces people
that are in this state to do a fast shutdown.. but that fails as well,
so they are only left with immediate.

I agree with Yeb.

All the more reason not to change anything, since we disagree.

The idea is that you're supposed to wait for the standby to come back up
or do failover. If you shutdown the master its because you are choosing
to failover.

Shutting down the master and restarting without failover leads to a
situation where some sync rep commits are not on both master and
standby. Making it easier to shutdown encourages that, which I don't
wish to do, at this stage.

I'm not sure I follow you. The proposed fast shutdown prevents the
backends which have not received the ACK from the standby yet
from returning the "success" to the client. So even after restarting
the server, there is no data loss from client's point of view. If this is
really unsafe, we *must* forbid immediate shutdown while backend
is waiting for sync rep. Because immediate shutdown creates the
same situation.

What scenario are you concerned?

We may decide that this is the right approach but I don't wish to rush
into that decision. I want to have clear agreement about all the changes
we want and what we call them if we do them. Zero data loss is
ultimately about users having confidence in us, not about specific
features. Our disagreements on this patch risk damaging that confidence,
whoever is right/wrong.

Same as above. I think that it's more problematic to leave the code
as it is. Because smart/fast shutdown can make the server get stuck
until immediate shutdown is requested.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#8Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#7)
Re: Sync Rep and shutdown Re: Sync Rep v19

On Wed, Mar 9, 2011 at 11:11 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

Same as above. I think that it's more problematic to leave the code
as it is. Because smart/fast shutdown can make the server get stuck
until immediate shutdown is requested.

I agree that the current state of affairs is a problem. However,
after looking through the code somewhat carefully, it looks a bit
difficult to fix. Suppose that backend A is waiting for sync rep. A
fast shutdown is performed. Right now, backend A shrugs its shoulders
and does nothing. Not good. But suppose we change it so that backend
A closes the connection and exits without either confirming the commit
or throwing ERROR/FATAL. That seems like correct behavior, since, if
we weren't using sync rep, the client would have to interpret that as
indicating that the connection denied in mid-COMMIT, and mustn't
assume anything about the state of the transaction. So far so good.

The problem is that there may be another backend B waiting on a lock
held by A. If backend A exits cleanly (without a PANIC), it will
remove itself from the ProcArray and release locks. That wakes up A,
which can now go do its thing. If the operating system is a bit on
the slow side delivering the signal to B, then the client to which B
is connected might manage to see a database state that shows the
transaction previous running in A as committed, even though that
transaction wasn't committed. That would stink, because the whole
point of having A hold onto locks until the standby ack'd the commit
was that no other transaction would see it as committed until it was
replicated.

This is a pretty unlikely race condition in practice but people who
are running sync rep are intending precisely to guard against unlikely
failure scenarios.

The only idea I have for allowing fast shutdown to still be fast, even
when sync rep is involved, is to shut down the system in two phases.
The postmaster would need to stop accepting new connections, and first
kill off all the backends that aren't waiting for sync rep. Then,
once all remaining backends are waiting for sync rep, we can have them
proceed as above: close the connection without acking the commit or
throwing ERROR/FATAL, and exit. That's pretty complicated, especially
given the rule that the postmaster mustn't touch shared memory, but I
don't see any alternative. We could just not allow fast shutdown, as
now, but I think that's worse.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#9Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#8)
Re: Sync Rep and shutdown Re: Sync Rep v19

On Wed, Mar 16, 2011 at 11:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:

The problem is that there may be another backend B waiting on a lock
held by A.  If backend A exits cleanly (without a PANIC), it will
remove itself from the ProcArray and release locks.  That wakes up A,
which can now go do its thing.  If the operating system is a bit on
the slow side delivering the signal to B, then the client to which B
is connected might manage to see a database state that shows the
transaction previous running in A as committed, even though that
transaction wasn't committed.  That would stink, because the whole
point of having A hold onto locks until the standby ack'd the commit
was that no other transaction would see it as committed until it was
replicated.

The lock can be released also when the transaction running in A is
rollbacked. So I could not understand why the client wrongly always
see the transaction as commtted even though it's not committed.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#10Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#8)
Re: Sync Rep and shutdown Re: Sync Rep v19

On Tue, 2011-03-15 at 22:07 -0400, Robert Haas wrote:

On Wed, Mar 9, 2011 at 11:11 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

Same as above. I think that it's more problematic to leave the code
as it is. Because smart/fast shutdown can make the server get stuck
until immediate shutdown is requested.

I agree that the current state of affairs is a problem. However,
after looking through the code somewhat carefully, it looks a bit
difficult to fix. Suppose that backend A is waiting for sync rep. A
fast shutdown is performed. Right now, backend A shrugs its shoulders
and does nothing. Not good. But suppose we change it so that backend
A closes the connection and exits without either confirming the commit
or throwing ERROR/FATAL. That seems like correct behavior, since, if
we weren't using sync rep, the client would have to interpret that as
indicating that the connection denied in mid-COMMIT, and mustn't
assume anything about the state of the transaction. So far so good.

The problem is that there may be another backend B waiting on a lock
held by A. If backend A exits cleanly (without a PANIC), it will
remove itself from the ProcArray and release locks. That wakes up A,
which can now go do its thing. If the operating system is a bit on
the slow side delivering the signal to B, then the client to which B
is connected might manage to see a database state that shows the
transaction previous running in A as committed, even though that
transaction wasn't committed. That would stink, because the whole
point of having A hold onto locks until the standby ack'd the commit
was that no other transaction would see it as committed until it was
replicated.

This is a pretty unlikely race condition in practice but people who
are running sync rep are intending precisely to guard against unlikely
failure scenarios.

The only idea I have for allowing fast shutdown to still be fast, even
when sync rep is involved, is to shut down the system in two phases.
The postmaster would need to stop accepting new connections, and first
kill off all the backends that aren't waiting for sync rep. Then,
once all remaining backends are waiting for sync rep, we can have them
proceed as above: close the connection without acking the commit or
throwing ERROR/FATAL, and exit. That's pretty complicated, especially
given the rule that the postmaster mustn't touch shared memory, but I
don't see any alternative.

We could just not allow fast shutdown, as
now, but I think that's worse.

Please explain why not allowing fast shutdown makes it worse?

For me, I'd rather not support a whole bunch of dubious code, just to
allow you to type -m fast when you can already type -m immediate.

What extra capability are we actually delivering by doing that??
The risk of introducing a bug and thereby losing data far outweighs the
rather dubious benefit.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#11Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#9)
Re: Sync Rep and shutdown Re: Sync Rep v19

On Wed, Mar 16, 2011 at 1:43 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Wed, Mar 16, 2011 at 11:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:

The problem is that there may be another backend B waiting on a lock
held by A.  If backend A exits cleanly (without a PANIC), it will
remove itself from the ProcArray and release locks.  That wakes up A,
which can now go do its thing.  If the operating system is a bit on
the slow side delivering the signal to B, then the client to which B
is connected might manage to see a database state that shows the
transaction previous running in A as committed, even though that
transaction wasn't committed.  That would stink, because the whole
point of having A hold onto locks until the standby ack'd the commit
was that no other transaction would see it as committed until it was
replicated.

The lock can be released also when the transaction running in A is
rollbacked. So I could not understand why the client wrongly always
see the transaction as commtted even though it's not committed.

The transaction IS committed, but only locally.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#10)
Re: Sync Rep and shutdown Re: Sync Rep v19

On Wed, Mar 16, 2011 at 4:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Tue, 2011-03-15 at 22:07 -0400, Robert Haas wrote:

On Wed, Mar 9, 2011 at 11:11 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

Same as above. I think that it's more problematic to leave the code
as it is. Because smart/fast shutdown can make the server get stuck
until immediate shutdown is requested.

I agree that the current state of affairs is a problem.  However,
after looking through the code somewhat carefully, it looks a bit
difficult to fix.  Suppose that backend A is waiting for sync rep.  A
fast shutdown is performed.  Right now, backend A shrugs its shoulders
and does nothing.  Not good.  But suppose we change it so that backend
A closes the connection and exits without either confirming the commit
or throwing ERROR/FATAL.  That seems like correct behavior, since, if
we weren't using sync rep, the client would have to interpret that as
indicating that the connection denied in mid-COMMIT, and mustn't
assume anything about the state of the transaction.  So far so good.

The problem is that there may be another backend B waiting on a lock
held by A.  If backend A exits cleanly (without a PANIC), it will
remove itself from the ProcArray and release locks.  That wakes up A,
which can now go do its thing.  If the operating system is a bit on
the slow side delivering the signal to B, then the client to which B
is connected might manage to see a database state that shows the
transaction previous running in A as committed, even though that
transaction wasn't committed.  That would stink, because the whole
point of having A hold onto locks until the standby ack'd the commit
was that no other transaction would see it as committed until it was
replicated.

This is a pretty unlikely race condition in practice but people who
are running sync rep are intending precisely to guard against unlikely
failure scenarios.

The only idea I have for allowing fast shutdown to still be fast, even
when sync rep is involved, is to shut down the system in two phases.
The postmaster would need to stop accepting new connections, and first
kill off all the backends that aren't waiting for sync rep.  Then,
once all remaining backends are waiting for sync rep, we can have them
proceed as above: close the connection without acking the commit or
throwing ERROR/FATAL, and exit.  That's pretty complicated, especially
given the rule that the postmaster mustn't touch shared memory, but I
don't see any alternative.

We could just not allow fast shutdown, as
now, but I think that's worse.

Please explain why not allowing fast shutdown makes it worse?

For me, I'd rather not support a whole bunch of dubious code, just to
allow you to type -m fast when you can already type -m immediate.

What extra capability are we actually delivering by doing that??
The risk of introducing a bug and thereby losing data far outweighs the
rather dubious benefit.

Well, my belief is that when users ask the database to shut down, they
want it to work. If I'm the only one who thinks that, then whatever.
But I firmly believe we'll get bug reports about this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#12)
Re: Sync Rep and shutdown Re: Sync Rep v19

On Wed, Mar 16, 2011 at 7:39 AM, Robert Haas <robertmhaas@gmail.com> wrote:

The only idea I have for allowing fast shutdown to still be fast, even
when sync rep is involved, is to shut down the system in two phases.
The postmaster would need to stop accepting new connections, and first
kill off all the backends that aren't waiting for sync rep.  Then,
once all remaining backends are waiting for sync rep, we can have them
proceed as above: close the connection without acking the commit or
throwing ERROR/FATAL, and exit.  That's pretty complicated, especially
given the rule that the postmaster mustn't touch shared memory, but I
don't see any alternative.

What extra capability are we actually delivering by doing that??
The risk of introducing a bug and thereby losing data far outweighs the
rather dubious benefit.

Well, my belief is that when users ask the database to shut down, they
want it to work.  If I'm the only one who thinks that, then whatever.
But I firmly believe we'll get bug reports about this.

On further review, the approach proposed above doesn't really work,
because a backend can get a SIGTERM either because the system is doing
a fast shutdown or because a user has issued
pg_terminate_backend(PID); and in the latter case we have to continue
letting in connections.

As of right now, synchronous replication continues to wait even when:

- someone tries to perform a fast shutdown
- someone tries to kill the backend using pg_terminate_backend()
- someone attempts to cancel the query using pg_cancel_backend() or by
pressing control-C in, for example, psql
- someone attempts to shut off synchronous replication by changing
synchronous_standby_names in postgresql.conf and issuing pg_ctl reload

We've worked pretty hard to ensure that things like query cancel and
shutdown work quickly and reliably, and I don't think we want to make
synchronous replication the one part of the system that departs from
that general principle.

So, patch attached. This patch arranges to do the following things:

1. If a die interrupt is received (pg_terminate_backend or fast
shutdown), then terminate the sync rep wait and arrange for the
connection to be closed without acknowledging the commit (but do send
a warning message back). The commit still happened, though, so other
transactions will see its effects. This is unavoidable unless we're
willing to either ignore attempts to terminate a backend waiting for
sync rep, or panic the system when it happens, and I don't think
either of those is appropriate.

2. If a query cancel interrupt is received (pg_cancel_backend or ^C),
then cancel the sync rep wait and issue a warning before acknowledging
the commit. Again, the alternative is to either ignore the cancel or
panic, neither of which I believe to be what users will want.

3. If synchronous_standby_names is changed to '' by editing
postgresql.conf and issuing pg_ctl reload, then cancel all waits in
progress and wake everybody up. As I mentioned before, reloading the
config file from within the waiting backend (which can't safely throw
an error) seems risky, so what I did instead is made WAL writer
responsible for handling this. Nobody's allowed to wait for sync rep
unless a global shared memory flag is set, and the WAL writer process
is responsible for setting and clearing this flag when the config file
is reloaded. This has basically no performance cost; WAL writer only
ever does any extra work at all with this code when it receives a
SIGHUP, and even then the work is trivial except in the case where
synchronous_standby_names has changed from empty to non-empty or visca
versa. The advantage of putting this in WAL writer rather than, say,
bgwriter is that WAL writer doesn't have nearly as many jobs to do and
they don't involve nearly as much I/O, so the chances of a long delay
due to the process being busy are much less.

4. Remove the SYNC_REP_MUST_DISCONNECT state, which actually does
absolutely nothing right now, despite what the name would seem to
imply. In particular, it doesn't arrange for any sort of disconnect.
This patch does arrange for that, but not using this mechanism.

5. The existing code relies on being able to read MyProc->syncRepState
without holding the lock, even while a WAL sender must be updating it
in another process. I'm not 100% sure this is safe on a
multi-processor machine with weak memory ordering. In practice, the
chances of something going wrong here seem extremely small. You'd
need something like this: a WAL sender updates MyProc->syncRepState
just after the wait timeout expires and before the latch is reset, but
the regular backend fails to see the state change due to
memory-ordering effects and drops through the loop, waiting another 60
s, and then finally wakes up and completes the wait (but a minute
later than expected). That seems vanishingly unlikely but it's also
simple to protect against, so I did.

Review appreciated.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

sync-rep-wait-fixes.patchapplication/octet-stream; name=sync-rep-wait-fixes.patchDownload+311-201
#14Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Robert Haas (#13)
Re: Sync Rep and shutdown Re: Sync Rep v19

Robert Haas <robertmhaas@gmail.com> writes:

1. If a die interrupt is received (pg_terminate_backend or fast
shutdown), then terminate the sync rep wait and arrange for the
connection to be closed without acknowledging the commit (but do send
a warning message back). The commit still happened, though, so other
transactions will see its effects. This is unavoidable unless we're
willing to either ignore attempts to terminate a backend waiting for
sync rep, or panic the system when it happens, and I don't think
either of those is appropriate.

Is it possible to force the standby out here, so that logs show that
there was something going on wrt replication?

2. If a query cancel interrupt is received (pg_cancel_backend or ^C),
then cancel the sync rep wait and issue a warning before acknowledging
the commit. Again, the alternative is to either ignore the cancel or
panic, neither of which I believe to be what users will want.

Or force the standby to disconnect.

In both those cases what we have is a situation were either we can't
satisfy the user request or we can't continue to offer sync rep. You're
saying that we have to satisfy the user's query, so I say kick off sync
rep or it does not make any sense.

3. If synchronous_standby_names is changed to '' by editing
postgresql.conf and issuing pg_ctl reload, then cancel all waits in
progress and wake everybody up. As I mentioned before, reloading the

Ok.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#15Robert Haas
robertmhaas@gmail.com
In reply to: Dimitri Fontaine (#14)
Re: Sync Rep and shutdown Re: Sync Rep v19

On Wed, Mar 16, 2011 at 6:23 PM, Dimitri Fontaine
<dimitri@2ndquadrant.fr> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

1. If a die interrupt is received (pg_terminate_backend or fast
shutdown), then terminate the sync rep wait and arrange for the
connection to be closed without acknowledging the commit (but do send
a warning message back).  The commit still happened, though, so other
transactions will see its effects.  This is unavoidable unless we're
willing to either ignore attempts to terminate a backend waiting for
sync rep, or panic the system when it happens, and I don't think
either of those is appropriate.

Is it possible to force the standby out here, so that logs show that
there was something going on wrt replication?

That's an interesting idea, but I think it might be too much spooky
action at a distance. I think we should look at getting Fujii Masao's
replication_timeout patch committed; that seems like the right way to
kick out unresponsive standbys. Another problem with doing it here is
that any ERROR will turn into a PANIC, which rules out doing anything
very complicated. Also note that we can (and do) log a WARNING, which
I think answers your concern about having something in the logs wrt
replication.

A further point is that even if we could kick out the standby, it'd
presumably reconnect after the usual 2 s interval, so it doesn't seem
like it really accomplishes much. We can't just unilaterally decide
that it is no longer allowed to be a sync standby ever again; that's
controlled by postgresql.conf.

I think the most important part of all this is that it is logged.
Anyone who is running synchronous replication should also be doing
careful monitoring; if not, shame on them, because if your data is
important enough that you need synchronous replication, it's surely
important enough to watch the logs. If you don't, all sorts of bad
things can happen to your data (either related to sync rep, or
otherwise) and you'll have no idea until it's far too late.

2. If a query cancel interrupt is received (pg_cancel_backend or ^C),
then cancel the sync rep wait and issue a warning before acknowledging
the commit.  Again, the alternative is to either ignore the cancel or
panic, neither of which I believe to be what users will want.

Or force the standby to disconnect.

In both those cases what we have is a situation were either we can't
satisfy the user request or we can't continue to offer sync rep.  You're
saying that we have to satisfy the user's query, so I say kick off sync
rep or it does not make any sense.

Same considerations here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16Aidan Van Dyk
aidan@highrise.ca
In reply to: Robert Haas (#15)
Re: Sync Rep and shutdown Re: Sync Rep v19

On Wed, Mar 16, 2011 at 8:30 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think the most important part of all this is that it is logged.
Anyone who is running synchronous replication should also be doing
careful monitoring; if not, shame on them, because if your data is
important enough that you need synchronous replication, it's surely
important enough to watch the logs.  If you don't, all sorts of bad
things can happen to your data (either related to sync rep, or
otherwise) and you'll have no idea until it's far too late.

+++++

If your data is that important, your logs/monitoring are *equally*
important, because they are what give you confidence your data is as
safe as you think it is...

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

#17Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#13)
Re: Sync Rep and shutdown Re: Sync Rep v19

On Thu, Mar 17, 2011 at 2:35 AM, Robert Haas <robertmhaas@gmail.com> wrote:

1. If a die interrupt is received (pg_terminate_backend or fast
shutdown), then terminate the sync rep wait and arrange for the
connection to be closed without acknowledging the commit (but do send
a warning message back).  The commit still happened, though, so other
transactions will see its effects.  This is unavoidable unless we're
willing to either ignore attempts to terminate a backend waiting for
sync rep, or panic the system when it happens, and I don't think
either of those is appropriate.

OK.

2. If a query cancel interrupt is received (pg_cancel_backend or ^C),
then cancel the sync rep wait and issue a warning before acknowledging
the commit.  Again, the alternative is to either ignore the cancel or
panic, neither of which I believe to be what users will want.

OK.

3. If synchronous_standby_names is changed to '' by editing
postgresql.conf and issuing pg_ctl reload, then cancel all waits in
progress and wake everybody up.  As I mentioned before, reloading the
config file from within the waiting backend (which can't safely throw
an error) seems risky,

AFAIR, ProcessConfigFile() doesn't throw an error. So I don't think
that's so risky. But, as you said in another thread, reading config file
at that point is inconsistent, I agree. And it seems better to leave
background process to wake up backends.

so what I did instead is made WAL writer
responsible for handling this.  Nobody's allowed to wait for sync rep
unless a global shared memory flag is set, and the WAL writer process
is responsible for setting and clearing this flag when the config file
is reloaded.  This has basically no performance cost; WAL writer only
ever does any extra work at all with this code when it receives a
SIGHUP, and even then the work is trivial except in the case where
synchronous_standby_names has changed from empty to non-empty or visca
versa.  The advantage of putting this in WAL writer rather than, say,
bgwriter is that WAL writer doesn't have nearly as many jobs to do and
they don't involve nearly as much I/O, so the chances of a long delay
due to the process being busy are much less.

This occurs to me; we should ensure that, in shutdown case, walwriter
should exit after all the backends have gone out? I'm not sure if it's worth
thinking of the case, but what if synchronous_standby_names is unset
and config file is reloaded after smart shutdown is requested? In this
case, the reload cannot wake up the waiting backends since walwriter
has already exited. This behavior looks a bit inconsistent.

4. Remove the SYNC_REP_MUST_DISCONNECT state, which actually does
absolutely nothing right now, despite what the name would seem to
imply.  In particular, it doesn't arrange for any sort of disconnect.
This patch does arrange for that, but not using this mechanism.

OK.

5. The existing code relies on being able to read MyProc->syncRepState
without holding the lock, even while a WAL sender must be updating it
in another process.  I'm not 100% sure this is safe on a
multi-processor machine with weak memory ordering.  In practice, the
chances of something going wrong here seem extremely small.  You'd
need something like this: a WAL sender updates MyProc->syncRepState
just after the wait timeout expires and before the latch is reset, but
the regular backend fails to see the state change due to
memory-ordering effects and drops through the loop, waiting another 60
s, and then finally wakes up and completes the wait (but a minute
later than expected).  That seems vanishingly unlikely but it's also
simple to protect against, so I did.

In the patch, in order to read the latest value, you take a light-weight lock.
But I wonder why taking a lock can ensure that the value is up-to-date.

Review appreciated.

Thanks! Here are some comments:

+ * WAL writer calls this as needed to update the shared sync_standbys_needed

Typo: s/sync_standbys_needed/sync_standbys_defined

+ * we exit normally, or SYNC_REP_MUST_DISCONNECT in abnormal cases.

Typo: the reference to SYNC_REP_MUST_DISCONNECT is not required.

+ * So in this case we issue a NOTICE (which some clients may

Typo: s/NOTICE/WARNING

+		if (ProcDiePending)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_ADMIN_SHUTDOWN),
+					 errmsg("canceling the wait for replication and terminating
connection due to administrator command"),
+					 errdetail("The transaction has already been committed locally
but might have not been replicated to the standby.")));
+			whereToSendOutput = DestNone;
+			LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+			MyProc->syncRepState = SYNC_REP_WAIT_COMPLETE;
+			SHMQueueDelete(&(MyProc->syncRepLinks));

SHMQueueIsDetached() should be checked before calling SHMQueueDelete().
Walsender can already delete the backend from the queue before reaching here.

+		if (QueryCancelPending)
+		{
+			QueryCancelPending = false;
+			ereport(WARNING,
+					(errmsg("canceling wait for synchronous replication due to user request"),
+					 errdetail("The transaction has committed locally, but may not
have replicated to the standby.")));
+			LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+			MyProc->syncRepState = SYNC_REP_WAIT_COMPLETE;
+			SHMQueueDelete(&(MyProc->syncRepLinks));
+			LWLockRelease(SyncRepLock);

Same as above.

+		if (!PostmasterIsAlive(true))
+		{
+			whereToSendOutput = DestNone;
+			proc_exit(1);

proc_exit() should not be called at that point because it leads PANIC.

I think that it's better to check ProcDiePending, QueryCancelPending
and PostmasterIsAlive *before* waiting on the latch, not after. Because
those events can occur before reaching there, and it's not worth waiting
for 60 seconds to detect them.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#18Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#13)
Re: Sync Rep and shutdown Re: Sync Rep v19

On 16.03.2011 19:35, Robert Haas wrote:

3. If synchronous_standby_names is changed to '' by editing
postgresql.conf and issuing pg_ctl reload, then cancel all waits in
progress and wake everybody up. As I mentioned before, reloading the
config file from within the waiting backend (which can't safely throw
an error) seems risky, so what I did instead is made WAL writer
responsible for handling this. Nobody's allowed to wait for sync rep
unless a global shared memory flag is set, and the WAL writer process
is responsible for setting and clearing this flag when the config file
is reloaded. This has basically no performance cost; WAL writer only
ever does any extra work at all with this code when it receives a
SIGHUP, and even then the work is trivial except in the case where
synchronous_standby_names has changed from empty to non-empty or visca
versa. The advantage of putting this in WAL writer rather than, say,
bgwriter is that WAL writer doesn't have nearly as many jobs to do and
they don't involve nearly as much I/O, so the chances of a long delay
due to the process being busy are much less.

Hmm, so setting synchronous_standby_names to '' takes effect
immediately, but other changes to it don't apply to already-blocked
commits. That seems a bit inconsistent. Perhaps walwriter should store
the parsed list of standby-names in shared memory, not just a boolean.

+1 otherwise.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#19Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#17)
Re: Sync Rep and shutdown Re: Sync Rep v19

On Thu, Mar 17, 2011 at 2:08 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

This occurs to me; we should ensure that, in shutdown case, walwriter
should exit after all the backends have gone out? I'm not sure if it's worth
thinking of the case, but what if synchronous_standby_names is unset
and config file is reloaded after smart shutdown is requested? In this
case, the reload cannot wake up the waiting backends since walwriter
has already exited. This behavior looks a bit inconsistent.

I agree we need to fix smart shutdown. I think that's going to have
to be a separate patch, however; it's got more problems than this
patch can fix without expanding into a monster.

In the patch, in order to read the latest value, you take a light-weight lock.
But I wonder why taking a lock can ensure that the value is up-to-date.

A lock acquisition acts as a memory sequence point.

+ * WAL writer calls this as needed to update the shared sync_standbys_needed

Typo: s/sync_standbys_needed/sync_standbys_defined

Fixed.

+ * we exit normally, or SYNC_REP_MUST_DISCONNECT in abnormal cases.

Typo: the reference to SYNC_REP_MUST_DISCONNECT is not required.

Fixed.

+                * So in this case we issue a NOTICE (which some clients may

Typo: s/NOTICE/WARNING

Fixed.

+               if (ProcDiePending)
+               {
+                       ereport(WARNING,
+                                       (errcode(ERRCODE_ADMIN_SHUTDOWN),
+                                        errmsg("canceling the wait for replication and terminating
connection due to administrator command"),
+                                        errdetail("The transaction has already been committed locally
but might have not been replicated to the standby.")));
+                       whereToSendOutput = DestNone;
+                       LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+                       MyProc->syncRepState = SYNC_REP_WAIT_COMPLETE;
+                       SHMQueueDelete(&(MyProc->syncRepLinks));

SHMQueueIsDetached() should be checked before calling SHMQueueDelete().
Walsender can already delete the backend from the queue before reaching here.

Fixed. But come to think of it, doesn't this mean
SyncRepCleanupAtProcExit() needs to repeat the test after acquiring
the lock?

+               if (QueryCancelPending)
+               {
+                       QueryCancelPending = false;
+                       ereport(WARNING,
+                                       (errmsg("canceling wait for synchronous replication due to user request"),
+                                        errdetail("The transaction has committed locally, but may not
have replicated to the standby.")));
+                       LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+                       MyProc->syncRepState = SYNC_REP_WAIT_COMPLETE;
+                       SHMQueueDelete(&(MyProc->syncRepLinks));
+                       LWLockRelease(SyncRepLock);

Same as above.

Fixed.

+               if (!PostmasterIsAlive(true))
+               {
+                       whereToSendOutput = DestNone;
+                       proc_exit(1);

proc_exit() should not be called at that point because it leads PANIC.

Fixed, although I'm not sure it matters.

I think that it's better to check ProcDiePending, QueryCancelPending
and PostmasterIsAlive *before* waiting on the latch, not after. Because
those events can occur before reaching there, and it's not worth waiting
for 60 seconds to detect them.

Not necessary. Inspired by one of your earlier patches, I made die()
and StatementCancelHandler() set the latch. We could still wait up to
60 s to detect postmaster death, but that's a very rare situation and
it's not worth slowing down the common case for it, especially since
there is a race condition no matter what.

Thanks for the review!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

sync-rep-wait-fixes-v2.patchapplication/octet-stream; name=sync-rep-wait-fixes-v2.patchDownload+214-114
#20Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#18)
Re: Sync Rep and shutdown Re: Sync Rep v19

On Thu, Mar 17, 2011 at 8:24 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Hmm, so setting synchronous_standby_names to '' takes effect immediately,
but other changes to it don't apply to already-blocked commits. That seems a
bit inconsistent. Perhaps walwriter should store the parsed list of
standby-names in shared memory, not just a boolean.

I don't think this is necessary. In general, the current or potential
WAL sender processes are responsible for working out among themselves
whose job it is to release waiters, and doing it. As long as
synchronous_standby_names is non-empty, then either (1) there are one
or more standbys connected that can take on the role of synchronous
standby, and whoever does will release waiters or (2) there are no
standbys connected that can take on the role of synchronous standbys,
in which case no waiters should be released until one connects. But
when synchronous_standby_names becomes completely empty, that doesn't
mean "wait until a standby connects whose application name is in the
empty set and make him the synchronous standby" but rather
"synchronous replication is administratively disabled, don't wait in
the first place". So we just need a Boolean flag.

+1 otherwise.

Thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#13)
#22Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#21)
#23Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#22)
#24Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#23)
#25Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#24)
#26Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jeff Davis (#25)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#26)
#28Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#19)
#29Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#28)
#30Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#29)
#31Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#30)
#32Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#29)
#33Yeb Havinga
yebhavinga@gmail.com
In reply to: Robert Haas (#29)
#34Robert Haas
robertmhaas@gmail.com
In reply to: Yeb Havinga (#33)
#35Yeb Havinga
yebhavinga@gmail.com
In reply to: Robert Haas (#34)
#36Robert Haas
robertmhaas@gmail.com
In reply to: Yeb Havinga (#35)
#37Yeb Havinga
yebhavinga@gmail.com
In reply to: Robert Haas (#36)
#38Robert Haas
robertmhaas@gmail.com
In reply to: Yeb Havinga (#37)
#39Yeb Havinga
yebhavinga@gmail.com
In reply to: Robert Haas (#38)
#40Yeb Havinga
yebhavinga@gmail.com
In reply to: Yeb Havinga (#39)
#41Yeb Havinga
yebhavinga@gmail.com
In reply to: Yeb Havinga (#40)
#42Robert Haas
robertmhaas@gmail.com
In reply to: Yeb Havinga (#41)