Sync Rep v19

Started by Simon Riggsabout 15 years ago91 messageshackers
Jump to latest
#1Simon Riggs
simon@2ndQuadrant.com

Latest version of Sync Rep, which includes substantial internal changes
and simplifications from previous version. (25-30 changes).

Includes all outstanding technical comments, typos and docs. I will
continue to work on self review and test myself, though actively
encourage others to test and report issues.

Interesting changes

* docs updated

* names listed in synchronous_standby_names are now in priority order

* synchronous_standby_names = "*" matches all standby names

* pg_stat_replication now shows standby priority - this is an ordinal
number so "1" means 1st, "2" means 2nd etc, though 0 means "not a sync
standby".

The only *currently* outstanding point of discussion is the "when to
wait" debate, which we aren't moving quickly towards consensus on at
this stage. I see that as a "How should it work?" debate and something
we can chew over during Alpha/Beta, not as an immediate blocker to
commit.

Please comment on the patch and also watch changes to the repo
git://github.com/simon2ndQuadrant/postgres.git

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

sync_rep.v19.context.patchtext/x-patch; charset=UTF-8; name=sync_rep.v19.context.patchDownload+1190-33
#2Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#1)
Re: Sync Rep v19

On Thu, Mar 3, 2011 at 7:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Latest version of Sync Rep, which includes substantial internal changes
and simplifications from previous version. (25-30 changes).

Includes all outstanding technical comments, typos and docs. I will
continue to work on self review and test myself, though actively
encourage others to test and report issues.

Thanks for the patch!

* synchronous_standby_names = "*" matches all standby names

Using '*' as the default seems to lead the performance degradation by
being connected from unexpected synchronous standby.

* pg_stat_replication now shows standby priority - this is an ordinal
number so "1" means 1st, "2" means 2nd etc, though 0 means "not a sync
standby".

monitoring.sgml should be updated.

Though I've not read whole of the patch yet, here is the current comment:

Using MyProc->lwWaiting and lwWaitLink for backends to wait for replication
looks fragile. Since they are used also by lwlock, the value of them can be
changed unexpectedly. Instead, how about defining dedicated variables for
replication?

+		else if (WaitingForSyncRep)
+		{
+			/*
+			 * This must NOT be a FATAL message. We want the state of the
+			 * transaction being aborted to be indeterminate to ensure that
+			 * the transaction completion guarantee is never broken.
+			 */

The backend can reach this code path after returning the commit to the client.
Instead, how about doing this in EndCommand, to close the connection before
returning the commit?

+		LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+		sync_priority = walsnd->sync_standby_priority;
+		LWLockRelease(SyncRepLock);

LW_SHARE can be used here, instead.

+			/*
+			 * Wait no longer if we have already reached our LSN
+			 */
+			if (XLByteLE(XactCommitLSN, queue->lsn))
+			{
+				/* No need to wait */
+				LWLockRelease(SyncRepLock);
+				return;
+			}

It might take long to acquire SyncRepLock, so how about comparing
our LSN with WalSnd->flush before here?

replication_timeout_client depends on GetCurrentTransactionStopTimestamp().
In COMMIT case, it's OK. But In PREPARE TRANSACTION, COMMIT PREPARED
and ROLLBACK PREPARED cases, it seems problematic because they don't call
SetCurrentTransactionStopTimestamp().

In SyncRepWaitOnQueue, the backend can theoretically call WaitLatch() again
after the wake-up from the latch. In this case, the "timeout" should
be calculated
again. Otherwise, it would take unexpectedly very long to cause the timeout.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#2)
Re: Sync Rep v19

On Fri, 2011-03-04 at 00:02 +0900, Fujii Masao wrote:

* synchronous_standby_names = "*" matches all standby names

Using '*' as the default seems to lead the performance degradation by
being connected from unexpected synchronous standby.

You can configure it however you wish. It seemed better to have an out
of the box setting that was useful.

* pg_stat_replication now shows standby priority - this is an ordinal
number so "1" means 1st, "2" means 2nd etc, though 0 means "not a sync
standby".

monitoring.sgml should be updated.

Didn't think it needed to be, but I've added a few lines to explain.

Though I've not read whole of the patch yet, here is the current comment:

Using MyProc->lwWaiting and lwWaitLink for backends to wait for replication
looks fragile. Since they are used also by lwlock, the value of them can be
changed unexpectedly. Instead, how about defining dedicated variables for
replication?

Yes, I think the queue stuff needs a rewrite now.

+		else if (WaitingForSyncRep)
+		{
+			/*
+			 * This must NOT be a FATAL message. We want the state of the
+			 * transaction being aborted to be indeterminate to ensure that
+			 * the transaction completion guarantee is never broken.
+			 */

The backend can reach this code path after returning the commit to the client.
Instead, how about doing this in EndCommand, to close the connection before
returning the commit?

OK, will look.

+		LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+		sync_priority = walsnd->sync_standby_priority;
+		LWLockRelease(SyncRepLock);

LW_SHARE can be used here, instead.

Seemed easier to keep it simple and have all lockers use LW_EXCLUSIVE.
But I've changed it for you.

+			/*
+			 * Wait no longer if we have already reached our LSN
+			 */
+			if (XLByteLE(XactCommitLSN, queue->lsn))
+			{
+				/* No need to wait */
+				LWLockRelease(SyncRepLock);
+				return;
+			}

It might take long to acquire SyncRepLock, so how about comparing
our LSN with WalSnd->flush before here?

If we're not the sync standby and we need to takeover the role of sync
standby we may need to issue a wakeup even though our standby reached
that LSN some time before. So we need to check each time.

replication_timeout_client depends on GetCurrentTransactionStopTimestamp().
In COMMIT case, it's OK. But In PREPARE TRANSACTION, COMMIT PREPARED
and ROLLBACK PREPARED cases, it seems problematic because they don't call
SetCurrentTransactionStopTimestamp().

Shame on them!

Seems reasonable that they should call
SetCurrentTransactionStopTimestamp().

I don't want to make a special case there for prepared transactions.

In SyncRepWaitOnQueue, the backend can theoretically call WaitLatch() again
after the wake-up from the latch. In this case, the "timeout" should
be calculated
again. Otherwise, it would take unexpectedly very long to cause the timeout.

That was originally modelled on on the way the statement_timeout timer
works. If it gets nudged and wakes up too early it puts itself back to
sleep to wakeup at the same time again.

I've renamed the variables to make that clearer and edited slightly.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#4Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Simon Riggs (#3)
Re: Sync Rep v19

Simon Riggs <simon@2ndQuadrant.com> writes:

On Fri, 2011-03-04 at 00:02 +0900, Fujii Masao wrote:

* synchronous_standby_names = "*" matches all standby names

Using '*' as the default seems to lead the performance degradation by
being connected from unexpected synchronous standby.

You can configure it however you wish. It seemed better to have an out
of the box setting that was useful.

Well the HBA still needs some opening before anyone can claim to be a
standby. I guess the default line would be commented out and no standby
would be accepted as synchronous by default, assuming this GUC is sighup.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Dimitri Fontaine (#4)
Re: Sync Rep v19

On Thu, 2011-03-03 at 18:51 +0100, Dimitri Fontaine wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

On Fri, 2011-03-04 at 00:02 +0900, Fujii Masao wrote:

* synchronous_standby_names = "*" matches all standby names

Using '*' as the default seems to lead the performance degradation by
being connected from unexpected synchronous standby.

You can configure it however you wish. It seemed better to have an out
of the box setting that was useful.

Well the HBA still needs some opening before anyone can claim to be a
standby. I guess the default line would be commented out and no standby
would be accepted as synchronous by default, assuming this GUC is sighup.

The patch sets "*" as the default, so all standbys are synchronous by
default.

Would you prefer it if it was blank, meaning no standbys are
synchronous, by default?

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#6Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#5)
Re: Sync Rep v19

On Thu, Mar 3, 2011 at 1:14 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Thu, 2011-03-03 at 18:51 +0100, Dimitri Fontaine wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

On Fri, 2011-03-04 at 00:02 +0900, Fujii Masao wrote:

* synchronous_standby_names = "*" matches all standby names

Using '*' as the default seems to lead the performance degradation by
being connected from unexpected synchronous standby.

You can configure it however you wish. It seemed better to have an out
of the box setting that was useful.

Well the HBA still needs some opening before anyone can claim to be a
standby.  I guess the default line would be commented out and no standby
would be accepted as synchronous by default, assuming this GUC is sighup.

The patch sets "*" as the default, so all standbys are synchronous by
default.

Would you prefer it if it was blank, meaning no standbys are
synchronous, by default?

I think * is a reasonable default.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#2)
Re: Sync Rep v19

On Fri, 2011-03-04 at 00:02 +0900, Fujii Masao wrote:

+               else if (WaitingForSyncRep)
+               {
+                       /*
+                        * This must NOT be a FATAL message. We want
the state of the
+                        * transaction being aborted to be
indeterminate to ensure that
+                        * the transaction completion guarantee is
never broken.
+                        */

The backend can reach this code path after returning the commit to the
client.
Instead, how about doing this in EndCommand, to close the connection
before
returning the commit?

I don't really understand this comment.

You can't get there after returning the COMMIT message. Once we have
finished waiting we set WaitingForSyncRep = false, before we return to
RecordTransactionCommit() and continue from there.

Anyway, this is code in the interrupt handler and only gets executed
when we receive SIGTERM for a fast shutdown.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#8Yeb Havinga
yebhavinga@gmail.com
In reply to: Simon Riggs (#1)
Re: Sync Rep v19

On 2011-03-03 11:53, Simon Riggs wrote:

Latest version of Sync Rep, which includes substantial internal changes
and simplifications from previous version. (25-30 changes).

Includes all outstanding technical comments, typos and docs. I will
continue to work on self review and test myself, though actively
encourage others to test and report issues.

Interesting changes

* docs updated

* names listed in synchronous_standby_names are now in priority order

* synchronous_standby_names = "*" matches all standby names

* pg_stat_replication now shows standby priority - this is an ordinal
number so "1" means 1st, "2" means 2nd etc, though 0 means "not a sync
standby".

Some initial remarks:

1) this works nice:
application_name not in synchronous_standby_names -> sync_priority = 0 (OK)
change synchronous_standby_names to default *, reload conf ->
sync_priority = 1 (OK)

message in log file
LOG: 00000: standby "walreceiver" is now the synchronous standby with
priority 1

2) priorities
I have to get used to mapping the integers to synchronous replication
meaning.
0 -> asynchronous
1 -> the synchronous standby that is waited for
2 and higher -> potential syncs

Could it be hidden from the user? I liked asynchronous / synchronous /
potential synchronous

then the log message could be
LOG: 00000: standby "walreceiver" is now the synchronous standby

3) walreceiver is the default application name - could there be problems
when a second standby with that name connects (ofcourse the same
question holds for two the same nondefault application_names)?

regards
Yeb Havinga

#9Simon Riggs
simon@2ndQuadrant.com
In reply to: Yeb Havinga (#8)
Re: Sync Rep v19

On Thu, 2011-03-03 at 22:27 +0100, Yeb Havinga wrote:

On 2011-03-03 11:53, Simon Riggs wrote:

Latest version of Sync Rep, which includes substantial internal changes
and simplifications from previous version. (25-30 changes).

Includes all outstanding technical comments, typos and docs. I will
continue to work on self review and test myself, though actively
encourage others to test and report issues.

Interesting changes

* docs updated

* names listed in synchronous_standby_names are now in priority order

* synchronous_standby_names = "*" matches all standby names

* pg_stat_replication now shows standby priority - this is an ordinal
number so "1" means 1st, "2" means 2nd etc, though 0 means "not a sync
standby".

Some initial remarks:

1) this works nice:
application_name not in synchronous_standby_names -> sync_priority = 0 (OK)
change synchronous_standby_names to default *, reload conf ->
sync_priority = 1 (OK)

message in log file
LOG: 00000: standby "walreceiver" is now the synchronous standby with
priority 1

2) priorities
I have to get used to mapping the integers to synchronous replication
meaning.
0 -> asynchronous
1 -> the synchronous standby that is waited for
2 and higher -> potential syncs

Could it be hidden from the user? I liked asynchronous / synchronous /
potential synchronous

Yes, that sounds good. I will leave it as it is now to gain other
comments since this need not delay commit.

then the log message could be
LOG: 00000: standby "walreceiver" is now the synchronous standby

The priority is mentioned in the LOG message, so you can understand what
happens when multiple standbys connect.

e.g.

if you have synchronous_standby_names = 'a, b, c'

and then the standbys connect in the order b, c, a then you will see log
messages

LOG: standby "b" is now the synchronous standby with priority 2
LOG: standby "a" is now the synchronous standby with priority 1

It's designed so no matter which order standbys arrive in it is the
highest priority standby that makes it to the front in the end.

3) walreceiver is the default application name - could there be problems
when a second standby with that name connects (ofcourse the same
question holds for two the same nondefault application_names)?

That's documented: in that case which standby is sync is indeterminate.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#7)
Re: Sync Rep v19

Simon Riggs <simon@2ndQuadrant.com> writes:

Anyway, this is code in the interrupt handler and only gets executed
when we receive SIGTERM for a fast shutdown.

I trust it's not getting *directly* executed from the interrupt handler,
at least not without ImmediateInterruptOK.

regards, tom lane

#11Fujii Masao
masao.fujii@gmail.com
In reply to: Tom Lane (#10)
Re: Sync Rep v19

On Fri, Mar 4, 2011 at 7:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

Anyway, this is code in the interrupt handler and only gets executed
when we receive SIGTERM for a fast shutdown.

I trust it's not getting *directly* executed from the interrupt handler,
at least not without ImmediateInterruptOK.

Yes, the backend waits for replication while cancel/die interrupt is
being blocked, i.e., InterruptHoldoffCount > 0. So SIGTERM doesn't
lead the waiting backend to there directly. The backend reaches there
after returning the result.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#12Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#11)
Re: Sync Rep v19

On Fri, Mar 4, 2011 at 1:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Mar 4, 2011 at 7:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

Anyway, this is code in the interrupt handler and only gets executed
when we receive SIGTERM for a fast shutdown.

I trust it's not getting *directly* executed from the interrupt handler,
at least not without ImmediateInterruptOK.

Yes, the backend waits for replication while cancel/die interrupt is
being blocked, i.e., InterruptHoldoffCount > 0. So SIGTERM doesn't
lead the waiting backend to there directly. The backend reaches there
after returning the result.

BTW, this is true in COMMIT and PREPARE cases, and false in
COMMIT PREPARED and ROLLBACK PREPARED cases. In the
latter cases, HOLD_INTERRUPT() is not called before waiting for
replication.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#13Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#12)
Re: Sync Rep v19

On Fri, 2011-03-04 at 13:35 +0900, Fujii Masao wrote:

On Fri, Mar 4, 2011 at 1:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Mar 4, 2011 at 7:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

Anyway, this is code in the interrupt handler and only gets executed
when we receive SIGTERM for a fast shutdown.

I trust it's not getting *directly* executed from the interrupt handler,
at least not without ImmediateInterruptOK.

Yes, the backend waits for replication while cancel/die interrupt is
being blocked, i.e., InterruptHoldoffCount > 0. So SIGTERM doesn't
lead the waiting backend to there directly. The backend reaches there
after returning the result.

BTW, this is true in COMMIT and PREPARE cases,

CommitTransaction() calls HOLD_INTERRUPT() and then RESUME_INTERRUPTS(),
which was reasonable before we started waiting for syncrep. The
interrupt does occur *before* we send the message back, but doesn't work
effectively at interrupting the wait in the way you would like.

If we RESUME_INTERRUPTS() prior to waiting and then HOLD again that
would allow all signals not just SIGTERM. We would need to selectively
reject everything except SIGTERM messages.

Ideas?

Alter ProcessInterrupts() to accept an interrupt if ProcDiePending &&
WaitingForSyncRep and InterruptHoldoffCount > 0. That looks a little
scary, but looks like it will work.

and false in
COMMIT PREPARED and ROLLBACK PREPARED cases. In the
latter cases, HOLD_INTERRUPT() is not called before waiting for
replication.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

signal_filter.patchtext/x-patch; charset=UTF-8; name=signal_filter.patchDownload+11-2
#14Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#2)
Re: Sync Rep v19

On Fri, Mar 4, 2011 at 12:02 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

Though I've not read whole of the patch yet, here is the current comment:

Here are another comments:

+#replication_timeout_client = 120 # 0 means wait forever

Typo: s/replication_timeout_client/sync_replication_timeout

+			else if (timeout > 0 &&
+				TimestampDifferenceExceeds(GetCurrentTransactionStopTimestamp(),
+											wait_start, timeout))

If SetCurrentTransactionStopTimestamp() is called before (i.e., COMMIT case),
the return value of GetCurrentTransactionStopTimestamp() is the same as
"wait_start". So, in this case, the timeout never expires.

+				strcpy(new_status + len, " waiting for sync rep");
+				set_ps_display(new_status, false);

How about changing the message to something like "waiting for %X/%X"
(%X/%X indicates the LSN which the backend is waiting for)?

Please initialize MyProc->procWaitLink to NULL in InitProcess() as well as
do MyProc->lwWaitLink.

+	/*
+	 * We're a potential sync standby. Release waiters if we are the
+	 * highest priority standby. We do this even if the standby is not yet
+	 * caught up, in case this is a restart situation and
+	 * there are backends waiting for us. That allows backends to exit the
+	 * wait state even if new backends cannot yet enter the wait state.
+	 */

I don't think that it's good idea to switch the high priority standby which has
not caught up, to the sync one, especially when there is already another
sync standby. Because that degrades replication from sync to async for
a while, even though there is sync standby which has caught up.

+		if (walsnd->pid != 0 &&
+			walsnd->sync_standby_priority > 0 &&
+			(priority == 0 ||
+			 priority < walsnd->sync_standby_priority))
+		{
+			 priority = walsnd->sync_standby_priority;
+			 syncWalSnd = walsnd;
+		}

According to the code, the last named standby has highest priority. But the
document says the opposite.

ISTM the waiting backends can be sent the wake-up signal by the
walsender multiple times since the walsender doesn't remove any
entry from the queue. Isn't this unsafe? waste of the cycle?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#15Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#13)
Re: Sync Rep v19

On Fri, Mar 4, 2011 at 3:16 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

CommitTransaction() calls HOLD_INTERRUPT() and then RESUME_INTERRUPTS(),
which was reasonable before we started waiting for syncrep. The
interrupt does occur *before* we send the message back, but doesn't work
effectively at interrupting the wait in the way you would like.

If we RESUME_INTERRUPTS() prior to waiting and then HOLD again that
would allow all signals not just SIGTERM. We would need to selectively
reject everything except SIGTERM messages.

Ideas?

Alter ProcessInterrupts() to accept an interrupt if ProcDiePending &&
WaitingForSyncRep and InterruptHoldoffCount > 0. That looks a little
scary, but looks like it will work.

If shutdown is requested before WaitingForSyncRep is set to TRUE and
after HOLD_INTERRUPT() is called, the waiting backends cannot be
interrupted.

SIGTERM can be sent by pg_terminate_backend(). So we should check
whether shutdown is requested before emitting WARNING and closing
the connection. If it's not requested yet, I think that it's safe to return the
success indication to the client.

I think that it's safer to close the connection and terminate the backend
after cleaning all the resources. So, as I suggested before, how about
doing that in EndCommand()?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#16Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#15)
Re: Sync Rep v19

On Fri, 2011-03-04 at 17:34 +0900, Fujii Masao wrote:

On Fri, Mar 4, 2011 at 3:16 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

CommitTransaction() calls HOLD_INTERRUPT() and then RESUME_INTERRUPTS(),
which was reasonable before we started waiting for syncrep. The
interrupt does occur *before* we send the message back, but doesn't work
effectively at interrupting the wait in the way you would like.

If we RESUME_INTERRUPTS() prior to waiting and then HOLD again that
would allow all signals not just SIGTERM. We would need to selectively
reject everything except SIGTERM messages.

Ideas?

Alter ProcessInterrupts() to accept an interrupt if ProcDiePending &&
WaitingForSyncRep and InterruptHoldoffCount > 0. That looks a little
scary, but looks like it will work.

If shutdown is requested before WaitingForSyncRep is set to TRUE and
after HOLD_INTERRUPT() is called, the waiting backends cannot be
interrupted.

SIGTERM can be sent by pg_terminate_backend(). So we should check
whether shutdown is requested before emitting WARNING and closing
the connection. If it's not requested yet, I think that it's safe to return the
success indication to the client.

I'm not sure if that matters. Nobody apart from the postmaster knows
about a shutdown. All the other processes know is that they received
SIGTERM, which as you say could have been a specific user action aimed
at an individual process.

We need a way to end the wait state explicitly, so it seems easier to
make SIGTERM the initiating action, no matter how it is received.

The alternative is to handle it this way
1) set something in shared memory
2) set latch of all backends
3) have the backends read shared memory and then end the wait

Who would do (1) and (2)? Not the backend, its sleeping, not the
postmaster its shm, nor a WALSender cos it might not be there.

Seems like a lot of effort to avoid SIGTERM. Do we have a good reason
why we need that? Might it introduce other issues?

I think that it's safer to close the connection and terminate the backend
after cleaning all the resources. So, as I suggested before, how about
doing that in EndCommand()?

Yes, if we don't use SIGTERM then we would use EndCommand()

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#17Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#14)
Re: Sync Rep v19

On Fri, 2011-03-04 at 16:42 +0900, Fujii Masao wrote:

On Fri, Mar 4, 2011 at 12:02 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

Though I've not read whole of the patch yet, here is the current comment:

Here are another comments:

+#replication_timeout_client = 120 # 0 means wait forever

Typo: s/replication_timeout_client/sync_replication_timeout

Done

+			else if (timeout > 0 &&
+				TimestampDifferenceExceeds(GetCurrentTransactionStopTimestamp(),
+											wait_start, timeout))

If SetCurrentTransactionStopTimestamp() is called before (i.e., COMMIT case),
the return value of GetCurrentTransactionStopTimestamp() is the same as
"wait_start". So, in this case, the timeout never expires.

Don't understand (still)

+				strcpy(new_status + len, " waiting for sync rep");
+				set_ps_display(new_status, false);

How about changing the message to something like "waiting for %X/%X"
(%X/%X indicates the LSN which the backend is waiting for)?

Done

Please initialize MyProc->procWaitLink to NULL in InitProcess() as well as
do MyProc->lwWaitLink.

I'm rewriting that aspect now.

+	/*
+	 * We're a potential sync standby. Release waiters if we are the
+	 * highest priority standby. We do this even if the standby is not yet
+	 * caught up, in case this is a restart situation and
+	 * there are backends waiting for us. That allows backends to exit the
+	 * wait state even if new backends cannot yet enter the wait state.
+	 */

I don't think that it's good idea to switch the high priority standby which has
not caught up, to the sync one, especially when there is already another
sync standby. Because that degrades replication from sync to async for
a while, even though there is sync standby which has caught up.

OK, that wasn't really my intention. Changed.

+		if (walsnd->pid != 0 &&
+			walsnd->sync_standby_priority > 0 &&
+			(priority == 0 ||
+			 priority < walsnd->sync_standby_priority))
+		{
+			 priority = walsnd->sync_standby_priority;
+			 syncWalSnd = walsnd;
+		}

According to the code, the last named standby has highest priority. But the
document says the opposite.

Priority is a difficult word here since "1" is the highest priority. I
deliberately avoided using the word "highest" in the code for that
reason.

The code above finds the lowest non-zero standby, which is correct as
documented.

ISTM the waiting backends can be sent the wake-up signal by the
walsender multiple times since the walsender doesn't remove any
entry from the queue. Isn't this unsafe? waste of the cycle?

It's ok to set a latch that isn't set. It's unlikely to wake someone
twice before they can remove themselves.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#18Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#17)
Re: Sync Rep v19

On Fri, 2011-03-04 at 10:51 +0000, Simon Riggs wrote:

+			else if (timeout > 0 &&
+				TimestampDifferenceExceeds(GetCurrentTransactionStopTimestamp(),
+											wait_start, timeout))

If SetCurrentTransactionStopTimestamp() is called before (i.e., COMMIT case),
the return value of GetCurrentTransactionStopTimestamp() is the same as
"wait_start". So, in this case, the timeout never expires.

Don't understand (still)

OK, coffee has seeped into brain now, thanks.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#19Yeb Havinga
yebhavinga@gmail.com
In reply to: Simon Riggs (#1)
Re: Sync Rep v19

On 2011-03-03 11:53, Simon Riggs wrote:

Latest version of Sync Rep, which includes substantial internal changes
and simplifications from previous version. (25-30 changes).

Testing more with the post v19 version from github with HEAD

commit 009875662e1b47012e1f4b7d30eb9e238d1937f6
Author: Simon Riggs <simon@2ndquadrant.com>
Date: Fri Mar 4 06:13:43 2011 +0000

Allow SIGTERM messages in ProcessInterrupts() even when interrupts are
held, if WaitingForSyncRep

1) unexpected behaviour
- master has synchronous_standby_names = 'standby1,standby2,standby3'
- standby with 'standby2' connects first.
- LOG: 00000: standby "standby2" is now the synchronous standby with
priority 2

I'm still confused by the priority numbers. At first I thought that
priority 1 meant: this is the one that is currently waited for. Now I'm
not sure if this is the first potential standby that is not used, or
that it is actually the one waited for.
What I expected was that it would be connected with priority 1. And then
if the standby1 connect, it would become the one with prio1 and standby2
with prio2.

2) unexpected behaviour
- continued from above
- standby with 'asyncone' name connects next
- no log message on master

I expected a log message along the lines 'standby "asyncone" is now an
asynchronous standby'

3) more about log messages
- didn't get a log message that the asyncone standby stopped
- didn't get a log message that standby1 connected with priority 1
- after stop / start master, again only got a log that standby2
connectied with priority 2
- pg_stat_replication showed both standb1 and standby2 with correct prio#

4) More about the priority stuff. At this point I figured out prio 2 can
also be 'the real sync'. Still I'd prefer in pg_stat_replication a
boolean that clearly shows 'this is the one', with a source that is
intimately connected to the syncrep implemenation, instead of a
different implementation of 'if lowest connected priority and > 0, then
sync is true. If there are two different implementations, there is room
for differences, which doesn't feel right.

5) performance.
Seems to have dropped a a few dozen %. With v17 I earlier got ~650 tps
and after some more tuning over 900 tps. Now with roughly the same setup
I get ~ 550 tps. Both versions on the same hardware, both compiled
without debugging, and I used the same postgresql.conf start config.

I'm currently thinking about a failure test that would check if a commit
has really waited for the standby. What's the worst thing to do to a
master server? Ideas are welcome :-)

#!/bin/sh
psql -c "create a big table with generate_series"
echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger

regards,
Yeb Havinga

#20Simon Riggs
simon@2ndQuadrant.com
In reply to: Yeb Havinga (#19)
Re: Sync Rep v19

On Fri, 2011-03-04 at 12:24 +0100, Yeb Havinga wrote:

On 2011-03-03 11:53, Simon Riggs wrote:

Latest version of Sync Rep, which includes substantial internal changes
and simplifications from previous version. (25-30 changes).

Testing more with the post v19 version from github with HEAD

Thanks

commit 009875662e1b47012e1f4b7d30eb9e238d1937f6
Author: Simon Riggs <simon@2ndquadrant.com>
Date: Fri Mar 4 06:13:43 2011 +0000

Allow SIGTERM messages in ProcessInterrupts() even when interrupts are
held, if WaitingForSyncRep

1) unexpected behaviour
- master has synchronous_standby_names = 'standby1,standby2,standby3'
- standby with 'standby2' connects first.
- LOG: 00000: standby "standby2" is now the synchronous standby with
priority 2

I'm still confused by the priority numbers. At first I thought that
priority 1 meant: this is the one that is currently waited for. Now I'm
not sure if this is the first potential standby that is not used, or
that it is actually the one waited for.
What I expected was that it would be connected with priority 1. And then
if the standby1 connect, it would become the one with prio1 and standby2
with prio2.

The priority refers to the order in which that standby is listed in
synchronous_standby_names. That is not dependent upon who is currently
connected. It doesn't mean the order in which the currently connected
standbys will become the sync standby.

So the log message allows you to work out that "standby2" is connected
and will operate as sync standby until something mentioned earlier in
synchronous_standby_names, in this case standby1, connects.

2) unexpected behaviour
- continued from above
- standby with 'asyncone' name connects next
- no log message on master

I expected a log message along the lines 'standby "asyncone" is now an
asynchronous standby'

That would introduce messages where there currently aren't any, so I
left that out. I'll put it in for clarity.

3) more about log messages
- didn't get a log message that the asyncone standby stopped

OK

- didn't get a log message that standby1 connected with priority 1

Bad

- after stop / start master, again only got a log that standby2
connectied with priority 2

Bad

- pg_stat_replication showed both standb1 and standby2 with correct prio#

Good

Please send me log output at DEBUG3 offline.

4) More about the priority stuff. At this point I figured out prio 2 can
also be 'the real sync'. Still I'd prefer in pg_stat_replication a
boolean that clearly shows 'this is the one', with a source that is
intimately connected to the syncrep implemenation, instead of a
different implementation of 'if lowest connected priority and > 0, then
sync is true. If there are two different implementations, there is room
for differences, which doesn't feel right.

OK

5) performance.
Seems to have dropped a a few dozen %. With v17 I earlier got ~650 tps
and after some more tuning over 900 tps. Now with roughly the same setup
I get ~ 550 tps. Both versions on the same hardware, both compiled
without debugging, and I used the same postgresql.conf start config.

Will need to re-look at performance after commit

I'm currently thinking about a failure test that would check if a commit
has really waited for the standby. What's the worst thing to do to a
master server? Ideas are welcome :-)

#!/bin/sh
psql -c "create a big table with generate_series"
echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger

regards,
Yeb Havinga

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#21Yeb Havinga
yebhavinga@gmail.com
In reply to: Yeb Havinga (#19)
#22Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#17)
#23Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#22)
#24Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#16)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#24)
#26Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Robert Haas (#6)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Jaime Casanova (#26)
#28Yeb Havinga
yebhavinga@gmail.com
In reply to: Jaime Casanova (#26)
#29Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#24)
#30Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#29)
#31Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#30)
#32Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#29)
#33Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#31)
#34Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#31)
#35Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#34)
#36Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#35)
#37Yeb Havinga
yebhavinga@gmail.com
In reply to: Robert Haas (#36)
#38Simon Riggs
simon@2ndQuadrant.com
In reply to: Yeb Havinga (#37)
#39Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#32)
#40Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#33)
#41Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#38)
#42Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#41)
#43Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Fujii Masao (#41)
#44Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#41)
#45Simon Riggs
simon@2ndQuadrant.com
In reply to: Jaime Casanova (#43)
#46Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#32)
#47Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#40)
#48Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#40)
#49Yeb Havinga
yebhavinga@gmail.com
In reply to: Simon Riggs (#46)
#50Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#47)
#51Yeb Havinga
yebhavinga@gmail.com
In reply to: Yeb Havinga (#49)
#52Yeb Havinga
yebhavinga@gmail.com
In reply to: Yeb Havinga (#51)
#53Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Yeb Havinga (#51)
#54Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#50)
#55Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#46)
#56Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#55)
#57Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#54)
#58Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#56)
#59Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#48)
#60Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#58)
#61Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#1)
#62Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Simon Riggs (#58)
#63Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#61)
#64Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#55)
#65Simon Riggs
simon@2ndQuadrant.com
In reply to: Yeb Havinga (#51)
#66Yeb Havinga
yebhavinga@gmail.com
In reply to: Simon Riggs (#65)
#67Simon Riggs
simon@2ndQuadrant.com
In reply to: Yeb Havinga (#66)
#68Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#63)
#69Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#68)
#70Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Robert Haas (#69)
#71Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#23)
#72Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#71)
#73Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#71)
#74Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Robert Haas (#73)
#75Robert Haas
robertmhaas@gmail.com
In reply to: Dimitri Fontaine (#74)
#76Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Robert Haas (#75)
#77Robert Haas
robertmhaas@gmail.com
In reply to: Dimitri Fontaine (#76)
#78Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#77)
#79Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#78)
#80Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#79)
#81Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#80)
#82Ross J. Reedstrom
reedstrm@rice.edu
In reply to: Robert Haas (#81)
#83Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#68)
#84Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#83)
#85Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#84)
#86Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#85)
#87Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#86)
#88Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#87)
#89Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#88)
#90Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#72)
#91Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#90)