WALWriter active during recovery

Started by Simon Riggsover 11 years ago16 messageshackers
Jump to latest
#1Simon Riggs
simon@2ndQuadrant.com

Currently, WALReceiver writes and fsyncs data it receives. Clearly,
while we are waiting for an fsync we aren't doing any other useful
work.

Following patch starts WALWriter during recovery and makes it
responsible for fsyncing data, allowing WALReceiver to progress other
useful actions.

At present this is a WIP patch, for code comments only. Don't bother
with anything other than code questions at this stage.

Implementation questions are

* How should we wake WALReceiver, since it waits on a poll(). Should
we use SIGUSR1, which is already used for latch waits, or another
signal?

* Should we introduce some pacing delays if the WALreceiver gets too
far ahead of apply?

* Other questions you may have?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

walwriter_active_in_recovery.v1.patchapplication/octet-stream; name=walwriter_active_in_recovery.v1.patchDownload+218-73
#2Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#1)
Re: WALWriter active during recovery

Hi,

On 2014-12-15 18:51:44 +0000, Simon Riggs wrote:

Currently, WALReceiver writes and fsyncs data it receives. Clearly,
while we are waiting for an fsync we aren't doing any other useful
work.

Well, it can still buffer data on the network level, but there's
definitely limits to that. So I can see this as being useful.

Following patch starts WALWriter during recovery and makes it
responsible for fsyncing data, allowing WALReceiver to progress other
useful actions.

At present this is a WIP patch, for code comments only. Don't bother
with anything other than code questions at this stage.

Implementation questions are

* How should we wake WALReceiver, since it waits on a poll(). Should
we use SIGUSR1, which is already used for latch waits, or another
signal?

It's not entirely trivial, but also not hard, to make it use the latch
code for waiting. It'd probably end up requiring less code because then
we could just scratch libqpwalreceiver.c:libpq_select().

* Should we introduce some pacing delays if the WALreceiver gets too
far ahead of apply?

Hm. Why don't we simply start fsyncing in the receiver itself at regular
intervals? If already synced that's cheap, if not, it'll pace us.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#1)
Re: WALWriter active during recovery

On 12/15/2014 08:51 PM, Simon Riggs wrote:

Currently, WALReceiver writes and fsyncs data it receives. Clearly,
while we are waiting for an fsync we aren't doing any other useful
work.

Following patch starts WALWriter during recovery and makes it
responsible for fsyncing data, allowing WALReceiver to progress other
useful actions.

What other useful actions can WAL receiver do while it's waiting? It
doesn't do much else than receive WAL, and fsync it to disk.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#3)
Re: WALWriter active during recovery

On 2014-12-16 16:12:40 +0200, Heikki Linnakangas wrote:

On 12/15/2014 08:51 PM, Simon Riggs wrote:

Currently, WALReceiver writes and fsyncs data it receives. Clearly,
while we are waiting for an fsync we aren't doing any other useful
work.

Following patch starts WALWriter during recovery and makes it
responsible for fsyncing data, allowing WALReceiver to progress other
useful actions.

What other useful actions can WAL receiver do while it's waiting? It doesn't
do much else than receive WAL, and fsync it to disk.

It can actually receive further data from the network and write it to
disk? On a relatively low latency network the buffers aren't that
large. Right now we generate quite a bursty IO pattern with the disks
alternating between idle and fully busy.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#3)
Re: WALWriter active during recovery

On 16 December 2014 at 14:12, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

On 12/15/2014 08:51 PM, Simon Riggs wrote:

Currently, WALReceiver writes and fsyncs data it receives. Clearly,
while we are waiting for an fsync we aren't doing any other useful
work.

Following patch starts WALWriter during recovery and makes it
responsible for fsyncing data, allowing WALReceiver to progress other
useful actions.

What other useful actions can WAL receiver do while it's waiting? It doesn't
do much else than receive WAL, and fsync it to disk.

So now it will only need to do one of those two things.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6didier
did447@gmail.com
In reply to: Simon Riggs (#5)
Re: WALWriter active during recovery

Hi,

On Tue, Dec 16, 2014 at 6:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 16 December 2014 at 14:12, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

On 12/15/2014 08:51 PM, Simon Riggs wrote:

Currently, WALReceiver writes and fsyncs data it receives. Clearly,
while we are waiting for an fsync we aren't doing any other useful
work.

Following patch starts WALWriter during recovery and makes it
responsible for fsyncing data, allowing WALReceiver to progress other
useful actions.

On many Linux systems it may not do that much (2.6.32 and 3.2 are bad,
3.13 is better but still it slows the fsync).

If there's a fsync in progress WALReceiver will:
1- slow the fsync because its writes to the same file are grabbed by the fsync
2- stall until the end of fsync.

from 'stracing' a test program simulating this pattern:
two processes, one writes to a file the second fsync it.

20279 11:51:24.037108 fsync(5 <unfinished ...>
20278 11:51:24.053524 <... nanosleep resumed> NULL) = 0 <0.020281>
20278 11:51:24.053691 lseek(3, 1383612416, SEEK_SET) = 1383612416 <0.000119>
20278 11:51:24.053965 write(3, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"...,
8192) = 8192 <0.000111>
20278 11:51:24.054190 nanosleep({0, 20000000}, NULL) = 0 <0.020243>
....
20278 11:51:24.404386 lseek(3, 194772992, SEEK_SET <unfinished ...>
20279 11:51:24.754123 <... fsync resumed> ) = 0 <0.716971>
20279 11:51:24.754202 close(5 <unfinished ...>
20278 11:51:24.754232 <... lseek resumed> ) = 194772992 <0.349825>

Yes that's a 300ms lseek...

What other useful actions can WAL receiver do while it's waiting? It doesn't
do much else than receive WAL, and fsync it to disk.

So now it will only need to do one of those two things.

Regards
Didier

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Simon Riggs
simon@2ndQuadrant.com
In reply to: didier (#6)
Re: WALWriter active during recovery

On 17 December 2014 at 11:27, didier <did447@gmail.com> wrote:

If there's a fsync in progress WALReceiver will:
1- slow the fsync because its writes to the same file are grabbed by the fsync
2- stall until the end of fsync.

PostgreSQL already fsyncs files while they are being written to. Are
you saying we should stop doing that?

It would be possible to synchronize processes so that we don't write
to a file while it is being fsynced.

fsyncs are also made once the whole 16MB has been written, so in those
cases there is no simultaneous action.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: didier (#6)
Re: WALWriter active during recovery

didier wrote:

On many Linux systems it may not do that much (2.6.32 and 3.2 are bad,
3.13 is better but still it slows the fsync).

If there's a fsync in progress WALReceiver will:
1- slow the fsync because its writes to the same file are grabbed by the fsync
2- stall until the end of fsync.

Is this behavior filesystem-dependent?

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9didier
did447@gmail.com
In reply to: Alvaro Herrera (#8)
Re: WALWriter active during recovery

Hi

On Wed, Dec 17, 2014 at 2:39 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

didier wrote:

On many Linux systems it may not do that much (2.6.32 and 3.2 are bad,
3.13 is better but still it slows the fsync).

If there's a fsync in progress WALReceiver will:
1- slow the fsync because its writes to the same file are grabbed by the fsync
2- stall until the end of fsync.

Is this behavior filesystem-dependent?

I don't know. I only tested ext4

Attach the trivial code I used, there's a lot of junk in it.

Didier

Attachments:

testf1.ctext/x-csrc; charset=US-ASCII; name=testf1.cDownload
#10Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#1)
Re: WALWriter active during recovery

On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Currently, WALReceiver writes and fsyncs data it receives. Clearly,
while we are waiting for an fsync we aren't doing any other useful
work.

Following patch starts WALWriter during recovery and makes it
responsible for fsyncing data, allowing WALReceiver to progress other
useful actions.

+1

At present this is a WIP patch, for code comments only. Don't bother
with anything other than code questions at this stage.

Implementation questions are

* How should we wake WALReceiver, since it waits on a poll(). Should
we use SIGUSR1, which is already used for latch waits, or another
signal?

Probably we need to change libpqwalreceiver so that it uses the latch.
This is useful even for the startup process to report the replay location to
the walreceiver in real time.

* Should we introduce some pacing delays if the WALreceiver gets too
far ahead of apply?

I don't think so for now. Instead, we can support synchronous_commit = replay,
and the users can use that new mode if they are worried about the delay of
WAL replay.

* Other questions you may have?

Who should wake the startup process so that it reads and replays the WAL data?
Current walreceiver. But if walwriter is responsible for fsyncing WAL data,
probably walwriter should do that. Because the startup process should not replay
the WAL data which has not been fsync'd yet.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#10)
Re: WALWriter active during recovery

On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs <simon@2ndquadrant.com>
wrote:

Currently, WALReceiver writes and fsyncs data it receives. Clearly,
while we are waiting for an fsync we aren't doing any other useful
work.

Following patch starts WALWriter during recovery and makes it
responsible for fsyncing data, allowing WALReceiver to progress other
useful actions.

+1

At present this is a WIP patch, for code comments only. Don't bother
with anything other than code questions at this stage.

Implementation questions are

* How should we wake WALReceiver, since it waits on a poll(). Should
we use SIGUSR1, which is already used for latch waits, or another
signal?

Probably we need to change libpqwalreceiver so that it uses the latch.
This is useful even for the startup process to report the replay location
to
the walreceiver in real time.

* Should we introduce some pacing delays if the WALreceiver gets too
far ahead of apply?

I don't think so for now. Instead, we can support synchronous_commit =
replay,
and the users can use that new mode if they are worried about the delay of
WAL replay.

* Other questions you may have?

Who should wake the startup process so that it reads and replays the WAL
data?
Current walreceiver. But if walwriter is responsible for fsyncing WAL data,
probably walwriter should do that. Because the startup process should not
replay
the WAL data which has not been fsync'd yet.

Moved this patch to CF 2015-02 to not lose track of it and because it did
not get any reviews.
--
Michael

#12Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#10)
Re: WALWriter active during recovery

On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Currently, WALReceiver writes and fsyncs data it receives. Clearly,
while we are waiting for an fsync we aren't doing any other useful
work.

Following patch starts WALWriter during recovery and makes it
responsible for fsyncing data, allowing WALReceiver to progress other
useful actions.

With the patch, replication didn't work fine in my machine. I started
the standby server after removing all the WAL files from the standby.
ISTM that the patch doesn't handle that case. That is, in that case,
the standby tries to start up walreceiver and replication to retrieve
the REDO-starting checkpoint record *before* starting up walwriter
(IOW, before reaching the consistent point). Then since walreceiver works
without walwriter, no received WAL data cannot be fsync'd in the standby.
So replication cannot advance furthermore. I think that walwriter needs
to start before walreceiver starts.

I just marked this patch as Waiting on Author.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#12)
Re: WALWriter active during recovery

On Thu, Mar 5, 2015 at 5:22 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Currently, WALReceiver writes and fsyncs data it receives. Clearly,
while we are waiting for an fsync we aren't doing any other useful
work.

Following patch starts WALWriter during recovery and makes it
responsible for fsyncing data, allowing WALReceiver to progress other
useful actions.

With the patch, replication didn't work fine in my machine. I started
the standby server after removing all the WAL files from the standby.
ISTM that the patch doesn't handle that case. That is, in that case,
the standby tries to start up walreceiver and replication to retrieve
the REDO-starting checkpoint record *before* starting up walwriter
(IOW, before reaching the consistent point). Then since walreceiver works
without walwriter, no received WAL data cannot be fsync'd in the standby.
So replication cannot advance furthermore. I think that walwriter needs
to start before walreceiver starts.

I just marked this patch as Waiting on Author.

This patch was moved to current CF with the status "Needs review".
But there are already some review comments which have not been addressed yet,
so I marked the patch as "Waiting on Author" again.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#13)
Re: WALWriter active during recovery

On 2 July 2015 at 14:31, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Mar 5, 2015 at 5:22 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao <masao.fujii@gmail.com>

wrote:

On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs <simon@2ndquadrant.com>

wrote:

Currently, WALReceiver writes and fsyncs data it receives. Clearly,
while we are waiting for an fsync we aren't doing any other useful
work.

Following patch starts WALWriter during recovery and makes it
responsible for fsyncing data, allowing WALReceiver to progress other
useful actions.

With the patch, replication didn't work fine in my machine. I started
the standby server after removing all the WAL files from the standby.
ISTM that the patch doesn't handle that case. That is, in that case,
the standby tries to start up walreceiver and replication to retrieve
the REDO-starting checkpoint record *before* starting up walwriter
(IOW, before reaching the consistent point). Then since walreceiver works
without walwriter, no received WAL data cannot be fsync'd in the standby.
So replication cannot advance furthermore. I think that walwriter needs
to start before walreceiver starts.

I just marked this patch as Waiting on Author.

This patch was moved to current CF with the status "Needs review".
But there are already some review comments which have not been addressed
yet,
so I marked the patch as "Waiting on Author" again.

This was pushed back from last CF and I haven't worked on it at all, nor
will I.

Pushing back again.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#15Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#14)
Re: WALWriter active during recovery

On 2015-07-02 14:34:48 +0100, Simon Riggs wrote:

This was pushed back from last CF and I haven't worked on it at all, nor
will I.

Pushing back again.

Let's "return with feedback", not " move", it then.. Moving a entries
along which aren't expected to receive updates anytime soon isn't a good
idea, there's more than enough entries each CF.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Simon Riggs
simon@2ndQuadrant.com
In reply to: Andres Freund (#15)
Re: WALWriter active during recovery

On 2 July 2015 at 14:38, Andres Freund <andres@anarazel.de> wrote:

On 2015-07-02 14:34:48 +0100, Simon Riggs wrote:

This was pushed back from last CF and I haven't worked on it at all, nor
will I.

Pushing back again.

Let's "return with feedback", not " move", it then.. Moving a entries
along which aren't expected to receive updates anytime soon isn't a good
idea, there's more than enough entries each CF.

Although I agree, the interface won't let me do that, so will leave as-is.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services