Re: Can PostgreSQL create new WAL files instead of reusing old ones?

Started by Jerry Jelinekalmost 8 years ago7 messagesgeneral

jerry.jelinek@joyent.com

almost 8 years ago

As Dave described in his original email on this topic, we'd like to avoid
recycling WAL files since that can cause performance issues when we have a
read-modify-write on a file that has dropped out of the cache.

I have implemented a small change to allow WAL recycling to be disabled. It
is visible at:
https://cr.joyent.us/#/c/4263/

I'd appreciate getting any feedback on this.

Thanks,
Jerry

David Pacheco

dap@joyent.com

almost 8 years ago

In reply to: Jerry Jelinek (#1)

On Wed, Jun 20, 2018 at 10:35 AM, Jerry Jelinek <jerry.jelinek@joyent.com>
wrote:

As Dave described in his original email on this topic, we'd like to avoid
recycling WAL files since that can cause performance issues when we have a
read-modify-write on a file that has dropped out of the cache.

I have implemented a small change to allow WAL recycling to be disabled.
It is visible at:
https://cr.joyent.us/#/c/4263/

I'd appreciate getting any feedback on this.

Thanks,
Jerry

For reference, there's more context in this thread from several months ago:
/messages/by-id/CACukRjO7DJvub8e2AijOayj8BfKK3XXBTwu3KKARiTr67M3E3w@mail.gmail.com

I'll repeat the relevant summary here:

tl;dr: We've found that under many conditions, PostgreSQL's re-use of old

WAL files appears to significantly degrade query latency on ZFS. The
reason is
complicated and I have details below. Has it been considered to make this
behavior tunable, to cause PostgreSQL to always create new WAL files
instead of re-using old ones?

Thanks,
Dave

Thomas Munro

thomas.munro@gmail.com

almost 8 years ago

In reply to: David Pacheco (#2)

On Fri, Jun 22, 2018 at 11:22 AM, David Pacheco <dap@joyent.com> wrote:

On Wed, Jun 20, 2018 at 10:35 AM, Jerry Jelinek <jerry.jelinek@joyent.com>
wrote:

I have implemented a small change to allow WAL recycling to be disabled.
It is visible at:
https://cr.joyent.us/#/c/4263/

I'd appreciate getting any feedback on this.

tl;dr: We've found that under many conditions, PostgreSQL's re-use of old
WAL files appears to significantly degrade query latency on ZFS.

I haven't tested by it looks reasonable to me. It needs documentation
in doc/src/sgml/config.sgml. It should be listed in
src/backend/utils/misc/postgresql.conf.sample. We'd want a patch
against our master branch. Could you please register it in
commitfest.postgresql.org so we don't lose track of it?

Hey, a question about PostgreSQL on ZFS: what do you guys think about
pg_flush_data() in fd.c? It does mmap(), msync(), munmap() to try to
influence writeback? I wonder if at least on some operating systems
that schlepps a bunch of data out of ZFS ARC into OS page cache, kinda
trashing the latter?

--
Thomas Munro
http://www.enterprisedb.com

Jerry Jelinek

jerry.jelinek@joyent.com

almost 8 years ago

In reply to: Thomas Munro (#3)

Thomas,

Thanks for taking a look at this. I'll work on getting a patch together for
the master branch. I'll also take a look at the other question you raised
and get back to you once I have more information.

Thanks again,
Jerry

On Thu, Jun 21, 2018 at 10:20 PM, Thomas Munro <
thomas.munro@enterprisedb.com> wrote:

Show quoted text

On Fri, Jun 22, 2018 at 11:22 AM, David Pacheco <dap@joyent.com> wrote:

On Wed, Jun 20, 2018 at 10:35 AM, Jerry Jelinek <

jerry.jelinek@joyent.com>

wrote:

I have implemented a small change to allow WAL recycling to be disabled.
It is visible at:
https://cr.joyent.us/#/c/4263/

I'd appreciate getting any feedback on this.

tl;dr: We've found that under many conditions, PostgreSQL's re-use of

old

WAL files appears to significantly degrade query latency on ZFS.

I haven't tested by it looks reasonable to me. It needs documentation
in doc/src/sgml/config.sgml. It should be listed in
src/backend/utils/misc/postgresql.conf.sample. We'd want a patch
against our master branch. Could you please register it in
commitfest.postgresql.org so we don't lose track of it?

Hey, a question about PostgreSQL on ZFS: what do you guys think about
pg_flush_data() in fd.c? It does mmap(), msync(), munmap() to try to
influence writeback? I wonder if at least on some operating systems
that schlepps a bunch of data out of ZFS ARC into OS page cache, kinda
trashing the latter?

--
Thomas Munro
http://www.enterprisedb.com

Vick Khera

vivek@khera.org

almost 8 years ago

In reply to: Jerry Jelinek (#1)

On Wed, Jun 20, 2018 at 1:35 PM, Jerry Jelinek <jerry.jelinek@joyent.com>
wrote:

As Dave described in his original email on this topic, we'd like to avoid
recycling WAL files since that can cause performance issues when we have a
read-modify-write on a file that has dropped out of the cache.

I have implemented a small change to allow WAL recycling to be disabled.
It is visible at:
https://cr.joyent.us/#/c/4263/

I'd appreciate getting any feedback on this.

This looks so simple, yet so beneficial. Thanks for making it. Is there
some other mechanism that already cleans out the old unneeded WAL files? I
recall there is something that does it when you start up after changing the
number of files to keep, but I don't recall if that is tested over some
loop regularly.

Is there some way to make it auto-detect when it should be enabled? If not,
please document that it should be used on ZFS and any other file system
with CoW properties on files.

Adam Brusselback

adambrusselback@gmail.com

almost 8 years ago

In reply to: Vick Khera (#5)

Is there some way to make it auto-detect when it should be enabled? If

not, please document that it should be used on ZFS and any other file
system with CoW properties on files.
In addition to this, wondering what type of performance regression this
would show on something like ext4 (if any).

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Adam Brusselback (#6)

Hi,

On 2018-06-22 11:41:45 -0400, Adam Brusselback wrote:

Is there some way to make it auto-detect when it should be enabled? If

not, please document that it should be used on ZFS and any other file
system with CoW properties on files.

In addition to this, wondering what type of performance regression this
would show on something like ext4 (if any).

It's a *massive* regression on ext4 & xfs. You can very trivially
compare the performance of a new cluster (which doesn't have files to
recycle) against one that's running for a while.

Greetings,

Andres Freund