Horrible CREATE DATABASE Performance in High Sierra

Started by Brent Dearthover 8 years ago22 messages

brent.dearth@gmail.com

over 8 years ago

I just recently "upgraded" to High Sierra and experiencing horrendous CREATE
DATABASE performance. Creating a database from a 3G template DB used to
take ~1m but post-upgrade is taking ~22m at a sustained write of around
4MB/s. Occasionally, attempting to create an empty database hangs
indefinitely as well. When this happens, restarting the Postgres server
allows empty database initialization in ~1s.

I had been running on an encrypted APFS volume (FileVault), but after
dropping encryption, saw the tasks drop to about *~15m* per run. Still a
far cry from the previous *~1m* threshold.

A multi-threaded pg_restore seems to sustain writes of ~38M/s and completes
in about the same time as pre-upgrade (Sierra), so I'm not sure it's
entirely related to APFS / disk IO.

I've completely rebuilt the Postgres data directory, re-installed Postgres
(Postgres.app 2.0.5) etc. I don't have any reasonable explanation for what
could have broken so catastrophically.

Coworker has seen the exact same issue. Has anyone else experienced this
yet or have any insight as to what could be happening?

Thanks in advance!

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Brent Dearth (#1)

Re: Horrible CREATE DATABASE Performance in High Sierra

Brent Dearth <brent.dearth@gmail.com> writes:

I just recently "upgraded" to High Sierra and experiencing horrendous CREATE
DATABASE performance. Creating a database from a 3G template DB used to
take ~1m but post-upgrade is taking ~22m at a sustained write of around
4MB/s. Occasionally, attempting to create an empty database hangs
indefinitely as well. When this happens, restarting the Postgres server
allows empty database initialization in ~1s.

What PG version are you running?

I tried to reproduce this, using HEAD and what I had handy:
(a) Late 2016 MacBook Pro, 2.7GHz i7, still on Sierra
(b) Late 2013 MacBook Pro, 2.3GHz i7, High Sierra, drive is converted to APFS

I made a ~7.5GB test database using "pgbench -i -s 500 bench" and
then cloned it with "create database b2 with template bench".

Case 1: fsync off.
Machine A did the clone in 5.6 seconds, machine B in 12.9 seconds.

Considering the CPU speed difference and the fact that Apple put
significantly faster SSDs into the 2016 models, I'm not sure this
difference is due to anything but better hardware.

Case 2: fsync on.
Machine A did the clone in 7.5 seconds, machine B in 2523.5 sec (42 min!).

So something is badly busted in APFS' handling of fsync, and/or
we're doing it in a bad way.

Interestingly, pg_test_fsync shows only about a factor-of-2 difference
in the timings for regular file fsyncs. So I poked into non-fsync
logic that we'd added recently, and after awhile found that diking out
the msync code path in pg_flush_data reduces machine B's time to an
entirely reasonable 11.5 seconds.

In short, therefore, APFS cannot cope with the way we're using msync().
I observe that the copy gets slower and slower as it runs (watching the
transfer rate with "iostat 1"), which makes me wonder if there's some
sort of O(N^2) issue in the kernel logic for this. But anyway, as
a short-term workaround you might try

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index b0c174284b..af35de5f7d 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -451,7 +451,7 @@ pg_flush_data(int fd, off_t offset, off_t nbytes)
        return;
    }
 #endif
-#if !defined(WIN32) && defined(MS_ASYNC)
+#if 0
    {
        void       *p;
        static int  pagesize = 0;

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Tom Lane (#2)

Re: Horrible CREATE DATABASE Performance in High Sierra

I wrote:

In short, therefore, APFS cannot cope with the way we're using msync().

I experimented with this further by seeing whether the msync() code path
is of any value on Sierra either. The answer seems to be "no": cloning
a scale-1000 pgbench database takes about 17-18 seconds on my Sierra
laptop using unmodified HEAD, but if I dike out the msync() logic then
it takes 16-17 seconds. Both numbers jump around a little, but using
msync is strictly worse.

I propose therefore that an appropriate fix is to unconditionally disable
the msync code path on Darwin, as we have already done for Windows. When
and if Apple changes their kernel so that this path is actually of some
value, we can figure out how to detect whether to use it.

The msync logic seems to date back to this thread:

/messages/by-id/alpine.DEB.2.10.1506011320000.28433@sto

wherein Andres opined

I think this patch primarily needs:
* Benchmarking on FreeBSD/OSX to see whether we should enable the
mmap()/msync(MS_ASYNC) method by default. Unless somebody does so, I'm
inclined to leave it off till then.

but so far as I can tell from the thread, only testing on FreeBSD ever
got done. So there's no evidence that this was ever beneficial on macOS,
and we now have evidence that it's between counterproductive and
catastrophic depending on which kernel version you look at.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Brent Dearth

brent.dearth@gmail.com

over 8 years ago

In reply to: Tom Lane (#3)

Re: Horrible CREATE DATABASE Performance in High Sierra

Thanks for this breakdown Tom!

FWIW - I'm on Postgres 9.6.5 as bundled with Postgres.app (2.0.5) running
on 2013 MBP (2.7GHz i7 / 16GB / SSD) setup. It looks like this might be a
priority for an upcoming release, so I might try to hold out for
downstream, but thanks for the patch. It will help if we need get custom
builds out to fellow devs if this becomes too unbearable.

On Mon, Oct 2, 2017 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

I wrote:

In short, therefore, APFS cannot cope with the way we're using msync().

I experimented with this further by seeing whether the msync() code path
is of any value on Sierra either. The answer seems to be "no": cloning
a scale-1000 pgbench database takes about 17-18 seconds on my Sierra
laptop using unmodified HEAD, but if I dike out the msync() logic then
it takes 16-17 seconds. Both numbers jump around a little, but using
msync is strictly worse.

I propose therefore that an appropriate fix is to unconditionally disable
the msync code path on Darwin, as we have already done for Windows. When
and if Apple changes their kernel so that this path is actually of some
value, we can figure out how to detect whether to use it.

The msync logic seems to date back to this thread:

/messages/by-id/alpine.DEB.2.
10.1506011320000.28433%40sto

wherein Andres opined

I think this patch primarily needs:
* Benchmarking on FreeBSD/OSX to see whether we should enable the
mmap()/msync(MS_ASYNC) method by default. Unless somebody does so, I'm
inclined to leave it off till then.

but so far as I can tell from the thread, only testing on FreeBSD ever
got done. So there's no evidence that this was ever beneficial on macOS,
and we now have evidence that it's between counterproductive and
catastrophic depending on which kernel version you look at.

regards, tom lane

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Tom Lane (#3)

Re: Horrible CREATE DATABASE Performance in High Sierra

On 2017-10-02 15:42:25 -0400, Tom Lane wrote:

I wrote:

In short, therefore, APFS cannot cope with the way we're using msync().

I experimented with this further by seeing whether the msync() code path
is of any value on Sierra either. The answer seems to be "no": cloning
a scale-1000 pgbench database takes about 17-18 seconds on my Sierra
laptop using unmodified HEAD, but if I dike out the msync() logic then
it takes 16-17 seconds. Both numbers jump around a little, but using
msync is strictly worse.

Well, that's only measuring one type of workload. Could you run a normal
pgbench with -P1 or so for 2-3 checkpoint cycles and see how big the
latency differences are?

I propose therefore that an appropriate fix is to unconditionally disable
the msync code path on Darwin, as we have already done for Windows. When
and if Apple changes their kernel so that this path is actually of some
value, we can figure out how to detect whether to use it.

I'm inclined to think you're right.

This is a surprisingly massive regression for a new OS release - we're
not the only database that uses msync...

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Andres Freund (#5)

Re: Horrible CREATE DATABASE Performance in High Sierra

Andres Freund <andres@anarazel.de> writes:

On 2017-10-02 15:42:25 -0400, Tom Lane wrote:

I experimented with this further by seeing whether the msync() code path
is of any value on Sierra either. The answer seems to be "no": cloning
a scale-1000 pgbench database takes about 17-18 seconds on my Sierra
laptop using unmodified HEAD, but if I dike out the msync() logic then
it takes 16-17 seconds. Both numbers jump around a little, but using
msync is strictly worse.

Well, that's only measuring one type of workload. Could you run a normal
pgbench with -P1 or so for 2-3 checkpoint cycles and see how big the
latency differences are?

Should I expect there to be any difference at all? We don't enable
*_flush_after by default on non-Linux platforms.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Tom Lane (#6)

Re: Horrible CREATE DATABASE Performance in High Sierra

On 2017-10-02 15:54:43 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2017-10-02 15:42:25 -0400, Tom Lane wrote:

I experimented with this further by seeing whether the msync() code path
is of any value on Sierra either. The answer seems to be "no": cloning
a scale-1000 pgbench database takes about 17-18 seconds on my Sierra
laptop using unmodified HEAD, but if I dike out the msync() logic then
it takes 16-17 seconds. Both numbers jump around a little, but using
msync is strictly worse.

Well, that's only measuring one type of workload. Could you run a normal
pgbench with -P1 or so for 2-3 checkpoint cycles and see how big the
latency differences are?

Should I expect there to be any difference at all? We don't enable
*_flush_after by default on non-Linux platforms.

Right, you'd have to enable that. But your patch would neuter an
intentionally enabled config too, no?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Andres Freund (#7)

Re: Horrible CREATE DATABASE Performance in High Sierra

Andres Freund <andres@anarazel.de> writes:

On 2017-10-02 15:54:43 -0400, Tom Lane wrote:

Should I expect there to be any difference at all? We don't enable
*_flush_after by default on non-Linux platforms.

Right, you'd have to enable that. But your patch would neuter an
intentionally enabled config too, no?

Well, if you want to suggest a specific scenario to try, I'm happy to.
I am not going to guess as to what will satisfy you.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Tom Lane (#8)

Re: Horrible CREATE DATABASE Performance in High Sierra

On 2017-10-02 15:59:05 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2017-10-02 15:54:43 -0400, Tom Lane wrote:

Should I expect there to be any difference at all? We don't enable
*_flush_after by default on non-Linux platforms.

Right, you'd have to enable that. But your patch would neuter an
intentionally enabled config too, no?

Well, if you want to suggest a specific scenario to try, I'm happy to.
I am not going to guess as to what will satisfy you.

To demonstrate what I'm observing here, on linux with a fairly fast ssd:

with:
-c autovacuum_analyze_threshold=2147483647 # to avoid analyze snapshot issue
-c fsync=on
-c synchronous_commit=on
-c shared_buffers=4GB
-c max_wal_size=30GB
-c checkpoint_timeout=30s
-c checkpoint_flush_after=0
-c bgwriter_flush_after=0
and
pgbench -i -s 100 -q

a pgbench -M prepared -c 8 -j 8 -n -P1 -T 100
often has periods like:

synchronous_commit = on:
progress: 73.0 s, 395.0 tps, lat 20.029 ms stddev 4.001
progress: 74.0 s, 289.0 tps, lat 23.730 ms stddev 23.337
progress: 75.0 s, 88.0 tps, lat 104.029 ms stddev 178.038
progress: 76.0 s, 400.0 tps, lat 20.055 ms stddev 4.844
latency average = 21.599 ms
latency stddev = 13.865 ms
tps = 370.346506 (including connections establishing)
tps = 370.372550 (excluding connections establishing)

with synchronous_commit=off those periods are a lot worse:
progress: 57.0 s, 21104.3 tps, lat 0.379 ms stddev 0.193
progress: 58.0 s, 9994.1 tps, lat 0.536 ms stddev 3.140
progress: 59.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 60.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 61.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 62.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 63.0 s, 3319.6 tps, lat 12.860 ms stddev 253.664
progress: 64.0 s, 20997.0 tps, lat 0.381 ms stddev 0.190
progress: 65.0 s, 20409.1 tps, lat 0.392 ms stddev 0.303
...
latency average = 0.745 ms
latency stddev = 20.470 ms
tps = 10743.555553 (including connections establishing)
tps = 10743.815591 (excluding connections establishing)

contrasting that to checkpoint_flush_after=256kB and
bgwriter_flush_after=512kB:

synchronous_commit=on
worst:
progress: 87.0 s, 298.0 tps, lat 26.874 ms stddev 26.691

latency average = 21.898 ms
latency stddev = 6.416 ms
tps = 365.308180 (including connections establishing)
tps = 365.318793 (excluding connections establishing)

synchronous_commit=on

worst:

progress: 30.0 s, 7026.8 tps, lat 1.137 ms stddev 11.070

latency average = 0.550 ms
latency stddev = 5.599 ms
tps = 14547.842213 (including connections establishing)
tps = 14548.325102 (excluding connections establishing)

If you do the same on rotational disks, the stall periods can get a
*lot* worse (multi-minute stalls with pretty much no activity).

What I'm basically wondering is whether we're screwing somebody over
that made the effort to manually configure this on OSX. It's fairly
obvious we need to find a way to disable the msync() by default.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Andres Freund (#9)

6 attachment(s)

Re: Horrible CREATE DATABASE Performance in High Sierra

Andres Freund <andres@anarazel.de> writes:

To demonstrate what I'm observing here, on linux with a fairly fast ssd:
...

I tried to replicate this test as closely as I could on the Mac hardware
I have laying about. I only bothered with the synchronous_commit=off
case, though, since you say that shows the worst effects. I used the
same parameters you did and the same pgbench settings. I attach the
pgbench output for six cases, flush_after disabled or enabled on three
different machines:

(A) 2016 MacBook Pro, 2.7GHz i7 + SSD, Sierra, HFS+ file system
(B) 2013 MacBook Pro, 2.3GHz i7 + SSD, High Sierra, APFS file system
(C) 2012 Mac Mini, 2.3GHz i7 + 5400-RPM SATA, High Sierra, HFS+ file system

There is some benefit on the SSD machines, but it's in the range of a
few percent --- clearly, these kernels are not as subject to the basic
I/O-scheduling problem as Linux is. The spinning-rust machine shows a
nice gain in overall TPS with flush enabled, but it's actually a bit
worse off in terms of the worst-case slowdown --- note that only that
case shows things coming to a complete halt. It'd be interesting to
check the behavior of a pre-High-Sierra kernel with spinning rust,
but I don't have any remotely modern machine answering that description.

I'm kind of surprised that machine B doesn't show obvious tanking in this
test given that msync() makes it suck so badly at copying a database.
I wonder what is different from the kernel's standpoint ... maybe the
sheer number of different files mmap'd by a single process during the
copy?

What I'm basically wondering is whether we're screwing somebody over
that made the effort to manually configure this on OSX. It's fairly
obvious we need to find a way to disable the msync() by default.

I suspect that anybody who cares about DB performance on macOS will
be running it on SSD-based hardware these days. The benefit seen on
the Mac Mini would have been worth the trouble of a custom configuration
a few years ago, but I'm dubious that it matters in the real world
anymore.

If we could arrange to not use pg_flush_after in copydir.c on macOS,
I'd be okay with leaving it alone for the configurable flush_after
calls. But I can't think of a way to do that that wouldn't be a
complete kluge. I don't much want to do

+#ifndef __darwin__
		pg_flush_data(dstfd, offset, nbytes);
+#endif

but I don't see any better alternative ...

regards, tom lane

#11

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Tom Lane (#10)

Re: Horrible CREATE DATABASE Performance in High Sierra

Hi,

On 2017-10-02 18:33:17 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

To demonstrate what I'm observing here, on linux with a fairly fast ssd:
...

I tried to replicate this test as closely as I could on the Mac hardware
I have laying about.

Thanks!

I only bothered with the synchronous_commit=off case, though, since
you say that shows the worst effects.

That's the case on linux at least, but I'd suspect it's a much more
general thing - you just can't get that much data dirty with pgbench
otherwise.

I'm kind of surprised that machine B doesn't show obvious tanking in this
test given that msync() makes it suck so badly at copying a database.
I wonder what is different from the kernel's standpoint ... maybe the
sheer number of different files mmap'd by a single process during the
copy?

Yea, that's curious. I've really no clue about OSX, so pardon my
question: With HEAD and CREATE DATABASE, is it IO blocked or kernel cpu
blocked?

If we could arrange to not use pg_flush_after in copydir.c on macOS,
I'd be okay with leaving it alone for the configurable flush_after
calls. But I can't think of a way to do that that wouldn't be a
complete kluge. I don't much want to do
+#ifndef __darwin__
pg_flush_data(dstfd, offset, nbytes);
+#endif
but I don't see any better alternative ...

What we could do is introduce a guc (~create_database_flush_data) that
controls whether we flush here, and have the defaults set differently on
OSX. Not sure if that's actually any better.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Andres Freund (#11)

Re: Horrible CREATE DATABASE Performance in High Sierra

Andres Freund <andres@anarazel.de> writes:

On 2017-10-02 18:33:17 -0400, Tom Lane wrote:

I'm kind of surprised that machine B doesn't show obvious tanking in this
test given that msync() makes it suck so badly at copying a database.
I wonder what is different from the kernel's standpoint ... maybe the
sheer number of different files mmap'd by a single process during the
copy?

Yea, that's curious. I've really no clue about OSX, so pardon my
question: With HEAD and CREATE DATABASE, is it IO blocked or kernel cpu
blocked?

What I saw was that the backend process was consuming 100% of (one) CPU,
while the I/O transaction rate viewed by "iostat 1" started pretty low
--- under 10% of what the machine is capable of --- and dropped from
there as the copy proceeded.  I did not think to check if that was user
or kernel-space CPU, but I imagine it has to be the latter.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Tom Lane (#12)

Re: Horrible CREATE DATABASE Performance in High Sierra

On 2017-10-02 19:50:51 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2017-10-02 18:33:17 -0400, Tom Lane wrote:

I'm kind of surprised that machine B doesn't show obvious tanking in this
test given that msync() makes it suck so badly at copying a database.
I wonder what is different from the kernel's standpoint ... maybe the
sheer number of different files mmap'd by a single process during the
copy?

Yea, that's curious. I've really no clue about OSX, so pardon my
question: With HEAD and CREATE DATABASE, is it IO blocked or kernel cpu
blocked?
What I saw was that the backend process was consuming 100% of (one) CPU,
while the I/O transaction rate viewed by "iostat 1" started pretty low
--- under 10% of what the machine is capable of --- and dropped from
there as the copy proceeded.  I did not think to check if that was user
or kernel-space CPU, but I imagine it has to be the latter.

So that's pretty clearly a kernel bug... Hm. I wonder if it's mmap() or
msync() that's the problem here. I guess you didn't run a profile?

One interesting thing here is that in the CREATE DATABASE case there'll
probably be a lot larger contiguous mappings than in *_flush_after
cases. So it might be related to the size of the mapping / flush "unit".

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Andres Freund (#13)

Re: Horrible CREATE DATABASE Performance in High Sierra

Andres Freund <andres@anarazel.de> writes:

On 2017-10-02 19:50:51 -0400, Tom Lane wrote:

What I saw was that the backend process was consuming 100% of (one) CPU,
while the I/O transaction rate viewed by "iostat 1" started pretty low
--- under 10% of what the machine is capable of --- and dropped from
there as the copy proceeded.  I did not think to check if that was user
or kernel-space CPU, but I imagine it has to be the latter.

So that's pretty clearly a kernel bug... Hm. I wonder if it's mmap() or
msync() that's the problem here. I guess you didn't run a profile?

Interestingly, profiling with Activity Monitor seems to blame the problem
entirely on munmap() ... which squares with the place I hit every time
when randomly stopping the process with gdb^Hlldb, so I'm inclined to
believe it.

This still offers no insight as to why CREATE DATABASE is hitting the
problem while regular flush activity doesn't.

One interesting thing here is that in the CREATE DATABASE case there'll
probably be a lot larger contiguous mappings than in *_flush_after
cases. So it might be related to the size of the mapping / flush "unit".

Meh, the mapping is only 64K in this case vs. 8K in the other. Hard
to credit that it breaks that easily.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Brent Dearth

brent.dearth@gmail.com

over 8 years ago

In reply to: Tom Lane (#14)

Re: Horrible CREATE DATABASE Performance in High Sierra

Tom, Andres -

Is there an issue tracker I could be looking at to follow along on the
progress on this issue?

Thanks so much!

On Mon, Oct 2, 2017 at 9:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Andres Freund <andres@anarazel.de> writes:
On 2017-10-02 19:50:51 -0400, Tom Lane wrote:
What I saw was that the backend process was consuming 100% of (one) CPU,
while the I/O transaction rate viewed by "iostat 1" started pretty low
--- under 10% of what the machine is capable of --- and dropped from
there as the copy proceeded.  I did not think to check if that was user
or kernel-space CPU, but I imagine it has to be the latter.
So that's pretty clearly a kernel bug... Hm. I wonder if it's mmap() or
msync() that's the problem here. I guess you didn't run a profile?

Interestingly, profiling with Activity Monitor seems to blame the problem
entirely on munmap() ... which squares with the place I hit every time
when randomly stopping the process with gdb^Hlldb, so I'm inclined to
believe it.

This still offers no insight as to why CREATE DATABASE is hitting the
problem while regular flush activity doesn't.

One interesting thing here is that in the CREATE DATABASE case there'll
probably be a lot larger contiguous mappings than in *_flush_after
cases. So it might be related to the size of the mapping / flush "unit".

Meh, the mapping is only 64K in this case vs. 8K in the other. Hard
to credit that it breaks that easily.

regards, tom lane

#16

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Brent Dearth (#15)

Re: Horrible CREATE DATABASE Performance in High Sierra

Brent Dearth <brent.dearth@gmail.com> writes:

Is there an issue tracker I could be looking at to follow along on the
progress on this issue?

This email thread is pretty much it ...

Current status is that I've filed a bug report with Apple and am waiting
to see their response before deciding what to do next. If they fix the
issue promptly then there's little need for us to do anything.

If you want to help move things along, it would be useful to run some
experiments and see if there's a way to ameliorate the problem short of
the brute-force answer of disabling copy_file()'s pg_flush_data() call.
One thing that occurs to me is that even a 64KB file copy buffer is pretty
small on any modern machine. If we were to up that to, say, 1MB, does
that help any? See COPY_BUF_SIZE in src/backend/storage/file/copydir.c.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Tom Lane (#16)

1 attachment(s)

Re: Horrible CREATE DATABASE Performance in High Sierra

I wrote:

Current status is that I've filed a bug report with Apple and am waiting
to see their response before deciding what to do next. If they fix the
issue promptly then there's little need for us to do anything.

Not having heard a peep from Apple yet, I decided to do a bit more
experimenting. I found that indeed, issuing fewer and larger mmap/msync
requests helps enormously. If you're willing to go as far as issuing
only one per file, the speed seems on par with non-fsync. But that
requires being able to mmap 1GB-sized files, so it's probably not
something we want to do.

What I did instead was to adjust the logic in copy_file() so that the
unit of flush requests can be a multiple of the unit of read/write
requests. (My original thought of just raising the buffer size seems
like not as good an idea; it's less cache-friendly for one thing.)

I find that on both Linux and macOS-with-HFS, requesting a flush only
every 1MB seems to be a win compared to flushing every 64KB as we
currently do. Actually it seems that on macOS, every increment of
increase in the flush distance helps, but with HFS you hit diminishing
returns after 1MB or so. With APFS you need a flush distance of 32MB
or more to have credible performance.

Accordingly I propose the attached patch. If anyone's interested in
experimenting on other platforms, we might be able to refine/complicate
the FLUSH_DISTANCE selection further, but I think this is probably good
enough for a first cut.

regards, tom lane

Attachments:

smarter-flush-distance-choice-1.patchtext/x-diff; charset=us-ascii; name=smarter-flush-distance-choice-1.patchDownload

diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index a5e074e..eae9f5a 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -139,10 +139,24 @@ copy_file(char *fromfile, char *tofile)
 	int			dstfd;
 	int			nbytes;
 	off_t		offset;
+	off_t		flush_offset;
 
-	/* Use palloc to ensure we get a maxaligned buffer */
+	/* Size of copy buffer (read and write requests) */
 #define COPY_BUF_SIZE (8 * BLCKSZ)
 
+	/*
+	 * Size of data flush requests.  It seems beneficial on most platforms to
+	 * do this every 1MB or so.  But macOS, at least with early releases of
+	 * APFS, is really unfriendly to small mmap/msync requests, so there do it
+	 * only every 32MB.
+	 */
+#if defined(__darwin__)
+#define FLUSH_DISTANCE (32 * 1024 * 1024)
+#else
+#define FLUSH_DISTANCE (1024 * 1024)
+#endif
+
+	/* Use palloc to ensure we get a maxaligned buffer */
 	buffer = palloc(COPY_BUF_SIZE);
 
 	/*
@@ -163,11 +177,23 @@ copy_file(char *fromfile, char *tofile)
 	/*
 	 * Do the data copying.
 	 */
+	flush_offset = 0;
 	for (offset = 0;; offset += nbytes)
 	{
 		/* If we got a cancel signal during the copy of the file, quit */
 		CHECK_FOR_INTERRUPTS();
 
+		/*
+		 * We fsync the files later, but during the copy, flush them every so
+		 * often to avoid spamming the cache and hopefully get the kernel to
+		 * start writing them out before the fsync comes.
+		 */
+		if (offset - flush_offset >= FLUSH_DISTANCE)
+		{
+			pg_flush_data(dstfd, flush_offset, offset - flush_offset);
+			flush_offset = offset;
+		}
+
 		pgstat_report_wait_start(WAIT_EVENT_COPY_FILE_READ);
 		nbytes = read(srcfd, buffer, COPY_BUF_SIZE);
 		pgstat_report_wait_end();
@@ -190,15 +216,11 @@ copy_file(char *fromfile, char *tofile)
 					 errmsg("could not write to file \"%s\": %m", tofile)));
 		}
 		pgstat_report_wait_end();
-
-		/*
-		 * We fsync the files later but first flush them to avoid spamming the
-		 * cache and hopefully get the kernel to start writing them out before
-		 * the fsync comes.
-		 */
-		pg_flush_data(dstfd, offset, nbytes);
 	}
 
+	if (offset > flush_offset)
+		pg_flush_data(dstfd, flush_offset, offset - flush_offset);
+
 	if (CloseTransientFile(dstfd))
 		ereport(ERROR,
 				(errcode_for_file_access(),

#18

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

about 8 years ago

In reply to: Tom Lane (#17)

Re: Horrible CREATE DATABASE Performance in High Sierra

On 10/7/17 16:46, Tom Lane wrote:

I wrote:

Current status is that I've filed a bug report with Apple and am waiting
to see their response before deciding what to do next. If they fix the
issue promptly then there's little need for us to do anything.

Accordingly I propose the attached patch. If anyone's interested in
experimenting on other platforms, we might be able to refine/complicate
the FLUSH_DISTANCE selection further, but I think this is probably good
enough for a first cut.

What is the status of this? Is performance on High Sierra still bad?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Andres Freund

andres@anarazel.de

about 8 years ago

In reply to: Peter Eisentraut (#18)

Re: Horrible CREATE DATABASE Performance in High Sierra

On November 8, 2017 7:31:17 AM PST, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

On 10/7/17 16:46, Tom Lane wrote:

I wrote:

Current status is that I've filed a bug report with Apple and am

waiting

to see their response before deciding what to do next. If they fix

the

issue promptly then there's little need for us to do anything.

Accordingly I propose the attached patch. If anyone's interested in
experimenting on other platforms, we might be able to

refine/complicate

the FLUSH_DISTANCE selection further, but I think this is probably

good

enough for a first cut.

What is the status of this? Is performance on High Sierra still bad?

Didn't Tom commit a workaround?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Tom Lane

tgl@sss.pgh.pa.us

about 8 years ago

In reply to: Peter Eisentraut (#18)

Re: Horrible CREATE DATABASE Performance in High Sierra

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

On 10/7/17 16:46, Tom Lane wrote:

Accordingly I propose the attached patch. If anyone's interested in
experimenting on other platforms, we might be able to refine/complicate
the FLUSH_DISTANCE selection further, but I think this is probably good
enough for a first cut.

What is the status of this? Is performance on High Sierra still bad?

I committed the fix at 643c27e36. If Apple have done anything about the
underlying problem, you couldn't tell it from their non-response to my
bug report.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Tom Lane (#20)

Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

I wrote:

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

What is the status of this? Is performance on High Sierra still bad?

I committed the fix at 643c27e36. If Apple have done anything about the
underlying problem, you couldn't tell it from their non-response to my
bug report.

So, after just about one year of radio silence, I received email from
Apple saying that (a) they'd closed my bug report as a duplicate of
another one, and (b) that other one was also closed, and (c) not one
other iota of information.

This seems to be standard practice for them, though I'm at a loss
to say why they consider it even minimally acceptable. From here,
it seems like a great way to piss people off and ensure they won't
bother filing any future bug reports.

Anyway, reading the tea leaves and considering the timing, one might
guess that they actually fixed the problem as of macOS Mojave.

regards, tom lane

#22

Gavin Flower

GavinFlower@archidevsys.co.nz

over 7 years ago

In reply to: Tom Lane (#21)

Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

On 19/09/2018 16:38, Tom Lane wrote:

I wrote:

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

What is the status of this? Is performance on High Sierra still bad?

I committed the fix at 643c27e36. If Apple have done anything about the
underlying problem, you couldn't tell it from their non-response to my
bug report.

So, after just about one year of radio silence, I received email from
Apple saying that (a) they'd closed my bug report as a duplicate of
another one, and (b) that other one was also closed, and (c) not one
other iota of information.

This seems to be standard practice for them, though I'm at a loss
to say why they consider it even minimally acceptable. From here,
it seems like a great way to piss people off and ensure they won't
bother filing any future bug reports.

Anyway, reading the tea leaves and considering the timing, one might
guess that they actually fixed the problem as of macOS Mojave.

regards, tom lane

Don't worry your tiny little mind, Apple knows best what is good for you!

[Smileys omitted, due to budget constraints]