Pointers towards identifying bulk import bottleneck (walwriter tuning?)

Started by Holtgrewe, Manuelalmost 7 years ago9 messagesgeneral

manuel.holtgrewe@bihealth.de

almost 7 years ago

Dear all,

I hope that this is the right place to ask.

I have trouble identifying the bottleneck in a bulk import. I'm suspecting the bottleneck to be the walwriter. I'm using PostgreSQL 11, my configuration is at [5]https://gist.github.com/holtgrewe/3e029c6c2555e0d9fd3fb247a0109b45#file-postgresql-conf. I'm running on ZFS raidz3 that can write write 300MB/s+. However, I'm only seeing ~35MB/s from walwriter.

What would be a strategy to improve bulk insertion performance or find out more about potential ways forward/understanding this better?

Thanks,
Manuel

The full description is as follows:

I have the following use case. My table is relatively complex (see [2]https://gist.github.com/holtgrewe/3e029c6c2555e0d9fd3fb247a0109b45#file-from_scratch-sql-L54 for the definition, [1]https://gist.github.com/holtgrewe/3e029c6c2555e0d9fd3fb247a0109b45 contains all scripts and commands while [3]https://file-public.bihealth.org/transient/varfish-debug/ has the data that I'm using for testing; original data from [4]https://figshare.com/articles/Corpasome/693052).

To summarise:

- My table has has 47 fields, most are integers but there are also two VARCHAR(64) arrays and one JSONB field.
- The data can nicely be partitioned by "case_id" as I will only query one case at a time.
- I want to import TSV files into the table, the example data has 370k lines which is realistic.
- Ideally I would be able to import multiple TSV files in parallel to saturate the I/O of the system.

The import strategy is that I'm first importing the TSV file into a table with the same number of fields as my actual table but is text only and then insert from this table into the actual table (I've cut out the SQL from my software which uses a library that goes this way for bulk import). I tried to directly \copy into the original table but that did not improve performance much.

For the benchmarking, I have created four separate of these "text tables" and pre-filled them (fill[1-4].sql). Then I'm filling my main table from them (query[1-4].sql), with 1, 2, and 4 queryX.sql executions in parallel. Wall-clock running time are about 14s for one import process, 21s for two import processes and 44s for four import processes.

As the main table is partitioned and each TSV import process only inserts into one partition entry, I was expecting the speedup to be almost linear. I also tried to insert into the partition entry tables directly from TSV in parallel but that did not help either.

My Postgres configuration is at [5]https://gist.github.com/holtgrewe/3e029c6c2555e0d9fd3fb247a0109b45#file-postgresql-conf. I'm running PostgreSQL 11.5 on a recent CentOS 7 (running inside a LXC container). The data is stored on a ZFS raidz3 pool with 16 spinning disks (and an SSD each for each ZFS logs and cache). The file system is capable of writing 300MB/s+ but I'm seeing 80MB/sec or less by postgres. With one import process the CPU usage is at 80%, dropping to ~60% for two import processes, and to about 30% for four import processes.

iotop tells me that walwriter does not go beyond ~35MB/s so maybe this is the culprit? Is there a way to tune walwriter I/O performance?

[1]: https://gist.github.com/holtgrewe/3e029c6c2555e0d9fd3fb247a0109b45
[2]: https://gist.github.com/holtgrewe/3e029c6c2555e0d9fd3fb247a0109b45#file-from_scratch-sql-L54
[3]: https://file-public.bihealth.org/transient/varfish-debug/
[4]: https://figshare.com/articles/Corpasome/693052
[5]: https://gist.github.com/holtgrewe/3e029c6c2555e0d9fd3fb247a0109b45#file-postgresql-conf

Luca Ferrari

fluca1978@gmail.com

almost 7 years ago

In reply to: Holtgrewe, Manuel (#1)

Re: Pointers towards identifying bulk import bottleneck (walwriter tuning?)

On Tue, Aug 27, 2019 at 12:06 PM Holtgrewe, Manuel
<manuel.holtgrewe@bihealth.de> wrote:

iotop tells me that walwriter does not go beyond ~35MB/s so maybe this is the culprit? Is there a way to tune walwriter I/O performance?

As far as I know, walwriter is there to help background processes, so
in the case it cannot keep up with WALs the backends will write on
their own. If my understanding is correct, I don't think that could be
the bootleneck.
I've seen you have checkpoints at 4h, that's quite huge to me. Do you
have any hint that checkpoints are happening too frequently?
Any chance you can turn fsync off (only if this is a testbed)?

Also this <https://www.postgresql.org/list/pgsql-performance/> could
be a better place to ask for help.

Luca

Holtgrewe, Manuel

manuel.holtgrewe@bihealth.de

almost 7 years ago

In reply to: Luca Ferrari (#2)

RE: [ext] Re: Pointers towards identifying bulk import bottleneck (walwriter tuning?)

Hi,

I also tried creating the table as "UNLOGGED" which led to walwriter I/O to drop drastically and I now get no wall-clock time increase with two import processes but it gets slower with four.

Switching off fsync leads to a drastic time improvement but still higher wall-clock time for four threads.

Thanks for the pointer to the other mailing list. I'll try my luck there.

Cheers,
Manuel

Jeff Janes

jeff.janes@gmail.com

almost 7 years ago

In reply to: Holtgrewe, Manuel (#3)

Re: [ext] Re: Pointers towards identifying bulk import bottleneck (walwriter tuning?)

On Tue, Aug 27, 2019 at 10:43 AM Holtgrewe, Manuel <
manuel.holtgrewe@bihealth.de> wrote:

Hi,

I also tried creating the table as "UNLOGGED" which led to walwriter I/O
to drop drastically and I now get no wall-clock time increase with two
import processes but it gets slower with four.

Switching off fsync leads to a drastic time improvement but still higher
wall-clock time for four threads.

Does switching fsync off make it faster even when the table are unlogged
(that would be surprising) or were the two changes made one at a time?
When you say still higher for four threads, do you mean the four threads
for fsync=off are much faster than 4 threads for fsync=on but still doesn't
scale linearly within the fsync=off set? Or is the nonlinearity so bad
that you fsync=off doesn't even improve the 4 thread situation?

PostgreSQL fsyncs each wal segment once it is full. Under heavy load, this
is effectively done in the foreground (even when done by WALwriter),
because other processes inserting WAL records will soon be blocked by locks
taken out by the fsyncing process. So if you can't run your production
database with fsync=off, one thing you can try is setting up a new database
with a larger wal segment size (--wal-segsize argument to initdb).

Cheers,

Jeff

Holtgrewe, Manuel

manuel.holtgrewe@bihealth.de

almost 7 years ago

In reply to: Holtgrewe, Manuel (#1)

RE: [ext] Re: Pointers towards identifying bulk import bottleneck (walwriter tuning?)

Dear Jeff,

thanks for your answer.

Your question regarding CPUs pointed me into the right direction now. In my container virtualization I had the actual CPU restriction set to 2 so this explains the drop in performance (d'oh!). Actually, with using UNLOGGED tables I get constant wall-clock time up to 4 processes, when removing this, I get a small penalty for 2 and 4 jobs.

As I can recover from broken bulk imports on these tables on the application level, I will recreate them as UNLOGGED and take home that performance gain.

Thanks you for your time and expertise!

Best wishes,
Manuel

________________________________
From: Jeff Janes [jeff.janes@gmail.com]
Sent: Tuesday, August 27, 2019 16:45
To: Holtgrewe, Manuel
Subject: [ext] Re: Pointers towards identifying bulk import bottleneck (walwriter tuning?)

On Tue, Aug 27, 2019 at 6:06 AM Holtgrewe, Manuel <manuel.holtgrewe@bihealth.de<mailto:manuel.holtgrewe@bihealth.de>> wrote:
Dear all,

I hope that this is the right place to ask.

I have trouble identifying the bottleneck in a bulk import. I'm suspecting the bottleneck to be the walwriter. I'm using PostgreSQL 11, my configuration is at [5]. I'm running on ZFS raidz3 that can write write 300MB/s+. However, I'm only seeing ~35MB/s from walwriter.

What happens if you turn fsync=off (test only---do not do that on a production environment)? If that doesn't speed things up dramatically, then what happens if you make the partitions of variants_smallvariant be unlogged tables? If that doesn't speed things up dramatically either, then you know the bottleneck has nothing to do with WAL writing.

What does "top" show?

Sample the contents of wait_event_type and wait_event from pg_stat_activity for the INSERT statements. What are they waiting on most?

iotop tells me that walwriter does not go beyond ~35MB/s so maybe this is the culprit? Is there a way to tune walwriter I/O performance?

WALwriter is not the exclusive writer of WAL records. For example the user-connected backends also write WAL records. If you want to know how fast WAL is being generated, you should look directly at the amount of WAL generated over the course of the benchmark, for example by looking at pg_current_wal_lsn() before and after, or looking at the number of wal segments getting archived.

You can mess around with wal_writer_delay and wal_writer_flush_after, but I think that is not likely to make much difference. You can certainly shove the burden of doing the writes back and forth between WALwriter and user backends, but the total bottleneck is unlikely to change much.

With one import process the CPU usage is at 80%, dropping to ~60% for two import processes, and to about 30% for four import processes.

How many CPUs do you have? Is that 80% of all your CPU, or 80% of just one of them? Is the rest going to IO wait, system, or idle?

Cheers,

Jeff

Import Notes

Reply to msg id not found: CAMkU=1wDDm3Rvkk+jcsz+NJigktEjzAL5Nw3a43hcapSGPCw4Q@mail.gmail.com

Laurenz Albe

laurenz.albe@cybertec.at

almost 7 years ago

In reply to: Holtgrewe, Manuel (#3)

Re: [ext] Re: Pointers towards identifying bulk import bottleneck (walwriter tuning?)

Holtgrewe, Manuel wrote:

Switching off fsync leads to a drastic time improvement but still
higher wall-clock time for four threads.

Don't do that unless you are ready to start from scratch with a new
"initdb" in the case of a crash.

You can do almost as good by setting "synchronous_commit = off",
and that is crash-safe.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Holtgrewe, Manuel

manuel.holtgrewe@bihealth.de

almost 7 years ago

In reply to: Laurenz Albe (#6)

RE: [ext] Re: Pointers towards identifying bulk import bottleneck (walwriter tuning?)

I guess this cannot be pointed out too often ;) I'm not intending to do this but the behaviour of the system with fsync=off led me to further understand that the bottleneck had to be CPU-related rather than I/O-related. Of course, I never switched off fsync on the production system.

Best wishes,

________________________________________
From: Laurenz Albe [laurenz.albe@cybertec.at]
Sent: Wednesday, August 28, 2019 05:45
To: Holtgrewe, Manuel; Luca Ferrari
Cc: pgsql-general
Subject: Re: [ext] Re: Pointers towards identifying bulk import bottleneck (walwriter tuning?)

Holtgrewe, Manuel wrote:

Switching off fsync leads to a drastic time improvement but still
higher wall-clock time for four threads.

Don't do that unless you are ready to start from scratch with a new
"initdb" in the case of a crash.

You can do almost as good by setting "synchronous_commit = off",
and that is crash-safe.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Michael Lewis

mlewis@entrata.com

almost 7 years ago

In reply to: Laurenz Albe (#6)

Re: [ext] Re: Pointers towards identifying bulk import bottleneck (walwriter tuning?)

On Tue, Aug 27, 2019 at 9:45 PM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

Holtgrewe, Manuel wrote:

Switching off fsync leads to a drastic time improvement but still
higher wall-clock time for four threads.

Don't do that unless you are ready to start from scratch with a new
"initdb" in the case of a crash.

You can do almost as good by setting "synchronous_commit = off",
and that is crash-safe.

It seems like it depends on your definition of crash-safe. Data loss can
occur but not data corruption, right? Do you know any ballpark for how much
difference in performance it makes to turn off synchronous_commit or what
type of hardware or usage it would make the biggest (or least) difference?

Laurenz Albe

laurenz.albe@cybertec.at

almost 7 years ago

In reply to: Michael Lewis (#8)

Re: [ext] Re: Pointers towards identifying bulk import bottleneck (walwriter tuning?)

On Wed, 2019-08-28 at 12:27 -0600, Michael Lewis wrote:

You can do almost as good by setting "synchronous_commit = off",
and that is crash-safe.

It seems like it depends on your definition of crash-safe. Data loss
can occur but not data corruption, right?

Right.

Do you know any ballpark for how much difference in performance it
makes to turn off synchronous_commit or what type of hardware or
usage it would make the biggest (or least) difference?

In my experience, the performance will be almost as good as
with fsync=off, which is as good as it gets.

For an exact answer for your specific system, run a simple benchmark.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com