checkpoints taking much longer than expected

Started by Tiemen Ruitenalmost 7 years ago22 messagesgeneral

Jump to latest

Tiemen Ruiten

t.ruiten@tech-lab.io

almost 7 years ago

Hello,

I setup a new 3-node cluster with the following specifications:

2x Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (2*20 cores)
128 GB RAM
8x Crucial MX500 1TB SSD's

FS is ZFS, the dataset with the PGDATA directory on it has the following
properties (only non-default listed):

NAME PROPERTY VALUE SOURCE
tank/pgsql used 234G -
tank/pgsql available 3.28T -
tank/pgsql referenced 234G -
tank/pgsql compressratio 2.68x -
tank/pgsql compression lz4 inherited from tank
tank/pgsql atime off inherited from tank
tank/pgsql canmount on local
tank/pgsql xattr sa inherited from tank

My postgresql.conf (only changed from default listed):

hba_file = '/var/lib/pgsql/pg_hba.conf'
listen_addresses = '*'
max_connections = 800
shared_buffers = 8GB
work_mem = 64MB
maintenance_work_mem = 2GB
autovacuum_work_mem = 1GB
dynamic_shared_memory_type = posix
effective_io_concurrency = 200
max_worker_processes = 50
max_parallel_maintenance_workers = 8
max_parallel_workers_per_gather = 8
max_parallel_workers = 40
wal_level = replica
synchronous_commit = off
full_page_writes = on
wal_log_hints = on
wal_buffers = 128MB
checkpoint_timeout = 60min
max_wal_size = 8GB
min_wal_size = 1GB
checkpoint_completion_target = 0.9
archive_mode = on
archive_command = 'pgbackrest --stanza=pgdb2 archive-push %p'
max_wal_senders = 10
wal_keep_segments = 20
hot_standby = on
hot_standby_feedback = on
random_page_cost = 1.5
effective_cache_size = 48GB
default_statistics_target = 500 # range 1-10000
idle_in_transaction_session_timeout = 30min # in milliseconds, 0 is disabled
shared_preload_libraries = 'timescaledb, pg_cron'
max_locks_per_transaction = 512
timescaledb.max_background_workers = 8

My problem is that checkpoints are taking a long time. Even when I run a
few manual checkpoints one after the other, they keep taking very long, up
to 10 minutes:

2019-06-14 15:21:10.351 CEST [23657] LOG: checkpoint starting: immediate
force wait
2019-06-14 15:25:57.655 CEST [23657] LOG: checkpoint complete: wrote
139831 buffers (13.3%); 0 WAL file(s) added, 148 removed, 40 recycled;
write=284.751 s, sync=0.532 s, total=287.304 s; sync files=537,
longest=0.010 s, average=0.000 s; distance=2745065 kB, estimate=2841407 kB
2019-06-14 15:26:01.988 CEST [23657] LOG: checkpoint starting: immediate
force wait
2019-06-14 15:30:30.430 CEST [23657] LOG: checkpoint complete: wrote
238322 buffers (22.7%); 0 WAL file(s) added, 0 removed, 172 recycled;
write=264.794 s, sync=0.415 s, total=268.441 s; sync files=378,
longest=0.011 s, average=0.001 s; distance=2834018 kB, estimate=2840668 kB
2019-06-14 15:30:44.097 CEST [23657] LOG: checkpoint starting: immediate
force wait
2019-06-14 15:37:01.438 CEST [23657] LOG: checkpoint complete: wrote
132286 buffers (12.6%); 0 WAL file(s) added, 54 removed, 96 recycled;
write=366.614 s, sync=2.975 s, total=377.341 s; sync files=467,
longest=0.095 s, average=0.006 s; distance=2444291 kB, estimate=2801030 kB

What is going on? It doesn't seem like normal behaviour?

Ravi Krishna

ravikrishna@mail.com

almost 7 years ago

In reply to: Tiemen Ruiten (#1)

Re: checkpoints taking much longer than expected

On 6/14/19 10:01 AM, Tiemen Ruiten wrote:

LOG: checkpoint starting: immediate force wait

Does it mean that the DB is blocked until the completion of checkpoint.
Years ago
Informix use to have this issue until they fixed around 2006.

Stephen Frost

sfrost@snowman.net

almost 7 years ago

In reply to: Ravi Krishna (#2)

Re: checkpoints taking much longer than expected

Greetings,

* Ravi Krishna (ravikrishna@mail.com) wrote:

On 6/14/19 10:01 AM, Tiemen Ruiten wrote:

LOG: checkpoint starting: immediate force wait

Does it mean that the DB is blocked until the completion of checkpoint.
Years ago
Informix use to have this issue until they fixed around 2006.

No.

Thanks,

Stephen

Stephen Frost

sfrost@snowman.net

almost 7 years ago

In reply to: Tiemen Ruiten (#1)

Re: checkpoints taking much longer than expected

Greetings,

* Tiemen Ruiten (t.ruiten@tech-lab.io) wrote:

checkpoint_timeout = 60min

That seems like a pretty long timeout.

My problem is that checkpoints are taking a long time. Even when I run a
few manual checkpoints one after the other, they keep taking very long, up
to 10 minutes:

You haven't said *why* this is an issue... Why are you concerned with
how long it takes to do a checkpoint?

2019-06-14 15:21:10.351 CEST [23657] LOG: checkpoint starting: immediate
force wait
2019-06-14 15:25:57.655 CEST [23657] LOG: checkpoint complete: wrote
139831 buffers (13.3%); 0 WAL file(s) added, 148 removed, 40 recycled;
write=284.751 s, sync=0.532 s, total=287.304 s; sync files=537,
longest=0.010 s, average=0.000 s; distance=2745065 kB, estimate=2841407 kB
2019-06-14 15:26:01.988 CEST [23657] LOG: checkpoint starting: immediate
force wait
2019-06-14 15:30:30.430 CEST [23657] LOG: checkpoint complete: wrote
238322 buffers (22.7%); 0 WAL file(s) added, 0 removed, 172 recycled;
write=264.794 s, sync=0.415 s, total=268.441 s; sync files=378,
longest=0.011 s, average=0.001 s; distance=2834018 kB, estimate=2840668 kB
2019-06-14 15:30:44.097 CEST [23657] LOG: checkpoint starting: immediate
force wait
2019-06-14 15:37:01.438 CEST [23657] LOG: checkpoint complete: wrote
132286 buffers (12.6%); 0 WAL file(s) added, 54 removed, 96 recycled;
write=366.614 s, sync=2.975 s, total=377.341 s; sync files=467,
longest=0.095 s, average=0.006 s; distance=2444291 kB, estimate=2801030 kB

What is going on? It doesn't seem like normal behaviour?

The time information is all there and it tells you what it's doing and
how much had to be done... If you're unhappy with how long it takes to
write out gigabytes of data and fsync hundreds of files, talk to your
storage people...

Thanks,

Stephen

Peter J. Holzer

hjp-pgsql@hjp.at

almost 7 years ago

In reply to: Tiemen Ruiten (#1)

Re: checkpoints taking much longer than expected

On 2019-06-14 16:01:40 +0200, Tiemen Ruiten wrote:

FS is ZFS, the dataset with the PGDATA directory on it has the following
properties (only non-default listed):

[...]

My problem is that checkpoints are taking a long time. Even when I run a few
manual checkpoints one after the other, they keep taking very long, up to 10
minutes:

There was a discussion about ZFS' COW behaviour and PostgreSQL reusing
WAL files not being a good combination about a year ago:
/messages/by-id/CACukRjO7DJvub8e2AijOayj8BfKK3XXBTwu3KKARiTr67M3E3w@mail.gmail.com

Maybe you have the same problem?

Tiemen Ruiten

t.ruiten@tech-lab.io

almost 7 years ago

In reply to: Stephen Frost (#4)

Re: checkpoints taking much longer than expected

On Fri, Jun 14, 2019 at 5:43 PM Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Tiemen Ruiten (t.ruiten@tech-lab.io) wrote:

checkpoint_timeout = 60min

That seems like a pretty long timeout.

My reasoning was that a longer recovery time to avoid writes would be
acceptable because there are two more nodes in the cluster to fall back on
in case of emergency.

My problem is that checkpoints are taking a long time. Even when I run a
few manual checkpoints one after the other, they keep taking very long,

up

to 10 minutes:

You haven't said *why* this is an issue... Why are you concerned with
how long it takes to do a checkpoint?

During normal operation I don't mind that it takes a long time, but when
performing maintenance I want to be able to gracefully bring down the
master without long delays to promote one of the standby's.

The time information is all there and it tells you what it's doing and
how much had to be done... If you're unhappy with how long it takes to
write out gigabytes of data and fsync hundreds of files, talk to your
storage people...

I am the storage people too :)

Show quoted text

Thanks,

Stephen

Peter Geoghegan

pg@bowt.ie

almost 7 years ago

In reply to: Tiemen Ruiten (#6)

Re: checkpoints taking much longer than expected

On Sat, Jun 15, 2019 at 1:50 AM Tiemen Ruiten <t.ruiten@tech-lab.io> wrote:

During normal operation I don't mind that it takes a long time, but when performing maintenance I want to be able to gracefully bring down the master without long delays to promote one of the standby's.

Maybe an "immediate" mode shutdown is appropriate, then. That will
mean that the primary will need to go through crash recovery if and
when you bring it back up, though.

--
Peter Geoghegan

Michael Loftis

mloftis@wgops.com

almost 7 years ago

In reply to: Tiemen Ruiten (#1)

Re: checkpoints taking much longer than expected

On Fri, Jun 14, 2019 at 08:02 Tiemen Ruiten <t.ruiten@tech-lab.io> wrote:

Hello,

I setup a new 3-node cluster with the following specifications:

2x Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (2*20 cores)
128 GB RAM
8x Crucial MX500 1TB SSD's

FS is ZFS, the dataset with the PGDATA directory on it has the following
properties (only non-default listed):

NAME PROPERTY VALUE SOURCE
tank/pgsql used 234G -
tank/pgsql available 3.28T -
tank/pgsql referenced 234G -
tank/pgsql compressratio 2.68x -
tank/pgsql compression lz4 inherited from
tank

You're possibly slowing yourself down a lot by using compression here.
Turning off compression though is multiple steps. You basically have to
turn it off in the FS and then rewrite/copy the data. I'd check in a
scratch area how long it takes to write ~3G of random data with compression
on vs off.

tank/pgsql atime off inherited from
tank
tank/pgsql canmount on local
tank/pgsql xattr sa inherited from
tank

My postgresql.conf (only changed from default listed):

hba_file = '/var/lib/pgsql/pg_hba.conf'
listen_addresses = '*'
max_connections = 800
shared_buffers = 8GB
work_mem = 64MB
maintenance_work_mem = 2GB
autovacuum_work_mem = 1GB
dynamic_shared_memory_type = posix
effective_io_concurrency = 200
max_worker_processes = 50
max_parallel_maintenance_workers = 8
max_parallel_workers_per_gather = 8
max_parallel_workers = 40
wal_level = replica
synchronous_commit = off
full_page_writes = on
wal_log_hints = on
wal_buffers = 128MB
checkpoint_timeout = 60min
max_wal_size = 8GB
min_wal_size = 1GB
checkpoint_completion_target = 0.9
archive_mode = on
archive_command = 'pgbackrest --stanza=pgdb2 archive-push %p'
max_wal_senders = 10
wal_keep_segments = 20
hot_standby = on
hot_standby_feedback = on
random_page_cost = 1.5
effective_cache_size = 48GB
default_statistics_target = 500 # range 1-10000
idle_in_transaction_session_timeout = 30min # in milliseconds, 0 is
disabled
shared_preload_libraries = 'timescaledb, pg_cron'
max_locks_per_transaction = 512
timescaledb.max_background_workers = 8

My problem is that checkpoints are taking a long time. Even when I run a
few manual checkpoints one after the other, they keep taking very long, up
to 10 minutes:

2019-06-14 15:21:10.351 CEST [23657] LOG: checkpoint starting: immediate
force wait
2019-06-14 15:25:57.655 CEST [23657] LOG: checkpoint complete: wrote
139831 buffers (13.3%); 0 WAL file(s) added, 148 removed, 40 recycled;
write=284.751 s, sync=0.532 s, total=287.304 s; sync files=537,
longest=0.010 s, average=0.000 s; distance=2745065 kB, estimate=2841407 kB
2019-06-14 15:26:01.988 CEST [23657] LOG: checkpoint starting: immediate
force wait
2019-06-14 15:30:30.430 CEST [23657] LOG: checkpoint complete: wrote
238322 buffers (22.7%); 0 WAL file(s) added, 0 removed, 172 recycled;
write=264.794 s, sync=0.415 s, total=268.441 s; sync files=378,
longest=0.011 s, average=0.001 s; distance=2834018 kB, estimate=2840668 kB
2019-06-14 15:30:44.097 CEST [23657] LOG: checkpoint starting: immediate
force wait
2019-06-14 15:37:01.438 CEST [23657] LOG: checkpoint complete: wrote
132286 buffers (12.6%); 0 WAL file(s) added, 54 removed, 96 recycled;
write=366.614 s, sync=2.975 s, total=377.341 s; sync files=467,
longest=0.095 s, average=0.006 s; distance=2444291 kB, estimate=2801030 kB

What is going on? It doesn't seem like normal behaviour?

"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler

Jeff Janes

jeff.janes@gmail.com

almost 7 years ago

In reply to: Tiemen Ruiten (#6)

Re: checkpoints taking much longer than expected

On Sat, Jun 15, 2019 at 4:50 AM Tiemen Ruiten <t.ruiten@tech-lab.io> wrote:

On Fri, Jun 14, 2019 at 5:43 PM Stephen Frost <sfrost@snowman.net> wrote:

The time information is all there and it tells you what it's doing and
how much had to be done... If you're unhappy with how long it takes to
write out gigabytes of data and fsync hundreds of files, talk to your
storage people...

Right, but true only because they were "checkpoint starting: immediate".
Otherwise the reported write time includes intentional sleeps added to
honor the checkpoint_completion_target. A bit confusing to report it that
way, I think.

I am the storage people too :)

So, what did you say to yourself? Have you done fundamental benchmarking
at the OS level, to see how fast you can write out 1-2 GB of data changes
spread out randomly over a few hundred GB of data, when it is not coming
from PostgreSQL?

Cheers,

Jeff

#10

Stephen Frost

sfrost@snowman.net

almost 7 years ago

In reply to: Tiemen Ruiten (#6)

Re: checkpoints taking much longer than expected

Greetings,

* Tiemen Ruiten (t.ruiten@tech-lab.io) wrote:

On Fri, Jun 14, 2019 at 5:43 PM Stephen Frost <sfrost@snowman.net> wrote:

* Tiemen Ruiten (t.ruiten@tech-lab.io) wrote:

checkpoint_timeout = 60min

That seems like a pretty long timeout.

My reasoning was that a longer recovery time to avoid writes would be
acceptable because there are two more nodes in the cluster to fall back on
in case of emergency.

Ok, so you want fewer checkpoints because you expect to failover to a
replica rather than recover the primary on a failure. If you're doing
synchronous replication, then that certainly makes sense. If you
aren't, then you're deciding that you're alright with losing some number
of writes by failing over rather than recovering the primary, which can
also be acceptable but it's certainly much more questionable.

My problem is that checkpoints are taking a long time. Even when I run a
few manual checkpoints one after the other, they keep taking very long,

up

to 10 minutes:

You haven't said *why* this is an issue... Why are you concerned with
how long it takes to do a checkpoint?

During normal operation I don't mind that it takes a long time, but when
performing maintenance I want to be able to gracefully bring down the
master without long delays to promote one of the standby's.

I'm getting the feeling that your replicas are async, but it sounds like
you'd be better off with having at least one sync replica, so that you
can flip to it quickly. Alternatively, having a way to more easily make
the primary to accepting new writes, flush everything to the replicas,
report that it's completed doing so, to allow you to promote a replica
without losing anything, and *then* go through the process on the
primary of doing a checkpoint, would be kind of nice.

Then again, you run into the issue that if your async replicas are very
far behind then you're still going to have a long period of time between
the "stop accepting new writes" and "finished flushing everything to the
replicas".

The time information is all there and it tells you what it's doing and
how much had to be done... If you're unhappy with how long it takes to
write out gigabytes of data and fsync hundreds of files, talk to your
storage people...

I am the storage people too :)

Great! Make it go faster. :)

Thanks,

Stephen

#11

Stephen Frost

sfrost@snowman.net

almost 7 years ago

In reply to: Jeff Janes (#9)

Re: checkpoints taking much longer than expected

Greetings,

* Jeff Janes (jeff.janes@gmail.com) wrote:

On Sat, Jun 15, 2019 at 4:50 AM Tiemen Ruiten <t.ruiten@tech-lab.io> wrote:

On Fri, Jun 14, 2019 at 5:43 PM Stephen Frost <sfrost@snowman.net> wrote:

The time information is all there and it tells you what it's doing and
how much had to be done... If you're unhappy with how long it takes to
write out gigabytes of data and fsync hundreds of files, talk to your
storage people...

Right, but true only because they were "checkpoint starting: immediate".
Otherwise the reported write time includes intentional sleeps added to
honor the checkpoint_completion_target. A bit confusing to report it that
way, I think.

Hrmpf, yeah, we could probably improve on that, I'd certainly like to
see that broken out.

Thanks,

Stephen

#12

Alvaro Herrera

alvherre@2ndquadrant.com

almost 7 years ago

In reply to: Peter J. Holzer (#5)

Re: checkpoints taking much longer than expected

On 2019-Jun-14, Peter J. Holzer wrote:

There was a discussion about ZFS' COW behaviour and PostgreSQL reusing
WAL files not being a good combination about a year ago:
/messages/by-id/CACukRjO7DJvub8e2AijOayj8BfKK3XXBTwu3KKARiTr67M3E3w@mail.gmail.com

Maybe you have the same problem?

Note that Joyent ended up proposing patches to fix their performance
problem (and got them committed). Maybe it would be useful for Tiemen
to try that code? (That commit cherry-picks cleanly on REL_11_STABLE.)

commit 475861b2615dd63ae8431d811749a6f9a15bbfd6
Author: Thomas Munro <tmunro@postgresql.org>
AuthorDate: Tue Apr 2 14:37:14 2019 +1300
CommitDate: Tue Apr 2 14:37:14 2019 +1300

Add wal_recycle and wal_init_zero GUCs.

On at least ZFS, it can be beneficial to create new WAL files every
time and not to bother zero-filling them. Since it's not clear which
other filesystems might benefit from one or both of those things,
add individual GUCs to control those two behaviors independently and
make only very general statements in the docs.

Author: Jerry Jelinek, with some adjustments by Thomas Munro
Reviewed-by: Alvaro Herrera, Andres Freund, Tomas Vondra, Robert Haas and others
Discussion: /messages/by-id/CACPQ5Fo00QR7LNAcd1ZjgoBi4y97+K760YABs0vQHH5dLdkkMA@mail.gmail.com

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13

Tiemen Ruiten

t.ruiten@tech-lab.io

almost 7 years ago

In reply to: Alvaro Herrera (#12)

Re: checkpoints taking much longer than expected

On Sun, Jun 16, 2019 at 8:57 PM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Note that Joyent ended up proposing patches to fix their performance
problem (and got them committed). Maybe it would be useful for Tiemen
to try that code? (That commit cherry-picks cleanly on REL_11_STABLE.)

Interesting! The performance improvements in that thread look like it's
worth trying. Is this patch likely to make it to an 11.x release?

#14

Stephen Frost

sfrost@snowman.net

almost 7 years ago

In reply to: Tiemen Ruiten (#13)

Re: checkpoints taking much longer than expected

Greetings,

* Tiemen Ruiten (t.ruiten@tech-lab.io) wrote:

On Sun, Jun 16, 2019 at 8:57 PM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Note that Joyent ended up proposing patches to fix their performance
problem (and got them committed). Maybe it would be useful for Tiemen
to try that code? (That commit cherry-picks cleanly on REL_11_STABLE.)

Interesting! The performance improvements in that thread look like it's
worth trying. Is this patch likely to make it to an 11.x release?

No, as it's a feature and we don't back-patch those.

Not likely to help with what you're experiencing anyway though...

Thanks,

Stephen

#15

Alvaro Herrera

alvherre@2ndquadrant.com

almost 7 years ago

In reply to: Stephen Frost (#14)

Re: checkpoints taking much longer than expected

On 2019-Jun-16, Stephen Frost wrote:

Not likely to help with what you're experiencing anyway though...

My gut feeling is that you're wrong, since (as I understand) the
symptoms are the same.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#16

Tiemen Ruiten

t.ruiten@tech-lab.io

almost 7 years ago

In reply to: Stephen Frost (#10)

Re: checkpoints taking much longer than expected

On Sun, Jun 16, 2019 at 7:30 PM Stephen Frost <sfrost@snowman.net> wrote:

Ok, so you want fewer checkpoints because you expect to failover to a
replica rather than recover the primary on a failure. If you're doing
synchronous replication, then that certainly makes sense. If you
aren't, then you're deciding that you're alright with losing some number
of writes by failing over rather than recovering the primary, which can
also be acceptable but it's certainly much more questionable.

Yes, in our setup that's the case: a few lost transactions will have a
negligible impact to the business.

I'm getting the feeling that your replicas are async, but it sounds like
you'd be better off with having at least one sync replica, so that you
can flip to it quickly.

They are indeed async, we traded durability for performance here, because
we can accept some lost transactions.

Alternatively, having a way to more easily make
the primary to accepting new writes, flush everything to the replicas,
report that it's completed doing so, to allow you to promote a replica
without losing anything, and *then* go through the process on the
primary of doing a checkpoint, would be kind of nice.

I suppose that would require being able to demote a master to a slave
during runtime.
That would definitely be nice-to-have.

Show quoted text

Thanks,

Stephen

#17

Stephen Frost

sfrost@snowman.net

almost 7 years ago

In reply to: Alvaro Herrera (#15)

Re: checkpoints taking much longer than expected

Greetings,

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:

On 2019-Jun-16, Stephen Frost wrote:

Not likely to help with what you're experiencing anyway though...

My gut feeling is that you're wrong, since (as I understand) the
symptoms are the same.

The issue in the linked-to thread was that of a synch replica getting
stuck doing a random read of a WAL page in order to append to it, since
we were reusing WAL.

The issue being discussed here is writing out to the heap files during a
checkpoint...

Thanks,

Stephen

#18

Stephen Frost

sfrost@snowman.net

almost 7 years ago

In reply to: Tiemen Ruiten (#16)

Re: checkpoints taking much longer than expected

Greetings,

* Tiemen Ruiten (t.ruiten@tech-lab.io) wrote:

On Sun, Jun 16, 2019 at 7:30 PM Stephen Frost <sfrost@snowman.net> wrote:

Ok, so you want fewer checkpoints because you expect to failover to a
replica rather than recover the primary on a failure. If you're doing
synchronous replication, then that certainly makes sense. If you
aren't, then you're deciding that you're alright with losing some number
of writes by failing over rather than recovering the primary, which can
also be acceptable but it's certainly much more questionable.

Yes, in our setup that's the case: a few lost transactions will have a
negligible impact to the business.

There's a risk here that the impact could be much higher than just 'a
few'. Hopefully you're monitoring it carefully and have some plan of
what to do if the difference grows very quickly.

Alternatively, having a way to more easily make
the primary to accepting new writes, flush everything to the replicas,
report that it's completed doing so, to allow you to promote a replica
without losing anything, and *then* go through the process on the
primary of doing a checkpoint, would be kind of nice.

I suppose that would require being able to demote a master to a slave
during runtime.

That's.. not the case.

If you're really not worried about losing transactions, then an
immediate shutdown as suggested by Peter G would let you fail over to
one of your not-completely-current replicas.. Or you could just do that
as soon as you start shutting down the primary. Of course, you'll end
up in a situation where the replica will have fork'ed off of the
timeline from a point prior to where the primary is at, meaning you'd
have to use pg_rewind (or do a restore from an earlier backup) to
re-build the former-primary as a replica, if that's the goal.

If the idea is that you don't want to lose any transactions during this
process, and you want to be able to cleanly bring the former primary
back, then I don't think we've really got a great solution where you can
make sure that all the WAL has been flushed out to the replicas and the
primary just finishes its checkpoint (and thinking about that a bit
more, I'm not sure we could actually do it anyway since we're going to
want to write out that shutdown record, making it a catch-22, but maybe
there's something clever that could be done there..).

Thanks,

Stephen

#19

Alvaro Herrera

alvherre@2ndquadrant.com

almost 7 years ago

In reply to: Stephen Frost (#17)

Re: checkpoints taking much longer than expected

On 2019-Jun-16, Stephen Frost wrote:

The issue being discussed here is writing out to the heap files during a
checkpoint...

We don't really know, as it was already established that the log line is
misattributing time spent ...

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#20

Stephen Frost

sfrost@snowman.net

almost 7 years ago

In reply to: Alvaro Herrera (#19)

Re: checkpoints taking much longer than expected

Greetings,

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:

On 2019-Jun-16, Stephen Frost wrote:

The issue being discussed here is writing out to the heap files during a
checkpoint...

We don't really know, as it was already established that the log line is
misattributing time spent ...

That's in the case of a regular checkpoint, this was from someone
actually issuing 'CHECKPOINT;', so I don't think that's accurate..

Thanks,

Stephen

#21

Tiemen Ruiten

t.ruiten@tech-lab.io

almost 7 years ago

In reply to: Alvaro Herrera (#12)

#22

Andres Freund

andres@anarazel.de

almost 7 years ago

In reply to: Jeff Janes (#9)