Postgres 10, slave not catching up with master

Started by Boris Sagadinover 7 years ago10 messagesgeneral

boris@infosplet.com

over 7 years ago

Hello,

I have a database running on i3.8xlarge (256GB RAM, 32 CPU cores, 4x 1.9TB
NVMe drive) AWS instance with about 5TB of disk space occupied, ext4,
Ubuntu 16.04.

Multi-tenant DB with about 40000 tables, insert heavy.

I started a new slave with identical HW specs, SR. DB started syncing from
master, which took about 4 hours, then it started applying the WALs.
However, it seems it can't catch up. Delay is still around 3 hours
(measured with now() - pg_last_xact_replay_timestamp()), even a day later.
It goes a few 100s up and down, but it seems to float around 3h mark.

Disk IO is low at about 10%, measured with iostat, no connected clients,
recovery process is at around 90% CPU single core usage.

Tried tuning the various parameters, but with no avail. Only thing I found
suspicious is stracing the recovery process constantly produces many errors
such as:

lseek(428, 0, SEEK_END) = 780124160
lseek(30, 0, SEEK_END) = 212992
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource temporarily
unavailable)
lseek(680, 0, SEEK_END) = 493117440
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource temporarily
unavailable)
lseek(774, 0, SEEK_END) = 583368704

...[snip]...

read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource temporarily
unavailable)
lseek(774, 0, SEEK_END) = 583368704
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource temporarily
unavailable)
lseek(277, 0, SEEK_END) = 502882304
lseek(6, 516096, SEEK_SET) = 516096
read(6,
"\227\320\5\0\1\0\0\0\0\340\7\246\26\274\0\0\315\0\0\0\0\0\0\0}\0178\5&/\260\r"...,
8192) = 8192
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource temporarily
unavailable)
lseek(735, 0, SEEK_END) = 272809984
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource temporarily
unavailable)
lseek(277, 0, SEEK_END) = 502882304

ls -l fd/9
lr-x------ 1 postgres postgres 64 Oct 21 06:21 fd/9 -> pipe:[46358]

Perf top on recovery produces:

27.76% postgres [.] pglz_decompress
9.90% [kernel] [k] entry_SYSCALL_64_after_swapgs
7.09% postgres [.] hash_search_with_hash_value
4.26% libpthread-2.23.so [.] llseek
3.64% libpthread-2.23.so [.] __read_nocancel
2.80% [kernel] [k] __fget_light
2.67% postgres [.] 0x000000000034d3ba
1.85% [kernel] [k] ext4_llseek
1.84% postgres [.] pg_comp_crc32c_sse42
1.44% postgres [.] hash_any
1.35% postgres [.] 0x000000000036afad
1.29% postgres [.] MarkBufferDirty
1.21% postgres [.] XLogReadRecord
[...]

Tried changing the process limits with prlimit to unlimited, but no change.

I can turn off the WAL compression but I doubt this is the main culprit.
Any ideas appreciated.

Regards,
Boris

Andy Colson

andy@squeakycode.net

over 7 years ago

In reply to: Boris Sagadin (#1)

Re: Postgres 10, slave not catching up with master

On 10/21/18 2:06 AM, Boris Sagadin wrote:

Hello,

I have a database running on i3.8xlarge (256GB RAM, 32 CPU cores, 4x 1.9TB NVMe drive) AWS instance with about 5TB of disk space occupied, ext4, Ubuntu 16.04.

Multi-tenant DB with about 40000 tables, insert heavy.

I started a new slave with identical HW specs, SR. DB started syncing from master, which took about 4 hours, then it started applying the WALs. However, it seems it can't catch up. Delay is still around 3 hours (measured with now() - pg_last_xact_replay_timestamp()), even a day later. It goes a few 100s up and down, but it seems to float around 3h mark.

Disk IO is low at about 10%, measured with iostat, no connected clients, recovery process is at around 90% CPU single core usage.

Tried tuning the various parameters, but with no avail. Only thing I found suspicious is stracing the recovery process constantly produces many errors such as:

lseek(428, 0, SEEK_END)                 = 780124160
lseek(30, 0, SEEK_END)                  = 212992
read(9, 0x7ffe4001f557, 1)              = -1 EAGAIN (Resource temporarily unavailable)
lseek(680, 0, SEEK_END)                 = 493117440
read(9, 0x7ffe4001f557, 1)              = -1 EAGAIN (Resource temporarily unavailable)
lseek(774, 0, SEEK_END)                 = 583368704

...[snip]...

read(9, 0x7ffe4001f557, 1)              = -1 EAGAIN (Resource temporarily unavailable)
lseek(774, 0, SEEK_END)                 = 583368704
read(9, 0x7ffe4001f557, 1)              = -1 EAGAIN (Resource temporarily unavailable)
lseek(277, 0, SEEK_END)                 = 502882304
lseek(6, 516096, SEEK_SET)              = 516096
read(6, "\227\320\5\0\1\0\0\0\0\340\7\246\26\274\0\0\315\0\0\0\0\0\0\0}\0178\5&/\260\r"..., 8192) = 8192
read(9, 0x7ffe4001f557, 1)              = -1 EAGAIN (Resource temporarily unavailable)
lseek(735, 0, SEEK_END)                 = 272809984
read(9, 0x7ffe4001f557, 1)              = -1 EAGAIN (Resource temporarily unavailable)
lseek(277, 0, SEEK_END)                 = 502882304

ls -l fd/9
lr-x------ 1 postgres postgres 64 Oct 21 06:21 fd/9 -> pipe:[46358]

Perf top on recovery produces:

27.76% postgres            [.] pglz_decompress
   9.90% [kernel]            [k] entry_SYSCALL_64_after_swapgs
   7.09% postgres            [.] hash_search_with_hash_value
   4.26% libpthread-2.23.so <http://libpthread-2.23.so> [.] llseek
   3.64% libpthread-2.23.so <http://libpthread-2.23.so> [.] __read_nocancel
   2.80% [kernel]            [k] __fget_light
   2.67% postgres            [.] 0x000000000034d3ba
   1.85% [kernel]            [k] ext4_llseek
   1.84% postgres            [.] pg_comp_crc32c_sse42
   1.44% postgres            [.] hash_any
   1.35% postgres            [.] 0x000000000036afad
   1.29% postgres            [.] MarkBufferDirty
   1.21% postgres            [.] XLogReadRecord
[...]

Tried changing the process limits with prlimit to unlimited, but no change.

I can turn off the WAL compression but I doubt this is the main culprit. Any ideas appreciated.

Regards,
Boris

I dont have an answer, but at a guess, decompress is taking the most amount of time, and the process spewing "Resource temporarily unavailable" might be just sitting around waiting for the decompress to finish.

I'd say yea, definitely try disabling WAL compression. I'm really curious what that would do.

I assume you are using streaming replication?

-Andy

Boris Sagadin

boris@infosplet.com

over 7 years ago

In reply to: Boris Sagadin (#1)

Re: Postgres 10, slave not catching up with master

Yes, turning wal_compression off improves things. Slave that was mentioned
unfortunately lagged too much before this setting was applied and was
turned off. However the remaining slave lags less now, although still
occasionally up to a few minutes. I think single threadedness of recovery
is a big slowdown for write heavy databases. Maybe an option to increase
wal_size beyond 16MB in v11 will help.

In the meantime we'll solve this by splitting the DB to 2 or 3 clusters or
maybe trying out some sharding solution like Citus.

Boris

On Sun, Oct 21, 2018 at 9:06 AM, Boris Sagadin <boris@infosplet.com> wrote:

Show quoted text

Hello,

I have a database running on i3.8xlarge (256GB RAM, 32 CPU cores, 4x 1.9TB
NVMe drive) AWS instance with about 5TB of disk space occupied, ext4,
Ubuntu 16.04.

Multi-tenant DB with about 40000 tables, insert heavy.

I started a new slave with identical HW specs, SR. DB started syncing from
master, which took about 4 hours, then it started applying the WALs.
However, it seems it can't catch up. Delay is still around 3 hours
(measured with now() - pg_last_xact_replay_timestamp()), even a day
later. It goes a few 100s up and down, but it seems to float around 3h mark.

Disk IO is low at about 10%, measured with iostat, no connected clients,
recovery process is at around 90% CPU single core usage.

Tried tuning the various parameters, but with no avail. Only thing I found
suspicious is stracing the recovery process constantly produces many errors
such as:

lseek(428, 0, SEEK_END) = 780124160
lseek(30, 0, SEEK_END) = 212992
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource temporarily
unavailable)
lseek(680, 0, SEEK_END) = 493117440
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource temporarily
unavailable)
lseek(774, 0, SEEK_END) = 583368704

...[snip]...

read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource temporarily
unavailable)
lseek(774, 0, SEEK_END) = 583368704
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource temporarily
unavailable)
lseek(277, 0, SEEK_END) = 502882304
lseek(6, 516096, SEEK_SET) = 516096
read(6, "\227\320\5\0\1\0\0\0\0\340\7\246\26\274\0\0\315\0\0\0\0\0\0\0}\0178\5&/\260\r"...,
8192) = 8192
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource temporarily
unavailable)
lseek(735, 0, SEEK_END) = 272809984
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource temporarily
unavailable)
lseek(277, 0, SEEK_END) = 502882304

ls -l fd/9
lr-x------ 1 postgres postgres 64 Oct 21 06:21 fd/9 -> pipe:[46358]

Perf top on recovery produces:

27.76% postgres [.] pglz_decompress
9.90% [kernel] [k] entry_SYSCALL_64_after_swapgs
7.09% postgres [.] hash_search_with_hash_value
4.26% libpthread-2.23.so [.] llseek
3.64% libpthread-2.23.so [.] __read_nocancel
2.80% [kernel] [k] __fget_light
2.67% postgres [.] 0x000000000034d3ba
1.85% [kernel] [k] ext4_llseek
1.84% postgres [.] pg_comp_crc32c_sse42
1.44% postgres [.] hash_any
1.35% postgres [.] 0x000000000036afad
1.29% postgres [.] MarkBufferDirty
1.21% postgres [.] XLogReadRecord
[...]

Tried changing the process limits with prlimit to unlimited, but no change.

I can turn off the WAL compression but I doubt this is the main culprit.
Any ideas appreciated.

Regards,
Boris

Hellmuth Vargas

hivs77@gmail.com

over 7 years ago

In reply to: Boris Sagadin (#3)

Re: Postgres 10, slave not catching up with master

can share recovery.conf file settings??

El mar., 23 de oct. de 2018 a la(s) 00:28, Boris Sagadin (
boris@infosplet.com) escribió:

Yes, turning wal_compression off improves things. Slave that was mentioned
unfortunately lagged too much before this setting was applied and was
turned off. However the remaining slave lags less now, although still
occasionally up to a few minutes. I think single threadedness of recovery
is a big slowdown for write heavy databases. Maybe an option to increase
wal_size beyond 16MB in v11 will help.

In the meantime we'll solve this by splitting the DB to 2 or 3 clusters or
maybe trying out some sharding solution like Citus.

Boris

On Sun, Oct 21, 2018 at 9:06 AM, Boris Sagadin <boris@infosplet.com>
wrote:

Hello,

I have a database running on i3.8xlarge (256GB RAM, 32 CPU cores, 4x
1.9TB NVMe drive) AWS instance with about 5TB of disk space occupied, ext4,
Ubuntu 16.04.

Multi-tenant DB with about 40000 tables, insert heavy.

I started a new slave with identical HW specs, SR. DB started syncing
from master, which took about 4 hours, then it started applying the WALs.
However, it seems it can't catch up. Delay is still around 3 hours
(measured with now() - pg_last_xact_replay_timestamp()), even a day later.
It goes a few 100s up and down, but it seems to float around 3h mark.

Disk IO is low at about 10%, measured with iostat, no connected clients,
recovery process is at around 90% CPU single core usage.

Tried tuning the various parameters, but with no avail. Only thing I
found suspicious is stracing the recovery process constantly produces many
errors such as:

lseek(428, 0, SEEK_END) = 780124160
lseek(30, 0, SEEK_END) = 212992
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource temporarily
unavailable)
lseek(680, 0, SEEK_END) = 493117440
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource temporarily
unavailable)
lseek(774, 0, SEEK_END) = 583368704

...[snip]...

read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource temporarily
unavailable)
lseek(774, 0, SEEK_END) = 583368704
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource temporarily
unavailable)
lseek(277, 0, SEEK_END) = 502882304
lseek(6, 516096, SEEK_SET) = 516096
read(6,
"\227\320\5\0\1\0\0\0\0\340\7\246\26\274\0\0\315\0\0\0\0\0\0\0}\0178\5&/\260\r"...,
8192) = 8192
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource temporarily
unavailable)
lseek(735, 0, SEEK_END) = 272809984
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource temporarily
unavailable)
lseek(277, 0, SEEK_END) = 502882304

ls -l fd/9
lr-x------ 1 postgres postgres 64 Oct 21 06:21 fd/9 -> pipe:[46358]

Perf top on recovery produces:

27.76% postgres [.] pglz_decompress
9.90% [kernel] [k] entry_SYSCALL_64_after_swapgs
7.09% postgres [.] hash_search_with_hash_value
4.26% libpthread-2.23.so [.] llseek
3.64% libpthread-2.23.so [.] __read_nocancel
2.80% [kernel] [k] __fget_light
2.67% postgres [.] 0x000000000034d3ba
1.85% [kernel] [k] ext4_llseek
1.84% postgres [.] pg_comp_crc32c_sse42
1.44% postgres [.] hash_any
1.35% postgres [.] 0x000000000036afad
1.29% postgres [.] MarkBufferDirty
1.21% postgres [.] XLogReadRecord
[...]

Tried changing the process limits with prlimit to unlimited, but no
change.

I can turn off the WAL compression but I doubt this is the main culprit.
Any ideas appreciated.

Regards,
Boris

--
Cordialmente,

Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate

Boris Sagadin

boris@infosplet.com

over 7 years ago

In reply to: Hellmuth Vargas (#4)

Re: Postgres 10, slave not catching up with master

Nothing special, just:

standby_mode = 'on'
primary_conninfo = 'host=... user=repmgr application_name=nodex'
recovery_target_timeline = 'latest'

Boris

On Tue, Oct 23, 2018 at 3:10 PM, Hellmuth Vargas <hivs77@gmail.com> wrote:

Show quoted text

Hi

can share recovery.conf file settings??

El mar., 23 de oct. de 2018 a la(s) 00:28, Boris Sagadin (
boris@infosplet.com) escribió:

Yes, turning wal_compression off improves things. Slave that was
mentioned unfortunately lagged too much before this setting was applied and
was turned off. However the remaining slave lags less now, although still
occasionally up to a few minutes. I think single threadedness of recovery
is a big slowdown for write heavy databases. Maybe an option to increase
wal_size beyond 16MB in v11 will help.

In the meantime we'll solve this by splitting the DB to 2 or 3 clusters
or maybe trying out some sharding solution like Citus.

Boris

On Sun, Oct 21, 2018 at 9:06 AM, Boris Sagadin <boris@infosplet.com>
wrote:

Hello,

I have a database running on i3.8xlarge (256GB RAM, 32 CPU cores, 4x
1.9TB NVMe drive) AWS instance with about 5TB of disk space occupied, ext4,
Ubuntu 16.04.

Multi-tenant DB with about 40000 tables, insert heavy.

I started a new slave with identical HW specs, SR. DB started syncing
from master, which took about 4 hours, then it started applying the WALs.
However, it seems it can't catch up. Delay is still around 3 hours
(measured with now() - pg_last_xact_replay_timestamp()), even a day
later. It goes a few 100s up and down, but it seems to float around 3h mark.

Disk IO is low at about 10%, measured with iostat, no connected clients,
recovery process is at around 90% CPU single core usage.

Tried tuning the various parameters, but with no avail. Only thing I
found suspicious is stracing the recovery process constantly produces many
errors such as:

lseek(428, 0, SEEK_END) = 780124160
lseek(30, 0, SEEK_END) = 212992
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(680, 0, SEEK_END) = 493117440
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(774, 0, SEEK_END) = 583368704

...[snip]...

read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(774, 0, SEEK_END) = 583368704
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(277, 0, SEEK_END) = 502882304
lseek(6, 516096, SEEK_SET) = 516096
read(6, "\227\320\5\0\1\0\0\0\0\340\7\246\26\274\0\0\315\0\0\0\0\0\0\0}\0178\5&/\260\r"...,
8192) = 8192
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(735, 0, SEEK_END) = 272809984
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(277, 0, SEEK_END) = 502882304

ls -l fd/9
lr-x------ 1 postgres postgres 64 Oct 21 06:21 fd/9 -> pipe:[46358]

Perf top on recovery produces:

27.76% postgres [.] pglz_decompress
9.90% [kernel] [k] entry_SYSCALL_64_after_swapgs
7.09% postgres [.] hash_search_with_hash_value
4.26% libpthread-2.23.so [.] llseek
3.64% libpthread-2.23.so [.] __read_nocancel
2.80% [kernel] [k] __fget_light
2.67% postgres [.] 0x000000000034d3ba
1.85% [kernel] [k] ext4_llseek
1.84% postgres [.] pg_comp_crc32c_sse42
1.44% postgres [.] hash_any
1.35% postgres [.] 0x000000000036afad
1.29% postgres [.] MarkBufferDirty
1.21% postgres [.] XLogReadRecord
[...]

Tried changing the process limits with prlimit to unlimited, but no
change.

I can turn off the WAL compression but I doubt this is the main culprit.
Any ideas appreciated.

Regards,
Boris

--
Cordialmente,

Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate

Hellmuth Vargas

hivs77@gmail.com

over 7 years ago

In reply to: Boris Sagadin (#5)

Re: Postgres 10, slave not catching up with master

which result you get from the following query:

SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;

source:

https://severalnines.com/blog/postgresql-streaming-replication-deep-dive

El mar., 23 de oct. de 2018 a la(s) 11:28, Boris Sagadin (
boris@infosplet.com) escribió:

Nothing special, just:

standby_mode = 'on'
primary_conninfo = 'host=... user=repmgr application_name=nodex'
recovery_target_timeline = 'latest'

Boris

On Tue, Oct 23, 2018 at 3:10 PM, Hellmuth Vargas <hivs77@gmail.com> wrote:

Hi

can share recovery.conf file settings??

El mar., 23 de oct. de 2018 a la(s) 00:28, Boris Sagadin (
boris@infosplet.com) escribió:

Yes, turning wal_compression off improves things. Slave that was
mentioned unfortunately lagged too much before this setting was applied and
was turned off. However the remaining slave lags less now, although still
occasionally up to a few minutes. I think single threadedness of recovery
is a big slowdown for write heavy databases. Maybe an option to increase
wal_size beyond 16MB in v11 will help.

In the meantime we'll solve this by splitting the DB to 2 or 3 clusters
or maybe trying out some sharding solution like Citus.

Boris

On Sun, Oct 21, 2018 at 9:06 AM, Boris Sagadin <boris@infosplet.com>
wrote:

Hello,

I have a database running on i3.8xlarge (256GB RAM, 32 CPU cores, 4x
1.9TB NVMe drive) AWS instance with about 5TB of disk space occupied, ext4,
Ubuntu 16.04.

Multi-tenant DB with about 40000 tables, insert heavy.

I started a new slave with identical HW specs, SR. DB started syncing
from master, which took about 4 hours, then it started applying the WALs.
However, it seems it can't catch up. Delay is still around 3 hours
(measured with now() - pg_last_xact_replay_timestamp()), even a day later.
It goes a few 100s up and down, but it seems to float around 3h mark.

Disk IO is low at about 10%, measured with iostat, no connected
clients, recovery process is at around 90% CPU single core usage.

Tried tuning the various parameters, but with no avail. Only thing I
found suspicious is stracing the recovery process constantly produces many
errors such as:

lseek(428, 0, SEEK_END) = 780124160
lseek(30, 0, SEEK_END) = 212992
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(680, 0, SEEK_END) = 493117440
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(774, 0, SEEK_END) = 583368704

...[snip]...

read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(774, 0, SEEK_END) = 583368704
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(277, 0, SEEK_END) = 502882304
lseek(6, 516096, SEEK_SET) = 516096
read(6,
"\227\320\5\0\1\0\0\0\0\340\7\246\26\274\0\0\315\0\0\0\0\0\0\0}\0178\5&/\260\r"...,
8192) = 8192
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(735, 0, SEEK_END) = 272809984
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(277, 0, SEEK_END) = 502882304

ls -l fd/9
lr-x------ 1 postgres postgres 64 Oct 21 06:21 fd/9 -> pipe:[46358]

Perf top on recovery produces:

27.76% postgres [.] pglz_decompress
9.90% [kernel] [k] entry_SYSCALL_64_after_swapgs
7.09% postgres [.] hash_search_with_hash_value
4.26% libpthread-2.23.so [.] llseek
3.64% libpthread-2.23.so [.] __read_nocancel
2.80% [kernel] [k] __fget_light
2.67% postgres [.] 0x000000000034d3ba
1.85% [kernel] [k] ext4_llseek
1.84% postgres [.] pg_comp_crc32c_sse42
1.44% postgres [.] hash_any
1.35% postgres [.] 0x000000000036afad
1.29% postgres [.] MarkBufferDirty
1.21% postgres [.] XLogReadRecord
[...]

Tried changing the process limits with prlimit to unlimited, but no
change.

I can turn off the WAL compression but I doubt this is the main
culprit. Any ideas appreciated.

Regards,
Boris

--
Cordialmente,

Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate

--
Cordialmente,

Ing. Hellmuth I. Vargas S.

Hellmuth Vargas

hivs77@gmail.com

over 7 years ago

In reply to: Hellmuth Vargas (#6)

Re: Postgres 10, slave not catching up with master

Both servers are configured with the same date, time and time configuration?

El mar., 23 de oct. de 2018 a la(s) 13:16, Hellmuth Vargas (hivs77@gmail.com)
escribió:

Hi

which result you get from the following query:

SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;

source:

https://severalnines.com/blog/postgresql-streaming-replication-deep-dive

El mar., 23 de oct. de 2018 a la(s) 11:28, Boris Sagadin (
boris@infosplet.com) escribió:

Nothing special, just:

standby_mode = 'on'
primary_conninfo = 'host=... user=repmgr application_name=nodex'
recovery_target_timeline = 'latest'

Boris

On Tue, Oct 23, 2018 at 3:10 PM, Hellmuth Vargas <hivs77@gmail.com>
wrote:

Hi

can share recovery.conf file settings??

El mar., 23 de oct. de 2018 a la(s) 00:28, Boris Sagadin (
boris@infosplet.com) escribió:

Yes, turning wal_compression off improves things. Slave that was
mentioned unfortunately lagged too much before this setting was applied and
was turned off. However the remaining slave lags less now, although still
occasionally up to a few minutes. I think single threadedness of recovery
is a big slowdown for write heavy databases. Maybe an option to increase
wal_size beyond 16MB in v11 will help.

In the meantime we'll solve this by splitting the DB to 2 or 3 clusters
or maybe trying out some sharding solution like Citus.

Boris

On Sun, Oct 21, 2018 at 9:06 AM, Boris Sagadin <boris@infosplet.com>
wrote:

Hello,

I have a database running on i3.8xlarge (256GB RAM, 32 CPU cores, 4x
1.9TB NVMe drive) AWS instance with about 5TB of disk space occupied, ext4,
Ubuntu 16.04.

Multi-tenant DB with about 40000 tables, insert heavy.

I started a new slave with identical HW specs, SR. DB started syncing
from master, which took about 4 hours, then it started applying the WALs.
However, it seems it can't catch up. Delay is still around 3 hours
(measured with now() - pg_last_xact_replay_timestamp()), even a day later.
It goes a few 100s up and down, but it seems to float around 3h mark.

Disk IO is low at about 10%, measured with iostat, no connected
clients, recovery process is at around 90% CPU single core usage.

Tried tuning the various parameters, but with no avail. Only thing I
found suspicious is stracing the recovery process constantly produces many
errors such as:

lseek(428, 0, SEEK_END) = 780124160
lseek(30, 0, SEEK_END) = 212992
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(680, 0, SEEK_END) = 493117440
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(774, 0, SEEK_END) = 583368704

...[snip]...

read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(774, 0, SEEK_END) = 583368704
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(277, 0, SEEK_END) = 502882304
lseek(6, 516096, SEEK_SET) = 516096
read(6,
"\227\320\5\0\1\0\0\0\0\340\7\246\26\274\0\0\315\0\0\0\0\0\0\0}\0178\5&/\260\r"...,
8192) = 8192
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(735, 0, SEEK_END) = 272809984
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(277, 0, SEEK_END) = 502882304

ls -l fd/9
lr-x------ 1 postgres postgres 64 Oct 21 06:21 fd/9 -> pipe:[46358]

Perf top on recovery produces:

27.76% postgres [.] pglz_decompress
9.90% [kernel] [k] entry_SYSCALL_64_after_swapgs
7.09% postgres [.] hash_search_with_hash_value
4.26% libpthread-2.23.so [.] llseek
3.64% libpthread-2.23.so [.] __read_nocancel
2.80% [kernel] [k] __fget_light
2.67% postgres [.] 0x000000000034d3ba
1.85% [kernel] [k] ext4_llseek
1.84% postgres [.] pg_comp_crc32c_sse42
1.44% postgres [.] hash_any
1.35% postgres [.] 0x000000000036afad
1.29% postgres [.] MarkBufferDirty
1.21% postgres [.] XLogReadRecord
[...]

Tried changing the process limits with prlimit to unlimited, but no
change.

I can turn off the WAL compression but I doubt this is the main
culprit. Any ideas appreciated.

Regards,
Boris

--
Cordialmente,

Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate

--
Cordialmente,

Ing. Hellmuth I. Vargas S.

--
Cordialmente,

Ing. Hellmuth I. Vargas S.

Boris Sagadin

boris@infosplet.com

over 7 years ago

In reply to: Hellmuth Vargas (#7)

Re: Postgres 10, slave not catching up with master

Yes, times are all identical, set to UTC, ntpd is used.

log_delay
-----------
15.788175

This is delay at this moment, but we graph replication delay and it's
fluctuating between 0 and 30s. Before I turned off wal compression, lag was
much bigger (0 to up to 8 minutes). We have lots of tables (40k) and many
upserts.

Boris

On Tue, Oct 23, 2018 at 8:24 PM, Hellmuth Vargas <hivs77@gmail.com> wrote:

Show quoted text

Hi

Both servers are configured with the same date, time and time
configuration?

El mar., 23 de oct. de 2018 a la(s) 13:16, Hellmuth Vargas (
hivs77@gmail.com) escribió:

Hi

which result you get from the following query:

SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;

source:

https://severalnines.com/blog/postgresql-streaming-replication-deep-dive

El mar., 23 de oct. de 2018 a la(s) 11:28, Boris Sagadin (
boris@infosplet.com) escribió:

Nothing special, just:

standby_mode = 'on'
primary_conninfo = 'host=... user=repmgr application_name=nodex'
recovery_target_timeline = 'latest'

Boris

On Tue, Oct 23, 2018 at 3:10 PM, Hellmuth Vargas <hivs77@gmail.com>
wrote:

Hi

can share recovery.conf file settings??

El mar., 23 de oct. de 2018 a la(s) 00:28, Boris Sagadin (
boris@infosplet.com) escribió:

Yes, turning wal_compression off improves things. Slave that was
mentioned unfortunately lagged too much before this setting was applied and
was turned off. However the remaining slave lags less now, although still
occasionally up to a few minutes. I think single threadedness of recovery
is a big slowdown for write heavy databases. Maybe an option to increase
wal_size beyond 16MB in v11 will help.

In the meantime we'll solve this by splitting the DB to 2 or 3
clusters or maybe trying out some sharding solution like Citus.

Boris

On Sun, Oct 21, 2018 at 9:06 AM, Boris Sagadin <boris@infosplet.com>
wrote:

Hello,

I have a database running on i3.8xlarge (256GB RAM, 32 CPU cores, 4x
1.9TB NVMe drive) AWS instance with about 5TB of disk space occupied, ext4,
Ubuntu 16.04.

Multi-tenant DB with about 40000 tables, insert heavy.

I started a new slave with identical HW specs, SR. DB started syncing
from master, which took about 4 hours, then it started applying the WALs.
However, it seems it can't catch up. Delay is still around 3 hours
(measured with now() - pg_last_xact_replay_timestamp()), even a day
later. It goes a few 100s up and down, but it seems to float around 3h mark.

Disk IO is low at about 10%, measured with iostat, no connected
clients, recovery process is at around 90% CPU single core usage.

Tried tuning the various parameters, but with no avail. Only thing I
found suspicious is stracing the recovery process constantly produces many
errors such as:

lseek(428, 0, SEEK_END) = 780124160
lseek(30, 0, SEEK_END) = 212992
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(680, 0, SEEK_END) = 493117440
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(774, 0, SEEK_END) = 583368704

...[snip]...

read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(774, 0, SEEK_END) = 583368704
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(277, 0, SEEK_END) = 502882304
lseek(6, 516096, SEEK_SET) = 516096
read(6, "\227\320\5\0\1\0\0\0\0\340\7\246\26\274\0\0\315\0\0\0\0\0\0\0}\0178\5&/\260\r"...,
8192) = 8192
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(735, 0, SEEK_END) = 272809984
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(277, 0, SEEK_END) = 502882304

ls -l fd/9
lr-x------ 1 postgres postgres 64 Oct 21 06:21 fd/9 -> pipe:[46358]

Perf top on recovery produces:

27.76% postgres [.] pglz_decompress
9.90% [kernel] [k] entry_SYSCALL_64_after_swapgs
7.09% postgres [.] hash_search_with_hash_value
4.26% libpthread-2.23.so [.] llseek
3.64% libpthread-2.23.so [.] __read_nocancel
2.80% [kernel] [k] __fget_light
2.67% postgres [.] 0x000000000034d3ba
1.85% [kernel] [k] ext4_llseek
1.84% postgres [.] pg_comp_crc32c_sse42
1.44% postgres [.] hash_any
1.35% postgres [.] 0x000000000036afad
1.29% postgres [.] MarkBufferDirty
1.21% postgres [.] XLogReadRecord
[...]

Tried changing the process limits with prlimit to unlimited, but no
change.

I can turn off the WAL compression but I doubt this is the main
culprit. Any ideas appreciated.

Regards,
Boris

--
Cordialmente,

Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate

--
Cordialmente,

Ing. Hellmuth I. Vargas S.

--
Cordialmente,

Ing. Hellmuth I. Vargas S.

Hellmuth Vargas

hivs77@gmail.com

over 7 years ago

In reply to: Boris Sagadin (#8)

Re: Postgres 10, slave not catching up with master

Hi
El mié., 24 de oct. de 2018 a la(s) 00:39, Boris Sagadin (
boris@infosplet.com) escribió:

Yes, times are all identical, set to UTC, ntpd is used.

log_delay
-----------
15.788175

This is delay at this moment, but we graph replication delay and it's
fluctuating between 0 and 30s.

But the fluctuation is between 0 and 30s!1, are not 4 hours fortunately.
Apart from the theme wal compression I think you should check networks

Before I turned off wal compression, lag was much bigger (0 to up to 8
minutes). We have lots of tables (40k) and many upserts.

Boris

On Tue, Oct 23, 2018 at 8:24 PM, Hellmuth Vargas <hivs77@gmail.com> wrote:

Hi

Both servers are configured with the same date, time and time
configuration?

El mar., 23 de oct. de 2018 a la(s) 13:16, Hellmuth Vargas (
hivs77@gmail.com) escribió:

Hi

which result you get from the following query:

SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;

source:

https://severalnines.com/blog/postgresql-streaming-replication-deep-dive

El mar., 23 de oct. de 2018 a la(s) 11:28, Boris Sagadin (
boris@infosplet.com) escribió:

Nothing special, just:

standby_mode = 'on'
primary_conninfo = 'host=... user=repmgr application_name=nodex'
recovery_target_timeline = 'latest'

Boris

On Tue, Oct 23, 2018 at 3:10 PM, Hellmuth Vargas <hivs77@gmail.com>
wrote:

Hi

can share recovery.conf file settings??

El mar., 23 de oct. de 2018 a la(s) 00:28, Boris Sagadin (
boris@infosplet.com) escribió:

Yes, turning wal_compression off improves things. Slave that was
mentioned unfortunately lagged too much before this setting was applied and
was turned off. However the remaining slave lags less now, although still
occasionally up to a few minutes. I think single threadedness of recovery
is a big slowdown for write heavy databases. Maybe an option to increase
wal_size beyond 16MB in v11 will help.

In the meantime we'll solve this by splitting the DB to 2 or 3
clusters or maybe trying out some sharding solution like Citus.

Boris

On Sun, Oct 21, 2018 at 9:06 AM, Boris Sagadin <boris@infosplet.com>
wrote:

Hello,

I have a database running on i3.8xlarge (256GB RAM, 32 CPU cores, 4x
1.9TB NVMe drive) AWS instance with about 5TB of disk space occupied, ext4,
Ubuntu 16.04.

Multi-tenant DB with about 40000 tables, insert heavy.

I started a new slave with identical HW specs, SR. DB started
syncing from master, which took about 4 hours, then it started applying the
WALs. However, it seems it can't catch up. Delay is still around 3 hours
(measured with now() - pg_last_xact_replay_timestamp()), even a day later.
It goes a few 100s up and down, but it seems to float around 3h mark.

Disk IO is low at about 10%, measured with iostat, no connected
clients, recovery process is at around 90% CPU single core usage.

Tried tuning the various parameters, but with no avail. Only thing I
found suspicious is stracing the recovery process constantly produces many
errors such as:

lseek(428, 0, SEEK_END) = 780124160
lseek(30, 0, SEEK_END) = 212992
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(680, 0, SEEK_END) = 493117440
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(774, 0, SEEK_END) = 583368704

...[snip]...

read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(774, 0, SEEK_END) = 583368704
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(277, 0, SEEK_END) = 502882304
lseek(6, 516096, SEEK_SET) = 516096
read(6,
"\227\320\5\0\1\0\0\0\0\340\7\246\26\274\0\0\315\0\0\0\0\0\0\0}\0178\5&/\260\r"...,
8192) = 8192
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(735, 0, SEEK_END) = 272809984
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(277, 0, SEEK_END) = 502882304

ls -l fd/9
lr-x------ 1 postgres postgres 64 Oct 21 06:21 fd/9 -> pipe:[46358]

Perf top on recovery produces:

27.76% postgres [.] pglz_decompress
9.90% [kernel] [k] entry_SYSCALL_64_after_swapgs
7.09% postgres [.] hash_search_with_hash_value
4.26% libpthread-2.23.so [.] llseek
3.64% libpthread-2.23.so [.] __read_nocancel
2.80% [kernel] [k] __fget_light
2.67% postgres [.] 0x000000000034d3ba
1.85% [kernel] [k] ext4_llseek
1.84% postgres [.] pg_comp_crc32c_sse42
1.44% postgres [.] hash_any
1.35% postgres [.] 0x000000000036afad
1.29% postgres [.] MarkBufferDirty
1.21% postgres [.] XLogReadRecord
[...]

Tried changing the process limits with prlimit to unlimited, but no
change.

I can turn off the WAL compression but I doubt this is the main
culprit. Any ideas appreciated.

Regards,
Boris

--
Cordialmente,

Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate

--
Cordialmente,

Ing. Hellmuth I. Vargas S.

--
Cordialmente,

Ing. Hellmuth I. Vargas S.

--
Cordialmente,

Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate

#10

Boris Sagadin

boris@infosplet.com

over 7 years ago

In reply to: Hellmuth Vargas (#9)

Re: Postgres 10, slave not catching up with master

Yes, as stated, the lag went very much down after disabling wal
compression, it's manageable at least. Network is 10GB.

Lep pozdrav,

*Boris Sagadin*
InfoSplet, informacijske tehnologije, d.o.o.
*www.infosplet.com* <http://www.infosplet.com/> | Tel: 0590 / 45 800, GSM:
041 / 337 848

On Wed, Oct 24, 2018 at 3:34 PM, Hellmuth Vargas <hivs77@gmail.com> wrote:

Show quoted text

Hi
El mié., 24 de oct. de 2018 a la(s) 00:39, Boris Sagadin (
boris@infosplet.com) escribió:

Yes, times are all identical, set to UTC, ntpd is used.

log_delay
-----------
15.788175

This is delay at this moment, but we graph replication delay and it's
fluctuating between 0 and 30s.

But the fluctuation is between 0 and 30s!1, are not 4 hours fortunately.
Apart from the theme wal compression I think you should check networks

Before I turned off wal compression, lag was much bigger (0 to up to 8
minutes). We have lots of tables (40k) and many upserts.

Boris

On Tue, Oct 23, 2018 at 8:24 PM, Hellmuth Vargas <hivs77@gmail.com>
wrote:

Hi

Both servers are configured with the same date, time and time
configuration?

El mar., 23 de oct. de 2018 a la(s) 13:16, Hellmuth Vargas (
hivs77@gmail.com) escribió:

Hi

which result you get from the following query:

SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;

source:

https://severalnines.com/blog/postgresql-streaming-
replication-deep-dive

El mar., 23 de oct. de 2018 a la(s) 11:28, Boris Sagadin (
boris@infosplet.com) escribió:

Nothing special, just:

standby_mode = 'on'
primary_conninfo = 'host=... user=repmgr application_name=nodex'
recovery_target_timeline = 'latest'

Boris

On Tue, Oct 23, 2018 at 3:10 PM, Hellmuth Vargas <hivs77@gmail.com>
wrote:

Hi

can share recovery.conf file settings??

El mar., 23 de oct. de 2018 a la(s) 00:28, Boris Sagadin (
boris@infosplet.com) escribió:

Yes, turning wal_compression off improves things. Slave that was
mentioned unfortunately lagged too much before this setting was applied and
was turned off. However the remaining slave lags less now, although still
occasionally up to a few minutes. I think single threadedness of recovery
is a big slowdown for write heavy databases. Maybe an option to increase
wal_size beyond 16MB in v11 will help.

In the meantime we'll solve this by splitting the DB to 2 or 3
clusters or maybe trying out some sharding solution like Citus.

Boris

On Sun, Oct 21, 2018 at 9:06 AM, Boris Sagadin <boris@infosplet.com>
wrote:

Hello,

I have a database running on i3.8xlarge (256GB RAM, 32 CPU cores,
4x 1.9TB NVMe drive) AWS instance with about 5TB of disk space occupied,
ext4, Ubuntu 16.04.

Multi-tenant DB with about 40000 tables, insert heavy.

I started a new slave with identical HW specs, SR. DB started
syncing from master, which took about 4 hours, then it started applying the
WALs. However, it seems it can't catch up. Delay is still around 3 hours
(measured with now() - pg_last_xact_replay_timestamp()), even a
day later. It goes a few 100s up and down, but it seems to float around 3h
mark.

Disk IO is low at about 10%, measured with iostat, no connected
clients, recovery process is at around 90% CPU single core usage.

Tried tuning the various parameters, but with no avail. Only thing
I found suspicious is stracing the recovery process constantly produces
many errors such as:

lseek(428, 0, SEEK_END) = 780124160
lseek(30, 0, SEEK_END) = 212992
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(680, 0, SEEK_END) = 493117440
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(774, 0, SEEK_END) = 583368704

...[snip]...

read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(774, 0, SEEK_END) = 583368704
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(277, 0, SEEK_END) = 502882304
lseek(6, 516096, SEEK_SET) = 516096
read(6, "\227\320\5\0\1\0\0\0\0\340\7\246\26\274\0\0\315\0\0\0\0\0\0\0}\0178\5&/\260\r"...,
8192) = 8192
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(735, 0, SEEK_END) = 272809984
read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource
temporarily unavailable)
lseek(277, 0, SEEK_END) = 502882304

ls -l fd/9
lr-x------ 1 postgres postgres 64 Oct 21 06:21 fd/9 -> pipe:[46358]

Perf top on recovery produces:

27.76% postgres [.] pglz_decompress
9.90% [kernel] [k] entry_SYSCALL_64_after_swapgs
7.09% postgres [.] hash_search_with_hash_value
4.26% libpthread-2.23.so [.] llseek
3.64% libpthread-2.23.so [.] __read_nocancel
2.80% [kernel] [k] __fget_light
2.67% postgres [.] 0x000000000034d3ba
1.85% [kernel] [k] ext4_llseek
1.84% postgres [.] pg_comp_crc32c_sse42
1.44% postgres [.] hash_any
1.35% postgres [.] 0x000000000036afad
1.29% postgres [.] MarkBufferDirty
1.21% postgres [.] XLogReadRecord
[...]

Tried changing the process limits with prlimit to unlimited, but no
change.

I can turn off the WAL compression but I doubt this is the main
culprit. Any ideas appreciated.

Regards,
Boris

--
Cordialmente,

Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate

--
Cordialmente,

Ing. Hellmuth I. Vargas S.

--
Cordialmente,

Ing. Hellmuth I. Vargas S.

--
Cordialmente,

Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate