Slow system due to ReorderBufferGetTupleBuf?

Started by Martin Mooreover 8 years ago4 messagesgeneral

martin.moore@avbrief.com

over 8 years ago

Postgres v10 on Debian stretch

I’m suffering from an occasionally very slow system. A few weeks ago someone mentioned using perf. I’ve installed this and caught the system during a slow period. It shows the following as the top cpu users:

9.09% postgres [.] ReorderBufferGetTupleBuf
6.14% postgres [.] ReorderBufferReturnChange

When ReorderBufferReturnChange is no longer running:

14.35% postgres [.] ReorderBufferGetTupleBuf

Can someone shed some light on this and advise how to prevent it reoccurring?

Cheers,

Martin.

Peter Geoghegan

pg@bowt.ie

over 8 years ago

In reply to: Martin Moore (#1)

Re: Slow system due to ReorderBufferGetTupleBuf?

On Mon, Jan 1, 2018 at 8:56 AM, Martin Moore <martin.moore@avbrief.com> wrote:

Can someone shed some light on this and advise how to prevent it reoccurring?

You're using v10, which has these two commits:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=58b25e98106dbe062cec0f3d31d64977bffaa4af

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9fab40ad32efa4038d19eaed975bb4c1713ccbc0

Unfortunately, per the commit message of the first commit, it doesn't
look like the tuple allocator uses any new strategy, at least until
this v11 commit:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a4ccc1cef5a04cc054af83bc4582a045d5232cb3

My guess is that that would make a noticeable difference, once v11
becomes available. Could you test this yourself by building from the
master branch?

--
Peter Geoghegan

martin.moore@avbrief.com

over 8 years ago

In reply to: Peter Geoghegan (#2)

Re: Slow system due to ReorderBufferGetTupleBuf?

On 01/01/2018, 17:45, "Peter Geoghegan" <pg@bowt.ie> wrote:

On Mon, Jan 1, 2018 at 8:56 AM, Martin Moore <martin.moore@avbrief.com> wrote:

Can someone shed some light on this and advise how to prevent it reoccurring?

You're using v10, which has these two commits:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=58b25e98106dbe062cec0f3d31d64977bffaa4af

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9fab40ad32efa4038d19eaed975bb4c1713ccbc0

Unfortunately, per the commit message of the first commit, it doesn't
look like the tuple allocator uses any new strategy, at least until
this v11 commit:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a4ccc1cef5a04cc054af83bc4582a045d5232cb3

My guess is that that would make a noticeable difference, once v11
becomes available. Could you test this yourself by building from the
master branch?

--
Peter Geoghegan

Thanks Peter. I don’t really want to go down that route for various reasons. There’s a task that copies ‘old’ rows to various old_ tables and then deletes from the main tables, then does a vaccum and analyse. Tables only have 20-30k rows. I’m guessing this may be the trigger for the problem so have changed the timing from every 20 mins to once in the middle of the night when things are quiet.

Would this explain the problem?

Martin.

martin.moore@avbrief.com

over 8 years ago

In reply to: Martin Moore (#3)

Re: Slow system due to ReorderBufferGetTupleBuf?

On 02/01/2018, 12:09, "Martin Moore" <martin.moore@avbrief.com> wrote:

On 01/01/2018, 17:45, "Peter Geoghegan" <pg@bowt.ie> wrote:

On Mon, Jan 1, 2018 at 8:56 AM, Martin Moore <martin.moore@avbrief.com> wrote:

Can someone shed some light on this and advise how to prevent it reoccurring?

You're using v10, which has these two commits:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=58b25e98106dbe062cec0f3d31d64977bffaa4af

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9fab40ad32efa4038d19eaed975bb4c1713ccbc0

Unfortunately, per the commit message of the first commit, it doesn't
look like the tuple allocator uses any new strategy, at least until
this v11 commit:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a4ccc1cef5a04cc054af83bc4582a045d5232cb3

My guess is that that would make a noticeable difference, once v11
becomes available. Could you test this yourself by building from the
master branch?

--
Peter Geoghegan

Thanks Peter. I don’t really want to go down that route for various reasons. There’s a task that copies ‘old’ rows to various old_ tables and then deletes from the main tables, then does a vaccum and analyse. Tables only have 20-30k rows. I’m guessing this may be the trigger for the problem so have changed the timing from every 20 mins to once in the middle of the night when things are quiet.

Would this explain the problem?

Martin.

======================================================================================================

Having stopped the suspect task, I’m still getting the same problem. Can’t even stop postgres:

waiting for server to shut down............................................................... failed
pg_ctl: server does not shut down

We’ve spent 2 yrs and a chunk of cash on a total system redesign and this is going to stop it from being released.

Can someone give me an idea what may be causing this – and what ReorderBufferGetTupleBuf is actually doing in case it gives me a clue.

Thanks.