Buffer Allocation Concurrency Limits

Started by Jason Petersenalmost 12 years ago2 messages
#1Jason Petersen
jason@citusdata.com

In December, Metin (a coworker of mine) discussed an inability to scale a simple task (parallel scans of many independent tables) to many cores (it’s here). As a ramp-up task at Citus I was tasked to figure out what the heck was going on here.

I have a pretty extensive writeup here (whose length is more a result of my inexperience with the workings of PostgreSQL than anything else) and was looking for some feedback.

In short, my conclusion is that a working set larger than memory results in backends piling up on BufFreelistLock. As much as possible I removed anything that could be blamed for this:

Hyper-Threading is disabled
zone reclaim mode is disabled
numactl was used to ensure interleaved allocation
kernel.sched_migration_cost was set to highly disable migration
kernel.sched_autogroup_enabled was disabled
transparent hugepage support was disabled

For a way forward, I was thinking the buffer allocation sections could use some of the atomics Andres added here. Rather than workers grabbing BufFreelistLock to iterate the clock hand until they find a victim, the algorithm could be rewritten in a lock-free style, allowing workers to move the clock hand in tandem.

Alternatively, the clock iteration could be moved off to a background process, similar to what Amit Kapila proposed here.

Is this assessment accurate? I know 9.4 changes a lot about lock organization, but last I looked I didn’t see anything that could alleviate this contention: are there any plans to address this?

—Jason

#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Jason Petersen (#1)
Re: Buffer Allocation Concurrency Limits

On Tue, Apr 8, 2014 at 10:38 PM, Jason Petersen <jason@citusdata.com> wrote:

In December, Metin (a coworker of mine) discussed an inability to scale a
simple task (parallel scans of many independent tables) to many cores (it's
here). As a ramp-up task at Citus I was tasked to figure out what the heck
was going on here.

I have a pretty extensive writeup here (whose length is more a result of my
inexperience with the workings of PostgreSQL than anything else) and was
looking for some feedback.

At this moment, I am not able to open the above link (here), may be some
problem (it's showing Service Unavailable); I will try it later.

In short, my conclusion is that a working set larger than memory results in
backends piling up on BufFreelistLock.

Here when you say that working set larger than memory, do you mean to refer
*memory* as shared_buffers?
I think if the data is more than total memory available, anyway the
effect of I/O
can over shadow the effect of BufFreelistLock contention.

As much as possible I removed
anything that could be blamed for this:

Hyper-Threading is disabled
zone reclaim mode is disabled
numactl was used to ensure interleaved allocation
kernel.sched_migration_cost was set to highly disable migration
kernel.sched_autogroup_enabled was disabled
transparent hugepage support was disabled

For a way forward, I was thinking the buffer allocation sections could use
some of the atomics Andres added here. Rather than workers grabbing
BufFreelistLock to iterate the clock hand until they find a victim, the
algorithm could be rewritten in a lock-free style, allowing workers to move
the clock hand in tandem.

Alternatively, the clock iteration could be moved off to a background
process, similar to what Amit Kapila proposed here.

I think both of the above ideas can be useful, but not sure if they are
sufficient for scaling shared buffer's.

Is this assessment accurate? I know 9.4 changes a lot about lock
organization, but last I looked I didn't see anything that could alleviate
this contention: are there any plans to address this?

I am planing to work on this for 9.5.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers