larger shared buffers slows down cluster

Started by Andrew Dunstanalmost 14 years ago5 messageshackers

andrew@dunslane.net

almost 14 years ago

This problem has been reported by a client.

Consider the following very small table test case:

create table bar as select a,b,c,d,e from generate_series(1,2) a,
generate_series(3,4) b, generate_series( 5,6) c,
generate_series(7,8) d, generate_series(9,10) e;
create index bar_a on bar(a);
create index bar_b on bar(b);
create index bar_c on bar(c);
create index bar_d on bar(d);
create index bar_e on bar(e);
create unique index bar_abcde on bar(a,b,c,d,e);

Now running:

cluster bar using bar_abcde;

appears to be very sensitive to the shared buffers setting. In an amazon
very large memory instance (64GB) and PostgreSQL 9.1.4, I observed the
following timings:

Shared Buffers Time
48Gb 2058ms
8Gb 372ms
1gb 67ms

Is this expected behaviour? If so, is there a good explanation? I'm not
sure what other operations might be affected this way.

cheers

andrew

Tom Lane

tgl@sss.pgh.pa.us

almost 14 years ago

In reply to: Andrew Dunstan (#1)

Re: larger shared buffers slows down cluster

Andrew Dunstan <andrew@dunslane.net> writes:

Now running:
cluster bar using bar_abcde;
appears to be very sensitive to the shared buffers setting. In an amazon
very large memory instance (64GB) and PostgreSQL 9.1.4, I observed the
following timings:

Shared Buffers Time
48Gb 2058ms
8Gb 372ms
1gb 67ms

DropRelFileNodeBuffers, perhaps? See recent commits to reduce the cost
of that for large shared_buffers, notably
e8d029a30b5a5fb74b848a8697b1dfa3f66d9697 and
ece01aae479227d9836294b287d872c5a6146a11

regards, tom lane

Jeff Janes

jeff.janes@gmail.com

almost 14 years ago

In reply to: Andrew Dunstan (#1)

Re: larger shared buffers slows down cluster

On Wed, Aug 22, 2012 at 1:48 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

This problem has been reported by a client.

Consider the following very small table test case:

create table bar as select a,b,c,d,e from generate_series(1,2) a,
generate_series(3,4) b, generate_series( 5,6) c,
generate_series(7,8) d, generate_series(9,10) e;
create index bar_a on bar(a);
create index bar_b on bar(b);
create index bar_c on bar(c);
create index bar_d on bar(d);
create index bar_e on bar(e);
create unique index bar_abcde on bar(a,b,c,d,e);

Now running:

cluster bar using bar_abcde;

appears to be very sensitive to the shared buffers setting. In an amazon
very large memory instance (64GB) and PostgreSQL 9.1.4, I observed the
following timings:

Shared Buffers Time
48Gb 2058ms
8Gb 372ms
1gb 67ms

Is this expected behaviour?

Yeah. Clustering the table means that all the indexes and the old
version of the table all get dropped, and each time something is
dropped the entire buffer pool is scoured to remove the old buffers.

In my hands, this is about 10 times better in 9.2 than 9.1.4, at 8GB.
Because now the scouring is done once per object, not once per fork.
Also, the check is done without an initial spinlock.

It perhaps could be improved further by only scouring the pool once,
at the end of the transaction, with a hash of all objects to be
dropped.

If so, is there a good explanation? I'm not sure
what other operations might be affected this way.

drop, truncate, reindex, vacuum full. What else causes a table to be
re-written?

Cheers,

Jeff

Andrew Dunstan

andrew@dunslane.net

almost 14 years ago

In reply to: Jeff Janes (#3)

Re: larger shared buffers slows down cluster

On 08/22/2012 05:19 PM, Jeff Janes wrote:

On Wed, Aug 22, 2012 at 1:48 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

This problem has been reported by a client.

Consider the following very small table test case:

create table bar as select a,b,c,d,e from generate_series(1,2) a,
generate_series(3,4) b, generate_series( 5,6) c,
generate_series(7,8) d, generate_series(9,10) e;
create index bar_a on bar(a);
create index bar_b on bar(b);
create index bar_c on bar(c);
create index bar_d on bar(d);
create index bar_e on bar(e);
create unique index bar_abcde on bar(a,b,c,d,e);

Now running:

cluster bar using bar_abcde;

appears to be very sensitive to the shared buffers setting. In an amazon
very large memory instance (64GB) and PostgreSQL 9.1.4, I observed the
following timings:

Shared Buffers Time
48Gb 2058ms
8Gb 372ms
1gb 67ms

Is this expected behaviour?

Yeah. Clustering the table means that all the indexes and the old
version of the table all get dropped, and each time something is
dropped the entire buffer pool is scoured to remove the old buffers.

In my hands, this is about 10 times better in 9.2 than 9.1.4, at 8GB.
Because now the scouring is done once per object, not once per fork.
Also, the check is done without an initial spinlock.

It perhaps could be improved further by only scouring the pool once,
at the end of the transaction, with a hash of all objects to be
dropped.

If so, is there a good explanation? I'm not sure
what other operations might be affected this way.

drop, truncate, reindex, vacuum full. What else causes a table to be
re-written?

OK, thanks for the info.

cheers

andrew

Andrew Dunstan

andrew@dunslane.net

almost 14 years ago

In reply to: Jeff Janes (#3)

Re: larger shared buffers slows down cluster

On 08/22/2012 05:19 PM, Jeff Janes wrote:

Shared Buffers Time
48Gb 2058ms
8Gb 372ms
1gb 67ms

Is this expected behaviour?

Yeah. Clustering the table means that all the indexes and the old
version of the table all get dropped, and each time something is
dropped the entire buffer pool is scoured to remove the old buffers.

In my hands, this is about 10 times better in 9.2 than 9.1.4, at 8GB.
Because now the scouring is done once per object, not once per fork.
Also, the check is done without an initial spinlock.

It perhaps could be improved further by only scouring the pool once,
at the end of the transaction, with a hash of all objects to be
dropped.

FYI, I have rerun the tests on amazon with 9.2 BETA - the improvement I
saw ranged from a factor of roughly 2 (with 1Gb of shared memory) to 6
(with 48Gb).

cheers

andrew