larger shared buffers slows down cluster
This problem has been reported by a client.
Consider the following very small table test case:
create table bar as select a,b,c,d,e from generate_series(1,2) a,
generate_series(3,4) b, generate_series( 5,6) c,
generate_series(7,8) d, generate_series(9,10) e;
create index bar_a on bar(a);
create index bar_b on bar(b);
create index bar_c on bar(c);
create index bar_d on bar(d);
create index bar_e on bar(e);
create unique index bar_abcde on bar(a,b,c,d,e);
Now running:
cluster bar using bar_abcde;
appears to be very sensitive to the shared buffers setting. In an amazon
very large memory instance (64GB) and PostgreSQL 9.1.4, I observed the
following timings:
Shared Buffers Time
48Gb 2058ms
8Gb 372ms
1gb 67ms
Is this expected behaviour? If so, is there a good explanation? I'm not
sure what other operations might be affected this way.
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
Now running:
cluster bar using bar_abcde;
appears to be very sensitive to the shared buffers setting. In an amazon
very large memory instance (64GB) and PostgreSQL 9.1.4, I observed the
following timings:
Shared Buffers Time
48Gb 2058ms
8Gb 372ms
1gb 67ms
DropRelFileNodeBuffers, perhaps? See recent commits to reduce the cost
of that for large shared_buffers, notably
e8d029a30b5a5fb74b848a8697b1dfa3f66d9697 and
ece01aae479227d9836294b287d872c5a6146a11
regards, tom lane
On Wed, Aug 22, 2012 at 1:48 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
This problem has been reported by a client.
Consider the following very small table test case:
create table bar as select a,b,c,d,e from generate_series(1,2) a,
generate_series(3,4) b, generate_series( 5,6) c,
generate_series(7,8) d, generate_series(9,10) e;
create index bar_a on bar(a);
create index bar_b on bar(b);
create index bar_c on bar(c);
create index bar_d on bar(d);
create index bar_e on bar(e);
create unique index bar_abcde on bar(a,b,c,d,e);Now running:
cluster bar using bar_abcde;
appears to be very sensitive to the shared buffers setting. In an amazon
very large memory instance (64GB) and PostgreSQL 9.1.4, I observed the
following timings:Shared Buffers Time
48Gb 2058ms
8Gb 372ms
1gb 67msIs this expected behaviour?
Yeah. Clustering the table means that all the indexes and the old
version of the table all get dropped, and each time something is
dropped the entire buffer pool is scoured to remove the old buffers.
In my hands, this is about 10 times better in 9.2 than 9.1.4, at 8GB.
Because now the scouring is done once per object, not once per fork.
Also, the check is done without an initial spinlock.
It perhaps could be improved further by only scouring the pool once,
at the end of the transaction, with a hash of all objects to be
dropped.
If so, is there a good explanation? I'm not sure
what other operations might be affected this way.
drop, truncate, reindex, vacuum full. What else causes a table to be
re-written?
Cheers,
Jeff
On 08/22/2012 05:19 PM, Jeff Janes wrote:
On Wed, Aug 22, 2012 at 1:48 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
This problem has been reported by a client.
Consider the following very small table test case:
create table bar as select a,b,c,d,e from generate_series(1,2) a,
generate_series(3,4) b, generate_series( 5,6) c,
generate_series(7,8) d, generate_series(9,10) e;
create index bar_a on bar(a);
create index bar_b on bar(b);
create index bar_c on bar(c);
create index bar_d on bar(d);
create index bar_e on bar(e);
create unique index bar_abcde on bar(a,b,c,d,e);Now running:
cluster bar using bar_abcde;
appears to be very sensitive to the shared buffers setting. In an amazon
very large memory instance (64GB) and PostgreSQL 9.1.4, I observed the
following timings:Shared Buffers Time
48Gb 2058ms
8Gb 372ms
1gb 67msIs this expected behaviour?
Yeah. Clustering the table means that all the indexes and the old
version of the table all get dropped, and each time something is
dropped the entire buffer pool is scoured to remove the old buffers.In my hands, this is about 10 times better in 9.2 than 9.1.4, at 8GB.
Because now the scouring is done once per object, not once per fork.
Also, the check is done without an initial spinlock.It perhaps could be improved further by only scouring the pool once,
at the end of the transaction, with a hash of all objects to be
dropped.If so, is there a good explanation? I'm not sure
what other operations might be affected this way.drop, truncate, reindex, vacuum full. What else causes a table to be
re-written?
OK, thanks for the info.
cheers
andrew
On 08/22/2012 05:19 PM, Jeff Janes wrote:
Shared Buffers Time
48Gb 2058ms
8Gb 372ms
1gb 67msIs this expected behaviour?
Yeah. Clustering the table means that all the indexes and the old
version of the table all get dropped, and each time something is
dropped the entire buffer pool is scoured to remove the old buffers.In my hands, this is about 10 times better in 9.2 than 9.1.4, at 8GB.
Because now the scouring is done once per object, not once per fork.
Also, the check is done without an initial spinlock.It perhaps could be improved further by only scouring the pool once,
at the end of the transaction, with a hash of all objects to be
dropped.
FYI, I have rerun the tests on amazon with 9.2 BETA - the improvement I
saw ranged from a factor of roughly 2 (with 1Gb of shared memory) to 6
(with 48Gb).
cheers
andrew