FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

Started by Todd A. Cookalmost 17 years ago10 messagesgeneral
Jump to latest
#1Todd A. Cook
tcook@blackducksoftware.com

Hi,

First, the numbers:

PG Version Load time pg_database_size autovac
----------------------------------------------------------
8.2.13 179 min 92,807,992,820 on
8.3.7 180 min 84,048,744,044 on (defaults)
8.4b2 206 min 84,028,995,344 on (defaults)
8.4b2 183 min 84,028,839,696 off

The bulk of the data is in 16 tables, each having about 55 million rows of
the form (int, int, smallint, smallint, int, int, int). Each table has a
single partial index on one of the integer columns. The dump file was 14GB
compressed.

The loads were all done on the same machine, with the DB going on a pair
of SATA drives in a RAID-0 stripe. The machine has 2 non-HT Xeons and
8GB RAM. maintenance_work_mem was set to 512MB in all three cases.

-- todd

#2Vick Khera
vivek@khera.org
In reply to: Todd A. Cook (#1)
Re: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

On Wed, Jun 17, 2009 at 10:50 AM, Todd A.
Cook<tcook@blackducksoftware.com> wrote:

The loads were all done on the same machine, with the DB going on a pair
of SATA drives in a RAID-0 stripe.  The machine has 2 non-HT Xeons and
8GB RAM.  maintenance_work_mem was set to 512MB in all three cases.

What if you double or triple the number of checkpoint segments?

#3Alan McKay
alan.mckay@gmail.com
In reply to: Todd A. Cook (#1)
Re: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

There was an interesting presentation at PG Con from a guy at Sun who
did a series of load tests on 8.3 vs 8.4

http://www.pgcon.org/2009/schedule/events/124.en.html

There is a link to the video from that page so you can watch it. But
he found a strange "corner case" where 8.4 performed way worse. After
he did a bit of digging he found a couple of default settings that
had changed in 8.4, and when he set them back to their old 8.3 values
and re-ran the tests, there was a huge difference in outcome.

On Wed, Jun 17, 2009 at 10:50 AM, Todd A.
Cook<tcook@blackducksoftware.com> wrote:

Hi,

First, the numbers:

 PG Version    Load time     pg_database_size     autovac
----------------------------------------------------------
 8.2.13        179 min       92,807,992,820        on
 8.3.7         180 min       84,048,744,044        on (defaults)
 8.4b2         206 min       84,028,995,344        on (defaults)
 8.4b2         183 min       84,028,839,696        off

The bulk of the data is in 16 tables, each having about 55 million rows of
the form (int, int, smallint, smallint, int, int, int).  Each table has a
single partial index on one of the integer columns.  The dump file was 14GB
compressed.

The loads were all done on the same machine, with the DB going on a pair
of SATA drives in a RAID-0 stripe.  The machine has 2 non-HT Xeons and
8GB RAM.  maintenance_work_mem was set to 512MB in all three cases.

-- todd

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
“Don't eat anything you've ever seen advertised on TV”
- Michael Pollan, author of "In Defense of Food"

#4Todd A. Cook
tcook@blackducksoftware.com
In reply to: Vick Khera (#2)
Re: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

Vick Khera wrote:

On Wed, Jun 17, 2009 at 10:50 AM, Todd A.
Cook<tcook@blackducksoftware.com> wrote:

The loads were all done on the same machine, with the DB going on a pair
of SATA drives in a RAID-0 stripe. The machine has 2 non-HT Xeons and
8GB RAM. maintenance_work_mem was set to 512MB in all three cases.

What if you double or triple the number of checkpoint segments?

checkpoint_segments was set to 128 for all tests.

-- todd

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Todd A. Cook (#1)
Re: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

"Todd A. Cook" <tcook@blackducksoftware.com> writes:

First, the numbers:

PG Version Load time pg_database_size autovac
----------------------------------------------------------
8.2.13 179 min 92,807,992,820 on
8.3.7 180 min 84,048,744,044 on (defaults)
8.4b2 206 min 84,028,995,344 on (defaults)
8.4b2 183 min 84,028,839,696 off

The bulk of the data is in 16 tables, each having about 55 million rows of
the form (int, int, smallint, smallint, int, int, int). Each table has a
single partial index on one of the integer columns.

Given that it's multiple tables, it'd be possible for autovacuum to
kick in and ANALYZE the data inserted into earlier tables while the
later ones were still being loaded. If so, the discrepancy might be
explained by 8.4's more-aggressive statistics target, which means that
a background ANALYZE will take about 10x more work than before.

If you have time to repeat the experiments, it would be interesting to
see what happens with consistent default_statistics_target across 8.3
and 8.4.

regards, tom lane

#6Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Tom Lane (#5)
Re: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

Tom Lane wrote:

"Todd A. Cook" <tcook@blackducksoftware.com> writes:

First, the numbers:

PG Version Load time pg_database_size autovac
----------------------------------------------------------
8.2.13 179 min 92,807,992,820 on
8.3.7 180 min 84,048,744,044 on (defaults)
8.4b2 206 min 84,028,995,344 on (defaults)
8.4b2 183 min 84,028,839,696 off

The bulk of the data is in 16 tables, each having about 55 million rows of
the form (int, int, smallint, smallint, int, int, int). Each table has a
single partial index on one of the integer columns.

Given that it's multiple tables, it'd be possible for autovacuum to
kick in and ANALYZE the data inserted into earlier tables while the
later ones were still being loaded. If so, the discrepancy might be
explained by 8.4's more-aggressive statistics target, which means that
a background ANALYZE will take about 10x more work than before.

If you have time to repeat the experiments, it would be interesting to
see what happens with consistent default_statistics_target across 8.3
and 8.4.

given that this was likely a single-thread restore and therefor wal
logged I wonder if the 206min one might be affected by the issue
discussed here
http://archives.postgresql.org/pgsql-hackers/2009-06/msg01133.php

Stefan

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stefan Kaltenbrunner (#6)
Re: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:

Tom Lane wrote:

If you have time to repeat the experiments, it would be interesting to
see what happens with consistent default_statistics_target across 8.3
and 8.4.

given that this was likely a single-thread restore and therefor wal
logged I wonder if the 206min one might be affected by the issue
discussed here
http://archives.postgresql.org/pgsql-hackers/2009-06/msg01133.php

It might be, but the current theory about that (ie that it's the
bulkwrite patch doing it) doesn't explain the interaction with
autovacuum on/off. I'd still like to hear whether changing the stats
target changes Todd's results.

regards, tom lane

#8Todd A. Cook
tcook@blackducksoftware.com
In reply to: Tom Lane (#5)
Re: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

Tom Lane wrote:

"Todd A. Cook" <tcook@blackducksoftware.com> writes:

First, the numbers:

PG Version Load time pg_database_size autovac
----------------------------------------------------------
8.2.13 179 min 92,807,992,820 on
8.3.7 180 min 84,048,744,044 on (defaults)
8.4b2 206 min 84,028,995,344 on (defaults)
8.4b2 183 min 84,028,839,696 off

The bulk of the data is in 16 tables, each having about 55 million rows of
the form (int, int, smallint, smallint, int, int, int). Each table has a
single partial index on one of the integer columns.

Given that it's multiple tables, it'd be possible for autovacuum to
kick in and ANALYZE the data inserted into earlier tables while the
later ones were still being loaded. If so, the discrepancy might be
explained by 8.4's more-aggressive statistics target, which means that
a background ANALYZE will take about 10x more work than before.

If you have time to repeat the experiments, it would be interesting to
see what happens with consistent default_statistics_target across 8.3
and 8.4.

That would seem to be it:

8.4b2 183 min 84,028,897,040 on (defaults, default_statistics_target=10)

I'll run the test on 8.3.7 with default_statistics_target=100 over the weekend.

-- todd

Show quoted text

regards, tom lane
.

#9Todd A. Cook
tcook@blackducksoftware.com
In reply to: Todd A. Cook (#8)
Re: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

Todd A. Cook wrote:

Tom Lane wrote:

"Todd A. Cook" <tcook@blackducksoftware.com> writes:

First, the numbers:
PG Version Load time pg_database_size autovac
----------------------------------------------------------
8.2.13 179 min 92,807,992,820 on
8.3.7 180 min 84,048,744,044 on (defaults)
8.4b2 206 min 84,028,995,344 on (defaults)
8.4b2 183 min 84,028,839,696 off
The bulk of the data is in 16 tables, each having about 55 million rows of
the form (int, int, smallint, smallint, int, int, int). Each table has a
single partial index on one of the integer columns.

Given that it's multiple tables, it'd be possible for autovacuum to
kick in and ANALYZE the data inserted into earlier tables while the
later ones were still being loaded. If so, the discrepancy might be
explained by 8.4's more-aggressive statistics target, which means that
a background ANALYZE will take about 10x more work than before.

If you have time to repeat the experiments, it would be interesting to
see what happens with consistent default_statistics_target across 8.3
and 8.4.

That would seem to be it:

8.4b2 183 min 84,028,897,040 on (defaults, default_statistics_target=10)

I'll run the test on 8.3.7 with default_statistics_target=100 over the weekend.

The results for this are also consistent with Tom's theory:

8.3.7 205 min 84,048,866,924 on (defaults, default_statistics_target=100)

-- todd

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Todd A. Cook (#9)
Re: FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

"Todd A. Cook" <tcook@blackducksoftware.com> writes:

Todd A. Cook wrote:

Tom Lane wrote:

If you have time to repeat the experiments, it would be interesting to
see what happens with consistent default_statistics_target across 8.3
and 8.4.

That would seem to be it:
8.4b2 183 min 84,028,897,040 on (defaults, default_statistics_target=10)

I'll run the test on 8.3.7 with default_statistics_target=100 over the weekend.

The results for this are also consistent with Tom's theory:
8.3.7 205 min 84,048,866,924 on (defaults, default_statistics_target=100)

OK, thanks for following up. So this is a different effect from the
COPY ring buffer size issue being argued about over on pgsql-hackers.
I think we can just say that this one is a price being paid
intentionally for better statistics, and if you don't need better
statistics you can back off the target setting ...

regards, tom lane