8.3RC2 vs 8.2.6 testing results

Started by Vladabout 18 years ago14 messagesgeneral
Jump to latest
#1Vlad
marchenko@gmail.com

Hello,

I wanted to share performance-related test results for Postgresql
8.3RC2 and 8.2.6. In both cases we used a freshly imported database
followed by analyze verbose command. Same server was used for testing
(2.6.23.14-107.fc8 x86_64) for each versions; both postgreses were
compiled with "-O3 -mtune=core2 -march=core2 -m64" flags. In both
cases default configuration was used with increased shared buffers to
1Gb (total server ram is 32Gb), increased work and maintenance mem,
enabled autovacuum, increased default_statistics_target to 100,
increased effective_cache_size to 20Gb, disabled fsync and increased
checkpoint_segments. Total size (on disk) of the tables involved in
the query was around 300Mb.

1. Freshly imported DB size on disk was about 3% smaller for 8.3
2. We ran several tests and found 8.3 generally 10% slower than 8.2.6.
We took special measures to make sure that no third factors involved
(no other apps running, all data was cached from disks, etc). Below
is one of the queries that we used for testing (I anonymized table
names) along with query plan for both 8.3 and 8.2. The query execution
plans are the same for both versions, but what we found quite
interesting is that if we add all the times from each line of 8.2's
query plan, it roughly adds-up to the total execution time. For 8.3's
plan each line shows a shorter time, yet resulting in longer total
runtime. Also, summing 8.3's plan lines doesn't come close to the
total execution time:

SELECT _."a_id", SUM(_."counter")
FROM ts.t_c AS _
LEFT OUTER JOIN ts.t_a AS __1 ON _."a_id"=__1."id"
LEFT OUTER JOIN ts.t_i AS __3 ON _."i_id"=__3."id"
LEFT OUTER JOIN ts.t_b AS __2 ON __3."b_id"=__2."id"
WHERE (_."date_day" >= '2008-01-01 00:00:00' AND _."date_day" <=
'2008-01-27 23:59:59')
AND __1."status" IS TRUE
AND __2."status" IS TRUE
GROUP BY _."a_id"

8.2.6 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=134877.04..134899.04 rows=200 width=52) (actual
time=21517.837..21517.890 rows=47 loops=1)
-> Hash Join (cost=2450.09..111489.75 rows=550289 width=52)
(actual time=76.083..7691.579 rows=2593557 loops=1)
Hash Cond: (_.i_id = __3.id)
-> Hash Join (cost=19.20..95377.74 rows=934651 width=56)
(actual time=0.119..4933.928 rows=2596942 loops=1)
Hash Cond: (_.a_id = __1.id)
-> Append (cost=0.00..76276.09 rows=2596252 width=56)
(actual time=0.014..2988.950 rows=2596942 loops=1)
-> Seq Scan on t_c _ (cost=0.00..21.10 rows=4
width=56) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((date_day >= '2008-01-01'::date)
AND (date_day <= '2008-01-27'::date))
-> Seq Scan on t_c_2008_01 _
(cost=0.00..76254.99 rows=2596248 width=56) (actual
time=0.011..1979.606 rows=2596942 loops=1)
Filter: ((date_day >= '2008-01-01'::date)
AND (date_day <= '2008-01-27'::date))
-> Hash (cost=18.30..18.30 rows=72 width=4) (actual
time=0.094..0.094 rows=72 loops=1)
-> Seq Scan on t_a __1 (cost=0.00..18.30
rows=72 width=4) (actual time=0.004..0.067 rows=72 loops=1)
Filter: (status IS TRUE)
-> Hash (cost=1950.44..1950.44 rows=38436 width=4) (actual
time=75.931..75.931 rows=59934 loops=1)
-> Hash Join (cost=57.45..1950.44 rows=38436 width=4)
(actual time=0.829..54.760 rows=59934 loops=1)
Hash Cond: (__3.b_id = __2.id)
-> Seq Scan on t_i __3 (cost=0.00..1263.82
rows=65282 width=8) (actual time=0.005..16.852 rows=65282 loops=1)
-> Hash (cost=50.90..50.90 rows=524 width=4)
(actual time=0.499..0.499 rows=524 loops=1)
-> Seq Scan on t_b __2 (cost=0.00..50.90
rows=524 width=4) (actual time=0.004..0.312 rows=524 loops=1)
Filter: (status IS TRUE)
Total runtime: 21518.097 ms

8.3RC2: QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=132352.98..132572.98 rows=200 width=52) (actual
time=24354.972..24355.019 rows=47 loops=1)
-> Hash Join (cost=2379.09..108954.69 rows=550548 width=52)
(actual time=76.188..8177.510 rows=2593557 loops=1)
Hash Cond: (_.i_id = __3.id)
-> Hash Join (cost=16.20..92904.25 rows=935090 width=56)
(actual time=0.140..5304.968 rows=2596942 loops=1)
Hash Cond: (_.a_id = __1.id)
-> Append (cost=0.00..73796.62 rows=2597473 width=56)
(actual time=0.043..3272.024 rows=2596942 loops=1)
-> Seq Scan on t_c _ (cost=0.00..21.55 rows=4
width=56) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((date_day >= '2008-01-01'::date)
AND (date_day <= '2008-01-27'::date))
-> Seq Scan on t_c_2008_01 _
(cost=0.00..73775.07 rows=2597469 width=56) (actual
time=0.040..2245.209 rows=2596942 loops=1)
Filter: ((date_day >= '2008-01-01'::date)
AND (date_day <= '2008-01-27'::date))
-> Hash (cost=15.30..15.30 rows=72 width=4) (actual
time=0.091..0.091 rows=72 loops=1)
-> Seq Scan on t_a __1 (cost=0.00..15.30
rows=72 width=4) (actual time=0.003..0.061 rows=72 loops=1)
Filter: (status IS TRUE)
-> Hash (cost=1882.44..1882.44 rows=38436 width=4) (actual
time=76.027..76.027 rows=59934 loops=1)
-> Hash Join (cost=55.45..1882.44 rows=38436 width=4)
(actual time=0.835..54.576 rows=59934 loops=1)
Hash Cond: (__3.b_id = __2.id)
-> Seq Scan on t_i __3 (cost=0.00..1197.82
rows=65282 width=8) (actual time=0.004..16.096 rows=65282 loops=1)
-> Hash (cost=48.90..48.90 rows=524 width=4)
(actual time=0.513..0.513 rows=524 loops=1)
-> Seq Scan on t_b __2 (cost=0.00..48.90
rows=524 width=4) (actual time=0.003..0.307 rows=524 loops=1)
Filter: (status IS TRUE)
Total runtime: 24355.179 ms

Any ideas on what-we-were-doing-wrong are welcomed

--
Vlad

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Vlad (#1)
Re: 8.3RC2 vs 8.2.6 testing results

Hello

8.3 plan is not optimal.

-> Hash Join (cost=2379.09..108954.69 rows=550548 width=52)
(actual time=76.188..8177.510 rows=2593557 loops=1)

please, try to increase statistics

default_statistics_target (in postgresql.conf) to 100 and repeat
import and your test.

Regards
Pavel Stehule

Show quoted text

On 28/01/2008, Vlad <marchenko@gmail.com> wrote:

Hello,

I wanted to share performance-related test results for Postgresql
8.3RC2 and 8.2.6. In both cases we used a freshly imported database
followed by analyze verbose command. Same server was used for testing
(2.6.23.14-107.fc8 x86_64) for each versions; both postgreses were
compiled with "-O3 -mtune=core2 -march=core2 -m64" flags. In both
cases default configuration was used with increased shared buffers to
1Gb (total server ram is 32Gb), increased work and maintenance mem,
enabled autovacuum, increased default_statistics_target to 100,
increased effective_cache_size to 20Gb, disabled fsync and increased
checkpoint_segments. Total size (on disk) of the tables involved in
the query was around 300Mb.

1. Freshly imported DB size on disk was about 3% smaller for 8.3
2. We ran several tests and found 8.3 generally 10% slower than 8.2.6.
We took special measures to make sure that no third factors involved
(no other apps running, all data was cached from disks, etc). Below
is one of the queries that we used for testing (I anonymized table
names) along with query plan for both 8.3 and 8.2. The query execution
plans are the same for both versions, but what we found quite
interesting is that if we add all the times from each line of 8.2's
query plan, it roughly adds-up to the total execution time. For 8.3's
plan each line shows a shorter time, yet resulting in longer total
runtime. Also, summing 8.3's plan lines doesn't come close to the
total execution time:

SELECT _."a_id", SUM(_."counter")
FROM ts.t_c AS _
LEFT OUTER JOIN ts.t_a AS __1 ON _."a_id"=__1."id"
LEFT OUTER JOIN ts.t_i AS __3 ON _."i_id"=__3."id"
LEFT OUTER JOIN ts.t_b AS __2 ON __3."b_id"=__2."id"
WHERE (_."date_day" >= '2008-01-01 00:00:00' AND _."date_day" <=
'2008-01-27 23:59:59')
AND __1."status" IS TRUE
AND __2."status" IS TRUE
GROUP BY _."a_id"

8.2.6 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=134877.04..134899.04 rows=200 width=52) (actual
time=21517.837..21517.890 rows=47 loops=1)
-> Hash Join (cost=2450.09..111489.75 rows=550289 width=52)
(actual time=76.083..7691.579 rows=2593557 loops=1)
Hash Cond: (_.i_id = __3.id)
-> Hash Join (cost=19.20..95377.74 rows=934651 width=56)
(actual time=0.119..4933.928 rows=2596942 loops=1)
Hash Cond: (_.a_id = __1.id)
-> Append (cost=0.00..76276.09 rows=2596252 width=56)
(actual time=0.014..2988.950 rows=2596942 loops=1)
-> Seq Scan on t_c _ (cost=0.00..21.10 rows=4
width=56) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((date_day >= '2008-01-01'::date)
AND (date_day <= '2008-01-27'::date))
-> Seq Scan on t_c_2008_01 _
(cost=0.00..76254.99 rows=2596248 width=56) (actual
time=0.011..1979.606 rows=2596942 loops=1)
Filter: ((date_day >= '2008-01-01'::date)
AND (date_day <= '2008-01-27'::date))
-> Hash (cost=18.30..18.30 rows=72 width=4) (actual
time=0.094..0.094 rows=72 loops=1)
-> Seq Scan on t_a __1 (cost=0.00..18.30
rows=72 width=4) (actual time=0.004..0.067 rows=72 loops=1)
Filter: (status IS TRUE)
-> Hash (cost=1950.44..1950.44 rows=38436 width=4) (actual
time=75.931..75.931 rows=59934 loops=1)
-> Hash Join (cost=57.45..1950.44 rows=38436 width=4)
(actual time=0.829..54.760 rows=59934 loops=1)
Hash Cond: (__3.b_id = __2.id)
-> Seq Scan on t_i __3 (cost=0.00..1263.82
rows=65282 width=8) (actual time=0.005..16.852 rows=65282 loops=1)
-> Hash (cost=50.90..50.90 rows=524 width=4)
(actual time=0.499..0.499 rows=524 loops=1)
-> Seq Scan on t_b __2 (cost=0.00..50.90
rows=524 width=4) (actual time=0.004..0.312 rows=524 loops=1)
Filter: (status IS TRUE)
Total runtime: 21518.097 ms

8.3RC2: QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=132352.98..132572.98 rows=200 width=52) (actual
time=24354.972..24355.019 rows=47 loops=1)
-> Hash Join (cost=2379.09..108954.69 rows=550548 width=52)
(actual time=76.188..8177.510 rows=2593557 loops=1)
Hash Cond: (_.i_id = __3.id)
-> Hash Join (cost=16.20..92904.25 rows=935090 width=56)
(actual time=0.140..5304.968 rows=2596942 loops=1)
Hash Cond: (_.a_id = __1.id)
-> Append (cost=0.00..73796.62 rows=2597473 width=56)
(actual time=0.043..3272.024 rows=2596942 loops=1)
-> Seq Scan on t_c _ (cost=0.00..21.55 rows=4
width=56) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((date_day >= '2008-01-01'::date)
AND (date_day <= '2008-01-27'::date))
-> Seq Scan on t_c_2008_01 _
(cost=0.00..73775.07 rows=2597469 width=56) (actual
time=0.040..2245.209 rows=2596942 loops=1)
Filter: ((date_day >= '2008-01-01'::date)
AND (date_day <= '2008-01-27'::date))
-> Hash (cost=15.30..15.30 rows=72 width=4) (actual
time=0.091..0.091 rows=72 loops=1)
-> Seq Scan on t_a __1 (cost=0.00..15.30
rows=72 width=4) (actual time=0.003..0.061 rows=72 loops=1)
Filter: (status IS TRUE)
-> Hash (cost=1882.44..1882.44 rows=38436 width=4) (actual
time=76.027..76.027 rows=59934 loops=1)
-> Hash Join (cost=55.45..1882.44 rows=38436 width=4)
(actual time=0.835..54.576 rows=59934 loops=1)
Hash Cond: (__3.b_id = __2.id)
-> Seq Scan on t_i __3 (cost=0.00..1197.82
rows=65282 width=8) (actual time=0.004..16.096 rows=65282 loops=1)
-> Hash (cost=48.90..48.90 rows=524 width=4)
(actual time=0.513..0.513 rows=524 loops=1)
-> Seq Scan on t_b __2 (cost=0.00..48.90
rows=524 width=4) (actual time=0.003..0.307 rows=524 loops=1)
Filter: (status IS TRUE)
Total runtime: 24355.179 ms

Any ideas on what-we-were-doing-wrong are welcomed

--
Vlad

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#2)
Re: 8.3RC2 vs 8.2.6 testing results

On 28/01/2008, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hello

8.3 plan is not optimal.

-> Hash Join (cost=2379.09..108954.69 rows=550548 width=52)
(actual time=76.188..8177.510 rows=2593557 loops=1)

please, try to increase statistics

I am blind, I am sorry, It's noise, you did it.

Show quoted text

default_statistics_target (in postgresql.conf) to 100 and repeat
import and your test.

Regards
Pavel Stehule

On 28/01/2008, Vlad <marchenko@gmail.com> wrote:

Hello,

I wanted to share performance-related test results for Postgresql
8.3RC2 and 8.2.6. In both cases we used a freshly imported database
followed by analyze verbose command. Same server was used for testing
(2.6.23.14-107.fc8 x86_64) for each versions; both postgreses were
compiled with "-O3 -mtune=core2 -march=core2 -m64" flags. In both
cases default configuration was used with increased shared buffers to
1Gb (total server ram is 32Gb), increased work and maintenance mem,
enabled autovacuum, increased default_statistics_target to 100,
increased effective_cache_size to 20Gb, disabled fsync and increased
checkpoint_segments. Total size (on disk) of the tables involved in
the query was around 300Mb.

1. Freshly imported DB size on disk was about 3% smaller for 8.3
2. We ran several tests and found 8.3 generally 10% slower than 8.2.6.
We took special measures to make sure that no third factors involved
(no other apps running, all data was cached from disks, etc). Below
is one of the queries that we used for testing (I anonymized table
names) along with query plan for both 8.3 and 8.2. The query execution
plans are the same for both versions, but what we found quite
interesting is that if we add all the times from each line of 8.2's
query plan, it roughly adds-up to the total execution time. For 8.3's
plan each line shows a shorter time, yet resulting in longer total
runtime. Also, summing 8.3's plan lines doesn't come close to the
total execution time:

SELECT _."a_id", SUM(_."counter")
FROM ts.t_c AS _
LEFT OUTER JOIN ts.t_a AS __1 ON _."a_id"=__1."id"
LEFT OUTER JOIN ts.t_i AS __3 ON _."i_id"=__3."id"
LEFT OUTER JOIN ts.t_b AS __2 ON __3."b_id"=__2."id"
WHERE (_."date_day" >= '2008-01-01 00:00:00' AND _."date_day" <=
'2008-01-27 23:59:59')
AND __1."status" IS TRUE
AND __2."status" IS TRUE
GROUP BY _."a_id"

8.2.6 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=134877.04..134899.04 rows=200 width=52) (actual
time=21517.837..21517.890 rows=47 loops=1)
-> Hash Join (cost=2450.09..111489.75 rows=550289 width=52)
(actual time=76.083..7691.579 rows=2593557 loops=1)
Hash Cond: (_.i_id = __3.id)
-> Hash Join (cost=19.20..95377.74 rows=934651 width=56)
(actual time=0.119..4933.928 rows=2596942 loops=1)
Hash Cond: (_.a_id = __1.id)
-> Append (cost=0.00..76276.09 rows=2596252 width=56)
(actual time=0.014..2988.950 rows=2596942 loops=1)
-> Seq Scan on t_c _ (cost=0.00..21.10 rows=4
width=56) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((date_day >= '2008-01-01'::date)
AND (date_day <= '2008-01-27'::date))
-> Seq Scan on t_c_2008_01 _
(cost=0.00..76254.99 rows=2596248 width=56) (actual
time=0.011..1979.606 rows=2596942 loops=1)
Filter: ((date_day >= '2008-01-01'::date)
AND (date_day <= '2008-01-27'::date))
-> Hash (cost=18.30..18.30 rows=72 width=4) (actual
time=0.094..0.094 rows=72 loops=1)
-> Seq Scan on t_a __1 (cost=0.00..18.30
rows=72 width=4) (actual time=0.004..0.067 rows=72 loops=1)
Filter: (status IS TRUE)
-> Hash (cost=1950.44..1950.44 rows=38436 width=4) (actual
time=75.931..75.931 rows=59934 loops=1)
-> Hash Join (cost=57.45..1950.44 rows=38436 width=4)
(actual time=0.829..54.760 rows=59934 loops=1)
Hash Cond: (__3.b_id = __2.id)
-> Seq Scan on t_i __3 (cost=0.00..1263.82
rows=65282 width=8) (actual time=0.005..16.852 rows=65282 loops=1)
-> Hash (cost=50.90..50.90 rows=524 width=4)
(actual time=0.499..0.499 rows=524 loops=1)
-> Seq Scan on t_b __2 (cost=0.00..50.90
rows=524 width=4) (actual time=0.004..0.312 rows=524 loops=1)
Filter: (status IS TRUE)
Total runtime: 21518.097 ms

8.3RC2: QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=132352.98..132572.98 rows=200 width=52) (actual
time=24354.972..24355.019 rows=47 loops=1)
-> Hash Join (cost=2379.09..108954.69 rows=550548 width=52)
(actual time=76.188..8177.510 rows=2593557 loops=1)
Hash Cond: (_.i_id = __3.id)
-> Hash Join (cost=16.20..92904.25 rows=935090 width=56)
(actual time=0.140..5304.968 rows=2596942 loops=1)
Hash Cond: (_.a_id = __1.id)
-> Append (cost=0.00..73796.62 rows=2597473 width=56)
(actual time=0.043..3272.024 rows=2596942 loops=1)
-> Seq Scan on t_c _ (cost=0.00..21.55 rows=4
width=56) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((date_day >= '2008-01-01'::date)
AND (date_day <= '2008-01-27'::date))
-> Seq Scan on t_c_2008_01 _
(cost=0.00..73775.07 rows=2597469 width=56) (actual
time=0.040..2245.209 rows=2596942 loops=1)
Filter: ((date_day >= '2008-01-01'::date)
AND (date_day <= '2008-01-27'::date))
-> Hash (cost=15.30..15.30 rows=72 width=4) (actual
time=0.091..0.091 rows=72 loops=1)
-> Seq Scan on t_a __1 (cost=0.00..15.30
rows=72 width=4) (actual time=0.003..0.061 rows=72 loops=1)
Filter: (status IS TRUE)
-> Hash (cost=1882.44..1882.44 rows=38436 width=4) (actual
time=76.027..76.027 rows=59934 loops=1)
-> Hash Join (cost=55.45..1882.44 rows=38436 width=4)
(actual time=0.835..54.576 rows=59934 loops=1)
Hash Cond: (__3.b_id = __2.id)
-> Seq Scan on t_i __3 (cost=0.00..1197.82
rows=65282 width=8) (actual time=0.004..16.096 rows=65282 loops=1)
-> Hash (cost=48.90..48.90 rows=524 width=4)
(actual time=0.513..0.513 rows=524 loops=1)
-> Seq Scan on t_b __2 (cost=0.00..48.90
rows=524 width=4) (actual time=0.003..0.307 rows=524 loops=1)
Filter: (status IS TRUE)
Total runtime: 24355.179 ms

Any ideas on what-we-were-doing-wrong are welcomed

--
Vlad

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#4Vlad
marchenko@gmail.com
In reply to: Pavel Stehule (#2)
Re: 8.3RC2 vs 8.2.6 testing results

Pavel:

thanks for your feedback. To me plans generated by 8.2 and 8.3 are
equal and only differ by execution times. (I don't know, maybe email
wrap'ed lines, so I've attached plans to my message). Also, I confirm
that that parameter was increased (to 100) before the ran tests.

On Jan 28, 2008 4:26 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Hello

8.3 plan is not optimal.

-> Hash Join (cost=2379.09..108954.69 rows=550548 width=52)
(actual time=76.188..8177.510 rows=2593557 loops=1)

please, try to increase statistics

default_statistics_target (in postgresql.conf) to 100 and repeat
import and your test.

Regards
Pavel Stehule

-- Vlad

Attachments:

82_plan.txttext/plain; name=82_plan.txtDownload
83_plan.txttext/plain; name=83_plan.txtDownload
#5Scott Marlowe
scott.marlowe@gmail.com
In reply to: Vlad (#1)
Re: 8.3RC2 vs 8.2.6 testing results

On Jan 28, 2008 3:56 PM, Vlad <marchenko@gmail.com> wrote:

Hello,

1. Freshly imported DB size on disk was about 3% smaller for 8.3
2. We ran several tests and found 8.3 generally 10% slower than 8.2.6.
We took special measures to make sure that no third factors involved
(no other apps running, all data was cached from disks, etc). Below
is one of the queries that we used for testing (I anonymized table
names) along with query plan for both 8.3 and 8.2. The query execution
plans are the same for both versions, but what we found quite
interesting is that if we add all the times from each line of 8.2's
query plan, it roughly adds-up to the total execution time. For 8.3's
plan each line shows a shorter time, yet resulting in longer total
runtime. Also, summing 8.3's plan lines doesn't come close to the
total execution time:

This last bit often means there's some overhead in the systems
timeofday() function calls.

If you just use \timing from psql, and run the script without explain
analyze, what speeds do you get on each?

#6Vlad
marchenko@gmail.com
In reply to: Scott Marlowe (#5)
Re: 8.3RC2 vs 8.2.6 testing results

This last bit often means there's some overhead in the systems
timeofday() function calls.

If you just use \timing from psql, and run the script without explain
analyze, what speeds do you get on each?

17480ms (8.2.6)
20342ms (8.3RC2)

--
Vlad

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vlad (#1)
Re: 8.3RC2 vs 8.2.6 testing results

Vlad <marchenko@gmail.com> writes:

2. We ran several tests and found 8.3 generally 10% slower than 8.2.6.

The particular case you are showing here seems to be all about the speed
of hash aggregation --- at least the time differential is mostly in the
HashAggregate step. What is the data type of a_id? I speculate that
you're noticing the slightly slower/more complicated hash function that
8.3 uses for integers. On a case where the data was well distributed
you'd not see any countervailing efficiency gain from those extra
cycles.

regards, tom lane

#8Greg Smith
gsmith@gregsmith.com
In reply to: Tom Lane (#7)
Re: 8.3RC2 vs 8.2.6 testing results

On Mon, 28 Jan 2008, Tom Lane wrote:

I speculate that you're noticing the slightly slower/more complicated
hash function that 8.3 uses for integers.

There was a similar slowdown in the Clodaldo case you tracked down
recently. Is it worth considering an addition to the release notes
warning about this class of problem? If there have been two of them so
far just in the beta I wonder how many people are going to run into some
variant of this in the future.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#9Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#7)
Re: 8.3RC2 vs 8.2.6 testing results

On 1/29/08, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Vlad <marchenko@gmail.com> writes:

2. We ran several tests and found 8.3 generally 10% slower than 8.2.6.

The particular case you are showing here seems to be all about the speed
of hash aggregation --- at least the time differential is mostly in the
HashAggregate step. What is the data type of a_id? I speculate that
you're noticing the slightly slower/more complicated hash function that
8.3 uses for integers. On a case where the data was well distributed
you'd not see any countervailing efficiency gain from those extra
cycles.

AFAIK we have a plan to update string hash in 8.4 to fastest
available (Jenkins lookup3). Maybe we should update integer
hash too then to the best:

http://www.cris.com/~Ttwang/tech/inthash.htm

("32 bit Mix Functions" is the one).

--
marko

#10Vlad
marchenko@gmail.com
In reply to: Vlad (#1)
Re: 8.3RC2 vs 8.2.6 testing results

Tom:

Yes, they are ints. To (somewhat) check your guess on the role of the hash
aggregation speed, I just ran oltp test from sysbench
(http://sysbench.sourceforge.net/docs/#database_mode) on a table with 1mln
of records. That test uses pretty simple queries (that do not use
aggregation) and 8.3 showed about the same performance as 8.2 (strictly
speaking 8.3 was about 1-2% slower, but not 10-15% like on my query).

I'm curious if that new hash aggregation algorithm was put in 8.3 with the
performance increase as a goal or it was simply a required change to support
some other new feature of 8.3? Right now the switch from 8.2 to 8.3 doesn't
seems a favorable step for the type of application that we have...

-- vlad

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Vlad" <marchenko@gmail.com>
Cc: "PG-General" <pgsql-general@postgresql.org>
Sent: Monday, January 28, 2008 9:13 PM
Subject: Re: [GENERAL] 8.3RC2 vs 8.2.6 testing results

Show quoted text

Vlad <marchenko@gmail.com> writes:

2. We ran several tests and found 8.3 generally 10% slower than 8.2.6.

The particular case you are showing here seems to be all about the speed
of hash aggregation --- at least the time differential is mostly in the
HashAggregate step. What is the data type of a_id? I speculate that
you're noticing the slightly slower/more complicated hash function that
8.3 uses for integers. On a case where the data was well distributed
you'd not see any countervailing efficiency gain from those extra
cycles.

regards, tom lane

#11Clodoaldo
clodoaldo.pinto.neto@gmail.com
In reply to: Vlad (#10)
Re: 8.3RC2 vs 8.2.6 testing results

2008/1/29, Vlad Marchenko <marchenko@gmail.com>:

Tom:

Yes, they are ints. To (somewhat) check your guess on the role of the hash
aggregation speed, I just ran oltp test from sysbench
(http://sysbench.sourceforge.net/docs/#database_mode) on a table with 1mln
of records. That test uses pretty simple queries (that do not use
aggregation) and 8.3 showed about the same performance as 8.2 (strictly
speaking 8.3 was about 1-2% slower, but not 10-15% like on my query).

I'm curious if that new hash aggregation algorithm was put in 8.3 with the
performance increase as a goal or it was simply a required change to support
some other new feature of 8.3? Right now the switch from 8.2 to 8.3 doesn't
seems a favorable step for the type of application that we have...

Vlad,

What happens if you run the 8.3 test with enable_hashagg set to off?

Saudações, Clodoaldo Pinto Neto

Show quoted text

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Vlad" <marchenko@gmail.com>
Cc: "PG-General" <pgsql-general@postgresql.org>
Sent: Monday, January 28, 2008 9:13 PM
Subject: Re: [GENERAL] 8.3RC2 vs 8.2.6 testing results

Vlad <marchenko@gmail.com> writes:

2. We ran several tests and found 8.3 generally 10% slower than 8.2.6.

The particular case you are showing here seems to be all about the speed
of hash aggregation --- at least the time differential is mostly in the
HashAggregate step. What is the data type of a_id? I speculate that
you're noticing the slightly slower/more complicated hash function that
8.3 uses for integers. On a case where the data was well distributed
you'd not see any countervailing efficiency gain from those extra
cycles.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

#12Vlad
marchenko@gmail.com
In reply to: Clodoaldo (#11)
Re: 8.3RC2 vs 8.2.6 testing results

I just tried adjusting two parameters:

enable_hashagg = off
both versions run slower, still 8.2 quicker than 8.3 (by ~ the same %)

enable_hashjoin = off
both versions run slower, still 8.2 quicker than 8.3 (by ~ the same %)

On Jan 29, 2008 10:34 AM, Clodoaldo <clodoaldo.pinto.neto@gmail.com> wrote:

Vlad,

What happens if you run the 8.3 test with enable_hashagg set to off?

Saudações, Clodoaldo Pinto Neto

--
Vlad

#13Vlad
marchenko@gmail.com
In reply to: Vlad (#12)
Re: 8.3RC2 vs 8.2.6 testing results

More test results for public. We ran my original query and found out
that on 4 cores CPU:

1 thread test 8.2 bits 8.3 by 10..15%
4 threads 8.2. wins 8.3 by ~2%
8 threads 8.3 finally wins 8.2 by ~2%

the same data set was affected during multi-threaded runs, but it's
100% cached from the disk. I guess we see the result of concurrent
scans optimization in 8.3...

--
Vlad

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vlad (#10)
Re: 8.3RC2 vs 8.2.6 testing results

"Vlad Marchenko" <marchenko@gmail.com> writes:

I'm curious if that new hash aggregation algorithm was put in 8.3 with the
performance increase as a goal

Yes.
http://archives.postgresql.org/pgsql-hackers/2007-05/msg01179.php
http://archives.postgresql.org/pgsql-committers/2007-06/msg00007.php
(and nearby messages)

or it was simply a required change to support
some other new feature of 8.3? Right now the switch from 8.2 to 8.3 doesn't
seems a favorable step for the type of application that we have...

You are condemning a demonstrably better algorithm on the basis of one
test case. If you poke around a bit harder you will find other cases
where it is faster than 8.2 because of having better distribution.

regards, tom lane