How Postgresql Compares For Query And Load Operations

Started by Mark Kirkwoodover 24 years ago27 messagesgeneral
Jump to latest
#1Mark Kirkwood
mark.kirkwood@catalyst.net.nz

Dear list,

With the advent of Version 7.1.2 I thought it would be interesting to compare
how Postgresql does a certain class of queries (Star Queries), and Data Loads
with some of the other leading databases ( which were in my humble opinion
Mysql 3.23.39, Db2 7.2 ,Oracle 9.0 [ ok - Possibly SQL 2000 too but I dont
run Winanyk] ).

The results were overall very encouraging :

Postgresql can clearly hold its own when compared to the "big guys".

The full details (together with a wee rant) are aviailable on :

http://homepages.slingshot.co.nz/~markir

(if anyone asks I can submit the entire results...but I figured, lets cut to
the chase here....)

There were two areas where Postgresql was slower, and I thought it would be
informative to discuss these briefly :

1 Star query scanning a sigificant portion of a fact table

SELECT
d0.f1,
count(f.val)
FROM dim0 d0,
fact1 f
WHERE d0.d0key = f.d0key
AND d0.f1 between '2007-07-01' AND '2018-07-01'
GROUP BY d0.f1

This query requires summarizing a significant proportion of the 3000000 row (
700Mb ) fact1 table.

Postgres 7.1.2 executed this query like :

Aggregate (cost=2732703.88..2738731.49 rows=120552 width=20)
-> Group (cost=2732703.88..2735717.69 rows=1205521 width=20)
-> Sort (cost=2732703.88..2732703.88 rows=1205521 width=20)
-> Hash Join (cost=1967.52..2550188.93 rows=1205521 width=20)
-> Seq Scan on fact1 f (cost=0.00..1256604.00 rows=3000000
width=8)
-> Hash (cost=1957.47..1957.47 rows=4018 width=12)
-> Index Scan using dim0_q1 on dim0 d0 (cost=0.00..1957.47
rows=4018 width=12)

for an elapsed time of 3m50s

Wheras Oracle 9.0 used :

SELECT STATEMENT Optimizer=CHOOSE (Cost=5810 Card=4020 Bytes =60300)
SORT (GROUP BY) (Cost=5810 Card=4020 Bytes=60300)
HASH JOIN (Cost=5810 Card=2989644 Bytes=44844660)
TABLE ACCESS (FULL) OF 'DIM0' (Cost=4 Card=4020 Bytes= 40200)
TABLE ACCESS (FULL) OF 'FACT1' (Cost=5806 Card=2990089
Bytes=14950445)

for an elapsed time of 50s.

It would seem that Oracle's execution plan is more optimal.

2 Bulk loading data

Buld Load times for a 3000000 row (700Mb ) fact table were

Postgresql 9m30s (copy)
Db2 2m15s (load)
Oracle 5m (sqlldr)
Mysql 2m20s (load)

(Note that Db2 is faster than Mysql here ! )

While I left "fsync = on" for this test... I still think the difference was
worth noting.

Any comments on these sort of results would be welcome.

regards

Mark

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Kirkwood (#1)
Re: Outer joins

eddie iannuccelli <eddie.iannuccelli@toulouse.inra.fr> writes:

Can anyone confirm me that Postgres 7.1 does not support outer join ?

What? It definitely *does* support outer joins.

Are functions similar to classical stored procedure ?

Depends on how loose your idea of "similar" is ... a function can't
readily return a recordset at the moment. (This may be fixed in time
for 7.2, though.)

regards, tom lane

#3Noname
wsheldah@lexmark.com
In reply to: Tom Lane (#2)
RE: Outer joins

I think there should be a Postgres Features section. It might be nice to have a
concise comparison chart comparing its features with, for instance, MySQL, SQL
Server 2000, Oracle, and any others that come up often in questions. The
comparison chart should be frank while still unabashedly highlighting the many
places where PostgreSQL really shines. You never know, some lazy journalist may
decide to copy and paste it into their next review. ;--)

"Trewern, Ben" <Ben.Trewern%mowlem.com@interlock.lexmark.com> on 07/13/2001
10:06:41 AM

To: pgsql-general%postgresql.org@interlock.lexmark.com
cc: (bcc: Wesley Sheldahl/Lex/Lexmark)
Subject: RE: [GENERAL] Outer joins

Postgres 7.1 does support Left, right & outer joins.� Functions are similar to
stored procedures but they cannot return rows.� I think that this limitation is
being worked on for the next release (7.2).

By the way should there be a 'Postgres Features' page on the website.� This
would save a large amount of these queries and reduce the amount of e-mails on
this list.� I'm sure that lots of interest is being raised because of the
'Redhat Database' Announcement.� Some of these people will just want to know
what Postgres is capable of.

Regards

Ben

-----Original Message-----
From: eddie iannuccelli [mailto:eddie.iannuccelli@toulouse.inra.fr]
Sent: 13 July 2001 18:51
Cc: pgsql-general@postgresql.org
Subject: [GENERAL] Outer joins

Can anyone confirm me that Postgres 7.1 does not support outer join ?
Are functions similar to classical stored procedure ?

thanks
--
**************************************************
Eddie IANNUCCELLI - tel: 05 61 28 54 44
INRA, Laboratoire de G�n�tique Cellulaire
Chemin de Borde Rouge - Auzeville -BP27
31326 Castanet Tolosan
**************************************************

---------------------------(end of
broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
���� (send "unregister YourEmailAddressHere" to
majordomo@postgresql.org)

Sorry about the footer it's automatic.

*****************************************************************************
This email and any attachments transmitted with it are confidential
and intended solely for the use of the individual or entity to whom
they are addressed. If you have received this email in error please
notify the sender and do not store, copy or disclose the content
to any other person.

It is the responsibility of the recipient to ensure that opening this
message and/or any of its attachments will not adversely affect
its systems. No responsibility is accepted by the Company.
*****************************************************************************

#4Ryan Mahoney
ryan@paymentalliance.net
In reply to: Mark Kirkwood (#1)
Re: How Postgresql Compares For Query And Load Operations

Hey Mark, very interesting results! Thanks for taking the time to collect
this info - it is really helpful!

Quick note, I don't know what your licensing arrangement is with Oracle -
but from what I understand you may be in violation of those terms by
publishing this data (maybe not - Ned from Great Bridge can answer this
question better). If that's is the case, I think publishing something like
xxxxxx 9.0 may be a simple resolution.

Great Work! I'd be interested in seeing how some additional tuning would
affect your pg results.

-Ryan Mahoney

At 10:22 PM 7/13/01 +1200, Mark kirkwood wrote:

Show quoted text

Dear list,

With the advent of Version 7.1.2 I thought it would be interesting to compare
how Postgresql does a certain class of queries (Star Queries), and Data Loads
with some of the other leading databases ( which were in my humble opinion
Mysql 3.23.39, Db2 7.2 ,Oracle 9.0 [ ok - Possibly SQL 2000 too but I dont
run Winanyk] ).

The results were overall very encouraging :

Postgresql can clearly hold its own when compared to the "big guys".

The full details (together with a wee rant) are aviailable on :

http://homepages.slingshot.co.nz/~markir

(if anyone asks I can submit the entire results...but I figured, lets cut to
the chase here....)

There were two areas where Postgresql was slower, and I thought it would be
informative to discuss these briefly :

1 Star query scanning a sigificant portion of a fact table

SELECT
d0.f1,
count(f.val)
FROM dim0 d0,
fact1 f
WHERE d0.d0key = f.d0key
AND d0.f1 between '2007-07-01' AND '2018-07-01'
GROUP BY d0.f1

This query requires summarizing a significant proportion of the 3000000 row (
700Mb ) fact1 table.

Postgres 7.1.2 executed this query like :

Aggregate (cost=2732703.88..2738731.49 rows=120552 width=20)
-> Group (cost=2732703.88..2735717.69 rows=1205521 width=20)
-> Sort (cost=2732703.88..2732703.88 rows=1205521 width=20)
-> Hash Join (cost=1967.52..2550188.93 rows=1205521 width=20)
-> Seq Scan on fact1 f (cost=0.00..1256604.00 rows=3000000
width=8)
-> Hash (cost=1957.47..1957.47 rows=4018 width=12)
-> Index Scan using dim0_q1 on dim0 d0 (cost=0.00..1957.47
rows=4018 width=12)

for an elapsed time of 3m50s

Wheras Oracle 9.0 used :

SELECT STATEMENT Optimizer=CHOOSE (Cost=5810 Card=4020 Bytes =60300)
SORT (GROUP BY) (Cost=5810 Card=4020 Bytes=60300)
HASH JOIN (Cost=5810 Card=2989644 Bytes=44844660)
TABLE ACCESS (FULL) OF 'DIM0' (Cost=4 Card=4020 Bytes= 40200)
TABLE ACCESS (FULL) OF 'FACT1' (Cost=5806 Card=2990089
Bytes=14950445)

for an elapsed time of 50s.

It would seem that Oracle's execution plan is more optimal.

2 Bulk loading data

Buld Load times for a 3000000 row (700Mb ) fact table were

Postgresql 9m30s (copy)
Db2 2m15s (load)
Oracle 5m (sqlldr)
Mysql 2m20s (load)

(Note that Db2 is faster than Mysql here ! )

While I left "fsync = on" for this test... I still think the difference was
worth noting.

Any comments on these sort of results would be welcome.

regards

Mark

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

---
Incoming mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.251 / Virus Database: 124 - Release Date: 4/26/01

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Kirkwood (#1)
Re: How Postgresql Compares For Query And Load Operations

Mark kirkwood <markir@slingshot.co.nz> writes:

Postgres 7.1.2 executed this query like :

Aggregate (cost=2732703.88..2738731.49 rows=120552 width=20)
-> Group (cost=2732703.88..2735717.69 rows=1205521 width=20)
-> Sort (cost=2732703.88..2732703.88 rows=1205521 width=20)
-> Hash Join (cost=1967.52..2550188.93 rows=1205521 width=20)
-> Seq Scan on fact1 f (cost=0.00..1256604.00 rows=3000000
width=8)
-> Hash (cost=1957.47..1957.47 rows=4018 width=12)
-> Index Scan using dim0_q1 on dim0 d0 (cost=0.00..1957.47
rows=4018 width=12)

for an elapsed time of 3m50s

Wheras Oracle 9.0 used :

SELECT STATEMENT Optimizer=CHOOSE (Cost=5810 Card=4020 Bytes =60300)
SORT (GROUP BY) (Cost=5810 Card=4020 Bytes=60300)
HASH JOIN (Cost=5810 Card=2989644 Bytes=44844660)
TABLE ACCESS (FULL) OF 'DIM0' (Cost=4 Card=4020 Bytes= 40200)
TABLE ACCESS (FULL) OF 'FACT1' (Cost=5806 Card=2990089
Bytes=14950445)

for an elapsed time of 50s.

It would seem that Oracle's execution plan is more optimal.

Hmm, since I don't know the details of Oracle's plan displays, it's hard
to comment on that --- but it looks to me like the plans are essentially
the same, with the small difference that Postgres chooses to use the
index on dim0 to filter out the dim0 rows that don't meet
d0.f1 between '2007-07-01' AND '2018-07-01'
before they get loaded into the hashtable, whereas Oracle is just
scanning dim0 by brute force (and presumably evaluating the BETWEEN
clause directly at each row). Given that that's the much smaller table,
it's unlikely that that makes much difference. I am assuming that the
different order in which the hash-join's inputs are listed is just an
artifact of the listing format --- surely Oracle is also choosing to
load the dim0 rows into an in-memory hash table and then scan fact1
to probe the hash table, rather than trying to load all of fact1 into
memory.

My guess is that the Oracle plan is misleading where it says "SORT",
and that they are actually doing no such thing as a sort. Given that
they only expect 4000 distinct rows out of the grouping/aggregation
steps, it'd make more sense to do the group+aggregation in one pass
using another in-memory hash table (indexed by GROUP BY key, with
contents being the count() accumulator). Currently, Postgres only knows
how to form groups via a true sort and then a "GROUP" pass (works like
the traditional "sort | uniq" Unix filtering method). But if you
actually want to aggregate the groups, and there aren't too many of
them, then you can form each aggregate in parallel in one unsorted pass
over the input, keeping the aggregate intermediate results in entries in
a hash table. Hash-based aggregation is on our TODO list, and is
fairly high priority in my eyes (though I doubt it'll be done for 7.2).

If Oracle really is doing a sort, it's hard to see where the speed
difference came from --- unless you have set the tuning parameters such
that Oracle does the sort all-in-memory whereas Postgres doesn't. Sorts
that have to go to disk are lots slower.

Can anyone who actually knows how to read Oracle plans confirm or deny
these speculations?

Buld Load times for a 3000000 row (700Mb ) fact table were

Postgresql 9m30s (copy)
Db2 2m15s (load)
Oracle 5m (sqlldr)
Mysql 2m20s (load)

Hmm, I couldn't make out from your webpage exactly how you did the
loading, or which steps are included in your timings.  I see that you
used COPY, which is good ... but did you create the indexes before or
after COPY?  What about the constraints?  I also see a CLUSTER script
--- was this used, and if so where is its time counted?

regards, tom lane

#6eddie iannuccelli
eddie.iannuccelli@toulouse.inra.fr
In reply to: Mark Kirkwood (#1)
Outer joins

Can anyone confirm me that Postgres 7.1 does not support outer join ?
Are functions similar to classical stored procedure ?

thanks
--
**************************************************
Eddie IANNUCCELLI - tel: 05 61 28 54 44
INRA, Laboratoire de G�n�tique Cellulaire
Chemin de Borde Rouge - Auzeville -BP27
31326 Castanet Tolosan
**************************************************

#7eddie iannuccelli
eddie.iannuccelli@toulouse.inra.fr
In reply to: eddie iannuccelli (#6)
Re: Outer joins

Sorry, I do not remember the SQL92.
Is it a JOIN and OUTER join syntax or the natural *= or #= ?

thanks
Neil Conway wrote:

Can anyone confirm me that Postgres 7.1 does not support outer join
? Are functions similar to classical stored procedure ?

Postgres 7.1 DOES support outer joins -- I believe it uses SQL92 syntax.

Cheers,

Neil

--
**************************************************
Eddie IANNUCCELLI - tel: 05 61 28 54 44
INRA, Laboratoire de G�n�tique Cellulaire
Chemin de Borde Rouge - Auzeville -BP27
31326 Castanet Tolosan
**************************************************

#8Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Tom Lane (#5)
Re: How Postgresql Compares For Query And Load Operations

Tom,

Good point....

I notice I have set Oracle "pga_aggregate_size=30M" which includes the sort
space for a user process ( oops) whereas every other db has about 2M ( there
are a few typos on the page... forgot to update from an earlier study)

I will have to re-run the Oracle results with 2M ( or re-run the rest with
30M...) I will update you .....

If Oracle really is doing a sort, it's hard to see where the speed
difference came from --- unless you have set the tuning parameters such
that Oracle does the sort all-in-memory whereas Postgres doesn't. Sorts
that have to go to disk are lots slower.

Can anyone who actually knows how to read Oracle plans confirm or deny
these speculations?

I will have a play with a clearer example for the star optimization business
( the thoery being - I believe ...that for a star query with n (small)
dimension tables and 1 (big) fact table, it is best to cartesian product the
dimensions, determine a set of keys, and access to the fact table using
these). My "trivial" example with 1 dimension does not illustrate this that
well...( I have another with 2 dimension tables which should be
better)...again I will update you.

Buld Load times for a 3000000 row (700Mb ) fact table were

Postgresql 9m30s (copy)
Db2 2m15s (load)
Oracle 5m (sqlldr)
Mysql 2m20s (load)

There are a few "optional" scripts in the tar - which I should have indicated
:-( ... I do not do the cluster, primary or foreign keys at all ( there were
too many variations and options for constraints for all the different
databases)....so I just create the table, load via copy and then create the
various indexes. The load timings are for the fact0 table with no indexes
created.

Show quoted text
Hmm, I couldn't make out from your webpage exactly how you did the
loading, or which steps are included in your timings.  I see that you
used COPY, which is good ... but did you create the indexes before or
after COPY?  What about the constraints?  I also see a CLUSTER script
--- was this used, and if so where is its time counted?

regards, tom lane

#9Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Tom Lane (#5)
Re: How Postgresql Compares For Query And Load Operations

On Saturday 14 July 2001 02:49, Tom Lane wrote:

It would seem that Oracle's execution plan is more optimal.

Hmm, since I don't know the details of Oracle's plan displays, it's hard
to comment on that --- but it looks to me like the plans are essentially
the same, with the small difference that Postgres chooses to use the
index on dim0 to filter ....(snipped )

After a little thinking, I am inclined to agree with you Tom... I wondered if
the difference might to be due to pure sequential scan performance
differences. I tried this query :

SELECT sum(val) FROM fact0

for Postgres, Db2 and Oracle. The results were

Postgres 2m25s
Db2 40s
Oracle 50s

This seems to be the likely culprit. I suspect that the "many block/page read
at once" type optimzations (prefetch for Db2 and mutli block read for Oracle)
mean that table sequential scans are faster for these guys than Postgres.

Thus on the bright side their access plans are not necessarily any better
than Postgres !

regards

Mark

#10Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Ryan Mahoney (#4)
Re: How Postgresql Compares For Query And Load Operations

On Saturday 14 July 2001 02:49, Ryan Mahoney wrote:

Quick note, I don't know what your licensing arrangement is with Oracle -
but from what I understand you may be in violation of those terms by
publishing this data (maybe not - Ned from Great Bridge can answer this
question better). If that's is the case, I think publishing something like
xxxxxx 9.0 may be a simple resolution.

I have made some mods to the web site to "obfuscate" the relevant
results....as the developer license forbids publishing benchmarking results...

regards

Mark

#11Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Tom Lane (#5)
Re: How Postgresql Compares For Query And Load Operations

If Oracle really is doing a sort, it's hard to see where the speed
difference came from --- unless you have set the tuning parameters such
that Oracle does the sort all-in-memory whereas Postgres doesn't. Sorts
that have to go to disk are lots slower.

I redid the tests ensuring everybody used 10M sort area... nothing was
significantly altered !! ( altho Postgres moved in towards the big boys on
the first 3 queries and the elapsed time for queries 4 & 5 converged )

Hmm, I couldn't make out from your webpage exactly how you did the
loading, or which steps are included in your timings.  I see that you
used COPY, which is good ... but did you create the indexes before or
after COPY?  What about the constraints?  I also see a CLUSTER script
--- was this used, and if so where is its time counted?

regards, tom lane

My apologies for the state of the scripts ( to all you who downloaded them
for a play) - I had forgotten to complete the README and also left heaps of
test files lying about in the query directory. I have cleaned these up now !

The story is... the comparison was supposed to be simple... so no special
features ( like clustered indexes/tables, bitmap indexes, materialized views,
automatic summary tables...) just a comparison of how well each db did its
"bread and butter" operations.

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Kirkwood (#9)
Re: How Postgresql Compares For Query And Load Operations

Mark kirkwood <markir@slingshot.co.nz> writes:

I tried this query :

SELECT sum(val) FROM fact0

for Postgres, Db2 and Oracle. The results were

Postgres 2m25s
Db2 40s
Oracle 50s

This seems to be the likely culprit. I suspect that the "many
block/page read at once" type optimzations (prefetch for Db2 and mutli
block read for Oracle) mean that table sequential scans are faster for
these guys than Postgres.

Hm. The theory about simple sequential reads is that we expect the
kernel to optimize the disk access, since it'll recognize that we are
doing sequential access to the table file and do read-aheads. Or that's
the theory, anyway.

I am not convinced that inefficient I/O is the story here. We could be
paying the price of our very generalized implementation of aggregates.
It would be interesting to know how much CPU time was chewed up by each
DB during the SELECT sum(). It'd also be interesting to know just what
datatype is being summed.

regards, tom lane

#13Bruce Momjian
bruce@momjian.us
In reply to: Mark Kirkwood (#9)
Re: How Postgresql Compares For Query And Load Operations

On Saturday 14 July 2001 02:49, Tom Lane wrote:

It would seem that Oracle's execution plan is more optimal.

Hmm, since I don't know the details of Oracle's plan displays, it's hard
to comment on that --- but it looks to me like the plans are essentially
the same, with the small difference that Postgres chooses to use the
index on dim0 to filter ....(snipped )

After a little thinking, I am inclined to agree with you Tom... I wondered if
the difference might to be due to pure sequential scan performance
differences. I tried this query :

SELECT sum(val) FROM fact0

for Postgres, Db2 and Oracle. The results were

Postgres 2m25s
Db2 40s
Oracle 50s

This seems to be the likely culprit. I suspect that the "many block/page read
at once" type optimzations (prefetch for Db2 and mutli block read for Oracle)
mean that table sequential scans are faster for these guys than Postgres.

Thus on the bright side their access plans are not necessarily any better
than Postgres !

Can you remind me about the OS you are using?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#14Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#12)
Re: How Postgresql Compares For Query And Load Operations

This seems to be the likely culprit. I suspect that the "many
block/page read at once" type optimzations (prefetch for Db2 and mutli
block read for Oracle) mean that table sequential scans are faster for
these guys than Postgres.

Hm. The theory about simple sequential reads is that we expect the
kernel to optimize the disk access, since it'll recognize that we are
doing sequential access to the table file and do read-aheads. Or that's
the theory, anyway.

I am not convinced that inefficient I/O is the story here. We could be
paying the price of our very generalized implementation of aggregates.
It would be interesting to know how much CPU time was chewed up by each
DB during the SELECT sum(). It'd also be interesting to know just what
datatype is being summed.

If it is Linux, they turn off read-ahead on the first fseek() and it
never gets turned on again, or so I am told. And because we have
virtual file descriptors, that table remains open after random access
and the readahead bit doesn't get set for the next sequential scan.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#14)
Re: How Postgresql Compares For Query And Load Operations

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Hm. The theory about simple sequential reads is that we expect the
kernel to optimize the disk access, since it'll recognize that we are
doing sequential access to the table file and do read-aheads. Or that's
the theory, anyway.

If it is Linux, they turn off read-ahead on the first fseek() and it
never gets turned on again, or so I am told. And because we have
virtual file descriptors, that table remains open after random access
and the readahead bit doesn't get set for the next sequential scan.

Ugh. And even if we hacked the VFD code to close/reopen the file, the
shared disk buffers might still have some entries for some blocks of
the file, causing those blocks not to be requested during the seq scan,
thus disabling read-ahead again.

It sounds like we really ought to try to get this Linux behavior fixed
to work more like BSD (ie, some reasonably small number of consecutive
reads turns on read-ahead). Red Hat guys, are you listening?

regards, tom lane

#16Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#15)
Re: How Postgresql Compares For Query And Load Operations

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Hm. The theory about simple sequential reads is that we expect the
kernel to optimize the disk access, since it'll recognize that we are
doing sequential access to the table file and do read-aheads. Or that's
the theory, anyway.

If it is Linux, they turn off read-ahead on the first fseek() and it
never gets turned on again, or so I am told. And because we have
virtual file descriptors, that table remains open after random access
and the readahead bit doesn't get set for the next sequential scan.

Ugh. And even if we hacked the VFD code to close/reopen the file, the
shared disk buffers might still have some entries for some blocks of
the file, causing those blocks not to be requested during the seq scan,
thus disabling read-ahead again.

It sounds like we really ought to try to get this Linux behavior fixed
to work more like BSD (ie, some reasonably small number of consecutive
reads turns on read-ahead). Red Hat guys, are you listening?

I hit them with this yesterday, and sent an email this morning.

The solution is to have the readahead throttle based on the number of
cache hits from previous read-aheads. Basically, increase readahead on
sequential reads and turn off on read cache lookup failures (meaning it
doesn't have the requested block in its cache / random access). This
works in cases where the app does alternating reads from two parts of a
file.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#17Patrick Macdonald
patrickm@redhat.com
In reply to: Bruce Momjian (#14)
Re: How Postgresql Compares For Query And Load Operations

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Hm. The theory about simple sequential reads is that we expect the
kernel to optimize the disk access, since it'll recognize that we are
doing sequential access to the table file and do read-aheads. Or that's
the theory, anyway.

If it is Linux, they turn off read-ahead on the first fseek() and it
never gets turned on again, or so I am told. And because we have
virtual file descriptors, that table remains open after random access
and the readahead bit doesn't get set for the next sequential scan.

Ugh. And even if we hacked the VFD code to close/reopen the file, the
shared disk buffers might still have some entries for some blocks of
the file, causing those blocks not to be requested during the seq scan,
thus disabling read-ahead again.

It sounds like we really ought to try to get this Linux behavior fixed
to work more like BSD (ie, some reasonably small number of consecutive
reads turns on read-ahead). Red Hat guys, are you listening?

Hmmm... Bruce mentioned this yesterday while he was up here and he reiterated
his thoughts in a note this morning. The note has been forwarded to the
appropriate people (ie. kernel folks).

Cheers,
Patrick

#18Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Bruce Momjian (#13)
Re: How Postgresql Compares For Query And Load Operations

Can you remind me about the OS you are using?

Linux 2.4.3 + Glibc 2.2.2 ( Mandrake 8.0)

regards

Mark

#19Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Tom Lane (#12)
Re: How Postgresql Compares For Query And Load Operations

I tried this query :

SELECT sum(val) FROM fact0

for Postgres, Db2 and Oracle. The results were

Postgres 2m25s
Db2 40s
Oracle 50s

This seems to be the likely culprit. I suspect that the "many
block/page read at once" type optimzations (prefetch for Db2 and mutli
block read for Oracle) mean that table sequential scans are faster for
these guys than Postgres.

Hm. The theory about simple sequential reads is that we expect the
kernel to optimize the disk access, since it'll recognize that we are
doing sequential access to the table file and do read-aheads. Or that's
the theory, anyway.

I am not convinced that inefficient I/O is the story here. We could be
paying the price of our very generalized implementation of aggregates.
It would be interesting to know how much CPU time was chewed up by each
DB during the SELECT sum(). It'd also be interesting to know just what
datatype is being summed.

regards, tom lane

I monitored the cpu consumed by the relevant db processes ( counting the time
noted against each process from ps -ef, hope that was what you had in mind )

DB Elapsed Cpu
Postgres 2m25s 2m01s
Db2 50s 30s
Oracle 40s 18s

( I seem to have got my numbers for Db2 and the big O around the wrong way in
the last post ! )

I thought it was worth trying a different query as well :

SELECT count(*) FROM fact0

DB Elapsed Cpu
Postgres 1m5s 32s
Db2 23s 15s
Oracle 37s 11s

Finally the datatypes etc for the table

         Table "fact0"
 Attribute |  Type   | Modifier 
-----------+---------+----------
 d0key     | integer | 
 d1key     | integer | 
 d2key     | integer | 
 val       | integer | 
 filler    | text    | 
Index: fact0_pk

In terms of caching etc.... the first query was run from a cold start, the
second immediatly afterwards.

The Postgres db has 4000 (8K) pages of data buffers and the table itself is
57000 pages. ( others were configured analagously )

regards

Mark

#20Dr. Evil
drevil@sidereal.kz
In reply to: Mark Kirkwood (#19)
Re: How Postgresql Compares For Query And Load Operations

Mark, thanks for sharing these results with us. Interesting. PG is
definitely slower, but not overwhelmingly slower.

I wonder how different the result would be if you had a huge amount of
RAM and allocated PG enough buffers to keep the entire table in RAM.
That would take OS IO considerations out of the question, right? RAM
is very cheap these days.

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Kirkwood (#19)
#22Mitch Vincent
mvincent@cablespeed.com
In reply to: Mark Kirkwood (#1)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mitch Vincent (#22)
#24Bruce Momjian
bruce@momjian.us
In reply to: Mark Kirkwood (#19)
#25Sean Chittenden
sean-pgsql-general@chittenden.org
In reply to: Bruce Momjian (#24)
#26Mitch Vincent
mvincent@cablespeed.com
In reply to: Mark Kirkwood (#1)
#27Shaun Thomas
sthomas@townnews.com
In reply to: Sean Chittenden (#25)