Performance problem with 50,000,000 rows
I'm new to PG but this just seems wrong. Can someone take a look:
.-----------. .-----------.
| bk_inv | | bk_title |
|-----------| |-----------|
| isbn |<--->| isbn |
| store | | vendor |
| qty | | |
| week | `-----------'
| | 2,000,000 recs
`-----------'
50,000,000 recs
Actual record numbers:
bk_inv : 46,790,877
bk_title: 2,311,710
VENDOR REPORT
A list of Inventory items, for any one given vendor (e.q. 01672708)
select i.isbn,
t.vendor,
i.store,
i.qty
from bk_inv i,
bk_title t
where i.isbn = t.isbn
and t.vendor = '01672708' ;
This query should be instantaneous. Granted that's 50 million rows, but
I have created an index on the isbn column for both tables.
After about 25 minutes (on 4 processor Del 6300 with 1GB Memory) it
spits out:
ERROR: Write to hashjoin temp file failed
tiger=# explain select * from bk_inv i, bk_title t where i.isbn = t.isbn
and t.vendor ='50000029';
NOTICE: QUERY PLAN:
Merge Join (cost=0.00..11229637.06 rows=2172466 width=72)
-> Index Scan using bk_title_isbn_idx on bk_title t
(cost=0.00..390788.08 rows=107331 width=24)
-> Index Scan using bk_inv_isbn_idx on bk_inv i
(cost=0.00..10252621.38 rows=46790877 width=48)
BIG COST!
These explain queries show the existance of the indexes and give small
costs:
tiger=# explain select * from bk_title where isbn = '50000029';
NOTICE: QUERY PLAN:
Index Scan using bk_title_isbn_idx on bk_title (cost=0.00..4.90 rows=1
width=24)
tiger=# explain select * from bk_inv where isbn = '0897474228';
NOTICE: QUERY PLAN:
Index Scan using bk_inv_isbn_idx on bk_inv (cost=0.00..225.53 rows=55
width=48)
Note. Same tables, same query returns instantaneously with Oracle 8.1.
What I am hoping to show is that Postgres can do our job too.
Any help on this much obliged. (Yes I ran vacuum analyze).
David Link
White Plains, NY
Have you done a VACUUM ANALYZE ?
-Mitch
----- Original Message -----
From: "David Link" <dlink@soundscan.com>
To: <pgsql-general@postgresql.org>
Sent: Wednesday, October 10, 2001 3:46 PM
Subject: [GENERAL] Performance problem with 50,000,000 rows
Show quoted text
I'm new to PG but this just seems wrong. Can someone take a look:
.-----------. .-----------.
| bk_inv | | bk_title |
|-----------| |-----------|
| isbn |<--->| isbn |
| store | | vendor |
| qty | | |
| week | `-----------'
| | 2,000,000 recs
`-----------'
50,000,000 recsActual record numbers:
bk_inv : 46,790,877
bk_title: 2,311,710VENDOR REPORT
A list of Inventory items, for any one given vendor (e.q. 01672708)
select i.isbn,
t.vendor,
i.store,
i.qty
from bk_inv i,
bk_title t
where i.isbn = t.isbn
and t.vendor = '01672708' ;This query should be instantaneous. Granted that's 50 million rows, but
I have created an index on the isbn column for both tables.
After about 25 minutes (on 4 processor Del 6300 with 1GB Memory) it
spits out:ERROR: Write to hashjoin temp file failed
tiger=# explain select * from bk_inv i, bk_title t where i.isbn = t.isbn
and t.vendor ='50000029';
NOTICE: QUERY PLAN:Merge Join (cost=0.00..11229637.06 rows=2172466 width=72)
-> Index Scan using bk_title_isbn_idx on bk_title t
(cost=0.00..390788.08 rows=107331 width=24)
-> Index Scan using bk_inv_isbn_idx on bk_inv i
(cost=0.00..10252621.38 rows=46790877 width=48)BIG COST!
These explain queries show the existance of the indexes and give small
costs:tiger=# explain select * from bk_title where isbn = '50000029';
NOTICE: QUERY PLAN:Index Scan using bk_title_isbn_idx on bk_title (cost=0.00..4.90 rows=1
width=24)tiger=# explain select * from bk_inv where isbn = '0897474228';
NOTICE: QUERY PLAN:Index Scan using bk_inv_isbn_idx on bk_inv (cost=0.00..225.53 rows=55
width=48)Note. Same tables, same query returns instantaneously with Oracle 8.1.
What I am hoping to show is that Postgres can do our job too.Any help on this much obliged. (Yes I ran vacuum analyze).
David Link
White Plains, NY---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
David Link wrote:
Just, reading it and I always enter these kind of queries like:
select i.isbn,
t.vendor,
i.store,
i.qty
from bk_inv i,
bk_title t
where
t.vendor = '01672708' and
i.isbn = t.isbn;
Don't know if it makes a difference but I can imagine that it could because
the t.vendor = '...' limits the possibilities in the first stage.
Regards,
Feite
I'm new to PG but this just seems wrong. Can someone take a look:
.-----------. .-----------.
| bk_inv | | bk_title |
|-----------| |-----------|
| isbn |<--->| isbn |
| store | | vendor |
| qty | | |
| week | `-----------'
| | 2,000,000 recs
`-----------'
50,000,000 recsActual record numbers:
bk_inv : 46,790,877
bk_title: 2,311,710VENDOR REPORT
A list of Inventory items, for any one given vendor (e.q. 01672708)
select i.isbn,
t.vendor,
i.store,
i.qty
from bk_inv i,
bk_title t
where i.isbn = t.isbn
and t.vendor = '01672708' ;This query should be instantaneous. Granted that's 50 million rows, but
I have created an index on the isbn column for both tables.
After about 25 minutes (on 4 processor Del 6300 with 1GB Memory) it
spits out:ERROR: Write to hashjoin temp file failed
tiger=# explain select * from bk_inv i, bk_title t where i.isbn = t.isbn
and t.vendor ='50000029';
NOTICE: QUERY PLAN:Merge Join (cost=0.00..11229637.06 rows=2172466 width=72)
-> Index Scan using bk_title_isbn_idx on bk_title t
(cost=0.00..390788.08 rows=107331 width=24)
-> Index Scan using bk_inv_isbn_idx on bk_inv i
(cost=0.00..10252621.38 rows=46790877 width=48)BIG COST!
These explain queries show the existance of the indexes and give small
costs:tiger=# explain select * from bk_title where isbn = '50000029';
NOTICE: QUERY PLAN:Index Scan using bk_title_isbn_idx on bk_title (cost=0.00..4.90 rows=1
width=24)tiger=# explain select * from bk_inv where isbn = '0897474228';
NOTICE: QUERY PLAN:Index Scan using bk_inv_isbn_idx on bk_inv (cost=0.00..225.53 rows=55
width=48)Note. Same tables, same query returns instantaneously with Oracle 8.1.
What I am hoping to show is that Postgres can do our job too.Any help on this much obliged. (Yes I ran vacuum analyze).
David Link
White Plains, NY---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
--
Feite Brekeveld
feite.brekeveld@osiris-it.nl
http://www.osiris-it.nl
Previously :
Table sizes .....
bk_inv : 46,790,877
bk_title: 2,311,710Query :
select i.isbn,
t.vendor,
i.store,
i.qty
from bk_inv i,
bk_title t
where i.isbn = t.isbn
and t.vendor = '01672708' ;
It might be worth putting and index on bk_title.vendor, then the scan of this
table will quickly find the appropriate rows( assuming vendor is
selective)... then the join to bk_inv can use the bk_inv.isbn index as
before, but hopefully with a smaller dataset....(so hopefully it might be
able to use a nested loop from bk_title -> bk_inv instead of a huge merge
join)
It also might be worth playing with sort_mem and shared_buffers (if you have
not done so already) - since you have a reasonable amount of memory.
good luck
Mark
Import Notes
Resolved by subject fallback
On Thu, 2001-10-11 at 00:03, Feite Brekeveld wrote:
Just, reading it and I always enter these kind of queries like:
select i.isbn,
t.vendor,
i.store,
i.qty
from bk_inv i,
bk_title t
where
t.vendor = '01672708' and
i.isbn = t.isbn;Don't know if it makes a difference but I can imagine that it could because
the t.vendor = '...' limits the possibilities in the first stage.
I have noticed this behaviour too. Is there some sort of guide to use in
order to get the query right? Or is this just part of SQL basics?
Cheers
Tony Grant
Solution to the problem:
I had run vaccum on the two tables in question (which happen to be the
only two tables user defined in the database)
# vacuum analyze bk_inv;
# vacuum analyze bk_title;
When I ran vacuum for all tables (including the system tables I suppose)
# vacuum analyze;
I got the response time I exected. (Why this is perhaps someone can
explain).
Thanks for the feedbacks.
Regards, David
David Link wrote:
Show quoted text
I'm new to PG but this just seems wrong. Can someone take a look:
.-----------. .-----------.
| bk_inv | | bk_title |
|-----------| |-----------|
| isbn |<--->| isbn |
| store | | vendor |
| qty | | |
| week | `-----------'
| | 2,000,000 recs
`-----------'
50,000,000 recsActual record numbers:
bk_inv : 46,790,877
bk_title: 2,311,710VENDOR REPORT
A list of Inventory items, for any one given vendor (e.q. 01672708)
select i.isbn,
t.vendor,
i.store,
i.qty
from bk_inv i,
bk_title t
where i.isbn = t.isbn
and t.vendor = '01672708' ;This query should be instantaneous. Granted that's 50 million rows, but
I have created an index on the isbn column for both tables.
After about 25 minutes (on 4 processor Del 6300 with 1GB Memory) it
spits out:ERROR: Write to hashjoin temp file failed
tiger=# explain select * from bk_inv i, bk_title t where i.isbn = t.isbn
and t.vendor ='50000029';
NOTICE: QUERY PLAN:Merge Join (cost=0.00..11229637.06 rows=2172466 width=72)
-> Index Scan using bk_title_isbn_idx on bk_title t
(cost=0.00..390788.08 rows=107331 width=24)
-> Index Scan using bk_inv_isbn_idx on bk_inv i
(cost=0.00..10252621.38 rows=46790877 width=48)BIG COST!
These explain queries show the existance of the indexes and give small
costs:tiger=# explain select * from bk_title where isbn = '50000029';
NOTICE: QUERY PLAN:Index Scan using bk_title_isbn_idx on bk_title (cost=0.00..4.90 rows=1
width=24)tiger=# explain select * from bk_inv where isbn = '0897474228';
NOTICE: QUERY PLAN:Index Scan using bk_inv_isbn_idx on bk_inv (cost=0.00..225.53 rows=55
width=48)Note. Same tables, same query returns instantaneously with Oracle 8.1.
What I am hoping to show is that Postgres can do our job too.Any help on this much obliged. (Yes I ran vacuum analyze).
David Link
White Plains, NY
Hello,
I tried to search the mailing list archives but got only:
An error occured!
PQconnectPoll() -- connect() failed: Connection refused Is the postmaster running (with -i) at 'db.hub.org' and accepting connections on TCP/IP port 5439?
This doesn�t speak well about the backend we use ;-)).
Kind regards
Andreas (hopefully not repeating someone else because I can�t search
the archive).