TimeOf(Subselects|Joins)FromLargeTables?

Started by Hegedus, Tamas .almost 22 years ago3 messagesgeneral
Jump to latest
#1Hegedus, Tamas .
Hegedus.Tamas@mayo.edu

Dear All,

I am a biologist and I do not know what to expect from an RDB (PgSQL).
I have large tables: 1215607 rows in prots, 2184596 rows in kwx (see table details below). I would like to do something like that:

SELECT name, seq FROM prots WHERE fid in (SELECT fid FROM kwx WHERE kw_acc=812);

After executing this (either as a subquery or joins) the best/fastest result what I had (SET enable_seqscan=off):
83643.482 ms (see EXPLAIN ANALYZE below).

The two (similar) parts of this query are executed much faster:
SELECT fid FROM kwx WHERE kw_acc=812 -- takes 302ms, n(rows)=78050
SELECT name, seq FROM prots WHERE fid < 80000 -- takes 1969.231 ms

Is this realistic? OK?
If not: how can I increase the speed by fine tuning of the RDB (indexes, run-time parameters) or my SQL query?
(It came now into my mind: if I decrease the number of columns in the prots table (to have only 3 fields (fid, name, seq) instead of 20 columns), than the prots table will have smaller file size on disk, than this table may need less disk page fetches, queries may be faster. Is this true?)

Thanks for your help!
Tamas

===============================================
Table "public.prots"
Column | Type | Modifiers
-----------+----------------------+----------
fid | integer | not null
name | character varying(10) | not null
[...other 17 columns...]
seq | text |
Indexes:
"prots_pkey" primary key, btree (fid)
"ix_prots_acc" unique, btree (acc)
"ix_prots_name" unique, btree (name)
"ix_prots_class" btree ("class")
===============================================
Table "public.kwx"
Column | Type | Modifiers
--------+--------+----------
fid | integer |
kw_acc | integer |
Indexes:
"ix_kwx_acc" btree (kw_acc)
"ix_kwx_fid" btree (fid)
Foreign-key constraints:
"fk_kws_acc" FOREIGN KEY (kw_acc) REFERENCES kw_ref(kw_acc)
"fk_kws_fid" FOREIGN KEY (fid) REFERENCES prots(fid)
===============================================

EXPLAIN ANALYZE SELECT name, seq from prots inner join kwx on (prots.fid=kwx.fid) where kwx.kw_acc = 812;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=0.00..160429.66 rows=84473 width=349) (actual time=29.039..83505.629 rows=78050 loops=1)
Merge Cond: ("outer".fid = "inner".fid)
-> Index Scan using ix_kwx_fid on kwx (cost=0.00..44987.55 rows=84473 width=4) (actual time=18.893..5730.468 rows=78050 loops=1)
Filter: (kw_acc = 812)
-> Index Scan using prots_pkey on prots (cost=0.00..112005.24 rows=981127 width=353) (actual time=0.083..76059.235 rows=1210377 loops=1)
Total runtime: 83643.482 ms
(6 rows)

#2Dann Corbit
DCorbit@connx.com
In reply to: Hegedus, Tamas . (#1)
Re: TimeOf(Subselects|Joins)FromLargeTables?

How does this query perform:

SELECT p.name, p.seq
FROM prots p, kwx k
WHERE
p.fid=k.fid
AND
k.kw_acc=812
;

Show quoted text

-----Original Message-----
From: Hegedus, Tamas . [mailto:Hegedus.Tamas@mayo.edu]
Sent: Thursday, June 03, 2004 6:48 PM
To: 'pgsql-general@postgresql.org'
Subject: [GENERAL] TimeOf(Subselects|Joins)FromLargeTables?

Dear All,

I am a biologist and I do not know what to expect from an RDB
(PgSQL). I have large tables: 1215607 rows in prots, 2184596
rows in kwx (see table details below). I would like to do
something like that:

SELECT name, seq FROM prots WHERE fid in (SELECT fid FROM kwx
WHERE kw_acc=812);

After executing this (either as a subquery or joins) the
best/fastest result what I had (SET enable_seqscan=off):
83643.482 ms (see EXPLAIN ANALYZE below).

The two (similar) parts of this query are executed much
faster: SELECT fid FROM kwx WHERE kw_acc=812 -- takes 302ms,
n(rows)=78050 SELECT name, seq FROM prots WHERE fid < 80000
-- takes 1969.231 ms

Is this realistic? OK?
If not: how can I increase the speed by fine tuning of the
RDB (indexes, run-time parameters) or my SQL query? (It came
now into my mind: if I decrease the number of columns in the
prots table (to have only 3 fields (fid, name, seq) instead
of 20 columns), than the prots table will have smaller file
size on disk, than this table may need less disk page
fetches, queries may be faster. Is this true?)

Thanks for your help!
Tamas

===============================================
Table "public.prots"
Column | Type | Modifiers
-----------+----------------------+----------
fid | integer | not null
name | character varying(10) | not null
[...other 17 columns...]
seq | text |
Indexes:
"prots_pkey" primary key, btree (fid)
"ix_prots_acc" unique, btree (acc)
"ix_prots_name" unique, btree (name)
"ix_prots_class" btree ("class")
===============================================
Table "public.kwx"
Column | Type | Modifiers
--------+--------+----------
fid | integer |
kw_acc | integer |
Indexes:
"ix_kwx_acc" btree (kw_acc)
"ix_kwx_fid" btree (fid)
Foreign-key constraints:
"fk_kws_acc" FOREIGN KEY (kw_acc) REFERENCES kw_ref(kw_acc)
"fk_kws_fid" FOREIGN KEY (fid) REFERENCES prots(fid)
===============================================

EXPLAIN ANALYZE SELECT name, seq from prots inner join kwx on
(prots.fid=kwx.fid) where kwx.kw_acc = 812;

QUERY PLAN

--------------------------------------------------------------
--------------------------------------------------------------
------------------
Merge Join (cost=0.00..160429.66 rows=84473 width=349)
(actual time=29.039..83505.629 rows=78050 loops=1)
Merge Cond: ("outer".fid = "inner".fid)
-> Index Scan using ix_kwx_fid on kwx
(cost=0.00..44987.55 rows=84473 width=4) (actual
time=18.893..5730.468 rows=78050 loops=1)
Filter: (kw_acc = 812)
-> Index Scan using prots_pkey on prots
(cost=0.00..112005.24 rows=981127 width=353) (actual
time=0.083..76059.235 rows=1210377 loops=1) Total runtime:
83643.482 ms (6 rows)

---------------------------(end of
broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to
majordomo@postgresql.org

#3Duane Lee - EGOVX
DLee@mail.maricopa.gov
In reply to: Dann Corbit (#2)
Re: TimeOf(Subselects|Joins)FromLargeTables?

Or this correlated subquery:

SELECT a.name, a.seq FROM prots as a
WHERE EXISTS (
SELECT '1'FROM kwx as b
WHERE b.kw_acc=812
and b.fid = a.fid
);

-----Original Message-----
From: Dann Corbit [mailto:DCorbit@connx.com]
Sent: Thursday, June 03, 2004 6:59 PM
To: Hegedus, Tamas .; pgsql-general@postgresql.org
Subject: Re: [GENERAL] TimeOf(Subselects|Joins)FromLargeTables?

How does this query perform:

SELECT p.name, p.seq
FROM prots p, kwx k
WHERE
p.fid=k.fid
AND
k.kw_acc=812
;

-----Original Message-----
From: Hegedus, Tamas . [mailto:Hegedus.Tamas@mayo.edu]
Sent: Thursday, June 03, 2004 6:48 PM
To: 'pgsql-general@postgresql.org'
Subject: [GENERAL] TimeOf(Subselects|Joins)FromLargeTables?

Dear All,

I am a biologist and I do not know what to expect from an RDB
(PgSQL). I have large tables: 1215607 rows in prots, 2184596
rows in kwx (see table details below). I would like to do
something like that:

SELECT name, seq FROM prots WHERE fid in (SELECT fid FROM kwx
WHERE kw_acc=812);

After executing this (either as a subquery or joins) the
best/fastest result what I had (SET enable_seqscan=off):
83643.482 ms (see EXPLAIN ANALYZE below).

The two (similar) parts of this query are executed much
faster: SELECT fid FROM kwx WHERE kw_acc=812 -- takes 302ms,
n(rows)=78050 SELECT name, seq FROM prots WHERE fid < 80000
-- takes 1969.231 ms

Is this realistic? OK?
If not: how can I increase the speed by fine tuning of the
RDB (indexes, run-time parameters) or my SQL query? (It came
now into my mind: if I decrease the number of columns in the
prots table (to have only 3 fields (fid, name, seq) instead
of 20 columns), than the prots table will have smaller file
size on disk, than this table may need less disk page
fetches, queries may be faster. Is this true?)

Thanks for your help!
Tamas

===============================================
Table "public.prots"
Column | Type | Modifiers
-----------+----------------------+----------
fid | integer | not null
name | character varying(10) | not null
[...other 17 columns...]
seq | text |
Indexes:
"prots_pkey" primary key, btree (fid)
"ix_prots_acc" unique, btree (acc)
"ix_prots_name" unique, btree (name)
"ix_prots_class" btree ("class")
===============================================
Table "public.kwx"
Column | Type | Modifiers
--------+--------+----------
fid | integer |
kw_acc | integer |
Indexes:
"ix_kwx_acc" btree (kw_acc)
"ix_kwx_fid" btree (fid)
Foreign-key constraints:
"fk_kws_acc" FOREIGN KEY (kw_acc) REFERENCES kw_ref(kw_acc)
"fk_kws_fid" FOREIGN KEY (fid) REFERENCES prots(fid)
===============================================

EXPLAIN ANALYZE SELECT name, seq from prots inner join kwx on
(prots.fid=kwx.fid) where kwx.kw_acc = 812;

QUERY PLAN

--------------------------------------------------------------
--------------------------------------------------------------
------------------
Merge Join (cost=0.00..160429.66 rows=84473 width=349)
(actual time=29.039..83505.629 rows=78050 loops=1)
Merge Cond: ("outer".fid = "inner".fid)
-> Index Scan using ix_kwx_fid on kwx
(cost=0.00..44987.55 rows=84473 width=4) (actual
time=18.893..5730.468 rows=78050 loops=1)
Filter: (kw_acc = 812)
-> Index Scan using prots_pkey on prots
(cost=0.00..112005.24 rows=981127 width=353) (actual
time=0.083..76059.235 rows=1210377 loops=1) Total runtime:
83643.482 ms (6 rows)

---------------------------(end of
broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to
majordomo@postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)