Query performance

Started by Christian Rengstlover 19 years ago10 messagesgeneral
Jump to latest
#1Christian Rengstl
Christian.Rengstl@klinik.uni-regensburg.de

Hi everyone,

i have a table with around 57 million tuples, with the following columns: pid(varchar), crit(varchar), val1(varchar), val2(varchar). Example:
pid crit val1 val2
p1 c1 x y
p1 c2 x z
p1 c3 y x
...
What i am doing is to query all val1 and val2 for one pid and all crit values:

select val1, val2, crit from mytable where pid='somepid' and crit in(select crit from myCritTable);
where myCritTable is a table that contains all crit values (around 42.000) ordered by their insertion date.

QUERY PLAN

--------------------------------------------------------------------------------
----------------------------------------------------------
Hash IN Join (cost=1033.67..134959.41 rows=37120 width=23) (actual time=357.11
6..356984.535 rows=37539 loops=1)
Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
-> Bitmap Heap Scan on test2 (cost=232.92..132766.66 rows=37120 width=23) (
actual time=291.600..356707.737 rows=37539 loops=1)
Recheck Cond: ((pid)::text = '1'::text)
-> Bitmap Index Scan on idx_test2_pid (cost=0.00..232.92 rows=37120 w
idth=0) (actual time=234.516..234.516 rows=37539 loops=1)
Index Cond: ((pid)::text = '1'::text)
-> Hash (cost=700.20..700.20 rows=40220 width=13) (actual time=65.055..65.0
55 rows=40220 loops=1)
-> Seq Scan on snps_test (cost=0.00..700.20 rows=40220 width=13) (act
ual time=0.020..30.131 rows=40220 loops=1)
Total runtime: 357017.259 ms

Unfortunately the query takes pretty long for the big table, so maybe one of you has a suggestion on how to make it faster.

--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

#2Hakan Kocaman
Hakan.Kocaman@digame.de
In reply to: Christian Rengstl (#1)
Re: Query performance

Hi,

can you post the complete query,schema- and table-definition,server-version etc. ?
This will help to identity the main problem.

So at the moment i'm just guessing:

Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
-> Bitmap Heap Scan on test2 (cost=232.92..132766.66 rows=37120 width=23)
(actual time=291.600..356707.737 rows=37539 loops=1)
This part is very expensive, but i got no clue why.
Maybe the text-type is not so ideal.

Best regards

Hakan Kocaman
Software-Development

digame.de GmbH
Richard-Byrd-Str. 4-8
50829 Köln

Tel.: +49 (0) 221 59 68 88 31
Fax: +49 (0) 221 59 68 88 98
Email: hakan.kocaman@digame.de

Show quoted text

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of
Christian Rengstl
Sent: Thursday, August 03, 2006 10:13 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Query performance

Hi everyone,

i have a table with around 57 million tuples, with the
following columns: pid(varchar), crit(varchar),
val1(varchar), val2(varchar). Example:
pid crit val1 val2
p1 c1 x y
p1 c2 x z
p1 c3 y x
...
What i am doing is to query all val1 and val2 for one pid and
all crit values:

select val1, val2, crit from mytable where pid='somepid' and
crit in(select crit from myCritTable);
where myCritTable is a table that contains all crit values
(around 42.000) ordered by their insertion date.

QUERY PLAN

--------------------------------------------------------------
------------------
----------------------------------------------------------
Hash IN Join (cost=1033.67..134959.41 rows=37120 width=23)
(actual time=357.11
6..356984.535 rows=37539 loops=1)
Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
-> Bitmap Heap Scan on test2 (cost=232.92..132766.66
rows=37120 width=23) (
actual time=291.600..356707.737 rows=37539 loops=1)
Recheck Cond: ((pid)::text = '1'::text)
-> Bitmap Index Scan on idx_test2_pid
(cost=0.00..232.92 rows=37120 w
idth=0) (actual time=234.516..234.516 rows=37539 loops=1)
Index Cond: ((pid)::text = '1'::text)
-> Hash (cost=700.20..700.20 rows=40220 width=13)
(actual time=65.055..65.0
55 rows=40220 loops=1)
-> Seq Scan on snps_test (cost=0.00..700.20
rows=40220 width=13) (act
ual time=0.020..30.131 rows=40220 loops=1)
Total runtime: 357017.259 ms

Unfortunately the query takes pretty long for the big table,
so maybe one of you has a suggestion on how to make it faster.

--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

---------------------------(end of
broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

#3Nikolay Samokhvalov
samokhvalov@gmail.com
In reply to: Christian Rengstl (#1)
Re: Query performance

On 8/3/06, Christian Rengstl <Christian.Rengstl@klinik.uni-regensburg.de> wrote:

...
Unfortunately the query takes pretty long for the big table, so maybe one of you has a suggestion on how to make it faster.

try smth like this:

select val1, val2, crit from mytable as a where pid='somepid' and
exists(select 1 from myCritTable as b where a.crit = b.crit);

--
Best regards,
Nikolay

#4Christian Rengstl
Christian.Rengstl@klinik.uni-regensburg.de
In reply to: Hakan Kocaman (#2)
Re: Query performance

Hi,

the complete query is the one i posted, but here comes the schema for mytable:
entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq''::regclass),
pid varchar(15) NOT NULL,
crit varchar(13) NOT NULL,
val1 varchar(1),
val2 varchar(1),
aendat text,
aennam varchar(8),
CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no)

myCritTable:
crit varchar(13) NOT NULL,
chr int2,
aendat timestamp,
CONSTRAINT pk_crit_master PRIMARY KEY (crit)

My server is 8.1.4. As a matter of fact, i have no idea where the text type comes from, because as you can see from above there are only varchar with maximum 15 characters.

"Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 10:34 am:

Hi,

can you post the complete query,schema- and
table-definition,server-version etc. ?
This will help to identity the main problem.

So at the moment i'm just guessing:

Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
-> Bitmap Heap Scan on test2 (cost=232.92..132766.66 rows=37120
width=23)
(actual time=291.600..356707.737 rows=37539 loops=1)
This part is very expensive, but i got no clue why.
Maybe the text-type is not so ideal.

Best regards

Hakan Kocaman
Software-Development

digame.de GmbH
Richard-Byrd-Str. 4-8
50829 Köln

Tel.: +49 (0) 221 59 68 88 31
Fax: +49 (0) 221 59 68 88 98
Email: hakan.kocaman@digame.de

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of
Christian Rengstl
Sent: Thursday, August 03, 2006 10:13 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Query performance

Hi everyone,

i have a table with around 57 million tuples, with the
following columns: pid(varchar), crit(varchar),
val1(varchar), val2(varchar). Example:
pid crit val1 val2
p1 c1 x y
p1 c2 x z
p1 c3 y x
...
What i am doing is to query all val1 and val2 for one pid and
all crit values:

select val1, val2, crit from mytable where pid='somepid' and
crit in(select crit from myCritTable);
where myCritTable is a table that contains all crit values
(around 42.000) ordered by their insertion date.

QUERY PLAN

--------------------------------------------------------------
------------------
----------------------------------------------------------
Hash IN Join (cost=1033.67..134959.41 rows=37120 width=23)
(actual time=357.11
6..356984.535 rows=37539 loops=1)
Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
-> Bitmap Heap Scan on test2 (cost=232.92..132766.66
rows=37120 width=23) (
actual time=291.600..356707.737 rows=37539 loops=1)
Recheck Cond: ((pid)::text = '1'::text)
-> Bitmap Index Scan on idx_test2_pid
(cost=0.00..232.92 rows=37120 w
idth=0) (actual time=234.516..234.516 rows=37539 loops=1)
Index Cond: ((pid)::text = '1'::text)
-> Hash (cost=700.20..700.20 rows=40220 width=13)
(actual time=65.055..65.0
55 rows=40220 loops=1)
-> Seq Scan on snps_test (cost=0.00..700.20
rows=40220 width=13) (act
ual time=0.020..30.131 rows=40220 loops=1)
Total runtime: 357017.259 ms

Unfortunately the query takes pretty long for the big table,
so maybe one of you has a suggestion on how to make it faster.

--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

---------------------------(end of
broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

#5Richard Huxton
dev@archonet.com
In reply to: Christian Rengstl (#4)
Re: Query performance

Christian Rengstl wrote:

Hi,

the complete query is the one i posted, but here comes the schema for mytable:
entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq''::regclass),
pid varchar(15) NOT NULL,
crit varchar(13) NOT NULL,
val1 varchar(1),
val2 varchar(1),
aendat text,
aennam varchar(8),
CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no)

myCritTable:
crit varchar(13) NOT NULL,
chr int2,
aendat timestamp,
CONSTRAINT pk_crit_master PRIMARY KEY (crit)

Still doesn't match the EXPLAIN output - where's snp_id? Where's table
test2?

My server is 8.1.4. As a matter of fact, i have no idea where the text
type comes from, because as you can see from above there are only
varchar with maximum 15 characters.

PG is casting it to text. There's no real difference between the types
(other than the size limit) and it's not expensive.

"Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 10:34 am:

Hi,

can you post the complete query,schema- and
table-definition,server-version etc. ?
This will help to identity the main problem.

So at the moment i'm just guessing:

Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
-> Bitmap Heap Scan on test2 (cost=232.92..132766.66 rows=37120
width=23)
(actual time=291.600..356707.737 rows=37539 loops=1)
This part is very expensive, but i got no clue why.

Yep, it looks like the "Bitmap Heap Scan" is at the heart of this. You
might want to increase work_mem, it could be that the bitmap is spilling
to disk (which is much slower than keeping it all in RAM)

http://www.postgresql.org/docs/8.1/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY

If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see what happens.

--
Richard Huxton
Archonet Ltd

#6Hakan Kocaman
Hakan.Kocaman@digame.de
In reply to: Richard Huxton (#5)
Re: Query performance

Hi,

-----Original Message-----
From: Richard Huxton [mailto:dev@archonet.com]
Sent: Thursday, August 03, 2006 11:00 AM
To: Christian Rengstl
Cc: Hakan Kocaman; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Query performance

Christian Rengstl wrote:

Hi,

the complete query is the one i posted, but here comes the

schema for mytable:

entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq''::regclass),
pid varchar(15) NOT NULL,
crit varchar(13) NOT NULL,
val1 varchar(1),
val2 varchar(1),
aendat text,
aennam varchar(8),
CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no)

myCritTable:
crit varchar(13) NOT NULL,
chr int2,
aendat timestamp,
CONSTRAINT pk_crit_master PRIMARY KEY (crit)

Still doesn't match the EXPLAIN output - where's snp_id?
Where's table
test2?

Yep, that bothered me too.

My server is 8.1.4. As a matter of fact, i have no idea

where the text

type comes from, because as you can see from above there are only
varchar with maximum 15 characters.

PG is casting it to text. There's no real difference between
the types
(other than the size limit) and it's not expensive.

But wouldn't a comparison between int4 be much cheaper.
If i see smth like "id" (here snp_id) in a fieldname it should be a int-type, i think.

"Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06

10:34 am:

Hi,

can you post the complete query,schema- and
table-definition,server-version etc. ?
This will help to identity the main problem.

So at the moment i'm just guessing:

Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
-> Bitmap Heap Scan on test2 (cost=232.92..132766.66

rows=37120

width=23)
(actual time=291.600..356707.737 rows=37539 loops=1)
This part is very expensive, but i got no clue why.

Yep, it looks like the "Bitmap Heap Scan" is at the heart of
this. You
might want to increase work_mem, it could be that the bitmap
is spilling
to disk (which is much slower than keeping it all in RAM)

http://www.postgresql.org/docs/8.1/static/runtime-config-resou

rce.html#RUNTIME-CONFIG-RESOURCE-MEMORY

If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see what happens.

--
Richard Huxton
Archonet Ltd

Hakan Kocaman
Software-Development

digame.de GmbH
Richard-Byrd-Str. 4-8
50829 Köln

Tel.: +49 (0) 221 59 68 88 31
Fax: +49 (0) 221 59 68 88 98
Email: hakan.kocaman@digame.de

#7Christian Rengstl
Christian.Rengstl@klinik.uni-regensburg.de
In reply to: Hakan Kocaman (#6)
Re: Query performance

Hi,

i would rather compare int4 too, but the snp_id can be something like "abc123" unfortunately.

"Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 11:08 am:

Hi,

-----Original Message-----
From: Richard Huxton [mailto:dev@archonet.com]
Sent: Thursday, August 03, 2006 11:00 AM
To: Christian Rengstl
Cc: Hakan Kocaman; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Query performance

Christian Rengstl wrote:

Hi,

the complete query is the one i posted, but here comes the

schema for mytable:

entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq''::regclass),
pid varchar(15) NOT NULL,
crit varchar(13) NOT NULL,
val1 varchar(1),
val2 varchar(1),
aendat text,
aennam varchar(8),
CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no)

myCritTable:
crit varchar(13) NOT NULL,
chr int2,
aendat timestamp,
CONSTRAINT pk_crit_master PRIMARY KEY (crit)

Still doesn't match the EXPLAIN output - where's snp_id?
Where's table
test2?

Yep, that bothered me too.

My server is 8.1.4. As a matter of fact, i have no idea

where the text

type comes from, because as you can see from above there are only
varchar with maximum 15 characters.

PG is casting it to text. There's no real difference between
the types
(other than the size limit) and it's not expensive.

But wouldn't a comparison between int4 be much cheaper.
If i see smth like "id" (here snp_id) in a fieldname it should be a
int-type, i think.

"Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06

10:34 am:

Hi,

can you post the complete query,schema- and
table-definition,server-version etc. ?
This will help to identity the main problem.

So at the moment i'm just guessing:

Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
-> Bitmap Heap Scan on test2 (cost=232.92..132766.66

rows=37120

width=23)
(actual time=291.600..356707.737 rows=37539 loops=1)
This part is very expensive, but i got no clue why.

Yep, it looks like the "Bitmap Heap Scan" is at the heart of
this. You
might want to increase work_mem, it could be that the bitmap
is spilling
to disk (which is much slower than keeping it all in RAM)

http://www.postgresql.org/docs/8.1/static/runtime-config-resou

rce.html#RUNTIME-CONFIG-RESOURCE-MEMORY

If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see what happens.

--
Richard Huxton
Archonet Ltd

Hakan Kocaman
Software-Development

digame.de GmbH
Richard-Byrd-Str. 4-8
50829 Köln

Tel.: +49 (0) 221 59 68 88 31
Fax: +49 (0) 221 59 68 88 98
Email: hakan.kocaman@digame.de

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

#8Hakan Kocaman
Hakan.Kocaman@digame.de
In reply to: Christian Rengstl (#7)
Re: Query performance

Hi,

maybe you should overthink your db-design, but thats a bit premature whithout your complete
table-definitions(including table-names, datatypes, indexes, foreign-key constraints,etc.)

If your are using pgadmin3 just cut'n paste the content of the window on the bottom left for
the corresponding tables.

If you're using psql try \d yur-table-name.

Best regards

Hakan Kocaman
Software-Development

digame.de GmbH
Richard-Byrd-Str. 4-8
50829 Köln

Tel.: +49 (0) 221 59 68 88 31
Fax: +49 (0) 221 59 68 88 98
Email: hakan.kocaman@digame.de

Show quoted text

-----Original Message-----
From: Christian Rengstl
[mailto:Christian.Rengstl@klinik.uni-regensburg.de]
Sent: Thursday, August 03, 2006 11:18 AM
To: Richard Huxton; Hakan Kocaman
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Query performance

Hi,

i would rather compare int4 too, but the snp_id can be
something like "abc123" unfortunately.

"Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 11:08 am:

Hi,

-----Original Message-----
From: Richard Huxton [mailto:dev@archonet.com]
Sent: Thursday, August 03, 2006 11:00 AM
To: Christian Rengstl
Cc: Hakan Kocaman; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Query performance

Christian Rengstl wrote:

Hi,

the complete query is the one i posted, but here comes the

schema for mytable:

entry_no int8 NOT NULL DEFAULT

nextval('entry_no_seq''::regclass),

pid varchar(15) NOT NULL,
crit varchar(13) NOT NULL,
val1 varchar(1),
val2 varchar(1),
aendat text,
aennam varchar(8),
CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no)

myCritTable:
crit varchar(13) NOT NULL,
chr int2,
aendat timestamp,
CONSTRAINT pk_crit_master PRIMARY KEY (crit)

Still doesn't match the EXPLAIN output - where's snp_id?
Where's table
test2?

Yep, that bothered me too.

My server is 8.1.4. As a matter of fact, i have no idea

where the text

type comes from, because as you can see from above

there are only

varchar with maximum 15 characters.

PG is casting it to text. There's no real difference between
the types
(other than the size limit) and it's not expensive.

But wouldn't a comparison between int4 be much cheaper.
If i see smth like "id" (here snp_id) in a fieldname it should be a
int-type, i think.

"Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06

10:34 am:

Hi,

can you post the complete query,schema- and
table-definition,server-version etc. ?
This will help to identity the main problem.

So at the moment i'm just guessing:

Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
-> Bitmap Heap Scan on test2 (cost=232.92..132766.66

rows=37120

width=23)
(actual time=291.600..356707.737 rows=37539 loops=1)
This part is very expensive, but i got no clue why.

Yep, it looks like the "Bitmap Heap Scan" is at the heart of
this. You
might want to increase work_mem, it could be that the bitmap
is spilling
to disk (which is much slower than keeping it all in RAM)

http://www.postgresql.org/docs/8.1/static/runtime-config-resou

rce.html#RUNTIME-CONFIG-RESOURCE-MEMORY

If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see

what happens.

--
Richard Huxton
Archonet Ltd

Hakan Kocaman
Software-Development

digame.de GmbH
Richard-Byrd-Str. 4-8
50829 Köln

Tel.: +49 (0) 221 59 68 88 31
Fax: +49 (0) 221 59 68 88 98
Email: hakan.kocaman@digame.de

---------------------------(end of

broadcast)---------------------------

TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's

datatypes do not

match

--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

#9Chris Mair
chrisnospam@1006.org
In reply to: Christian Rengstl (#1)
Re: Query performance

i have a table with around 57 million tuples, with the following columns: pid(varchar), crit(varchar), val1(varchar), val2(varchar). Example:
pid crit val1 val2
p1 c1 x y
p1 c2 x z
p1 c3 y x
...
What i am doing is to query all val1 and val2 for one pid and all crit values:

select val1, val2, crit from mytable where pid='somepid' and crit in(select crit from myCritTable);
where myCritTable is a table that contains all crit values (around 42.000) ordered by their insertion date.

In case myCritTable doesn't change a lot and this select by contrast is
executed a lot, have you considered precomputing whether a record from
your big table has a crit value from myCritTable?

Of course this info would be invalidated each time myCritTable is
updated, so you would trade fast selects on the big table vs. slow
updates on myCritTable. Don't know wether that makes sence for you...

Bye, Chris.

#10Christian Rengstl
Christian.Rengstl@klinik.uni-regensburg.de
In reply to: Hakan Kocaman (#8)
Re: Query performance

Hi,

here is the definition of the master table which is inherited by around 30 tables based on the value of chr:
CREATE TABLE snp_master
(
entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq'::regclass),
pid varchar(15) NOT NULL,
snp_id varchar(13) NOT NULL,
val1 varchar(1),
val2 varchar(1),
chr int2 NOT NULL,
aendat text,
aennam varchar(8),
CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no),
CONSTRAINT "UNIQUE_SNP_ALLEL_MASTER" UNIQUE (pid, entry_no, snp_id)
)
WITHOUT OIDS;

The thing is that i load the data from txt files which themselves have more or less the same structure. So for every pid of 1500 there are up to 42000 different snp_id values and for each of this combinations there are different val1 and val2 entries (all together this accounts for up to around 58 million tuples in the biggest table). MyCritTable then just contains the distinct snp_ids so that at least this query does not take very long any more.
CREATE TABLE snps_master
(
snp_id varchar(13) NOT NULL,
chr int2,
aendat timestamp,
CONSTRAINT pk_snp_master PRIMARY KEY (snp_id)
)
WITHOUT OIDS;

Up to now there are no foreign key constraints, as once the data is loaded into the db there will be no update or delete operations at all. I only have to export the data to different file formats for which i need the query posted originally.

"Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 11:36 am:

Hi,

maybe you should overthink your db-design, but thats a bit premature
whithout your complete
table-definitions(including table-names, datatypes, indexes,
foreign-key constraints,etc.)

If your are using pgadmin3 just cut'n paste the content of the window
on the bottom left for
the corresponding tables.

If you're using psql try \d yur-table-name.

Best regards

Hakan Kocaman
Software-Development

digame.de GmbH
Richard-Byrd-Str. 4-8
50829 Köln

Tel.: +49 (0) 221 59 68 88 31
Fax: +49 (0) 221 59 68 88 98
Email: hakan.kocaman@digame.de

-----Original Message-----
From: Christian Rengstl
[mailto:Christian.Rengstl@klinik.uni-regensburg.de]
Sent: Thursday, August 03, 2006 11:18 AM
To: Richard Huxton; Hakan Kocaman
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Query performance

Hi,

i would rather compare int4 too, but the snp_id can be
something like "abc123" unfortunately.

"Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 11:08 am:

Hi,

-----Original Message-----
From: Richard Huxton [mailto:dev@archonet.com]
Sent: Thursday, August 03, 2006 11:00 AM
To: Christian Rengstl
Cc: Hakan Kocaman; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Query performance

Christian Rengstl wrote:

Hi,

the complete query is the one i posted, but here comes the

schema for mytable:

entry_no int8 NOT NULL DEFAULT

nextval('entry_no_seq''::regclass),

pid varchar(15) NOT NULL,
crit varchar(13) NOT NULL,
val1 varchar(1),
val2 varchar(1),
aendat text,
aennam varchar(8),
CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no)

myCritTable:
crit varchar(13) NOT NULL,
chr int2,
aendat timestamp,
CONSTRAINT pk_crit_master PRIMARY KEY (crit)

Still doesn't match the EXPLAIN output - where's snp_id?
Where's table
test2?

Yep, that bothered me too.

My server is 8.1.4. As a matter of fact, i have no idea

where the text

type comes from, because as you can see from above

there are only

varchar with maximum 15 characters.

PG is casting it to text. There's no real difference between
the types
(other than the size limit) and it's not expensive.

But wouldn't a comparison between int4 be much cheaper.
If i see smth like "id" (here snp_id) in a fieldname it should be a
int-type, i think.

"Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06

10:34 am:

Hi,

can you post the complete query,schema- and
table-definition,server-version etc. ?
This will help to identity the main problem.

So at the moment i'm just guessing:

Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
-> Bitmap Heap Scan on test2 (cost=232.92..132766.66

rows=37120

width=23)
(actual time=291.600..356707.737 rows=37539 loops=1)
This part is very expensive, but i got no clue why.

Yep, it looks like the "Bitmap Heap Scan" is at the heart of
this. You
might want to increase work_mem, it could be that the bitmap
is spilling
to disk (which is much slower than keeping it all in RAM)

http://www.postgresql.org/docs/8.1/static/runtime-config-resou

rce.html#RUNTIME-CONFIG-RESOURCE-MEMORY

If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see

what happens.

--
Richard Huxton
Archonet Ltd

Hakan Kocaman
Software-Development

digame.de GmbH
Richard-Byrd-Str. 4-8
50829 Köln

Tel.: +49 (0) 221 59 68 88 31
Fax: +49 (0) 221 59 68 88 98
Email: hakan.kocaman@digame.de

---------------------------(end of

broadcast)---------------------------

TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's

datatypes do not

match

--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230