Very slow query

Started by Rory Campbell-Langealmost 22 years ago7 messagesgeneral
Jump to latest
#1Rory Campbell-Lange
rory@campbell-lange.net

The following query on some small datasets takes over a second to run.
I'd be grateful for some help in understanding the explain output, and
to remake the code.

Looks like the sort is using up quite a bit of resources. I made an
index on boards using columns "b.n_type, b.n_id, b.t_name" but the index
was not recorded in explain analyze. (see "testindex" below).

I am using PostgreSQL 7.4.2 on i386-pc-linux-gnu (Debian). The query is
in a psql function (I've changed variables to hard coded integers
below). The db is vacuumed every night.

Thanks for any help;
Rory

SELECT DISTINCT
b.n_id as id,
b.n_type,
CASE b.n_type WHEN 0 THEN 'personal'
WHEN 1 THEN 'private'
WHEN 2 THEN 'blog'
ELSE 'public'
END as type,
b.t_name as title
FROM
boards b, people p, objects o
WHERE
b.b_hidden = 'f'
AND
(
b.n_type = 3
OR
b.n_creator = 71
OR
( b.n_id = o.n_board_id
AND
o.n_creator = 71
AND
o.n_joined > 0
)
)
ORDER BY
b.n_type, b.n_id ASC, b.t_name;

trial=> \d boards
Table "public.boards"
Column | Type | Modifiers
---------------+-----------------------------+----------------------------------------------------------
n_id | integer | not null default nextval('public.boards_n_id_seq'::text)
b_hidden | boolean | default false
dt_created | timestamp without time zone | default ('now'::text)::timestamp(6) with time zone
dt_modified | timestamp without time zone | default ('now'::text)::timestamp(6) with time zone
t_mobile | character varying(15) |
t_email | character varying(50) |
n_creator | integer | not null
n_type | smallint | not null default 0
t_name | character varying(100) | not null
t_description | character varying(500) |
n_id_photo | integer |
n_bg_colour | integer | default 0
Indexes:
"boards_pkey" primary key, btree (n_id)
"boards_t_email_key" unique, btree (t_email)
"boards_t_mobile_key" unique, btree (t_mobile)
"testindex" btree (n_type, n_id, t_name)
Foreign-key constraints:
"$1" FOREIGN KEY (n_creator) REFERENCES people(n_id) ON UPDATE CASCADE ON DELETE CASCADE
"$2" FOREIGN KEY (n_id_photo) REFERENCES photo(n_id) ON UPDATE CASCADE ON DELETE SET NULL
Triggers:
tr_update_modified_time BEFORE UPDATE ON boards FOR EACH ROW EXECUTE PROCEDURE fn_update_modified_time()

trial=> \d people
Table "public.people"
Column | Type | Modifiers
-------------------+-----------------------------+----------------------------------------------------------
n_id | integer | not null default nextval('public.people_n_id_seq'::text)
n_object_id | integer |
n_objects_counter | integer | default 0
b_hidden | boolean | default false
dt_created | timestamp without time zone | default ('now'::text)::timestamp(6) with time zone
dt_modified | timestamp without time zone | default ('now'::text)::timestamp(6) with time zone
t_nickname | character varying(20) | not null
t_firstname | character varying(20) | not null
t_surname | character varying(25) | not null
t_mobile | character varying(15) |
t_email | character varying(50) |
t_password | character varying(15) | not null
b_registered | boolean | default false
n_email_status | smallint | default 0
n_txt_status | smallint | default 0
b_work_hours | boolean | default false
b_confirmations | boolean | default true
Indexes:
"people_pkey" primary key, btree (n_id)
"people_t_email_key" unique, btree (t_email)
"people_t_mobile_key" unique, btree (t_mobile)
"people_t_nickname_key" unique, btree (t_nickname)
Foreign-key constraints:
"object_chk" FOREIGN KEY (n_object_id) REFERENCES objects(n_id) ON UPDATE CASCADE ON DELETE SET NULL
Triggers:
tr_update_modified_time BEFORE UPDATE ON people FOR EACH ROW EXECUTE PROCEDURE fn_update_modified_time()

trial=> \d objects
Table "public.objects"
Column | Type | Modifiers
---------------+-----------------------------+-----------------------------------------------------------
n_id | integer | not null default nextval('public.objects_n_id_seq'::text)
t_text_id | character varying(25) | not null
b_hidden | boolean | default false
dt_created | timestamp without time zone | default ('now'::text)::timestamp(6) with time zone
dt_modified | timestamp without time zone | default ('now'::text)::timestamp(6) with time zone
n_creator | integer | not null
n_type | smallint | not null default 0
t_name | character varying(100) | not null
t_description | text |
t_postcode | character varying(10) |
n_id_photo | integer |
n_board_id | integer | not null
n_joined | smallint |
b_trigger | boolean | default true
Indexes:
"objects_pkey" primary key, btree (n_id)
Foreign-key constraints:
"$1" FOREIGN KEY (n_creator) REFERENCES people(n_id) ON UPDATE CASCADE ON DELETE CASCADE
"$2" FOREIGN KEY (n_id_photo) REFERENCES photo(n_id) ON UPDATE CASCADE ON DELETE SET NULL
"$3" FOREIGN KEY (n_board_id) REFERENCES boards(n_id) ON UPDATE CASCADE ON DELETE CASCADE

Unique (cost=3677.26..3711.48 rows=102 width=18) (actual time=12566.422..13045.404 rows=4 loops=1)
-> Sort (cost=3677.26..3684.10 rows=2738 width=18) (actual time=12566.413..12809.928 rows=158254 loops=1)
Sort Key: b.n_type, b.n_id, b.t_name, CASE WHEN (b.n_type = 0) THEN 'personal'::text WHEN (b.n_type = 1) THEN 'private'::te
xt WHEN (b.n_type = 2) THEN 'blog'::text ELSE 'public'::text END
-> Nested Loop (cost=3442.79..3520.93 rows=2738 width=18) (actual time=0.244..1052.180 rows=158254 loops=1)
-> Seq Scan on people p (cost=0.00..2.67 rows=67 width=0) (actual time=0.012..0.709 rows=67 loops=1)
-> Materialize (cost=3442.79..3443.20 rows=41 width=18) (actual time=0.005..6.528 rows=2362 loops=67)
-> Nested Loop (cost=3.27..3442.79 rows=41 width=18) (actual time=0.216..273.709 rows=2362 loops=1)
Join Filter: ((("inner".n_id = "outer".n_board_id) OR ("inner".n_type = 3) OR ("inner".n_creator = 71)) A
ND (("outer".n_creator = 71) OR ("inner".n_type = 3) OR ("inner".n_creator = 71)) AND (("outer".n_joined > 0) OR ("inner".n_type = 3
) OR ("inner".n_creator = 71)))
-> Seq Scan on objects o (cost=0.00..27.87 rows=787 width=10) (actual time=0.005..1.372 rows=787 loops=
1)
-> Materialize (cost=3.27..4.29 rows=102 width=22) (actual time=0.001..0.105 rows=102 loops=787)
-> Seq Scan on boards b (cost=0.00..3.27 rows=102 width=22) (actual time=0.017..0.352 rows=102 lo
ops=1)
Filter: (b_hidden = false)
Total runtime: 13051.065 ms

--
Rory Campbell-Lange
<rory@campbell-lange.net>
<www.campbell-lange.net>

#2Rory Campbell-Lange
rory@campbell-lange.net
In reply to: Rory Campbell-Lange (#1)
Re: Very slow query

Sorry for replying to my own post, but I'm anxious for an answer. Should
I provide other information?

Thanks
Rory

On 10/05/04, Rory Campbell-Lange (rory@campbell-lange.net) wrote:

The following query on some small datasets takes over a second to run.
I'd be grateful for some help in understanding the explain output, and
to remake the code.

Looks like the sort is using up quite a bit of resources. I made an
index on boards using columns "b.n_type, b.n_id, b.t_name" but the index
was not recorded in explain analyze. (see "testindex" below).

I am using PostgreSQL 7.4.2 on i386-pc-linux-gnu (Debian). The query is
in a psql function (I've changed variables to hard coded integers
below). The db is vacuumed every night.

Thanks for any help;
Rory

SELECT DISTINCT
b.n_id as id,
b.n_type,
CASE b.n_type WHEN 0 THEN 'personal'
WHEN 1 THEN 'private'
WHEN 2 THEN 'blog'
ELSE 'public'
END as type,
b.t_name as title
FROM
boards b, people p, objects o
WHERE
b.b_hidden = 'f'
AND
(
b.n_type = 3
OR
b.n_creator = 71
OR
( b.n_id = o.n_board_id
AND
o.n_creator = 71
AND
o.n_joined > 0
)
)
ORDER BY
b.n_type, b.n_id ASC, b.t_name;

trial=> \d boards
Table "public.boards"
Column | Type | Modifiers
---------------+-----------------------------+----------------------------------------------------------
n_id | integer | not null default nextval('public.boards_n_id_seq'::text)
b_hidden | boolean | default false
dt_created | timestamp without time zone | default ('now'::text)::timestamp(6) with time zone
dt_modified | timestamp without time zone | default ('now'::text)::timestamp(6) with time zone
t_mobile | character varying(15) |
t_email | character varying(50) |
n_creator | integer | not null
n_type | smallint | not null default 0
t_name | character varying(100) | not null
t_description | character varying(500) |
n_id_photo | integer |
n_bg_colour | integer | default 0
Indexes:
"boards_pkey" primary key, btree (n_id)
"boards_t_email_key" unique, btree (t_email)
"boards_t_mobile_key" unique, btree (t_mobile)
"testindex" btree (n_type, n_id, t_name)
Foreign-key constraints:
"$1" FOREIGN KEY (n_creator) REFERENCES people(n_id) ON UPDATE CASCADE ON DELETE CASCADE
"$2" FOREIGN KEY (n_id_photo) REFERENCES photo(n_id) ON UPDATE CASCADE ON DELETE SET NULL
Triggers:
tr_update_modified_time BEFORE UPDATE ON boards FOR EACH ROW EXECUTE PROCEDURE fn_update_modified_time()

trial=> \d people
Table "public.people"
Column | Type | Modifiers
-------------------+-----------------------------+----------------------------------------------------------
n_id | integer | not null default nextval('public.people_n_id_seq'::text)
n_object_id | integer |
n_objects_counter | integer | default 0
b_hidden | boolean | default false
dt_created | timestamp without time zone | default ('now'::text)::timestamp(6) with time zone
dt_modified | timestamp without time zone | default ('now'::text)::timestamp(6) with time zone
t_nickname | character varying(20) | not null
t_firstname | character varying(20) | not null
t_surname | character varying(25) | not null
t_mobile | character varying(15) |
t_email | character varying(50) |
t_password | character varying(15) | not null
b_registered | boolean | default false
n_email_status | smallint | default 0
n_txt_status | smallint | default 0
b_work_hours | boolean | default false
b_confirmations | boolean | default true
Indexes:
"people_pkey" primary key, btree (n_id)
"people_t_email_key" unique, btree (t_email)
"people_t_mobile_key" unique, btree (t_mobile)
"people_t_nickname_key" unique, btree (t_nickname)
Foreign-key constraints:
"object_chk" FOREIGN KEY (n_object_id) REFERENCES objects(n_id) ON UPDATE CASCADE ON DELETE SET NULL
Triggers:
tr_update_modified_time BEFORE UPDATE ON people FOR EACH ROW EXECUTE PROCEDURE fn_update_modified_time()

trial=> \d objects
Table "public.objects"
Column | Type | Modifiers
---------------+-----------------------------+-----------------------------------------------------------
n_id | integer | not null default nextval('public.objects_n_id_seq'::text)
t_text_id | character varying(25) | not null
b_hidden | boolean | default false
dt_created | timestamp without time zone | default ('now'::text)::timestamp(6) with time zone
dt_modified | timestamp without time zone | default ('now'::text)::timestamp(6) with time zone
n_creator | integer | not null
n_type | smallint | not null default 0
t_name | character varying(100) | not null
t_description | text |
t_postcode | character varying(10) |
n_id_photo | integer |
n_board_id | integer | not null
n_joined | smallint |
b_trigger | boolean | default true
Indexes:
"objects_pkey" primary key, btree (n_id)
Foreign-key constraints:
"$1" FOREIGN KEY (n_creator) REFERENCES people(n_id) ON UPDATE CASCADE ON DELETE CASCADE
"$2" FOREIGN KEY (n_id_photo) REFERENCES photo(n_id) ON UPDATE CASCADE ON DELETE SET NULL
"$3" FOREIGN KEY (n_board_id) REFERENCES boards(n_id) ON UPDATE CASCADE ON DELETE CASCADE

Unique (cost=3677.26..3711.48 rows=102 width=18) (actual time=12566.422..13045.404 rows=4 loops=1)
-> Sort (cost=3677.26..3684.10 rows=2738 width=18) (actual time=12566.413..12809.928 rows=158254 loops=1)
Sort Key: b.n_type, b.n_id, b.t_name, CASE WHEN (b.n_type = 0) THEN 'personal'::text WHEN (b.n_type = 1) THEN 'private'::te
xt WHEN (b.n_type = 2) THEN 'blog'::text ELSE 'public'::text END
-> Nested Loop (cost=3442.79..3520.93 rows=2738 width=18) (actual time=0.244..1052.180 rows=158254 loops=1)
-> Seq Scan on people p (cost=0.00..2.67 rows=67 width=0) (actual time=0.012..0.709 rows=67 loops=1)
-> Materialize (cost=3442.79..3443.20 rows=41 width=18) (actual time=0.005..6.528 rows=2362 loops=67)
-> Nested Loop (cost=3.27..3442.79 rows=41 width=18) (actual time=0.216..273.709 rows=2362 loops=1)
Join Filter: ((("inner".n_id = "outer".n_board_id) OR ("inner".n_type = 3) OR ("inner".n_creator = 71)) A
ND (("outer".n_creator = 71) OR ("inner".n_type = 3) OR ("inner".n_creator = 71)) AND (("outer".n_joined > 0) OR ("inner".n_type = 3
) OR ("inner".n_creator = 71)))
-> Seq Scan on objects o (cost=0.00..27.87 rows=787 width=10) (actual time=0.005..1.372 rows=787 loops=
1)
-> Materialize (cost=3.27..4.29 rows=102 width=22) (actual time=0.001..0.105 rows=102 loops=787)
-> Seq Scan on boards b (cost=0.00..3.27 rows=102 width=22) (actual time=0.017..0.352 rows=102 lo
ops=1)
Filter: (b_hidden = false)
Total runtime: 13051.065 ms

--
Rory Campbell-Lange
<rory@campbell-lange.net>
<www.campbell-lange.net>

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

--
Rory Campbell-Lange
<rory@campbell-lange.net>
<www.campbell-lange.net>

#3Nick Barr
nicky@chuckie.co.uk
In reply to: Rory Campbell-Lange (#2)
Re: Very slow query

[snip]
Rory Campbell-Lange wrote:

SELECT DISTINCT
b.n_id as id,
b.n_type,
CASE b.n_type WHEN 0 THEN 'personal'
WHEN 1 THEN 'private'
WHEN 2 THEN 'blog'
ELSE 'public'
END as type,
b.t_name as title
FROM
boards b, people p, objects o
WHERE
b.b_hidden = 'f'
AND
(
b.n_type = 3
OR
b.n_creator = 71
OR
( b.n_id = o.n_board_id
AND
o.n_creator = 71
AND
o.n_joined > 0
)
)
ORDER BY
b.n_type, b.n_id ASC, b.t_name;

You don't seem to be joining the people table to the boards or objects
table. In fact the people table is never referenced anywhere but the
FROM clause. This might be why it is not running so well. People seems
to be referenced by both boards and objects, so does there need to be a
join between all 3 tables?

[snip]

"$1" FOREIGN KEY (n_creator) REFERENCES people(n_id) ON UPDATE CASCADE ON DELETE CASCADE

[snip]

"$1" FOREIGN KEY (n_creator) REFERENCES people(n_id) ON UPDATE CASCADE ON DELETE CASCADE

It is this "missing" join which seems to cause 158254 rows to appear in
this nested loop, when PG only reckons it is going to see 2738 rows

-> Nested Loop (cost=3442.79..3520.93 rows=2738 width=18) (actual time=0.244..1052.180 rows=158254 loops=1)

I cant remember what the correct term is, but basically all rows from
the seq scan combines with all rows from the materialize, which is what
is causing the 150,000 odd rows to appear (is that called a cross
product anyone?)

i.e

67 rows from the seq x 2362 rows from the materialize = 158254 output
rows for the nested loop to chug through.

-> Seq Scan on people p (cost=0.00..2.67 rows=67 width=0) (actual time=0.012..0.709 rows=67 loops=1)
-> Materialize (cost=3442.79..3443.20 rows=41 width=18) (actual time=0.005..6.528 rows=2362 loops=67)

I reckon fix the unconstrained join and your query might run a little
faster.

-> Nested Loop (cost=3.27..3442.79 rows=41

width=18) (actual time=0.216..273.709 rows=2362 loops=1)

The estimate for this nested loop seems a little off, note the
guesstimate in the first set of brackets of 41, and compare with the
actual result in the second set of 2362. Have you vacuum analyzed recently?

HTH

Nick

#4Paul Thomas
paul@tmsl.demon.co.uk
In reply to: Rory Campbell-Lange (#2)
Re: Very slow query

On 10/05/2004 22:30 Rory Campbell-Lange wrote:

Sorry for replying to my own post, but I'm anxious for an answer. Should
I provide other information?

Look carefully at your column types. I can see several smallint columns in
there WHERE clause which are not expicitely typed as such.

-- 
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for 
Business             |
| Computer Consultants         | 
http://www.thomas-micro-systems-ltd.co.uk   |
+------------------------------+---------------------------------------------+
#5Nick Barr
nicky@chuckie.co.uk
In reply to: Rory Campbell-Lange (#1)
Re: Very slow query

Rory Campbell-Lange wrote:

Wow, this is an education! My php profiler now shows:

0.02 db->db_board_listing C: /var/www/trial/php/db.php:175 M: 867024
0.02 ob_start C: /var/www/trial/php/db.php:238 M: 867024
0.02 pg_exec C: /var/www/trial/php/db.php:239 M: 908328
1.24 pg_result_status C: /var/www/trial/php/db.php:240 M: 908392
1.24 ob_end_clean C: /var/www/trial/php/db.php:241 M: 908416
1.24 db->result_checker C: /var/www/trial/php/db.php:243 M: 867472
1.24 is_resource C: /var/www/trial/php/db.php:1317 M: 867472
1.24 pg_numrows C: /var/www/trial/php/db.php:248 M: 867568
1.24 pg_fetch_array C: /var/www/trial/php/db.php:250 M: 867600
1.24 pg_fetch_array C: /var/www/trial/php/db.php:250 M: 867760

and now shows:

C: /var/www/trial/php/core.php:151 M: 867040
0.02 db->db_board_listing C: /var/www/trial/php/db.php:175 M: 867040
0.02 ob_start C: /var/www/trial/php/db.php:238 M: 867040
0.02 pg_exec C: /var/www/trial/php/db.php:239 M: 908344
0.14 pg_result_status C: /var/www/trial/php/db.php:240 M: 908408
0.14 ob_end_clean C: /var/www/trial/php/db.php:241 M: 908432
0.14 db->result_checker C: /var/www/trial/php/db.php:243 M: 867488
0.14 is_resource C: /var/www/trial/php/db.php:1317 M: 867488
0.14 pg_numrows C: /var/www/trial/php/db.php:248 M: 867584
0.14 pg_fetch_array C: /var/www/trial/php/db.php:250 M: 867616
0.14 pg_fetch_array C: /var/www/trial/php/db.php:250 M: 867776

This is a saving of 1.1 seconds!

Thats a bit better ;-)

On 10/05/2004 22:30 Rory Campbell-Lange wrote:

Sorry for replying to my own post, but I'm anxious for an answer. Should
I provide other information?

Look carefully at your column types. I can see several smallint columns in
there WHERE clause which are not expicitely typed as such.

I'm not sure how to do this, Paul. do I do b.n_id::smallint ? Is
smallint not implied?

Thanks
Rory

Not quite. Explicit casts are needed when you have any numbers in the
WHERE condition and the columns are not of type integer/int4. For
example I have tweaked your query.

WHERE
b.b_hidden = 'f'
AND
(
b.n_type = 3::smallint
OR
b.n_creator = 71
OR
(
b.n_id = o.n_board_id
AND
o.n_creator = 71
AND
o.n_joined > 0::smallint
)
)

Note that b.n_creator and o.n_creator do not need explicit casts because
they are both of type integer anyway. You could of course put them in
for clarity. PG only casts the numbers to integer's, and not to smallint
or bigint, which basically means it does not use any indexes on that
column. This is fixed in 7.4 I believe, which you seem to be running
anyway so you might not be affected.

HTH

Nick

#6Rory Campbell-Lange
rory@campbell-lange.net
In reply to: Nick Barr (#5)
Re: Very slow query

On 11/05/04, Nick Barr (nicky@chuckie.co.uk) wrote:

Rory Campbell-Lange wrote:

Look carefully at your column types. I can see several smallint
columns in there WHERE clause which are not expicitely typed as
such.

I'm not sure how to do this, Paul. do I do b.n_id::smallint ? Is
smallint not implied?

Not quite. Explicit casts are needed when you have any numbers in the
WHERE condition and the columns are not of type integer/int4. For
example I have tweaked your query.

...

Note that b.n_creator and o.n_creator do not need explicit casts because
they are both of type integer anyway. You could of course put them in
for clarity. PG only casts the numbers to integer's, and not to smallint
or bigint, which basically means it does not use any indexes on that
column. This is fixed in 7.4 I believe, which you seem to be running
anyway so you might not be affected.

Mmm. Seems like I should try profiling with and without the cast. Maybe
there is little value in defining a column as a smallint, other than
checking the length of input.

Thanks again for your help,
Rory
--
Rory Campbell-Lange
<rory@campbell-lange.net>
<www.campbell-lange.net>

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Rory Campbell-Lange (#1)
Re: Very slow query

Rory Campbell-Lange <rory@campbell-lange.net> writes:

The following query on some small datasets takes over a second to run.
I'd be grateful for some help in understanding the explain output, and
to remake the code.

I think the problem is you have an unconstrained join against "people
p", which is wasting a bunch of time generating duplicate records
(and then sorting them, and then discarding them in the DISTINCT step...)

regards, tom lane