Speeding up subselect ?

Started by Peter Albereralmost 24 years ago5 messagesgeneral
Jump to latest
#1Peter Alberer
h9351252@obelix.wu-wien.ac.at

Hi there,

i have the following query in a pl/pgsql procedure:

update lr_object_usage
set status = (case status
when ''OPEN_SUCC'' then ''CLOSED_SUCC''
when ''OPEN_FAIL'' then ''CLOSED_FAIL'' end)
where lr_object_usage_id in (select lr_object_usage_id from
lr_locked_objects where context = p_exam_usage_id);

the problem is the result of the subselect is obviously calculated for
every row of lr_object_usage (30K rows) -> the update takes very long.
Here is the explain output with p_exam_usage_id being 38191. Two simple
selects with the result of the first one being used in the second one
vs. the subselect. Divided into two selects

Can I somehow tell the planer not to requery the subselect for every row
?

Many TIA,

peter

--
------------------------------------------------------------------

openacs=# select lr_object_usage_id from lr_locked_objects where context
= 38191;
lr_object_usage_id
--------------------
38192
38193
38194
38195
(4 rows)

openacs=# explain analyze select lr_object_usage_id from
lr_locked_objects where context = 38191;
NOTICE: QUERY PLAN:

Seq Scan on lr_locked_objects (cost=0.00..1.04 rows=1 width=4) (actual
time=0.06..0.09 rows=4 loops=1)
Total runtime: 0.18 msec

EXPLAIN

openacs=# select * from lr_object_usage where lr_object_usage_id in
(38192,38193,38194,38195);
lr_object_usage_id | lr_object_id | access_time |
user_id | status | context
--------------------+--------------+-------------------------------+----
-----+--------+---------
38192 | 13496 | 2002-07-08 17:47:01.001332+02 |
2434 | | 38191
38193 | 13503 | 2002-07-08 17:47:01.001332+02 |
2434 | | 38191
38194 | 13434 | 2002-07-08 17:47:01.001332+02 |
2434 | | 38191
(3 rows)

openacs=# explain analyze select * from lr_object_usage where
lr_object_usage_id in (38192,38193,38194,38195);
NOTICE: QUERY PLAN:

Index Scan using lr_object_usage_lr_object_usage,
lr_object_usage_lr_object_usage, lr_object_usage_lr_object_usage,
lr_object_usage_lr_object_usage on lr_object_usage (cost=0.00..12.18
rows=4 width=39) (actual time=0.06..0.20 rows=3 loops=1)
Total runtime: 0.38 msec

EXPLAIN

Doing the same thing with the subselect:

openacs=# explain analyze select * from lr_object_usage where
lr_object_usage_id in (select lr_object_usage_id from lr_locked_objects
where context = 38191);
NOTICE: QUERY PLAN:

Seq Scan on lr_object_usage (cost=0.00..17860.59 rows=16514 width=39)
(actual time=2640.91..2646.47 rows=3 loops=1)
SubPlan
-> Seq Scan on lr_locked_objects (cost=0.00..1.04 rows=1 width=4)
(actual time=0.03..0.06 rows=4 loops=33080)
Total runtime: 2646.60 msec

EXPLAIN

#2Martijn van Oosterhout
kleptog@svana.org
In reply to: Peter Alberer (#1)
Re: Speeding up subselect ?

On Mon, Jul 08, 2002 at 06:10:05PM +0200, Peter Alberer wrote:

Hi there,

i have the following query in a pl/pgsql procedure:

update lr_object_usage
set status = (case status
when ''OPEN_SUCC'' then ''CLOSED_SUCC''
when ''OPEN_FAIL'' then ''CLOSED_FAIL'' end)
where lr_object_usage_id in (select lr_object_usage_id from
lr_locked_objects where context = p_exam_usage_id);

Read the FAQ. Use EXISTS rather than IN.
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

There are 10 kinds of people in the world, those that can do binary
arithmetic and those that can't.

#3Darren Ferguson
darren@crystalballinc.com
In reply to: Peter Alberer (#1)
Re: Speeding up subselect ?

Is the situation possible where by you can run the subselect first at the
very start and put it into a variable.

Then put that variable in the query instead of the subselect

Darren

On Mon, 8 Jul 2002, Peter Alberer wrote:

Hi there,

i have the following query in a pl/pgsql procedure:

update lr_object_usage
set status = (case status
when ''OPEN_SUCC'' then ''CLOSED_SUCC''
when ''OPEN_FAIL'' then ''CLOSED_FAIL'' end)
where lr_object_usage_id in (select lr_object_usage_id from
lr_locked_objects where context = p_exam_usage_id);

the problem is the result of the subselect is obviously calculated for
every row of lr_object_usage (30K rows) -> the update takes very long.
Here is the explain output with p_exam_usage_id being 38191. Two simple
selects with the result of the first one being used in the second one
vs. the subselect. Divided into two selects

Can I somehow tell the planer not to requery the subselect for every row
?

Many TIA,

peter

--
------------------------------------------------------------------

openacs=# select lr_object_usage_id from lr_locked_objects where context
= 38191;
lr_object_usage_id
--------------------
38192
38193
38194
38195
(4 rows)

openacs=# explain analyze select lr_object_usage_id from
lr_locked_objects where context = 38191;
NOTICE: QUERY PLAN:

Seq Scan on lr_locked_objects (cost=0.00..1.04 rows=1 width=4) (actual
time=0.06..0.09 rows=4 loops=1)
Total runtime: 0.18 msec

EXPLAIN

openacs=# select * from lr_object_usage where lr_object_usage_id in
(38192,38193,38194,38195);
lr_object_usage_id | lr_object_id | access_time |
user_id | status | context
--------------------+--------------+-------------------------------+----
-----+--------+---------
38192 | 13496 | 2002-07-08 17:47:01.001332+02 |
2434 | | 38191
38193 | 13503 | 2002-07-08 17:47:01.001332+02 |
2434 | | 38191
38194 | 13434 | 2002-07-08 17:47:01.001332+02 |
2434 | | 38191
(3 rows)

openacs=# explain analyze select * from lr_object_usage where
lr_object_usage_id in (38192,38193,38194,38195);
NOTICE: QUERY PLAN:

Index Scan using lr_object_usage_lr_object_usage,
lr_object_usage_lr_object_usage, lr_object_usage_lr_object_usage,
lr_object_usage_lr_object_usage on lr_object_usage (cost=0.00..12.18
rows=4 width=39) (actual time=0.06..0.20 rows=3 loops=1)
Total runtime: 0.38 msec

EXPLAIN

Doing the same thing with the subselect:

openacs=# explain analyze select * from lr_object_usage where
lr_object_usage_id in (select lr_object_usage_id from lr_locked_objects
where context = 38191);
NOTICE: QUERY PLAN:

Seq Scan on lr_object_usage (cost=0.00..17860.59 rows=16514 width=39)
(actual time=2640.91..2646.47 rows=3 loops=1)
SubPlan
-> Seq Scan on lr_locked_objects (cost=0.00..1.04 rows=1 width=4)
(actual time=0.03..0.06 rows=4 loops=33080)
Total runtime: 2646.60 msec

EXPLAIN

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

--
Darren Ferguson

#4Joe Conway
mail@joeconway.com
In reply to: Peter Alberer (#1)
Re: Speeding up subselect ?

Peter Alberer wrote:

Hi there,

i have the following query in a pl/pgsql procedure:

update lr_object_usage
set status = (case status
when ''OPEN_SUCC'' then ''CLOSED_SUCC''
when ''OPEN_FAIL'' then ''CLOSED_FAIL'' end)
where lr_object_usage_id in (select lr_object_usage_id from
lr_locked_objects where context = p_exam_usage_id);

the problem is the result of the subselect is obviously calculated for
every row of lr_object_usage (30K rows) -> the update takes very long.
Here is the explain output with p_exam_usage_id being 38191. Two simple
selects with the result of the first one being used in the second one
vs. the subselect. Divided into two selects

Can I somehow tell the planer not to requery the subselect for every row

Try to recast the subselect as a FROM clause subselect. E.g. will this work?

update lr_object_usage
set status = (case status
when ''OPEN_SUCC'' then ''CLOSED_SUCC''
when ''OPEN_FAIL'' then ''CLOSED_FAIL'' end)
from (select lr_object_usage_id
from lr_locked_objects
where context = p_exam_usage_id) as t1
where lr_object_usage.lr_object_usage_id = t1.lr_object_usage_id;

HTH,

Joe

#5Manfred Koizar
mkoi-pg@aon.at
In reply to: Peter Alberer (#1)
Re: Speeding up subselect ?

On Mon, 8 Jul 2002 18:10:05 +0200, "Peter Alberer"
<h9351252@obelix.wu-wien.ac.at> wrote:

Hi there,

i have the following query in a pl/pgsql procedure:

update lr_object_usage
set status = (case status
when ''OPEN_SUCC'' then ''CLOSED_SUCC''
when ''OPEN_FAIL'' then ''CLOSED_FAIL'' end)
where lr_object_usage_id in (select lr_object_usage_id from
lr_locked_objects where context = p_exam_usage_id);

the problem is the result of the subselect is obviously calculated for
every row of lr_object_usage (30K rows) -> the update takes very long.

Peter, try

UPDATE lr_object_usage
SET status = CASE status
WHEN ''OPEN_SUCC'' THEN ''CLOSED_SUCC''
WHEN ''OPEN_FAIL'' THEN ''CLOSED_FAIL''
END
FROM lr_locked_objects o
WHERE lr_object_usage.lr_object_usage_id = o.lr_object_usage_id
AND <onetable>.context = <othertable>.p_exam_usage_id;

I didn't figure out where context and p_exam_usage_id come from.
Use at your own risk, I did not test it.

Servus
Manfred