Re: [HACKERS] Bug on complex subselect (was: Bug on complex join)

Started by Oleg Broytmannabout 27 years ago9 messageshackers
Jump to latest
#1Oleg Broytmann
phd@sun.med.ru

Hello!

I rewrote my 4-tables join to use subselects:

SELECT DISTINCT subsec_id FROM positions
WHERE pos_id IN
(SELECT DISTINCT pos_id
FROM central
WHERE shop_id IN
(SELECT shop_id FROM shops
WHERE distr_id IN
(SELECT distr_id FROM districts
WHERE city_id = 2)
)
)
;

This does not work, either - postgres loops forever, until I cancel
psql.

I splitted it - I ran

(SELECT DISTINCT pos_id
FROM central
WHERE shop_id IN
(SELECT shop_id FROM shops
WHERE distr_id IN
(SELECT distr_id FROM districts
WHERE city_id = 2)
)
)

and stored result in a file. Then I substituted the subselect with the
file:

SELECT DISTINCT subsec_id FROM positions
WHERE pos_id IN
(1, 2, 3, 6, 22, 25, 26, 27, 28, 29, 31, 33, 34, 35, 38, 41, 42, 44, 45,
46, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 60, 61, 62, 63, 64)

and got desired result within a second.

This finally solves my problem, but I need to pass a long way to find
that postgres cannot handle such not too complex joins and subselects.

Oleg.
----
Oleg Broytmann http://members.xoom.com/phd2/ phd2@earthling.net
Programmers don't die, they just GOSUB without RETURN.

#2Jackson, DeJuan
djackson@cpsgroup.com
In reply to: Oleg Broytmann (#1)
RE: [HACKERS] Bug on complex subselect (was: Bug on complex join)

Hello!

I rewrote my 4-tables join to use subselects:

SELECT DISTINCT subsec_id FROM positions
WHERE pos_id IN
(SELECT DISTINCT pos_id
FROM central
WHERE shop_id IN
(SELECT shop_id FROM shops
WHERE distr_id IN
(SELECT distr_id FROM districts
WHERE city_id = 2)
)
)
;

This does not work, either - postgres loops forever, until I cancel
psql.

I splitted it - I ran

(SELECT DISTINCT pos_id
FROM central
WHERE shop_id IN
(SELECT shop_id FROM shops
WHERE distr_id IN
(SELECT distr_id FROM districts
WHERE city_id = 2)
)
)

and stored result in a file. Then I substituted the
subselect with the
file:

SELECT DISTINCT subsec_id FROM positions
WHERE pos_id IN
(1, 2, 3, 6, 22, 25, 26, 27, 28, 29, 31, 33, 34, 35, 38, 41,
42, 44, 45,
46, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 60, 61, 62, 63, 64)

and got desired result within a second.

This finally solves my problem, but I need to pass a long
way to find
that postgres cannot handle such not too complex joins and subselects.

If you think about the query long enough you'll realize that several
things have to be assummed for that query to be efficient.
Looking at you final query first, and assuming that you have an index on
positions(pos_id):
SELECT DISTINCT subsec_id FROM positions
WHERE pos_id IN
(1, 2, 3, 6, 22, 25, 26, 27, 28, 29, 31, 33, 34, 35, 38, 41,
42, 44, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
60, 61, 62, 63, 64)

This turns into 36 OR clauses in the backend; which as you have
expressed is not a problem for PostgreSQL. But as soon as you make that
IN clause a subselect Postgres can't assume that the results will be
static for each row that it's evaluating, therefore you have:
X rows in positions compared to Y rows from the subselect = X*Y
compares
Let's assume that there are only 10 rows in positions, and that each
select of central return 36 rows. We get up to 360 comparisons for that
query, and none of the compares is likely to use the index because they
are OR'ed together.
Now let's throw in the other subselect and assume that each table only
has 10 rows besides for central which obviously has to have more so
we'll assume 40.
10 rows from district
indexed on city_id (*YAY*) (btree index (maybe 2 compares)
-------------------------
2 rows results
OR= 10 rows from shops + 2) * 10
-------------------------
5 rows results
OR= 40 rows from central + 5) * 40
-------------------------
36 rows results
OR= 10 rows from position + 36) * 10)
------------------------- ------------------
5 rows results 18360 comparisons for this query
And you have to remember that only the innermost subselect is likely to
use an index. (My math could be wrong but,) I think you get the idea.

Try your query this way:
SELECT DISTINCT subsec_id
FROM positions p
WHERE EXISTS(SELECT 1
FROM central c, shops s, districts d
WHERE p.pos_id = c.pos_id AND
c.shop_id = s.shop_id AND
s.distr_id = d.distr_id AND
d.city_id = 2);
Make sure you have indexes on pos_id, shop_id, distr_id, and city_id.

#3Thomas Reinke
reinke@e-softinc.com
In reply to: Jackson, DeJuan (#2)

Forgive what may be a dumb question. When I subscribed to this group,
instructions for unsubscribing where as indicated in the excerpt below:

--

Welcome to the pgsql-hackers mailing list!

Please save this message for future reference. Thank you.

If you ever want to remove yourself from this mailing list,
you can send mail to <Majordomo@hub.org> with the following
command in the body of your email message:

unsubscribe pgsql-hackers

or from another account, besides reinke@e-softinc.com:

unsubscribe pgsql-hackers reinke@e-softinc.com

This doesn't work. The response was

unsubscribe psql-hackers

**** unsubscribe: unknown list 'psql-hackers'.
**** Help for Majordomo@hub.org:

So I read the instructions on the PostGresQL page
for the hackers mailing list, and it says

"To subscribe or unsubscribe from the list, send mail to
pgsql-hackers-request@postgresql.org. The body of the message
should
contain the single line "subscribe" or "unsubscribe".

When I do this, I get

----- The following addresses had permanent fatal errors -----
<psql-hackers-request@postgresql.org>

----- Transcript of session follows -----
... while talking to postgresql.org.:

RCPT To:<psql-hackers-request@postgresql.org>

<<< 550 <psql-hackers-request@postgresql.org>... User unknown
550 <psql-hackers-request@postgresql.org>... User unknown

Not to be ungrateful or anything, but I would like to get myself off
this
list. (I don't have time to filter through 50-100 messages/day).

Any suggestions on how I get myself off the list? (Perhaps it is
worthwhile
either updating the instructions on your website, or including an
_up_to_date_
copy in the signatures being sent out on the list?)

Thomas

#4Jackson, DeJuan
djackson@cpsgroup.com
In reply to: Thomas Reinke (#3)
RE: [HACKERS] Bug on complex subselect (was: Bug on complex join)

The problem is that you keep leaving out the 'g' in pgsql
Take a loser look at the instructions.
-DEJ

Show quoted text

-----Original Message-----
From: Thomas Reinke [mailto:reinke@e-softinc.com]
Sent: Wednesday, March 10, 1999 3:10 PM
To: hackers@postgreSQL.org
Subject: Re: [HACKERS] Bug on complex subselect (was: Bug on complex
join)

Forgive what may be a dumb question. When I subscribed to this group,
instructions for unsubscribing where as indicated in the
excerpt below:

--

Welcome to the pgsql-hackers mailing list!

Please save this message for future reference. Thank you.

If you ever want to remove yourself from this mailing list,
you can send mail to <Majordomo@hub.org> with the following
command in the body of your email message:

unsubscribe pgsql-hackers

or from another account, besides reinke@e-softinc.com:

unsubscribe pgsql-hackers reinke@e-softinc.com

This doesn't work. The response was

unsubscribe psql-hackers

**** unsubscribe: unknown list 'psql-hackers'.
**** Help for Majordomo@hub.org:

So I read the instructions on the PostGresQL page
for the hackers mailing list, and it says

"To subscribe or unsubscribe from the list, send mail to
pgsql-hackers-request@postgresql.org. The body of the message
should
contain the single line "subscribe" or "unsubscribe".

When I do this, I get

----- The following addresses had permanent fatal errors -----
<psql-hackers-request@postgresql.org>

----- Transcript of session follows -----
... while talking to postgresql.org.:

RCPT To:<psql-hackers-request@postgresql.org>

<<< 550 <psql-hackers-request@postgresql.org>... User unknown
550 <psql-hackers-request@postgresql.org>... User unknown

Not to be ungrateful or anything, but I would like to get myself off
this
list. (I don't have time to filter through 50-100 messages/day).

Any suggestions on how I get myself off the list? (Perhaps it is
worthwhile
either updating the instructions on your website, or including an
_up_to_date_
copy in the signatures being sent out on the list?)

Thomas

#5Igor Sysoev
igor@nitek.ru
In reply to: Jackson, DeJuan (#4)

Oleg Broytmann <phd@sun.med.ru> wrote:

I rewrote my 4-tables join to use subselects:

SELECT DISTINCT subsec_id FROM positions
WHERE pos_id IN
(SELECT DISTINCT pos_id
FROM central
WHERE shop_id IN
(SELECT shop_id FROM shops
WHERE distr_id IN
(SELECT distr_id FROM districts
WHERE city_id = 2)
)
)
;

This does not work, either - postgres loops forever, until I cancel
psql.

Yes, it's very ancient bug I knew it from time when subselects fisrt
appeared.

This finally solves my problem, but I need to pass a long way to find
that postgres cannot handle such not too complex joins and subselects.

Postgres cannot quick handle even simpler subselect on small enough
base (~ 1000 records) and when subselect return only one value.
Executing query like "SELECT ... WHERE ... IN ( SELECT ..."
Postgres eats memory and takes too long time too complete.
When it eats to many memory FreeBSD killed it.
The single way to resolve it is to rewrite subselect using EXISTS.

With best regards,
Igor Sysoev
http://www.nitek.ru/~igor/

#6Igor Sysoev
maillist@nitek.ru
In reply to: Igor Sysoev (#5)

Oleg Broytmann <phd@sun.med.ru> wrote:

I rewrote my 4-tables join to use subselects:

SELECT DISTINCT subsec_id FROM positions
WHERE pos_id IN
(SELECT DISTINCT pos_id
FROM central
WHERE shop_id IN
(SELECT shop_id FROM shops
WHERE distr_id IN
(SELECT distr_id FROM districts
WHERE city_id = 2)
)
)
;

This does not work, either - postgres loops forever, until I cancel
psql.

Yes, it's very ancient bug I knew it from time when subselects fisrt
appeared.

This finally solves my problem, but I need to pass a long way to find
that postgres cannot handle such not too complex joins and subselects.

Postgres cannot quick handle even simpler subselect on small enough
base (~ 1000 records) and when subselect return only one value.
Executing query like "SELECT ... WHERE ... IN ( SELECT ..."
Postgres eats memory and takes too long time too complete.
When it eats to many memory FreeBSD killed it.
The single way to resolve it is to rewrite subselect using EXISTS.

With best regards,
Igor Sysoev
http://www.nitek.ru/~igor/

#7Oleg Broytmann
phd@sun.med.ru
In reply to: Jackson, DeJuan (#2)
RE: [HACKERS] Bug on complex subselect (was: Bug on complex join)

Hello!

Vadim already gave the idea to use EXISTS. Will try it.
Thanks to all who replied!

On Wed, 10 Mar 1999, Jackson, DeJuan wrote:

Try your query this way:
SELECT DISTINCT subsec_id
FROM positions p
WHERE EXISTS(SELECT 1
FROM central c, shops s, districts d
WHERE p.pos_id = c.pos_id AND
c.shop_id = s.shop_id AND
s.distr_id = d.distr_id AND
d.city_id = 2);

Make sure you have indexes on pos_id, shop_id, distr_id, and city_id.

All these are primary keys in corresponding tables, and hence have
UNIQUE indicies. Is it enough?

Oleg.
----
Oleg Broytmann http://members.xoom.com/phd2/ phd2@earthling.net
Programmers don't die, they just GOSUB without RETURN.

#8Jackson, DeJuan
djackson@cpsgroup.com
In reply to: Oleg Broytmann (#7)
RE: [HACKERS] Bug on complex subselect (was: Bug on complex join)

Hello!

Vadim already gave the idea to use EXISTS. Will try it.
Thanks to all who replied!

On Wed, 10 Mar 1999, Jackson, DeJuan wrote:

Try your query this way:
SELECT DISTINCT subsec_id
FROM positions p
WHERE EXISTS(SELECT 1
FROM central c, shops s, districts d
WHERE p.pos_id = c.pos_id AND
c.shop_id = s.shop_id AND
s.distr_id = d.distr_id AND
d.city_id = 2);

Make sure you have indexes on pos_id, shop_id, distr_id,

and city_id.

All these are primary keys in corresponding tables, and hence have
UNIQUE indicies. Is it enough?

Oleg.

You should have indexes on both the primary and the referenced table.
(i.e. positions.pos_id and central.pos_id) It gives PostgreSQL more
options on which join methods to use while still having an index to
reference.

-DEJ

#9Oleg Broytmann
phd@sun.med.ru
In reply to: Jackson, DeJuan (#8)
RE: [HACKERS] Bug on complex subselect (was: Bug on complex join)

Hi!

On Thu, 11 Mar 1999, Jackson, DeJuan wrote:

Make sure you have indexes on pos_id, shop_id, distr_id,

and city_id.

All these are primary keys in corresponding tables, and hence have
UNIQUE indicies. Is it enough?

Oleg.

You should have indexes on both the primary and the referenced table.
(i.e. positions.pos_id and central.pos_id) It gives PostgreSQL more
options on which join methods to use while still having an index to
reference.

Understand.
Thank you.

-DEJ

Oleg.
----
Oleg Broytmann http://members.xoom.com/phd2/ phd2@earthling.net
Programmers don't die, they just GOSUB without RETURN.