Postgresql sql query - selecting rows outside a join

Started by Graham Leggettover 22 years ago6 messagesgeneral
Jump to latest
#1Graham Leggett
minfrin@sharp.fm

Hi all,

I am trying to do a query that returns all rows that are _not_ part of
a join, and so far I cannot seem to find a query that doesn't take 30
minutes or more to run.

The basic query is "select * from tableA where tableA_id NOT IN (select
tableA_id from tableB)".

Is there a more efficient way of doing this?

Regards,
Graham
--

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Graham Leggett (#1)
Re: Postgresql sql query - selecting rows outside a join

Hello

On Mon, 1 Dec 2003, Graham Leggett wrote:

Show quoted text

Hi all,

I am trying to do a query that returns all rows that are _not_ part of
a join, and so far I cannot seem to find a query that doesn't take 30
minutes or more to run.

The basic query is "select * from tableA where tableA_id NOT IN (select
tableA_id from tableB)".

Is there a more efficient way of doing this?

Regards,
Graham
--

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

#3Arjen van der Meijden
acmmailing@vulcanus.its.tudelft.nl
In reply to: Graham Leggett (#1)
Re: Postgresql sql query - selecting rows outside a join

Graham Leggett

Hi all,

I am trying to do a query that returns all rows that are
_not_ part of a join, and so far I cannot seem to find a
query that doesn't take 30 minutes or more to run.

The basic query is "select * from tableA where tableA_id NOT
IN (select tableA_id from tableB)".

Is there a more efficient way of doing this?

Mysql's version to do something like that is, afaik:

SELECT * FROM tableA LEFT JOIN tableB USING(tableA_id) WHERE
tableB.tableA_id IS NULL;

Perhaps that makes more efficient use of your indices.
Another version is something like:
SELECT *, COUNT(tableB.*) FROM tableA LEFT JOIN tableB USING(tableA_id)
GROUP BY columns_of_tableA HAVING count(tableB.*) = 0;

And perhaps a rewrite to use EXISTS (although that is claimed to be
similar in speed or even slower as of postgres 7.4) is useful:
SELECT * FROM tableA WHERE NOT EXISTS(SELECT * FROM tableB WHERE
tableB.tableA_id = tableA.tableA_id)

There are a few others, but it all depens on your index structure and
table sizes whether they work more efficient or not.

Best regards,

Arjen

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: Graham Leggett (#1)
Re: Postgresql sql query - selecting rows outside a join

Hello

If you can, use PostgreSQL version 7.4. There is some optimalisation for
this questions. You can change your query from select .. IN (select) to
select .. exists (select). More about it you can find in FAQ.

regards
Pavel

On Mon, 1 Dec 2003, Graham Leggett wrote:

Show quoted text

Hi all,

I am trying to do a query that returns all rows that are _not_ part of
a join, and so far I cannot seem to find a query that doesn't take 30
minutes or more to run.

The basic query is "select * from tableA where tableA_id NOT IN (select
tableA_id from tableB)".

Is there a more efficient way of doing this?

Regards,
Graham
--

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

#5Toby Doig
toby@vibrantmedia.com
In reply to: Pavel Stehule (#4)
Re: Postgresql sql query - selecting rows outside a join

A suggestion is to use a left outer join and then test for rows with a
right-hand null value (ones where the join failed).

The following example creates 2 tables, t1 and t2.
t1 has related records in t2 and the relation is indicated by
t1.rel->t2.id

create table t1 (
id integer,
rel integer,
label varchar(10)
);

create table t2 (
id integer,
label varchar(10)
);

insert into t1 (id, rel, label) values (1, 1, 'bob');
insert into t1 (id, rel, label) values (2, 2, 'sam');
insert into t1 (id, rel, label) values (3, 0, 'alice');

insert into t2 (id, label) values (1, 'martin');
insert into t2 (id, label) values (2, 'gwen');

-- this shows you all the records
select t1.*, t2.* from t1 left outer join t2 on (t1.rel = t2.id);

-- this shows you those where the inner join fails (your where X NOT IN
(y) stuff)
-- the result should be where t1.id=3 because it has a t1.rel of 0
therefore no related
-- record in t2
select t1.*, t2.* from t1 left outer join t2 on (t1.rel = t2.id) where
t2.id is null

Toby
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Graham Leggett
Sent: 01 December 2003 17:16
To: pgsql-general@postgresql.org
Subject: [GENERAL] Postgresql sql query - selecting rows outside a join

Hi all,

I am trying to do a query that returns all rows that are _not_ part of
a join, and so far I cannot seem to find a query that doesn't take 30
minutes or more to run.

The basic query is "select * from tableA where tableA_id NOT IN (select
tableA_id from tableB)".

Is there a more efficient way of doing this?

Regards,
Graham
--

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

#6Adam Ruth
aruth@intercation.com
In reply to: Graham Leggett (#1)
Re: Postgresql sql query - selecting rows outside a join

On Dec 1, 2003, at 10:15 AM, Graham Leggett wrote:

Hi all,

I am trying to do a query that returns all rows that are _not_ part of
a join, and so far I cannot seem to find a query that doesn't take 30
minutes or more to run.

The basic query is "select * from tableA where tableA_id NOT IN (select
tableA_id from tableB)".

Is there a more efficient way of doing this?

Regards,
Graham
--

This should be more efficient:

select * from tableA where not exists (select null from tableB where
tableA_id = tableA.tableA_id);

Version 7.4 fixes the slow IN - NOT IN, but I haven't tested it myself
yet to see how it compares to this.

Adam Ruth