The most efficient way to put this?

Started by Arsalan Zaidiabout 24 years ago4 messagesgeneral
Jump to latest
#1Arsalan Zaidi
azaidi@directi.com

The query I'd like to run is fairly simple. Unfortunately, PG doesn't handle
IN's very well.

SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c FROM aa WHERE aa.a NOT IN (select
zz.a from zz);

This is with two very large tables. Both zz and aa have many millions of
rows.

The best I've come up with so far is...

SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c FROM aa WHERE NOT EXISTS (SELECT
zz.a FROM zz WHERE aa.a = zz.a);

Does anyone have anything better?

--Arsalan

-------------------------------------------------------------------
People often hate those things which they do not know, or cannot understand.
--Ali Ibn Abi Talib (A.S.)

#2Marc Polatschek
Marc.Polatschek@computec.de
In reply to: Arsalan Zaidi (#1)
Re: The most efficient way to put this?

hm, you can try to make a statement with UNION and GROUP by. i dont know
if it is faster...

-----Ursprüngliche Nachricht-----
Von: Arsalan Zaidi [mailto:azaidi@directi.com]
Gesendet: Dienstag, 5. März 2002 11:12
An: pgsql-general@postgresql.org
Betreff: [GENERAL] The most efficient way to put this?

The query I'd like to run is fairly simple. Unfortunately, PG doesn't
handle
IN's very well.

SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c FROM aa WHERE aa.a NOT IN
(select
zz.a from zz);

This is with two very large tables. Both zz and aa have many millions of
rows.

The best I've come up with so far is...

SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c FROM aa WHERE NOT EXISTS
(SELECT
zz.a FROM zz WHERE aa.a = zz.a);

Does anyone have anything better?

--Arsalan

-------------------------------------------------------------------
People often hate those things which they do not know, or cannot
understand.
--Ali Ibn Abi Talib (A.S.)

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

#3Jean-Michel POURE
jm.poure@freesurf.fr
In reply to: Arsalan Zaidi (#1)
Re: The most efficient way to put this?

Le Mardi 5 Mars 2002 11:11, Arsalan Zaidi a écrit :

SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c FROM aa WHERE aa.a NOT IN
(select zz.a from zz);

1) LEFT JOIN
SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c
FROM aa
LEFT JOIN zz ON zz.a = aa.a
WHERE zz.a IS NULL;

Run the query twice and VACUUM ANALYSE the database. Re-run. Is it faster
now? Make sure the required fields are indexed.

2) TRIGGER
Another solution if is to add a boolean field in aa and

- create a trigger on aa which stores the result of "NOT IN (select zz.a FROM
zz)". Then, select queries only rely on table aa, which is "lightning fast".
This is the kind of queries where PostgreSQL can be 10 times faster than
MySQL.

- create a trigger on bb which updates calculated values in aa when bb values
are updated.

The trigger solution is only possible if one of the two tables does not
change ofter (its values are not altered continuously).

Cheers,
Jean-Michel POURE

#4Noname
rolf.ostvik@axxessit.no
In reply to: Jean-Michel POURE (#3)
Re: The most efficient way to put this?

On 2002-03-05 "Arsalan Zaidi" <azaidi@directi.com> wrote

SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c FROM aa WHERE aa.a NOT IN

(select

zz.a from zz);

Does this work?

select distinct on (aa.a) aa.a, aa.b, aa.c
from aa left join zz on aa.a = zz.a
where zz.a isnull

--
Rolf