Query too slow with "not in" condition

Started by சிவகுமார் மாover 17 years ago5 messagesgeneral
Jump to latest

Dear Friends,

I have loaded the backup from a live database in a test system. Both run
8.3.5 versions. The plan for a query varies in these systems.

Test System
A. PostgreSQL 8.3.5 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
20061115 (prerelease) (SUSE Linux)

B. explain select * from stock_transaction_detail_106 where transaction_id
not in (select transaction_id from transaction_value);
Seq Scan on stock_transaction_detail_106 (cost=1829.78..2867.74 rows=16478
width=128)
Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on transaction_value (cost=0.00..1598.02 rows=92702
width=4)

The query takes about 300 ms to run.

Production System

1. PostgreSQL 8.3.5 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.2.1
(SUSE Linux)

2. explain select * from stock_transaction_detail_106 where transaction_id
not in (select transaction_id from transaction_value);
Seq Scan on stock_transaction_detail_106 (cost=2153.95..25245478.39
rows=17064 width=122)
Filter: (NOT (subplan))
SubPlan
-> Materialize (cost=2153.95..3401.01 rows=92905 width=4)
-> Seq Scan on transaction_value (cost=0.00..1743.05 rows=92905
width=4)

Here the query did not return any results after 1hour.

In both the computers same query with in condition runs fast (520 ms and 290
ms respectively)

Please help me to resolve this issue. (One configuration difference between
these machines are pg_hba.conf file. In production machine it is password
enabled. In test machine it is trust mode.)

Thanks and regards,

Ma Sivakumar

மா சிவகுமார்
எல்லோரும் எல்லாமும் பெற வேண்டும்
http://masivakumar.blogspot.com

#2David Rowley
dgrowleyml@gmail.com
In reply to: சிவகுமார் மா (#1)
Re: Query too slow with "not in" condition

I have loaded the backup from a live database in a test system. Both run
8.3.5 versions. The plan for a query varies in these systems.

Test System
A. PostgreSQL 8.3.5 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
20061115 (prerelease) (SUSE Linux)

B. explain select * from stock_transaction_detail_106 where transaction_id
not in (select transaction_id from transaction_value);
Seq Scan on stock_transaction_detail_106 (cost=1829.78..2867.74

rows=16478 width=128)

Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on transaction_value (cost=0.00..1598.02 rows=92702

width=4)

The query takes about 300 ms to run.

Production System

1. PostgreSQL 8.3.5 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.2.1

(SUSE Linux)

2. explain select * from stock_transaction_detail_106 where transaction_id

not in (select transaction_id from transaction_value);

Seq Scan on stock_transaction_detail_106 (cost=2153.95..25245478.39

rows=17064 width=122)

Filter: (NOT (subplan))
SubPlan
-> Materialize (cost=2153.95..3401.01 rows=92905 width=4)
-> Seq Scan on transaction_value (cost=0.00..1743.05

rows=92905 width=4)

Here the query did not return any results after 1hour.

In both the computers same query with in condition runs fast (520 ms and

290 ms respectively)

Please help me to resolve this issue. (One configuration difference
between these machines are pg_hba.conf file. In production machine it is
password enabled. In test machine it is trust mode.)

You might find this page interesting:
http://www.depesz.com/index.php/2008/08/13/nulls-vs-not-in/

I assume workmem, effective_cache_size and random_page_cost are all the same
in the 2 postgresql.conf?

Do you get the same after you ANALYZE the database?

If you read the page above you might realise NOT IN is not really what you
want. Maybe NOT EXISTS or a LEFT OUTER JOIN ... WHERE transaction_id IS NULL
?

David.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#2)
Re: Query too slow with "not in" condition

"David Rowley" <dgrowley@gmail.com> writes:

I assume workmem, effective_cache_size and random_page_cost are all the same
in the 2 postgresql.conf?

Indeed, work_mem is probably the problem. The critical difference
between the two plans seems to be that the first one is using a
"hashed subplan" and the second one isn't. Assuming the same datatypes
in both databases, the only reason not to use a hashed subplan is if
the hashtable is estimated not to fit in work_mem.

regards, tom lane

In reply to: Tom Lane (#3)
Re: Query too slow with "not in" condition [Resolved]

Thanks a lot.

On Sun, Nov 30, 2008 at 11:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David Rowley" <dgrowley@gmail.com> writes:

I assume workmem, effective_cache_size and random_page_cost are all the

same

in the 2 postgresql.conf?

Indeed, work_mem is probably the problem. The critical difference
between the two plans seems to be that the first one is using a
"hashed subplan" and the second one isn't. Assuming the same datatypes in
both databases, the only reason not to use a hashed subplan is if the
hashtable is estimated not to fit in work_mem.

I changed work_mem in test machine to be implemented in production server
later. Completely forgot that when faced with this issue. Sorry for the
trouble.

I learnt about "hashed subplan" requiring sufficient work_mem, however.

Thanks and regards,

Ma Sivakumar

மா சிவகுமார்
எல்லோரும் எல்லாமும் பெற வேண்டும்
http://masivakumar.blogspot.com

In reply to: David Rowley (#2)
Re: Query too slow with "not in" condition

On Sun, Nov 30, 2008 at 3:28 PM, David Rowley <dgrowley@gmail.com> wrote:

You might find this page interesting:
http://www.depesz.com/index.php/2008/08/13/nulls-vs-not-in/

Thanks David. Another issue I was faced with was exactly what the link you
provided discusses.

Best regards,

Ma Sivakumar

மா சிவகுமார்
எல்லோரும் எல்லாமும் பெற வேண்டும்
http://masivakumar.blogspot.com