BUG #7556 addition info

Started by Nonameover 13 years ago5 messagesbugs
Jump to latest
#1Noname
l1t@tom.com

plan of http://archives.postgresql.org/pgsql-bugs/2012-09/msg00222.php
test=# explain select max(a.info)from sli_test a where a.id not in(select b.id from sli_test2 b where b.id<50000);
QUERY PLAN
---------------------------------------------------------------------------------------
Aggregate (cost=9243466274.00..9243466274.01 rows=1 width=12)
-> Seq Scan on sli_test a (cost=0.00..9243465024.00 rows=500000 width=12)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=0.00..18359.60 rows=50919 width=4)
-> Seq Scan on sli_test2 b (cost=0.00..17906.00 rows=50919 width=4)
Filter: (id < 50000)

#2Craig Ringer
craig@2ndquadrant.com
In reply to: Noname (#1)
Re: BUG #7556 addition info

On 09/20/2012 01:14 PM, l1t@tom.com wrote:

plan of http://archives.postgresql.org/pgsql-bugs/2012-09/msg00222.php
test=# explain select max(a.info)from sli_test a where a.id not
in(select b.id from sli_test2 b where b.id<50000);
QUERY PLAN
---------------------------------------------------------------------------------------
Aggregate (cost=9243466274.00..9243466274.01 rows=1 width=12)
-> Seq Scan on sli_test a (cost=0.00..9243465024.00 rows=500000
width=12)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=0.00..18359.60 rows=50919 width=4)
-> Seq Scan on sli_test2 b (cost=0.00..17906.00
rows=50919 width=4)
Filter: (id < 50000)

That takes about 500ms here, on 9.1. What did you expect? What's the
EXPLAIN ANALYZE if your results differ?

You've failed to show your NOT EXISTS query and plan for comparison, but
I'm guessing:

regress=# explain analyze select max(a.info)from sli_test a where not
exists
(select 1 from sli_test2 b where a.id=b.id and b.id<50000);
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=50256.22..50256.23 rows=1 width=12) (actual
time=671.993..671.993 rows=1 loops=1)
-> Hash Anti Join (cost=18510.11..47877.04 rows=951671 width=12)
(actual time=103.048..459.508 rows=950001 loops=1)
Hash Cond: (a.id = b.id)
-> Seq Scan on sli_test a (cost=0.00..16274.00 rows=1000000
width=16) (actual time=0.016..71.871 rows=1000000 loops=1)
-> Hash (cost=17906.00..17906.00 rows=48329 width=4) (actual
time=86.948..86.948 rows=49999 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 1758kB
-> Seq Scan on sli_test2 b (cost=0.00..17906.00
rows=48329 width=4) (actual time=0.018..76.489 rows=49999 loops=1)
Filter: (id < 50000)
Total runtime: 672.379 ms
(9 rows)

which, if so performs signficiantly worse.

I'm struggling to understand what exactly your complaint is. It's kind
of hard to work out what the intention of the query is in real-world
terms to suggest alternatives, too.

--
Craig Ringer

#3Noname
l1t@tom.com
In reply to: Craig Ringer (#2)
RE:Re: BUG #7556 addition info

I downloaded the latest version,and the cost of "not in" is much higher than that of "not exist". please see attachment for detail.
As the time of query is very long,I didn't get the explain analyze result.
I think the id columns of table a and b are not null, so the query of "not in" and "not exists" are equal,they should use similar plans.
I notice there is a "Materialize" step in my bad plans of "not in",but there isn't in your following plan. I wonder how to get the plan you posted here, are there any configure argument or command?
egress=# explain select max(a.info)from sli_test a where a.id not in(select
regress(# b.id from sli_test2 b where b.id<50000);
QUERY PLAN
---------------------------------------------------------------------------------
Aggregate (cost=38050.82..38050.83 rows=1 width=12)
-> Seq Scan on sli_test a (cost=18026.82..36800.82 rows=500000 width=12)
Filter: (NOT (hashed SubPlan 1))
SubPlan 1
-> Seq Scan on sli_test2 b (cost=0.00..17906.00 rows=48329 width=4)
Filter: (id < 50000)
(6 rows)

It runs in about 500ms here.
and I got following plan on kingbase 7.1,a modified version of postgres product.
TEST=# explain select max(a.name)from a where a.id not in(select b.id from b where b.id<50000);
QUERY PLAN
--------------------------------------------------------------------------------
-----------------------
Aggregate (cost=72541.71..72541.72 rows=1 width=12)
-> Merge Anti Join (cost=0.00..70041.71 rows=1000000 width=12)
Merge Cond: (A.ID = B.ID)
-> Index Scan using A_PKEY on A (cost=0.00..32257.36 rows=1000000 width=16)
-> Index Scan using B_ID on B (cost=0.00..31117.69 rows=333333 width=4)
Index Cond: (ID < 50000)
(6 行)
时间: 20.095 ms

Attachments:

post921.txttext/plain; name="=?gb2312?B?cG9zdDkyMS50eHQ=?="Download
#4Noname
l1t@tom.com
In reply to: Noname (#3)
RE:Re: BUG #7556 addition info

My Server has 4GB mem and OS is Windows 2008 R2.
I downloaded the latest version,and the cost of "not in" is much higher than that of "not exist".Please see attachment for detail.
As the time of query is very long,I didn't get the explain analyze result.
I think the id columns of table a and b are not null, so the query of "not in" and "not exists" are equal,they should use similar plans.
I notice there is a "Materialize" step in my bad plans of "not in",but there isn't in your following plan. I wonder how to get the plan you posted here, are there any configure argument or command?
egress=# explain select max(a.info)from sli_test a where a.id not in(select
regress(# b.id from sli_test2 b where b.id<50000);
QUERY PLAN
---------------------------------------------------------------------------------
Aggregate (cost=38050.82..38050.83 rows=1 width=12)
-> Seq Scan on sli_test a (cost=18026.82..36800.82 rows=500000 width=12)
Filter: (NOT (hashed SubPlan 1))
SubPlan 1
-> Seq Scan on sli_test2 b (cost=0.00..17906.00 rows=48329 width=4)
Filter: (id < 50000)
(6 rows)

It runs in about 500ms here.
and I got following plan on kingbase 7.1,a modified version of postgres product, no "Materialize"step either.
TEST=# explain select max(a.name)from a where a.id not in(select b.id from b where b.id<50000);
QUERY PLAN
--------------------------------------------------------------------------------
-----------------------
Aggregate (cost=72541.71..72541.72 rows=1 width=12)
-> Merge Anti Join (cost=0.00..70041.71 rows=1000000 width=12)
Merge Cond: (A.ID = B.ID)
-> Index Scan using A_PKEY on A (cost=0.00..32257.36 rows=1000000 width=16)
-> Index Scan using B_ID on B (cost=0.00..31117.69 rows=333333 width=4)
Index Cond: (ID < 50000)
(6 行)
时间: 20.095 ms

Attachments:

post921.txttext/plain; name="=?gb2312?B?cG9zdDkyMS50eHQ=?="Download
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#3)
Re: RE:Re: BUG #7556 addition info

l1t@tom.com writes:

I think the id columns of table a and b are not null, so the query of "not in" and "not exists" are equal,they should use similar plans.

NOT IN and NOT EXISTS are *not* equivalent. Per SQL standard, NOT IN
has different (and usually not very desirable) behavior with NULL
values. The spec-mandated behavior makes it a lot harder to optimize
nicely --- and since you usually don't want what it does anyway, we
haven't tried hard.

regards, tom lane