constraints in query plans

Started by Jeremy Drakeabout 19 years ago4 messages
#1Jeremy Drake
pgsql@jdrake.com

I set up the following experiment:

CREATE DOMAIN m_or_p AS "char" CHECK (VALUE = 'm' OR VALUE = 'p');

CREATE TABLE test_domain (
fkey integer not null,
k integer not null,
x1 integer not null,
x2 integer,
mp m_or_p not null
);

CREATE INDEX test_domain_k_x1_x2_m ON test_domain (k, x1, x2) WHERE mp = 'm';
CREATE INDEX test_domain_k_x1_x2_p ON test_domain (k, x1, x2) WHERE mp = 'p';

then added about 375000 rows, half where mp = 'm' and half where mp = 'p'

Now, I do analyze verbose test_domain
jeremyd=# ANALYZE verbose test_domain;
INFO: analyzing "public.test_domain"
INFO: "test_domain": scanned 2379 of 2379 pages, containing 375226 live
rows and 0 dead rows; 3000 rows in sample, 375226 estimated total rows
ANALYZE

Now, take a look at this
jeremyd=# explain select * from test_domain where k = 1255;
QUERY PLAN
-----------------------------------------------------------------
Seq Scan on test_domain (cost=0.00..7069.32 rows=127 width=17)
Filter: (k = 1255)
(2 rows)

I turn constraint_exclusion on and I still get the same plan. I tried
adding the domain's constraint to the table as well

ALTER TABLE test_domain ADD CHECK (mp = 'm' OR mp = 'p');

and I still get the same plan. It seems the constraint is not
incorporated into the plan, since I get a different plan if I include the
constraint in the WHERE clause explicitly:

jeremyd=# explain select * from test_domain where k = 1255 AND (mp = 'm' OR mp = 'p');
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test_domain (cost=9.97..423.26 rows=95 width=17)
Recheck Cond: (((k = 1255) AND ((mp)::"char" = 'm'::"char")) OR ((k = 1255) AND ((mp)::"char" = 'p'::"char")))
-> BitmapOr (cost=9.97..9.97 rows=127 width=0)
-> Bitmap Index Scan on test_domain_k_x1_x2_m (cost=0.00..4.98 rows=60 width=0)
Index Cond: (k = 1255)
-> Bitmap Index Scan on test_domain_k_x1_x2_p (cost=0.00..4.98 rows=67 width=0)
Index Cond: (k = 1255)
(7 rows)

And the explain analyze for each:

jeremyd=# explain analyze select * from test_domain where k = 1255 AND (mp = 'm' OR mp = 'p');
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test_domain (cost=9.97..423.26 rows=95 width=17) (actual time=0.325..2.397 rows=261 loops=1)
Recheck Cond: (((k = 1255) AND ((mp)::"char" = 'm'::"char")) OR ((k = 1255) AND ((mp)::"char" = 'p'::"char")))
-> BitmapOr (cost=9.97..9.97 rows=127 width=0) (actual time=0.269..0.269 rows=0 loops=1)
-> Bitmap Index Scan on test_domain_k_x1_x2_m (cost=0.00..4.98 rows=60 width=0) (actual time=0.150..0.150 rows=129 loops=1)
Index Cond: (k = 1255)
-> Bitmap Index Scan on test_domain_k_x1_x2_p (cost=0.00..4.98 rows=67 width=0) (actual time=0.101..0.101 rows=132 loops=1)
Index Cond: (k = 1255)
Total runtime: 3.238 ms
(8 rows)

jeremyd=# explain analyze select * from test_domain where k = 1255;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on test_domain (cost=0.00..7069.32 rows=127 width=17) (actual time=0.427..125.057 rows=261 loops=1)
Filter: (k = 1255)
Total runtime: 125.878 ms
(3 rows)

ISTM that with the constraint_exclusion flag on, it should see from the
constraints that all values but 'm' or 'p' are excluded for the column mp,
and thus the two queries I gave are exactly equivalent. I noticed that
the docs said it looked at table constraints, so I added the check to the
table constraint as well, but it made no difference. I'm not sure if this
is a bug or a limitation of the planner, but it seems that these two
queries are equivalent. I wonder how it would work out with boolean
instead of the "char" column, it should definitely know that there are
only 2 possible values for a boolean not null column, true or false.

DROP INDEX test_domain_k_x1_x2_p;
DROP INDEX test_domain_k_x1_x2_m;
ALTER TABLE test_domain DROP CONSTRAINT test_domain_mp_check;
ALTER TABLE test_domain ALTER COLUMN mp TYPE boolean USING (CASE WHEN mp = 'm' THEN false ELSE true END);
CREATE INDEX test_domain_k_x1_x2_p ON test_domain (k, x1, x2) WHERE mp;
CREATE INDEX test_domain_k_x1_x2_m ON test_domain (k, x1, x2) WHERE NOT mp;

jeremyd=# ANALYZE verbose test_domain;
INFO: analyzing "public.test_domain"
INFO: "test_domain": scanned 2379 of 2379 pages, containing 375226 live
rows and 0 dead rows; 3000 rows in sample, 375226 estimated total rows
ANALYZE

jeremyd=# explain analyze select * from test_domain where k = 1255;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on test_domain (cost=0.00..7069.32 rows=131 width=17) (actual time=0.317..103.822 rows=261 loops=1)
Filter: (k = 1255)
Total runtime: 104.631 ms
(3 rows)

jeremyd=# explain analyze select * from test_domain where k = 1255 AND (mp OR NOT mp);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test_domain (cost=10.01..434.58 rows=98 width=17) (actual time=0.162..1.132 rows=261 loops=1)
Recheck Cond: (((k = 1255) AND mp) OR ((k = 1255) AND (NOT mp)))
-> BitmapOr (cost=10.01..10.01 rows=131 width=0) (actual time=0.129..0.129 rows=0 loops=1)
-> Bitmap Index Scan on test_domain_k_x1_x2_p (cost=0.00..5.01 rows=66 width=0) (actual time=0.077..0.077 rows=132 loops=1)
Index Cond: (k = 1255)
-> Bitmap Index Scan on test_domain_k_x1_x2_m (cost=0.00..5.01 rows=65 width=0) (actual time=0.042..0.042 rows=129 loops=1)
Index Cond: (k = 1255)
Total runtime: 1.947 ms
(8 rows)

Now this one looks even more strange, that adding a no-op like 'AND (mp OR
NOT mp)' to the query gives a completely different plan. I'm thinking I
should have named the column to_be ;)

--
Truth is the most valuable thing we have -- so let us economize it.
-- Mark Twain

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeremy Drake (#1)
Re: constraints in query plans

Jeremy Drake <pgsql@jdrake.com> writes:

CREATE TABLE test_domain (
fkey integer not null,
k integer not null,
x1 integer not null,
x2 integer,
mp m_or_p not null
);

CREATE INDEX test_domain_k_x1_x2_m ON test_domain (k, x1, x2) WHERE mp = 'm';
CREATE INDEX test_domain_k_x1_x2_p ON test_domain (k, x1, x2) WHERE mp = 'p';

Perhaps you need a non-partial index.

regards, tom lane

#3Jeremy Drake
pgsql@jdrake.com
In reply to: Tom Lane (#2)
Re: constraints in query plans

On Sun, 15 Oct 2006, Tom Lane wrote:

Jeremy Drake <pgsql@jdrake.com> writes:

CREATE TABLE test_domain (
fkey integer not null,
k integer not null,
x1 integer not null,
x2 integer,
mp m_or_p not null
);

CREATE INDEX test_domain_k_x1_x2_m ON test_domain (k, x1, x2) WHERE mp = 'm';
CREATE INDEX test_domain_k_x1_x2_p ON test_domain (k, x1, x2) WHERE mp = 'p';

Perhaps you need a non-partial index.

I just tried that,
CREATE INDEX test_domain_k_x1_x2_mp ON test_domain (k, x1, x2, mp);

and dropped the others. That actually works properly.
jeremyd=# explain analyze select * from test_domain where k = 1255 and mp;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test_domain (cost=5.37..237.21 rows=66 width=17) (actual time=0.115..0.707 rows=132 loops=1)
Recheck Cond: (k = 1255)
Filter: mp
-> Bitmap Index Scan on test_domain_k_x1_x2_mp (cost=0.00..5.37 rows=66 width=0) (actual time=0.081..0.081 rows=132 loops=1)
Index Cond: ((k = 1255) AND (mp = true))
Total runtime: 1.137 ms
(6 rows)

I thought I had to refer to all of the columns in order for this to work,
that I could not skip some in the middle, but it seems to work.

--
If you can survive death, you can probably survive anything.

#4Jeff Davis
pgsql@j-davis.com
In reply to: Jeremy Drake (#3)
Re: constraints in query plans

On Sun, 2006-10-15 at 20:36 -0700, Jeremy Drake wrote:

I just tried that,
CREATE INDEX test_domain_k_x1_x2_mp ON test_domain (k, x1, x2, mp);

and dropped the others. That actually works properly.
jeremyd=# explain analyze select * from test_domain where k = 1255 and mp;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test_domain (cost=5.37..237.21 rows=66 width=17) (actual time=0.115..0.707 rows=132 loops=1)
Recheck Cond: (k = 1255)
Filter: mp
-> Bitmap Index Scan on test_domain_k_x1_x2_mp (cost=0.00..5.37 rows=66 width=0) (actual time=0.081..0.081 rows=132 loops=1)
Index Cond: ((k = 1255) AND (mp = true))
Total runtime: 1.137 ms
(6 rows)

I thought I had to refer to all of the columns in order for this to work,
that I could not skip some in the middle, but it seems to work.

As long as "k=1255" is selective enough, the index is useful. That's
because k is the first item in the index key.

Regards,
Jeff Davis