OR vs UNION vs UNION ALL
Hi,
I've reworked a bit an application to make 1 query (with many OR) instead
of thousands (2900 in this bench),
but was a bit disappointed with the perf. Here follow a quick bench and 2
questions at the end:
Here is my table
CREATE TABLE properties
(
item_id text NOT NULL,
calendar_id uuid NOT NULL,
clef text,
valeur text,
recurrence_id bigint,
CONSTRAINT fk_props_id FOREIGN KEY (item_id, calendar_id, recurrence_id)
REFERENCES cal_item (itemid, calendar_id, recurrence_id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE
)
Here are 3 versions of the "same" query, there is ~2900 conditions (OR,
UNION, UNION ALL), returning ~9000 lines (same exact number each time)
(i've ran the tests multiple time, using postgresql 9.3.5 on rhel 6, on an
idle physical server)
1)
SELECT clef,valeur,ITEM_ID,CALENDAR_ID,RECURRENCE_ID FROM PROPERTIES WHERE
(ITEM_ID=((E'1263d4ac-d8a3-44d7-b971-5eee947ff36a')::text) AND
CALENDAR_ID=((E'f9532857-ca68-497b-b2a9-162507110e55')::uuid) AND
RECURRENCE_ID=((0)::int8)) OR
(ITEM_ID=((E'9a1e40bb-0280-4e79-ae13-ee57d614fdef')::text) AND
CALENDAR_ID=((E'f9532857-ca68-497b-b2a9-162507110e55')::uuid) AND
RECURRENCE_ID=((0)::int8)) OR
.....
total time: 7.6s
plan:
Index Cond,
Bitmap index scan on fk_props_id,
BitmapOr,
Recheck Cond,
Bitmap Heap Scan on properties
2)
SELECT clef,valeur,ITEM_ID,CALENDAR_ID,RECURRENCE_ID FROM PROPERTIES WHERE
(ITEM_ID=((E'1263d4ac-d8a3-44d7-b971-5eee947ff36a')::text) AND
CALENDAR_ID=((E'f9532857-ca68-497b-b2a9-162507110e55')::uuid) AND
RECURRENCE_ID=((0)::int8)) UNION
SELECT clef,valeur,ITEM_ID,CALENDAR_ID,RECURRENCE_ID FROM PROPERTIES WHERE
(ITEM_ID=((E'9a1e40bb-0280-4e79-ae13-ee57d614fdef')::text) AND
CALENDAR_ID=((E'f9532857-ca68-497b-b2a9-162507110e55')::uuid) AND
RECURRENCE_ID=((0)::int8)) UNION
...
total time: 1s
plan:
Index Cond,
Index Scan using fki_props_id on properties properties_XXX,
Append,
HashAggregate
3)
SELECT clef,valeur,ITEM_ID,CALENDAR_ID,RECURRENCE_ID FROM PROPERTIES WHERE
(ITEM_ID=((E'1263d4ac-d8a3-44d7-b971-5eee947ff36a')::text) AND
CALENDAR_ID=((E'f9532857-ca68-497b-b2a9-162507110e55')::uuid) AND
RECURRENCE_ID=((0)::int8)) UNION ALL
SELECT clef,valeur,ITEM_ID,CALENDAR_ID,RECURRENCE_ID FROM PROPERTIES WHERE
(ITEM_ID=((E'9a1e40bb-0280-4e79-ae13-ee57d614fdef')::text) AND
CALENDAR_ID=((E'f9532857-ca68-497b-b2a9-162507110e55')::uuid) AND
RECURRENCE_ID=((0)::int8)) UNION ALL
...
total time: 1.2s
plan:
Index Cond,
Index Scan using fki_props_id on properties properties_XXX,
Append
Two questions:
1) Is it normal to have such a big difference between OR and UNION and
should i always prefer UNION when possible?
2) How can UNION ALL be slower than UNION, it's not doing the HashAggregate
but is 0.2s slower?
Thanks in advance
Etienne
On Wed, Sep 23, 2015 at 8:38 AM, Etienne Champetier
<champetier.etienne@gmail.com> wrote:
Two questions:
1) Is it normal to have such a big difference between OR and UNION and
should i always prefer UNION when possible?
yes, it's unfortunate but true. I consider automatic transformation
of these expressions to possible enhancements to the planner.
Although the fact it hasn't already been done hints at high difficulty
or complicated trade-offs.
Note, these queries are not the same. OR/UNION ALL should give the
same result regardless of input but UNION has a deduplication step
that gives a different answer. Avoid pure UNION unless it's very
specifically what you want; it can act as an optimization fence in
more complex queries.
2) How can UNION ALL be slower than UNION, it's not doing the HashAggregate
but is 0.2s slower?
likely measurement noise. take 10 samples and take a median.
merlin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general