pgsql: Allow pushdown of WHERE quals into subqueries with window functi
Allow pushdown of WHERE quals into subqueries with window functions.
We can allow this even without any specific knowledge of the semantics
of the window function, so long as pushed-down quals will either accept
every row in a given window partition, or reject every such row. Because
window functions act only within a partition, such a case can't result
in changing the window functions' outputs for any surviving row.
Eliminating entire partitions in this way obviously can reduce the cost
of the window-function computations substantially.
The fly in the ointment is that it's hard to be entirely sure whether
this is true for an arbitrary qual condition. This patch allows pushdown
if (a) the qual references only partitioning columns, and (b) the qual
contains no volatile functions. We are at risk of incorrect results if
the qual can produce different answers for values that the partitioning
equality operator sees as equal. While it's not hard to invent cases
for which that can happen, it seems to seldom be a problem in practice,
since no one has complained about a similar assumption that we've had
for many years with respect to DISTINCT. The potential performance
gains seem to be worth the risk.
David Rowley, reviewed by Vik Fearing; some credit is due also to
Thomas Mayer who did considerable preliminary investigation.
Branch
------
master
Details
-------
http://git.postgresql.org/pg/commitdiff/d222585a9f7a18f2d793785c82be4c877b90c461
Modified Files
--------------
src/backend/optimizer/path/allpaths.c | 79 ++++++++++++++++++++++++++++-----
src/test/regress/expected/window.out | 41 +++++++++++++++++
src/test/regress/sql/window.sql | 20 +++++++++
3 files changed, 128 insertions(+), 12 deletions(-)
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On Sat, Jun 28, 2014 at 06:08:05AM +0000, Tom Lane wrote:
Allow pushdown of WHERE quals into subqueries with window functions.
We can allow this even without any specific knowledge of the semantics
of the window function, so long as pushed-down quals will either accept
every row in a given window partition, or reject every such row. Because
window functions act only within a partition, such a case can't result
in changing the window functions' outputs for any surviving row.
Eliminating entire partitions in this way obviously can reduce the cost
of the window-function computations substantially.The fly in the ointment is that it's hard to be entirely sure whether
this is true for an arbitrary qual condition. This patch allows pushdown
if (a) the qual references only partitioning columns, and (b) the qual
contains no volatile functions. We are at risk of incorrect results if
the qual can produce different answers for values that the partitioning
equality operator sees as equal. While it's not hard to invent cases
for which that can happen, it seems to seldom be a problem in practice,
since no one has complained about a similar assumption that we've had
for many years with respect to DISTINCT. The potential performance
gains seem to be worth the risk.
SQL leaves room for our DISTINCT optimization but not for this. Where
pushdown of a qual beneath DISTINCT changes query results, it does so by
changing which one of various equal, distinguishable values survives to
represent a group. SQL says:
If the <set quantifier> DISTINCT is specified, and the most specific type
of a result column is character string, datetime with time zone or a
user-defined type, then the precise values retained in that column after
eliminating redundant duplicates is implementation-dependent.
PostgreSQL conforms, making qual pushdown one of the factors determining which
value to retain. This influences queries having a pushed-down qual that
distinguishes equal values. An optimization fence changes the result:
SELECT * FROM (SELECT DISTINCT x FROM (VALUES (1.00),(1.1),(1.0)) t1(x) OFFSET 0) t0 WHERE length(x::text) <> 4;
x
-----
1.1
(1 row)
SELECT * FROM (SELECT DISTINCT x FROM (VALUES (1.00),(1.1),(1.0)) t1(x)) t0 WHERE length(x::text) <> 4;
x
-----
1.0
1.1
(2 rows)
Type "timetamptz" is unaffected in PostgreSQL. Character strings are affected
less than SQL contemplates, because PostgreSQL strings use binary equality. I
don't know why SQL omits "numeric" from the list of affected types.
Here is the corresponding scenario for window functions:
SELECT x, y FROM (
SELECT x, first_value(x) OVER (PARTITION BY x ORDER BY length(x::text)) AS y FROM
(VALUES (1.00),(1.0)) t1(x)
OFFSET 0
) t0 WHERE length(x::text) = 4;
x | y
------+-----
1.00 | 1.0
(1 row)
SELECT x, y FROM (
SELECT x, first_value(x) OVER (PARTITION BY x ORDER BY length(x::text)) AS y FROM
(VALUES (1.00),(1.0)) t1(x)
) t0 WHERE length(x::text) = 4;
x | y
------+------
1.00 | 1.00
(1 row)
Only the first result is correct. SQL does not give us the wiggle room it
offered for DISTINCT. The conformance sacrifice is perhaps worthwhile
nonetheless. I didn't see this highlighted in the original discussion, so I
wished to ensure this compromise is known explicitly.
If the system catalogs recorded whether each equality operator is also an
identity operator, we could restrict the optimization to types where the
equality operator of the partitioning opclass is so marked. That eliminates
the conformance problem. Such an annotation would facilitate other planner
advances; for example, predtest.c could attempt to refute f(x) given x = k.
Thanks,
nm
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers