query decorrelation in postgres
Hello,
I am a master's student in computer science at IIT Bombay. As part of my
project, I need to get a decorrelated version of a SQL query. Please could
anyone let me know if we have query decorrelation feature implemented in
postgres ?
Thanks,
Mahendra Chavan
mahendra chavan <mahcha@gmail.com> wrote:
I am a master's student in computer science at IIT Bombay. As part of my
project, I need to get a decorrelated version of a SQL query. Please could
anyone let me know if we have query decorrelation feature implemented in
postgres ?
What do you mean by "query decorrelation"? Is it an addtional method for
query optimization? At least there is no word 'decorrelation' in
the postgres documentation.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
I am sorry for not elaborating on that. What I meant by de-correlation was
optimizing a query to get rid of sub-queirs by using joins.
eg. In the TPC-H schema, a query to find out the names of suppliers who
supply parts having size < 100
*Query with nested subqueries:*
SELECT
S_NAME
FROM
SUPPLIER
WHERE
S_SUPPKEY
IN (
SELECT
PS_SUPPKEY
FROM
PARTSUPP
WHERE
PS_PARTKEY
IN (
SELECT
P_PARTKEY
FROM
PART
WHERE
P_SIZE < 100
)
*Query with joins without subqueries:*
SELECT
S_NAME
FROM
SUPPLIER
INNER JOIN
PARTSUPP
ON
S_SUPPKEY = PS_SUPPKEY
INNER JOIN
PART
ON
P_PARTKEY = PS_PARTKEY
WHERE
P_SIZE < 100
Thanks,
Mahendra
On Thu, Jul 23, 2009 at 9:02 PM, Itagaki Takahiro <
itagaki.takahiro@oss.ntt.co.jp> wrote:
Show quoted text
mahendra chavan <mahcha@gmail.com> wrote:
I am a master's student in computer science at IIT Bombay. As part of my
project, I need to get a decorrelated version of a SQL query. Pleasecould
anyone let me know if we have query decorrelation feature implemented in
postgres ?What do you mean by "query decorrelation"? Is it an addtional method for
query optimization? At least there is no word 'decorrelation' in
the postgres documentation.Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
mahendra chavan <mahcha@gmail.com> wrote:
What I meant by de-correlation was optimizing a query to get rid of
sub-queirs by using joins.eg. In the TPC-H schema, a query to find out the names of suppliers
who supply parts having size < 100*Query with nested subqueries:*
SELECT
S_NAME
FROM
SUPPLIER
WHERE
S_SUPPKEY
IN (
SELECT
PS_SUPPKEY
FROM
PARTSUPP
WHERE
PS_PARTKEY
IN (
SELECT
P_PARTKEY
FROM
PART
WHERE
P_SIZE < 100
)*Query with joins without subqueries:*
SELECT
S_NAME
FROM
SUPPLIER
INNER JOIN
PARTSUPP
ON
S_SUPPKEY = PS_SUPPKEY
INNER JOIN
PART
ON
P_PARTKEY = PS_PARTKEY
WHERE
P_SIZE < 100
Those two queries aren't exactly identical, because you could get
duplicate rows in the second which would not be there in the first.
Optimizations to "pull up" subqueries into a higher level FROM clause
as joins have been in PostgreSQL for as long as I've been using it,
but the ability to do the specific optimization you show there
(without the duplicates) was added in version 8.4 using "semi-joins".
I don't think any syntax was added to explicitly write a query using
semi-joins, though; it's an optimization of EXISTS, IN, ANY, etc.
To see how the planner chooses to execute a given query against a
particular schema which has a particular set of statistics about the
data distributions, use the EXPLAIN option.
http://www.postgresql.org/docs/8.4/interactive/sql-explain.html
-Kevin
Thank you for you response. I was looking for a query rewriting mechanism
which would be outside the optimizer and will do this kind of
transformations at the query level.
~Mahendra
On Fri, Jul 24, 2009 at 7:32 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov
Show quoted text
wrote:
mahendra chavan <mahcha@gmail.com> wrote:
What I meant by de-correlation was optimizing a query to get rid of
sub-queirs by using joins.eg. In the TPC-H schema, a query to find out the names of suppliers
who supply parts having size < 100*Query with nested subqueries:*
SELECT
S_NAME
FROM
SUPPLIER
WHERE
S_SUPPKEY
IN (
SELECT
PS_SUPPKEY
FROM
PARTSUPP
WHERE
PS_PARTKEY
IN (
SELECT
P_PARTKEY
FROM
PART
WHERE
P_SIZE < 100
)*Query with joins without subqueries:*
SELECT
S_NAME
FROM
SUPPLIER
INNER JOIN
PARTSUPP
ON
S_SUPPKEY = PS_SUPPKEY
INNER JOIN
PART
ON
P_PARTKEY = PS_PARTKEY
WHERE
P_SIZE < 100Those two queries aren't exactly identical, because you could get
duplicate rows in the second which would not be there in the first.
Optimizations to "pull up" subqueries into a higher level FROM clause
as joins have been in PostgreSQL for as long as I've been using it,
but the ability to do the specific optimization you show there
(without the duplicates) was added in version 8.4 using "semi-joins".
I don't think any syntax was added to explicitly write a query using
semi-joins, though; it's an optimization of EXISTS, IN, ANY, etc.To see how the planner chooses to execute a given query against a
particular schema which has a particular set of statistics about the
data distributions, use the EXPLAIN option.http://www.postgresql.org/docs/8.4/interactive/sql-explain.html
-Kevin