query decorrelation in postgres

Started by mahendra chavanover 16 years ago5 messages
#1mahendra chavan
mahcha@gmail.com

Hello,

I am a master's student in computer science at IIT Bombay. As part of my
project, I need to get a decorrelated version of a SQL query. Please could
anyone let me know if we have query decorrelation feature implemented in
postgres ?

Thanks,
Mahendra Chavan

#2Itagaki Takahiro
itagaki.takahiro@oss.ntt.co.jp
In reply to: mahendra chavan (#1)
Re: query decorrelation in postgres

mahendra chavan <mahcha@gmail.com> wrote:

I am a master's student in computer science at IIT Bombay. As part of my
project, I need to get a decorrelated version of a SQL query. Please could
anyone let me know if we have query decorrelation feature implemented in
postgres ?

What do you mean by "query decorrelation"? Is it an addtional method for
query optimization? At least there is no word 'decorrelation' in
the postgres documentation.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

#3mahendra chavan
mahcha@gmail.com
In reply to: Itagaki Takahiro (#2)
Re: query decorrelation in postgres

I am sorry for not elaborating on that. What I meant by de-correlation was
optimizing a query to get rid of sub-queirs by using joins.

eg. In the TPC-H schema, a query to find out the names of suppliers who
supply parts having size < 100

*Query with nested subqueries:*

SELECT
S_NAME
FROM
SUPPLIER
WHERE
S_SUPPKEY
IN (
SELECT
PS_SUPPKEY
FROM
PARTSUPP
WHERE
PS_PARTKEY
IN (
SELECT
P_PARTKEY
FROM
PART
WHERE
P_SIZE < 100
)

*Query with joins without subqueries:*

SELECT
S_NAME
FROM
SUPPLIER
INNER JOIN
PARTSUPP
ON
S_SUPPKEY = PS_SUPPKEY
INNER JOIN
PART
ON
P_PARTKEY = PS_PARTKEY
WHERE
P_SIZE < 100

Thanks,
Mahendra

On Thu, Jul 23, 2009 at 9:02 PM, Itagaki Takahiro <
itagaki.takahiro@oss.ntt.co.jp> wrote:

Show quoted text

mahendra chavan <mahcha@gmail.com> wrote:

I am a master's student in computer science at IIT Bombay. As part of my
project, I need to get a decorrelated version of a SQL query. Please

could

anyone let me know if we have query decorrelation feature implemented in
postgres ?

What do you mean by "query decorrelation"? Is it an addtional method for
query optimization? At least there is no word 'decorrelation' in
the postgres documentation.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

#4Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: mahendra chavan (#3)
Re: query decorrelation in postgres

mahendra chavan <mahcha@gmail.com> wrote:

What I meant by de-correlation was optimizing a query to get rid of
sub-queirs by using joins.

eg. In the TPC-H schema, a query to find out the names of suppliers
who supply parts having size < 100

*Query with nested subqueries:*

SELECT
S_NAME
FROM
SUPPLIER
WHERE
S_SUPPKEY
IN (
SELECT
PS_SUPPKEY
FROM
PARTSUPP
WHERE
PS_PARTKEY
IN (
SELECT
P_PARTKEY
FROM
PART
WHERE
P_SIZE < 100
)

*Query with joins without subqueries:*

SELECT
S_NAME
FROM
SUPPLIER
INNER JOIN
PARTSUPP
ON
S_SUPPKEY = PS_SUPPKEY
INNER JOIN
PART
ON
P_PARTKEY = PS_PARTKEY
WHERE
P_SIZE < 100

Those two queries aren't exactly identical, because you could get
duplicate rows in the second which would not be there in the first.
Optimizations to "pull up" subqueries into a higher level FROM clause
as joins have been in PostgreSQL for as long as I've been using it,
but the ability to do the specific optimization you show there
(without the duplicates) was added in version 8.4 using "semi-joins".
I don't think any syntax was added to explicitly write a query using
semi-joins, though; it's an optimization of EXISTS, IN, ANY, etc.

To see how the planner chooses to execute a given query against a
particular schema which has a particular set of statistics about the
data distributions, use the EXPLAIN option.

http://www.postgresql.org/docs/8.4/interactive/sql-explain.html

-Kevin

#5mahendra chavan
mahcha@gmail.com
In reply to: Kevin Grittner (#4)
Re: query decorrelation in postgres

Thank you for you response. I was looking for a query rewriting mechanism
which would be outside the optimizer and will do this kind of
transformations at the query level.

~Mahendra

On Fri, Jul 24, 2009 at 7:32 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov

Show quoted text

wrote:

mahendra chavan <mahcha@gmail.com> wrote:

What I meant by de-correlation was optimizing a query to get rid of
sub-queirs by using joins.

eg. In the TPC-H schema, a query to find out the names of suppliers
who supply parts having size < 100

*Query with nested subqueries:*

SELECT
S_NAME
FROM
SUPPLIER
WHERE
S_SUPPKEY
IN (
SELECT
PS_SUPPKEY
FROM
PARTSUPP
WHERE
PS_PARTKEY
IN (
SELECT
P_PARTKEY
FROM
PART
WHERE
P_SIZE < 100
)

*Query with joins without subqueries:*

SELECT
S_NAME
FROM
SUPPLIER
INNER JOIN
PARTSUPP
ON
S_SUPPKEY = PS_SUPPKEY
INNER JOIN
PART
ON
P_PARTKEY = PS_PARTKEY
WHERE
P_SIZE < 100

Those two queries aren't exactly identical, because you could get
duplicate rows in the second which would not be there in the first.
Optimizations to "pull up" subqueries into a higher level FROM clause
as joins have been in PostgreSQL for as long as I've been using it,
but the ability to do the specific optimization you show there
(without the duplicates) was added in version 8.4 using "semi-joins".
I don't think any syntax was added to explicitly write a query using
semi-joins, though; it's an optimization of EXISTS, IN, ANY, etc.

To see how the planner chooses to execute a given query against a
particular schema which has a particular set of statistics about the
data distributions, use the EXPLAIN option.

http://www.postgresql.org/docs/8.4/interactive/sql-explain.html

-Kevin