star join optimization

Started by Rudyarabout 14 years ago7 messages
#1Rudyar
rudyar.cortes@gmail.com

Hello,

the hybrid hash join algorithm implemented in the current version of
PostgreSQL has any kind of optimization
for star join queries for Data Warehouse model?

Regards.

#2Robert Haas
robertmhaas@gmail.com
In reply to: Rudyar (#1)
Re: star join optimization

On Mon, Nov 14, 2011 at 8:25 AM, Rudyar <rudyar.cortes@gmail.com> wrote:

the hybrid hash join algorithm implemented in the current version of
PostgreSQL has any kind of optimization
for star join queries for Data Warehouse model?

Not really. As much as possible, we try to make the query optimizer a
general-purpose tool that can handle any query you happen to throw at
it, rather than putting in special-purpose hacks to cater to specific
types of queries. I'm not aware of anything in particular that we
could do to better optimize the star-join case than what we do for any
other query.

Now, one thing that was discussed a year or two ago was the
possibility of considering join algorithms that can handle more than
two tables at a time. Currently, we don't do that, so a four-way join
will be implemented either by joining two tables, then the other two
tables, and then the results of those; or more commonly by joining two
tables, joining the results to a third table, and then joining those
results to the final table. Due to the pipelined nature of our
executor, this works pretty well, but it's possible that there are
better algorithms out there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Rudyar
rudyar.cortes@gmail.com
In reply to: Robert Haas (#2)
Re: star join optimization

On 14/11/11 12:20, Robert Haas wrote:

On Mon, Nov 14, 2011 at 8:25 AM, Rudyar<rudyar.cortes@gmail.com> wrote:

the hybrid hash join algorithm implemented in the current version of
PostgreSQL has any kind of optimization
for star join queries for Data Warehouse model?

Not really. As much as possible, we try to make the query optimizer a
general-purpose tool that can handle any query you happen to throw at
it, rather than putting in special-purpose hacks to cater to specific
types of queries. I'm not aware of anything in particular that we
could do to better optimize the star-join case than what we do for any
other query.

Now, one thing that was discussed a year or two ago was the
possibility of considering join algorithms that can handle more than
two tables at a time. Currently, we don't do that, so a four-way join
will be implemented either by joining two tables, then the other two
tables, and then the results of those; or more commonly by joining two
tables, joining the results to a third table, and then joining those
results to the final table. Due to the pipelined nature of our
executor, this works pretty well, but it's possible that there are
better algorithms out there.

Thanks Robert,

I'm a new programmer in postgreSQL source code and I working in my tesis
project about that optimizations to HHJ algorithm.
I think so is very useful that optimizer recognize one star join and
apply this optimizations..
For example, SQL Server and Oracle databases implements star join query
optimizations for OLAP queries in DW.

How can contribute with my tesis project to postreSQL source code?

Regards.

--

Rudyar Cort�s.
Estudiante de Ingenier�a Civil Inform�tica
Universidad T�cnica Federico Santa Mar�a.

#4Robert Haas
robertmhaas@gmail.com
In reply to: Rudyar (#3)
Re: star join optimization

On Mon, Nov 14, 2011 at 10:36 AM, Rudyar <rudyar.cortes@gmail.com> wrote:

On 14/11/11 12:20, Robert Haas wrote:

On Mon, Nov 14, 2011 at 8:25 AM, Rudyar<rudyar.cortes@gmail.com>  wrote:

the hybrid hash join algorithm implemented in the current version of
PostgreSQL has any kind of optimization
for star join queries for Data Warehouse model?

Not really.  As much as possible, we try to make the query optimizer a
general-purpose tool that can handle any query you happen to throw at
it, rather than putting in special-purpose hacks to cater to specific
types of queries.  I'm not aware of anything in particular that we
could do to better optimize the star-join case than what we do for any
other query.

Now, one thing that was discussed a year or two ago was the
possibility of considering join algorithms that can handle more than
two tables at a time.  Currently, we don't do that, so a four-way join
will be implemented either by joining two tables, then the other two
tables, and then the results of those; or more commonly by joining two
tables, joining the results to a third table, and then joining those
results to the final table.  Due to the pipelined nature of our
executor, this works pretty well, but it's possible that there are
better algorithms out there.

Thanks Robert,

I'm a new programmer in postgreSQL source code and I working in my tesis
project about that optimizations to HHJ algorithm.
I think so is very useful that optimizer recognize one star join and apply
this optimizations..
For example, SQL Server and Oracle databases implements star join query
optimizations for OLAP queries in DW.

How can contribute with my tesis project to postreSQL source code?

A good example might be to show us some of the specific cases that you
think can be improved. Perhaps with a script to set up the test data,
and EXPLAIN ANALYZE output from the queries involved, and a
description of where you see an opportunity for improvement.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Rudyar
rudyar.cortes@gmail.com
In reply to: Robert Haas (#4)
Re: star join optimization

On 14/11/11 12:37, Robert Haas wrote:

On Mon, Nov 14, 2011 at 10:36 AM, Rudyar<rudyar.cortes@gmail.com> wrote:

On 14/11/11 12:20, Robert Haas wrote:

On Mon, Nov 14, 2011 at 8:25 AM, Rudyar<rudyar.cortes@gmail.com> wrote:

the hybrid hash join algorithm implemented in the current version of
PostgreSQL has any kind of optimization
for star join queries for Data Warehouse model?

Not really. As much as possible, we try to make the query optimizer a
general-purpose tool that can handle any query you happen to throw at
it, rather than putting in special-purpose hacks to cater to specific
types of queries. I'm not aware of anything in particular that we
could do to better optimize the star-join case than what we do for any
other query.

Now, one thing that was discussed a year or two ago was the
possibility of considering join algorithms that can handle more than
two tables at a time. Currently, we don't do that, so a four-way join
will be implemented either by joining two tables, then the other two
tables, and then the results of those; or more commonly by joining two
tables, joining the results to a third table, and then joining those
results to the final table. Due to the pipelined nature of our
executor, this works pretty well, but it's possible that there are
better algorithms out there.

Thanks Robert,

I'm a new programmer in postgreSQL source code and I working in my tesis
project about that optimizations to HHJ algorithm.
I think so is very useful that optimizer recognize one star join and apply
this optimizations..
For example, SQL Server and Oracle databases implements star join query
optimizations for OLAP queries in DW.

How can contribute with my tesis project to postreSQL source code?

A good example might be to show us some of the specific cases that you
think can be improved. Perhaps with a script to set up the test data,
and EXPLAIN ANALYZE output from the queries involved, and a
description of where you see an opportunity for improvement.

Ok, I'm working in that project. I will send you my results
and comparision with SQL server HHJ optimization in one or two months.

Regards

--
Rudyar Cort�s.
Estudiante de Ingenier�a Civil Inform�tica
Universidad T�cnica Federico Santa Mar�a.

#6Greg Smith
greg@2ndQuadrant.com
In reply to: Rudyar (#5)
Re: star join optimization

On 11/14/2011 10:44 AM, Rudyar wrote:

Ok, I'm working in that project. I will send you my results
and comparision with SQL server HHJ optimization in one or two months.

Please be careful not to share here details of how features like this
are built in any commercial databases you evaluate. Some of those
implementations use patented design approaches that should be avoided in
an open source project. Oracle, Microsoft, and DB2 are all aggressive
about patenting the innovative parts of their database server code.

In addition to not wanting to accidentally incorporate such a design,
it's better for the PostgreSQL project to not be aware of what patents
in this area exist too. We don't even want a survey of patents in this
area published here because there are increased penalties for willful
patent infringement. See http://en.wikipedia.org/wiki/Treble_damages
for example.

What this project likes best are innovative approaches from recent
academic research that haven't been incorporated in any commercial
products yet. A good example is how the Serializable Snapshot Isolation
technique developed by Cahill and others was added to PostgreSQL 9.1:
http://wiki.postgresql.org/wiki/Serializable There was less concern
over accidentally duplicating a patented approach because that technique
wasn't in any of the commercial databases yet.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

#7Rudyar
rudyar.cortes@gmail.com
In reply to: Greg Smith (#6)
Re: star join optimization

On 14/11/11 13:09, Greg Smith wrote:

On 11/14/2011 10:44 AM, Rudyar wrote:

Ok, I'm working in that project. I will send you my results
and comparision with SQL server HHJ optimization in one or two months.

Please be careful not to share here details of how features like this
are built in any commercial databases you evaluate. Some of those
implementations use patented design approaches that should be avoided
in an open source project. Oracle, Microsoft, and DB2 are all
aggressive about patenting the innovative parts of their database
server code.

In addition to not wanting to accidentally incorporate such a design,
it's better for the PostgreSQL project to not be aware of what patents
in this area exist too. We don't even want a survey of patents in
this area published here because there are increased penalties for
willful patent infringement. See
http://en.wikipedia.org/wiki/Treble_damages for example.

What this project likes best are innovative approaches from recent
academic research that haven't been incorporated in any commercial
products yet. A good example is how the Serializable Snapshot
Isolation technique developed by Cahill and others was added to
PostgreSQL 9.1: http://wiki.postgresql.org/wiki/Serializable There
was less concern over accidentally duplicating a patented approach
because that technique wasn't in any of the commercial databases yet.

Greg,

Ok. I will consider your recommendations.

Best Regards.

--
Rudyar Cort�s.
Estudiante de Ingenier�a Civil Inform�tica
Universidad T�cnica Federico Santa Mar�a.