Any optimizations to the join code in 7.1?

Started by Mike Mascarialmost 25 years ago11 messageshackers
Jump to latest
#1Mike Mascari
mascarm@mascari.com

Hello.

I have a particular query which performs a 15-way join; I believe in
normalization ;-). Under 7.0.3, using the defaults where GEQO is
enabled after 11, the query (which returns 1 row) takes 10 seconds.
With GEQO turned off, it takes 18 seconds. Naturally I intend to
upgrade as soon as possible, but I looked through the change log and
didn't see anything specific WRT large joins. I was wondering if any
work had been done in that area for 7.1. I realize you can only
squeeze so much blood from stone, but....

Thanks for any info,

Mike Mascari
mascarm@mascari.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mike Mascari (#1)
Re: Any optimizations to the join code in 7.1?

Mike Mascari <mascarm@mascari.com> writes:

I have a particular query which performs a 15-way join;

You should read
http://www.postgresql.org/devel-corner/docs/postgres/explicit-joins.html

regards, tom lane

#3Joel Burton
jburton@scw.org
In reply to: Tom Lane (#2)
Re: Any optimizations to the join code in 7.1?

On Wed, 25 Apr 2001, Tom Lane wrote:

Mike Mascari <mascarm@mascari.com> writes:

I have a particular query which performs a 15-way join;

You should read
http://www.postgresql.org/devel-corner/docs/postgres/explicit-joins.html

I was recently poring over this page myself, as I've been working w/some
larger-than-usual queries.

Two questions:

1) it appears (from my tests) that SELECT * FROM

CREATE VIEW joined as
SELECT p.id,
p.pname,
c.cname
FROM p
LEFT OUTER JOIN c using (id)

gives the same answer as SELECT * FROM

CREATE VIEW nested
SELECT p.id,
p.pname,
(select c.cname from c where c.id = p.id)
FROM p

However, I often am writing VIEWs that will be used by developers
in a front-end system. Usually, this view might have 30 items in the
select clause, but the developer using it is likely to only as for
four or five items. In this case, I often prefer the
subquery form because it appears that

SELECT id, pname FROM joined

is more complicated than

SELECT id, pname FROM nested

as the first has to perform the join, and the second doesn't.

Is this actually correct?

2) The explicit-joins help suggests that manual structuring and
experimentation might help -- has anyone written (or could
anyone write) anthing about where to start in guessing what
join order might be optimal?

--
Joel Burton <jburton@scw.org>
Director of Information Systems, Support Center of Washington

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joel Burton (#3)
Re: Any optimizations to the join code in 7.1?

Joel Burton <jburton@scw.org> writes:

1) it appears (from my tests) that SELECT * FROM

CREATE VIEW joined as
SELECT p.id,
p.pname,
c.cname
FROM p
LEFT OUTER JOIN c using (id)

gives the same answer as SELECT * FROM

CREATE VIEW nested
SELECT p.id,
p.pname,
(select c.cname from c where c.id = p.id)
FROM p

Only if c.id is a unique column (ie, there are always 0 or 1 matches in
c for any given p.id). Otherwise the subselect form will fail.

However, I often am writing VIEWs that will be used by developers
in a front-end system. Usually, this view might have 30 items in the
select clause, but the developer using it is likely to only as for
four or five items. In this case, I often prefer the
subquery form because it appears that
SELECT id, pname FROM joined
is more complicated than
SELECT id, pname FROM nested
as the first has to perform the join, and the second doesn't.

Is this actually correct?

This approach is probably reasonable if the cname field of the view
result is seldom wanted at all, and never used as a WHERE constraint.
You'd get a very nonoptimal plan if someone did

select * from nested where cname like 'foo%'

since the planner has no way to use the LIKE constraint to limit the
rows fetched from p. In the JOIN format, on the other hand, I think
the constraint could be exploited.

Also bear in mind that the subselect form is essentially forcing the
join to be done via a nested loop. If you have an index on c.id then
this may not be too bad, but without one the performance will be
horrid. Even with an index, nested loop with inner indexscan is not
the join method of choice if you are retrieving a lot of rows.

2) The explicit-joins help suggests that manual structuring and
experimentation might help -- has anyone written (or could
anyone write) anthing about where to start in guessing what
join order might be optimal?

The obvious starting point is the plan produced by the planner from an
unconstrained query. Even if you don't feel like trying to improve it,
you could cut the time to reproduce the plan quite a bit --- just CROSS
JOIN a few of the relation pairs that are joined first in the
unconstrained plan.

regards, tom lane

#5Joel Burton
jburton@scw.org
In reply to: Tom Lane (#4)
Re: Any optimizations to the join code in 7.1?

On Wed, 25 Apr 2001, Tom Lane wrote:

2) The explicit-joins help suggests that manual structuring and
experimentation might help -- has anyone written (or could
anyone write) anthing about where to start in guessing what
join order might be optimal?

The obvious starting point is the plan produced by the planner from an
unconstrained query. Even if you don't feel like trying to improve it,
you could cut the time to reproduce the plan quite a bit --- just CROSS
JOIN a few of the relation pairs that are joined first in the
unconstrained plan.

In other words, let it do the work, and steal the credit for
ourselves. :-)

Thanks, Tom. I appreciate your answers to my questions.

In other DB systems I've used, some find that for this original query:

SELECT * FROM a, b WHERE a.id=b.id AND b.name = 'foo';

that this version

SELECT * FROM a JOIN b USING (id) WHERE b.name = 'foo';

has slower performance than

SELECT * FROM b JOIN a USING (id) WHERE b.name = 'foo';

because it can reduce b before any join.

Is it safe to assume that this is a valid optimization in PostgreSQL?

If this whole thing were a view, except w/o the WHERE clause, and we were
querying the view w/the b.name WHERE clause, would we still see a
performance boost from the right arrangement? (ie, does our criteria get
pushed down early enough in the joining process?)

TIA,
--
Joel Burton <jburton@scw.org>
Director of Information Systems, Support Center of Washington

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joel Burton (#5)
Re: Any optimizations to the join code in 7.1?

Joel Burton <jburton@scw.org> writes:

In other DB systems I've used, some find that for this original query:
SELECT * FROM a, b WHERE a.id=b.id AND b.name = 'foo';
that this version
SELECT * FROM a JOIN b USING (id) WHERE b.name = 'foo';
has slower performance than
SELECT * FROM b JOIN a USING (id) WHERE b.name = 'foo';
because it can reduce b before any join.

Is it safe to assume that this is a valid optimization in PostgreSQL?

In general, that'd be a waste of time --- our planner considers the same
set of plans in either case.

However, it could make a difference if the planner thinks that the two
choices (a outer or b outer) have exactly the same cost. In that case
the order you wrote them in will influence which plan actually gets
picked; and if the planner's estimate is wrong --- ie, there really is a
considerable difference in the costs --- then you could see a change in
performance depending on which way you wrote it. That's a pretty
unusual circumstance, maybe, but it just happens that I'm in the middle
of looking at a planning bug wherein exactly this behavior occurs...

If this whole thing were a view, except w/o the WHERE clause, and we were
querying the view w/the b.name WHERE clause, would we still see a
performance boost from the right arrangement? (ie, does our criteria get
pushed down early enough in the joining process?)

Shouldn't make a difference; AFAIK the WHERE clause will get pushed down
as far as possible, independently of whether a view is involved or you
wrote it out the hard way.

regards, tom lane

#7Mike Mascari
mascarm@mascari.com
In reply to: Tom Lane (#6)
RE: Any optimizations to the join code in 7.1?

Sorry for the delay in the response. It took be a day to get
everything upgraded to 7.1. To restate the problem - in 7.0 with
GEQO enabled, a 15-way join took 10 seconds. With GEQO disabled it
took 18 seconds. 7.1 out of the box took only 2 seconds! I was amazed
and shocked at this damned impressive improvement in planning
speed....until I actually used the explicit JOIN syntax described in
11.2. Instanteous results! Instantaneous.....

Thanks a bunch,
(still in shock)

Mike Mascari
mascarm@mascari.com

-----Original Message-----
From: Tom Lane [SMTP:tgl@sss.pgh.pa.us]
Sent: Wednesday, April 25, 2001 12:42 PM
To: mascarm@mascari.com
Cc: 'pgsql-hackers@postgresql.org'
Subject: Re: [HACKERS] Any optimizations to the join code in 7.1?

Mike Mascari <mascarm@mascari.com> writes:

I have a particular query which performs a 15-way join;

You should read
http://www.postgresql.org/devel-corner/docs/postgres/explicit-join
s.html

regards, tom lane

#8Bruce Momjian
bruce@momjian.us
In reply to: Mike Mascari (#7)
Re: Any optimizations to the join code in 7.1?

You can thank Tom Lane for most/all of our optimization improvements.

Sorry for the delay in the response. It took be a day to get
everything upgraded to 7.1. To restate the problem - in 7.0 with
GEQO enabled, a 15-way join took 10 seconds. With GEQO disabled it
took 18 seconds. 7.1 out of the box took only 2 seconds! I was amazed
and shocked at this damned impressive improvement in planning
speed....until I actually used the explicit JOIN syntax described in
11.2. Instanteous results! Instantaneous.....

Thanks a bunch,
(still in shock)

Mike Mascari
mascarm@mascari.com

-----Original Message-----
From: Tom Lane [SMTP:tgl@sss.pgh.pa.us]
Sent: Wednesday, April 25, 2001 12:42 PM
To: mascarm@mascari.com
Cc: 'pgsql-hackers@postgresql.org'
Subject: Re: [HACKERS] Any optimizations to the join code in 7.1?

Mike Mascari <mascarm@mascari.com> writes:

I have a particular query which performs a 15-way join;

You should read
http://www.postgresql.org/devel-corner/docs/postgres/explicit-join
s.html

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#9Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Mike Mascari (#7)
Re: Any optimizations to the join code in 7.1?

... 7.1 out of the box took only 2 seconds! I was amazed
and shocked at this damned impressive improvement in planning
speed....until I actually used the explicit JOIN syntax described in
11.2. Instanteous results! Instantaneous.....

But it is possible, under many circumstances, for query optimization to
be a benefit for a many-table query. The docs indicate that explicit
join syntax bypasses that, even for inner joins, so you may find that
this syntax is a net loss in performance depending on the query and your
choice of table order.

Presumably we will be interested in making these two forms of inner join
equivalent in behavior in a future release. Tom, what are the
impediments we might encounter in doing this?

- Thomas

#10Mike Mascari
mascarm@mascari.com
In reply to: Thomas Lockhart (#9)
RE: Re: Any optimizations to the join code in 7.1?

What would be nice, and I don't know how it would be done or what the
syntax would be, would be a feature that allows PostgreSQL to skip
not only the parsing stage, but the planning stage as well. Then,
when the data has changed dramatically enough to warrant it, as you
point out, a command can be issued to 'refresh' the query plan. My
15-way join has expanded to a 19-way join and is still instantaneous,
albeit on a very small set of data. Before 7.1, the query would
simply have taken far too long, and I would have had to denormalize
the database for performance purposes. With the explicit join syntax,
it allows me to design the database 'the right way'. I basically used
EXPLAIN SELECT... to determine the explicit join order, so as the
data changes, its something I'll have to do on occassion to ensure
good performance, but at least its now possible. :-)

Mike Mascari
mascarm@mascari.com

-----Original Message-----
From: Thomas Lockhart [SMTP:lockhart@alumni.caltech.edu]
Sent: Friday, April 27, 2001 9:49 PM
To: mascarm@mascari.com; 'Tom Lane'
Cc: 'pgsql-hackers@postgresql.org'
Subject: [HACKERS] Re: Any optimizations to the join code in 7.1?

... 7.1 out of the box took only 2 seconds! I was amazed
and shocked at this damned impressive improvement in planning
speed....until I actually used the explicit JOIN syntax described

in

11.2. Instanteous results! Instantaneous.....

But it is possible, under many circumstances, for query optimization
to
be a benefit for a many-table query. The docs indicate that explicit
join syntax bypasses that, even for inner joins, so you may find that
this syntax is a net loss in performance depending on the query and
your
choice of table order.

Presumably we will be interested in making these two forms of inner
join
equivalent in behavior in a future release. Tom, what are the
impediments we might encounter in doing this?

- Thomas

---------------------------(end of broadcast)---------------------
------
TIP 4: Don't 'kill -9' the postmaster

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Lockhart (#9)
Re: Re: Any optimizations to the join code in 7.1?

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:

But it is possible, under many circumstances, for query optimization to
be a benefit for a many-table query. The docs indicate that explicit
join syntax bypasses that, even for inner joins, so you may find that
this syntax is a net loss in performance depending on the query and your
choice of table order.

Presumably we will be interested in making these two forms of inner join
equivalent in behavior in a future release. Tom, what are the
impediments we might encounter in doing this?

I don't think there are any real technical problems in the way; it's
simply an implementation choice not to treat INNER JOIN the same as an
implicit join list. I did it that way in 7.1 mainly as a flyer, to see
how many people would think it's a feature vs. how many think it's a
bug. The votes aren't all in yet, but here we have Mike apparently
pretty pleased with it, while I recall at least one other person who
was not happy with the 7.1 behavior.

regards, tom lane