pgsql: Apply a band-aid fix for the problem that 8.2 and up completely

Started by Tom Laneover 18 years ago3 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

Log Message:
-----------
Apply a band-aid fix for the problem that 8.2 and up completely misestimate
the number of rows likely to be produced by a query such as
SELECT * FROM t1 LEFT JOIN t2 USING (key) WHERE t2.key IS NULL;
What this is doing is selecting for t1 rows with no match in t2, and thus
it may produce a significant number of rows even if the t2.key table column
contains no nulls at all. 8.2 thinks the table column's null fraction is
relevant and thus may estimate no rows out, which results in terrible plans
if there are more joins above this one. A proper fix for this will involve
passing much more information about the context of a clause to the selectivity
estimator functions than we ever have. There's no time left to write such a
patch for 8.3, and it wouldn't be back-patchable into 8.2 anyway. Instead,
put in an ad-hoc test to defeat the normal table-stats-based estimation when
an IS NULL test is evaluated at an outer join, and just use a constant
estimate instead --- I went with 0.5 for lack of a better idea. This won't
catch every case but it will catch the typical ways of writing such queries,
and it seems unlikely to make things worse for other queries.

Tags:
----
REL8_2_STABLE

Modified Files:
--------------
pgsql/src/backend/optimizer/path:
clausesel.c (r1.82 -> r1.82.2.1)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/path/clausesel.c?r1=1.82&r2=1.82.2.1)
pgsql/src/backend/utils/adt:
selfuncs.c (r1.214.2.5 -> r1.214.2.6)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/selfuncs.c?r1=1.214.2.5&r2=1.214.2.6)
pgsql/src/include/utils:
selfuncs.h (r1.36 -> r1.36.2.1)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/utils/selfuncs.h?r1=1.36&r2=1.36.2.1)

#2Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#1)
Re: pgsql: Apply a band-aid fix for the problem that 8.2 and up completely

"Tom Lane" <tgl@postgresql.org> writes:

Log Message:
-----------
Apply a band-aid fix for the problem that 8.2 and up completely misestimate
the number of rows likely to be produced by a query such as
SELECT * FROM t1 LEFT JOIN t2 USING (key) WHERE t2.key IS NULL;

I'm a little wary of backpatching planner logic changes like this and another
instance in the past.

Even if the new logic is better in 99% of cases people with existing systems
will have already dealt with the 99% of cases where the existing releases come
up with poor plans. It seems the only people this patch will affect are the 1%
for whom the old planner works fine and upgrading breaks their application.

I'm not sure I would advocate rolling this particular commit back though,
especially since there's already at least one other planner change iirc.
Perhaps just noting the planner changes specifically in the release
announcement as possible compatibility gotchas is what's needed.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#2)
Re: pgsql: Apply a band-aid fix for the problem that 8.2 and up completely

Gregory Stark <stark@enterprisedb.com> writes:

"Tom Lane" <tgl@postgresql.org> writes:

Log Message:
-----------
Apply a band-aid fix for the problem that 8.2 and up completely misestimate
the number of rows likely to be produced by a query such as
SELECT * FROM t1 LEFT JOIN t2 USING (key) WHERE t2.key IS NULL;

I'm a little wary of backpatching planner logic changes like this and another
instance in the past.

I probably wouldn't have even made this patch if 8.2's behavior weren't
so completely broken on an important class of query. For a significant
number of people, this is a bug fix.

In any case, there have been planner changes much larger than this
committed into the 8.2 branch since release --- both the outer join
rearrangement logic and choose_bitmap_and have needed significant
surgery.

regards, tom lane