pgsql: Add better handling of redundant IS [NOT] NULL quals

Started by David Rowleyabout 2 years ago4 messageshackers

dgrowleyml@gmail.com

about 2 years ago

Add better handling of redundant IS [NOT] NULL quals

Until now PostgreSQL has not been very smart about optimizing away IS
NOT NULL base quals on columns defined as NOT NULL. The evaluation of
these needless quals adds overhead. Ordinarily, anyone who came
complaining about that would likely just have been told to not include
the qual in their query if it's not required. However, a recent bug
report indicates this might not always be possible.

Bug 17540 highlighted that when we optimize Min/Max aggregates the IS NOT
NULL qual that the planner adds to make the rewritten plan ignore NULLs
can cause issues with poor index choice. That particular case
demonstrated that other quals, especially ones where no statistics are
available to allow the planner a chance at estimating an approximate
selectivity for can result in poor index choice due to cheap startup paths
being prefered with LIMIT 1.

Here we take generic approach to fixing this by having the planner check
for NOT NULL columns and just have the planner remove these quals (when
they're not needed) for all queries, not just when optimizing Min/Max
aggregates.

Additionally, here we also detect IS NULL quals on a NOT NULL column and
transform that into a gating qual so that we don't have to perform the
scan at all. This also works for join relations when the Var is not
nullable by any outer join.

This also helps with the self-join removal work as it must replace
strict join quals with IS NOT NULL quals to ensure equivalence with the
original query.

Author: David Rowley, Richard Guo, Andy Fan
Reviewed-by: Richard Guo, David Rowley
Discussion: /messages/by-id/CAApHDvqg6XZDhYRPz0zgOcevSMo0d3vxA9DvHrZtKfqO30WTnw@mail.gmail.com
Discussion: /messages/by-id/17540-7aa1855ad5ec18b4@postgresql.org

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/b262ad440edecda0b1aba81d967ab560a83acb8a

Modified Files
--------------
contrib/postgres_fdw/expected/postgres_fdw.out | 16 +-
contrib/postgres_fdw/sql/postgres_fdw.sql | 4 +-
src/backend/optimizer/plan/initsplan.c | 197 +++++++++++++++++++-
src/backend/optimizer/util/joininfo.c | 28 +++
src/backend/optimizer/util/plancat.c | 19 ++
src/backend/optimizer/util/relnode.c | 3 +
src/include/nodes/pathnodes.h | 7 +-
src/include/optimizer/planmain.h | 4 +
src/test/regress/expected/equivclass.out | 18 +-
src/test/regress/expected/join.out | 67 ++++---
src/test/regress/expected/predicate.out | 244 +++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/predicate.sql | 122 +++++++++++++
13 files changed, 664 insertions(+), 67 deletions(-)

Alvaro Herrera

alvherre@2ndquadrant.com

about 2 years ago

In reply to: David Rowley (#1)

Re: pgsql: Add better handling of redundant IS [NOT] NULL quals

On 2024-Jan-23, David Rowley wrote:

Add better handling of redundant IS [NOT] NULL quals

Until now PostgreSQL has not been very smart about optimizing away IS
NOT NULL base quals on columns defined as NOT NULL.

Hmm, what happens if a NOT NULL constraint is dropped and you have such
a plan in plancache? As I recall, lack of a mechanism to invalidate
such plans was the main reason for Postgres not to have this. One of
the motivations for adding catalogued NOT NULL constraints was precisely
to have an OID that you could use to cause plancache to invalidate such
a plan. Does this new code add something like that?

Admittedly I didn't read the threads or the patch, just skimmed for some
clues, so I may have failed to notice it. But in the tests you added I
don't see any ALTER TABLE DROP CONSTRAINT.

(Similarly, allowing GROUP BY to ignore columns not in the GROUP BY,
when a UNIQUE constraint exists and all columns are NOT NULL; currently
we allow that for PRIMARY KEY, but if you have the NOT NULL constraint
OIDs to cue the plan invalidation would let that case to be implemented
as well.)

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Por suerte hoy explotó el califont porque si no me habría muerto
de aburrido" (Papelucho)

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Alvaro Herrera (#2)

Re: pgsql: Add better handling of redundant IS [NOT] NULL quals

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On 2024-Jan-23, David Rowley wrote:

Until now PostgreSQL has not been very smart about optimizing away IS
NOT NULL base quals on columns defined as NOT NULL.

Hmm, what happens if a NOT NULL constraint is dropped and you have such
a plan in plancache? As I recall, lack of a mechanism to invalidate
such plans was the main reason for Postgres not to have this.

IIRC, we realized that that concern was bogus. Removal of such
constraints would cause pg_attribute.attnotnull to change, leading
to a relcache invalidation on the table, forcing replan. If anyone
tried to get rid of attnotnull or make it incompletely reliable,
then we'd have problems; but AFAIK that's not being contemplated.

regards, tom lane

David Rowley

dgrowleyml@gmail.com

about 2 years ago

In reply to: Alvaro Herrera (#2)

Re: pgsql: Add better handling of redundant IS [NOT] NULL quals

On Wed, 24 Jan 2024 at 08:15, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

(Similarly, allowing GROUP BY to ignore columns not in the GROUP BY,
when a UNIQUE constraint exists and all columns are NOT NULL; currently
we allow that for PRIMARY KEY, but if you have the NOT NULL constraint
OIDs to cue the plan invalidation would let that case to be implemented
as well.)

I recall some discussion about the GROUP BY case. I think at the time
there might have been some confusion with plan cache invalidation and
invalidating views that have been created with columns in the target
list which are functionally dependent on columns in the GROUP BY.

i.e, given:

create table ab (a int primary key, b int not null unique);

the following works:

create view v_ab1 as select a,b from ab group by a; -- works

but this one does not:

create view v_ab2 as select a,b from ab group by b; -- does not work
ERROR: column "ab.a" must appear in the GROUP BY clause or be used in
an aggregate function
LINE 1: create view v_ab2 as select a,b from ab group by b;

I think thanks to your work on adding pg_constraint records for NOT
NULL conditions, the latter case could now be made to work.

As for the plan optimisation, I agree with Tom about the relcache
invalidation triggering a replan. Maybe it's worth adding a test to
ensure the replan is done after a ALTER TABLE ... DROP NOT NULL,
however.

David