BUG #16586: deduplicate_items=true can be configured for numeric indexes

Started by PG Bug reporting formover 5 years ago3 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 16586
Logged by: Matthias van de Meent
Email address: matthias.vandemeent@cofano.nl
PostgreSQL version: 13beta3
Operating system: Debian Stretch (9.13)
Description:

CREATE INDEX numerical_index ON table USING btree ((num::numeric)) WITH

(deduplicate_items=true);
CREATE INDEX

\d+ numerical_index

Index "public.numerical_index"
Column | Type | Key? | Definition | Storage | Stats target
--------+---------+------+------------+---------+--------------
num | numeric | yes | num | main |
btree, for table "public.table"
Options: deduplicate_items=true

There is no error for specifying the "deduplicate_items" -flag. As
deduplication is not supported for indexes with numeric type, I expected the
index creation statement to error.

In reply to: PG Bug reporting form (#1)
Re: BUG #16586: deduplicate_items=true can be configured for numeric indexes

On Thu, Aug 20, 2020 at 4:52 AM PG Bug reporting form
<noreply@postgresql.org> wrote:

There is no error for specifying the "deduplicate_items" -flag. As
deduplication is not supported for indexes with numeric type, I expected the
index creation statement to error.

I don't think that there should be an error. While the
"equalimage"-ness of an operator class (such as btree/numeric_ops) is
in theory static, in practice it could change in either direction. For
example, it's possible (though very unlikely) that somebody will make
the mistake of marking an operator class as equalimage/dedup safe when
they shouldn't have. If this actually happens, a REINDEX shouldn't
raise errors with the same spelling of REINDEX that worked the first
time (e.g. when restoring a dump).

The deduplicate_items storage parameter is kind of an advisory thing.
Deduplication is always applied selectively in unique indexes, even
though it might be slightly better to do so consistently with some
workloads. Also, it's possible that we'll find a way to make some of
the operator classes (though not btree/numeric_ops) deduplication safe
in the future. For example, we could teach container types to report
their "equalimage"-ness by invoking the underlying support function of
contained types. So you could use deduplication with a composite type,
provided it didn't contain unsafe scalar types like numeric.

In general I don't expect that users will consciously think about
deduplication very often -- it's supposed to have very little overhead
in cases that don't benefit, so it will probably fade into the
background even in installations where it provides a lot of benefit. I
don't expect many users will want to make sure that it's enabled in
one index but definitely not enabled in another.

With all of that said, it would be nice if I could raise a NOTICE or
even a WARNING here if and only if the user spelled out
"deduplicate_items = on". Hard to see how to do that with the current
design of reloptions, though, unless it's okay to show it even when
"deduplicate_items = on" was not specifically provided (I don't think
that it's okay). An index access method (such as nbtree) can tell
whether or not all storage params should come from the defaults by
checking if the rel's rd_options is NULL or not, but that's not the
same thing -- it'll be set when fillfactor was explicitly set, for
example.

--
Peter Geoghegan

#3Matthias van de Meent
matthias.vandemeent@cofano.nl
In reply to: Peter Geoghegan (#2)
Re: BUG #16586: deduplicate_items=true can be configured for numeric indexes

On Sat, 22 Aug 2020 at 00:49, Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Aug 20, 2020 at 4:52 AM PG Bug reporting form
<noreply@postgresql.org> wrote:

There is no error for specifying the "deduplicate_items" -flag. As
deduplication is not supported for indexes with numeric type, I expected the
index creation statement to error.

I don't think that there should be an error. While the
"equalimage"-ness of an operator class (such as btree/numeric_ops) is
in theory static, in practice it could change in either direction. For
example, it's possible (though very unlikely) that somebody will make
the mistake of marking an operator class as equalimage/dedup safe when
they shouldn't have. If this actually happens, a REINDEX shouldn't
raise errors with the same spelling of REINDEX that worked the first
time (e.g. when restoring a dump).

The deduplicate_items storage parameter is kind of an advisory thing.

The current documentation is quite unclear about that, as the flag
itself is documented as "Controls usage of the B-tree deduplication
technique described in Section 63.4.2.". A note "Even when configured,
the feature will not be used if it does not pass the limitations as
described in section 63.4.2" would help in preventing confusion.

Deduplication is always applied selectively in unique indexes, even
though it might be slightly better to do so consistently with some
workloads. Also, it's possible that we'll find a way to make some of
the operator classes (though not btree/numeric_ops) deduplication safe
in the future. For example, we could teach container types to report
their "equalimage"-ness by invoking the underlying support function of
contained types. So you could use deduplication with a composite type,
provided it didn't contain unsafe scalar types like numeric.

In general I don't expect that users will consciously think about
deduplication very often -- it's supposed to have very little overhead
in cases that don't benefit, so it will probably fade into the
background even in installations where it provides a lot of benefit. I
don't expect many users will want to make sure that it's enabled in
one index but definitely not enabled in another.

With all of that said, it would be nice if I could raise a NOTICE or
even a WARNING here if and only if the user spelled out
"deduplicate_items = on". Hard to see how to do that with the current
design of reloptions, though, unless it's okay to show it even when
"deduplicate_items = on" was not specifically provided (I don't think
that it's okay). An index access method (such as nbtree) can tell
whether or not all storage params should come from the defaults by
checking if the rel's rd_options is NULL or not, but that's not the
same thing -- it'll be set when fillfactor was explicitly set, for
example.

Thanks for the reply, it was very insightful.

- Matthias

Show quoted text

--
Peter Geoghegan