Identifying optimizer usage of indexed expressions
Hello,
I have a Postgres database where many similar expression-based indexes have
been added by a script. It is quite likely that some of the indexes are not
actually used, and I would like to remove these unused indexes.
I can identify indexes that are used directly during query execution using
pg_stat_user_indexes. However, it is possible that the optimizer might be
using the statistics on the indexed expressions to guide query planning. If
that is the case then I might remove an index which appears to be unused,
only to then find that this results in the optimizer choosing poorer plans
as it no longer has the additional statistics on the indexed expressions.
How can I identify indexes which are used by the optimizer when planning?
Are there any relevant internal counters similar to pg_stat_user_indexes?
Or an extension which provides this information?
If there is nothing available inside the database, I was thinking I would
have to parse all of the queries executed on the database, extract the
expressions, work out which columns of which tables are referenced by those
expressions, and then check whether those expressions are indexed. Or
(since that seems like a lot of work), I could remove all indexes which are
reported as unused by pg_stat_user_indexes, but for each and every one of
them create extended statistics on the relevant expression(s) using CREATE
STATISTICS. That might result in a lot of unnecessary expression statistics
- are there any significant costs associated with that? The docs say
"providing benefits similar to an expression index without the overhead of
index maintenance" [1]https://www.postgresql.org/docs/14/sql-createstatistics.html, and don't mention any downside.
Lastly, please could someone point me at the part of the Postgres source
code that handles index expressions in the optimizer?
Thank you
Tim
[1]: https://www.postgresql.org/docs/14/sql-createstatistics.html
On Mon, 2024-02-19 at 23:08 +0000, Tim Palmer wrote:
I can identify indexes that are used directly during query execution using
pg_stat_user_indexes. However, it is possible that the optimizer might be
using the statistics on the indexed expressions to guide query planning.
If that is the case then I might remove an index which appears to be unused,
only to then find that this results in the optimizer choosing poorer plans
as it no longer has the additional statistics on the indexed expressions.How can I identify indexes which are used by the optimizer when planning?
I don't think you can.
If there is nothing available inside the database, I was thinking I would
have to parse all of the queries executed on the database, extract the
expressions, work out which columns of which tables are referenced by
those expressions, and then check whether those expressions are indexed.
Well, in an ideal world, you should know which indexes you created for
which query. But then indexes might be used in ways you didn't intend.
Or (since that seems like a lot of work), I could remove all indexes which
are reported as unused by pg_stat_user_indexes, but for each and every one
of them create extended statistics on the relevant expression(s) using
CREATE STATISTICS. That might result in a lot of unnecessary expression
statistics - are there any significant costs associated with that? The
docs say "providing benefits similar to an expression index without the
overhead of index maintenance" [1], and don't mention any downside.
The small downside is that ANALYZE will have more work, the big downside
is that query planning on tables with advanced statistics takes slightly
longer. That may or may not be a problem for you, but if you could pay
the price of the extra indexes, you will certainly be able to pay the
price of advanced statistics.
You could also consider the option to drop indexes and see if somebody
complains, or watch out for changes in pg_stat_statements.
Yours,
Laurenz Albe