Add important info about ANALYZE after create Functional Index

Started by Fabrízio de Royes Melloabout 5 years ago40 messages

fabriziomello@gmail.com

about 5 years ago

Hi all,

As you all already know Postgres supports functions in index expressions
(marked as immutable ofc) and for this special index the ANALYZE command
creates some statistics (new pg_statistic entry) about it.

The problem is just after creating a new index or rebuilding concurrently
(using the new REINDEX .. CONCURRENTLY or the old manner creating new one
and then swapping) we need to run ANALYZE to update statistics but we don't
mention it in any part of our documentation.

Last weekend Gitlab went down because the lack of an ANALYZE after
rebuilding concurrently a functional index and they followed the
recommendation we have into our documentation [1]https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2885#note_436310499 about how to rebuild it
concurrently, but we don't warn users about the ANALYZE after.

Would be nice if add some information about it into our docs but not sure
where. I'm thinking about:
- doc/src/sgml/ref/create_index.sgml
- doc/src/sgml/maintenance.sgml (routine-reindex)

Thoughts?

[1]: https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2885#note_436310499
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2885#note_436310499

--
Fabrízio de Royes Mello
PostgreSQL Developer at OnGres Inc. - https://ongres.com

David G. Johnston

david.g.johnston@gmail.com

about 5 years ago

In reply to: Fabrízio de Royes Mello (#1)

Re: Add important info about ANALYZE after create Functional Index

On Mon, Oct 26, 2020 at 3:08 PM Fabrízio de Royes Mello <
fabriziomello@gmail.com> wrote:

Hi all,

As you all already know Postgres supports functions in index expressions
(marked as immutable ofc) and for this special index the ANALYZE command
creates some statistics (new pg_statistic entry) about it.

The problem is just after creating a new index or rebuilding concurrently
(using the new REINDEX .. CONCURRENTLY or the old manner creating new one
and then swapping) we need to run ANALYZE to update statistics but we don't
mention it in any part of our documentation.

Last weekend Gitlab went down because the lack of an ANALYZE after
rebuilding concurrently a functional index and they followed the
recommendation we have into our documentation [1] about how to rebuild it
concurrently, but we don't warn users about the ANALYZE after.

Would be nice if add some information about it into our docs but not sure
where. I'm thinking about:
- doc/src/sgml/ref/create_index.sgml
- doc/src/sgml/maintenance.sgml (routine-reindex)

Thoughts?

[1]
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2885#note_436310499

It would seem preferable to call the lack of auto-analyzing after these
operations a bug and back-patch a fix that injects an analyze side-effect
just before their completion. It doesn't have to be smart either,
analyzing things even if the created (or newly validated) index doesn't
have statistics of its own isn't a problem in my book.

David J.

Nikolay Samokhvalov

samokhvalov@gmail.com

about 5 years ago

In reply to: David G. Johnston (#2)

Re: Add important info about ANALYZE after create Functional Index

On Mon, Oct 26, 2020 at 3:46 PM David G. Johnston <
david.g.johnston@gmail.com> wrote:

It would seem preferable to call the lack of auto-analyzing after these
operations a bug and back-patch a fix that injects an analyze side-effect
just before their completion. It doesn't have to be smart either,
analyzing things even if the created (or newly validated) index doesn't
have statistics of its own isn't a problem in my book.

+1 to consider it as a major problem of CREATE INDEX [CONCURRENTLY] for
indexes on expressions, it's very easy to forget what I've observed many
times.

Although, this triggers a question – should ANALYZE be automated in, say,
pg_restore as well?

And another question: how ANALYZE needs to be run? If it's under the
user's control, there is an option to use vacuumdb --analyze and benefit
from using -j to parallelize the work (and, in some cases, benefit from
using --analyze-in-stages). If we had ANALYZE as a part of building indexes
on expressions, should it be parallelized to the same extent as index
creation (controlled by max_parallel_maintenance_workers)?

Thanks,
Nik

David G. Johnston

david.g.johnston@gmail.com

about 5 years ago

In reply to: Nikolay Samokhvalov (#3)

Re: Add important info about ANALYZE after create Functional Index

On Monday, October 26, 2020, Nikolay Samokhvalov <samokhvalov@gmail.com>
wrote:

Although, this triggers a question – should ANALYZE be automated in, say,
pg_restore as well?

Independent concern.

And another question: how ANALYZE needs to be run? If it's under the
user's control, there is an option to use vacuumdb --analyze and benefit
from using -j to parallelize the work (and, in some cases, benefit from
using --analyze-in-stages). If we had ANALYZE as a part of building indexes
on expressions, should it be parallelized to the same extent as index
creation (controlled by max_parallel_maintenance_workers)?

None of that seems relevant here. The only relevant parameter I see is
what to specify for “table_and_columns”.

David J.

Nikolay Samokhvalov

samokhvalov@gmail.com

about 5 years ago

In reply to: David G. Johnston (#4)

Re: Add important info about ANALYZE after create Functional Index

On Mon, Oct 26, 2020 at 7:03 PM David G. Johnston <
david.g.johnston@gmail.com> wrote:

On Monday, October 26, 2020, Nikolay Samokhvalov <samokhvalov@gmail.com>
wrote:

Although, this triggers a question – should ANALYZE be automated in, say,
pg_restore as well?

Independent concern.

It's the same class of issues – after we created some objects, we lack
statistics and willing to automate its collection. If the approach is
automated in one case, it should be automated in the others, for
consistency.

And another question: how ANALYZE needs to be run? If it's under the

user's control, there is an option to use vacuumdb --analyze and benefit
from using -j to parallelize the work (and, in some cases, benefit from
using --analyze-in-stages). If we had ANALYZE as a part of building indexes
on expressions, should it be parallelized to the same extent as index
creation (controlled by max_parallel_maintenance_workers)?

None of that seems relevant here. The only relevant parameter I see is
what to specify for “table_and_columns”.

I'm not sure I follow.

Thanks,
Nik

Nikolay Samokhvalov

samokhvalov@gmail.com

about 5 years ago

In reply to: Fabrízio de Royes Mello (#1)

2 attachment(s)

Re: Add important info about ANALYZE after create Functional Index

On Mon, Oct 26, 2020 at 3:08 PM Fabrízio de Royes Mello <
fabriziomello@gmail.com> wrote:

Would be nice if add some information about it into our docs but not sure
where. I'm thinking about:
- doc/src/sgml/ref/create_index.sgml
- doc/src/sgml/maintenance.sgml (routine-reindex)

Attaching the patches for the docs, one for 11 and older, and another for
12+ (which have REINDEX CONCURRENTLY not suffering from lack of ANALYZE).

I still think that automating is the right thing to do but of course, it's
a much bigger topic that a quick fix dor the docs.

Attachments:

func-index-analyze.pre12.patchapplication/x-patch; name=func-index-analyze.pre12.patchDownload

From 7e846e6864b48be51bb0afee455f853debe10cb9 Mon Sep 17 00:00:00 2001
From: Nikolay Samokhvalov <samokhvalov@gmail.com>
Date: Tue, 27 Oct 2020 06:26:50 +0000
Subject: [PATCH 1/2] Rebuilding indexes on expressions requires ANALYZE

It is critically important to run ANALYZE before dropping the old index.
This is only relevant to Postgres version up to 11 because which do not have
REINDEX CONCURRENTLY.
---
 doc/src/sgml/maintenance.sgml | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 3b649575e9..c430207deb 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -895,7 +895,9 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
    option can instead be recreated that way. If that is successful and the
    resulting index is valid, the original index can then be replaced by
    the newly built one using a combination of <xref linkend="sql-alterindex"/>
-   and <xref linkend="sql-dropindex"/>. When an index is used to enforce
+   and <xref linkend="sql-dropindex"/>. In the case of rebuilding an index on
+   an expression, it is important to run <literal>ANALYZE</literal> on the
+   table before dropping the original one. When an index is used to enforce
    uniqueness or other constraints, <xref linkend="sql-altertable"/> might
    be necessary to swap the existing constraint with one enforced by
    the new index. Review this alternate multistep rebuild approach
-- 
GitLab


From 58fef5bca6c38e5c6d6ff23896c5f7130631b244 Mon Sep 17 00:00:00 2001
From: Nikolay Samokhvalov <gitlab@samokhvalov.com>
Date: Tue, 27 Oct 2020 00:03:34 -0700
Subject: [PATCH 2/2] Building indexes on expressions requires ANALYZE

It is critically important to run ANALYZE after an index on an
expression is created.

- add a not to "Indexes on Expressions"
- mention that ANALYZE is needed when performing index maintenance
for indexes on expressions (this makes sense only Postgres versions
11 and older, since newer versions have support of REINDEX CONCURRENTLY
that doesn't suffer from lack of ANALYZE)
---
 doc/src/sgml/indices.sgml | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 210a9e0adf..41adbbd4ad 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -734,6 +734,15 @@ CREATE INDEX people_names ON people ((first_name || ' ' || last_name));
    query.  Thus, indexes on expressions are useful when retrieval speed
    is more important than insertion and update speed.
   </para>
+
+  <note>
+   <title>Note</title>
+   <para>
+    Once an index on an expression is successfuly created, it is important to
+    run <literal>ANALYZE</literal> on the corresponding table to gather
+    statistics for the expression.
+   </para>
+  </note>
  </sect1>

func-index-analyze.master.patchapplication/x-patch; name=func-index-analyze.master.patchDownload

From c6ec56913d933f328f4e54ac1ab7123a02d4bccf Mon Sep 17 00:00:00 2001
From: Nikolay Samokhvalov <samokhvalov@gmail.com>
Date: Tue, 27 Oct 2020 07:01:34 +0000
Subject: [PATCH] Building indexes on expressions requires ANALYZE

It is critically important to run ANALYZE after an index on an
expression is created.
---
 doc/src/sgml/indices.sgml | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 671299ff05..bb5d7dfdd5 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -741,6 +741,15 @@ CREATE INDEX people_names ON people ((first_name || ' ' || last_name));
    query.  Thus, indexes on expressions are useful when retrieval speed
    is more important than insertion and update speed.
   </para>
+
+  <note>
+   <title>Note</title>
+   <para>
+    Once an index on an expression is successfuly created, it is important to
+    run <literal>ANALYZE</literal> on the corresponding table to gather
+    statistics for the expression.
+   </para>
+  </note>
  </sect1>

Fabrízio de Royes Mello

fabriziomello@gmail.com

about 5 years ago

In reply to: David G. Johnston (#2)

Re: Add important info about ANALYZE after create Functional Index

On Mon, Oct 26, 2020 at 7:46 PM David G. Johnston <
david.g.johnston@gmail.com> wrote:

It would seem preferable to call the lack of auto-analyzing after these

operations a bug and back-patch a fix that injects an analyze side-effect
just before their completion. It doesn't have to be smart either,
analyzing things even if the created (or newly validated) index doesn't
have statistics of its own isn't a problem in my book.

When we create a new table or index they will not have statistics until an
ANALYZE happens. This is the default behaviour and I think is not a big
problem here, but we need to add some note on docs about the need of
statistics for indexes on expressions.

But IMHO there is a misbehaviour with the implementation of CONCURRENTLY on
REINDEX because running it will lose the statistics. Have a look the
example below:

fabrizio=# SELECT version();
version

---------------------------------------------------------------------------------------------------------
PostgreSQL 14devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
(1 row)

fabrizio=# CREATE TABLE t(f1 BIGSERIAL PRIMARY KEY, f2 TEXT) WITH
(autovacuum_enabled = false);
CREATE TABLE
fabrizio=# INSERT INTO t(f2) SELECT repeat(chr(65+(random()*26)::int),
(random()*300)::int) FROM generate_series(1, 10000);
INSERT 0 10000
fabrizio=# CREATE INDEX t_idx2 ON t(lower(f2));
CREATE INDEX
fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid =
't_pkey'::regclass;
count
-------
0
(1 row)

fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid =
't_idx2'::regclass;
count
-------
0
(1 row)

fabrizio=# ANALYZE t;
ANALYZE
fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid =
't_pkey'::regclass;
count
-------
0
(1 row)

fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid =
't_idx2'::regclass;
count
-------
1
(1 row)

fabrizio=# REINDEX INDEX t_idx2;
REINDEX
fabrizio=# REINDEX INDEX t_pkey;
REINDEX
fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid =
't_pkey'::regclass;
count
-------
0
(1 row)

fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid =
't_idx2'::regclass;
count
-------
1
(1 row)
^^^^^^^^
-- A regular REINDEX don't lose the statistics.

fabrizio=# REINDEX INDEX CONCURRENTLY t_idx2;
REINDEX
fabrizio=# SELECT count(*) FROM pg_statistic WHERE starelid =
't_idx2'::regclass;
count
-------
0
(1 row)

^^^^^^^^
-- But the REINDEX CONCURRENTLY loses.

So IMHO here is the place we should rework a bit to execute ANALYZE as a
last step.

Regards,

--
Fabrízio de Royes Mello
PostgreSQL Developer at OnGres Inc. - https://ongres.com

Fabrízio de Royes Mello

fabriziomello@gmail.com

about 5 years ago

In reply to: Nikolay Samokhvalov (#6)

Re: Add important info about ANALYZE after create Functional Index

On Tue, Oct 27, 2020 at 4:12 AM Nikolay Samokhvalov <samokhvalov@gmail.com>
wrote:

On Mon, Oct 26, 2020 at 3:08 PM Fabrízio de Royes Mello <

fabriziomello@gmail.com> wrote:

Would be nice if add some information about it into our docs but not

sure where. I'm thinking about:

- doc/src/sgml/ref/create_index.sgml
- doc/src/sgml/maintenance.sgml (routine-reindex)

Attaching the patches for the docs, one for 11 and older, and another for

12+ (which have REINDEX CONCURRENTLY not suffering from lack of ANALYZE).

Actually the REINDEX CONCURRENTLY suffers with the lack of ANALYZE. See my
previous message on this thread.

So just adding the note on the ANALYZE docs is enough.

I still think that automating is the right thing to do but of course,

it's a much bigger topic that a quick fix dor the docs.

So what we need to do is see how to fix REINDEX CONCURRENTLY.

Regards,

--
Fabrízio de Royes Mello
PostgreSQL Developer at OnGres Inc. - https://ongres.com

Michael Paquier

michael@paquier.xyz

about 5 years ago

In reply to: Fabrízio de Royes Mello (#7)

Re: Add important info about ANALYZE after create Functional Index

On Tue, Oct 27, 2020 at 11:06:22AM -0300, Fabrízio de Royes Mello wrote:

When we create a new table or index they will not have statistics until an
ANALYZE happens. This is the default behaviour and I think is not a big
problem here, but we need to add some note on docs about the need of
statistics for indexes on expressions.

But IMHO there is a misbehaviour with the implementation of CONCURRENTLY on
REINDEX because running it will lose the statistics. Have a look the
example below:

[...]

So IMHO here is the place we should rework a bit to execute ANALYZE as a
last step.

I agree that this is not user-friendly, and I suspect that we will
need to do something within index_concurrently_swap() to fill in the
stats of the new index from the data of the old one (not looked at
that in details yet).
--
Michael

#10

Fabrízio de Royes Mello

fabriziomello@gmail.com

about 5 years ago

In reply to: Michael Paquier (#9)

Re: Add important info about ANALYZE after create Functional Index

On Wed, Oct 28, 2020 at 2:15 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Oct 27, 2020 at 11:06:22AM -0300, Fabrízio de Royes Mello wrote:

When we create a new table or index they will not have statistics until

ANALYZE happens. This is the default behaviour and I think is not a big
problem here, but we need to add some note on docs about the need of
statistics for indexes on expressions.

But IMHO there is a misbehaviour with the implementation of

CONCURRENTLY on

REINDEX because running it will lose the statistics. Have a look the
example below:

[...]

So IMHO here is the place we should rework a bit to execute ANALYZE as a
last step.

I agree that this is not user-friendly, and I suspect that we will
need to do something within index_concurrently_swap() to fill in the
stats of the new index from the data of the old one (not looked at
that in details yet).

We already do a similar thing for PgStats [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/catalog/index.c;h=0974f3e23a23726b63246cd3a1347e10923dd541;hb=HEAD#l1693 so maybe we should also copy
pg_statistics from old to new index during the swap.

But I'm not sure if it's totally safe anyway and would be better to create
a new phase to issue ANALYZE if necessary (exists statistics for old index).

Regards,

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/catalog/index.c;h=0974f3e23a23726b63246cd3a1347e10923dd541;hb=HEAD#l1693
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/catalog/index.c;h=0974f3e23a23726b63246cd3a1347e10923dd541;hb=HEAD#l1693

--
Fabrízio de Royes Mello
PostgreSQL Developer at OnGres Inc. - https://ongres.com

#11

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 5 years ago

In reply to: Fabrízio de Royes Mello (#7)

Re: Add important info about ANALYZE after create Functional Index

On Tue, Oct 27, 2020 at 11:06:22AM -0300, Fabrï¿½zio de Royes Mello wrote:

On Mon, Oct 26, 2020 at 7:46 PM David G. Johnston <
david.g.johnston@gmail.com> wrote:

It would seem preferable to call the lack of auto-analyzing after these

operations a bug and back-patch a fix that injects an analyze side-effect
just before their completion. It doesn't have to be smart either,
analyzing things even if the created (or newly validated) index doesn't
have statistics of its own isn't a problem in my book.

When we create a new table or index they will not have statistics until an
ANALYZE happens. This is the default behaviour and I think is not a big
problem here, but we need to add some note on docs about the need of
statistics for indexes on expressions.

I think the problem is we notice when a table has not been analyzed yet
(and trigger an analyze), but we won't notice that for an index. So if
the table does not change very often, it may take ages before we build
stats for the index - not great.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 5 years ago

In reply to: David G. Johnston (#2)

Re: Add important info about ANALYZE after create Functional Index

On Mon, Oct 26, 2020 at 03:46:10PM -0700, David G. Johnston wrote:

On Mon, Oct 26, 2020 at 3:08 PM Fabrï¿½zio de Royes Mello <
fabriziomello@gmail.com> wrote:

Hi all,

As you all already know Postgres supports functions in index expressions
(marked as immutable ofc) and for this special index the ANALYZE command
creates some statistics (new pg_statistic entry) about it.

The problem is just after creating a new index or rebuilding concurrently
(using the new REINDEX .. CONCURRENTLY or the old manner creating new one
and then swapping) we need to run ANALYZE to update statistics but we don't
mention it in any part of our documentation.

Last weekend Gitlab went down because the lack of an ANALYZE after
rebuilding concurrently a functional index and they followed the
recommendation we have into our documentation [1] about how to rebuild it
concurrently, but we don't warn users about the ANALYZE after.

Would be nice if add some information about it into our docs but not sure
where. I'm thinking about:
- doc/src/sgml/ref/create_index.sgml
- doc/src/sgml/maintenance.sgml (routine-reindex)

Thoughts?

[1]
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2885#note_436310499

It would seem preferable to call the lack of auto-analyzing after these
operations a bug and back-patch a fix that injects an analyze side-effect
just before their completion. It doesn't have to be smart either,
analyzing things even if the created (or newly validated) index doesn't
have statistics of its own isn't a problem in my book.

I agree the lack of stats may be quite annoying and cause issues, but my
guess is the chances of backpatching such change are about 0.000001%. We
have a usable 'workaround' for this - manual analyze.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13

David G. Johnston

david.g.johnston@gmail.com

about 5 years ago

In reply to: Tomas Vondra (#12)

Re: Add important info about ANALYZE after create Functional Index

On Wed, Oct 28, 2020 at 11:55 AM Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

I agree the lack of stats may be quite annoying and cause issues, but my
guess is the chances of backpatching such change are about 0.000001%. We
have a usable 'workaround' for this - manual analyze.

My guess is that it wouldn't be too difficult to write a patch that could
be safely back-patched and it's worth doing so even if ultimately the
decision is not to. But then again the patch writer isn't going to be me.

Given how simple the manual workaround is not having it be manual seems
like it would be safe and straight-forward to implement.

David J.

#14

Tom Lane

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Tomas Vondra (#12)

Re: Add important info about ANALYZE after create Functional Index

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Mon, Oct 26, 2020 at 03:46:10PM -0700, David G. Johnston wrote:

It would seem preferable to call the lack of auto-analyzing after these
operations a bug and back-patch a fix that injects an analyze side-effect
just before their completion. It doesn't have to be smart either,
analyzing things even if the created (or newly validated) index doesn't
have statistics of its own isn't a problem in my book.

I agree the lack of stats may be quite annoying and cause issues, but my
guess is the chances of backpatching such change are about 0.000001%. We
have a usable 'workaround' for this - manual analyze.

This doesn't seem clearly different from any other situation where
auto-analyze doesn't react fast enough to suit you. I would not
call it a bug, at least not without a wholesale redefinition of
how auto-analyze is supposed to work. As a close analogy, we
don't make any effort to force an immediate auto-analyze after
CREATE STATISTICS.

I don't see anything in the CREATE STATISTICS man page pointing
that out, either. But there's probably room for "Notes" entries
about it in both places.

regards, tom lane

#15

David G. Johnston

david.g.johnston@gmail.com

about 5 years ago

In reply to: Nikolay Samokhvalov (#5)

Re: Add important info about ANALYZE after create Functional Index

On Mon, Oct 26, 2020 at 9:44 PM Nikolay Samokhvalov <samokhvalov@gmail.com>
wrote:

On Mon, Oct 26, 2020 at 7:03 PM David G. Johnston <
david.g.johnston@gmail.com> wrote:

On Monday, October 26, 2020, Nikolay Samokhvalov <samokhvalov@gmail.com>
wrote:

Although, this triggers a question – should ANALYZE be automated in,
say, pg_restore as well?

Independent concern.

It's the same class of issues – after we created some objects, we lack
statistics and willing to automate its collection. If the approach is
automated in one case, it should be automated in the others, for
consistency.

I don't see a need to force consistency between something that will affect,
at most, one table, and something that will affect an entire database or
cluster. The other material difference is that the previous state of a
restore is "nothing" while in the create/reindex cases we are going from
live, populated, state to another.

I do observe that while the create/reindex analyze would run automatically
during the restore on object creation there would be no data present so it
would be close to a no-op in practice.

And another question: how ANALYZE needs to be run? If it's under the

user's control, there is an option to use vacuumdb --analyze and benefit
from using -j to parallelize the work (and, in some cases, benefit from
using --analyze-in-stages). If we had ANALYZE as a part of building indexes
on expressions, should it be parallelized to the same extent as index
creation (controlled by max_parallel_maintenance_workers)?

None of that seems relevant here. The only relevant parameter I see is
what to specify for “table_and_columns”.

I'm not sure I follow.

Describe how parallelism within the session that is auto-analyzing is
supposed to work. vaccuumdb opens up multiple connections which shouldn't
happen here.

I suppose having the auto-analyze run three times with different targets
would work but I'm doubting that is a win. I may just be underestimating
how long an analyze on an extremely large table with high statistics takes.

David J.

#16

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 5 years ago

In reply to: David G. Johnston (#13)

Re: Add important info about ANALYZE after create Functional Index

On Wed, Oct 28, 2020 at 12:00:54PM -0700, David G. Johnston wrote:

On Wed, Oct 28, 2020 at 11:55 AM Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

I agree the lack of stats may be quite annoying and cause issues, but my
guess is the chances of backpatching such change are about 0.000001%. We
have a usable 'workaround' for this - manual analyze.

My guess is that it wouldn't be too difficult to write a patch that could
be safely back-patched and it's worth doing so even if ultimately the
decision is not to. But then again the patch writer isn't going to be me.

Given how simple the manual workaround is not having it be manual seems
like it would be safe and straight-forward to implement.

Maybe, but I wouldn't be surprised if it was actually a bit trickier in
practice, particularly for the CONCURRENTLY case. But I haven't tried.

Anyway, I think there's an agreement it'd be valuable to do this after
CREATE INDEX in the future, so if someone wants to implement it that'd
be great. We can consider backpatching only once we have an actual patch
anyway.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#17

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 5 years ago

In reply to: Tom Lane (#14)

Re: Add important info about ANALYZE after create Functional Index

On Wed, Oct 28, 2020 at 03:05:39PM -0400, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Mon, Oct 26, 2020 at 03:46:10PM -0700, David G. Johnston wrote:

It would seem preferable to call the lack of auto-analyzing after these
operations a bug and back-patch a fix that injects an analyze side-effect
just before their completion. It doesn't have to be smart either,
analyzing things even if the created (or newly validated) index doesn't
have statistics of its own isn't a problem in my book.

I agree the lack of stats may be quite annoying and cause issues, but my
guess is the chances of backpatching such change are about 0.000001%. We
have a usable 'workaround' for this - manual analyze.

This doesn't seem clearly different from any other situation where
auto-analyze doesn't react fast enough to suit you. I would not
call it a bug, at least not without a wholesale redefinition of
how auto-analyze is supposed to work. As a close analogy, we
don't make any effort to force an immediate auto-analyze after
CREATE STATISTICS.

True.

I don't see anything in the CREATE STATISTICS man page pointing
that out, either. But there's probably room for "Notes" entries
about it in both places.

I agree. I'll add it to my TODO list for the next CF.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#18

David G. Johnston

david.g.johnston@gmail.com

about 5 years ago

In reply to: Tom Lane (#14)

Re: Add important info about ANALYZE after create Functional Index

On Wed, Oct 28, 2020 at 12:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

This doesn't seem clearly different from any other situation where
auto-analyze doesn't react fast enough to suit you.

I would not
call it a bug, at least not without a wholesale redefinition of
how auto-analyze is supposed to work.

The definition of auto-analyze is just fine; the issue is with the user
unfriendly position that the only times analyze is ever run is when it is
run manually or heuristically in a separate process. I agree that this
isn't a bug in the traditional sense - the current behavior is intentional
- but it is a POLA violation.

The fundamental question here is do we want to change our policy in this
regard and make our system more user-friendly? If so, let's do so for v14
in honor of the problem the lack of documentation and POLA violation has
recently caused.

Then, as a separate concern, should we admit the oversight and back-patch
our policy change or just move forward and add documentation to older
versions?

As a close analogy, we
don't make any effort to force an immediate auto-analyze after
CREATE STATISTICS.

At least we have been consistent...

David J.

#19

Tom Lane

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Tomas Vondra (#16)

Re: Add important info about ANALYZE after create Functional Index

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Wed, Oct 28, 2020 at 12:00:54PM -0700, David G. Johnston wrote:

Given how simple the manual workaround is not having it be manual seems
like it would be safe and straight-forward to implement.

Maybe, but I wouldn't be surprised if it was actually a bit trickier in
practice, particularly for the CONCURRENTLY case. But I haven't tried.

Anyway, I think there's an agreement it'd be valuable to do this after
CREATE INDEX in the future, so if someone wants to implement it that'd
be great. We can consider backpatching only once we have an actual patch
anyway.

Just to be clear, I'm entirely *not* on board with that. IMV it's
intentional that we do not force auto-analyze activity after CREATE
INDEX or CREATE STATISTICS. If we change that, people will want a
way to opt out of it, and then your "simple" patch isn't so simple
anymore. (Not that it was simple anyway. What if the CREATE is
inside a transaction block, for instance? There's no use in
kicking autovacuum before commit.)

regards, tom lane

#20

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 5 years ago

In reply to: Tom Lane (#19)

Re: Add important info about ANALYZE after create Functional Index

On Wed, Oct 28, 2020 at 03:18:52PM -0400, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Wed, Oct 28, 2020 at 12:00:54PM -0700, David G. Johnston wrote:

Given how simple the manual workaround is not having it be manual seems
like it would be safe and straight-forward to implement.

Maybe, but I wouldn't be surprised if it was actually a bit trickier in
practice, particularly for the CONCURRENTLY case. But I haven't tried.

Anyway, I think there's an agreement it'd be valuable to do this after
CREATE INDEX in the future, so if someone wants to implement it that'd
be great. We can consider backpatching only once we have an actual patch
anyway.

Just to be clear, I'm entirely *not* on board with that. IMV it's
intentional that we do not force auto-analyze activity after CREATE
INDEX or CREATE STATISTICS. If we change that, people will want a
way to opt out of it, and then your "simple" patch isn't so simple
anymore.

True. Some users may have reasons to not want the analyze, I guess.

(Not that it was simple anyway. What if the CREATE is
inside a transaction block, for instance? There's no use in
kicking autovacuum before commit.)

I don't think anyone proposed to do this through autovacuum. There was a
reference to auto-analyze but I think that was meant as 'run analyze
automatically.' Which would work in transactions just fine, I think.

But I agree it'd likely be a more complicated patch than it might seem
at first glance.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#21

Fabrízio de Royes Mello

fabriziomello@gmail.com

about 5 years ago

In reply to: Tomas Vondra (#20)

Re: Add important info about ANALYZE after create Functional Index

On Wed, Oct 28, 2020 at 4:35 PM Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

I don't think anyone proposed to do this through autovacuum. There was a
reference to auto-analyze but I think that was meant as 'run analyze
automatically.' Which would work in transactions just fine, I think.

Maybe I was not very clear at the beginning so will try to clarify my
thoughts:

1) We should add notes on our docs about the need to issue ANALYZE after
creating indexes using expressions and create extended statistics. Nikolay
sent a patch upthread and we can work on it and back patch.

2) REINDEX CONCURRENTLY does not keep statistics (pg_statistc) like a
regular REINDEX for indexes using expressions and to me it's a bug. Michael
pointed out upthread that maybe we should rework a bit
index_concurrently_swap() to copy statistics from old index to new one.

But I agree it'd likely be a more complicated patch than it might seem
at first glance.

If we think about a way to kick AutoAnalyze for sure it will be a more
complicated task but IMHO for now we can do it simply just by copying
statistics like I mentioned above.

Regards,

--
Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

#22

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 5 years ago

In reply to: Fabrízio de Royes Mello (#21)

Re: Add important info about ANALYZE after create Functional Index

On Wed, Oct 28, 2020 at 05:43:08PM -0300, Fabrï¿½zio de Royes Mello wrote:

On Wed, Oct 28, 2020 at 4:35 PM Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

I don't think anyone proposed to do this through autovacuum. There was a
reference to auto-analyze but I think that was meant as 'run analyze
automatically.' Which would work in transactions just fine, I think.

Maybe I was not very clear at the beginning so will try to clarify my
thoughts:

1) We should add notes on our docs about the need to issue ANALYZE after
creating indexes using expressions and create extended statistics. Nikolay
sent a patch upthread and we can work on it and back patch.

2) REINDEX CONCURRENTLY does not keep statistics (pg_statistc) like a
regular REINDEX for indexes using expressions and to me it's a bug. Michael
pointed out upthread that maybe we should rework a bit
index_concurrently_swap() to copy statistics from old index to new one.

Yeah. Not sure it counts as a bug, but I see what you mean - it's
definitely an unexpected/undesirable difference in behavior between
plain REINDEX and concurrent one.

But I agree it'd likely be a more complicated patch than it might seem
at first glance.

If we think about a way to kick AutoAnalyze for sure it will be a more
complicated task but IMHO for now we can do it simply just by copying
statistics like I mentioned above.

I very much doubt just we can rely on autoanalyze here. For one, it'll
have issues with transactions, as Tom already pointed out elsewhere in
this thread. So if you do a reindex after a bulk load in a transaction,
followed by some report queries, autoanalyze is not going to help.

But it has another issue - there may not be any free autovacuum workers,
so it'd have to wait for unknown amount of time. In fact, it'd have to
wait for the autovacuum worker to actually do the analyze, otherwise we
could still have unpredictable behavior for queries immediately after
the REINDEX, even outside transactions. That's not good, so this would
have to do an actual analyze I think.

But as Tom pointed out, the automatic analyze may be against wishes of
some users, and there are other similar cases that don't trigger analyze
(CREATE STATISTICS). So not sure about this.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#23

Michael Paquier

michael@paquier.xyz

about 5 years ago

In reply to: Tomas Vondra (#22)

Re: Add important info about ANALYZE after create Functional Index

On Thu, Oct 29, 2020 at 12:02:11AM +0100, Tomas Vondra wrote:

On Wed, Oct 28, 2020 at 05:43:08PM -0300, Fabrízio de Royes Mello wrote:

2) REINDEX CONCURRENTLY does not keep statistics (pg_statistc) like a
regular REINDEX for indexes using expressions and to me it's a bug. Michael
pointed out upthread that maybe we should rework a bit
index_concurrently_swap() to copy statistics from old index to new one.

Yeah. Not sure it counts as a bug, but I see what you mean - it's
definitely an unexpected/undesirable difference in behavior between
plain REINDEX and concurrent one.

REINDEX CONCURRENTLY is by design wanted to provide an experience
transparent to the user similar to what a plain REINDEX would do, at
least that's the idea behind it, so.. This qualifies as a bug to me,
in spirit.
--
Michael

#24

Michael Paquier

michael@paquier.xyz

about 5 years ago

In reply to: Michael Paquier (#23)

1 attachment(s)

Re: Add important info about ANALYZE after create Functional Index

On Thu, Oct 29, 2020 at 10:59:52AM +0900, Michael Paquier wrote:

REINDEX CONCURRENTLY is by design wanted to provide an experience
transparent to the user similar to what a plain REINDEX would do, at
least that's the idea behind it, so.. This qualifies as a bug to me,
in spirit.

And in spirit, it is possible to address this issue with the patch
attached which copies the set of stats from the old to the new index.
For a non-concurrent REINDEX, this does not happen because we keep the
same base relation, while we replace completely the relation with a
concurrent operation. We have a RemoveStatistics() in heap.c, but I
did not really see the point to invent a copy flavor for this
particular case. Perhaps others have an opinion on that?
--
Michael

Attachments:

reindex-stats-v1.patchtext/x-diff; charset=us-asciiDownload

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 0974f3e23a..3820a03fb6 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -49,6 +49,7 @@
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
+#include "catalog/pg_statistic.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_trigger.h"
 #include "catalog/pg_type.h"
@@ -1711,6 +1712,44 @@ index_concurrently_swap(Oid newIndexId, Oid oldIndexId, const char *oldName)
 		}
 	}
 
+	/*
+	 * Copy over any data of pg_statistic from the old index to the new one.
+	 */
+	{
+		HeapTuple   tup;
+		SysScanDesc scan;
+		ScanKeyData key[1];
+		Relation	statrel;
+
+		statrel = table_open(StatisticRelationId, RowExclusiveLock);
+		ScanKeyInit(&key[0],
+					Anum_pg_statistic_starelid,
+					BTEqualStrategyNumber, F_OIDEQ,
+					ObjectIdGetDatum(oldIndexId));
+
+		scan = systable_beginscan(statrel, StatisticRelidAttnumInhIndexId,
+								  true, NULL, 1, key);
+
+		while (HeapTupleIsValid((tup = systable_getnext(scan))))
+		{
+			Form_pg_statistic statform;
+
+			/* make a modifiable copy */
+			tup = heap_copytuple(tup);
+			statform = (Form_pg_statistic) GETSTRUCT(tup);
+
+			/* update the copy of the tuple and insert it */
+			statform->starelid = newIndexId;
+			CatalogTupleInsert(statrel, tup);
+
+			heap_freetuple(tup);
+		}
+
+		systable_endscan(scan);
+
+		table_close(statrel, RowExclusiveLock);
+	}
+
 	/* Close relations */
 	table_close(pg_class, RowExclusiveLock);
 	table_close(pg_index, RowExclusiveLock);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 6ace7662ee..012c1eb067 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2551,6 +2551,17 @@ CREATE UNIQUE INDEX concur_exprs_index_pred ON concur_exprs_tab (c1)
 CREATE UNIQUE INDEX concur_exprs_index_pred_2
   ON concur_exprs_tab ((1 / c1))
   WHERE ('-H') >= (c2::TEXT) COLLATE "C";
+ANALYZE concur_exprs_tab;
+SELECT starelid::regclass, count(*) FROM pg_statistic WHERE starelid IN (
+  'concur_exprs_index_expr'::regclass,
+  'concur_exprs_index_pred'::regclass,
+  'concur_exprs_index_pred_2'::regclass)
+  GROUP BY starelid ORDER BY starelid::regclass::text;
+        starelid         | count 
+-------------------------+-------
+ concur_exprs_index_expr |     1
+(1 row)
+
 SELECT pg_get_indexdef('concur_exprs_index_expr'::regclass);
                                                 pg_get_indexdef                                                
 ---------------------------------------------------------------------------------------------------------------
@@ -2608,6 +2619,17 @@ SELECT pg_get_indexdef('concur_exprs_index_pred_2'::regclass);
  CREATE UNIQUE INDEX concur_exprs_index_pred_2 ON public.concur_exprs_tab USING btree (((1 / c1))) WHERE ('-H'::text >= (c2 COLLATE "C"))
 (1 row)
 
+-- Statistics should remain intact.
+SELECT starelid::regclass, count(*) FROM pg_statistic WHERE starelid IN (
+  'concur_exprs_index_expr'::regclass,
+  'concur_exprs_index_pred'::regclass,
+  'concur_exprs_index_pred_2'::regclass)
+  GROUP BY starelid ORDER BY starelid::regclass::text;
+        starelid         | count 
+-------------------------+-------
+ concur_exprs_index_expr |     1
+(1 row)
+
 DROP TABLE concur_exprs_tab;
 -- Temporary tables and on-commit actions, where CONCURRENTLY is ignored.
 -- ON COMMIT PRESERVE ROWS, the default.
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 37f7259da9..4e45b18613 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1079,6 +1079,12 @@ CREATE UNIQUE INDEX concur_exprs_index_pred ON concur_exprs_tab (c1)
 CREATE UNIQUE INDEX concur_exprs_index_pred_2
   ON concur_exprs_tab ((1 / c1))
   WHERE ('-H') >= (c2::TEXT) COLLATE "C";
+ANALYZE concur_exprs_tab;
+SELECT starelid::regclass, count(*) FROM pg_statistic WHERE starelid IN (
+  'concur_exprs_index_expr'::regclass,
+  'concur_exprs_index_pred'::regclass,
+  'concur_exprs_index_pred_2'::regclass)
+  GROUP BY starelid ORDER BY starelid::regclass::text;
 SELECT pg_get_indexdef('concur_exprs_index_expr'::regclass);
 SELECT pg_get_indexdef('concur_exprs_index_pred'::regclass);
 SELECT pg_get_indexdef('concur_exprs_index_pred_2'::regclass);
@@ -1091,6 +1097,12 @@ ALTER TABLE concur_exprs_tab ALTER c2 TYPE TEXT;
 SELECT pg_get_indexdef('concur_exprs_index_expr'::regclass);
 SELECT pg_get_indexdef('concur_exprs_index_pred'::regclass);
 SELECT pg_get_indexdef('concur_exprs_index_pred_2'::regclass);
+-- Statistics should remain intact.
+SELECT starelid::regclass, count(*) FROM pg_statistic WHERE starelid IN (
+  'concur_exprs_index_expr'::regclass,
+  'concur_exprs_index_pred'::regclass,
+  'concur_exprs_index_pred_2'::regclass)
+  GROUP BY starelid ORDER BY starelid::regclass::text;
 DROP TABLE concur_exprs_tab;
 
 -- Temporary tables and on-commit actions, where CONCURRENTLY is ignored.

#25

Justin Pryzby

pryzby@telsasoft.com

about 5 years ago

In reply to: Michael Paquier (#24)

Re: Add important info about ANALYZE after create Functional Index

On Fri, Oct 30, 2020 at 03:22:52PM +0900, Michael Paquier wrote:

On Thu, Oct 29, 2020 at 10:59:52AM +0900, Michael Paquier wrote:

REINDEX CONCURRENTLY is by design wanted to provide an experience
transparent to the user similar to what a plain REINDEX would do, at
least that's the idea behind it, so.. This qualifies as a bug to me,
in spirit.

And in spirit, it is possible to address this issue with the patch
attached which copies the set of stats from the old to the new index.
For a non-concurrent REINDEX, this does not happen because we keep the
same base relation, while we replace completely the relation with a
concurrent operation. We have a RemoveStatistics() in heap.c, but I
did not really see the point to invent a copy flavor for this
particular case. Perhaps others have an opinion on that?

The implementation of REINDEX CONCURRENTLY is "CREATE INDEX CONCURRENTLY
followed by internal index swap". But the command is called "reindex", and so
the user experience is that the statistics are inexplicably lost.

(I'm quoting from the commit message of the patch I wrote, which is same as
your patch).

--
Justin

#26

Fabrízio de Royes Mello

fabriziomello@gmail.com

about 5 years ago

In reply to: Michael Paquier (#24)

1 attachment(s)

Re: Add important info about ANALYZE after create Functional Index

On Fri, Oct 30, 2020 at 3:22 AM Michael Paquier <michael@paquier.xyz> wrote:

And in spirit, it is possible to address this issue with the patch
attached which copies the set of stats from the old to the new index.

Did some tests and everything went ok... some comments below!

For a non-concurrent REINDEX, this does not happen because we keep the
same base relation, while we replace completely the relation with a
concurrent operation.

Exactly!

We have a RemoveStatistics() in heap.c, but I
did not really see the point to invent a copy flavor for this
particular case. Perhaps others have an opinion on that?

Even if we won't use it now, IMHO it is more legible to separate this
responsibility into its own CopyStatistics function as attached.

Regards,

--
Fabrízio de Royes Mello
PostgreSQL Developer at OnGres Inc. - https://ongres.com

Attachments:

reindex-stats-v2.patchtext/x-patch; charset=US-ASCII; name=reindex-stats-v2.patchDownload

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 67144aa3c9..7da1f7eb56 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -3142,6 +3142,46 @@ cookConstraint(ParseState *pstate,
 	return expr;
 }
 
+/*
+ * CopyStatistics --- copy entries in pg_statistic from old to new rel
+ *
+ */
+void
+CopyStatistics(Oid oldRelId, Oid newRelId)
+{
+	HeapTuple   tup;
+	SysScanDesc scan;
+	ScanKeyData key[1];
+	Relation	statrel;
+
+	statrel = table_open(StatisticRelationId, RowExclusiveLock);
+	ScanKeyInit(&key[0],
+				Anum_pg_statistic_starelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(oldRelId));
+
+	scan = systable_beginscan(statrel, StatisticRelidAttnumInhIndexId,
+							  true, NULL, 1, key);
+
+	while (HeapTupleIsValid((tup = systable_getnext(scan))))
+	{
+		Form_pg_statistic statform;
+
+		/* make a modifiable copy */
+		tup = heap_copytuple(tup);
+		statform = (Form_pg_statistic) GETSTRUCT(tup);
+
+		/* update the copy of the tuple and insert it */
+		statform->starelid = newRelId;
+		CatalogTupleInsert(statrel, tup);
+
+		heap_freetuple(tup);
+	}
+
+	systable_endscan(scan);
+
+	table_close(statrel, RowExclusiveLock);
+}
 
 /*
  * RemoveStatistics --- remove entries in pg_statistic for a rel or column
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 0974f3e23a..a6975c2dbe 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -49,6 +49,7 @@
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
+#include "catalog/pg_statistic.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_trigger.h"
 #include "catalog/pg_type.h"
@@ -1711,6 +1712,11 @@ index_concurrently_swap(Oid newIndexId, Oid oldIndexId, const char *oldName)
 		}
 	}
 
+	/*
+	 * Copy over any data of pg_statistic from the old index to the new one.
+	 */
+	CopyStatistics(oldIndexId, newIndexId);
+
 	/* Close relations */
 	table_close(pg_class, RowExclusiveLock);
 	table_close(pg_index, RowExclusiveLock);
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index d31141c1a2..3d42edbbf6 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -134,6 +134,7 @@ extern void RemoveAttributeById(Oid relid, AttrNumber attnum);
 extern void RemoveAttrDefault(Oid relid, AttrNumber attnum,
 							  DropBehavior behavior, bool complain, bool internal);
 extern void RemoveAttrDefaultById(Oid attrdefId);
+extern void CopyStatistics(Oid oldRelId, Oid newRelId);
 extern void RemoveStatistics(Oid relid, AttrNumber attnum);
 
 extern const FormData_pg_attribute *SystemAttributeDefinition(AttrNumber attno);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 6ace7662ee..012c1eb067 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2551,6 +2551,17 @@ CREATE UNIQUE INDEX concur_exprs_index_pred ON concur_exprs_tab (c1)
 CREATE UNIQUE INDEX concur_exprs_index_pred_2
   ON concur_exprs_tab ((1 / c1))
   WHERE ('-H') >= (c2::TEXT) COLLATE "C";
+ANALYZE concur_exprs_tab;
+SELECT starelid::regclass, count(*) FROM pg_statistic WHERE starelid IN (
+  'concur_exprs_index_expr'::regclass,
+  'concur_exprs_index_pred'::regclass,
+  'concur_exprs_index_pred_2'::regclass)
+  GROUP BY starelid ORDER BY starelid::regclass::text;
+        starelid         | count 
+-------------------------+-------
+ concur_exprs_index_expr |     1
+(1 row)
+
 SELECT pg_get_indexdef('concur_exprs_index_expr'::regclass);
                                                 pg_get_indexdef                                                
 ---------------------------------------------------------------------------------------------------------------
@@ -2608,6 +2619,17 @@ SELECT pg_get_indexdef('concur_exprs_index_pred_2'::regclass);
  CREATE UNIQUE INDEX concur_exprs_index_pred_2 ON public.concur_exprs_tab USING btree (((1 / c1))) WHERE ('-H'::text >= (c2 COLLATE "C"))
 (1 row)
 
+-- Statistics should remain intact.
+SELECT starelid::regclass, count(*) FROM pg_statistic WHERE starelid IN (
+  'concur_exprs_index_expr'::regclass,
+  'concur_exprs_index_pred'::regclass,
+  'concur_exprs_index_pred_2'::regclass)
+  GROUP BY starelid ORDER BY starelid::regclass::text;
+        starelid         | count 
+-------------------------+-------
+ concur_exprs_index_expr |     1
+(1 row)
+
 DROP TABLE concur_exprs_tab;
 -- Temporary tables and on-commit actions, where CONCURRENTLY is ignored.
 -- ON COMMIT PRESERVE ROWS, the default.
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 37f7259da9..4e45b18613 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1079,6 +1079,12 @@ CREATE UNIQUE INDEX concur_exprs_index_pred ON concur_exprs_tab (c1)
 CREATE UNIQUE INDEX concur_exprs_index_pred_2
   ON concur_exprs_tab ((1 / c1))
   WHERE ('-H') >= (c2::TEXT) COLLATE "C";
+ANALYZE concur_exprs_tab;
+SELECT starelid::regclass, count(*) FROM pg_statistic WHERE starelid IN (
+  'concur_exprs_index_expr'::regclass,
+  'concur_exprs_index_pred'::regclass,
+  'concur_exprs_index_pred_2'::regclass)
+  GROUP BY starelid ORDER BY starelid::regclass::text;
 SELECT pg_get_indexdef('concur_exprs_index_expr'::regclass);
 SELECT pg_get_indexdef('concur_exprs_index_pred'::regclass);
 SELECT pg_get_indexdef('concur_exprs_index_pred_2'::regclass);
@@ -1091,6 +1097,12 @@ ALTER TABLE concur_exprs_tab ALTER c2 TYPE TEXT;
 SELECT pg_get_indexdef('concur_exprs_index_expr'::regclass);
 SELECT pg_get_indexdef('concur_exprs_index_pred'::regclass);
 SELECT pg_get_indexdef('concur_exprs_index_pred_2'::regclass);
+-- Statistics should remain intact.
+SELECT starelid::regclass, count(*) FROM pg_statistic WHERE starelid IN (
+  'concur_exprs_index_expr'::regclass,
+  'concur_exprs_index_pred'::regclass,
+  'concur_exprs_index_pred_2'::regclass)
+  GROUP BY starelid ORDER BY starelid::regclass::text;
 DROP TABLE concur_exprs_tab;
 
 -- Temporary tables and on-commit actions, where CONCURRENTLY is ignored.

#27

Michael Paquier

michael@paquier.xyz

about 5 years ago

In reply to: Fabrízio de Royes Mello (#26)

Re: Add important info about ANALYZE after create Functional Index

On Sat, Oct 31, 2020 at 07:56:33PM -0300, Fabrízio de Royes Mello wrote:

Even if we won't use it now, IMHO it is more legible to separate this
responsibility into its own CopyStatistics function as attached.

By doing so, there is no need to include pg_statistic.h in index.c.
Except that, the logic looks fine at quick glance. In the long-term,
I also think that it would make sense to move both routnes out of
heap.c into a separate pg_statistic.c. That's material for a separate
patch of course.
--
Michael

#28

Michael Paquier

michael@paquier.xyz

about 5 years ago

In reply to: Justin Pryzby (#25)

Re: Add important info about ANALYZE after create Functional Index

On Fri, Oct 30, 2020 at 10:30:13PM -0500, Justin Pryzby wrote:

(I'm quoting from the commit message of the patch I wrote, which is same as
your patch).

(I may have missed something, but you did not send a patch, right?)
--
Michael

#29

Justin Pryzby

pryzby@telsasoft.com

about 5 years ago

In reply to: Michael Paquier (#28)

Re: Add important info about ANALYZE after create Functional Index

On Sun, Nov 01, 2020 at 10:11:06AM +0900, Michael Paquier wrote:

On Fri, Oct 30, 2020 at 10:30:13PM -0500, Justin Pryzby wrote:

(I'm quoting from the commit message of the patch I wrote, which is same as
your patch).

(I may have missed something, but you did not send a patch, right?)

Right, because it's the same as yours.

--
Justin

#30

Michael Paquier

michael@paquier.xyz

about 5 years ago

In reply to: Michael Paquier (#27)

Re: Add important info about ANALYZE after create Functional Index

On Sun, Nov 01, 2020 at 09:23:44AM +0900, Michael Paquier wrote:

By doing so, there is no need to include pg_statistic.h in index.c.
Except that, the logic looks fine at quick glance. In the long-term,
I also think that it would make sense to move both routnes out of
heap.c into a separate pg_statistic.c. That's material for a separate
patch of course.

I have looked again at that, and applied it after some tweaks.
Particularly, I did not really like the use of "old" and "new" for the
copy from the old to a new relation in the new function, so I have
replaced that by "from" and "to".
--
Michael

#31

Fabrízio de Royes Mello

fabriziomello@gmail.com

about 5 years ago

In reply to: Michael Paquier (#30)

Re: Add important info about ANALYZE after create Functional Index

On Sun, 1 Nov 2020 at 09:29 Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Nov 01, 2020 at 09:23:44AM +0900, Michael Paquier wrote:

By doing so, there is no need to include pg_statistic.h in index.c.
Except that, the logic looks fine at quick glance. In the long-term,
I also think that it would make sense to move both routnes out of
heap.c into a separate pg_statistic.c. That's material for a separate
patch of course.

I have looked again at that, and applied it after some tweaks.
Particularly, I did not really like the use of "old" and "new" for the
copy from the old to a new relation in the new function, so I have
replaced that by "from" and "to".

Awesome thanks!!

Regards,

--
Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

#32

Bruce Momjian

bruce@momjian.us

about 5 years ago

In reply to: Nikolay Samokhvalov (#6)

1 attachment(s)

Re: Add important info about ANALYZE after create Functional Index

On Tue, Oct 27, 2020 at 12:12:00AM -0700, Nikolay Samokhvalov wrote:

On Mon, Oct 26, 2020 at 3:08 PM Fabrï¿½zio de Royes Mello <
fabriziomello@gmail.com> wrote:

Would be nice if add some information about it into our docs but not sure
where. I'm thinking about:
- doc/src/sgml/ref/create_index.sgml
- doc/src/sgml/maintenance.sgml (routine-reindex)

Attaching the patches for the docs, one for 11 and older, and another for 12+
(which have REINDEX CONCURRENTLY not suffering from lack of ANALYZE).

I still think that automating is the right thing to do but of course, it's a
much bigger topic that a quick fix dor the docs.

I see REINDEX CONCURRENTLY was fixed in head, but the docs didn't get
updated to mention the need to run ANALYZE or wait for autovacuum before
expression indexes can be fully used by the optimizer. Instead of
putting this mention in the maintenance section, I thought the CREATE
INDEX page make more sense, since it is more of a usability issue,
rather than "why use expression indexes". Patch attached, which I plan
to apply to all supported branches.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

Attachments:

expindex.difftext/x-diff; charset=us-asciiDownload

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
new file mode 100644
index 749db28..48c42db
*** a/doc/src/sgml/ref/create_index.sgml
--- b/doc/src/sgml/ref/create_index.sgml
*************** Indexes:
*** 746,751 ****
--- 746,761 ----
    </para>
  
    <para>
+    The system collects statistics on all of a table's columns.
+    Newly-created non-expression indexes can immediately
+    use these statistics to determine an index's usefulness.
+    For new expression indexes, it is necessary to run <link
+    linkend="sql-analyze"><command>ANALYZE</command></link> or wait for
+    the <link linkend="autovacuum">autovacuum daemon</link> to analyze
+    the table to generate statistics about new expression indexes.
+   </para>
+ 
+   <para>
     For most index methods, the speed of creating an index is
     dependent on the setting of <xref linkend="guc-maintenance-work-mem"/>.
     Larger values will reduce the time needed for index creation, so long

#33

Fabrízio de Royes Mello

fabriziomello@gmail.com

about 5 years ago

In reply to: Bruce Momjian (#32)

Re: Add important info about ANALYZE after create Functional Index

On Mon, 9 Nov 2020 at 20:27 Bruce Momjian <bruce@momjian.us> wrote:

I see REINDEX CONCURRENTLY was fixed in head, but the docs didn't get
updated to mention the need to run ANALYZE or wait for autovacuum before
expression indexes can be fully used by the optimizer. Instead of
putting this mention in the maintenance section, I thought the CREATE
INDEX page make more sense, since it is more of a usability issue,
rather than "why use expression indexes". Patch attached, which I plan
to apply to all supported branches.

Did a quick review and totally agree... thanks a lot Bruce to help us don’t
miss it.

Regards,

--
Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

#34

Bruce Momjian

bruce@momjian.us

about 5 years ago

In reply to: Fabrízio de Royes Mello (#33)

Re: Add important info about ANALYZE after create Functional Index

On Mon, Nov 9, 2020 at 08:35:46PM -0300, Fabrízio de Royes Mello wrote:

On Mon, 9 Nov 2020 at 20:27 Bruce Momjian <bruce@momjian.us> wrote:

I see REINDEX CONCURRENTLY was fixed in head, but the docs didn't get
updated to mention the need to run ANALYZE or wait for autovacuum before
expression indexes can be fully used by the optimizer. Instead of
putting this mention in the maintenance section, I thought the CREATE
INDEX page make more sense, since it is more of a usability issue,
rather than "why use expression indexes". Patch attached, which I plan
to apply to all supported branches.

Did a quick review and totally agree... thanks a lot Bruce to help us don’t
miss it.

Patch applied to all branches. Thanks for the review.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#35

Justin Pryzby

pryzby@telsasoft.com

about 5 years ago

In reply to: Bruce Momjian (#32)

Re: Add important info about ANALYZE after create Functional Index

On Mon, Nov 09, 2020 at 06:27:20PM -0500, Bruce Momjian wrote:

On Tue, Oct 27, 2020 at 12:12:00AM -0700, Nikolay Samokhvalov wrote:

On Mon, Oct 26, 2020 at 3:08 PM Fabrï¿½zio de Royes Mello <fabriziomello@gmail.com> wrote:

Would be nice if add some information about it into our docs but not sure
where. I'm thinking about:
- doc/src/sgml/ref/create_index.sgml
- doc/src/sgml/maintenance.sgml (routine-reindex)

Attaching the patches for the docs, one for 11 and older, and another for 12+
(which have REINDEX CONCURRENTLY not suffering from lack of ANALYZE).

I still think that automating is the right thing to do but of course, it's a
much bigger topic that a quick fix dor the docs.

I see REINDEX CONCURRENTLY was fixed in head, but the docs didn't get
updated to mention the need to run ANALYZE or wait for autovacuum before
expression indexes can be fully used by the optimizer. Instead of
putting this mention in the maintenance section, I thought the CREATE
INDEX page make more sense, since it is more of a usability issue,
rather than "why use expression indexes". Patch attached, which I plan
to apply to all supported branches.

The commited patch actually says:

--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -745,6 +745,16 @@ Indexes:
    sort high</quote>, in queries that depend on indexes to avoid sorting steps.
   </para>

+  <para>
+   The regularly system collects statistics on all of a table's
+   columns.  Newly-created non-expression indexes can immediately
+   use these statistics to determine an index's usefulness.
+   For new expression indexes, it is necessary to run <link
+   linkend="sql-analyze"><command>ANALYZE</command></link> or wait for
+   the <link linkend="autovacuum">autovacuum daemon</link> to analyze
+   the table to generate statistics about new expression indexes.
+  </para>
+

I guess it should say "The system regularly ..."

Also, the last sentence begins "For new expression indexes" and ends with
"about new expression indexes", which I guess could instead say "about the
expressions".

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
new file mode 100644
index 749db28..48c42db
*** a/doc/src/sgml/ref/create_index.sgml
--- b/doc/src/sgml/ref/create_index.sgml
*************** Indexes:
*** 746,751 ****
--- 746,761 ----
</para>

<para>
+    The system collects statistics on all of a table's columns.
+    Newly-created non-expression indexes can immediately
+    use these statistics to determine an index's usefulness.
+    For new expression indexes, it is necessary to run <link
+    linkend="sql-analyze"><command>ANALYZE</command></link> or wait for
+    the <link linkend="autovacuum">autovacuum daemon</link> to analyze
+    the table to generate statistics about new expression indexes.
+   </para>
+ 
+   <para>
For most index methods, the speed of creating an index is
dependent on the setting of <xref linkend="guc-maintenance-work-mem"/>.
Larger values will reduce the time needed for index creation, so long

--
Justin Pryzby
System Administrator
Telsasoft
+1-952-707-8581

#36

Bruce Momjian

bruce@momjian.us

about 5 years ago

In reply to: Justin Pryzby (#35)

1 attachment(s)

Re: Add important info about ANALYZE after create Functional Index

On Thu, Nov 12, 2020 at 03:11:43PM -0600, Justin Pryzby wrote:

I guess it should say "The system regularly ..."

Also, the last sentence begins "For new expression indexes" and ends with
"about new expression indexes", which I guess could instead say "about the
expressions".

How is this followup patch?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

Attachments:

expindex-v2.difftext/x-diff; charset=us-asciiDownload

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 844f00904b..29dee5689e 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -746,13 +746,13 @@ Indexes:
   </para>
 
   <para>
-   The regularly system collects statistics on all of a table's
+   The system regularly collects statistics on all of a table's
    columns.  Newly-created non-expression indexes can immediately
    use these statistics to determine an index's usefulness.
    For new expression indexes, it is necessary to run <link
    linkend="sql-analyze"><command>ANALYZE</command></link> or wait for
    the <link linkend="autovacuum">autovacuum daemon</link> to analyze
-   the table to generate statistics about new expression indexes.
+   the table to generate statistics for these indexes.
   </para>
 
   <para>

#37

Justin Pryzby

pryzby@telsasoft.com

about 5 years ago

In reply to: Bruce Momjian (#36)

Re: Add important info about ANALYZE after create Functional Index

On Thu, Nov 12, 2020 at 06:01:02PM -0500, Bruce Momjian wrote:

On Thu, Nov 12, 2020 at 03:11:43PM -0600, Justin Pryzby wrote:

I guess it should say "The system regularly ..."

Also, the last sentence begins "For new expression indexes" and ends with
"about new expression indexes", which I guess could instead say "about the
expressions".

How is this followup patch?

I see Alvaro already patched the first issue at bcbd77133.

The problematic language was recently introduced, and I'd reported at:
/messages/by-id/20201112211143.GL30691@telsasoft.com
And Erik at:
/messages/by-id/e92b3fba98a0c0f7afc0a2a37e765954@xs4all.nl

--
Justin

#38

Alvaro Herrera

alvherre@alvh.no-ip.org

about 5 years ago

In reply to: Bruce Momjian (#36)

Re: Add important info about ANALYZE after create Functional Index

On 2020-Nov-12, Bruce Momjian wrote:

For new expression indexes, it is necessary to run <link
linkend="sql-analyze"><command>ANALYZE</command></link> or wait for
the <link linkend="autovacuum">autovacuum daemon</link> to analyze
-   the table to generate statistics about new expression indexes.
+   the table to generate statistics for these indexes.

Looks good to me.

#39

Alvaro Herrera

alvherre@alvh.no-ip.org

about 5 years ago

In reply to: Justin Pryzby (#37)

Re: Add important info about ANALYZE after create Functional Index

On 2020-Nov-16, Justin Pryzby wrote:

I see Alvaro already patched the first issue at bcbd77133.

The problematic language was recently introduced, and I'd reported at:
/messages/by-id/20201112211143.GL30691@telsasoft.com
And Erik at:
/messages/by-id/e92b3fba98a0c0f7afc0a2a37e765954@xs4all.nl

Yeah, sorry I didn't notice you had already reported it.

#40

Bruce Momjian

bruce@momjian.us

about 5 years ago

In reply to: Alvaro Herrera (#38)

Re: Add important info about ANALYZE after create Functional Index

On Mon, Nov 16, 2020 at 11:59:03AM -0300, ï¿½lvaro Herrera wrote:

On 2020-Nov-12, Bruce Momjian wrote:

For new expression indexes, it is necessary to run <link
linkend="sql-analyze"><command>ANALYZE</command></link> or wait for
the <link linkend="autovacuum">autovacuum daemon</link> to analyze
-   the table to generate statistics about new expression indexes.
+   the table to generate statistics for these indexes.

Looks good to me.

Applied to all branches. Thanks for the review.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee