BUG #17942: vacuumdb doesn't populate extended statistics on partitioned tables

Started by PG Bug reporting formalmost 3 years ago8 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 17942
Logged by: Henri Chapelle
Email address: henri.chapelle@dbandmore.com
PostgreSQL version: 13.8
Operating system: Centos 7
Description:

Hi,
We recently added extended statistics on big partitioned tables in
Postgresql 13.8.
While runing analyze tablename or vacuumdb -Z -t tablename populate extended
statistics, the vacuumdb -Z command (without -t) doesn't process the parent
table during the analyze.
We created a script to collecte table names and run the analyze table by
table.
Is it expected ?
The documentation mentions that autovacuum will not process the parent table
due to no data in the parent table.
But vacuumdb should process all tables, right ?

Here is an example :
CREATE TABLE people (
id int not null,
birth_date date not null,
country_code character(2) not null,
name text
) PARTITION BY RANGE (birth_date);

CREATE TABLE people_y2000 PARTITION OF people
FOR VALUES FROM ('2000-01-01') TO ('2001-01-01');

CREATE TABLE people_y2001 PARTITION OF people
FOR VALUES FROM ('2001-01-01') TO ('2002-01-01');

CREATE TABLE people_y2002 PARTITION OF people
FOR VALUES FROM ('2002-01-01') TO ('2003-01-01');

INSERT INTO people (id, birth_date, country_code, name) VALUES (1,
'2000-01-01', 'US', 'John'), (2, '2000-02-02', 'IT', 'Jane'), (3,
'2001-03-03', 'FR', 'Bob');

create statistics people_stat_001 on birth_date, name from people;

vacuumdb -Z -t people : -> ok extende stats collected
vacuumdb -Z : -> no extended stats and analyze much faster

#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #17942: vacuumdb doesn't populate extended statistics on partitioned tables

At Wed, 24 May 2023 06:41:23 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in

vacuumdb -Z -t people : -> ok extende stats collected
vacuumdb -Z : -> no extended stats and analyze much faster

ANALYZE processes inhertance parents. By a quick search, I found that commit 3c3bb99330 modified get_rel_oids (which has now been relocated to get_all_vacuum_rels) to include RELKIND_PARTITIONED_TABLE. However, it seems like it overlooked updating the same in vacuumdb.c. This dates back to 10.

diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 4b17a07089..1369c37504 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -686,7 +686,8 @@ vacuum_one_database(ConnParams *cparams,
 	{
 		appendPQExpBufferStr(&catalog_query, " WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array["
 							 CppAsString2(RELKIND_RELATION) ", "
-							 CppAsString2(RELKIND_MATVIEW) "])\n");
+							 CppAsString2(RELKIND_MATVIEW) ", "
+							 CppAsString2(RELKIND_PARTITIONED_TABLE) "])\n");
 		has_where = true;
 	}

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#3Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#2)
Re: BUG #17942: vacuumdb doesn't populate extended statistics on partitioned tables

On Thu, May 25, 2023 at 01:07:59PM +0900, Kyotaro Horiguchi wrote:

ANALYZE processes inhertance parents. By a quick search, I found
that commit 3c3bb99330 modified get_rel_oids (which has now been
relocated to get_all_vacuum_rels) to include
RELKIND_PARTITIONED_TABLE. However, it seems like it overlooked
updating the same in vacuumdb.c. This dates back to 10.

You cannot do that, because this would make a single run of vacuumdb
process more than once each partition. Am I missing something?
--
Michael

#4Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#3)
Re: BUG #17942: vacuumdb doesn't populate extended statistics on partitioned tables

At Thu, 25 May 2023 16:45:50 +0900, Michael Paquier <michael@paquier.xyz> wrote in

On Thu, May 25, 2023 at 01:07:59PM +0900, Kyotaro Horiguchi wrote:

ANALYZE processes inhertance parents. By a quick search, I found
that commit 3c3bb99330 modified get_rel_oids (which has now been
relocated to get_all_vacuum_rels) to include
RELKIND_PARTITIONED_TABLE. However, it seems like it overlooked
updating the same in vacuumdb.c. This dates back to 10.

You cannot do that, because this would make a single run of vacuumdb
process more than once each partition. Am I missing something?

It seems to be exactly the same as ANALYZE, though. I'm a bit unclear
about our perspective on this SQL command's behavior.

We had a patch proposed to prevent such duplication, but it was
withdrawn for a reason that's I don't remember clearly.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#5Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#4)
Re: BUG #17942: vacuumdb doesn't populate extended statistics on partitioned tables

On Fri, May 26, 2023 at 09:15:36AM +0900, Kyotaro Horiguchi wrote:

It seems to be exactly the same as ANALYZE, though. I'm a bit unclear
about our perspective on this SQL command's behavior.

Sorry for being a bit unclear here. When dealing with partitioned
tables, a database-wide ANALYZE processes the partitions individually
as well as a full partition/inheritance tree.

My point is slightly different though: your suggestion of adding
RELKIND_PARTITIONED_TABLE to the filter added in vacuumdb would work
for -Z, but it would cause the vacuum code path of vacuumdb to process
more than once all the partitions in a single run. For instance, take
this schema:
CREATE TABLE parent_list (id int) PARTITION BY LIST (id);
CREATE TABLE child_list PARTITION OF parent_list
FOR VALUES IN (1, 2);

`vacuumdb` would now list both parent_list and child_list, making
child_list being vacuumed twice, which is not necessary. In order to
get a behavior in parity with the SQL commands ANALYZE, VACUUM and
VACUUM ANALYZE, we need to be more careful about the addition of
RELKIND_PARTITIONED_TABLE to the filtering clause.
--
Michael

#6Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#5)
Re: BUG #17942: vacuumdb doesn't populate extended statistics on partitioned tables

At Fri, 26 May 2023 14:48:30 +0900, Michael Paquier <michael@paquier.xyz> wrote in

On Fri, May 26, 2023 at 09:15:36AM +0900, Kyotaro Horiguchi wrote:

It seems to be exactly the same as ANALYZE, though. I'm a bit unclear
about our perspective on this SQL command's behavior.

Sorry for being a bit unclear here. When dealing with partitioned
tables, a database-wide ANALYZE processes the partitions individually
as well as a full partition/inheritance tree.

My point is slightly different though: your suggestion of adding
RELKIND_PARTITIONED_TABLE to the filter added in vacuumdb would work
for -Z, but it would cause the vacuum code path of vacuumdb to process
more than once all the partitions in a single run. For instance, take
this schema:

..

`vacuumdb` would now list both parent_list and child_list, making
child_list being vacuumed twice, which is not necessary. In order to
get a behavior in parity with the SQL commands ANALYZE, VACUUM and
VACUUM ANALYZE, we need to be more careful about the addition of
RELKIND_PARTITIONED_TABLE to the filtering clause.

Ah, thanks. The difference lies in how VACUUM and vacuumdb handle
table names. VACUUM collects all names automatically, while the
vacuumdb specifies individual table names. The difference in handling
table names seems to be due to vacuumdb's certain options that need to
be checked against each table at the client side, specifically
--min-xid-age and min-mxid-age.

It might be nice if we included these options in VACUUM/ANALYZE's
syntax. Then vacuumdb wouldn't have to explicity gather table names.

On the other hand, regarding the existing versions. It would make
sense to allow partitioned tables only for analyze-only
cases. However, in the case of vacuum-analyze, we rather need to
exclude children. Therefore, it might be a better approach to always
exlclude children from the target relation list only if the parent is
present in the list. Anyway, I don't find a simple way to do that for
now.

So, the simplest measure for the issue would be to add the description
about the restriction to the documentation..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#7Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#6)
Re: BUG #17942: vacuumdb doesn't populate extended statistics on partitioned tables

At Fri, 26 May 2023 16:49:35 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

present in the list. Anyway, I don't find a simple way to do that for
now.

For 12 or later, pg_partition_ancestors() is available. Thus something
like the following query would work.

+WITH inh_children AS
+ (SELECT tc.relid::name, t.nspname FROM
+  (SELECT c.relname, ns.nspname FROM pg_catalog.pg_class c
+   JOIN pg_catalog.pg_namespace ns ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid
+   LEFT JOIN pg_catalog.pg_class t ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid
+   WHERE c.relkind OPERATOR(pg_catalog.=) 'p') t,
+  LATERAL
+   (SELECT tt.relid, t.nspname
+    FROM pg_catalog.pg_partition_tree(t.relname::text) tt WHERE isleaf) tc)

SELECT c.relname, ns.nspname FROM pg_catalog.pg_class c
JOIN pg_catalog.pg_namespace ns ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid
LEFT JOIN pg_catalog.pg_class t ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid
WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array['r', 'm', 'p'])
+ AND NOT EXISTS (SELECT * FROM inh_children i WHERE i.relid = c.relname AND i.nspname = ns.nspname)

ORDER BY c.relpages DESC;

The lines prefixed by a '+' removes other than the topmost-parents of trees.

For 11, we need to do the same withouth using the function.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#8Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#7)
Re: BUG #17942: vacuumdb doesn't populate extended statistics on partitioned tables

On Fri, May 26, 2023 at 06:02:07PM +0900, Kyotaro Horiguchi wrote:

For 12 or later, pg_partition_ancestors() is available. Thus something
like the following query would work.

Yes, we will need to be a bit more imaginative for older servers :(

Any pre-11 logic added had better mention the compatibility tweak it
required, and that pg_partition_ancestors() could be used to simplify
the filtering a lot for the analyze case. It would be good to add
some test cases to check all the patterns we would need to look after,
for both the VACUUM and ANALYZE cases. Perhaps with vacuumdb
--verbose and a pattern match check of the logs produced, with at
least 2 level of partitioning to check after the case of ANALYZE run
across a rather complex partitioning tree?
--
Michael