Autovacuum on Partitioned Tables

Started by Ryan Ruenroengover 3 years ago7 messagesgeneral
Jump to latest
#1Ryan Ruenroeng
rruenroeng@gmail.com

Hello there!

There is a statement in the Postgres Docs: "Partitioned tables are not
processed by autovacuum."

What does the above statement mean?
Does autovacuum not process both the parent and the child tables in a
partition relationship?

What is the definition of a partitioned table?

I have a table with 50+ million rows that gets data added to/wiped from it
every 90 days. We are planning to break this table into a few thousand
partitions. More partitions will likely be added in the future, but we
don't have plans to delete any of the partitions. Will we need to manually
track the statistics of these partitions and manually vacuum the tables or
will autovacuum help to manage them?

Best,
Ryan

Ryan N Ruenroeng (He/His)
(717) 578-3929
| RRuenroeng@gmail.com
| Madison, WI <https://maps.google.com/?q=Madison,%20WI&gt;
<https://github.com/rruenroeng&gt; <https://www.facebook.com/ryan.ruenroeng&gt;
<https://www.linkedin.com/in/ryan-ruenroeng&gt;

#2Ron
ronljohnsonjr@gmail.com
In reply to: Ryan Ruenroeng (#1)
Re: Autovacuum on Partitioned Tables

On 10/31/22 17:26, Ryan Ruenroeng wrote:

Hello there!

There is a statement in the Postgres Docs: "Partitioned tables are not
processed by autovacuum."

What does the above statement mean?
Does autovacuum not process both the parent and the child tables in a
partition relationship?

Partitioned (aka parent) tables are "virtual".  There's nothing to vacuum
are analyze.

What is the definition of a partitioned table?

I have a table with 50+ million rows that gets data added to/wiped from it
every 90 days. We are planning to break this table into a few thousand
partitions.

1. That's a lot of partitions.  Older (like v12) query optimizers don't do a
great job handle that many partitions.
2. Be careful what you partition on.  (We added "partition_date" to PKs so
as to partition by date, even though the "real" PK is a synthetic key; it
was a query performance *disaster* in Pg 12.)

More partitions will likely be added in the future, but we don't have
plans to delete any of the partitions. Will we need to manually track the
statistics of these partitions and manually vacuum the tables or will
autovacuum help to manage them?

Autovacuum will handle it.

--
Angular momentum makes the world go 'round.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ron (#2)
Re: Autovacuum on Partitioned Tables

Ron <ronljohnsonjr@gmail.com> writes:

On 10/31/22 17:26, Ryan Ruenroeng wrote:

I have a table with 50+ million rows that gets data added to/wiped from it
every 90 days. We are planning to break this table into a few thousand
partitions.

1. That's a lot of partitions.  Older (like v12) query optimizers don't do a
great job handle that many partitions.

Newer ones don't either, if you are incautious enough to issue a query
that touches all/most of the partitions --- or even just that the
planner cannot prove doesn't touch all/most of the partitions. So
that sort of setup hinges critically on having very stylized queries
that you've vetted in advance. We'll probably continue to make
incremental improvements in how well things work with lots of
partitions, but I don't foresee it ever becoming a cost-free thing.

TBH, if you've got 50m rows, I'm not sure you need partitions at all.
You sure as heck do not need "a few thousand" of them.

You should be basing your partitioning design on predictable maintenance
operations that will match your partition boundaries. For example,
if your business requirement is to keep five years' worth of records
and you want to drop the oldest month's records in bulk once a month,
then partitioning by month would be pretty helpful to make those drops
cheap. That would lead to 60 active partitions which is entirely
reasonable.

regards, tom lane

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Ryan Ruenroeng (#1)
Re: Autovacuum on Partitioned Tables

On Mon, 2022-10-31 at 17:26 -0500, Ryan Ruenroeng wrote:

There is a statement in the Postgres Docs: "Partitioned tables are not processed by autovacuum."

What does the above statement mean?
Does autovacuum not process both the parent and the child tables in a partition relationship? 

I agree that this is unclear and have proposed an improvement:

/messages/by-id/1fd81ddc7710a154834030133c6fea41e55c8efb.camel@cybertec.at

Yours,
Laurenz Albe

#5Ron
ronljohnsonjr@gmail.com
In reply to: Tom Lane (#3)
Re: Autovacuum on Partitioned Tables

On 10/31/22 23:05, Tom Lane wrote:
[snip]

TBH, if you've got 50m rows, I'm not sure you need partitions at all.

Big rows (i.e. document storage tables with bytea or xml fields) can make
databases explode in size even with only 50M rows.

(Yes, I know the arguments against it, but it works quite well when the
database is in a cloud instance.  Worries about backup times, at least, are
eliminated.)

--
Angular momentum makes the world go 'round.

#6Ryan Ruenroeng
rruenroeng@gmail.com
In reply to: Ron (#5)
Re: Autovacuum on Partitioned Tables

Thanks all for your responses. We have a couple of tables. Each with
50-70mil rows currently, but they are expected to grow. Partitioning seems
to be a better long-term strategy, queries to these tables, using their
existing indexes, leaves them basically unusable (loooong run times).

FYI, we are currently on PG v11.16

Let's assume we do go with my current proposal because it is inline with
how we are planning to drop partitions that stop seeing activity in the
future:

[snip] from Ron's post above:
Autovacuum will handle it.

I still have some doubts based on this:

"Tuples changed in partitions and inheritance children do not trigger
analyze on the parent table. If the parent table is empty or rarely
changed, it may never be processed by autovacuum, and the statistics
for the inheritance tree as a whole won't be collected. It is
necessary to run ANALYZE* on the parent table manually in order to
keep the statistics up to date." *[Link
<https://www.postgresql.org/docs/15/routine-vacuuming.html&gt;]

Q1: Will we at least need to call Analyze via a cron job on the parent
table to ensure that the statistics are up to date for autovacuum to
catch the tables?

From reading the documentation that a few of you have pointed me to,
I'm led to believe that the parent table is the "Partition" table. The
children tables are treated by the autovacuum as tables

Q2: Autovacuum will act on the partitions/children to the parent
table. *Is that a correct statement?*

--
It's good to know that the query optimizer will improve with partitions on
versions 12+. Thank you.

Best,
Ryan

Ryan N Ruenroeng (He/His)
(717) 578-3929
| RRuenroeng@gmail.com
| Madison, WI <https://maps.google.com/?q=Madison,%20WI&gt;
<https://github.com/rruenroeng&gt; <https://www.facebook.com/ryan.ruenroeng&gt;
<https://www.linkedin.com/in/ryan-ruenroeng&gt;

On Tue, Nov 1, 2022 at 2:54 AM Ron <ronljohnsonjr@gmail.com> wrote:

Show quoted text

On 10/31/22 23:05, Tom Lane wrote:
[snip]

TBH, if you've got 50m rows, I'm not sure you need partitions at all.

Big rows (i.e. document storage tables with bytea or xml fields) can make
databases explode in size even with only 50M rows.

(Yes, I know the arguments against it, but it works quite well when the
database is in a cloud instance. Worries about backup times, at least,
are
eliminated.)

--
Angular momentum makes the world go 'round.

#7Ron
ronljohnsonjr@gmail.com
In reply to: Ryan Ruenroeng (#6)
Re: Autovacuum on Partitioned Tables

On 11/1/22 10:11, Ryan Ruenroeng wrote:T

Thanks all for your responses. We have a couple of tables. Each with
50-70mil rows currently, but they are expected to grow. Partitioning seems
to be a better long-term strategy, queries to these tables, using their
existing indexes, leaves them basically unusable (loooong run times).

FYI, we are currently on PG v11.16

Let's assume we do go with my current proposal because it is inline with
how we are planning to drop partitions that stop seeing activity in the
future:

[snip] from Ron's post above:
Autovacuum will handle it.

I still have some doubts based on this:
"Tuples changed in partitions and inheritance children do not trigger
analyze on the parent table. If the parent table is empty or rarely
changed, it may never be processed by autovacuum, and the statistics for
the inheritance tree as a whole won't be collected. It is necessary to run
|ANALYZE|/ on the parent table manually in order to keep the statistics up to
date." /[Link <https://www.postgresql.org/docs/15/routine-vacuuming.html&gt;]
Q1: Will we at least need to call Analyze via a cron job on the parent
table to ensure that the statistics are up to date for autovacuum to catch
the tables?

Tuning auto-analyze variables is supposed to obviate the need for manual
vacuuming *analyzing*.  I do manual vacuuming *anayzing* anyway in a cron
job. "This set of tables have had more than X amount of changes, so I'll
/explictly/ analyze them."

Once a week, I do bloat calculations and vacuum based on that.

From reading the documentation that a few of you have pointed me to, I'm
led to believe that the parent table is the "Partition" table.

"\d" will show you exactly what you need to know.

The children tables are treated by the autovacuum as tables

Child tables *are* tables.

Q2: Autovacuum will act on the partitions/children to the parent table.
*Is that a correct statement?*

Parents are meta-tables.  There's nothing to vacuum or analyze.

--
It's good to know that the query optimizer will improve with partitions on
versions 12+. Thank you.

Best,
Ryan

Ryan N Ruenroeng (He/His)
(717) 578-3929 <tel:(717)+578-3929>

| RRuenroeng@gmail.com

| Madison, WI <https://maps.google.com/?q=Madison,%20WI&gt;

<https://github.com/rruenroeng&gt; <https://www.facebook.com/ryan.ruenroeng&gt;
<https://www.linkedin.com/in/ryan-ruenroeng&gt;

On Tue, Nov 1, 2022 at 2:54 AM Ron <ronljohnsonjr@gmail.com> wrote:

On 10/31/22 23:05, Tom Lane wrote:
[snip]

TBH, if you've got 50m rows, I'm not sure you need partitions at all.

Big rows (i.e. document storage tables with bytea or xml fields) can make
databases explode in size even with only 50M rows.

(Yes, I know the arguments against it, but it works quite well when the
database is in a cloud instance.  Worries about backup times, at
least, are
eliminated.)

--
Angular momentum makes the world go 'round.

--
Angular momentum makes the world go 'round.