pg_dump: VACUUM and REINDEXING

Started by Hasan Marzooqalmost 4 years ago8 messagesgeneral
Jump to latest
#1Hasan Marzooq
engr.naqvi@gmail.com

Hello!

I've some questions around Backup & Restore.

1: Is it necessary to perform a VACUUM and REINDEXING operation after
restoring the dump from Postgres 9.6 to Postgres 13? The dump size could be
1/2 TB to 1 TB.

2: Also, are there any other operations that are recommended to perform
after pg_restore?

3: What is the minimum required disk space if taking a dump on the same
machine where the source database exists? Is it the "size of the current
data folder x 2"?

Thanks.
Hasan

#2Guillaume Lelarge
guillaume@lelarge.info
In reply to: Hasan Marzooq (#1)
Re: pg_dump: VACUUM and REINDEXING

Hi,

Le sam. 7 mai 2022 à 04:36, Hasan Marzooq <engr.naqvi@gmail.com> a écrit :

Hello!

I've some questions around Backup & Restore.

1: Is it necessary to perform a VACUUM and REINDEXING operation after
restoring the dump from Postgres 9.6 to Postgres 13? The dump size could be
1/2 TB to 1 TB.

You can perform a VACUUM and an ANALYZE right after restoring, but you
definitely shouldn't REINDEX.

2: Also, are there any other operations that are recommended to perform
after pg_restore?

I don't think you need anything else.

3: What is the minimum required disk space if taking a dump on the same
machine where the source database exists? Is it the "size of the current
data folder x 2"?

There's definitely no rules like that. It's impossible to know before doing
it.

--
Guillaume.

#3Ron
ronljohnsonjr@gmail.com
In reply to: Hasan Marzooq (#1)
Re: pg_dump: VACUUM and REINDEXING

On 5/6/22 21:35, Hasan Marzooq wrote:

Hello!

I've some questions around Backup & Restore.

1: Is it necessary to perform a VACUUM and REINDEXING operation after
restoring the dump from Postgres 9.6 to Postgres 13? The dump size could
be 1/2 TB to 1 TB.

Perform VACUUM after there have been many updates and deletes. There have
been zero updates and deleted after pg_restore; therefore, *no need to vacuum*.

pg_restore loads all tables and then builds all indices.  Thus, *no need to
reindex*.

2: Also, are there any other operations that are recommended to perform
after pg_restore?

ANALYZE all tables.

https://www.postgresql.org/docs/13/app-vacuumdb.html

vacuumdb --dbname=whatever --jobs=`nproc` --analyze-only

3: What is the minimum required disk space if taking a dump on the same
machine where the source database exists? Is it the "size of the current
data folder x 2"?

Probably much less, but maybe (if, for example, you store lots of images
(JPEG, TIFF, PDF, etc) in bytea columns.

Whatever you do, make sure to run pg_dump with these options:
--format=directory --jobs=`nproc`
https://www.postgresql.org/docs/13/app-pgdump.html

--
Angular momentum makes the world go 'round.

#4Guillaume Lelarge
guillaume@lelarge.info
In reply to: Ron (#3)
Re: pg_dump: VACUUM and REINDEXING

Le sam. 7 mai 2022 à 10:21, Ron <ronljohnsonjr@gmail.com> a écrit :

On 5/6/22 21:35, Hasan Marzooq wrote:

Hello!

I've some questions around Backup & Restore.

1: Is it necessary to perform a VACUUM and REINDEXING operation after
restoring the dump from Postgres 9.6 to Postgres 13? The dump size could be
1/2 TB to 1 TB.

Perform VACUUM after there have been many updates and deletes. There have
been zero updates and deleted after pg_restore; therefore, *no need to
vacuum*.

I disagree. You're right about the "zero updates and deletes", so no need
to vacuum for bloat. But you need vacuum to get the visibility map of each
relation, so that the planner can use index-only scans.

--
Guillaume.

#5Hasan Marzooq
engr.naqvi@gmail.com
In reply to: Guillaume Lelarge (#4)
Re: pg_dump: VACUUM and REINDEXING

Hello!

Thanks Guillaume and Ron!

I understand REINDEXING is not required, and as Guillaume highlighted,
vacuum will still be needed after pg_restore.

Is it ok to perform a "standard" vacuum or do we need a "FULL" vacuum after
pg_restore?

Also, I think finding tables which have dead rows and then performing
vacuum on those tables only to save some time/processing here.

@Ron: Yes, we're using --jobs=`nproc` and it has significantly improved the
pg_dump/pg_restore processes. I see there is a similar option "parallel'
with VACUUM as well.

Thanks!

Hasan

On Sat, 7 May 2022 at 18:07, Guillaume Lelarge <guillaume@lelarge.info>
wrote:

Show quoted text

Le sam. 7 mai 2022 à 10:21, Ron <ronljohnsonjr@gmail.com> a écrit :

On 5/6/22 21:35, Hasan Marzooq wrote:

Hello!

I've some questions around Backup & Restore.

1: Is it necessary to perform a VACUUM and REINDEXING operation after
restoring the dump from Postgres 9.6 to Postgres 13? The dump size could be
1/2 TB to 1 TB.

Perform VACUUM after there have been many updates and deletes. There
have been zero updates and deleted after pg_restore; therefore, *no need
to vacuum*.

I disagree. You're right about the "zero updates and deletes", so no need
to vacuum for bloat. But you need vacuum to get the visibility map of each
relation, so that the planner can use index-only scans.

--
Guillaume.

#6Guillaume Lelarge
guillaume@lelarge.info
In reply to: Hasan Marzooq (#5)
Re: pg_dump: VACUUM and REINDEXING

Le sam. 7 mai 2022 à 15:27, Hasan Marzooq <engr.naqvi@gmail.com> a écrit :

Hello!

Thanks Guillaume and Ron!

I understand REINDEXING is not required, and as Guillaume highlighted,
vacuum will still be needed after pg_restore.

Is it ok to perform a "standard" vacuum or do we need a "FULL"
vacuum after pg_restore?

You don't need VACUUM FULL.

Also, I think finding tables which have dead rows and then performing
vacuum on those tables only to save some time/processing here.

Finding dead rows in a table is interesting, but not right after a
pg_restore. pg_dump only dumps live tuples, so there won't be any dead rows
right after pg_restore.

--
Guillaume.

#7Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Guillaume Lelarge (#4)
Re: pg_dump: VACUUM and REINDEXING

On Sat, 2022-05-07 at 12:06 +0200, Guillaume Lelarge wrote:

Le sam. 7 mai 2022 à 10:21, Ron <ronljohnsonjr@gmail.com> a écrit :

 On 5/6/22 21:35, Hasan Marzooq wrote:

I've some questions around Backup & Restore.

 1: Is it necessary to perform a VACUUM and REINDEXING operation after restoring the
dump from Postgres 9.6 to Postgres 13? The dump size could be 1/2 TB to 1 TB.

Perform VACUUM after there have been many updates and deletes.  There have been zero
updates and deleted after pg_restore; therefore, no need to vacuum.

I disagree. You're right about the "zero updates and deletes", so no need to vacuum for bloat.
But you need vacuum to get the visibility map of each relation, so that the planner can use index-only scans.

But from PostgreSQL v13 on, autovacuum is also triggered by INSERTs.
So I'd say that there is nothing to do after restoring a pg_dump, except
to wait until autovacuum is done.

Yours,
Laurenz Albe

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Laurenz Albe (#7)
Re: pg_dump: VACUUM and REINDEXING

Laurenz Albe <laurenz.albe@cybertec.at> writes:

But from PostgreSQL v13 on, autovacuum is also triggered by INSERTs.
So I'd say that there is nothing to do after restoring a pg_dump, except
to wait until autovacuum is done.

You might want to do manual VACUUM ANALYZE (no need for FULL) if
you don't want to wait around for autovacuum to get to it.

regards, tom lane