pg_restore scan

Started by R Wahyudi7 months ago14 messagesgeneral

rwahyudi@gmail.com

7 months ago

I'm trying to troubleshoot the slowness issue with pg_restore and stumbled
across a recent post about pg_restore scanning the whole file :

"scanning happens in a very inefficient way, with many seek calls and

small block reads. Try strace to see them. This initial phase can take
hours in a huge dump file, before even starting any actual restoration."
see :
/messages/by-id/E48B611D-7D61-4575-A820-B2C3EC2E0551@gmx.net

I'm currently having this same issue.

At the early stage of restoration I can see lots of disk writes activities
but as time goes by, disk writes activities are reduced.
I can see the COPY process in postgres but not using any CPU, and the
process that uses CPU are pg_restores.

I can recreate this issue when restoring a specific table to stdout.

ie :
pg_restore -vvvv -t <some_table_at_the> DB.pgdump -f -

If the table is at the bottom of the TOC it will take hours before I get a
result, but I get an almost immediate result when the table is at the top.
parallel restore suffers with the same issue where each process has to
perform a scan for each table.

What is the best way to speed up the restore ?

More info about my environment :
pg_restore (PostgreSQL) 17.6

Archive :
; Archive created at 2025-09-16 16:08:28 AEST
; dbname: DB
; TOC Entries: 8221
; Compression: none
; Dump Version: 1.14-0
; Format: CUSTOM
; Integer: 4 bytes
; Offset: 8 bytes
; Dumped from database version: 14.15
; Dumped by pg_dump version: 14.19 (Ubuntu 14.19-1.pgdg22.04+1)

Adrian Klaver

adrian.klaver@aklaver.com

7 months ago

In reply to: R Wahyudi (#1)

Re: pg_restore scan

On 9/16/25 15:25, R Wahyudi wrote:

I'm trying to troubleshoot the slowness issue with pg_restore and
stumbled across a recent post about pg_restore scanning the whole file :

"scanning happens in a very inefficient way, with many seek calls and

small block reads. Try strace to see them. This initial phase can take
hours in a huge dump file, before even starting any actual restoration."
see : /messages/by-id/E48B611D-7D61-4575-A820-
B2C3EC2E0551%40gmx.net <https://www.postgresql.org/message-id/
E48B611D-7D61-4575-A820-B2C3EC2E0551%40gmx.net>

This was for pg_dump output that was streamed to a Borg archive and as
result had no object offsets in the TOC.

How are you doing your pg_dump?

--
Adrian Klaver
adrian.klaver@aklaver.com

R Wahyudi

rwahyudi@gmail.com

7 months ago

In reply to: Adrian Klaver (#2)

Re: pg_restore scan

pg_dump was done using the following command :
pg_dump -Fc -Z 0 -h <host> -U <user> -w -d <database>

On Wed, 17 Sept 2025 at 08:36, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 9/16/25 15:25, R Wahyudi wrote:

I'm trying to troubleshoot the slowness issue with pg_restore and
stumbled across a recent post about pg_restore scanning the whole file :

"scanning happens in a very inefficient way, with many seek calls and

small block reads. Try strace to see them. This initial phase can take
hours in a huge dump file, before even starting any actual restoration."
see : /messages/by-id/E48B611D-7D61-4575-A820-
B2C3EC2E0551%40gmx.net <https://www.postgresql.org/message-id/
E48B611D-7D61-4575-A820-B2C3EC2E0551%40gmx.net>

This was for pg_dump output that was streamed to a Borg archive and as
result had no object offsets in the TOC.

How are you doing your pg_dump?

--
Adrian Klaver
adrian.klaver@aklaver.com

Ron

ronljohnsonjr@gmail.com

7 months ago

In reply to: R Wahyudi (#3)

Re: pg_restore scan

So, piping or redirecting to a file? If so, then that's the problem.

pg_dump directly to a file puts file offsets in the TOC.

This how I do custom dumps:
cd $BackupDir
pg_dump -Fc --compress=zstd:long -v -d${db} -f ${db}.dump 2> ${db}.log

On Tue, Sep 16, 2025 at 8:54 PM R Wahyudi <rwahyudi@gmail.com> wrote:

pg_dump was done using the following command :
pg_dump -Fc -Z 0 -h <host> -U <user> -w -d <database>

On Wed, 17 Sept 2025 at 08:36, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 9/16/25 15:25, R Wahyudi wrote:

I'm trying to troubleshoot the slowness issue with pg_restore and
stumbled across a recent post about pg_restore scanning the whole file :

"scanning happens in a very inefficient way, with many seek calls

and

small block reads. Try strace to see them. This initial phase can take
hours in a huge dump file, before even starting any actual restoration."
see : /messages/by-id/E48B611D-7D61-4575-A820-
B2C3EC2E0551%40gmx.net <https://www.postgresql.org/message-id/
E48B611D-7D61-4575-A820-B2C3EC2E0551%40gmx.net>

This was for pg_dump output that was streamed to a Borg archive and as
result had no object offsets in the TOC.

How are you doing your pg_dump?

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Adrian Klaver

adrian.klaver@aklaver.com

7 months ago

In reply to: R Wahyudi (#3)

Re: pg_restore scan

On 9/16/25 17:54, R Wahyudi wrote:

pg_dump was done using the following command :
pg_dump -Fc -Z 0 -h <host> -U <user> -w -d <database>

What do you do with the output?

--
Adrian Klaver
adrian.klaver@aklaver.com

R Wahyudi

rwahyudi@gmail.com

7 months ago

In reply to: Ron (#4)

Re: pg_restore scan

Sorry for not including the full command - yes , its piping to a
compression command :
| lbzip2 -n <threadsforbzipgoeshere>--best > <filenamegoeshere>

I think we found the issue! I'll do further testing and see how it goes !

On Wed, 17 Sept 2025 at 11:02, Ron Johnson <ronljohnsonjr@gmail.com> wrote:

Show quoted text

So, piping or redirecting to a file? If so, then that's the problem.

pg_dump directly to a file puts file offsets in the TOC.

This how I do custom dumps:
cd $BackupDir
pg_dump -Fc --compress=zstd:long -v -d${db} -f ${db}.dump 2> ${db}.log

On Tue, Sep 16, 2025 at 8:54 PM R Wahyudi <rwahyudi@gmail.com> wrote:

pg_dump was done using the following command :
pg_dump -Fc -Z 0 -h <host> -U <user> -w -d <database>

On Wed, 17 Sept 2025 at 08:36, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 9/16/25 15:25, R Wahyudi wrote:

I'm trying to troubleshoot the slowness issue with pg_restore and
stumbled across a recent post about pg_restore scanning the whole file

:

"scanning happens in a very inefficient way, with many seek calls

and

small block reads. Try strace to see them. This initial phase can take
hours in a huge dump file, before even starting any actual

restoration."

see : /messages/by-id/E48B611D-7D61-4575-A820-
B2C3EC2E0551%40gmx.net <https://www.postgresql.org/message-id/
E48B611D-7D61-4575-A820-B2C3EC2E0551%40gmx.net>

This was for pg_dump output that was streamed to a Borg archive and as
result had no object offsets in the TOC.

How are you doing your pg_dump?

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Ron

ronljohnsonjr@gmail.com

7 months ago

In reply to: R Wahyudi (#6)

Re: pg_restore scan

PG 17 has integrated zstd compression, while --format=directory lets you do
multi-threaded dumps. That's much faster than a single-threaded pg_dump
into a multi-threaded compression program.

(If for _Reasons_ you require a single-file backup, then tar the directory
of compressed files using the --remove-files option.)

On Tue, Sep 16, 2025 at 10:50 PM R Wahyudi <rwahyudi@gmail.com> wrote:

Sorry for not including the full command - yes , its piping to a
compression command :
| lbzip2 -n <threadsforbzipgoeshere>--best > <filenamegoeshere>

I think we found the issue! I'll do further testing and see how it goes !

On Wed, 17 Sept 2025 at 11:02, Ron Johnson <ronljohnsonjr@gmail.com>
wrote:

So, piping or redirecting to a file? If so, then that's the problem.

pg_dump directly to a file puts file offsets in the TOC.

This how I do custom dumps:
cd $BackupDir
pg_dump -Fc --compress=zstd:long -v -d${db} -f ${db}.dump 2> ${db}.log

On Tue, Sep 16, 2025 at 8:54 PM R Wahyudi <rwahyudi@gmail.com> wrote:

pg_dump was done using the following command :
pg_dump -Fc -Z 0 -h <host> -U <user> -w -d <database>

On Wed, 17 Sept 2025 at 08:36, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 9/16/25 15:25, R Wahyudi wrote:

I'm trying to troubleshoot the slowness issue with pg_restore and
stumbled across a recent post about pg_restore scanning the whole

file :

"scanning happens in a very inefficient way, with many seek calls

and

small block reads. Try strace to see them. This initial phase can

take

hours in a huge dump file, before even starting any actual

restoration."

see : /messages/by-id/E48B611D-7D61-4575-A820-
B2C3EC2E0551%40gmx.net <https://www.postgresql.org/message-id/
E48B611D-7D61-4575-A820-B2C3EC2E0551%40gmx.net>

This was for pg_dump output that was streamed to a Borg archive and as
result had no object offsets in the TOC.

How are you doing your pg_dump?

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

R Wahyudi

rwahyudi@gmail.com

7 months ago

In reply to: Ron (#7)

Re: pg_restore scan

Hi All,

Thanks for the quick and accurate response! I never been so happy seeing
IOwait on my system!

I might be blind as I can't find information about 'offset' in pg_dump
documentation.
Where can I find more info about this?

Regards,
Rianto

On Wed, 17 Sept 2025 at 13:48, Ron Johnson <ronljohnsonjr@gmail.com> wrote:

Show quoted text

PG 17 has integrated zstd compression, while --format=directory lets you
do multi-threaded dumps. That's much faster than a single-threaded pg_dump
into a multi-threaded compression program.

(If for _Reasons_ you require a single-file backup, then tar the directory
of compressed files using the --remove-files option.)

On Tue, Sep 16, 2025 at 10:50 PM R Wahyudi <rwahyudi@gmail.com> wrote:

Sorry for not including the full command - yes , its piping to a
compression command :
| lbzip2 -n <threadsforbzipgoeshere>--best > <filenamegoeshere>

I think we found the issue! I'll do further testing and see how it goes !

On Wed, 17 Sept 2025 at 11:02, Ron Johnson <ronljohnsonjr@gmail.com>
wrote:

So, piping or redirecting to a file? If so, then that's the problem.

pg_dump directly to a file puts file offsets in the TOC.

This how I do custom dumps:
cd $BackupDir
pg_dump -Fc --compress=zstd:long -v -d${db} -f ${db}.dump 2> ${db}.log

On Tue, Sep 16, 2025 at 8:54 PM R Wahyudi <rwahyudi@gmail.com> wrote:

pg_dump was done using the following command :
pg_dump -Fc -Z 0 -h <host> -U <user> -w -d <database>

On Wed, 17 Sept 2025 at 08:36, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 9/16/25 15:25, R Wahyudi wrote:

I'm trying to troubleshoot the slowness issue with pg_restore and
stumbled across a recent post about pg_restore scanning the whole

file :

"scanning happens in a very inefficient way, with many seek calls

and

small block reads. Try strace to see them. This initial phase can

take

hours in a huge dump file, before even starting any actual

restoration."

see : /messages/by-id/E48B611D-7D61-4575-A820-
B2C3EC2E0551%40gmx.net <https://www.postgresql.org/message-id/
E48B611D-7D61-4575-A820-B2C3EC2E0551%40gmx.net>

This was for pg_dump output that was streamed to a Borg archive and as
result had no object offsets in the TOC.

How are you doing your pg_dump?

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Ron

ronljohnsonjr@gmail.com

7 months ago

In reply to: R Wahyudi (#8)

Re: pg_restore scan

It's towards the end of this long mailing list thread from a couple of
weeks ago.

https://www.postgrespro.com/list/id/s0491qrn-343s-0757-8sn5-120rr8610qqq@tzk.arg

On Thu, Sep 18, 2025 at 8:58 AM R Wahyudi <rwahyudi@gmail.com> wrote:

Hi All,

Thanks for the quick and accurate response! I never been so happy seeing
IOwait on my system!

I might be blind as I can't find information about 'offset' in pg_dump
documentation.
Where can I find more info about this?

Regards,
Rianto

On Wed, 17 Sept 2025 at 13:48, Ron Johnson <ronljohnsonjr@gmail.com>
wrote:

PG 17 has integrated zstd compression, while --format=directory lets you
do multi-threaded dumps. That's much faster than a single-threaded pg_dump
into a multi-threaded compression program.

(If for _Reasons_ you require a single-file backup, then tar the
directory of compressed files using the --remove-files option.)

On Tue, Sep 16, 2025 at 10:50 PM R Wahyudi <rwahyudi@gmail.com> wrote:

Sorry for not including the full command - yes , its piping to a
compression command :
| lbzip2 -n <threadsforbzipgoeshere>--best > <filenamegoeshere>

I think we found the issue! I'll do further testing and see how it goes !

On Wed, 17 Sept 2025 at 11:02, Ron Johnson <ronljohnsonjr@gmail.com>
wrote:

So, piping or redirecting to a file? If so, then that's the problem.

pg_dump directly to a file puts file offsets in the TOC.

This how I do custom dumps:
cd $BackupDir
pg_dump -Fc --compress=zstd:long -v -d${db} -f ${db}.dump 2> ${db}.log

On Tue, Sep 16, 2025 at 8:54 PM R Wahyudi <rwahyudi@gmail.com> wrote:

pg_dump was done using the following command :
pg_dump -Fc -Z 0 -h <host> -U <user> -w -d <database>

On Wed, 17 Sept 2025 at 08:36, Adrian Klaver <
adrian.klaver@aklaver.com> wrote:

On 9/16/25 15:25, R Wahyudi wrote:

I'm trying to troubleshoot the slowness issue with pg_restore and
stumbled across a recent post about pg_restore scanning the whole

file :

"scanning happens in a very inefficient way, with many seek

calls and

small block reads. Try strace to see them. This initial phase can

take

hours in a huge dump file, before even starting any actual

restoration."

see :

/messages/by-id/E48B611D-7D61-4575-A820-

B2C3EC2E0551%40gmx.net <https://www.postgresql.org/message-id/
E48B611D-7D61-4575-A820-B2C3EC2E0551%40gmx.net>

This was for pg_dump output that was streamed to a Borg archive and
as
result had no object offsets in the TOC.

How are you doing your pg_dump?

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

#10

Adrian Klaver

adrian.klaver@aklaver.com

7 months ago

In reply to: R Wahyudi (#8)

Re: pg_restore scan

On 9/18/25 05:58, R Wahyudi wrote:

Hi All,

Thanks for the quick and accurate response! I never been so happy
seeing IOwait on my system!

Because?

What did you find?

I might be blind as I can't find information about 'offset' in pg_dump
documentation.
Where can I find more info about this?

It is not in the user documentation.

From the thread Ron referred to, there is an explanation here:

/messages/by-id/366773.1756749256@sss.pgh.pa.us

I believe the actual code, for the -Fc format, is in pg_backup_custom.c
here:

https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/pg_backup_custom.c#L723

Per comment at line 755:

"
If possible, re-write the TOC in order to update the data offset
information. This is not essential, as pg_restore can cope in most
cases without it; but it can make pg_restore significantly faster
in some situations (especially parallel restore). We can skip this
step if we're not dumping any data; there are no offsets to update
in that case.
"

Regards,
Rianto

On Wed, 17 Sept 2025 at 13:48, Ron Johnson <ronljohnsonjr@gmail.com
<mailto:ronljohnsonjr@gmail.com>> wrote:

PG 17 has integrated zstd compression, while --format=directory lets
you do multi-threaded dumps. That's much faster than a single-
threaded pg_dump into a multi-threaded compression program.

(If for _Reasons_ you require a single-file backup, then tar the
directory of compressed files using the --remove-files option.)

On Tue, Sep 16, 2025 at 10:50 PM R Wahyudi <rwahyudi@gmail.com
<mailto:rwahyudi@gmail.com>> wrote:

Sorry for not including the full command - yes , its piping to a
compression command :
| lbzip2 -n <threadsforbzipgoeshere>--best > <filenamegoeshere>

I think we found the issue! I'll do further testing and see how
it goes !

On Wed, 17 Sept 2025 at 11:02, Ron Johnson
<ronljohnsonjr@gmail.com <mailto:ronljohnsonjr@gmail.com>> wrote:

So, piping or redirecting to a file? If so, then that's the
problem.

pg_dump directly to a file puts file offsets in the TOC.

This how I do custom dumps:
cd $BackupDir
pg_dump -Fc --compress=zstd:long -v -d${db} -f ${db}.dump
2> ${db}.log

On Tue, Sep 16, 2025 at 8:54 PM R Wahyudi
<rwahyudi@gmail.com <mailto:rwahyudi@gmail.com>> wrote:

pg_dump was done using the following command :
pg_dump -Fc -Z 0 -h <host> -U <user> -w -d <database>

On Wed, 17 Sept 2025 at 08:36, Adrian Klaver
<adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

On 9/16/25 15:25, R Wahyudi wrote:

I'm trying to troubleshoot the slowness issue

with pg_restore and

stumbled across a recent post about pg_restore

scanning the whole file :

> "scanning happens in a very inefficient way,

with many seek calls and

small block reads. Try strace to see them. This

initial phase can take

hours in a huge dump file, before even starting

any actual restoration."

see : https://www.postgresql.org/message-id/

E48B611D-7D61-4575-A820- <https://
www.postgresql.org/message-id/E48B611D-7D61-4575-A820->

B2C3EC2E0551%40gmx.net <http://40gmx.net>

<https://www.postgresql.org/message-id/ <https://
www.postgresql.org/message-id/>

E48B611D-7D61-4575-A820-B2C3EC2E0551%40gmx.net

<http://40gmx.net>>

This was for pg_dump output that was streamed to a
Borg archive and as
result had no object offsets in the TOC.

How are you doing your pg_dump?

--
Adrian Klaver
adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--
Adrian Klaver
adrian.klaver@aklaver.com

#11

R Wahyudi

rwahyudi@gmail.com

7 months ago

In reply to: Adrian Klaver (#10)

Re: pg_restore scan

I've been given a database dump file daily and I've been asked to restore
it.
I tried everything I could to speed up the process, including using -j 40.

I discovered that at the later stage of the restore process, the
following behaviour repeated a few times :
40 x pg_restore process doing 100% CPU
40 x postgres process doing COPY but using 0% CPU
..... and zero disk write activity

I don't see this behaviour when restoring the database that was dumped with
-Fd.
Also with an un-piped backup file, I can restore a specific table without
having to wait for hours.

On Fri, 19 Sept 2025 at 01:54, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 9/18/25 05:58, R Wahyudi wrote:

Hi All,

Thanks for the quick and accurate response! I never been so happy
seeing IOwait on my system!

Because?

What did you find?

I might be blind as I can't find information about 'offset' in pg_dump
documentation.
Where can I find more info about this?

It is not in the user documentation.

From the thread Ron referred to, there is an explanation here:

/messages/by-id/366773.1756749256@sss.pgh.pa.us

I believe the actual code, for the -Fc format, is in pg_backup_custom.c
here:

https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/pg_backup_custom.c#L723

Per comment at line 755:

"
If possible, re-write the TOC in order to update the data offset
information. This is not essential, as pg_restore can cope in most
cases without it; but it can make pg_restore significantly faster
in some situations (especially parallel restore). We can skip this
step if we're not dumping any data; there are no offsets to update
in that case.
"

Regards,
Rianto

On Wed, 17 Sept 2025 at 13:48, Ron Johnson <ronljohnsonjr@gmail.com
<mailto:ronljohnsonjr@gmail.com>> wrote:

PG 17 has integrated zstd compression, while --format=directory lets
you do multi-threaded dumps. That's much faster than a single-
threaded pg_dump into a multi-threaded compression program.

(If for _Reasons_ you require a single-file backup, then tar the
directory of compressed files using the --remove-files option.)

On Tue, Sep 16, 2025 at 10:50 PM R Wahyudi <rwahyudi@gmail.com
<mailto:rwahyudi@gmail.com>> wrote:

Sorry for not including the full command - yes , its piping to a
compression command :
| lbzip2 -n <threadsforbzipgoeshere>--best > <filenamegoeshere>

I think we found the issue! I'll do further testing and see how
it goes !

On Wed, 17 Sept 2025 at 11:02, Ron Johnson
<ronljohnsonjr@gmail.com <mailto:ronljohnsonjr@gmail.com>>

wrote:

So, piping or redirecting to a file? If so, then that's the
problem.

pg_dump directly to a file puts file offsets in the TOC.

This how I do custom dumps:
cd $BackupDir
pg_dump -Fc --compress=zstd:long -v -d${db} -f ${db}.dump
2> ${db}.log

On Tue, Sep 16, 2025 at 8:54 PM R Wahyudi
<rwahyudi@gmail.com <mailto:rwahyudi@gmail.com>> wrote:

pg_dump was done using the following command :
pg_dump -Fc -Z 0 -h <host> -U <user> -w -d <database>

On Wed, 17 Sept 2025 at 08:36, Adrian Klaver
<adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

On 9/16/25 15:25, R Wahyudi wrote:

I'm trying to troubleshoot the slowness issue

with pg_restore and

stumbled across a recent post about pg_restore

scanning the whole file :

"scanning happens in a very inefficient way,

with many seek calls and

small block reads. Try strace to see them. This

initial phase can take

hours in a huge dump file, before even starting

any actual restoration."

see : https://www.postgresql.org/message-id/

E48B611D-7D61-4575-A820- <https://

www.postgresql.org/message-id/E48B611D-7D61-4575-A820->

B2C3EC2E0551%40gmx.net <http://40gmx.net>

<https://www.postgresql.org/message-id/ <https://
www.postgresql.org/message-id/>

E48B611D-7D61-4575-A820-B2C3EC2E0551%40gmx.net

<http://40gmx.net>>

This was for pg_dump output that was streamed to a
Borg archive and as
result had no object offsets in the TOC.

How are you doing your pg_dump?

--
Adrian Klaver
adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--
Adrian Klaver
adrian.klaver@aklaver.com

#12

Adrian Klaver

adrian.klaver@aklaver.com

7 months ago

In reply to: R Wahyudi (#11)

Re: pg_restore scan

On 9/18/25 2:36 PM, R Wahyudi wrote:

I've been given a database dump file daily and I've been asked to
restore it.
I tried everything I could to speed up the process, including using -j 40.

I discovered that at the later stage of the restore process, the
following behaviour repeated a few times :
40 x pg_restore process doing 100% CPU
40 x postgres process doing COPY but using 0% CPU
..... and zero disk write activity

I don't see this behaviour when restoring the database that was dumped
with -Fd.
Also with an un-piped backup file, I can restore a specific table
without having to wait for hours.

From the docs:

https://www.postgresql.org/docs/current/app-pgrestore.html

"
-j number-of-jobs

Only the custom and directory archive formats are supported with this
option. The input must be a regular file or directory (not, for example,
a pipe or standard input). Also, multiple jobs cannot be used together
with the option --single-transaction.
"

--

On Fri, 19 Sept 2025 at 01:54, Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

On 9/18/25 05:58, R Wahyudi wrote:

Hi All,

Thanks for the quick and accurate response! I never been so happy
seeing IOwait on my system!

Because?

What did you find?

I might be blind as I can't find information about 'offset' in

pg_dump

documentation.
Where can I find more info about this?

It is not in the user documentation.

From the thread Ron referred to, there is an explanation here:

https://www.postgresql.org/message-
id/366773.1756749256%40sss.pgh.pa.us <https://www.postgresql.org/
message-id/366773.1756749256%40sss.pgh.pa.us>

I believe the actual code, for the -Fc format, is in pg_backup_custom.c
here:

https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/
pg_backup_custom.c#L723 <https://github.com/postgres/postgres/blob/
master/src/bin/pg_dump/pg_backup_custom.c#L723>

Per comment at line 755:

"
If possible, re-write the TOC in order to update the data offset
information. This is not essential, as pg_restore can cope in most
cases without it; but it can make pg_restore significantly faster
in some situations (especially parallel restore). We can skip this
step if we're not dumping any data; there are no offsets to update
in that case.
"

Regards,
Rianto

On Wed, 17 Sept 2025 at 13:48, Ron Johnson

<ronljohnsonjr@gmail.com <mailto:ronljohnsonjr@gmail.com>

<mailto:ronljohnsonjr@gmail.com

<mailto:ronljohnsonjr@gmail.com>>> wrote:

PG 17 has integrated zstd compression, while --

format=directory lets

you do multi-threaded dumps. That's much faster than a single-
threaded pg_dump into a multi-threaded compression program.

(If for _Reasons_ you require a single-file backup, then tar the
directory of compressed files using the --remove-files option.)

On Tue, Sep 16, 2025 at 10:50 PM R Wahyudi

<rwahyudi@gmail.com <mailto:rwahyudi@gmail.com>

<mailto:rwahyudi@gmail.com <mailto:rwahyudi@gmail.com>>> wrote:

Sorry for not including the full command - yes , its

piping to a

compression command :
| lbzip2 -n <threadsforbzipgoeshere>--best >

<filenamegoeshere>

I think we found the issue! I'll do further testing and

see how

it goes !

On Wed, 17 Sept 2025 at 11:02, Ron Johnson
<ronljohnsonjr@gmail.com <mailto:ronljohnsonjr@gmail.com>

<mailto:ronljohnsonjr@gmail.com <mailto:ronljohnsonjr@gmail.com>>>
wrote:

So, piping or redirecting to a file? If so, then

that's the

problem.

pg_dump directly to a file puts file offsets in the TOC.

This how I do custom dumps:
cd $BackupDir
pg_dump -Fc --compress=zstd:long -v -d${db} -f ${db}.dump
2> ${db}.log

On Tue, Sep 16, 2025 at 8:54 PM R Wahyudi
<rwahyudi@gmail.com <mailto:rwahyudi@gmail.com>

<mailto:rwahyudi@gmail.com <mailto:rwahyudi@gmail.com>>> wrote:

pg_dump was done using the following command :
pg_dump -Fc -Z 0 -h <host> -U <user> -w -d <database>

On Wed, 17 Sept 2025 at 08:36, Adrian Klaver
<adrian.klaver@aklaver.com

<mailto:adrian.klaver@aklaver.com>

<mailto:adrian.klaver@aklaver.com

<mailto:adrian.klaver@aklaver.com>>> wrote:

On 9/16/25 15:25, R Wahyudi wrote:
>
> I'm trying to troubleshoot the slowness issue
with pg_restore and
> stumbled across a recent post about pg_restore
scanning the whole file :
>
> > "scanning happens in a very inefficient

way,

with many seek calls and
> small block reads. Try strace to see them.

This

initial phase can take
> hours in a huge dump file, before even

starting

any actual restoration."
> see : https://www.postgresql.org/message-

id/ </messages/by-id/>

E48B611D-7D61-4575-A820- <https://
www.postgresql.org/message-id/E48B611D-7D61-4575-A820- <http://

www.postgresql.org/message-id/E48B611D-7D61-4575-A820->>

> B2C3EC2E0551%40gmx.net <http://40gmx.net>

<http://40gmx.net <http://40gmx.net>>

<https://www.postgresql.org/message-id/

</messages/by-id/> <https://

www.postgresql.org/message-id/ <http://www.postgresql.org/

message-id/>>

> E48B611D-7D61-4575-A820-

B2C3EC2E0551%40gmx.net <http://40gmx.net>

<http://40gmx.net <http://40gmx.net>>>

This was for pg_dump output that was streamed

to a

Borg archive and as
result had no object offsets in the TOC.

How are you doing your pg_dump?

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
<mailto:adrian.klaver@aklaver.com

<mailto:adrian.klaver@aklaver.com>>

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com

#13

R Wahyudi

rwahyudi@gmail.com

7 months ago

In reply to: Adrian Klaver (#12)

Re: pg_restore scan

The input must be a regular file or directory (not, for example, a pipe

or standard input).

Thanks again for the pointer!

I successfully ran a parallel restore with no warnings presented.
I didn't really pay attention to how the dump was taken until I
accidentally stumbled upon your post.

Regards,
Rianto

On Fri, 19 Sept 2025 at 07:45, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 9/18/25 2:36 PM, R Wahyudi wrote:

I've been given a database dump file daily and I've been asked to
restore it.
I tried everything I could to speed up the process, including using -j

40.

I discovered that at the later stage of the restore process, the
following behaviour repeated a few times :
40 x pg_restore process doing 100% CPU
40 x postgres process doing COPY but using 0% CPU
..... and zero disk write activity

I don't see this behaviour when restoring the database that was dumped
with -Fd.
Also with an un-piped backup file, I can restore a specific table
without having to wait for hours.

From the docs:

https://www.postgresql.org/docs/current/app-pgrestore.html

"
-j number-of-jobs

Only the custom and directory archive formats are supported with this
option. The input must be a regular file or directory (not, for example,
a pipe or standard input). Also, multiple jobs cannot be used together
with the option --single-transaction.
"

--

On Fri, 19 Sept 2025 at 01:54, Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

On 9/18/25 05:58, R Wahyudi wrote:

Hi All,

Thanks for the quick and accurate response! I never been so happy
seeing IOwait on my system!

Because?

What did you find?

I might be blind as I can't find information about 'offset' in

pg_dump

documentation.
Where can I find more info about this?

It is not in the user documentation.

From the thread Ron referred to, there is an explanation here:

https://www.postgresql.org/message-
id/366773.1756749256%40sss.pgh.pa.us <https://www.postgresql.org/
message-id/366773.1756749256%40sss.pgh.pa.us>

I believe the actual code, for the -Fc format, is in

pg_backup_custom.c

here:

https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/
pg_backup_custom.c#L723 <https://github.com/postgres/postgres/blob/
master/src/bin/pg_dump/pg_backup_custom.c#L723>

Per comment at line 755:

"
If possible, re-write the TOC in order to update the data offset
information. This is not essential, as pg_restore can cope in most
cases without it; but it can make pg_restore significantly faster
in some situations (especially parallel restore). We can skip this
step if we're not dumping any data; there are no offsets to update
in that case.
"

Regards,
Rianto

On Wed, 17 Sept 2025 at 13:48, Ron Johnson

<ronljohnsonjr@gmail.com <mailto:ronljohnsonjr@gmail.com>

<mailto:ronljohnsonjr@gmail.com

<mailto:ronljohnsonjr@gmail.com>>> wrote:

PG 17 has integrated zstd compression, while --

format=directory lets

you do multi-threaded dumps. That's much faster than a

single-

threaded pg_dump into a multi-threaded compression program.

(If for _Reasons_ you require a single-file backup, then tar

the

directory of compressed files using the --remove-files

option.)

On Tue, Sep 16, 2025 at 10:50 PM R Wahyudi

<rwahyudi@gmail.com <mailto:rwahyudi@gmail.com>

<mailto:rwahyudi@gmail.com <mailto:rwahyudi@gmail.com>>>

wrote:

Sorry for not including the full command - yes , its

piping to a

compression command :
| lbzip2 -n <threadsforbzipgoeshere>--best >

<filenamegoeshere>

I think we found the issue! I'll do further testing and

see how

it goes !

On Wed, 17 Sept 2025 at 11:02, Ron Johnson
<ronljohnsonjr@gmail.com <mailto:ronljohnsonjr@gmail.com>

<mailto:ronljohnsonjr@gmail.com <mailto:ronljohnsonjr@gmail.com>>>
wrote:

So, piping or redirecting to a file? If so, then

that's the

problem.

pg_dump directly to a file puts file offsets in the

TOC.

This how I do custom dumps:
cd $BackupDir
pg_dump -Fc --compress=zstd:long -v -d${db} -f

${db}.dump

2> ${db}.log

On Tue, Sep 16, 2025 at 8:54 PM R Wahyudi
<rwahyudi@gmail.com <mailto:rwahyudi@gmail.com>

<mailto:rwahyudi@gmail.com <mailto:rwahyudi@gmail.com>>> wrote:

pg_dump was done using the following command :
pg_dump -Fc -Z 0 -h <host> -U <user> -w -d

<database>

On Wed, 17 Sept 2025 at 08:36, Adrian Klaver
<adrian.klaver@aklaver.com

<mailto:adrian.klaver@aklaver.com>

<mailto:adrian.klaver@aklaver.com

<mailto:adrian.klaver@aklaver.com>>> wrote:

On 9/16/25 15:25, R Wahyudi wrote:

I'm trying to troubleshoot the slowness

issue

with pg_restore and

stumbled across a recent post about

pg_restore

scanning the whole file :

"scanning happens in a very inefficient

way,

with many seek calls and

small block reads. Try strace to see them.

This

initial phase can take

hours in a huge dump file, before even

starting

any actual restoration."

see : https://www.postgresql.org/message-

id/ </messages/by-id/>

E48B611D-7D61-4575-A820- <https://
www.postgresql.org/message-id/E48B611D-7D61-4575-A820- <http://

www.postgresql.org/message-id/E48B611D-7D61-4575-A820->>

B2C3EC2E0551%40gmx.net <http://40gmx.net>

<http://40gmx.net <http://40gmx.net>>

<https://www.postgresql.org/message-id/

</messages/by-id/> <https://

www.postgresql.org/message-id/ <http://www.postgresql.org/

message-id/>>

E48B611D-7D61-4575-A820-

B2C3EC2E0551%40gmx.net <http://40gmx.net>

<http://40gmx.net <http://40gmx.net>>>

This was for pg_dump output that was streamed

to a

Borg archive and as
result had no object offsets in the TOC.

How are you doing your pg_dump?

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
<mailto:adrian.klaver@aklaver.com

<mailto:adrian.klaver@aklaver.com>>

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com

#14

Ron

ronljohnsonjr@gmail.com

7 months ago

In reply to: R Wahyudi (#11)

Re: pg_restore scan

On Thu, Sep 18, 2025 at 5:37 PM R Wahyudi <rwahyudi@gmail.com> wrote:

I've been given a database dump file daily and I've been asked to restore
it.
I tried everything I could to speed up the process, including using -j 40.

I discovered that at the later stage of the restore process, the
following behaviour repeated a few times :
40 x pg_restore process doing 100% CPU

Threads are not magic. IO and memory limitations still exist.

40 x postgres process doing COPY but using 0% CPU
..... and zero disk write activity

I don't see this behaviour when restoring the database that was dumped
with -Fd.
Also with an un-piped backup file, I can restore a specific table without
having to wait for hours.

We explained this three days ago. Heck, it's in this very email. Click
on "the three dots", scroll down a bit.

On Fri, 19 Sept 2025 at 01:54, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 9/18/25 05:58, R Wahyudi wrote:

Hi All,

Thanks for the quick and accurate response! I never been so happy
seeing IOwait on my system!

Because?

What did you find?

I might be blind as I can't find information about 'offset' in pg_dump
documentation.
Where can I find more info about this?

It is not in the user documentation.

From the thread Ron referred to, there is an explanation here:

/messages/by-id/366773.1756749256@sss.pgh.pa.us

I believe the actual code, for the -Fc format, is in pg_backup_custom.c
here:

https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/pg_backup_custom.c#L723

Per comment at line 755:

"
If possible, re-write the TOC in order to update the data offset
information. This is not essential, as pg_restore can cope in most
cases without it; but it can make pg_restore significantly faster
in some situations (especially parallel restore). We can skip this
step if we're not dumping any data; there are no offsets to update
in that case.
"

Regards,
Rianto

On Wed, 17 Sept 2025 at 13:48, Ron Johnson <ronljohnsonjr@gmail.com
<mailto:ronljohnsonjr@gmail.com>> wrote:

PG 17 has integrated zstd compression, while --format=directory lets
you do multi-threaded dumps. That's much faster than a single-
threaded pg_dump into a multi-threaded compression program.

(If for _Reasons_ you require a single-file backup, then tar the
directory of compressed files using the --remove-files option.)

On Tue, Sep 16, 2025 at 10:50 PM R Wahyudi <rwahyudi@gmail.com
<mailto:rwahyudi@gmail.com>> wrote:

Sorry for not including the full command - yes , its piping to a
compression command :
| lbzip2 -n <threadsforbzipgoeshere>--best >

<filenamegoeshere>

I think we found the issue! I'll do further testing and see how
it goes !

On Wed, 17 Sept 2025 at 11:02, Ron Johnson
<ronljohnsonjr@gmail.com <mailto:ronljohnsonjr@gmail.com>>

wrote:

So, piping or redirecting to a file? If so, then that's the
problem.

pg_dump directly to a file puts file offsets in the TOC.

This how I do custom dumps:
cd $BackupDir
pg_dump -Fc --compress=zstd:long -v -d${db} -f ${db}.dump
2> ${db}.log

On Tue, Sep 16, 2025 at 8:54 PM R Wahyudi
<rwahyudi@gmail.com <mailto:rwahyudi@gmail.com>> wrote:

pg_dump was done using the following command :
pg_dump -Fc -Z 0 -h <host> -U <user> -w -d <database>

On Wed, 17 Sept 2025 at 08:36, Adrian Klaver
<adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

On 9/16/25 15:25, R Wahyudi wrote:

I'm trying to troubleshoot the slowness issue

with pg_restore and

stumbled across a recent post about pg_restore

scanning the whole file :

"scanning happens in a very inefficient way,

with many seek calls and

small block reads. Try strace to see them. This

initial phase can take

hours in a huge dump file, before even starting

any actual restoration."

see : https://www.postgresql.org/message-id/

E48B611D-7D61-4575-A820- <https://

www.postgresql.org/message-id/E48B611D-7D61-4575-A820->

B2C3EC2E0551%40gmx.net <http://40gmx.net>

<https://www.postgresql.org/message-id/ <https://
www.postgresql.org/message-id/>

E48B611D-7D61-4575-A820-B2C3EC2E0551%40gmx.net

<http://40gmx.net>>

This was for pg_dump output that was streamed to a
Borg archive and as
result had no object offsets in the TOC.

How are you doing your pg_dump?

--
Adrian Klaver
adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!