Experience and feedback on pg_restore --data-only

Started by Dimitrios Apostolouabout 1 year ago17 messagesgeneral
Jump to latest

Rationale:

When restoring a backup in an emergency situation, it's fine to run
pg_restore as superuser and get an exact replica of the dumped db.
AFAICT pg_restore (without --data-only) is optimised for such case.

But pg_dump/restore can be used as a generic data-copying utility, and in
those cases it makes often sense to get rid of the churn and create a
clean database by running the SQL schema definition from version control,
and then copy the data for only the tables created.

For this case, I choose to run pg_restore --data-only, and run it as the
user who owns the database (dbowner), not as a superuser, in order to
avoid changes being introduced under the radar.

Things that made my life hard:

* plenty of permission denials for both ALTER OWNER or SET SESSION
AUTHORIZATION (depending on command line switches). Both of these
require superuser privilege, but in my case this is not really needed.
Dbowner has CREATEROLE and is the one who creates all the roles (WITH
SET TRUE), and their private schemata in the specific database. Things
would work if pg_restore did "SET ROLE" instead of "SET SESSION
AUTHORIZATION" to switch user. Is this a straightforward change or there
are issues I don't see?

* After each failed attempt, I need to issue a TRUNCATE table1,table2,...
before I try again. I wrote my own function for that. It would help if
pg_restore would optionally truncate before COPY. I believe it would
require superuser privilege for it, that could achieve using the
--superuser=username option used today for disabling the triggers.

Performance issues: (important as my db size is >5TB)

* WAL writes: I didn't manage to avoid writing to the WAL, despite having
setting wal_level=minimal. I even wrote my own function to ALTER all
tables to UNLOGGED, but failed with "could not change table T to
unlogged because it references logged table". I'm out of ideas on this
one.

* Indices: Could pg_restore have a switch to DROP indices before each
COPY, and re-CREATE them after, exactly as they were? This would speed
up the process quite a bit.

Any feedback for improving my process? Should I put these ideas somewhere
as ideas for improvement on pg_restore?

Thank you in advance,
Dimitris

In reply to: Dimitrios Apostolou (#1)
Re: Experience and feedback on pg_restore --data-only

On Thu, 20 Mar 2025, Dimitrios Apostolou wrote:

Rationale:

When restoring a backup in an emergency situation, it's fine to run
pg_restore as superuser and get an exact replica of the dumped db.
AFAICT pg_restore (without --data-only) is optimised for such case.

But pg_dump/restore can be used as a generic data-copying utility, and in
those cases it makes often sense to get rid of the churn and create a clean
database by running the SQL schema definition from version control, and then
copy the data for only the tables created.

For this case, I choose to run pg_restore --data-only, and run it as the user
who owns the database (dbowner), not as a superuser, in order to avoid
changes being introduced under the radar.

Another important reason I use --data-only: tablespaces. The new host has
different storage configuration and tablespaces and the schema has been
updated to reflect those. Therefore I must create the database using the
updated schema definitions on the new host.

Show quoted text

Things that made my life hard:

* plenty of permission denials for both ALTER OWNER or SET SESSION
AUTHORIZATION (depending on command line switches). Both of these
require superuser privilege, but in my case this is not really needed.
Dbowner has CREATEROLE and is the one who creates all the roles (WITH
SET TRUE), and their private schemata in the specific database. Things
would work if pg_restore did "SET ROLE" instead of "SET SESSION
AUTHORIZATION" to switch user. Is this a straightforward change or there
are issues I don't see?

* After each failed attempt, I need to issue a TRUNCATE table1,table2,...
before I try again. I wrote my own function for that. It would help if
pg_restore would optionally truncate before COPY. I believe it would
require superuser privilege for it, that could achieve using the
--superuser=username option used today for disabling the triggers.

Performance issues: (important as my db size is >5TB)

* WAL writes: I didn't manage to avoid writing to the WAL, despite having
setting wal_level=minimal. I even wrote my own function to ALTER all
tables to UNLOGGED, but failed with "could not change table T to
unlogged because it references logged table". I'm out of ideas on this
one.

* Indices: Could pg_restore have a switch to DROP indices before each
COPY, and re-CREATE them after, exactly as they were? This would speed
up the process quite a bit.

Any feedback for improving my process? Should I put these ideas somewhere as
ideas for improvement on pg_restore?

Thank you in advance,
Dimitris

#3Ron
ronljohnsonjr@gmail.com
In reply to: Dimitrios Apostolou (#2)
Re: Experience and feedback on pg_restore --data-only

On Fri, Mar 21, 2025 at 2:36 PM Dimitrios Apostolou <jimis@gmx.net> wrote:

On Thu, 20 Mar 2025, Dimitrios Apostolou wrote:

Rationale:

When restoring a backup in an emergency situation, it's fine to run
pg_restore as superuser and get an exact replica of the dumped db.

How often do you have emergencies requiring database restore???? In my
seven years managing PG systems, I've had TWO.

AFAICT pg_restore (without --data-only) is optimised for such case.

[snip]

Any feedback for improving my process?

Yes: don't use a logical backup tool like pg_dump to backup production
databases.

PgBackRest (the tool I have experience with; there are others, though) has
mandatory features like PITR, incremental and differential backups, delta
restores and encryption. Use that instead.

To get rid of the cruft in your database, go through the schema and
manually drop unused tables.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Dimitrios Apostolou (#1)
Re: Experience and feedback on pg_restore --data-only

On 3/20/25 15:48, Dimitrios Apostolou wrote:

Rationale:

When restoring a backup in an emergency situation, it's fine to run
pg_restore as superuser and get an exact replica of the dumped db.
AFAICT pg_restore (without --data-only) is optimised for such case.

But pg_dump/restore can be used as a generic data-copying utility, and in
those cases it makes often sense to get rid of the churn and create a
clean database by running the SQL schema definition from version control,
and then copy the data for only the tables created.

For this case, I choose to run pg_restore --data-only, and run it as the
user who owns the database (dbowner), not as a superuser, in order to
avoid changes being introduced under the radar.

Things that made my life hard:

* plenty of permission denials for both ALTER OWNER or SET SESSION
  AUTHORIZATION (depending on command line switches).  Both of these
  require superuser privilege, but in my case this is not really needed.
  Dbowner has CREATEROLE and is the one who creates all the roles (WITH
  SET TRUE), and their private schemata in the specific database.  Things
  would work if pg_restore did "SET ROLE" instead of "SET SESSION
  AUTHORIZATION" to switch user. Is this a straightforward change or there
  are issues I don't see?

If this is --data-only what are the ALTER OWNER and SET SESSION
AUTHORIZATION for?

* After each failed attempt, I need to issue a TRUNCATE table1,table2,...
  before I try again.  I wrote my own function for that. It would help if
  pg_restore would optionally truncate before COPY.  I believe it would
  require superuser privilege for it, that could achieve using the
  --superuser=username option used today for disabling the triggers.

That is what --clean is for, though it needs to have the objects(tables)
be in the restore e.g. not just --data-only.

Performance issues: (important as my db size is >5TB)

* WAL writes: I didn't manage to avoid writing to the WAL, despite having
  setting wal_level=minimal. I even wrote my own function to ALTER all
  tables to UNLOGGED, but failed with "could not change table T to
  unlogged because it references logged table".  I'm out of ideas on this
  one.

* Indices: Could pg_restore have a switch to DROP indices before each
  COPY, and re-CREATE them after, exactly as they were?  This would speed
  up the process quite a bit.

Any feedback for improving my process? Should I put these ideas somewhere
as ideas for improvement on pg_restore?

Thank you in advance,
Dimitris

--
Adrian Klaver
adrian.klaver@aklaver.com

#5Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Dimitrios Apostolou (#1)
Re: Experience and feedback on pg_restore --data-only

On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote:

Performance issues: (important as my db size is >5TB)

* WAL writes: I didn't manage to avoid writing to the WAL, despite having
   setting wal_level=minimal. I even wrote my own function to ALTER all
   tables to UNLOGGED, but failed with "could not change table T to
   unlogged because it references logged table".  I'm out of ideas on this
   one.

You'd have to create an load the table in the same transaction, that is,
you'd have to run pg_restore with --single-transaction.

Any feedback for improving my process? Should I put these ideas somewhere
as ideas for improvement on pg_restore?

You put your ideas in the right place.

Yours,
Laurenz Albe

In reply to: Laurenz Albe (#5)
Re: Experience and feedback on pg_restore --data-only

On Sun, 23 Mar 2025, Laurenz Albe wrote:

On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote:

Performance issues: (important as my db size is >5TB)

* WAL writes: I didn't manage to avoid writing to the WAL, despite having
   setting wal_level=minimal. I even wrote my own function to ALTER all
   tables to UNLOGGED, but failed with "could not change table T to
   unlogged because it references logged table".  I'm out of ideas on this
   one.

You'd have to create an load the table in the same transaction, that is,
you'd have to run pg_restore with --single-transaction.

That would restore the schema from the dump, while I want to create the
schema from the SQL code in version control.

Something that might work, would be for pg_restore to issue a TRUNCATE
before the COPY. I believe this would require superuser privelege though,
that I would prefer to avoid. Currently I issue TRUNCATE for all tables
manually before running pg_restore, but of course this is in a different
transaction so it doesn't help.

By the way do you see potential problems with using --single-transaction
to restore billion-rows tables?

Thank you,
Dimitris

#7Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Dimitrios Apostolou (#6)
Re: Experience and feedback on pg_restore --data-only

On 3/24/25 07:24, Dimitrios Apostolou wrote:

On Sun, 23 Mar 2025, Laurenz Albe wrote:

On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote:

Performance issues: (important as my db size is >5TB)

* WAL writes: I didn't manage to avoid writing to the WAL, despite
having
   setting wal_level=minimal. I even wrote my own function to ALTER all
   tables to UNLOGGED, but failed with "could not change table T to
   unlogged because it references logged table".  I'm out of ideas on
this
   one.

You'd have to create an load the table in the same transaction, that is,
you'd have to run pg_restore with --single-transaction.

That would restore the schema from the dump, while I want to create the
schema from the SQL code in version control.

I am not following, from your original post:

"
... create a
clean database by running the SQL schema definition from version
control, and then copy the data for only the tables created.

For this case, I choose to run pg_restore --data-only, and run it as the
user who owns the database (dbowner), not as a superuser, in order to
avoid changes being introduced under the radar.
"

You are running the process in two steps, where the first does not
involve pg_restore. Not sure why doing the pg_restore --data-only
portion in single transaction is not possible?

Something that might work, would be for pg_restore to issue a TRUNCATE
before the COPY. I believe this would require superuser privelege though,
that I would prefer to avoid. Currently I issue TRUNCATE for all tables
manually before running pg_restore, but of course this is in a different
transaction so it doesn't help.

By the way do you see potential problems with using --single-transaction
to restore billion-rows tables?

COPY is all or none(version 17+ caveat(see
https://www.postgresql.org/docs/current/sql-copy.html ON_ERROR)), so if
the data dump fails in --single-transaction everything rolls back.

Thank you,
Dimitris

--
Adrian Klaver
adrian.klaver@aklaver.com

#8Thomas Kellerer
shammat@gmx.net
In reply to: Dimitrios Apostolou (#1)
Re: Experience and feedback on pg_restore --data-only

Dimitrios Apostolou schrieb am 20.03.2025 um 23:48:

Things that made my life hard:

* plenty of permission denials for both ALTER OWNER or SET SESSION
  AUTHORIZATION (depending on command line switches).  Both of these
  require superuser privilege, but in my case this is not really needed.
  Dbowner has CREATEROLE and is the one who creates all the roles (WITH
  SET TRUE), and their private schemata in the specific database.  Things
  would work if pg_restore did "SET ROLE" instead of "SET SESSION
  AUTHORIZATION" to switch user. Is this a straightforward change or there
  are issues I don't see?

I think that can be avoided by using --no-owner and --no-privileges

In reply to: Adrian Klaver (#7)
Re: Experience and feedback on pg_restore --data-only

On Mon, 24 Mar 2025, Adrian Klaver wrote:

On 3/24/25 07:24, Dimitrios Apostolou wrote:

On Sun, 23 Mar 2025, Laurenz Albe wrote:

On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote:

Performance issues: (important as my db size is >5TB)

* WAL writes: I didn't manage to avoid writing to the WAL, despite
having
   setting wal_level=minimal. I even wrote my own function to ALTER all
   tables to UNLOGGED, but failed with "could not change table T to
   unlogged because it references logged table".  I'm out of ideas on
this
   one.

You'd have to create an load the table in the same transaction, that is,
you'd have to run pg_restore with --single-transaction.

That would restore the schema from the dump, while I want to create the
schema from the SQL code in version control.

I am not following, from your original post:

"
... create a
clean database by running the SQL schema definition from version control, and
then copy the data for only the tables created.

For this case, I choose to run pg_restore --data-only, and run it as the user
who owns the database (dbowner), not as a superuser, in order to avoid
changes being introduced under the radar.
"

You are running the process in two steps, where the first does not involve
pg_restore. Not sure why doing the pg_restore --data-only portion in single
transaction is not possible?

Laurenz informed me that I could avoid writing to the WAL if I "create and
load the table in a single transaction".
I haven't tried, but here is what I would do to try --single-transaction:

Transaction 1: manually issuing all of CREATE TABLE etc.

Transaction 2: pg_restore --single-transaction --data-only

The COPY command in transaction 2 would still need to write to WAL, since
it's separate from the CREATE TABLE.

Am I wrong somewhere?

Something that might work, would be for pg_restore to issue a TRUNCATE
before the COPY. I believe this would require superuser privelege though,
that I would prefer to avoid. Currently I issue TRUNCATE for all tables
manually before running pg_restore, but of course this is in a different
transaction so it doesn't help.

By the way do you see potential problems with using --single-transaction
to restore billion-rows tables?

COPY is all or none(version 17+ caveat(see
https://www.postgresql.org/docs/current/sql-copy.html ON_ERROR)), so if the
data dump fails in --single-transaction everything rolls back.

So if I restore all tables, then an error about a "table not found" would
not roll back already copied tables, since it's not part of a COPY?

Thank you for the feedback,
Dimitris

#10Ron
ronljohnsonjr@gmail.com
In reply to: Dimitrios Apostolou (#9)
Re: Experience and feedback on pg_restore --data-only

Why are you regularly having emergencies requiring the restoration of
multi-TB tables to databases with lots of cruft?

Fixing that would go a long way towards eliminating your problems with
pg_restore.

On Mon, Mar 24, 2025 at 11:51 AM Dimitrios Apostolou <jimis@gmx.net> wrote:

On Mon, 24 Mar 2025, Adrian Klaver wrote:

On 3/24/25 07:24, Dimitrios Apostolou wrote:

On Sun, 23 Mar 2025, Laurenz Albe wrote:

On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote:

Performance issues: (important as my db size is >5TB)

* WAL writes: I didn't manage to avoid writing to the WAL, despite
having
setting wal_level=minimal. I even wrote my own function to ALTER

all

tables to UNLOGGED, but failed with "could not change table T to
unlogged because it references logged table". I'm out of ideas on
this
one.

You'd have to create an load the table in the same transaction, that

is,

you'd have to run pg_restore with --single-transaction.

That would restore the schema from the dump, while I want to create the
schema from the SQL code in version control.

I am not following, from your original post:

"
... create a
clean database by running the SQL schema definition from version

control, and

then copy the data for only the tables created.

For this case, I choose to run pg_restore --data-only, and run it as the

user

who owns the database (dbowner), not as a superuser, in order to avoid
changes being introduced under the radar.
"

You are running the process in two steps, where the first does not

involve

pg_restore. Not sure why doing the pg_restore --data-only portion in

single

transaction is not possible?

Laurenz informed me that I could avoid writing to the WAL if I "create and
load the table in a single transaction".
I haven't tried, but here is what I would do to try --single-transaction:

Transaction 1: manually issuing all of CREATE TABLE etc.

Transaction 2: pg_restore --single-transaction --data-only

The COPY command in transaction 2 would still need to write to WAL, since
it's separate from the CREATE TABLE.

Am I wrong somewhere?

Something that might work, would be for pg_restore to issue a TRUNCATE
before the COPY. I believe this would require superuser privelege

though,

that I would prefer to avoid. Currently I issue TRUNCATE for all tables
manually before running pg_restore, but of course this is in a

different

transaction so it doesn't help.

By the way do you see potential problems with using

--single-transaction

to restore billion-rows tables?

COPY is all or none(version 17+ caveat(see
https://www.postgresql.org/docs/current/sql-copy.html ON_ERROR)), so

if the

data dump fails in --single-transaction everything rolls back.

So if I restore all tables, then an error about a "table not found" would
not roll back already copied tables, since it's not part of a COPY?

Thank you for the feedback,
Dimitris

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

In reply to: Adrian Klaver (#4)
Re: Experience and feedback on pg_restore --data-only

On Sun, 23 Mar 2025, Adrian Klaver wrote:

On 3/20/25 15:48, Dimitrios Apostolou wrote:

* plenty of permission denials for both ALTER OWNER or SET SESSION
  AUTHORIZATION (depending on command line switches).  Both of these
  require superuser privilege, but in my case this is not really needed.
  Dbowner has CREATEROLE and is the one who creates all the roles (WITH
  SET TRUE), and their private schemata in the specific database.  Things
  would work if pg_restore did "SET ROLE" instead of "SET SESSION
  AUTHORIZATION" to switch user. Is this a straightforward change or
there are issues I don't see?

If this is --data-only what are the ALTER OWNER and SET SESSION AUTHORIZATION
for?

You are probably right, early in my trials I was running pg_restore
without --data-only as a non-superuser so it might be that the error
message comes from there. Haven't noted the exact command for this error
message unfortunately.

The point still stands though. The dbowner user is administrator for this
database, and has CREATEd the users with the right to SET ROLE as any of
them. Those other users own tables in their private schemas. But
pg_restore does SET SESSION AUTHORIZATION which requires superuser priv
instead of SET ROLE. I wonder what the reasons are for that.

Maybe pg_restore could either:

- do SET ROLE instead of SET SESSION AUTHORIZATION
- temporarily use the --superuser powers just for issuing the ALTER ROLE.

Regards,
Dimitris

#12Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Dimitrios Apostolou (#9)
Re: Experience and feedback on pg_restore --data-only

On 3/24/25 08:51, Dimitrios Apostolou wrote:

On Mon, 24 Mar 2025, Adrian Klaver wrote:

On 3/24/25 07:24, Dimitrios Apostolou wrote:

 On Sun, 23 Mar 2025, Laurenz Albe wrote:

 On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote:

 Performance issues: (important as my db size is >5TB)

 * WAL writes: I didn't manage to avoid writing to the WAL, despite
 having
    setting wal_level=minimal. I even wrote my own function to
ALTER all
    tables to UNLOGGED, but failed with "could not change table T to
    unlogged because it references logged table".  I'm out of ideas on
 this
    one.

 You'd have to create an load the table in the same transaction,
that is,
 you'd have to run pg_restore with --single-transaction.

 That would restore the schema from the dump, while I want to create the
 schema from the SQL code in version control.

I am not following, from your original post:

"
... create a
clean database by running the SQL schema definition from version
control, and
then copy the data for only the tables created.

For this case, I choose to run pg_restore --data-only, and run it as
the user
who owns the database (dbowner), not as a superuser, in order to avoid
changes being introduced under the radar.
"

You are running the process in two steps, where the first does not
involve
pg_restore. Not sure why doing the pg_restore --data-only portion in
single
transaction is not possible?

Laurenz informed me that I could avoid writing to the WAL if I "create and
load the table in a single transaction".

From here:

/messages/by-id/455d28421ae33c73b73a6f527d2f72816ca5dd29.camel@cybertec.at

What he said was:

"You'd have to create an load the table in the same transaction, that
is, you'd have to run pg_restore with --single-transaction."

Where I assume he meant '... create and load ...'. That is not the same
as what you are doing below.

I haven't tried, but here is what I would do to try --single-transaction:

Transaction 1: manually issuing all of CREATE TABLE etc.

Transaction 2: pg_restore --single-transaction --data-only

The COPY command in transaction 2 would still need to write to WAL, since
it's separate from the CREATE TABLE.

Am I wrong somewhere?

COPY is all or none(version 17+ caveat(see
https://www.postgresql.org/docs/current/sql-copy.html  ON_ERROR)), so
if the
data dump fails in --single-transaction everything rolls back.

So if I restore all tables, then an error about a "table not found" would
not roll back already copied tables, since it's not part of a COPY?

If you are following what you show above then the tables and other
objects would be created manually from the version control outside of
pg_restore and on successful completion and commit of that transaction
they would persist until such time as you change them. The second step
pg_restore --single-transaction --data-only is where you could 99% of
the way through and have a failure that rolls back all the data entered
in the tables.

Thank you for the feedback,
Dimitris

--
Adrian Klaver
adrian.klaver@aklaver.com

In reply to: Adrian Klaver (#7)
Re: Experience and feedback on pg_restore --data-only

Hi Ron,

I read your reply in the mailing list archives as I'm not subscribed to
the list, and I'm copy-pasting a response here. Please include me as a
recipient in further replies.

Why are you regularly having emergencies requiring the restoration of
multi-TB tables to databases with lots of cruft?

Fixing that would go a long way towards eliminating your problems with
pg_restore.

I don't have emergencies yet. I'm testing the process of restoring the
database dump, and it takes more than 24 hours currently. A successful
test is vital to approve the process.

But the primary usage of pg_restore that I have is not to save me from
emergencies but to populate the dev database with recent data.

Regards,
Dimitris

#14Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Dimitrios Apostolou (#6)
Re: Experience and feedback on pg_restore --data-only

On Mon, 2025-03-24 at 15:24 +0100, Dimitrios Apostolou wrote:

By the way do you see potential problems with using --single-transaction
to restore billion-rows tables?

No.

Yours,
Laurenz Albe

#15Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Dimitrios Apostolou (#9)
Re: Experience and feedback on pg_restore --data-only

On Mon, 2025-03-24 at 16:51 +0100, Dimitrios Apostolou wrote:

Laurenz informed me that I could avoid writing to the WAL if I "create and
load the table in a single transaction".
I haven't tried, but here is what I would do to try --single-transaction:

Transaction 1: manually issuing all of CREATE TABLE etc.

Transaction 2: pg_restore --single-transaction --data-only

The COPY command in transaction 2 would still need to write to WAL, since
it's separate from the CREATE TABLE.

Am I wrong somewhere?

No, that is correct.

Yours,
Laurenz Albe

#16Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Dimitrios Apostolou (#13)
Re: Experience and feedback on pg_restore --data-only

On 3/24/25 09:15, Dimitrios Apostolou wrote:

Hi Ron,

I read your reply in the mailing list archives as I'm not subscribed to
the list, and I'm copy-pasting a response here. Please include me as a
recipient in further replies.

Why are you regularly having emergencies requiring the restoration of
multi-TB tables to databases with lots of cruft?

Fixing that would go a long way towards eliminating your problems with
pg_restore.

I don't have emergencies yet. I'm testing the process of restoring the
database dump, and it takes more than 24 hours currently. A successful
test is vital to approve the process.

It is doubtful that pg_dump/pg_restore will meet the requirements. You
are probably looking at some process that does incremental updates and
then restores from that. Something like pgbackrest:

https://pgbackrest.org/

comes to mind.

But the primary usage of pg_restore that I have is not to save me from
emergencies but to populate the dev database with recent data.

Regards,
Dimitris

--
Adrian Klaver
adrian.klaver@aklaver.com

In reply to: Laurenz Albe (#5)
Re: Experience and feedback on pg_restore --data-only

Hi Laurenz,

On Sun, 23 Mar 2025, Laurenz Albe wrote:

On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote:

Performance issues: (important as my db size is >5TB)

* WAL writes: I didn't manage to avoid writing to the WAL, despite having
   setting wal_level=minimal. I even wrote my own function to ALTER all
   tables to UNLOGGED, but failed with "could not change table T to
   unlogged because it references logged table".  I'm out of ideas on this
   one.

You'd have to create an load the table in the same transaction, that is,
you'd have to run pg_restore with --single-transaction.

Do you know why --single-transaction can't be combined with -j ?
I'm thinking it's because every process would need to have its own
transaction, which apparently results to more than one transaction.

Do you think this could be alleviated somehow with subtransactions?

I'm trying to think of a way to enjoy the performance benefits of
--single-transaction, while also parallelizing the pg_restore. For this
use case I don't care about the content appearing all in one transaction.
Would it be possible then to have dependency resolution in such a way that
we split the tasks into one transaction per worker process?

Just thinking out loud here, I'd be interested in feedback.

Thanks,
Dimitris