BUG #17342: pg_restore with jobs > 1 errors out : a worker process died unexpectedly

Started by PG Bug reporting formover 4 years ago3 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 17342
Logged by: GPO
Email address: gparc@online.fr
PostgreSQL version: 12.7
Operating system: Linux CentOS 7.9
Description:

Hello,
I'm migrating from PostgreSQL 9.1.24 to PostgreSQL 12.7

On the source machine (9.1.24), the export process is done like this :
pg_dump --schema=$SCHEMA --no-owner --format=custom $BASE --no-password
--verbose -f $DUMPFILE

On the target machine (12.7), the import process is done like this :
export PGOPTIONS="-c maintenance_work_mem=512MB"
pg_restore --no-owner --no-tablespaces --dbname=$BASE --username=postgres
--role=$ROLE --schema=$SCHEMA --no-password --verbose --jobs=4
--exit-on-error <dumpfile_from_9.1.24>

And I get the following messages :
pg_restore: error: could not execute query: ERROR: there is no unique
constraint matching given keys for referenced table "<table_name>"
Command was: ALTER TABLE ONLY <table_name>
ADD CONSTRAINT <constraint_name> FOREIGN KEY (<column_name>) REFERENCES
<table_name>(column_name);
pg_restore: error: a worker process died unexpectedly

==> if I omit --jobs option or use --jobs=1, i get no errors

I saw this type of error presumably fixed in 12.4 but I'm on 12.7

More infos :
In the instance log with log_error_verbosity=verbose, i have these extra
lines :

host=[local],user=postgres,db=<dbname>,time=2021-12-21 17:57:55 CET LOG:
00000: statement: ALTER TABLE ONLY <table_name>
ADD CONSTRAINT <constraint_name> FOREIGN KEY (<column_name>)
REFERENCES <table_name>(<column_name>);

host=[local],user=postgres,db=<dbname>,time=2021-12-21 17:57:55 CET
LOCATION: exec_simple_query, postgres.c:1045
host=[local],user=postgres,db=<dbname>,time=2021-12-21 17:57:55 CET ERROR:
42830: there is no unique constraint matching given keys for referenced
table "<table_name>"
host=[local],user=postgres,db=<dbname>,time=2021-12-21 17:57:55 CET
LOCATION: transformFkeyCheckAttrs, tablecmds.c:10022
host=[local],user=postgres,db=<dbname>,time=2021-12-21 17:57:55 CET
STATEMENT: ALTER TABLE ONLY <table_name>
ADD CONSTRAINT <constraint_name> FOREIGN KEY (<column_name>)
REFERENCES <table_name>(column_name>);

host=[local],user=postgres,db=<dbname>,time=2021-12-21 17:57:55 CET ERROR:
57014: canceling statement due to user request
host=[local],user=postgres,db=<dbname>,time=2021-12-21 17:57:55 CET CONTEXT:
COPY <table_name>, line 53616759: "363925604 2 19950.00
3613 \N 5033006 125 \N \N"
host=[local],user=postgres,db=<dbname>,time=2021-12-21 17:57:55 CET
LOCATION: ProcessInterrupts, postgres.c:3136
host=[local],user=postgres,db=<dbname>,time=2021-12-21 17:57:55 CET
STATEMENT: COPY <table_name> (<column_name>,<column_name>,...) FROM
stdin;

host=[local],user=postgres,db=<dbname>,time=2021-12-21 17:57:55 CET LOG:
08P01: incomplete message from client
host=[local],user=postgres,db=<dbname>,time=2021-12-21 17:57:55 CET CONTEXT:
COPY <table_name>, line 27482
host=[local],user=postgres,db=<dbname>,time=2021-12-21 17:57:55 CET
LOCATION: pq_getmessage, pqcomm.c:1329
host=[local],user=postgres,db=<dbname>,time=2021-12-21 17:57:55 CET
STATEMENT: COPY <table_name>(column_name>,<column_name>,...) FROM stdin;

host=[local],user=postgres,db=<dbname>,time=2021-12-21 17:57:55 CET ERROR:
08006: unexpected EOF on client connection with an open transaction
host=[local],user=postgres,db=<dbname>,time=2021-12-21 17:57:55 CET CONTEXT:
COPY <table_name>, line 27482
host=[local],user=postgres,db=<dbname>,time=2021-12-21 17:57:55 CET
LOCATION: CopyGetData, copy.c:666
host=[local],user=postgres,db=<dbname>,time=2021-12-21 17:57:55 CET
STATEMENT: COPY <table_name>(column_name>,<column_name>,...) FROM stdin;

host=[local],user=postgres,db=<db_name>,time=2021-12-21 17:57:55 CET LOG:
08006: could not send data to client: Broken pipe
host=[local],user=postgres,db=<db_name>,time=2021-12-21 17:57:55 CET
LOCATION: internal_flush, pqcomm.c:1462
host=[local],user=postgres,db=<db_name>,time=2021-12-21 17:57:55 CET
STATEMENT: COPY <table_name>(column_name>,<column_name>,...) FROM stdin;

host=[local],user=postgres,db=<db_name>,time=2021-12-21 17:57:55 CET LOG:
08006: could not send data to client: Broken pipe
host=[local],user=postgres,db=<db_name>,time=2021-12-21 17:57:55 CET
LOCATION: internal_flush, pqcomm.c:1462
host=[local],user=postgres,db=<db_name>,time=2021-12-21 17:57:55 CET ERROR:
57014: canceling statement due to user request
host=[local],user=postgres,db=<db_name>,time=2021-12-21 17:57:55 CET
LOCATION: ProcessInterrupts, postgres.c:3136
host=[local],user=postgres,db=<db_name>,time=2021-12-21 17:57:55 CET FATAL:
08006: connection to client lost
host=[local],user=postgres,db=<db_name>,time=2021-12-21 17:57:55 CET
LOCATION: ProcessInterrupts, postgres.c:3033
host=,user=,db=,time=2021-12-21 17:57:55 CET LOG: 00000: checkpoint
complete: wrote 27593 buffers (10.5%); 0 WAL file(s) added, 2 removed, 33
recycled; write=9.290 s, sync=0.823 s, total=10.189 s; sync files=32,
longest=0.441 s, average=0.026 s; distance=580615 kB, estimate=580615 kB
host=,user=,db=,time=2021-12-21 17:57:55 CET LOCATION: LogCheckpointEnd,
xlog.c:8507
host=[local],user=postgres,db=<db_name>,time=2021-12-21 17:57:55 CET FATAL:
08P01: terminating connection because protocol synchronization was lost
host=[local],user=postgres,db=<db_name>,time=2021-12-21 17:57:55 CET
LOCATION: PostgresMain, postgres.c:4086
host=[local],user=postgres,db=<db_name>,time=2021-12-21 17:57:56 CET LOG:
08006: could not send data to client: Broken pipe
host=[local],user=postgres,db=<db_name>,time=2021-12-21 17:57:56 CET
LOCATION: internal_flush, pqcomm.c:1462
host=[local],user=postgres,db=<db_name>,time=2021-12-21 17:57:56 CET FATAL:
08006: connection to client lost
host=[local],user=postgres,db=<db_name>,time=2021-12-21 17:57:56 CET
LOCATION: ProcessInterrupts, postgres.c:3033

Also, if it may helps, here are the first few lines of the dump file :
PGDMP
<dbname>
9.1.24
9.1.24
ENCODING
ENCODING
SET client_encoding = 'UTF8';
false
STDSTRINGS
STDSTRINGS
SET standard_conforming_strings = 'on';
false
2615
188092
SCHEMA
CREATE SCHEMA <SCHEMA_NAME>;
DROP SCHEMA <SCHEMA_NAME>;
false
1259
188093

Hope this helps for your analysis
Regards

P.S it also works with a dump created with PG 12.7 version and using
--jobs=4
P.P.S above, i obfuscated real objects names between < and >

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #17342: pg_restore with jobs > 1 errors out : a worker process died unexpectedly

PG Bug reporting form <noreply@postgresql.org> writes:

I'm migrating from PostgreSQL 9.1.24 to PostgreSQL 12.7

On the source machine (9.1.24), the export process is done like this :
pg_dump --schema=$SCHEMA --no-owner --format=custom $BASE --no-password
--verbose -f $DUMPFILE

On the target machine (12.7), the import process is done like this :
export PGOPTIONS="-c maintenance_work_mem=512MB"
pg_restore --no-owner --no-tablespaces --dbname=$BASE --username=postgres
--role=$ROLE --schema=$SCHEMA --no-password --verbose --jobs=4
--exit-on-error <dumpfile_from_9.1.24>

And I get the following messages :
pg_restore: error: could not execute query: ERROR: there is no unique
constraint matching given keys for referenced table "<table_name>"
Command was: ALTER TABLE ONLY <table_name>
ADD CONSTRAINT <constraint_name> FOREIGN KEY (<column_name>) REFERENCES
<table_name>(column_name);
pg_restore: error: a worker process died unexpectedly

==> if I omit --jobs option or use --jobs=1, i get no errors

I saw this type of error presumably fixed in 12.4 but I'm on 12.7

12.5, actually. But the point is that you need to use 12.5 or later
pg_dump to get the benefit of the fix. 9.1's pg_dump will produce
an archive with insufficient dependency links, allowing the parallel
restore to execute steps in the wrong order.

You could use the 12.7 installation's pg_dump to extract data from
the old server. However, doing that across the network might slow
things down enough to outweigh the benefit of being able to restore
in parallel later.

regards, tom lane

#3Noname
gparc@free.fr
In reply to: Tom Lane (#2)
Re: BUG #17342: pg_restore with jobs > 1 errors out : a worker process died unexpectedly

Hello Tom,

thanks for your answer.

There is no direct access between source and target server
so I will restore without job (-j) option.
By chance, the schema size to restore is not so huge.

Regards

----- Mail original -----
De: "Tom Lane" <tgl@sss.pgh.pa.us>
À: gparc@online.fr
Cc: pgsql-bugs@lists.postgresql.org
Envoyé: Mercredi 22 Décembre 2021 19:28:28
Objet: Re: BUG #17342: pg_restore with jobs > 1 errors out : a worker process died unexpectedly

PG Bug reporting form <noreply@postgresql.org> writes:

I'm migrating from PostgreSQL 9.1.24 to PostgreSQL 12.7

On the source machine (9.1.24), the export process is done like this :
pg_dump --schema=$SCHEMA --no-owner --format=custom $BASE --no-password
--verbose -f $DUMPFILE

On the target machine (12.7), the import process is done like this :
export PGOPTIONS="-c maintenance_work_mem=512MB"
pg_restore --no-owner --no-tablespaces --dbname=$BASE --username=postgres
--role=$ROLE --schema=$SCHEMA --no-password --verbose --jobs=4
--exit-on-error <dumpfile_from_9.1.24>

And I get the following messages :
pg_restore: error: could not execute query: ERROR: there is no unique
constraint matching given keys for referenced table "<table_name>"
Command was: ALTER TABLE ONLY <table_name>
ADD CONSTRAINT <constraint_name> FOREIGN KEY (<column_name>) REFERENCES
<table_name>(column_name);
pg_restore: error: a worker process died unexpectedly

==> if I omit --jobs option or use --jobs=1, i get no errors

I saw this type of error presumably fixed in 12.4 but I'm on 12.7

12.5, actually. But the point is that you need to use 12.5 or later
pg_dump to get the benefit of the fix. 9.1's pg_dump will produce
an archive with insufficient dependency links, allowing the parallel
restore to execute steps in the wrong order.

You could use the 12.7 installation's pg_dump to extract data from
the old server. However, doing that across the network might slow
things down enough to outweigh the benefit of being able to restore
in parallel later.

regards, tom lane