[PATCH] minor bug fix for pg_dump --clean

Started by Frédéric Yhuelover 3 years ago5 messageshackers
Jump to latest
#1Frédéric Yhuel
frederic.yhuel@dalibo.com

Hello,

When using pg_dump (or pg_restore) with option "--clean", there is some
SQL code to drop every objects at the beginning.

The DROP statement for a view involving circular dependencies is :

CREATE OR REPLACE VIEW [...]

(see commit message of d8c05aff for a much better explanation)

If the view is not in the "public" schema, and the target database is
empty, this statement fails, because the schema hasn't been created yet.

The attached patches are a TAP test which can be used to reproduce the
bug, and a proposed fix. They apply to the master branch.

Best regards,
Frédéric

Attachments:

0002-pg_dump-fix-up-d8c05aff.patchtext/x-patch; charset=UTF-8; name=0002-pg_dump-fix-up-d8c05aff.patchDownload+6-2
0001-Add-TAP-test-for-pg_restore.patchtext/x-patch; charset=UTF-8; name=0001-Add-TAP-test-for-pg_restore.patchDownload+22-1
In reply to: Frédéric Yhuel (#1)
Re: [PATCH] minor bug fix for pg_dump --clean

Hi,

Good catch! Here are several points for improvement:
1. pg_dump.c:17380
Maybe better to write simpler

appendPQExpBuffer(delcmd, "CREATE SCHEMA IF NOT EXISTS %s;\n",
tbinfo->dobj.namespace->dobj.name);

because there is a schema name inside the `tbinfo->dobj.namespace->dobj.name
`

2. pg_backup_archiver.c:588
Here are no necessary spaces before and after braces, and no spaces around
the '+' sign.

( strncmp(dropStmt, "CREATE SCHEMA IF NOT EXISTS", 27) == 0 &&
strstr(dropStmt+29, "CREATE OR REPLACE VIEW") ))

Best regards,
Viktoria Shepard

чт, 1 сент. 2022 г. в 12:13, Frédéric Yhuel <frederic.yhuel@dalibo.com>:

Show quoted text

Hello,

When using pg_dump (or pg_restore) with option "--clean", there is some
SQL code to drop every objects at the beginning.

The DROP statement for a view involving circular dependencies is :

CREATE OR REPLACE VIEW [...]

(see commit message of d8c05aff for a much better explanation)

If the view is not in the "public" schema, and the target database is
empty, this statement fails, because the schema hasn't been created yet.

The attached patches are a TAP test which can be used to reproduce the
bug, and a proposed fix. They apply to the master branch.

Best regards,
Frédéric

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Frédéric Yhuel (#1)
Re: [PATCH] minor bug fix for pg_dump --clean

=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?= <frederic.yhuel@dalibo.com> writes:

When using pg_dump (or pg_restore) with option "--clean", there is some
SQL code to drop every objects at the beginning.

Yup ...

The DROP statement for a view involving circular dependencies is :
CREATE OR REPLACE VIEW [...]
(see commit message of d8c05aff for a much better explanation)
If the view is not in the "public" schema, and the target database is
empty, this statement fails, because the schema hasn't been created yet.
The attached patches are a TAP test which can be used to reproduce the
bug, and a proposed fix. They apply to the master branch.

I am disinclined to accept this as a valid bug, because there's never
been any guarantee that a --clean script would execute error-free in
a database that doesn't match what the source database contains.

(The pg_dump documentation used to say that in so many words.
I see that whoever added the --if-exists option was under the
fond illusion that that fixes all cases, which it surely does not.
We need to back off the promises a bit there.)

An example of a case that won't execute error-free is if the view
having a circular dependency includes a column of a non-built-in
data type. If you try to run that in an empty database, you'll
get an error from the CREATE OR REPLACE VIEW's reference to that
data type. For instance, if I adjust your test case to make
the "payload" column be of type hstore, I get something like

psql:dumpresult.sql:22: ERROR: type "public.hstore" does not exist
LINE 4: NULL::public.hstore AS payload;
^

The same type of failure occurs for user-defined functions and
operators that use a non-built-in type, and I'm sure there are
more cases in the same vein. But it gets *really* messy if
the target database isn't completely empty, but contains objects
with different properties than the dump script expects; for example,
if the view being discussed here exists with a different column set
than the script thinks, or if the dependency chains aren't all the
same.

If this fix were cleaner I might be inclined to accept it anyway,
but it's not very clean at all --- for example, it's far from
obvious to me what are the side-effects of changing the filter
in RestoreArchive like that. Nor am I sure that the schema
you want to create is guaranteed to get dropped again later in
every use-case.

So I think mainly what we ought to do here is to adjust the
documentation to make it clearer that --clean is not guaranteed
to work without errors unless the target database has the same
set of objects as the source. --if-exists can reduce the set
of error cases, but not eliminate it. Possibly we should be
more enthusiastic about recommending --create --clean (ie,
drop and recreate the whole database) instead.

regards, tom lane

#4Frédéric Yhuel
frederic.yhuel@dalibo.com
In reply to: Tom Lane (#3)
Re: [PATCH] minor bug fix for pg_dump --clean

On 10/24/22 03:01, Tom Lane wrote:

=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?= <frederic.yhuel@dalibo.com> writes:

When using pg_dump (or pg_restore) with option "--clean", there is some
SQL code to drop every objects at the beginning.

Yup ...

The DROP statement for a view involving circular dependencies is :
CREATE OR REPLACE VIEW [...]
(see commit message of d8c05aff for a much better explanation)
If the view is not in the "public" schema, and the target database is
empty, this statement fails, because the schema hasn't been created yet.
The attached patches are a TAP test which can be used to reproduce the
bug, and a proposed fix. They apply to the master branch.

I am disinclined to accept this as a valid bug, because there's never
been any guarantee that a --clean script would execute error-free in
a database that doesn't match what the source database contains.

(The pg_dump documentation used to say that in so many words.
I see that whoever added the --if-exists option was under the
fond illusion that that fixes all cases, which it surely does not.
We need to back off the promises a bit there.)

An example of a case that won't execute error-free is if the view
having a circular dependency includes a column of a non-built-in
data type. If you try to run that in an empty database, you'll
get an error from the CREATE OR REPLACE VIEW's reference to that
data type. For instance, if I adjust your test case to make
the "payload" column be of type hstore, I get something like

psql:dumpresult.sql:22: ERROR: type "public.hstore" does not exist
LINE 4: NULL::public.hstore AS payload;
^

The same type of failure occurs for user-defined functions and
operators that use a non-built-in type, and I'm sure there are
more cases in the same vein. But it gets *really* messy if
the target database isn't completely empty, but contains objects
with different properties than the dump script expects; for example,
if the view being discussed here exists with a different column set
than the script thinks, or if the dependency chains aren't all the
same.

If this fix were cleaner I might be inclined to accept it anyway,
but it's not very clean at all --- for example, it's far from
obvious to me what are the side-effects of changing the filter
in RestoreArchive like that. Nor am I sure that the schema
you want to create is guaranteed to get dropped again later in
every use-case.

Hi Tom, Viktoria,

Thank you for your review Viktoria!

Thank you for this detailed explanation, Tom! I didn't have great hope
for this patch. I thought that the TAP test could be accepted, but now I
can see that it is clearly useless.

So I think mainly what we ought to do here is to adjust the
documentation to make it clearer that --clean is not guaranteed
to work without errors unless the target database has the same
set of objects as the source. --if-exists can reduce the set
of error cases, but not eliminate it. Possibly we should be
more enthusiastic about recommending --create --clean (ie,
drop and recreate the whole database) instead.

I beleive a documentation patch would be useful, indeed.

Best regards,
Frédéric

#5Bruce Momjian
bruce@momjian.us
In reply to: Frédéric Yhuel (#4)
Re: [PATCH] minor bug fix for pg_dump --clean

FYI, this was improved in a recent commit:

commit 75af0f401f
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri Sep 29 13:13:54 2023 -0400

Doc: improve description of dump/restore's --clean and --if-exists.

Try to make these option descriptions a little clearer for novices.
Per gripe from Attila Gulyás.

Discussion: /messages/by-id/169590536647.3727336.11070254203649648453@wrigleys.postgresql.org

---------------------------------------------------------------------------

On Mon, Oct 24, 2022 at 09:02:46AM +0200, Frédéric Yhuel wrote:

On 10/24/22 03:01, Tom Lane wrote:

=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?= <frederic.yhuel@dalibo.com> writes:

When using pg_dump (or pg_restore) with option "--clean", there is some
SQL code to drop every objects at the beginning.

Yup ...

The DROP statement for a view involving circular dependencies is :
CREATE OR REPLACE VIEW [...]
(see commit message of d8c05aff for a much better explanation)
If the view is not in the "public" schema, and the target database is
empty, this statement fails, because the schema hasn't been created yet.
The attached patches are a TAP test which can be used to reproduce the
bug, and a proposed fix. They apply to the master branch.

I am disinclined to accept this as a valid bug, because there's never
been any guarantee that a --clean script would execute error-free in
a database that doesn't match what the source database contains.

(The pg_dump documentation used to say that in so many words.
I see that whoever added the --if-exists option was under the
fond illusion that that fixes all cases, which it surely does not.
We need to back off the promises a bit there.)

An example of a case that won't execute error-free is if the view
having a circular dependency includes a column of a non-built-in
data type. If you try to run that in an empty database, you'll
get an error from the CREATE OR REPLACE VIEW's reference to that
data type. For instance, if I adjust your test case to make
the "payload" column be of type hstore, I get something like

psql:dumpresult.sql:22: ERROR: type "public.hstore" does not exist
LINE 4: NULL::public.hstore AS payload;
^

The same type of failure occurs for user-defined functions and
operators that use a non-built-in type, and I'm sure there are
more cases in the same vein. But it gets *really* messy if
the target database isn't completely empty, but contains objects
with different properties than the dump script expects; for example,
if the view being discussed here exists with a different column set
than the script thinks, or if the dependency chains aren't all the
same.

If this fix were cleaner I might be inclined to accept it anyway,
but it's not very clean at all --- for example, it's far from
obvious to me what are the side-effects of changing the filter
in RestoreArchive like that. Nor am I sure that the schema
you want to create is guaranteed to get dropped again later in
every use-case.

Hi Tom, Viktoria,

Thank you for your review Viktoria!

Thank you for this detailed explanation, Tom! I didn't have great hope for
this patch. I thought that the TAP test could be accepted, but now I can see
that it is clearly useless.

So I think mainly what we ought to do here is to adjust the
documentation to make it clearer that --clean is not guaranteed
to work without errors unless the target database has the same
set of objects as the source. --if-exists can reduce the set
of error cases, but not eliminate it. Possibly we should be
more enthusiastic about recommending --create --clean (ie,
drop and recreate the whole database) instead.

I beleive a documentation patch would be useful, indeed.

Best regards,
Frédéric

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.