pgsql: Move handling of database properties from pg_dumpall into pg_dum
Move handling of database properties from pg_dumpall into pg_dump.
This patch rearranges the division of labor between pg_dump and pg_dumpall
so that pg_dump itself handles all properties attached to a single
database. Notably, a database's ACL (GRANT/REVOKE status) and local GUC
settings established by ALTER DATABASE SET and ALTER ROLE IN DATABASE SET
can be dumped and restored by pg_dump. This is a long-requested
improvement.
"pg_dumpall -g" will now produce only role- and tablespace-related output,
nothing about individual databases. The total output of a regular
pg_dumpall run remains the same.
pg_dump (or pg_restore) will restore database-level properties only when
creating the target database with --create. This applies not only to
ACLs and GUCs but to the other database properties it already handled,
that is database comments and security labels. This is more consistent
and useful, but does represent an incompatibility in the behavior seen
without --create.
(This change makes the proposed patch to have pg_dump use "COMMENT ON
DATABASE CURRENT_DATABASE" unnecessary, since there is no case where
the command is issued that we won't know the true name of the database.
We might still want that patch as a feature in its own right, but pg_dump
no longer needs it.)
pg_dumpall with --clean will now drop and recreate the "postgres" and
"template1" databases in the target cluster, allowing their locale and
encoding settings to be changed if necessary, and providing a cleaner
way to set nondefault tablespaces for them than we had before. This
means that such a script must now always be started in the "postgres"
database; the order of drops and reconnects will not work otherwise.
Without --clean, the script will not adjust any database-level properties
of those two databases (including their comments, ACLs, and security
labels, which it formerly would try to set).
Another minor incompatibility is that the CREATE DATABASE commands in a
pg_dumpall script will now always specify locale and encoding settings.
Formerly those would be omitted if they matched the cluster's default.
While that behavior had some usefulness in some migration scenarios,
it also posed a significant hazard of unwanted locale/encoding changes.
To migrate to another locale/encoding, it's now necessary to use pg_dump
without --create to restore into a database with the desired settings.
Commit 4bd371f6f's hack to emit "SET default_transaction_read_only = off"
is gone: we now dodge that problem by the expedient of not issuing ALTER
DATABASE SET commands until after reconnecting to the target database.
Therefore, such settings won't apply during the restore session.
In passing, improve some shaky grammar in the docs, and add a note pointing
out that pg_dumpall's output can't be expected to load without any errors.
(Someday we might want to fix that, but this is not that patch.)
Haribabu Kommi, reviewed at various times by Andreas Karlsson,
Vaishnavi Prabakaran, and Robert Haas; further hacking by me.
Discussion: /messages/by-id/CAJrrPGcUurV0eWTeXODwsOYFN=Ekq36t1s0YnFYUNzsmRfdAyA@mail.gmail.com
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/b3f8401205afdaf63cb20dc316d44644c933d5a1
Modified Files
--------------
doc/src/sgml/ref/pg_dump.sgml | 34 ++-
doc/src/sgml/ref/pg_dumpall.sgml | 50 +++-
doc/src/sgml/ref/pg_restore.sgml | 11 +
src/bin/pg_dump/dumputils.c | 51 ++++
src/bin/pg_dump/dumputils.h | 5 +
src/bin/pg_dump/pg_backup_archiver.c | 33 ++-
src/bin/pg_dump/pg_dump.c | 237 ++++++++++++++--
src/bin/pg_dump/pg_dumpall.c | 507 +++++------------------------------
src/bin/pg_dump/t/002_pg_dump.pl | 25 +-
src/bin/pg_upgrade/pg_upgrade.c | 54 ++--
10 files changed, 470 insertions(+), 537 deletions(-)
On Mon, Jan 22, 2018 at 2:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
pg_dumpall with --clean will now drop and recreate the "postgres" and
"template1" databases in the target cluster, allowing their locale and
encoding settings to be changed if necessary, and providing a cleaner
way to set nondefault tablespaces for them than we had before. This
means that such a script must now always be started in the "postgres"
database; the order of drops and reconnects will not work otherwise.
Without --clean, the script will not adjust any database-level properties
of those two databases (including their comments, ACLs, and security
labels, which it formerly would try to set).
Unless we insert some guard that causes a hard error or at least warns
the user if they do the wrong thing, this seems extremely likely to
have people (1) try to dump and restore using pg_dumpall and (2)
complain when it doesn't work. Actually, it'll work fine if your OS
username happens to be "postgres", but otherwise not so much.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Jan 22, 2018 at 2:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
pg_dumpall with --clean will now drop and recreate the "postgres" and
"template1" databases in the target cluster, allowing their locale and
encoding settings to be changed if necessary, and providing a cleaner
way to set nondefault tablespaces for them than we had before. This
means that such a script must now always be started in the "postgres"
database; the order of drops and reconnects will not work otherwise.
Without --clean, the script will not adjust any database-level properties
of those two databases (including their comments, ACLs, and security
labels, which it formerly would try to set).
Unless we insert some guard that causes a hard error or at least warns
the user if they do the wrong thing, this seems extremely likely to
have people (1) try to dump and restore using pg_dumpall and (2)
complain when it doesn't work. Actually, it'll work fine if your OS
username happens to be "postgres", but otherwise not so much.
Not entirely following your concern. The initial DB's name is "postgres"
regardless of the bootstrap superuser's name. If you're thinking of the
case where you do "initdb -U joe" and then try to "psql <pgdumpallscript"
without specifying a target DB name, that would have failed before and
still will, because there's no DB named "joe". Or if there is, the
DROP DATABASE on it would have failed because you're connected to it,
so it doesn't seem like a habit people would've got into.
The only practice that worked before and now will not is
"psql template1 <pgdumpallscript", and even there, the only thing that
really happens is that template1's locale/encoding/tablespace don't
get adjusted from the way initdb left them. Which was true anyway
for the locale/encoding.
regards, tom lane
I wrote:
Commit 4bd371f6f's hack to emit "SET default_transaction_read_only = off"
is gone: we now dodge that problem by the expedient of not issuing ALTER
DATABASE SET commands until after reconnecting to the target database.
Therefore, such settings won't apply during the restore session.
The buildfarm just pointed out an issue in that: the delayed SET commands
might affect the interpretation of data during the restore session.
Specifically, I see failures like this on machines where the prevailing
locale isn't C or US:
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 4871; 2618 34337 RULE rtest_emp rtest_emp_del pgbf
pg_restore: [archiver (db)] could not execute query: ERROR: invalid input syntax for type money: "$0.00"
LINE 3: ... ("old"."ename", CURRENT_USER, 'fired'::"bpchar", '$0.00'::"...
^
Command was: CREATE RULE "rtest_emp_del" AS
ON DELETE TO "rtest_emp" DO INSERT INTO "rtest_emplog" ("ename", "who", "action", "newsal", "oldsal")
VALUES ("old"."ename", CURRENT_USER, 'fired'::"bpchar", '$0.00'::"money", "old"."salary");
The dump dumped the money literal with '$' because we set up the
regression database with
ALTER DATABASE regression SET lc_monetary TO 'C';
but at the time we process the CREATE RULE during the restore, we're
still running with whatever the environmental default is.
Not very sure about the best solution here. Clearly, we'd better absorb
these settings to some extent, but how?
I thought for a bit about having pg_dump emit immediate SET operations
along with the ALTER versions, for any GUCs we deem important for
restore purposes. This would probably be about a 99% solution, but
we could only do it for ALTER DATABASE SET not for ALTER ROLE IN DATABASE
SET, because we couldn't be very sure which of the latter should apply.
So it would have some edge-case failure conditions.
Or we could reconnect after applying the ALTERs, ensuring that we have
the expected environment. But then we're back into the problem that
commit 4bd371f6f hoped to solve, namely that "ALTER DATABASE SET
default_transaction_read_only = on" breaks pg_dumpall (and hence
pg_upgrade).
I then thought about splitting the ALTERs into two separate TOC entries,
one for "desirable" GUCs (which we'd apply by reconnecting after emitting
its contents), and one for "undesirables", which we would not reconnect
after. That seemed a bit messy because of the need to maintain a
blacklist going forward, and the possibility that we couldn't agree on
what to blacklist.
Then it occurred to me that none of this works anyway for parallel
pg_restore --- including parallel pg_upgrade --- because the workers
are going to see all the ALTERs in place. And that means that commit
4bd371f6f's hack has been broken since parallel upgrade went in, in 9.3.
So at this point I'm feeling that letting pg_dumpall work around
default_transaction_read_only = on is just too much of a hack, and we
should forget about that consideration entirely. If we do, fixing the
current buildfarm failure is about a one-line patch: just reconnect
after processing DATABASE PROPERTIES.
If someone were to hold a gun to my head and say "make this work", what
I'd probably do is set up the desirable vs undesirable split and then
arrange for the "undesirable" GUCs to be postponed to the end of the
restore, after the parallel portion of the run is complete. But man,
that's a lot of ugliness.
I think the only reason that 4bd371f6f got in at all was that it was just
a two-line kluge, and we were willing to accept that amount of ugliness
to handle a corner case more nicely. The cost of solving it after the
pg_dump/dumpall refactoring will be a lot higher, and I for one don't
think it's worth it.
Comments?
regards, tom lane
I wrote:
Specifically, I see failures like this on machines where the prevailing
locale isn't C or US:
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 4871; 2618 34337 RULE rtest_emp rtest_emp_del pgbf
pg_restore: [archiver (db)] could not execute query: ERROR: invalid input syntax for type money: "$0.00"
LINE 3: ... ("old"."ename", CURRENT_USER, 'fired'::"bpchar", '$0.00'::"...
^
Command was: CREATE RULE "rtest_emp_del" AS
ON DELETE TO "rtest_emp" DO INSERT INTO "rtest_emplog" ("ename", "who", "action", "newsal", "oldsal")
VALUES ("old"."ename", CURRENT_USER, 'fired'::"bpchar", '$0.00'::"money", "old"."salary");
Actually ... maybe what this is pointing out is a pre-existing deficiency
in pg_dump, which is that it's failing to lock down lc_monetary during
restore. Arguably, we should teach _doSetFixedOutputState to set
lc_monetary to whatever prevailed in the dump session, just as we do
for client_encoding. I seem now to recall some user complaints about
unsafety of dump/restore for "money" values, which essentially is the
problem we're seeing here.
I think the reason we haven't done this already is fear of putting
platform-dependent lc_monetary values into dump scripts. That's
certainly an issue, but it seems a minor one: at worst, you'd have
to not use --single-transaction when restoring on a different platform,
so you could ignore an error from the SET command.
While this would fix the specific problem we're seeing in the buildfarm,
I'm thinking we'd still need to do what I said in the previous message,
and change pg_dump so that the restore session will absorb ALTER DATABASE
settings before proceeding. Otherwise, at minimum, we have a hazard of
differing behaviors in serial and parallel restores. It might be that
lc_monetary is the only GUC that makes a real difference for this purpose,
but I haven't got much faith in that proposition at the moment.
regards, tom lane
On Tue, Jan 23, 2018 at 8:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
Specifically, I see failures like this on machines where the prevailing
locale isn't C or US:pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 4871; 2618 34337 RULErtest_emp rtest_emp_del pgbf
pg_restore: [archiver (db)] could not execute query: ERROR: invalid
input syntax for type money: "$0.00"
LINE 3: ... ("old"."ename", CURRENT_USER, 'fired'::"bpchar",
'$0.00'::"...
^
Command was: CREATE RULE "rtest_emp_del" AS
ON DELETE TO "rtest_emp" DO INSERT INTO "rtest_emplog" ("ename","who", "action", "newsal", "oldsal")
VALUES ("old"."ename", CURRENT_USER, 'fired'::"bpchar",
'$0.00'::"money", "old"."salary");
Actually ... maybe what this is pointing out is a pre-existing deficiency
in pg_dump, which is that it's failing to lock down lc_monetary during
restore. Arguably, we should teach _doSetFixedOutputState to set
lc_monetary to whatever prevailed in the dump session, just as we do
for client_encoding. I seem now to recall some user complaints about
unsafety of dump/restore for "money" values, which essentially is the
problem we're seeing here.I think the reason we haven't done this already is fear of putting
platform-dependent lc_monetary values into dump scripts. That's
certainly an issue, but it seems a minor one: at worst, you'd have
to not use --single-transaction when restoring on a different platform,
so you could ignore an error from the SET command.While this would fix the specific problem we're seeing in the buildfarm,
I'm thinking we'd still need to do what I said in the previous message,
and change pg_dump so that the restore session will absorb ALTER DATABASE
settings before proceeding. Otherwise, at minimum, we have a hazard of
differing behaviors in serial and parallel restores. It might be that
lc_monetary is the only GUC that makes a real difference for this purpose,
but I haven't got much faith in that proposition at the moment.
Yes, restore session should absorb all the ALTER DATABASE and ALTER ROLE
IN DATABASE settings also to make sure that the target database is in same
state when the dump has started.
currently "default_transaction_read_only" is the only GUC that affects the
absorbing the restore of a database.
As you said in upthread, I feel splitting them into two _TOC entries and
dump
as last commands of each database? Does it have any problem with parallel
restore?
Regards,
Hari Babu
Fujitsu Australia
Haribabu Kommi <kommi.haribabu@gmail.com> writes:
On Tue, Jan 23, 2018 at 8:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm thinking we'd still need to do what I said in the previous message,
and change pg_dump so that the restore session will absorb ALTER DATABASE
settings before proceeding. Otherwise, at minimum, we have a hazard of
differing behaviors in serial and parallel restores. It might be that
lc_monetary is the only GUC that makes a real difference for this purpose,
but I haven't got much faith in that proposition at the moment.
Yes, restore session should absorb all the ALTER DATABASE and ALTER ROLE
IN DATABASE settings also to make sure that the target database is in same
state when the dump has started.
I've pushed a patch to do that.
currently "default_transaction_read_only" is the only GUC that affects the
absorbing the restore of a database.
Well, that's exactly the $64 question. Are we sure we have a complete,
reliable list of which GUCs do or do not affect data restoration?
I wouldn't count on it. If nothing else, extensions might have custom
GUCs that we could not predict the behavior of.
As you said in upthread, I feel splitting them into two _TOC entries and
dump as last commands of each database? Does it have any problem with
parallel restore?
As I said yesterday, I'm not really eager to do that. It's a lot of
complexity and a continuing maintenance headache to solve a use-case
that I find pretty debatable. Anyone who's actually put in
"default_transaction_read_only = on" in a way that restricts their
maintenance roles must have created procedures for undoing it easily;
otherwise day-to-day maintenance would be a problem. Further, I don't
find the original hack's distinction between pg_dump and pg_dumpall
to be really convincing. Maybe that says we should resolve this by just
sticking "SET default_transaction_read_only = off" into regular pg_dump
output after all --- perhaps with a switch added to enable it. But I'd
kind of like to see the argument why we should worry about this at all
made afresh.
regards, tom lane