pg_dumpall --exclude-database option
PFA a patch to provide an --exclude-database option for pg_dumpall. The
causes pg_dumpall to skip any database whose name matches the argument
pattern. The option can be used multiple times.
Among other use cases, this is useful where a database name is visible
but the database is not dumpable by the user. Examples of this occur in
some managed Postgres services.
I will add this to the September CF.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
pg_dumpall-exclude-database.patchtext/x-patch; name=pg_dumpall-exclude-database.patchDownload+115-1
Among other use cases, this is useful where a database name is visible but
the database is not dumpable by the user. Examples of this occur in some
managed Postgres services.
This looks like a reasonable feature.
I will add this to the September CF.
My 0.02€:
Patch applies cleanly, compiles, and works for me.
A question: would it makes sense to have a symmetrical
--include-database=PATTERN option as well?
Somehow the option does not make much sense when under -g/-r/-t... maybe
it should complain, like it does when the others are used together?
ISTM that it would have been better to issue just one query with an OR
list, but that would require to extend "processSQLNamePattern" a little
bit. Not sure whether it is worth it.
Function "database_excluded": I'd suggest to consider reusing the
"simple_string_list_member" function instead of reimplementing it in a
special case.
XML doc: "--exclude-database=dbname", ISTM that
"--exclude-database=pattern" would be closer to what it is? "Multiple
database can be matched by writing multiple switches". Sure, but it can
also be done with a pattern. The documentation seems to assume that the
argument is one database name, and then changes this afterwards. I'd
suggest to start by saying that a pattern like psql is expected, and then
proceed to simply tell that the option can be repeated, instead of
implying that it is a dbname and then telling that it is a pattern.
The simple list is not freed. Ok, it seems to be part of the design of the
data structure.
--
Fabien.
On Fri, Aug 03, 2018 at 11:08:57PM +0200, Fabien COELHO wrote:
Patch applies cleanly, compiles, and works for me.
Last review has not been addressed, so please note that this has been
marked as returned with feedback.
--
Michael
On 08/03/2018 05:08 PM, Fabien COELHO wrote:
Among other use cases, this is useful where a database name is
visible but the database is not dumpable by the user. Examples of
this occur in some managed Postgres services.This looks like a reasonable feature.
Thanks for the review.
I will add this to the September CF.
My 0.02€:
Patch applies cleanly, compiles, and works for me.
A question: would it makes sense to have a symmetrical
--include-database=PATTERN option as well?
I don't think so. If you only want a few databases, just use pg_dump.
The premise of pg_dumpall is that you want all of them and this switch
provides for exceptions to that.
Somehow the option does not make much sense when under -g/-r/-t...
maybe it should complain, like it does when the others are used together?
Added an error check.
ISTM that it would have been better to issue just one query with an OR
list, but that would require to extend "processSQLNamePattern" a
little bit. Not sure whether it is worth it.
I don't think it is. This uses the same pattern that is used in
pg_dump.c for similar switches.
Function "database_excluded": I'd suggest to consider reusing the
"simple_string_list_member" function instead of reimplementing it in a
special case.
done.
XML doc: "--exclude-database=dbname", ISTM that
"--exclude-database=pattern" would be closer to what it is? "Multiple
database can be matched by writing multiple switches". Sure, but it
can also be done with a pattern. The documentation seems to assume
that the argument is one database name, and then changes this
afterwards. I'd suggest to start by saying that a pattern like psql is
expected, and then proceed to simply tell that the option can be
repeated, instead of implying that it is a dbname and then telling
that it is a pattern.
docco revised.
The simple list is not freed. Ok, it seems to be part of the design of
the data structure.
I don't see much point in freeing it.
revised patch attached.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
pg_dumpall-exclude-v3.patchtext/x-patch; name=pg_dumpall-exclude-v3.patchDownload+113-1
Hello Andrew,
A question: would it makes sense to have a symmetrical
--include-database=PATTERN option as well?I don't think so. If you only want a few databases, just use pg_dump. The
premise of pg_dumpall is that you want all of them and this switch provides
for exceptions to that.
Ok, sounds reasonable.
Somehow the option does not make much sense when under -g/-r/-t... maybe it
should complain, like it does when the others are used together?Added an error check.
Ok.
ISTM that it would have been better to issue just one query with an OR
list, but that would require to extend "processSQLNamePattern" a little
bit. Not sure whether it is worth it.I don't think it is. This uses the same pattern that is used in pg_dump.c for
similar switches.
Ok.
revised patch attached.
Patch applies cleanly, compiles, make check ok, pg_dump tap tests ok, doc
build ok.
Very minor comments:
Missing space after comma:
+ {"exclude-database",required_argument, NULL, 5},
Now that C99 is okay, ISTM that both for loops in expand_dbname_patterns
could benefit from using loop-local variables:
for (SimpleStringListCell *cell = ...
for (int i = ...
About the documentation:
"When using wildcards, be careful to quote the pattern if needed to prevent
the shell from expanding the wildcards."
I'd suggest to consider simplifying the end, maybe "to prevent shell
wildcard expansion".
The feature is not tested per se. Maybe one existing tap test could be
extended with minimal fuss to use it, eg --exclude-database='[a-z]*'
should be close to only keeping the global stuff? I noticed an "exclude
table" test already exists.
--
Fabien.
On 10/13/2018 10:07 AM, Fabien COELHO wrote:
Hello Andrew,
A question: would it makes sense to have a symmetrical
--include-database=PATTERN option as well?I don't think so. If you only want a few databases, just use pg_dump.
The premise of pg_dumpall is that you want all of them and this
switch provides for exceptions to that.Ok, sounds reasonable.
Somehow the option does not make much sense when under -g/-r/-t...
maybe it should complain, like it does when the others are used
together?Added an error check.
Ok.
ISTM that it would have been better to issue just one query with an
OR list, but that would require to extend "processSQLNamePattern" a
little bit. Not sure whether it is worth it.I don't think it is. This uses the same pattern that is used in
pg_dump.c for similar switches.Ok.
revised patch attached.
Patch applies cleanly, compiles, make check ok, pg_dump tap tests ok,
doc build ok.Very minor comments:
Missing space after comma:
+ {"exclude-database",required_argument, NULL, 5},
Now that C99 is okay, ISTM that both for loops in
expand_dbname_patterns could benefit from using loop-local variables:for (SimpleStringListCell *cell = ...
for (int i = ...About the documentation:
"When using wildcards, be careful to quote the pattern if needed to
prevent
the shell from expanding the wildcards."I'd suggest to consider simplifying the end, maybe "to prevent shell
wildcard expansion".The feature is not tested per se. Maybe one existing tap test could be
extended with minimal fuss to use it, eg --exclude-database='[a-z]*'
should be close to only keeping the global stuff? I noticed an
"exclude table" test already exists.
This patch addresses all these concerns.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
pg_dumpall-exclude-v4.patchtext/x-patch; name=pg_dumpall-exclude-v4.patchDownload+121-1
Hello Andrew,
This patch addresses all these concerns.
Patch v4 applies cleanly, compiles, doc generation ok, global & local
tests ok.
Tiny comments: there is a useless added blank line at the beginning of the
added varlistenry.
I have recreated the CF entry and put the patch to ready... but I've must
have mixed up something because now there are two entries:-(
Could anywone remove the duplicate entry (1859 & 1860 are the same)?
Peter??
--
Fabien.
On 10/31/2018 12:44 PM, Fabien COELHO wrote:
Hello Andrew,
This patch addresses all these concerns.
Patch v4 applies cleanly, compiles, doc generation ok, global & local
tests ok.Tiny comments: there is a useless added blank line at the beginning of
the added varlistenry.I have recreated the CF entry and put the patch to ready... but I've
must have mixed up something because now there are two entries:-(Could anywone remove the duplicate entry (1859 & 1860 are the same)?
Peter??
:-( My fault, I just created a new one.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
:-( My fault, I just created a new one.
Hmmm... so did I:-) We did it a few minutes apart. I did not find yours
when I first searched, then I proceeded to try to move the previous CF
entry which had been marked as "returned" but this was rejected, so I
recreated the one without checking whether it had appeared in between.
Hopefully someone can remove it…
--
Fabien.
On Wed, Oct 31, 2018 at 05:44:26PM +0100, Fabien COELHO wrote:
Patch v4 applies cleanly, compiles, doc generation ok, global & local tests
ok.
+# also fails for -r and -t, but it seems pointless to add more tests
for those.
+command_fails_like(
+ [ 'pg_dumpall', '--exclude-database=foo', '--globals-only' ],
+ qr/\Qpg_dumpall: option --exclude-database cannot be used
together with -g\/--globals-only\E/,
+ 'pg_dumpall: option --exclude-database cannot be used together
with -g/--globals-only');
Usually testing all combinations is preferred, as well as having one
error message for each pattern, which is also more consistent with all
the other sanity checks in pg_dumpall.c and such.
--
Michael
The comment in expand_dbname_patterns is ungrammatical and mentions
"OID" rather than "name", so I suggest
/*
* The loop below might sometimes result in duplicate entries in the
* output name list, but we don't care.
*/
I'm not sure this is grammatical either:
exclude databases whose name matches PATTERN
I would have written it like this:
exclude databases whose names match PATTERN
but I'm not sure (each database has only one name, of course, but aren't
you talking about multiple databases there?)
Other than that, the patch seems fine to me -- I tested and it works as
intended.
Personally I would say "See also expand_table_name_patterns" instead of
"This is similar to code in pg_dump.c for handling matching table names.",
as well as mention this function in expand_table_name_patterns' comment.
(No need to mention expand_schema_name_patterns, since these are
adjacent.) But this is mostly stylistic and left to your own judgement.
In the long run, I think we should add an option to processSQLNamePattern
to use OR instead of AND, which would fix both this problem as well as
pg_dump's. I don't think that's important enough to stall this patch.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/17/18 9:55 AM, Alvaro Herrera wrote:
The comment in expand_dbname_patterns is ungrammatical and mentions
"OID" rather than "name", so I suggest/*
* The loop below might sometimes result in duplicate entries in the
* output name list, but we don't care.
*/
Will fix.
I'm not sure this is grammatical either:
exclude databases whose name matches PATTERN
I would have written it like this:
exclude databases whose names match PATTERN
but I'm not sure (each database has only one name, of course, but aren't
you talking about multiple databases there?)
I think the original is grammatical.
Other than that, the patch seems fine to me -- I tested and it works as
intended.Personally I would say "See also expand_table_name_patterns" instead of
"This is similar to code in pg_dump.c for handling matching table names.",
as well as mention this function in expand_table_name_patterns' comment.
(No need to mention expand_schema_name_patterns, since these are
adjacent.) But this is mostly stylistic and left to your own judgement.In the long run, I think we should add an option to processSQLNamePattern
to use OR instead of AND, which would fix both this problem as well as
pg_dump's. I don't think that's important enough to stall this patch.
Thanks for the review.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Nov 18, 2018 at 7:41 PM Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
On 11/17/18 9:55 AM, Alvaro Herrera wrote:
The comment in expand_dbname_patterns is ungrammatical and mentions
"OID" rather than "name", so I suggestWill fix.
Other than that, the patch seems fine to me -- I tested and it works as
intended.Personally I would say "See also expand_table_name_patterns" instead of
"This is similar to code in pg_dump.c for handling matching table names.",
as well as mention this function in expand_table_name_patterns' comment.
(No need to mention expand_schema_name_patterns, since these are
adjacent.) But this is mostly stylistic and left to your own judgement.In the long run, I think we should add an option to processSQLNamePattern
to use OR instead of AND, which would fix both this problem as well as
pg_dump's. I don't think that's important enough to stall this patch.Thanks for the review.
Unfortunately judging from cfbot output patch needs to be rebased, could you
please post an updated version with those fixes mentioned above?
On 11/18/18 1:41 PM, Andrew Dunstan wrote:
On 11/17/18 9:55 AM, Alvaro Herrera wrote:
The comment in expand_dbname_patterns is ungrammatical and mentions
"OID" rather than "name", so I suggest/*
* The loop below might sometimes result in duplicate entries in the
* output name list, but we don't care.
*/Will fix.
I'm not sure this is grammatical either:
exclude databases whose name matches PATTERN
I would have written it like this:
exclude databases whose names match PATTERN
but I'm not sure (each database has only one name, of course, but aren't
you talking about multiple databases there?)I think the original is grammatical.
Other than that, the patch seems fine to me -- I tested and it works as
intended.Personally I would say "See also expand_table_name_patterns" instead of
"This is similar to code in pg_dump.c for handling matching table
names.",
as well as mention this function in expand_table_name_patterns' comment.
(No need to mention expand_schema_name_patterns, since these are
adjacent.) But this is mostly stylistic and left to your own judgement.In the long run, I think we should add an option to
processSQLNamePattern
to use OR instead of AND, which would fix both this problem as well as
pg_dump's. I don't think that's important enough to stall this patch.Thanks for the review.
Rebased and updated patch attached.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
pg_dumpall-exclude-v5.patchtext/x-patch; name=pg_dumpall-exclude-v5.patchDownload+124-2
On Fri, Nov 30, 2018 at 04:26:41PM -0500, Andrew Dunstan wrote:
On 11/18/18 1:41 PM, Andrew Dunstan wrote:
On 11/17/18 9:55 AM, Alvaro Herrera wrote:
In the long run, I think we should add an option to processSQLNamePattern
to use OR instead of AND, which would fix both this problem as well as
pg_dump's. I don't think that's important enough to stall this patch.
Agreed. This patch is useful in itself. This option would be nice to
have, and this routine interface would begin to grow too many boolean
switches to my taste so I'd rather use some flags instead.
The patch is doing its work, however I have spotted an issue in the
format of the dumps generated. Each time an excluded database is
processed its set of SET queries (from _doSetFixedOutputState) as well
as the header "PostgreSQL database dump" gets generated. I think that
this data should not show up.
--
Michael
On 12/18/18 11:53 PM, Michael Paquier wrote:
On Fri, Nov 30, 2018 at 04:26:41PM -0500, Andrew Dunstan wrote:
On 11/18/18 1:41 PM, Andrew Dunstan wrote:
On 11/17/18 9:55 AM, Alvaro Herrera wrote:
In the long run, I think we should add an option to processSQLNamePattern
to use OR instead of AND, which would fix both this problem as well as
pg_dump's.� I don't think that's important enough to stall this patch.Agreed. This patch is useful in itself. This option would be nice to
have, and this routine interface would begin to grow too many boolean
switches to my taste so I'd rather use some flags instead.The patch is doing its work, however I have spotted an issue in the
format of the dumps generated. Each time an excluded database is
processed its set of SET queries (from _doSetFixedOutputState) as well
as the header "PostgreSQL database dump" gets generated. I think that
this data should not show up.
I'll take a look.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12/19/18 3:55 PM, Andrew Dunstan wrote:
On 12/18/18 11:53 PM, Michael Paquier wrote:
On Fri, Nov 30, 2018 at 04:26:41PM -0500, Andrew Dunstan wrote:
On 11/18/18 1:41 PM, Andrew Dunstan wrote:
On 11/17/18 9:55 AM, Alvaro Herrera wrote:
In the long run, I think we should add an option to processSQLNamePattern
to use OR instead of AND, which would fix both this problem as well as
pg_dump's.� I don't think that's important enough to stall this patch.Agreed. This patch is useful in itself. This option would be nice to
have, and this routine interface would begin to grow too many boolean
switches to my taste so I'd rather use some flags instead.The patch is doing its work, however I have spotted an issue in the
format of the dumps generated. Each time an excluded database is
processed its set of SET queries (from _doSetFixedOutputState) as well
as the header "PostgreSQL database dump" gets generated. I think that
this data should not show up.I'll take a look.
I think you're mistaken. The following example shows this clearly -
there is nothing corresponding to the 20 excluded databases. What you're
referring to appears to be the statements that preceded the 'CREATE
DATABASE' statement. That's to be excpected.
cheers
andrew
andrew@emma:inst (pg_dumpall--exclude)*$ for x in `seq 1 20` ; do
bin/createdb ex$x; done
andrew@emma:inst (pg_dumpall--exclude)*$ bin/createdb inc
andrew@emma:inst (pg_dumpall--exclude)*$ bin/pg_dumpall
--exclude-database "ex*"
--
-- PostgreSQL database cluster dump
--
SET default_transaction_read_only = off;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
--
-- Roles
--
CREATE ROLE andrew;
ALTER ROLE andrew WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN
REPLICATION BYPASSRLS;
\connect template1
--
-- PostgreSQL database dump
--
-- Dumped from database version 12devel
-- Dumped by pg_dump version 12devel
SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET client_min_messages = warning;
SET row_security = off;
--
-- PostgreSQL database dump complete
--
--
-- PostgreSQL database dump
--
-- Dumped from database version 12devel
-- Dumped by pg_dump version 12devel
SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET client_min_messages = warning;
SET row_security = off;
--
-- Name: inc; Type: DATABASE; Schema: -; Owner: andrew
--
CREATE DATABASE inc WITH TEMPLATE = template0 ENCODING = 'UTF8'
LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';
ALTER DATABASE inc OWNER TO andrew;
\connect inc
SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET client_min_messages = warning;
SET row_security = off;
--
-- PostgreSQL database dump complete
--
\connect postgres
--
-- PostgreSQL database dump
--
-- Dumped from database version 12devel
-- Dumped by pg_dump version 12devel
SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET client_min_messages = warning;
SET row_security = off;
--
-- PostgreSQL database dump complete
--
--
-- PostgreSQL database cluster dump complete
--
�
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Dec 24, 2018 at 02:46:48PM -0500, Andrew Dunstan wrote:
I think you're mistaken. The following example shows this clearly -
there is nothing corresponding to the 20 excluded databases. What you're
referring to appears to be the statements that preceded the 'CREATE
DATABASE' statement. That's to be excpected.
I would have believed that excluding a database removes a full section
from "PostgreSQL database dump" to "PostgreSQL database dump
complete", and not only a portion of it. Sorry for sounding
nit-picky.
--
Michael
Hello Andrew,
Rebased and updated patch attached.
Here is a review of v5, sorry for the delay.
Patch applies cleanly, compiles, "make check" is ok.
I do not see Michaël's issue, and do not see how it could be so, for me
the whole database-specific section generated by the underlying "pg_dump"
call is removed, as expected.
All is well for me, I turned the patch as ready.
While poking around the dump output, I noticed some unrelated points:
* Command "pg_dump" could tell which database is dumped in the output at
the start of the section, eg:
--
-- PostgreSQL database "foo" dump
--
Or "pg_dumpall" could issue a comment line in the output telling which
database is being considered.
* The database dumps should have an introductory comment, like there is
one for roles, eg:
--
-- Databases
--
* On extensions, the dump creates both the extension and the extension
comment. However, ISTM that the extension comment is already created by
the extension, so this is redundant:
--
-- Name: pg_dirtyread; Type: EXTENSION; Schema: -; Owner:
--
CREATE EXTENSION IF NOT EXISTS pg_dirtyread WITH SCHEMA public;
--
-- Name: EXTENSION pg_dirtyread; Type: COMMENT; Schema: -; Owner:
--
COMMENT ON EXTENSION pg_dirtyread IS 'Read dead but unvacuumed rows from table';
Maybe it should notice that the comment belongs to the extension and need
not be updated?
--
Fabien.
On Tue, Dec 25, 2018 at 09:36:05AM +0100, Fabien COELHO wrote:
I do not see Michaël's issue, and do not see how it could be so, for me the
whole database-specific section generated by the underlying "pg_dump" call
is removed, as expected.All is well for me, I turned the patch as ready.
Sorry for the noise. I have been double-checking what I said
previously and I am in the wrong.
--
-- PostgreSQL database "foo" dump
--Or "pg_dumpall" could issue a comment line in the output telling which
database is being considered.
Mentioning which database dump has been completed in the end comment
could be additionally nice.
--
Michael