pg_dump exclusion switches and functions/types
Testing out the new pg_dump exclusion switches I've found that excluding a
table means that no functions or types will be dumped. Excluding one
table shouldn't exclude these objects. My real use case for this
functionality is that I have a database that has 99% of its space used by
one big table and I'd like to be able to dump the rest of the database
quickly. If I lose function/type information it's useless.
Kris Jurka
Kris Jurka <books@ejurka.com> writes:
Testing out the new pg_dump exclusion switches I've found that excluding a
table means that no functions or types will be dumped. Excluding one
table shouldn't exclude these objects.
I tend to agree ... will see if I can make it happen. (I never did get
around to reviewing that patch, anyway ...)
One issue is what to do with procedural languages and large objects,
which don't have any associated schema. If we treat them as being
outside all schemas, we'd have semantics like this: dump the PLs and
blobs unless one or more --schema switches appeared. Is that OK?
regards, tom lane
Hi, Tom,
Tom Lane wrote:
One issue is what to do with procedural languages and large objects,
which don't have any associated schema. If we treat them as being
outside all schemas, we'd have semantics like this: dump the PLs and
blobs unless one or more --schema switches appeared. Is that OK?
Sounds fine.
Is there a possibility to dump only those objects? Maybe --large-objects
and --languages?
Thanks,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS
Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org
Tom Lane wrote:
Kris Jurka <books@ejurka.com> writes:
Testing out the new pg_dump exclusion switches I've found that excluding a
table means that no functions or types will be dumped. Excluding one
table shouldn't exclude these objects.I tend to agree ... will see if I can make it happen. (I never did get
around to reviewing that patch, anyway ...)One issue is what to do with procedural languages and large objects,
which don't have any associated schema. If we treat them as being
outside all schemas, we'd have semantics like this: dump the PLs and
blobs unless one or more --schema switches appeared. Is that OK?
Is there a reason why pg_dump can't do the --list/--use-list flags like
pg_restore, or is it just a matter of round tuits?
--
Richard Huxton
Archonet Ltd
Richard Huxton wrote:
Is there a reason why pg_dump can't do the --list/--use-list flags
like pg_restore, or is it just a matter of round tuits?
The major reason for having those features as I understand it was to
help overcome dependency difficulties in dumps, which are now largely a
thing of the past.
However, ISTM that a similar facility for fine grained control could
fairly easily be built into pg_dump.
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
However, ISTM that a similar facility for fine grained control could
fairly easily be built into pg_dump.
Yeah ... later.
The way I envision it is that the schema-related switches are fine for
selecting things at the level of whole schemas, and the table-related
switches are fine for selecting individual tables, and what we lack are
inclusion/exclusion switches that operate on other kinds of individual
objects. Somebody can design and implement those later, if the itch
strikes. What we have to do today is make sure that the interaction of
schema and table switches is such that an extension in that direction
will fit in naturally.
regards, tom lane
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
However, ISTM that a similar facility for fine grained control could
fairly easily be built into pg_dump.Yeah ... later.
The way I envision it is that the schema-related switches are fine for
selecting things at the level of whole schemas, and the table-related
switches are fine for selecting individual tables, and what we lack are
inclusion/exclusion switches that operate on other kinds of individual
objects. Somebody can design and implement those later, if the itch
strikes. What we have to do today is make sure that the interaction of
schema and table switches is such that an extension in that direction
will fit in naturally.
totally agree.
cheers
andrew
Kris Jurka <books@ejurka.com> writes:
Testing out the new pg_dump exclusion switches I've found that excluding a
table means that no functions or types will be dumped. Excluding one
table shouldn't exclude these objects.
I've been chewing on this a bit and find that the existing patch has
several behaviors that seem surprising. Considering just one type of
inclusion/exclusion switches at a time (we'll get to the interaction of
schema and table switches below), I think we can all agree without too
much argument on these statements:
* With no inclusion/exclusion switches, all objects except system
objects should be dumped.
* With only exclusion switches given, all objects except system
objects and those matching at least one pattern should be dumped.
* With only inclusion switches given, only those objects matching
at least one pattern should be dumped (whether they are system
objects or not).
That last proviso might be debatable but on balance I think it's OK
(for instance, "pg_dump --schema=information_schema" could be useful
for debugging or documentation purposes). Where things get interesting
is when you have both inclusion and exclusion switches given. The
existing patch's behavior is that "the rightmost switch wins", ie,
if an object's name matches more than one pattern then it is included or
excluded according to the rightmost switch it matches. This is, erm,
poorly documented, but it seems like useful behavior so I don't have
an objection myself. The real question is what should happen to objects
that don't match any of the switch patterns? (This is relevant to Kris'
complaint because non-table objects should be treated the same as tables
that don't match any table name inclusion/exclusion switches.)
What I find in the existing code is that if an inclusion switch appears
first:
pg_dump -n 's.*' -N 'ss.*' ...
then only schemas matching an inclusion switch (and not matching any
later exclusion switch) are dumped. While if an exclusion switch
appears first:
pg_dump -N 's.*' -n 'ss.*' ...
then all schemas are dumped except system schemas and those matching an
exclusion switch (and not matching any later inclusion switch). So the
"default" behavior for unmatched objects flips depending on switch
order. This doesn't seem to satisfy the principle of least surprise,
and it's certainly not adequately documented. It might be the most
useful behavior though. I thought about the alternative rule that
"if any inclusion switches appear at all, the default is not to dump"
--- that is, an object must match at least one inclusion switch (and not
match any later exclusion switch) to be dumped. But with that rule,
exclusion switches before the first inclusion switch are actually
useless. Has anyone got a better idea?
Returning to the point about schema versus table selection switches,
what we've got is that they are independent filters: to be dumped,
a table must be in a schema selected by the schema inclusion/exclusion
switches (if any), and it must have a name selected by the table
inclusion/exclusion switches (if any). I think this is OK but it leads
to the property that the order of -n/-N switches is relevant, and the
order of -t/-T switches is relevant, but their order relative to each
other is not relevant. This could be surprising.
If you're still with me, the payoff is here: what are the rules for
dumping non-table objects, given that there are no inclusion/exclusion
switches for them (but we might want to add such later)? If only schema
inclusion/exclusion switches are present, then it's relatively easy to
say "dump objects that are in selected schemas" --- but what about
objects that don't have a schema, such as PLs? And what about the case
where table inclusion/exclusion switches are present? I said above that
non-table objects should be treated the same way as unmatched tables,
which I think is a necessary rule if we want to extend the set of switch
types later. But that leads to the conclusion that "a non-table object
is dumped unless a -t switch appears before any -T switches". Which
strikes me as a mighty surprising behavior. I'm not sure what to do
differently though.
Lastly, as long as we're questioning the premises of this patch,
I wonder about the choice to use regex pattern matching rules.
The problem with regex is that to be upward-compatible with the old
exact-match switch definitions, a switch value that doesn't contain
any regex special characters is treated as an equality condition not
a pattern, which makes for a discontinuity. For instance, "-t x" is
treated like -t '^x$' while -t 'x.*y' doesn't get the anchors added.
That's going to burn people. An alternative we could consider is to
use LIKE patterns instead, but since underscore is a wildcard in LIKE,
it's easy to imagine people getting burnt by that too. Or we could
import the rather ad-hoc shell-wildcard-like rules used by psql's \d
stuff. None of these are especially attractive :-(
Comments?
regards, tom lane
[Snip explanations]
Comments?
Would it be reasonable to include one more switch: 'include
dependencies' ?
That would work like this:
* first consider all to be included objects (possibly limited by the
include switches);
* if dependencies are included, add all dependent objects, plus
non-schema objects (which arguably can be considered as dependencies for
the whole data base);
* remove all objects targeted by exclude switches;
This way you won't have any dependency on the ordering, and you could
consider all non-schema objects as dependencies, so they will only be
included if dependencies are to be included.
Excluding dependencies would be the default if any switches are
specified, including otherwise (not sure how much is this of the
principle of least surprise, but it would be backwards compatible).
The scenario I most care about is to be able to make a complete data
base dump (including non-schema objects) while excluding only a few
tables. If I understood your explanations correctly, this would not be
easily possible with the current implementation. Note that I have a
patch (kludge ?) on the 8.1 pg_dump which does exactly this, it would be
a pity if I would need to patch the 8.2 one again to do that...
Cheers,
Csaba.
Tom Lane wrote:
Lastly, as long as we're questioning the premises of this patch,
I wonder about the choice to use regex pattern matching rules.
The problem with regex is that to be upward-compatible with the old
exact-match switch definitions, a switch value that doesn't contain
any regex special characters is treated as an equality condition not
a pattern, which makes for a discontinuity. For instance, "-t x" is
treated like -t '^x$' while -t 'x.*y' doesn't get the anchors added.
That's going to burn people. An alternative we could consider is to
use LIKE patterns instead, but since underscore is a wildcard in LIKE,
it's easy to imagine people getting burnt by that too. Or we could
import the rather ad-hoc shell-wildcard-like rules used by psql's \d
stuff. None of these are especially attractive :-(Comments?
1. regexes, please.
2. I'd rather remove backwards compatibility than have the
discontinuity. After all, users can anchor the expression pretty easily.
If not, then let's use an alternate switch for the regexes, (I know we
are running out of option space).
cheers
andrew
Csaba Nagy <nagy@ecircle-ag.com> writes:
Would it be reasonable to include one more switch: 'include
dependencies' ?
We are two months past feature freeze ... adding entirely new features
to pg_dump is *not* on the table for 8.2. What we need to do at the
moment is make sure that the features we've got work sanely and won't
create headaches for likely future extensions; but not actually
implement those extensions.
The scenario I most care about is to be able to make a complete data
base dump (including non-schema objects) while excluding only a few
tables.
Isn't this the same as Kris' complaint? Why do you need additional
dependency analysis to do the above?
regards, tom lane
Andrew Dunstan <andrew@dunslane.net> writes:
Tom Lane wrote:
... Or we could
import the rather ad-hoc shell-wildcard-like rules used by psql's \d
stuff. None of these are especially attractive :-(
1. regexes, please.
One argument that occurs to me for importing the psql code is that it's
solved the problem of including a schema name in the pattern. It would
be a lot nicer to say "-t schema.table" than to have to say "-t table -n
schema". In particular this allows one to dump s1.foo and s2.bar
without also getting s1.bar and s2.foo; a problem that is insoluble if
we have only independent schema and table filters. I think that ideally
one would only use the schema switches if one actually wanted a
schema-by-schema dump, not as a wart on the side of the
specific-object-selection switches.
The psql code does allow you to get at most of the functionality of
regexes...
regards, tom lane
On Fri, Oct 06, 2006 at 11:54:51 -0400,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
The problem with regex is that to be upward-compatible with the old
exact-match switch definitions, a switch value that doesn't contain
any regex special characters is treated as an equality condition not
a pattern, which makes for a discontinuity. For instance, "-t x" is
treated like -t '^x$' while -t 'x.*y' doesn't get the anchors added.
That's going to burn people. An alternative we could consider is to
use LIKE patterns instead, but since underscore is a wildcard in LIKE,
it's easy to imagine people getting burnt by that too. Or we could
import the rather ad-hoc shell-wildcard-like rules used by psql's \d
stuff. None of these are especially attractive :-(Comments?
How about making the regex's anchored by default? People who want unanchored
ones can add .* at the beginning and/or end. Since only whether or not
the pattern matches is important (not the string it matched), this keeps
all of the same power, but matches the old behavior in simple cases.
Tom Lane <tgl@sss.pgh.pa.us> writes:
The existing patch's behavior is that "the rightmost switch wins", ie, if an
object's name matches more than one pattern then it is included or excluded
according to the rightmost switch it matches. This is, erm, poorly
documented, but it seems like useful behavior so I don't have an objection
myself.
I don't know, it sounds like it's the source of the confusion you identify
later.
My first thought is that the rule should be to apply all the inclusion
switches (implicitly including everything if there are none), then apply all
the exclusion switches.
That leads to including non-schema objects only if there are no schema
inclusion switches. Which seems pretty logical since if you're explicitly
including objects then you'll only expect objects explicitly included to be
dumped and you'll quickly realize there's no switch to bring in those
non-schema objects. Maybe there should be a switch to include them just for
completeness.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes:
Tom Lane <tgl@sss.pgh.pa.us> writes:
The existing patch's behavior is that "the rightmost switch wins", ie, if an
object's name matches more than one pattern then it is included or excluded
according to the rightmost switch it matches.
My first thought is that the rule should be to apply all the inclusion
switches (implicitly including everything if there are none), then apply all
the exclusion switches.
I kinda like that, because it makes the behavior completely independent
of switch ordering, which seems like a good property to preserve.
Anyone else have an opinion pro or con?
That leads to including non-schema objects only if there are no schema
inclusion switches. Which seems pretty logical since if you're explicitly
including objects then you'll only expect objects explicitly included to be
dumped and you'll quickly realize there's no switch to bring in those
non-schema objects. Maybe there should be a switch to include them just for
completeness.
Well, pg_dump already has a --blobs switch, which has been a no-op
(because now the default) since 8.1, but it's still in the switch
parser. It wouldn't take much to revive it for the purpose of causing
blobs to be dumped even when there's an inclusion switch. As for PLs,
I'm not really too worried about dumping them per se (since it's usually
easy enough to create the ones you're using). The functionality we're
really lacking there is the "--include-dependencies" switch that was
discussed upthread ... which I think is a fine idea but should wait for
8.3.
regards, tom lane
Tom,
I kinda like that, because it makes the behavior completely independent
of switch ordering, which seems like a good property to preserve.
Anyone else have an opinion pro or con?
The only "con" argument I can think of is that "tar" and "rsync", whose syntax
is familiar to a lot of sysadmins, apply switches left-to-right.
However, I don't feel that that is a compelling argument. The include/exclude
switch order processing is something I've always *hated* about tar and has
messed me up more times than I can count. Also, Windows users could care
less if we behave like tar.
So +1 to go with orderless switching.
--
Josh Berkus
PostgreSQL @ Sun
San Francisco
I wrote:
One argument that occurs to me for importing the psql code is that it's
solved the problem of including a schema name in the pattern. It would
be a lot nicer to say "-t schema.table" than to have to say "-t table -n
schema".
The more I think about this, the more I think the above is a killer
argument. We really should have had the ability to say "-t schema.table"
ever since schemas were added in 7.3, but no one got around to making it
happen. If we go over to interpreting the arguments as standard regexes
then we'll never be able to do that, because we'll have foreclosed the
meaning of dot. The psql pattern code was specifically designed as a
compromise notation adapted to SQL needs, and IMHO it's served pretty
well --- so I think we should adopt that into pg_dump rather than pure
regex notation.
The psql code does allow you to get at most of the functionality of
regexes...
Actually, it lets you get at all of it, though perhaps a bit awkwardly.
The transformations it makes are
. => schema vs name separator
* => .*
? => .
So the only regex patterns you can't write directly are dot, R* and R?
for which you can use these locutions:
. => ?
R* => (R+|)
R? => (R|)
(Perhaps this should be documented somewhere...)
So I propose that we should revise the patch to use psql's \d code to
determine which objects match a pattern. I think that together with
Greg's idea of processing all inclusions before all exclusions should
answer the concerns I've got about the patch.
regards, tom lane
On Fri, Oct 06, 2006 at 10:28:21PM -0400, Gregory Stark wrote:
Tom Lane <tgl@sss.pgh.pa.us> writes:
The existing patch's behavior is that "the rightmost switch wins",
ie, if an object's name matches more than one pattern then it is
included or excluded according to the rightmost switch it matches.
This is, erm, poorly documented, but it seems like useful behavior
so I don't have an objection myself.I don't know, it sounds like it's the source of the confusion you
identify later.My first thought is that the rule should be to apply all the
inclusion switches (implicitly including everything if there are
none), then apply all the exclusion switches.
+1 :)
Order-dependent switches are a giant foot gun.
Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter
Remember to vote!
We are two months past feature freeze ... adding entirely new features
to pg_dump is *not* on the table for 8.2.
Ok, clear.
The scenario I most care about is to be able to make a complete data
base dump (including non-schema objects) while excluding only a few
tables.Isn't this the same as Kris' complaint? Why do you need additional
dependency analysis to do the above?
Well, I obviously didn't understand well the complete feature as it is
implemented. Now, is what I want (see above) possible with the new
feature, or if I exclude some tables I implicitly loose some other
things too from the dump which normally would be there ? This is my only
concern...
Cheers,
Csaba.
Csaba Nagy <nagy@ecircle-ag.com> writes:
Isn't this the same as Kris' complaint? Why do you need additional
dependency analysis to do the above?
Well, I obviously didn't understand well the complete feature as it is
implemented. Now, is what I want (see above) possible with the new
feature, or if I exclude some tables I implicitly loose some other
things too from the dump which normally would be there ? This is my only
concern...
I think we've agreed that if you use some exclusion switches, but not
any inclusion switches, then only the specific objects matching your
switches are excluded. CVS HEAD gets this wrong, but I'm going to work
on it today.
regards, tom lane