pg_dump --with-* options
I'm looking at the new in PG18 pg_dump --with-* options, and I'm having
trouble understanding them. (I did not look into the source code or the
git or mailing list history for this, to try to understand it as a user.)
We have
-a, --data-only dump only the data, not the schema or statistics
--no-data do not dump data
--with-data dump the data # this one is new
(and there is also --section=data), and then three analogous options for
"schema" and "statistics".
What is the purpose of the --with-data option? Dumping the data is the
default. Is this to override an earlier --no-data option?
The man page is only minimally more verbose: "Dump data. This is the
default." But why do you then need this option?
I think we should add some more documenting detail for these, but right
now I don't know what it would be.
On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:
We have
-a, --data-only dump only the data, not the schema or statistics
--no-data do not dump data
--with-data dump the data # this one is new(and there is also --section=data), and then three analogous options for
"schema" and "statistics".What is the purpose of the --with-data option? Dumping the data is the
default. Is this to override an earlier --no-data option?
I believe the idea is that these will allow folks to be explicit about what
they want instead of needing to understand the defaults for every
component.
--
nathan
On 06.06.25 17:39, Nathan Bossart wrote:
On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:
We have
-a, --data-only dump only the data, not the schema or statistics
--no-data do not dump data
--with-data dump the data # this one is new(and there is also --section=data), and then three analogous options for
"schema" and "statistics".What is the purpose of the --with-data option? Dumping the data is the
default. Is this to override an earlier --no-data option?I believe the idea is that these will allow folks to be explicit about what
they want instead of needing to understand the defaults for every
component.
I get that idea, but we really need some more documentation for this, I
think. So far I could only guess how this is supposed to be used, and I
also happened to guess wrong.
My initial guess was that --with-data can override --no-data. That
would have been pretty standard "last option wins" behavior. But
pg_dump rejects that. Personally, I think that is kind of wrong.
But you can use --with-data to override, say, --schema-only. That also
seems kind of wrong to me, but anyway. Then, it seems that
--schema-only plus --with-data is kind of the same as everything plus
--no-statistics?
I'm also wondering, if you consider the aforementioned "last option
wins" behavior, should there be a difference between
--schema-only --with-data, and
--with-data --schema-only
or for that matter
--schema-only --data-only
But in any case, if you want that level of precision, wouldn't it make
more sense to use the --section option?
This is all a nice logic puzzle, but some more documentation would be nice.
On Fri, Jun 6, 2025 at 11:40 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:
We have
-a, --data-only dump only the data, not the schema or statistics
--no-data do not dump data
--with-data dump the data # this one is new(and there is also --section=data), and then three analogous options for
"schema" and "statistics".What is the purpose of the --with-data option? Dumping the data is the
default. Is this to override an earlier --no-data option?I believe the idea is that these will allow folks to be explicit about what
they want instead of needing to understand the defaults for every
component.
Am I too late to propose ripping this out?
I mean, if I look at pg_dump --help and there are options for
--with-broccoli and --without-mushrooms, I know that the defaults are
no brocooli, yes mushrooms, and I know which options I need to specify
to get the behavior that I want, whatever that happens to be. If all
options exist in both forms, it's a lot more confusing. Maybe there's
some issue of cross-version compatibility here that justifies this
complexity, but I don't see what it would be. I would think
--with-data has always been the default and always will be, so we just
don't need --with-data for anything. But maybe I'm confused.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 2025/06/12 22:47, Peter Eisentraut wrote:
On 06.06.25 17:39, Nathan Bossart wrote:
On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:
We have
-a, --data-only dump only the data, not the schema or statistics
--no-data do not dump data
--with-data dump the data # this one is new(and there is also --section=data), and then three analogous options for
"schema" and "statistics".What is the purpose of the --with-data option? Dumping the data is the
default. Is this to override an earlier --no-data option?I believe the idea is that these will allow folks to be explicit about what
they want instead of needing to understand the defaults for every
component.I get that idea, but we really need some more documentation for this, I think. So far I could only guess how this is supposed to be used, and I also happened to guess wrong.
My initial guess was that --with-data can override --no-data. That would have been pretty standard "last option wins" behavior. But pg_dump rejects that. Personally, I think that is kind of wrong.
But you can use --with-data to override, say, --schema-only. That also seems kind of wrong to me, but anyway.
While testing pg_dump --with-* in relation to bug #18952 [1]/messages/by-id/18952-be40a620f8b1e755@postgresql.org,
I also ran into this behavior. It was surprising,
as I expected pg_dump to reject that combination of options.
The current behavior seems confusing.
Regards,
[1]: /messages/by-id/18952-be40a620f8b1e755@postgresql.org
--
Fujii Masao
NTT DATA Japan Corporation
On Thu, Jun 12, 2025 at 10:18:56AM -0400, Robert Haas wrote:
On Fri, Jun 6, 2025 at 11:40 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:
What is the purpose of the --with-data option? Dumping the data is the
default. Is this to override an earlier --no-data option?I believe the idea is that these will allow folks to be explicit about what
they want instead of needing to understand the defaults for every
component.Am I too late to propose ripping this out?
I mean, if I look at pg_dump --help and there are options for
--with-broccoli and --without-mushrooms, I know that the defaults are
no brocooli, yes mushrooms, and I know which options I need to specify
to get the behavior that I want, whatever that happens to be. If all
options exist in both forms, it's a lot more confusing. Maybe there's
some issue of cross-version compatibility here that justifies this
complexity, but I don't see what it would be. I would think
--with-data has always been the default and always will be, so we just
don't need --with-data for anything. But maybe I'm confused.
If the idea is to remove all options for default behavior, we'd be removing
--no-statistics, --with-data, and --with-schema at this point. Maybe we
could go a step further and even rip out --statistics-only (in favor of
--no-schema --no-data --with-statistics). In general, I do think the list
of pg_dump options is pretty unwieldy at this point.
--
nathan
On Thu, 2025-06-12 at 15:47 +0200, Peter Eisentraut wrote:
My initial guess was that --with-data can override --no-data. That
would have been pretty standard "last option wins" behavior. But
pg_dump rejects that. Personally, I think that is kind of wrong.
Do we have other options that are order-sensitive?
But in any case, if you want that level of precision, wouldn't it
make
more sense to use the --section option?
That's not possible with statistics, because some appear in
SECTION_DATA and some in SECTION_POST_DATA (e.g. stats on indexes,
which are in SECTION_POST_DATA).
Regards,
Jeff Davis
On Thu, 2025-06-12 at 10:18 -0400, Robert Haas wrote:
Am I too late to propose ripping this out?
As long as we keep the functionality, I'm fine changing the
options/names around at this point.
Regards,
Jeff Davis
On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
If the idea is to remove all options for default behavior, we'd be
removing
--no-statistics, --with-data, and --with-schema at this point.
That's OK with me.
Maybe we
could go a step further and even rip out --statistics-only (in favor
of
--no-schema --no-data --with-statistics).
I'd probably keep --statistics-only.
Regards,
Jeff Davis
On Thu, Jun 12, 2025 at 08:58:15AM -0700, Jeff Davis wrote:
On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
If the idea is to remove all options for default behavior, we'd be
removing
--no-statistics, --with-data, and --with-schema at this point.That's OK with me.
� Maybe we
could go a step further and even rip out --statistics-only (in favor
of
--no-schema --no-data --with-statistics).I'd probably keep --statistics-only.
WFM
--
nathan
On Thu, Jun 12, 2025 at 11:58 AM Jeff Davis <pgsql@j-davis.com> wrote:
On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
If the idea is to remove all options for default behavior, we'd be
removing
--no-statistics, --with-data, and --with-schema at this point.That's OK with me.
Same.
Maybe we
could go a step further and even rip out --statistics-only (in favor
of
--no-schema --no-data --with-statistics).I'd probably keep --statistics-only.
I'm going to vote for removing it. pg_dump has a lot of options, and
it doesn't seem like a good bet to me to have options that are
equivalent to various combinations of other options. I don't see any
particular reason to believe that --statistics-only is even a
particularly likely combination of options for someone to want. I'd
rather keep it simple.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 12.06.25 17:14, Jeff Davis wrote:
On Thu, 2025-06-12 at 15:47 +0200, Peter Eisentraut wrote:
My initial guess was that --with-data can override --no-data. That
would have been pretty standard "last option wins" behavior. But
pg_dump rejects that. Personally, I think that is kind of wrong.Do we have other options that are order-sensitive?
I think most of them are. For example:
psql -p 5432 -p 5433
initdb --data-checksums --no-data-checksums
postgres --shared-buffers=1GB --shared-buffers=2GB
On Thu, 2025-06-12 at 13:36 -0400, Robert Haas wrote:
On Thu, Jun 12, 2025 at 11:58 AM Jeff Davis <pgsql@j-davis.com> wrote:
On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
If the idea is to remove all options for default behavior, we'd be
removing
--no-statistics, --with-data, and --with-schema at this point.That's OK with me.
Same.
I must be missing something, but I think --no-statistics is sorely needed.
How else can I get the effect of
pg_dump --no-statistics mydb
Yours,
Laurenz Albe
On Thu, Jun 12, 2025 at 10:07:05PM +0200, Laurenz Albe wrote:
I must be missing something, but I think --no-statistics is sorely needed.
How else can I get the effect ofpg_dump --no-statistics mydb
This was recently changed to be the default behavior (see commit 34eb2a8).
--
nathan
On Thu, Jun 12, 2025 at 1:36 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jun 12, 2025 at 11:58 AM Jeff Davis <pgsql@j-davis.com> wrote:
On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:
If the idea is to remove all options for default behavior, we'd be
removing
--no-statistics, --with-data, and --with-schema at this point.That's OK with me.
Same.
Maybe we
could go a step further and even rip out --statistics-only (in favor
of
--no-schema --no-data --with-statistics).I'd probably keep --statistics-only.
I'm going to vote for removing it. pg_dump has a lot of options, and
it doesn't seem like a good bet to me to have options that are
equivalent to various combinations of other options. I don't see any
particular reason to believe that --statistics-only is even a
particularly likely combination of options for someone to want. I'd
rather keep it simple.
The use case for --statistics-only is to extract the existing statistics
for the tables and indexes that are involved in a given query that is
giving you problems, allowing you to apply those statistics to an existing
QA/dev database and tweak them without further impacting operations on the
production database. I think this will prove to be very useful, and having
a --statistics-only flag conveys the clear intent of "I want the stats, and
only the stats",
--
Robert Haas
EDB: http://www.enterprisedb.com
If we're hot to remove options, how about we remove the sections flags?
Their utility is reliant upon the user understanding exactly which things
go in which section, and further assumes that everything deterministically
goes in exactly one section, which is no longer the case as Jeff
pointed out recently. They have outlived their usefulness.
If we have the full complement of -no-something flags, and the three -only
flags, we wouldn't need the --with-something flags. That would mean making
statistics export the default on dumps, which I think it should be anyway,
because there's nothing else that we don't dump by default, and while it
might seem strange to have them by default now, NOT having them by default
will feel very strange a few years down the road.
On Thu, Jun 12, 2025 at 04:12:35PM -0400, Corey Huinker wrote:
The use case for --statistics-only is to extract the existing statistics
for the tables and indexes that are involved in a given query that is
giving you problems, allowing you to apply those statistics to an existing
QA/dev database and tweak them without further impacting operations on the
production database. I think this will prove to be very useful, and having
a --statistics-only flag conveys the clear intent of "I want the stats, and
only the stats",
I do think this is useful functionality, I only suggested removing it
because AFAICT it is redundant, i.e., you can accomplish the same thing
with --with-statistics --no-schema --no-data. It seems like we're trying
to avoid having multiple ways to do the same thing.
If we're hot to remove options, how about we remove the sections flags?
Their utility is reliant upon the user understanding exactly which things
go in which section, and further assumes that everything deterministically
goes in exactly one section, which is no longer the case as Jeff
pointed out recently. They have outlived their usefulness.
I almost brought this up earlier as something else we could potentially
trim. That's v19 material at this point, though.
--
nathan
On Thu, Jun 12, 2025 at 4:22 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:
On Thu, Jun 12, 2025 at 04:12:35PM -0400, Corey Huinker wrote:
The use case for --statistics-only is to extract the existing statistics
for the tables and indexes that are involved in a given query that is
giving you problems, allowing you to apply those statistics to anexisting
QA/dev database and tweak them without further impacting operations on
the
production database. I think this will prove to be very useful, and
having
a --statistics-only flag conveys the clear intent of "I want the stats,
and
only the stats",
I do think this is useful functionality, I only suggested removing it
because AFAICT it is redundant, i.e., you can accomplish the same thing
with --with-statistics --no-schema --no-data. It seems like we're trying
to avoid having multiple ways to do the same thing.
By that same argument, we should remove --schema-only and --data-only as
well. I think we shouldn't because those two options have proved very
convenient for users and they convey clear intent to the person reading the
script, and I believe that --statistics-only will prove the same over time.
If we're hot to remove options, how about we remove the sections flags?
Their utility is reliant upon the user understanding exactly which things
go in which section, and further assumes that everythingdeterministically
goes in exactly one section, which is no longer the case as Jeff
pointed out recently. They have outlived their usefulness.I almost brought this up earlier as something else we could potentially
trim. That's v19 material at this point, though.
+1 for 19.
On Thu, Jun 12, 2025 at 04:39:00PM -0400, Corey Huinker wrote:
On Thu, Jun 12, 2025 at 4:22 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:I do think this is useful functionality, I only suggested removing it
because AFAICT it is redundant, i.e., you can accomplish the same thing
with --with-statistics --no-schema --no-data. It seems like we're trying
to avoid having multiple ways to do the same thing.By that same argument, we should remove --schema-only and --data-only as
well. I think we shouldn't because those two options have proved very
convenient for users and they convey clear intent to the person reading the
script, and I believe that --statistics-only will prove the same over time.
Those predate v18, so while might be able to mark them deprecated, I doubt
we'd remove them anytime soon.
FWIW I don't have a tremendously strong opinion about --statistics-only.
I'd probably vote to remove it because 1) it's redundant, 2) once you add
an option, it's hard to remove it, and 3) pg_dump already has so many
options. But I won't cry too hard if we keep it around.
--
nathan
On Thu, 2025-06-12 at 15:57 -0500, Nathan Bossart wrote:
FWIW I don't have a tremendously strong opinion about --statistics-
only.
Same here. I won't cast a vote on this particular issue, as long as the
functionality is available.
Regards,
Jeff Davis
On Thu, 2025-06-12 at 21:16 +0200, Peter Eisentraut wrote:
Do we have other options that are order-sensitive?
I think most of them are. For example:
psql -p 5432 -p 5433
initdb --data-checksums --no-data-checksums
postgres --shared-buffers=1GB --shared-buffers=2GB
Interesting. I don't think the "last option wins" model applies to
other pg_dump options, though. For instance, in PG17:
pg_dump --data-only --schema-only
pg_dump: error: options -s/--schema-only and -a/--data-only cannot be
used together
I don't think it's simple to start using "last option wins" behavior
now. There are probably some combinations of options where it's not
clear whether a later option is an extra constraint or will override a
previous option.
Regards,
Jeff Davis