pg_dump --with-* options

Started by Peter Eisentraut9 months ago72 messages
Jump to latest
#1Peter Eisentraut
peter_e@gmx.net

I'm looking at the new in PG18 pg_dump --with-* options, and I'm having
trouble understanding them. (I did not look into the source code or the
git or mailing list history for this, to try to understand it as a user.)

We have

-a, --data-only dump only the data, not the schema or statistics
--no-data do not dump data
--with-data dump the data # this one is new

(and there is also --section=data), and then three analogous options for
"schema" and "statistics".

What is the purpose of the --with-data option? Dumping the data is the
default. Is this to override an earlier --no-data option?

The man page is only minimally more verbose: "Dump data. This is the
default." But why do you then need this option?

I think we should add some more documenting detail for these, but right
now I don't know what it would be.

#2Nathan Bossart
nathandbossart@gmail.com
In reply to: Peter Eisentraut (#1)
Re: pg_dump --with-* options

On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:

We have

-a, --data-only dump only the data, not the schema or statistics
--no-data do not dump data
--with-data dump the data # this one is new

(and there is also --section=data), and then three analogous options for
"schema" and "statistics".

What is the purpose of the --with-data option? Dumping the data is the
default. Is this to override an earlier --no-data option?

I believe the idea is that these will allow folks to be explicit about what
they want instead of needing to understand the defaults for every
component.

--
nathan

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Nathan Bossart (#2)
Re: pg_dump --with-* options

On 06.06.25 17:39, Nathan Bossart wrote:

On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:

We have

-a, --data-only dump only the data, not the schema or statistics
--no-data do not dump data
--with-data dump the data # this one is new

(and there is also --section=data), and then three analogous options for
"schema" and "statistics".

What is the purpose of the --with-data option? Dumping the data is the
default. Is this to override an earlier --no-data option?

I believe the idea is that these will allow folks to be explicit about what
they want instead of needing to understand the defaults for every
component.

I get that idea, but we really need some more documentation for this, I
think. So far I could only guess how this is supposed to be used, and I
also happened to guess wrong.

My initial guess was that --with-data can override --no-data. That
would have been pretty standard "last option wins" behavior. But
pg_dump rejects that. Personally, I think that is kind of wrong.

But you can use --with-data to override, say, --schema-only. That also
seems kind of wrong to me, but anyway. Then, it seems that
--schema-only plus --with-data is kind of the same as everything plus
--no-statistics?

I'm also wondering, if you consider the aforementioned "last option
wins" behavior, should there be a difference between

--schema-only --with-data, and
--with-data --schema-only

or for that matter

--schema-only --data-only

But in any case, if you want that level of precision, wouldn't it make
more sense to use the --section option?

This is all a nice logic puzzle, but some more documentation would be nice.

#4Robert Haas
robertmhaas@gmail.com
In reply to: Nathan Bossart (#2)
Re: pg_dump --with-* options

On Fri, Jun 6, 2025 at 11:40 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:

We have

-a, --data-only dump only the data, not the schema or statistics
--no-data do not dump data
--with-data dump the data # this one is new

(and there is also --section=data), and then three analogous options for
"schema" and "statistics".

What is the purpose of the --with-data option? Dumping the data is the
default. Is this to override an earlier --no-data option?

I believe the idea is that these will allow folks to be explicit about what
they want instead of needing to understand the defaults for every
component.

Am I too late to propose ripping this out?

I mean, if I look at pg_dump --help and there are options for
--with-broccoli and --without-mushrooms, I know that the defaults are
no brocooli, yes mushrooms, and I know which options I need to specify
to get the behavior that I want, whatever that happens to be. If all
options exist in both forms, it's a lot more confusing. Maybe there's
some issue of cross-version compatibility here that justifies this
complexity, but I don't see what it would be. I would think
--with-data has always been the default and always will be, so we just
don't need --with-data for anything. But maybe I'm confused.

--
Robert Haas
EDB: http://www.enterprisedb.com

#5Fujii Masao
masao.fujii@gmail.com
In reply to: Peter Eisentraut (#3)
Re: pg_dump --with-* options

On 2025/06/12 22:47, Peter Eisentraut wrote:

On 06.06.25 17:39, Nathan Bossart wrote:

On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:

We have

   -a, --data-only      dump only the data, not the schema or statistics
   --no-data            do not dump data
   --with-data          dump the data  # this one is new

(and there is also --section=data), and then three analogous options for
"schema" and "statistics".

What is the purpose of the --with-data option?  Dumping the data is the
default.  Is this to override an earlier --no-data option?

I believe the idea is that these will allow folks to be explicit about what
they want instead of needing to understand the defaults for every
component.

I get that idea, but we really need some more documentation for this, I think.  So far I could only guess how this is supposed to be used, and I also happened to guess wrong.

My initial guess was that --with-data can override --no-data.  That would have been pretty standard "last option wins" behavior.  But pg_dump rejects that.  Personally, I think that is kind of wrong.

But you can use --with-data to override, say, --schema-only.  That also seems kind of wrong to me, but anyway.

While testing pg_dump --with-* in relation to bug #18952 [1]/messages/by-id/18952-be40a620f8b1e755@postgresql.org,
I also ran into this behavior. It was surprising,
as I expected pg_dump to reject that combination of options.
The current behavior seems confusing.

Regards,

[1]: /messages/by-id/18952-be40a620f8b1e755@postgresql.org

--
Fujii Masao
NTT DATA Japan Corporation

#6Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#4)
Re: pg_dump --with-* options

On Thu, Jun 12, 2025 at 10:18:56AM -0400, Robert Haas wrote:

On Fri, Jun 6, 2025 at 11:40 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Fri, Jun 06, 2025 at 09:14:32AM +0200, Peter Eisentraut wrote:

What is the purpose of the --with-data option? Dumping the data is the
default. Is this to override an earlier --no-data option?

I believe the idea is that these will allow folks to be explicit about what
they want instead of needing to understand the defaults for every
component.

Am I too late to propose ripping this out?

I mean, if I look at pg_dump --help and there are options for
--with-broccoli and --without-mushrooms, I know that the defaults are
no brocooli, yes mushrooms, and I know which options I need to specify
to get the behavior that I want, whatever that happens to be. If all
options exist in both forms, it's a lot more confusing. Maybe there's
some issue of cross-version compatibility here that justifies this
complexity, but I don't see what it would be. I would think
--with-data has always been the default and always will be, so we just
don't need --with-data for anything. But maybe I'm confused.

If the idea is to remove all options for default behavior, we'd be removing
--no-statistics, --with-data, and --with-schema at this point. Maybe we
could go a step further and even rip out --statistics-only (in favor of
--no-schema --no-data --with-statistics). In general, I do think the list
of pg_dump options is pretty unwieldy at this point.

--
nathan

#7Jeff Davis
pgsql@j-davis.com
In reply to: Peter Eisentraut (#3)
Re: pg_dump --with-* options

On Thu, 2025-06-12 at 15:47 +0200, Peter Eisentraut wrote:

My initial guess was that --with-data can override --no-data.  That
would have been pretty standard "last option wins" behavior.  But
pg_dump rejects that.  Personally, I think that is kind of wrong.

Do we have other options that are order-sensitive?

But in any case, if you want that level of precision, wouldn't it
make
more sense to use the --section option?

That's not possible with statistics, because some appear in
SECTION_DATA and some in SECTION_POST_DATA (e.g. stats on indexes,
which are in SECTION_POST_DATA).

Regards,
Jeff Davis

#8Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#4)
Re: pg_dump --with-* options

On Thu, 2025-06-12 at 10:18 -0400, Robert Haas wrote:

Am I too late to propose ripping this out?

As long as we keep the functionality, I'm fine changing the
options/names around at this point.

Regards,
Jeff Davis

#9Jeff Davis
pgsql@j-davis.com
In reply to: Nathan Bossart (#6)
Re: pg_dump --with-* options

On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:

If the idea is to remove all options for default behavior, we'd be
removing
--no-statistics, --with-data, and --with-schema at this point.

That's OK with me.

  Maybe we
could go a step further and even rip out --statistics-only (in favor
of
--no-schema --no-data --with-statistics).

I'd probably keep --statistics-only.

Regards,
Jeff Davis

#10Nathan Bossart
nathandbossart@gmail.com
In reply to: Jeff Davis (#9)
Re: pg_dump --with-* options

On Thu, Jun 12, 2025 at 08:58:15AM -0700, Jeff Davis wrote:

On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:

If the idea is to remove all options for default behavior, we'd be
removing
--no-statistics, --with-data, and --with-schema at this point.

That's OK with me.

� Maybe we
could go a step further and even rip out --statistics-only (in favor
of
--no-schema --no-data --with-statistics).

I'd probably keep --statistics-only.

WFM

--
nathan

#11Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#9)
Re: pg_dump --with-* options

On Thu, Jun 12, 2025 at 11:58 AM Jeff Davis <pgsql@j-davis.com> wrote:

On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:

If the idea is to remove all options for default behavior, we'd be
removing
--no-statistics, --with-data, and --with-schema at this point.

That's OK with me.

Same.

Maybe we
could go a step further and even rip out --statistics-only (in favor
of
--no-schema --no-data --with-statistics).

I'd probably keep --statistics-only.

I'm going to vote for removing it. pg_dump has a lot of options, and
it doesn't seem like a good bet to me to have options that are
equivalent to various combinations of other options. I don't see any
particular reason to believe that --statistics-only is even a
particularly likely combination of options for someone to want. I'd
rather keep it simple.

--
Robert Haas
EDB: http://www.enterprisedb.com

#12Peter Eisentraut
peter_e@gmx.net
In reply to: Jeff Davis (#7)
Re: pg_dump --with-* options

On 12.06.25 17:14, Jeff Davis wrote:

On Thu, 2025-06-12 at 15:47 +0200, Peter Eisentraut wrote:

My initial guess was that --with-data can override --no-data.  That
would have been pretty standard "last option wins" behavior.  But
pg_dump rejects that.  Personally, I think that is kind of wrong.

Do we have other options that are order-sensitive?

I think most of them are. For example:

psql -p 5432 -p 5433
initdb --data-checksums --no-data-checksums
postgres --shared-buffers=1GB --shared-buffers=2GB

#13Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Robert Haas (#11)
Re: pg_dump --with-* options

On Thu, 2025-06-12 at 13:36 -0400, Robert Haas wrote:

On Thu, Jun 12, 2025 at 11:58 AM Jeff Davis <pgsql@j-davis.com> wrote:

On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:

If the idea is to remove all options for default behavior, we'd be
removing
--no-statistics, --with-data, and --with-schema at this point.

That's OK with me.

Same.

I must be missing something, but I think --no-statistics is sorely needed.
How else can I get the effect of

pg_dump --no-statistics mydb

Yours,
Laurenz Albe

#14Nathan Bossart
nathandbossart@gmail.com
In reply to: Laurenz Albe (#13)
Re: pg_dump --with-* options

On Thu, Jun 12, 2025 at 10:07:05PM +0200, Laurenz Albe wrote:

I must be missing something, but I think --no-statistics is sorely needed.
How else can I get the effect of

pg_dump --no-statistics mydb

This was recently changed to be the default behavior (see commit 34eb2a8).

--
nathan

#15Corey Huinker
corey.huinker@gmail.com
In reply to: Robert Haas (#11)
Re: pg_dump --with-* options

On Thu, Jun 12, 2025 at 1:36 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jun 12, 2025 at 11:58 AM Jeff Davis <pgsql@j-davis.com> wrote:

On Thu, 2025-06-12 at 09:52 -0500, Nathan Bossart wrote:

If the idea is to remove all options for default behavior, we'd be
removing
--no-statistics, --with-data, and --with-schema at this point.

That's OK with me.

Same.

Maybe we
could go a step further and even rip out --statistics-only (in favor
of
--no-schema --no-data --with-statistics).

I'd probably keep --statistics-only.

I'm going to vote for removing it. pg_dump has a lot of options, and
it doesn't seem like a good bet to me to have options that are
equivalent to various combinations of other options. I don't see any
particular reason to believe that --statistics-only is even a
particularly likely combination of options for someone to want. I'd
rather keep it simple.

The use case for --statistics-only is to extract the existing statistics
for the tables and indexes that are involved in a given query that is
giving you problems, allowing you to apply those statistics to an existing
QA/dev database and tweak them without further impacting operations on the
production database. I think this will prove to be very useful, and having
a --statistics-only flag conveys the clear intent of "I want the stats, and
only the stats",

--
Robert Haas
EDB: http://www.enterprisedb.com

If we're hot to remove options, how about we remove the sections flags?
Their utility is reliant upon the user understanding exactly which things
go in which section, and further assumes that everything deterministically
goes in exactly one section, which is no longer the case as Jeff
pointed out recently. They have outlived their usefulness.

If we have the full complement of -no-something flags, and the three -only
flags, we wouldn't need the --with-something flags. That would mean making
statistics export the default on dumps, which I think it should be anyway,
because there's nothing else that we don't dump by default, and while it
might seem strange to have them by default now, NOT having them by default
will feel very strange a few years down the road.

#16Nathan Bossart
nathandbossart@gmail.com
In reply to: Corey Huinker (#15)
Re: pg_dump --with-* options

On Thu, Jun 12, 2025 at 04:12:35PM -0400, Corey Huinker wrote:

The use case for --statistics-only is to extract the existing statistics
for the tables and indexes that are involved in a given query that is
giving you problems, allowing you to apply those statistics to an existing
QA/dev database and tweak them without further impacting operations on the
production database. I think this will prove to be very useful, and having
a --statistics-only flag conveys the clear intent of "I want the stats, and
only the stats",

I do think this is useful functionality, I only suggested removing it
because AFAICT it is redundant, i.e., you can accomplish the same thing
with --with-statistics --no-schema --no-data. It seems like we're trying
to avoid having multiple ways to do the same thing.

If we're hot to remove options, how about we remove the sections flags?
Their utility is reliant upon the user understanding exactly which things
go in which section, and further assumes that everything deterministically
goes in exactly one section, which is no longer the case as Jeff
pointed out recently. They have outlived their usefulness.

I almost brought this up earlier as something else we could potentially
trim. That's v19 material at this point, though.

--
nathan

#17Corey Huinker
corey.huinker@gmail.com
In reply to: Nathan Bossart (#16)
Re: pg_dump --with-* options

On Thu, Jun 12, 2025 at 4:22 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:

On Thu, Jun 12, 2025 at 04:12:35PM -0400, Corey Huinker wrote:

The use case for --statistics-only is to extract the existing statistics
for the tables and indexes that are involved in a given query that is
giving you problems, allowing you to apply those statistics to an

existing

QA/dev database and tweak them without further impacting operations on

the

production database. I think this will prove to be very useful, and

having

a --statistics-only flag conveys the clear intent of "I want the stats,

and

only the stats",

I do think this is useful functionality, I only suggested removing it
because AFAICT it is redundant, i.e., you can accomplish the same thing
with --with-statistics --no-schema --no-data. It seems like we're trying
to avoid having multiple ways to do the same thing.

By that same argument, we should remove --schema-only and --data-only as
well. I think we shouldn't because those two options have proved very
convenient for users and they convey clear intent to the person reading the
script, and I believe that --statistics-only will prove the same over time.

If we're hot to remove options, how about we remove the sections flags?
Their utility is reliant upon the user understanding exactly which things
go in which section, and further assumes that everything

deterministically

goes in exactly one section, which is no longer the case as Jeff
pointed out recently. They have outlived their usefulness.

I almost brought this up earlier as something else we could potentially
trim. That's v19 material at this point, though.

+1 for 19.

#18Nathan Bossart
nathandbossart@gmail.com
In reply to: Corey Huinker (#17)
Re: pg_dump --with-* options

On Thu, Jun 12, 2025 at 04:39:00PM -0400, Corey Huinker wrote:

On Thu, Jun 12, 2025 at 4:22 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:

I do think this is useful functionality, I only suggested removing it
because AFAICT it is redundant, i.e., you can accomplish the same thing
with --with-statistics --no-schema --no-data. It seems like we're trying
to avoid having multiple ways to do the same thing.

By that same argument, we should remove --schema-only and --data-only as
well. I think we shouldn't because those two options have proved very
convenient for users and they convey clear intent to the person reading the
script, and I believe that --statistics-only will prove the same over time.

Those predate v18, so while might be able to mark them deprecated, I doubt
we'd remove them anytime soon.

FWIW I don't have a tremendously strong opinion about --statistics-only.
I'd probably vote to remove it because 1) it's redundant, 2) once you add
an option, it's hard to remove it, and 3) pg_dump already has so many
options. But I won't cry too hard if we keep it around.

--
nathan

#19Jeff Davis
pgsql@j-davis.com
In reply to: Nathan Bossart (#18)
Re: pg_dump --with-* options

On Thu, 2025-06-12 at 15:57 -0500, Nathan Bossart wrote:

FWIW I don't have a tremendously strong opinion about --statistics-
only.

Same here. I won't cast a vote on this particular issue, as long as the
functionality is available.

Regards,
Jeff Davis

#20Jeff Davis
pgsql@j-davis.com
In reply to: Peter Eisentraut (#12)
Re: pg_dump --with-* options

On Thu, 2025-06-12 at 21:16 +0200, Peter Eisentraut wrote:

Do we have other options that are order-sensitive?

I think most of them are.  For example:

psql -p 5432 -p 5433
initdb --data-checksums --no-data-checksums
postgres --shared-buffers=1GB --shared-buffers=2GB

Interesting. I don't think the "last option wins" model applies to
other pg_dump options, though. For instance, in PG17:

pg_dump --data-only --schema-only
pg_dump: error: options -s/--schema-only and -a/--data-only cannot be
used together

I don't think it's simple to start using "last option wins" behavior
now. There are probably some combinations of options where it's not
clear whether a later option is an extra constraint or will override a
previous option.

Regards,
Jeff Davis

#21Fujii Masao
masao.fujii@gmail.com
In reply to: Jeff Davis (#19)
#22Fujii Masao
masao.fujii@gmail.com
In reply to: Nathan Bossart (#6)
#23Greg Sabino Mullane
greg@turnstep.com
In reply to: Corey Huinker (#15)
#24Peter Eisentraut
peter_e@gmx.net
In reply to: Jeff Davis (#20)
#25Daniel Gustafsson
daniel@yesql.se
In reply to: Fujii Masao (#22)
#26Jeff Davis
pgsql@j-davis.com
In reply to: Peter Eisentraut (#24)
#27Jeff Davis
pgsql@j-davis.com
In reply to: Fujii Masao (#22)
#28Nathan Bossart
nathandbossart@gmail.com
In reply to: Jeff Davis (#27)
#29Corey Huinker
corey.huinker@gmail.com
In reply to: Peter Eisentraut (#24)
#30Corey Huinker
corey.huinker@gmail.com
In reply to: Jeff Davis (#27)
#31Fujii Masao
masao.fujii@gmail.com
In reply to: Nathan Bossart (#28)
#32Corey Huinker
corey.huinker@gmail.com
In reply to: Fujii Masao (#31)
#33Nathan Bossart
nathandbossart@gmail.com
In reply to: Corey Huinker (#32)
#34Jeff Davis
pgsql@j-davis.com
In reply to: Nathan Bossart (#33)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#34)
#36Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#35)
#37Fujii Masao
masao.fujii@gmail.com
In reply to: Corey Huinker (#32)
#38Fujii Masao
masao.fujii@gmail.com
In reply to: Nathan Bossart (#36)
#39Jeff Davis
pgsql@j-davis.com
In reply to: Corey Huinker (#32)
#40Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Davis (#9)
#41Nathan Bossart
nathandbossart@gmail.com
In reply to: Jeff Davis (#40)
#42Jeff Davis
pgsql@j-davis.com
In reply to: Nathan Bossart (#41)
#43Nathan Bossart
nathandbossart@gmail.com
In reply to: Jeff Davis (#42)
#44Jeff Davis
pgsql@j-davis.com
In reply to: Nathan Bossart (#41)
#45Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#44)
#46Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#45)
#47Robert Haas
robertmhaas@gmail.com
In reply to: Nathan Bossart (#46)
#48Greg Sabino Mullane
greg@turnstep.com
In reply to: Nathan Bossart (#41)
#49Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#47)
#50Nathan Bossart
nathandbossart@gmail.com
In reply to: Greg Sabino Mullane (#48)
#51Nathan Bossart
nathandbossart@gmail.com
In reply to: Fujii Masao (#49)
#52Greg Sabino Mullane
greg@turnstep.com
In reply to: Nathan Bossart (#50)
#53Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#45)
#54Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Davis (#44)
#55Jeff Davis
pgsql@j-davis.com
In reply to: Fujii Masao (#49)
#56Fujii Masao
masao.fujii@gmail.com
In reply to: Jeff Davis (#55)
#57Jeff Davis
pgsql@j-davis.com
In reply to: Fujii Masao (#56)
#58Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Davis (#54)
#59Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jeff Davis (#58)
#60Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Davis (#44)
#61Jeff Davis
pgsql@j-davis.com
In reply to: Alvaro Herrera (#59)
#62Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Fujii Masao (#49)
#63Jeff Davis
pgsql@j-davis.com
In reply to: Alvaro Herrera (#62)
#64Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jeff Davis (#63)
#65Corey Huinker
corey.huinker@gmail.com
In reply to: Alvaro Herrera (#64)
#66Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Davis (#60)
#67Jeff Davis
pgsql@j-davis.com
In reply to: Corey Huinker (#65)
#68Corey Huinker
corey.huinker@gmail.com
In reply to: Jeff Davis (#67)
#69Nathan Bossart
nathandbossart@gmail.com
In reply to: Jeff Davis (#66)
#70Fujii Masao
masao.fujii@gmail.com
In reply to: Jeff Davis (#66)
#71Jeff Davis
pgsql@j-davis.com
In reply to: Nathan Bossart (#69)
#72Jeff Davis
pgsql@j-davis.com
In reply to: Fujii Masao (#70)