Purpose of pg_dump tar archive format?

Started by Ronalmost 2 years ago15 messagesgeneral
Jump to latest
#1Ron
ronljohnsonjr@gmail.com

It doesn't support compression nor restore reordering like the custom
format, so I'm having trouble seeing why it still exists (at least without
a doc warning that it's obsolete).

#2Erik Wienhold
ewie@ewie.name
In reply to: Ron (#1)
Re: Purpose of pg_dump tar archive format?

On 2024-06-04 14:13 +0200, Ron Johnson wrote:

It doesn't support compression nor restore reordering like the custom
format, so I'm having trouble seeing why it still exists (at least without
a doc warning that it's obsolete).

Maybe because of this:

* The tar format also includes a 'restore.sql' script which is there for
* the benefit of humans. This script is never used by pg_restore.

I've never used that restore.sql, so I can't say how useful it is,
especially since it requires editing $$PATH$$. Should be mentioned in
the docs if that file is still relevant.

--
Erik

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Ron (#1)
Re: Purpose of pg_dump tar archive format?

On 6/4/24 05:13, Ron Johnson wrote:

It doesn't support compression nor restore reordering like the custom
format, so I'm having trouble seeing why it still exists (at least
without a doc warning that it's obsolete).

pg_dump -d test -U postgres -Ft | gzip --stdout > test.tgz

--
Adrian Klaver
adrian.klaver@aklaver.com

#4Ron
ronljohnsonjr@gmail.com
In reply to: Adrian Klaver (#3)
Re: Purpose of pg_dump tar archive format?

On Tue, Jun 4, 2024 at 10:43 AM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 6/4/24 05:13, Ron Johnson wrote:

It doesn't support compression nor restore reordering like the custom
format, so I'm having trouble seeing why it still exists (at least
without a doc warning that it's obsolete).

pg_dump -d test -U postgres -Ft | gzip --stdout > test.tgz

Who's got meaningful databases that small anymore?

And if you've got meaningfully sized databases, open port 5432 and move
them using pg_dump.

#5Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Ron (#4)
Re: Purpose of pg_dump tar archive format?

On 6/4/24 08:25, Ron Johnson wrote:

On Tue, Jun 4, 2024 at 10:43 AM Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

On 6/4/24 05:13, Ron Johnson wrote:

It doesn't support compression nor restore reordering like the

custom

format, so I'm having trouble seeing why it still exists (at least
without a doc warning that it's obsolete).

pg_dump -d test -U postgres -Ft  | gzip --stdout > test.tgz

Who's got meaningful databases that small anymore?

And if you've got meaningfully sized databases, open port 5432 and move
them using pg_dump.

If you don't need the tar format then don't use it.

--
Adrian Klaver
adrian.klaver@aklaver.com

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Adrian Klaver (#5)
Re: Purpose of pg_dump tar archive format?

Adrian Klaver <adrian.klaver@aklaver.com> writes:

If you don't need the tar format then don't use it.

Indeed. FTR, I think the original motivation for tar format was to
have a "standard" structured output format that could be manipulated
with tools other than pg_restore. Years later, we added the directory
format which could be argued to serve the same purpose. But nobody
particularly wanted to remove the tar functionality, especially since
the two were made to be compatible:

commit 7f508f1c6b515df66d27f860b2faa7b5761fa55d
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sun Jan 23 23:10:15 2011 +0200

Add 'directory' format to pg_dump. The new directory format is compatible
with the 'tar' format, in that untarring a tar format archive produces a
valid directory format archive.

regards, tom lane

#7Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Ron (#1)
Re: Purpose of pg_dump tar archive format?

Reply to list also.
Ccing list

On 6/4/24 10:03 AM, Ron Johnson wrote:

If you don't need the tar format then don't use it.

That's neither the purpose nor the point of my question.

I think that a note in the docs mentioning that it's obsolete would be
helpful for new users who recognize "tar" so choose it.

You are assuming facts not in evidence, namely that the format is obsolete.

--
Adrian Klaver
adrian.klaver@aklaver.com

#8Shaheed Haque
shaheedhaque@gmail.com
In reply to: Adrian Klaver (#7)
Re: Purpose of pg_dump tar archive format?

We use it. I bet lots of others do too.

On Tue, 4 Jun 2024, 18:06 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:

Show quoted text

Reply to list also.
Ccing list

On 6/4/24 10:03 AM, Ron Johnson wrote:

If you don't need the tar format then don't use it.

That's neither the purpose nor the point of my question.

I think that a note in the docs mentioning that it's obsolete would be
helpful for new users who recognize "tar" so choose it.

You are assuming facts not in evidence, namely that the format is obsolete.

--
Adrian Klaver
adrian.klaver@aklaver.com

#9Rob Sargent
robjsargent@gmail.com
In reply to: Shaheed Haque (#8)
Re: Purpose of pg_dump tar archive format?

On 6/4/24 11:40, Shaheed Haque wrote:

We use it. I bet lots of others do too.

Of course.  There are lots of small, real, useful databases in the wild.

#10Ron
ronljohnsonjr@gmail.com
In reply to: Rob Sargent (#9)
Re: Purpose of pg_dump tar archive format?

On Tue, Jun 4, 2024 at 2:55 PM Rob Sargent <robjsargent@gmail.com> wrote:

On 6/4/24 11:40, Shaheed Haque wrote:

We use it. I bet lots of others do too.

Of course. There are lots of small, real, useful databases in the wild.

But why tar instead of custom? That was part of my original question.

#11Rob Sargent
robjsargent@gmail.com
In reply to: Ron (#10)
Re: Purpose of pg_dump tar archive format?

On 6/4/24 13:15, Ron Johnson wrote:

On Tue, Jun 4, 2024 at 2:55 PM Rob Sargent <robjsargent@gmail.com> wrote:

On 6/4/24 11:40, Shaheed Haque wrote:

We use it. I bet lots of others do too.

Of course.  There are lots of small, real, useful databases in the
wild.

But why tar instead of custom? That was part of my original question.

That I can't say.  Familiarity?  Fewer keystrokes?

#12Gavin Roy
gavinr@aweber.com
In reply to: Ron (#10)
Re: Purpose of pg_dump tar archive format?

On Tue, Jun 4, 2024 at 3:15 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:

But why tar instead of custom? That was part of my original question.

I've found it pretty useful for programmatically accessing data in a dump
for large databases outside of the normal pg_dump/pg_restore workflow. You
don't have to seek through one large binary file to get to the data section
to get at the data.

#13Shaheed Haque
shaheedhaque@gmail.com
In reply to: Gavin Roy (#12)
Re: Purpose of pg_dump tar archive format?

On Tue, 4 Jun 2024 at 20:47, Gavin Roy <gavinr@aweber.com> wrote:

On Tue, Jun 4, 2024 at 3:15 PM Ron Johnson <ronljohnsonjr@gmail.com>
wrote:

But why tar instead of custom? That was part of my original question.

I've found it pretty useful for programmatically accessing data in a dump
for large databases outside of the normal pg_dump/pg_restore workflow. You
don't have to seek through one large binary file to get to the data section
to get at the data.

This is true for us too; specifically, tar, including with compression, is
very convenient for both CLI and Python ecosystems.

#14Ron
ronljohnsonjr@gmail.com
In reply to: Gavin Roy (#12)
Re: Purpose of pg_dump tar archive format?

On Tue, Jun 4, 2024 at 3:47 PM Gavin Roy <gavinr@aweber.com> wrote:

On Tue, Jun 4, 2024 at 3:15 PM Ron Johnson <ronljohnsonjr@gmail.com>
wrote:

But why tar instead of custom? That was part of my original question.

I've found it pretty useful for programmatically accessing data in a dump
for large databases outside of the normal pg_dump/pg_restore workflow. You
don't have to seek through one large binary file to get to the data section
to get at the data.

Interesting. Please explain, though, since a big tarball _is_ "one large
binary file" that you have to sequentially scan. (I don't know the
internal structure of custom format files, and whether they have file
pointers to each table.)

Is it because you need individual .dat "COPY" files for something other
than loading into PG tables (since pg_restore --table=xxxx does that, too),
and directory format archives can be inconvenient?

#15Gavin Roy
gavinr@aweber.com
In reply to: Ron (#14)
Re: Purpose of pg_dump tar archive format?

On Tue, Jun 4, 2024 at 7:36 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:

On Tue, Jun 4, 2024 at 3:47 PM Gavin Roy <gavinr@aweber.com> wrote:

On Tue, Jun 4, 2024 at 3:15 PM Ron Johnson <ronljohnsonjr@gmail.com>
wrote:

But why tar instead of custom? That was part of my original question.

I've found it pretty useful for programmatically accessing data in a dump
for large databases outside of the normal pg_dump/pg_restore workflow. You
don't have to seek through one large binary file to get to the data section
to get at the data.

Interesting. Please explain, though, since a big tarball _is_ "one large
binary file" that you have to sequentially scan. (I don't know the
internal structure of custom format files, and whether they have file
pointers to each table.)

Not if you untar it first.

Is it because you need individual .dat "COPY" files for something other
than loading into PG tables (since pg_restore --table=xxxx does that, too),
and directory format archives can be inconvenient?

In the past I've used it for data analysis outside of Postgres.
--
*Gavin M. Roy*
CTO
AWeber