Confusing error message with too-large file in pg_basebackup

Started by Josh Berkusover 10 years ago23 messagesbugs

Jump to latest

Josh Berkus

josh@agliodbs.com

over 10 years ago

Version: 9.4.5
Summary: confusing error message for too-large file failure in pg_basebackup

Details:

1. PostgreSQL previously core dumped on this system and left behind a
9gb core file, which was never deleted.

2. Attempted to pg_basebackup the server.

3. Got this error message:

pg_basebackup: could not get transaction log end position from server:
ERROR: archive member "core" too large for tar format

This was very confusing to the user, because they weren't requesting tar
format, and even setting -Fp got the same error message. I can only
hypothesize that tar is used somewhere under the hood.

pg_basebackup doesn't need to work under these circumstances, but maybe
we could give a less baffling error message?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Michael Paquier

michael@paquier.xyz

over 10 years ago

In reply to: Josh Berkus (#1)

Re: Confusing error message with too-large file in pg_basebackup

On Fri, Nov 20, 2015 at 9:37 AM, Josh Berkus <josh@agliodbs.com> wrote:

pg_basebackup: could not get transaction log end position from server:
ERROR: archive member "core" too large for tar format

That's a backend-side error.

This was very confusing to the user, because they weren't requesting tar
format, and even setting -Fp got the same error message. I can only
hypothesize that tar is used somewhere under the hood.

Exactly, when a base backup is taken through the replication protocol,
backend always sends it in tar format for performance reasons. It is
then up to pg_basebackup to decide if the output should be untared or
not.

pg_basebackup doesn't need to work under these circumstances, but maybe
we could give a less baffling error message?

We would need to let the backend know about the output format expected
by the caller of BASE_BACKUP by extending the command in the
replication protocol. It does not sound like a good idea to me just to
make some potential error messages more verbose.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Michael Paquier (#2)

Re: Confusing error message with too-large file in pg_basebackup

Michael Paquier <michael.paquier@gmail.com> writes:

On Fri, Nov 20, 2015 at 9:37 AM, Josh Berkus <josh@agliodbs.com> wrote:

pg_basebackup: could not get transaction log end position from server:
ERROR: archive member "core" too large for tar format

That's a backend-side error.

This was very confusing to the user, because they weren't requesting tar
format, and even setting -Fp got the same error message. I can only
hypothesize that tar is used somewhere under the hood.

Exactly, when a base backup is taken through the replication protocol,
backend always sends it in tar format for performance reasons. It is
then up to pg_basebackup to decide if the output should be untared or
not.

It's not unreasonable for pg_basebackup to use tar format, because the
size limitation should not be an issue for files that are expected to
be in a data directory. Leftover core dump files are unexpected :-(.
I wonder if we could put some sort of filter into pg_basebackup so
it would skip this sort of thing.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Michael Paquier

michael@paquier.xyz

over 10 years ago

In reply to: Tom Lane (#3)

Re: Confusing error message with too-large file in pg_basebackup

On Fri, Nov 20, 2015 at 1:41 PM, Tom Lane wrote:

It's not unreasonable for pg_basebackup to use tar format, because the
size limitation should not be an issue for files that are expected to
be in a data directory. Leftover core dump files are unexpected :-(.
I wonder if we could put some sort of filter into pg_basebackup so
it would skip this sort of thing.

We could try to have some filtering with the core file name for most
of the main distribution cases, like "core", or "core*", however with
kernel.core_pattern it is easy to set up on a given system a custom
core file name format.

Without having to call "file" through system(), another way would be
to have directly a look at the file type, but this looks
unmaintainable to me, look for example here in magic/Magdir/ that
keeps a reference of that. That's quite interesting.
ftp://ftp.astron.com/pub/file/
Now there is actually the possibility to call directly "file" in the
base backup code path as well, and filter the result depending on if
"core" shows up...
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Guillaume Lelarge

guillaume@lelarge.info

over 10 years ago

In reply to: Michael Paquier (#4)

Re: Confusing error message with too-large file in pg_basebackup

Le 20 nov. 2015 6:26 AM, "Michael Paquier" <michael.paquier@gmail.com> a
écrit :

On Fri, Nov 20, 2015 at 1:41 PM, Tom Lane wrote:

It's not unreasonable for pg_basebackup to use tar format, because the
size limitation should not be an issue for files that are expected to
be in a data directory. Leftover core dump files are unexpected :-(.
I wonder if we could put some sort of filter into pg_basebackup so
it would skip this sort of thing.

We could try to have some filtering with the core file name for most
of the main distribution cases, like "core", or "core*", however with
kernel.core_pattern it is easy to set up on a given system a custom
core file name format.

Without having to call "file" through system(), another way would be
to have directly a look at the file type, but this looks
unmaintainable to me, look for example here in magic/Magdir/ that
keeps a reference of that. That's quite interesting.
ftp://ftp.astron.com/pub/file/
Now there is actually the possibility to call directly "file" in the
base backup code path as well, and filter the result depending on if
"core" shows up...

Looking at the file's size is probably a better idea. As far as I know,
PostgreSQL doesn't create files bigger than 1GB, except for log files. I'm
not sure about this but I guess pg_basebackup doesn't ship log files. So,
looking at the size would work.

Michael Paquier

michael@paquier.xyz

over 10 years ago

In reply to: Guillaume Lelarge (#5)

Re: Confusing error message with too-large file in pg_basebackup

On Fri, Nov 20, 2015 at 4:39 PM, Guillaume Lelarge wrote:

Looking at the file's size is probably a better idea.

But isn't that already what the backend does?

As far as I know,
PostgreSQL doesn't create files bigger than 1GB, except for log files.

In most cases where the default is used, yes. Now this depends as well
on --with-segsize.

I'm not sure about this but I guess pg_basebackup doesn't ship log files. So, looking at the size would work.

It does fetch files from pg_log. We actually had on hackers not so
long ago discussions about authorizing some filtering option in
pg_basebackup for partially this purpose.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Alvaro Herrera

alvherre@2ndquadrant.com

over 10 years ago

In reply to: Guillaume Lelarge (#5)

Re: Confusing error message with too-large file in pg_basebackup

Guillaume Lelarge wrote:

Looking at the file's size is probably a better idea. As far as I know,
PostgreSQL doesn't create files bigger than 1GB, except for log files. I'm
not sure about this but I guess pg_basebackup doesn't ship log files. So,
looking at the size would work.

Hmm, so we let configure --with-segsize to change the file size. The
configure help says that the limit should be "less than your OS' limit
on file size". We don't warn them that this could cause backup
problems later on. Should we add a blurb about that somewhere?

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Guillaume Lelarge

guillaume@lelarge.info

over 10 years ago

In reply to: Alvaro Herrera (#7)

Re: Confusing error message with too-large file in pg_basebackup

Le 20 nov. 2015 1:34 PM, "Alvaro Herrera" <alvherre@2ndquadrant.com> a
écrit :

Guillaume Lelarge wrote:

Looking at the file's size is probably a better idea. As far as I know,
PostgreSQL doesn't create files bigger than 1GB, except for log files.

I'm

not sure about this but I guess pg_basebackup doesn't ship log files.

So,

looking at the size would work.

Hmm, so we let configure --with-segsize to change the file size. The
configure help says that the limit should be "less than your OS' limit
on file size". We don't warn them that this could cause backup
problems later on. Should we add a blurb about that somewhere?

If we do, we should already have done so because of the file size limit in
the tar format backup done with pg_dump.

Alvaro Herrera

alvherre@2ndquadrant.com

over 10 years ago

In reply to: Guillaume Lelarge (#8)

Re: Confusing error message with too-large file in pg_basebackup

Guillaume Lelarge wrote:

Le 20 nov. 2015 1:34 PM, "Alvaro Herrera" <alvherre@2ndquadrant.com> a
ï¿½crit :

Hmm, so we let configure --with-segsize to change the file size. The
configure help says that the limit should be "less than your OS' limit
on file size". We don't warn them that this could cause backup
problems later on. Should we add a blurb about that somewhere?

If we do, we should already have done so because of the file size limit in
the tar format backup done with pg_dump.

Agreed, please do. I cannot lend you my time machine right now, though,
because I'm using it to go to a past conference I missed, so please ask
someone else.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Alvaro Herrera (#9)

Re: Confusing error message with too-large file in pg_basebackup

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Guillaume Lelarge wrote:

Le 20 nov. 2015 1:34 PM, "Alvaro Herrera" <alvherre@2ndquadrant.com> a
�crit :

Hmm, so we let configure --with-segsize to change the file size. The
configure help says that the limit should be "less than your OS' limit
on file size". We don't warn them that this could cause backup
problems later on. Should we add a blurb about that somewhere?

If we do, we should already have done so because of the file size limit in
the tar format backup done with pg_dump.

Agreed, please do. I cannot lend you my time machine right now, though,
because I'm using it to go to a past conference I missed, so please ask
someone else.

Um ... the segment size has no effect on pg_dump. It is true that you
can't use pg_dump's -Ft format if the text form of a table's data exceeds
8GB or whatever it is, but it matters not whether the table is segmented
internally. I'm not sure if this limitation is well-documented either,
but it's a completely different issue so far as users are concerned.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#11

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Alvaro Herrera (#7)

Re: Confusing error message with too-large file in pg_basebackup

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Guillaume Lelarge wrote:

Looking at the file's size is probably a better idea. As far as I know,
PostgreSQL doesn't create files bigger than 1GB, except for log files. I'm
not sure about this but I guess pg_basebackup doesn't ship log files. So,
looking at the size would work.

Hmm, so we let configure --with-segsize to change the file size. The
configure help says that the limit should be "less than your OS' limit
on file size". We don't warn them that this could cause backup
problems later on. Should we add a blurb about that somewhere?

Actually ... why don't we get rid of the limit? wikipedia's entry on
tar format says

... only 11 octal digits can be stored. This gives a maximum file size
of 8 gigabytes on archived files. To overcome this limitation, star in
2001 introduced a base-256 coding that is indicated by setting the
high-order bit of the leftmost byte of a numeric field. GNU-tar and
BSD-tar followed this idea.

If that extension is as widespread as this suggests, then following it
when we have a file > 8GB seems like a better answer than failing
entirely. If you try to read the dump with an old tar program, old
pg_restore, etc, it might fail ... but are you really worse off than
if you couldn't make the dump at all?

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#12

daveg

daveg@sonic.net

over 10 years ago

In reply to: Tom Lane (#11)

Re: Confusing error message with too-large file in pg_basebackup

On Fri, 20 Nov 2015 15:20:12 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Actually ... why don't we get rid of the limit? wikipedia's entry on
tar format says

... only 11 octal digits can be stored. This gives a maximum file size
of 8 gigabytes on archived files. To overcome this limitation, star in
2001 introduced a base-256 coding that is indicated by setting the
high-order bit of the leftmost byte of a numeric field. GNU-tar and
BSD-tar followed this idea.

If that extension is as widespread as this suggests, then following it
when we have a file > 8GB seems like a better answer than failing
entirely. If you try to read the dump with an old tar program, old
pg_restore, etc, it might fail ... but are you really worse off than
if you couldn't make the dump at all?

regards, tom lane

-dg

--
David Gould daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#13

John R Pierce

pierce@hogranch.com

over 10 years ago

In reply to: Tom Lane (#11)

Re: Confusing error message with too-large file in pg_basebackup

On 11/20/2015 12:20 PM, Tom Lane wrote:

Actually ... why don't we get rid of the limit? wikipedia's entry on
tar format says

We still shouldn't be dumping core files as part of a pg_basebackup...
but then, why was the core file even IN the $PGDATA directory, shouldn't
it have been in some other dir, like the postgres daemon user's $HOME ?

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#14

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: John R Pierce (#13)

Re: Confusing error message with too-large file in pg_basebackup

John R Pierce <pierce@hogranch.com> writes:

We still shouldn't be dumping core files as part of a pg_basebackup...

It'd be reasonable to skip 'em if we can identify 'em reliably. I'm
not sure how reliably we can do that though.

but then, why was the core file even IN the $PGDATA directory, shouldn't
it have been in some other dir, like the postgres daemon user's $HOME ?

No, that's standard behavior. (I know of no Unix-oid system that dumps
cores into your $HOME by default; it's normally either $PWD or some
reserved directory like /cores.)

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#15

John R Pierce

pierce@hogranch.com

over 10 years ago

In reply to: Tom Lane (#14)

Re: Confusing error message with too-large file in pg_basebackup

On 11/20/2015 2:13 PM, Tom Lane wrote:

It'd be reasonable to skip 'em if we can identify 'em reliably. I'm
not sure how reliably we can do that though.

aren't they nearly always named 'core' ?

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#16

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: John R Pierce (#15)

Re: Confusing error message with too-large file in pg_basebackup

John R Pierce <pierce@hogranch.com> writes:

On 11/20/2015 2:13 PM, Tom Lane wrote:

It'd be reasonable to skip 'em if we can identify 'em reliably. I'm
not sure how reliably we can do that though.

aren't they nearly always named 'core' ?

No. Modern systems more often call them something like 'core.<pid>'.
What really makes it messy is that the name is user-configurable on
most Linux kernels, see /proc/sys/kernel/core_pattern.

We could probably get away with excluding anything that matches "*core*",
but it wouldn't be bulletproof.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#17

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Tom Lane (#11)

Re: Confusing error message with too-large file in pg_basebackup

I wrote:

Actually ... why don't we get rid of the limit? wikipedia's entry on
tar format says
... only 11 octal digits can be stored. This gives a maximum file size
of 8 gigabytes on archived files. To overcome this limitation, star in
2001 introduced a base-256 coding that is indicated by setting the
high-order bit of the leftmost byte of a numeric field. GNU-tar and
BSD-tar followed this idea.
If that extension is as widespread as this suggests, then following it
when we have a file > 8GB seems like a better answer than failing
entirely. If you try to read the dump with an old tar program, old
pg_restore, etc, it might fail ... but are you really worse off than
if you couldn't make the dump at all?

I looked into the GNU tar sources and confirmed that gtar supports this
concept. (It looks from the GNU sources like they've supported it for
a *really long time*, like since the 90's, in which case wikipedia's
credit to "star" for the idea is full of it.)

Hence, I propose something like the attached (WIP, has no doc
corrections). I've done simple testing on the pg_dump/pg_restore code
path, but not on basebackup --- anyone want to test that?

I'm not sure whether we should treat this as a back-patchable bug fix
or a new feature for HEAD only. If we don't back-patch it, there are
in any case several bugs here that we must fix. In particular, the
existing coding in ReceiveTarFile:

size_t filesz = 0;
...
sscanf(&tarhdr[124], "%11o", (unsigned int *) &filesz);

is utterly, absolutely, completely broken; it'll fail grossly on
any 64-bit big-endian hardware. There are other places with misplaced
faith that "unsigned long" is at least as wide as size_t.

Comments?

regards, tom lane

#18

daveg

daveg@sonic.net

over 10 years ago

In reply to: Tom Lane (#17)

Re: Confusing error message with too-large file in pg_basebackup

On Fri, 20 Nov 2015 19:11:23 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm not sure whether we should treat this as a back-patchable bug fix
or a new feature for HEAD only. If we don't back-patch it, there are
in any case several bugs here that we must fix. In particular, the
existing coding in ReceiveTarFile:

size_t filesz = 0;
...
sscanf(&tarhdr[124], "%11o", (unsigned int *) &filesz);

is utterly, absolutely, completely broken; it'll fail grossly on
any 64-bit big-endian hardware. There are other places with misplaced
faith that "unsigned long" is at least as wide as size_t.

Comments?

My vote would be that it should go in 9.5. If it gets back patched then
some dumps produced by 9.4.x would not be readable by 9.4.x-1. But no 9.5.x
dump is broken by changing it now.

-dg

--
David Gould 510 282 0869 daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#19

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: daveg (#18)

Re: Confusing error message with too-large file in pg_basebackup

David Gould <daveg@sonic.net> writes:

My vote would be that it should go in 9.5. If it gets back patched then
some dumps produced by 9.4.x would not be readable by 9.4.x-1. But no 9.5.x
dump is broken by changing it now.

The thing is, though, that without the patch 9.4.x would have failed to
produce such a dump at all. Is that really a better outcome? At least
if you have the dump, you have the option to update so you can read it.
With no dump at all, you might be screwed at a most inopportune moment.
(Think "automatic dump script was failing and nobody noticed" ...)

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#20

Michael Paquier

michael@paquier.xyz

over 10 years ago

In reply to: Tom Lane (#16)

Re: Confusing error message with too-large file in pg_basebackup

On Sat, Nov 21, 2015 at 7:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

John R Pierce <pierce@hogranch.com> writes:

On 11/20/2015 2:13 PM, Tom Lane wrote:

It'd be reasonable to skip 'em if we can identify 'em reliably. I'm
not sure how reliably we can do that though.

aren't they nearly always named 'core' ?

No. Modern systems more often call them something like 'core.<pid>'.
What really makes it messy is that the name is user-configurable on
most Linux kernels, see /proc/sys/kernel/core_pattern.

We could probably get away with excluding anything that matches "*core*",
but it wouldn't be bulletproof.

It does not look like a good idea to me. I have no doubts that there
are deployments including configuration files with such abbreviations
in PGDATA.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#21

daveg

daveg@sonic.net

over 10 years ago

In reply to: Michael Paquier (#20)

#22

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Tom Lane (#17)

#23