Patch: incorrect array offset in backend replication tar header
While researching the way streaming replication works I was examining
the construction of the tar file header. By comparing documentation on
the tar header format from various sources I certain the following
patch should be applied to so the group identifier is put into thee
header properly.
While I realize that wikipedia isn't always the best source of
information, the header offsets seem to match the other documentation
I've found. The format is just easier to read on wikipedia
http://en.wikipedia.org/wiki/Tar_(file_format)#File_header
Here is the trivial patch:
diff --git a/src/backend/replication/basebackup.c
b/src/backend/replication/basebackup.c
index 4aaa9e3..524223e 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -871,7 +871,7 @@ _tarWriteHeader(const char *filename, const char
*linktarget,
sprintf(&h[108], "%07o ", statbuf->st_uid);
/* Group 8 */
- sprintf(&h[117], "%07o ", statbuf->st_gid);
+ sprintf(&h[116], "%07o ", statbuf->st_gid);
/* File size 12 - 11 digits, 1 space, no NUL */
if (linktarget != NULL || S_ISDIR(statbuf->st_mode))
-- Brian
--
/* insert witty comment here */
Actually I found one other issue while continuing my investigation.
The insertion of the 'ustar' and version '00' has the '00' version at
the wrong offset. The patch is attached.
-- Brian
On Mon, Sep 24, 2012 at 7:51 PM, Brian Weaver <cmdrclueless@gmail.com> wrote:
While researching the way streaming replication works I was examining
the construction of the tar file header. By comparing documentation on
the tar header format from various sources I certain the following
patch should be applied to so the group identifier is put into thee
header properly.While I realize that wikipedia isn't always the best source of
information, the header offsets seem to match the other documentation
I've found. The format is just easier to read on wikipediahttp://en.wikipedia.org/wiki/Tar_(file_format)#File_header
Here is the trivial patch:
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 4aaa9e3..524223e 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -871,7 +871,7 @@ _tarWriteHeader(const char *filename, const char *linktarget, sprintf(&h[108], "%07o ", statbuf->st_uid);/* Group 8 */ - sprintf(&h[117], "%07o ", statbuf->st_gid); + sprintf(&h[116], "%07o ", statbuf->st_gid);/* File size 12 - 11 digits, 1 space, no NUL */
if (linktarget != NULL || S_ISDIR(statbuf->st_mode))-- Brian
--
/* insert witty comment here */
--
/* insert witty comment here */
Attachments:
pg-tar.patchapplication/octet-stream; name=pg-tar.patchDownload+2-2
Um.... I apologize for the third e-mail on the topic. It seems that my
C coding is a bit rusty from years of neglect. No sooner had I hit the
send button then I realized that trying to embed a null character in a
string might not work, especially when it's followed by two
consecutive zeros.
Here is a safer fix which is more in line with the rest of the code in
the file. I guess this is what I get for being cleaver.
Patch is attached
--
/* insert witty comment here */
Attachments:
pg-tar.patchapplication/octet-stream; name=pg-tar.patchDownload+3-2
Brian Weaver <cmdrclueless@gmail.com> writes:
While researching the way streaming replication works I was examining
the construction of the tar file header. By comparing documentation on
the tar header format from various sources I certain the following
patch should be applied to so the group identifier is put into thee
header properly.
Yeah, this is definitely wrong.
While I realize that wikipedia isn't always the best source of
information, the header offsets seem to match the other documentation
I've found. The format is just easier to read on wikipedia
The authoritative specification can be found in the "pax" page in the
POSIX spec, which is available here:
http://pubs.opengroup.org/onlinepubs/9699919799/
I agree that the 117 number is bogus, and also that the magic "ustar"
string is written incorrectly. What's more, it appears that the latter
error has been copied from pg_dump (but the 117 seems to be just a new
bug in pg_basebackup). I wonder what else might be wrong hereabouts :-(
Will sit down and take a closer look.
I believe what we need to do about this is:
1. fix pg_dump and pg_basebackup output to conform to spec.
2. make sure pg_restore will accept both conformant and
previous-generation files.
Am I right in believing that we don't have any code that's expected to
read pg_basebackup output? We just feed it to "tar", no?
I'm a bit concerned about backwards compatibility issues. It looks to
me like existing versions of pg_restore will flat out reject files that
have a spec-compliant "ustar\0" MAGIC field. Is it going to be
sufficient if we fix this in minor-version updates, or are we going to
need to have a switch that tells pg_dump to emit the incorrect old
format? (Ick.)
regards, tom lane
Tom,
I'm still investigating and I have been looking at various sources. I
have checked lots of pages on the web and I was just looking at the
libarchive source from github. I found an interesting sequence in
libarchive that implies that the 'ustar00\0' marks the header as GNU
Tar format.
Here are lines 321 through 329 of 'archive_read_support_format_tar.c'
from libarchive
321 /* Recognize POSIX formats. */
322 if ((memcmp(header->magic, "ustar\0", 6) == 0)
323 && (memcmp(header->version, "00", 2) == 0))
324 bid += 56;
325
326 /* Recognize GNU tar format. */
327 if ((memcmp(header->magic, "ustar ", 6) == 0)
328 && (memcmp(header->version, " \0", 2) == 0))
329 bid += 56;
I'm wondering if the original committer put the 'ustar00\0' string in by design?
Regardless I'll look at it more tomorrow 'cause I'm calling it a
night. I need to send a note to the libarchive folks too because I
*think* I found a potential buffer overrun in one of their octal
conversion routines.
-- Brian
On Mon, Sep 24, 2012 at 10:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Brian Weaver <cmdrclueless@gmail.com> writes:
While researching the way streaming replication works I was examining
the construction of the tar file header. By comparing documentation on
the tar header format from various sources I certain the following
patch should be applied to so the group identifier is put into thee
header properly.Yeah, this is definitely wrong.
While I realize that wikipedia isn't always the best source of
information, the header offsets seem to match the other documentation
I've found. The format is just easier to read on wikipediaThe authoritative specification can be found in the "pax" page in the
POSIX spec, which is available here:
http://pubs.opengroup.org/onlinepubs/9699919799/I agree that the 117 number is bogus, and also that the magic "ustar"
string is written incorrectly. What's more, it appears that the latter
error has been copied from pg_dump (but the 117 seems to be just a new
bug in pg_basebackup). I wonder what else might be wrong hereabouts :-(
Will sit down and take a closer look.I believe what we need to do about this is:
1. fix pg_dump and pg_basebackup output to conform to spec.
2. make sure pg_restore will accept both conformant and
previous-generation files.Am I right in believing that we don't have any code that's expected to
read pg_basebackup output? We just feed it to "tar", no?I'm a bit concerned about backwards compatibility issues. It looks to
me like existing versions of pg_restore will flat out reject files that
have a spec-compliant "ustar\0" MAGIC field. Is it going to be
sufficient if we fix this in minor-version updates, or are we going to
need to have a switch that tells pg_dump to emit the incorrect old
format? (Ick.)regards, tom lane
--
/* insert witty comment here */
Brian Weaver <cmdrclueless@gmail.com> writes:
Here are lines 321 through 329 of 'archive_read_support_format_tar.c'
from libarchive
321 /* Recognize POSIX formats. */
322 if ((memcmp(header->magic, "ustar\0", 6) == 0)
323 && (memcmp(header->version, "00", 2) == 0))
324 bid += 56;
325
326 /* Recognize GNU tar format. */
327 if ((memcmp(header->magic, "ustar ", 6) == 0)
328 && (memcmp(header->version, " \0", 2) == 0))
329 bid += 56;
I'm wondering if the original committer put the 'ustar00\0' string in by design?
The second part of that looks to me like it matches "ustar \0",
not "ustar00\0". I think the pg_dump coding is just wrong. I've
already noticed that its code for writing the checksum is pretty
brain-dead too :-(
Note that according to the wikipedia page, tar programs typically
accept files as pre-POSIX format if the checksum is okay, regardless of
what is in the magic field; and the fields that were added by POSIX
are noncritical so we'd likely never notice that they were being
ignored. (In fact, looking closer, pg_dump isn't even filling those
fields anyway, so the fact that it's not producing a compliant magic
field may be a good thing ...)
regards, tom lane
Tom,
I actually plan on doing a lot of work on the frontend pg_basebackup
for my employer. pg_basebackup is 90% of the way to a solution that I
need for doing backups of *large* databases while allowing the
database to continue to work. The problem is a lack of secondary disk
space to save a replication of the original database cluster. I want
to modify pg_basebackup to include the WAL files in the tar output. I
have several ideas but I need to code and test them. That was the main
reason I was examining the backend code.
If you're willing to wait a bit on me to code and test my extensions
to pg_basebackup I will try to address some of the deficiencies as
well add new features.
I agree the checksum algorithm could definitely use some refactoring.
I was already working on that before I retired last night.
-- Brian
On Mon, Sep 24, 2012 at 10:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Brian Weaver <cmdrclueless@gmail.com> writes:
Here are lines 321 through 329 of 'archive_read_support_format_tar.c'
from libarchive321 /* Recognize POSIX formats. */
322 if ((memcmp(header->magic, "ustar\0", 6) == 0)
323 && (memcmp(header->version, "00", 2) == 0))
324 bid += 56;
325
326 /* Recognize GNU tar format. */
327 if ((memcmp(header->magic, "ustar ", 6) == 0)
328 && (memcmp(header->version, " \0", 2) == 0))
329 bid += 56;I'm wondering if the original committer put the 'ustar00\0' string in by design?
The second part of that looks to me like it matches "ustar \0",
not "ustar00\0". I think the pg_dump coding is just wrong. I've
already noticed that its code for writing the checksum is pretty
brain-dead too :-(Note that according to the wikipedia page, tar programs typically
accept files as pre-POSIX format if the checksum is okay, regardless of
what is in the magic field; and the fields that were added by POSIX
are noncritical so we'd likely never notice that they were being
ignored. (In fact, looking closer, pg_dump isn't even filling those
fields anyway, so the fact that it's not producing a compliant magic
field may be a good thing ...)regards, tom lane
--
/* insert witty comment here */
On 9/25/12 3:38 PM, Brian Weaver wrote:
I want
to modify pg_basebackup to include the WAL files in the tar output.
Doesn't pg_basebackup -x do exactly that?
Regards,
Marko Tiikkaja
Brian Weaver <cmdrclueless@gmail.com> writes:
If you're willing to wait a bit on me to code and test my extensions
to pg_basebackup I will try to address some of the deficiencies as
well add new features.
I think it's a mistake to try to handle these issues in the same patch
as feature extensions. If you want to submit a patch for them, I'm
happy to let you do the legwork, but please keep it narrowly focused
on fixing file-format deficiencies.
The notes I had last night after examining pg_dump were:
magic number written incorrectly, but POSIX fields aren't filled anyway
(which is why tar tvf doesn't show them)
checksum code is brain-dead; no use in "lastSum" nor in looping
per spec, there should be 1024 zeroes not 512 at end of file;
this explains why tar whines about a "lone zero block" ...
Not sure which of these apply to pg_basebackup.
As far as the backwards compatibility issue goes, what seems like
a good idea after sleeping on it is (1) fix pg_dump in HEAD to emit
standard-compliant tar files; (2) fix pg_restore in HEAD and all back
branches to accept both the standard and the incorrect magic field.
This way, the only people with a compatibility problem would be those
trying to use by-then-ancient pg_restore versions to read 9.3 or later
pg_dump output.
regards, tom lane
Unless I misread the code, the tar format and streaming xlog are
mutually exclusive. Considering my normal state of fatigue it's not
unlikely. I don't want to have to set my wal_keep_segments
artificially high just for the backup
On Tue, Sep 25, 2012 at 10:05 AM, Marko Tiikkaja <pgmail@joh.to> wrote:
On 9/25/12 3:38 PM, Brian Weaver wrote:
I want
to modify pg_basebackup to include the WAL files in the tar output.Doesn't pg_basebackup -x do exactly that?
Regards,
Marko Tiikkaja
--
/* insert witty comment here */
Tom,
I'm fine with submitting highly focused patches first. I was just
explaining my end-goal. Still I will need time to patch, compile, and
test before submitting so you're not going to see any output from me
for a few days. That's all assuming my employer can leave me alone
long enough to focus on a single task. I'm far too interrupt driven at
work.
-- Brian
On Tue, Sep 25, 2012 at 10:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Brian Weaver <cmdrclueless@gmail.com> writes:
If you're willing to wait a bit on me to code and test my extensions
to pg_basebackup I will try to address some of the deficiencies as
well add new features.I think it's a mistake to try to handle these issues in the same patch
as feature extensions. If you want to submit a patch for them, I'm
happy to let you do the legwork, but please keep it narrowly focused
on fixing file-format deficiencies.The notes I had last night after examining pg_dump were:
magic number written incorrectly, but POSIX fields aren't filled anyway
(which is why tar tvf doesn't show them)checksum code is brain-dead; no use in "lastSum" nor in looping
per spec, there should be 1024 zeroes not 512 at end of file;
this explains why tar whines about a "lone zero block" ...Not sure which of these apply to pg_basebackup.
As far as the backwards compatibility issue goes, what seems like
a good idea after sleeping on it is (1) fix pg_dump in HEAD to emit
standard-compliant tar files; (2) fix pg_restore in HEAD and all back
branches to accept both the standard and the incorrect magic field.
This way, the only people with a compatibility problem would be those
trying to use by-then-ancient pg_restore versions to read 9.3 or later
pg_dump output.regards, tom lane
--
/* insert witty comment here */
On Tue, Sep 25, 2012 at 5:08 PM, Brian Weaver <cmdrclueless@gmail.com> wrote:
Unless I misread the code, the tar format and streaming xlog are
mutually exclusive. Considering my normal state of fatigue it's not
unlikely. I don't want to have to set my wal_keep_segments
artificially high just for the backup
Correct, you can't use both of those at the same time. That can
certainly be improved - but injecting a file into the tar from a
different process is far from easy. But one idea might be to just
stream the WAL into a *separate* tarfile in this case.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Tue, Sep 25, 2012 at 4:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Brian Weaver <cmdrclueless@gmail.com> writes:
While researching the way streaming replication works I was examining
the construction of the tar file header. By comparing documentation on
the tar header format from various sources I certain the following
patch should be applied to so the group identifier is put into thee
header properly.Yeah, this is definitely wrong.
While I realize that wikipedia isn't always the best source of
information, the header offsets seem to match the other documentation
I've found. The format is just easier to read on wikipediaThe authoritative specification can be found in the "pax" page in the
POSIX spec, which is available here:
http://pubs.opengroup.org/onlinepubs/9699919799/I agree that the 117 number is bogus, and also that the magic "ustar"
string is written incorrectly. What's more, it appears that the latter
error has been copied from pg_dump (but the 117 seems to be just a new
bug in pg_basebackup). I wonder what else might be wrong hereabouts :-(
Will sit down and take a closer look.I believe what we need to do about this is:
1. fix pg_dump and pg_basebackup output to conform to spec.
2. make sure pg_restore will accept both conformant and
previous-generation files.Am I right in believing that we don't have any code that's expected to
read pg_basebackup output? We just feed it to "tar", no?
Yes. We generate a .tarfile, but we have no tool that reads it
ourselves. (unlike pg_dump where we have pg_restore that actually
reads it)
I'm a bit concerned about backwards compatibility issues. It looks to
me like existing versions of pg_restore will flat out reject files that
have a spec-compliant "ustar\0" MAGIC field. Is it going to be
sufficient if we fix this in minor-version updates, or are we going to
need to have a switch that tells pg_dump to emit the incorrect old
format? (Ick.)
Do we officially support using an older pg_restore to reload a newer
dump? I think not? As long as we don't officially support that, I
think we'll be ok.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Magnus,
I probably just did a poor job of explaining what I wanted to try. I
was going to have the base backup open two connections; one to stream
the tar archive, the second to receive the wal files like
pg_receivexlog.
The wal files received on the second connection would be streamed to a
temporary file, with tar headers. Then when the complete tar archive
from the first header was complete received simply replay the contents
from the temporary file to append them to the tar archive.
Turns out that isn't necessary..... It was an idea borne from my
misunderstanding of how the pg_basebackup worked. The archive will
include all the wal files if I make wal_keep_segments high enough.
-- Brian
On Thu, Sep 27, 2012 at 6:01 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Tue, Sep 25, 2012 at 5:08 PM, Brian Weaver <cmdrclueless@gmail.com> wrote:
Unless I misread the code, the tar format and streaming xlog are
mutually exclusive. Considering my normal state of fatigue it's not
unlikely. I don't want to have to set my wal_keep_segments
artificially high just for the backupCorrect, you can't use both of those at the same time. That can
certainly be improved - but injecting a file into the tar from a
different process is far from easy. But one idea might be to just
stream the WAL into a *separate* tarfile in this case.--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
--
/* insert witty comment here */
OK, here is my attempt at patching and correcting the issue in this
thread. I have done my best to test to ensure that hot standby,
pg_basebackup, and pg_restore of older files work without issues. I
think this might be a larger patch that expected, I took some
liberties of trying to clean up a bit.
For example the block size '512' was scattered throughout the code
regarding the tar block size. I've replace instances of that with a
defined constant TAR_BLOCK_SIZE. I've likewise created other constants
and used them in place of raw numbers in what I hope makes the code a
bit more readable.
I've also used functions like strncpy(), strnlen(), and the like in
place of sprintf() where I could. Also instead of using sscanf() I
used a custom octal conversion routine which has a hard limit on how
many character it will process like strncpy() & strnlen().
I expect comments, hopefully they'll be positive.
-- Brian
--
/* insert witty comment here */
Attachments:
postgresql-tar.patchapplication/octet-stream; name=postgresql-tar.patchDownload+552-294
On Fri, Sep 28, 2012 at 12:12 AM, Brian Weaver <cmdrclueless@gmail.com> wrote:
Magnus,
I probably just did a poor job of explaining what I wanted to try. I
was going to have the base backup open two connections; one to stream
the tar archive, the second to receive the wal files like
pg_receivexlog.
This is what --xlog=stream does.
The wal files received on the second connection would be streamed to a
temporary file, with tar headers. Then when the complete tar archive
from the first header was complete received simply replay the contents
from the temporary file to append them to the tar archive.
Ah, yeah, that should also work I guess. But you could also just leave
the the data in the temporary tarfile the whole time. IIRC, you can
just cat one tarfile to the end of another one, right?
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes:
On Tue, Sep 25, 2012 at 4:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm a bit concerned about backwards compatibility issues. It looks to
me like existing versions of pg_restore will flat out reject files that
have a spec-compliant "ustar\0" MAGIC field. Is it going to be
sufficient if we fix this in minor-version updates, or are we going to
need to have a switch that tells pg_dump to emit the incorrect old
format? (Ick.)
Do we officially support using an older pg_restore to reload a newer
dump? I think not? As long as we don't officially support that, I
think we'll be ok.
Well, for the -Fc format, we have an explicit version number, and
pg_restore is supposed to be able to read anything with current or prior
version number. We don't bump the version number too often, but we've
definitely done it anytime we added new features at the file-format
level. However, since the whole point of the -Ft format is to be
standard-compliant, people might be surprised if it fell over in a
backwards-compatibility situation.
Having said all that, I don't think we have a lot of choices here.
A "tar format" output option that isn't actually tar format has hardly
any excuse to live at all.
regards, tom lane
On Fri, Sep 28, 2012 at 12:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
On Tue, Sep 25, 2012 at 4:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm a bit concerned about backwards compatibility issues. It looks to
me like existing versions of pg_restore will flat out reject files that
have a spec-compliant "ustar\0" MAGIC field. Is it going to be
sufficient if we fix this in minor-version updates, or are we going to
need to have a switch that tells pg_dump to emit the incorrect old
format? (Ick.)Do we officially support using an older pg_restore to reload a newer
dump? I think not? As long as we don't officially support that, I
think we'll be ok.Well, for the -Fc format, we have an explicit version number, and
pg_restore is supposed to be able to read anything with current or prior
version number. We don't bump the version number too often, but we've
definitely done it anytime we added new features at the file-format
level. However, since the whole point of the -Ft format is to be
standard-compliant, people might be surprised if it fell over in a
backwards-compatibility situation.Having said all that, I don't think we have a lot of choices here.
A "tar format" output option that isn't actually tar format has hardly
any excuse to live at all.
Yeah, that's what I'm thinking - it's really a bugfix...
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Brian Weaver <cmdrclueless@gmail.com> writes:
OK, here is my attempt at patching and correcting the issue in this
thread. I have done my best to test to ensure that hot standby,
pg_basebackup, and pg_restore of older files work without issues. I
think this might be a larger patch that expected, I took some
liberties of trying to clean up a bit.
For example the block size '512' was scattered throughout the code
regarding the tar block size. I've replace instances of that with a
defined constant TAR_BLOCK_SIZE. I've likewise created other constants
and used them in place of raw numbers in what I hope makes the code a
bit more readable.
That seems possibly reasonable ...
I've also used functions like strncpy(), strnlen(), and the like in
place of sprintf() where I could. Also instead of using sscanf() I
used a custom octal conversion routine which has a hard limit on how
many character it will process like strncpy() & strnlen().
... but I doubt that this really constitutes a readability improvement.
Or a portability improvement. strnlen for instance is not to be found
in Single Unix Spec v2 (http://pubs.opengroup.org/onlinepubs/007908799/)
which is what we usually take as our baseline assumption about which
system functions are available everywhere. By and large, I think the
more different system functions you rely on, the harder it is to read
your code, even if some unusual system function happens to exactly match
your needs in particular places. It also greatly increases the risk
of having portability problems, eg on Windows, or non-mainstream Unix
platforms.
But a larger point is that the immediate need is to fix bugs. Code
beautification is a separate activity and would be better submitted as
a separate patch. There is no way I'd consider applying most of this
patch to the back branches, for instance.
regards, tom lane
Magnus Hagander <magnus@hagander.net> writes:
Ah, yeah, that should also work I guess. But you could also just leave
the the data in the temporary tarfile the whole time. IIRC, you can
just cat one tarfile to the end of another one, right?
Not if they're written according to spec, I think. There is an EOF
marker consisting of 2 blocks of zeroes (which pg_dump is failing to
create correctly, but that's a bug not a feature). Possibly you could
strip the EOF marker though, if the file join code is allowed to be
something smarter than "cat".
regards, tom lane