[patch] pg_copy - a command for reliable WAL archiving

Started by Tsunakawa, Takayukialmost 12 years ago26 messageshackers
Jump to latest
#1Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com

Hello,

As I proposed before in the thread below, I've implemented a simple command
for reliable WAL archiving. I would appreciate it if you could review and
test the patch.

/messages/by-id/9C1EB95CA1F34DAB93DF549A51E3E874@maumau

Regards
MauMau

Attachments:

pg_copy.patchapplication/octet-stream; name=pg_copy.patchDownload+419-8
#2Abhijit Menon-Sen
ams@2ndQuadrant.com
In reply to: Tsunakawa, Takayuki (#1)
Re: [patch] pg_copy - a command for reliable WAL archiving

At 2014-06-17 23:26:37 +0900, maumau307@gmail.com wrote:

Hello,

As I proposed before in the thread below, I've implemented a simple
command for reliable WAL archiving. I would appreciate it if you
could review and test the patch.

Please add your patch to the next CF (i.e. 2014-08), so that it gets
review attention at the appropriate time.

-- Abhijit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Joe Conway
mail@joeconway.com
In reply to: Tsunakawa, Takayuki (#1)
Re: [patch] pg_copy - a command for reliable WAL archiving

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/17/2014 07:26 AM, MauMau wrote:

Hello,

As I proposed before in the thread below, I've implemented a simple
command for reliable WAL archiving. I would appreciate it if you could
review and test the patch.

/messages/by-id/9C1EB95CA1F34DAB93DF549A51E3E874@maumau

That first hunk refers to dblink -- I'm guessing it does not belong with
this patch.

Joe

- --
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJToFiiAAoJEDfy90M199hlseMP/Rpwneo5la7mJkHJA0BWUtj/
Hh8yNzzyBVPbKYuTEOZWi2yblFW6yA2dknrYD8RigS1qJFwLw+drFt5673Vi6jKW
Pf7Qn62Ck/U0lZTGXUU9akOhSx7BsKVwH8HvdARIp5DSV2n7/HFDtazi3hTtFq31
GHKA84lPwuynODN42eVez0YXdeRUX7K/s5rCRq154q3BrLPhEEvQwo/kfhet3F0A
utf0ymSPuX3RvpGDDPAZ1oQxTd/xA+qJhX0jrsRoJVaY40rufTgXPmAnNFOdWEds
fXs6ObRm+tHmeLhYVUXhODa1HPHBQMvElVeB3CGxhPUvhP2494sfoLOd7qs8Lblg
oUkYbIJee8Ir7NN34Gc69YW58sekPSVI4vXKu0PwF++Ubs4MYNNd7nP4Wu9N2ISw
p/GPIbx0uR3NbFGCoOLD0K3QHnX/b0FTWHzTTboZ+b69rNIDePpn3eGO2QEOLL/R
P/YkPma8SLyDvNnCzuSU3XDkFmWQsgK7xa7qpsBR1mbdF7zKPfOxCtoby/enSeuk
z7NJxtpHeUQkO7Pb3ZNk6gL+E8UAQihvdBgdBwBzj4qaZyAM5ec29aya3TtBbF+3
UoFX3m4tthR6mMWqizsdadSPvozDjMrhqSRFAjdBSX80Nfs2DVN1Hepp8NXvjRzV
b5RfV7x4yvtr92FFQboj
=+TWh
-----END PGP SIGNATURE-----

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Abhijit Menon-Sen
ams@2ndQuadrant.com
In reply to: Joe Conway (#3)
Re: [patch] pg_copy - a command for reliable WAL archiving

At 2014-06-17 08:02:59 -0700, mail@joeconway.com wrote:

That first hunk refers to dblink -- I'm guessing it does not belong
with this patch.

Yes, it's a leftover of the dblink memory leak patch that's in this CF.

-- Abhijit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Joe Conway (#3)
Re: [patch] pg_copy - a command for reliable WAL archiving

From: "Joe Conway" <mail@joeconway.com>

That first hunk refers to dblink -- I'm guessing it does not belong with
this patch.

Ouch, what a careless mistake. The attached one is clean. I'll update the
CommitFest entry when I'm back home from work.

Regards
MauMau

Attachments:

pg_copy_v2.patchapplication/octet-stream; name=pg_copy_v2.patchDownload+417-8
#6Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Tsunakawa, Takayuki (#5)
Re: [patch] pg_copy - a command for reliable WAL archiving

I fixed some minor mistakes.

Regards
MauMau

Attachments:

pg_copy_v3.patchapplication/octet-stream; name=pg_copy_v3.patchDownload+431-8
#7Fujii Masao
masao.fujii@gmail.com
In reply to: Tsunakawa, Takayuki (#6)
Re: [patch] pg_copy - a command for reliable WAL archiving

On Thu, Aug 14, 2014 at 1:52 PM, MauMau <maumau307@gmail.com> wrote:

I fixed some minor mistakes.

What's the main purpose of this tool? If it's for WAL archiving, the tool name
"pg_copy" sounds too generic. We already have pg_archivecleanup, so maybe
"pg_archivecopy" or something is better for the consistency?

pg_copy in the patch copies the file to the destination in a
straightforward way,
i.e., directly copies the file to the dest file with actual name. This can cause
the problem which some people reported. The problem is that, when the server
crashes while WAL file is being archived by cp command, its partially-filled
WAL file remains at the archival area. This half-baked archive file can cause
various troubles. To address this, WAL file needs to be copied to the temporary
file at first, then renamed to the actual name. I think that pg_copy should
copy the WAL file in that way.

Currently pg_copy always syncs the archive file, and there is no way to disable
that. But I'm sure that not everyone want to sync the archive file. So I think
that it's better to add the option specifying whether to sync the file
or not, into
pg_copy.

Some users might want to specify whether to call posix_fadvise or not because
they might need to re-read the archvied files just after the archiving.
For example, network copy of the archived files from the archive area to
remote site for disaster recovery.

Do you recommend to use pg_copy for restore_command? If yes, it also should
be documented. And in the WAL restore case, the restored WAL files are re-read
soon by recovery, so posix_fadvise is not good in that case.

Direct I/O and posix_fadvise are used only for destination file. But why not
source file? That might be useful especially for restore_command case.

At last, the big question is, is there really no OS command which provides
the same functionality as pg_copy does? If there is, I'd like to avoid duplicate
work basically.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Fujii Masao (#7)
Re: [patch] pg_copy - a command for reliable WAL archiving

Fujii Masao <masao.fujii@gmail.com> wrote:

Direct I/O and posix_fadvise are used only for destination file.
But why not source file? That might be useful especially for
restore_command case.

That would prevent people from piping the file through a
compression utility.  We should support piped I/O for input (and if
we want to use it on the recovery side, for the output, too), at
least as an option.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Fujii Masao (#7)
Re: [patch] pg_copy - a command for reliable WAL archiving

From: "Fujii Masao" <masao.fujii@gmail.com>

What's the main purpose of this tool? If it's for WAL archiving, the tool
name
"pg_copy" sounds too generic. We already have pg_archivecleanup, so maybe
"pg_archivecopy" or something is better for the consistency?

pg_copy in the patch copies the file to the destination in a
straightforward way,
i.e., directly copies the file to the dest file with actual name. This can
cause
the problem which some people reported. The problem is that, when the
server
crashes while WAL file is being archived by cp command, its
partially-filled
WAL file remains at the archival area. This half-baked archive file can
cause
various troubles. To address this, WAL file needs to be copied to the
temporary
file at first, then renamed to the actual name. I think that pg_copy
should
copy the WAL file in that way.

I intended to make pg_copy a straightforward replacement of cp/copy, which
complements the missing sync. Direct I/O and posix_fadvice() feature may be
convenient but not essential for this utility. cp/copy doesn't copy to a
temporary file, and the problem can be solved easily by mv/move. I wanted
to keep pg_copy as generic as cp/copy, so that it can be used by some
advanced features in the future, e.g. comprehensive backup/recovery
management like RMAN (this example may not be best) when it's integrated
into the core.

With that said, copying to a temporary file like <dest>.tmp and renaming it
to <dest> sounds worthwhile even as a basic copy utility. I want to avoid
copying to a temporary file with a fixed name like _copy.tmp, because some
advanced utility may want to run multiple instances of pg_copy to copy
several files into the same directory simultaneously. However, I'm afraid
multiple <dest>.tmp files might continue to occupy disk space after
canceling copy or power failure in some use cases, where the copy of the
same file won't be retried. That's also the reason why I chose to not use a
temporary file like cp/copy.

Currently pg_copy always syncs the archive file, and there is no way to
disable
that. But I'm sure that not everyone want to sync the archive file. So I
think
that it's better to add the option specifying whether to sync the file
or not, into
pg_copy.

pg_copy is for copying a file reliably by syncing. If sync is not
necessary, people can use cp/copy.

Some users might want to specify whether to call posix_fadvise or not
because
they might need to re-read the archvied files just after the archiving.
For example, network copy of the archived files from the archive area to
remote site for disaster recovery.

This sounds reasonable. Do you have an idea on the switch name and the
default behavior?

Do you recommend to use pg_copy for restore_command? If yes, it also
should
be documented. And in the WAL restore case, the restored WAL files are
re-read
soon by recovery, so posix_fadvise is not good in that case.

Direct I/O and posix_fadvise are used only for destination file. But why
not
source file? That might be useful especially for restore_command case.

No, I don't think it's necessary to use pg_copy for restore_command.

At last, the big question is, is there really no OS command which provides
the same functionality as pg_copy does? If there is, I'd like to avoid
duplicate
work basically.

If there exists such a command available in the standard OS installation, I
want to use it.

Regards
MauMau

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Peter Eisentraut
peter_e@gmx.net
In reply to: Tsunakawa, Takayuki (#9)
Re: [patch] pg_copy - a command for reliable WAL archiving

On 8/19/14 9:35 AM, MauMau wrote:

pg_copy is for copying a file reliably by syncing. If sync is not
necessary, people can use cp/copy.

I'm getting mixed messages from this thread.

I think there could be a fair amount of support for a new tool that can
serve as a universal plug-and-play archive_command, with a variety of
options, such as fsync yes/no, fadvise yes/no, direct-io[*] yes/no,
atomic copy yes/no, allow overwrite yes/no, compression yes/no. That
would reduce the need for users to compose adventurous shell commands,
and it would set out the various options more clearly.

This is not that. This is cp+fsync with a hardcoded fadvise policy and
optional direct-io. That is a valid problem to address, but it is one
of many. On the other hand, I fear that the addition of this
single-and-a-half-purpose tool would make the overall landscape more
complicated than it already is. Since it's in the examples, people will
probably use it, even if they don't need to or shouldn't. And not
recommending it for the restore_command is also confusing.

Another example of how confusing all of this is: On Windows, the copy
command by default doesn't overwrite files, which is what we want
(usually). The patch changes those instances of copy to pg_copy, but it
doesn't have that behavior. Should the examples by changed to do a test
&& pg_copy on Windows (what's the Windows shell syntax for that?), or
should pg_copy have an option to not overwrite a file? How do you then
avoid inconsistencies with the Unix behavior? Or what if you want fsync
but allow overwriting on Windows?

On the technical side, I think if you fsync a new file, you also need to
fsync the directory, to make sure the file is certain to be visible
after a crash.

[*] I keep reading "directio" as a typo of "direction", so please
consider putting a hyphen or underscore in the option and variable
names. ;-)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Peter Eisentraut
peter_e@gmx.net
In reply to: Fujii Masao (#7)
Re: [patch] pg_copy - a command for reliable WAL archiving

On 8/15/14 10:46 AM, Fujii Masao wrote:

At last, the big question is, is there really no OS command which provides
the same functionality as pg_copy does? If there is, I'd like to avoid duplicate
work basically.

If you look hard enough, you can maybe find an OS command that can fsync
a file after it was copied. Some versions of dd can do that, and some
systems have an fsync program. But it's not clear whether all systems
have that, and it probably won't be simple and consistent.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Josh Berkus
josh@agliodbs.com
In reply to: Tsunakawa, Takayuki (#1)
Re: [patch] pg_copy - a command for reliable WAL archiving

On 08/19/2014 12:37 PM, Peter Eisentraut wrote:

On 8/19/14 9:35 AM, MauMau wrote:

pg_copy is for copying a file reliably by syncing. If sync is not
necessary, people can use cp/copy.

I'm getting mixed messages from this thread.

I think there could be a fair amount of support for a new tool that can
serve as a universal plug-and-play archive_command, with a variety of
options, such as fsync yes/no, fadvise yes/no, direct-io[*] yes/no,
atomic copy yes/no, allow overwrite yes/no, compression yes/no. That
would reduce the need for users to compose adventurous shell commands,
and it would set out the various options more clearly.

Yes please!

Although I'm not sold on the idea of using DirectIO for this. Is there
really enough benefit to make it worth the trouble?

This is not that. This is cp+fsync with a hardcoded fadvise policy and
optional direct-io. That is a valid problem to address, but it is one
of many. On the other hand, I fear that the addition of this
single-and-a-half-purpose tool would make the overall landscape more
complicated than it already is. Since it's in the examples, people will
probably use it, even if they don't need to or shouldn't. And not
recommending it for the restore_command is also confusing.

I'm afraid that I agree with Peter here. pg_copy looks like a nice
foundation for the eventual pg_copy utility we need, but it's not there yet.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tsunakawa, Takayuki (#9)
Re: [patch] pg_copy - a command for reliable WAL archiving

MauMau wrote:

With that said, copying to a temporary file like <dest>.tmp and
renaming it to <dest> sounds worthwhile even as a basic copy
utility. I want to avoid copying to a temporary file with a fixed
name like _copy.tmp, because some advanced utility may want to run
multiple instances of pg_copy to copy several files into the same
directory simultaneously. However, I'm afraid multiple <dest>.tmp
files might continue to occupy disk space after canceling copy or
power failure in some use cases, where the copy of the same file
won't be retried. That's also the reason why I chose to not use a
temporary file like cp/copy.

Is there a way to create a link to a file which only exists as an open
file descriptor? If there was, you could create a temp file, open an
fd, then delete the file. That would remove the issue with files being
leaked due to failures of various kinds.

Also, it's been mentioned that this utility might be useful for
restore_command. That sounds good I guess, but need to keep the
RECOVERYXLOG trick in mind. I remember a case of stalled replay because
the restore command was writing to RECOVERYXLOG.gz and ungzipping, and
the unlink("RECOVERYXLOG") call failed after a partial copy and so did
the copy from the archive. (Removing the borked RECOVERYXLOG.gz fixed
it.)

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#13)
Re: [patch] pg_copy - a command for reliable WAL archiving

On Tue, Aug 19, 2014 at 10:42 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Is there a way to create a link to a file which only exists as an open
file descriptor? If there was, you could create a temp file, open an
fd, then delete the file. That would remove the issue with files being
leaked due to failures of various kinds.

Sort of. On recent Linuxen you can create a file with open(O_TMPFILE)
then use linkat(2) to create a link for it only once it's fully
written.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#14)
Re: [patch] pg_copy - a command for reliable WAL archiving

c.f.:

O_TMPFILE (since Linux 3.11)
Create an unnamed temporary file. The pathname argument
specifies a directory; an unnamed inode will be created in
that directory's filesystem. Anything written to the
resulting file will be lost when the last file descriptor is
closed, unless the file is given a name.

O_TMPFILE must be specified with one of O_RDWR or O_WRONLY
and, optionally, O_EXCL. If O_EXCL is not specified, then
linkat(2) can be used to link the temporary file into the
filesystem, making it permanent, using code like the
following:

char path[PATH_MAX];
fd = open("/path/to/dir", O_TMPFILE | O_RDWR,
S_IRUSR | S_IWUSR);

/* File I/O on 'fd'... */

snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd);
linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file",
AT_SYMLINK_FOLLOW);

In this case, the open() mode argument determines the file
permission mode, as with O_CREAT.

Specifying O_EXCL in conjunction with O_TMPFILE prevents a
temporary file from being linked into the filesystem in the
above manner. (Note that the meaning of O_EXCL in this case
is different from the meaning of O_EXCL otherwise.)

There are two main use cases for O_TMPFILE:

* Improved tmpfile(3) functionality: race-free creation of
temporary files that (1) are automatically deleted when
closed; (2) can never be reached via any pathname; (3) are
not subject to symlink attacks; and (4) do not require the
caller to devise unique names.

* Creating a file that is initially invisible, which is then
populated with data and adjusted to have appropriate
filesystem attributes (chown(2), chmod(2), fsetxattr(2),
etc.) before being atomically linked into the filesystem
in a fully formed state (using linkat(2) as described
above).

O_TMPFILE requires support by the underlying filesystem; only
a subset of Linux filesystems provide that support. In the
initial implementation, support was provided in the ext2,
ext3, ext4, UDF, Minix, and shmem filesystems. XFS support
was added in Linux 3.15.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#15)
Re: [patch] pg_copy - a command for reliable WAL archiving

Greg Stark wrote:

char path[PATH_MAX];
fd = open("/path/to/dir", O_TMPFILE | O_RDWR,
S_IRUSR | S_IWUSR);

/* File I/O on 'fd'... */

snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd);
linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file",
AT_SYMLINK_FOLLOW);

Hmm, the real trick here is linkat(... "/proc/self/foobar"), not the
O_TMPFILE: you can have an open file descriptor to an "invisible" file
simply by creating a normal file and unlinking it. I looked at linkat()
yesterday but the idea of using /proc/self didn't occur to me. Nasty
trick :-( It seems linkat() is quite a bit more portable than
O_TMPFILE, fortunately ...

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#13)
Re: [patch] pg_copy - a command for reliable WAL archiving

On 2014-08-19 17:42:06 -0400, Alvaro Herrera wrote:

MauMau wrote:

With that said, copying to a temporary file like <dest>.tmp and
renaming it to <dest> sounds worthwhile even as a basic copy
utility. I want to avoid copying to a temporary file with a fixed
name like _copy.tmp, because some advanced utility may want to run
multiple instances of pg_copy to copy several files into the same
directory simultaneously. However, I'm afraid multiple <dest>.tmp
files might continue to occupy disk space after canceling copy or
power failure in some use cases, where the copy of the same file
won't be retried. That's also the reason why I chose to not use a
temporary file like cp/copy.

Is there a way to create a link to a file which only exists as an open
file descriptor? If there was, you could create a temp file, open an
fd, then delete the file. That would remove the issue with files being
leaked due to failures of various kinds.

Isn't this a solution looking for a problem? We're using tempfiles in
dozens of other places and I really don't see why this is the place to
stop doing so. Just copy to <dest>.tmp and move it into place. If things
crash during that, the command will be repeated shortly afterwards again
*anyway*. Let's not get into platform specific games here.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#17)
Re: [patch] pg_copy - a command for reliable WAL archiving

Andres Freund wrote:

On 2014-08-19 17:42:06 -0400, Alvaro Herrera wrote:

MauMau wrote:

With that said, copying to a temporary file like <dest>.tmp and
renaming it to <dest> sounds worthwhile even as a basic copy
utility. I want to avoid copying to a temporary file with a fixed
name like _copy.tmp, because some advanced utility may want to run
multiple instances of pg_copy to copy several files into the same
directory simultaneously. However, I'm afraid multiple <dest>.tmp
files might continue to occupy disk space after canceling copy or
power failure in some use cases, where the copy of the same file
won't be retried. That's also the reason why I chose to not use a
temporary file like cp/copy.

Is there a way to create a link to a file which only exists as an open
file descriptor? If there was, you could create a temp file, open an
fd, then delete the file. That would remove the issue with files being
leaked due to failures of various kinds.

Isn't this a solution looking for a problem? We're using tempfiles in
dozens of other places and I really don't see why this is the place to
stop doing so. Just copy to <dest>.tmp and move it into place. If things
crash during that, the command will be repeated shortly afterwards again
*anyway*. Let's not get into platform specific games here.

The issue is what happens if there's a crash while the temp file is in
the middle of being filled. I agree there are bigger problems to solve
in this area though. Yes, I agree that a fixed name such as _copy.tmp
is a really bad choice for several reasons.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#16)
Re: [patch] pg_copy - a command for reliable WAL archiving

On Wed, Aug 20, 2014 at 2:27 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Hmm, the real trick here is linkat(... "/proc/self/foobar"), not the
O_TMPFILE: you can have an open file descriptor to an "invisible" file
simply by creating a normal file and unlinking it. I looked at linkat()
yesterday but the idea of using /proc/self didn't occur to me. Nasty
trick :-( It seems linkat() is quite a bit more portable than
O_TMPFILE, fortunately ...

Supposedly linkat(2) on Linux refuses to create a link to a file that
was opened normally and had its last link removed with unlink(2) due
to concerns that this would create security holes.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#18)
Re: [patch] pg_copy - a command for reliable WAL archiving

On 2014-08-20 09:50:56 -0400, Alvaro Herrera wrote:

Andres Freund wrote:

On 2014-08-19 17:42:06 -0400, Alvaro Herrera wrote:

MauMau wrote:

With that said, copying to a temporary file like <dest>.tmp and
renaming it to <dest> sounds worthwhile even as a basic copy
utility. I want to avoid copying to a temporary file with a fixed
name like _copy.tmp, because some advanced utility may want to run
multiple instances of pg_copy to copy several files into the same
directory simultaneously. However, I'm afraid multiple <dest>.tmp
files might continue to occupy disk space after canceling copy or
power failure in some use cases, where the copy of the same file
won't be retried. That's also the reason why I chose to not use a
temporary file like cp/copy.

Is there a way to create a link to a file which only exists as an open
file descriptor? If there was, you could create a temp file, open an
fd, then delete the file. That would remove the issue with files being
leaked due to failures of various kinds.

Isn't this a solution looking for a problem? We're using tempfiles in
dozens of other places and I really don't see why this is the place to
stop doing so. Just copy to <dest>.tmp and move it into place. If things
crash during that, the command will be repeated shortly afterwards again
*anyway*. Let's not get into platform specific games here.

The issue is what happens if there's a crash while the temp file is in
the middle of being filled.

The archive command will be be run again a couple seconds and remove the
half-filled temp file.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#19)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#20)
#23Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#22)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#23)
#25Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#24)
#26Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#25)