pg_archivecleanup bug

Started by Kevin Grittnerover 12 years ago33 messageshackers
Jump to latest
#1Kevin Grittner
Kevin.Grittner@wicourts.gov

An EDB customer reported a problem with pg_archivecleanup which I
have looked into and found a likely cause.  It is, in any event, a
bug which I think should be fixed.  It has to do with our use of
the readdir() function:

http://pubs.opengroup.org/onlinepubs/7908799/xsh/readdir_r.html

These are the relevant bits:

| Applications wishing to check for error situations should set
| errno to 0 before calling readdir(). If errno is set to non-zero
| on return, an error occurred.

| Upon successful completion, readdir() returns a pointer to an
| object of type struct dirent. When an error is encountered, a
| null pointer is returned and errno is set to indicate the error.
| When the end of the directory is encountered, a null pointer is
| returned and errno is not changed.

Here is our current usage:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=contrib/pg_archivecleanup/pg_archivecleanup.c;h=8f77998de12f95f41bb95c3e05a14de6cdf18047;hb=7800229b36d0444cf2c61f5c5895108ee5e8ee2a#l110

So an error in scanning the directory will not be reported; the
cleanup will quietly terminate the WAL deletions without processing
the remainder of the directory.  Attached is the simplest fix,
which would report the error, stop looking for WAL files, and
continue with other clean-ups. I'm not sure we should keep the fix
that simple.  We could set a flag so that we would exit with a
non-zero code, or we could try a new directory scan as long as the
last scan found and deleted at least one WAL file.  Perhaps we want
to back-patch the simple fix and do something fancier for 9.4?

I would also add a few comment lines before committing this, if we
decide to go with the simple approach -- this is for purposes of
illustration; to facilitate discussion.

Thoughts?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

archivecleanup-dir-error-v1.difftext/x-diff; name=archivecleanup-dir-error-v1.diffDownload+6-1
#2Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#1)
Re: pg_archivecleanup bug

On Thu, Dec 5, 2013 at 3:06 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

An EDB customer reported a problem with pg_archivecleanup which I
have looked into and found a likely cause. It is, in any event, a
bug which I think should be fixed. It has to do with our use of
the readdir() function:

http://pubs.opengroup.org/onlinepubs/7908799/xsh/readdir_r.html

These are the relevant bits:

| Applications wishing to check for error situations should set
| errno to 0 before calling readdir(). If errno is set to non-zero
| on return, an error occurred.

| Upon successful completion, readdir() returns a pointer to an
| object of type struct dirent. When an error is encountered, a
| null pointer is returned and errno is set to indicate the error.
| When the end of the directory is encountered, a null pointer is
| returned and errno is not changed.

Here is our current usage:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=contrib/pg_archivecleanup/pg_archivecleanup.c;h=8f77998de12f95f41bb95c3e05a14de6cdf18047;hb=7800229b36d0444cf2c61f5c5895108ee5e8ee2a#l110

So an error in scanning the directory will not be reported; the
cleanup will quietly terminate the WAL deletions without processing
the remainder of the directory. Attached is the simplest fix,
which would report the error, stop looking for WAL files, and
continue with other clean-ups. I'm not sure we should keep the fix
that simple. We could set a flag so that we would exit with a
non-zero code, or we could try a new directory scan as long as the
last scan found and deleted at least one WAL file. Perhaps we want
to back-patch the simple fix and do something fancier for 9.4?

A directory that you can't read sounds like a pretty bad thing. I'd
be inclined to print an error message and exit forthwith.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#1)
Re: pg_archivecleanup bug

Kevin Grittner <kgrittn@ymail.com> writes:

| Applications wishing to check for error situations should set
| errno to 0 before calling readdir(). If errno is set to non-zero
| on return, an error occurred.

So an error in scanning the directory will not be reported; the
cleanup will quietly terminate the WAL deletions without processing
the remainder of the directory.  Attached is the simplest fix,
which would report the error, stop looking for WAL files, and
continue with other clean-ups. I'm not sure we should keep the fix
that simple.  We could set a flag so that we would exit with a
non-zero code, or we could try a new directory scan as long as the
last scan found and deleted at least one WAL file.  Perhaps we want
to back-patch the simple fix and do something fancier for 9.4?

A quick grep shows about ten other readdir() usages, most of which
have a similar disease.

In general, I think there is no excuse for code in the backend to use
readdir() directly; it should be using ReadDir(), which takes care of this
as well as error reporting. It appears that src/backend/storage/ipc/dsm.c
didn't get that memo; it certainly is innocent of any error checking
concerns. But the other usages seem to be in assorted utilities, which
will need to do it right for themselves. initdb.c's walkdir() seems to
have it right and might be a reasonable model to follow. Or maybe we
should invent a frontend-friendly version of ReadDir() rather than
duplicating all the error checking code in ten-and-counting places?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#3)
Re: pg_archivecleanup bug

On Thu, Dec 5, 2013 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

In general, I think there is no excuse for code in the backend to use
readdir() directly; it should be using ReadDir(), which takes care of this
as well as error reporting.

My understanding is that the fd.c infrastructure can't be used in the
postmaster.

I agree that sucks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#4)
Re: pg_archivecleanup bug

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Dec 5, 2013 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

In general, I think there is no excuse for code in the backend to use
readdir() directly; it should be using ReadDir(), which takes care of this
as well as error reporting.

My understanding is that the fd.c infrastructure can't be used in the
postmaster.

Say what? See ParseConfigDirectory for code that certainly runs in the
postmaster, and uses ReadDir().

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#5)
Re: pg_archivecleanup bug

On Fri, Dec 6, 2013 at 11:10 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Dec 5, 2013 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

In general, I think there is no excuse for code in the backend to use
readdir() directly; it should be using ReadDir(), which takes care of this
as well as error reporting.

My understanding is that the fd.c infrastructure can't be used in the
postmaster.

Say what? See ParseConfigDirectory for code that certainly runs in the
postmaster, and uses ReadDir().

Gosh, I could have sworn that I had calls into fd.c that were crashing
and burning during development because they happened too early in
postmaster startup. But it seems to work fine now, so I've pushed a
fix for this and a few related issues. Please let me know if you
think there are remaining issues.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#3)
Re: pg_archivecleanup bug

On Thu, Dec 5, 2013 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

But the other usages seem to be in assorted utilities, which
will need to do it right for themselves. initdb.c's walkdir() seems to
have it right and might be a reasonable model to follow. Or maybe we
should invent a frontend-friendly version of ReadDir() rather than
duplicating all the error checking code in ten-and-counting places?

If there's enough uniformity in all of those places to make that
feasible, it certainly seems wise to do it that way. I don't know if
that's the case, though - e.g. maybe some callers want to exit and
others do not. pg_resetxlog wants to exit; pg_archivecleanup and
pg_standby most likely want to print an error and carry on.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Bruce Momjian
bruce@momjian.us
In reply to: Kevin Grittner (#1)
Re: pg_archivecleanup bug

On Thu, Dec 5, 2013 at 12:06:07PM -0800, Kevin Grittner wrote:

An EDB customer reported a problem with pg_archivecleanup which I
have looked into and found a likely cause.� It is, in any event, a
bug which I think should be fixed.� It has to do with our use of
the readdir() function:

http://pubs.opengroup.org/onlinepubs/7908799/xsh/readdir_r.html

These are the relevant bits:

| Applications wishing to check for error situations should set
| errno to 0 before calling readdir(). If errno is set to non-zero
| on return, an error occurred.

| Upon successful completion, readdir() returns a pointer to an
| object of type struct dirent. When an error is encountered, a
| null pointer is returned and errno is set to indicate the error.
| When the end of the directory is encountered, a null pointer is
| returned and errno is not changed.

Wow, another case where errno clearing is necessary. We were just
looking this requirement for getpwuid() last week.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#7)
Re: pg_archivecleanup bug

On Mon, Dec 9, 2013 at 11:27:28AM -0500, Robert Haas wrote:

On Thu, Dec 5, 2013 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

But the other usages seem to be in assorted utilities, which
will need to do it right for themselves. initdb.c's walkdir() seems to
have it right and might be a reasonable model to follow. Or maybe we
should invent a frontend-friendly version of ReadDir() rather than
duplicating all the error checking code in ten-and-counting places?

If there's enough uniformity in all of those places to make that
feasible, it certainly seems wise to do it that way. I don't know if
that's the case, though - e.g. maybe some callers want to exit and
others do not. pg_resetxlog wants to exit; pg_archivecleanup and
pg_standby most likely want to print an error and carry on.

I have developed the attached patch which fixes all cases where
readdir() wasn't checking for errno, and cleaned up the syntax in other
cases to be consistent.

While I am not a fan of backpatching, the fact we are ignoring errors in
some critical cases seems the non-cosmetic parts should be backpatched.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

Attachments:

readdir.difftext/x-diff; charset=us-asciiDownload+42-33
#10Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#9)
Re: pg_archivecleanup bug

On Thu, Mar 13, 2014 at 1:48 AM, Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Dec 9, 2013 at 11:27:28AM -0500, Robert Haas wrote:

On Thu, Dec 5, 2013 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

But the other usages seem to be in assorted utilities, which
will need to do it right for themselves. initdb.c's walkdir() seems to
have it right and might be a reasonable model to follow. Or maybe we
should invent a frontend-friendly version of ReadDir() rather than
duplicating all the error checking code in ten-and-counting places?

If there's enough uniformity in all of those places to make that
feasible, it certainly seems wise to do it that way. I don't know if
that's the case, though - e.g. maybe some callers want to exit and
others do not. pg_resetxlog wants to exit; pg_archivecleanup and
pg_standby most likely want to print an error and carry on.

I have developed the attached patch which fixes all cases where
readdir() wasn't checking for errno, and cleaned up the syntax in other
cases to be consistent.

Thanks!

While I am not a fan of backpatching, the fact we are ignoring errors in
some critical cases seems the non-cosmetic parts should be backpatched.

While I haven't read the patch, I agree that this is a back-patchable bug fix.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Amit Kapila
amit.kapila16@gmail.com
In reply to: Bruce Momjian (#9)
Re: pg_archivecleanup bug

On Thu, Mar 13, 2014 at 11:18 AM, Bruce Momjian <bruce@momjian.us> wrote:

I have developed the attached patch which fixes all cases where
readdir() wasn't checking for errno, and cleaned up the syntax in other
cases to be consistent.

1. One common thing missed wherever handling for errno is added
is below check which is present in all existing cases where errno
is used (initdb.c, pg_resetxlog.c, ReadDir, ..)

#ifdef WIN32
/*
* This fix is in mingw cvs (runtime/mingwex/dirent.c rev 1.4), but not in
* released version
*/
if (GetLastError() == ERROR_NO_MORE_FILES)
errno = 0;
#endif

2.
! if (errno || closedir(chkdir) == -1)
result = -1; /* some kind of I/O error? */

Is there a special need to check return value of closedir in this
function, as all other uses (initdb.c, pg_resetxlog.c, pgfnames.c)
of it in similar context doesn't check the same?

One thing I think for which this code needs change is to check
errno before closedir as is done in initdb.c or pg_resetxlog.c

While I am not a fan of backpatching, the fact we are ignoring errors in
some critical cases seems the non-cosmetic parts should be backpatched.

+1

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Bruce Momjian
bruce@momjian.us
In reply to: Amit Kapila (#11)
Re: pg_archivecleanup bug

On Tue, Mar 18, 2014 at 11:25:46AM +0530, Amit Kapila wrote:

On Thu, Mar 13, 2014 at 11:18 AM, Bruce Momjian <bruce@momjian.us> wrote:

I have developed the attached patch which fixes all cases where
readdir() wasn't checking for errno, and cleaned up the syntax in other
cases to be consistent.

1. One common thing missed wherever handling for errno is added
is below check which is present in all existing cases where errno
is used (initdb.c, pg_resetxlog.c, ReadDir, ..)

#ifdef WIN32
/*
* This fix is in mingw cvs (runtime/mingwex/dirent.c rev 1.4), but not in
* released version
*/
if (GetLastError() == ERROR_NO_MORE_FILES)
errno = 0;
#endif

Very good point. I have modified the patch to add this block in all
cases where it was missing. I started to wonder about the comment and
if the Mingw fix was released. Based on some research, I see this as
fixed in mingw-runtime-3.2, released 2003-10-10. That's pretty old.
(What I don't know is when that was paired with Msys in a bundled
release.) Here is the Mingw fixed code:

http://ftp.ntua.gr/mirror/mingw/OldFiles/mingw-runtime-3.2-src.tar.gz
{
/* Get the next search entry. */
if (_tfindnext (dirp->dd_handle, &(dirp->dd_dta)))
{
/* We are off the end or otherwise error.
_findnext sets errno to ENOENT if no more file
Undo this. */
DWORD winerr = GetLastError();
if (winerr == ERROR_NO_MORE_FILES)
errno = 0;

The current code has a better explanation:

http://sourceforge.net/p/mingw/mingw-org-wsl/ci/master/tree/src/libcrt/tchar/dirent.c
if( dirp->dd_private.dd_stat++ > 0 )
{
/* Otherwise...
*
* Get the next search entry. POSIX mandates that this must
* return NULL after the last entry has been read, but that it
* MUST NOT change errno in this case. MS-Windows _findnext()
* DOES change errno (to ENOENT) after the last entry has been
* read, so we must be prepared to restore it to its previous
* value, when no actual error has occurred.
*/
int prev_errno = errno;
if( DIRENT_UPDATE( dirp->dd_private ) != 0 )
{
/* May be an error, or just the case described above...
*/
if( GetLastError() == ERROR_NO_MORE_FILES )
/*
* ...which requires us to reset errno.
*/
errno = prev_errno;

but it is basically doing the same thing. I am wondering if we should
back-patch the PG code block where it was missing, and remove it from
head in all places on the logic that everyone running 9.4 will have a
post-3.1 version of Mingw. Postgres 8.4 was released in 2009 and it is
possible some people are still using pre-3.2 Mingw versions with that PG
release.

2.
! if (errno || closedir(chkdir) == -1)
result = -1; /* some kind of I/O error? */

Is there a special need to check return value of closedir in this
function, as all other uses (initdb.c, pg_resetxlog.c, pgfnames.c)
of it in similar context doesn't check the same?

One thing I think for which this code needs change is to check
errno before closedir as is done in initdb.c or pg_resetxlog.c

Yes, good point. Patch adjusted to add this.

While I am not a fan of backpatching, the fact we are ignoring errors in
some critical cases seems the non-cosmetic parts should be backpatched.

+1

The larger the patch gets, the more worried I am about backpatching.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#12)
Re: pg_archivecleanup bug

Bruce Momjian <bruce@momjian.us> writes:

Very good point. I have modified the patch to add this block in all
cases where it was missing. I started to wonder about the comment and
if the Mingw fix was released. Based on some research, I see this as
fixed in mingw-runtime-3.2, released 2003-10-10. That's pretty old.

Yeah. I would vote for removing that code in all branches. There is no
reason to suppose somebody is going to install 8.4.22 on a machine that
they haven't updated mingw on since 2003. Or, if you prefer, just remove
it in HEAD --- but going around and *adding* more copies seems like
make-work. The fact that we've not heard complaints about the omissions
is good evidence that nobody's using the buggy mingw versions anymore.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#13)
Re: pg_archivecleanup bug

On Tue, Mar 18, 2014 at 9:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bruce Momjian <bruce@momjian.us> writes:

Very good point. I have modified the patch to add this block in all
cases where it was missing. I started to wonder about the comment and
if the Mingw fix was released. Based on some research, I see this as
fixed in mingw-runtime-3.2, released 2003-10-10. That's pretty old.

Yeah. I would vote for removing that code in all branches. There is no
reason to suppose somebody is going to install 8.4.22 on a machine that
they haven't updated mingw on since 2003. Or, if you prefer, just remove
it in HEAD --- but going around and *adding* more copies seems like
make-work. The fact that we've not heard complaints about the omissions
is good evidence that nobody's using the buggy mingw versions anymore.

I don't think it is. Right now we're not checking errno *at all* in a
bunch of these places, so we're sure not going to get complaints about
doing it incorrectly in those places. Or do I need more caffeine?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#14)
Re: pg_archivecleanup bug

On Tue, Mar 18, 2014 at 10:03:46AM -0400, Robert Haas wrote:

On Tue, Mar 18, 2014 at 9:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bruce Momjian <bruce@momjian.us> writes:

Very good point. I have modified the patch to add this block in all
cases where it was missing. I started to wonder about the comment and
if the Mingw fix was released. Based on some research, I see this as
fixed in mingw-runtime-3.2, released 2003-10-10. That's pretty old.

Yeah. I would vote for removing that code in all branches. There is no
reason to suppose somebody is going to install 8.4.22 on a machine that
they haven't updated mingw on since 2003. Or, if you prefer, just remove
it in HEAD --- but going around and *adding* more copies seems like
make-work. The fact that we've not heard complaints about the omissions
is good evidence that nobody's using the buggy mingw versions anymore.

I don't think it is. Right now we're not checking errno *at all* in a
bunch of these places, so we're sure not going to get complaints about
doing it incorrectly in those places. Or do I need more caffeine?

You are correct. This code is seriously broken and I am susprised we
have not gotten more complaints. Good thing readdir/closedir rarely
fail.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#15)
Re: pg_archivecleanup bug

Bruce Momjian escribi�:

On Tue, Mar 18, 2014 at 10:03:46AM -0400, Robert Haas wrote:

On Tue, Mar 18, 2014 at 9:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bruce Momjian <bruce@momjian.us> writes:

Very good point. I have modified the patch to add this block in all
cases where it was missing. I started to wonder about the comment and
if the Mingw fix was released. Based on some research, I see this as
fixed in mingw-runtime-3.2, released 2003-10-10. That's pretty old.

Yeah. I would vote for removing that code in all branches. There is no
reason to suppose somebody is going to install 8.4.22 on a machine that
they haven't updated mingw on since 2003. Or, if you prefer, just remove
it in HEAD --- but going around and *adding* more copies seems like
make-work. The fact that we've not heard complaints about the omissions
is good evidence that nobody's using the buggy mingw versions anymore.

I don't think it is. Right now we're not checking errno *at all* in a
bunch of these places, so we're sure not going to get complaints about
doing it incorrectly in those places. Or do I need more caffeine?

You are correct. This code is seriously broken and I am susprised we
have not gotten more complaints. Good thing readdir/closedir rarely
fail.

I think we need to keep the check for old mingw runtime in older
branches; it seems reasonable to keep updating Postgres when new
versions come out but keep mingw the same if it doesn't break. A good
criterion here, to me, is: would we make it a runtime error if an old
mingw version is detected? If we would, then let's go and remove all
those errno checks. Then we force everyone to update to a sane mingw.
But if we're not adding such a check, then we might cause subtle trouble
just because we think running old mingw is unlikely.

On another note, please let's not make the code dissimilar in some
branches just because of source code embellishments are not back-ported
out of fear. I mean, if we want them in master, let them be in older
branches as well. Otherwise we end up with slightly different versions
that make back-patching future fixes a lot harder, for no gain.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Simon Riggs
simon@2ndQuadrant.com
In reply to: Alvaro Herrera (#16)
Re: pg_archivecleanup bug

On 18 March 2014 14:15, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Bruce Momjian escribió:

On Tue, Mar 18, 2014 at 10:03:46AM -0400, Robert Haas wrote:

On Tue, Mar 18, 2014 at 9:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bruce Momjian <bruce@momjian.us> writes:

Very good point. I have modified the patch to add this block in all
cases where it was missing. I started to wonder about the comment and
if the Mingw fix was released. Based on some research, I see this as
fixed in mingw-runtime-3.2, released 2003-10-10. That's pretty old.

Yeah. I would vote for removing that code in all branches. There is no
reason to suppose somebody is going to install 8.4.22 on a machine that
they haven't updated mingw on since 2003. Or, if you prefer, just remove
it in HEAD --- but going around and *adding* more copies seems like
make-work. The fact that we've not heard complaints about the omissions
is good evidence that nobody's using the buggy mingw versions anymore.

I don't think it is. Right now we're not checking errno *at all* in a
bunch of these places, so we're sure not going to get complaints about
doing it incorrectly in those places. Or do I need more caffeine?

You are correct. This code is seriously broken and I am susprised we
have not gotten more complaints. Good thing readdir/closedir rarely
fail.

back-patching

Some commentary on this...

Obviously, all errors are mine.

If pg_archivecleanup is a problem, then so is pg_standby a problem.

Given the above, this means we've run for about 7 years without a
reported issue on this. If we are going to "make this better" by
actually having it throw errors in places that didn't throw errors
before, are we sure that is going to make people happier? The archive
cleanup isn't exactly critical in most cases, so dynamic errors don't
matter much.

Also, the programs were originally written to work as standalone
program as well as an archive_cleanup_command. So we can't use
PostgreSQL infrastructure (can we?). That aspect is needed to allow
testing the program before it goes live.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#17)
Re: pg_archivecleanup bug

On Tue, Mar 18, 2014 at 11:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Given the above, this means we've run for about 7 years without a
reported issue on this. If we are going to "make this better" by
actually having it throw errors in places that didn't throw errors
before, are we sure that is going to make people happier? The archive
cleanup isn't exactly critical in most cases, so dynamic errors don't
matter much.

We report errors returned by system calls in many other places. I
can't see why this place should be any different.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#18)
Re: pg_archivecleanup bug

On 18 March 2014 15:50, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Mar 18, 2014 at 11:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Given the above, this means we've run for about 7 years without a
reported issue on this. If we are going to "make this better" by
actually having it throw errors in places that didn't throw errors
before, are we sure that is going to make people happier? The archive
cleanup isn't exactly critical in most cases, so dynamic errors don't
matter much.

We report errors returned by system calls in many other places. I
can't see why this place should be any different.

Sure. Just wanted to make sure it's a conscious, explicit choice to do so.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#9)
Re: pg_archivecleanup bug

On 13 March 2014 05:48, Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Dec 9, 2013 at 11:27:28AM -0500, Robert Haas wrote:

On Thu, Dec 5, 2013 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

But the other usages seem to be in assorted utilities, which
will need to do it right for themselves. initdb.c's walkdir() seems to
have it right and might be a reasonable model to follow. Or maybe we
should invent a frontend-friendly version of ReadDir() rather than
duplicating all the error checking code in ten-and-counting places?

If there's enough uniformity in all of those places to make that
feasible, it certainly seems wise to do it that way. I don't know if
that's the case, though - e.g. maybe some callers want to exit and
others do not. pg_resetxlog wants to exit; pg_archivecleanup and
pg_standby most likely want to print an error and carry on.

I have developed the attached patch which fixes all cases where
readdir() wasn't checking for errno, and cleaned up the syntax in other
cases to be consistent.

While I am not a fan of backpatching, the fact we are ignoring errors in
some critical cases seems the non-cosmetic parts should be backpatched.

pg_resetxlog was not an offender here; its coding was sound.

We shouldn't be discussing backpatching a patch that contains changes
to coding style.

ISTM we should change the code with missing checks to adopt the coding
style of pg_resetxlog, not the other way around.

I assume you or Kevin have this in hand and you don't want me to apply
the patch? (Since it was originally my bug)

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#20)
#22Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#21)
#23Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#22)
#24Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#23)
#25Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#23)
#26Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Simon Riggs (#25)
#27Simon Riggs
simon@2ndQuadrant.com
In reply to: Alvaro Herrera (#26)
#28Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#27)
#29Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#28)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#29)
#31Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Bruce Momjian (#29)
#32Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#31)
#33Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#32)