Reversed sync check in pg_receivewal
This bug seems to have snuck in there with the introduction of walmethods.
AFAICT we are testing the result of sync() backwards, so whenever a partial
segment exists for pg_receivewal, it will fail. It will then unlink the
file, so when it retries 5 seconds later it works.
It also doesn't log the failure. Oops.
Attached patch reverses the check, and adds a failure message. I'd
appreciate a quick review in case I have the logic backwards in my head...
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Attachments:
receivelog_sync.patchtext/x-patch; charset=US-ASCII; name=receivelog_sync.patchDownload
diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index f415135..8511e57 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -132,8 +132,11 @@ open_walfile(StreamCtl *stream, XLogRecPtr startpoint)
}
/* fsync file in case of a previous crash */
- if (!stream->walmethod->sync(f))
+ if (stream->walmethod->sync(f) != 0)
{
+ fprintf(stderr,
+ _("%s: could not sync existing transaction log file \"%s\": %s\n"),
+ progname, fn, stream->walmethod->getlasterror());
stream->walmethod->close(f, CLOSE_UNLINK);
return false;
}
Magnus Hagander <magnus@hagander.net> writes:
Attached patch reverses the check, and adds a failure message. I'd
appreciate a quick review in case I have the logic backwards in my head...
I think the patch is correct, but if there's any documentation of the
walmethod APIs that would allow one to assert which side of the API got
this wrong, I sure don't see it. Would it be unreasonable to insist
on some documentation around that?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 11, 2017 at 9:41 PM, Magnus Hagander <magnus@hagander.net> wrote:
This bug seems to have snuck in there with the introduction of walmethods.
AFAICT we are testing the result of sync() backwards, so whenever a partial
segment exists for pg_receivewal, it will fail. It will then unlink the
file, so when it retries 5 seconds later it works.It also doesn't log the failure. Oops.
Attached patch reverses the check, and adds a failure message. I'd
appreciate a quick review in case I have the logic backwards in my head...
This has been fat-fingered in 56c7d8d4, and looking around I am not
seeing similar mistakes. Thanks for the fix.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Magnus,
Attached patch reverses the check, and adds a failure message. I'd
appreciate a quick review in case I have the logic backwards in my head...
Well, I can state that `make check-world` passes on my laptop and that
code seems to be right. However documentation to WalWriteMethod could
probably be improved. Currently there is none.
--
Best regards,
Aleksander Alekseev
On Tue, Apr 11, 2017 at 3:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
Attached patch reverses the check, and adds a failure message. I'd
appreciate a quick review in case I have the logic backwards in myhead...
I think the patch is correct, but if there's any documentation of the
walmethod APIs that would allow one to assert which side of the API got
this wrong, I sure don't see it. Would it be unreasonable to insist
on some documentation around that?
Agreed.
Would you say comments in the struct in walmethods.h is enough, or were you
thinking actual sgml docs when you commented that?
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Magnus Hagander <magnus@hagander.net> writes:
On Tue, Apr 11, 2017 at 3:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think the patch is correct, but if there's any documentation of the
walmethod APIs that would allow one to assert which side of the API got
this wrong, I sure don't see it. Would it be unreasonable to insist
on some documentation around that?
Would you say comments in the struct in walmethods.h is enough, or were you
thinking actual sgml docs when you commented that?
This is just internal to pg_basebackup isn't it? I think comments in
walmethods.h would be plenty.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 11, 2017 at 3:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
On Tue, Apr 11, 2017 at 3:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think the patch is correct, but if there's any documentation of the
walmethod APIs that would allow one to assert which side of the API got
this wrong, I sure don't see it. Would it be unreasonable to insist
on some documentation around that?Would you say comments in the struct in walmethods.h is enough, or were
you
thinking actual sgml docs when you commented that?
This is just internal to pg_basebackup isn't it? I think comments in
walmethods.h would be plenty.
Local to pg_basebackup and pg_receivewal, yes.
Something like the attached?
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Attachments:
walmethod_comments.patchtext/x-patch; charset=US-ASCII; name=walmethod_comments.patchDownload
diff --git a/src/bin/pg_basebackup/walmethods.h b/src/bin/pg_basebackup/walmethods.h
index 8d679da..a3316a6 100644
--- a/src/bin/pg_basebackup/walmethods.h
+++ b/src/bin/pg_basebackup/walmethods.h
@@ -19,18 +19,50 @@ typedef enum
CLOSE_NO_RENAME
} WalCloseMethod;
+/*
+ * A WalWriteMethod structure represents the different methods used
+ * to write the streaming WAL as it's received.
+ *
+ * All methods that have a failure path will set errno on failure.
+ */
typedef struct WalWriteMethod WalWriteMethod;
struct WalWriteMethod
{
+ /* Open a target file. Returns Walfile, or NULL if open failed. */
Walfile(*open_for_write) (const char *pathname, const char *temp_suffix, size_t pad_to_size);
+
+ /*
+ * Close an open Walfile, using one or more methods for handling
+ * automatic unlinking etc. Returns 0 on success, other values
+ * for error.
+ */
int (*close) (Walfile f, WalCloseMethod method);
+
+ /* Check if a file exist */
bool (*existsfile) (const char *pathname);
+
+ /* Return the size of a file, or -1 on failure. */
ssize_t (*get_file_size) (const char *pathname);
+ /*
+ * Write count number of bytes to the file, and return the number
+ * of bytes actually written or -1 for error.
+ */
ssize_t (*write) (Walfile f, const void *buf, size_t count);
+
+ /* Return the current position in a file or -1 on error */
off_t (*get_current_pos) (Walfile f);
+
+ /*
+ * fsync the contents of the specified file. Returns 0 on
+ * success.
+ */
int (*sync) (Walfile f);
+
+ /* Clean up the Walmethod, closing any shared resources */
bool (*finish) (void);
+
+ /* Return a text for the last error in this Walfile */
char *(*getlasterror) (void);
};
Magnus Hagander <magnus@hagander.net> writes:
Something like the attached?
Not sure about
+ * All methods that have a failure path will set errno on failure.
Given that you've got a getlasterror method, I don't think that's really
the API invariant is it? If it were, you'd just have the callers doing
strerror(errno) for themselves. I think maybe a more accurate statement
would be
* All methods that have a failure return indicator will set state
* allowing the getlasterror() method to return a suitable message.
* Commonly, errno is this state (or part of it); so callers must take
* care not to clobber errno between a failed method call and use of
* getlasterror() to retrieve the message.
Also:
* the arguments of open_for_write() could stand to be explained
* what's the return value of finish()?
* wouldn't it be better to declare getlasterror() as returning
"const char *"?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 11, 2017 at 6:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
Something like the attached?
Not sure about
+ * All methods that have a failure path will set errno on failure.
Given that you've got a getlasterror method, I don't think that's really
the API invariant is it? If it were, you'd just have the callers doing
strerror(errno) for themselves. I think maybe a more accurate statement
would be
Hmm. You're right, this is what I get for rushing a patch before plane
departure :/
* All methods that have a failure return indicator will set state
* allowing the getlasterror() method to return a suitable message.
* Commonly, errno is this state (or part of it); so callers must take
* care not to clobber errno between a failed method call and use of
* getlasterror() to retrieve the message.
Yeah, that's much better.
Also:
* the arguments of open_for_write() could stand to be explained
* what's the return value of finish()?
Both fixed.
* wouldn't it be better to declare getlasterror() as returning
"const char *"?
Yeah, probably is. Will do and push.
Thanks!
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>