Server is not getting started with log level as debug5 on master after commit 3147ac

Started by Amit Kapilaabout 12 years ago7 messages

amit.kapila16@gmail.com

about 12 years ago

1 attachment(s)

In master branch, server is not getting started with log level as debug5.

Simple steps to reproduce the problem:
a. initdb -D ..\..\Data
b. change log_min_messages = debug5 in postgresql.conf
c. start server (pg_ctl start -D ..\..\Data) -- server doesn't get started

Relevant message on server console:
DEBUG: mapped win32 error code 2 to 2
FATAL: could not open recovery command file "recovery.conf": No error

This problem occurs only in Windows, but the cause of problem is generic.

The problem occurs when during startup, code tries to open
recovery.conf in below function:
readRecoveryCommandFile(void)
{
..
fd = AllocateFile(RECOVERY_COMMAND_FILE, "r");
if (fd == NULL)
{
if (errno == ENOENT)
return; /* not there, so no archive recovery */
ereport(FATAL,
(errcode_for_file_access(),
errmsg("could not open recovery command file \"%s\": %m",
RECOVERY_COMMAND_FILE)));
..
}

It is expected that if file doesn't exist, errno should be ENOENT
which is correct and was working fine until commit:
3147acd63e0135aff9a6c4b01d861251925d97d9
Use improved vsnprintf calling logic in more places.

The reason is that after this commit, function appendStringInfoVA has
started calling pvsnprintf() which initializes errno to 0. As function
appendStringInfoVA gets called in error path
(errmsg_internal()->EVALUATE_MESSAGE->appendStringInfoVA), it can
reset previous errno if there is any, as it happens for function
_dosmaperr().

_dosmaperr(unsigned long e)
{
...
errno = doserrors[i].doserr;
#ifndef FRONTEND
ereport(DEBUG5,
(errmsg_internal("mapped win32 error code %lu to %d",
e, errno)));
..
}

Here after setting errno, it calls errmsg_internal() which internally
resets errno to zero, so the callers of this function who are
expecting proper errno can fail which is what happens in current
issue.

I could think of below possible ways to fix the problem:
a. in function pvsnprintf(), save the errno before setting it to 0 and
then before exiting function reset it back to saved errno. I think
this is sane because in function pvsnprintf, if any error occurs due
to which errno is changed, it will not return control, so errno will
not be required by callers.
b. we can change the callers like _dosmaperr() who have responsibility
to save errno for their callers.

Patch with approach a) is attached with mail, we can change code as
per approach b) or any other better method as well, but for now I have
prepared patch with approach a), as I could not see any problem with
it.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

pvsnprintf_issue.patchapplication/octet-stream; name=pvsnprintf_issue.patchDownload

diff --git a/src/common/psprintf.c b/src/common/psprintf.c
index 41c39c4..ee2e820 100644
--- a/src/common/psprintf.c
+++ b/src/common/psprintf.c
@@ -97,9 +97,11 @@ size_t
 pvsnprintf(char *buf, size_t len, const char *fmt, va_list args)
 {
 	int			nprinted;
+	int			saved_errno;
 
 	Assert(len > 0);
 
+	saved_errno = errno;
 	errno = 0;
 
 	/*
@@ -129,6 +131,8 @@ pvsnprintf(char *buf, size_t len, const char *fmt, va_list args)
 #endif
 	}
 
+	errno = saved_errno;
+
 	/*
 	 * Note: some versions of vsnprintf return the number of chars actually
 	 * stored, not the total space needed as C99 specifies.  And at least one

Amit Kapila

amit.kapila16@gmail.com

about 12 years ago

In reply to: Amit Kapila (#1)

1 attachment(s)

Re: Server is not getting started with log level as debug5 on master after commit 3147ac

On Fri, Nov 22, 2013 at 9:30 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

In master branch, server is not getting started with log level as debug5.

Simple steps to reproduce the problem:
a. initdb -D ..\..\Data
b. change log_min_messages = debug5 in postgresql.conf
c. start server (pg_ctl start -D ..\..\Data) -- server doesn't get started

Relevant message on server console:
DEBUG: mapped win32 error code 2 to 2
FATAL: could not open recovery command file "recovery.conf": No error

This problem occurs only in Windows, but the cause of problem is generic.

I could think of below possible ways to fix the problem:
a. in function pvsnprintf(), save the errno before setting it to 0 and
then before exiting function reset it back to saved errno. I think
this is sane because in function pvsnprintf, if any error occurs due
to which errno is changed, it will not return control, so errno will
not be required by callers.
b. we can change the callers like _dosmaperr() who have responsibility
to save errno for their callers.

Patch with approach a) is attached with mail, we can change code as
per approach b) or any other better method as well, but for now I have
prepared patch with approach a), as I could not see any problem with
it.

Again looking at it, I think better fix would be to restore 'errno'
from 'edata->saved_errno' in errfinish() incase the control returns
back to caller (elevel <= WARNING). It seems to me that this fix is
required anyway because if we return from errfinish (ereport/elog) to
caller, it should restore back 'errno', as caller might need to take
some action based on errno.
Patch to restore 'errno' in errfinish() is attached with this mail.

Any better ideas for fixing this issue?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

pvsnprintf_issue_v2.patchapplication/octet-stream; name=pvsnprintf_issue_v2.patchDownload

diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index e648792..a1621bf 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -553,9 +553,12 @@ errfinish(int dummy,...)
 	 *
 	 * But check for cancel/die interrupt first --- this is so that the user
 	 * can stop a query emitting tons of notice or warning messages, even if
-	 * it's in a loop that otherwise fails to check for interrupts.
+	 * it's in a loop that otherwise fails to check for interrupts. Before
+	 * returning to caller restore errno back as callers might take some action
+	 * based on errno.
 	 */
 	CHECK_FOR_INTERRUPTS();
+	errno = edata->saved_errno;
 }

Tom Lane

tgl@sss.pgh.pa.us

about 12 years ago

In reply to: Amit Kapila (#2)

Re: Re: Server is not getting started with log level as debug5 on master after commit 3147ac

Amit Kapila <amit.kapila16@gmail.com> writes:

On Fri, Nov 22, 2013 at 9:30 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I could think of below possible ways to fix the problem:
a. in function pvsnprintf(), save the errno before setting it to 0 and
then before exiting function reset it back to saved errno. I think
this is sane because in function pvsnprintf, if any error occurs due
to which errno is changed, it will not return control, so errno will
not be required by callers.
b. we can change the callers like _dosmaperr() who have responsibility
to save errno for their callers.

Patch with approach a) is attached with mail, we can change code as
per approach b) or any other better method as well, but for now I have
prepared patch with approach a), as I could not see any problem with
it.

Again looking at it, I think better fix would be to restore 'errno'
from 'edata->saved_errno' in errfinish() incase the control returns
back to caller (elevel <= WARNING). It seems to me that this fix is
required anyway because if we return from errfinish (ereport/elog) to
caller, it should restore back 'errno', as caller might need to take
some action based on errno.
Patch to restore 'errno' in errfinish() is attached with this mail.

I think this is pretty misguided. In general, there should be no
expectation that a function call doesn't stomp on errno unless it's
specifically documented not to --- which surely ereport() never has
been. So generally it's the responsibility of any code that needs
errno to be preserved to do so itself. In particular, in the case
at hand:

if (doserrors[i].winerr == e)
{
errno = doserrors[i].doserr;
#ifndef FRONTEND
ereport(DEBUG5,
(errmsg_internal("mapped win32 error code %lu to %d",
e, errno)));
#elif FRONTEND_DEBUG
fprintf(stderr, _("mapped win32 error code %lu to %d"), e, errno);
#endif
return;
}

even if we were to do what you suggest to make the ereport() call
preserve errno, the FRONTEND_DEBUG code path would still be utterly
broken, because fprintf() is certainly capable of changing errno.

So basically, _dosmaperr() is broken and always has been, and it's
surprising we've not seen evidence of that before. It needs to not
try to set the real errno variable till it's done logging about it.

Normally I avoid touching Windows-specific code for lack of ability
to test it, but in this case the needed fix seems pretty clear, so
I'll go make it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Kapila

amit.kapila16@gmail.com

about 12 years ago

In reply to: Tom Lane (#3)

Re: Re: Server is not getting started with log level as debug5 on master after commit 3147ac

On Sun, Nov 24, 2013 at 4:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Kapila <amit.kapila16@gmail.com> writes:

On Fri, Nov 22, 2013 at 9:30 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Again looking at it, I think better fix would be to restore 'errno'
from 'edata->saved_errno' in errfinish() incase the control returns
back to caller (elevel <= WARNING). It seems to me that this fix is
required anyway because if we return from errfinish (ereport/elog) to
caller, it should restore back 'errno', as caller might need to take
some action based on errno.
Patch to restore 'errno' in errfinish() is attached with this mail.

I think this is pretty misguided. In general, there should be no
expectation that a function call doesn't stomp on errno unless it's
specifically documented not to --- which surely ereport() never has
been. So generally it's the responsibility of any code that needs
errno to be preserved to do so itself. In particular, in the case
at hand:

So basically, _dosmaperr() is broken and always has been, and it's
surprising we've not seen evidence of that before. It needs to not
try to set the real errno variable till it's done logging about it.

Normally I avoid touching Windows-specific code for lack of ability
to test it, but in this case the needed fix seems pretty clear, so
I'll go make it.

Thanks, I have verified that the problem reported above is fixed.

I think that still this kind of problems can be there at other
places in code. I checked few places and suspecting secure_read() can
also have similar problem:

secure_read()
{
..
errno = 0;
n = SSL_read(port->ssl, ptr, len);
err = SSL_get_error(port->ssl, n);
switch (err)
{
..
..

case SSL_ERROR_SSL:
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
errmsg("SSL error: %s", SSLerrmessage())));
/* fall through */
..
}

In case SSL_read sets errno, ereport will reset it and caller of
secure_read() like pq_getbyte_if_available() who perform actions based
on errno, can lead to some problem.

pq_getbyte_if_available(unsigned char *c)
{
..

r = secure_read(MyProcPort, c, 1);
if (r < 0)
{
if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR)
r = 0;
..
}

In general it is responsibility of caller to take care of errno
handling, but I am not sure it is taken care well at all places in
code and the chances of such problems were less earlier because there
was less chance that ereport would reset errno, but now it will
definitely do so.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

about 12 years ago

In reply to: Amit Kapila (#4)

Re: Re: Server is not getting started with log level as debug5 on master after commit 3147ac

Amit Kapila <amit.kapila16@gmail.com> writes:

I think that still this kind of problems can be there at other
places in code. I checked few places and suspecting secure_read() can
also have similar problem:

case SSL_ERROR_SSL:
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
errmsg("SSL error: %s", SSLerrmessage())));
/* fall through */

Note that what it "falls through" to is "errno = ECONNRESET", so the
caller will see a well-defined value of errno after this. Even without
the ereport call, I'd think that was necessary because SSL_get_error
isn't documented to return a meaningful value of errno except in the
SSL_ERROR_SYSCALL case.

In general it is responsibility of caller to take care of errno
handling, but I am not sure it is taken care well at all places in
code and the chances of such problems were less earlier because there
was less chance that ereport would reset errno, but now it will
definitely do so.

[ shrug... ] To my mind, this is a *good* thing, because now we will
more easily find and fix any callers that are broken. They were broken
anyway, it's just that the circumstances might've been difficult to
reproduce. As an example consider the possibility that ereport might
previously have stomped on errno only with unusual log_line_prefix
settings.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

about 12 years ago

In reply to: Tom Lane (#5)

Re: Re: Server is not getting started with log level as debug5 on master after commit 3147ac

I wrote:

Amit Kapila <amit.kapila16@gmail.com> writes:

In general it is responsibility of caller to take care of errno
handling, but I am not sure it is taken care well at all places in
code and the chances of such problems were less earlier because there
was less chance that ereport would reset errno, but now it will
definitely do so.

[ shrug... ] To my mind, this is a *good* thing, because now we will
more easily find and fix any callers that are broken. They were broken
anyway, it's just that the circumstances might've been difficult to
reproduce. As an example consider the possibility that ereport might
previously have stomped on errno only with unusual log_line_prefix
settings.

... and having said that, there *are* bugs here, which have got nothing to
do with ereport(). client_read_ended() for instance is not being careful
to preserve errno :-(

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Kapila

amit.kapila16@gmail.com

about 12 years ago

In reply to: Tom Lane (#5)

Re: Re: Server is not getting started with log level as debug5 on master after commit 3147ac

On Sun, Nov 24, 2013 at 10:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Kapila <amit.kapila16@gmail.com> writes:

I think that still this kind of problems can be there at other
places in code. I checked few places and suspecting secure_read() can
also have similar problem:

case SSL_ERROR_SSL:
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
errmsg("SSL error: %s", SSLerrmessage())));
/* fall through */

Note that what it "falls through" to is "errno = ECONNRESET", so the
caller will see a well-defined value of errno after this. Even without
the ereport call, I'd think that was necessary because SSL_get_error
isn't documented to return a meaningful value of errno except in the
SSL_ERROR_SYSCALL case.

Yes, that is right. Idea is that there can be more occurrences which
we might need to handle and the same seems to be taken care in your
latest
commit.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers