[RFC] Shouldn't we remove annoying FATAL messages from server log?
Hello,
My customers and colleagues sometimes (or often?) ask about the following
message:
FATAL: the database system is starting up
This message is often output dozens of times during a failover or PITR. The
users seem to be worried because the message level is FATAL and they don't
know why such severe message is output in a successful failover and
recovery. I can blame the users, because the message is merely a
sub-product of pg_ctl's internal ping.
Similarly, the below message is output when I stop the standby server
normally. Why FATAL as a result of successful operation? I'm afraid DBAs
are annoyed by these messages, as system administration software collects
ERROR and more severe messages for daily monitoring.
FATAL: terminating walreceiver process due to administrator command
Shouldn't we lower the severity or avoiding those messages to server log?
How about the following measures?
1. FATAL: the database system is starting up
2. FATAL: the database system is shutting down
3. FATAL: the database system is in recovery mode
4. FATAL: sorry, too many clients already
Report these as FATAL to the client because the client wants to know the
reason. But don't output them to server log because they are not necessary
for DBAs (4 is subtle.)
5. FATAL: terminating walreceiver process due to administrator command
6. FATAL: terminating background worker \"%s\" due to administrator command
Don't output these to server log. Why are they necessary? For
troubleshooting purposes? If necessary, the severity should be LOG (but I
wonder why other background processes are not reported...)
To suppress server log output, I think we can do as follows. I guess
ereport(FATAL) is still needed for easily handling both client report and
process termination.
log_min_messages = PANIC;
ereport(FATAL,
(errcode(ERRCODE_CANNOT_CONNECT_NOW),
errmsg("the database system is starting up")));
May I hear your opinions?
Regards
MauMau
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12/5/13, 10:25 AM, MauMau wrote:
Report these as FATAL to the client because the client wants to know the
reason. But don't output them to server log because they are not
necessary for DBAs
Yeah, this is part of a more general problem, which you have
characterized correctly: What is fatal (or error, or warning, ...) to
the client isn't necessarily fatal (or error, or warning, ...) to the
server or DBA. Fixing this would need a larger enhancement of the
logging infrastructure. It's been discussed before, but it's a bit of work.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"MauMau" <maumau307@gmail.com> writes:
Shouldn't we lower the severity or avoiding those messages to server log?
No. They are FATAL so far as the individual session is concerned.
Possibly some documentation effort is needed here, but I don't think
any change in the code behavior would be an improvement.
1. FATAL: the database system is starting up
2. FATAL: the database system is shutting down
3. FATAL: the database system is in recovery mode
4. FATAL: sorry, too many clients already
Report these as FATAL to the client because the client wants to know the
reason. But don't output them to server log because they are not necessary
for DBAs (4 is subtle.)
The notion that a DBA should not be allowed to find out how often #4 is
happening is insane.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Tom Lane-2 wrote
"MauMau" <
maumau307@
> writes:
Shouldn't we lower the severity or avoiding those messages to server log?
No. They are FATAL so far as the individual session is concerned.
Possibly some documentation effort is needed here, but I don't think
any change in the code behavior would be an improvement.1. FATAL: the database system is starting up
2. FATAL: the database system is shutting down
3. FATAL: the database system is in recovery mode
4. FATAL: sorry, too many clients already
Report these as FATAL to the client because the client wants to know the
reason. But don't output them to server log because they are not
necessary
for DBAs (4 is subtle.)The notion that a DBA should not be allowed to find out how often #4 is
happening is insane.
Agreed #4 is definitely DBA territory.
ISTM that instituting some level of categorization for messages would be
helpful. Then logging and reporting frameworks would be able to identify
and segregate the logs in whatever way they and the configuration deems
appropriate.
FATAL: [LOGON] too many clients already
I'd make the category output disabled by default for a long while then
eventually enabled by default but leave the ability to disable. Calls that
do not supply a category get [N/A] output in category mode.
David J.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/RFC-Shouldn-t-we-remove-annoying-FATAL-messages-from-server-log-tp5781899p5781925.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* David Johnston (polobo@yahoo.com) wrote:
ISTM that instituting some level of categorization for messages would be
helpful. Then logging and reporting frameworks would be able to identify
and segregate the logs in whatever way they and the configuration deems
appropriate.
I've wanted to do that and have even discussed it with folks in the
past, the trick is finding enough toit's, which is difficult when you
start to look at the size of the task...
Thanks,
Stephen
On 12/05/2013 10:21 AM, Stephen Frost wrote:
* David Johnston (polobo@yahoo.com) wrote:
ISTM that instituting some level of categorization for messages would be
helpful. Then logging and reporting frameworks would be able to identify
and segregate the logs in whatever way they and the configuration deems
appropriate.I've wanted to do that and have even discussed it with folks in the
past, the trick is finding enough toit's, which is difficult when you
start to look at the size of the task...
But ... if we set a firm policy on this, then we could gradually clean
up the error messages piecemeal over the next couple of major versions.
We could also make sure that any new features complied with the
categorization policy.
Right now, how to categorize errors is up to each individual patch
author, which means that things are all over the place, and get worse
with each new feature added.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: WMb2b24d2822027d071e1dd16decf9ab34c784c1ce629ff7c393fe20c390c3dd5ce12b01d7be909f1aeaac240abbc4d6df@asav-3.01.com
Josh Berkus <josh@agliodbs.com> writes:
On 12/05/2013 10:21 AM, Stephen Frost wrote:
But ... if we set a firm policy on this, then we could gradually clean
up the error messages piecemeal over the next couple of major versions.
We could also make sure that any new features complied with the
categorization policy.
Right now, how to categorize errors is up to each individual patch
author, which means that things are all over the place, and get worse
with each new feature added.
I don't think there's that much randomness in is-it-an-ERROR-or-not.
What I believe Stephen is talking about is a classification that
simply doesn't exist today, namely something around how likely is the
message to be of interest to a DBA as opposed to the client application.
We currently compose messages almost entirely with the client in mind,
and that's as it should be. But we could use some new decoration that's
more DBA-oriented to help decide what goes into the postmaster log.
Before we could get very far we'd need a better understanding than we have
of what cases a DBA might be interested in. To take the specific example
that started this thread, there wouldn't be a lot of value IMO in a
classification like "connection failure messages". I think the OP is
probably right that those are often uninteresting --- but as I mentioned,
"too many clients" might become interesting if he's wondering whether he
needs to enlarge max_connections. Or password failure cases might become
interesting if he starts to suspect breakin attempts. So I'd want to see
a design that credibly covers those sorts of needs before we put any large
effort into code changes.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12/05/2013 10:46 AM, Tom Lane wrote:
Before we could get very far we'd need a better understanding than we have
of what cases a DBA might be interested in. To take the specific example
that started this thread, there wouldn't be a lot of value IMO in a
classification like "connection failure messages". I think the OP is
probably right that those are often uninteresting --- but as I mentioned,
"too many clients" might become interesting if he's wondering whether he
needs to enlarge max_connections. Or password failure cases might become
interesting if he starts to suspect breakin attempts. So I'd want to see
a design that credibly covers those sorts of needs before we put any large
effort into code changes.
Heck, I'd be happy just to have a class of messages which specifically
means "OMG, there's something wrong with the server", that is, a flag
for messages which only occur when PostgreSQL encounters a bug, data
corrpution, or platform error. Right now, I have to suss those out by
regex.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: WM50e811bc4743a9dbfc1b3c2418d26c7c955ca6447ac25eaa1c6d3c8c9cf4d2a0145acb3f9cbc4c0f698dd37c1f158d44@asav-1.01.com
Josh Berkus wrote:
Heck, I'd be happy just to have a class of messages which specifically
means "OMG, there's something wrong with the server", that is, a flag
for messages which only occur when PostgreSQL encounters a bug, data
corrpution, or platform error. Right now, I have to suss those out by
regex.
My proposal was to have something separate from message severity
("criticality"). So the problems would continue to be reported as
FATAL, ERROR or WARNING, but if they are just the result of something
the user did wrong, then they get marked as "non critical", but if
there, say, a failure to flush xlog (which currently results in an
ERROR), we could flag it as critical. Grepping the log for critical
messages, regardless of severity, would result in actual action items
for the DBA, without having to grep things out by regex.
Otherwise you have to come up with a lot of messages each keeping the
current behavior of abort the current transaction or not, or terminate
the process or not.
There was also the idea that this would be driven off SQLSTATE but this
seems pretty unwieldy to me.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: "Peter Eisentraut" <peter_e@gmx.net>
Yeah, this is part of a more general problem, which you have
characterized correctly: What is fatal (or error, or warning, ...) to
the client isn't necessarily fatal (or error, or warning, ...) to the
server or DBA.
Thanks. In addition, #5 and #6 in my previous mail are even unnecessary for
both the client and the DBA, aren't they?
Fixing this would need a larger enhancement of the
logging infrastructure. It's been discussed before, but it's a bit of
work.
How about the easy fix I proposed? The current logging infrastructure seems
enough to solve the original problem with small effort without complicating
the code. If you don't like "log_min_messages = PANIC", SetConfigOption()
can be used instead. I think we'd better take a step to eliminate the
facing problem, as well as consider a much richer infrastracture in the long
run. I'm also interested in the latter, and want to discuss it after
solving the problem in front of me.
Regards
MauMau
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: "Tom Lane" <tgl@sss.pgh.pa.us>
No. They are FATAL so far as the individual session is concerned.
Possibly some documentation effort is needed here, but I don't think
any change in the code behavior would be an improvement.
You are suggesting that we should add a note like "Don't worry about the
following message. This is a result of normal connectivity checking", don't
you?
FATAL: the database system is starting up
But I doubt most users would recognize such notes. Anyway, lots of such
messages certainly make monitoring and troubleshooting harder, because
valuable messages are buried.
4. FATAL: sorry, too many clients already
Report these as FATAL to the client because the client wants to know the
reason. But don't output them to server log because they are not
necessary
for DBAs (4 is subtle.)The notion that a DBA should not be allowed to find out how often #4 is
happening is insane.
I thought someone would point out so. You are right, #4 is a strong hint
for the DBA about max_connection setting or connection pool configuration.
Regards
MauMau
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-12-06 22:35:21 +0900, MauMau wrote:
From: "Tom Lane" <tgl@sss.pgh.pa.us>
No. They are FATAL so far as the individual session is concerned.
Possibly some documentation effort is needed here, but I don't think
any change in the code behavior would be an improvement.You are suggesting that we should add a note like "Don't worry about the
following message. This is a result of normal connectivity checking", don't
you?FATAL: the database system is starting up
Uh. An explanation why you cannot connect to the database hardly seems
like a superflous log message.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: "Josh Berkus" <josh@agliodbs.com>
Heck, I'd be happy just to have a class of messages which specifically
means "OMG, there's something wrong with the server", that is, a flag
for messages which only occur when PostgreSQL encounters a bug, data
corrpution, or platform error. Right now, I have to suss those out by
regex.
What are the issues of using SQLSTATE XXnnn as a filter?
Regards
MauMau
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: "Alvaro Herrera" <alvherre@2ndquadrant.com>
There was also the idea that this would be driven off SQLSTATE but this
seems pretty unwieldy to me.
You are referring to this long discussion, don't you?
/messages/by-id/19791.1335902957@sss.pgh.pa.us
I've read it before, and liked the SQLSTATE-based approach. It seems like
properly assigned SQLSTATEs can be used as message IDs, and pairs of
SQLSTATE and its user action might be utilized to provide sophisticated
database administration GUI.
That discussion sounds interesting, and I want to take more time to
consider. But what do you think of my original suggestion to easily solve
the current issue? I'd like to remove the current annoying problem first
before spending much time for more excited infrastructure.
Regards
MauMau
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"MauMau" <maumau307@gmail.com> writes:
That discussion sounds interesting, and I want to take more time to
consider. But what do you think of my original suggestion to easily solve
the current issue? I'd like to remove the current annoying problem first
before spending much time for more excited infrastructure.
There is no enthusiasm for a quick-hack solution here, and most people
don't actually agree with your proposal that these errors should never
get logged. So no, that is not happening. You can hack your local
copy that way if you like of course, but it's not getting committed.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: "Tom Lane" <tgl@sss.pgh.pa.us>
There is no enthusiasm for a quick-hack solution here, and most people
don't actually agree with your proposal that these errors should never
get logged. So no, that is not happening. You can hack your local
copy that way if you like of course, but it's not getting committed.
Oh, I may have misunderstood your previous comments. I got the impression
that you and others regard those messages (except "too many clients") as
unnecessary in server log.
1. FATAL: the database system is starting up
2. FATAL: the database system is shutting down
3. FATAL: the database system is in recovery mode
5. FATAL: terminating walreceiver process due to administrator command
6. FATAL: terminating background worker \"%s\" due to administrator command
Could you tell me why these are necessary in server log? I guess like this.
Am I correct?
* #1 through #3 are necessary for the DBA to investigate and explain to the
end user why he cannot connect to the database.
* #4 and #5 are unnecessary for the DBA. I can't find out any reason why
these are useful for the DBA.
Regards
MauMau
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
MauMau wrote
From: "Tom Lane" <
tgl@.pa
>
There is no enthusiasm for a quick-hack solution here, and most people
don't actually agree with your proposal that these errors should never
get logged. So no, that is not happening. You can hack your local
copy that way if you like of course, but it's not getting committed.Oh, I may have misunderstood your previous comments. I got the impression
that you and others regard those messages (except "too many clients") as
unnecessary in server log.1. FATAL: the database system is starting up
2. FATAL: the database system is shutting down
3. FATAL: the database system is in recovery mode5. FATAL: terminating walreceiver process due to administrator command
6. FATAL: terminating background worker \"%s\" due to administrator
commandCould you tell me why these are necessary in server log? I guess like
this.
Am I correct?* #1 through #3 are necessary for the DBA to investigate and explain to
the
end user why he cannot connect to the database.* #4 and #5 are unnecessary for the DBA. I can't find out any reason why
these are useful for the DBA.
For me 1-3 are unusual events in production situations and so knowing when
they occur, and confirming they occurred for a good reason, is a key job of
the DBA.
5 and 6: I don't fully understand when they would happen but likely fall
into the same "the DBA should know what is going on with their server and
confirm any startup/shutdown activity it is involved with".
They might be better categorized "NOTICE" level if they were in response to
a administrator action, versus in response to a crashed process, but even
for the user-initiated situation making sure they hit the log but using
FATAL is totally understandable and IMO desirable.
I'd ask in what situations are these messages occurring so frequently that
they are becoming noise instead of useful data? Sorry if I missed your
use-case explanation up-thread.
David J.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/RFC-Shouldn-t-we-remove-annoying-FATAL-messages-from-server-log-tp5781899p5782234.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
David Johnston wrote
MauMau wrote
From: "Tom Lane" <
tgl@.pa
>
There is no enthusiasm for a quick-hack solution here, and most people
don't actually agree with your proposal that these errors should never
get logged. So no, that is not happening. You can hack your local
copy that way if you like of course, but it's not getting committed.Oh, I may have misunderstood your previous comments. I got the
impression
that you and others regard those messages (except "too many clients") as
unnecessary in server log.1. FATAL: the database system is starting up
2. FATAL: the database system is shutting down
3. FATAL: the database system is in recovery mode5. FATAL: terminating walreceiver process due to administrator command
6. FATAL: terminating background worker \"%s\" due to administrator
commandCould you tell me why these are necessary in server log? I guess like
this.
Am I correct?* #1 through #3 are necessary for the DBA to investigate and explain to
the
end user why he cannot connect to the database.* #4 and #5 are unnecessary for the DBA. I can't find out any reason why
these are useful for the DBA.For me 1-3 are unusual events in production situations and so knowing when
they occur, and confirming they occurred for a good reason, is a key job
of the DBA.5 and 6: I don't fully understand when they would happen but likely fall
into the same "the DBA should know what is going on with their server and
confirm any startup/shutdown activity it is involved with".They might be better categorized "NOTICE" level if they were in response
to a administrator action, versus in response to a crashed process, but
even for the user-initiated situation making sure they hit the log but
using FATAL is totally understandable and IMO desirable.I'd ask in what situations are these messages occurring so frequently that
they are becoming noise instead of useful data? Sorry if I missed your
use-case explanation up-thread.David J.
Went and scanned the thread:
PITR/Failover is not generally that frequent an occurrence but I will agree
that these events are likely common during such.
Maybe PITR/Failover mode can output something in the logs to alleviate user
angst about these frequent events? I'm doubting that a totally separate
mechanism can be used for this "mode" but instead of looking for things to
remove how about adding some additional coddling to the logs and the
beginning and end of the mode change?
Thought provoking only as I have not actually been a user of said feature.
David J.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/RFC-Shouldn-t-we-remove-annoying-FATAL-messages-from-server-log-tp5781899p5782235.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
MauMau wrote
From: "Tom Lane" <
tgl@.pa
>
There is no enthusiasm for a quick-hack solution here, and most people
don't actually agree with your proposal that these errors should never
get logged. So no, that is not happening. You can hack your local
copy that way if you like of course, but it's not getting committed.Oh, I may have misunderstood your previous comments. I got the impression
that you and others regard those messages (except "too many clients") as
unnecessary in server log.1. FATAL: the database system is starting up
How about altering the message to tone down the severity by a half-step...
FATAL: (probably) not! - the database system is starting up
David J.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/RFC-Shouldn-t-we-remove-annoying-FATAL-messages-from-server-log-tp5781899p5782236.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: "David Johnston" <polobo@yahoo.com>
5. FATAL: terminating walreceiver process due to administrator command
6. FATAL: terminating background worker \"%s\" due to administrator
command5 and 6: I don't fully understand when they would happen but likely fall
into the same "the DBA should know what is going on with their server and
confirm any startup/shutdown activity it is involved with".They might be better categorized "NOTICE" level if they were in response
to
a administrator action, versus in response to a crashed process, but even
for the user-initiated situation making sure they hit the log but using
FATAL is totally understandable and IMO desirable.
#5 is output when the DBA shuts down the replication standby server.
#6 is output when the DBA shuts down the server if he is using any custom
background worker.
These messages are always output. What I'm seeing as a problem is that
FATAL messages are output in a normal situation, which worries the DBA, and
those messages don't help the DBA with anything. They merely worry the DBA.
Regards
MauMau
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers