[PATCH] Log CSV by default

Started by David Fetterabout 7 years ago16 messages
#1David Fetter
david@fetter.org

This makes it much simpler for computers to use the logs while not
making it excessively difficult for humans to use them.

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 03594e77fe..21a0f4f86c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1544,7 +1544,7 @@ static struct config_bool ConfigureNamesBool[] =
 			NULL
 		},
 		&Logging_collector,
-		false,
+		true,
 		NULL, NULL, NULL
 	},
 	{
@@ -3727,13 +3727,13 @@ static struct config_string ConfigureNamesString[] =
 	{
 		{"log_destination", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Sets the destination for server log output."),
-			gettext_noop("Valid values are combinations of \"stderr\", "
-						 "\"syslog\", \"csvlog\", and \"eventlog\", "
+			gettext_noop("Valid values are combinations of \"csvlog\", "
+						 "\"stderr\", \"syslog\", and \"eventlog\", "
 						 "depending on the platform."),
 			GUC_LIST_INPUT
 		},
 		&Log_destination_string,
-		"stderr",
+		"csvlog",
 		check_log_destination, assign_log_destination, NULL
 	},
 	{
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 1fa02d2c93..c68ac6868c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -411,13 +411,13 @@

# - Where to Log -

-#log_destination = 'stderr'		# Valid values are combinations of
-					# stderr, csvlog, syslog, and eventlog,
-					# depending on platform.  csvlog
-					# requires logging_collector to be on.
+#log_destination = 'csvlog'		# Valid values are combinations of
+					# csvlog, stderr, syslog, and eventlog,
+					# depending on platform.  Options other than csvlog
+					# do not require that logging_collector be on.
 # This is used when logging to stderr:
-#logging_collector = off		# Enable capturing of stderr and csvlog
+#logging_collector = on		# Enable capturing of stderr and csvlog
 					# into log files. Required to be on for
 					# csvlogs.
 					# (change requires restart)

--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#2Andres Freund
andres@anarazel.de
In reply to: David Fetter (#1)
Re: [PATCH] Log CSV by default

Hi,

On 2018-11-30 19:53:18 +0100, David Fetter wrote:

This makes it much simpler for computers to use the logs while not
making it excessively difficult for humans to use them.

While perhaps not excessively so, I think it increases the difficulty
sufficiently that I'm against such a proposal. Yes, a conversion into
something more readable can be done programatically, but that's another
step before analyzing a problem. And besides, a lot of log fragments are
embedded in email, screenshotted etc.

Secondarily, csvlog doesn't contain all the interesting information.

Thirdly, it often increases the log volume noticably.

I think having a bin/pg_logparse tool that can parse postgres' config
file and attempt to parse the log contents in whatever format they are
would be much much more useful. Obviously not every log_line_prefix can
be parsed unambiguously, but a lot of formats can, and a lot more
formats can be made unambiguous (e.g. adding escape logic to application
name logging would be very useful).

- Andres

#3Isaac Morland
isaac.morland@gmail.com
In reply to: Andres Freund (#2)
Re: [PATCH] Log CSV by default

I think having a bin/pg_logparse tool that can parse postgres' config
file and attempt to parse the log contents in whatever format they are
would be much much more useful. Obviously not every log_line_prefix can
be parsed unambiguously, but a lot of formats can, and a lot more
formats can be made unambiguous (e.g. adding escape logic to application
name logging would be very useful).

Aren't application names set by the client? If they are currently logged
without escaping, couldn't a client insert arbitrary binary content such as
terminal control sequences into log files, with potentially unpleasant
results? Or is there some sort of filter already in place somewhere? Should
I investigate?

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#2)
Re: [PATCH] Log CSV by default

Andres Freund <andres@anarazel.de> writes:

On 2018-11-30 19:53:18 +0100, David Fetter wrote:

This makes it much simpler for computers to use the logs while not
making it excessively difficult for humans to use them.

While perhaps not excessively so, I think it increases the difficulty
sufficiently that I'm against such a proposal.

I don't like this either. People who prefer CSV format can select it
trivially. Those who don't are going to be annoyed at us for changing
behavior that's stood for many years.

Also, in addition to the objections you noted, there's the problem that
this change requires changing logging_collector to default to "on".
That's an *enormous* compatibility break, because of the effects on
where the log output goes. Not to mention having performance impacts
that can be significant.

I think we should reject this out of hand.

I think having a bin/pg_logparse tool that can parse postgres' config
file and attempt to parse the log contents in whatever format they are
would be much much more useful. Obviously not every log_line_prefix can
be parsed unambiguously, but a lot of formats can, and a lot more
formats can be made unambiguous (e.g. adding escape logic to application
name logging would be very useful).

Yeah, it might be possible to make some progress in those directions.

regards, tom lane

#5Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#4)
Re: [PATCH] Log CSV by default

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Andres Freund <andres@anarazel.de> writes:

On 2018-11-30 19:53:18 +0100, David Fetter wrote:

This makes it much simpler for computers to use the logs while not
making it excessively difficult for humans to use them.

While perhaps not excessively so, I think it increases the difficulty
sufficiently that I'm against such a proposal.

I don't like this either. People who prefer CSV format can select it
trivially. Those who don't are going to be annoyed at us for changing
behavior that's stood for many years.

Agreed. Frankly, even if we made this the default, I doubt packagers
would decide to go with it because of the issues raised here.

I think we should reject this out of hand.

Agreed.

Thanks!

Stephen

#6Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#5)
Re: [PATCH] Log CSV by default

On Fri, Nov 30, 2018 at 06:11:03PM -0500, Stephen Frost wrote:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

I think we should reject this out of hand.

Agreed.

+1.
--
Michael
#7David Fetter
david@fetter.org
In reply to: Tom Lane (#4)
Re: [PATCH] Log CSV by default

On Fri, Nov 30, 2018 at 04:55:30PM -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2018-11-30 19:53:18 +0100, David Fetter wrote:

This makes it much simpler for computers to use the logs while not
making it excessively difficult for humans to use them.

While perhaps not excessively so, I think it increases the difficulty
sufficiently that I'm against such a proposal.

I don't like this either. People who prefer CSV format can select it
trivially. Those who don't are going to be annoyed at us for changing
behavior that's stood for many years.

Also, in addition to the objections you noted, there's the problem that
this change requires changing logging_collector to default to "on".
That's an *enormous* compatibility break, because of the effects on
where the log output goes. Not to mention having performance impacts
that can be significant.

I think we should reject this out of hand.

It's been far too long since I got one of these!

I think having a bin/pg_logparse tool that can parse postgres' config
file and attempt to parse the log contents in whatever format they are
would be much much more useful. Obviously not every log_line_prefix can
be parsed unambiguously, but a lot of formats can, and a lot more
formats can be made unambiguous (e.g. adding escape logic to application
name logging would be very useful).

Yeah, it might be possible to make some progress in those directions.

So application names need better handling, and possibly reviews for
security considerations, and pg_logparse ?

OK.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#8Vik Fearing
vik.fearing@2ndquadrant.com
In reply to: Andres Freund (#2)
Re: [PATCH] Log CSV by default

On 30/11/2018 21:15, Andres Freund wrote:

Secondarily, csvlog doesn't contain all the interesting information.

It doesn't? What is it missing?
--
Vik Fearing +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#9Arkhena
Arkhena@gmail.com
In reply to: David Fetter (#7)
Re: [PATCH] Log CSV by default

Hi all,

Also, in addition to the objections you noted, there's the problem that
this change requires changing logging_collector to default to "on".
That's an *enormous* compatibility break, because of the effects on
where the log output goes.

As a consultant with so many customers that simply install Postgres "out of
the box", I think that changing log_destination to 'csvlog' and
logging_collector to 'on' needs changing at least log_truncate_on_rotation
(but certainly also log_filename and log_rotation_age) if we don't want my
customers to run out of space sooner or later.

So, that's a "-1" for me to change the log_destination default to 'csvlog'.

Personally, I use one of my colleague's python script that "change" the
standard text output to csv (and even create the table and insert rows in
it by connecting to my database), so I can use my SQL queries to perform
log analyzes.
So I think the idea of a 'pg_logparse' is wonderfull.

Cheers,

Lætitia

Le sam. 1 déc. 2018 à 01:03, David Fetter <david@fetter.org> a écrit :

On Fri, Nov 30, 2018 at 04:55:30PM -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2018-11-30 19:53:18 +0100, David Fetter wrote:

This makes it much simpler for computers to use the logs while not
making it excessively difficult for humans to use them.

While perhaps not excessively so, I think it increases the difficulty
sufficiently that I'm against such a proposal.

I don't like this either. People who prefer CSV format can select it
trivially. Those who don't are going to be annoyed at us for changing
behavior that's stood for many years.

Also, in addition to the objections you noted, there's the problem that
this change requires changing logging_collector to default to "on".
That's an *enormous* compatibility break, because of the effects on
where the log output goes. Not to mention having performance impacts
that can be significant.

I think we should reject this out of hand.

It's been far too long since I got one of these!

I think having a bin/pg_logparse tool that can parse postgres' config
file and attempt to parse the log contents in whatever format they are
would be much much more useful. Obviously not every log_line_prefix can
be parsed unambiguously, but a lot of formats can, and a lot more
formats can be made unambiguous (e.g. adding escape logic to

application

name logging would be very useful).

Yeah, it might be possible to make some progress in those directions.

So application names need better handling, and possibly reviews for
security considerations, and pg_logparse ?

OK.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Adoptez l'éco-attitude
N'imprimez ce mail que si c'est vraiment nécessaire

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Arkhena (#9)
Re: [PATCH] Log CSV by default

Arkhena <Arkhena@gmail.com> writes:

As a consultant with so many customers that simply install Postgres "out of
the box", I think that changing log_destination to 'csvlog' and
logging_collector to 'on' needs changing at least log_truncate_on_rotation
(but certainly also log_filename and log_rotation_age) if we don't want my
customers to run out of space sooner or later.

Yeah, that's another issue that we'd have to confront if we turn
logging_collector on by default: the default settings for log
rotation aren't exactly foolproof, and we'd need to make them so.

(Now having said that, the default setup *without* logging_collector
isn't exactly foolproof either, since wherever you send stderr to
will grow without bound. But maybe people are used to dealing with
that.)

Independently of changing logging_collector's default, perhaps we could
agree on changes to the rotation-related defaults so that it's less
risky to just turn logging_collector on without other changes?

regards, tom lane

#11David Fetter
david@fetter.org
In reply to: Tom Lane (#10)
Re: [PATCH] Log CSV by default

On Sun, Dec 02, 2018 at 12:09:57PM -0500, Tom Lane wrote:

Arkhena <Arkhena@gmail.com> writes:

As a consultant with so many customers that simply install Postgres "out of
the box", I think that changing log_destination to 'csvlog' and
logging_collector to 'on' needs changing at least log_truncate_on_rotation
(but certainly also log_filename and log_rotation_age) if we don't want my
customers to run out of space sooner or later.

Yeah, that's another issue that we'd have to confront if we turn
logging_collector on by default: the default settings for log
rotation aren't exactly foolproof, and we'd need to make them so.

(Now having said that, the default setup *without* logging_collector
isn't exactly foolproof either, since wherever you send stderr to
will grow without bound. But maybe people are used to dealing with
that.)

Independently of changing logging_collector's default, perhaps we could
agree on changes to the rotation-related defaults so that it's less
risky to just turn logging_collector on without other changes?

Would rotating daily and keeping a week's worth be reasonable as
defaults? If not, what do you have in mind?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#12Gilles Darold
gilles@darold.net
In reply to: David Fetter (#1)
Re: [PATCH] Log CSV by default

Le 30/11/2018 à 19:53, David Fetter a écrit :

This makes it much simpler for computers to use the logs while not
making it excessively difficult for humans to use them.

I'm also not very comfortable with this change. For a pgBadger point of
view I don't know an existing library to parse csv log in multi-process
mode. The only way I know is to split the file or load chunks in memory
which is not really recommended with huge log files. I am not saying
that this is not possible to have a Perl CSV library allowing
multi-process but this require some additional days of work. Also at
this time I have not received any feature request for that, maybe the
use of csvlog format is not so common.

Regards,

--
Gilles Darold
http://www.darold.net/

#13David Fetter
david@fetter.org
In reply to: Gilles Darold (#12)
Re: [PATCH] Log CSV by default

On Mon, Dec 03, 2018 at 07:18:31PM +0100, Gilles Darold wrote:

Le 30/11/2018 � 19:53, David Fetter a �crit�:

This makes it much simpler for computers to use the logs while not
making it excessively difficult for humans to use them.

I'm also not very comfortable with this change. For a pgBadger point
of view I don't know an existing library to parse csv log in
multi-process mode. The only way I know is to split the file or load
chunks in memory which is not really recommended with huge log
files. I am not saying that this is not possible to have a Perl CSV
library allowing multi-process but this require some additional days
of work.

How about other structured log formats? Would one for JSON work
better?

Also at this time I have not received any feature request for that,
maybe the use of csvlog format is not so common.

Part of the reason it's uncommon is that it's not the default we ship
with, and defaults influence this very much.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#14Gilles Darold
gilles@darold.net
In reply to: David Fetter (#13)
Re: [PATCH] Log CSV by default

Le 03/12/2018 à 19:25, David Fetter a écrit :

On Mon, Dec 03, 2018 at 07:18:31PM +0100, Gilles Darold wrote:

Le 30/11/2018 à 19:53, David Fetter a écrit :

This makes it much simpler for computers to use the logs while not
making it excessively difficult for humans to use them.

I'm also not very comfortable with this change. For a pgBadger point
of view I don't know an existing library to parse csv log in
multi-process mode. The only way I know is to split the file or load
chunks in memory which is not really recommended with huge log
files. I am not saying that this is not possible to have a Perl CSV
library allowing multi-process but this require some additional days
of work.

How about other structured log formats? Would one for JSON work
better?

pgBadger is able to parse jsonlog
(https://github.com/michaelpq/pg_plugins/tree/master/jsonlog) output in
multi-process mode because it do not use an external library to parse
the json. In this minimalist implementation I assume that each log line
is a full json object. Honestly I do not have enough feedback on this
format to be sure that my basic jsonlog parser is enough, maybe it is
not. What is really annoying for a log parser is the preservation of
newline character in queries, if the format outputs each atomic log
messages on a single line each, this is ideal.

Also at this time I have not received any feature request for that,
maybe the use of csvlog format is not so common.

Part of the reason it's uncommon is that it's not the default we ship
with, and defaults influence this very much.

I agree. I also think that syslog or stderr are more human readable than
csvlog or jsonlog, which have probably contributed to their large
adoption too.

--
Gilles Darold
http://www.darold.net/

#15Michael Paquier
michael@paquier.xyz
In reply to: Gilles Darold (#14)
Re: [PATCH] Log CSV by default

On Mon, Dec 03, 2018 at 09:20:01PM +0100, Gilles Darold wrote:

pgBadger is able to parse jsonlog
(https://github.com/michaelpq/pg_plugins/tree/master/jsonlog) output in
multi-process mode because it do not use an external library to parse
the json. In this minimalist implementation I assume that each log line
is a full json object. Honestly I do not have enough feedback on this
format to be sure that my basic jsonlog parser is enough, maybe it is
not. What is really annoying for a log parser is the preservation of
newline character in queries, if the format outputs each atomic log
messages on a single line each, this is ideal.

jsonlog generates one JSON object for each elog call, and per line.
Newlines are treated as they would for a JSON object by being replaced
with \n within the query strings, the same way te backend handles JSON
strings as text.
--
Michael

#16Gilles Darold
gilles@darold.net
In reply to: Michael Paquier (#15)
Re: [PATCH] Log CSV by default

Le 04/12/2018 � 03:37, Michael Paquier a �crit�:

On Mon, Dec 03, 2018 at 09:20:01PM +0100, Gilles Darold wrote:

pgBadger is able to parse jsonlog
(https://github.com/michaelpq/pg_plugins/tree/master/jsonlog) output in
multi-process mode because it do not use an external library to parse
the json. In this minimalist implementation I assume that each log line
is a full json object. Honestly I do not have enough feedback on this
format to be sure that my basic jsonlog parser is enough, maybe it is
not. What is really annoying for a log parser is the preservation of
newline character in queries, if the format outputs each atomic log
messages on a single line each, this is ideal.

jsonlog generates one JSON object for each elog call, and per line.
Newlines are treated as they would for a JSON object by being replaced
with \n within the query strings, the same way te backend handles JSON
strings as text.

This was what I remember and this is why it is easy for pgbadger to
process these log files in multiprocess mode.

--
Gilles Darold