Proposal: More structured logging

Started by Ronan Dunklaualmost 5 years ago26 messageshackers

ronan.dunklau@aiven.io

almost 5 years ago

Hello,

The logging system already captures a lot of information in the ErrorData. But
at present there is no way for a log message authors to include more metadata
about the log outside of the log message itself. For example, including the
extension name which can be useful for filtering / dispatching log messages.
This can be done in the log message itself, but that requires specialized
parsing in the log output.

Even though among the available loggers in core, only the csv logger would be
able to make sense of it, I feel it would be beneficial to add a tagging system
to logs, by adding different (tag, value) combinations to a log entry, with an
API similar to what we do for other ErrorData fields:

ereport(NOTICE,
(errmsg("My log message")),
(errtag("EMITTER", "MYEXTENSION")),
(errtag("MSG-ID", "%d", error_message_id))
);

Please find attached a very small POC patch to better demonstrate what I
propose.

Third party logging hooks could then exploit those values to output them
correctly. For example the json loggger written by Michael Paquier here:
https://github.com/michaelpq/pg_plugins/tree/master/jsonlog, or the
seeminlgy-abandonned journald hook here: https://github.com/intgr/pg_journal
could add those information in a structured way.

I think the pgaudit extension (or something similar) could also make good use
of such a feature instead of dumping a CSV entry in the log file.

As for Postgres own log messages, I'm sure we could find practical use cases
for tagging messages like this.

On a related note, the only structured logger we have in-core is the CSV
logger. Many users nowadays end up feeding the logs to journald either by
capturing the stderr output with systemd, or by having syslog implemented by
journald itself. Would there be any interest in having native journald support
as a logging_destination ?

Best regards,

--
Ronan Dunklau

Alvaro Herrera

alvherre@2ndquadrant.com

almost 5 years ago

In reply to: Ronan Dunklau (#1)

Re: Proposal: More structured logging

On 2021-Aug-13, Ronan Dunklau wrote:

ereport(NOTICE,
(errmsg("My log message")),
(errtag("EMITTER", "MYEXTENSION")),
(errtag("MSG-ID", "%d", error_message_id))
);

Interesting idea. I agree this would be useful.

Please find attached a very small POC patch to better demonstrate what I
propose.

Seems like a good start. I think a further step towards a committable
patch would include a test module that uses the new tagging
functionality.

On a related note, the only structured logger we have in-core is the CSV
logger. Many users nowadays end up feeding the logs to journald either by
capturing the stderr output with systemd, or by having syslog implemented by
journald itself. Would there be any interest in having native journald support
as a logging_destination ?

I don't know anything about journald, but I agree that we sorely need
some better way to emit structured log entries.

--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
"The problem with the facetime model is not just that it's demoralizing, but
that the people pretending to work interrupt the ones actually working."
(Paul Graham)

Magnus Hagander

magnus@hagander.net

almost 5 years ago

In reply to: Alvaro Herrera (#2)

Re: Proposal: More structured logging

On Thu, Aug 19, 2021 at 3:04 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Aug-13, Ronan Dunklau wrote:

ereport(NOTICE,
(errmsg("My log message")),
(errtag("EMITTER", "MYEXTENSION")),
(errtag("MSG-ID", "%d", error_message_id))
);

Interesting idea. I agree this would be useful.

Please find attached a very small POC patch to better demonstrate what I
propose.

Seems like a good start. I think a further step towards a committable
patch would include a test module that uses the new tagging
functionality.

On a related note, the only structured logger we have in-core is the CSV
logger. Many users nowadays end up feeding the logs to journald either by
capturing the stderr output with systemd, or by having syslog implemented by
journald itself. Would there be any interest in having native journald support
as a logging_destination ?

I don't know anything about journald, but I agree that we sorely need
some better way to emit structured log entries.

+1. I think having the ability to natively send structured data to
journald would be useful.

Another thing I've noticed in more and more other products is to be
able to log as json, which is then later thrown into a central logging
system somewhere. Basically like csv, but with the schema defined in
each row. Yes, a lot more overhead, but some systems really do like to
consume json.... So when we're on the topic of more structured
logging...

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Alvaro Herrera

alvherre@2ndquadrant.com

almost 5 years ago

In reply to: Magnus Hagander (#3)

Re: Proposal: More structured logging

On 2021-Aug-19, Magnus Hagander wrote:

Another thing I've noticed in more and more other products is to be
able to log as json, which is then later thrown into a central logging
system somewhere. Basically like csv, but with the schema defined in
each row. Yes, a lot more overhead, but some systems really do like to
consume json.... So when we're on the topic of more structured
logging...

Yeah, I was thinking in json logging too -- specifically thinking about
Fluentbit and similar tools.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

Ronan Dunklau

ronan.dunklau@aiven.io

almost 5 years ago

In reply to: Alvaro Herrera (#2)

Re: Proposal: More structured logging

Le jeudi 19 août 2021, 15:04:30 CEST Alvaro Herrera a écrit :

On 2021-Aug-13, Ronan Dunklau wrote:

ereport(NOTICE,

(errmsg("My log message")),
(errtag("EMITTER", "MYEXTENSION")),
(errtag("MSG-ID", "%d", error_message_id))

);

Interesting idea. I agree this would be useful.

Please find attached a very small POC patch to better demonstrate what I
propose.

Seems like a good start. I think a further step towards a committable
patch would include a test module that uses the new tagging
functionality.

Please find attached the original patch + a new one adding a test module.
The test module exposes a function for generating logs with tags, and a log
hook which format the tags in the DETAIL field for easy regression testing.

Next step would be to emit those tags in the CSV logs. I'm not sure what
representation they should have though: a nested csv in it's own column ? A
key => value pairs list similar to hstore ? A json object ?

Also we should probably make this available to the client too, but once again
the format of the tag field needs to be defined. Any opinion ?

--
Ronan Dunklau

Ronan Dunklau

ronan.dunklau@aiven.io

almost 5 years ago

In reply to: Alvaro Herrera (#4)

Re: Proposal: More structured logging

Le jeudi 19 août 2021, 16:50:10 CEST Alvaro Herrera a écrit :

On 2021-Aug-19, Magnus Hagander wrote:

Another thing I've noticed in more and more other products is to be
able to log as json, which is then later thrown into a central logging
system somewhere. Basically like csv, but with the schema defined in
each row. Yes, a lot more overhead, but some systems really do like to
consume json.... So when we're on the topic of more structured
logging...

Yeah, I was thinking in json logging too -- specifically thinking about
Fluentbit and similar tools.

Michael, your jsonlog module already fullfills this need. Is it something that
should be merged into our tree ?

--
Ronan Dunklau

Michael Paquier

michael@paquier.xyz

almost 5 years ago

In reply to: Ronan Dunklau (#6)

Re: Proposal: More structured logging

On Fri, Aug 20, 2021 at 11:35:29AM +0200, Ronan Dunklau wrote:

Michael, your jsonlog module already fullfills this need. Is it something that
should be merged into our tree ?

Yes, there is nothing technically preventing to have this stuff in
core, of course, and that would even take care of the issues in
detecting if the piping protocol should be used or not.

Now, the last time this was proposed, there was a lot of drawback
because the presence of the JSON keys increases significantly the size
of the logs compared to CSV, and usually users care a lot about the
size of the log files. I would support the presence of JSON format
for the logs in core, FWIW.

The module is here, for reference:
https://github.com/michaelpq/pg_plugins/tree/master/jsonlog
--
Michael

Ronan Dunklau

ronan.dunklau@aiven.io

almost 5 years ago

In reply to: Ronan Dunklau (#5)

Re: Proposal: More structured logging

Le vendredi 20 août 2021, 11:31:21 CEST Ronan Dunklau a écrit :

Le jeudi 19 août 2021, 15:04:30 CEST Alvaro Herrera a écrit :

On 2021-Aug-13, Ronan Dunklau wrote:

ereport(NOTICE,

(errmsg("My log message")),
(errtag("EMITTER", "MYEXTENSION")),
(errtag("MSG-ID", "%d", error_message_id))

);

Interesting idea. I agree this would be useful.

Please find attached a very small POC patch to better demonstrate what I
propose.

Seems like a good start. I think a further step towards a committable
patch would include a test module that uses the new tagging
functionality.

Please find attached the original patch + a new one adding a test module.
The test module exposes a function for generating logs with tags, and a log
hook which format the tags in the DETAIL field for easy regression testing.

Next step would be to emit those tags in the CSV logs. I'm not sure what
representation they should have though: a nested csv in it's own column ? A
key => value pairs list similar to hstore ? A json object ?

I opted for a JSON representation in the CSV logs, please find attached v3
which is the same thing as v2 with another patch for CSV log output.

Also we should probably make this available to the client too, but once
again the format of the tag field needs to be defined. Any opinion ?

--
Ronan Dunklau

Magnus Hagander

magnus@hagander.net

almost 5 years ago

In reply to: Michael Paquier (#7)

Re: Proposal: More structured logging

On Sat, Aug 21, 2021 at 2:37 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Aug 20, 2021 at 11:35:29AM +0200, Ronan Dunklau wrote:

Michael, your jsonlog module already fullfills this need. Is it something that
should be merged into our tree ?

Yes, there is nothing technically preventing to have this stuff in
core, of course, and that would even take care of the issues in
detecting if the piping protocol should be used or not.

Now, the last time this was proposed, there was a lot of drawback
because the presence of the JSON keys increases significantly the size
of the logs compared to CSV, and usually users care a lot about the
size of the log files. I would support the presence of JSON format
for the logs in core, FWIW.

As long as it's optional, I don't think that drawback holds as an
argument. The same argument could be made against the cvs logs in the
first place -- they add information to every row that a lot of people
don't need. But it's optional. Leaving it up to the administrator to
choose whether they prefer the verbose-and-easier-to-parse-maybe
format or the more compact format seems like the right path for that.
I bet quite a few would actually choose json -- easier to integrate
with other systems (because newer systems love json), and unless
you're actually logging a lot of queries (which many people don't),
the size of the logfile is often not a concern at all.

In short, I would also support the presence of JSON log format in
core. (but as a proper log_destination of course -- or if it's time to
actually split that into a separaet thing, being one parameter for
log_destination and another for log_format)

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#10

Michael Paquier

michael@paquier.xyz

almost 5 years ago

In reply to: Magnus Hagander (#9)

Re: Proposal: More structured logging

On Mon, Aug 23, 2021 at 11:33:09AM +0200, Magnus Hagander wrote:

As long as it's optional, I don't think that drawback holds as an
argument. The same argument could be made against the cvs logs in the
first place -- they add information to every row that a lot of people
don't need. But it's optional. Leaving it up to the administrator to
choose whether they prefer the verbose-and-easier-to-parse-maybe
format or the more compact format seems like the right path for that.

From a code perspective, and while on it, we could split a bit elog.c
and move the log entries generated for each format into their own
file. That would be cleaner for CSV and JSON. As a whole I don't
have an objection with moving the JSON format into core. If one
proposes a patch, feel free to reuse the code from the module I have
released.

I bet quite a few would actually choose json -- easier to integrate
with other systems (because newer systems love json), and unless
you're actually logging a lot of queries (which many people don't),
the size of the logfile is often not a concern at all.

The module I publish on github to do this work is the most popular
thing of my plugin repo, FWIW. It even gets bug reports, sometimes.

In short, I would also support the presence of JSON log format in
core. (but as a proper log_destination of course -- or if it's time to
actually split that into a separaet thing, being one parameter for
log_destination and another for log_format)

What would you gain with a new parameter here? I think that this
would be rather confusing and log_destination has been around for
years.
--
Michael

#11

Sehrope Sarkuni

sehrope@jackdb.com

over 4 years ago

In reply to: Michael Paquier (#10)

Re: Proposal: More structured logging

On Tue, Aug 24, 2021 at 7:22 PM Michael Paquier <michael@paquier.xyz> wrote:

From a code perspective, and while on it, we could split a bit elog.c
and move the log entries generated for each format into their own
file. That would be cleaner for CSV and JSON. As a whole I don't
have an objection with moving the JSON format into core. If one
proposes a patch, feel free to reuse the code from the module I have
released.

I had a patch that does exactly this from a few years ago. It started off
copying a bunch out of your json logging module and adds it as a new
"jsonlog" destination.

It needed some cleanup due to bit rot, but it now builds and works atop
master. I'll post it in its own thread.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

#12

Michael Paquier

michael@paquier.xyz

over 4 years ago

In reply to: Sehrope Sarkuni (#11)

Re: Proposal: More structured logging

On Tue, Aug 31, 2021 at 10:46:30AM -0400, Sehrope Sarkuni wrote:

It needed some cleanup due to bit rot, but it now builds and works atop
master. I'll post it in its own thread.

Thanks.
--
Michael

#13

Peter Eisentraut

peter_e@gmx.net

over 4 years ago

In reply to: Magnus Hagander (#9)

Re: Proposal: More structured logging

On 23.08.21 11:33, Magnus Hagander wrote:

In short, I would also support the presence of JSON log format in
core. (but as a proper log_destination of course -- or if it's time to
actually split that into a separaet thing, being one parameter for
log_destination and another for log_format)

It would be useful if there was some kind of standardized schema for
JSON logging, meaning what the keys should be named etc. Then we don't
need to make this all up from scratch, and there could be some
integration with existing log analysis tools.

#14

Peter Eisentraut

peter_e@gmx.net

over 4 years ago

In reply to: Ronan Dunklau (#1)

Re: Proposal: More structured logging

On 13.08.21 15:23, Ronan Dunklau wrote:

The logging system already captures a lot of information in the ErrorData. But
at present there is no way for a log message authors to include more metadata
about the log outside of the log message itself. For example, including the
extension name which can be useful for filtering / dispatching log messages.
This can be done in the log message itself, but that requires specialized
parsing in the log output.

Even though among the available loggers in core, only the csv logger would be
able to make sense of it, I feel it would be beneficial to add a tagging system
to logs, by adding different (tag, value) combinations to a log entry, with an
API similar to what we do for other ErrorData fields:

ereport(NOTICE,
(errmsg("My log message")),
(errtag("EMITTER", "MYEXTENSION")),
(errtag("MSG-ID", "%d", error_message_id))
);

What are some more examples of where this could be used? The extension
name could be added more or less automatically to ereport() calls. I
don't know what the MSG-ID is supposed to be. Are there other use cases?

#15

Ronan Dunklau

ronan.dunklau@aiven.io

over 4 years ago

In reply to: Peter Eisentraut (#14)

Re: Proposal: More structured logging

Le mercredi 1 septembre 2021, 09:36:50 CEST Peter Eisentraut a écrit :

On 13.08.21 15:23, Ronan Dunklau wrote:

The logging system already captures a lot of information in the ErrorData.
But at present there is no way for a log message authors to include more
metadata about the log outside of the log message itself. For example,
including the extension name which can be useful for filtering /
dispatching log messages. This can be done in the log message itself, but
that requires specialized parsing in the log output.

Even though among the available loggers in core, only the csv logger would
be able to make sense of it, I feel it would be beneficial to add a
tagging system to logs, by adding different (tag, value) combinations to
a log entry, with an API similar to what we do for other ErrorData
fields:

ereport(NOTICE,

(errmsg("My log message")),
(errtag("EMITTER", "MYEXTENSION")),
(errtag("MSG-ID", "%d", error_message_id))

);

What are some more examples of where this could be used? The extension
name could be added more or less automatically to ereport() calls. I
don't know what the MSG-ID is supposed to be. Are there other use cases?

Adding it automatically would be nice, but how would you go about that ?

In-core it would open up the possibility to split log messages into different
fields, for example the different statistics reported in the logs by VACUUM /
ANALYZE VERBOSE and make it easier to consume the output without having to
parse the message. Parsing the message also means that any tool parsing it
needs to either be aware of the locale, or to force the user to use a specific
one.

Out-of-core, the same things could be done for extensions like pg_audit which
logs structured information into the message itself, leaving the message
format to be parsed at a later time.

--
Ronan Dunklau

#16

Peter Eisentraut

peter_e@gmx.net

over 4 years ago

In reply to: Ronan Dunklau (#15)

Re: Proposal: More structured logging

On 01.09.21 10:00, Ronan Dunklau wrote:

In-core it would open up the possibility to split log messages into different
fields, for example the different statistics reported in the logs by VACUUM /
ANALYZE VERBOSE and make it easier to consume the output without having to
parse the message. Parsing the message also means that any tool parsing it
needs to either be aware of the locale, or to force the user to use a specific
one.

I think those messages would themselves have substructure. For example,
the current message

"I/O timings: read: %.3f ms, write: %.3f ms\n"

might be

{"I/O timings": {"read": ..., "write": ...}}

and that in turn is already part of a larger message.

So just having a single level of tags wouldn't really work for this.

#17

Ronan Dunklau

ronan.dunklau@aiven.io

over 4 years ago

In reply to: Peter Eisentraut (#16)

Re: Proposal: More structured logging

Le mercredi 8 septembre 2021, 11:51:31 CEST Peter Eisentraut a écrit :

On 01.09.21 10:00, Ronan Dunklau wrote:

In-core it would open up the possibility to split log messages into
different fields, for example the different statistics reported in the
logs by VACUUM / ANALYZE VERBOSE and make it easier to consume the output
without having to parse the message. Parsing the message also means that
any tool parsing it needs to either be aware of the locale, or to force
the user to use a specific one.

I think those messages would themselves have substructure. For example,
the current message

"I/O timings: read: %.3f ms, write: %.3f ms\n"

might be

{"I/O timings": {"read": ..., "write": ...}}

and that in turn is already part of a larger message.

So just having a single level of tags wouldn't really work for this.

I agree having a nested structure may feel more natural, but I don't think it
would matter too much if we standardize on ini-style property names (ie,
iotimings.read, iotimings.write and so on...)

Regards,

--
Ronan Dunklau

#18

Justin Pryzby

pryzby@telsasoft.com

over 4 years ago

In reply to: Ronan Dunklau (#8)

Re: Proposal: More structured logging

Subject: [PATCH v3 2/3] Add test module for the new tag functionality.

...

+test_logging(PG_FUNCTION_ARGS)
+{

...

+	 (errmsg("%s", message),
+	  ({
+		forboth(lk, keys, lv, values)
+		{
+			(errtag(lfirst(lk), "%s", (char *) lfirst(lv)));
+		}})
+	));

The windows build fails with that syntax.
http://cfbot.cputube.org/ronan-dunklau.html
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.157923

Subject: [PATCH v3 3/3] Output error tags in CSV logs
+++ b/doc/src/sgml/config.sgml
@@ -7370,6 +7371,7 @@ CREATE TABLE postgres_log
backend_type text,
leader_pid integer,
query_id bigint,
+  tags jsonb
PRIMARY KEY (session_id, session_line_num)
);
</programlisting>

That's invalid sql due to missing a trailing ",".

You should also update file-fdw.sgml - which I only think of since we forgot in
to update it before e568ed0eb and 0830d21f5. config.sgml should have a comment
as a reminder to do that.

--
Justin

#19

Ronan Dunklau

ronan.dunklau@aiven.io

over 4 years ago

In reply to: Justin Pryzby (#18)

Re: Proposal: More structured logging

Le mercredi 29 décembre 2021, 14:59:16 CET Justin Pryzby a écrit :

Subject: [PATCH v3 2/3] Add test module for the new tag functionality.

...

+test_logging(PG_FUNCTION_ARGS)
+{

...
+	 (errmsg("%s", message),
+	  ({
+		forboth(lk, keys, lv, values)
+		{
+			(errtag(lfirst(lk), "%s", (char *) 

lfirst(lv)));

+ }})
+ ));

The windows build fails with that syntax.
http://cfbot.cputube.org/ronan-dunklau.html
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.157923

Thank you. I switched to an explicit sequence of errstart / errfinish to avoid
putting too much things in nested macro calls. As I don't have any Windows
knowledge, I am very grateful for the new cirrus-ci integration which allowed
me to build on Windows without hassle.

Subject: [PATCH v3 3/3] Output error tags in CSV logs
+++ b/doc/src/sgml/config.sgml
@@ -7370,6 +7371,7 @@ CREATE TABLE postgres_log
backend_type text,
leader_pid integer,
query_id bigint,

+ tags jsonb

PRIMARY KEY (session_id, session_line_num)

);
</programlisting>
That's invalid sql due to missing a trailing ",".

Thanks, fixed.

You should also update file-fdw.sgml - which I only think of since we forgot
in to update it before e568ed0eb and 0830d21f5. config.sgml should have a
comment as a reminder to do that.

Done, and I added anoher commit per your suggestion to add this comment.

Thank you for this review.

Regards,

--
Ronan Dunklau

#20

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Ronan Dunklau (#19)

Re: Proposal: More structured logging

Hi,

On Tue, Jan 11, 2022 at 11:05:26AM +0100, Ronan Dunklau wrote:

Done, and I added anoher commit per your suggestion to add this comment.

The cfbot reports that the patchset doesn't apply anymore:

http://cfbot.cputube.org/patch_36_3293.log
=== Applying patches on top of PostgreSQL commit ID 43c2175121c829c8591fc5117b725f1f22bfb670 ===
[...]
=== applying patch ./v4-0003-Output-error-tags-in-CSV-logs.patch
[...]
patching file src/backend/utils/error/elog.c
Hunk #2 FAILED at 3014.
1 out of 2 hunks FAILED -- saving rejects to file src/backend/utils/error/elog.c.rej

Could you send a rebased version? In the meantime I will switch the cf entry
to Waiting on Author.

#21

Ronan Dunklau

ronan.dunklau@aiven.io

over 4 years ago

In reply to: Julien Rouhaud (#20)

#22

Ronan Dunklau

ronan.dunklau@aiven.io

over 4 years ago

In reply to: Ronan Dunklau (#21)

#23