Proposal: Adding json logging
*Hello,*
I'm new here. I'm David and would describe myself as an ambitious newbie,
so please take my suggestion with a grain of salt.
*Use case:*
I find it difficult to properly parse postgres logs into some kind of log
aggregator (I use fluent bit). My two standard option are standard and
csvlog.
I have reviewed some log samples and all DO contain some kind of multi line
logs which are very uncomfortable to parse reliably in a log streamer.
I asked Michael Paquier about his solution:
https://github.com/michaelpq/pg_plugins/tree/master/jsonlog
He was suggestion to take action and propose this extension again to be
included in contrib:
https://github.com/michaelpq/pg_plugins/issues/24
He mentioned the argument was rised of taking too much place.
This is true under the paradigm that logs are consumed by TTY or grep,
however, if those logs are to be stored in a logging solution, this is not
really of concern.
Please let me know if you need more context on my use case.
That beeing said the proposal is to accept this library into postgres
contrib.
Please let me know, if I should prepare a patch.
*Best Regards,*
David A.
--
[image: XOE Solutions] <http://xoe.solutions/> DAVID ARNOLD
Gerente General
xoe.solutions
dar@xoe.solutions
+57 (315) 304 13 68
*Confidentiality Note: * This email may contain confidential and/or private
information. If you received this email in error please delete and notify
sender.
*Environmental Consideration: * Please avoid printing this email on paper,
unless really necessary.
On Sat, Apr 14, 2018 at 12:00:16AM +0000, David Arnold wrote:
I'm new here. I'm David and would describe myself as an ambitious newbie,
so please take my suggestion with a grain of salt.
Welcome here.
I asked Michael Paquier about his solution:
https://github.com/michaelpq/pg_plugins/tree/master/jsonlog
He was suggestion to take action and propose this extension again to be
included in contrib:
https://github.com/michaelpq/pg_plugins/issues/24He mentioned the argument was rised of taking too much place.
This is true under the paradigm that logs are consumed by TTY or grep,
however, if those logs are to be stored in a logging solution, this is not
really of concern.
Here are the exact same words I used on this github thread to avoid
confusion:
"I proposed that a couple of years back, to be rejected as the key names
are too much repetitive and take too much place. I have personally plans
to work on other things, so if anybody wishes to take this code and send
a proposal upstream, feel free to! The code is under PostgreSQL license
and I am fine if a patch is proposed even with this code taken."
I am not sure that the concerns expressed back on community-side have
changed. I cannot put back my finger on the -hackers thread where this
has been discussed by the way, the extra log volume caused by repetitive
key names was one.
Please let me know if you need more context on my use case.
That beeing said the proposal is to accept this library into postgres
contrib.Please let me know, if I should prepare a patch.
It is better to gather opinions before delivering a patch. If there is
consensus that people would like to have an in-core option to allow logs
in json format, for which I am sure that folks would *not* want a
contrib/ plugin but something as an extension of log_destination, then
of course you could move ahead and propose a patch. Of course feel free
to reuse any code in my module if that helps! It is released under
PostgreSQL license as well.
Please note two things though:
- Patch submission follows a particular flow, be sure to read those
notes:
https://wiki.postgresql.org/wiki/Submitting_a_Patch
- Once you have a patch, you need to send it to a commit fest, for which
the next one will likely be next September (precise schedule will be
finalized at the end of May at PGCon) for the beginning of development
of Postgres 12. The development of Postgres 11 has just finished, so
the focus is to stabilize the release first, which consists in testing
and double-checking that everything which has been merged is stable. So
please do not expect immediate feedback on any patch you send.
Thanks,
--
Michael
On 14 April 2018 at 11:24, Michael Paquier <michael@paquier.xyz> wrote:
"I proposed that a couple of years back, to be rejected as the key names
are too much repetitive and take too much place.
gzip is astonishingly good at dealing with that, so I think that's
actually a bit of a silly reason to block it.
Plus it's likely only a short-lived interchange format, not something
to be retained for a long period.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Plus it's likely only a short-lived interchange format, not something to be
retained for a long period.
Absolutely.
There might be an argument that it's not easy on the eyes in the case it
would be consumed by a pair of them. It's absolutely valid. Golang
community has found a solution for that called logfmt, which I personally
appreciate.
It's somewhat similar to JSON, but a lot easier on the eyes, so if logs go
to the stdout of a docker container and are forwarded afterwards, you still
can attach to the live container logs and actually understand something.
If it's for that reason, logfmt is possibly preferable and there is already
a lot of standard tooling available for it.
Any thoughts on that argument?
Best Regards
El sáb., 14 abr. 2018, 7:59 a.m., Craig Ringer <craig@2ndquadrant.com>
escribió:
Show quoted text
On 14 April 2018 at 11:24, Michael Paquier <michael@paquier.xyz> wrote:
"I proposed that a couple of years back, to be rejected as the key names
are too much repetitive and take too much place.gzip is astonishingly good at dealing with that, so I think that's
actually a bit of a silly reason to block it.Plus it's likely only a short-lived interchange format, not something
to be retained for a long period.--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Apr 14, 2018 at 03:27:58PM +0000, David Arnold wrote:
Plus it's likely only a short-lived interchange format, not something to be
retained for a long period.
Absolutely.
There might be an argument that it's not easy on the eyes in the case it
would be consumed by a pair of them. It's absolutely valid. Golang
community has found a solution for that called logfmt, which I personally
appreciate.
I think a suite of json_to_* utilities would be a good bit more
helpful in this regard than changing our human-eye-consumable logs. We
already have human-eye-consumable logs by default. What we don't
have, and increasingly do want, is a log format that's really easy on
machines.
As to logfmt in particular, the fact that it's not standardized is
probably a show-stopper.
Let's go with JSON.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes:
I think a suite of json_to_* utilities would be a good bit more
helpful in this regard than changing our human-eye-consumable logs. We
already have human-eye-consumable logs by default. What we don't
have, and increasingly do want, is a log format that's really easy on
machines.
I'm dubious that JSON is "easier on machines" than CSV.
regards, tom lane
I'm dubious that JSON is "easier on machines" than CSV.
Under common paradigms you are right, but if we talk of line-by-line
streaming with subsequent processing, then it's a show stopper. Of course,
some log aggregators have buffers for that and can do Multiline parsing on
that buffer, but
1. Not all log aggregators support it
2. Building a parser which reliably detects Multiline logs AND is easy on
resources is probably not something a normal person can achieve quickly.
So normally CSV is fine but for log streaming it's not the best, nor the
most standard compliant way.
El sáb., 14 abr. 2018, 10:51 a.m., Tom Lane <tgl@sss.pgh.pa.us> escribió:
Show quoted text
David Fetter <david@fetter.org> writes:
I think a suite of json_to_* utilities would be a good bit more
helpful in this regard than changing our human-eye-consumable logs. We
already have human-eye-consumable logs by default. What we don't
have, and increasingly do want, is a log format that's really easy on
machines.I'm dubious that JSON is "easier on machines" than CSV.
regards, tom lane
On Sat, Apr 14, 2018 at 11:51:17AM -0400, Tom Lane wrote:
David Fetter <david@fetter.org> writes:
I think a suite of json_to_* utilities would be a good bit more
helpful in this regard than changing our human-eye-consumable
logs. We already have human-eye-consumable logs by default. What
we don't have, and increasingly do want, is a log format that's
really easy on machines.I'm dubious that JSON is "easier on machines" than CSV.
I've found the opposite.
CSV is very poorly specified, which makes it at best complicated to
build correct parsing libraries. JSON, whatever gripes I have about
the format[1]These are mostly the lack of comments and of some useful data types like large integers, floats, and ISO-8601 dates. PostgreSQL continues to share that last. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 is extremely well specified, and hence has excellent
parsing libraries.
Best,
David.
[1]: These are mostly the lack of comments and of some useful data types like large integers, floats, and ISO-8601 dates. PostgreSQL continues to share that last. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778
types like large integers, floats, and ISO-8601 dates. PostgreSQL
continues to share that last.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
As to logfmt in particular, the fact that it's not standardized is probably a
show-stopper.
Let's go with JSON.
I Agree. Though I don't want to deprecate the idea of logfmt enterly, yet.
In container infrastructure it's a defacto standard and it solves a real
problem. But I'm in favor to step back with that idea in favour of
prioritizing JSON.
El sáb., 14 abr. 2018, 11:03 a.m., David Arnold <dar@xoe.solutions>
escribió:
Show quoted text
I'm dubious that JSON is "easier on machines" than CSV.
Under common paradigms you are right, but if we talk of line-by-line
streaming with subsequent processing, then it's a show stopper. Of course,
some log aggregators have buffers for that and can do Multiline parsing on
that buffer, but
1. Not all log aggregators support it
2. Building a parser which reliably detects Multiline logs AND is easy on
resources is probably not something a normal person can achieve quickly.So normally CSV is fine but for log streaming it's not the best, nor the
most standard compliant way.El sáb., 14 abr. 2018, 10:51 a.m., Tom Lane <tgl@sss.pgh.pa.us> escribió:
David Fetter <david@fetter.org> writes:
I think a suite of json_to_* utilities would be a good bit more
helpful in this regard than changing our human-eye-consumable logs. We
already have human-eye-consumable logs by default. What we don't
have, and increasingly do want, is a log format that's really easy on
machines.I'm dubious that JSON is "easier on machines" than CSV.
regards, tom lane
Given we have the following LOG_DESTIONATION...
/* Log destination bitmap */
#define LOG_DESTINATION_STDERR 1
#define LOG_DESTINATION_SYSLOG 2
#define LOG_DESTINATION_EVENTLOG 4
#define LOG_DESTINATION_CSVLOG 8
Something confuses me about CSVLOG...
Isn't log destination and log formatting tow different kinds? How to deal
with that mix?
I was somewhat expecting to find a log formatting hook somewhere around,
but it seems more complicated than that.
El sáb., 14 abr. 2018 a las 11:51, Chapman Flack (<chap@anastigmatix.net>)
escribió:
On 04/14/18 12:05, David Fetter wrote:
On Sat, Apr 14, 2018 at 11:51:17AM -0400, Tom Lane wrote:
I'm dubious that JSON is "easier on machines" than CSV.
I've found the opposite.
CSV is very poorly specified, which makes it at best complicated to
build correct parsing libraries.I was just about to say the same thing. Based on my experience, I can infer
the history of CSV as a format was something like this:"we'll use commas to separate the values"
- some implementations released
"but what if a value has a comma?"
- some new implementations released
"what if it has a quote?"
- some newer implementations released
"a newline?"
- ...
JSON, whatever gripes I have about
the format[1] is extremely well specified, and hence has excellent
parsing libraries.It has, if nothing else, the benefit of coming around later and seeing
what happened with CSV.-Chap
--
[image: XOE Solutions] <http://xoe.solutions/> DAVID ARNOLD
Gerente General
xoe.solutions
dar@xoe.solutions
+57 (315) 304 13 68
*Confidentiality Note: * This email may contain confidential and/or private
information. If you received this email in error please delete and notify
sender.
*Environmental Consideration: * Please avoid printing this email on paper,
unless really necessary.
Import Notes
Reply to msg id not found: 5AD2319A.3040608@anastigmatix.net
On 2018-04-14 18:05:18 +0200, David Fetter wrote:
On Sat, Apr 14, 2018 at 11:51:17AM -0400, Tom Lane wrote:
David Fetter <david@fetter.org> writes:
I think a suite of json_to_* utilities would be a good bit more
helpful in this regard than changing our human-eye-consumable
logs. We already have human-eye-consumable logs by default. What
we don't have, and increasingly do want, is a log format that's
really easy on machines.I'm dubious that JSON is "easier on machines" than CSV.
I've found the opposite.
CSV is very poorly specified, which makes it at best complicated to
build correct parsing libraries. JSON, whatever gripes I have about
the format[1] is extremely well specified, and hence has excellent
parsing libraries.
Worth to notice that useful json formats for logging also kinda don't
follow standards. Either you end up with entire logfiles as one big
array, which most libraries won't parse and makes logrotate etc really
complicated, or you end up with some easy to parse format where newlines
have non-standard record separator meaning.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2018-04-14 18:05:18 +0200, David Fetter wrote:
CSV is very poorly specified, which makes it at best complicated to
build correct parsing libraries. JSON, whatever gripes I have about
the format[1] is extremely well specified, and hence has excellent
parsing libraries.
Worth to notice that useful json formats for logging also kinda don't
follow standards. Either you end up with entire logfiles as one big
array, which most libraries won't parse and makes logrotate etc really
complicated, or you end up with some easy to parse format where newlines
have non-standard record separator meaning.
Hmm .. that, actually, seems like a pretty serious objection. If the beef
with CSV is that it's poorly specified and inconsistently implemented
(which is surely true), then using some nonstandard variant of JSON
doesn't seem like it's going to lead to a big step forward.
"The wonderful thing about standards is there are so many to choose from."
(variously attributed to Hopper, Tanenbaum, and others)
regards, tom lane
On Sat, Apr 14, 2018 at 01:20:16PM -0700, Andres Freund wrote:
On 2018-04-14 18:05:18 +0200, David Fetter wrote:
On Sat, Apr 14, 2018 at 11:51:17AM -0400, Tom Lane wrote:
David Fetter <david@fetter.org> writes:
I think a suite of json_to_* utilities would be a good bit more
helpful in this regard than changing our human-eye-consumable
logs. We already have human-eye-consumable logs by default. What
we don't have, and increasingly do want, is a log format that's
really easy on machines.I'm dubious that JSON is "easier on machines" than CSV.
I've found the opposite.
CSV is very poorly specified, which makes it at best complicated to
build correct parsing libraries. JSON, whatever gripes I have about
the format[1] is extremely well specified, and hence has excellent
parsing libraries.Worth to notice that useful json formats for logging also kinda don't
follow standards. Either you end up with entire logfiles as one big
array, which most libraries won't parse and makes logrotate etc really
complicated, or you end up with some easy to parse format where newlines
have non-standard record separator meaning.
I don't see this as a big problem. The smallest-lift thing is to put
something along the lines of:
When you log as JSON, those logs are JSON objects, one per output
event. They are not guaranteed to break on newlines.
A slightly larger lift would include escaping newlines and ensuring
that JSON output is always single lines, however long.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On 2018-04-15 00:31:14 +0200, David Fetter wrote:
On Sat, Apr 14, 2018 at 01:20:16PM -0700, Andres Freund wrote:
On 2018-04-14 18:05:18 +0200, David Fetter wrote:
CSV is very poorly specified, which makes it at best complicated to
build correct parsing libraries. JSON, whatever gripes I have about
the format[1] is extremely well specified, and hence has excellent
parsing libraries.Worth to notice that useful json formats for logging also kinda don't
follow standards. Either you end up with entire logfiles as one big
array, which most libraries won't parse and makes logrotate etc really
complicated, or you end up with some easy to parse format where newlines
have non-standard record separator meaning.I don't see this as a big problem. The smallest-lift thing is to put
something along the lines of:When you log as JSON, those logs are JSON objects, one per output
event. They are not guaranteed to break on newlines.A slightly larger lift would include escaping newlines and ensuring
that JSON output is always single lines, however long.
Still obliterates your "standard standard standard" line of
argument. There seem to valid arguments for adding json regardless, but
that line is just bogus.
Greetings,
Andres Freund
On Sat, Apr 14, 2018, 4:33 PM Andres Freund <andres@anarazel.de> wrote:
On 2018-04-15 00:31:14 +0200, David Fetter wrote:
On Sat, Apr 14, 2018 at 01:20:16PM -0700, Andres Freund wrote:
On 2018-04-14 18:05:18 +0200, David Fetter wrote:
CSV is very poorly specified, which makes it at best complicated to
build correct parsing libraries. JSON, whatever gripes I have about
the format[1] is extremely well specified, and hence has excellent
parsing libraries.Worth to notice that useful json formats for logging also kinda don't
follow standards. Either you end up with entire logfiles as one big
array, which most libraries won't parse and makes logrotate etc really
complicated, or you end up with some easy to parse format wherenewlines
have non-standard record separator meaning.
I don't see this as a big problem. The smallest-lift thing is to put
something along the lines of:When you log as JSON, those logs are JSON objects, one per output
event. They are not guaranteed to break on newlines.A slightly larger lift would include escaping newlines and ensuring
that JSON output is always single lines, however long.Still obliterates your "standard standard standard" line of
argument. There seem to valid arguments for adding json regardless, but
that line is just bogus.Greetings,
Andres Freund
The format is known as JSON Lines.
http://jsonlines.org/
Ryan
Show quoted text
I would suggest that the community consider whether postgres will log multidimensional data. That will weigh into the decision of json vs. another format quite significantly. I am a fan of the json5 spec (https://json5.org/), though adoption of this is quite poor.
---
Jordan Deitch
https://id.rsa.pub
A slightly larger lift would include escaping newlines and ensuring that JSON
output is always single lines, however long.
I think that's necessary, actually I was implicitly assuming that as a
prerequisite. I cannot imagine anything else beeing actually useful.
Alternatively, I'm sure logfmt has a well thought-through solution for that
:-)
I would suggest that the community consider whether postgres will log
multidimensional data. That will weigh into the decision of json vs.
another format quite significantly. I am a fan of the json5 spec (
https://json5.org/), though adoption of this is quite poor.
What do you mean by multidimensional data? Arrays/maps?
I think there is no advantage of multidimensional vs prefixed flat logging
unless data structure gets really nastily nested.
What case where you thinking of?
El sáb., 14 abr. 2018, 6:25 p.m., Jordan Deitch <jd@rsa.pub> escribió:
Show quoted text
I would suggest that the community consider whether postgres will log
multidimensional data. That will weigh into the decision of json vs.
another format quite significantly. I am a fan of the json5 spec (
https://json5.org/), though adoption of this is quite poor.---
Jordan Deitch
https://id.rsa.pub
I would suggest that the community consider whether postgres will log
multidimensional data. That will weigh into the decision of json vs.
another format quite significantly. I am a fan of the json5 spec (
https://json5.org/), though adoption of this is quite poor.What do you mean by multidimensional data? Arrays/maps?
I think there is no advantage of multidimensional vs prefixed flat logging
unless data structure gets really nastily nested.What case where you thinking of?
Exactly - arrays, maps, nested json objects. It's more organized and easier to reason about. As postgresql becomes more and more sophisticated over time, I see flat logging becoming more unwieldy. With tools like jq, reading and querying json on the command line is simple and user friendly, and using json for logging capture and aggregation is widely supporting and embraced.
On 15 April 2018 at 11:27, Jordan Deitch <jd@rsa.pub> wrote:
I would suggest that the community consider whether postgres will log
multidimensional data. That will weigh into the decision of json vs.
another format quite significantly. I am a fan of the json5 spec (
https://json5.org/), though adoption of this is quite poor.What do you mean by multidimensional data? Arrays/maps?
I think there is no advantage of multidimensional vs prefixed flat
logging
unless data structure gets really nastily nested.
What case where you thinking of?
Exactly - arrays, maps, nested json objects. It's more organized and
easier to reason about. As postgresql becomes more and more sophisticated
over time, I see flat logging becoming more unwieldy. With tools like jq,
reading and querying json on the command line is simple and user friendly,
and using json for logging capture and aggregation is widely supporting and
embraced.
Exactly what are you logging here ??? Why would I need to see a
multi-dimensional array in the log ?
Dave Cramer
davec@postgresintl.com
www.postgresintl.com
Exactly what are you logging here ??? Why would I need to see a
multi-dimensional array in the log ?
If I wanted to capture the location of errors my clients are encountering on their postgres clusters in detail, I would need to parse the 'LOCATION' string in their log entries, parse out the filename by splitting on the ':' character of that same line, and parse out the line number. Essentially any programmatic analysis of logs, as it stands today, would require string parsing. I'd rather have an organized, logical representation of information which I suggest is not possible in a flat, single dimensional structure.
{
"level":"ERROR",
"meta":{
"line_number":23,
"file": "parse_relation.c",
},
"detail:{
"condition_name"...,
"error_code"...,
},
time:....
}