Add jsonlog log_destination for JSON server logs

Started by Sehrope Sarkuniover 4 years ago40 messageshackers
Jump to latest
#1Sehrope Sarkuni
sehrope@jackdb.com

Hi,

This patch adds a new log_destination, "jsonlog", that writes log entries
as lines of JSON. It was originally started by David Fetter using
the jsonlog module by Michael Paquier (
https://github.com/michaelpq/pg_plugins/blob/master/jsonlog/jsonlog.c) as a
basis for how to serialize the log messages. Thanks to both of them because
this wouldn't be possible without that starting point.

The first commit splits out the destination in log pipe messages into its
own field. Previously it would piggyback on the "is_last" field. This adds
an int to the message size but makes the rest of the code easier to follow.

The second commit adds a TAP test for log_destination "csvlog". This was
done to both confirm that the previous change didn't break anything and as
a skeleton for the test in the next commit.

The third commit adds the new log_destination "jsonlog". The output format
is one line per entry with the top level output being a JSON object keyed
with the log fields. Newlines in the output fields are escaped as \n so the
output file has exactly one line per log entry. It also includes a new test
for verifying the JSON output with some basic regex checks (similar to the
csvlog test).

Here's a sample of what the log entries look like:

{"timestamp":"2021-08-31 10:15:25.129
EDT","user":"sehrope","dbname":"postgres","pid":12012,"remote_host":"[local]","session_id":"612e397d.2eec","line_num":1,"ps":"idle","session_start":"2021-08-31
10:15:25
EDT","vxid":"3/2","txid":"0","error_severity":"LOG","application_name":"
006_jsonlog.pl","message":"statement: SELECT 1/0"}

It builds and passes "make check-world" on Linux. It also includes code to
handle Windows as well but I have not actually tried building it there.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Attachments:

v001-0001-Adds-separate-dest-field-to-log-protocol-PipeProtoHe.patchtext/x-patch; charset=US-ASCII; name=v001-0001-Adds-separate-dest-field-to-log-protocol-PipeProtoHe.patchDownload+11-13
v001-0002-Add-TAP-test-for-csvlog.patchtext/x-patch; charset=US-ASCII; name=v001-0002-Add-TAP-test-for-csvlog.patchDownload+118-1
v001-0003-Add-jsonlog-log_destination-for-JSON-server-logs.patchtext/x-patch; charset=US-ASCII; name=v001-0003-Add-jsonlog-log_destination-for-JSON-server-logs.patchDownload+624-23
#2Michael Paquier
michael@paquier.xyz
In reply to: Sehrope Sarkuni (#1)
Re: Add jsonlog log_destination for JSON server logs

On Tue, Aug 31, 2021 at 11:34:56AM -0400, Sehrope Sarkuni wrote:

The second commit adds a TAP test for log_destination "csvlog". This was
done to both confirm that the previous change didn't break anything and as
a skeleton for the test in the next commit.

+note "Before sleep";
+usleep(100_000);
+note "Before rotate";
+$node->logrotate();
+note "After rotate";
+usleep(100_000);

Do you really need a rotation of the log files here? Wouldn't it be
better to grab the position of the current log file with a fixed log
file name, and then slurp the file from this position with your
expected output? That would make the test faster, as well.

The third commit adds the new log_destination "jsonlog". The output format
is one line per entry with the top level output being a JSON object keyed
with the log fields. Newlines in the output fields are escaped as \n so the
output file has exactly one line per log entry. It also includes a new test
for verifying the JSON output with some basic regex checks (similar to the
csvlog test).

+ * Write logs in json format.
+ */
+static void
+write_jsonlog(ErrorData *edata)
+{
Rather than making elog.c larger, I think that we should try to split
that into more files.  Why not refactoring out the CSV part first?
You could just call that csvlog.c, then create a new jsonlog.c for the
meat of the patch.

The list of fields is not up to date. At quick glance, you are
missing:
- backend type.
- leader PID.
- query ID.
- Session start timestamp (?)
--
Michael

#3Sehrope Sarkuni
sehrope@jackdb.com
In reply to: Michael Paquier (#2)
Re: Add jsonlog log_destination for JSON server logs

On Tue, Aug 31, 2021 at 8:43 PM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Aug 31, 2021 at 11:34:56AM -0400, Sehrope Sarkuni wrote:

The second commit adds a TAP test for log_destination "csvlog". This was
done to both confirm that the previous change didn't break anything and

as

a skeleton for the test in the next commit.

+note "Before sleep";
+usleep(100_000);
+note "Before rotate";
+$node->logrotate();
+note "After rotate";
+usleep(100_000);

Do you really need a rotation of the log files here? Wouldn't it be
better to grab the position of the current log file with a fixed log
file name, and then slurp the file from this position with your
expected output? That would make the test faster, as well.

Yes, that was intentional. I used the log rotation tap test as a base and
kept that piece in there so it verifies that the csv log files are actually
rotated. Same for the json logs.

Rather than making elog.c larger, I think that we should try to split
that into more files. Why not refactoring out the CSV part first?
You could just call that csvlog.c, then create a new jsonlog.c for the
meat of the patch.

That's a good idea. I'll try that out.

The list of fields is not up to date. At quick glance, you are

missing:
- backend type.
- leader PID.
- query ID.
- Session start timestamp (?)

Thanks. I'll take a look.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

#4Sehrope Sarkuni
sehrope@jackdb.com
In reply to: Sehrope Sarkuni (#3)
Re: Add jsonlog log_destination for JSON server logs

Updated patch set is attached.

This version splits out the existing csvlog code into its own file and
centralizes the common helpers into a new elog-internal.h so that they're
only included by the actual write_xyz sources.

That makes the elog.c changes in the JSON logging patch minimal as all it's
really doing is invoking the new write_jsonlog(...) function.

It also adds those missing fields to the JSON logger output.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Attachments:

v2-0001-Adds-separate-dest-field-to-log-protocol-PipeProtoHe.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Adds-separate-dest-field-to-log-protocol-PipeProtoHe.patchDownload+11-13
v2-0002-Add-TAP-test-for-csvlog.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Add-TAP-test-for-csvlog.patchDownload+118-1
v2-0003-Split-csv-handling-in-elog-c-into-separate-csvlog-c.patchtext/x-patch; charset=US-ASCII; name=v2-0003-Split-csv-handling-in-elog-c-into-separate-csvlog-c.patchDownload+365-298
v2-0004-Add-jsonlog-log_destination-for-JSON-server-logs.patchtext/x-patch; charset=US-ASCII; name=v2-0004-Add-jsonlog-log_destination-for-JSON-server-logs.patchDownload+669-24
#5Michael Paquier
michael@paquier.xyz
In reply to: Sehrope Sarkuni (#4)
Re: Add jsonlog log_destination for JSON server logs

On Wed, Sep 01, 2021 at 04:39:43PM -0400, Sehrope Sarkuni wrote:

That makes the elog.c changes in the JSON logging patch minimal as all it's
really doing is invoking the new write_jsonlog(...) function.

Looking at 0001, to do things in order.

@@ -46,8 +46,8 @@ typedef struct
char		nuls[2];		/* always \0\0 */
uint16		len;			/* size of this chunk (counts data only) */
int32		pid;			/* writer's pid */
-	char		is_last;		/* last chunk of message? 't' or 'f' ('T' or
-								 * 'F' for CSV case) */
+	int32		dest;			/* log destination */
+	char		is_last;        /* last chunk of message? 't' or 'f'*/
char		data[FLEXIBLE_ARRAY_MEMBER];	/* data payload starts here */
} PipeProtoHeader;

Making PipeProtoHeader larger is not free, and that could penalize
workloads with a lot of short messages and many backends as the
syslogger relies on pipes with sync calls. Why not switching is_last
to bits8 flags instead? That should be enough for the addition of
JSON. 3 bits are enough at the end: one to know if it is the last
chunk of message, one for CSV and one for JSON.
--
Michael

#6Andrew Dunstan
andrew@dunslane.net
In reply to: Michael Paquier (#5)
Re: Add jsonlog log_destination for JSON server logs

On 9/8/21 2:58 AM, Michael Paquier wrote:

On Wed, Sep 01, 2021 at 04:39:43PM -0400, Sehrope Sarkuni wrote:

That makes the elog.c changes in the JSON logging patch minimal as all it's
really doing is invoking the new write_jsonlog(...) function.

Looking at 0001, to do things in order.

@@ -46,8 +46,8 @@ typedef struct
char		nuls[2];		/* always \0\0 */
uint16		len;			/* size of this chunk (counts data only) */
int32		pid;			/* writer's pid */
-	char		is_last;		/* last chunk of message? 't' or 'f' ('T' or
-								 * 'F' for CSV case) */
+	int32		dest;			/* log destination */
+	char		is_last;        /* last chunk of message? 't' or 'f'*/
char		data[FLEXIBLE_ARRAY_MEMBER];	/* data payload starts here */
} PipeProtoHeader;

Making PipeProtoHeader larger is not free, and that could penalize
workloads with a lot of short messages and many backends as the
syslogger relies on pipes with sync calls. Why not switching is_last
to bits8 flags instead? That should be enough for the addition of
JSON. 3 bits are enough at the end: one to know if it is the last
chunk of message, one for CSV and one for JSON.

Yeah. A very simple change would be to use two different values for json
(say 'y' and 'n'). A slightly more principled scheme might use the top
bit for the end marker and the bottom 3 bits for the dest type (so up to
8 types possible), with the rest available for future use.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#7Michael Paquier
michael@paquier.xyz
In reply to: Andrew Dunstan (#6)
Re: Add jsonlog log_destination for JSON server logs

On Wed, Sep 08, 2021 at 08:46:44AM -0400, Andrew Dunstan wrote:

Yeah. A very simple change would be to use two different values for json
(say 'y' and 'n'). A slightly more principled scheme might use the top
bit for the end marker and the bottom 3 bits for the dest type (so up to
8 types possible), with the rest available for future use.

I was thinking to track stderr as a case where no bits are set in the
flags for the area of the destinations, but that's a bit crazy if we
have a lot of margin in what can be stored. I have looked at that and
finished with the attached which is an improvement IMO, especially
when it comes to the header validation.

FWIW, while looking at my own external module for the JSON logs, I
noticed that I used the chunk protocol when the log redirection is
enabled, but just enforced everything to be sent to stderr.
--
Michael

Attachments:

syslogger-flags.patchtext/plain; charset=us-asciiDownload+24-9
#8Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#6)
Re: Add jsonlog log_destination for JSON server logs

Fwiw I was shocked when I saw the t/f T/F kluge when I went to work on
jsonlogging. That's the kind of dead-end short-sighted hack that just
lays traps and barriers for future hackers to have to clean up before
they can do the work they want to do.

Please just put a "format" field (or "channel" field -- the logging
daemon doesn't really care what format) with a list of defined formats
that can easily be extended in the future. If you want to steal the
high bit for "is last" and only allow 128 values instead of 256 so be
it.

#9Michael Paquier
michael@paquier.xyz
In reply to: Bruce Momjian (#8)
Re: Add jsonlog log_destination for JSON server logs

On Wed, Sep 08, 2021 at 10:58:51PM -0400, Greg Stark wrote:

Please just put a "format" field (or "channel" field -- the logging
daemon doesn't really care what format) with a list of defined formats
that can easily be extended in the future. If you want to steal the
high bit for "is last" and only allow 128 values instead of 256 so be
it.

Which is what I just posted here:
/messages/by-id/YTlunSciDRl1z7ik@paquier.xyz

Well, we could also do things so as we have two fields, as of
something like:
typedef struct
{
[...]
bits8 flags:4, format:4;
[...]
} PipeProtoHeader;

I am not sure if this is an improvement in readability though, and
that's less consistent with the recent practice we've been trying to
follow with bitmasks and flag-like options.
--
Michael

#10Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#7)
Re: Add jsonlog log_destination for JSON server logs

On Thu, Sep 09, 2021 at 11:17:01AM +0900, Michael Paquier wrote:

I was thinking to track stderr as a case where no bits are set in the
flags for the area of the destinations, but that's a bit crazy if we
have a lot of margin in what can be stored. I have looked at that and
finished with the attached which is an improvement IMO, especially
when it comes to the header validation.

One part that was a bit flacky after more consideration is that the
header validation would consider as correct the case where both stderr
and csvlog are set in the set of flags. I have finished by just using
pg_popcount() on one byte with a filter on the log destinations,
making the whole more robust. If there are any objections, please let
me know.
--
Michael

Attachments:

syslogger-flags-2.patchtext/plain; charset=us-asciiDownload+27-9
#11Michael Paquier
michael@paquier.xyz
In reply to: Sehrope Sarkuni (#4)
Re: Add jsonlog log_destination for JSON server logs

On Wed, Sep 01, 2021 at 04:39:43PM -0400, Sehrope Sarkuni wrote:

This version splits out the existing csvlog code into its own file and
centralizes the common helpers into a new elog-internal.h so that they're
only included by the actual write_xyz sources.

That makes the elog.c changes in the JSON logging patch minimal as all it's
really doing is invoking the new write_jsonlog(...) function.

It also adds those missing fields to the JSON logger output.

Forking a bit this thread while looking at 0002 that adds new tests
for csvlog. While I agree that it would be useful to have more
coverage with the syslogger message chunk protocol in this area, I
think that having a separate test is a waste of resources. Creating a
new node is not cheap either, and this adds more wait phases, making
the tests take longer. It would be much better to extend
004_logrotate.pl and update it to use log_destination = 'stderr,
csvlog', to minimize the number of nodes we create as well as the
additional amount of time we'd spend for those tests. Plugging in
JSON into that would not be complicated either once we have in place a
set of small routines that limit the code duplication between the
checks for each log destination type.
--
Michael

#12Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#11)
Re: Add jsonlog log_destination for JSON server logs

On Fri, Sep 10, 2021 at 01:07:00PM +0900, Michael Paquier wrote:

Forking a bit this thread while looking at 0002 that adds new tests
for csvlog. While I agree that it would be useful to have more
coverage with the syslogger message chunk protocol in this area, I
think that having a separate test is a waste of resources. Creating a
new node is not cheap either, and this adds more wait phases, making
the tests take longer. It would be much better to extend
004_logrotate.pl and update it to use log_destination = 'stderr,
csvlog', to minimize the number of nodes we create as well as the
additional amount of time we'd spend for those tests. Plugging in
JSON into that would not be complicated either once we have in place a
set of small routines that limit the code duplication between the
checks for each log destination type.

And this part leads me to the attached, where the addition of the JSON
format would result in the addition of a couple of lines.
--
Michael

Attachments:

tap-csvlog.patchtext/plain; charset=us-asciiDownload+58-37
#13Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#12)
Re: Add jsonlog log_destination for JSON server logs

On Fri, Sep 10, 2021 at 03:56:18PM +0900, Michael Paquier wrote:

And this part leads me to the attached, where the addition of the JSON
format would result in the addition of a couple of lines.

Okay, I have worked through the first half of the patch set, and
applied the improved versions of 0001 (refactoring of the chunk
protocol) and 0002 (addition of the tests for csvlog).

I have not looked in details at 0003 and 0004 yet. Still, here are
some comments after a quick scan.

+ * elog-internal.h
I'd rather avoid the hyphen, and use elog_internal.h.

+#define FORMATTED_TS_LEN 128
+extern char formatted_start_time[FORMATTED_TS_LEN];
+extern char formatted_log_time[FORMATTED_TS_LEN];
+
+void setup_formatted_log_time(void);
+void setup_formatted_start_time(void);
We could just use a static buffer in each one of those routines, and
return back a pointer to the caller.
+   else if ((Log_destination & LOG_DESTINATION_JSONLOG) &&
+       (jsonlogFile == NULL ||
+        time_based_rotation || (size_rotation_for & LOG_DESTINATION_JSONLOG)))
[...]
+   /* Write to JSON log if enabled */
+   else if (Log_destination & LOG_DESTINATION_JSONLOG)
+   {
Those bits in 0004 are wrong.  They should be two "if" clauses.  This
is leading to issues when setting log_destination to multiple values
with jsonlog in the set of values and logging_connector = on, and the
logs are not getting redirected properly to the .json file.  We would 
get the data for the .log and .csv files though.  This issue also
happens with the original patch set applied on top of e757080.   I
think that we should also be more careful with the way we free
StringInfoData in send_message_to_server_log(), or we are going to
finish with unwelcome leaks.

The same coding pattern is now repeated three times in logfile_rotate():
- Check if a logging type is enabled.
- Optionally open new file, with logfile_open().
- Business with ENFILE and EMFILE.
- pfree() and save of the static FILE* ane file name for each type.
I think that we have enough material for a wrapper routine that does
this work, where we pass down LOG_DESTINATION_* and pointers to the
static FILE* and the static last file names. That would have avoided
the bug I found above.

The rebased patch set, that has reworked the tests to be in line with
HEAD, also fails. They compile.

Sehrope, could you adjust that?
--
Michael

Attachments:

v3-0001-Split-csv-handling-in-elog.c-into-separate-csvlog.patchtext/plain; charset=us-asciiDownload+365-298
v3-0002-Add-jsonlog-log_destination-for-JSON-server-logs.patchtext/plain; charset=us-asciiDownload+585-28
#14Sehrope Sarkuni
sehrope@jackdb.com
In reply to: Michael Paquier (#13)
Re: Add jsonlog log_destination for JSON server logs

On Sun, Sep 12, 2021 at 10:22 PM Michael Paquier <michael@paquier.xyz>
wrote:

On Fri, Sep 10, 2021 at 03:56:18PM +0900, Michael Paquier wrote:

And this part leads me to the attached, where the addition of the JSON
format would result in the addition of a couple of lines.

Okay, I have worked through the first half of the patch set, and
applied the improved versions of 0001 (refactoring of the chunk
protocol) and 0002 (addition of the tests for csvlog).

Thanks. I finally got a chance to look through those changes. I like it.
The popcount and pulling out the flags are much easier to follow than the
old hard coded letters.

I have not looked in details at 0003 and 0004 yet. Still, here are
some comments after a quick scan.

+ * elog-internal.h
I'd rather avoid the hyphen, and use elog_internal.h.

+#define FORMATTED_TS_LEN 128
+extern char formatted_start_time[FORMATTED_TS_LEN];
+extern char formatted_log_time[FORMATTED_TS_LEN];
+
+void setup_formatted_log_time(void);
+void setup_formatted_start_time(void);
We could just use a static buffer in each one of those routines, and
return back a pointer to the caller.

+1

+   else if ((Log_destination & LOG_DESTINATION_JSONLOG) &&
+       (jsonlogFile == NULL ||
+        time_based_rotation || (size_rotation_for &
LOG_DESTINATION_JSONLOG)))
[...]
+   /* Write to JSON log if enabled */
+   else if (Log_destination & LOG_DESTINATION_JSONLOG)
+   {
Those bits in 0004 are wrong.  They should be two "if" clauses.  This
is leading to issues when setting log_destination to multiple values
with jsonlog in the set of values and logging_connector = on, and the
logs are not getting redirected properly to the .json file.  We would
get the data for the .log and .csv files though.  This issue also
happens with the original patch set applied on top of e757080.   I
think that we should also be more careful with the way we free
StringInfoData in send_message_to_server_log(), or we are going to
finish with unwelcome leaks.

Good catch. Staring at that piece again, that's tricky as it tries to
aggressively free the buffer before calling write_cvslog(...). Which can't
just be duplicated for additional destinations.

I think we need to pull up the negative case (i.e. syslogger not available)
before the other destinations and if it matches, free and exit early.
Otherwise, free the buffer and call whatever destination routines are
enabled.

The same coding pattern is now repeated three times in logfile_rotate():
- Check if a logging type is enabled.
- Optionally open new file, with logfile_open().
- Business with ENFILE and EMFILE.
- pfree() and save of the static FILE* ane file name for each type.
I think that we have enough material for a wrapper routine that does
this work, where we pass down LOG_DESTINATION_* and pointers to the
static FILE* and the static last file names. That would have avoided
the bug I found above.

I started on a bit of this as well. There's so much overlap already between
the syslog_ and csvlog code that I'm going to put that together first. Then
the json addition should just fall into a couple of call sites.

I'm thinking of adding an internal struct to house the FILE* and file
names. Then all the opening, closing, and log rotation code can pass
pointers to those and centralize the pfree() and NULL checks.

The rebased patch set, that has reworked the tests to be in line with
HEAD, also fails. They compile.

Sehrope, could you adjust that?

Yes I'm looking to sync those up and address the above later this week.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

#15Michael Paquier
michael@paquier.xyz
In reply to: Sehrope Sarkuni (#14)
Re: Add jsonlog log_destination for JSON server logs

On Mon, Sep 13, 2021 at 11:56:52PM -0400, Sehrope Sarkuni wrote:

Good catch. Staring at that piece again, that's tricky as it tries to
aggressively free the buffer before calling write_cvslog(...). Which can't
just be duplicated for additional destinations.

I think we need to pull up the negative case (i.e. syslogger not available)
before the other destinations and if it matches, free and exit early.
Otherwise, free the buffer and call whatever destination routines are
enabled.

Yes, I got a similar impression.

I started on a bit of this as well. There's so much overlap already between
the syslog_ and csvlog code that I'm going to put that together first. Then
the json addition should just fall into a couple of call sites.

I'm thinking of adding an internal struct to house the FILE* and file
names. Then all the opening, closing, and log rotation code can pass
pointers to those and centralize the pfree() and NULL checks.

Agreed on both points (using a structure and doing the refactoring as
a first patch).
--
Michael

#16Sehrope Sarkuni
sehrope@jackdb.com
In reply to: Michael Paquier (#15)
Re: Add jsonlog log_destination for JSON server logs

Attached three patches refactor the syslogger handling of file based
destinations a bit ... and then a lot.

First patch adds a new constant to represent both file based destinations.
This should make it easier to ensure additional destinations are handled in
"For all file destinations..." situations (e.g. when we add the jsonlog
destination).

Second patch refactors the file descriptor serialization and re-opening in
EXEC_BACKEND forking. Previously the code was duplicated for both stderr
and csvlog. Again, this should simplify adding new destinations as they'd
just invoke the same helper. There's an existing comment about not handling
failed opens in syslogger_parseArgs(...) and this patch doesn't fix that,
but it does provide a single location to do so in the future.

The third patch adds a new internal (to syslogger.c) structure,
FileLogDestination, for file based log destinations and modifies the
existing handling for syslogFile and csvlogFile to defer to a bunch of
helper functions. It also renames "syslogFile" to "stderr_file_info".
Working through this patch, it was initially confusing that the stderr log
file was named "syslogFile", the C file is named syslogger.c, and there's
an entirely separate syslog (the message logging API) destination. I think
this clears that up a bit.

The patches pass check-world on Linux. I haven't tested it on Windows but
it does pass check-world with EXEC_BACKEND defined to try out the forking
code paths. Definitely needs some review to ensure it's functionally
correct for the different log files.

I haven't tried splitting the csvlog code out or adding the new jsonlog
atop this yet. There's enough changes here that I'd like to get this
settled first.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Attachments:

v4-0001-Add-constant-for-list-of-log-destinations-that-use-f.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Add-constant-for-list-of-log-destinations-that-use-f.patchDownload+5-4
v4-0002-Split-out-syslogger-EXEC_BACKEND-fd-serialization-an.patchtext/x-patch; charset=US-ASCII; name=v4-0002-Split-out-syslogger-EXEC_BACKEND-fd-serialization-an.patchDownload+44-66
v4-0003-Refactor-syslogger-to-consolidate-common-tasks-for-f.patchtext/x-patch; charset=US-ASCII; name=v4-0003-Refactor-syslogger-to-consolidate-common-tasks-for-f.patchDownload+199-217
#17Sehrope Sarkuni
sehrope@jackdb.com
In reply to: Sehrope Sarkuni (#16)
Re: Add jsonlog log_destination for JSON server logs

The previous patches failed on the cfbot Appveyor Windows build. That also
makes me question whether I did the EXEC_BACKEND testing correctly as it
should have caught that compile error. I'm going to look into that.

In the meantime the attached should correct that part of the build.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Show quoted text

Attachments:

v5-0001-Add-constant-for-list-of-log-destinations-that-use-f.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Add-constant-for-list-of-log-destinations-that-use-f.patchDownload+5-4
v5-0002-Split-out-syslogger-EXEC_BACKEND-fd-serialization-an.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Split-out-syslogger-EXEC_BACKEND-fd-serialization-an.patchDownload+44-66
v5-0003-Refactor-syslogger-to-consolidate-common-tasks-for-f.patchtext/x-patch; charset=US-ASCII; name=v5-0003-Refactor-syslogger-to-consolidate-common-tasks-for-f.patchDownload+199-217
#18Michael Paquier
michael@paquier.xyz
In reply to: Sehrope Sarkuni (#16)
Re: Add jsonlog log_destination for JSON server logs

On Thu, Sep 16, 2021 at 05:27:20PM -0400, Sehrope Sarkuni wrote:

Attached three patches refactor the syslogger handling of file based
destinations a bit ... and then a lot.

First patch adds a new constant to represent both file based destinations.
This should make it easier to ensure additional destinations are handled in
"For all file destinations..." situations (e.g. when we add the jsonlog
destination).

Second patch refactors the file descriptor serialization and re-opening in
EXEC_BACKEND forking. Previously the code was duplicated for both stderr
and csvlog. Again, this should simplify adding new destinations as they'd
just invoke the same helper. There's an existing comment about not handling
failed opens in syslogger_parseArgs(...) and this patch doesn't fix that,
but it does provide a single location to do so in the future.

The third patch adds a new internal (to syslogger.c) structure,
FileLogDestination, for file based log destinations and modifies the
existing handling for syslogFile and csvlogFile to defer to a bunch of
helper functions. It also renames "syslogFile" to "stderr_file_info".
Working through this patch, it was initially confusing that the stderr log
file was named "syslogFile", the C file is named syslogger.c, and there's
an entirely separate syslog (the message logging API) destination. I think
this clears that up a bit.

The patches pass check-world on Linux. I haven't tested it on Windows but
it does pass check-world with EXEC_BACKEND defined to try out the forking
code paths. Definitely needs some review to ensure it's functionally
correct for the different log files.

Compilation with EXEC_BACKEND on Linux should work, so no need to
bother with Windows when it comes to that. There is a buildfarm
machine testing this configuration, for example.

I haven't tried splitting the csvlog code out or adding the new jsonlog
atop this yet. There's enough changes here that I'd like to get this
settled first.

Makes sense.

I am not really impressed by 0001 and the introduction of
LOG_DESTINATIONS_WITH_FILES. So I would leave that out and keep the
list of destinations listed instead. And we are talking about two
places here, only within syslogger.c.

0002 looks like an improvement, but I think that 0003 makes the
readability of the code worse with the introduction of eight static
routines to handle all those cases.

file_log_dest_should_rotate_for_size(), file_log_dest_close(),
file_log_dest_check_rotate_for_size(), get_syslogger_file() don't
bring major improvements. On the contrary,
file_log_dest_write_metadata and file_log_dest_rotate seem to reduce a
bit the noise.
--
Michael

#19Sehrope Sarkuni
sehrope@jackdb.com
In reply to: Michael Paquier (#18)
Re: Add jsonlog log_destination for JSON server logs

On Thu, Sep 16, 2021 at 9:36 PM Michael Paquier <michael@paquier.xyz> wrote:

I am not really impressed by 0001 and the introduction of
LOG_DESTINATIONS_WITH_FILES. So I would leave that out and keep the
list of destinations listed instead. And we are talking about two
places here, only within syslogger.c.

I've taken that out for now. The idea was to simplify the additions when
jsonlog is added but can circle back to that later if it makes sense.

0002 looks like an improvement,

Nice. That's left unchanged (renamed to 0001 now).

but I think that 0003 makes the
readability of the code worse with the introduction of eight static
routines to handle all those cases.

file_log_dest_should_rotate_for_size(), file_log_dest_close(),
file_log_dest_check_rotate_for_size(), get_syslogger_file() don't
bring major improvements. On the contrary,
file_log_dest_write_metadata and file_log_dest_rotate seem to reduce a
bit the noise.

It was helpful to split them out while working on the patch but I see your
point upon re-reading through the result.

Attached version (renamed to 002) adds only three static functions for
checking rotation size, performing the actual rotation, and closing. Also
one other to handle generating the logfile names with a suffix to handle
the pfree-ing (wrapping logfile_open(...)).

The rest of the changes happen in place using the new structures.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Attachments:

v6-0001-Split-out-syslogger-EXEC_BACKEND-fd-serialization-an.patchtext/x-patch; charset=US-ASCII; name=v6-0001-Split-out-syslogger-EXEC_BACKEND-fd-serialization-an.patchDownload+44-66
v6-0002-Refactor-syslogger-to-consolidate-common-tasks-for-f.patchtext/x-patch; charset=US-ASCII; name=v6-0002-Refactor-syslogger-to-consolidate-common-tasks-for-f.patchDownload+164-182
#20Michael Paquier
michael@paquier.xyz
In reply to: Sehrope Sarkuni (#19)
Re: Add jsonlog log_destination for JSON server logs

On Fri, Sep 17, 2021 at 04:36:57PM -0400, Sehrope Sarkuni wrote:

It was helpful to split them out while working on the patch but I see your
point upon re-reading through the result.

Attached version (renamed to 002) adds only three static functions for
checking rotation size, performing the actual rotation, and closing. Also
one other to handle generating the logfile names with a suffix to handle
the pfree-ing (wrapping logfile_open(...)).

The rest of the changes happen in place using the new structures.

I have looked at that in details, and found that the introduction of
FileLogDestination makes the code harder to follow, and that the
introduction of the file extension, the destination name and the
expected target destination LOG_DESTINATION_* had a limited impact
because they are used in few places. The last two useful pieces are
the FILE* handle and the last file name for current_logfiles.

Attached are updated patches. The logic of 0001 to refactor the fd
fetch/save logic when forking the syslogger in EXEC_BACKEND builds is
unchanged. I have tweaked the patch with more comments and different
routine names though. Patch 0002 refactors the main point that
introduced FileLogDestination by refactoring the per-destination file
rotation, not forgetting the fact that the file last name and handle
for stderr can never be cleaned up even if LOG_DESTINATION_STDERR is
disabled. Grepping after LOG_DESTINATION_CSVLOG in the code tree, I'd
be fine to live with this level of abstraction as each per-destination
change are grouped with each other so they are hard to miss.

0001 is in a rather commitable shape, and I have made the code
consistent with HEAD. However, I think that its handling of
_get_osfhandle() is clunky for 64-bit compilations as long is 32b in
WIN32 but intptr_t is platform-dependent as it could be 32b or 64b, so
atoi() would overflow if the handle is larger than INT_MAX for 64b
builds:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/standard-types
This problem deserves a different thread.

It would be good for 0002 if an extra pair of eyes looks at it. While
on it, I have renamed the existing last_file_name to
last_sys_file_name in 0002 to make the naming more consistent with
syslogFile. It is independent of 0001, so it could be done first as
well.
--
Michael

Attachments:

v7-0001-Refactor-fd-handling-when-forking-syslogger-in-EX.patchtext/x-diff; charset=us-asciiDownload+62-61
v7-0002-Refactor-per-destination-file-rotation-in-syslogg.patchtext/x-diff; charset=us-asciiDownload+119-124
#21Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#20)
#22Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#21)
#23Nathan Bossart
nathandbossart@gmail.com
In reply to: Michael Paquier (#22)
#24Michael Paquier
michael@paquier.xyz
In reply to: Nathan Bossart (#23)
#25Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#22)
#26Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#25)
#27Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#26)
#28Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#27)
#29Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#28)
#30Andrew Dunstan
andrew@dunslane.net
In reply to: Michael Paquier (#29)
#31Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#30)
#32Michael Paquier
michael@paquier.xyz
In reply to: Andrew Dunstan (#31)
#33Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#32)
#34Nathan Bossart
nathandbossart@gmail.com
In reply to: Michael Paquier (#33)
#35Michael Paquier
michael@paquier.xyz
In reply to: Nathan Bossart (#34)
#36Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#35)
#37Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#36)
#38Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#33)
#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#38)
#40Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#38)