WIP: Separate log file for extension

Started by Antonin Houskaalmost 9 years ago13 messageshackers

ah@cybertec.at

almost 9 years ago

Attached is a draft patch to allow extension to write log messages to a
separate file. It introduces a concept of a "log stream". The extension's
shared library gets its stream assigned by calling this function from
_PG_init()

my_stream_id = get_log_stream("my_extension", &my_log_stream);

Then it's supposed to change some of its attributes

adjust_log_stream_attr(&stream->filename, "my_extension.log");

and to use the stream id in ereport() calls

ereport(LOG, (errmsg("Hello world"), errstream(my_stream_id)));

The EXEC_BACKEND mechanism makes initialization of the log streams by
postmaster child processes non-trivial. I decided to extend
save_backend_variables() and restore_backend_variables() accordingly. Maybe
someone has better idea.

pgaudit seems to be the most obvious use case for this enhancement, but it
might be useful for many other extensions.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

Craig Ringer

craig@2ndquadrant.com

almost 9 years ago

In reply to: Antonin Houska (#1)

Re: WIP: Separate log file for extension

On 25 August 2017 at 15:12, Antonin Houska <ah@cybertec.at> wrote:

Attached is a draft patch to allow extension to write log messages to a
separate file.

I like the idea a lot. I'm not so sure about the approach.

How will this play with syslog? csvlog? etc?

I wonder if a level of indirection is appropriate here, where extensions
(or postgres subsystems, even) provide a log stream label. Then the logging
backed takes care of using that appropriately for the logging mechanism in
use; for logging to file that'd generally be separate files. Same for
CSVlog. Other mechanisms could be left as stubs initially.

So the outcome would be the same, just without the assumption of specific
file name and output mechanism baked in.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Antonin Houska

ah@cybertec.at

almost 9 years ago

In reply to: Craig Ringer (#2)

Re: WIP: Separate log file for extension

Craig Ringer <craig@2ndquadrant.com> wrote:

On 25 August 2017 at 15:12, Antonin Houska <ah@cybertec.at> wrote:

How will this play with syslog? csvlog? etc?

There's nothing special about csvlog: the LogStream structure has a
"destination" field, so if particular extension wants this kind of output, it
simply sets the LOG_DESTINATION_CSVLOG bit here.

LOG_DESTINATION_SYSLOG is a problem because (AFAIK) a single process can
maintain no more than one connection to the syslog server. I don't like the
idea of closing the current connection and opening a different one whenever
the next ereport() is associated with a different stream than the previous
was.

As for LOG_DESTINATION_EVENTLOG, I haven't checked yet.

I wonder if a level of indirection is appropriate here, where extensions (or
postgres subsystems, even) provide a log stream label. Then the logging
backed takes care of using that appropriately for the logging mechanism in
use; for logging to file that'd generally be separate files. Same for
CSVlog. Other mechanisms could be left as stubs initially.

So the outcome would be the same, just without the assumption of specific
file name and output mechanism baked in.

The example I shown in my previous email was the simplest case I could think
of. But it does not mean that the file name has to be hard-coded in the
extension. Instead of setting the LogStream.filename field, you can pass a
pointer to this field to DefineCustomStringVariable() function, so the
specific GUC can control the value.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Masahiko Sawada

sawada.mshk@gmail.com

almost 9 years ago

In reply to: Antonin Houska (#1)

Re: WIP: Separate log file for extension

On Fri, Aug 25, 2017 at 4:12 PM, Antonin Houska <ah@cybertec.at> wrote:

Attached is a draft patch to allow extension to write log messages to a
separate file.

Does it allow postgres core code to write log messages to multiple log
files as well? I can imagine a use case where we want to write log
messages to separate log files by error level or purpose (server log
and SQL log etc).

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Antonin Houska

ah@cybertec.at

over 8 years ago

In reply to: Masahiko Sawada (#4)

Re: WIP: Separate log file for extension

Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Aug 25, 2017 at 4:12 PM, Antonin Houska <ah@cybertec.at> wrote:

Attached is a draft patch to allow extension to write log messages to a
separate file.

Does it allow postgres core code to write log messages to multiple log
files as well? I can imagine a use case where we want to write log
messages to separate log files by error level or purpose (server log
and SQL log etc).

At low level it should work as long as several log streams are reserved for
the core code. However, as every single stream needs multiple variables, I
have no idea how to configure those additional streams w/o adding many new GUC
variables to the core.

Also the discussion what should (not) be logged separate would probably be
difficult.

Specifically to log SQL statements to a separate file, I think (but not
verified) that you can write an extension that reserves a stream for itself
and also uses emit_log_hook to recognize SQL statements among the logged
messages. If the hook adjusts the syslogger_stream field of ErrorData
accordingly, the message should end up in the separate file.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 8 years ago

In reply to: Antonin Houska (#3)

Re: WIP: Separate log file for extension

On 08/28/2017 11:23 AM, Antonin Houska wrote:

Craig Ringer <craig@2ndquadrant.com> wrote:

On 25 August 2017 at 15:12, Antonin Houska <ah@cybertec.at> wrote:

How will this play with syslog? csvlog? etc?

There's nothing special about csvlog: the LogStream structure has a
"destination" field, so if particular extension wants this kind of output, it
simply sets the LOG_DESTINATION_CSVLOG bit here.

I assume Craig's point was more about CSVLOG requiring log_collector=on.
So what will happen if the PostgreSQL is started without the collector,
and an extension attempts to use LOG_DESTINATION_CSVLOG? Or will it
start a separate collector for each such extension?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Magnus Hagander

magnus@hagander.net

over 8 years ago

In reply to: Antonin Houska (#1)

Re: WIP: Separate log file for extension

On Fri, Aug 25, 2017 at 12:12 AM, Antonin Houska <ah@cybertec.at> wrote:

Attached is a draft patch to allow extension to write log messages to a
separate file. It introduces a concept of a "log stream". The extension's
shared library gets its stream assigned by calling this function from
_PG_init()

my_stream_id = get_log_stream("my_extension", &my_log_stream);

I like this idea in general.

Then it's supposed to change some of its attributes

adjust_log_stream_attr(&stream->filename, "my_extension.log");

This, however, seems to be wrong.

The logfile name does not belong in the extension, it belongs in the
configuration file. I think the extension should set it's "stream id" or
whatever you want to call it, and then it should be possible to control in
postgresql.conf where that log is sent.

Also, what if this extension is loaded on demand in a session and not via
shared_preload_libraries? It looks like the syslogger only gets the list of
configured streams when it starts?

In short, I think the solution should be more generic, and not "just for
extensions".

I completely missed this thread when I did my quick-wip at
/messages/by-id/CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj+BQZjpvmRurdw@mail.gmail.com
-- some of the changes made were close enough that I got the two confused
:) Based on the feedback of that one, have you done any performance checks?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Antonin Houska

ah@cybertec.at

over 8 years ago

In reply to: Tomas Vondra (#6)

Re: WIP: Separate log file for extension

Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On 08/28/2017 11:23 AM, Antonin Houska wrote:

Craig Ringer <craig@2ndquadrant.com> wrote:

On 25 August 2017 at 15:12, Antonin Houska <ah@cybertec.at> wrote:

How will this play with syslog? csvlog? etc?

There's nothing special about csvlog: the LogStream structure has a
"destination" field, so if particular extension wants this kind of output, it
simply sets the LOG_DESTINATION_CSVLOG bit here.

I assume Craig's point was more about CSVLOG requiring log_collector=on.
So what will happen if the PostgreSQL is started without the collector,
and an extension attempts to use LOG_DESTINATION_CSVLOG? Or will it
start a separate collector for each such extension?

The patch does not try to change this behavior. So there'll be no csvlog file
and only the plain (not csv-formatted) message will be written to console.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Antonin Houska

ah@cybertec.at

over 8 years ago

In reply to: Magnus Hagander (#7)

Re: WIP: Separate log file for extension

Magnus Hagander <magnus@hagander.net> wrote:

On Fri, Aug 25, 2017 at 12:12 AM, Antonin Houska <ah@cybertec.at> wrote:

I like this idea in general.

Then it's supposed to change some of its attributes

adjust_log_stream_attr(&stream->filename, "my_extension.log");

This, however, seems to be wrong.

The logfile name does not belong in the extension, it belongs in the
configuration file. I think the extension should set it's "stream id" or
whatever you want to call it, and then it should be possible to control in
postgresql.conf where that log is sent.

Doesn't the last paragraph of

/messages/by-id/11412.1503912190@localhost

address your concerns?

Also, what if this extension is loaded on demand in a session and not via
shared_preload_libraries? It looks like the syslogger only gets the list of
configured streams when it starts?

Yes, the syslogger gets the list of streams only when it starts, so the
extension that wants to use this feature needs to provide the file information
via shared_preload_libraries. I consider it sufficient because various
existing logging-related GUCs also can't be changed on-the-fly.

In short, I think the solution should be more generic, and not "just for extensions".

o.k. Any idea about dividing the streams into categories? Should they for
example correspond somehow to categories of GUC variables?

I completely missed this thread when I did my quick-wip at
/messages/by-id/CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj+BQZjpvmRurdw@mail.gmail.com
-- some of the changes made were close enough that I got the two confused :)
Based on the feedback of that one, have you done any performance checks?

I don't expect mere routing of messages into multiple files to bring any
overhead. I'll run some tests, just out of curiosity.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Antonin Houska

ah@cybertec.at

over 8 years ago

In reply to: Antonin Houska (#9)

Re: [HACKERS] WIP: Separate log file for extension

Antonin Houska <ah@cybertec.at> wrote:

Magnus Hagander <magnus@hagander.net> wrote:

On Fri, Aug 25, 2017 at 12:12 AM, Antonin Houska <ah@cybertec.at> wrote:

I like this idea in general.

Then it's supposed to change some of its attributes

adjust_log_stream_attr(&stream->filename, "my_extension.log");

This, however, seems to be wrong.

The logfile name does not belong in the extension, it belongs in the
configuration file. I think the extension should set it's "stream id" or
whatever you want to call it, and then it should be possible to control in
postgresql.conf where that log is sent.

Doesn't the last paragraph of

/messages/by-id/11412.1503912190@localhost

address your concerns?

Besides a new version of the patch, an example extension is attached that uses
the feature.

Also, what if this extension is loaded on demand in a session and not via
shared_preload_libraries? It looks like the syslogger only gets the list of
configured streams when it starts?

Yes, the syslogger gets the list of streams only when it starts, so the
extension that wants to use this feature needs to provide the file information
via shared_preload_libraries. I consider it sufficient because various
existing logging-related GUCs also can't be changed on-the-fly.

In short, I think the solution should be more generic, and not "just for extensions".

statement_log.diff demonstrates how the feature can be used by various
subsystems of PG core. Please consider it an example rather than part of the
"separate log patch". Even if there were no other design questions, there's
too much copy & paste in guc.c. I have no good idea right now how to improve
this part.

o.k. Any idea about dividing the streams into categories? Should they for
example correspond somehow to categories of GUC variables?

I completely missed this thread when I did my quick-wip at
/messages/by-id/CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj+BQZjpvmRurdw@mail.gmail.com
-- some of the changes made were close enough that I got the two confused :)
Based on the feedback of that one, have you done any performance checks?

I don't expect mere routing of messages into multiple files to bring any
overhead. I'll run some tests, just out of curiosity.

After having read the thread on your patch I think that the reason you were
asked to evaluate performance was that your patch can possibly make syslogger
a bottleneck. In contrast, my patch does not prevent user from disabling the
syslogger if it (the syslogger) seems to cause performance issues.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

#11

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Antonin Houska (#10)

Re: [HACKERS] WIP: Separate log file for extension

Antonin Houska <ah@cybertec.at> writes:

After having read the thread on your patch I think that the reason you were
asked to evaluate performance was that your patch can possibly make syslogger
a bottleneck. In contrast, my patch does not prevent user from disabling the
syslogger if it (the syslogger) seems to cause performance issues.

Just to clarify that: we know that in workloads that emit lots of log
output, the log collector *already is* a bottleneck; there are reports
that some people can't use it because it's too slow. See e.g. towards
the end of this thread:

/messages/by-id/CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj+BQZjpvmRurdw@mail.gmail.com

and particularly the referenced thread from 2011. (I seem to recall other
reports but didn't have much luck finding them.)

I'm quite concerned by the proposed patch, and not even so much any
performance issues; what bothers me is that it adds complexity into a
portion of the system where we can ill afford it. Bugs in the logging
mechanism compromise one's ability to have any faith in tracking down
other bugs. The difficulty of reconfiguring the logger on the fly
is another reason to not want more configuration options for it.
On the whole, therefore, I'd just as soon not go there --- especially
seeing that there's been little field demand for such features.

regards, tom lane

#12

Jing Wang

jingwangian@gmail.com

over 8 years ago

In reply to: Antonin Houska (#10)

Re: [HACKERS] WIP: Separate log file for extension

Hi Antonin,

1. check-world run error as following:

for extra in contrib/adminpack; do make -C '../..'/$extra
DESTDIR='/db/pgmaster/postgres'/tmp_install install

'/db/pgmaster/postgres'/tmp_install/log/install.log || exit; done

In file included from ../../src/include/c.h:1135:0,
from ../../src/include/postgres.h:47,
from adminpack.c:15:
adminpack.c: In function ‘convert_and_check_filename’:
adminpack.c:84:38: error: ‘Log_directory’ undeclared (first use in this
function)
(!logAllowed || !is_absolute_path(Log_directory) ||

2. In function get_log_stream

The following code
if (strcmp(id, stream->id))
ereport(ERROR,
(errmsg("log stream with id \"%s\" already exists", id)));

should be " if (strcmp(id,stream->id)==0)" ?

3. In function pipeThread

csvlog_file can't be found but referred by the following statements:

if (ftell(syslog_file) >= Log_RotationSize * 1024L ||
(csvlog_file != NULL &&

4. In logfile_open function
LogStream *stream = &log_streams[stream_id];
int file_mode = stream->file_mode;

It may be better to be as following:
int file_mode = log_streams[stream_id].file_mode

--
Regards,
Jing Wang
Fujitsu Australia

#13

David Steele

david@pgmasters.net

over 8 years ago

In reply to: Tom Lane (#11)

Re: Re: [HACKERS] WIP: Separate log file for extension

Hi Antonin,

On 1/10/18 5:38 PM, Tom Lane wrote:

Antonin Houska <ah@cybertec.at> writes:

After having read the thread on your patch I think that the reason you were
asked to evaluate performance was that your patch can possibly make syslogger
a bottleneck. In contrast, my patch does not prevent user from disabling the
syslogger if it (the syslogger) seems to cause performance issues.

Just to clarify that: we know that in workloads that emit lots of log
output, the log collector *already is* a bottleneck; there are reports
that some people can't use it because it's too slow. See e.g. towards
the end of this thread:

/messages/by-id/CABUevExztL0GORyWM9S4tR_Ft3FmJbRaxQdxj+BQZjpvmRurdw@mail.gmail.com

and particularly the referenced thread from 2011. (I seem to recall other
reports but didn't have much luck finding them.)

I'm quite concerned by the proposed patch, and not even so much any
performance issues; what bothers me is that it adds complexity into a
portion of the system where we can ill afford it. Bugs in the logging
mechanism compromise one's ability to have any faith in tracking down
other bugs. The difficulty of reconfiguring the logger on the fly
is another reason to not want more configuration options for it.
On the whole, therefore, I'd just as soon not go there --- especially
seeing that there's been little field demand for such features.

I think this feature would be useful, especially for an extension like
pgaudit. It's a request I hear fairly frequently.

However, there doesn't seem to be consensus that this is a viable
approach, so marked as Returned with Feedback for this CF.

This may be too invasive a feature to be a good fit for the last PG11 CF
in March but I hope you keep working on the idea.

Regards,
--
-David
david@pgmasters.net

WIP: Separate log file for extension

Attachments:

Attachments: