Doc update proposal for the note on log_statement in the runtime config for logging page

Started by Daniel Bauman9 months ago11 messageshackers
Jump to latest
#1Daniel Bauman
danielbaniel@gmail.com

Hey pgsql hackers
This is my first email to this list. I'm reaching out about a possible doc
update for the log_statement parameter.
Here's the current doc -
https://www.postgresql.org/docs/current/runtime-config-logging.html
The doc already has a Note bubble for the log_statement parameter with some
important callouts.

I am proposing adding something there to call out that there is no
guarantee provided that all committed statements in the DB will be logged.
The logging to the log file is not part of PGs ACID compliant DB engine but
instead a separate best-effort system. This might be obvious to some people
but I don't think it is for everyone.

In many cases people will use these logs for security, legal or compliance
reasons and I think it's relevant to users to make the consistency
guarantees explicit and clear.

Please let me know what you think and also if you think there's a different
place where this notification would be better placed. Maybe it already
exists somewhere else.

This would be very similar to the documentation that MySQL have over here -
https://dev.mysql.com/doc/refman/8.4/en/audit-log-logging-configuration.html
"Regardless of strategy, logging occurs on a best-effort basis, with no
guarantee of consistency."

I believe the same applies to the PGAudit extension but I'm starting out
here.

Thanks for your time,
Daniel

#2Bruce Momjian
bruce@momjian.us
In reply to: Daniel Bauman (#1)
Re: Doc update proposal for the note on log_statement in the runtime config for logging page

On Fri, Jul 25, 2025 at 03:24:48PM -0700, Daniel Bauman wrote:

Hey pgsql hackers
This is my first email to this list. I'm reaching out about a possible doc
update for the log_statement parameter.
Here's the current doc - https://www.postgresql.org/docs/current/
runtime-config-logging.html
The doc already has a Note bubble for the log_statement parameter with some
important callouts.

I am proposing adding something there to call out that there is no guarantee
provided that all committed statements in the DB will be logged. The logging to
the log file is not part of PGs ACID compliant DB engine but instead a separate
best-effort system. This might be obvious to some people but I don't think it
is for everyone.

In many cases people will use these logs for security, legal or compliance
reasons and I think it's relevant to users to make the consistency guarantees
explicit and clear.

Please let me know what you think and also if you think there's a different
place where this notification would be better placed. Maybe it already exists
somewhere else.

This would be very similar to the documentation that MySQL have over here - 
https://dev.mysql.com/doc/refman/8.4/en/audit-log-logging-configuration.html
"Regardless of strategy, logging occurs on a best-effort basis, with no
guarantee of consistency."

I believe the same applies to the PGAudit extension but I'm starting out here.

Why do you think log_statement is best effort?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

#3Daniel Bauman
danielbaniel@gmail.com
In reply to: Bruce Momjian (#2)
Re: Doc update proposal for the note on log_statement in the runtime config for logging page

Here's where I think the logging is happening.
https://github.com/postgres/postgres/blob/master/src/backend/tcop/postgres.c#L1070
It seems like the log is happening before application of the transaction,
not after.

So consistency is best effort. ie - a crash after the log but before the
transaction is committed could result in a logged statement that wasn't
actually committed.

Logging is not handled in the WAL and is not guaranteed to be consistent in
the way that different statements in a transaction are, which is the nuance
that I'm interested in documenting.
Maybe the doc update should be to say that statements in the log may not
have been applied rather than saying that applied statements might not
appear in the log.

I'm not very familiar with the codebase yet. If you know this section well,
I'd be very happy to learn what inconsistencies you think are possible in
postgres and whether you think there's scope to make the docs clearer about
them. If you're saying that there is some mechanism to keep the
log consistent with which transactions are actually committed could you
please share a link to that part of the code so I can learn more?

-Daniel

On Mon, Jul 28, 2025 at 11:38 AM Bruce Momjian <bruce@momjian.us> wrote:

Show quoted text

On Fri, Jul 25, 2025 at 03:24:48PM -0700, Daniel Bauman wrote:

Hey pgsql hackers
This is my first email to this list. I'm reaching out about a possible

doc

update for the log_statement parameter.
Here's the current doc - https://www.postgresql.org/docs/current/
runtime-config-logging.html
The doc already has a Note bubble for the log_statement parameter with

some

important callouts.

I am proposing adding something there to call out that there is no

guarantee

provided that all committed statements in the DB will be logged. The

logging to

the log file is not part of PGs ACID compliant DB engine but instead a

separate

best-effort system. This might be obvious to some people but I don't

think it

is for everyone.

In many cases people will use these logs for security, legal or

compliance

reasons and I think it's relevant to users to make the consistency

guarantees

explicit and clear.

Please let me know what you think and also if you think there's a

different

place where this notification would be better placed. Maybe it already

exists

somewhere else.

This would be very similar to the documentation that MySQL have over

here -

https://dev.mysql.com/doc/refman/8.4/en/audit-log-logging-configuration.html

"Regardless of strategy, logging occurs on a best-effort basis, with no
guarantee of consistency."

I believe the same applies to the PGAudit extension but I'm starting out

here.

Why do you think log_statement is best effort?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

#4Bruce Momjian
bruce@momjian.us
In reply to: Daniel Bauman (#3)
Re: Doc update proposal for the note on log_statement in the runtime config for logging page

On Mon, Jul 28, 2025 at 04:24:14PM -0700, Daniel Bauman wrote:

Here's where I think the logging is happening. https://github.com/postgres/
postgres/blob/master/src/backend/tcop/postgres.c#L1070
It seems like the log is happening before application of the transaction, not
after.

So consistency is best effort. ie - a crash after the log but before the
transaction is committed could result in a logged statement that wasn't
actually committed.

Logging is not handled in the WAL and is not guaranteed to be consistent in the
way that different statements in a transaction are, which is the nuance that
I'm interested in documenting.
Maybe the doc update should be to say that statements in the log may not have
been applied rather than saying that applied statements might not appear in the
log.

Yes, right, it is not that we do some "best effort", whatever that
means, but rather we just log when we start running the statement. The
docs clearly explain we log after parsing but before execution:

Statements that contain simple syntax errors are not logged even
by the <varname>log_statement</varname> = <literal>all</literal>
setting, because the log message is emitted only after basic
parsing has been done to determine the statement type.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

#5David G. Johnston
david.g.johnston@gmail.com
In reply to: Bruce Momjian (#4)
Re: Doc update proposal for the note on log_statement in the runtime config for logging page

On Monday, July 28, 2025, Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Jul 28, 2025 at 04:24:14PM -0700, Daniel Bauman wrote:

Here's where I think the logging is happening. https://github.com/

postgres/

postgres/blob/master/src/backend/tcop/postgres.c#L1070
It seems like the log is happening before application of the

transaction, not

after.

So consistency is best effort. ie - a crash after the log but before the
transaction is committed could result in a logged statement that wasn't
actually committed.

Your invalid assumption that commit/transaction is at all relevant here is
not something we’ve induced.

A select query returns the results to a client regardless of whether the
transaction it is executed within is committed or rolled back. Not logging
such a query because the transaction was rolled back would be an insane
design choice.

The only true “best-effort” consideration would be if somehow the logging
itself was flaky, say under system load, such that attempted writes to disk
somehow never made it there. I suppose in the case of a crash and “sync”
patterns there may be a latent bug or active design choice to not let those
syncs bog down the system…but that hasn’t been shown and I wouldn’t expect
it to exist absent such documentation. It’s definitely not correct to call
what we have best-effort. It’s just normal logging of valid statements
presented for execution. A normal type of audit trail of what work the
system performed.

David J.

#6Daniel Bauman
danielbaniel@gmail.com
In reply to: Bruce Momjian (#4)
Re: Doc update proposal for the note on log_statement in the runtime config for logging page

The doc fragment you shared is explaining to customers that basic syntax
checks are done before postgres gets to logging the transaction.
I don't think that is the same as clearly explaining to users statement
logging happens before execution.
At least for me, as a user and reader of the docs but not someone familiar
with the code, the docs didn't make it clear to me how statement logging
corresponded to query execution.

Do you think there's room to document something like
"Statements are logged before they are executed. It's not guaranteed that
logged statements have been successfully executed."
I'd be happy to submit and iterate on a pull request if you do.

I'd also like to understand what happens if there are errors writing the
log - like the disk where the log directory is configured being full.
My understanding is the following. ereport (
https://github.com/postgres/postgres/blob/71c0921b649d7a800eb2d6f93539890eaa14d979/src/include/utils/elog.h#L163)
will end up calling errfinish and errfinish will call EmitErrorReport (
https://github.com/postgres/postgres/blob/master/src/backend/utils/error/elog.c#L543)
which will call send_message_to_server_log (
https://github.com/postgres/postgres/blob/71c0921b649d7a800eb2d6f93539890eaa14d979/src/backend/utils/error/elog.c#L1733)
and that will call write_syslogger_file (
https://github.com/postgres/postgres/blob/71c0921b649d7a800eb2d6f93539890eaa14d979/src/backend/utils/error/elog.c#L3443C2-L3443C16
).
write_syslogger_file looks like it handles errors by logging to stderr but
not raising an error condition that would cancel the transaction (
https://github.com/postgres/postgres/blob/71c0921b649d7a800eb2d6f93539890eaa14d979/src/backend/postmaster/syslogger.c#L1126
)

Do I understand correctly or am I on the wrong codepath?

-Daniel

On Mon, Jul 28, 2025 at 4:35 PM Bruce Momjian <bruce@momjian.us> wrote:

Show quoted text

On Mon, Jul 28, 2025 at 04:24:14PM -0700, Daniel Bauman wrote:

Here's where I think the logging is happening.

https://github.com/postgres/

postgres/blob/master/src/backend/tcop/postgres.c#L1070
It seems like the log is happening before application of the

transaction, not

after.

So consistency is best effort. ie - a crash after the log but before the
transaction is committed could result in a logged statement that wasn't
actually committed.

Logging is not handled in the WAL and is not guaranteed to be

consistent in the

way that different statements in a transaction are, which is the nuance

that

I'm interested in documenting.
Maybe the doc update should be to say that statements in the log may not

have

been applied rather than saying that applied statements might not

appear in the

log.

Yes, right, it is not that we do some "best effort", whatever that
means, but rather we just log when we start running the statement. The
docs clearly explain we log after parsing but before execution:

Statements that contain simple syntax errors are not logged even
by the <varname>log_statement</varname> = <literal>all</literal>
setting, because the log message is emitted only after basic
parsing has been done to determine the statement type.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

#7Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#4)
Re: Doc update proposal for the note on log_statement in the runtime config for logging page

On Mon, Jul 28, 2025 at 07:35:51PM -0400, Bruce Momjian wrote:

On Mon, Jul 28, 2025 at 04:24:14PM -0700, Daniel Bauman wrote:

Here's where I think the logging is happening. https://github.com/postgres/
postgres/blob/master/src/backend/tcop/postgres.c#L1070
It seems like the log is happening before application of the transaction, not
after.

So consistency is best effort. ie - a crash after the log but before the
transaction is committed could result in a logged statement that wasn't
actually committed.

Logging is not handled in the WAL and is not guaranteed to be consistent in the
way that different statements in a transaction are, which is the nuance that
I'm interested in documenting.
Maybe the doc update should be to say that statements in the log may not have
been applied rather than saying that applied statements might not appear in the
log.

Yes, right, it is not that we do some "best effort", whatever that
means, but rather we just log when we start running the statement. The
docs clearly explain we log after parsing but before execution:

Statements that contain simple syntax errors are not logged even
by the <varname>log_statement</varname> = <literal>all</literal>
setting, because the log message is emitted only after basic
parsing has been done to determine the statement type.

"log_min_duration_statement=0" might be closer to what you want since it
is only printed when the query completes.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

#8David G. Johnston
david.g.johnston@gmail.com
In reply to: Daniel Bauman (#6)
Re: Doc update proposal for the note on log_statement in the runtime config for logging page

On Monday, July 28, 2025, Daniel Bauman <danielbaniel@gmail.com> wrote:

The doc fragment you shared is explaining to customers that basic syntax
checks are done before postgres gets to logging the transaction.
I don't think that is the same as clearly explaining to users statement
logging happens before execution.

He quoted the wrong sentence. This is the definitive one:

For clients using extended query protocol, logging occurs when an Execute
message is received, and values of the Bind parameters are included (with
any embedded single-quote marks doubled).
David J.

At least for me, as a user and reader of the docs but not someone familiar
with the code, the docs didn't make it clear to me how statement logging
corresponded to query execution.

Do you think there's room to document something like
"Statements are logged before they are executed. It's not guaranteed that
logged statements have been successfully executed."
I'd be happy to submit and iterate on a pull request if you do.

The first sentence maybe. If you find this important enough to submit a
patch using our submission process go ahead. On that note, please observe
that we in-line reply and trim here, not top-post.

I'd also like to understand what happens if there are errors writing the
log - like the disk where the log directory is configured being full.
My understanding is the following. ereport (https://github.com/postgres/
postgres/blob/71c0921b649d7a800eb2d6f93539890eaa14d979/src/include/utils/
elog.h#L163) will end up calling errfinish and errfinish will call
EmitErrorReport (https://github.com/postgres/postgres/blob/master/src/
backend/utils/error/elog.c#L543) which will call
send_message_to_server_log (https://github.com/postgres/postgres/blob/
71c0921b649d7a800eb2d6f93539890eaa14d979/src/backend/utils/
error/elog.c#L1733) and that will call write_syslogger_file (
https://github.com/postgres/postgres/blob/71c0921b649d7a800eb2d6f9353989
0eaa14d979/src/backend/utils/error/elog.c#L3443C2-L3443C16).
write_syslogger_file looks like it handles errors by logging to stderr but
not raising an error condition that would cancel the transaction (
https://github.com/postgres/postgres/blob/71c0921b649d7a800eb2d6f9353989
0eaa14d979/src/backend/postmaster/syslogger.c#L1126)

Do I understand correctly or am I on the wrong codepath?

That sounds right. It’s deemed overly harsh to crash the server just
because some logging doesn’t happen.

David J.

#9Daniel Bauman
danielbaniel@gmail.com
In reply to: David G. Johnston (#8)
Re: Doc update proposal for the note on log_statement in the runtime config for logging page

On Tue, Jul 29, 2025 at 8:46 AM David G. Johnston <
david.g.johnston@gmail.com> wrote:

On Monday, July 28, 2025, Daniel Bauman <danielbaniel@gmail.com> wrote:

The doc fragment you shared is explaining to customers that basic syntax
checks are done before postgres gets to logging the transaction.
I don't think that is the same as clearly explaining to users statement
logging happens before execution.

He quoted the wrong sentence. This is the definitive one:

For clients using extended query protocol, logging occurs when an Execute
message is received, and values of the Bind parameters are included (with
any embedded single-quote marks doubled).
David J.

At least for me, as a user and reader of the docs but not someone
familiar with the code, the docs didn't make it clear to me how statement
logging corresponded to query execution.

Do you think there's room to document something like
"Statements are logged before they are executed. It's not guaranteed that
logged statements have been successfully executed."
I'd be happy to submit and iterate on a pull request if you do.

The first sentence maybe. If you find this important enough to submit a
patch using our submission process go ahead. On that note, please observe
that we in-line reply and trim here, not top-post.

Thanks for the callout, in-line replying here :)

I think the first sentence would be an improvement. However, based on the
fact that errors are ignored and the fact that there is no fsync (have
looked, haven't found an fsync and don't think doing so would make sense)
on the ereport path I think there's room for making it clear that logging
of all transactions is not guaranteed.
Do you disagree with making this explicit?

The tradeoff to ignore errors and not fsync every log absolutely makes
sense to me. It's just something it would be beneficial for users to be
aware of. Particularly those concerned with auditing.

I'd also like to understand what happens if there are errors writing the
log - like the disk where the log directory is configured being full.
My understanding is the following. ereport (
https://github.com/postgres/postgres/blob/71c0921b649d7a800eb2d6f93539890eaa14d979/src/include/utils/elog.h#L163)
will end up calling errfinish and errfinish will call EmitErrorReport (
https://github.com/postgres/postgres/blob/master/src/backend/utils/error/elog.c#L543)
which will call send_message_to_server_log (
https://github.com/postgres/postgres/blob/71c0921b649d7a800eb2d6f93539890eaa14d979/src/backend/utils/error/elog.c#L1733)
and that will call write_syslogger_file (
https://github.com/postgres/postgres/blob/71c0921b649d7a800eb2d6f93539890eaa14d979/src/backend/utils/error/elog.c#L3443C2-L3443C16
).
write_syslogger_file looks like it handles errors by logging to stderr
but not raising an error condition that would cancel the transaction (
https://github.com/postgres/postgres/blob/71c0921b649d7a800eb2d6f93539890eaa14d979/src/backend/postmaster/syslogger.c#L1126
)

Do I understand correctly or am I on the wrong codepath?

That sounds right. It’s deemed overly harsh to crash the server just
because some logging doesn’t happen.

I agree. I'm not suggesting a change, just making the docs explicit.

On Tue, Jul 29, 2025 at 8:46 AM David G. Johnston <
david.g.johnston@gmail.com> wrote:

Show quoted text

On Monday, July 28, 2025, Daniel Bauman <danielbaniel@gmail.com> wrote:

The doc fragment you shared is explaining to customers that basic syntax
checks are done before postgres gets to logging the transaction.
I don't think that is the same as clearly explaining to users statement
logging happens before execution.

He quoted the wrong sentence. This is the definitive one:

For clients using extended query protocol, logging occurs when an Execute
message is received, and values of the Bind parameters are included (with
any embedded single-quote marks doubled).
David J.

At least for me, as a user and reader of the docs but not someone
familiar with the code, the docs didn't make it clear to me how statement
logging corresponded to query execution.

Do you think there's room to document something like
"Statements are logged before they are executed. It's not guaranteed that
logged statements have been successfully executed."
I'd be happy to submit and iterate on a pull request if you do.

The first sentence maybe. If you find this important enough to submit a
patch using our submission process go ahead. On that note, please observe
that we in-line reply and trim here, not top-post.

I'd also like to understand what happens if there are errors writing the
log - like the disk where the log directory is configured being full.
My understanding is the following. ereport (
https://github.com/postgres/postgres/blob/71c0921b649d7a800eb2d6f93539890eaa14d979/src/include/utils/elog.h#L163)
will end up calling errfinish and errfinish will call EmitErrorReport (
https://github.com/postgres/postgres/blob/master/src/backend/utils/error/elog.c#L543)
which will call send_message_to_server_log (
https://github.com/postgres/postgres/blob/71c0921b649d7a800eb2d6f93539890eaa14d979/src/backend/utils/error/elog.c#L1733)
and that will call write_syslogger_file (
https://github.com/postgres/postgres/blob/71c0921b649d7a800eb2d6f93539890eaa14d979/src/backend/utils/error/elog.c#L3443C2-L3443C16
).
write_syslogger_file looks like it handles errors by logging to stderr
but not raising an error condition that would cancel the transaction (
https://github.com/postgres/postgres/blob/71c0921b649d7a800eb2d6f93539890eaa14d979/src/backend/postmaster/syslogger.c#L1126
)

Do I understand correctly or am I on the wrong codepath?

That sounds right. It’s deemed overly harsh to crash the server just
because some logging doesn’t happen.

David J.

#10David G. Johnston
david.g.johnston@gmail.com
In reply to: Daniel Bauman (#9)
Re: Doc update proposal for the note on log_statement in the runtime config for logging page

On Tue, Jul 29, 2025, 10:07 Daniel Bauman <danielbaniel@gmail.com> wrote:

The tradeoff to ignore errors and not fsync every log absolutely makes
sense to me. It's just something it would be beneficial for users to be
aware of. Particularly those concerned with auditing.

The main question is where to put such info. The config settings section
seems like an odd place to find that.

David J.

#11Daniel Bauman
danielbaniel@gmail.com
In reply to: David G. Johnston (#10)
Re: Doc update proposal for the note on log_statement in the runtime config for logging page

On Tue, Jul 29, 2025 at 11:46 AM David G. Johnston <
david.g.johnston@gmail.com> wrote:

On Tue, Jul 29, 2025, 10:07 Daniel Bauman <danielbaniel@gmail.com> wrote:

The tradeoff to ignore errors and not fsync every log absolutely makes
sense to me. It's just something it would be beneficial for users to be
aware of. Particularly those concerned with auditing.

The main question is where to put such info. The config settings section
seems like an odd place to find that.

David J.

From looking through the docs I think that adding to the note under the
log_statement is an appropriate place.
It already calls out other statement logging details like the fact that
statements are only logged after basic validation and that statements can
contain sensitive information.
I think adding a statement much like the MySQL wording is correct (
"Statement logging occurs on a best-effort basis, with no guarantee of
consistency." is descriptive without getting into unnecessary details for
the user about fsync, transactions and error handling. That could
separately be discussed in the code comments for developers.
I think the best-effort wording is justified and it probably is for most DB
engines. Error logging is not transactional, it's not part of the WAL. Even
though statement logging happens before the tx is executed it isn't fsynced
for each statement and write errors don't prevent transactions from being
processed.

What do you think?

-Daniel