Set log_lock_waits=on by default
Here is a patch to implement this.
Being stuck behind a lock for more than a second is almost
always a problem, so it is reasonable to turn this on by default.
Yours,
Laurenz Albe
Attachments:
v1-0001-Default-to-log_lock_waits-on.patchtext/x-patch; charset=UTF-8; name=v1-0001-Default-to-log_lock_waits-on.patchDownload
From a767e69c724fbbff14114729272be5d29c3d69d8 Mon Sep 17 00:00:00 2001
From: Laurenz Albe <laurenz.albe@cybertec.at>
Date: Thu, 21 Dec 2023 14:24:00 +0100
Subject: [PATCH v1] Default to log_lock_waits=on
If someone is stuck behind a lock for more than a second,
that is almost always a problem that is worth a log entry.
---
doc/src/sgml/config.sgml | 2 +-
src/backend/storage/lmgr/proc.c | 2 +-
src/backend/utils/misc/guc_tables.c | 2 +-
src/backend/utils/misc/postgresql.conf.sample | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 44cada2b40..f96ec5a72e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7383,7 +7383,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
Controls whether a log message is produced when a session waits
longer than <xref linkend="guc-deadlock-timeout"/> to acquire a
lock. This is useful in determining if lock waits are causing
- poor performance. The default is <literal>off</literal>.
+ poor performance. The default is <literal>on</literal>.
Only superusers and users with the appropriate <literal>SET</literal>
privilege can change this setting.
</para>
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index b6451d9d08..4f25cebbd8 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -60,7 +60,7 @@ int StatementTimeout = 0;
int LockTimeout = 0;
int IdleInTransactionSessionTimeout = 0;
int IdleSessionTimeout = 0;
-bool log_lock_waits = false;
+bool log_lock_waits = true;
/* Pointer to this process's PGPROC struct, if any */
PGPROC *MyProc = NULL;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index f7c9882f7c..1f3d56e6ee 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1519,7 +1519,7 @@ struct config_bool ConfigureNamesBool[] =
NULL
},
&log_lock_waits,
- false,
+ true,
NULL, NULL, NULL
},
{
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index cf9f283cfe..451e09d3d8 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -586,7 +586,7 @@
# processes
# %% = '%'
# e.g. '<%u%%%d> '
-#log_lock_waits = off # log lock waits >= deadlock_timeout
+#log_lock_waits = on # log lock waits >= deadlock_timeout
#log_recovery_conflict_waits = off # log standby recovery conflict waits
# >= deadlock_timeout
#log_parameter_max_length = -1 # when logging statements, limit logged
--
2.43.0
On Thu, Dec 21, 2023 at 8:29 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Here is a patch to implement this.
Being stuck behind a lock for more than a second is almost
always a problem, so it is reasonable to turn this on by default.
I think it depends somewhat on the lock type, and also on your
threshold for what constitutes a problem. For example, you can wait
for 1 second for a relation extension lock pretty easily, I think,
just because the I/O system is busy. Or I think also a VXID lock held
by some transaction that has a tuple locked could be not particularly
exciting. A conflict on a relation lock seems more likely to represent
a real issue, but I guess it's all kind of a judgement call. A second
isn't really all that long on an overloaded system, and I see an awful
lot of overloaded systems (because those are the people who call me).
Just a random idea but what if we separated log_lock_waits from
deadlock_timeout? Say, it becomes time-valued rather than
Boolean-valued, but it has to be >= deadlock_timeout? Because I'd
probably be more interested in hearing about a lock wait that was more
than say 10 seconds, but I don't necessarily want to wait 10 seconds
for the deadlock detector to trigger.
In general, I do kind of like the idea of trying to log more problem
situations by default, so that when someone has a major issue, you
don't have to start by having them change all the logging settings and
then wait until they get hosed a second time before you can
troubleshoot anything. I'm just concerned that 1s might be too
sensitive for a lot of users who aren't as, let's say, diligent about
keeping the system healthy as you probably are.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Dec 21, 2023 at 05:29 Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Here is a patch to implement this.
Being stuck behind a lock for more than a second is almost
always a problem, so it is reasonable to turn this on by default.
I think it's a very good idea. On all heavily loaded systems I have
observed so far, we always have turned it on. 1s (default deadlock_timeout)
is quite large value for web/mobile apps, meaning that default frequency of
logging is quite low, so any potential suffering from observer effect
doesn't happen -- saturation related active session number happens much,
much earlier, even if you have very slow disk IO for logging.
At the same time, I like the idea by Robert to separate logging of log
waits and deadlock_timeout logic -- the current implementation is a quite
confusing for new users. I also had cases when people wanted to log lock
waits earlier than deadlock detection. And also, most always lock wait
logging lacks the information another the blocking session (its state, and
last query, first of all), but is maybe an off topic worthing another
effort of improvements.
Nik
Le 21/12/2023 à 14:29, Laurenz Albe a écrit :
Here is a patch to implement this.
Being stuck behind a lock for more than a second is almost
always a problem, so it is reasonable to turn this on by default.
I think it's a really good idea. At Dalibo, we advise our customers to
switch it on. AFAICT, it's never been a problem.
Best regards,
Frédéric
Re: Robert Haas
On Thu, Dec 21, 2023 at 8:29 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Here is a patch to implement this.
Being stuck behind a lock for more than a second is almost
always a problem, so it is reasonable to turn this on by default.I think it depends somewhat on the lock type, and also on your
threshold for what constitutes a problem. For example, you can wait
for 1 second for a relation extension lock pretty easily, I think,
just because the I/O system is busy. Or I think also a VXID lock held
by some transaction that has a tuple locked could be not particularly
exciting. A conflict on a relation lock seems more likely to represent
a real issue, but I guess it's all kind of a judgement call. A second
isn't really all that long on an overloaded system, and I see an awful
lot of overloaded systems (because those are the people who call me).
If a system is so busy that it's waiting so long for the disk, I would
like PG to tell me about it. Likewise, if my transactions are slow
because they are waiting for each other, I'd also like PG to tell me.
Especially as the 2nd condition can't be seen by "it's slow because
CPU or IO is at 100%".
In any case, setting log_lock_waits=on by default helps.
In fact, everyone I talked to was wondering why log_checkpoints was
turned on by default, and not this parameter. The info provided by
log_lock_waits is much more actionable than the stream of
log_checkpoint messages.
Just a random idea but what if we separated log_lock_waits from
deadlock_timeout? Say, it becomes time-valued rather than
Boolean-valued, but it has to be >= deadlock_timeout? Because I'd
probably be more interested in hearing about a lock wait that was more
than say 10 seconds, but I don't necessarily want to wait 10 seconds
for the deadlock detector to trigger.
That's also a good point, but I'd like to see log_lock_waits default
to 'on' independently from having this extra change.
In general, I do kind of like the idea of trying to log more problem
situations by default, so that when someone has a major issue, you
don't have to start by having them change all the logging settings and
then wait until they get hosed a second time before you can
troubleshoot anything. I'm just concerned that 1s might be too
sensitive for a lot of users who aren't as, let's say, diligent about
keeping the system healthy as you probably are.
I don't think 1s would be too sensitive by default.
Christoph
On Thu, 2023-12-21 at 09:14 -0500, Robert Haas wrote:
I think it depends somewhat on the lock type, and also on your
threshold for what constitutes a problem. For example, you can wait
for 1 second for a relation extension lock pretty easily, I think,
just because the I/O system is busy. Or I think also a VXID lock held
by some transaction that has a tuple locked could be not particularly
exciting. A conflict on a relation lock seems more likely to represent
a real issue, but I guess it's all kind of a judgement call. A second
isn't really all that long on an overloaded system, and I see an awful
lot of overloaded systems (because those are the people who call me).
Sure, you don't want "log_lock_waits = on" in all conceivable databases.
I have seen applications that use database locks to synchronize
application threads (*shudder*). If it is normal for your database
to experience long lock waits, disable the parameter.
My point is that in the vast majority of cases, long lock waits
indicate a problem that you would like to know about, so the parameter
should default to "on".
(Out of curiosity: what would ever wait for a VXID lock?)
Just a random idea but what if we separated log_lock_waits from
deadlock_timeout? Say, it becomes time-valued rather than
Boolean-valued, but it has to be >= deadlock_timeout? Because I'd
probably be more interested in hearing about a lock wait that was more
than say 10 seconds, but I don't necessarily want to wait 10 seconds
for the deadlock detector to trigger.
That is an appealing thought, but as far as I know, "log_lock_waits"
is implemented by the deadlock detector, which is why it is tied to
"deadlock_timeout". So if we want that, we'd need a separate "live
lock detector". I don't know if we want to go there.
Yours,
Laurenz Albe
On 2023-12-22 20:00, Laurenz Albe wrote:
My point is that in the vast majority of cases, long lock waits
indicate a problem that you would like to know about, so the parameter
should default to "on".
+1.
I always set log_lock_waits=on, so I agree with this.
Just a random idea but what if we separated log_lock_waits from
deadlock_timeout? Say, it becomes time-valued rather than
Boolean-valued, but it has to be >= deadlock_timeout? Because I'd
probably be more interested in hearing about a lock wait that was more
than say 10 seconds, but I don't necessarily want to wait 10 seconds
for the deadlock detector to trigger.That is an appealing thought, but as far as I know, "log_lock_waits"
is implemented by the deadlock detector, which is why it is tied to
"deadlock_timeout". So if we want that, we'd need a separate "live
lock detector". I don't know if we want to go there.
Personally, I thought it was a good idea to separate log_lock_waits and
deadlock_timeout, but I have not checked how that is implemented.
--
Regards,
Shinya Kato
NTT DATA GROUP CORPORATION
On 12/21/23 6:58 AM, Nikolay Samokhvalov wrote:
On Thu, Dec 21, 2023 at 05:29 Laurenz Albe <laurenz.albe@cybertec.at
<mailto:laurenz.albe@cybertec.at>> wrote:Here is a patch to implement this.
Being stuck behind a lock for more than a second is almost
always a problem, so it is reasonable to turn this on by default.I think it's a very good idea. On all heavily loaded systems I have
observed so far, we always have turned it on. 1s (default
deadlock_timeout) is quite large value for web/mobile apps, meaning that
default frequency of logging is quite low, so any potential suffering
from observer effect doesn't happen -- saturation related active session
number happens much, much earlier, even if you have very slow disk IO
for logging.
FWIW, enabling this setting has also been a long-time "happiness hint"
that I've passed along to people.
What would be the worst case amount of logging that we're going to
generate at scale? I think the worst case would largely scale according
to connection count? So if someone had a couple thousand backends on a
busy top-end system, then I guess they might generate up to a couple
thousand log messages every second or two under load after this
parameter became enabled with a 1 second threshold?
I'm not aware of any cases where enabling this parameter with a 1 second
threshold overwhelmed the logging collector (unlike, for example,
log_statement=all) but I wanted to pose the question in the interest of
being careful.
At the same time, I like the idea by Robert to separate logging of log
waits and deadlock_timeout logic -- the current implementation is a
quite confusing for new users. I also had cases when people wanted to
log lock waits earlier than deadlock detection. And also, most always
lock wait logging lacks the information another the blocking session
(its state, and last query, first of all), but is maybe an off topic
worthing another effort of improvements.
I agree with this, though it's equally true that proliferation of new
GUCs is confusing for new users. I hope the project avoids too low of a
bar for adding new GUCs. But using the deadlock_timeout GUC for this
completely unrelated log threshold really doesn't make sense.
-Jeremy
Hi,
On Thu, Dec 21, 2023 at 02:29:05PM +0100, Laurenz Albe wrote:
Here is a patch to implement this.
Being stuck behind a lock for more than a second is almost
always a problem, so it is reasonable to turn this on by default.
I also think that this should be set to on.
I had a look at the patch and it works fine. Regarding the
documentation, maybe the back-reference at deadlock_timeout could be
made a bit more explicit then as well, as in the attached patch, but
this is mostly bikeshedding.
Michael
Attachments:
log-lock-waits-doc.patchtext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6d975ef7ab..7f9bdea50b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -10185,11 +10185,12 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
</para>
<para>
- When <xref linkend="guc-log-lock-waits"/> is set,
- this parameter also determines the amount of time to wait before
- a log message is issued about the lock wait. If you are trying
- to investigate locking delays you might want to set a shorter than
- normal <varname>deadlock_timeout</varname>.
+ When <xref linkend="guc-log-lock-waits"/> is set to
+ <literal>on</literal> (which is the default), this parameter also
+ determines the amount of time to wait before a log message is issued
+ about the lock wait. If you are trying to investigate locking delays
+ you might want to set a shorter than normal
+ <varname>deadlock_timeout</varname>.
</para>
</listitem>
</varlistentry>
Hi,
On Thu, Jan 11, 2024 at 03:24:55PM +0100, Michael Banck wrote:
On Thu, Dec 21, 2023 at 02:29:05PM +0100, Laurenz Albe wrote:
Here is a patch to implement this.
Being stuck behind a lock for more than a second is almost
always a problem, so it is reasonable to turn this on by default.I also think that this should be set to on.
I had a look at the patch and it works fine.
Regarding the documentation, maybe the back-reference at
deadlock_timeout could be made a bit more explicit then as well, as in
the attached patch, but this is mostly bikeshedding.
I've marked it ready for committer now, as the above really is
bikeshedding.
Michael
I saw this was marked ready-for-committer, so I took a look at the thread.
It looks like there are two ideas:
* Separate log_lock_waits from deadlock_timeout. It looks like this idea
has a decent amount of support, but I didn't see any patch for it so far.
IMHO this is arguably a prerequisite for setting log_lock_waits on by
default, as we could then easily set it higher by default to help address
concerns about introducing too much noise in the logs.
* Set log_lock_waits on by default. The folks on this thread seem to
support this idea, but given the lively discussion for enabling
log_checkpoints by default [0]/messages/by-id/CALj2ACX-rW_OeDcp4gqrFUAkf1f50Fnh138dmkd0JkvCNQRKGA@mail.gmail.com, I'm hesitant to commit something like
this without further community discussion.
[0]: /messages/by-id/CALj2ACX-rW_OeDcp4gqrFUAkf1f50Fnh138dmkd0JkvCNQRKGA@mail.gmail.com
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Nathan Bossart <nathandbossart@gmail.com> writes:
It looks like there are two ideas:
* Separate log_lock_waits from deadlock_timeout. It looks like this idea
has a decent amount of support, but I didn't see any patch for it so far.
I think the support comes from people who have not actually looked at
the code. The reason they are not separate is that the same timeout
service routine does both things. To pull them apart, you would have
to (1) detangle that code and (2) incur the overhead of two timeout
events queued for every lock wait. It's not clear to me that it's
worth it. In some sense, deadlock_timeout is exactly the length of
time after which you want to get concerned.
IMHO this is arguably a prerequisite for setting log_lock_waits on by
default, as we could then easily set it higher by default to help address
concerns about introducing too much noise in the logs.
Well, that's the question --- would it be sane to enable
log_lock_waits by default if we don't separate them?
* Set log_lock_waits on by default. The folks on this thread seem to
support this idea, but given the lively discussion for enabling
log_checkpoints by default [0], I'm hesitant to commit something like
this without further community discussion.
I was, and remain, of the opinion that that was a bad idea that
we'll eventually revert, just like we previously got rid of most
inessential log chatter in the default configuration. So I doubt
you'll have much trouble guessing my opinion of this one. I think
the default logging configuration should be chosen with the
understanding that nobody ever looks at the logs of most
installations, and we should be more worried about their disk space
consumption than anything else. Admittedly, log_lock_waits is less
bad than log_checkpoints, because no such events will occur in a
well-tuned configuration ... but still, it's going to be useless
chatter in the average installation.
regards, tom lane
On Tue, 2024-02-06 at 12:29 -0500, Tom Lane wrote:
Nathan Bossart <nathandbossart@gmail.com> writes:
It looks like there are two ideas:
[...]
* Set log_lock_waits on by default. The folks on this thread seem to
support this idea, but given the lively discussion for enabling
log_checkpoints by default [0], I'm hesitant to commit something like
this without further community discussion.I was, and remain, of the opinion that that was a bad idea that
we'll eventually revert, just like we previously got rid of most
inessential log chatter in the default configuration. So I doubt
you'll have much trouble guessing my opinion of this one. I think
the default logging configuration should be chosen with the
understanding that nobody ever looks at the logs of most
installations, and we should be more worried about their disk space
consumption than anything else. Admittedly, log_lock_waits is less
bad than log_checkpoints, because no such events will occur in a
well-tuned configuration ... but still, it's going to be useless
chatter in the average installation.
Unsurprisingly, I want to argue against that.
You say that it is less bad than "log_checkpoints = on", and I agree.
I can't remember seeing any complaints by users about
"log_checkpoints", and I predict that the complaints about
"log_lock_waits = on" will be about as loud.
I am all for avoiding useless chatter in the log. In my personal
experience, that is usually "database typo does not exist" and
constraint violation errors. I always recommend people to enable
"log_lock_waits", and so far I have not seen it spam the logs.
I agree that usually nobody ever looks into the log file. The
time when people *do* look into the log file is when they encounter
trouble, and my stance is that the default configuration should be
such that the log contains clues as to what may be the problem.
If a statement sometimes takes unreasonably long, it is very
valuable corroborative information that the statement occasionally
waits more than a second for a lock.
Yours,
Laurenz Albe
Greetings,
* Laurenz Albe (laurenz.albe@cybertec.at) wrote:
On Tue, 2024-02-06 at 12:29 -0500, Tom Lane wrote:
Nathan Bossart <nathandbossart@gmail.com> writes:
It looks like there are two ideas:
[...]
* Set log_lock_waits on by default. The folks on this thread seem to
support this idea, but given the lively discussion for enabling
log_checkpoints by default [0], I'm hesitant to commit something like
this without further community discussion.I was, and remain, of the opinion that that was a bad idea that
we'll eventually revert, just like we previously got rid of most
inessential log chatter in the default configuration. So I doubt
you'll have much trouble guessing my opinion of this one. I think
the default logging configuration should be chosen with the
understanding that nobody ever looks at the logs of most
installations, and we should be more worried about their disk space
consumption than anything else. Admittedly, log_lock_waits is less
bad than log_checkpoints, because no such events will occur in a
well-tuned configuration ... but still, it's going to be useless
chatter in the average installation.Unsurprisingly, I want to argue against that.
I tend to agree with this position- log_checkpoints being on has been a
recommended configuration for a very long time and is valuable
information to have about what's been happening when someone does go and
look at the log.
Having log_lock_waits on by default is likely to be less noisy and even
more useful for going back in time to figure out what happened.
You say that it is less bad than "log_checkpoints = on", and I agree.
I can't remember seeing any complaints by users about
"log_checkpoints", and I predict that the complaints about
"log_lock_waits = on" will be about as loud.
Yeah, agreed.
I am all for avoiding useless chatter in the log. In my personal
experience, that is usually "database typo does not exist" and
constraint violation errors. I always recommend people to enable
"log_lock_waits", and so far I have not seen it spam the logs.
I really wish we could separate out the messages about typos and
constraint violations from these logs about processes waiting a long
time for locks or about checkpoints or even PANIC's or other really
important messages. That said, that's a different problem and not
something this change needs to concern itself with.
As for if we want to separate out log_lock_waits from deadlock_timeout-
no, I don't think we do, for the reasons that Tom mentioned. I don't
see it as necessary either for enabling log_lock_waits by default.
Waiting deadlock_timeout amount of time for a lock certainly is a
problem already and once we've waited that amount of time, I can't see
the time spent logging about it as being a serious issue.
+1 for simply enabling log_lock_waits by default.
All that said ... if we did come up with a nice way to separate out the
timing for deadlock_timeout and log_lock_waits, I wouldn't necessarily
be against it. Perhaps one approach to that would be to set just one
timer but have it be the lower of the two, and then set another when
that fires (if there's more waiting to do) and then catch it when it
happens... Again, I'd view this as some independent improvement though
and not a requirement for just enabling log_lock_waits by default.
Thanks,
Stephen
Stephen Frost <sfrost@snowman.net> writes:
On Tue, 2024-02-06 at 12:29 -0500, Tom Lane wrote:
I was, and remain, of the opinion that that was a bad idea that
we'll eventually revert, just like we previously got rid of most
inessential log chatter in the default configuration.
Unsurprisingly, I want to argue against that.
I tend to agree with this position- log_checkpoints being on has been a
recommended configuration for a very long time and is valuable
information to have about what's been happening when someone does go and
look at the log.
We turned on default log_checkpoints in v15, which means that behavior
has been in the field for about sixteen months. I don't think that
that gives it the status of a settled issue; my bet is that most
users still have not seen it.
regards, tom lane
Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Stephen Frost <sfrost@snowman.net> writes:
On Tue, 2024-02-06 at 12:29 -0500, Tom Lane wrote:
I was, and remain, of the opinion that that was a bad idea that
we'll eventually revert, just like we previously got rid of most
inessential log chatter in the default configuration.Unsurprisingly, I want to argue against that.
I tend to agree with this position- log_checkpoints being on has been a
recommended configuration for a very long time and is valuable
information to have about what's been happening when someone does go and
look at the log.We turned on default log_checkpoints in v15, which means that behavior
has been in the field for about sixteen months. I don't think that
that gives it the status of a settled issue; my bet is that most
users still have not seen it.
Apologies for not being clear- log_checkpoints being on has been a
configuration setting that I (and many others I've run into) have been
recommending since as far back as I can remember.
I was not referring to the change in the default.
Thanks,
Stephen
Hi,
this patch is still on the table, though for v18 now.
Nathan mentioned up-thread that he was hesitant to commit this without
further discussion. Laurenz, Stephen and I are +1 on this, but when it
comes to committers having chimed in only Tom did so far and was -1.
Are there any others who have an opinion on this?
On Tue, Feb 06, 2024 at 12:29:10PM -0500, Tom Lane wrote:
Nathan Bossart <nathandbossart@gmail.com> writes:
It looks like there are two ideas:
* Separate log_lock_waits from deadlock_timeout. It looks like this idea
has a decent amount of support, but I didn't see any patch for it so far.I think the support comes from people who have not actually looked at
the code. The reason they are not separate is that the same timeout
service routine does both things. To pull them apart, you would have
to (1) detangle that code and (2) incur the overhead of two timeout
events queued for every lock wait. It's not clear to me that it's
worth it. In some sense, deadlock_timeout is exactly the length of
time after which you want to get concerned.IMHO this is arguably a prerequisite for setting log_lock_waits on by
default, as we could then easily set it higher by default to help address
concerns about introducing too much noise in the logs.Well, that's the question --- would it be sane to enable
log_lock_waits by default if we don't separate them?
I think it would be, I have not seen people change the value of
deadlock_timeout so far, and I think 1s is a reasonable long time for a
default lock wait to be reported.
* Set log_lock_waits on by default. The folks on this thread seem to
support this idea, but given the lively discussion for enabling
log_checkpoints by default [0], I'm hesitant to commit something like
this without further community discussion.I was, and remain, of the opinion that that was a bad idea that
we'll eventually revert, just like we previously got rid of most
inessential log chatter in the default configuration.
I somewhat agree here in the sense that log_checkpoints is really only
useful for heavily-used servers, but this is a digression and due to the
fact that log_checkpoints emits log lines periodically while
log_lock_waits only emits them for application conflicts (and arguably
application bugs), I do not think those would be in the "issential log
chatter" group similar to how all SQL errors are not in that group
either.
Michael
On Thu, 2024-07-18 at 12:25 +0200, Michael Banck wrote:
this patch is still on the table, though for v18 now.
Nathan mentioned up-thread that he was hesitant to commit this without
further discussion. Laurenz, Stephen and I are +1 on this, but when it
comes to committers having chimed in only Tom did so far and was -1.Are there any others who have an opinion on this?
If we want to tally up, there were also +1 votes from Christoph Berg,
Shinya Kato, Nikolay Samokhvalov, Jeremy Schneider and Frédéric Yhuel.
The major criticism is Tom's that it might unduly increase the log volume.
I'm not trying to rush things, but I see little point in kicking a trivial
patch like this through many commitfests. If no committer wants to step
up and commit this, it should be rejected.
Yours,
Laurenz Albe
Re: Laurenz Albe
The major criticism is Tom's that it might unduly increase the log volume.
I'm not trying to rush things, but I see little point in kicking a trivial
patch like this through many commitfests. If no committer wants to step
up and commit this, it should be rejected.
That would be a pity, I still think log_lock_waits=on by default would
be a good thing.
I have not seen any server yet where normal, legitimate operation
would routinely trigger the message. (If it did, people should likely
have used SKIP LOCKED or NOWAIT instead.) It would only increase the
log volume on systems that have a problem.
Christoph
On Fri, Jul 19, 2024 at 9:24 AM Christoph Berg <myon@debian.org> wrote:
I have not seen any server yet where normal, legitimate operation
would routinely trigger the message. (If it did, people should likely
have used SKIP LOCKED or NOWAIT instead.) It would only increase the
log volume on systems that have a problem.
I've definitely seen systems where this setting would have generated
regular output, because I see a lot of systems that are chronically
overloaded. I think waits of more than 1 second for tuple locks could
be pretty routine on some servers -- or XID or VXID locks. So I'm more
cautious about this than most people on this thread: log_checkpoints
won't generate more than a few lines of output per checkpoint
interval, and a checkpoint cycle will be on the order of minutes, so
it's really never that much volume. On the other hand, in theory, this
setting can generate arbitrarily many messages.
That's why I originally proposed separating deadlock_timeout from
log_lock_waits, because I think 1s might actually end up being kind of
noisy for some people. On the other hand, I take Tom's point that
separating those things would be hard to code and, probably more
importantly, require a second timer.
I'm not strongly opposed to just turning this on by default. It's not
like we can't change our minds, and it's also not like individual
customers can't change the default. I think Tom has entirely the wrong
idea about what a typical log file on a production server looks like.
In every case I've seen, it's full of application generated errors
that will never be fixed, connection logging, statement logging, and
other stuff that is there just in case but will normally be ignored.
Finding the messages that indicate real database problems is typically
quite difficult, even if they're all enabled. If they're disabled by
default, well then the useless crap is the ONLY thing you find in the
log file, and when the customer has a problem, the first thing you
have to do is tell them to turn on all the GUCs that log the actually
important stuff and wait until the problem recurs.
I have yet to run into a customer who was thrilled about receiving that message.
--
Robert Haas
EDB: http://www.enterprisedb.com
Re: Robert Haas
I've definitely seen systems where this setting would have generated
regular output, because I see a lot of systems that are chronically
overloaded.
I'd argue this is exactly what I mean by "this system has a problem".
Telling the user about that makes sense.
cautious about this than most people on this thread: log_checkpoints
won't generate more than a few lines of output per checkpoint
interval, and a checkpoint cycle will be on the order of minutes, so
it's really never that much volume. On the other hand, in theory, this
setting can generate arbitrarily many messages.
Well, it's still limited by 1 message per second (times
max_connections). It won't suddenly fill up the server with 1000
messages per second.
The log volume is the lesser of the problems. Not printing the message
just because the system does have a problem isn't the right fix.
Finding the messages that indicate real database problems is typically
quite difficult, even if they're all enabled. If they're disabled by
default, well then the useless crap is the ONLY thing you find in the
log file, and when the customer has a problem, the first thing you
have to do is tell them to turn on all the GUCs that log the actually
important stuff and wait until the problem recurs.I have yet to run into a customer who was thrilled about receiving that message.
Let's fix the default. People who have a problem can still disable it,
but then everyone else gets the useful messages in the first iteration.
Christoph
Are there any others who have an opinion on this?
Big +1 to having it on by default. It's already one of the first things I
turn on by default on any system I come across. The log spam is minimal,
compared to all the other stuff that ends up in there. And unlike most of
that stuff, this is output you generally want and need, when problems start
occurring.
Cheers,
Greg
Late to the party, but +1 from me on this default change also
On Fri, 19 Jul 2024 at 11:54, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Show quoted text
On Thu, 2024-07-18 at 12:25 +0200, Michael Banck wrote:
this patch is still on the table, though for v18 now.
Nathan mentioned up-thread that he was hesitant to commit this without
further discussion. Laurenz, Stephen and I are +1 on this, but when it
comes to committers having chimed in only Tom did so far and was -1.Are there any others who have an opinion on this?
If we want to tally up, there were also +1 votes from Christoph Berg,
Shinya Kato, Nikolay Samokhvalov, Jeremy Schneider and Frédéric Yhuel.The major criticism is Tom's that it might unduly increase the log volume.
I'm not trying to rush things, but I see little point in kicking a trivial
patch like this through many commitfests. If no committer wants to step
up and commit this, it should be rejected.Yours,
Laurenz Albe
On Fri, Jul 19, 2024 at 10:22 AM Christoph Berg <myon@debian.org> wrote:
I'd argue this is exactly what I mean by "this system has a problem".
Telling the user about that makes sense.
That's a fair position.
cautious about this than most people on this thread: log_checkpoints
won't generate more than a few lines of output per checkpoint
interval, and a checkpoint cycle will be on the order of minutes, so
it's really never that much volume. On the other hand, in theory, this
setting can generate arbitrarily many messages.Well, it's still limited by 1 message per second (times
max_connections). It won't suddenly fill up the server with 1000
messages per second.
You make it sound like running with max_connections=5000 is a bad idea.
(That was a joke, but yes, people actually do this, and no, it doesn't go well.)
The log volume is the lesser of the problems. Not printing the message
just because the system does have a problem isn't the right fix.
Yeah.
Let's fix the default. People who have a problem can still disable it,
but then everyone else gets the useful messages in the first iteration.
Reasonable.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Jul 19, 2024 at 5:13 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jul 19, 2024 at 10:22 AM Christoph Berg <myon@debian.org> wrote:
[..]
Let's fix the default. People who have a problem can still disable it,
but then everyone else gets the useful messages in the first iteration.Reasonable.
I have feeling that we have three disconnected class of uses:
1. dev/testing DBs: where frankly speaking nobody cares about such DBs
until they stop/crash; this also includes DBs from new users on dev
laptops too
2. production systems: where it matters to have log_lock_waits=on (and
people may start getting nervous if they don't have it when the issue
strikes)
3. PG on embedded hardware, where it would be worth to be actually
disabled and not consume scare resources
I would like to +1 too to the default value of log_lock_waits=on due
to mostly working nearby use case #2, and because due to my surprise,
we had __ 74.7% __ of individual installations having it already as
'on' already within last year support reports here at EDB (that may be
biased just to class of systems #2).
But I could be easily convinced too, that it is the embedded space
(#3) that has the biggest amount of default installations, so we
should stick log_lock_waits=off by default. However, I believe that
such specialized use of PG already might require some "customizations"
first to even further reduce e.g shared_buffers, right?
I would also like to believe that if people try to use PostgreSQL for
the first time (use case #1), they would be much better served when
the log would contain info about stuck sessions.
Also, if there's ever any change to the default, it should be put into
Release Notes at front to simply warn people (especially those from
embedded space?).
-J.
Re: Jakub Wartak
1. dev/testing DBs: where frankly speaking nobody cares about such DBs
until they stop/crash; this also includes DBs from new users on dev
laptops too
2. production systems: where it matters to have log_lock_waits=on (and
people may start getting nervous if they don't have it when the issue
strikes)
3. PG on embedded hardware, where it would be worth to be actually
disabled and not consume scare resourcesI would like to +1 too to the default value of log_lock_waits=on due
to mostly working nearby use case #2, and because due to my surprise,
we had __ 74.7% __ of individual installations having it already as
'on' already within last year support reports here at EDB (that may be
biased just to class of systems #2).
Ack. Thanks for that field data.
But I could be easily convinced too, that it is the embedded space
(#3) that has the biggest amount of default installations, so we
should stick log_lock_waits=off by default. However, I believe that
such specialized use of PG already might require some "customizations"
first to even further reduce e.g shared_buffers, right?
The ship "no log spam by default" has definitely sailed since
log_checkpoints defaults to 'on'.
I would also like to believe that if people try to use PostgreSQL for
the first time (use case #1), they would be much better served when
the log would contain info about stuck sessions.
Definitely.
Christoph
On Fri, 2024-07-19 at 11:54 +0200, Laurenz Albe wrote:
On Thu, 2024-07-18 at 12:25 +0200, Michael Banck wrote:
this patch is still on the table, though for v18 now.
Nathan mentioned up-thread that he was hesitant to commit this without
further discussion. Laurenz, Stephen and I are +1 on this, but when it
comes to committers having chimed in only Tom did so far and was -1.Are there any others who have an opinion on this?
If we want to tally up, there were also +1 votes from Christoph Berg,
Shinya Kato, Nikolay Samokhvalov, Jeremy Schneider and Frédéric Yhuel.The major criticism is Tom's that it might unduly increase the log volume.
I'm not trying to rush things, but I see little point in kicking a trivial
patch like this through many commitfests. If no committer wants to step
up and commit this, it should be rejected.
Well, it didn't make v18.
But this patch got enough votes in favor that I am not going to withdraw it.
Yours,
Laurenz Albe
Hi,
On Tue, Apr 8, 2025 at 11:33 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Well, it didn't make v18.
But this patch got enough votes in favor that I am not going to withdraw it.
Thank you for discussing it.
diff --git a/src/backend/utils/misc/guc_tables.c
b/src/backend/utils/misc/guc_tables.c
index f7c9882f7c..1f3d56e6ee 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1519,7 +1519,7 @@ struct config_bool ConfigureNamesBool[] =
NULL
},
&log_lock_waits,
- false,
+ true,
NULL, NULL, NULL
},
{
Due to 635998965 [0]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=63599896545c7869f7dd28cd593e8b548983d613, you need to change guc_parameters.dat instead of
guc_tables.c.
--
Best regards,
Shinya Kato
NTT OSS Center
On Thu, 2025-09-04 at 14:12 +0900, Shinya Kato wrote:
Due to 635998965 [0], you need to change guc_parameters.dat instead of
guc_tables.c.
Thanks for pointing that out to me!
Here is an updated patch.
Yours,
Laurenz Albe
Attachments:
v2-0001-Default-to-log_lock_waits-on.patchtext/x-patch; charset=UTF-8; name=v2-0001-Default-to-log_lock_waits-on.patchDownload
From aa04f6cf6e7c89f1107ffc6eb661da65082459fe Mon Sep 17 00:00:00 2001
From: Laurenz Albe <laurenz.albe@cybertec.at>
Date: Sun, 7 Sep 2025 23:44:38 +0200
Subject: [PATCH v2] Default to log_lock_waits=on
If someone is stuck behind a lock for more than a second,
that is almost always a problem that is worth a log entry.
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-By: Michael Banck <mbanck@gmx.net>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Christoph Berg <myon@debian.org>
Reviewed-By: Stephen Frost <sfrost@snowman.net>
Discussion: https://postgr.es/m/b8b8502915e50f44deb111bc0b43a99e2733e117.camel%40cybertec.at
---
doc/src/sgml/config.sgml | 2 +-
src/backend/storage/lmgr/proc.c | 2 +-
src/backend/utils/misc/guc_parameters.dat | 2 +-
src/backend/utils/misc/postgresql.conf.sample | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2a3685f474a..3c33d5d0fbc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7929,7 +7929,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
Controls whether a log message is produced when a session waits
longer than <xref linkend="guc-deadlock-timeout"/> to acquire a
lock. This is useful in determining if lock waits are causing
- poor performance. The default is <literal>off</literal>.
+ poor performance. The default is <literal>on</literal>.
Only superusers and users with the appropriate <literal>SET</literal>
privilege can change this setting.
</para>
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index e9ef0fbfe32..96f29aafc39 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -60,7 +60,7 @@ int LockTimeout = 0;
int IdleInTransactionSessionTimeout = 0;
int TransactionTimeout = 0;
int IdleSessionTimeout = 0;
-bool log_lock_waits = false;
+bool log_lock_waits = true;
/* Pointer to this process's PGPROC struct, if any */
PGPROC *MyProc = NULL;
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 0da01627cfe..6bc6be13d2a 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -566,7 +566,7 @@
{ name => 'log_lock_waits', type => 'bool', context => 'PGC_SUSET', group => 'LOGGING_WHAT',
short_desc => 'Logs long lock waits.',
variable => 'log_lock_waits',
- boot_val => 'false',
+ boot_val => 'true',
},
{ name => 'log_lock_failures', type => 'bool', context => 'PGC_SUSET', group => 'LOGGING_WHAT',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 26c08693564..c36fcb9ab61 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -624,7 +624,7 @@
# processes
# %% = '%'
# e.g. '<%u%%%d> '
-#log_lock_waits = off # log lock waits >= deadlock_timeout
+#log_lock_waits = on # log lock waits >= deadlock_timeout
#log_lock_failures = off # log lock failures
#log_recovery_conflict_waits = off # log standby recovery conflict waits
# >= deadlock_timeout
--
2.51.0
On Sep 8, 2025, at 05:46, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2025-09-04 at 14:12 +0900, Shinya Kato wrote:
Due to 635998965 [0], you need to change guc_parameters.dat instead of
guc_tables.c.Thanks for pointing that out to me!
Here is an updated patch.Yours,
Laurenz Albe
<v2-0001-Default-to-log_lock_waits-on.patch>
I am hesitate to support this change. I agree log_lock_waits is a powerful tool, but it’s more like a diagnostic tool. On a high-concurrency system with many tractions running at the same time, it may quickly fill up log files and make it difficult to find other errors. So, we usually only turn it on when needed, and turn it off once debugging is done.
But v2 code looks good to me.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Mon, 2025-09-08 at 10:38 +0800, Chao Li wrote:
I am hesitate to support this change. I agree log_lock_waits is a powerful tool,
but it’s more like a diagnostic tool. On a high-concurrency system with many tractions
running at the same time, it may quickly fill up log files and make it difficult
to find other errors. So, we usually only turn it on when needed, and turn it off
once debugging is done.
You still can do that - only the default changed.
I think that if your log fills up with messages indicating that sessions are stuck
behind a lock for more than one second, and it is a high concurrency system, you
have got a severe problem that you want to know about.
What I argue is that while log_lock_waits=on is not the correct setting for
everybody, it is the correct setting for the vast majority of databases.
Yours,
Laurenz Albe
On 07.09.25 23:46, Laurenz Albe wrote:
On Thu, 2025-09-04 at 14:12 +0900, Shinya Kato wrote:
Due to 635998965 [0], you need to change guc_parameters.dat instead of
guc_tables.c.Thanks for pointing that out to me!
Here is an updated patch.
committed