control max length of parameter values logged
Hello
Patch ba79cb5 (for full discussion see [1]/messages/by-id/0146a67b-a22a-0519-9082-bc29756b93a2@imap.cc) introduces a feature to log
bind parameter values on error,
which greatly helps to reproduce errors artificially having only server
log -- thanks everyone who
reviewed and improved it!
However, it cuts the values, as some of the reviewers were worried log
could fill up too quickly.
This applies both to the new case of logging parameter values and to the
existing ones due to
log_min_duration_statement or log_statement.
This is a backwards-incompatible change, and also ruins the idea of
reproducible errors -- sorry Tom
I failed to second this idea [2]/messages/by-id/11425.1575927321@sss.pgh.pa.us in time, before the change was pushed.
I personally don't think that we necessarily need to cut the values, we
can rely on the users
being careful when using this feature -- in the same way we trusted them
use similarly dangerous
log_min_duration_statement and especially log_statement for ages. At
least it's better than having
no option to disable it. Alvaro's opinion was different [3]/messages/by-id/20191209200531.GA19848@alvherre.pgsql. What do you
think
of adding a parameter to limit max logged parameter length? See patch
attached.
Best, Alex
[1]: /messages/by-id/0146a67b-a22a-0519-9082-bc29756b93a2@imap.cc
[2]: /messages/by-id/11425.1575927321@sss.pgh.pa.us
[3]: /messages/by-id/20191209200531.GA19848@alvherre.pgsql
Attachments:
log_parameter_max_length_v001.patchtext/x-patch; charset=UTF-8; name=log_parameter_max_length_v001.patchDownload+55-9
Alexey Bashtanov <bashtanov@imap.cc> writes:
I personally don't think that we necessarily need to cut the values, we
can rely on the users
being careful when using this feature -- in the same way we trusted them
use similarly dangerous
log_min_duration_statement and especially log_statement for ages. At
least it's better than having
no option to disable it. Alvaro's opinion was different [3]. What do you
think
of adding a parameter to limit max logged parameter length? See patch
attached.
This patch is failing to build docs (per the cfbot) and it also fails
check-world because you changed behavior tested by ba79cb5dc's test case.
Attached is an update that hopefully will make the cfbot happy.
I agree that something ought to be done here, but I'm not sure that
this is exactly what. It appears to me that there are three related
but distinct behaviors under discussion:
1. Truncation of bind parameters returned to clients in error message
detail fields.
2. Truncation of bind parameters written to the server log in logged
error messages.
3. Truncation of bind parameters written to the server log in non-error
statement logging actions (log_statement and variants).
Historically we haven't truncated any of these, I believe. As of
ba79cb5dc we forcibly truncate #1 and #2 at 64 bytes, but not #3.
Your patch proposes to provide a SUSET GUC that controls the
truncation length for all 3.
I think that the status quo as of ba79cb5dc is entirely unacceptable,
because there is no recourse if you want to find out why a statement
is failing and the reason is buried in some long parameter string.
However, this patch doesn't really fix it, because it still seems
pretty annoying that you need to be superuser to adjust what gets
sent back to the client. Maybe that isn't a problem in practice
(since the client presumably has the original parameter value laying
about), but it seems conceptually wrong.
On the other hand, that line of reasoning leads to wanting two
separate GUCs (one for #1 and one for the other two), which seems
like overkill, plus it's going to be hard/expensive to have the
outputs for #1 and #2 not be the same.
I do agree that it seems weird and random (not to say 100% backward)
that error cases provide only truncated values but routine query
logging insists on emitting full untruncated values. I should think
that the most common use-cases would want it the other way round.
So I feel like we'd better resolve these definitional questions about
what behavior we actually want. I agree that ba79cb5dc is not
terribly well thought out as it stands.
regards, tom lane
Attachments:
log_parameter_max_length_v2.patchtext/x-diff; charset=us-ascii; name=log_parameter_max_length_v2.patchDownload+60-12
On 2020-Mar-10, Tom Lane wrote:
I agree that something ought to be done here, but I'm not sure that
this is exactly what. It appears to me that there are three related
but distinct behaviors under discussion:1. Truncation of bind parameters returned to clients in error message
detail fields.
2. Truncation of bind parameters written to the server log in logged
error messages.
3. Truncation of bind parameters written to the server log in non-error
statement logging actions (log_statement and variants).Historically we haven't truncated any of these, I believe. As of
ba79cb5dc we forcibly truncate #1 and #2 at 64 bytes, but not #3.
Your patch proposes to provide a SUSET GUC that controls the
truncation length for all 3.
The reason I didn't change the other uses was precisely that it was
established behavior, but my opinion was that truncating them would
be better, now that we have code to handle doing so.
Maybe it would make sense to always log complete parameters for error
cases when that feature is enabled, and have the GUC only control the
lengths logged for non-error cases?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
Maybe it would make sense to always log complete parameters for error
cases when that feature is enabled, and have the GUC only control the
lengths logged for non-error cases?
I could get behind that. It's a bit different from the original
idea here, but I think it's closer to being real-world-useful.
Another way to slice this up would be to have a USERSET GUC that
controls truncation of parameter values in errors, and a separate
SUSET GUC that controls it for the non-error statement logging
cases. I'm not sure how much that's actually worth, but if we
feel that truncation in error cases can be useful, that's how
I'd vote to expose it.
regards, tom lane
On 2020-Mar-11, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
Maybe it would make sense to always log complete parameters for error
cases when that feature is enabled, and have the GUC only control the
lengths logged for non-error cases?I could get behind that. It's a bit different from the original
idea here, but I think it's closer to being real-world-useful.Another way to slice this up would be to have a USERSET GUC that
controls truncation of parameter values in errors, and a separate
SUSET GUC that controls it for the non-error statement logging
cases. I'm not sure how much that's actually worth, but if we
feel that truncation in error cases can be useful, that's how
I'd vote to expose it.
Either of these ideas work for me. I think I like the latter more,
since it allows to configure truncation in all cases. (I'm not really
sure I understand why one of them must be SUSET.)
The reason I'm so hot about parameter truncation is that we've seen
cases where customers' log files contain log lines many megabytes long
because of gigantic parameters; UUID arrays with tens of thousands of
entries, and such. Sometimes we see those in the normal "statement"
line because $customer interpolates into the query literal; normally the
"solution" is to move the params from interpolated into a parameter.
But if we log all parameters whole, that workaround no longer works, so
a way to clip is necessary.
I agree that truncating the value that can be disabled while not
truncating the values that cannot be disabled, is a bit silly.
I'm okay with the default being not to clip anything.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2020-Mar-11, Tom Lane wrote:
Another way to slice this up would be to have a USERSET GUC that
controls truncation of parameter values in errors, and a separate
SUSET GUC that controls it for the non-error statement logging
cases. I'm not sure how much that's actually worth, but if we
feel that truncation in error cases can be useful, that's how
I'd vote to expose it.
Either of these ideas work for me. I think I like the latter more,
since it allows to configure truncation in all cases. (I'm not really
sure I understand why one of them must be SUSET.)
We generally suppose that GUCs that control statement logging should be
SUSET, so that unprivileged users don't get to hide their activity from
the log. On the other hand, I think it's okay for error logging (as
opposed to statement tracing) to be under user control, because the user
can simply avoid or trap an error if he doesn't want it to be logged.
The reason I'm so hot about parameter truncation is that we've seen
cases where customers' log files contain log lines many megabytes long
because of gigantic parameters; UUID arrays with tens of thousands of
entries, and such. Sometimes we see those in the normal "statement"
line because $customer interpolates into the query literal; normally the
"solution" is to move the params from interpolated into a parameter.
But if we log all parameters whole, that workaround no longer works, so
a way to clip is necessary.
Agreed, it seems like there's a fairly compelling case for being
able to clip.
I'm okay with the default being not to clip anything.
Also agreed. It's been like it is for a long time with not that
many complaints, so the case for changing the default behavior
seems a bit weak.
Barring other opinions, I think we have consensus here on what
to do. Alexey, will you update your patch?
regards, tom lane
On Thu, Mar 12, 2020 at 12:01:24PM -0400, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
The reason I'm so hot about parameter truncation is that we've seen
cases where customers' log files contain log lines many megabytes long
because of gigantic parameters; UUID arrays with tens of thousands of
entries, and such. Sometimes we see those in the normal "statement"
line because $customer interpolates into the query literal; normally the
"solution" is to move the params from interpolated into a parameter.
But if we log all parameters whole, that workaround no longer works, so
a way to clip is necessary.Agreed, it seems like there's a fairly compelling case for being
able to clip.I'm okay with the default being not to clip anything.
Also agreed. It's been like it is for a long time with not that
many complaints, so the case for changing the default behavior
seems a bit weak.Barring other opinions, I think we have consensus here on what
to do. Alexey, will you update your patch?
I am sorry --- I am confused. Why are we truncating or allowing control
of truncation of BIND parameter values, but have no such facility for
queries. Do we assume queries are shorter than BIND parameters, or is
it just that it is easier to trim BIND parameters than values embedded
in non-EXECUTE queries.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
Bruce Momjian <bruce@momjian.us> writes:
I am sorry --- I am confused. Why are we truncating or allowing control
of truncation of BIND parameter values, but have no such facility for
queries. Do we assume queries are shorter than BIND parameters, or is
it just that it is easier to trim BIND parameters than values embedded
in non-EXECUTE queries.
The cases that Alvaro was worried about were enormous values supplied
via bind parameters. We haven't heard comparable complaints about
the statement text. Also, from a security standpoint, the contents
of the statement text are way more critical than the contents of
an out-of-line parameter; you can't do SQL injection from the latter.
So I think the audience for trimming would be a lot smaller for
statement-text trimming.
regards, tom lane
On 2020-Mar-14, Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
I am sorry --- I am confused. Why are we truncating or allowing control
of truncation of BIND parameter values, but have no such facility for
queries. Do we assume queries are shorter than BIND parameters, or is
it just that it is easier to trim BIND parameters than values embedded
in non-EXECUTE queries.The cases that Alvaro was worried about were enormous values supplied
via bind parameters. We haven't heard comparable complaints about
the statement text.
To be more precise, I have seen cases of enormous statement text, but
those are fixed precisely by moving the bulk to parameters. So the
ability to trim the parameter is important. I've never seen a very
large query without the bulk being parameterizable.
Also, from a security standpoint, the contents
of the statement text are way more critical than the contents of
an out-of-line parameter; you can't do SQL injection from the latter.
That's a good point too.
So I think the audience for trimming would be a lot smaller for
statement-text trimming.
Nod. (I think if we really wanted to trim queries, it would have to be
something semantically sensible, not just trim whatever is at the end of
the statement literal. Say, only trim parts of the where clause that
are of the form "something op constant", and rules like that, plus put
placeholders to show that they were there. This sounds a lot of work to
figure out usefully ...)
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi,
BIND parameter truncation is good to me.
Logs could be very hard to read due to
the very long parameters recorded.
Now, parameter is extracuted from left.
e.g. "AAAA-BBBB-CCCC-DDDD-EEEE" to "AAAA-BBBB-CCCC..."
Is not necessary from right?
e.g. "AAAA-BBBB-CCCC-DDDD-EEEE" to "...CCCC-DDDD-EEEE"
It is sometimes nice to be able to check
the last of strings. For example, if there
are difference only in the last of strings
of many parameters.
Best Regards,
Keisuke Kuroda
On Sun, Mar 15, 2020 at 08:48:33PM -0300, Alvaro Herrera wrote:
On 2020-Mar-14, Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
I am sorry --- I am confused. Why are we truncating or allowing control
of truncation of BIND parameter values, but have no such facility for
queries. Do we assume queries are shorter than BIND parameters, or is
it just that it is easier to trim BIND parameters than values embedded
in non-EXECUTE queries.The cases that Alvaro was worried about were enormous values supplied
via bind parameters. We haven't heard comparable complaints about
the statement text.To be more precise, I have seen cases of enormous statement text, but
those are fixed precisely by moving the bulk to parameters. So the
ability to trim the parameter is important. I've never seen a very
large query without the bulk being parameterizable.
I don't claim our use is a common case or a good example but I'm going to offer
a data point.
We have very long query strings, even while using bind parameters.
Our loader process uses upsert and prepared statements.
So we might run: INSERT INTO t (k1,k2,a,b,...) VALUES($1,$2,$3,$4)
ON CONFLICT(k1,k2) DO UPDATE SET a=excluded.a,b=excluded.b
..which is fine, but we also have large number of columns - historically up to
1600. If a query fails, the error might be a query string 2+ pages long.
Looks like we have common cases (and executed many times) with:
24k long message and 86k long param string
70k long message and 10k long param string
Having full log on error is important, more to the client but also in the
server log. But it would be nice if we could reduce the server logs. Most of
the prepare string is of little value if there's no error: VALUES ($1,$2,$3,)
(but prepared query is at least better than repeating the query string).
Related: a year ago, I wrote about the repetition of the "PREPARE" statement in
query logs.
/messages/by-id/20190208132953.GF29720@telsasoft.com
Ultimately I withdrew that patch and switched to log_statement_min_duration.
Nod. (I think if we really wanted to trim queries, it would have to be
something semantically sensible, not just trim whatever is at the end of
the statement literal.
If it were easy, I would truncate query strings to a few hundred bytes.
--
Justin
Also agreed. It's been like it is for a long time with not that
many complaints, so the case for changing the default behavior
seems a bit weak.Barring other opinions, I think we have consensus here on what
to do. Alexey, will you update your patch?
Sorry for the delay, please could you have a look?
Best, Alex
Attachments:
log_parameter_max_length_v3.patchtext/x-patch; charset=UTF-8; name=log_parameter_max_length_v3.patchDownload+134-12
Alexey Bashtanov <bashtanov@imap.cc> writes:
Sorry for the delay, please could you have a look?
Got it, will look tomorrow. (I think it's important to get this
done for v13, before we ship this behavior.)
regards, tom lane
Hi,
On Wed, Apr 01, 2020 at 01:52:48AM +0100, Alexey Bashtanov wrote:
+++ b/doc/src/sgml/config.sgml + <varlistentry id="guc-log-parameter-max-length" xreflabel="log_parameter_max_length"> + <term><varname>log_parameter_max_length</varname> (<type>integer</type>) + <indexterm> + <primary><varname>log_parameter_max_length</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + If greater than zero, bind parameter values reported in non-error + statement-logging messages are trimmed to no more than this many bytes.
Can I suggest to say:
"Limit bind parameter values reported by non-error statement-logging messages
to this many bytes". Or,
"The maximum length of bind parameter values to log with non-error
statement-logging messages".
--- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -2855,6 +2857,28 @@ static struct config_int ConfigureNamesInt[] = NULL, NULL, NULL },+ { + {"log_parameter_max_length", PGC_SUSET, LOGGING_WHAT, + gettext_noop("When logging statements, limit logged parameter values to first N bytes."), + gettext_noop("Zero to print values in full."),
Could you make zero a normal value and -1 the "special" value to disable
trimming ?
Setting to zero will avoid displaying parameters at all, setting to -1 wil
display values in full.
Cheers,
--
Justin
Hi,
+ If greater than zero, bind parameter values reported in non-error + statement-logging messages are trimmed to no more than this many bytes.Can I suggest to say:
"Limit bind parameter values reported by non-error statement-logging messages
to this many bytes". Or,"The maximum length of bind parameter values to log with non-error
statement-logging messages".
Okay I'll rephrase.
Could you make zero a normal value and -1 the "special" value to disable
trimming ?Setting to zero will avoid displaying parameters at all, setting to -1 wil
display values in full.
I can, but then for the sake of consistency I'll have to do the same for
log_parameter_max_length.
Then we'll end up with two ways to enable/disable parameter logging on
error:
using the primary boolean setting and setting length to 0.
One of them will require superuser privileges, the other one won't.
Do you think it's okay? I have no objections but I'm a bit worried
someone may find it clumsy.
Best, Alex
On Wed, Apr 01, 2020 at 10:10:55AM +0100, Alexey Bashtanov wrote:
Hi,
+ If greater than zero, bind parameter values reported in non-error + statement-logging messages are trimmed to no more than this many bytes.Can I suggest to say:
"Limit bind parameter values reported by non-error statement-logging messages
to this many bytes". Or,"The maximum length of bind parameter values to log with non-error
statement-logging messages".Okay I'll rephrase.
Could you make zero a normal value and -1 the "special" value to disable
trimming ?Setting to zero will avoid displaying parameters at all, setting to -1 wil
display values in full.I can, but then for the sake of consistency I'll have to do the same for
log_parameter_max_length.
Then we'll end up with two ways to enable/disable parameter logging on
error:
using the primary boolean setting and setting length to 0.
One of them will require superuser privileges, the other one won't.
I guess you're referring to log_parameters_on_error.
Does it have to be SUSET ?
Or maybe log_parameters_on_error doesn't need to exist at all, and setting
log_parameter_max_length=0 can be used to disable parameter logging.
I showed up late to this thread, so let's see what others think.
--
Justin
Justin Pryzby <pryzby@telsasoft.com> writes:
On Wed, Apr 01, 2020 at 10:10:55AM +0100, Alexey Bashtanov wrote:
Could you make zero a normal value and -1 the "special" value to disable
trimming ?
I can, but then for the sake of consistency I'll have to do the same for
log_parameter_max_length.
Then we'll end up with two ways to enable/disable parameter logging on
error:
using the primary boolean setting and setting length to 0.
One of them will require superuser privileges, the other one won't.
I guess you're referring to log_parameters_on_error.
Does it have to be SUSET ?
Or maybe log_parameters_on_error doesn't need to exist at all, and setting
log_parameter_max_length=0 can be used to disable parameter logging.
I showed up late to this thread, so let's see what others think.
I think Justin's got a point: defining zero this way is weirdly
inconsistent. -1, being clearly outside the domain of possible
length limits, makes more sense as a marker for "don't trim".
Alexey's right that having a separate boolean flag is pointless, but
I think we could just drop the boolean; we haven't shipped that yet.
The privilege argument seems irrelevant to me. We already decided
that the plan is (a) SUSET for non-error statement logging purposes and
(b) USERSET for logging caused by errors, and that would have to apply
to length limits as well as enable/disable ability. Otherwise a user
could pretty effectively disable logging by setting the length to 1.
regards, tom lane
Hi,
The privilege argument seems irrelevant to me. We already decided
that the plan is (a) SUSET for non-error statement logging purposes and
(b) USERSET for logging caused by errors, and that would have to apply
to length limits as well as enable/disable ability. Otherwise a user
could pretty effectively disable logging by setting the length to 1.
The only privilege that user can gain if we drop the boolean is to
*enable* logging parameters on error.
That gives user a little bit easier way to fill up the disk with logs,
but they anyway can do that if they want to.
If that's okay with everyone, please see the new version attached.
Best, Alex
Attachments:
log_parameter_max_length_v4.patchtext/x-patch; charset=UTF-8; name=log_parameter_max_length_v4.patchDownload+171-44
Thanks for updating the patch.
On Thu, Apr 02, 2020 at 01:29:04AM +0100, Alexey Bashtanov wrote:
+ If greater than zero, bind parameter values reported in non-error + statement-logging messages are trimmed to no more than this many bytes. + If this value is specified without units, it is taken as bytes. + Zero disables logging parameters with statements. + <literal>-1</literal> (the default) makes parameters logged in full.
say: "..causes parameters to be logged in full".
Also..I just realized that this truncates *each* parameter, rather than
truncating the parameter list.
So say: "
|If greater than zero, each bind parameter value reported in a non-error
|statement-logging messages is trimmed to no more than this number of bytes.
+ Non-zero values add some overhead, as + <productname>PostgreSQL</productname> will compute and store textual + representations of parameter values in memory for all statements, + even if they eventually do not get logged.
say: "even if they are ultimately not logged"
+++ b/src/backend/nodes/params.c @@ -280,6 +280,7 @@ BuildParamLogString(ParamListInfo params, char **knownTextValues, int maxlen) oldCxt; StringInfoData buf;+ Assert(maxlen == -1 || maxlen > 0);
You can write: >= -1
- if (log_parameters_on_error) + if (log_parameter_max_length_on_error != 0)
I would write this as >= 0
+ if (log_parameter_max_length_on_error > 0) + { + /* + * We can trim the saved string, knowing that we + * won't print all of it. But we must copy at + * least two more full characters than + * BuildParamLogString wants to use; otherwise it + * might fail to include the trailing ellipsis. + */ + knownTextValues[paramno] = + pnstrdup(pstring, + log_parameter_max_length_on_error + + 2 * MAX_MULTIBYTE_CHAR_LEN);
The comment says we need at least 2 chars, but
log_parameter_max_length_on_error might be 1, so I think you can either add 64
byte fudge factor, like before, or do Max(log_parameter_max_length_on_error, 2).
+ } + else + knownTextValues[paramno] = pstrdup(pstring);
I suggest to handle the "== -1" case first and the > 0 case as "else".
Thanks,
--
Justin
Hi,
Please see the new version attached.
+ If greater than zero, bind parameter values reported in non-error + statement-logging messages are trimmed to no more than this many bytes. + If this value is specified without units, it is taken as bytes. + Zero disables logging parameters with statements. + <literal>-1</literal> (the default) makes parameters logged in full.say: "..causes parameters to be logged in full".
Also..I just realized that this truncates *each* parameter, rather than
truncating the parameter list.So say: "
|If greater than zero, each bind parameter value reported in a non-error
|statement-logging messages is trimmed to no more than this number of bytes.
okay
I also changed "trimmed to no more than this many bytes" to "trimmed to
this many bytes".
It's not that pedantic any more but hopefully less awkward.
+ Non-zero values add some overhead, as + <productname>PostgreSQL</productname> will compute and store textual + representations of parameter values in memory for all statements, + even if they eventually do not get logged.say: "even if they are ultimately not logged"
okay
+++ b/src/backend/nodes/params.c @@ -280,6 +280,7 @@ BuildParamLogString(ParamListInfo params, char **knownTextValues, int maxlen) oldCxt; StringInfoData buf;+ Assert(maxlen == -1 || maxlen > 0);
You can write: >= -1
no, I'm specifically checking they don't pass 0
- if (log_parameters_on_error) + if (log_parameter_max_length_on_error != 0)I would write this as >= 0
no, I'm specifically checking for both positives and -1
+ if (log_parameter_max_length_on_error > 0) + { + /* + * We can trim the saved string, knowing that we + * won't print all of it. But we must copy at + * least two more full characters than + * BuildParamLogString wants to use; otherwise it + * might fail to include the trailing ellipsis. + */ + knownTextValues[paramno] = + pnstrdup(pstring, + log_parameter_max_length_on_error + + 2 * MAX_MULTIBYTE_CHAR_LEN);The comment says we need at least 2 chars, but
log_parameter_max_length_on_error might be 1, so I think you can either add 64
byte fudge factor, like before, or do Max(log_parameter_max_length_on_error, 2).
That's the code I reused without deep analysis TBH.
I think it's mostly for to allocate the space for the ellipsis in case
it needs to be added,
not to copy any actual characters, that's why we add.
+ } + else + knownTextValues[paramno] = pstrdup(pstring);I suggest to handle the "== -1" case first and the > 0 case as "else".
Good, as long as I thought of this too, I'm changing that.
Best, Alex