Log timestamps at higher resolution

Started by David Fetterabout 7 years ago16 messages
#1David Fetter
david@fetter.org
1 attachment(s)

Folks,

Per gripes I've been hearing with increasing frequency, please find
attached a patch that implements $Subject. It's microsecond resolution
because at least at the moment, nanosecond resolution doesn't appear
to be helpful in this context.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachments:

0001-log_line_prefix-was-milliseconds-now-microseconds.patchtext/x-diff; charset=us-asciiDownload
From 29e620bbae9d458e5789350ed19d63c48b8459ff Mon Sep 17 00:00:00 2001
From: David Fetter <david@fetter.org>
Date: Tue, 23 Oct 2018 11:35:34 -0700
Subject: [PATCH] log_line_prefix was milliseconds: now microseconds
To: pgsql-hackers@postgresql.org

---
 doc/src/sgml/config.sgml                      |  6 +++---
 src/backend/utils/error/elog.c                | 14 +++++++-------
 src/backend/utils/misc/postgresql.conf.sample |  6 +++---
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7554cba3f9..8faaa35b86 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5593,17 +5593,17 @@ local0.*    /var/log/postgresql
             </row>
             <row>
              <entry><literal>%t</literal></entry>
-             <entry>Time stamp without milliseconds</entry>
+             <entry>Time stamp without microseconds</entry>
              <entry>no</entry>
             </row>
             <row>
              <entry><literal>%m</literal></entry>
-             <entry>Time stamp with milliseconds</entry>
+             <entry>Time stamp with microseconds</entry>
              <entry>no</entry>
             </row>
             <row>
              <entry><literal>%n</literal></entry>
-             <entry>Time stamp with milliseconds (as a Unix epoch)</entry>
+             <entry>Time stamp with microseconds (as a Unix epoch)</entry>
              <entry>no</entry>
             </row>
             <row>
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index b9c11301ca..70a61fde4c 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2222,7 +2222,7 @@ static void
 setup_formatted_log_time(void)
 {
 	pg_time_t	stamp_time;
-	char		msbuf[13];
+	char		msbuf[16];
 
 	if (!saved_timeval_set)
 	{
@@ -2238,13 +2238,13 @@ setup_formatted_log_time(void)
 	 * nonempty or CSV mode can be selected.
 	 */
 	pg_strftime(formatted_log_time, FORMATTED_TS_LEN,
-	/* leave room for milliseconds... */
-				"%Y-%m-%d %H:%M:%S     %Z",
+	/* leave room for microseconds... */
+				"%Y-%m-%d %H:%M:%S        %Z",
 				pg_localtime(&stamp_time, log_timezone));
 
-	/* 'paste' milliseconds into place... */
-	sprintf(msbuf, ".%03d", (int) (saved_timeval.tv_usec / 1000));
-	memcpy(formatted_log_time + 19, msbuf, 4);
+	/* 'paste' microseconds into place... */
+	sprintf(msbuf, ".%06d", saved_timeval.tv_usec);
+	memcpy(formatted_log_time + 19, msbuf, 7);
 }
 
 /*
@@ -2665,7 +2665,7 @@ write_csvlog(ErrorData *edata)
 	initStringInfo(&buf);
 
 	/*
-	 * timestamp with milliseconds
+	 * timestamp with microseconds
 	 *
 	 * Check if the timestamp is already calculated for the syslog message,
 	 * and use it if so.  Otherwise, get the current timestamp.  This is done
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 4e61bc6521..dcea840b8f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -470,9 +470,9 @@
 					#   %r = remote host and port
 					#   %h = remote host
 					#   %p = process ID
-					#   %t = timestamp without milliseconds
-					#   %m = timestamp with milliseconds
-					#   %n = timestamp with milliseconds (as a Unix epoch)
+					#   %t = timestamp without microseconds
+					#   %m = timestamp with microseconds
+					#   %n = timestamp with microseconds (as a Unix epoch)
 					#   %i = command tag
 					#   %e = SQL state
 					#   %c = session ID
-- 
2.17.2

#2Thomas Munro
thomas.munro@enterprisedb.com
In reply to: David Fetter (#1)
Re: Log timestamps at higher resolution

On Wed, Oct 24, 2018 at 7:51 AM David Fetter <david@fetter.org> wrote:

Per gripes I've been hearing with increasing frequency, please find
attached a patch that implements $Subject. It's microsecond resolution
because at least at the moment, nanosecond resolution doesn't appear
to be helpful in this context.

Wouldn't you want to choose a new letter or some other way to make
existing format control strings do what they always did?

--
Thomas Munro
http://www.enterprisedb.com

#3David Fetter
david@fetter.org
In reply to: Thomas Munro (#2)
Re: Log timestamps at higher resolution

On Wed, Oct 24, 2018 at 08:00:24AM +1300, Thomas Munro wrote:

On Wed, Oct 24, 2018 at 7:51 AM David Fetter <david@fetter.org> wrote:

Per gripes I've been hearing with increasing frequency, please find
attached a patch that implements $Subject. It's microsecond resolution
because at least at the moment, nanosecond resolution doesn't appear
to be helpful in this context.

Wouldn't you want to choose a new letter or some other way to make
existing format control strings do what they always did?

I hadn't because I'd looked at the existing format as merely buggy in
lacking precision, although I guess with really fussy log processors,
this change could break things.

Have you seen processors like that in the wild?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Fetter (#3)
Re: Log timestamps at higher resolution

On 2018-Oct-23, David Fetter wrote:

On Wed, Oct 24, 2018 at 08:00:24AM +1300, Thomas Munro wrote:

On Wed, Oct 24, 2018 at 7:51 AM David Fetter <david@fetter.org> wrote:

Per gripes I've been hearing with increasing frequency, please find
attached a patch that implements $Subject. It's microsecond resolution
because at least at the moment, nanosecond resolution doesn't appear
to be helpful in this context.

Wouldn't you want to choose a new letter or some other way to make
existing format control strings do what they always did?

I hadn't because I'd looked at the existing format as merely buggy in
lacking precision, although I guess with really fussy log processors,
this change could break things.

Have you seen processors like that in the wild?

pgbadger does this:
'%m' => [('t_mtimestamp', '(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\.\d+(?: [A-Z\+\-\d]{3,6})?')], # timestamp with milliseconds

which should cope with however many digits there are (\d+).
But I would expect others to be less forgiving ...

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5David Fetter
david@fetter.org
In reply to: Alvaro Herrera (#4)
Re: Log timestamps at higher resolution

On Tue, Oct 23, 2018 at 04:14:50PM -0300, Alvaro Herrera wrote:

On 2018-Oct-23, David Fetter wrote:

On Wed, Oct 24, 2018 at 08:00:24AM +1300, Thomas Munro wrote:

On Wed, Oct 24, 2018 at 7:51 AM David Fetter <david@fetter.org> wrote:

Per gripes I've been hearing with increasing frequency, please find
attached a patch that implements $Subject. It's microsecond resolution
because at least at the moment, nanosecond resolution doesn't appear
to be helpful in this context.

Wouldn't you want to choose a new letter or some other way to make
existing format control strings do what they always did?

I hadn't because I'd looked at the existing format as merely buggy in
lacking precision, although I guess with really fussy log processors,
this change could break things.

Have you seen processors like that in the wild?

pgbadger does this:
'%m' => [('t_mtimestamp', '(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\.\d+(?: [A-Z\+\-\d]{3,6})?')], # timestamp with milliseconds

which should cope with however many digits there are (\d+).
But I would expect others to be less forgiving ...

That's an interesting point. pgbadger is the only one I recall using
that's still maintained. Which others would it be useful to test?

Also, do we have tests--or at least ideas of how to
test--functionality relating to logging? I was a little bit taken
aback by the fact that `make check-world` passed after the change.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#6Michael Paquier
michael@paquier.xyz
In reply to: David Fetter (#5)
Re: Log timestamps at higher resolution

On Wed, Oct 24, 2018 at 12:11:18AM +0200, David Fetter wrote:

That's an interesting point. pgbadger is the only one I recall using
that's still maintained. Which others would it be useful to test?

There could be private solutions as well. My take is that we should use
separate letters and not change the existing ones or we'll get
complains.

Also, do we have tests--or at least ideas of how to
test--functionality relating to logging? I was a little bit taken
aback by the fact that `make check-world` passed after the change.

This requires server-level changes where a TAP test is usually adapted,
and there is no test for logging yet.
--
Michael

#7David Fetter
david@fetter.org
In reply to: Michael Paquier (#6)
Re: Log timestamps at higher resolution

On Wed, Oct 24, 2018 at 08:46:47AM +0900, Michael Paquier wrote:

On Wed, Oct 24, 2018 at 12:11:18AM +0200, David Fetter wrote:

That's an interesting point. pgbadger is the only one I recall using
that's still maintained. Which others would it be useful to test?

There could be private solutions as well. My take is that we should use
separate letters and not change the existing ones or we'll get
complains.

I believe that the utility of having finer time resolution outweighs
the inconvenience of changing processing systems, to the extent that
that's a consideration.

Adding yet more random letter options to log_line_prefix isn't free
either.

For one thing, there are a limited number of single-letter options we
can use, and we need to make sure they can continue to make sense, or
at least have reasonably consistent meanings.

For another, having separate letter rather than number modifiers as
printf("%03d") does, is just lousy API design. Baroque is OK for
music if you're in the mood, but not so great for log line prefix
format codes. If we do go the number option route, we can only not
break people's fussy parsers by having a default format of length 3, a
decision which will get harder and harder to justify as time goes on.

I'm happy to code up a number option if that's what people want, but
that'll break things the same way the current patch does, modulo The
Inexplicable Decision To Default To Three Decimal Places, which code
archaeologists will love and system implementers will hate.

Also, do we have tests--or at least ideas of how to
test--functionality relating to logging? I was a little bit taken
aback by the fact that `make check-world` passed after the change.

This requires server-level changes where a TAP test is usually adapted,
and there is no test for logging yet.

Is that worth a separate patch?

I suspect that as things Cloud™ (a.k.a. Renting Other People's Data
Centers) get even more popular, we'll want to improve our logging
game, and that will sooner rather than later have features complicated
enough to be worth testing.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Fetter (#3)
Re: Log timestamps at higher resolution

David Fetter <david@fetter.org> writes:

On Wed, Oct 24, 2018 at 08:00:24AM +1300, Thomas Munro wrote:

On Wed, Oct 24, 2018 at 7:51 AM David Fetter <david@fetter.org> wrote:

Per gripes I've been hearing with increasing frequency, please find
attached a patch that implements $Subject. It's microsecond resolution
because at least at the moment, nanosecond resolution doesn't appear
to be helpful in this context.

Wouldn't you want to choose a new letter or some other way to make
existing format control strings do what they always did?

I hadn't because I'd looked at the existing format as merely buggy in
lacking precision, although I guess with really fussy log processors,
this change could break things.

Aside from breaking things, this would increase log volume, which is
already a pain point for many people. So I think your odds of getting
it committed as-is are nil. It might be OK as a different letter, or
with some other way of controlling the precision.

regards, tom lane

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Fetter (#7)
Re: Log timestamps at higher resolution

On 2018-Oct-24, David Fetter wrote:

For another, having separate letter rather than number modifiers as
printf("%03d") does, is just lousy API design.

I don't think the API is lousy as all that, but a further improvement to
allow a precision specifier might be a worthy feature addition -- say
%.6t or such (where %m would have the same meaning as %.3t).

Baroque is OK for music if you're in the mood,

Haha!

I suspect that as things Cloud™ (a.k.a. Renting Other People's Data
Centers) get even more popular, we'll want to improve our logging
game, and that will sooner rather than later have features complicated
enough to be worth testing.

Improved logging is already a sore need we have, regardless of weather
conditions.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#9)
Re: Log timestamps at higher resolution

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2018-Oct-24, David Fetter wrote:

For another, having separate letter rather than number modifiers as
printf("%03d") does, is just lousy API design.

I don't think the API is lousy as all that, but a further improvement to
allow a precision specifier might be a worthy feature addition -- say
%.6t or such (where %m would have the same meaning as %.3t).

+1. I think the patch as it stands is way too influenced by the purely
chance factor that "milli" and "micro" have the same initial letter in
English. David would never have submitted it in this form if that weren't
true; but that doesn't make either resolution the right answer for
everyone. And I do not want to have this discussion again in ten years
when somebody starts moaning that microsecond resolution is so last
century. Let's put in a number so that the format-string API can stay the
same as the required resolution moves.

regards, tom lane

#11David Fetter
david@fetter.org
In reply to: Tom Lane (#8)
Re: Log timestamps at higher resolution

On Wed, Oct 24, 2018 at 11:10:02AM +0100, Tom Lane wrote:

David Fetter <david@fetter.org> writes:

On Wed, Oct 24, 2018 at 08:00:24AM +1300, Thomas Munro wrote:

On Wed, Oct 24, 2018 at 7:51 AM David Fetter <david@fetter.org> wrote:

Per gripes I've been hearing with increasing frequency, please find
attached a patch that implements $Subject. It's microsecond resolution
because at least at the moment, nanosecond resolution doesn't appear
to be helpful in this context.

Wouldn't you want to choose a new letter or some other way to make
existing format control strings do what they always did?

I hadn't because I'd looked at the existing format as merely buggy in
lacking precision, although I guess with really fussy log processors,
this change could break things.

Aside from breaking things, this would increase log volume, which is
already a pain point for many people.

As to log volume, this may or may not be in our ambit to address.
There are Linux distros that "handle" this by automatically using
compression so slow that it completely tanks performance, as in
shrinks TPS by an order of magnitude or more. For me, the
lesson there is to make those trade-offs explicit and controllable by
our users rather than magically imposed by whatever means.

So I think your odds of getting it committed as-is are nil. It
might be OK as a different letter, or with some other way of
controlling the precision.

Would you be OK with a way to control the precision that defaults to
more precise and comes with a big loud release note about how to get
the previous behavior?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#12David Fetter
david@fetter.org
In reply to: Tom Lane (#10)
Re: Log timestamps at higher resolution

On Wed, Oct 24, 2018 at 03:17:09PM +0100, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2018-Oct-24, David Fetter wrote:

For another, having separate letter rather than number modifiers as
printf("%03d") does, is just lousy API design.

I don't think the API is lousy as all that, but a further improvement to
allow a precision specifier might be a worthy feature addition -- say
%.6t or such (where %m would have the same meaning as %.3t).

+1. I think the patch as it stands is way too influenced by the purely
chance factor that "milli" and "micro" have the same initial letter in
English. David would never have submitted it in this form if that weren't
true; but that doesn't make either resolution the right answer for
everyone. And I do not want to have this discussion again in ten years
when somebody starts moaning that microsecond resolution is so last
century. Let's put in a number so that the format-string API can stay the
same as the required resolution moves.

Digging a teensy bit deeper, I noticed that there's already a
"padding" (space padding, if I understand correctly) system for parts
of the log_line_prefix specification including %m. Did we get painted
into a corner here back in 9.4, when the padding feature was added?
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=4334639f4bb9fb88c13b8dd5faca22b207248504

Strangely, there were no tests that came with that either. David, did
you mean to expand it past space padding, or...?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#13David Rowley
david.rowley@2ndquadrant.com
In reply to: David Fetter (#12)
Re: Log timestamps at higher resolution

On 25 October 2018 at 11:25, David Fetter <david@fetter.org> wrote:

Digging a teensy bit deeper, I noticed that there's already a
"padding" (space padding, if I understand correctly) system for parts
of the log_line_prefix specification including %m. Did we get painted
into a corner here back in 9.4, when the padding feature was added?
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=4334639f4bb9fb88c13b8dd5faca22b207248504

I'm not sure "painted into a corner" is the right way to think about
this. Numerics immediately followed by the '%' are simply already used
by the space padding feature. If you want some other modifier then
you'll need to invent some other syntax to identify that. It looks
like using a '.' after the '%' was mentioned by Alvaro.

I think the best way to implement this is to rename
process_log_prefix_padding into process_log_prefix_modifiers and add
an "int *precision" variable and add some code to parse and populate
that based on if there's a '.' found in the prefix. Then in
log_line_prefix() just take into account the precision for any prefix
variable that it should apply to. So, %s likely should be altered too,
although that might require some additional code to NULLify the string
if the server config is reloaded, or perhaps you can just remember how
much precision that string was made with and only rebuild it if the
required precision changes. We don't seem to have any other floating
point type prefix variables, so I don't think any of the others need
to be altered.

Strangely, there were no tests that came with that either. David, did
you mean to expand it past space padding, or...?

Unsure what infrastructure existed then for testing this.

I didn't intend to add anything else later. I don't think we have any
variables that zero padding would be that useful for, but maybe
someone might want to zero pad their pids or xids... ?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#14Michael Paquier
michael@paquier.xyz
In reply to: David Rowley (#13)
Re: Log timestamps at higher resolution

On Thu, Oct 25, 2018 at 01:00:08PM +1300, David Rowley wrote:

On 25 October 2018 at 11:25, David Fetter <david@fetter.org> wrote:

Strangely, there were no tests that came with that either. David, did
you mean to expand it past space padding, or...?

Unsure what infrastructure existed then for testing this.

Close to none in-core for such things, the first set of TAP tests
included was 9.6, and at the time of 9.4 it was not even clear what we
were lookng for. You could have done things similar to what
test_decoding or pg_stat_statements are doing to enforce some
configuration parameters, but that would not have scaled with different
configurations. Another possibility would have been a buildfarm module.

Anyway, what has been done in this case looks fine to me with what was
present.
--
Michael

#15David Fetter
david@fetter.org
In reply to: David Rowley (#13)
Re: Log timestamps at higher resolution

On Thu, Oct 25, 2018 at 01:00:08PM +1300, David Rowley wrote:

On 25 October 2018 at 11:25, David Fetter <david@fetter.org> wrote:

Digging a teensy bit deeper, I noticed that there's already a
"padding" (space padding, if I understand correctly) system for parts
of the log_line_prefix specification including %m. Did we get painted
into a corner here back in 9.4, when the padding feature was added?
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=4334639f4bb9fb88c13b8dd5faca22b207248504

I'm not sure "painted into a corner" is the right way to think about
this.

Excellent!

Numerics immediately followed by the '%' are simply already used by
the space padding feature. If you want some other modifier then
you'll need to invent some other syntax to identify that. It looks
like using a '.' after the '%' was mentioned by Alvaro.

I think the best way to implement this is to rename
process_log_prefix_padding into process_log_prefix_modifiers and add
an "int *precision" variable and add some code to parse and populate
that based on if there's a '.' found in the prefix. Then in
log_line_prefix() just take into account the precision for any
prefix variable that it should apply to. So, %s likely should be
altered too, although that might require some additional code to
NULLify the string if the server config is reloaded, or perhaps you
can just remember how much precision that string was made with and
only rebuild it if the required precision changes. We don't seem to
have any other floating point type prefix variables, so I don't
think any of the others need to be altered.

Thanks for the detailed roadmap on this.

Strangely, there were no tests that came with that either. David,
did you mean to expand it past space padding, or...?

Unsure what infrastructure existed then for testing this.

I didn't intend to add anything else later. I don't think we have
any variables that zero padding would be that useful for, but maybe
someone might want to zero pad their pids or xids... ?

They might, so I'll either put in a comment where that would go or
just implement it.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#16Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Fetter (#15)
Re: Log timestamps at higher resolution

On 2018-Oct-25, David Fetter wrote:

I didn't intend to add anything else later. I don't think we have
any variables that zero padding would be that useful for, but maybe
someone might want to zero pad their pids or xids... ?

They might, so I'll either put in a comment where that would go or
just implement it.

I've never seen a need for zero-padding XIDs or PIDs, frankly. Can you
imagine the mess if we ever grew XIDs to 64 bits? And what about
OS-level configurability of PIDs? I don't know that there's a good way
to learn their widths.

I don't think it's worth of a comment even.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services