logging stuff

Started by Andrew Dunstanover 22 years ago20 messages
#1Andrew Dunstan
andrew@dunslane.net

(Responding to the deafening silence regarding my posts a couple of days
ago about logging dbnames and disconnections) ;-)

The dbname patch is now done. If nobody objects to the format
("[db:yourdbname]") I'll submit it - I did it that way to make it fairly
easy to split a log file based on it, although you would have to be
careful with multiline log entries such as query strings. It is
intentionally minimalist.

I had some thoughts about logging disconnections - I can see a way to do
it via an on_proc_exit handler, I think. Then I started wondering if it
might be useful to log session times instead of just noting a disconnect
and letting the user have to calculate the time.

But I won't bother with this if there's no interest. *I* have no current
use for it, but I could well imagine others might. (I might too in the
future if I wanted to debug my connection pooling app).

cheers

andrew

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#1)
Re: logging stuff

Andrew Dunstan <andrew@dunslane.net> writes:

The dbname patch is now done. If nobody objects to the format
("[db:yourdbname]") I'll submit it - I did it that way to make it fairly
easy to split a log file based on it, although you would have to be
careful with multiline log entries such as query strings.

I'd tend to just put the dbname in a known column, and not bother with
the decoration --- decoration adds up fast when it's on every log line,
and I don't think it helps make the log more parseable. Compare what we
do with timestamps and pids.

regards, tom lane

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#2)
Re: logging stuff

Not quite the same - timestamps and pids have known formats, while db
names are almost arbitrary. I know including spaces in names is horrible
to my *nix way of thinking, but others might not have my prejudices.
(interesting question - what characters are NOT allowed in a database
name?).

BTW, we're talking about 5 extra chars per line here. I know it adds up,
but these days disk space is cheap and plentiful - for a million log
lines we're still only talking 5Mb (says he whose first machine had a
whopping 20Mb of disk space, and who paid hundreds of (aussie) dollars
for the extra 10 Mb.)

Syslog puts [] around pids, and typically has a lot of redundancy.

I'm prepared to be guided by concensus, though.

cheers

andrew

Tom Lane wrote:

Show quoted text

Andrew Dunstan <andrew@dunslane.net> writes:

The dbname patch is now done. If nobody objects to the format
("[db:yourdbname]") I'll submit it - I did it that way to make it fairly
easy to split a log file based on it, although you would have to be
careful with multiline log entries such as query strings.

I'd tend to just put the dbname in a known column, and not bother with
the decoration --- decoration adds up fast when it's on every log line,
and I don't think it helps make the log more parseable. Compare what we
do with timestamps and pids.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#3)
Re: logging stuff

Andrew Dunstan <andrew@dunslane.net> writes:

I'm prepared to be guided by concensus, though.

I'm not dead set on it either, just wanted to raise a flag. Who else
has an opinion?

regards, tom lane

#5Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#4)
Re: logging stuff

Tom,

I'm prepared to be guided by concensus, though.

I'm not dead set on it either, just wanted to raise a flag. Who else
has an opinion?

From my perspective, we could really use a "delimiter" between the "fields" of
log output which is unlikely to appear within those fields instead of parsing
by character count, rather than making dbname a special case.

Or do we already do this and I'm not parsing my log right?

--
Josh Berkus
Aglio Database Solutions
San Francisco

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#5)
Re: logging stuff

Josh Berkus <josh@agliodbs.com> writes:

From my perspective, we could really use a "delimiter" between the
"fields" of log output which is unlikely to appear within those fields
instead of parsing by character count, rather than making dbname a
special case.

Well, Andrew was arguing that space is a likely character in dbnames,
but then again it's possible to put "]" into dbnames. I think the only
way to make this completely unambiguous would be to introduce a quoting
convention for dbnames (and usernames when those get added). But if
the goal is to allow simple parsing of log entries, that won't improve
matters at all.

My feeling about it is that DBAs who want to automatically parse log
entries can just forbid spaces in the dbnames and usernames they allow.
The KISS principle applies here.

A variant (which'd be okay with me) is to separate these fields with
tabs instead of spaces; then the rule for DBAs would be "don't allow
tabs in db/user names".

regards, tom lane

#7Larry Rosenman
ler@lerctr.org
In reply to: Tom Lane (#6)
Re: logging stuff

--On Tuesday, August 05, 2003 16:27:55 -0400 Tom Lane <tgl@sss.pgh.pa.us>
wrote:

Josh Berkus <josh@agliodbs.com> writes:

From my perspective, we could really use a "delimiter" between the
"fields" of log output which is unlikely to appear within those fields
instead of parsing by character count, rather than making dbname a
special case.

Well, Andrew was arguing that space is a likely character in dbnames,
but then again it's possible to put "]" into dbnames. I think the only
way to make this completely unambiguous would be to introduce a quoting
convention for dbnames (and usernames when those get added). But if
the goal is to allow simple parsing of log entries, that won't improve
matters at all.

My feeling about it is that DBAs who want to automatically parse log
entries can just forbid spaces in the dbnames and usernames they allow.
The KISS principle applies here.

A variant (which'd be okay with me) is to separate these fields with
tabs instead of spaces; then the rule for DBAs would be "don't allow
tabs in db/user names".

Tabs cause issues in syslog output if I remember correctly.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Larry Rosenman (#7)
Re: logging stuff

Larry Rosenman <ler@lerctr.org> writes:

--On Tuesday, August 05, 2003 16:27:55 -0400 Tom Lane <tgl@sss.pgh.pa.us>

A variant (which'd be okay with me) is to separate these fields with
tabs instead of spaces; then the rule for DBAs would be "don't allow
tabs in db/user names".

Tabs cause issues in syslog output if I remember correctly.

Really? What kind of issues? We're already quite likely to be sticking
tabs into the log output, e.g. if log_statement is turned on.

regards, tom lane

#9Larry Rosenman
ler@lerctr.org
In reply to: Tom Lane (#8)
Re: logging stuff

--On Tuesday, August 05, 2003 16:41:34 -0400 Tom Lane <tgl@sss.pgh.pa.us>
wrote:

Larry Rosenman <ler@lerctr.org> writes:

--On Tuesday, August 05, 2003 16:27:55 -0400 Tom Lane
<tgl@sss.pgh.pa.us>

A variant (which'd be okay with me) is to separate these fields with
tabs instead of spaces; then the rule for DBAs would be "don't allow
tabs in db/user names".

Tabs cause issues in syslog output if I remember correctly.

Really? What kind of issues? We're already quite likely to be sticking
tabs into the log output, e.g. if log_statement is turned on.

I remember dealing with \n's, and don't recall if \t is a problem as well
in SCO's syslogd. I'll play tonite, and see what it does/doesn't deal
with.

LER

regards, tom lane

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

#10Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Andrew Dunstan (#3)
Re: logging stuff

My issue is that I think there are reasonable people might want
username, dbname, hostname, host ip, and host port on the log lines.
This is the information that we currently report during a connection, if
enabled.

I have no problem adding those five booleans if people prefer booleans
--- my big point is that dbname isn't the only possible thing folks
would want.  And of course, we already have pid and timestamp, so once
we are done, we will have seven possible data items on each line, and
with booleans there will be no control over their order on the line.

---------------------------------------------------------------------------

Andrew Dunstan wrote:

Not quite the same - timestamps and pids have known formats, while db
names are almost arbitrary. I know including spaces in names is horrible
to my *nix way of thinking, but others might not have my prejudices.
(interesting question - what characters are NOT allowed in a database
name?).

BTW, we're talking about 5 extra chars per line here. I know it adds up,
but these days disk space is cheap and plentiful - for a million log
lines we're still only talking 5Mb (says he whose first machine had a
whopping 20Mb of disk space, and who paid hundreds of (aussie) dollars
for the extra 10 Mb.)

Syslog puts [] around pids, and typically has a lot of redundancy.

I'm prepared to be guided by concensus, though.

cheers

andrew

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

The dbname patch is now done. If nobody objects to the format
("[db:yourdbname]") I'll submit it - I did it that way to make it fairly
easy to split a log file based on it, although you would have to be
careful with multiline log entries such as query strings.

I'd tend to just put the dbname in a known column, and not bother with
the decoration --- decoration adds up fast when it's on every log line,
and I don't think it helps make the log more parseable. Compare what we
do with timestamps and pids.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#11Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Josh Berkus (#5)
Re: logging stuff

Another interesting idea, if we do a single log variable with parameters
separated by commas, is to allow some syntax where you could specify the
delimiter between fields, so it could be:

log_line: "|", dbname, username

or

log_line: " ", dbname, username

---------------------------------------------------------------------------

Josh Berkus wrote:

Tom,

I'm prepared to be guided by concensus, though.

I'm not dead set on it either, just wanted to raise a flag. Who else
has an opinion?

From my perspective, we could really use a "delimiter" between the "fields" of

log output which is unlikely to appear within those fields instead of parsing
by character count, rather than making dbname a special case.

Or do we already do this and I'm not parsing my log right?

--
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#12scott.marlowe
scott.marlowe@ihs.com
In reply to: Bruce Momjian (#11)
Re: logging stuff

If we're looking at this, we might want to look at how apache does it with
it's customlog feature. This allows you to first define custom log types,
then set them according to which virtual server you're setting up.

I could see that being nice so you could create a couple of different
custom log types, then use one for one database, and another for a
different database.

For people running large numbers of databases on a single cluster, it
might be quite useful to be able to log each database differently.

On Tue, 5 Aug 2003, Bruce Momjian wrote:

Show quoted text

Another interesting idea, if we do a single log variable with parameters
separated by commas, is to allow some syntax where you could specify the
delimiter between fields, so it could be:

log_line: "|", dbname, username

or

log_line: " ", dbname, username

---------------------------------------------------------------------------

Josh Berkus wrote:

Tom,

I'm prepared to be guided by concensus, though.

I'm not dead set on it either, just wanted to raise a flag. Who else
has an opinion?

From my perspective, we could really use a "delimiter" between the "fields" of

log output which is unlikely to appear within those fields instead of parsing
by character count, rather than making dbname a special case.

Or do we already do this and I'm not parsing my log right?

--
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#10)
Re: logging stuff

Bruce Momjian <pgman@candle.pha.pa.us> writes:

... And of course, we already have pid and timestamp, so once
we are done, we will have seven possible data items on each line, and
with booleans there will be no control over their order on the line.

Which is exactly the way I want it ;-). I can't see any use that would
justify the amount of extra logic needed to allow user-specified
ordering of the entries. This feature discussion seems to be
degenerating into a gild-the-lily contest ...

regards, tom lane

#14Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#13)
Re: logging stuff

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

... And of course, we already have pid and timestamp, so once
we are done, we will have seven possible data items on each line, and
with booleans there will be no control over their order on the line.

Which is exactly the way I want it ;-). I can't see any use that would
justify the amount of extra logic needed to allow user-specified
ordering of the entries. This feature discussion seems to be
degenerating into a gild-the-lily contest ...

Depends if someone needs a lilly, though I have not heard anyone say
they do. ;-)

Adding several new variables is fine, but what do we call the hostname
option if we already have log_hostname?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#15Rod Taylor
rbt@rbt.ca
In reply to: scott.marlowe (#12)
Re: logging stuff

I could see that being nice so you could create a couple of different
custom log types, then use one for one database, and another for a
different database.

Point them to different log files and you've got yourself a great
feature.

For people running large numbers of databases on a single cluster, it
might be quite useful to be able to log each database differently.

You bet. Hosting companies give clients the option to pick up their
weblogs. This would be a good feature for those. Make the log come out
similarly as apache, and you've got yourself some webalizer screens with
funny URLs (queries ;).

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#14)
Re: logging stuff

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Adding several new variables is fine, but what do we call the hostname
option if we already have log_hostname?

<shrug> We've renamed GUC variables before for consistency. I'd opt
for picking names that show the common purpose, maybe log_line_FOO?

regards, tom lane

#17Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#16)
Re: logging stuff

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Adding several new variables is fine, but what do we call the hostname
option if we already have log_hostname?

<shrug> We've renamed GUC variables before for consistency. I'd opt
for picking names that show the common purpose, maybe log_line_FOO?

That's what I was thinking, that we might rename log_pid and
log_timestamp to log_line_pid and log_line_timestamp to indicate it is
printed on every line, and have the new booleans follow that pattern.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#18Mendola Gaetano
mendola@bigfoot.com
In reply to: Andrew Dunstan (#1)
Re: logging stuff

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Josh Berkus <josh@agliodbs.com> writes:

From my perspective, we could really use a "delimiter" between the
"fields" of log output which is unlikely to appear within those fields
instead of parsing by character count, rather than making dbname a
special case.

Well, Andrew was arguing that space is a likely character in dbnames,
but then again it's possible to put "]" into dbnames. I think the only
way to make this completely unambiguous would be to introduce a quoting
convention for dbnames (and usernames when those get added). But if
the goal is to allow simple parsing of log entries, that won't improve
matters at all.

My feeling about it is that DBAs who want to automatically parse log
entries can just forbid spaces in the dbnames and usernames they allow.
The KISS principle applies here.

A variant (which'd be okay with me) is to separate these fields with
tabs instead of spaces; then the rule for DBAs would be "don't allow
tabs in db/user names".

The tabs are not properly treated by syslogd, look this piece of log:

Aug 6 01:02:16 dell-01 postgres[19490]: [157296] LOG: connection received:
host=[local]
Aug 6 01:02:16 dell-01 postgres[19490]: [157297] LOG: connection
authorized: user=kalman database=kalman
Aug 6 01:02:16 dell-01 postgres[19490]: [157298] DEBUG:
/usr/bin/postmaster child[19490]: starting with (
Aug 6 01:02:16 dell-01 postgres[19490]: [157299] DEBUG: ^Ipostgres
Aug 6 01:02:16 dell-01 postgres[19490]: [157300] DEBUG: ^I-v131072
Aug 6 01:02:16 dell-01 postgres[19490]: [157301] DEBUG: ^I-p
Aug 6 01:02:16 dell-01 postgres[19490]: [157302] DEBUG: ^Ikalman
Aug 6 01:02:16 dell-01 postgres[19490]: [157303] DEBUG: )

do you see: ^I that are "chars" that syslogd hate.

Regards
Gaeatano Mendola

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mendola Gaetano (#18)
Re: logging stuff

"Mendola Gaetano" <mendola@bigfoot.com> writes:

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

A variant (which'd be okay with me) is to separate these fields with
tabs instead of spaces; then the rule for DBAs would be "don't allow
tabs in db/user names".

do you see: ^I that are "chars" that syslogd hate.

Ugh. Okay, scratch that idea ...

regards, tom lane

#20Andrew Dunstan
andrew@dunslane.net
In reply to: scott.marlowe (#12)
Re: logging stuff

Gathering some of the threads, here's what I see:

1. There is some demand for per database logging. Doing that is rather
beyond the scope of what I intended - in fact what I intended was to enable
sensible log splitting out of band. I'll even write a perl script to do it
for you if you like :-)

2. There is a suggestion of custom log formats, beyond the current on/off
flags used, similar to Apache's logging mechanism. Of course, there are some
problems - web logs cover one thing - http transactions. Even there, custom
logging has led people to grief, and led to the common logging format that
web log analysers usually use. And Pg logs cover various kinds of events.
Some don't have a sensible db name, for example (like server startup logs
messages). Writing code that would allow administrators to specify the log
format almost arbitrarily, or even just to order the fields arbitrarily,
would be quite a bit of work, and I suspect a constant source of irritating
bugs, and I have no interest in doing the work. There are far more valuable
things that need to be done.

3. There is debate about field delimiters in the logs. Currently, timestamp
when used is a fixed length field at the start of the record, and pid when
used is inside [ ]. This is complicated in the case of dbname by the facts
that a) dbnames can be almost anything and b) not all records will have a
(sensible) dbname to report. I chose [ ] originally to conform to what was
done with pid, but now I'm leaning towards something else - too many M$
people have used [ ] in db work for years, so it is likely to crop up in
names somehow. Maybe < > or | | - whatever seems least liketo to occur
inside a dbname. I want to logs to be readable as well as parseable, or I'd
use something like ^ or ~ or `. The other reason to move away from [ ] is
that Tom (quite reasonably) wants to keep the char count down, and that
would let me drop the "db:" prefix, so the fixed overhead would be 3 chars
per line instead of 6 (if you count the trailing space).

The bottom line for me is to get the info needed in the logs, and then make
them look pretty, split them, or whatever, out of band - that isn't (and
shouldn't, IMNSHO) be the server's job.

andrew