Increasing the length of pg_stat_activity.current_query...

Started by Sean Chittendenabout 21 years ago44 messages
#1Sean Chittenden
sean@chittenden.org

Is there any reason the length of
pg_catalog.pg_stat_activity.current_query is capped at 255 characters?
Why can't it be a pointer to the currently running query?

Seems silly to me and is a PITA to try and use as a debugging tool only
to find out that the query in question, has a logical break right at
character 255 so the query in pg_stat_query looks like it's the
complete query, but it's not (extra foo at the end of the query is
causing it to run dog slow, but it's difficult to see that without
going to the logs and digging through them to find the problem
statement).

Anyway, is there any good reason for this or can this be increased
somehow? -sc

--
Sean Chittenden

#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Sean Chittenden (#1)
Re: Increasing the length of pg_stat_activity.current_query...

Sean Chittenden wrote:

Is there any reason the length of
pg_catalog.pg_stat_activity.current_query is capped at 255 characters?
Why can't it be a pointer to the currently running query?

Seems silly to me and is a PITA to try and use as a debugging tool only
to find out that the query in question, has a logical break right at
character 255 so the query in pg_stat_query looks like it's the
complete query, but it's not (extra foo at the end of the query is
causing it to run dog slow, but it's difficult to see that without
going to the logs and digging through them to find the problem
statement).

Anyway, is there any good reason for this or can this be increased
somehow? -sc

I think it is limited because the queries are stored in shared memory,
maybe.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sean Chittenden (#1)
Re: Increasing the length of pg_stat_activity.current_query...

Sean Chittenden <sean@chittenden.org> writes:

Is there any reason the length of
pg_catalog.pg_stat_activity.current_query is capped at 255 characters?

The reason for a limit is to avoid fragmentation of UDP messages.
I believe we've set it at 1K for 8.0, though, and if you are on
a platform with a higher message size limit you could raise it more.

regards, tom lane

#4Sean Chittenden
sean@chittenden.org
In reply to: Tom Lane (#3)
Re: Increasing the length of pg_stat_activity.current_query...

Is there any reason the length of
pg_catalog.pg_stat_activity.current_query is capped at 255 characters?

The reason for a limit is to avoid fragmentation of UDP messages.
I believe we've set it at 1K for 8.0, though, and if you are on
a platform with a higher message size limit you could raise it more.

I'm confused... UDP as in the UDP/IP? RPC caps UDP messages at 8K and
NFS over UDP often runs at 32K... where is UDP used in the backend?
-sc

--
Sean Chittenden

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sean Chittenden (#4)
Re: Increasing the length of pg_stat_activity.current_query...

Sean Chittenden <sean@chittenden.org> writes:

I'm confused... UDP as in the UDP/IP? RPC caps UDP messages at 8K and
NFS over UDP often runs at 32K... where is UDP used in the backend?

pgstat messages travel over UDP/IP.

regards, tom lane

#6Sean Chittenden
sean@chittenden.org
In reply to: Tom Lane (#5)
Re: Increasing the length of pg_stat_activity.current_query...

I'm confused... UDP as in the UDP/IP? RPC caps UDP messages at 8K and
NFS over UDP often runs at 32K... where is UDP used in the backend?

pgstat messages travel over UDP/IP.

Over the loopback interface, right? Then why worry about
fragmentation? This seems like premature optimization/prevention. A
lost packet over lo0 is symptom of a bigger problem. The contents of
pgstat messages are probably the least of an admins concerns if that's
happening.

Having a 1K query isn't uncommon on some of the stuff I work on, an 8K
query... that's a tad different and would stick out like a sore thumb.
Would you be open to increasing this further after the 8.0 release? I
haven't heard of anyone complaining about dropped/fragmented pgstat
messages. :) -sc

--
Sean Chittenden

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sean Chittenden (#6)
Re: Increasing the length of pg_stat_activity.current_query...

Sean Chittenden <sean@chittenden.org> writes:

Would you be open to increasing this further after the 8.0 release?

Nope.

I haven't heard of anyone complaining about dropped/fragmented pgstat
messages. :) -sc

That's because we keep 'em small enough to not fragment.

regards, tom lane

#8Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#3)
Re: Increasing the length of

On Sat, 2004-11-06 at 19:44, Tom Lane wrote:

Sean Chittenden <sean@chittenden.org> writes:

Is there any reason the length of
pg_catalog.pg_stat_activity.current_query is capped at 255 characters?

The reason for a limit is to avoid fragmentation of UDP messages.
I believe we've set it at 1K for 8.0, though, and if you are on
a platform with a higher message size limit you could raise it more.

Hopefully, that is an explanation and not a justification.

Assuming there is no actual barrier to change on that point...

Not having the whole query is painful. Raising it to 1K doesn't get
round the fact that it's the longer queries that tend to be the more
painful ones, and so they are the ones you want to trap in full and
EXPLAIN, so you can find out if they are *ever* coming back.

The assumption that we are also logging queries is less likely to be
true. pg_stat_activity is a window into the dynamic activity of the
server and is used for to-the-minute administration. The query volume
may be too high to enable full query logging all of the time.

If somebody is bothered by UDP fragmentation, then they should ask for
only the first 255 chars, rather than doing a select * from
pg_stat_activity. If its a backward compatibility issue, perhaps we can
set up a view to fake it for those people.

I'd vote in favour of relaxing the limit entirely, as Sean suggests.

--
Best Regards, Simon Riggs

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#8)
Re: Increasing the length of pg_stat_activity.current_query...

Simon Riggs <simon@2ndquadrant.com> writes:

Not having the whole query is painful. Raising it to 1K doesn't get
round the fact that it's the longer queries that tend to be the more
painful ones, and so they are the ones you want to trap in full and
EXPLAIN, so you can find out if they are *ever* coming back.

... so look in the postmaster log ...

I'd vote in favour of relaxing the limit entirely, as Sean suggests.

The choice is not between "limit" and "no limit", it is between
"limit" and "broken".

regards, tom lane

#10Greg Stark
gsstark@mit.edu
In reply to: Sean Chittenden (#6)
Re: Increasing the length of pg_stat_activity.current_query...

Sean Chittenden <sean@chittenden.org> writes:

Having a 1K query isn't uncommon on some of the stuff I work on, an 8K query...
that's a tad different and would stick out like a sore thumb.

Just as a point of reference, I've been processing my logs to see how large my
queries work out to. They seem to max out at just over 5k, (5330 bytes to be
exact).

This is excluding "CREATE FUNCTION" calls where the body of the function can
of course be much larger but isn't interesting for stats.

Would you be open to increasing this further after the 8.0 release? I
haven't heard of anyone complaining about dropped/fragmented pgstat
messages. :) -sc

From my own experience I would suggest 8k. If it's good enough for NFS
defaults it ought to be good enough for Postgres.

Realistically, you shouldn't be expecting any real quantities of dropped
packets on a local area network, so fragmented UDP packets aren't really a
problem. Anyone running their stats collector over a long haul internet
connection with dropped packets is probably doing something pretty unusual.

I think historically implementations didn't handle fragmented UDP packets at
all, or perhaps not over 32k. But any platform today ought to be happy with
packets at least up to 32k and any modern platform quite a bit larger.

--
greg

#11Greg Stark
gsstark@mit.edu
In reply to: Tom Lane (#9)
Re: Increasing the length of pg_stat_activity.current_query...

Tom Lane <tgl@sss.pgh.pa.us> writes:

I'd vote in favour of relaxing the limit entirely, as Sean suggests.

The choice is not between "limit" and "no limit", it is between
"limit" and "broken".

What do you think is broken about fragmented UDP packets?

Once Upon a Time fragmented UDP packets basically didn't work at all. But
that's going on 20 years now. These days you can reliably send large packets
up to 32k certainly over local connections and even over long-haul connections
when you don't have packet loss problems. Even when you do the worst case
scenario is your packet doesn't make it, so what's the harm in at least trying
to send it?

I'm assuming the packets aren't a fixed size. As long as we aren't bumping up
the other packets to 8k then there's no danger to sending the occasional 8k
packet.

The reason people don't like fragmented UDP packets is that there's no
retransmission facility and a packet is lost if a single fragment is lost. So
if you're sending an 8k packet with an MTU of 1500 you'll have 5 fragments.
With 10% packet loss that gives your 8k fragmented packet a 50/50 chance of
getting through.

But if you're having 10% packet loss on your local area network you already
have a problem. Even then you're losing 10% of your smaller queries and 50% of
your larger queries whereas currently you would be losing 10% of your smaller
queries and 100% of your larger queries...

--
greg

#12Simon Riggs
simon@2ndquadrant.com
In reply to: Greg Stark (#11)
Re: Increasing the length of

On Sun, 2004-11-07 at 20:59, Greg Stark wrote:

Tom Lane <tgl@sss.pgh.pa.us> writes:

I'd vote in favour of relaxing the limit entirely, as Sean suggests.

The choice is not between "limit" and "no limit", it is between
"limit" and "broken".

What do you think is broken about fragmented UDP packets?

...probably that pgstat.c doesn't handle them at all, so if they occur
then you've lost data. Until that is fixed, we have a limit.

--
Best Regards, Simon Riggs

#13Greg Stark
gsstark@mit.edu
In reply to: Simon Riggs (#12)
Re: Increasing the length of pg_stat_activity.current_query...

Simon Riggs <simon@2ndquadrant.com> writes:

On Sun, 2004-11-07 at 20:59, Greg Stark wrote:

What do you think is broken about fragmented UDP packets?

...probably that pgstat.c doesn't handle them at all, so if they occur
then you've lost data. Until that is fixed, we have a limit.

Fragmentation happens at the IP protocol level, the kernel is responsible for
reassembly. There's nothing for the application level to handle.

--
greg

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#13)
Re: Increasing the length of pg_stat_activity.current_query...

Greg Stark <gsstark@mit.edu> writes:

What do you think is broken about fragmented UDP packets?

Fragmentation happens at the IP protocol level, the kernel is responsible for
reassembly. There's nothing for the application level to handle.

And, by the same token, on platforms where it is broken there is nothing
we can do about it.

regards, tom lane

#15Gaetano Mendola
mendola@bigfoot.com
In reply to: Tom Lane (#14)
Re: Increasing the length of pg_stat_activity.current_query...

Tom Lane wrote:

Greg Stark <gsstark@mit.edu> writes:

What do you think is broken about fragmented UDP packets?

Fragmentation happens at the IP protocol level, the kernel is responsible for
reassembly. There's nothing for the application level to handle.

And, by the same token, on platforms where it is broken there is nothing
we can do about it.

Like what? If the OS can not handle UDP reassembly then we have some other problems
around
I think the OS breakage is a non issue here.

Regards
Gaetano Mendola

#16Greg Stark
gsstark@mit.edu
In reply to: Tom Lane (#14)
Re: Increasing the length of pg_stat_activity.current_query...

Tom Lane <tgl@sss.pgh.pa.us> writes:

Greg Stark <gsstark@mit.edu> writes:

What do you think is broken about fragmented UDP packets?

Fragmentation happens at the IP protocol level, the kernel is responsible for
reassembly. There's nothing for the application level to handle.

And, by the same token, on platforms where it is broken there is nothing
we can do about it.

Ok having read the source I think I see why we're talking past each other.

The pgstat messages are indeed fixed size. So bumping the size up to 8k would
mean *every* udp packet would be bumped up to 8k.

However there's no good reason for that to be the case. recv(2) always returns
exactly one packet and tells you how large it was. And in fact the
PgStat_MsgHdr even has a redundant message size field that could serve the
same purpose. So we actually have all the machinery needed twice over to avoid
the fixed size messages.

In fact looking over the code I think it would be simple to change this. I
think it would be reasonable to make just the PgStat_MsgActivity variable
sized. I'll look at it a bit more, I think it's well worth the slight code
complexity, especially if it were only done for the one message type.

This has potential to reduce the average size of these messages quite a lot.
Potentially reducing the data being pumped through udp and the pipe buffer
(?!) quite a lot.

--
greg

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#16)
Re: Increasing the length of pg_stat_activity.current_query...

Greg Stark <gsstark@mit.edu> writes:

The pgstat messages are indeed fixed size.

No, there's a fixed maximum size.

regards, tom lane

#18Greg Stark
gsstark@mit.edu
In reply to: Tom Lane (#17)
Re: Increasing the length of pg_stat_activity.current_query...

Tom Lane <tgl@sss.pgh.pa.us> writes:

Greg Stark <gsstark@mit.edu> writes:

The pgstat messages are indeed fixed size.

No, there's a fixed maximum size.

Hm. *rereads source*

It's true, pgstat_report_activity only sends the actual size of the query, not
the full payload size.

The only problem I see in raising the size of PGSTAT_MSG_PAYLOAD is that it
also governs the size of PGSTAT_NUM_TABPURGE and PGSTAT_NUM_TABENTRIES.
There's no need to grow those arrays and risk losing them. But these message
sizes could just be left based on the 1k value while boosting the maximum size
of PGSTAT_ACTIVITY_SIZE.

That would have no downside and only benefits. The worst case is that a
machine that didn't handle UDP fragment reassembly would drop the packets that
postgres is currently dropping preemptively. Shorter queries and other packets
would be unaffected.

--
greg

#19Greg Stark
gsstark@mit.edu
In reply to: Greg Stark (#18)
Re: Increasing the length of pg_stat_activity.current_query...

Greg Stark <gsstark@MIT.EDU> writes:

The only problem I see in raising the size of PGSTAT_MSG_PAYLOAD is that it
also governs the size of PGSTAT_NUM_TABPURGE and PGSTAT_NUM_TABENTRIES.
There's no need to grow those arrays and risk losing them. But these message
sizes could just be left based on the 1k value while boosting the maximum size
of PGSTAT_ACTIVITY_SIZE.

Just to be clear, I'm talking about something as simple as this:
(Haven't finished compiling it yet)

--- pgstat.h.~1.26.~	2004-08-29 00:13:03.000000000 -0400
+++ pgstat.h	2004-11-08 17:17:17.000000000 -0500
@@ -57,12 +57,13 @@
 } PgStat_MsgHdr;
 /* ----------
- * Space available in a message.  This will keep the UDP packets below 1K,
- * which should fit unfragmented into the MTU of the lo interface on most
- * platforms. Does anybody care for platforms where it doesn't?
+ * Space used by a message ideally and maximum space used. We try to not to go
+ * over 1k unless necessary to avoid UDP packets that don't fit into the MTU
+ * of the loopback interface on very old systems and need to be fragmented.
  * ----------
  */
-#define PGSTAT_MSG_PAYLOAD	(1000 - sizeof(PgStat_MsgHdr))
+#define PGSTAT_MSG_PAYLOAD_IDEAL	(1000  - sizeof(PgStat_MsgHdr))
+#define PGSTAT_MSG_PAYLOAD			(32740 - sizeof(PgStat_MsgHdr))
 /* ----------
  * PgStat_TableEntry			Per-table info in a MsgTabstat
@@ -131,7 +132,7 @@
  *								and buffer access statistics.
  * ----------
  */
-#define PGSTAT_NUM_TABENTRIES	((PGSTAT_MSG_PAYLOAD - 3 * sizeof(int))		\
+#define PGSTAT_NUM_TABENTRIES	((PGSTAT_MSG_PAYLOAD_IDEAL - 3 * sizeof(int))		\
 								/ sizeof(PgStat_TableEntry))
 typedef struct PgStat_MsgTabstat
@@ -148,7 +149,7 @@
  *								about dead tables.
  * ----------
  */
-#define PGSTAT_NUM_TABPURGE		((PGSTAT_MSG_PAYLOAD - sizeof(int))		\
+#define PGSTAT_NUM_TABPURGE		((PGSTAT_MSG_PAYLOAD_IDEAL - sizeof(int))		\
 								/ sizeof(Oid))

typedef struct PgStat_MsgTabpurge

--
greg

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#18)
Re: Increasing the length of pg_stat_activity.current_query...

Greg Stark <gsstark@mit.edu> writes:

That would have no downside and only benefits. The worst case is that a
machine that didn't handle UDP fragment reassembly would drop the packets that
postgres is currently dropping preemptively.

Huh? We're not dropping the query *entirely*, which is what I would
expect to happen if the kernel doesn't want to deal with UDP packet
fragmentation.

However, after rereading the RFCs I think this discussion may be based
on false premises. In a network stack designed per the RFCs, both TCP
and UDP use the same IP-level fragmentation logic, and so it's unlikely
that there would be no fragmentation support at all. It's really a
performance issue: do you want to pay the penalty associated with
reassembling messages that exceed the loopback MTU, and do you want to
risk the possibility that the kernel will drop stuff on the floor rather
than fragment or reassemble it? Remember that UDP is non-guaranteed
delivery, and the cases you are most interested in are likely to be
exactly the same cases where the kernel is under stress and may decide
to shed load that way.

BTW, although the transmitted packets might not be fixed-size, the
per-backend entries written to the stats file are. Cranking
PGSTAT_ACTIVITY_SIZE up to the moon without loss of performance
will take more than just changing one #define.

Another relevant question is why you are expecting to get this
information through pgstats and not by looking in the postmaster log.
I don't know about you, but I don't have any tools that are designed to
cope nicely with looking at tables that have columns that might be many
K wide. Looking in the log seems a much nicer way of examining the full
text of extremely long queries. So I think it's actually a good thing
that pgstats truncates the queries at some reasonable width.

regards, tom lane

#21Oliver Jowett
oliver@opencloud.com
In reply to: Tom Lane (#20)
Re: Increasing the length of pg_stat_activity.current_query...

Tom Lane wrote:

It's really a
performance issue: do you want to pay the penalty associated with
reassembling messages that exceed the loopback MTU [...]

BTW, the loopback MTU here is quite large:

oliver@flood:~$ /sbin/ifconfig lo | grep MTU
UP LOOPBACK RUNNING MTU:16436 Metric:1
oliver@flood:~$ uname -a
Linux flood 2.6.8.1-flood #1 Wed Sep 29 21:58:09 NZST 2004 i686 GNU/Linux

so at least on Linux 2.6 it seems like the risk of fragmentation is minimal.

-O

#22Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#20)
Re: Increasing the length of

On Mon, 2004-11-08 at 22:32, Tom Lane wrote:

Another relevant question is why you are expecting to get this
information through pgstats and not by looking in the postmaster log.

This is only available if you log all queries, which isn't normally done
while you are in production. When you hit a long running query, you do
wish you had that enabled, and if it was you could look there.

It would be best to leave the postmaster logging turned off, then allow
dynamic inspection of the query iff you have a rogue query.

This is an important admin consideration for data warehousing.

I don't know about you, but I don't have any tools that are designed to
cope nicely with looking at tables that have columns that might be many
K wide. Looking in the log seems a much nicer way of examining the full
text of extremely long queries. So I think it's actually a good thing
that pgstats truncates the queries at some reasonable width.

You're right...if the query was in the log, thats where I'd look.

In general, I'm not bothered whether I can see the whole query or not.
But it would be good to have a mode of operation that allows the whole
query to be seen via pg_stat_activity, when required.

Could this allow some dynamic behaviour? i.e. set it low, as Tom
suggests for most of the time, then set it higher, as Greg suggests,
upon demand, for a short period only? [Not sure, but I think it may only
be sent once at start of query, then never again....so may be a hole in
this thinking]

--
Best Regards, Simon Riggs

#23Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#20)
Re: Increasing the length of pg_stat_activity.current_query...

On 11/8/2004 5:32 PM, Tom Lane wrote:

Another relevant question is why you are expecting to get this
information through pgstats and not by looking in the postmaster log.

The pgstats were originally designed to give "hints" for tuning. That's
why they cover cache hits vs. misses per table and numbers that can be
used to point out missing as well as obsolete indexes. That was what led
to the design of the pgstats file, the UDP communication and those fixed
sizes. The goal was to let it have as little impact on the server
performance as possible. The whole "current query" stuff was added later
on request.

In my opinion it is quite pointless to attempt to transmit the last byte
of every single query sent to the backend, when all you can get out of
that view is a random query every 500 milliseconds.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#24Simon Riggs
simon@2ndquadrant.com
In reply to: Jan Wieck (#23)
Re: Increasing the length of

On Wed, 2004-11-10 at 20:25, Jan Wieck wrote:

On 11/8/2004 5:32 PM, Tom Lane wrote:

Another relevant question is why you are expecting to get this
information through pgstats and not by looking in the postmaster log.

The pgstats were originally designed to give "hints" for tuning. That's
why they cover cache hits vs. misses per table and numbers that can be
used to point out missing as well as obsolete indexes. That was what led
to the design of the pgstats file, the UDP communication and those fixed
sizes. The goal was to let it have as little impact on the server
performance as possible. The whole "current query" stuff was added later
on request.

OK, and thanks for writing it.

Evolution is a wonderful thing... our original design point was slightly
away from where we are now.

In my opinion it is quite pointless to attempt to transmit the last byte
of every single query sent to the backend, when all you can get out of
that view is a random query every 500 milliseconds.

If you are certain you have no queries whose text is > 1 KB, or you have
no SQL that lasts > a few seconds, then increasing the UDP limit would
just be a painful waste, I agree.

My intention was towards a data warehouse situation, and my comments are
only relevant in that context. Possibly 25+% of the user base use this
style of processing. In that case, I expect queries to run for minutes
or hours.

What are the alternatives when a query is still running when you return
from lunch? Kill it? Hope? These rogue queries can be a problem, using
up much of the capacity of the system for hours.

Many such queries are generated by applications and hard to recreate.
The server is running it, so we should be able to access the SQL and
diagnose.

--
Best Regards, Simon Riggs

#25Richard Huxton
dev@archonet.com
In reply to: Simon Riggs (#22)
Re: Increasing the length of

Simon Riggs wrote:

On Mon, 2004-11-08 at 22:32, Tom Lane wrote:

Another relevant question is why you are expecting to get this
information through pgstats and not by looking in the postmaster log.

This is only available if you log all queries, which isn't normally done
while you are in production. When you hit a long running query, you do
wish you had that enabled, and if it was you could look there.

It would be best to leave the postmaster logging turned off, then allow
dynamic inspection of the query iff you have a rogue query.

Isn't that:
log_min_duration_statement (integer)

Sets a minimum statement execution time (in milliseconds) for
statement to be logged. All SQL statements that run in the time
specified or longer will be logged with their duration. Setting this to
zero will print all queries and their durations. Minus-one (the default)
disables this. For example, if you set it to 250 then all SQL statements
that run 250ms or longer will be logged. Enabling this option can be
useful in tracking down unoptimized queries in your applications. Only
superusers can increase this or set it to minus-one if this option is
set by the administrator.

--
Richard Huxton
Archonet Ltd

#26Simon Riggs
simon@2ndquadrant.com
In reply to: Richard Huxton (#25)
Re: Increasing the length of

Hi Richard

On Wed, 2004-11-10 at 21:48, Richard Huxton wrote:

Simon Riggs wrote:

On Mon, 2004-11-08 at 22:32, Tom Lane wrote:

Another relevant question is why you are expecting to get this
information through pgstats and not by looking in the postmaster log.

This is only available if you log all queries, which isn't normally done
while you are in production. When you hit a long running query, you do
wish you had that enabled, and if it was you could look there.

It would be best to leave the postmaster logging turned off, then allow
dynamic inspection of the query iff you have a rogue query.

Isn't that:
log_min_duration_statement (integer)

Sets a minimum statement execution time (in milliseconds) for
statement to be logged. All SQL statements that run in the time
specified or longer will be logged with their duration. Setting this to
zero will print all queries and their durations. Minus-one (the default)
disables this. For example, if you set it to 250 then all SQL statements
that run 250ms or longer will be logged. Enabling this option can be
useful in tracking down unoptimized queries in your applications. Only
superusers can increase this or set it to minus-one if this option is
set by the administrator.

That gets written when a statement completes, not during execution.

It's great for finding out if you have them AFTER the fact, but no good
for telling what's going on in the middle of execution.

(It's great BTW, thanks to whoever wrote it)

--
Best Regards, Simon Riggs

#27Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Simon Riggs (#26)
Re: Increasing the length of

On Wed, Nov 10, 2004 at 09:52:17PM +0000, Simon Riggs wrote:

On Wed, 2004-11-10 at 21:48, Richard Huxton wrote:

Isn't that:
log_min_duration_statement (integer)

That gets written when a statement completes, not during execution.

I've been following this thread, and I was thinking the same thing.
I wonder how much work it'd be to have another log setting -- say
log_statement_after_min_duration (integer) -- which did what Simon
wants. That'd more than satisfy my need, for sure. Might the cost
of that be too high, though?

A

--
Andrew Sullivan | ajs@crankycanuck.ca
When my information changes, I alter my conclusions. What do you do sir?
--attr. John Maynard Keynes

#28Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Andrew Sullivan (#27)
Re: Increasing the length of

On Wed, Nov 10, 2004 at 05:51:01PM -0500, Andrew Sullivan wrote:

log_statement_after_min_duration (integer) -- which did what Simon
wants.

Uh, well, not what Simon wants, of course, but which gave us a useful
capability anyway. I agree that the full-bore profiling for the DBA
would be awful nice. But in its absence, if you could see your
long-running query in the log after a minute, and then go do an
EXPLAIN and realise "uh-oh, that's gonna take 3 days to complete" and
kill it, it would be a big help.

A

--
Andrew Sullivan | ajs@crankycanuck.ca
A certain description of men are for getting out of debt, yet are
against all taxes for raising money to pay it off.
--Alexander Hamilton

#29Greg Stark
gsstark@mit.edu
In reply to: Simon Riggs (#24)
Re: Increasing the length of pg_stat_activity.current_query...

Simon Riggs <simon@2ndquadrant.com> writes:

My intention was towards a data warehouse situation, and my comments are
only relevant in that context. Possibly 25+% of the user base use this
style of processing. In that case, I expect queries to run for minutes
or hours.

I come from the opposite angle but have also ended up with the same
conclusion.

In an OLTP environment you can't be trying to save every single SQL query in
the log file. And saving only queries that take longer than some arbitrary
amount of time might not be capturing enough to give a good picture of what's
going on.

I like the idea of a stats daemon that's isolated from the server by something
like UDP and keeps statistics. It would let me turn off logging while still
being able to peek into what queries are running, which take the longest,
which are being executed the most often, and which are taking the most
cumulative time (which isn't necessarily the same thing as either of the other
two).

The idea of tracking cache misses is great, though in the current design a
postgres buffer cache miss doesn't necessarily mean a cache miss. If Postgres
moves to O_DIRECT then it would be a valuable statistic, or if instrumentation
to test for timing of cache hits and misses is added then it could be a good
statistic to have.

I can say that with Oracle it was *incredibly* useful to have the queries
being executed and cached queryable in the cache. The ora_explain tool that
comes with DBD::Oracle makes it extremely easy to identify queries consuming
resources, experiment with rewrites, and then copy the new query into the
application.

It would be great to have something equivalent for Postgres. It would be
extremely kludgy by comparison to have to dig through the log files for
queries. Much better would be to have an interface to access the data pgstats
gathers. But that only works if the entire query is there.

--
greg

#30Robert Treat
xzilla@users.sourceforge.net
In reply to: Andrew Sullivan (#28)
Re: Increasing the length of

On Wed, 2004-11-10 at 17:57, Andrew Sullivan wrote:

On Wed, Nov 10, 2004 at 05:51:01PM -0500, Andrew Sullivan wrote:

log_statement_after_min_duration (integer) -- which did what Simon
wants.

Uh, well, not what Simon wants, of course, but which gave us a useful
capability anyway. I agree that the full-bore profiling for the DBA
would be awful nice. But in its absence, if you could see your
long-running query in the log after a minute, and then go do an
EXPLAIN and realise "uh-oh, that's gonna take 3 days to complete" and
kill it, it would be a big help.

I believe the geeky non-helpful answer is to attach to the process with
gdb and do p debug_query_string which I believe will show you said long
running query. (Someone who actually hacks C can correct me on that, but
I believe I've done it that way before).

Of course that idea lead me to wondering why we couldn't have a function
that could look at a connection (well, either by way of pid or possibly
transaction id) and show the current query being executed.

Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

#31Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Robert Treat (#30)
Re: Increasing the length of

On Thu, Nov 11, 2004 at 10:20:43AM -0500, Robert Treat wrote:

I believe the geeky non-helpful answer is to attach to the process with
gdb and do p debug_query_string which I believe will show you said long
running query.

Yes, this will work, & I've used it. But of course, you don't
actually _know_ it's a long-running query. All you know is that the
back end which has that pid has a query now. So you have to connect
several times and look a thte query to see this. This strategy is
also what pgmonitor and friends do.

Of course that idea lead me to wondering why we couldn't have a function
that could look at a connection (well, either by way of pid or possibly
transaction id) and show the current query being executed.

I think that's what's being requested. I've no clue how to do it,
though.

A

--
Andrew Sullivan | ajs@crankycanuck.ca
The fact that technology doesn't work is no bar to success in the marketplace.
--Philip Greenspun

#32Simon Riggs
simon@2ndquadrant.com
In reply to: Andrew Sullivan (#27)
Re: Increasing the length of

On Wed, 2004-11-10 at 22:51, Andrew Sullivan wrote:

On Wed, Nov 10, 2004 at 09:52:17PM +0000, Simon Riggs wrote:

On Wed, 2004-11-10 at 21:48, Richard Huxton wrote:

Isn't that:
log_min_duration_statement (integer)

That gets written when a statement completes, not during execution.

I've been following this thread, and I was thinking the same thing.
I wonder how much work it'd be to have another log setting -- say
log_statement_after_min_duration (integer) -- which did what Simon
wants. That'd more than satisfy my need, for sure. Might the cost
of that be too high, though?

I think this is a great idea.

...Rather than invent a new GUC, I think this is the solution:

log_min_duration_statement writes to log at end of execution, if
execution time is greater than that threshold. Let's move that piece of
code so it is executed as the query progresses. That way, you get
notified that a problem query is occurring NOW, rather than "it has
occurred".

The code already has such a timer check, for statement_timeout, in
backend/storage/lmgr/proc.c. We could reuse this timer to go off at
log_min_duration_statement and then log query if still executing. (If
log_min_duration_statement >= statement_timeout, we would skip that
step.) We would then reset the timer so that it then goes off at where
it does now, at statement_timeout. So, same piece of code, used twice...

That way you can set up 2 limits, with three bands of actions:

Between 0 and log_min_duration_statement
- logs nothing

Between log_min_duration_statement and statement_timeout
- statement written to log, though execution continues...

At statement_timeout
- statement cancelled

We'd just need a small piece of code to set timer correctly first, then
another piece to record state change and reset timer again. Lift and
drop the existing code from end-of-execution.

This then:
- solves the *problem query* diagnosis issue, as originally raised by
Sean and seconded by myself and Greg
- makes the answer exactly what Tom proposed - look in the logs
- doesn't make any changes to the technical innards of UDP and pgstats.c
- no administrative interface changes, just slightly changed behaviour -
existing users mostly wouldn't even notice we'd done it...

Thoughts?

Easy enough change to be included as a hot fix for 8.0: no new system
code, no new interface code, just same behaviour at different time.

--
Best Regards, Simon Riggs

#33Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Simon Riggs (#32)
Re: Increasing the length of

Great idea. Added to TODO:

* Make log_min_duration_statement output when the duration is reached rather
than when the statement completes

This prints long queries while they are running, making trouble shooting
easier. Also, it eliminates the need for log_statement because it
would now be the same as a log_min_duration_statement of zero.

---------------------------------------------------------------------------

Simon Riggs wrote:

On Wed, 2004-11-10 at 22:51, Andrew Sullivan wrote:

On Wed, Nov 10, 2004 at 09:52:17PM +0000, Simon Riggs wrote:

On Wed, 2004-11-10 at 21:48, Richard Huxton wrote:

Isn't that:
log_min_duration_statement (integer)

That gets written when a statement completes, not during execution.

I've been following this thread, and I was thinking the same thing.
I wonder how much work it'd be to have another log setting -- say
log_statement_after_min_duration (integer) -- which did what Simon
wants. That'd more than satisfy my need, for sure. Might the cost
of that be too high, though?

I think this is a great idea.

...Rather than invent a new GUC, I think this is the solution:

log_min_duration_statement writes to log at end of execution, if
execution time is greater than that threshold. Let's move that piece of
code so it is executed as the query progresses. That way, you get
notified that a problem query is occurring NOW, rather than "it has
occurred".

The code already has such a timer check, for statement_timeout, in
backend/storage/lmgr/proc.c. We could reuse this timer to go off at
log_min_duration_statement and then log query if still executing. (If
log_min_duration_statement >= statement_timeout, we would skip that
step.) We would then reset the timer so that it then goes off at where
it does now, at statement_timeout. So, same piece of code, used twice...

That way you can set up 2 limits, with three bands of actions:

Between 0 and log_min_duration_statement
- logs nothing

Between log_min_duration_statement and statement_timeout
- statement written to log, though execution continues...

At statement_timeout
- statement cancelled

We'd just need a small piece of code to set timer correctly first, then
another piece to record state change and reset timer again. Lift and
drop the existing code from end-of-execution.

This then:
- solves the *problem query* diagnosis issue, as originally raised by
Sean and seconded by myself and Greg
- makes the answer exactly what Tom proposed - look in the logs
- doesn't make any changes to the technical innards of UDP and pgstats.c
- no administrative interface changes, just slightly changed behaviour -
existing users mostly wouldn't even notice we'd done it...

Thoughts?

Easy enough change to be included as a hot fix for 8.0: no new system
code, no new interface code, just same behaviour at different time.

--
Best Regards, Simon Riggs

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#34David Parker
dparker@tazznetworks.com
In reply to: Bruce Momjian (#33)
Re: Increasing the length of

I've been using "log_min_duration_statement = 0" to get durations on all
SQL statements for the purposes of performance tuning, because this logs
the duration on the same line as the statement. My reading of this TODO
is that now log_min_duration_statement = 0 would give me the statements
but no total duration?

- DAP

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Bruce Momjian
Sent: Tuesday, November 30, 2004 1:20 PM
To: Simon Riggs
Cc: Andrew Sullivan; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Increasing the length of

Great idea. Added to TODO:

* Make log_min_duration_statement output when the duration is
reached rather
than when the statement completes

This prints long queries while they are running, making
trouble shooting
easier. Also, it eliminates the need for log_statement because it
would now be the same as a log_min_duration_statement of zero.

---------------------------------------------------------------
------------

Simon Riggs wrote:

On Wed, 2004-11-10 at 22:51, Andrew Sullivan wrote:

On Wed, Nov 10, 2004 at 09:52:17PM +0000, Simon Riggs wrote:

On Wed, 2004-11-10 at 21:48, Richard Huxton wrote:

Isn't that:
log_min_duration_statement (integer)

That gets written when a statement completes, not during

execution.

I've been following this thread, and I was thinking the

same thing.

I wonder how much work it'd be to have another log setting -- say
log_statement_after_min_duration (integer) -- which did what Simon
wants. That'd more than satisfy my need, for sure. Might

the cost

of that be too high, though?

I think this is a great idea.

...Rather than invent a new GUC, I think this is the solution:

log_min_duration_statement writes to log at end of execution, if
execution time is greater than that threshold. Let's move that piece
of code so it is executed as the query progresses. That way, you get
notified that a problem query is occurring NOW, rather than "it has
occurred".

The code already has such a timer check, for statement_timeout, in
backend/storage/lmgr/proc.c. We could reuse this timer to go off at
log_min_duration_statement and then log query if still

executing. (If

log_min_duration_statement >= statement_timeout, we would skip that
step.) We would then reset the timer so that it then goes

off at where

it does now, at statement_timeout. So, same piece of code,

used twice...

That way you can set up 2 limits, with three bands of actions:

Between 0 and log_min_duration_statement
- logs nothing

Between log_min_duration_statement and statement_timeout
- statement written to log, though execution continues...

At statement_timeout
- statement cancelled

We'd just need a small piece of code to set timer correctly first,
then another piece to record state change and reset timer

again. Lift

and drop the existing code from end-of-execution.

This then:
- solves the *problem query* diagnosis issue, as originally

raised by

Sean and seconded by myself and Greg
- makes the answer exactly what Tom proposed - look in the logs
- doesn't make any changes to the technical innards of UDP and
pgstats.c
- no administrative interface changes, just slightly changed

behaviour

- existing users mostly wouldn't even notice we'd done it...

Thoughts?

Easy enough change to be included as a hot fix for 8.0: no

new system

code, no new interface code, just same behaviour at different time.

--
Best Regards, Simon Riggs

---------------------------(end of
broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an

index scan if your

joining column's datatypes do not match

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, 
Pennsylvania 19073

---------------------------(end of
broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index
scan if your
joining column's datatypes do not match

#35Bruce Momjian
pgman@candle.pha.pa.us
In reply to: David Parker (#34)
Re: Increasing the length of

David Parker wrote:

I've been using "log_min_duration_statement = 0" to get durations on all
SQL statements for the purposes of performance tuning, because this logs
the duration on the same line as the statement. My reading of this TODO
is that now log_min_duration_statement = 0 would give me the statements
but no total duration?

Oh, sorry, you are right. I forgot about the duration part! I got so
excited I forgot.

TODO item removed.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#36Jim C. Nasby
decibel@decibel.org
In reply to: Bruce Momjian (#35)
Re: Increasing the length of

Could we come up with a compromise then? I guess maybe another setting
that says log any query when it hits more than x amount of time. (I'd
also argue you should get a NOTICE or WARNING when this exceeds the
query timeout time).

A perhapse more friendly alternative would be a way to query to get this
information in real-time, but that probably goes back into the
discussion about the length of data made available in pg_stat_activity.

On Tue, Nov 30, 2004 at 02:32:05PM -0500, Bruce Momjian wrote:

David Parker wrote:

I've been using "log_min_duration_statement = 0" to get durations on all
SQL statements for the purposes of performance tuning, because this logs
the duration on the same line as the statement. My reading of this TODO
is that now log_min_duration_statement = 0 would give me the statements
but no total duration?

Oh, sorry, you are right. I forgot about the duration part! I got so
excited I forgot.

TODO item removed.

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

#37Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Jim C. Nasby (#36)
Re: Increasing the length of

Jim C. Nasby wrote:

Could we come up with a compromise then? I guess maybe another setting
that says log any query when it hits more than x amount of time. (I'd
also argue you should get a NOTICE or WARNING when this exceeds the
query timeout time).

A perhapse more friendly alternative would be a way to query to get this
information in real-time, but that probably goes back into the
discussion about the length of data made available in pg_stat_activity.

Yes. I don't see a huge win for adding another GUC variable.

---------------------------------------------------------------------------

On Tue, Nov 30, 2004 at 02:32:05PM -0500, Bruce Momjian wrote:

David Parker wrote:

I've been using "log_min_duration_statement = 0" to get durations on all
SQL statements for the purposes of performance tuning, because this logs
the duration on the same line as the statement. My reading of this TODO
is that now log_min_duration_statement = 0 would give me the statements
but no total duration?

Oh, sorry, you are right. I forgot about the duration part! I got so
excited I forgot.

TODO item removed.

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

--
Jim C. Nasby, Database Consultant decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#38Simon Riggs
simon@2ndquadrant.com
In reply to: Bruce Momjian (#35)
Re: Increasing the length of

On Tue, 2004-11-30 at 19:32, Bruce Momjian wrote:

David Parker wrote:

I've been using "log_min_duration_statement = 0" to get durations on all
SQL statements for the purposes of performance tuning, because this logs
the duration on the same line as the statement. My reading of this TODO
is that now log_min_duration_statement = 0 would give me the statements
but no total duration?

Oh, sorry, you are right. I forgot about the duration part! I got so
excited I forgot.

TODO item removed.

David's objection was noted, and why I hadn't coded it (yet).

There are currently two ways of getting statement and duration output,
which is confusing....

You can either
1. Individual statements
- log_statement = all
- log_duration = true
- log_line_prefix includes processid

which produces 2 log lines like
statement: xxxxxxxxx
duration: yyyyyyyyyy

2. log_min_duration
log_min_duration_statement=0
which produces 1 log line like
duration: yyyyyyy statement: xxxxxxxxxx

These two things do exactly the same thing, apart from the way the
output is presented to the user in the log line.

I'd like to change log_min_duration_statement as suggested, but this
side-effect behaviour of being a better log_statement than log_statement
kindof gets in the way. It makes me wonder why we have log_statement at
all.

We all want to do performance tracing. I'd also like to be able to
dynamically monitor what is actually happening *now* on the system.
There is no way right now to monitor for rogue queries, other than to
cancel anything that runs more than statement_timeout. Thats not good
either, even if it does keep the current behaviour.

My preference would be to do the following:
- add a script to contrib to process the log file
- always add processid to log_statement_prefix when both log_statement
and log_duration are specified, so you can always tie up the data

Anybody?

--
Best Regards, Simon Riggs

#39Robert Treat
xzilla@users.sourceforge.net
In reply to: Simon Riggs (#38)
Re: Increasing the length of

On Wednesday 01 December 2004 03:38, Simon Riggs wrote:

On Tue, 2004-11-30 at 19:32, Bruce Momjian wrote:

David Parker wrote:

I've been using "log_min_duration_statement = 0" to get durations on
all SQL statements for the purposes of performance tuning, because this
logs the duration on the same line as the statement. My reading of this
TODO is that now log_min_duration_statement = 0 would give me the
statements but no total duration?

Oh, sorry, you are right. I forgot about the duration part! I got so
excited I forgot.

TODO item removed.

David's objection was noted, and why I hadn't coded it (yet).

There are currently two ways of getting statement and duration output,
which is confusing....

You can either
1. Individual statements
- log_statement = all
- log_duration = true
- log_line_prefix includes processid

which produces 2 log lines like
statement: xxxxxxxxx
duration: yyyyyyyyyy

2. log_min_duration
log_min_duration_statement=0
which produces 1 log line like
duration: yyyyyyy statement: xxxxxxxxxx

These two things do exactly the same thing, apart from the way the
output is presented to the user in the log line.

I'd like to change log_min_duration_statement as suggested, but this
side-effect behaviour of being a better log_statement than log_statement
kindof gets in the way. It makes me wonder why we have log_statement at
all.

it's what we started with and has some use cases that
log_min_duration_statement doesnt. Besides, if you change
log_min_duration_statement to print at the time duration is exceeded, you'll
need something to enable printing out of durations of completed statements.

We all want to do performance tracing. I'd also like to be able to
dynamically monitor what is actually happening *now* on the system.
There is no way right now to monitor for rogue queries, other than to
cancel anything that runs more than statement_timeout. Thats not good
either, even if it does keep the current behaviour.

pg_stat_activity shows query_start, you could poll that to look for rouge
queries. you can also use log_min_duration_statement to watch for rouge
queries.

My preference would be to do the following:
- add a script to contrib to process the log file

check out the pqa project on pgfoundry

- always add processid to log_statement_prefix when both log_statement
and log_duration are specified, so you can always tie up the data

we have the option to add the process id now, I don't see why we should force
it into the line. besides, some logging tools will do this for you, so it
would be duplicate data.

--
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

#40Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Simon Riggs (#38)
Re: Increasing the length of

Simon Riggs wrote:

On Tue, 2004-11-30 at 19:32, Bruce Momjian wrote:

David Parker wrote:

I've been using "log_min_duration_statement = 0" to get durations on all
SQL statements for the purposes of performance tuning, because this logs
the duration on the same line as the statement. My reading of this TODO
is that now log_min_duration_statement = 0 would give me the statements
but no total duration?

Oh, sorry, you are right. I forgot about the duration part! I got so
excited I forgot.

TODO item removed.

David's objection was noted, and why I hadn't coded it (yet).

There are currently two ways of getting statement and duration output,
which is confusing....

You can either
1. Individual statements
- log_statement = all
- log_duration = true
- log_line_prefix includes processid

which produces 2 log lines like
statement: xxxxxxxxx
duration: yyyyyyyyyy

2. log_min_duration
log_min_duration_statement=0
which produces 1 log line like
duration: yyyyyyy statement: xxxxxxxxxx

These two things do exactly the same thing, apart from the way the
output is presented to the user in the log line.

I'd like to change log_min_duration_statement as suggested, but this
side-effect behaviour of being a better log_statement than log_statement
kindof gets in the way. It makes me wonder why we have log_statement at
all.

We have it so you can look in the log to see currently running queries
rather than just completed queries.

We all want to do performance tracing. I'd also like to be able to
dynamically monitor what is actually happening *now* on the system.
There is no way right now to monitor for rogue queries, other than to
cancel anything that runs more than statement_timeout. Thats not good
either, even if it does keep the current behaviour.

My preference would be to do the following:
- add a script to contrib to process the log file
- always add processid to log_statement_prefix when both log_statement
and log_duration are specified, so you can always tie up the data

I think our setup is confusing enough. Adding "automatic" additions
seems even more confusing than what we have now. We could throw a log
warning message somehow though.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#41David Parker
dparker@tazznetworks.com
In reply to: Bruce Momjian (#40)
Re: Increasing the length of

How difficult would it be to make the combination

log_statement = all
log_duration = true

just put the duration on the same line as the statement? Then
log_min_duration_statement could be used to
do the desired log-at-threshold behavior, which certainly seems
valuable. You'd need a way to visually/scriptually (?) distinguish those
log records, though, I guess.

Note that my original post on this was more of a question than an
objection - it's entirely possible to sed around having duration and
statement on separate lines - I just wanted clarification. Having them
on the same line IS handy, however....

- DAP

Show quoted text

-----Original Message-----
From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
Sent: Wednesday, December 01, 2004 11:18 AM
To: Simon Riggs
Cc: David Parker; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Increasing the length of

Simon Riggs wrote:

On Tue, 2004-11-30 at 19:32, Bruce Momjian wrote:

David Parker wrote:

I've been using "log_min_duration_statement = 0" to get

durations

on all SQL statements for the purposes of performance tuning,
because this logs the duration on the same line as the

statement.

My reading of this TODO is that now

log_min_duration_statement = 0

would give me the statements but no total duration?

Oh, sorry, you are right. I forgot about the duration

part! I got

so excited I forgot.

TODO item removed.

David's objection was noted, and why I hadn't coded it (yet).

There are currently two ways of getting statement and

duration output,

which is confusing....

You can either
1. Individual statements
- log_statement = all
- log_duration = true
- log_line_prefix includes processid

which produces 2 log lines like
statement: xxxxxxxxx
duration: yyyyyyyyyy

2. log_min_duration
log_min_duration_statement=0
which produces 1 log line like
duration: yyyyyyy statement: xxxxxxxxxx

These two things do exactly the same thing, apart from the way the
output is presented to the user in the log line.

I'd like to change log_min_duration_statement as suggested, but this
side-effect behaviour of being a better log_statement than
log_statement kindof gets in the way. It makes me wonder why we have
log_statement at all.

We have it so you can look in the log to see currently running
queries rather than just completed queries.

We all want to do performance tracing. I'd also like to be able to
dynamically monitor what is actually happening *now* on the system.
There is no way right now to monitor for rogue queries,

other than to

cancel anything that runs more than statement_timeout. Thats

not good

either, even if it does keep the current behaviour.

My preference would be to do the following:
- add a script to contrib to process the log file
- always add processid to log_statement_prefix when both

log_statement

and log_duration are specified, so you can always tie up the data

I think our setup is confusing enough. Adding "automatic"
additions seems even more confusing than what we have now. We
could throw a log warning message somehow though.

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, 
Pennsylvania 19073
#42Bruce Momjian
pgman@candle.pha.pa.us
In reply to: David Parker (#41)
Re: Increasing the length of

David Parker wrote:

How difficult would it be to make the combination

log_statement = all
log_duration = true

just put the duration on the same line as the statement? Then
log_min_duration_statement could be used to
do the desired log-at-threshold behavior, which certainly seems
valuable. You'd need a way to visually/scriptually (?) distinguish those
log records, though, I guess.

Note that my original post on this was more of a question than an
objection - it's entirely possible to sed around having duration and
statement on separate lines - I just wanted clarification. Having them
on the same line IS handy, however....

Many people want the statement printed when it starts (for real-time
server monitoring), not when it finishes, meaning we don't know the
duration at start time.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#43Robert Treat
xzilla@users.sourceforge.net
In reply to: Bruce Momjian (#42)
Re: Increasing the length of

On Wed, 2004-12-01 at 11:47, Bruce Momjian wrote:

David Parker wrote:

How difficult would it be to make the combination

log_statement = all
log_duration = true

just put the duration on the same line as the statement? Then
log_min_duration_statement could be used to
do the desired log-at-threshold behavior, which certainly seems
valuable. You'd need a way to visually/scriptually (?) distinguish those
log records, though, I guess.

Note that my original post on this was more of a question than an
objection - it's entirely possible to sed around having duration and
statement on separate lines - I just wanted clarification. Having them
on the same line IS handy, however....

Many people want the statement printed when it starts (for real-time
server monitoring), not when it finishes, meaning we don't know the
duration at start time.

I think what David is asking for is log_statement and log_duration to
print on the same line at query completion time. We could then change
log_min_duration_statement to print DurationExceeded <duration> rather
than just Duration <time> like it does now, also on the same line, but
with a way to differentiate the two.

Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

#44Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Robert Treat (#43)
Re: Increasing the length of

Robert Treat wrote:

On Wed, 2004-12-01 at 11:47, Bruce Momjian wrote:

David Parker wrote:

How difficult would it be to make the combination

log_statement = all
log_duration = true

just put the duration on the same line as the statement? Then
log_min_duration_statement could be used to
do the desired log-at-threshold behavior, which certainly seems
valuable. You'd need a way to visually/scriptually (?) distinguish those
log records, though, I guess.

Note that my original post on this was more of a question than an
objection - it's entirely possible to sed around having duration and
statement on separate lines - I just wanted clarification. Having them
on the same line IS handy, however....

Many people want the statement printed when it starts (for real-time
server monitoring), not when it finishes, meaning we don't know the
duration at start time.

I think what David is asking for is log_statement and log_duration to
print on the same line at query completion time. We could then change
log_min_duration_statement to print DurationExceeded <duration> rather
than just Duration <time> like it does now, also on the same line, but
with a way to differentiate the two.

I see. That just seems too confusing to me.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073