pg_stat_get_backen_last_activity() ???

Started by Sean Chittendenabout 23 years ago8 messagesgeneral
Jump to latest
#1Sean Chittenden
sean@chittenden.org

Is there any notion of pg_stat_get_backend_last_activity()? I've got
a rogue process someplace and I'm having a devil of a time tracking
down which process is not committing its transaction and letting the
rest of the world continue to access the tables. Based off of the
last query performed, I'd have some idea as to where to hunt and could
figure this out, but as best as I can tell, the only option available
is to setup a query log and cruize through there with grep/less. Am I
missing more programatic and obvious? -sc

PS, the archives didn't turn up anything for 'idle in transaction' ???

--
Sean Chittenden

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sean Chittenden (#1)
Re: pg_stat_get_backen_last_activity() ???

Sean Chittenden <sean@chittenden.org> writes:

Is there any notion of pg_stat_get_backend_last_activity()? I've got
a rogue process someplace and I'm having a devil of a time tracking
down which process is not committing its transaction and letting the
rest of the world continue to access the tables.

Perhaps look in pg_locks to see who's holding the oldest open
transaction number, or who's got the most locks that other people
are waiting for? That should get you at least the PID of the offender.
Then use ps and/or pg_stat_activity to figure out what the offender
is doing.

regards, tom lane

#3Sean Chittenden
sean@chittenden.org
In reply to: Tom Lane (#2)
Re: pg_stat_get_backen_last_activity() ???

Is there any notion of pg_stat_get_backend_last_activity()? I've
got a rogue process someplace and I'm having a devil of a time
tracking down which process is not committing its transaction and
letting the rest of the world continue to access the tables.

Perhaps look in pg_locks to see who's holding the oldest open
transaction number, or who's got the most locks that other people
are waiting for? That should get you at least the PID of the
offender. Then use ps and/or pg_stat_activity to figure out what
the offender is doing.

I can already identify that. Using the PID of the backend + sockstat
on both the DB server and on the remote www server I can get the PID
of the webserver process causing the problem, but I have no clue which
customer, which page, which query, etc. The www process is doing
something along the lines of:

BEGIN;
SELECT....
[waits forever]
END;

I could easily grep and figure out what the query is if I knew what
the prior query was because that'd give me some indication as to which
page it was that was sitting there pouting inside of a transaction.
Attaching GDB to random httpd processes isn't my idea of fun. :-/ The
other problem is that with the frequency being around once every week
or so, it's one of those largely annoying things that isn't too
critical in its frequency, but just critical enough to spend some
cycles on here and there. -sc

--
Sean Chittenden

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sean Chittenden (#3)
Re: pg_stat_get_backen_last_activity() ???

Sean Chittenden <sean@chittenden.org> writes:

I can already identify that. Using the PID of the backend + sockstat
on both the DB server and on the remote www server I can get the PID
of the webserver process causing the problem, but I have no clue which
customer, which page, which query, etc. The www process is doing
something along the lines of:

BEGIN;
SELECT....
[waits forever]
END;

Oh, you want to know the actual current query. Turn on
stats_command_string or whatever it's called. See the statistics
collector documentation.

If this is after-the-fact (ie, you want to know about a process that's
hung right now), attach to it with a debugger and print out
debug_query_string.

regards, tom lane

#5Sean Chittenden
sean@chittenden.org
In reply to: Tom Lane (#4)
Re: pg_stat_get_backen_last_activity() ???

I can already identify that. Using the PID of the backend +
sockstat on both the DB server and on the remote www server I can
get the PID of the webserver process causing the problem, but I
have no clue which customer, which page, which query, etc. The
www process is doing something along the lines of:

BEGIN;
SELECT....
[waits forever]
END;

Oh, you want to know the actual current query. Turn on
stats_command_string or whatever it's called. See the statistics
collector documentation.

If this is after-the-fact (ie, you want to know about a process
that's hung right now), attach to it with a debugger and print out
debug_query_string.

Alright, turned on stats_command_string and am seeing nothing in any
of the stats tables and debug_query_string is null.

(gdb) p debug_query_string
$1 = 0
(gdb) bt
#0 0x283d0c58 in recvfrom () from /usr/lib/libc.so.4
#1 0x283c092b in recv () from /usr/lib/libc.so.4
#2 0x80f11c6 in secure_read ()
#3 0x80f4557 in StreamConnection ()
#4 0x80f4935 in pq_getbyte ()
#5 0x81377b0 in PostgresMain ()
#6 0x811a63c in PostmasterMain ()
#7 0x8119e5b in PostmasterMain ()
#8 0x811934b in PostmasterMain ()
#9 0x8118e94 in PostmasterMain ()
#10 0x80f53d7 in main ()
#11 0x8069b3e in _start ()

Any ideas on how I can figure out what the mystery query/app/lib is
that's causing my backend to hang with a zillion connections that are
all idle?

-sc

--
Sean Chittenden

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sean Chittenden (#5)
Re: pg_stat_get_backen_last_activity() ???

Sean Chittenden <sean@chittenden.org> writes:

(gdb) bt
#0 0x283d0c58 in recvfrom () from /usr/lib/libc.so.4
#1 0x283c092b in recv () from /usr/lib/libc.so.4
#2 0x80f11c6 in secure_read ()
#3 0x80f4557 in StreamConnection ()
#4 0x80f4935 in pq_getbyte ()
#5 0x81377b0 in PostgresMain ()
#6 0x811a63c in PostmasterMain ()
#7 0x8119e5b in PostmasterMain ()
#8 0x811934b in PostmasterMain ()
#9 0x8118e94 in PostmasterMain ()
#10 0x80f53d7 in main ()
#11 0x8069b3e in _start ()

Any ideas on how I can figure out what the mystery query/app/lib is
that's causing my backend to hang with a zillion connections that are
all idle?

Well, that particular backend is not hung: the stack trace proves it's
waiting for a command from its frontend. Perhaps you need to look
through your other Postgres processes.

regards, tom lane

#7Sean Chittenden
sean@chittenden.org
In reply to: Tom Lane (#6)
Re: pg_stat_get_backen_last_activity() ???

(gdb) bt
#0 0x283d0c58 in recvfrom () from /usr/lib/libc.so.4
#1 0x283c092b in recv () from /usr/lib/libc.so.4
#2 0x80f11c6 in secure_read ()
#3 0x80f4557 in StreamConnection ()
#4 0x80f4935 in pq_getbyte ()
#5 0x81377b0 in PostgresMain ()
#6 0x811a63c in PostmasterMain ()
#7 0x8119e5b in PostmasterMain ()
#8 0x811934b in PostmasterMain ()
#9 0x8118e94 in PostmasterMain ()
#10 0x80f53d7 in main ()
#11 0x8069b3e in _start ()

Any ideas on how I can figure out what the mystery query/app/lib
is that's causing my backend to hang with a zillion connections
that are all idle?

Well, that particular backend is not hung: the stack trace proves it's
waiting for a command from its frontend. Perhaps you need to look
through your other Postgres processes.

Sorry, hang is the wrong word. Block is more correct. I know it's
waiting for commands from the front end, but I'm trying to figure out
what front end/query/lib/page is causing the problem. It's only once
in few thousand queries that it happens and I'm going crazy trying to
figure out what it is that's causing this. I've set
stats_command_string to true and attached gdb to the backend, but
that's null. Is there another var in the backend that contains a
pointer to the last query performed? I haven't found it if there
is... -sc

--
Sean Chittenden

#8Sean Chittenden
sean@chittenden.org
In reply to: Sean Chittenden (#7)
Re: pg_stat_get_backen_last_activity() ???

Any ideas on how I can figure out what the mystery query/app/lib
is that's causing my backend to hang with a zillion connections
that are all idle?

Well, that particular backend is not hung: the stack trace proves it's
waiting for a command from its frontend. Perhaps you need to look
through your other Postgres processes.

Sorry, hang is the wrong word. Block is more correct. I know it's
waiting for commands from the front end, but I'm trying to figure out
what front end/query/lib/page is causing the problem. It's only once
in few thousand queries that it happens and I'm going crazy trying to
figure out what it is that's causing this. I've set
stats_command_string to true and attached gdb to the backend, but
that's null. Is there another var in the backend that contains a
pointer to the last query performed? I haven't found it if there
is... -sc

Hrm, guess this'll work:

log_pid = true
log_statement = true
log_duration = true
log_timestamp = true

Lower tech, but seems to work well effective. I changed pg_ctl to zip
my error logs though, otherwise it grows too quickly. :-/

-        "$po_path" ${1+"$@"} ${PGDATAOPTS+$PGDATAOPTS} </dev/null >>$logfile 2>& 1 &
+        "$po_path" ${1+"$@"} ${PGDATAOPTS+$PGDATAOPTS} </dev/null | $gzip -${ziplevel} -c >>$logfile 2>& 1 &

-sc

--
Sean Chittenden