vacuum vs open transactions

Started by Ed L.over 21 years ago16 messagesgeneral
Jump to latest
#1Ed L.
pgsql@bluepolka.net

I'm looking at some 7.3.4 vacuum output, and at first glance it does not
appear that vacuum is reclaiming any dead tuple space if there is even a
single open transaction, even if the open transaction does not in any way
reference the table being vacuumed. Is that correct? Is the behavior
different in later versions?

Ed

#2Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Ed L. (#1)
Re: vacuum vs open transactions

On Wed, 2005-01-12 at 11:59, Ed L. wrote:

I'm looking at some 7.3.4 vacuum output, and at first glance it does not
appear that vacuum is reclaiming any dead tuple space if there is even a
single open transaction, even if the open transaction does not in any way
reference the table being vacuumed. Is that correct? Is the behavior
different in later versions?

I believe the problem is occurring if the open transaction is older than
the tuples that could be vacuumed. The MVCC system means that as long
as a transaction that started X hours ago is still open, the tuples that
have been freed since then can't vacuumed because they need to stay
visible for that transaction.

#3Ed L.
pgsql@bluepolka.net
In reply to: Scott Marlowe (#2)
Re: vacuum vs open transactions

On Wednesday January 12 2005 11:10, Scott Marlowe wrote:

I believe the problem is occurring if the open transaction is older than
the tuples that could be vacuumed. The MVCC system means that as long
as a transaction that started X hours ago is still open, the tuples that
have been freed since then can't vacuumed because they need to stay
visible for that transaction.

Is it possible via SQL query to tell how long a transaction has been open?

Ed

#4Michael Fuhr
mike@fuhr.org
In reply to: Ed L. (#3)
Re: vacuum vs open transactions

On Wed, Jan 12, 2005 at 11:49:12AM -0700, Ed L. wrote:

Is it possible via SQL query to tell how long a transaction has been open?

I'm not aware of a way to find out when a transaction started, but
if you have stats_command_string enabled then you can query
pg_stat_activity to see when a session's current query started.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

#5Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Michael Fuhr (#4)
Re: vacuum vs open transactions

On Wed, Jan 12, 2005 at 11:30:50PM -0700, Michael Fuhr wrote:

On Wed, Jan 12, 2005 at 11:49:12AM -0700, Ed L. wrote:

Is it possible via SQL query to tell how long a transaction has been open?

I'm not aware of a way to find out when a transaction started, but
if you have stats_command_string enabled then you can query
pg_stat_activity to see when a session's current query started.

now() returns the current transaction's start time.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"No necesitamos banderas
No reconocemos fronteras" (Jorge Gonz�lez)

#6Ed L.
pgsql@bluepolka.net
In reply to: Michael Fuhr (#4)
Re: vacuum vs open transactions

On Wednesday January 12 2005 11:30, Michael Fuhr wrote:

On Wed, Jan 12, 2005 at 11:49:12AM -0700, Ed L. wrote:

Is it possible via SQL query to tell how long a transaction has been
open?

I'm not aware of a way to find out when a transaction started, but
if you have stats_command_string enabled then you can query
pg_stat_activity to see when a session's current query started.

Yes, I see that in 7.4 (not in 7.3). But my purpose would be to remotely
identify long-open transactions that are causing table bloat by making
vacuum fail to reclaim space, so it seems I need the transaction start
time, not query start time. Most likely this situation occurs when 1)
someone starts a transaction in psql and then leaves it there, or 2) an
application opens a transaction prior to getting user input.

Ed

#7Ed L.
pgsql@bluepolka.net
In reply to: Alvaro Herrera (#5)
Re: vacuum vs open transactions

On Thursday January 13 2005 5:50, Alvaro Herrera wrote:

On Wed, Jan 12, 2005 at 11:30:50PM -0700, Michael Fuhr wrote:

On Wed, Jan 12, 2005 at 11:49:12AM -0700, Ed L. wrote:

Is it possible via SQL query to tell how long a transaction has been
open?

I'm not aware of a way to find out when a transaction started, but
if you have stats_command_string enabled then you can query
pg_stat_activity to see when a session's current query started.

now() returns the current transaction's start time.

That would be perfect if I could see that for transactions other than my
own.

Ed

#8Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Ed L. (#7)
Re: vacuum vs open transactions

On Thu, Jan 13, 2005 at 08:45:38AM -0700, Ed L. wrote:

On Thursday January 13 2005 5:50, Alvaro Herrera wrote:

On Wed, Jan 12, 2005 at 11:30:50PM -0700, Michael Fuhr wrote:

On Wed, Jan 12, 2005 at 11:49:12AM -0700, Ed L. wrote:

Is it possible via SQL query to tell how long a transaction has been
open?

now() returns the current transaction's start time.

That would be perfect if I could see that for transactions other than my
own.

No, there's no way to know that. Unless, of course, you save it
yourself somewhere at transaction start (though because of isolation,
using a regular table would not help you any. Perhaps some PL's shared
data can help you there.)

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
Este mail se entrega garantizadamente 100% libre de sarcasmo.

#9Michael Fuhr
mike@fuhr.org
In reply to: Alvaro Herrera (#5)
Re: vacuum vs open transactions

On Thu, Jan 13, 2005 at 09:50:38AM -0300, Alvaro Herrera wrote:

On Wed, Jan 12, 2005 at 11:30:50PM -0700, Michael Fuhr wrote:

I'm not aware of a way to find out when a transaction started, but
if you have stats_command_string enabled then you can query
pg_stat_activity to see when a session's current query started.

now() returns the current transaction's start time.

I meant when any transaction started, particularly a transaction
other than the current one.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

#10Michael Fuhr
mike@fuhr.org
In reply to: Ed L. (#6)
Re: vacuum vs open transactions

On Thu, Jan 13, 2005 at 08:44:56AM -0700, Ed L. wrote:

On Wednesday January 12 2005 11:30, Michael Fuhr wrote:

I'm not aware of a way to find out when a transaction started, but
if you have stats_command_string enabled then you can query
pg_stat_activity to see when a session's current query started.

Yes, I see that in 7.4 (not in 7.3). But my purpose would be to remotely
identify long-open transactions that are causing table bloat by making
vacuum fail to reclaim space, so it seems I need the transaction start
time, not query start time. Most likely this situation occurs when 1)
someone starts a transaction in psql and then leaves it there, or 2) an
application opens a transaction prior to getting user input.

For idle transactions pg_stat_activity shows "<IDLE> in transaction"
and the query_start column shows when the transaction became idle
(i.e., when the last statement completed). So if long-lived idle
transactions are the problem, then at least you can find out how
long they've been idle.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

#11Ed L.
pgsql@bluepolka.net
In reply to: Michael Fuhr (#10)
Re: vacuum vs open transactions

On Thursday January 13 2005 10:09, Michael Fuhr wrote:

For idle transactions pg_stat_activity shows "<IDLE> in transaction"
and the query_start column shows when the transaction became idle
(i.e., when the last statement completed). So if long-lived idle
transactions are the problem, then at least you can find out how
long they've been idle.

That will help, thanks. Unfortunately, that doesn't appear to work for many
of our pre-7.4 clusters, of which we have 60, but that will definitely help
as we migrate forward.

Ed

#12Martijn van Oosterhout
kleptog@svana.org
In reply to: Ed L. (#6)
Re: vacuum vs open transactions

On Thu, Jan 13, 2005 at 08:44:56AM -0700, Ed L. wrote:

Yes, I see that in 7.4 (not in 7.3). But my purpose would be to remotely
identify long-open transactions that are causing table bloat by making
vacuum fail to reclaim space, so it seems I need the transaction start
time, not query start time. Most likely this situation occurs when 1)
someone starts a transaction in psql and then leaves it there, or 2) an
application opens a transaction prior to getting user input.

Wouldn't the transaction ID be more useful. An earlier transaction ID
obviously started earlier. So you should be able to identify the oldest
transaction. Would the transaction ID field in pg_locks do?

Hope this helps,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

#13Michael Fuhr
mike@fuhr.org
In reply to: Martijn van Oosterhout (#12)
Re: vacuum vs open transactions

On Thu, Jan 13, 2005 at 07:11:09PM +0100, Martijn van Oosterhout wrote:

Wouldn't the transaction ID be more useful. An earlier transaction ID
obviously started earlier. So you should be able to identify the oldest
transaction. Would the transaction ID field in pg_locks do?

That'll show which transaction is oldest but not how long it's been
open or idle, i.e., whether it's "long-open" or not. I assumed,
perhaps incorrectly, that he was already looking at pg_locks and
wanted to find out which of those transactions had been open for a
long time. Since pg_locks has a pid column, you can join (visually
or via a join query) with pg_stat_activity's procpid column.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

#14Ed L.
pgsql@bluepolka.net
In reply to: Michael Fuhr (#13)
Re: vacuum vs open transactions

On Thursday January 13 2005 11:37, Michael Fuhr wrote:

That'll show which transaction is oldest but not how long it's been
open or idle, i.e., whether it's "long-open" or not. I assumed,
perhaps incorrectly, that he was already looking at pg_locks and
wanted to find out which of those transactions had been open for a
long time. Since pg_locks has a pid column, you can join (visually
or via a join query) with pg_stat_activity's procpid column.

What I'm after is a simple way to automatically tell via cron query if there
is a transaction staying open long enough (probably an hour) to cause bloat
but that doesn't require any visual inspection. While not fool-proof,
pg_stat_activity.query_start looks like a 90% answer.

It'd be nice if pg_stat_activity.transaction_start were added in the future
for a 100% answer, but I'm not sure there's much interest in this apart
from our needs.

Ed

#15Michael Fuhr
mike@fuhr.org
In reply to: Ed L. (#11)
Re: vacuum vs open transactions

On Thu, Jan 13, 2005 at 10:33:50AM -0700, Ed L. wrote:

On Thursday January 13 2005 10:09, Michael Fuhr wrote:

For idle transactions pg_stat_activity shows "<IDLE> in transaction"
and the query_start column shows when the transaction became idle

That will help, thanks. Unfortunately, that doesn't appear to work for many
of our pre-7.4 clusters, of which we have 60, but that will definitely help
as we migrate forward.

Ah yes, I see in the Relase Notes that the query start time was
added in 7.4.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

#16Michael Fuhr
mike@fuhr.org
In reply to: Ed L. (#14)
Re: vacuum vs open transactions

On Thu, Jan 13, 2005 at 12:04:28PM -0700, Ed L. wrote:

It'd be nice if pg_stat_activity.transaction_start were added in the future
for a 100% answer, but I'm not sure there's much interest in this apart
from our needs.

I wouldn't expect that to be hard to add. Consider submitting a
patch or proposing it to the developers.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/