Additional stats for Relations

Started by NikhilSabout 19 years ago10 messages
#1NikhilS
nikkhils@gmail.com

Hi,

Currently a "select * from pg_statio_user_tables;" displays only
heap_blks_read, heap_blks_hit stats amongst others for the main relation. It
would be good to have the following stats collected too. I think these stats
can be used to better statistically analyze/understand the block I/O
activity on the relation:

heap_blks_reused: The number of buffers returned by the FSM for use to store
a new tuple in

heap_blks_extend: The number of times file extend was invoked on the
relation

heap_blks_truncate: The total number of blocks that have been truncated due
to vacuum activity e.g.

As an addendum to the truncate stats above, we can also have the additional
following stats:

heap_blks_maxtruncate: The max block of buffers truncated in one go

heap_blks_ntruncate: The number of times truncate was called on this
relation

I can come up with a patch (already have one) for the above. Any
thought/comments?

Regards,
Nikhils
(www.enterprisedb.com)
--
All the world's a stage, and most of us are desperately unrehearsed.

#2Jim C. Nasby
jim@nasby.net
In reply to: NikhilS (#1)
Re: Additional stats for Relations

On Fri, Oct 13, 2006 at 06:17:47PM +0530, NikhilS wrote:

Currently a "select * from pg_statio_user_tables;" displays only
heap_blks_read, heap_blks_hit stats amongst others for the main relation. It
would be good to have the following stats collected too. I think these stats
can be used to better statistically analyze/understand the block I/O
activity on the relation:

heap_blks_reused: The number of buffers returned by the FSM for use to store
a new tuple in

The description on this is misleading... FSM doesn't return buffers, it
returns pages that have free space on them.

I'm also not sure if this metric is what you actually want, since a
single page can be returned many times from the FSM even between
vacuums. Tracking how many pages for a relation have been put into the
FSM might be more useful...

heap_blks_extend: The number of times file extend was invoked on the
relation

heap_blks_truncate: The total number of blocks that have been truncated due
to vacuum activity e.g.

As an addendum to the truncate stats above, we can also have the additional
following stats:

heap_blks_maxtruncate: The max block of buffers truncated in one go

heap_blks_ntruncate: The number of times truncate was called on this
relation

I can come up with a patch (already have one) for the above. Any
thought/comments?

Do you have a use-case for this info? I can see where it might be neat
to know, but I'm not sure how you'd actually use it in the real world.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#3NikhilS
nikkhils@gmail.com
In reply to: Jim C. Nasby (#2)
Re: Additional stats for Relations

Hi Jim,

On 10/13/06, Jim C. Nasby <jim@nasby.net> wrote:

On Fri, Oct 13, 2006 at 06:17:47PM +0530, NikhilS wrote:

Currently a "select * from pg_statio_user_tables;" displays only
heap_blks_read, heap_blks_hit stats amongst others for the main

relation. It

would be good to have the following stats collected too. I think these

stats

can be used to better statistically analyze/understand the block I/O
activity on the relation:

heap_blks_reused: The number of buffers returned by the FSM for use to

store

a new tuple in

The description on this is misleading... FSM doesn't return buffers, it
returns pages that have free space on them.

<Nikhils>
FSM returns the block number from which we fetch the buffer. This is similar
to the way we track buffer_read stats in ReadBuffer.
<Nikhils>

I'm also not sure if this metric is what you actually want, since a
single page can be returned many times from the FSM even between
vacuums. Tracking how many pages for a relation have been put into the
FSM might be more useful...

<Nikhils>
Pages might be put into the FSM, but by this metric don't we get the actual
usage of the pages from the FSM? Agreed a single page can be returned
multiple times, but since it serves a new tuple, shouldn't we track it?
<Nikhils>

heap_blks_extend: The number of times file extend was invoked on the
relation

heap_blks_truncate: The total number of blocks that have been truncated

due

to vacuum activity e.g.

As an addendum to the truncate stats above, we can also have the

additional

following stats:

heap_blks_maxtruncate: The max block of buffers truncated in one go

heap_blks_ntruncate: The number of times truncate was called on this
relation

I can come up with a patch (already have one) for the above. Any
thought/comments?

Do you have a use-case for this info? I can see where it might be neat
to know, but I'm not sure how you'd actually use it in the real world.

<Nikhils>
The use-case according to me is that these stats help prove the
effectiveness of autovacuum/vacuum operations. By varying some autovac guc
variables, and doing subsequent (pgbench e.g.) runs, one can find out the
optimum values for these variables using these stats.
<Nikhils>

Regards,
Nikhils
EnterpriseDB http://www.enterprisedb.com
--
All the world's a stage, and most of us are desperately unrehearsed.

#4Simon Riggs
simon@2ndquadrant.com
In reply to: NikhilS (#3)
Re: Additional stats for Relations

On Sat, 2006-10-14 at 11:32 +0530, NikhilS wrote:

On 10/13/06, Jim C. Nasby <jim@nasby.net> wrote:

I'm also not sure if this metric is what you actually want,
since a
single page can be returned many times from the FSM even
between
vacuums. Tracking how many pages for a relation have been put
into the
FSM might be more useful...

<Nikhils>
Pages might be put into the FSM, but by this metric don't we get the
actual usage of the pages from the FSM? Agreed a single page can be
returned multiple times, but since it serves a new tuple, shouldn't we
track it?
<Nikhils>

This makes sense for indexes, but only makes sense for heaps when we
know that the backend will keep re-accessing the block until it is full
- so only of interest in steady-state workloads.

IMHO Jim's proposal makes more sense for general use.

heap_blks_extend: The number of times file extend was

invoked on the

relation

Sounds good

heap_blks_truncate: The total number of blocks that have

been truncated due

to vacuum activity e.g.

Sounds good

As an addendum to the truncate stats above, we can also have

the additional

following stats:

heap_blks_maxtruncate: The max block of buffers truncated in

one go

heap_blks_ntruncate: The number of times truncate was called

on this

relation

Those last 2 sound too complex for normal use and ntruncate is most
likely the same as number of vacuums anyway. Hmmm...Perhaps nvacuums is
a more interesting metric? We've got last vacuum date, but no indication
of how frequently a vacuum has run.

Do you have a use-case for this info? I can see where it might
be neat
to know, but I'm not sure how you'd actually use it in the
real world.

<Nikhils>
The use-case according to me is that these stats help prove the
effectiveness of autovacuum/vacuum operations. By varying some autovac
guc variables, and doing subsequent (pgbench e.g.) runs, one can find
out the optimum values for these variables using these stats.
<Nikhils>

This should be useful for tuning space allocation/deallocation. If we
get this patch in early it should help get feedback on this area.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#5NikhilS
nikkhils@gmail.com
In reply to: Simon Riggs (#4)
Re: Additional stats for Relations

Hi,

So:
heap_blks_reused (with Jim's semantics), heap_blks_extend,
heap_blks_truncate are the "interesting" stats? Will try to work up a patch
for this.

Regards,
Nikhils
EnterpriseDB http://www.enterprisedb.com
On 10/15/06, Simon Riggs <simon@2ndquadrant.com> wrote:

On Sat, 2006-10-14 at 11:32 +0530, NikhilS wrote:

On 10/13/06, Jim C. Nasby <jim@nasby.net> wrote:

I'm also not sure if this metric is what you actually want,
since a
single page can be returned many times from the FSM even
between
vacuums. Tracking how many pages for a relation have been put
into the
FSM might be more useful...

<Nikhils>
Pages might be put into the FSM, but by this metric don't we get the
actual usage of the pages from the FSM? Agreed a single page can be
returned multiple times, but since it serves a new tuple, shouldn't we
track it?
<Nikhils>

This makes sense for indexes, but only makes sense for heaps when we
know that the backend will keep re-accessing the block until it is full
- so only of interest in steady-state workloads.

IMHO Jim's proposal makes more sense for general use.

heap_blks_extend: The number of times file extend was

invoked on the

relation

Sounds good

heap_blks_truncate: The total number of blocks that have

been truncated due

to vacuum activity e.g.

Sounds good

As an addendum to the truncate stats above, we can also have

the additional

following stats:

heap_blks_maxtruncate: The max block of buffers truncated in

one go

heap_blks_ntruncate: The number of times truncate was called

on this

relation

Those last 2 sound too complex for normal use and ntruncate is most
likely the same as number of vacuums anyway. Hmmm...Perhaps nvacuums is
a more interesting metric? We've got last vacuum date, but no indication
of how frequently a vacuum has run.

Do you have a use-case for this info? I can see where it might
be neat
to know, but I'm not sure how you'd actually use it in the
real world.

<Nikhils>
The use-case according to me is that these stats help prove the
effectiveness of autovacuum/vacuum operations. By varying some autovac
guc variables, and doing subsequent (pgbench e.g.) runs, one can find
out the optimum values for these variables using these stats.
<Nikhils>

This should be useful for tuning space allocation/deallocation. If we
get this patch in early it should help get feedback on this area.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

--
All the world's a stage, and most of us are desperately unrehearsed.

#6Jim C. Nasby
jim@nasby.net
In reply to: NikhilS (#5)
Re: Additional stats for Relations

Also how many times a relation has been vacuumed (which puts all the
other numbers in more perspective... good catch Simon). And I think
number of pages that could not be added to the FSM would also be
extremely valuable.

On Wed, Oct 18, 2006 at 11:27:39AM +0530, NikhilS wrote:

Hi,

So:
heap_blks_reused (with Jim's semantics), heap_blks_extend,
heap_blks_truncate are the "interesting" stats? Will try to work up a patch
for this.

Regards,
Nikhils
EnterpriseDB http://www.enterprisedb.com
On 10/15/06, Simon Riggs <simon@2ndquadrant.com> wrote:

On Sat, 2006-10-14 at 11:32 +0530, NikhilS wrote:

On 10/13/06, Jim C. Nasby <jim@nasby.net> wrote:

I'm also not sure if this metric is what you actually want,
since a
single page can be returned many times from the FSM even
between
vacuums. Tracking how many pages for a relation have been put
into the
FSM might be more useful...

<Nikhils>
Pages might be put into the FSM, but by this metric don't we get the
actual usage of the pages from the FSM? Agreed a single page can be
returned multiple times, but since it serves a new tuple, shouldn't we
track it?
<Nikhils>

This makes sense for indexes, but only makes sense for heaps when we
know that the backend will keep re-accessing the block until it is full
- so only of interest in steady-state workloads.

IMHO Jim's proposal makes more sense for general use.

heap_blks_extend: The number of times file extend was

invoked on the

relation

Sounds good

heap_blks_truncate: The total number of blocks that have

been truncated due

to vacuum activity e.g.

Sounds good

As an addendum to the truncate stats above, we can also have

the additional

following stats:

heap_blks_maxtruncate: The max block of buffers truncated in

one go

heap_blks_ntruncate: The number of times truncate was called

on this

relation

Those last 2 sound too complex for normal use and ntruncate is most
likely the same as number of vacuums anyway. Hmmm...Perhaps nvacuums is
a more interesting metric? We've got last vacuum date, but no indication
of how frequently a vacuum has run.

Do you have a use-case for this info? I can see where it might
be neat
to know, but I'm not sure how you'd actually use it in the
real world.

<Nikhils>
The use-case according to me is that these stats help prove the
effectiveness of autovacuum/vacuum operations. By varying some autovac
guc variables, and doing subsequent (pgbench e.g.) runs, one can find
out the optimum values for these variables using these stats.
<Nikhils>

This should be useful for tuning space allocation/deallocation. If we
get this patch in early it should help get feedback on this area.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

--
All the world's a stage, and most of us are desperately unrehearsed.

--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#7NikhilS
nikkhils@gmail.com
In reply to: Jim C. Nasby (#6)
Re: Additional stats for Relations

Hi Jim,

On 10/18/06, Jim C. Nasby <jim@nasby.net> wrote:

Also how many times a relation has been vacuumed (which puts all the
other numbers in more perspective... good catch Simon). And I think
number of pages that could not be added to the FSM would also be
extremely valuable.

By the above, do you mean the number of pages that could not be added to the
FSM because they had freespace which was less than the threshold for this
particular relation?

Regards,
Nikhils
EnterpriseDB http://www.enterprisedb.com

On Wed, Oct 18, 2006 at 11:27:39AM +0530, NikhilS wrote:

Hi,

So:
heap_blks_reused (with Jim's semantics), heap_blks_extend,
heap_blks_truncate are the "interesting" stats? Will try to work up a

patch

for this.

Regards,
Nikhils
EnterpriseDB http://www.enterprisedb.com
On 10/15/06, Simon Riggs <simon@2ndquadrant.com> wrote:

On Sat, 2006-10-14 at 11:32 +0530, NikhilS wrote:

On 10/13/06, Jim C. Nasby <jim@nasby.net> wrote:

I'm also not sure if this metric is what you actually want,
since a
single page can be returned many times from the FSM even
between
vacuums. Tracking how many pages for a relation have been put
into the
FSM might be more useful...

<Nikhils>
Pages might be put into the FSM, but by this metric don't we get the
actual usage of the pages from the FSM? Agreed a single page can be
returned multiple times, but since it serves a new tuple, shouldn't

we

track it?
<Nikhils>

This makes sense for indexes, but only makes sense for heaps when we
know that the backend will keep re-accessing the block until it is full
- so only of interest in steady-state workloads.

IMHO Jim's proposal makes more sense for general use.

heap_blks_extend: The number of times file extend was

invoked on the

relation

Sounds good

heap_blks_truncate: The total number of blocks that have

been truncated due

to vacuum activity e.g.

Sounds good

As an addendum to the truncate stats above, we can also

have

the additional

following stats:

heap_blks_maxtruncate: The max block of buffers truncated

in

one go

heap_blks_ntruncate: The number of times truncate was

called

on this

relation

Those last 2 sound too complex for normal use and ntruncate is most
likely the same as number of vacuums anyway. Hmmm...Perhaps nvacuums is
a more interesting metric? We've got last vacuum date, but no

indication

of how frequently a vacuum has run.

Do you have a use-case for this info? I can see where it

might

be neat
to know, but I'm not sure how you'd actually use it in the
real world.

<Nikhils>
The use-case according to me is that these stats help prove the
effectiveness of autovacuum/vacuum operations. By varying some

autovac

guc variables, and doing subsequent (pgbench e.g.) runs, one can find
out the optimum values for these variables using these stats.
<Nikhils>

This should be useful for tuning space allocation/deallocation. If we
get this patch in early it should help get feedback on this area.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

--
All the world's a stage, and most of us are desperately unrehearsed.

--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

--
All the world's a stage, and most of us are desperately unrehearsed.

#8Jim C. Nasby
jim@nasby.net
In reply to: NikhilS (#7)
Re: Additional stats for Relations

On Thu, Oct 19, 2006 at 04:10:46PM +0530, NikhilS wrote:

Hi Jim,

On 10/18/06, Jim C. Nasby <jim@nasby.net> wrote:

Also how many times a relation has been vacuumed (which puts all the
other numbers in more perspective... good catch Simon). And I think
number of pages that could not be added to the FSM would also be
extremely valuable.

By the above, do you mean the number of pages that could not be added to the
FSM because they had freespace which was less than the threshold for this
particular relation?

Yes... but... :)

We want to ignore pages that have less than the average request size,
because vacuum will never try and put them in the FSM anyway. We only
care about pages that were dropped because MaxFSMPages was less than
DesiredFSMPages (see freespace.c for more info).

It would also be useful to keep track of what relations have been bumped
out of the FSM (or never got recorded, though I'm not sure if that's
possible) because we've run into the MaxFSMRelations limit.

BTW, if we add these counters we'll be up to 7 stats dealing with vacuum
and analyze, and NikhilS has a patch we're finalizing that would add 3
more. Right now there's 4 slated to go into pg_stat_* in 8.2, but maybe
we should have a separate view for vacuum/FSM statistics?
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim C. Nasby (#8)
Re: Additional stats for Relations

"Jim C. Nasby" <jim@nasby.net> writes:

BTW, if we add these counters we'll be up to 7 stats dealing with vacuum
and analyze, and NikhilS has a patch we're finalizing that would add 3
more. Right now there's 4 slated to go into pg_stat_* in 8.2, but maybe
we should have a separate view for vacuum/FSM statistics?

I've seen no demonstration of a need for *any* of them, actually,
and am pretty dubious that we want to add so much collection overhead.
At least not without a major redesign of the stats reporting mechanism.
If we just drop in another seven counters, we'll create an immediate 50%
increase in the stats-file I/O volume, even when no vacuuming is
happening at all.

regards, tom lane

#10Jim C. Nasby
jim@nasby.net
In reply to: Tom Lane (#9)
Re: Additional stats for Relations

On Thu, Oct 19, 2006 at 11:47:53AM -0400, Tom Lane wrote:

"Jim C. Nasby" <jim@nasby.net> writes:

BTW, if we add these counters we'll be up to 7 stats dealing with vacuum
and analyze, and NikhilS has a patch we're finalizing that would add 3
more. Right now there's 4 slated to go into pg_stat_* in 8.2, but maybe
we should have a separate view for vacuum/FSM statistics?

I've seen no demonstration of a need for *any* of them, actually,
and am pretty dubious that we want to add so much collection overhead.
At least not without a major redesign of the stats reporting mechanism.
If we just drop in another seven counters, we'll create an immediate 50%
increase in the stats-file I/O volume, even when no vacuuming is
happening at all.

Yeah, for stuff like vacuuming the current stats system may not make any
sense. Almost anything dealing with vacuum can really just be put into a
table, because it doesn't happen all that often.

My concern is that there's enough useful data to collect about vacuuming
and the FSM that it should probably get it's own set of tables/views,
rather than piggy-backing on pg_stat_*. But that ship has pretty much
sailed, so we're probably stuck with at least the last_* stuff in
pg_stat_* for the immediate future.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)