autoanalyze criteria

Started by Stefan Andreattaalmost 13 years ago14 messages
#1Stefan Andreatta
s.andreatta@synedra.com

Hi,

If I understand
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html
correctly, the autovacuum threshold in could be estimated like this in
PostgreSQL 9.1:

SELECT pg_stat_user_tables.relname,
pg_stat_user_tables.n_dead_tup,
CAST(current_setting('autovacuum_vacuum_threshold') AS bigint)
+ (CAST(current_setting('autovacuum_vacuum_scale_factor') AS
numeric)
* pg_class.reltuples) AS av_threshold
FROM pg_stat_user_tables
JOIN pg_class on pg_stat_user_tables.relid = pg_class.oid
ORDER BY 1;

If pg_stat_user_tables.n_dead_tup exceeds av_threshold autovacuum should
kick in. Obviously, that does rely on up-to-date statistics. Is that how
it is actually done?

2nd question: because pg_stat_user_tables.n_dead_tup is itself estimated
by ANALYZE it cannot be used as a criterion for the next autoanalyze
run, I think. Is there any way to query, whether a table currently
qualifies for autoanalyze?

Thanks and Regards,
Stefan

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Jeff Janes
jeff.janes@gmail.com
In reply to: Stefan Andreatta (#1)
Re: autoanalyze criteria

On Fri, Feb 22, 2013 at 5:57 AM, Stefan Andreatta
<s.andreatta@synedra.com>wrote:

Hi,

If I understand http://www.postgresql.org/**docs/9.1/static/routine-**
vacuuming.html<http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html&gt;correctly, the autovacuum threshold in could be estimated like this in
PostgreSQL 9.1:

SELECT pg_stat_user_tables.relname,
pg_stat_user_tables.n_dead_**tup,
CAST(current_setting('**autovacuum_vacuum_threshold') AS bigint)
+ (CAST(current_setting('**autovacuum_vacuum_scale_**factor') AS
numeric)
* pg_class.reltuples) AS av_threshold
FROM pg_stat_user_tables
JOIN pg_class on pg_stat_user_tables.relid = pg_class.oid
ORDER BY 1;

If pg_stat_user_tables.n_dead_tup exceeds av_threshold autovacuum should
kick in. Obviously, that does rely on up-to-date statistics. Is that how it
is actually done?

Pretty much, yes. With the caveat that table storage settings can override
the global settings.

2nd question: because pg_stat_user_tables.n_dead_tup is itself estimated
by ANALYZE it cannot be used as a criterion for the next autoanalyze run, I
think.

n_dead_tup is updated by the stats collector, not by ANALYZE.

Cheers,

Jeff

#3Stefan Andreatta
s.andreatta@synedra.com
In reply to: Jeff Janes (#2)
Re: autoanalyze criteria

On 02/22/2013 06:27 PM, Jeff Janes wrote:

On Fri, Feb 22, 2013 at 5:57 AM, Stefan Andreatta
<s.andreatta@synedra.com <mailto:s.andreatta@synedra.com>> wrote:

Hi,

If I understand
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html
correctly, the autovacuum threshold in could be estimated like
this in PostgreSQL 9.1:

SELECT pg_stat_user_tables.relname,
pg_stat_user_tables.n_dead_tup,
CAST(current_setting('autovacuum_vacuum_threshold') AS bigint)
+ (CAST(current_setting('autovacuum_vacuum_scale_factor')
AS numeric)
* pg_class.reltuples) AS av_threshold
FROM pg_stat_user_tables
JOIN pg_class on pg_stat_user_tables.relid = pg_class.oid
ORDER BY 1;

If pg_stat_user_tables.n_dead_tup exceeds av_threshold autovacuum
should kick in. Obviously, that does rely on up-to-date
statistics. Is that how it is actually done?

Pretty much, yes. With the caveat that table storage settings can
override the global settings.

2nd question: because pg_stat_user_tables.n_dead_tup is itself
estimated by ANALYZE it cannot be used as a criterion for the next
autoanalyze run, I think.

n_dead_tup is updated by the stats collector, not by ANALYZE.

Thanks Jeff, that helped a lot (as did a careful rereading of
http://www.postgresql.org/docs/9.1/static/monitoring-stats.html and
http://www.postgresql.org/docs/9.1/static/catalog-pg-class.html ;-)

However, to estimate whether autoanalyze should be triggered, I am still
missing something: the analyze threshold is compared to the "total
number of tuples inserted, updated, or deleted since the last ANALYZE."
(according to
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html).

pg_stat_user_tables.n_live tup - pg_class.reltuples should give
something like the sum of rows inserted minus rows deleted since the
last ANALYZE. But according to the documentation we would need the sum
of those values. And we are still missing a number for rows updated
since the last analyze. pg_stat_usert_tables. n_dead_tup, on the other
hand, is only set back by successful VACUUM. autoanalyzing a table with
more than 10% dead rows would therefore keep autoanalyze in a loop until
the ratio rises beyond 20% (default configuration) and autovacuum kicks
in. So that wouldn't make a lot of sense.

Regards,
Stefan

#4Jeff Janes
jeff.janes@gmail.com
In reply to: Stefan Andreatta (#3)
Re: autoanalyze criteria

On Saturday, February 23, 2013, Stefan Andreatta wrote:

Thanks Jeff, that helped a lot (as did a careful rereading of
http://www.postgresql.org/docs/9.1/static/monitoring-stats.html and
http://www.postgresql.org/docs/9.1/static/catalog-pg-class.html ;-)

However, to estimate whether autoanalyze should be triggered, I am still
missing something: the analyze threshold is compared to the "total number
of tuples inserted, updated, or deleted since the last ANALYZE."
(according to
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html).

pg_stat_user_tables.n_live tup - pg_class.reltuples should give something
like the sum of rows inserted minus rows deleted since the last ANALYZE.
But according to the documentation we would need the sum of those values.
And we are still missing a number for rows updated since the last analyze.
pg_stat_usert_tables. n_dead_tup, on the other hand, is only set back by
successful VACUUM. autoanalyzing a table with more than 10% dead rows would
therefore keep autoanalyze in a loop until the ratio rises beyond 20%
(default configuration) and autovacuum kicks in. So that wouldn't make a
lot of sense.

Hi Stefan,

Sorry, I got tunnel vision about the how the threshold was computed, and
forgot about the thing it was compared to. There is a "secret" data point
in the stats collector called changes_since_analyze. This is not exposed
in the pg_stat_user_tables. But I think it should be as I often have
wanted to see it.

Cheers,

Jeff

#5Stefan Andreatta
s.andreatta@synedra.com
In reply to: Jeff Janes (#4)
Re: autoanalyze criteria

On 02/23/2013 05:10 PM, Jeff Janes wrote:

On Saturday, February 23, 2013, Stefan Andreatta wrote:

Thanks Jeff, that helped a lot (as did a careful rereading of
http://www.postgresql.org/docs/9.1/static/monitoring-stats.html
and
http://www.postgresql.org/docs/9.1/static/catalog-pg-class.html ;-)

However, to estimate whether autoanalyze should be triggered, I am
still missing something: the analyze threshold is compared to the
"total number of tuples inserted, updated, or deleted since the
last ANALYZE." (according to
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html).

pg_stat_user_tables.n_live tup - pg_class.reltuples should give
something like the sum of rows inserted minus rows deleted since
the last ANALYZE. But according to the documentation we would need
the sum of those values. And we are still missing a number for
rows updated since the last analyze. pg_stat_usert_tables.
n_dead_tup, on the other hand, is only set back by successful
VACUUM. autoanalyzing a table with more than 10% dead rows would
therefore keep autoanalyze in a loop until the ratio rises beyond
20% (default configuration) and autovacuum kicks in. So that
wouldn't make a lot of sense.

Hi Stefan,

Sorry, I got tunnel vision about the how the threshold was computed,
and forgot about the thing it was compared to. There is a "secret"
data point in the stats collector called changes_since_analyze. This
is not exposed in the pg_stat_user_tables. But I think it should be
as I often have wanted to see it.

Cheers,

Jeff

Sounds like a very good idea to me - any way I could help to make such a
thing happen?

Stefan

#6Jeff Janes
jeff.janes@gmail.com
In reply to: Stefan Andreatta (#5)
Re: [GENERAL] autoanalyze criteria

Moved discussion from General To Hackers.

On Sat, Feb 23, 2013 at 10:41 AM, Stefan Andreatta
<s.andreatta@synedra.com>wrote:

On 02/23/2013 05:10 PM, Jeff Janes wrote:

Sorry, I got tunnel vision about the how the threshold was computed, and
forgot about the thing it was compared to. There is a "secret" data point
in the stats collector called changes_since_analyze. This is not exposed
in the pg_stat_user_tables. But I think it should be as I often have
wanted to see it.

Sounds like a very good idea to me - any way I could help to make such a
thing happen?

It should be fairly easy to implement because the other columns are already
there to show you the way, and if you want to try your hand at hacking
pgsql it would be a good introduction to doing so.

Look at each instance in the code of n_dead_dup and
pg_stat_get_dead_tuples, and those are the places where
changes_since_analyze also need to be addressed, in an analogous manner
(assuming it is isn't already there.)

git grep 'n_dead_tup'

It looks like we would need to add an SQL function to retrieve the data,
then incorporate that function into the view definitions that make up the
pg_stat_user_tables etc. views. and of course update the regression test
and the documentation.

Other than implementing it, we would need to convince other hackers that
this is desirable to have. I'm not sure how hard that would be. I've
looked in the archives to see if this idea was already considered but
rejected, but I don't see any indication that it was previously considered.

(/messages/by-id/4823.1262132964@sss.pgh.pa.us).

Cheers,

Jeff

#7Stefan Andreatta
s.andreatta@synedra.com
In reply to: Jeff Janes (#6)
Re: [GENERAL] autoanalyze criteria

On 02/23/2013 09:30 PM, Jeff Janes wrote:

Moved discussion from General To Hackers.

On Sat, Feb 23, 2013 at 10:41 AM, Stefan Andreatta
<s.andreatta@synedra.com <mailto:s.andreatta@synedra.com>> wrote:

On 02/23/2013 05:10 PM, Jeff Janes wrote:

Sorry, I got tunnel vision about the how the threshold was
computed, and forgot about the thing it was compared to. There
is a "secret" data point in the stats collector
called changes_since_analyze. This is not exposed in the
pg_stat_user_tables. But I think it should be as I often have
wanted to see it.

Sounds like a very good idea to me - any way I could help to make
such a thing happen?

It should be fairly easy to implement because the other columns are
already there to show you the way, and if you want to try your hand at
hacking pgsql it would be a good introduction to doing so.

Look at each instance in the code of n_dead_dup and
pg_stat_get_dead_tuples, and those are the places where
changes_since_analyze also need to be addressed, in an analogous
manner (assuming it is isn't already there.)

git grep 'n_dead_tup'

It looks like we would need to add an SQL function to retrieve the
data, then incorporate that function into the view definitions that
make up the pg_stat_user_tables etc. views. and of course update the
regression test and the documentation.

Other than implementing it, we would need to convince other hackers
that this is desirable to have. I'm not sure how hard that would be.
I've looked in the archives to see if this idea was already considered
but rejected, but I don't see any indication that it was previously
considered.

(/messages/by-id/4823.1262132964@sss.pgh.pa.us).

Cheers,

Jeff

Not being a developer, I am afraid, I will not be going to implement it
myself - nor would anybody wish so ;-)

I also searched the archives, but the closest I found is a discussion on
the Admin List starting here:
/messages/by-id/626919622.7634700.1351695913466.JavaMail.root@alaloop.com

On the other hand, there is quite a lot of discussion about making
autoanalyze more (or less) aggressive - which seems a difficult task to
me, when you cannot even check what's triggering your autoanalyze.

Anybody else interested?

Regards,
Stefan

#8Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Stefan Andreatta (#7)
Re: [GENERAL] autoanalyze criteria

On 24/02/13 10:12, Stefan Andreatta wrote:

On 02/23/2013 09:30 PM, Jeff Janes wrote:

Moved discussion from General To Hackers.

On Sat, Feb 23, 2013 at 10:41 AM, Stefan Andreatta
<s.andreatta@synedra.com <mailto:s.andreatta@synedra.com>> wrote:

On 02/23/2013 05:10 PM, Jeff Janes wrote:

Sorry, I got tunnel vision about the how the threshold was
computed, and forgot about the thing it was compared to. There
is a "secret" data point in the stats collector
called changes_since_analyze. This is not exposed in the
pg_stat_user_tables. But I think it should be as I often have
wanted to see it.

Sounds like a very good idea to me - any way I could help to make
such a thing happen?

It should be fairly easy to implement because the other columns are
already there to show you the way, and if you want to try your hand at
hacking pgsql it would be a good introduction to doing so.

Look at each instance in the code of n_dead_dup and
pg_stat_get_dead_tuples, and those are the places where
changes_since_analyze also need to be addressed, in an analogous
manner (assuming it is isn't already there.)

git grep 'n_dead_tup'

It looks like we would need to add an SQL function to retrieve the
data, then incorporate that function into the view definitions that
make up the pg_stat_user_tables etc. views. and of course update the
regression test and the documentation.

Other than implementing it, we would need to convince other hackers
that this is desirable to have. I'm not sure how hard that would be.
I've looked in the archives to see if this idea was already considered
but rejected, but I don't see any indication that it was previously
considered.

(/messages/by-id/4823.1262132964@sss.pgh.pa.us).

Cheers,

Jeff

Not being a developer, I am afraid, I will not be going to implement it
myself - nor would anybody wish so ;-)

I also searched the archives, but the closest I found is a discussion on
the Admin List starting here:
/messages/by-id/626919622.7634700.1351695913466.JavaMail.root@alaloop.com

On the other hand, there is quite a lot of discussion about making
autoanalyze more (or less) aggressive - which seems a difficult task to
me, when you cannot even check what's triggering your autoanalyze.

Anybody else interested?

I was asked about this exact thing the other day - it would be very nice
to have the information visible. I may take a look at doing it (I've
done some hacking on the stats system previously). However don't let
that put anyone else off - as I'll have to find the time to start :-)

Regards

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Alban Hertroys
haramrae@gmail.com
In reply to: Stefan Andreatta (#3)
Re: autoanalyze criteria

On Feb 23, 2013, at 14:11, Stefan Andreatta <s.andreatta@synedra.com> wrote:

And we are still missing a number for rows updated since the last analyse.

In MVCC an update is an insert + delete, so you already got those numbers.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

#10Stefan Andreatta
s.andreatta@synedra.com
In reply to: Alban Hertroys (#9)
Re: autoanalyze criteria

On 02/24/2013 12:52 PM, Alban Hertroys wrote:

On Feb 23, 2013, at 14:11, Stefan Andreatta <s.andreatta@synedra.com
<mailto:s.andreatta@synedra.com>> wrote:

And we are still missing a number for rows updated since the last
analyse.

In MVCC an update is an insert + delete, so you already got those numbers.

Good point. But because they are an update and a delete, they cancel
each other out and do not show up in pg_stat_user_tables.n_live_tup -
and that's the only value for which we have a reference value from the
time of the last analyze (pg_class.reltuples).

On the other hand, I might again miss something out. I would be most
happy if anybody could come up with a query to estimate autoanalyze
trigger conditions from the values we have available now.

Stefan

#11Alban Hertroys
haramrae@gmail.com
In reply to: Stefan Andreatta (#10)
Re: autoanalyze criteria

On Feb 25, 2013, at 7:23, Stefan Andreatta <s.andreatta@synedra.com> wrote:

On 02/24/2013 12:52 PM, Alban Hertroys wrote:

On Feb 23, 2013, at 14:11, Stefan Andreatta <s.andreatta@synedra.com> wrote:

And we are still missing a number for rows updated since the last analyse.

In MVCC an update is an insert + delete, so you already got those numbers.

Good point. But because they are an update and a delete, they cancel each other out and do not show up in pg_stat_user_tables.n_live_tup - and that's the only value for which we have a reference value from the time of the last analyze (pg_class.reltuples).

I'm pretty sure that an update results in 1 live + 1 dead tuple, so they don't cancel each other out - they end up adding to different statistics. Assuming those statistics are both since last vacuum, added together they are the total number of changed records since last vacuum.
What gain do you expect from a number of updated tuples?

And it seems to me those numbers are since last vacuum, not since last analyse - analyse doesn't change the amount of dead tuples (it just updates them to closer match reality), but vacuum does.

Disclaimer: I'm not intimately familiar with the planner statistics, but knowing what vacuum and analyse do in an MVCC database, like I described above it makes sense to me. I might be wrong though.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

#12Stefan Andreatta
s.andreatta@synedra.com
In reply to: Alban Hertroys (#11)
Re: autoanalyze criteria

On 02/25/2013 09:00 AM, Alban Hertroys wrote:

On Feb 25, 2013, at 7:23, Stefan Andreatta <s.andreatta@synedra.com
<mailto:s.andreatta@synedra.com>> wrote:

On 02/24/2013 12:52 PM, Alban Hertroys wrote:

On Feb 23, 2013, at 14:11, Stefan Andreatta <s.andreatta@synedra.com
<mailto:s.andreatta@synedra.com>> wrote:

And we are still missing a number for rows updated since the last
analyse.

In MVCC an update is an insert + delete, so you already got those
numbers.

Good point. But because they are an update and a delete, they cancel
each other out and do not show up in pg_stat_user_tables.n_live_tup -
and that's the only value for which we have a reference value from
the time of the last analyze (pg_class.reltuples).

I'm pretty sure that an update results in 1 live + 1 dead tuple, so
they don't cancel each other out - they end up adding to different
statistics. Assuming those statistics are both since last vacuum,
added together they are the total number of changed records since last
vacuum.
What gain do you expect from a number of updated tuples?

And it seems to me those numbers are since last vacuum, not since last
analyse - analyse doesn't change the amount of dead tuples (it just
updates them to closer match reality), but vacuum does.

Disclaimer: I'm not intimately familiar with the planner statistics,
but knowing what vacuum and analyse do in an MVCC database, like I
described above it makes sense to me. I might be wrong though.

1 update = 1 insert + 1 delete cancel each other out with respect to
pg_stat_user_tables.n_live_tup. Naturally, they dont't cancel each other
out with pg_stat_user_tables.n_tup_ins or n_tup_del - they don't even
show up in those values, presumably because that's what n_tup_upd is
there for. However the update adds to n_dead_tup.

VACUUM does not reset *any* of the statistics values that can be
accessed via pg_stat_user_tables, apart from n_dead_tup (hopefully ;-)
Anyway, to estimate the autoanalyze trigger, I would need statistics
that get reset by autoanalyze not autovacuum.

I wrote a test script to show the behaviour. Be sure to wait a second
each time before accessing pg_stat_user_tables as there is a delay in
getting those data:

CREATE TABLE test_stat (id BIGINT, some_number BIGINT);
INSERT INTO test_stat (SELECT generate_series(1,10000) AS i, random() AS r);
SELECT count(*) FROM test_stat;
ANALYZE test_stat;

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

INSERT INTO test_stat (SELECT generate_series(10001,10900) AS i,
random() AS r);

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

DELETE FROM test_stat WHERE id > 10000;

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

UPDATE test_stat set some_number = 1 where id > 9100;

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

ANALYZE test_stat;

SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

VACUUM test_stat;

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

DROP TABLE test_stat;

Output from a postgres 9.2 database:
--------------------------------------------------

test=# CREATE TABLE test_stat (id BIGINT, some_number BIGINT);
CREATE TABLE
test=# INSERT INTO test_stat (SELECT generate_series(1,10000) AS i,
random() AS r);
INSERT 0 10000
test=# SELECT count(*) FROM test_stat;
count
-------
10000
(1 row)

test=# ANALYZE test_stat;
ANALYZE
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
relname | reltuples
-----------+-----------
test_stat | 10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
test_stat | 10000 | 0 | 10000 | 0 |
0 | 0
(1 row)

test=# INSERT INTO test_stat (SELECT generate_series(10001,10900) AS i,
random() AS r);
INSERT 0 900
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
relname | reltuples
-----------+-----------
test_stat | 10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
test_stat | 10900 | 0 | 10900 | 0 |
0 | 0
(1 row)

test=# DELETE FROM test_stat WHERE id > 10000;
DELETE 900
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
relname | reltuples
-----------+-----------
test_stat | 10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
test_stat | 10000 | 900 | 10900 | 0 | 900
| 0
(1 row)

test=# UPDATE test_stat set some_number = 1 where id > 9100;
UPDATE 900
test=# -- wait here (0.5 s) for statistics collector to catch up
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
relname | reltuples
-----------+-----------
test_stat | 10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
test_stat | 10000 | 1800 | 10900 | 900 | 900
| 10
(1 row)

test=# ANALYZE test_stat;
ANALYZE
test=#
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
relname | reltuples
-----------+-----------
test_stat | 10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
test_stat | 10000 | 1800 | 10900 | 900 | 900
| 10
(1 row)

test=# VACUUM test_stat;
VACUUM
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
relname | reltuples
-----------+-----------
test_stat | 10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
test_stat | 10000 | 0 | 10900 | 900 | 900
| 10
(1 row)

test=# DROP TABLE test_stat;
DROP TABLE

Regards,
Stefan

#13Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Mark Kirkwood (#8)
1 attachment(s)
Re: [GENERAL] autoanalyze criteria

On 24/02/13 10:51, Mark Kirkwood wrote:

On 24/02/13 10:12, Stefan Andreatta wrote:

On 02/23/2013 09:30 PM, Jeff Janes wrote:

Moved discussion from General To Hackers.

On Sat, Feb 23, 2013 at 10:41 AM, Stefan Andreatta
<s.andreatta@synedra.com <mailto:s.andreatta@synedra.com>> wrote:

On 02/23/2013 05:10 PM, Jeff Janes wrote:

Sorry, I got tunnel vision about the how the threshold was
computed, and forgot about the thing it was compared to. There
is a "secret" data point in the stats collector
called changes_since_analyze. This is not exposed in the
pg_stat_user_tables. But I think it should be as I often have
wanted to see it.

Sounds like a very good idea to me - any way I could help to make
such a thing happen?

It should be fairly easy to implement because the other columns are
already there to show you the way, and if you want to try your hand at
hacking pgsql it would be a good introduction to doing so.

Look at each instance in the code of n_dead_dup and
pg_stat_get_dead_tuples, and those are the places where
changes_since_analyze also need to be addressed, in an analogous
manner (assuming it is isn't already there.)

git grep 'n_dead_tup'

It looks like we would need to add an SQL function to retrieve the
data, then incorporate that function into the view definitions that
make up the pg_stat_user_tables etc. views. and of course update the
regression test and the documentation.

Other than implementing it, we would need to convince other hackers
that this is desirable to have. I'm not sure how hard that would be.
I've looked in the archives to see if this idea was already considered
but rejected, but I don't see any indication that it was previously
considered.

(/messages/by-id/4823.1262132964@sss.pgh.pa.us).

Cheers,

Jeff

Not being a developer, I am afraid, I will not be going to implement it
myself - nor would anybody wish so ;-)

I also searched the archives, but the closest I found is a discussion on
the Admin List starting here:
/messages/by-id/626919622.7634700.1351695913466.JavaMail.root@alaloop.com

On the other hand, there is quite a lot of discussion about making
autoanalyze more (or less) aggressive - which seems a difficult task to
me, when you cannot even check what's triggering your autoanalyze.

Anybody else interested?

I was asked about this exact thing the other day - it would be very
nice to have the information visible. I may take a look at doing it
(I've done some hacking on the stats system previously). However don't
let that put anyone else off - as I'll have to find the time to start :-)

I happened to be looking at the whole autovacuum/analyze setup in
another context - which reminded me about volunteering to take a look at
a patch for adding changes_since_analyze. So with probably impeccably
poor timing (smack in the middle of 9.3 beta), here is a patch that does
that (so it is probably an early 9.4 addition).

I've called the column "n_changes_since_analyze" - I can sense that
there might be discussion about how to maybe shorten that :-) , and
added a doc line for the view + updated the regression test expected input.

Regards

Mark

Attachments:

pgstat-changes-since-analyze.1.patchtext/x-patch; name=pgstat-changes-since-analyze.1.patchDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b37b6c3..0ebce4e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -951,6 +951,11 @@ postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
      <entry>Estimated number of dead rows</entry>
     </row>
     <row>
+     <entry><structfield>n_changes_since_analyze</></entry>
+     <entry><type>bigint</></entry>
+     <entry>Estimated number of row changes (inserts + updates + deletes) since the last analyze</entry>
+    </row>
+    <row>
      <entry><structfield>last_vacuum</></entry>
      <entry><type>timestamp with time zone</></entry>
      <entry>Last time at which this table was manually vacuumed
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index a03bfa6..05bba74 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -405,6 +405,7 @@ CREATE VIEW pg_stat_all_tables AS
             pg_stat_get_tuples_hot_updated(C.oid) AS n_tup_hot_upd,
             pg_stat_get_live_tuples(C.oid) AS n_live_tup,
             pg_stat_get_dead_tuples(C.oid) AS n_dead_tup,
+            pg_stat_get_changes_since_analyze(C.oid) AS n_changes_since_analyze,
             pg_stat_get_last_vacuum_time(C.oid) as last_vacuum,
             pg_stat_get_last_autovacuum_time(C.oid) as last_autovacuum,
             pg_stat_get_last_analyze_time(C.oid) as last_analyze,
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 8c1a767..8803996 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -34,6 +34,7 @@ extern Datum pg_stat_get_tuples_deleted(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_tuples_hot_updated(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_live_tuples(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_dead_tuples(PG_FUNCTION_ARGS);
+extern Datum pg_stat_get_changes_since_analyze(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_blocks_fetched(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_blocks_hit(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_last_vacuum_time(PG_FUNCTION_ARGS);
@@ -266,6 +267,22 @@ pg_stat_get_dead_tuples(PG_FUNCTION_ARGS)
 
 
 Datum
+pg_stat_get_changes_since_analyze(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	int64		result;
+	PgStat_StatTabEntry *tabentry;
+
+	if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+		result = 0;
+	else
+		result = (int64) (tabentry->changes_since_analyze);
+
+	PG_RETURN_INT64(result);
+}
+
+
+Datum
 pg_stat_get_blocks_fetched(PG_FUNCTION_ARGS)
 {
 	Oid			relid = PG_GETARG_OID(0);
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 392649c..0a77fcb 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */
 
 /*							yyyymmddN */
-#define CATALOG_VERSION_NO	201305061
+#define CATALOG_VERSION_NO	201305151
 
 #endif
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index feecbf9..a7b7adb 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2593,6 +2593,8 @@ DATA(insert OID = 2878 (  pg_stat_get_live_tuples	PGNSP PGUID 12 1 0 0 0 f f f f
 DESCR("statistics: number of live tuples");
 DATA(insert OID = 2879 (  pg_stat_get_dead_tuples	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_dead_tuples _null_ _null_ _null_ ));
 DESCR("statistics: number of dead tuples");
+DATA(insert OID = 3177 (  pg_stat_get_changes_since_analyze	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_changes_since_analyze _null_ _null_ _null_ ));
+DESCR("statistics: number of tuples changed since last analyze");
 DATA(insert OID = 1934 (  pg_stat_get_blocks_fetched	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_blocks_fetched _null_ _null_ _null_ ));
 DESCR("statistics: number of blocks fetched");
 DATA(insert OID = 1935 (  pg_stat_get_blocks_hit		PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_blocks_hit _null_ _null_ _null_ ));
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index dc3c8a8..d3e132a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1626,6 +1626,7 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
                                  |     pg_stat_get_tuples_hot_updated(c.oid) AS n_tup_hot_upd,                                                                                                                                                    +
                                  |     pg_stat_get_live_tuples(c.oid) AS n_live_tup,                                                                                                                                                              +
                                  |     pg_stat_get_dead_tuples(c.oid) AS n_dead_tup,                                                                                                                                                              +
+                                 |     pg_stat_get_changes_since_analyze(c.oid) AS n_changes_since_analyze,                                                                                                                                       +
                                  |     pg_stat_get_last_vacuum_time(c.oid) AS last_vacuum,                                                                                                                                                        +
                                  |     pg_stat_get_last_autovacuum_time(c.oid) AS last_autovacuum,                                                                                                                                                +
                                  |     pg_stat_get_last_analyze_time(c.oid) AS last_analyze,                                                                                                                                                      +
@@ -1720,6 +1721,7 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
                                  |     pg_stat_all_tables.n_tup_hot_upd,                                                                                                                                                                          +
                                  |     pg_stat_all_tables.n_live_tup,                                                                                                                                                                             +
                                  |     pg_stat_all_tables.n_dead_tup,                                                                                                                                                                             +
+                                 |     pg_stat_all_tables.n_changes_since_analyze,                                                                                                                                                                +
                                  |     pg_stat_all_tables.last_vacuum,                                                                                                                                                                            +
                                  |     pg_stat_all_tables.last_autovacuum,                                                                                                                                                                        +
                                  |     pg_stat_all_tables.last_analyze,                                                                                                                                                                           +
@@ -1762,6 +1764,7 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
                                  |     pg_stat_all_tables.n_tup_hot_upd,                                                                                                                                                                          +
                                  |     pg_stat_all_tables.n_live_tup,                                                                                                                                                                             +
                                  |     pg_stat_all_tables.n_dead_tup,                                                                                                                                                                             +
+                                 |     pg_stat_all_tables.n_changes_since_analyze,                                                                                                                                                                +
                                  |     pg_stat_all_tables.last_vacuum,                                                                                                                                                                            +
                                  |     pg_stat_all_tables.last_autovacuum,                                                                                                                                                                        +
                                  |     pg_stat_all_tables.last_analyze,                                                                                                                                                                           +
#14Magnus Hagander
magnus@hagander.net
In reply to: Mark Kirkwood (#13)
Re: [GENERAL] autoanalyze criteria

On Wed, May 15, 2013 at 2:33 AM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:

On 24/02/13 10:51, Mark Kirkwood wrote:

On 24/02/13 10:12, Stefan Andreatta wrote:

On 02/23/2013 09:30 PM, Jeff Janes wrote:

Moved discussion from General To Hackers.

On Sat, Feb 23, 2013 at 10:41 AM, Stefan Andreatta
<s.andreatta@synedra.com <mailto:s.andreatta@synedra.com>> wrote:

On 02/23/2013 05:10 PM, Jeff Janes wrote:

Sorry, I got tunnel vision about the how the threshold was
computed, and forgot about the thing it was compared to. There
is a "secret" data point in the stats collector
called changes_since_analyze. This is not exposed in the
pg_stat_user_tables. But I think it should be as I often have
wanted to see it.

Sounds like a very good idea to me - any way I could help to make
such a thing happen?

It should be fairly easy to implement because the other columns are
already there to show you the way, and if you want to try your hand at
hacking pgsql it would be a good introduction to doing so.

Look at each instance in the code of n_dead_dup and
pg_stat_get_dead_tuples, and those are the places where
changes_since_analyze also need to be addressed, in an analogous
manner (assuming it is isn't already there.)

git grep 'n_dead_tup'

It looks like we would need to add an SQL function to retrieve the
data, then incorporate that function into the view definitions that
make up the pg_stat_user_tables etc. views. and of course update the
regression test and the documentation.

Other than implementing it, we would need to convince other hackers
that this is desirable to have. I'm not sure how hard that would be.
I've looked in the archives to see if this idea was already considered
but rejected, but I don't see any indication that it was previously
considered.

(/messages/by-id/4823.1262132964@sss.pgh.pa.us).

Cheers,

Jeff

Not being a developer, I am afraid, I will not be going to implement it
myself - nor would anybody wish so ;-)

I also searched the archives, but the closest I found is a discussion on
the Admin List starting here:

/messages/by-id/626919622.7634700.1351695913466.JavaMail.root@alaloop.com

On the other hand, there is quite a lot of discussion about making
autoanalyze more (or less) aggressive - which seems a difficult task to
me, when you cannot even check what's triggering your autoanalyze.

Anybody else interested?

I was asked about this exact thing the other day - it would be very nice
to have the information visible. I may take a look at doing it (I've done
some hacking on the stats system previously). However don't let that put
anyone else off - as I'll have to find the time to start :-)

I happened to be looking at the whole autovacuum/analyze setup in another
context - which reminded me about volunteering to take a look at a patch for
adding changes_since_analyze. So with probably impeccably poor timing (smack
in the middle of 9.3 beta), here is a patch that does that (so it is
probably an early 9.4 addition).

I've called the column "n_changes_since_analyze" - I can sense that there
might be discussion about how to maybe shorten that :-) , and added a doc
line for the view + updated the regression test expected input.

Applied, with the changs suggested by Laurenz Albe in his review.

Thanks!

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers