statistics for shared catalogs not updated when autovacuum is off

Started by Peter Eisentrautabout 10 years ago12 messageshackers
Jump to latest
#1Peter Eisentraut
peter_e@gmx.net

When autovacuum is off, the statistics in pg_stat_sys_tables for shared
catalogs (e.g., pg_authid, pg_database) never update. So seq_scan
doesn't update when you read the table, last_analyze doesn't update when
you run analyze, etc.

But when you shut down the server and restart it with autovacuum on, the
updated statistics magically appear right away. So seq_scan is updated
with the number of reads you did before the shutdown, last_analyze
updates with the time of the analyze you did before the shutdown, etc.
So the data is saved, just not propagated to the view properly.

I can reproduce this back to 9.3, but not 9.2. Any ideas?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Peter Eisentraut (#1)
Re: statistics for shared catalogs not updated when autovacuum is off

On 1/30/16 5:05 PM, Peter Eisentraut wrote:

When autovacuum is off, the statistics in pg_stat_sys_tables for shared
catalogs (e.g., pg_authid, pg_database) never update. So seq_scan
doesn't update when you read the table, last_analyze doesn't update when
you run analyze, etc.

But when you shut down the server and restart it with autovacuum on, the

What about with autovacuum still off?

updated statistics magically appear right away. So seq_scan is updated
with the number of reads you did before the shutdown, last_analyze
updates with the time of the analyze you did before the shutdown, etc.
So the data is saved, just not propagated to the view properly.

I can reproduce this back to 9.3, but not 9.2. Any ideas?

ISTR that there's some code in the autovac launcher that ensures certain
stats have been loaded from the file into memory; I'm guessing that the
functions implementing the shared catalog views need something similar.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Jim Nasby (#2)
Re: statistics for shared catalogs not updated when autovacuum is off

On 2/1/16 9:49 AM, Jim Nasby wrote:

On 1/30/16 5:05 PM, Peter Eisentraut wrote:

When autovacuum is off, the statistics in pg_stat_sys_tables for shared
catalogs (e.g., pg_authid, pg_database) never update. So seq_scan
doesn't update when you read the table, last_analyze doesn't update when
you run analyze, etc.

But when you shut down the server and restart it with autovacuum on, the

What about with autovacuum still off?

nothing

updated statistics magically appear right away. So seq_scan is updated
with the number of reads you did before the shutdown, last_analyze
updates with the time of the analyze you did before the shutdown, etc.
So the data is saved, just not propagated to the view properly.

I can reproduce this back to 9.3, but not 9.2. Any ideas?

ISTR that there's some code in the autovac launcher that ensures certain
stats have been loaded from the file into memory; I'm guessing that the
functions implementing the shared catalog views need something similar.

That's probably right. Even with autovacuum on, the statistics for
shared catalogs do not appear as updated right away. That is, if you
run VACUUM and then look at pg_stat_sys_tables right away, you will see
the stats for shared catalogs to be slightly out of date until the
minutely autovacuum check causes them to update.

So the problem exists in general, but the autovacuum launcher papers
over it every minute.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Peter Eisentraut (#3)
Re: statistics for shared catalogs not updated when autovacuum is off

On 2/1/16 7:20 PM, Peter Eisentraut wrote:

That's probably right. Even with autovacuum on, the statistics for
shared catalogs do not appear as updated right away. That is, if you
run VACUUM and then look at pg_stat_sys_tables right away, you will see
the stats for shared catalogs to be slightly out of date until the
minutely autovacuum check causes them to update.

So the problem exists in general, but the autovacuum launcher papers
over it every minute.

I suspect the issue is in backend_read_statsfile(). Presumably the if
just needs a call to AutoVacuumingActive() added:

/*
* Autovacuum launcher wants stats about all databases, but a shallow read
* is sufficient.
*/
if (IsAutoVacuumLauncherProcess())
pgStatDBHash = pgstat_read_statsfiles(InvalidOid, false, false);
else
pgStatDBHash = pgstat_read_statsfiles(MyDatabaseId, false, true);

The interesting thing is that we always start the launcher one time, to
protect against wraparound, but apparently that path doesn't call
anything that calls backend_read_statsfile() (which is static).
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Michael Paquier
michael@paquier.xyz
In reply to: Jim Nasby (#4)
Re: statistics for shared catalogs not updated when autovacuum is off

On Tue, Feb 2, 2016 at 10:38 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 2/1/16 7:20 PM, Peter Eisentraut wrote:

That's probably right. Even with autovacuum on, the statistics for
shared catalogs do not appear as updated right away. That is, if you
run VACUUM and then look at pg_stat_sys_tables right away, you will see
the stats for shared catalogs to be slightly out of date until the
minutely autovacuum check causes them to update.

So the problem exists in general, but the autovacuum launcher papers
over it every minute.

I suspect the issue is in backend_read_statsfile(). Presumably the if just
needs a call to AutoVacuumingActive() added:

The interesting thing is that we always start the launcher one time, to
protect against wraparound, but apparently that path doesn't call anything
that calls backend_read_statsfile() (which is static).

The problem is different I think. Since 9.3, database-related
statistics are located on separate files. And the statistics of shared
tables is visibly located in a file with database name set as
InvalidOid, leading to the presence of db_0.stat in pg_stat_tmp. So
the fix for shared relations is to make sure that
backend_read_statsfile can load the file dedicated to shared objects
when data from it is needed, like pg_database stats. So making
backend_read_statsfile a bit smarter looks like the good answer to me.

At the same time I am not getting why pgstat_fetch_stat_tabentry needs
to be that complicated. Based on the relation OID we can know if it is
a shared relation or not, there is no point in doing two times the
same lookup in the pgstat hash table.

Attached is a patch that fixes the issue here:
=# show autovacuum;
autovacuum
------------
off
(1 row)
=# select seq_scan from pg_stat_sys_tables where relname = 'pg_database';
seq_scan
----------
2
(1 row)
=# select count(*) from pg_database;
count
-------
4
(1 row)
=# select seq_scan from pg_stat_sys_tables where relname = 'pg_database';
seq_scan
----------
3
(1 row)
--
Michael

Attachments:

pgstat-shared-catalogs.patchapplication/x-download; name=pgstat-shared-catalogs.patchDownload+22-31
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#5)
Re: statistics for shared catalogs not updated when autovacuum is off

Michael Paquier <michael.paquier@gmail.com> writes:

The problem is different I think. Since 9.3, database-related
statistics are located on separate files. And the statistics of shared
tables is visibly located in a file with database name set as
InvalidOid, leading to the presence of db_0.stat in pg_stat_tmp. So
the fix for shared relations is to make sure that
backend_read_statsfile can load the file dedicated to shared objects
when data from it is needed, like pg_database stats. So making
backend_read_statsfile a bit smarter looks like the good answer to me.

I don't think that's quite it. The problem basically is that the stats
collector will only write out the shared-catalog stats file when it's
prompted to by a PGSTAT_MTYPE_INQUIRY message for database OID 0.
Ordinary backends will never do that; they always send their own DB OID.
The autovac launcher accidentally does that, because it calls
pgstat_send_inquiry with MyDatabaseId which it has never set nonzero.
So once per autovac launcher cycle, an update of the shared-catalog stats
file will happen on disk, but with autovac off it never happens at all.

Meanwhile, backends look only at the timestamp on the non-shared stats
file corresponding to their DB. When that's sufficiently up to date,
they read in both that file and the shared-stats file and call it good.

At the same time I am not getting why pgstat_fetch_stat_tabentry needs
to be that complicated. Based on the relation OID we can know if it is
a shared relation or not, there is no point in doing two times the
same lookup in the pgstat hash table.

True, although I'm not sure if adding a dependency on IsSharedRelation()
here is a good thing. In any case, you took the dependency too far ...

Attached is a patch that fixes the issue here:

I have not tested it, but I would bet a lot that this patch is broken:
what will happen if the first request in a transaction is for a shared
catalog is that we'll read just the shared-catalog data, and then
subsequent requests for stats for an unshared table will find nothing.
Moreover, even if you undid the change to the pgstat_read_statsfiles()
call so that we still did read the appropriate unshared stats, this
would have the effect of reversing the problem: if the first request
is for a shared catalog, then we'd check to ensure the shared stats
are up-to-date, but fail to ensure that the unshared ones are.

We could possibly fix this by having backends perform the poll/request
logic in backend_read_statsfile() for *both* their own DB OID and OID 0.
(They would always have to do both, regardless of which type of table is
initially asked about, since there's no way to predict what other requests
will be made in the same transaction.) This seems pretty inefficient to
me, though. I think it would be better to redefine the behavior in
pgstat_write_statsfiles() so that the shared-catalog stats file is always
written if any DB statsfile write has been requested. Then backends could
continue to poll just their own DB OID and expect to see up-to-date shared
stats as well.

I initially thought there might be a race condition with that, but there's
not really because pgstat_read_db_statsfile_timestamp() is not actually
looking at on-disk file timestamps: it's reading the global stats file and
seeing whether the per-db last write time recorded in that is new enough.
So all of the relevant data (global stats file, shared-catalog table stats
file, and per-database table stats file) will appear to update atomically.

In short, I think the attached is enough to fix it. There's some cosmetic
cleanup that could be done here: a lot of these functions could use better
comments, and I'm not happy about the autovac launcher depending on
MyDatabaseId being zero. But those are not functional problems.

regards, tom lane

Attachments:

pgstat-shared-catalogs-fix-2.patchtext/x-diff; charset=us-ascii; name=pgstat-shared-catalogs-fix-2.patchDownload+9-0
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#6)
Re: statistics for shared catalogs not updated when autovacuum is off

I wrote:

In short, I think the attached is enough to fix it. There's some cosmetic
cleanup that could be done here: a lot of these functions could use better
comments, and I'm not happy about the autovac launcher depending on
MyDatabaseId being zero. But those are not functional problems.

While re-reading this, I realized that there's a second serious bug in the
same area: commit 187492b6c basically broke the mechanism that prevents
duplicated writes when two backends send inquiries for the same DB at
about the same time. The intention is that if an inquiry arrives with a
cutoff_time older than when we last wrote the DB's stats, we don't have to
do anything. But as the code stands we will make a DBWriteRequest
unconditionally, because the previous request has been removed from the
list. It would only succeed in merging requests if the second one arrives
before we have executed a write in response to the first one, which looks
to me to be impossible given the coding of the message-processing loop in
PgstatCollectorMain.

In the attached expanded patch, I fixed pgstat_recv_inquiry() to not make
a write request unless the cutoff-time restriction is met. I also removed
DBWriteRequest.request_time, which was not being consulted at all. At
this point the DBWriteRequest data structure is just overkill and could be
replaced by an OID list at very substantial notational savings, but I've
not done that yet.

Comments/objections?

regards, tom lane

Attachments:

pgstat-fixes.patchtext/x-diff; charset=us-ascii; name=pgstat-fixes.patchDownload+81-70
#8Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#6)
Re: statistics for shared catalogs not updated when autovacuum is off

On Tue, May 24, 2016 at 3:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Michael Paquier <michael.paquier@gmail.com> writes:

At the same time I am not getting why pgstat_fetch_stat_tabentry needs
to be that complicated. Based on the relation OID we can know if it is
a shared relation or not, there is no point in doing two times the
same lookup in the pgstat hash table.

True, although I'm not sure if adding a dependency on IsSharedRelation()
here is a good thing. In any case, you took the dependency too far ...

OK. It seemed more simple to me to have the dependency on catalog.h to
avoid two lookups. But I won't fight for it either.

Attached is a patch that fixes the issue here:

I have not tested it, but I would bet a lot that this patch is broken:
what will happen if the first request in a transaction is for a shared
catalog is that we'll read just the shared-catalog data, and then
subsequent requests for stats for an unshared table will find nothing.
Moreover, even if you undid the change to the pgstat_read_statsfiles()
call so that we still did read the appropriate unshared stats, this
would have the effect of reversing the problem: if the first request
is for a shared catalog, then we'd check to ensure the shared stats
are up-to-date, but fail to ensure that the unshared ones are.

Your bet is right. I forgot the transactional behavior of this code
path. The stats obtained at the first request would not have been
correct for the database of the backend.

In short, I think the attached is enough to fix it. There's some cosmetic
cleanup that could be done here: a lot of these functions could use better
comments, and I'm not happy about the autovac launcher depending on
MyDatabaseId being zero. But those are not functional problems.

I am going to look at the second patch you sent.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#8)
Re: statistics for shared catalogs not updated when autovacuum is off

Michael Paquier <michael.paquier@gmail.com> writes:

On Tue, May 24, 2016 at 3:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

True, although I'm not sure if adding a dependency on IsSharedRelation()
here is a good thing. In any case, you took the dependency too far ...

OK. It seemed more simple to me to have the dependency on catalog.h to
avoid two lookups. But I won't fight for it either.

Well, it might be worth doing, but I think it's a distinct patch.

I am going to look at the second patch you sent.

OK. I'm going to work on replacing the DBWriteRequest structure with
an OID list, and on improving the comments some more, but I don't
anticipate further functional changes.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#9)
Re: statistics for shared catalogs not updated when autovacuum is off

I wrote:

OK. I'm going to work on replacing the DBWriteRequest structure with
an OID list, and on improving the comments some more, but I don't
anticipate further functional changes.

Here's the end result of that. I went ahead and pushed the original
small patch, concluding that that was fixing a different bug and so
should be a different commit.

regards, tom lane

Attachments:

pgstat-avoid-redundant-writes-again.patchtext/x-diff; charset=us-ascii; name=pgstat-avoid-redundant-writes-again.patchDownload+208-189
#11Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#10)
Re: statistics for shared catalogs not updated when autovacuum is off

On Wed, May 25, 2016 at 05:50:41PM -0400, Tom Lane wrote:

I wrote:

OK. I'm going to work on replacing the DBWriteRequest structure with
an OID list, and on improving the comments some more, but I don't
anticipate further functional changes.

Here's the end result of that.

You may want to compare your patch to this pending patch for the same bug:
/messages/by-id/24f09c2d-e5bf-1f73-db54-8255c1280d32@2ndquadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#11)
Re: statistics for shared catalogs not updated when autovacuum is off

Noah Misch <noah@leadboat.com> writes:

You may want to compare your patch to this pending patch for the same bug:
/messages/by-id/24f09c2d-e5bf-1f73-db54-8255c1280d32@2ndquadrant.com

Oh, interesting. I had not been paying any attention to that thread.
I'll go respond over there.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers