Re: query plan question
What I think is happening with the missing pg_statistic entries:
The install of our application involves a lot of data importing (via
JDBC) in one large transaction, which can take up to 30 minutes. (I
realize I left out this key piece of info in my original post...)
The pg_autovacuum logic is relying on data from pg_stat_all_tables to
make the decision about running analyze. As far as I can tell, the data
in this view gets updated outside of the transaction, because I saw the
numbers growing while I was importing. I saw pg_autovacuum log messages
for running analyze on several tables, but no statistics data showed up
for these, I assume because the actual data in the table wasn't yet
visible to pg_autovacuum because the import transaction had not finished
yet.
When the import finished, not all of the tables affected by the import
were re-visited because they had not bumped up over the threshold again,
even though the analyze run for those tables had not generated any stats
because of the still-open transaction.
Am I making the correct assumptions about the way the various pieces
work? Does this scenario make sense?
It's easy enough for us to kick off a vacuum/analyze at the end of a
long import - but this "mysterious" behavior was bugging me!
Thanks.
- DAP
Show quoted text
-----Original Message-----
From: Matthew T. O'Connor [mailto:matthew@zeut.net]
Sent: Wednesday, November 17, 2004 2:02 PM
To: David Parker
Cc: Tom Lane; Jeff; Russell Smith; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] query plan questionWell based on the autovacuum log that you attached, all of
those tables
are insert only (at least during the time period included in
the log.
Is that correct? If so, autovacuum will never do a vacuum
(unless required by xid wraparound issues) on those tables.
So this doesn't appear to be an autovacuum problem. I'm not
sure about the missing pg_statistic entries anyone else care
to field that one?Matthew
David Parker wrote:
Thanks. The tables I'm concerned with are named: 'schema', 'usage',
'usageparameter', and 'flow'. It looks like autovacuum is performing
analyzes:% grep "Performing: " logs/.db.tazz.vacuum.log
[2004-11-17 12:05:58 PM] Performing: ANALYZE
"public"."scriptlibrary_library"
[2004-11-17 12:15:59 PM] Performing: ANALYZE
"public"."scriptlibraryparm"
[2004-11-17 12:15:59 PM] Performing: ANALYZE "public"."usageparameter"
[2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."usageproperty"
[2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."route"
[2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."usageparameter"
[2004-11-17 12:21:00 PM] Performing: ANALYZE
"public"."scriptlibrary_library"
[2004-11-17 12:26:01 PM] Performing: ANALYZE "public"."usage"
[2004-11-17 12:26:01 PM] Performing: ANALYZE "public"."usageparameter"
[2004-11-17 12:31:04 PM] Performing: ANALYZE "public"."usageproperty"
[2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."route"
[2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."service_usage"
[2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."usageparameter"But when I run the following:
select * from pg_statistic where starelid in (select oid from
pg_class
where relname in
('schema','usageparameter','flow','usage'))it returns no records. Shouldn't it? It doesn't appear to be doing a
vacuum anywhere, which makes sense because none of these tables have
over the default threshold of 1000. Are there statisticswhich only get
generated by vacuum?
I've attached a gzip of the pg_autovacuum log file, with -d 3.
Thanks again.
- DAP
-----Original Message-----
From: Matthew T. O'Connor [mailto:matthew@zeut.net]
Sent: Wednesday, November 17, 2004 11:41 AM
To: David Parker
Cc: Tom Lane; Jeff; Russell Smith; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] query plan questionDavid Parker wrote:
We're using postgresql 7.4.5. I've only recently put
pg_autovacuum in
place as part of our installation, and I'm basically taking the
defaults. I doubt it's a problem with autovacuum itself, but rather
with my configuration of it. I have some reading to do, soany pointers
to existing autovacuum threads would be greatly appreciated!
Well the first thing to do is increase the verbosity of the
pg_autovacuum logging output. If you use -d2 or higher,pg_autovacuum
will print out a lot of detail on what it thinks the thresholds are
and
why it is or isn't performing vacuums and analyzes. Attach
some of the
log and I'll take a look at it.
"David Parker" <dparker@tazznetworks.com> writes:
What I think is happening with the missing pg_statistic entries:
The install of our application involves a lot of data importing (via
JDBC) in one large transaction, which can take up to 30 minutes. (I
realize I left out this key piece of info in my original post...)
The pg_autovacuum logic is relying on data from pg_stat_all_tables to
make the decision about running analyze. As far as I can tell, the data
in this view gets updated outside of the transaction, because I saw the
numbers growing while I was importing. I saw pg_autovacuum log messages
for running analyze on several tables, but no statistics data showed up
for these, I assume because the actual data in the table wasn't yet
visible to pg_autovacuum because the import transaction had not finished
yet.
When the import finished, not all of the tables affected by the import
were re-visited because they had not bumped up over the threshold again,
even though the analyze run for those tables had not generated any stats
because of the still-open transaction.
Bingo. The per-table activity stats are sent to the collector whenever
the backend waits for a client command. Given a moderately long
transaction block doing updates, it's not hard at all to imagine that
autovacuum would kick off vacuum and/or analyze while the updating
transaction is still in progress. The resulting operation is of course
a waste of time.
It'd be trivial to adjust postgres.c so that per-table stats are
only transmitted when we exit the transaction (basically move the
pgstat_report_tabstat call down a couple lines so it's not called if
IsTransactionOrTransactionBlock).
This seems like a good change to me. Does anyone not like it?
regards, tom lane
Tom,
This seems like a good change to me. Does anyone not like it?
+1
--
Josh Berkus
Aglio Database Solutions
San Francisco
Import Notes
Reply to msg id not found: 20041118172519.36F443A472B@svr1.postgresql.orgReference msg id not found: 20041118172519.36F443A472B@svr1.postgresql.org | Resolved by subject fallback
On 11/18/2004 11:43 AM, Tom Lane wrote:
"David Parker" <dparker@tazznetworks.com> writes:
What I think is happening with the missing pg_statistic entries:
The install of our application involves a lot of data importing (via
JDBC) in one large transaction, which can take up to 30 minutes. (I
realize I left out this key piece of info in my original post...)The pg_autovacuum logic is relying on data from pg_stat_all_tables to
make the decision about running analyze. As far as I can tell, the data
in this view gets updated outside of the transaction, because I saw the
numbers growing while I was importing. I saw pg_autovacuum log messages
for running analyze on several tables, but no statistics data showed up
for these, I assume because the actual data in the table wasn't yet
visible to pg_autovacuum because the import transaction had not finished
yet.When the import finished, not all of the tables affected by the import
were re-visited because they had not bumped up over the threshold again,
even though the analyze run for those tables had not generated any stats
because of the still-open transaction.Bingo. The per-table activity stats are sent to the collector whenever
the backend waits for a client command. Given a moderately long
transaction block doing updates, it's not hard at all to imagine that
autovacuum would kick off vacuum and/or analyze while the updating
transaction is still in progress. The resulting operation is of course
a waste of time.It'd be trivial to adjust postgres.c so that per-table stats are
only transmitted when we exit the transaction (basically move the
pgstat_report_tabstat call down a couple lines so it's not called if
IsTransactionOrTransactionBlock).This seems like a good change to me. Does anyone not like it?
regards, tom lane
Sounds reasonable here.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #