Re: query plan question

Started by David Parkerabout 21 years ago4 messages
#1David Parker
dparker@tazznetworks.com

What I think is happening with the missing pg_statistic entries:

The install of our application involves a lot of data importing (via
JDBC) in one large transaction, which can take up to 30 minutes. (I
realize I left out this key piece of info in my original post...)

The pg_autovacuum logic is relying on data from pg_stat_all_tables to
make the decision about running analyze. As far as I can tell, the data
in this view gets updated outside of the transaction, because I saw the
numbers growing while I was importing. I saw pg_autovacuum log messages
for running analyze on several tables, but no statistics data showed up
for these, I assume because the actual data in the table wasn't yet
visible to pg_autovacuum because the import transaction had not finished
yet.

When the import finished, not all of the tables affected by the import
were re-visited because they had not bumped up over the threshold again,
even though the analyze run for those tables had not generated any stats
because of the still-open transaction.

Am I making the correct assumptions about the way the various pieces
work? Does this scenario make sense?

It's easy enough for us to kick off a vacuum/analyze at the end of a
long import - but this "mysterious" behavior was bugging me!

Thanks.

- DAP

Show quoted text

-----Original Message-----
From: Matthew T. O'Connor [mailto:matthew@zeut.net]
Sent: Wednesday, November 17, 2004 2:02 PM
To: David Parker
Cc: Tom Lane; Jeff; Russell Smith; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] query plan question

Well based on the autovacuum log that you attached, all of
those tables
are insert only (at least during the time period included in
the log.
Is that correct? If so, autovacuum will never do a vacuum
(unless required by xid wraparound issues) on those tables.
So this doesn't appear to be an autovacuum problem. I'm not
sure about the missing pg_statistic entries anyone else care
to field that one?

Matthew

David Parker wrote:

Thanks. The tables I'm concerned with are named: 'schema', 'usage',
'usageparameter', and 'flow'. It looks like autovacuum is performing
analyzes:

% grep "Performing: " logs/.db.tazz.vacuum.log
[2004-11-17 12:05:58 PM] Performing: ANALYZE
"public"."scriptlibrary_library"
[2004-11-17 12:15:59 PM] Performing: ANALYZE
"public"."scriptlibraryparm"
[2004-11-17 12:15:59 PM] Performing: ANALYZE "public"."usageparameter"
[2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."usageproperty"
[2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."route"
[2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."usageparameter"
[2004-11-17 12:21:00 PM] Performing: ANALYZE
"public"."scriptlibrary_library"
[2004-11-17 12:26:01 PM] Performing: ANALYZE "public"."usage"
[2004-11-17 12:26:01 PM] Performing: ANALYZE "public"."usageparameter"
[2004-11-17 12:31:04 PM] Performing: ANALYZE "public"."usageproperty"
[2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."route"
[2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."service_usage"
[2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."usageparameter"

But when I run the following:

select * from pg_statistic where starelid in (select oid from

pg_class

where relname in
('schema','usageparameter','flow','usage'))

it returns no records. Shouldn't it? It doesn't appear to be doing a
vacuum anywhere, which makes sense because none of these tables have
over the default threshold of 1000. Are there statistics

which only get

generated by vacuum?

I've attached a gzip of the pg_autovacuum log file, with -d 3.

Thanks again.

- DAP

-----Original Message-----
From: Matthew T. O'Connor [mailto:matthew@zeut.net]
Sent: Wednesday, November 17, 2004 11:41 AM
To: David Parker
Cc: Tom Lane; Jeff; Russell Smith; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] query plan question

David Parker wrote:

We're using postgresql 7.4.5. I've only recently put

pg_autovacuum in

place as part of our installation, and I'm basically taking the
defaults. I doubt it's a problem with autovacuum itself, but rather
with my configuration of it. I have some reading to do, so

any pointers

to existing autovacuum threads would be greatly appreciated!

Well the first thing to do is increase the verbosity of the
pg_autovacuum logging output. If you use -d2 or higher,

pg_autovacuum

will print out a lot of detail on what it thinks the thresholds are
and
why it is or isn't performing vacuums and analyzes. Attach
some of the
log and I'll take a look at it.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Parker (#1)
Timing of pgstats updates

"David Parker" <dparker@tazznetworks.com> writes:

What I think is happening with the missing pg_statistic entries:
The install of our application involves a lot of data importing (via
JDBC) in one large transaction, which can take up to 30 minutes. (I
realize I left out this key piece of info in my original post...)

The pg_autovacuum logic is relying on data from pg_stat_all_tables to
make the decision about running analyze. As far as I can tell, the data
in this view gets updated outside of the transaction, because I saw the
numbers growing while I was importing. I saw pg_autovacuum log messages
for running analyze on several tables, but no statistics data showed up
for these, I assume because the actual data in the table wasn't yet
visible to pg_autovacuum because the import transaction had not finished
yet.

When the import finished, not all of the tables affected by the import
were re-visited because they had not bumped up over the threshold again,
even though the analyze run for those tables had not generated any stats
because of the still-open transaction.

Bingo. The per-table activity stats are sent to the collector whenever
the backend waits for a client command. Given a moderately long
transaction block doing updates, it's not hard at all to imagine that
autovacuum would kick off vacuum and/or analyze while the updating
transaction is still in progress. The resulting operation is of course
a waste of time.

It'd be trivial to adjust postgres.c so that per-table stats are
only transmitted when we exit the transaction (basically move the
pgstat_report_tabstat call down a couple lines so it's not called if
IsTransactionOrTransactionBlock).

This seems like a good change to me. Does anyone not like it?

regards, tom lane

#3Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#2)
Re: [pgsql-hackers] Timing of pgstats updates

Tom,

This seems like a good change to me.  Does anyone not like it?

+1

--
Josh Berkus
Aglio Database Solutions
San Francisco

#4Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#2)
Re: Timing of pgstats updates

On 11/18/2004 11:43 AM, Tom Lane wrote:

"David Parker" <dparker@tazznetworks.com> writes:

What I think is happening with the missing pg_statistic entries:
The install of our application involves a lot of data importing (via
JDBC) in one large transaction, which can take up to 30 minutes. (I
realize I left out this key piece of info in my original post...)

The pg_autovacuum logic is relying on data from pg_stat_all_tables to
make the decision about running analyze. As far as I can tell, the data
in this view gets updated outside of the transaction, because I saw the
numbers growing while I was importing. I saw pg_autovacuum log messages
for running analyze on several tables, but no statistics data showed up
for these, I assume because the actual data in the table wasn't yet
visible to pg_autovacuum because the import transaction had not finished
yet.

When the import finished, not all of the tables affected by the import
were re-visited because they had not bumped up over the threshold again,
even though the analyze run for those tables had not generated any stats
because of the still-open transaction.

Bingo. The per-table activity stats are sent to the collector whenever
the backend waits for a client command. Given a moderately long
transaction block doing updates, it's not hard at all to imagine that
autovacuum would kick off vacuum and/or analyze while the updating
transaction is still in progress. The resulting operation is of course
a waste of time.

It'd be trivial to adjust postgres.c so that per-table stats are
only transmitted when we exit the transaction (basically move the
pgstat_report_tabstat call down a couple lines so it's not called if
IsTransactionOrTransactionBlock).

This seems like a good change to me. Does anyone not like it?

regards, tom lane

Sounds reasonable here.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #