autovacuum stress-testing our system

Started by Tomas Vondraover 13 years ago95 messages
#1Tomas Vondra
tv@fuzzy.cz

Hi,

I've been struggling with autovacuum generating a lot of I/O and CPU on
some of our
systems - after a night spent analyzing this behavior, I believe the
current
autovacuum accidentally behaves a bit like a stress-test in some corner
cases (but
I may be seriously wrong, after all it was a long night).

First - our system really is not a "common" one - we do have ~1000 of
databases of
various size, each containing up to several thousands of tables
(several user-defined
tables, the rest serve as caches for a reporting application - yes,
it's a bit weird
design but that's life). This all leads to pgstat.stat significantly
larger than 60 MB.

Now, the two main pieces of information from the pgstat.c are the timer
definitions

---------------------------------- pgstat.c : 80
----------------------------------

#define PGSTAT_STAT_INTERVAL 500 /* Minimum time between stats
file
* updates; in milliseconds. */

#define PGSTAT_RETRY_DELAY 10 /* How long to wait between
checks for
* a new file; in milliseconds.
*/

#define PGSTAT_MAX_WAIT_TIME 10000 /* Maximum time to wait for a
stats
* file update; in milliseconds.
*/

#define PGSTAT_INQ_INTERVAL 640 /* How often to ping the
collector for
* a new file; in milliseconds.
*/

#define PGSTAT_RESTART_INTERVAL 60 /* How often to attempt to
restart a
* failed statistics collector;
in
* seconds. */

#define PGSTAT_POLL_LOOP_COUNT (PGSTAT_MAX_WAIT_TIME /
PGSTAT_RETRY_DELAY)
#define PGSTAT_INQ_LOOP_COUNT (PGSTAT_INQ_INTERVAL /
PGSTAT_RETRY_DELAY)

-----------------------------------------------------------------------------------

and then this loop (the current HEAD does this a bit differently, but
the 9.2 code
is a bit readable and suffers the same issue):

---------------------------------- pgstat.c : 3560
--------------------------------

/*
* Loop until fresh enough stats file is available or we ran out of
time.
* The stats inquiry message is sent repeatedly in case collector
drops
* it; but not every single time, as that just swamps the collector.
*/
for (count = 0; count < PGSTAT_POLL_LOOP_COUNT; count++)
{
TimestampTz file_ts = 0;

CHECK_FOR_INTERRUPTS();

if (pgstat_read_statsfile_timestamp(false, &file_ts) &&
file_ts >= min_ts)
break;

/* Not there or too old, so kick the collector and wait a bit */
if ((count % PGSTAT_INQ_LOOP_COUNT) == 0)
pgstat_send_inquiry(min_ts);

pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
}

if (count >= PGSTAT_POLL_LOOP_COUNT)
elog(WARNING, "pgstat wait timeout");

/* Autovacuum launcher wants stats about all databases */
if (IsAutoVacuumLauncherProcess())
pgStatDBHash = pgstat_read_statsfile(InvalidOid, false);
else
pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false);

-----------------------------------------------------------------------------------

What this code does it that it checks the statfile, and if it's not
stale (the
timestamp of the write start is not older than PGSTAT_RETRY_DELAY
milliseconds),
the loop is terminated and the file is read.

Now, let's suppose the write takes >10 ms, which is the
PGSTAT_RETRY_DELAY values.
With our current pgstat.stat filesize/num of relations, this is quite
common.
Actually the common write time in our case is ~100 ms, even if we move
the file
into tmpfs. That means that almost all the calls to
backend_read_statsfile (which
happen in all pgstat_fetch_stat_*entry calls) result in continuous
stream of
inquiries from the autovacuum workers, writing/reading of the file.

We're not getting 'pgstat wait timeout' though, because it finally gets
written
before PGSTAT_MAX_WAIT_TIME.

By moving the file to a tmpfs we've minimized the I/O impact, but now
the collector
and autovacuum launcher consume ~75% of CPU (i.e. ~ one core) and do
nothing except
burning power because the database is almost read-only. Not a good
thing in the
"green computing" era I guess.

First, I'm interested in feedback - did I get all the details right, or
am I
missing something important?

Next, I'm thinking about ways to solve this:

1) turning of autovacuum, doing regular VACUUM ANALYZE from cron -
certainly an
option, but it's rather a workaround than a solution and I'm not
very fond of
it. Moreover it fixes only one side of the problem - triggering the
statfile
writes over and over. The file will be written anyway, although not
that
frequently.

2) tweaking the timer values, especially increasing PGSTAT_RETRY_DELAY
and so on
to consider several seconds to be fresh enough - Would be nice to
have this
as a GUC variables, although we can do another private patch on our
own. But
more knobs is not always better.

3) logic detecting the proper PGSTAT_RETRY_DELAY value - based mostly
on the time
it takes to write the file (e.g. 10x the write time or something).

4) keeping some sort of "dirty flag" in stat entries - and then writing
only info
about objects were modified enough to be eligible for vacuum/analyze
(e.g.
increasing number of index scans can't trigger autovacuum while
inserting
rows can). Also, I'm not worried about getting a bit older num of
index scans,
so 'clean' records might be written less frequently than 'dirty'
ones.

5) splitting the single stat file into multiple pieces - e.g. per
database,
written separately, so that the autovacuum workers don't need to
read all
the data even for databases that don't need to be vacuumed. This
might be
combined with (4).

Ideas? Objections? Preferred options?

I kinda like (4+5), although that'd be a pretty big patch and I'm not
entirely
sure it can be done without breaking other things.

regards
Tomas

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#1)
Re: autovacuum stress-testing our system

Really, as far as autovacuum is concerned, it would be much more useful
to be able to reliably detect that a table has been recently vacuumed,
without having to request a 10ms-recent pgstat snapshot. That would
greatly reduce the amount of time autovac spends on pgstat requests.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#3Jeff Janes
jeff.janes@gmail.com
In reply to: Tomas Vondra (#1)
Re: autovacuum stress-testing our system

On Wed, Sep 26, 2012 at 5:43 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

First - our system really is not a "common" one - we do have ~1000 of
databases of
various size, each containing up to several thousands of tables (several
user-defined
tables, the rest serve as caches for a reporting application - yes, it's a
bit weird
design but that's life). This all leads to pgstat.stat significantly larger
than 60 MB.

...

Now, let's suppose the write takes >10 ms, which is the PGSTAT_RETRY_DELAY
values.
With our current pgstat.stat filesize/num of relations, this is quite
common.
Actually the common write time in our case is ~100 ms, even if we move the
file
into tmpfs. That means that almost all the calls to backend_read_statsfile
(which
happen in all pgstat_fetch_stat_*entry calls) result in continuous stream of
inquiries from the autovacuum workers, writing/reading of the file.

I don't think it actually does. What you are missing is the same
thing I was missing a few weeks ago when I also looked into something
like this.

3962:

* We don't recompute min_ts after sleeping, except in the
* unlikely case that cur_ts went backwards.

That means the file must have been written within 10 ms of when we
*first* asked for it.

What is generating the endless stream you are seeing is that you have
1000 databases so if naptime is one minute you are vacuuming 16 per
second. Since every database gets a new process, that process needs
to read the file as it doesn't inherit one.

...

First, I'm interested in feedback - did I get all the details right, or am I
missing something important?

Next, I'm thinking about ways to solve this:

1) turning of autovacuum, doing regular VACUUM ANALYZE from cron

Increasing autovacuum_naptime seems like a far better way to do
effectively the same thing.

2) tweaking the timer values, especially increasing PGSTAT_RETRY_DELAY and
so on
to consider several seconds to be fresh enough - Would be nice to have
this
as a GUC variables, although we can do another private patch on our own.
But
more knobs is not always better.

I think forking it off to to another value would be better. If you
are an autovacuum worker which is just starting up and so getting its
initial stats, you can tolerate a stats file up to "autovacuum_naptime
/ 5.0" stale. If you are already started up and are just about to
vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as it
currently is, so as not to redundantly vacuum a table.

3) logic detecting the proper PGSTAT_RETRY_DELAY value - based mostly on the
time
it takes to write the file (e.g. 10x the write time or something).

This is already in place.

5) splitting the single stat file into multiple pieces - e.g. per database,
written separately, so that the autovacuum workers don't need to read all
the data even for databases that don't need to be vacuumed. This might be
combined with (4).

I think this needs to happen eventually.

Cheers,

Jeff

#4Euler Taveira
euler@timbira.com
In reply to: Tomas Vondra (#1)
Re: autovacuum stress-testing our system

On 26-09-2012 09:43, Tomas Vondra wrote:

I've been struggling with autovacuum generating a lot of I/O and CPU on some
of our
systems - after a night spent analyzing this behavior, I believe the current
autovacuum accidentally behaves a bit like a stress-test in some corner cases
(but
I may be seriously wrong, after all it was a long night).

It is known that statistic collector doesn't scale for a lot of databases. It
wouldn't be a problem if we don't have automatic maintenance (aka autovacuum).

Next, I'm thinking about ways to solve this:

1) turning of autovacuum, doing regular VACUUM ANALYZE from cron - certainly an
option, but it's rather a workaround than a solution and I'm not very fond of
it. Moreover it fixes only one side of the problem - triggering the statfile
writes over and over. The file will be written anyway, although not that
frequently.

It solves your problem if you combine scheduled VA with pgstat.stat in a
tmpfs. I don't see it as a definitive solution if we want to scale auto
maintenance for several hundreds or even thousands databases in a single
cluster (Someone could think it is not that common but in hosting scenarios
this is true. DBAs don't want to run several VMs or pg servers just to
minimize the auto maintenance scalability problem).

2) tweaking the timer values, especially increasing PGSTAT_RETRY_DELAY and so on
to consider several seconds to be fresh enough - Would be nice to have this
as a GUC variables, although we can do another private patch on our own. But
more knobs is not always better.

It doesn't solve the problem. Also it could be a problem for autovacuum (that
make assumptions based in those fixed values).

3) logic detecting the proper PGSTAT_RETRY_DELAY value - based mostly on the time
it takes to write the file (e.g. 10x the write time or something).

Such adaptive logic would be good only iff it takes a small time fraction to
execute. It have to pay attention to the limits. It appears to be a candidate
for exploration.

4) keeping some sort of "dirty flag" in stat entries - and then writing only info
about objects were modified enough to be eligible for vacuum/analyze (e.g.
increasing number of index scans can't trigger autovacuum while inserting
rows can). Also, I'm not worried about getting a bit older num of index scans,
so 'clean' records might be written less frequently than 'dirty' ones.

It minimizes your problem but harms collector tools (that want fresh
statistics about databases).

5) splitting the single stat file into multiple pieces - e.g. per database,
written separately, so that the autovacuum workers don't need to read all
the data even for databases that don't need to be vacuumed. This might be
combined with (4).

IMHO that's the definitive solution. It would be one file per database plus a
global one. That way, the check would only read the global.stat and process
those database that were modified. Also, an in-memory map could store that
information to speed up the checks. The only downside I can see is that you
will increase the number of opened file descriptors.

Ideas? Objections? Preferred options?

I prefer to attack 3, sort of 4 (explained in 5 -- in-memory map) and 5.

Out of curiosity, did you run perf (or some other performance analyzer) to
verify if some (stats and/or autovac) functions pop up in the report?

--
Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

#5Tomas Vondra
tv@fuzzy.cz
In reply to: Jeff Janes (#3)
Re: autovacuum stress-testing our system

Dne 26.09.2012 16:51, Jeff Janes napsal:

On Wed, Sep 26, 2012 at 5:43 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

First - our system really is not a "common" one - we do have ~1000
of
databases of
various size, each containing up to several thousands of tables
(several
user-defined
tables, the rest serve as caches for a reporting application - yes,
it's a
bit weird
design but that's life). This all leads to pgstat.stat significantly
larger
than 60 MB.

...

Now, let's suppose the write takes >10 ms, which is the
PGSTAT_RETRY_DELAY
values.
With our current pgstat.stat filesize/num of relations, this is
quite
common.
Actually the common write time in our case is ~100 ms, even if we
move the
file
into tmpfs. That means that almost all the calls to
backend_read_statsfile
(which
happen in all pgstat_fetch_stat_*entry calls) result in continuous
stream of
inquiries from the autovacuum workers, writing/reading of the file.

I don't think it actually does. What you are missing is the same
thing I was missing a few weeks ago when I also looked into something
like this.

3962:

* We don't recompute min_ts after sleeping, except in
the
* unlikely case that cur_ts went backwards.

That means the file must have been written within 10 ms of when we
*first* asked for it.

Yeah, right - I've missed the first "if (pgStatDBHash)" check right at
the beginning.

What is generating the endless stream you are seeing is that you have
1000 databases so if naptime is one minute you are vacuuming 16 per
second. Since every database gets a new process, that process needs
to read the file as it doesn't inherit one.

Right. But that makes the 10ms timeout even more strange, because the
worker is then using the data for very long time (even minutes).

...

First, I'm interested in feedback - did I get all the details right,
or am I
missing something important?

Next, I'm thinking about ways to solve this:

1) turning of autovacuum, doing regular VACUUM ANALYZE from cron

Increasing autovacuum_naptime seems like a far better way to do
effectively the same thing.

Agreed. One of my colleagues turned autovacuum off a few years back and
that
was a nice lesson how not to solve this kind of issues.

2) tweaking the timer values, especially increasing
PGSTAT_RETRY_DELAY and
so on
to consider several seconds to be fresh enough - Would be nice to
have
this
as a GUC variables, although we can do another private patch on
our own.
But
more knobs is not always better.

I think forking it off to to another value would be better. If you
are an autovacuum worker which is just starting up and so getting its
initial stats, you can tolerate a stats file up to
"autovacuum_naptime
/ 5.0" stale. If you are already started up and are just about to
vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as it
currently is, so as not to redundantly vacuum a table.

I always thought there's a "no more than one worker per database"
limit,
and that the file is always reloaded when switching to another
database.
So I'm not sure how could a worker see such a stale table info? Or are
the workers keeping the stats across multiple databases?

3) logic detecting the proper PGSTAT_RETRY_DELAY value - based
mostly on the
time
it takes to write the file (e.g. 10x the write time or
something).

This is already in place.

Really? Where?

I've checked the current master, and the only thing I see in
pgstat_write_statsfile
is this (line 3558):

last_statwrite = globalStats.stats_timestamp;

https://github.com/postgres/postgres/blob/master/src/backend/postmaster/pgstat.c#L3558

I don't think that's doing what I meant. That really doesn't scale the
timeout
according to write time. What happens right now is that when the stats
file is
written at time 0 (starts at zero, write finishes at 100 ms), and a
worker asks
for the file at 99 ms (i.e. 1ms before the write finishes), it will set
the time
of the inquiry to last_statrequest and then do this

if (last_statwrite < last_statrequest)
pgstat_write_statsfile(false);

i.e. comparing it to the start of the write. So another write will
start right
after the file is written out. And over and over.

Moreover there's the 'rename' step making the new file invisible for
the worker
processes, which makes the thing a bit more complicated.

What I'm suggesting it that there should be some sort of tracking the
write time
and then deciding whether the file is fresh enough using 10x that
value. So when
a file is written in 100 ms, it's be considered OK for the next 900 ms,
i.e. 1 sec
in total. Sure, we could use 5x or other coefficient, doesn't really
matter.

5) splitting the single stat file into multiple pieces - e.g. per
database,
written separately, so that the autovacuum workers don't need to
read all
the data even for databases that don't need to be vacuumed. This
might be
combined with (4).

I think this needs to happen eventually.

Yes, a nice patch idea ;-)

thanks for the feedback

Tomas

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Euler Taveira (#4)
Re: autovacuum stress-testing our system

Excerpts from Euler Taveira's message of mié sep 26 11:53:27 -0300 2012:

On 26-09-2012 09:43, Tomas Vondra wrote:

5) splitting the single stat file into multiple pieces - e.g. per database,
written separately, so that the autovacuum workers don't need to read all
the data even for databases that don't need to be vacuumed. This might be
combined with (4).

IMHO that's the definitive solution. It would be one file per database plus a
global one. That way, the check would only read the global.stat and process
those database that were modified. Also, an in-memory map could store that
information to speed up the checks.

+1

The only downside I can see is that you
will increase the number of opened file descriptors.

Note that most users of pgstat will only have two files open (instead of
one as currently) -- one for shared, one for their own database. Only
pgstat itself and autovac launcher would need to open pgstat files for
all databases; but both do not have a need to open other files
(arbitrary tables) so this shouldn't be a major problem.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#5)
Re: autovacuum stress-testing our system

Excerpts from Tomas Vondra's message of mié sep 26 12:25:58 -0300 2012:

Dne 26.09.2012 16:51, Jeff Janes napsal:

I think forking it off to to another value would be better. If you
are an autovacuum worker which is just starting up and so getting its
initial stats, you can tolerate a stats file up to
"autovacuum_naptime
/ 5.0" stale. If you are already started up and are just about to
vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as it
currently is, so as not to redundantly vacuum a table.

I always thought there's a "no more than one worker per database"
limit,

There is no such limitation.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#8Tomas Vondra
tv@fuzzy.cz
In reply to: Alvaro Herrera (#7)
Re: autovacuum stress-testing our system

Dne 26.09.2012 17:29, Alvaro Herrera napsal:

Excerpts from Tomas Vondra's message of mié sep 26 12:25:58 -0300
2012:

Dne 26.09.2012 16:51, Jeff Janes napsal:

I think forking it off to to another value would be better. If

you

are an autovacuum worker which is just starting up and so getting

its

initial stats, you can tolerate a stats file up to
"autovacuum_naptime
/ 5.0" stale. If you are already started up and are just about to
vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as

it

currently is, so as not to redundantly vacuum a table.

I always thought there's a "no more than one worker per database"
limit,

There is no such limitation.

OK, thanks. Still, reading/writing the small (per-database) files would
be
much faster so it would be easy to read/write them more often on
demand.

Tomas

#9Jeff Janes
jeff.janes@gmail.com
In reply to: Tomas Vondra (#5)
Re: autovacuum stress-testing our system

On Wed, Sep 26, 2012 at 8:25 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

Dne 26.09.2012 16:51, Jeff Janes napsal:

What is generating the endless stream you are seeing is that you have
1000 databases so if naptime is one minute you are vacuuming 16 per
second. Since every database gets a new process, that process needs
to read the file as it doesn't inherit one.

Right. But that makes the 10ms timeout even more strange, because the
worker is then using the data for very long time (even minutes).

On average that can't happen, or else your vacuuming would fall way
behind. But I agree, there is no reason to have very fresh statistics
to start with. naptime/5 seems like a good cutoff for me for the
start up reading. If a table only becomes eligible for vacuuming in
the last 20% of the naptime, I see no reason that it can't wait
another round. But that just means the statistics collector needs to
write the file less often, the workers still need to read it once per
database since each one only vacuums one database and don't inherit
the data from the launcher.

I think forking it off to to another value would be better. If you
are an autovacuum worker which is just starting up and so getting its
initial stats, you can tolerate a stats file up to "autovacuum_naptime
/ 5.0" stale. If you are already started up and are just about to
vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as it
currently is, so as not to redundantly vacuum a table.

I always thought there's a "no more than one worker per database" limit,
and that the file is always reloaded when switching to another database.
So I'm not sure how could a worker see such a stale table info? Or are
the workers keeping the stats across multiple databases?

If you only have one "active" database, then all the workers will be
in it. I don't how likely it is that they will leap frog each other
and collide. But anyway, if you 1000s of databases, then each one
will generally require zero vacuums per naptime (as you say, it is
mostly read only), so it is the reads upon start-up, not the reads per
table that needs vacuuming, which generates most of the traffic. Once
you separate those two parameters out, playing around with the
PGSTAT_RETRY_DELAY one seems like a needless risk.

3) logic detecting the proper PGSTAT_RETRY_DELAY value - based mostly on
the
time
it takes to write the file (e.g. 10x the write time or something).

This is already in place.

Really? Where?

I had thought that this part was effectively the same thing:

* We don't recompute min_ts after sleeping, except in the
* unlikely case that cur_ts went backwards.

But I think I did not understand your proposal.

I've checked the current master, and the only thing I see in
pgstat_write_statsfile
is this (line 3558):

last_statwrite = globalStats.stats_timestamp;

https://github.com/postgres/postgres/blob/master/src/backend/postmaster/pgstat.c#L3558

I don't think that's doing what I meant. That really doesn't scale the
timeout
according to write time. What happens right now is that when the stats file
is
written at time 0 (starts at zero, write finishes at 100 ms), and a worker
asks
for the file at 99 ms (i.e. 1ms before the write finishes), it will set the
time
of the inquiry to last_statrequest and then do this

if (last_statwrite < last_statrequest)
pgstat_write_statsfile(false);

i.e. comparing it to the start of the write. So another write will start
right
after the file is written out. And over and over.

Ah. I had wondered about this before too, and wondered if it would be
a good idea to have it go back to the beginning of the stats file, and
overwrite the timestamp with the current time (rather than the time it
started writing it), as the last action it does before the rename. I
think that would automatically make it adaptive to the time it takes
to write out the file, in a fairly simple way.

Moreover there's the 'rename' step making the new file invisible for the
worker
processes, which makes the thing a bit more complicated.

I think renames are assumed to be atomic. Either it sees the old one,
or the new one, but never sees neither.

Cheers,

Jeff

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#6)
Re: autovacuum stress-testing our system

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Excerpts from Euler Taveira's message of mié sep 26 11:53:27 -0300 2012:

On 26-09-2012 09:43, Tomas Vondra wrote:

5) splitting the single stat file into multiple pieces - e.g. per database,
written separately, so that the autovacuum workers don't need to read all
the data even for databases that don't need to be vacuumed. This might be
combined with (4).

IMHO that's the definitive solution. It would be one file per database plus a
global one. That way, the check would only read the global.stat and process
those database that were modified. Also, an in-memory map could store that
information to speed up the checks.

+1

That would help for the case of hundreds of databases, but how much
does it help for lots of tables in a single database?

I'm a bit suspicious of the idea that we should encourage people to use
hundreds of databases per installation anyway: the duplicated system
catalogs are going to be mighty expensive, both in disk space and in
their cache footprint in shared buffers. There was some speculation
at the last PGCon about how we might avoid the duplication, but I think
we're years away from any such thing actually happening.

What seems to me like it could help more is fixing things so that the
autovac launcher needn't even launch a child process for databases that
haven't had any updates lately. I'm not sure how to do that, but it
probably involves getting the stats collector to produce some kind of
summary file.

regards, tom lane

#11Jeff Janes
jeff.janes@gmail.com
In reply to: Tom Lane (#10)
Re: autovacuum stress-testing our system

On Wed, Sep 26, 2012 at 9:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Excerpts from Euler Taveira's message of mié sep 26 11:53:27 -0300 2012:

On 26-09-2012 09:43, Tomas Vondra wrote:

5) splitting the single stat file into multiple pieces - e.g. per database,
written separately, so that the autovacuum workers don't need to read all
the data even for databases that don't need to be vacuumed. This might be
combined with (4).

IMHO that's the definitive solution. It would be one file per database plus a
global one. That way, the check would only read the global.stat and process
those database that were modified. Also, an in-memory map could store that
information to speed up the checks.

+1

That would help for the case of hundreds of databases, but how much
does it help for lots of tables in a single database?

It doesn't help that case, but that case doesn't need much help. If
you have N statistics-kept objects in total spread over M databases,
of which T objects need vacuuming per naptime, the stats file traffic
is proportional to N*(M+T). If T is low, then there is generally is
no problem if M is also low. Or at least, the problem is much smaller
than when M is high for a fixed value of N.

I'm a bit suspicious of the idea that we should encourage people to use
hundreds of databases per installation anyway:

I agree with that, but we could still do a better job of tolerating
it; without encouraging it. If someone volunteers to write the code
to do this, what trade-offs would there be?

Cheers,

Jeff

#12Tomas Vondra
tv@fuzzy.cz
In reply to: Tom Lane (#10)
Re: autovacuum stress-testing our system

On 26.9.2012 18:29, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Excerpts from Euler Taveira's message of miĂŠ sep 26 11:53:27 -0300 2012:

On 26-09-2012 09:43, Tomas Vondra wrote:

5) splitting the single stat file into multiple pieces - e.g. per database,
written separately, so that the autovacuum workers don't need to read all
the data even for databases that don't need to be vacuumed. This might be
combined with (4).

IMHO that's the definitive solution. It would be one file per database plus a
global one. That way, the check would only read the global.stat and process
those database that were modified. Also, an in-memory map could store that
information to speed up the checks.

+1

That would help for the case of hundreds of databases, but how much
does it help for lots of tables in a single database?

Well, it wouldn't, but it wouldn't make it worse either. Or at least
that's how I understand it.

I'm a bit suspicious of the idea that we should encourage people to use
hundreds of databases per installation anyway: the duplicated system
catalogs are going to be mighty expensive, both in disk space and in
their cache footprint in shared buffers. There was some speculation
at the last PGCon about how we might avoid the duplication, but I think
we're years away from any such thing actually happening.

You don't need to encourage us to do that ;-) We know it's not perfect
and considering a good alternative - e.g. several databases (~10) with
schemas inside, replacing the current database-only approach. This way
we'd get multiple stat files (thus gaining the benefits) with less
overhead (shared catalogs).

And yes, using tens of thousands of tables (serving as "caches") for a
reporting solution is "interesting" (as in the old Chinese curse) too.

What seems to me like it could help more is fixing things so that the
autovac launcher needn't even launch a child process for databases that
haven't had any updates lately. I'm not sure how to do that, but it
probably involves getting the stats collector to produce some kind of
summary file.

Yes, I've proposed something like this in my original mail - setting a
"dirty" flag on objects (a database in this case) whenever a table in it
gets eligible for vacuum/analyze.

Tomas

#13Tomas Vondra
tv@fuzzy.cz
In reply to: Jeff Janes (#9)
Re: autovacuum stress-testing our system

On 26.9.2012 18:14, Jeff Janes wrote:

On Wed, Sep 26, 2012 at 8:25 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

Dne 26.09.2012 16:51, Jeff Janes napsal:

What is generating the endless stream you are seeing is that you have
1000 databases so if naptime is one minute you are vacuuming 16 per
second. Since every database gets a new process, that process needs
to read the file as it doesn't inherit one.

Right. But that makes the 10ms timeout even more strange, because the
worker is then using the data for very long time (even minutes).

On average that can't happen, or else your vacuuming would fall way
behind. But I agree, there is no reason to have very fresh statistics
to start with. naptime/5 seems like a good cutoff for me for the
start up reading. If a table only becomes eligible for vacuuming in
the last 20% of the naptime, I see no reason that it can't wait
another round. But that just means the statistics collector needs to
write the file less often, the workers still need to read it once per
database since each one only vacuums one database and don't inherit
the data from the launcher.

So what happens if there are two workers vacuuming the same database?
Wouldn't that make it more probable that were already vacuumed by the
other worker?

See the comment at the beginning of autovacuum.c, where it also states
that the statfile is reloaded before each table (probably because of the
calls to autovac_refresh_stats which in turn calls clear_snapshot).

I think forking it off to to another value would be better. If you
are an autovacuum worker which is just starting up and so getting its
initial stats, you can tolerate a stats file up to "autovacuum_naptime
/ 5.0" stale. If you are already started up and are just about to
vacuum a table, then keep the staleness at PGSTAT_RETRY_DELAY as it
currently is, so as not to redundantly vacuum a table.

I always thought there's a "no more than one worker per database" limit,
and that the file is always reloaded when switching to another database.
So I'm not sure how could a worker see such a stale table info? Or are
the workers keeping the stats across multiple databases?

If you only have one "active" database, then all the workers will be
in it. I don't how likely it is that they will leap frog each other
and collide. But anyway, if you 1000s of databases, then each one
will generally require zero vacuums per naptime (as you say, it is
mostly read only), so it is the reads upon start-up, not the reads per
table that needs vacuuming, which generates most of the traffic. Once
you separate those two parameters out, playing around with the
PGSTAT_RETRY_DELAY one seems like a needless risk.

OK, right. My fault.

Yes, our databases are mostly readable - more precisely whenever we load
data, we immediately do VACUUM ANALYZE on the tables, so autovacuum
never kicks in on them. The only thing it works on are system catalogs
and such stuff.

3) logic detecting the proper PGSTAT_RETRY_DELAY value - based mostly on
the
time
it takes to write the file (e.g. 10x the write time or something).

This is already in place.

Really? Where?

I had thought that this part was effectively the same thing:

* We don't recompute min_ts after sleeping, except in the
* unlikely case that cur_ts went backwards.

But I think I did not understand your proposal.

I've checked the current master, and the only thing I see in
pgstat_write_statsfile
is this (line 3558):

last_statwrite = globalStats.stats_timestamp;

https://github.com/postgres/postgres/blob/master/src/backend/postmaster/pgstat.c#L3558

I don't think that's doing what I meant. That really doesn't scale the
timeout
according to write time. What happens right now is that when the stats file
is
written at time 0 (starts at zero, write finishes at 100 ms), and a worker
asks
for the file at 99 ms (i.e. 1ms before the write finishes), it will set the
time
of the inquiry to last_statrequest and then do this

if (last_statwrite < last_statrequest)
pgstat_write_statsfile(false);

i.e. comparing it to the start of the write. So another write will start
right
after the file is written out. And over and over.

Ah. I had wondered about this before too, and wondered if it would be
a good idea to have it go back to the beginning of the stats file, and
overwrite the timestamp with the current time (rather than the time it
started writing it), as the last action it does before the rename. I
think that would automatically make it adaptive to the time it takes
to write out the file, in a fairly simple way.

Yeah, I was thinking about that too.

Moreover there's the 'rename' step making the new file invisible for the
worker
processes, which makes the thing a bit more complicated.

I think renames are assumed to be atomic. Either it sees the old one,
or the new one, but never sees neither.

I'm not quite sure what I meant, but not this - I know the renames are
atomic. I probably haven't noticed that inquiries are using min_ts, so I
though that an inquiry sent right after the write starts (with min_ts
before the write) would trigger another write, but that's not the case.

regards
Tomas

#14Simon Riggs
simon@2ndQuadrant.com
In reply to: Alvaro Herrera (#2)
Re: autovacuum stress-testing our system

On 26 September 2012 15:47, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Really, as far as autovacuum is concerned, it would be much more useful
to be able to reliably detect that a table has been recently vacuumed,
without having to request a 10ms-recent pgstat snapshot. That would
greatly reduce the amount of time autovac spends on pgstat requests.

VACUUMing generates a relcache invalidation. Can we arrange for those
invalidations to be received by autovac launcher, so it gets immediate
feedback of recent activity without polling?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#15Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Simon Riggs (#14)
Re: autovacuum stress-testing our system

Excerpts from Simon Riggs's message of jue sep 27 06:51:28 -0300 2012:

On 26 September 2012 15:47, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Really, as far as autovacuum is concerned, it would be much more useful
to be able to reliably detect that a table has been recently vacuumed,
without having to request a 10ms-recent pgstat snapshot. That would
greatly reduce the amount of time autovac spends on pgstat requests.

VACUUMing generates a relcache invalidation. Can we arrange for those
invalidations to be received by autovac launcher, so it gets immediate
feedback of recent activity without polling?

Hmm, this is an interesting idea worth exploring, I think. Maybe we
should sort tables in the autovac worker to-do list by age of last
invalidation messages received, or something like that. Totally unclear
on the details, but as I said, worth exploring.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#16Simon Riggs
simon@2ndQuadrant.com
In reply to: Alvaro Herrera (#15)
Re: autovacuum stress-testing our system

On 27 September 2012 15:57, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Excerpts from Simon Riggs's message of jue sep 27 06:51:28 -0300 2012:

On 26 September 2012 15:47, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Really, as far as autovacuum is concerned, it would be much more useful
to be able to reliably detect that a table has been recently vacuumed,
without having to request a 10ms-recent pgstat snapshot. That would
greatly reduce the amount of time autovac spends on pgstat requests.

VACUUMing generates a relcache invalidation. Can we arrange for those
invalidations to be received by autovac launcher, so it gets immediate
feedback of recent activity without polling?

Hmm, this is an interesting idea worth exploring, I think. Maybe we
should sort tables in the autovac worker to-do list by age of last
invalidation messages received, or something like that. Totally unclear
on the details, but as I said, worth exploring.

Just put them to back of queue if an inval is received.

There is already support for listening and yet never generating to
relcache inval messages.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#17Tomas Vondra
tv@fuzzy.cz
In reply to: Jeff Janes (#11)
1 attachment(s)
Re: autovacuum stress-testing our system

Hi!

On 26.9.2012 19:18, Jeff Janes wrote:

On Wed, Sep 26, 2012 at 9:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Excerpts from Euler Taveira's message of mié sep 26 11:53:27 -0300 2012:

On 26-09-2012 09:43, Tomas Vondra wrote:

5) splitting the single stat file into multiple pieces - e.g. per database,
written separately, so that the autovacuum workers don't need to read all
the data even for databases that don't need to be vacuumed. This might be
combined with (4).

IMHO that's the definitive solution. It would be one file per database plus a
global one. That way, the check would only read the global.stat and process
those database that were modified. Also, an in-memory map could store that
information to speed up the checks.

+1

That would help for the case of hundreds of databases, but how much
does it help for lots of tables in a single database?

It doesn't help that case, but that case doesn't need much help. If
you have N statistics-kept objects in total spread over M databases,
of which T objects need vacuuming per naptime, the stats file traffic
is proportional to N*(M+T). If T is low, then there is generally is
no problem if M is also low. Or at least, the problem is much smaller
than when M is high for a fixed value of N.

I've done some initial hacking on splitting the stat file into multiple
smaller pieces over the weekend, and it seems promising (at least with
respect to the issues we're having).

See the patch attached, but be aware that this is a very early WIP (or
rather a proof of concept), so it has many rough edges (read "sloppy
coding"). I haven't even added it to the commitfest yet ...

The two main changes are these:

(1) The stats file is split into a common "db" file, containing all the
DB Entries, and per-database files with tables/functions. The common
file is still called "pgstat.stat", the per-db files have the
database OID appended, so for example "pgstat.stat.12345" etc.

This was a trivial hack pgstat_read_statsfile/pgstat_write_statsfile
functions, introducing two new functions:

pgstat_read_db_statsfile
pgstat_write_db_statsfile

that do the trick of reading/writing stat file for one database.

(2) The pgstat_read_statsfile has an additional parameter "onlydbs" that
says that you don't need table/func stats - just the list of db
entries. This is used for autovacuum launcher, which does not need
to read the table/stats (if I'm reading the code in autovacuum.c
correctly - it seems to be working as expected).

So what are the benefits?

(a) When a launcher asks for info about databases, something like this
is called in the end:

pgstat_read_db_statsfile(InvalidOid, false, true)

which means all databases (InvalidOid) and only db info (true). So
it reads only the one common file with db entries, not the
table/func stats.

(b) When a worker asks for stats for a given DB, something like this is
called in the end:

pgstat_read_db_statsfile(MyDatabaseId, false, false)

which reads only the common stats file (with db entries) and only
one file for the one database.

The current implementation (with the single pgstat.stat file), all
the data had to be read (and skipped silently) in both cases.
That's a lot of CPU time, and we're seeing ~60% of CPU spent on
doing just this (writing/reading huge statsfile).

So with a lot of databases/objects, this "pgstat.stat split" saves
us a lot of CPU ...

(c) This should lower the space requirements too - with a single file,
you actually need at least 2x the disk space (or RAM, if you're
using tmpfs as we are), because you need to keep two versions of
the file at the same time (pgstat.stat and pgstat.tmp).

Thanks to this split you only need additional space for a copy of
the largest piece (with some reasonable safety reserve).

Well, it's very early patch, so there are rough edges too

(a) It does not solve the "many-schema" scenario at all - that'll need
a completely new approach I guess :-(

(b) It does not solve the writing part at all - the current code uses a
single timestamp (last_statwrite) to decide if a new file needs to
be written.

That clearly is not enough for multiple files - there should be one
timestamp for each database/file. I'm thinking about how to solve
this and how to integrate it with pgstat_send_inquiry etc.

One way might be adding the timestamp(s) into PgStat_StatDBEntry
and the other one is using an array of inquiries for each database.

And yet another one I'm thinking about is using a fixed-length
array of timestamps (e.g. 256), indexed by mod(dboid,256). That
would mean stats for all databases with the same mod(oid,256) would
be written at the same time. Seems like an over-engineering though.

(c) I'm a bit worried about the number of files - right now there's one
for each database and I'm thinking about splitting them by type
(one for tables, one for functions) which might make it even faster
for some apps with a lot of stored procedures etc.

But is the large number of files actually a problem? After all,
we're using one file per relation fork in the "base" directory, so
this seems like a minor issue.

And if really an issue, this might be solved by the mod(oid,256) to
combine multiple files into one (which would work neatly with the
fixed-length array of timestamps).

kind regards
Tomas

Attachments:

stats-split.patchtext/plain; charset=UTF-8; name=stats-split.patchDownload
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index be3adf1..226311c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -253,7 +253,9 @@ static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
 static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
 					 Oid tableoid, bool create);
 static void pgstat_write_statsfile(bool permanent);
-static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent);
+static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent);
+static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs);
+static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
 static void backend_read_statsfile(void);
 static void pgstat_read_current_status(void);
 
@@ -1408,13 +1410,14 @@ pgstat_ping(void)
  * ----------
  */
 static void
-pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time)
+pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time, Oid databaseid)
 {
 	PgStat_MsgInquiry msg;
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_INQUIRY);
 	msg.clock_time = clock_time;
 	msg.cutoff_time = cutoff_time;
+	msg.databaseid = databaseid;
 	pgstat_send(&msg, sizeof(msg));
 }
 
@@ -3063,7 +3066,7 @@ PgstatCollectorMain(int argc, char *argv[])
 	 * zero.
 	 */
 	pgStatRunningInCollector = true;
-	pgStatDBHash = pgstat_read_statsfile(InvalidOid, true);
+	pgStatDBHash = pgstat_read_statsfile(InvalidOid, true, false);
 
 	/*
 	 * Loop to process messages until we get SIGQUIT or detect ungraceful
@@ -3435,11 +3438,7 @@ static void
 pgstat_write_statsfile(bool permanent)
 {
 	HASH_SEQ_STATUS hstat;
-	HASH_SEQ_STATUS tstat;
-	HASH_SEQ_STATUS fstat;
 	PgStat_StatDBEntry *dbentry;
-	PgStat_StatTabEntry *tabentry;
-	PgStat_StatFuncEntry *funcentry;
 	FILE	   *fpout;
 	int32		format_id;
 	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
@@ -3493,29 +3492,15 @@ pgstat_write_statsfile(bool permanent)
 		(void) rc;				/* we'll check for error with ferror */
 
 		/*
-		 * Walk through the database's access stats per table.
+		 * Write our the tables and functions into a separate file.
 		 */
-		hash_seq_init(&tstat, dbentry->tables);
-		while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
-		{
-			fputc('T', fpout);
-			rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
-			(void) rc;			/* we'll check for error with ferror */
-		}
-
-		/*
-		 * Walk through the database's function stats table.
-		 */
-		hash_seq_init(&fstat, dbentry->functions);
-		while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
-		{
-			fputc('F', fpout);
-			rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
-			(void) rc;			/* we'll check for error with ferror */
-		}
+		pgstat_write_db_statsfile(dbentry, permanent);
 
 		/*
 		 * Mark the end of this DB
+		 * 
+		 * FIXME does it really make much sense, when the tables/functions
+		 * are moved to a separate file (using those chars?)
 		 */
 		fputc('d', fpout);
 	}
@@ -3587,6 +3572,111 @@ pgstat_write_statsfile(bool permanent)
 }
 
 
+
+/* ----------
+ * pgstat_write_statsfile() -
+ *
+ *	Tell the news.
+ *	If writing to the permanent file (happens when the collector is
+ *	shutting down only), remove the temporary file so that backends
+ *	starting up under a new postmaster can't read the old data before
+ *	the new collector is ready.
+ * ----------
+ */
+static void
+pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent)
+{
+	HASH_SEQ_STATUS tstat;
+	HASH_SEQ_STATUS fstat;
+	PgStat_StatTabEntry *tabentry;
+	PgStat_StatFuncEntry *funcentry;
+	FILE	   *fpout;
+	int			rc;
+
+	/* FIXME Disgusting. Handle properly ... */
+	const char *tmpfile_x = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
+	const char *statfile_x = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
+
+	char tmpfile[255];
+	char statfile[255];
+
+	/* FIXME Do some kind of reduction (e.g. mod(oid,255)) not to end with thousands of files,
+	 *one for each database */
+	snprintf(tmpfile, 255, "%s.%d", tmpfile_x, dbentry->databaseid);
+	snprintf(statfile, 255, "%s.%d", statfile_x, dbentry->databaseid);
+
+	/*
+	 * Open the statistics temp file to write out the current values.
+	 */
+	fpout = AllocateFile(tmpfile, PG_BINARY_W);
+	if (fpout == NULL)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+				 errmsg("could not open temporary statistics file \"%s\": %m",
+						tmpfile)));
+		return;
+	}
+
+	/*
+	 * Walk through the database's access stats per table.
+	 */
+	hash_seq_init(&tstat, dbentry->tables);
+	while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
+	{
+		fputc('T', fpout);
+		rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
+		(void) rc;			/* we'll check for error with ferror */
+	}
+
+	/*
+	 * Walk through the database's function stats table.
+	 */
+	hash_seq_init(&fstat, dbentry->functions);
+	while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
+	{
+		fputc('F', fpout);
+		rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
+		(void) rc;			/* we'll check for error with ferror */
+	}
+
+	/*
+	 * No more output to be done. Close the temp file and replace the old
+	 * pgstat.stat with it.  The ferror() check replaces testing for error
+	 * after each individual fputc or fwrite above.
+	 */
+	fputc('E', fpout);
+
+	if (ferror(fpout))
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+			   errmsg("could not write temporary statistics file \"%s\": %m",
+					  tmpfile)));
+		FreeFile(fpout);
+		unlink(tmpfile);
+	}
+	else if (FreeFile(fpout) < 0)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+			   errmsg("could not close temporary statistics file \"%s\": %m",
+					  tmpfile)));
+		unlink(tmpfile);
+	}
+	else if (rename(tmpfile, statfile) < 0)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+				 errmsg("could not rename temporary statistics file \"%s\" to \"%s\": %m",
+						tmpfile, statfile)));
+		unlink(tmpfile);
+	}
+	
+	// if (permanent)
+	//	unlink(pgstat_stat_filename);
+}
+
 /* ----------
  * pgstat_read_statsfile() -
  *
@@ -3595,14 +3685,10 @@ pgstat_write_statsfile(bool permanent)
  * ----------
  */
 static HTAB *
-pgstat_read_statsfile(Oid onlydb, bool permanent)
+pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs)
 {
 	PgStat_StatDBEntry *dbentry;
 	PgStat_StatDBEntry dbbuf;
-	PgStat_StatTabEntry *tabentry;
-	PgStat_StatTabEntry tabbuf;
-	PgStat_StatFuncEntry funcbuf;
-	PgStat_StatFuncEntry *funcentry;
 	HASHCTL		hash_ctl;
 	HTAB	   *dbhash;
 	HTAB	   *tabhash = NULL;
@@ -3758,6 +3844,16 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				 */
 				tabhash = dbentry->tables;
 				funchash = dbentry->functions;
+
+				/*
+				 * Read the data from the file for this database. If there was
+				 * onlydb specified (!= InvalidOid), we would not get here because
+				 * of a break above. So we don't need to recheck.
+				 */
+				if (! onlydbs)
+					pgstat_read_db_statsfile(dbentry->databaseid, tabhash, funchash,
+											permanent);
+
 				break;
 
 				/*
@@ -3768,6 +3864,79 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				funchash = NULL;
 				break;
 
+			case 'E':
+				goto done;
+
+			default:
+				ereport(pgStatRunningInCollector ? LOG : WARNING,
+						(errmsg("corrupted statistics file \"%s\"",
+								statfile)));
+				goto done;
+		}
+	}
+
+done:
+	FreeFile(fpin);
+
+	if (permanent)
+		unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+
+	return dbhash;
+}
+
+
+/* ----------
+ * pgstat_read_db_statsfile() -
+ *
+ *	Reads in an existing statistics collector db file and initializes the
+ *	tables and functions hash tables (for the database identified by Oid).
+ * ----------
+ */
+static void
+pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent)
+{
+	PgStat_StatTabEntry *tabentry;
+	PgStat_StatTabEntry tabbuf;
+	PgStat_StatFuncEntry funcbuf;
+	PgStat_StatFuncEntry *funcentry;
+	FILE	   *fpin;
+	bool		found;
+
+	/* FIXME Disgusting. Handle properly ... */
+	const char *statfile_x = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
+	char statfile[255];
+
+	/* FIXME Do some kind of reduction (e.g. mod(oid,255)) not to end with thousands of files,
+	 *one for each database */
+	snprintf(statfile, 255, "%s.%d", statfile_x, databaseid);
+
+	/*
+	 * Try to open the status file. If it doesn't exist, the backends simply
+	 * return zero for anything and the collector simply starts from scratch
+	 * with empty counters.
+	 *
+	 * ENOENT is a possibility if the stats collector is not running or has
+	 * not yet written the stats file the first time.  Any other failure
+	 * condition is suspicious.
+	 */
+	if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
+	{
+		if (errno != ENOENT)
+			ereport(pgStatRunningInCollector ? LOG : WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not open statistics file \"%s\": %m",
+							statfile)));
+		return;
+	}
+
+	/*
+	 * We found an existing collector stats file. Read it and put all the
+	 * hashtable entries into place.
+	 */
+	for (;;)
+	{
+		switch (fgetc(fpin))
+		{
 				/*
 				 * 'T'	A PgStat_StatTabEntry follows.
 				 */
@@ -3853,10 +4022,11 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 done:
 	FreeFile(fpin);
 
-	if (permanent)
-		unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+// FIXME unlink permanent filename (with the proper Oid appended
+// 	if (permanent)
+// 		unlink(PGSTAT_STAT_PERMANENT_FILENAME);
 
-	return dbhash;
+	return;
 }
 
 /* ----------
@@ -4006,7 +4176,7 @@ backend_read_statsfile(void)
 				pfree(mytime);
 			}
 
-			pgstat_send_inquiry(cur_ts, min_ts);
+			pgstat_send_inquiry(cur_ts, min_ts, InvalidOid);
 			break;
 		}
 
@@ -4016,7 +4186,7 @@ backend_read_statsfile(void)
 
 		/* Not there or too old, so kick the collector and wait a bit */
 		if ((count % PGSTAT_INQ_LOOP_COUNT) == 0)
-			pgstat_send_inquiry(cur_ts, min_ts);
+			pgstat_send_inquiry(cur_ts, min_ts, InvalidOid);
 
 		pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
 	}
@@ -4026,9 +4196,16 @@ backend_read_statsfile(void)
 
 	/* Autovacuum launcher wants stats about all databases */
 	if (IsAutoVacuumLauncherProcess())
-		pgStatDBHash = pgstat_read_statsfile(InvalidOid, false);
+		/* 
+		 * FIXME Does it really need info including tables/functions? Or is it enough to read
+		 * database-level stats? It seems to me the launcher needs PgStat_StatDBEntry only
+		 * (at least that's how I understand the rebuild_database_list() in autovacuum.c),
+		 * because pgstat_stattabentries are used in do_autovacuum() only, that that's what's
+		 * executed in workers ... So maybe we'd be just fine by reading in the dbentries?
+		 */
+		pgStatDBHash = pgstat_read_statsfile(InvalidOid, false, true);
 	else
-		pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false);
+		pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false, false);
 }
 
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 613c1c2..8971002 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -205,6 +205,7 @@ typedef struct PgStat_MsgInquiry
 	PgStat_MsgHdr m_hdr;
 	TimestampTz clock_time;		/* observed local clock time */
 	TimestampTz cutoff_time;	/* minimum acceptable file timestamp */
+	Oid			databaseid;		/* requested DB (InvalidOid => all DBs) */
 } PgStat_MsgInquiry;
 
 
#18Tomas Vondra
tv@fuzzy.cz
In reply to: Tom Lane (#10)
Re: autovacuum stress-testing our system

On 26.9.2012 18:29, Tom Lane wrote:

What seems to me like it could help more is fixing things so that the
autovac launcher needn't even launch a child process for databases that
haven't had any updates lately. I'm not sure how to do that, but it
probably involves getting the stats collector to produce some kind of
summary file.

Couldn't we use the PgStat_StatDBEntry for this? By splitting the
pgstat.stat file into multiple pieces (see my other post in this thread)
there's a file with StatDBEntry items only, so maybe it could be used as
the summary file ...

I've been thinking about this:

(a) add "needs_autovacuuming" flag to PgStat_(TableEntry|StatDBEntry)

(b) when table stats are updated, run quick check to decide whether
the table needs to be processed by autovacuum (vacuumed or
analyzed), and if yes then set needs_autovacuuming=true and both
for table and database

The worker may read the DB entries from the file and act only on those
that need to be processed (those with needs_autovacuuming=true).

Maybe the DB-level field might be a counter of tables that need to be
processed, and the autovacuum daemon might act on those first? Although
the simpler the better I guess.

Or did you mean something else?

regards
Tomas

#19Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#17)
Re: autovacuum stress-testing our system

On Sun, Nov 18, 2012 at 5:49 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

The two main changes are these:

(1) The stats file is split into a common "db" file, containing all the
DB Entries, and per-database files with tables/functions. The common
file is still called "pgstat.stat", the per-db files have the
database OID appended, so for example "pgstat.stat.12345" etc.

This was a trivial hack pgstat_read_statsfile/pgstat_write_statsfile
functions, introducing two new functions:

pgstat_read_db_statsfile
pgstat_write_db_statsfile

that do the trick of reading/writing stat file for one database.

(2) The pgstat_read_statsfile has an additional parameter "onlydbs" that
says that you don't need table/func stats - just the list of db
entries. This is used for autovacuum launcher, which does not need
to read the table/stats (if I'm reading the code in autovacuum.c
correctly - it seems to be working as expected).

I'm not an expert on the stats system, but this seems like a promising
approach to me.

(a) It does not solve the "many-schema" scenario at all - that'll need
a completely new approach I guess :-(

We don't need to solve every problem in the first patch. I've got no
problem kicking this one down the road.

(b) It does not solve the writing part at all - the current code uses a
single timestamp (last_statwrite) to decide if a new file needs to
be written.

That clearly is not enough for multiple files - there should be one
timestamp for each database/file. I'm thinking about how to solve
this and how to integrate it with pgstat_send_inquiry etc.

Presumably you need a last_statwrite for each file, in a hash table or
something, and requests need to specify which file is needed.

And yet another one I'm thinking about is using a fixed-length
array of timestamps (e.g. 256), indexed by mod(dboid,256). That
would mean stats for all databases with the same mod(oid,256) would
be written at the same time. Seems like an over-engineering though.

That seems like an unnecessary kludge.

(c) I'm a bit worried about the number of files - right now there's one
for each database and I'm thinking about splitting them by type
(one for tables, one for functions) which might make it even faster
for some apps with a lot of stored procedures etc.

But is the large number of files actually a problem? After all,
we're using one file per relation fork in the "base" directory, so
this seems like a minor issue.

I don't see why one file per database would be a problem. After all,
we already have on directory per database inside base/. If the user
has so many databases that dirent lookups in a directory of that size
are a problem, they're already hosed, and this will probably still
work out to a net win.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20Tomas Vondra
tv@fuzzy.cz
In reply to: Robert Haas (#19)
1 attachment(s)
Re: autovacuum stress-testing our system

On 21.11.2012 19:02, Robert Haas wrote:

On Sun, Nov 18, 2012 at 5:49 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

The two main changes are these:

(1) The stats file is split into a common "db" file, containing all the
DB Entries, and per-database files with tables/functions. The common
file is still called "pgstat.stat", the per-db files have the
database OID appended, so for example "pgstat.stat.12345" etc.

This was a trivial hack pgstat_read_statsfile/pgstat_write_statsfile
functions, introducing two new functions:

pgstat_read_db_statsfile
pgstat_write_db_statsfile

that do the trick of reading/writing stat file for one database.

(2) The pgstat_read_statsfile has an additional parameter "onlydbs" that
says that you don't need table/func stats - just the list of db
entries. This is used for autovacuum launcher, which does not need
to read the table/stats (if I'm reading the code in autovacuum.c
correctly - it seems to be working as expected).

I'm not an expert on the stats system, but this seems like a promising
approach to me.

(a) It does not solve the "many-schema" scenario at all - that'll need
a completely new approach I guess :-(

We don't need to solve every problem in the first patch. I've got no
problem kicking this one down the road.

(b) It does not solve the writing part at all - the current code uses a
single timestamp (last_statwrite) to decide if a new file needs to
be written.

That clearly is not enough for multiple files - there should be one
timestamp for each database/file. I'm thinking about how to solve
this and how to integrate it with pgstat_send_inquiry etc.

Presumably you need a last_statwrite for each file, in a hash table or
something, and requests need to specify which file is needed.

And yet another one I'm thinking about is using a fixed-length
array of timestamps (e.g. 256), indexed by mod(dboid,256). That
would mean stats for all databases with the same mod(oid,256) would
be written at the same time. Seems like an over-engineering though.

That seems like an unnecessary kludge.

(c) I'm a bit worried about the number of files - right now there's one
for each database and I'm thinking about splitting them by type
(one for tables, one for functions) which might make it even faster
for some apps with a lot of stored procedures etc.

But is the large number of files actually a problem? After all,
we're using one file per relation fork in the "base" directory, so
this seems like a minor issue.

I don't see why one file per database would be a problem. After all,
we already have on directory per database inside base/. If the user
has so many databases that dirent lookups in a directory of that size
are a problem, they're already hosed, and this will probably still
work out to a net win.

Attached is a v2 of the patch, fixing some of the issues and unclear
points from the initial version.

The main improvement is that it implements writing only stats for the
requested database (set when sending inquiry). There's a dynamic array
of request - for each DB only the last request is kept.

I've done a number of changes - most importantly:

- added a stats_timestamp field to PgStat_StatDBEntry, keeping the last
write of the database (i.e. a per-database last_statwrite), which is
used to decide whether the file is stale or not

- handling of the 'permanent' flag correctly (used when starting or
stopping the cluster) for per-db files

- added a very simple header to the per-db files (basically just a
format ID and a timestamp) - this is needed for checking of the
timestamp of the last write from workers (although maybe we could
just read the pgstat.stat, which is now rather small)

- a 'force' parameter (true - write all databases, even if they weren't
specifically requested)

So with the exception of 'multi-schema' case (which was not the aim of
this effort), it should solve all the issues of the initial version.

There are two blocks of code dealing with clock glitches. I haven't
fixed those yet, but that can wait I guess. I've also left there some
logging I've used during development (printing inquiries and which file
is written and when).

The main unsolved problem I'm struggling with is what to do when a
database is dropped? Right now, the statfile remains in pg_stat_tmp
forewer (or until the restart) - is there a good way to remove the
file? I'm thinking about adding a message to be sent to the collector
from the code that handles DROP TABLE.

I've done some very simple performance testing - I've created 1000
databases with 1000 tables each, done ANALYZE on all of them. With only
autovacum running, I've seen this:

Without the patch
-----------------

%CPU %MEM TIME+ COMMAND
18 3.0 0:10.10 postgres: autovacuum launcher process
17 2.6 0:11.44 postgres: stats collector process

The I/O was seriously bogged down, doing ~150 MB/s (basically what the
drive can handle) - with less dbs, or when the statfiles are placed on
tmpfs filesystem, we usually see ~70% of one core doing just this.

With the patch
--------------

Then, the typical "top" for PostgreSQL processes looked like this:

%CPU %MEM TIME+ COMMAND
2 0.3 1:16.57 postgres: autovacuum launcher process
2 3.1 0:25.34 postgres: stats collector process

and the average write speed from the stats collector was ~3.5MB/s
(measured using iotop), and even when running the ANALYZE etc. I was
getting rather light IO usage (like ~15 MB/s or something).

With both cases, the total size was ~150MB, but without the space
requirements are actually 2x that (because of writing a copy and then
renaming).

I'd like to put this into 2013-01 commit fest, but if we can do some
prior testing / comments, that'd be great.

regards
Tomas

Attachments:

stats-split-v2.patchtext/plain; charset=UTF-8; name=stats-split-v2.patchDownload
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index be3adf1..63b9e14 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -222,8 +222,16 @@ static PgStat_GlobalStats globalStats;
 /* Last time the collector successfully wrote the stats file */
 static TimestampTz last_statwrite;
 
-/* Latest statistics request time from backends */
-static TimestampTz last_statrequest;
+/* Write request info for each database */
+typedef struct DBWriteRequest
+{
+	Oid			databaseid;		/* OID of the database to write */
+	TimestampTz request_time;	/* timestamp of the last write request */
+} DBWriteRequest;
+
+/* Latest statistics request time from backends for each DB */
+static DBWriteRequest * last_statrequests = NULL;
+static int num_statrequests = 0;
 
 static volatile bool need_exit = false;
 static volatile bool got_SIGHUP = false;
@@ -252,8 +260,10 @@ static void pgstat_sighup_handler(SIGNAL_ARGS);
 static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
 static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
 					 Oid tableoid, bool create);
-static void pgstat_write_statsfile(bool permanent);
-static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent);
+static void pgstat_write_statsfile(bool permanent, bool force);
+static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent);
+static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs);
+static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
 static void backend_read_statsfile(void);
 static void pgstat_read_current_status(void);
 
@@ -285,6 +295,8 @@ static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le
 static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
 static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
 
+static bool pgstat_write_statsfile_needed();
+static bool pgstat_db_requested(Oid databaseid);
 
 /* ------------------------------------------------------------
  * Public functions called from postmaster follow
@@ -1408,13 +1420,14 @@ pgstat_ping(void)
  * ----------
  */
 static void
-pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time)
+pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time, Oid databaseid)
 {
 	PgStat_MsgInquiry msg;
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_INQUIRY);
 	msg.clock_time = clock_time;
 	msg.cutoff_time = cutoff_time;
+	msg.databaseid = databaseid;
 	pgstat_send(&msg, sizeof(msg));
 }
 
@@ -3004,6 +3017,7 @@ PgstatCollectorMain(int argc, char *argv[])
 	int			len;
 	PgStat_Msg	msg;
 	int			wr;
+	bool		first_write = true;
 
 	IsUnderPostmaster = true;	/* we are a postmaster subprocess now */
 
@@ -3055,15 +3069,15 @@ PgstatCollectorMain(int argc, char *argv[])
 	/*
 	 * Arrange to write the initial status file right away
 	 */
-	last_statrequest = GetCurrentTimestamp();
-	last_statwrite = last_statrequest - 1;
-
+	// last_statrequest = GetCurrentTimestamp();
+	// last_statwrite = GetCurrentTimestamp() - 1;
+	
 	/*
 	 * Read in an existing statistics stats file or initialize the stats to
 	 * zero.
 	 */
 	pgStatRunningInCollector = true;
-	pgStatDBHash = pgstat_read_statsfile(InvalidOid, true);
+	pgStatDBHash = pgstat_read_statsfile(InvalidOid, true, false);
 
 	/*
 	 * Loop to process messages until we get SIGQUIT or detect ungraceful
@@ -3109,8 +3123,11 @@ PgstatCollectorMain(int argc, char *argv[])
 			 * Write the stats file if a new request has arrived that is not
 			 * satisfied by existing file.
 			 */
-			if (last_statwrite < last_statrequest)
-				pgstat_write_statsfile(false);
+			if (first_write || pgstat_write_statsfile_needed())
+			{
+				pgstat_write_statsfile(false, first_write);
+				first_write = false;
+			}
 
 			/*
 			 * Try to receive and process a message.  This will not block,
@@ -3269,7 +3286,7 @@ PgstatCollectorMain(int argc, char *argv[])
 	/*
 	 * Save the final stats to reuse at next startup.
 	 */
-	pgstat_write_statsfile(true);
+	pgstat_write_statsfile(true, true);
 
 	exit(0);
 }
@@ -3432,20 +3449,18 @@ pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create)
  * ----------
  */
 static void
-pgstat_write_statsfile(bool permanent)
+pgstat_write_statsfile(bool permanent, bool force)
 {
 	HASH_SEQ_STATUS hstat;
-	HASH_SEQ_STATUS tstat;
-	HASH_SEQ_STATUS fstat;
 	PgStat_StatDBEntry *dbentry;
-	PgStat_StatTabEntry *tabentry;
-	PgStat_StatFuncEntry *funcentry;
 	FILE	   *fpout;
 	int32		format_id;
 	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 	int			rc;
 
+	elog(WARNING, "writing statsfile '%s'", statfile);
+	
 	/*
 	 * Open the statistics temp file to write out the current values.
 	 */
@@ -3489,36 +3504,36 @@ pgstat_write_statsfile(bool permanent)
 		 * use to any other process.
 		 */
 		fputc('D', fpout);
+		dbentry->stats_timestamp = globalStats.stats_timestamp;
 		rc = fwrite(dbentry, offsetof(PgStat_StatDBEntry, tables), 1, fpout);
 		(void) rc;				/* we'll check for error with ferror */
 
 		/*
-		 * Walk through the database's access stats per table.
+		 * Write our the tables and functions into a separate file, but only
+		 * if the database is in the requests or if it's a forced write (then
+		 * all the DBs need to be written - e.g. at the shutdown).
 		 */
-		hash_seq_init(&tstat, dbentry->tables);
-		while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
-		{
-			fputc('T', fpout);
-			rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
-			(void) rc;			/* we'll check for error with ferror */
-		}
-
-		/*
-		 * Walk through the database's function stats table.
-		 */
-		hash_seq_init(&fstat, dbentry->functions);
-		while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
-		{
-			fputc('F', fpout);
-			rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
-			(void) rc;			/* we'll check for error with ferror */
+		if (force || pgstat_db_requested(dbentry->databaseid)) {
+			elog(WARNING, "writing statsfile for DB %d", dbentry->databaseid);
+			pgstat_write_db_statsfile(dbentry, permanent);
 		}
 
 		/*
 		 * Mark the end of this DB
+		 * 
+		 * FIXME does it really make much sense, when the tables/functions
+		 * are moved to a separate file (using those chars?)
 		 */
 		fputc('d', fpout);
 	}
+	
+	/* In any case, we can just throw away all the db requests. */
+	if (last_statrequests != NULL)
+	{
+		pfree(last_statrequests);
+		last_statrequests = NULL;
+		num_statrequests = 0;
+	}
 
 	/*
 	 * No more output to be done. Close the temp file and replace the old
@@ -3559,27 +3574,28 @@ pgstat_write_statsfile(bool permanent)
 		 */
 		last_statwrite = globalStats.stats_timestamp;
 
+		/* FIXME Update to the per-db request times. */
 		/*
 		 * If there is clock skew between backends and the collector, we could
 		 * receive a stats request time that's in the future.  If so, complain
 		 * and reset last_statrequest.	Resetting ensures that no inquiry
 		 * message can cause more than one stats file write to occur.
 		 */
-		if (last_statrequest > last_statwrite)
-		{
-			char	   *reqtime;
-			char	   *mytime;
-
-			/* Copy because timestamptz_to_str returns a static buffer */
-			reqtime = pstrdup(timestamptz_to_str(last_statrequest));
-			mytime = pstrdup(timestamptz_to_str(last_statwrite));
-			elog(LOG, "last_statrequest %s is later than collector's time %s",
-				 reqtime, mytime);
-			pfree(reqtime);
-			pfree(mytime);
-
-			last_statrequest = last_statwrite;
-		}
+// 		if (last_statrequest > last_statwrite)
+// 		{
+// 			char	   *reqtime;
+// 			char	   *mytime;
+// 
+// 			/* Copy because timestamptz_to_str returns a static buffer */
+// 			reqtime = pstrdup(timestamptz_to_str(last_statrequest));
+// 			mytime = pstrdup(timestamptz_to_str(last_statwrite));
+// 			elog(LOG, "last_statrequest %s is later than collector's time %s",
+// 				 reqtime, mytime);
+// 			pfree(reqtime);
+// 			pfree(mytime);
+// 
+// 			last_statrequest = last_statwrite;
+// 		}
 	}
 
 	if (permanent)
@@ -3587,6 +3603,137 @@ pgstat_write_statsfile(bool permanent)
 }
 
 
+
+/* ----------
+ * pgstat_write_db_statsfile() -
+ *
+ *	Tell the news.
+ *	If writing to the permanent file (happens when the collector is
+ *	shutting down only), remove the temporary file so that backends
+ *	starting up under a new postmaster can't read the old data before
+ *	the new collector is ready.
+ * ----------
+ */
+static void
+pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent)
+{
+	HASH_SEQ_STATUS tstat;
+	HASH_SEQ_STATUS fstat;
+	PgStat_StatTabEntry *tabentry;
+	PgStat_StatFuncEntry *funcentry;
+	FILE	   *fpout;
+	int32		format_id;
+	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
+	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
+	int			rc;
+
+	/*
+	 * OIDs are 32-bit values, so 10 chars should be safe, +2 for the dot and \0 byte
+	 */
+	char db_tmpfile[strlen(tmpfile) + 12];
+	char db_statfile[strlen(statfile) + 12];
+
+	/*
+	 * Append database OID at the end of the basic filename (both for tmp and target file).
+	 */
+	snprintf(db_tmpfile, strlen(tmpfile) + 12, "%s.%d", tmpfile, dbentry->databaseid);
+	snprintf(db_statfile, strlen(statfile) + 12, "%s.%d", statfile, dbentry->databaseid);
+
+	elog(WARNING, "writing statsfile '%s'", db_statfile);
+
+	/*
+	 * Open the statistics temp file to write out the current values.
+	 */
+	fpout = AllocateFile(db_tmpfile, PG_BINARY_W);
+	if (fpout == NULL)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+				 errmsg("could not open temporary statistics file \"%s\": %m",
+						db_tmpfile)));
+		return;
+	}
+
+	/*
+	 * Write the file header --- currently just a format ID.
+	 */
+	format_id = PGSTAT_FILE_FORMAT_ID;
+	rc = fwrite(&format_id, sizeof(format_id), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
+	/*
+	 * Write the timestamp.
+	 */
+	rc = fwrite(&(globalStats.stats_timestamp), sizeof(globalStats.stats_timestamp), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
+	/*
+	 * Walk through the database's access stats per table.
+	 */
+	hash_seq_init(&tstat, dbentry->tables);
+	while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
+	{
+		fputc('T', fpout);
+		rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
+		(void) rc;			/* we'll check for error with ferror */
+	}
+
+	/*
+	 * Walk through the database's function stats table.
+	 */
+	hash_seq_init(&fstat, dbentry->functions);
+	while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
+	{
+		fputc('F', fpout);
+		rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
+		(void) rc;			/* we'll check for error with ferror */
+	}
+
+	/*
+	 * No more output to be done. Close the temp file and replace the old
+	 * pgstat.stat with it.  The ferror() check replaces testing for error
+	 * after each individual fputc or fwrite above.
+	 */
+	fputc('E', fpout);
+
+	if (ferror(fpout))
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+			   errmsg("could not write temporary statistics file \"%s\": %m",
+					  db_tmpfile)));
+		FreeFile(fpout);
+		unlink(db_tmpfile);
+	}
+	else if (FreeFile(fpout) < 0)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+			   errmsg("could not close temporary statistics file \"%s\": %m",
+					  db_tmpfile)));
+		unlink(db_tmpfile);
+	}
+	else if (rename(db_tmpfile, db_statfile) < 0)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+				 errmsg("could not rename temporary statistics file \"%s\" to \"%s\": %m",
+						db_tmpfile, db_statfile)));
+		unlink(db_tmpfile);
+	}
+	
+	if (permanent)
+	{
+		/* FIXME This aliases the existing db_statfile variable (might have different
+		 * length). */
+		char db_statfile[strlen(pgstat_stat_filename) + 12];
+		snprintf(db_statfile, strlen(pgstat_stat_filename) + 12, "%s.%d",
+				 pgstat_stat_filename, dbentry->databaseid);
+		elog(DEBUG1, "removing stat file '%s'", db_statfile);
+		unlink(db_statfile);
+	}
+}
+
 /* ----------
  * pgstat_read_statsfile() -
  *
@@ -3595,14 +3742,10 @@ pgstat_write_statsfile(bool permanent)
  * ----------
  */
 static HTAB *
-pgstat_read_statsfile(Oid onlydb, bool permanent)
+pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs)
 {
 	PgStat_StatDBEntry *dbentry;
 	PgStat_StatDBEntry dbbuf;
-	PgStat_StatTabEntry *tabentry;
-	PgStat_StatTabEntry tabbuf;
-	PgStat_StatFuncEntry funcbuf;
-	PgStat_StatFuncEntry *funcentry;
 	HASHCTL		hash_ctl;
 	HTAB	   *dbhash;
 	HTAB	   *tabhash = NULL;
@@ -3758,6 +3901,16 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				 */
 				tabhash = dbentry->tables;
 				funchash = dbentry->functions;
+
+				/*
+				 * Read the data from the file for this database. If there was
+				 * onlydb specified (!= InvalidOid), we would not get here because
+				 * of a break above. So we don't need to recheck.
+				 */
+				if (! onlydbs)
+					pgstat_read_db_statsfile(dbentry->databaseid, tabhash, funchash,
+											permanent);
+
 				break;
 
 				/*
@@ -3768,6 +3921,105 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				funchash = NULL;
 				break;
 
+			case 'E':
+				goto done;
+
+			default:
+				ereport(pgStatRunningInCollector ? LOG : WARNING,
+						(errmsg("corrupted statistics file \"%s\"",
+								statfile)));
+				goto done;
+		}
+	}
+
+done:
+	FreeFile(fpin);
+
+	if (permanent)
+		unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+
+	return dbhash;
+}
+
+
+/* ----------
+ * pgstat_read_db_statsfile() -
+ *
+ *	Reads in an existing statistics collector db file and initializes the
+ *	tables and functions hash tables (for the database identified by Oid).
+ * ----------
+ */
+static void
+pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent)
+{
+	PgStat_StatTabEntry *tabentry;
+	PgStat_StatTabEntry tabbuf;
+	PgStat_StatFuncEntry funcbuf;
+	PgStat_StatFuncEntry *funcentry;
+	FILE	   *fpin;
+	int32		format_id;
+	TimestampTz timestamp;
+	bool		found;
+	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
+
+	/*
+	 * OIDs are 32-bit values, so 10 chars should be safe, +2 for the dot and \0 byte
+	 */
+	char db_statfile[strlen(statfile) + 12];
+
+	/*
+	 * Append database OID at the end of the basic filename (both for tmp and target file).
+	 */
+	snprintf(db_statfile, strlen(statfile) + 12, "%s.%d", statfile, databaseid);
+
+	/*
+	 * Try to open the status file. If it doesn't exist, the backends simply
+	 * return zero for anything and the collector simply starts from scratch
+	 * with empty counters.
+	 *
+	 * ENOENT is a possibility if the stats collector is not running or has
+	 * not yet written the stats file the first time.  Any other failure
+	 * condition is suspicious.
+	 */
+	if ((fpin = AllocateFile(db_statfile, PG_BINARY_R)) == NULL)
+	{
+		if (errno != ENOENT)
+			ereport(pgStatRunningInCollector ? LOG : WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not open statistics file \"%s\": %m",
+							db_statfile)));
+		return;
+	}
+
+	/*
+	 * Verify it's of the expected format.
+	 */
+	if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id)
+		|| format_id != PGSTAT_FILE_FORMAT_ID)
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", db_statfile)));
+		goto done;
+	}
+
+	/*
+	 * Read global stats struct
+	 */
+	if (fread(&timestamp, 1, sizeof(timestamp), fpin) != sizeof(timestamp))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", db_statfile)));
+		goto done;
+	}
+
+	/*
+	 * We found an existing collector stats file. Read it and put all the
+	 * hashtable entries into place.
+	 */
+	for (;;)
+	{
+		switch (fgetc(fpin))
+		{
 				/*
 				 * 'T'	A PgStat_StatTabEntry follows.
 				 */
@@ -3777,7 +4029,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
-									statfile)));
+									db_statfile)));
 					goto done;
 				}
 
@@ -3795,7 +4047,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
-									statfile)));
+									db_statfile)));
 					goto done;
 				}
 
@@ -3811,7 +4063,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
-									statfile)));
+									db_statfile)));
 					goto done;
 				}
 
@@ -3829,7 +4081,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
-									statfile)));
+									db_statfile)));
 					goto done;
 				}
 
@@ -3845,7 +4097,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 			default:
 				ereport(pgStatRunningInCollector ? LOG : WARNING,
 						(errmsg("corrupted statistics file \"%s\"",
-								statfile)));
+								db_statfile)));
 				goto done;
 		}
 	}
@@ -3854,37 +4106,49 @@ done:
 	FreeFile(fpin);
 
 	if (permanent)
-		unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+	{
+		/* FIXME This aliases the existing db_statfile variable (might have different
+		 * length). */
+		char db_statfile[strlen(PGSTAT_STAT_PERMANENT_FILENAME) + 12];
+		snprintf(db_statfile, strlen(PGSTAT_STAT_PERMANENT_FILENAME) + 12, "%s.%d",
+				 PGSTAT_STAT_PERMANENT_FILENAME, databaseid);
+		elog(DEBUG1, "removing permanent stats file '%s'", db_statfile);
+		unlink(db_statfile);
+	}
 
-	return dbhash;
+	return;
 }
 
 /* ----------
- * pgstat_read_statsfile_timestamp() -
+ * pgstat_read_db_statsfile_timestamp() -
  *
- *	Attempt to fetch the timestamp of an existing stats file.
+ *	Attempt to fetch the timestamp of an existing stats file (for a DB).
  *	Returns TRUE if successful (timestamp is stored at *ts).
  * ----------
  */
 static bool
-pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
+pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent, TimestampTz *ts)
 {
-	PgStat_GlobalStats myGlobalStats;
+	TimestampTz timestamp;
 	FILE	   *fpin;
 	int32		format_id;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
+	char db_statfile[strlen(statfile) + 12];
+
+	/* format the db statfile filename */
+	snprintf(db_statfile, strlen(statfile) + 12, "%s.%d", statfile, databaseid);
 
 	/*
 	 * Try to open the status file.  As above, anything but ENOENT is worthy
 	 * of complaining about.
 	 */
-	if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
+	if ((fpin = AllocateFile(db_statfile, PG_BINARY_R)) == NULL)
 	{
 		if (errno != ENOENT)
 			ereport(pgStatRunningInCollector ? LOG : WARNING,
 					(errcode_for_file_access(),
 					 errmsg("could not open statistics file \"%s\": %m",
-							statfile)));
+							db_statfile)));
 		return false;
 	}
 
@@ -3895,7 +4159,7 @@ pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
 		|| format_id != PGSTAT_FILE_FORMAT_ID)
 	{
 		ereport(pgStatRunningInCollector ? LOG : WARNING,
-				(errmsg("corrupted statistics file \"%s\"", statfile)));
+				(errmsg("corrupted statistics file \"%s\"", db_statfile)));
 		FreeFile(fpin);
 		return false;
 	}
@@ -3903,15 +4167,15 @@ pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
 	/*
 	 * Read global stats struct
 	 */
-	if (fread(&myGlobalStats, 1, sizeof(myGlobalStats), fpin) != sizeof(myGlobalStats))
+	if (fread(&timestamp, 1, sizeof(TimestampTz), fpin) != sizeof(TimestampTz))
 	{
 		ereport(pgStatRunningInCollector ? LOG : WARNING,
-				(errmsg("corrupted statistics file \"%s\"", statfile)));
+				(errmsg("corrupted statistics file \"%s\"", db_statfile)));
 		FreeFile(fpin);
 		return false;
 	}
 
-	*ts = myGlobalStats.stats_timestamp;
+	*ts = timestamp;
 
 	FreeFile(fpin);
 	return true;
@@ -3947,7 +4211,7 @@ backend_read_statsfile(void)
 
 		CHECK_FOR_INTERRUPTS();
 
-		ok = pgstat_read_statsfile_timestamp(false, &file_ts);
+		ok = pgstat_read_db_statsfile_timestamp(MyDatabaseId, false, &file_ts);
 
 		cur_ts = GetCurrentTimestamp();
 		/* Calculate min acceptable timestamp, if we didn't already */
@@ -4006,7 +4270,7 @@ backend_read_statsfile(void)
 				pfree(mytime);
 			}
 
-			pgstat_send_inquiry(cur_ts, min_ts);
+			pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
 			break;
 		}
 
@@ -4016,7 +4280,7 @@ backend_read_statsfile(void)
 
 		/* Not there or too old, so kick the collector and wait a bit */
 		if ((count % PGSTAT_INQ_LOOP_COUNT) == 0)
-			pgstat_send_inquiry(cur_ts, min_ts);
+			pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
 
 		pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
 	}
@@ -4026,9 +4290,16 @@ backend_read_statsfile(void)
 
 	/* Autovacuum launcher wants stats about all databases */
 	if (IsAutoVacuumLauncherProcess())
-		pgStatDBHash = pgstat_read_statsfile(InvalidOid, false);
+		/* 
+		 * FIXME Does it really need info including tables/functions? Or is it enough to read
+		 * database-level stats? It seems to me the launcher needs PgStat_StatDBEntry only
+		 * (at least that's how I understand the rebuild_database_list() in autovacuum.c),
+		 * because pgstat_stattabentries are used in do_autovacuum() only, that that's what's
+		 * executed in workers ... So maybe we'd be just fine by reading in the dbentries?
+		 */
+		pgStatDBHash = pgstat_read_statsfile(InvalidOid, false, true);
 	else
-		pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false);
+		pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false, false);
 }
 
 
@@ -4084,13 +4355,53 @@ pgstat_clear_snapshot(void)
 static void
 pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
 {
+	int i = 0;
+	bool found = false;
+
+	elog(WARNING, "received inquiry for %d", msg->databaseid);
+
 	/*
-	 * Advance last_statrequest if this requestor has a newer cutoff time
-	 * than any previous request.
+	 * Find the last write request for this DB (found=true in that case). Plain
+	 * linear search, not really worth doing any magic here (probably).
 	 */
-	if (msg->cutoff_time > last_statrequest)
-		last_statrequest = msg->cutoff_time;
+	for (i = 0; i < num_statrequests; i++)
+	{
+		if (last_statrequests[i].databaseid == msg->databaseid)
+		{
+			found = true;
+			break;
+		}
+	}
+	
+	if (found)
+	{
+		/*
+		 * There already is a request for this DB, so lets advance the
+		 * request time	 if this requestor has a newer cutoff time
+		 * than any previous request.
+		 */
+		if (msg->cutoff_time > last_statrequests[i].request_time)
+			last_statrequests[i].request_time = msg->cutoff_time;
+	}
+	else
+	{
+		/*
+		 * There's no request for this DB yet, so lets create it (allocate a
+		 * space for it, set the values).
+		 */
+		if (last_statrequests == NULL)
+			last_statrequests = palloc(sizeof(DBWriteRequest));
+		else
+			last_statrequests = repalloc(last_statrequests,
+								(num_statrequests + 1)*sizeof(DBWriteRequest));
+		
+		last_statrequests[num_statrequests].databaseid = msg->databaseid;
+		last_statrequests[num_statrequests].request_time = msg->clock_time;
+		num_statrequests += 1;
+	}
 
+	/* FIXME Do we need to update this to work with per-db stats? This should
+	 * be moved to the "else" branch I guess. */
 	/*
 	 * If the requestor's local clock time is older than last_statwrite, we
 	 * should suspect a clock glitch, ie system time going backwards; though
@@ -4099,31 +4410,31 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
 	 * retreat in the system clock reading could otherwise cause us to neglect
 	 * to update the stats file for a long time.
 	 */
-	if (msg->clock_time < last_statwrite)
-	{
-		TimestampTz cur_ts = GetCurrentTimestamp();
-
-		if (cur_ts < last_statwrite)
-		{
-			/*
-			 * Sure enough, time went backwards.  Force a new stats file write
-			 * to get back in sync; but first, log a complaint.
-			 */
-			char	   *writetime;
-			char	   *mytime;
-
-			/* Copy because timestamptz_to_str returns a static buffer */
-			writetime = pstrdup(timestamptz_to_str(last_statwrite));
-			mytime = pstrdup(timestamptz_to_str(cur_ts));
-			elog(LOG, "last_statwrite %s is later than collector's time %s",
-				 writetime, mytime);
-			pfree(writetime);
-			pfree(mytime);
-
-			last_statrequest = cur_ts;
-			last_statwrite = last_statrequest - 1;
-		}
-	}
+// 	if (msg->clock_time < last_statwrite)
+// 	{
+// 		TimestampTz cur_ts = GetCurrentTimestamp();
+// 
+// 		if (cur_ts < last_statwrite)
+// 		{
+// 			/*
+// 			 * Sure enough, time went backwards.  Force a new stats file write
+// 			 * to get back in sync; but first, log a complaint.
+// 			 */
+// 			char	   *writetime;
+// 			char	   *mytime;
+// 
+// 			/* Copy because timestamptz_to_str returns a static buffer */
+// 			writetime = pstrdup(timestamptz_to_str(last_statwrite));
+// 			mytime = pstrdup(timestamptz_to_str(cur_ts));
+// 			elog(LOG, "last_statwrite %s is later than collector's time %s",
+// 				 writetime, mytime);
+// 			pfree(writetime);
+// 			pfree(mytime);
+// 
+// 			last_statrequest = cur_ts;
+// 			last_statwrite = last_statrequest - 1;
+// 		}
+// 	}
 }
 
 
@@ -4687,3 +4998,54 @@ pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len)
 						   HASH_REMOVE, NULL);
 	}
 }
+
+/* ----------
+ * pgstat_write_statsfile_needed() -
+ *
+ *	Checks whether there's a db stats request, requiring a file write.
+ * ----------
+ */
+
+static bool pgstat_write_statsfile_needed()
+{
+	int i = 0;
+	PgStat_StatDBEntry *dbentry;
+	
+	/* Check the databases if they need to refresh the stats. */
+	for (i = 0; i < num_statrequests; i++)
+	{
+		dbentry = pgstat_get_db_entry(last_statrequests[i].databaseid, false);
+		
+		/* No dbentry yet or too old. */
+		if ((! dbentry) ||
+			(dbentry->stats_timestamp < last_statrequests[i].request_time)) {
+			return true;
+		}
+		
+	}
+	
+	/* Well, everything was written recently ... */
+	return false;
+}
+
+/* ----------
+ * pgstat_write_statsfile_needed() -
+ *
+ *	Checks whether stats for a particular DB need to be written to a file).
+ * ----------
+ */
+
+static bool
+pgstat_db_requested(Oid databaseid)
+{
+	int i = 0;
+	
+	/* Check the databases if they need to refresh the stats. */
+	for (i = 0; i < num_statrequests; i++)
+	{
+		if (last_statrequests[i].databaseid == databaseid)
+			return true;
+	}
+	
+	return false;
+}
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 613c1c2..bdb1bbc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -205,6 +205,7 @@ typedef struct PgStat_MsgInquiry
 	PgStat_MsgHdr m_hdr;
 	TimestampTz clock_time;		/* observed local clock time */
 	TimestampTz cutoff_time;	/* minimum acceptable file timestamp */
+	Oid			databaseid;		/* requested DB (InvalidOid => all DBs) */
 } PgStat_MsgInquiry;
 
 
@@ -545,6 +546,7 @@ typedef struct PgStat_StatDBEntry
 	PgStat_Counter n_block_write_time;
 
 	TimestampTz stat_reset_timestamp;
+	TimestampTz stats_timestamp;		/* time of db stats file update */
 
 	/*
 	 * tables and functions must be last in the struct, because we don't write
#21Tomas Vondra
tv@fuzzy.cz
In reply to: Tomas Vondra (#1)
1 attachment(s)
PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Hi,

attached is a version of the patch that I believe is ready for the
commitfest. As the patch was discussed over a large number of messages,
I've prepared a brief summary for those who did not follow the thread.

Issue
=====

The patch aims to improve the situation in deployments with many tables
in many databases (think for example 1000 tables in 1000 databases).
Currently all the stats for all the objects (dbs, tables and functions)
are written in a single file (pgstat.stat), which may get quite large
and that consequently leads to various issues:

1) I/O load - the file is frequently written / read, which may use a
significant part of the I/O bandwidth. For example we'have to deal with
cases when the pgstat.stat size is >150MB, and it's written (and read)
continuously (once it's written, a new write starts) and utilizes 100%
bandwidth on that device.

2) CPU load - a common solution to the previous issue is moving the file
into RAM, using a tmpfs filesystem. That "fixes" the I/O bottleneck but
causes high CPU load because the system is serializing and deserializing
large amounts of data. We often see ~1 CPU core "lost" due to this (and
causing higher power consumption, but that's Amazon's problem ;-)

3) disk space utilization - the pgstat.stat file is updated in two
steps, i.e. a new version is written to another file (pgstat.tmp) and
then it's renamed to pgstat.stat, which means the device (amount of RAM,
if using tmpfs device) needs to be >2x the actual size of the file.
(Actually more, because there may be descriptors open to multiple
versions of the file.)

This patch does not attempt to fix a "single DB with multiple schemas"
scenario, although it should not have a negative impact on it.

What the patch does
===================

1) split into global and per-db files
-------------------------------------

The patch "splits" the huge pgstat.stat file into smaller pieces - one
"global" one (global.stat) with database stats, and one file for each of
the databases (oid.stat) with table.

This makes it possible to write/read much smaller amounts of data, because

a) autovacuum launcher does not need to read the whole file - it needs
just the list of databases (and not the table/func stats)

b) autovacuum workers do request a fresh copy of a single database, so
the stats collector may write just the global.stat + one of the per-db files

and that consequently leads to much lower I/O and CPU load. During our
tests we've seen the I/O to drop from ~150MB/s to less than 4MB/s, and
much lower CPU utilization.

2) a new global/stat directory
------------------------------

The pgstat.stat file was originally saved into the "global" directory,
but with so many files that would get rather messy so I've created a new
global/stat directory and all the files are stored there.

This also means we can do a simple "delete files in the dir" when
pgstat_reset_all is called.

3) pgstat_(read|write)_statsfile split
--------------------------------------

These two functions were moved into a global and per-db functions, so
now there's

pgstat_write_statsfile -- global.stat
pgstat_write_db_statsfile -- oid.stat

pgstat_read_statsfile -- global.stat
pgstat_read_db_statsfile -- oid.stat

There's a logic to read/write only those files that are actually needed.

4) list of (OID, timestamp) inquiries, last db-write
----------------------------------------------------

Originally there was a single pair of request/write timestamps for the
whole file, updated whenever a worker requested a fresh file or when the
file was written.

With the split, this had to be replaced by two lists - a timestamp of
the last inquiry (per DB), and a timestamp when each database file was
written for the last time.

The timestamp of the last DB write was added to the PgStat_StatDBEntry
and the list of inquiries is kept in last_statrequests. The fields are
used at several places, so it's probably best to see the code.

Handling the timestamps is a rather complex stuff because of the clock
skews. One of those checks is not needed as the list of inquiries is
freed right after writing all the databases. But I wouldn't be surprised
if there was something I missed, as splitting the file into multiple
pieces made this part more complex.

So please, if you're going to review this patch this is one of the
tricky places.

5) dummy file
-------------

A special handling is necessary when an inquiry arrives for a database
without a PgStat_StatDBEntry - this happens for example right after
initdb, when there are no stats for template0 and template1, yet the
autovacuum workers do send inqiries for them.

The backend_read_statsfile now uses the timestamp stored in the header
of the per-db file (not in the global one), and the easies way to handle
this for new databases is writing an empty 'dummy file' (just a header
with timestamp). Without this, this would result in 'pgstat wait
timeout' errors.

This is what pgstat_write_db_dummyfile (used in pgstat_write_statsfile)
is for.

6) format ID
------------

I've bumped PGSTAT_FILE_FORMAT_ID to a new random value, although the
filenames changed to so we could live with the old ID just fine.

We've done a fair amount of testing so far, and if everything goes fine
we plan to deploy a back-ported version of this patch (to 9.1) on a
production in ~2 weeks.

Then I'll be able to provide some numbers from a real-world workload
(although our deployment and workload is not quite usual I guess).

regards

Attachments:

stats-split-v4.patchtext/plain; charset=UTF-8; name=stats-split-v4.patchDownload
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index be3adf1..37b85e6 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -64,10 +64,14 @@
 
 /* ----------
  * Paths for the statistics files (relative to installation's $PGDATA).
+ * Permanent and temprorary, global and per-database files.
  * ----------
  */
-#define PGSTAT_STAT_PERMANENT_FILENAME		"global/pgstat.stat"
-#define PGSTAT_STAT_PERMANENT_TMPFILE		"global/pgstat.tmp"
+#define PGSTAT_STAT_PERMANENT_DIRECTORY		"global/stat"
+#define PGSTAT_STAT_PERMANENT_FILENAME		"global/stat/global.stat"
+#define PGSTAT_STAT_PERMANENT_TMPFILE		"global/stat/global.tmp"
+#define PGSTAT_STAT_PERMANENT_DB_FILENAME	"global/stat/%d.stat"
+#define PGSTAT_STAT_PERMANENT_DB_TMPFILE	"global/stat/%d.tmp"
 
 /* ----------
  * Timer definitions.
@@ -115,8 +119,11 @@ int			pgstat_track_activity_query_size = 1024;
  * Built from GUC parameter
  * ----------
  */
+char	   *pgstat_stat_directory = NULL;
 char	   *pgstat_stat_filename = NULL;
 char	   *pgstat_stat_tmpname = NULL;
+char	   *pgstat_stat_db_filename = NULL;
+char	   *pgstat_stat_db_tmpname = NULL;
 
 /*
  * BgWriter global statistics counters (unused in other processes).
@@ -219,11 +226,16 @@ static int	localNumBackends = 0;
  */
 static PgStat_GlobalStats globalStats;
 
-/* Last time the collector successfully wrote the stats file */
-static TimestampTz last_statwrite;
+/* Write request info for each database */
+typedef struct DBWriteRequest
+{
+	Oid			databaseid;		/* OID of the database to write */
+	TimestampTz request_time;	/* timestamp of the last write request */
+} DBWriteRequest;
 
-/* Latest statistics request time from backends */
-static TimestampTz last_statrequest;
+/* Latest statistics request time from backends for each DB */
+static DBWriteRequest * last_statrequests = NULL;
+static int num_statrequests = 0;
 
 static volatile bool need_exit = false;
 static volatile bool got_SIGHUP = false;
@@ -252,11 +264,17 @@ static void pgstat_sighup_handler(SIGNAL_ARGS);
 static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
 static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
 					 Oid tableoid, bool create);
-static void pgstat_write_statsfile(bool permanent);
-static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent);
+static void pgstat_write_statsfile(bool permanent, bool force);
+static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent);
+static void pgstat_write_db_dummyfile(Oid databaseid);
+static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs);
+static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
 static void backend_read_statsfile(void);
 static void pgstat_read_current_status(void);
 
+static bool pgstat_write_statsfile_needed();
+static bool pgstat_db_requested(Oid databaseid);
+
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
 static HTAB *pgstat_collect_oids(Oid catalogid);
@@ -285,7 +303,6 @@ static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le
 static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
 static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
 
-
 /* ------------------------------------------------------------
  * Public functions called from postmaster follow
  * ------------------------------------------------------------
@@ -549,8 +566,34 @@ startup_failed:
 void
 pgstat_reset_all(void)
 {
-	unlink(pgstat_stat_filename);
-	unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+	DIR * dir;
+	struct dirent * entry;
+
+	dir = AllocateDir(pgstat_stat_directory);
+	while ((entry = ReadDir(dir, pgstat_stat_directory)) != NULL)
+	{
+		char fname[strlen(pgstat_stat_directory) + strlen(entry->d_name) + 1];
+
+		if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
+			continue;
+
+		sprintf(fname, "%s/%s", pgstat_stat_directory, entry->d_name);
+		unlink(fname);
+	}
+	FreeDir(dir);
+
+	dir = AllocateDir(PGSTAT_STAT_PERMANENT_DIRECTORY);
+	while ((entry = ReadDir(dir, PGSTAT_STAT_PERMANENT_DIRECTORY)) != NULL)
+	{
+		char fname[strlen(PGSTAT_STAT_PERMANENT_FILENAME) + strlen(entry->d_name) + 1];
+
+		if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
+			continue;
+
+		sprintf(fname, "%s/%s", PGSTAT_STAT_PERMANENT_FILENAME, entry->d_name);
+		unlink(fname);
+	}
+	FreeDir(dir);
 }
 
 #ifdef EXEC_BACKEND
@@ -1408,13 +1451,14 @@ pgstat_ping(void)
  * ----------
  */
 static void
-pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time)
+pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time, Oid databaseid)
 {
 	PgStat_MsgInquiry msg;
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_INQUIRY);
 	msg.clock_time = clock_time;
 	msg.cutoff_time = cutoff_time;
+	msg.databaseid = databaseid;
 	pgstat_send(&msg, sizeof(msg));
 }
 
@@ -3004,6 +3048,7 @@ PgstatCollectorMain(int argc, char *argv[])
 	int			len;
 	PgStat_Msg	msg;
 	int			wr;
+	bool		first_write = true;
 
 	IsUnderPostmaster = true;	/* we are a postmaster subprocess now */
 
@@ -3053,17 +3098,11 @@ PgstatCollectorMain(int argc, char *argv[])
 	init_ps_display("stats collector process", "", "", "");
 
 	/*
-	 * Arrange to write the initial status file right away
-	 */
-	last_statrequest = GetCurrentTimestamp();
-	last_statwrite = last_statrequest - 1;
-
-	/*
 	 * Read in an existing statistics stats file or initialize the stats to
-	 * zero.
+	 * zero (read data for all databases, including table/func stats).
 	 */
 	pgStatRunningInCollector = true;
-	pgStatDBHash = pgstat_read_statsfile(InvalidOid, true);
+	pgStatDBHash = pgstat_read_statsfile(InvalidOid, true, false);
 
 	/*
 	 * Loop to process messages until we get SIGQUIT or detect ungraceful
@@ -3107,10 +3146,14 @@ PgstatCollectorMain(int argc, char *argv[])
 
 			/*
 			 * Write the stats file if a new request has arrived that is not
-			 * satisfied by existing file.
+			 * satisfied by existing file (force writing all files if it's
+			 * the first write after startup).
 			 */
-			if (last_statwrite < last_statrequest)
-				pgstat_write_statsfile(false);
+			if (first_write || pgstat_write_statsfile_needed())
+			{
+				pgstat_write_statsfile(false, first_write);
+				first_write = false;
+			}
 
 			/*
 			 * Try to receive and process a message.  This will not block,
@@ -3269,7 +3312,7 @@ PgstatCollectorMain(int argc, char *argv[])
 	/*
 	 * Save the final stats to reuse at next startup.
 	 */
-	pgstat_write_statsfile(true);
+	pgstat_write_statsfile(true, true);
 
 	exit(0);
 }
@@ -3429,23 +3472,25 @@ pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create)
  *	shutting down only), remove the temporary file so that backends
  *	starting up under a new postmaster can't read the old data before
  *	the new collector is ready.
+ * 
+ *	When the 'force' is false, only the requested databases (listed in
+ * 	last_statrequests) will be written. If 'force' is true, all databases
+ * 	will be written (this is used e.g. at shutdown).
  * ----------
  */
 static void
-pgstat_write_statsfile(bool permanent)
+pgstat_write_statsfile(bool permanent, bool force)
 {
 	HASH_SEQ_STATUS hstat;
-	HASH_SEQ_STATUS tstat;
-	HASH_SEQ_STATUS fstat;
 	PgStat_StatDBEntry *dbentry;
-	PgStat_StatTabEntry *tabentry;
-	PgStat_StatFuncEntry *funcentry;
 	FILE	   *fpout;
 	int32		format_id;
 	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 	int			rc;
 
+	elog(DEBUG1, "writing statsfile '%s'", statfile);
+	
 	/*
 	 * Open the statistics temp file to write out the current values.
 	 */
@@ -3484,6 +3529,20 @@ pgstat_write_statsfile(bool permanent)
 	while ((dbentry = (PgStat_StatDBEntry *) hash_seq_search(&hstat)) != NULL)
 	{
 		/*
+		 * Write our the tables and functions into a separate file, but only
+		 * if the database is in the requests or if it's a forced write (then
+		 * all the DBs need to be written - e.g. at the shutdown).
+		 * 
+		 * We need to do this before the dbentry write to write the proper
+		 * timestamp to the global file.
+		 */
+		if (force || pgstat_db_requested(dbentry->databaseid)) {
+			elog(DEBUG1, "writing statsfile for DB %d", dbentry->databaseid);
+			dbentry->stats_timestamp = globalStats.stats_timestamp;
+			pgstat_write_db_statsfile(dbentry, permanent);
+		}
+
+		/*
 		 * Write out the DB entry including the number of live backends. We
 		 * don't write the tables or functions pointers, since they're of no
 		 * use to any other process.
@@ -3493,29 +3552,10 @@ pgstat_write_statsfile(bool permanent)
 		(void) rc;				/* we'll check for error with ferror */
 
 		/*
-		 * Walk through the database's access stats per table.
-		 */
-		hash_seq_init(&tstat, dbentry->tables);
-		while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
-		{
-			fputc('T', fpout);
-			rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
-			(void) rc;			/* we'll check for error with ferror */
-		}
-
-		/*
-		 * Walk through the database's function stats table.
-		 */
-		hash_seq_init(&fstat, dbentry->functions);
-		while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
-		{
-			fputc('F', fpout);
-			rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
-			(void) rc;			/* we'll check for error with ferror */
-		}
-
-		/*
 		 * Mark the end of this DB
+		 * 
+		 * TODO Does using these chars still make sense, when the tables/func
+		 * stats are moved to a separate file?
 		 */
 		fputc('d', fpout);
 	}
@@ -3527,6 +3567,28 @@ pgstat_write_statsfile(bool permanent)
 	 */
 	fputc('E', fpout);
 
+	/* In any case, we can just throw away all the db requests, but we need to
+	 * write dummy files for databases without a stat entry (it would cause
+	 * issues in pgstat_read_db_statsfile_timestamp and pgstat wait timeouts).
+	 * This may happend e.g. for shared DB (oid = 0) right after initdb.
+	 */
+	if (last_statrequests != NULL)
+	{
+		int i = 0;
+		for (i = 0; i < num_statrequests; i++)
+		{
+			/* Create dummy files for requested databases without a proper
+			 * dbentry. It's much easier this way than dealing with multiple
+			 * timestamps, possibly existing but not yet written DBs etc. */
+			if (! pgstat_get_db_entry(last_statrequests[i].databaseid, false))
+				pgstat_write_db_dummyfile(last_statrequests[i].databaseid);
+		}
+
+		pfree(last_statrequests);
+		last_statrequests = NULL;
+		num_statrequests = 0;
+	}
+
 	if (ferror(fpout))
 	{
 		ereport(LOG,
@@ -3552,57 +3614,247 @@ pgstat_write_statsfile(bool permanent)
 						tmpfile, statfile)));
 		unlink(tmpfile);
 	}
-	else
+
+	if (permanent)
+		unlink(pgstat_stat_filename);
+}
+
+
+/* ----------
+ * pgstat_write_db_statsfile() -
+ *
+ *	Tell the news. This writes stats file for a single database.
+ *
+ *	If writing to the permanent file (happens when the collector is
+ *	shutting down only), remove the temporary file so that backends
+ *	starting up under a new postmaster can't read the old data before
+ *	the new collector is ready.
+ * ----------
+ */
+static void
+pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent)
+{
+	HASH_SEQ_STATUS tstat;
+	HASH_SEQ_STATUS fstat;
+	PgStat_StatTabEntry *tabentry;
+	PgStat_StatFuncEntry *funcentry;
+	FILE	   *fpout;
+	int32		format_id;
+	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_DB_TMPFILE : pgstat_stat_db_tmpname;
+	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_DB_FILENAME : pgstat_stat_db_filename;
+	int			rc;
+
+	/*
+	 * OIDs are 32-bit values, so 10 chars should be safe, +1 for the \0 byte
+	 */
+	char db_tmpfile[strlen(tmpfile) + 11];
+	char db_statfile[strlen(statfile) + 11];
+
+	/*
+	 * Append database OID at the end of the basic filename (both for tmp and target file).
+	 */
+	snprintf(db_tmpfile, strlen(tmpfile) + 11, tmpfile, dbentry->databaseid);
+	snprintf(db_statfile, strlen(statfile) + 11, statfile, dbentry->databaseid);
+
+	elog(DEBUG1, "writing statsfile '%s'", db_statfile);
+
+	/*
+	 * Open the statistics temp file to write out the current values.
+	 */
+	fpout = AllocateFile(db_tmpfile, PG_BINARY_W);
+	if (fpout == NULL)
 	{
-		/*
-		 * Successful write, so update last_statwrite.
-		 */
-		last_statwrite = globalStats.stats_timestamp;
+		ereport(LOG,
+				(errcode_for_file_access(),
+				 errmsg("could not open temporary statistics file \"%s\": %m",
+						db_tmpfile)));
+		return;
+	}
 
-		/*
-		 * If there is clock skew between backends and the collector, we could
-		 * receive a stats request time that's in the future.  If so, complain
-		 * and reset last_statrequest.	Resetting ensures that no inquiry
-		 * message can cause more than one stats file write to occur.
-		 */
-		if (last_statrequest > last_statwrite)
-		{
-			char	   *reqtime;
-			char	   *mytime;
-
-			/* Copy because timestamptz_to_str returns a static buffer */
-			reqtime = pstrdup(timestamptz_to_str(last_statrequest));
-			mytime = pstrdup(timestamptz_to_str(last_statwrite));
-			elog(LOG, "last_statrequest %s is later than collector's time %s",
-				 reqtime, mytime);
-			pfree(reqtime);
-			pfree(mytime);
-
-			last_statrequest = last_statwrite;
-		}
+	/*
+	 * Write the file header --- currently just a format ID.
+	 */
+	format_id = PGSTAT_FILE_FORMAT_ID;
+	rc = fwrite(&format_id, sizeof(format_id), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
+	/*
+	 * Write the timestamp.
+	 */
+	rc = fwrite(&(globalStats.stats_timestamp), sizeof(globalStats.stats_timestamp), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
+	/*
+	 * Walk through the database's access stats per table.
+	 */
+	hash_seq_init(&tstat, dbentry->tables);
+	while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
+	{
+		fputc('T', fpout);
+		rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
+		(void) rc;			/* we'll check for error with ferror */
 	}
 
+	/*
+	 * Walk through the database's function stats table.
+	 */
+	hash_seq_init(&fstat, dbentry->functions);
+	while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
+	{
+		fputc('F', fpout);
+		rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
+		(void) rc;			/* we'll check for error with ferror */
+	}
+
+	/*
+	 * No more output to be done. Close the temp file and replace the old
+	 * pgstat.stat with it.  The ferror() check replaces testing for error
+	 * after each individual fputc or fwrite above.
+	 */
+	fputc('E', fpout);
+
+	if (ferror(fpout))
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+			   errmsg("could not write temporary statistics file \"%s\": %m",
+					  db_tmpfile)));
+		FreeFile(fpout);
+		unlink(db_tmpfile);
+	}
+	else if (FreeFile(fpout) < 0)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+			   errmsg("could not close temporary statistics file \"%s\": %m",
+					  db_tmpfile)));
+		unlink(db_tmpfile);
+	}
+	else if (rename(db_tmpfile, db_statfile) < 0)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+				 errmsg("could not rename temporary statistics file \"%s\" to \"%s\": %m",
+						db_tmpfile, db_statfile)));
+		unlink(db_tmpfile);
+	}
+	
 	if (permanent)
-		unlink(pgstat_stat_filename);
+	{
+		char db_statfile[strlen(pgstat_stat_db_filename) + 11];
+		snprintf(db_statfile, strlen(pgstat_stat_db_filename) + 11,
+				 pgstat_stat_db_filename, dbentry->databaseid);
+		elog(DEBUG1, "removing temporary stat file '%s'", db_statfile);
+		unlink(db_statfile);
+	}
 }
 
 
 /* ----------
+ * pgstat_write_db_dummyfile() -
+ *
+ *	All this does is writing a dummy stat file for databases without dbentry
+ *	yet. It basically writes just a file header - format ID and a timestamp.
+ * ----------
+ */
+static void
+pgstat_write_db_dummyfile(Oid databaseid)
+{
+	FILE	   *fpout;
+	int32		format_id;
+	int			rc;
+
+	/*
+	 * OIDs are 32-bit values, so 10 chars should be safe, +1 for the \0 byte
+	 */
+	char db_tmpfile[strlen(pgstat_stat_db_tmpname) + 11];
+	char db_statfile[strlen(pgstat_stat_db_filename) + 11];
+
+	/*
+	 * Append database OID at the end of the basic filename (both for tmp and target file).
+	 */
+	snprintf(db_tmpfile, strlen(pgstat_stat_db_tmpname) + 11, pgstat_stat_db_tmpname, databaseid);
+	snprintf(db_statfile, strlen(pgstat_stat_db_filename) + 11, pgstat_stat_db_filename, databaseid);
+
+	elog(DEBUG1, "writing statsfile '%s'", db_statfile);
+
+	/*
+	 * Open the statistics temp file to write out the current values.
+	 */
+	fpout = AllocateFile(db_tmpfile, PG_BINARY_W);
+	if (fpout == NULL)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+				 errmsg("could not open temporary statistics file \"%s\": %m",
+						db_tmpfile)));
+		return;
+	}
+
+	/*
+	 * Write the file header --- currently just a format ID.
+	 */
+	format_id = PGSTAT_FILE_FORMAT_ID;
+	rc = fwrite(&format_id, sizeof(format_id), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
+	/*
+	 * Write the timestamp.
+	 */
+	rc = fwrite(&(globalStats.stats_timestamp), sizeof(globalStats.stats_timestamp), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
+	/*
+	 * No more output to be done. Close the temp file and replace the old
+	 * pgstat.stat with it.  The ferror() check replaces testing for error
+	 * after each individual fputc or fwrite above.
+	 */
+	fputc('E', fpout);
+
+	if (ferror(fpout))
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+			   errmsg("could not write temporary dummy statistics file \"%s\": %m",
+					  db_tmpfile)));
+		FreeFile(fpout);
+		unlink(db_tmpfile);
+	}
+	else if (FreeFile(fpout) < 0)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+			   errmsg("could not close temporary dummy statistics file \"%s\": %m",
+					  db_tmpfile)));
+		unlink(db_tmpfile);
+	}
+	else if (rename(db_tmpfile, db_statfile) < 0)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+				 errmsg("could not rename temporary dummy statistics file \"%s\" to \"%s\": %m",
+						db_tmpfile, db_statfile)));
+		unlink(db_tmpfile);
+	}
+
+}
+
+/* ----------
  * pgstat_read_statsfile() -
  *
  *	Reads in an existing statistics collector file and initializes the
  *	databases' hash table (whose entries point to the tables' hash tables).
+ * 
+ *	Allows reading only the global stats (at database level), which is just
+ *	enough for many purposes (e.g. autovacuum launcher etc.). If this is
+ *	sufficient for you, use onlydbs=true.
  * ----------
  */
 static HTAB *
-pgstat_read_statsfile(Oid onlydb, bool permanent)
+pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs)
 {
 	PgStat_StatDBEntry *dbentry;
 	PgStat_StatDBEntry dbbuf;
-	PgStat_StatTabEntry *tabentry;
-	PgStat_StatTabEntry tabbuf;
-	PgStat_StatFuncEntry funcbuf;
-	PgStat_StatFuncEntry *funcentry;
 	HASHCTL		hash_ctl;
 	HTAB	   *dbhash;
 	HTAB	   *tabhash = NULL;
@@ -3613,6 +3865,11 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 
 	/*
+	 * If we want a db-level stats only, we don't want a particular db.
+	 */
+	Assert(!((onlydb != InvalidOid) && onlydbs));
+
+	/*
 	 * The tables will live in pgStatLocalContext.
 	 */
 	pgstat_setup_memcxt();
@@ -3758,6 +4015,16 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				 */
 				tabhash = dbentry->tables;
 				funchash = dbentry->functions;
+
+				/*
+				 * Read the data from the file for this database. If there was
+				 * onlydb specified (!= InvalidOid), we would not get here because
+				 * of a break above. So we don't need to recheck.
+				 */
+				if (! onlydbs)
+					pgstat_read_db_statsfile(dbentry->databaseid, tabhash, funchash,
+											permanent);
+
 				break;
 
 				/*
@@ -3768,6 +4035,105 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				funchash = NULL;
 				break;
 
+			case 'E':
+				goto done;
+
+			default:
+				ereport(pgStatRunningInCollector ? LOG : WARNING,
+						(errmsg("corrupted statistics file \"%s\"",
+								statfile)));
+				goto done;
+		}
+	}
+
+done:
+	FreeFile(fpin);
+
+	if (permanent)
+		unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+
+	return dbhash;
+}
+
+
+/* ----------
+ * pgstat_read_db_statsfile() -
+ *
+ *	Reads in an existing statistics collector db file and initializes the
+ *	tables and functions hash tables (for the database identified by Oid).
+ * ----------
+ */
+static void
+pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent)
+{
+	PgStat_StatTabEntry *tabentry;
+	PgStat_StatTabEntry tabbuf;
+	PgStat_StatFuncEntry funcbuf;
+	PgStat_StatFuncEntry *funcentry;
+	FILE	   *fpin;
+	int32		format_id;
+	TimestampTz timestamp;
+	bool		found;
+	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_DB_FILENAME : pgstat_stat_db_filename;
+
+	/*
+	 * OIDs are 32-bit values, so 10 chars should be safe, +1 for the \0 byte
+	 */
+	char db_statfile[strlen(statfile) + 11];
+
+	/*
+	 * Append database OID at the end of the basic filename (both for tmp and target file).
+	 */
+	snprintf(db_statfile, strlen(statfile) + 11, statfile, databaseid);
+
+	/*
+	 * Try to open the status file. If it doesn't exist, the backends simply
+	 * return zero for anything and the collector simply starts from scratch
+	 * with empty counters.
+	 *
+	 * ENOENT is a possibility if the stats collector is not running or has
+	 * not yet written the stats file the first time.  Any other failure
+	 * condition is suspicious.
+	 */
+	if ((fpin = AllocateFile(db_statfile, PG_BINARY_R)) == NULL)
+	{
+		if (errno != ENOENT)
+			ereport(pgStatRunningInCollector ? LOG : WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not open statistics file \"%s\": %m",
+							db_statfile)));
+		return;
+	}
+
+	/*
+	 * Verify it's of the expected format.
+	 */
+	if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id)
+		|| format_id != PGSTAT_FILE_FORMAT_ID)
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", db_statfile)));
+		goto done;
+	}
+
+	/*
+	 * Read global stats struct
+	 */
+	if (fread(&timestamp, 1, sizeof(timestamp), fpin) != sizeof(timestamp))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", db_statfile)));
+		goto done;
+	}
+
+	/*
+	 * We found an existing collector stats file. Read it and put all the
+	 * hashtable entries into place.
+	 */
+	for (;;)
+	{
+		switch (fgetc(fpin))
+		{
 				/*
 				 * 'T'	A PgStat_StatTabEntry follows.
 				 */
@@ -3777,7 +4143,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
-									statfile)));
+									db_statfile)));
 					goto done;
 				}
 
@@ -3795,7 +4161,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
-									statfile)));
+									db_statfile)));
 					goto done;
 				}
 
@@ -3811,7 +4177,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
-									statfile)));
+									db_statfile)));
 					goto done;
 				}
 
@@ -3829,7 +4195,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
-									statfile)));
+									db_statfile)));
 					goto done;
 				}
 
@@ -3845,7 +4211,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 			default:
 				ereport(pgStatRunningInCollector ? LOG : WARNING,
 						(errmsg("corrupted statistics file \"%s\"",
-								statfile)));
+								db_statfile)));
 				goto done;
 		}
 	}
@@ -3854,37 +4220,47 @@ done:
 	FreeFile(fpin);
 
 	if (permanent)
-		unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+	{
+		char db_statfile[strlen(PGSTAT_STAT_PERMANENT_DB_FILENAME) + 11];
+		snprintf(db_statfile, strlen(PGSTAT_STAT_PERMANENT_DB_FILENAME) + 11,
+				 PGSTAT_STAT_PERMANENT_DB_FILENAME, databaseid);
+		elog(DEBUG1, "removing permanent stats file '%s'", db_statfile);
+		unlink(db_statfile);
+	}
 
-	return dbhash;
+	return;
 }
 
 /* ----------
- * pgstat_read_statsfile_timestamp() -
+ * pgstat_read_db_statsfile_timestamp() -
  *
- *	Attempt to fetch the timestamp of an existing stats file.
+ *	Attempt to fetch the timestamp of an existing stats file (for a DB).
  *	Returns TRUE if successful (timestamp is stored at *ts).
  * ----------
  */
 static bool
-pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
+pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent, TimestampTz *ts)
 {
-	PgStat_GlobalStats myGlobalStats;
+	TimestampTz timestamp;
 	FILE	   *fpin;
 	int32		format_id;
-	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
+	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_DB_FILENAME : pgstat_stat_db_filename;
+	char db_statfile[strlen(statfile) + 11];
+
+	/* format the db statfile filename */
+	snprintf(db_statfile, strlen(statfile) + 11, statfile, databaseid);
 
 	/*
 	 * Try to open the status file.  As above, anything but ENOENT is worthy
 	 * of complaining about.
 	 */
-	if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
+	if ((fpin = AllocateFile(db_statfile, PG_BINARY_R)) == NULL)
 	{
 		if (errno != ENOENT)
 			ereport(pgStatRunningInCollector ? LOG : WARNING,
 					(errcode_for_file_access(),
 					 errmsg("could not open statistics file \"%s\": %m",
-							statfile)));
+							db_statfile)));
 		return false;
 	}
 
@@ -3895,7 +4271,7 @@ pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
 		|| format_id != PGSTAT_FILE_FORMAT_ID)
 	{
 		ereport(pgStatRunningInCollector ? LOG : WARNING,
-				(errmsg("corrupted statistics file \"%s\"", statfile)));
+				(errmsg("corrupted statistics file \"%s\"", db_statfile)));
 		FreeFile(fpin);
 		return false;
 	}
@@ -3903,15 +4279,15 @@ pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
 	/*
 	 * Read global stats struct
 	 */
-	if (fread(&myGlobalStats, 1, sizeof(myGlobalStats), fpin) != sizeof(myGlobalStats))
+	if (fread(&timestamp, 1, sizeof(TimestampTz), fpin) != sizeof(TimestampTz))
 	{
 		ereport(pgStatRunningInCollector ? LOG : WARNING,
-				(errmsg("corrupted statistics file \"%s\"", statfile)));
+				(errmsg("corrupted statistics file \"%s\"", db_statfile)));
 		FreeFile(fpin);
 		return false;
 	}
 
-	*ts = myGlobalStats.stats_timestamp;
+	*ts = timestamp;
 
 	FreeFile(fpin);
 	return true;
@@ -3947,7 +4323,7 @@ backend_read_statsfile(void)
 
 		CHECK_FOR_INTERRUPTS();
 
-		ok = pgstat_read_statsfile_timestamp(false, &file_ts);
+		ok = pgstat_read_db_statsfile_timestamp(MyDatabaseId, false, &file_ts);
 
 		cur_ts = GetCurrentTimestamp();
 		/* Calculate min acceptable timestamp, if we didn't already */
@@ -4006,7 +4382,7 @@ backend_read_statsfile(void)
 				pfree(mytime);
 			}
 
-			pgstat_send_inquiry(cur_ts, min_ts);
+			pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
 			break;
 		}
 
@@ -4016,7 +4392,7 @@ backend_read_statsfile(void)
 
 		/* Not there or too old, so kick the collector and wait a bit */
 		if ((count % PGSTAT_INQ_LOOP_COUNT) == 0)
-			pgstat_send_inquiry(cur_ts, min_ts);
+			pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
 
 		pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
 	}
@@ -4026,9 +4402,16 @@ backend_read_statsfile(void)
 
 	/* Autovacuum launcher wants stats about all databases */
 	if (IsAutoVacuumLauncherProcess())
-		pgStatDBHash = pgstat_read_statsfile(InvalidOid, false);
+		/* 
+		 * FIXME Does it really need info including tables/functions? Or is it enough to read
+		 * database-level stats? It seems to me the launcher needs PgStat_StatDBEntry only
+		 * (at least that's how I understand the rebuild_database_list() in autovacuum.c),
+		 * because pgstat_stattabentries are used in do_autovacuum() only, that that's what's
+		 * executed in workers ... So maybe we'd be just fine by reading in the dbentries?
+		 */
+		pgStatDBHash = pgstat_read_statsfile(InvalidOid, false, true);
 	else
-		pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false);
+		pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false, false);
 }
 
 
@@ -4084,44 +4467,84 @@ pgstat_clear_snapshot(void)
 static void
 pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
 {
-	/*
-	 * Advance last_statrequest if this requestor has a newer cutoff time
-	 * than any previous request.
-	 */
-	if (msg->cutoff_time > last_statrequest)
-		last_statrequest = msg->cutoff_time;
+	int i = 0;
+	bool found = false;
+	PgStat_StatDBEntry *dbentry;
+
+	elog(DEBUG1, "received inquiry for %d", msg->databaseid);
 
 	/*
-	 * If the requestor's local clock time is older than last_statwrite, we
-	 * should suspect a clock glitch, ie system time going backwards; though
-	 * the more likely explanation is just delayed message receipt.  It is
-	 * worth expending a GetCurrentTimestamp call to be sure, since a large
-	 * retreat in the system clock reading could otherwise cause us to neglect
-	 * to update the stats file for a long time.
+	 * Find the last write request for this DB (found=true in that case). Plain
+	 * linear search, not really worth doing any magic here (probably).
 	 */
-	if (msg->clock_time < last_statwrite)
+	for (i = 0; i < num_statrequests; i++)
+	{
+		if (last_statrequests[i].databaseid == msg->databaseid)
+		{
+			found = true;
+			break;
+		}
+	}
+	
+	if (found)
+	{
+		/*
+		 * There already is a request for this DB, so lets advance the
+		 * request time	 if this requestor has a newer cutoff time
+		 * than any previous request.
+		 */
+		if (msg->cutoff_time > last_statrequests[i].request_time)
+			last_statrequests[i].request_time = msg->cutoff_time;
+	}
+	else
 	{
-		TimestampTz cur_ts = GetCurrentTimestamp();
+		/*
+		 * There's no request for this DB yet, so lets create it (allocate a
+		 * space for it, set the values).
+		 */
+		if (last_statrequests == NULL)
+			last_statrequests = palloc(sizeof(DBWriteRequest));
+		else
+			last_statrequests = repalloc(last_statrequests,
+								(num_statrequests + 1)*sizeof(DBWriteRequest));
+		
+		last_statrequests[num_statrequests].databaseid = msg->databaseid;
+		last_statrequests[num_statrequests].request_time = msg->clock_time;
+		num_statrequests += 1;
 
-		if (cur_ts < last_statwrite)
+		/*
+		* If the requestor's local clock time is older than last_statwrite, we
+		* should suspect a clock glitch, ie system time going backwards; though
+		* the more likely explanation is just delayed message receipt.  It is
+		* worth expending a GetCurrentTimestamp call to be sure, since a large
+		* retreat in the system clock reading could otherwise cause us to neglect
+		* to update the stats file for a long time.
+		*/
+		dbentry = pgstat_get_db_entry(msg->databaseid, false);
+		if ((dbentry != NULL) && (msg->clock_time < dbentry->stats_timestamp))
 		{
-			/*
-			 * Sure enough, time went backwards.  Force a new stats file write
-			 * to get back in sync; but first, log a complaint.
-			 */
-			char	   *writetime;
-			char	   *mytime;
-
-			/* Copy because timestamptz_to_str returns a static buffer */
-			writetime = pstrdup(timestamptz_to_str(last_statwrite));
-			mytime = pstrdup(timestamptz_to_str(cur_ts));
-			elog(LOG, "last_statwrite %s is later than collector's time %s",
-				 writetime, mytime);
-			pfree(writetime);
-			pfree(mytime);
-
-			last_statrequest = cur_ts;
-			last_statwrite = last_statrequest - 1;
+			TimestampTz cur_ts = GetCurrentTimestamp();
+
+			if (cur_ts < dbentry->stats_timestamp)
+			{
+				/*
+				* Sure enough, time went backwards.  Force a new stats file write
+				* to get back in sync; but first, log a complaint.
+				*/
+				char	   *writetime;
+				char	   *mytime;
+
+				/* Copy because timestamptz_to_str returns a static buffer */
+				writetime = pstrdup(timestamptz_to_str(dbentry->stats_timestamp));
+				mytime = pstrdup(timestamptz_to_str(cur_ts));
+				elog(LOG, "last_statwrite %s is later than collector's time %s for "
+					"db %d", writetime, mytime, dbentry->databaseid);
+				pfree(writetime);
+				pfree(mytime);
+
+				last_statrequests[num_statrequests].request_time = cur_ts;
+				dbentry->stats_timestamp = cur_ts - 1;
+			}
 		}
 	}
 }
@@ -4278,10 +4701,17 @@ pgstat_recv_dropdb(PgStat_MsgDropdb *msg, int len)
 	dbentry = pgstat_get_db_entry(msg->m_databaseid, false);
 
 	/*
-	 * If found, remove it.
+	 * If found, remove it (along with the db statfile).
 	 */
 	if (dbentry)
 	{
+		char db_statfile[strlen(pgstat_stat_db_filename) + 11];
+		snprintf(db_statfile, strlen(pgstat_stat_db_filename) + 11,
+				 pgstat_stat_filename, dbentry->databaseid);
+		
+		elog(DEBUG1, "removing %s", db_statfile);
+		unlink(db_statfile);
+		
 		if (dbentry->tables != NULL)
 			hash_destroy(dbentry->tables);
 		if (dbentry->functions != NULL)
@@ -4687,3 +5117,58 @@ pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len)
 						   HASH_REMOVE, NULL);
 	}
 }
+
+/* ----------
+ * pgstat_write_statsfile_needed() -
+ *
+ *	Checks whether there's a db stats request, requiring a file write.
+ * 
+ *	TODO Seems that thanks the way we handle last_statrequests (erase after
+ *	a write), this is unnecessary. Just check that there's at least one
+ *	request and you're done. Although there might be delayed requests ...
+ * ----------
+ */
+
+static bool pgstat_write_statsfile_needed()
+{
+	int i = 0;
+	PgStat_StatDBEntry *dbentry;
+	
+	/* Check the databases if they need to refresh the stats. */
+	for (i = 0; i < num_statrequests; i++)
+	{
+		dbentry = pgstat_get_db_entry(last_statrequests[i].databaseid, false);
+		
+		/* No dbentry yet or too old. */
+		if ((! dbentry) ||
+			(dbentry->stats_timestamp < last_statrequests[i].request_time)) {
+			return true;
+		}
+		
+	}
+	
+	/* Well, everything was written recently ... */
+	return false;
+}
+
+/* ----------
+ * pgstat_write_statsfile_needed() -
+ *
+ *	Checks whether stats for a particular DB need to be written to a file).
+ * ----------
+ */
+
+static bool
+pgstat_db_requested(Oid databaseid)
+{
+	int i = 0;
+	
+	/* Check the databases if they need to refresh the stats. */
+	for (i = 0; i < num_statrequests; i++)
+	{
+		if (last_statrequests[i].databaseid == databaseid)
+			return true;
+	}
+	
+	return false;
+}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 2cf34ce..e3e432b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -8730,20 +8730,43 @@ static void
 assign_pgstat_temp_directory(const char *newval, void *extra)
 {
 	/* check_canonical_path already canonicalized newval for us */
+	char	   *dname;
 	char	   *tname;
 	char	   *fname;
-
-	tname = guc_malloc(ERROR, strlen(newval) + 12);		/* /pgstat.tmp */
-	sprintf(tname, "%s/pgstat.tmp", newval);
-	fname = guc_malloc(ERROR, strlen(newval) + 13);		/* /pgstat.stat */
-	sprintf(fname, "%s/pgstat.stat", newval);
-
+	char	   *tname_db;
+	char	   *fname_db;
+
+	/* directory */
+	dname = guc_malloc(ERROR, strlen(newval) + 1);		/* runtime dir */
+	sprintf(dname, "%s", newval);
+
+	/* global stats */
+	tname = guc_malloc(ERROR, strlen(newval) + 12);		/* /global.tmp */
+	sprintf(tname, "%s/global.tmp", newval);
+	fname = guc_malloc(ERROR, strlen(newval) + 13);		/* /global.stat */
+	sprintf(fname, "%s/global.stat", newval);
+
+	/* per-db stats */
+	tname_db = guc_malloc(ERROR, strlen(newval) + 8);		/* /%d.tmp */
+	sprintf(tname_db, "%s/%%d.tmp", newval);
+	fname_db = guc_malloc(ERROR, strlen(newval) + 9);		/* /%d.stat */
+	sprintf(fname_db, "%s/%%d.stat", newval);
+
+	if (pgstat_stat_directory)
+		free(pgstat_stat_directory);
+	pgstat_stat_directory = dname;
 	if (pgstat_stat_tmpname)
 		free(pgstat_stat_tmpname);
 	pgstat_stat_tmpname = tname;
 	if (pgstat_stat_filename)
 		free(pgstat_stat_filename);
 	pgstat_stat_filename = fname;
+	if (pgstat_stat_db_tmpname)
+		free(pgstat_stat_db_tmpname);
+	pgstat_stat_db_tmpname = tname_db;
+	if (pgstat_stat_db_filename)
+		free(pgstat_stat_db_filename);
+	pgstat_stat_db_filename = fname_db;
 }
 
 static bool
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 3e05ac3..8c86301 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -179,6 +179,7 @@ char	   *restrict_env;
 #endif
 const char *subdirs[] = {
 	"global",
+	"global/stat",
 	"pg_xlog",
 	"pg_xlog/archive_status",
 	"pg_clog",
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 613c1c2..b3467d2 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -205,6 +205,7 @@ typedef struct PgStat_MsgInquiry
 	PgStat_MsgHdr m_hdr;
 	TimestampTz clock_time;		/* observed local clock time */
 	TimestampTz cutoff_time;	/* minimum acceptable file timestamp */
+	Oid			databaseid;		/* requested DB (InvalidOid => all DBs) */
 } PgStat_MsgInquiry;
 
 
@@ -514,7 +515,7 @@ typedef union PgStat_Msg
  * ------------------------------------------------------------
  */
 
-#define PGSTAT_FILE_FORMAT_ID	0x01A5BC9A
+#define PGSTAT_FILE_FORMAT_ID	0xA240CA47
 
 /* ----------
  * PgStat_StatDBEntry			The collector's data per database
@@ -545,6 +546,7 @@ typedef struct PgStat_StatDBEntry
 	PgStat_Counter n_block_write_time;
 
 	TimestampTz stat_reset_timestamp;
+	TimestampTz stats_timestamp;		/* time of db stats file update */
 
 	/*
 	 * tables and functions must be last in the struct, because we don't write
@@ -722,8 +724,11 @@ extern bool pgstat_track_activities;
 extern bool pgstat_track_counts;
 extern int	pgstat_track_functions;
 extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern char *pgstat_stat_directory;
 extern char *pgstat_stat_tmpname;
 extern char *pgstat_stat_filename;
+extern char *pgstat_stat_db_tmpname;
+extern char *pgstat_stat_db_filename;
 
 /*
  * BgWriter statistics counters are updated directly by bgwriter and bufmgr
#22Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Tomas Vondra (#21)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 03.01.2013 01:15, Tomas Vondra wrote:

2) a new global/stat directory
------------------------------

The pgstat.stat file was originally saved into the "global" directory,
but with so many files that would get rather messy so I've created a new
global/stat directory and all the files are stored there.

This also means we can do a simple "delete files in the dir" when
pgstat_reset_all is called.

How about creating the new directory as a direct subdir of $PGDATA,
rather than buried in global? "global" is supposed to contain data
related to shared catalog relations (plus pg_control), so it doesn't
seem like the right location for per-database stat files. Also, if we're
going to have admins manually zapping the directory (hopefully when the
system is offline), that's less scary if the directory is not buried as
deep.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Tomas Vondra
tv@fuzzy.cz
In reply to: Heikki Linnakangas (#22)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 3.1.2013 18:47, Heikki Linnakangas wrote:

On 03.01.2013 01:15, Tomas Vondra wrote:

2) a new global/stat directory
------------------------------

The pgstat.stat file was originally saved into the "global" directory,
but with so many files that would get rather messy so I've created a new
global/stat directory and all the files are stored there.

This also means we can do a simple "delete files in the dir" when
pgstat_reset_all is called.

How about creating the new directory as a direct subdir of $PGDATA,
rather than buried in global? "global" is supposed to contain data
related to shared catalog relations (plus pg_control), so it doesn't
seem like the right location for per-database stat files. Also, if we're
going to have admins manually zapping the directory (hopefully when the
system is offline), that's less scary if the directory is not buried as
deep.

That's clearly possible and it's a trivial change. I was thinking about
that actually, but then I placed the directory into "global" because
that's where the "pgstat.stat" originally was.

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Magnus Hagander
magnus@hagander.net
In reply to: Tomas Vondra (#23)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On Thu, Jan 3, 2013 at 8:31 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 3.1.2013 18:47, Heikki Linnakangas wrote:

On 03.01.2013 01:15, Tomas Vondra wrote:

2) a new global/stat directory
------------------------------

The pgstat.stat file was originally saved into the "global" directory,
but with so many files that would get rather messy so I've created a new
global/stat directory and all the files are stored there.

This also means we can do a simple "delete files in the dir" when
pgstat_reset_all is called.

How about creating the new directory as a direct subdir of $PGDATA,
rather than buried in global? "global" is supposed to contain data
related to shared catalog relations (plus pg_control), so it doesn't
seem like the right location for per-database stat files. Also, if we're
going to have admins manually zapping the directory (hopefully when the
system is offline), that's less scary if the directory is not buried as
deep.

That's clearly possible and it's a trivial change. I was thinking about
that actually, but then I placed the directory into "global" because
that's where the "pgstat.stat" originally was.

Yeah, +1 for a separate directory not in global.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Tomas Vondra
tv@fuzzy.cz
In reply to: Magnus Hagander (#24)
1 attachment(s)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 3.1.2013 20:33, Magnus Hagander wrote:

On Thu, Jan 3, 2013 at 8:31 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 3.1.2013 18:47, Heikki Linnakangas wrote:

How about creating the new directory as a direct subdir of $PGDATA,
rather than buried in global? "global" is supposed to contain data
related to shared catalog relations (plus pg_control), so it doesn't
seem like the right location for per-database stat files. Also, if we're
going to have admins manually zapping the directory (hopefully when the
system is offline), that's less scary if the directory is not buried as
deep.

That's clearly possible and it's a trivial change. I was thinking about
that actually, but then I placed the directory into "global" because
that's where the "pgstat.stat" originally was.

Yeah, +1 for a separate directory not in global.

OK, I moved the files from "global/stat" to "stat".

Tomas

Attachments:

stats-split-v5.patchtext/plain; charset=UTF-8; name=stats-split-v5.patchDownload
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index be3adf1..4ec485e 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -64,10 +64,14 @@
 
 /* ----------
  * Paths for the statistics files (relative to installation's $PGDATA).
+ * Permanent and temprorary, global and per-database files.
  * ----------
  */
-#define PGSTAT_STAT_PERMANENT_FILENAME		"global/pgstat.stat"
-#define PGSTAT_STAT_PERMANENT_TMPFILE		"global/pgstat.tmp"
+#define PGSTAT_STAT_PERMANENT_DIRECTORY		"stat"
+#define PGSTAT_STAT_PERMANENT_FILENAME		"stat/global.stat"
+#define PGSTAT_STAT_PERMANENT_TMPFILE		"stat/global.tmp"
+#define PGSTAT_STAT_PERMANENT_DB_FILENAME	"stat/%d.stat"
+#define PGSTAT_STAT_PERMANENT_DB_TMPFILE	"stat/%d.tmp"
 
 /* ----------
  * Timer definitions.
@@ -115,8 +119,11 @@ int			pgstat_track_activity_query_size = 1024;
  * Built from GUC parameter
  * ----------
  */
+char	   *pgstat_stat_directory = NULL;
 char	   *pgstat_stat_filename = NULL;
 char	   *pgstat_stat_tmpname = NULL;
+char	   *pgstat_stat_db_filename = NULL;
+char	   *pgstat_stat_db_tmpname = NULL;
 
 /*
  * BgWriter global statistics counters (unused in other processes).
@@ -219,11 +226,16 @@ static int	localNumBackends = 0;
  */
 static PgStat_GlobalStats globalStats;
 
-/* Last time the collector successfully wrote the stats file */
-static TimestampTz last_statwrite;
+/* Write request info for each database */
+typedef struct DBWriteRequest
+{
+	Oid			databaseid;		/* OID of the database to write */
+	TimestampTz request_time;	/* timestamp of the last write request */
+} DBWriteRequest;
 
-/* Latest statistics request time from backends */
-static TimestampTz last_statrequest;
+/* Latest statistics request time from backends for each DB */
+static DBWriteRequest * last_statrequests = NULL;
+static int num_statrequests = 0;
 
 static volatile bool need_exit = false;
 static volatile bool got_SIGHUP = false;
@@ -252,11 +264,17 @@ static void pgstat_sighup_handler(SIGNAL_ARGS);
 static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
 static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
 					 Oid tableoid, bool create);
-static void pgstat_write_statsfile(bool permanent);
-static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent);
+static void pgstat_write_statsfile(bool permanent, bool force);
+static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent);
+static void pgstat_write_db_dummyfile(Oid databaseid);
+static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs);
+static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
 static void backend_read_statsfile(void);
 static void pgstat_read_current_status(void);
 
+static bool pgstat_write_statsfile_needed();
+static bool pgstat_db_requested(Oid databaseid);
+
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
 static HTAB *pgstat_collect_oids(Oid catalogid);
@@ -285,7 +303,6 @@ static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le
 static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
 static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
 
-
 /* ------------------------------------------------------------
  * Public functions called from postmaster follow
  * ------------------------------------------------------------
@@ -549,8 +566,34 @@ startup_failed:
 void
 pgstat_reset_all(void)
 {
-	unlink(pgstat_stat_filename);
-	unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+	DIR * dir;
+	struct dirent * entry;
+
+	dir = AllocateDir(pgstat_stat_directory);
+	while ((entry = ReadDir(dir, pgstat_stat_directory)) != NULL)
+	{
+		char fname[strlen(pgstat_stat_directory) + strlen(entry->d_name) + 1];
+
+		if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
+			continue;
+
+		sprintf(fname, "%s/%s", pgstat_stat_directory, entry->d_name);
+		unlink(fname);
+	}
+	FreeDir(dir);
+
+	dir = AllocateDir(PGSTAT_STAT_PERMANENT_DIRECTORY);
+	while ((entry = ReadDir(dir, PGSTAT_STAT_PERMANENT_DIRECTORY)) != NULL)
+	{
+		char fname[strlen(PGSTAT_STAT_PERMANENT_FILENAME) + strlen(entry->d_name) + 1];
+
+		if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
+			continue;
+
+		sprintf(fname, "%s/%s", PGSTAT_STAT_PERMANENT_FILENAME, entry->d_name);
+		unlink(fname);
+	}
+	FreeDir(dir);
 }
 
 #ifdef EXEC_BACKEND
@@ -1408,13 +1451,14 @@ pgstat_ping(void)
  * ----------
  */
 static void
-pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time)
+pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time, Oid databaseid)
 {
 	PgStat_MsgInquiry msg;
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_INQUIRY);
 	msg.clock_time = clock_time;
 	msg.cutoff_time = cutoff_time;
+	msg.databaseid = databaseid;
 	pgstat_send(&msg, sizeof(msg));
 }
 
@@ -3004,6 +3048,7 @@ PgstatCollectorMain(int argc, char *argv[])
 	int			len;
 	PgStat_Msg	msg;
 	int			wr;
+	bool		first_write = true;
 
 	IsUnderPostmaster = true;	/* we are a postmaster subprocess now */
 
@@ -3053,17 +3098,11 @@ PgstatCollectorMain(int argc, char *argv[])
 	init_ps_display("stats collector process", "", "", "");
 
 	/*
-	 * Arrange to write the initial status file right away
-	 */
-	last_statrequest = GetCurrentTimestamp();
-	last_statwrite = last_statrequest - 1;
-
-	/*
 	 * Read in an existing statistics stats file or initialize the stats to
-	 * zero.
+	 * zero (read data for all databases, including table/func stats).
 	 */
 	pgStatRunningInCollector = true;
-	pgStatDBHash = pgstat_read_statsfile(InvalidOid, true);
+	pgStatDBHash = pgstat_read_statsfile(InvalidOid, true, false);
 
 	/*
 	 * Loop to process messages until we get SIGQUIT or detect ungraceful
@@ -3107,10 +3146,14 @@ PgstatCollectorMain(int argc, char *argv[])
 
 			/*
 			 * Write the stats file if a new request has arrived that is not
-			 * satisfied by existing file.
+			 * satisfied by existing file (force writing all files if it's
+			 * the first write after startup).
 			 */
-			if (last_statwrite < last_statrequest)
-				pgstat_write_statsfile(false);
+			if (first_write || pgstat_write_statsfile_needed())
+			{
+				pgstat_write_statsfile(false, first_write);
+				first_write = false;
+			}
 
 			/*
 			 * Try to receive and process a message.  This will not block,
@@ -3269,7 +3312,7 @@ PgstatCollectorMain(int argc, char *argv[])
 	/*
 	 * Save the final stats to reuse at next startup.
 	 */
-	pgstat_write_statsfile(true);
+	pgstat_write_statsfile(true, true);
 
 	exit(0);
 }
@@ -3429,23 +3472,25 @@ pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create)
  *	shutting down only), remove the temporary file so that backends
  *	starting up under a new postmaster can't read the old data before
  *	the new collector is ready.
+ * 
+ *	When the 'force' is false, only the requested databases (listed in
+ * 	last_statrequests) will be written. If 'force' is true, all databases
+ * 	will be written (this is used e.g. at shutdown).
  * ----------
  */
 static void
-pgstat_write_statsfile(bool permanent)
+pgstat_write_statsfile(bool permanent, bool force)
 {
 	HASH_SEQ_STATUS hstat;
-	HASH_SEQ_STATUS tstat;
-	HASH_SEQ_STATUS fstat;
 	PgStat_StatDBEntry *dbentry;
-	PgStat_StatTabEntry *tabentry;
-	PgStat_StatFuncEntry *funcentry;
 	FILE	   *fpout;
 	int32		format_id;
 	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 	int			rc;
 
+	elog(DEBUG1, "writing statsfile '%s'", statfile);
+	
 	/*
 	 * Open the statistics temp file to write out the current values.
 	 */
@@ -3484,6 +3529,20 @@ pgstat_write_statsfile(bool permanent)
 	while ((dbentry = (PgStat_StatDBEntry *) hash_seq_search(&hstat)) != NULL)
 	{
 		/*
+		 * Write our the tables and functions into a separate file, but only
+		 * if the database is in the requests or if it's a forced write (then
+		 * all the DBs need to be written - e.g. at the shutdown).
+		 * 
+		 * We need to do this before the dbentry write to write the proper
+		 * timestamp to the global file.
+		 */
+		if (force || pgstat_db_requested(dbentry->databaseid)) {
+			elog(DEBUG1, "writing statsfile for DB %d", dbentry->databaseid);
+			dbentry->stats_timestamp = globalStats.stats_timestamp;
+			pgstat_write_db_statsfile(dbentry, permanent);
+		}
+
+		/*
 		 * Write out the DB entry including the number of live backends. We
 		 * don't write the tables or functions pointers, since they're of no
 		 * use to any other process.
@@ -3493,29 +3552,10 @@ pgstat_write_statsfile(bool permanent)
 		(void) rc;				/* we'll check for error with ferror */
 
 		/*
-		 * Walk through the database's access stats per table.
-		 */
-		hash_seq_init(&tstat, dbentry->tables);
-		while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
-		{
-			fputc('T', fpout);
-			rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
-			(void) rc;			/* we'll check for error with ferror */
-		}
-
-		/*
-		 * Walk through the database's function stats table.
-		 */
-		hash_seq_init(&fstat, dbentry->functions);
-		while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
-		{
-			fputc('F', fpout);
-			rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
-			(void) rc;			/* we'll check for error with ferror */
-		}
-
-		/*
 		 * Mark the end of this DB
+		 * 
+		 * TODO Does using these chars still make sense, when the tables/func
+		 * stats are moved to a separate file?
 		 */
 		fputc('d', fpout);
 	}
@@ -3527,6 +3567,28 @@ pgstat_write_statsfile(bool permanent)
 	 */
 	fputc('E', fpout);
 
+	/* In any case, we can just throw away all the db requests, but we need to
+	 * write dummy files for databases without a stat entry (it would cause
+	 * issues in pgstat_read_db_statsfile_timestamp and pgstat wait timeouts).
+	 * This may happend e.g. for shared DB (oid = 0) right after initdb.
+	 */
+	if (last_statrequests != NULL)
+	{
+		int i = 0;
+		for (i = 0; i < num_statrequests; i++)
+		{
+			/* Create dummy files for requested databases without a proper
+			 * dbentry. It's much easier this way than dealing with multiple
+			 * timestamps, possibly existing but not yet written DBs etc. */
+			if (! pgstat_get_db_entry(last_statrequests[i].databaseid, false))
+				pgstat_write_db_dummyfile(last_statrequests[i].databaseid);
+		}
+
+		pfree(last_statrequests);
+		last_statrequests = NULL;
+		num_statrequests = 0;
+	}
+
 	if (ferror(fpout))
 	{
 		ereport(LOG,
@@ -3552,57 +3614,247 @@ pgstat_write_statsfile(bool permanent)
 						tmpfile, statfile)));
 		unlink(tmpfile);
 	}
-	else
+
+	if (permanent)
+		unlink(pgstat_stat_filename);
+}
+
+
+/* ----------
+ * pgstat_write_db_statsfile() -
+ *
+ *	Tell the news. This writes stats file for a single database.
+ *
+ *	If writing to the permanent file (happens when the collector is
+ *	shutting down only), remove the temporary file so that backends
+ *	starting up under a new postmaster can't read the old data before
+ *	the new collector is ready.
+ * ----------
+ */
+static void
+pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent)
+{
+	HASH_SEQ_STATUS tstat;
+	HASH_SEQ_STATUS fstat;
+	PgStat_StatTabEntry *tabentry;
+	PgStat_StatFuncEntry *funcentry;
+	FILE	   *fpout;
+	int32		format_id;
+	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_DB_TMPFILE : pgstat_stat_db_tmpname;
+	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_DB_FILENAME : pgstat_stat_db_filename;
+	int			rc;
+
+	/*
+	 * OIDs are 32-bit values, so 10 chars should be safe, +1 for the \0 byte
+	 */
+	char db_tmpfile[strlen(tmpfile) + 11];
+	char db_statfile[strlen(statfile) + 11];
+
+	/*
+	 * Append database OID at the end of the basic filename (both for tmp and target file).
+	 */
+	snprintf(db_tmpfile, strlen(tmpfile) + 11, tmpfile, dbentry->databaseid);
+	snprintf(db_statfile, strlen(statfile) + 11, statfile, dbentry->databaseid);
+
+	elog(DEBUG1, "writing statsfile '%s'", db_statfile);
+
+	/*
+	 * Open the statistics temp file to write out the current values.
+	 */
+	fpout = AllocateFile(db_tmpfile, PG_BINARY_W);
+	if (fpout == NULL)
 	{
-		/*
-		 * Successful write, so update last_statwrite.
-		 */
-		last_statwrite = globalStats.stats_timestamp;
+		ereport(LOG,
+				(errcode_for_file_access(),
+				 errmsg("could not open temporary statistics file \"%s\": %m",
+						db_tmpfile)));
+		return;
+	}
 
-		/*
-		 * If there is clock skew between backends and the collector, we could
-		 * receive a stats request time that's in the future.  If so, complain
-		 * and reset last_statrequest.	Resetting ensures that no inquiry
-		 * message can cause more than one stats file write to occur.
-		 */
-		if (last_statrequest > last_statwrite)
-		{
-			char	   *reqtime;
-			char	   *mytime;
-
-			/* Copy because timestamptz_to_str returns a static buffer */
-			reqtime = pstrdup(timestamptz_to_str(last_statrequest));
-			mytime = pstrdup(timestamptz_to_str(last_statwrite));
-			elog(LOG, "last_statrequest %s is later than collector's time %s",
-				 reqtime, mytime);
-			pfree(reqtime);
-			pfree(mytime);
-
-			last_statrequest = last_statwrite;
-		}
+	/*
+	 * Write the file header --- currently just a format ID.
+	 */
+	format_id = PGSTAT_FILE_FORMAT_ID;
+	rc = fwrite(&format_id, sizeof(format_id), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
+	/*
+	 * Write the timestamp.
+	 */
+	rc = fwrite(&(globalStats.stats_timestamp), sizeof(globalStats.stats_timestamp), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
+	/*
+	 * Walk through the database's access stats per table.
+	 */
+	hash_seq_init(&tstat, dbentry->tables);
+	while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
+	{
+		fputc('T', fpout);
+		rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
+		(void) rc;			/* we'll check for error with ferror */
 	}
 
+	/*
+	 * Walk through the database's function stats table.
+	 */
+	hash_seq_init(&fstat, dbentry->functions);
+	while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
+	{
+		fputc('F', fpout);
+		rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
+		(void) rc;			/* we'll check for error with ferror */
+	}
+
+	/*
+	 * No more output to be done. Close the temp file and replace the old
+	 * pgstat.stat with it.  The ferror() check replaces testing for error
+	 * after each individual fputc or fwrite above.
+	 */
+	fputc('E', fpout);
+
+	if (ferror(fpout))
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+			   errmsg("could not write temporary statistics file \"%s\": %m",
+					  db_tmpfile)));
+		FreeFile(fpout);
+		unlink(db_tmpfile);
+	}
+	else if (FreeFile(fpout) < 0)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+			   errmsg("could not close temporary statistics file \"%s\": %m",
+					  db_tmpfile)));
+		unlink(db_tmpfile);
+	}
+	else if (rename(db_tmpfile, db_statfile) < 0)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+				 errmsg("could not rename temporary statistics file \"%s\" to \"%s\": %m",
+						db_tmpfile, db_statfile)));
+		unlink(db_tmpfile);
+	}
+	
 	if (permanent)
-		unlink(pgstat_stat_filename);
+	{
+		char db_statfile[strlen(pgstat_stat_db_filename) + 11];
+		snprintf(db_statfile, strlen(pgstat_stat_db_filename) + 11,
+				 pgstat_stat_db_filename, dbentry->databaseid);
+		elog(DEBUG1, "removing temporary stat file '%s'", db_statfile);
+		unlink(db_statfile);
+	}
 }
 
 
 /* ----------
+ * pgstat_write_db_dummyfile() -
+ *
+ *	All this does is writing a dummy stat file for databases without dbentry
+ *	yet. It basically writes just a file header - format ID and a timestamp.
+ * ----------
+ */
+static void
+pgstat_write_db_dummyfile(Oid databaseid)
+{
+	FILE	   *fpout;
+	int32		format_id;
+	int			rc;
+
+	/*
+	 * OIDs are 32-bit values, so 10 chars should be safe, +1 for the \0 byte
+	 */
+	char db_tmpfile[strlen(pgstat_stat_db_tmpname) + 11];
+	char db_statfile[strlen(pgstat_stat_db_filename) + 11];
+
+	/*
+	 * Append database OID at the end of the basic filename (both for tmp and target file).
+	 */
+	snprintf(db_tmpfile, strlen(pgstat_stat_db_tmpname) + 11, pgstat_stat_db_tmpname, databaseid);
+	snprintf(db_statfile, strlen(pgstat_stat_db_filename) + 11, pgstat_stat_db_filename, databaseid);
+
+	elog(DEBUG1, "writing statsfile '%s'", db_statfile);
+
+	/*
+	 * Open the statistics temp file to write out the current values.
+	 */
+	fpout = AllocateFile(db_tmpfile, PG_BINARY_W);
+	if (fpout == NULL)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+				 errmsg("could not open temporary statistics file \"%s\": %m",
+						db_tmpfile)));
+		return;
+	}
+
+	/*
+	 * Write the file header --- currently just a format ID.
+	 */
+	format_id = PGSTAT_FILE_FORMAT_ID;
+	rc = fwrite(&format_id, sizeof(format_id), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
+	/*
+	 * Write the timestamp.
+	 */
+	rc = fwrite(&(globalStats.stats_timestamp), sizeof(globalStats.stats_timestamp), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
+	/*
+	 * No more output to be done. Close the temp file and replace the old
+	 * pgstat.stat with it.  The ferror() check replaces testing for error
+	 * after each individual fputc or fwrite above.
+	 */
+	fputc('E', fpout);
+
+	if (ferror(fpout))
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+			   errmsg("could not write temporary dummy statistics file \"%s\": %m",
+					  db_tmpfile)));
+		FreeFile(fpout);
+		unlink(db_tmpfile);
+	}
+	else if (FreeFile(fpout) < 0)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+			   errmsg("could not close temporary dummy statistics file \"%s\": %m",
+					  db_tmpfile)));
+		unlink(db_tmpfile);
+	}
+	else if (rename(db_tmpfile, db_statfile) < 0)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+				 errmsg("could not rename temporary dummy statistics file \"%s\" to \"%s\": %m",
+						db_tmpfile, db_statfile)));
+		unlink(db_tmpfile);
+	}
+
+}
+
+/* ----------
  * pgstat_read_statsfile() -
  *
  *	Reads in an existing statistics collector file and initializes the
  *	databases' hash table (whose entries point to the tables' hash tables).
+ * 
+ *	Allows reading only the global stats (at database level), which is just
+ *	enough for many purposes (e.g. autovacuum launcher etc.). If this is
+ *	sufficient for you, use onlydbs=true.
  * ----------
  */
 static HTAB *
-pgstat_read_statsfile(Oid onlydb, bool permanent)
+pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs)
 {
 	PgStat_StatDBEntry *dbentry;
 	PgStat_StatDBEntry dbbuf;
-	PgStat_StatTabEntry *tabentry;
-	PgStat_StatTabEntry tabbuf;
-	PgStat_StatFuncEntry funcbuf;
-	PgStat_StatFuncEntry *funcentry;
 	HASHCTL		hash_ctl;
 	HTAB	   *dbhash;
 	HTAB	   *tabhash = NULL;
@@ -3613,6 +3865,11 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 
 	/*
+	 * If we want a db-level stats only, we don't want a particular db.
+	 */
+	Assert(!((onlydb != InvalidOid) && onlydbs));
+
+	/*
 	 * The tables will live in pgStatLocalContext.
 	 */
 	pgstat_setup_memcxt();
@@ -3758,6 +4015,16 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				 */
 				tabhash = dbentry->tables;
 				funchash = dbentry->functions;
+
+				/*
+				 * Read the data from the file for this database. If there was
+				 * onlydb specified (!= InvalidOid), we would not get here because
+				 * of a break above. So we don't need to recheck.
+				 */
+				if (! onlydbs)
+					pgstat_read_db_statsfile(dbentry->databaseid, tabhash, funchash,
+											permanent);
+
 				break;
 
 				/*
@@ -3768,6 +4035,105 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				funchash = NULL;
 				break;
 
+			case 'E':
+				goto done;
+
+			default:
+				ereport(pgStatRunningInCollector ? LOG : WARNING,
+						(errmsg("corrupted statistics file \"%s\"",
+								statfile)));
+				goto done;
+		}
+	}
+
+done:
+	FreeFile(fpin);
+
+	if (permanent)
+		unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+
+	return dbhash;
+}
+
+
+/* ----------
+ * pgstat_read_db_statsfile() -
+ *
+ *	Reads in an existing statistics collector db file and initializes the
+ *	tables and functions hash tables (for the database identified by Oid).
+ * ----------
+ */
+static void
+pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent)
+{
+	PgStat_StatTabEntry *tabentry;
+	PgStat_StatTabEntry tabbuf;
+	PgStat_StatFuncEntry funcbuf;
+	PgStat_StatFuncEntry *funcentry;
+	FILE	   *fpin;
+	int32		format_id;
+	TimestampTz timestamp;
+	bool		found;
+	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_DB_FILENAME : pgstat_stat_db_filename;
+
+	/*
+	 * OIDs are 32-bit values, so 10 chars should be safe, +1 for the \0 byte
+	 */
+	char db_statfile[strlen(statfile) + 11];
+
+	/*
+	 * Append database OID at the end of the basic filename (both for tmp and target file).
+	 */
+	snprintf(db_statfile, strlen(statfile) + 11, statfile, databaseid);
+
+	/*
+	 * Try to open the status file. If it doesn't exist, the backends simply
+	 * return zero for anything and the collector simply starts from scratch
+	 * with empty counters.
+	 *
+	 * ENOENT is a possibility if the stats collector is not running or has
+	 * not yet written the stats file the first time.  Any other failure
+	 * condition is suspicious.
+	 */
+	if ((fpin = AllocateFile(db_statfile, PG_BINARY_R)) == NULL)
+	{
+		if (errno != ENOENT)
+			ereport(pgStatRunningInCollector ? LOG : WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not open statistics file \"%s\": %m",
+							db_statfile)));
+		return;
+	}
+
+	/*
+	 * Verify it's of the expected format.
+	 */
+	if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id)
+		|| format_id != PGSTAT_FILE_FORMAT_ID)
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", db_statfile)));
+		goto done;
+	}
+
+	/*
+	 * Read global stats struct
+	 */
+	if (fread(&timestamp, 1, sizeof(timestamp), fpin) != sizeof(timestamp))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", db_statfile)));
+		goto done;
+	}
+
+	/*
+	 * We found an existing collector stats file. Read it and put all the
+	 * hashtable entries into place.
+	 */
+	for (;;)
+	{
+		switch (fgetc(fpin))
+		{
 				/*
 				 * 'T'	A PgStat_StatTabEntry follows.
 				 */
@@ -3777,7 +4143,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
-									statfile)));
+									db_statfile)));
 					goto done;
 				}
 
@@ -3795,7 +4161,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
-									statfile)));
+									db_statfile)));
 					goto done;
 				}
 
@@ -3811,7 +4177,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
-									statfile)));
+									db_statfile)));
 					goto done;
 				}
 
@@ -3829,7 +4195,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
-									statfile)));
+									db_statfile)));
 					goto done;
 				}
 
@@ -3845,7 +4211,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 			default:
 				ereport(pgStatRunningInCollector ? LOG : WARNING,
 						(errmsg("corrupted statistics file \"%s\"",
-								statfile)));
+								db_statfile)));
 				goto done;
 		}
 	}
@@ -3854,37 +4220,47 @@ done:
 	FreeFile(fpin);
 
 	if (permanent)
-		unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+	{
+		char db_statfile[strlen(PGSTAT_STAT_PERMANENT_DB_FILENAME) + 11];
+		snprintf(db_statfile, strlen(PGSTAT_STAT_PERMANENT_DB_FILENAME) + 11,
+				 PGSTAT_STAT_PERMANENT_DB_FILENAME, databaseid);
+		elog(DEBUG1, "removing permanent stats file '%s'", db_statfile);
+		unlink(db_statfile);
+	}
 
-	return dbhash;
+	return;
 }
 
 /* ----------
- * pgstat_read_statsfile_timestamp() -
+ * pgstat_read_db_statsfile_timestamp() -
  *
- *	Attempt to fetch the timestamp of an existing stats file.
+ *	Attempt to fetch the timestamp of an existing stats file (for a DB).
  *	Returns TRUE if successful (timestamp is stored at *ts).
  * ----------
  */
 static bool
-pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
+pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent, TimestampTz *ts)
 {
-	PgStat_GlobalStats myGlobalStats;
+	TimestampTz timestamp;
 	FILE	   *fpin;
 	int32		format_id;
-	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
+	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_DB_FILENAME : pgstat_stat_db_filename;
+	char db_statfile[strlen(statfile) + 11];
+
+	/* format the db statfile filename */
+	snprintf(db_statfile, strlen(statfile) + 11, statfile, databaseid);
 
 	/*
 	 * Try to open the status file.  As above, anything but ENOENT is worthy
 	 * of complaining about.
 	 */
-	if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
+	if ((fpin = AllocateFile(db_statfile, PG_BINARY_R)) == NULL)
 	{
 		if (errno != ENOENT)
 			ereport(pgStatRunningInCollector ? LOG : WARNING,
 					(errcode_for_file_access(),
 					 errmsg("could not open statistics file \"%s\": %m",
-							statfile)));
+							db_statfile)));
 		return false;
 	}
 
@@ -3895,7 +4271,7 @@ pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
 		|| format_id != PGSTAT_FILE_FORMAT_ID)
 	{
 		ereport(pgStatRunningInCollector ? LOG : WARNING,
-				(errmsg("corrupted statistics file \"%s\"", statfile)));
+				(errmsg("corrupted statistics file \"%s\"", db_statfile)));
 		FreeFile(fpin);
 		return false;
 	}
@@ -3903,15 +4279,15 @@ pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
 	/*
 	 * Read global stats struct
 	 */
-	if (fread(&myGlobalStats, 1, sizeof(myGlobalStats), fpin) != sizeof(myGlobalStats))
+	if (fread(&timestamp, 1, sizeof(TimestampTz), fpin) != sizeof(TimestampTz))
 	{
 		ereport(pgStatRunningInCollector ? LOG : WARNING,
-				(errmsg("corrupted statistics file \"%s\"", statfile)));
+				(errmsg("corrupted statistics file \"%s\"", db_statfile)));
 		FreeFile(fpin);
 		return false;
 	}
 
-	*ts = myGlobalStats.stats_timestamp;
+	*ts = timestamp;
 
 	FreeFile(fpin);
 	return true;
@@ -3947,7 +4323,7 @@ backend_read_statsfile(void)
 
 		CHECK_FOR_INTERRUPTS();
 
-		ok = pgstat_read_statsfile_timestamp(false, &file_ts);
+		ok = pgstat_read_db_statsfile_timestamp(MyDatabaseId, false, &file_ts);
 
 		cur_ts = GetCurrentTimestamp();
 		/* Calculate min acceptable timestamp, if we didn't already */
@@ -4006,7 +4382,7 @@ backend_read_statsfile(void)
 				pfree(mytime);
 			}
 
-			pgstat_send_inquiry(cur_ts, min_ts);
+			pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
 			break;
 		}
 
@@ -4016,7 +4392,7 @@ backend_read_statsfile(void)
 
 		/* Not there or too old, so kick the collector and wait a bit */
 		if ((count % PGSTAT_INQ_LOOP_COUNT) == 0)
-			pgstat_send_inquiry(cur_ts, min_ts);
+			pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
 
 		pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
 	}
@@ -4026,9 +4402,16 @@ backend_read_statsfile(void)
 
 	/* Autovacuum launcher wants stats about all databases */
 	if (IsAutoVacuumLauncherProcess())
-		pgStatDBHash = pgstat_read_statsfile(InvalidOid, false);
+		/* 
+		 * FIXME Does it really need info including tables/functions? Or is it enough to read
+		 * database-level stats? It seems to me the launcher needs PgStat_StatDBEntry only
+		 * (at least that's how I understand the rebuild_database_list() in autovacuum.c),
+		 * because pgstat_stattabentries are used in do_autovacuum() only, that that's what's
+		 * executed in workers ... So maybe we'd be just fine by reading in the dbentries?
+		 */
+		pgStatDBHash = pgstat_read_statsfile(InvalidOid, false, true);
 	else
-		pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false);
+		pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false, false);
 }
 
 
@@ -4084,44 +4467,84 @@ pgstat_clear_snapshot(void)
 static void
 pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
 {
-	/*
-	 * Advance last_statrequest if this requestor has a newer cutoff time
-	 * than any previous request.
-	 */
-	if (msg->cutoff_time > last_statrequest)
-		last_statrequest = msg->cutoff_time;
+	int i = 0;
+	bool found = false;
+	PgStat_StatDBEntry *dbentry;
+
+	elog(DEBUG1, "received inquiry for %d", msg->databaseid);
 
 	/*
-	 * If the requestor's local clock time is older than last_statwrite, we
-	 * should suspect a clock glitch, ie system time going backwards; though
-	 * the more likely explanation is just delayed message receipt.  It is
-	 * worth expending a GetCurrentTimestamp call to be sure, since a large
-	 * retreat in the system clock reading could otherwise cause us to neglect
-	 * to update the stats file for a long time.
+	 * Find the last write request for this DB (found=true in that case). Plain
+	 * linear search, not really worth doing any magic here (probably).
 	 */
-	if (msg->clock_time < last_statwrite)
+	for (i = 0; i < num_statrequests; i++)
+	{
+		if (last_statrequests[i].databaseid == msg->databaseid)
+		{
+			found = true;
+			break;
+		}
+	}
+	
+	if (found)
+	{
+		/*
+		 * There already is a request for this DB, so lets advance the
+		 * request time	 if this requestor has a newer cutoff time
+		 * than any previous request.
+		 */
+		if (msg->cutoff_time > last_statrequests[i].request_time)
+			last_statrequests[i].request_time = msg->cutoff_time;
+	}
+	else
 	{
-		TimestampTz cur_ts = GetCurrentTimestamp();
+		/*
+		 * There's no request for this DB yet, so lets create it (allocate a
+		 * space for it, set the values).
+		 */
+		if (last_statrequests == NULL)
+			last_statrequests = palloc(sizeof(DBWriteRequest));
+		else
+			last_statrequests = repalloc(last_statrequests,
+								(num_statrequests + 1)*sizeof(DBWriteRequest));
+		
+		last_statrequests[num_statrequests].databaseid = msg->databaseid;
+		last_statrequests[num_statrequests].request_time = msg->clock_time;
+		num_statrequests += 1;
 
-		if (cur_ts < last_statwrite)
+		/*
+		* If the requestor's local clock time is older than last_statwrite, we
+		* should suspect a clock glitch, ie system time going backwards; though
+		* the more likely explanation is just delayed message receipt.  It is
+		* worth expending a GetCurrentTimestamp call to be sure, since a large
+		* retreat in the system clock reading could otherwise cause us to neglect
+		* to update the stats file for a long time.
+		*/
+		dbentry = pgstat_get_db_entry(msg->databaseid, false);
+		if ((dbentry != NULL) && (msg->clock_time < dbentry->stats_timestamp))
 		{
-			/*
-			 * Sure enough, time went backwards.  Force a new stats file write
-			 * to get back in sync; but first, log a complaint.
-			 */
-			char	   *writetime;
-			char	   *mytime;
-
-			/* Copy because timestamptz_to_str returns a static buffer */
-			writetime = pstrdup(timestamptz_to_str(last_statwrite));
-			mytime = pstrdup(timestamptz_to_str(cur_ts));
-			elog(LOG, "last_statwrite %s is later than collector's time %s",
-				 writetime, mytime);
-			pfree(writetime);
-			pfree(mytime);
-
-			last_statrequest = cur_ts;
-			last_statwrite = last_statrequest - 1;
+			TimestampTz cur_ts = GetCurrentTimestamp();
+
+			if (cur_ts < dbentry->stats_timestamp)
+			{
+				/*
+				* Sure enough, time went backwards.  Force a new stats file write
+				* to get back in sync; but first, log a complaint.
+				*/
+				char	   *writetime;
+				char	   *mytime;
+
+				/* Copy because timestamptz_to_str returns a static buffer */
+				writetime = pstrdup(timestamptz_to_str(dbentry->stats_timestamp));
+				mytime = pstrdup(timestamptz_to_str(cur_ts));
+				elog(LOG, "last_statwrite %s is later than collector's time %s for "
+					"db %d", writetime, mytime, dbentry->databaseid);
+				pfree(writetime);
+				pfree(mytime);
+
+				last_statrequests[num_statrequests].request_time = cur_ts;
+				dbentry->stats_timestamp = cur_ts - 1;
+			}
 		}
 	}
 }
@@ -4278,10 +4701,17 @@ pgstat_recv_dropdb(PgStat_MsgDropdb *msg, int len)
 	dbentry = pgstat_get_db_entry(msg->m_databaseid, false);
 
 	/*
-	 * If found, remove it.
+	 * If found, remove it (along with the db statfile).
 	 */
 	if (dbentry)
 	{
+		char db_statfile[strlen(pgstat_stat_db_filename) + 11];
+		snprintf(db_statfile, strlen(pgstat_stat_db_filename) + 11,
+				 pgstat_stat_filename, dbentry->databaseid);
+		
+		elog(DEBUG1, "removing %s", db_statfile);
+		unlink(db_statfile);
+		
 		if (dbentry->tables != NULL)
 			hash_destroy(dbentry->tables);
 		if (dbentry->functions != NULL)
@@ -4687,3 +5117,58 @@ pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len)
 						   HASH_REMOVE, NULL);
 	}
 }
+
+/* ----------
+ * pgstat_write_statsfile_needed() -
+ *
+ *	Checks whether there's a db stats request, requiring a file write.
+ * 
+ *	TODO Seems that thanks the way we handle last_statrequests (erase after
+ *	a write), this is unnecessary. Just check that there's at least one
+ *	request and you're done. Although there might be delayed requests ...
+ * ----------
+ */
+
+static bool pgstat_write_statsfile_needed()
+{
+	int i = 0;
+	PgStat_StatDBEntry *dbentry;
+	
+	/* Check the databases if they need to refresh the stats. */
+	for (i = 0; i < num_statrequests; i++)
+	{
+		dbentry = pgstat_get_db_entry(last_statrequests[i].databaseid, false);
+		
+		/* No dbentry yet or too old. */
+		if ((! dbentry) ||
+			(dbentry->stats_timestamp < last_statrequests[i].request_time)) {
+			return true;
+		}
+		
+	}
+	
+	/* Well, everything was written recently ... */
+	return false;
+}
+
+/* ----------
+ * pgstat_write_statsfile_needed() -
+ *
+ *	Checks whether stats for a particular DB need to be written to a file).
+ * ----------
+ */
+
+static bool
+pgstat_db_requested(Oid databaseid)
+{
+	int i = 0;
+	
+	/* Check the databases if they need to refresh the stats. */
+	for (i = 0; i < num_statrequests; i++)
+	{
+		if (last_statrequests[i].databaseid == databaseid)
+			return true;
+	}
+	
+	return false;
+}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 2cf34ce..e3e432b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -8730,20 +8730,43 @@ static void
 assign_pgstat_temp_directory(const char *newval, void *extra)
 {
 	/* check_canonical_path already canonicalized newval for us */
+	char	   *dname;
 	char	   *tname;
 	char	   *fname;
-
-	tname = guc_malloc(ERROR, strlen(newval) + 12);		/* /pgstat.tmp */
-	sprintf(tname, "%s/pgstat.tmp", newval);
-	fname = guc_malloc(ERROR, strlen(newval) + 13);		/* /pgstat.stat */
-	sprintf(fname, "%s/pgstat.stat", newval);
-
+	char	   *tname_db;
+	char	   *fname_db;
+
+	/* directory */
+	dname = guc_malloc(ERROR, strlen(newval) + 1);		/* runtime dir */
+	sprintf(dname, "%s", newval);
+
+	/* global stats */
+	tname = guc_malloc(ERROR, strlen(newval) + 12);		/* /global.tmp */
+	sprintf(tname, "%s/global.tmp", newval);
+	fname = guc_malloc(ERROR, strlen(newval) + 13);		/* /global.stat */
+	sprintf(fname, "%s/global.stat", newval);
+
+	/* per-db stats */
+	tname_db = guc_malloc(ERROR, strlen(newval) + 8);		/* /%d.tmp */
+	sprintf(tname_db, "%s/%%d.tmp", newval);
+	fname_db = guc_malloc(ERROR, strlen(newval) + 9);		/* /%d.stat */
+	sprintf(fname_db, "%s/%%d.stat", newval);
+
+	if (pgstat_stat_directory)
+		free(pgstat_stat_directory);
+	pgstat_stat_directory = dname;
 	if (pgstat_stat_tmpname)
 		free(pgstat_stat_tmpname);
 	pgstat_stat_tmpname = tname;
 	if (pgstat_stat_filename)
 		free(pgstat_stat_filename);
 	pgstat_stat_filename = fname;
+	if (pgstat_stat_db_tmpname)
+		free(pgstat_stat_db_tmpname);
+	pgstat_stat_db_tmpname = tname_db;
+	if (pgstat_stat_db_filename)
+		free(pgstat_stat_db_filename);
+	pgstat_stat_db_filename = fname_db;
 }
 
 static bool
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 3e05ac3..a8a2639 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -179,6 +179,7 @@ char	   *restrict_env;
 #endif
 const char *subdirs[] = {
 	"global",
+	"stat",
 	"pg_xlog",
 	"pg_xlog/archive_status",
 	"pg_clog",
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 613c1c2..b3467d2 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -205,6 +205,7 @@ typedef struct PgStat_MsgInquiry
 	PgStat_MsgHdr m_hdr;
 	TimestampTz clock_time;		/* observed local clock time */
 	TimestampTz cutoff_time;	/* minimum acceptable file timestamp */
+	Oid			databaseid;		/* requested DB (InvalidOid => all DBs) */
 } PgStat_MsgInquiry;
 
 
@@ -514,7 +515,7 @@ typedef union PgStat_Msg
  * ------------------------------------------------------------
  */
 
-#define PGSTAT_FILE_FORMAT_ID	0x01A5BC9A
+#define PGSTAT_FILE_FORMAT_ID	0xA240CA47
 
 /* ----------
  * PgStat_StatDBEntry			The collector's data per database
@@ -545,6 +546,7 @@ typedef struct PgStat_StatDBEntry
 	PgStat_Counter n_block_write_time;
 
 	TimestampTz stat_reset_timestamp;
+	TimestampTz stats_timestamp;		/* time of db stats file update */
 
 	/*
 	 * tables and functions must be last in the struct, because we don't write
@@ -722,8 +724,11 @@ extern bool pgstat_track_activities;
 extern bool pgstat_track_counts;
 extern int	pgstat_track_functions;
 extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern char *pgstat_stat_directory;
 extern char *pgstat_stat_tmpname;
 extern char *pgstat_stat_filename;
+extern char *pgstat_stat_db_tmpname;
+extern char *pgstat_stat_db_filename;
 
 /*
  * BgWriter statistics counters are updated directly by bgwriter and bufmgr
#26Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#25)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Tomas Vondra wrote:

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index be3adf1..4ec485e 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -64,10 +64,14 @@

/* ----------
* Paths for the statistics files (relative to installation's $PGDATA).
+ * Permanent and temprorary, global and per-database files.

Note typo in the line above.

-#define PGSTAT_STAT_PERMANENT_FILENAME		"global/pgstat.stat"
-#define PGSTAT_STAT_PERMANENT_TMPFILE		"global/pgstat.tmp"
+#define PGSTAT_STAT_PERMANENT_DIRECTORY		"stat"
+#define PGSTAT_STAT_PERMANENT_FILENAME		"stat/global.stat"
+#define PGSTAT_STAT_PERMANENT_TMPFILE		"stat/global.tmp"
+#define PGSTAT_STAT_PERMANENT_DB_FILENAME	"stat/%d.stat"
+#define PGSTAT_STAT_PERMANENT_DB_TMPFILE	"stat/%d.tmp"
+char	   *pgstat_stat_directory = NULL;
char	   *pgstat_stat_filename = NULL;
char	   *pgstat_stat_tmpname = NULL;
+char	   *pgstat_stat_db_filename = NULL;
+char	   *pgstat_stat_db_tmpname = NULL;

I don't like the quoted parts very much; it seems awkward to have the
snprintf patterns in one place and have them be used in very distant
places. Is there a way to improve that? Also, if I understand clearly,
the pgstat_stat_db_filename value needs to be an snprintf pattern too,
right? What if it doesn't contain the required % specifier?

Also, if you can filter this through pgindent, that would be best. Make
sure to add DBWriteRequest to src/tools/pgindent/typedefs_list.

+		/*
+		 * There's no request for this DB yet, so lets create it (allocate a
+		 * space for it, set the values).
+		 */
+		if (last_statrequests == NULL)
+			last_statrequests = palloc(sizeof(DBWriteRequest));
+		else
+			last_statrequests = repalloc(last_statrequests,
+								(num_statrequests + 1)*sizeof(DBWriteRequest));
+		
+		last_statrequests[num_statrequests].databaseid = msg->databaseid;
+		last_statrequests[num_statrequests].request_time = msg->clock_time;
+		num_statrequests += 1;

Having to repalloc this array each time seems wrong. Would a list
instead of an array help? see ilist.c/h; I vote for a dlist because you
can easily delete elements from the middle of it, if required (I think
you'd need that.)

+		char db_statfile[strlen(pgstat_stat_db_filename) + 11];
+		snprintf(db_statfile, strlen(pgstat_stat_db_filename) + 11,
+				 pgstat_stat_filename, dbentry->databaseid);

This pattern seems rather frequent. Can we use a macro or similar here?
Encapsulating the "11" better would be good. Magic numbers are evil.

diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 613c1c2..b3467d2 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -205,6 +205,7 @@ typedef struct PgStat_MsgInquiry
PgStat_MsgHdr m_hdr;
TimestampTz clock_time;		/* observed local clock time */
TimestampTz cutoff_time;	/* minimum acceptable file timestamp */
+	Oid			databaseid;		/* requested DB (InvalidOid => all DBs) */
} PgStat_MsgInquiry;

Do we need to support the case that somebody requests stuff from the
"shared" DB? IIRC that's what InvalidOid means in pgstat ...

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Jeff Janes
jeff.janes@gmail.com
In reply to: Tomas Vondra (#25)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 3.1.2013 20:33, Magnus Hagander wrote:

Yeah, +1 for a separate directory not in global.

OK, I moved the files from "global/stat" to "stat".

This has a warning:

pgstat.c:5132: warning: 'pgstat_write_statsfile_needed' was used with
no prototype before its definition

I plan to do some performance testing, but that will take a while so I
wanted to post this before I get distracted.

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28Jeff Janes
jeff.janes@gmail.com
In reply to: Tomas Vondra (#25)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 3.1.2013 20:33, Magnus Hagander wrote:

Yeah, +1 for a separate directory not in global.

OK, I moved the files from "global/stat" to "stat".

Why "stat" rather than "pg_stat"?

The existence of "global" and "base" as exceptions already annoys me.
(Especially when I do a tar -xf in my home directory without
remembering the -C flag). Unless there is some unstated rule behind
what gets a pg_ and what doesn't, I think we should have the "pg_".

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Jeff Janes
jeff.janes@gmail.com
In reply to: Jeff Janes (#27)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On Sat, Feb 2, 2013 at 2:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 3.1.2013 20:33, Magnus Hagander wrote:

Yeah, +1 for a separate directory not in global.

OK, I moved the files from "global/stat" to "stat".

This has a warning:

pgstat.c:5132: warning: 'pgstat_write_statsfile_needed' was used with
no prototype before its definition

I plan to do some performance testing, but that will take a while so I
wanted to post this before I get distracted.

Running "vacuumdb -a" on a cluster with 1000 db with 200 tables (x
serial primary key) in each, I get log messages like this:

last_statwrite 23682-06-18 22:36:52.960194-07 is later than
collector's time 2013-02-03 12:49:19.700629-08 for db 16387

Note the bizarre year in the first time stamp.

If it matters, I got this after shutting down the cluster, blowing
away $DATA/stat/*, then restarting it and invoking vacuumdb.

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30Tomas Vondra
tv@fuzzy.cz
In reply to: Jeff Janes (#28)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 3.2.2013 20:46, Jeff Janes wrote:

On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 3.1.2013 20:33, Magnus Hagander wrote:

Yeah, +1 for a separate directory not in global.

OK, I moved the files from "global/stat" to "stat".

Why "stat" rather than "pg_stat"?

The existence of "global" and "base" as exceptions already annoys me.
(Especially when I do a tar -xf in my home directory without
remembering the -C flag). Unless there is some unstated rule behind
what gets a pg_ and what doesn't, I think we should have the "pg_".

I don't think there's a clear naming rule. But I think your suggestion
makes perfect sense, especially because we have pg_stat_tmp directory.
So now we'd have pg_stat and pg_stat_tmp, which is quite elegant.

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31Tomas Vondra
tv@fuzzy.cz
In reply to: Jeff Janes (#27)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 2.2.2013 23:33, Jeff Janes wrote:

On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 3.1.2013 20:33, Magnus Hagander wrote:

Yeah, +1 for a separate directory not in global.

OK, I moved the files from "global/stat" to "stat".

This has a warning:

pgstat.c:5132: warning: 'pgstat_write_statsfile_needed' was used with
no prototype before its definition

I forgot to add "void" into the method prototype ... Thanks!

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32Tomas Vondra
tv@fuzzy.cz
In reply to: Jeff Janes (#29)
1 attachment(s)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 3.2.2013 21:54, Jeff Janes wrote:

On Sat, Feb 2, 2013 at 2:33 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 3.1.2013 20:33, Magnus Hagander wrote:

Yeah, +1 for a separate directory not in global.

OK, I moved the files from "global/stat" to "stat".

This has a warning:

pgstat.c:5132: warning: 'pgstat_write_statsfile_needed' was used with
no prototype before its definition

I plan to do some performance testing, but that will take a while so I
wanted to post this before I get distracted.

Running "vacuumdb -a" on a cluster with 1000 db with 200 tables (x
serial primary key) in each, I get log messages like this:

last_statwrite 23682-06-18 22:36:52.960194-07 is later than
collector's time 2013-02-03 12:49:19.700629-08 for db 16387

Note the bizarre year in the first time stamp.

If it matters, I got this after shutting down the cluster, blowing
away $DATA/stat/*, then restarting it and invoking vacuumdb.

I somehow expected that hash_search zeroes all the fields of a new
entry, but looking at pgstat_get_db_entry that obviously is not the
case. So stats_timestamp (which tracks timestamp of the last write for a
DB) was random - that's where the bizarre year values came from.

I've added a proper initialization (to 0), and now it works as expected.

Although the whole sequence of errors I was getting was this:

LOG: last_statwrite 11133-08-28 19:22:31.711744+02 is later than
collector's time 2013-02-04 00:54:21.113439+01 for db 19093
WARNING: pgstat wait timeout
LOG: last_statwrite 39681-12-23 18:48:48.9093+01 is later than
collector's time 2013-02-04 00:54:31.424681+01 for db 46494
FATAL: could not find block containing chunk 0x2af4a60
LOG: statistics collector process (PID 10063) exited with exit code 1
WARNING: pgstat wait timeout
WARNING: pgstat wait timeout

I'm not entirely sure where the FATAL came from, but it seems it was
somehow related to the issue - it was quite reproducible, although I
don't see how exactly could this happen. There relevant block of code
looks like this:

char *writetime;
char *mytime;

/* Copy because timestamptz_to_str returns a static buffer */
writetime = pstrdup(timestamptz_to_str(dbentry->stats_timestamp));
mytime = pstrdup(timestamptz_to_str(cur_ts));
elog(LOG, "last_statwrite %s is later than collector's time %s for "
"db %d", writetime, mytime, dbentry->databaseid);
pfree(writetime);
pfree(mytime);

which seems quite fine to mee. I'm not sure how one of the pfree calls
could fail?

Anyway, attached is a patch that fixes all three issues, i.e.

1) the un-initialized timestamp
2) the "void" ommited from the signature
3) rename to "pg_stat" instead of just "stat"

Tomas

Attachments:

stats-split-v6.patchtext/x-diff; name=stats-split-v6.patchDownload
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index d318db9..6d0efe9 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -64,10 +64,14 @@
 
 /* ----------
  * Paths for the statistics files (relative to installation's $PGDATA).
+ * Permanent and temprorary, global and per-database files.
  * ----------
  */
-#define PGSTAT_STAT_PERMANENT_FILENAME		"global/pgstat.stat"
-#define PGSTAT_STAT_PERMANENT_TMPFILE		"global/pgstat.tmp"
+#define PGSTAT_STAT_PERMANENT_DIRECTORY		"pg_stat"
+#define PGSTAT_STAT_PERMANENT_FILENAME		"pg_stat/global.stat"
+#define PGSTAT_STAT_PERMANENT_TMPFILE		"pg_stat/global.tmp"
+#define PGSTAT_STAT_PERMANENT_DB_FILENAME	"pg_stat/%d.stat"
+#define PGSTAT_STAT_PERMANENT_DB_TMPFILE	"pg_stat/%d.tmp"
 
 /* ----------
  * Timer definitions.
@@ -115,8 +119,11 @@ int			pgstat_track_activity_query_size = 1024;
  * Built from GUC parameter
  * ----------
  */
+char	   *pgstat_stat_directory = NULL;
 char	   *pgstat_stat_filename = NULL;
 char	   *pgstat_stat_tmpname = NULL;
+char	   *pgstat_stat_db_filename = NULL;
+char	   *pgstat_stat_db_tmpname = NULL;
 
 /*
  * BgWriter global statistics counters (unused in other processes).
@@ -219,11 +226,16 @@ static int	localNumBackends = 0;
  */
 static PgStat_GlobalStats globalStats;
 
-/* Last time the collector successfully wrote the stats file */
-static TimestampTz last_statwrite;
+/* Write request info for each database */
+typedef struct DBWriteRequest
+{
+	Oid			databaseid;		/* OID of the database to write */
+	TimestampTz request_time;	/* timestamp of the last write request */
+} DBWriteRequest;
 
-/* Latest statistics request time from backends */
-static TimestampTz last_statrequest;
+/* Latest statistics request time from backends for each DB */
+static DBWriteRequest * last_statrequests = NULL;
+static int num_statrequests = 0;
 
 static volatile bool need_exit = false;
 static volatile bool got_SIGHUP = false;
@@ -252,11 +264,17 @@ static void pgstat_sighup_handler(SIGNAL_ARGS);
 static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
 static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
 					 Oid tableoid, bool create);
-static void pgstat_write_statsfile(bool permanent);
-static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent);
+static void pgstat_write_statsfile(bool permanent, bool force);
+static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent);
+static void pgstat_write_db_dummyfile(Oid databaseid);
+static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs);
+static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
 static void backend_read_statsfile(void);
 static void pgstat_read_current_status(void);
 
+static bool pgstat_write_statsfile_needed(void);
+static bool pgstat_db_requested(Oid databaseid);
+
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
 static HTAB *pgstat_collect_oids(Oid catalogid);
@@ -285,7 +303,6 @@ static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le
 static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
 static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
 
-
 /* ------------------------------------------------------------
  * Public functions called from postmaster follow
  * ------------------------------------------------------------
@@ -549,8 +566,34 @@ startup_failed:
 void
 pgstat_reset_all(void)
 {
-	unlink(pgstat_stat_filename);
-	unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+	DIR * dir;
+	struct dirent * entry;
+
+	dir = AllocateDir(pgstat_stat_directory);
+	while ((entry = ReadDir(dir, pgstat_stat_directory)) != NULL)
+	{
+		char fname[strlen(pgstat_stat_directory) + strlen(entry->d_name) + 1];
+
+		if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
+			continue;
+
+		sprintf(fname, "%s/%s", pgstat_stat_directory, entry->d_name);
+		unlink(fname);
+	}
+	FreeDir(dir);
+
+	dir = AllocateDir(PGSTAT_STAT_PERMANENT_DIRECTORY);
+	while ((entry = ReadDir(dir, PGSTAT_STAT_PERMANENT_DIRECTORY)) != NULL)
+	{
+		char fname[strlen(PGSTAT_STAT_PERMANENT_FILENAME) + strlen(entry->d_name) + 1];
+
+		if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
+			continue;
+
+		sprintf(fname, "%s/%s", PGSTAT_STAT_PERMANENT_FILENAME, entry->d_name);
+		unlink(fname);
+	}
+	FreeDir(dir);
 }
 
 #ifdef EXEC_BACKEND
@@ -1408,13 +1451,14 @@ pgstat_ping(void)
  * ----------
  */
 static void
-pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time)
+pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time, Oid databaseid)
 {
 	PgStat_MsgInquiry msg;
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_INQUIRY);
 	msg.clock_time = clock_time;
 	msg.cutoff_time = cutoff_time;
+	msg.databaseid = databaseid;
 	pgstat_send(&msg, sizeof(msg));
 }
 
@@ -3004,6 +3048,7 @@ PgstatCollectorMain(int argc, char *argv[])
 	int			len;
 	PgStat_Msg	msg;
 	int			wr;
+	bool		first_write = true;
 
 	IsUnderPostmaster = true;	/* we are a postmaster subprocess now */
 
@@ -3053,17 +3098,11 @@ PgstatCollectorMain(int argc, char *argv[])
 	init_ps_display("stats collector process", "", "", "");
 
 	/*
-	 * Arrange to write the initial status file right away
-	 */
-	last_statrequest = GetCurrentTimestamp();
-	last_statwrite = last_statrequest - 1;
-
-	/*
 	 * Read in an existing statistics stats file or initialize the stats to
-	 * zero.
+	 * zero (read data for all databases, including table/func stats).
 	 */
 	pgStatRunningInCollector = true;
-	pgStatDBHash = pgstat_read_statsfile(InvalidOid, true);
+	pgStatDBHash = pgstat_read_statsfile(InvalidOid, true, false);
 
 	/*
 	 * Loop to process messages until we get SIGQUIT or detect ungraceful
@@ -3107,10 +3146,14 @@ PgstatCollectorMain(int argc, char *argv[])
 
 			/*
 			 * Write the stats file if a new request has arrived that is not
-			 * satisfied by existing file.
+			 * satisfied by existing file (force writing all files if it's
+			 * the first write after startup).
 			 */
-			if (last_statwrite < last_statrequest)
-				pgstat_write_statsfile(false);
+			if (first_write || pgstat_write_statsfile_needed())
+			{
+				pgstat_write_statsfile(false, first_write);
+				first_write = false;
+			}
 
 			/*
 			 * Try to receive and process a message.  This will not block,
@@ -3269,7 +3312,7 @@ PgstatCollectorMain(int argc, char *argv[])
 	/*
 	 * Save the final stats to reuse at next startup.
 	 */
-	pgstat_write_statsfile(true);
+	pgstat_write_statsfile(true, true);
 
 	exit(0);
 }
@@ -3349,6 +3392,7 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 		result->n_block_write_time = 0;
 
 		result->stat_reset_timestamp = GetCurrentTimestamp();
+		result->stats_timestamp = 0;
 
 		memset(&hash_ctl, 0, sizeof(hash_ctl));
 		hash_ctl.keysize = sizeof(Oid);
@@ -3429,23 +3473,25 @@ pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create)
  *	shutting down only), remove the temporary file so that backends
  *	starting up under a new postmaster can't read the old data before
  *	the new collector is ready.
+ * 
+ *	When the 'force' is false, only the requested databases (listed in
+ * 	last_statrequests) will be written. If 'force' is true, all databases
+ * 	will be written (this is used e.g. at shutdown).
  * ----------
  */
 static void
-pgstat_write_statsfile(bool permanent)
+pgstat_write_statsfile(bool permanent, bool force)
 {
 	HASH_SEQ_STATUS hstat;
-	HASH_SEQ_STATUS tstat;
-	HASH_SEQ_STATUS fstat;
 	PgStat_StatDBEntry *dbentry;
-	PgStat_StatTabEntry *tabentry;
-	PgStat_StatFuncEntry *funcentry;
 	FILE	   *fpout;
 	int32		format_id;
 	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 	int			rc;
 
+	elog(DEBUG1, "writing statsfile '%s'", statfile);
+	
 	/*
 	 * Open the statistics temp file to write out the current values.
 	 */
@@ -3484,6 +3530,20 @@ pgstat_write_statsfile(bool permanent)
 	while ((dbentry = (PgStat_StatDBEntry *) hash_seq_search(&hstat)) != NULL)
 	{
 		/*
+		 * Write our the tables and functions into a separate file, but only
+		 * if the database is in the requests or if it's a forced write (then
+		 * all the DBs need to be written - e.g. at the shutdown).
+		 * 
+		 * We need to do this before the dbentry write to write the proper
+		 * timestamp to the global file.
+		 */
+		if (force || pgstat_db_requested(dbentry->databaseid)) {
+			elog(DEBUG1, "writing statsfile for DB %d", dbentry->databaseid);
+			dbentry->stats_timestamp = globalStats.stats_timestamp;
+			pgstat_write_db_statsfile(dbentry, permanent);
+		}
+
+		/*
 		 * Write out the DB entry including the number of live backends. We
 		 * don't write the tables or functions pointers, since they're of no
 		 * use to any other process.
@@ -3493,29 +3553,10 @@ pgstat_write_statsfile(bool permanent)
 		(void) rc;				/* we'll check for error with ferror */
 
 		/*
-		 * Walk through the database's access stats per table.
-		 */
-		hash_seq_init(&tstat, dbentry->tables);
-		while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
-		{
-			fputc('T', fpout);
-			rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
-			(void) rc;			/* we'll check for error with ferror */
-		}
-
-		/*
-		 * Walk through the database's function stats table.
-		 */
-		hash_seq_init(&fstat, dbentry->functions);
-		while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
-		{
-			fputc('F', fpout);
-			rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
-			(void) rc;			/* we'll check for error with ferror */
-		}
-
-		/*
 		 * Mark the end of this DB
+		 * 
+		 * TODO Does using these chars still make sense, when the tables/func
+		 * stats are moved to a separate file?
 		 */
 		fputc('d', fpout);
 	}
@@ -3527,6 +3568,28 @@ pgstat_write_statsfile(bool permanent)
 	 */
 	fputc('E', fpout);
 
+	/* In any case, we can just throw away all the db requests, but we need to
+	 * write dummy files for databases without a stat entry (it would cause
+	 * issues in pgstat_read_db_statsfile_timestamp and pgstat wait timeouts).
+	 * This may happend e.g. for shared DB (oid = 0) right after initdb.
+	 */
+	if (last_statrequests != NULL)
+	{
+		int i = 0;
+		for (i = 0; i < num_statrequests; i++)
+		{
+			/* Create dummy files for requested databases without a proper
+			 * dbentry. It's much easier this way than dealing with multiple
+			 * timestamps, possibly existing but not yet written DBs etc. */
+			if (! pgstat_get_db_entry(last_statrequests[i].databaseid, false))
+				pgstat_write_db_dummyfile(last_statrequests[i].databaseid);
+		}
+
+		pfree(last_statrequests);
+		last_statrequests = NULL;
+		num_statrequests = 0;
+	}
+
 	if (ferror(fpout))
 	{
 		ereport(LOG,
@@ -3552,57 +3615,247 @@ pgstat_write_statsfile(bool permanent)
 						tmpfile, statfile)));
 		unlink(tmpfile);
 	}
-	else
+
+	if (permanent)
+		unlink(pgstat_stat_filename);
+}
+
+
+/* ----------
+ * pgstat_write_db_statsfile() -
+ *
+ *	Tell the news. This writes stats file for a single database.
+ *
+ *	If writing to the permanent file (happens when the collector is
+ *	shutting down only), remove the temporary file so that backends
+ *	starting up under a new postmaster can't read the old data before
+ *	the new collector is ready.
+ * ----------
+ */
+static void
+pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent)
+{
+	HASH_SEQ_STATUS tstat;
+	HASH_SEQ_STATUS fstat;
+	PgStat_StatTabEntry *tabentry;
+	PgStat_StatFuncEntry *funcentry;
+	FILE	   *fpout;
+	int32		format_id;
+	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_DB_TMPFILE : pgstat_stat_db_tmpname;
+	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_DB_FILENAME : pgstat_stat_db_filename;
+	int			rc;
+
+	/*
+	 * OIDs are 32-bit values, so 10 chars should be safe, +1 for the \0 byte
+	 */
+	char db_tmpfile[strlen(tmpfile) + 11];
+	char db_statfile[strlen(statfile) + 11];
+
+	/*
+	 * Append database OID at the end of the basic filename (both for tmp and target file).
+	 */
+	snprintf(db_tmpfile, strlen(tmpfile) + 11, tmpfile, dbentry->databaseid);
+	snprintf(db_statfile, strlen(statfile) + 11, statfile, dbentry->databaseid);
+
+	elog(DEBUG1, "writing statsfile '%s'", db_statfile);
+
+	/*
+	 * Open the statistics temp file to write out the current values.
+	 */
+	fpout = AllocateFile(db_tmpfile, PG_BINARY_W);
+	if (fpout == NULL)
 	{
-		/*
-		 * Successful write, so update last_statwrite.
-		 */
-		last_statwrite = globalStats.stats_timestamp;
+		ereport(LOG,
+				(errcode_for_file_access(),
+				 errmsg("could not open temporary statistics file \"%s\": %m",
+						db_tmpfile)));
+		return;
+	}
 
-		/*
-		 * If there is clock skew between backends and the collector, we could
-		 * receive a stats request time that's in the future.  If so, complain
-		 * and reset last_statrequest.	Resetting ensures that no inquiry
-		 * message can cause more than one stats file write to occur.
-		 */
-		if (last_statrequest > last_statwrite)
-		{
-			char	   *reqtime;
-			char	   *mytime;
-
-			/* Copy because timestamptz_to_str returns a static buffer */
-			reqtime = pstrdup(timestamptz_to_str(last_statrequest));
-			mytime = pstrdup(timestamptz_to_str(last_statwrite));
-			elog(LOG, "last_statrequest %s is later than collector's time %s",
-				 reqtime, mytime);
-			pfree(reqtime);
-			pfree(mytime);
-
-			last_statrequest = last_statwrite;
-		}
+	/*
+	 * Write the file header --- currently just a format ID.
+	 */
+	format_id = PGSTAT_FILE_FORMAT_ID;
+	rc = fwrite(&format_id, sizeof(format_id), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
+	/*
+	 * Write the timestamp.
+	 */
+	rc = fwrite(&(globalStats.stats_timestamp), sizeof(globalStats.stats_timestamp), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
+	/*
+	 * Walk through the database's access stats per table.
+	 */
+	hash_seq_init(&tstat, dbentry->tables);
+	while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
+	{
+		fputc('T', fpout);
+		rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
+		(void) rc;			/* we'll check for error with ferror */
 	}
 
+	/*
+	 * Walk through the database's function stats table.
+	 */
+	hash_seq_init(&fstat, dbentry->functions);
+	while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
+	{
+		fputc('F', fpout);
+		rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
+		(void) rc;			/* we'll check for error with ferror */
+	}
+
+	/*
+	 * No more output to be done. Close the temp file and replace the old
+	 * pgstat.stat with it.  The ferror() check replaces testing for error
+	 * after each individual fputc or fwrite above.
+	 */
+	fputc('E', fpout);
+
+	if (ferror(fpout))
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+			   errmsg("could not write temporary statistics file \"%s\": %m",
+					  db_tmpfile)));
+		FreeFile(fpout);
+		unlink(db_tmpfile);
+	}
+	else if (FreeFile(fpout) < 0)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+			   errmsg("could not close temporary statistics file \"%s\": %m",
+					  db_tmpfile)));
+		unlink(db_tmpfile);
+	}
+	else if (rename(db_tmpfile, db_statfile) < 0)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+				 errmsg("could not rename temporary statistics file \"%s\" to \"%s\": %m",
+						db_tmpfile, db_statfile)));
+		unlink(db_tmpfile);
+	}
+	
 	if (permanent)
-		unlink(pgstat_stat_filename);
+	{
+		char db_statfile[strlen(pgstat_stat_db_filename) + 11];
+		snprintf(db_statfile, strlen(pgstat_stat_db_filename) + 11,
+				 pgstat_stat_db_filename, dbentry->databaseid);
+		elog(DEBUG1, "removing temporary stat file '%s'", db_statfile);
+		unlink(db_statfile);
+	}
 }
 
 
 /* ----------
+ * pgstat_write_db_dummyfile() -
+ *
+ *	All this does is writing a dummy stat file for databases without dbentry
+ *	yet. It basically writes just a file header - format ID and a timestamp.
+ * ----------
+ */
+static void
+pgstat_write_db_dummyfile(Oid databaseid)
+{
+	FILE	   *fpout;
+	int32		format_id;
+	int			rc;
+
+	/*
+	 * OIDs are 32-bit values, so 10 chars should be safe, +1 for the \0 byte
+	 */
+	char db_tmpfile[strlen(pgstat_stat_db_tmpname) + 11];
+	char db_statfile[strlen(pgstat_stat_db_filename) + 11];
+
+	/*
+	 * Append database OID at the end of the basic filename (both for tmp and target file).
+	 */
+	snprintf(db_tmpfile, strlen(pgstat_stat_db_tmpname) + 11, pgstat_stat_db_tmpname, databaseid);
+	snprintf(db_statfile, strlen(pgstat_stat_db_filename) + 11, pgstat_stat_db_filename, databaseid);
+
+	elog(DEBUG1, "writing statsfile '%s'", db_statfile);
+
+	/*
+	 * Open the statistics temp file to write out the current values.
+	 */
+	fpout = AllocateFile(db_tmpfile, PG_BINARY_W);
+	if (fpout == NULL)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+				 errmsg("could not open temporary statistics file \"%s\": %m",
+						db_tmpfile)));
+		return;
+	}
+
+	/*
+	 * Write the file header --- currently just a format ID.
+	 */
+	format_id = PGSTAT_FILE_FORMAT_ID;
+	rc = fwrite(&format_id, sizeof(format_id), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
+	/*
+	 * Write the timestamp.
+	 */
+	rc = fwrite(&(globalStats.stats_timestamp), sizeof(globalStats.stats_timestamp), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
+	/*
+	 * No more output to be done. Close the temp file and replace the old
+	 * pgstat.stat with it.  The ferror() check replaces testing for error
+	 * after each individual fputc or fwrite above.
+	 */
+	fputc('E', fpout);
+
+	if (ferror(fpout))
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+			   errmsg("could not write temporary dummy statistics file \"%s\": %m",
+					  db_tmpfile)));
+		FreeFile(fpout);
+		unlink(db_tmpfile);
+	}
+	else if (FreeFile(fpout) < 0)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+			   errmsg("could not close temporary dummy statistics file \"%s\": %m",
+					  db_tmpfile)));
+		unlink(db_tmpfile);
+	}
+	else if (rename(db_tmpfile, db_statfile) < 0)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+				 errmsg("could not rename temporary dummy statistics file \"%s\" to \"%s\": %m",
+						db_tmpfile, db_statfile)));
+		unlink(db_tmpfile);
+	}
+
+}
+
+/* ----------
  * pgstat_read_statsfile() -
  *
  *	Reads in an existing statistics collector file and initializes the
  *	databases' hash table (whose entries point to the tables' hash tables).
+ * 
+ *	Allows reading only the global stats (at database level), which is just
+ *	enough for many purposes (e.g. autovacuum launcher etc.). If this is
+ *	sufficient for you, use onlydbs=true.
  * ----------
  */
 static HTAB *
-pgstat_read_statsfile(Oid onlydb, bool permanent)
+pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs)
 {
 	PgStat_StatDBEntry *dbentry;
 	PgStat_StatDBEntry dbbuf;
-	PgStat_StatTabEntry *tabentry;
-	PgStat_StatTabEntry tabbuf;
-	PgStat_StatFuncEntry funcbuf;
-	PgStat_StatFuncEntry *funcentry;
 	HASHCTL		hash_ctl;
 	HTAB	   *dbhash;
 	HTAB	   *tabhash = NULL;
@@ -3613,6 +3866,11 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 
 	/*
+	 * If we want a db-level stats only, we don't want a particular db.
+	 */
+	Assert(!((onlydb != InvalidOid) && onlydbs));
+
+	/*
 	 * The tables will live in pgStatLocalContext.
 	 */
 	pgstat_setup_memcxt();
@@ -3758,6 +4016,16 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				 */
 				tabhash = dbentry->tables;
 				funchash = dbentry->functions;
+
+				/*
+				 * Read the data from the file for this database. If there was
+				 * onlydb specified (!= InvalidOid), we would not get here because
+				 * of a break above. So we don't need to recheck.
+				 */
+				if (! onlydbs)
+					pgstat_read_db_statsfile(dbentry->databaseid, tabhash, funchash,
+											permanent);
+
 				break;
 
 				/*
@@ -3768,6 +4036,105 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				funchash = NULL;
 				break;
 
+			case 'E':
+				goto done;
+
+			default:
+				ereport(pgStatRunningInCollector ? LOG : WARNING,
+						(errmsg("corrupted statistics file \"%s\"",
+								statfile)));
+				goto done;
+		}
+	}
+
+done:
+	FreeFile(fpin);
+
+	if (permanent)
+		unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+
+	return dbhash;
+}
+
+
+/* ----------
+ * pgstat_read_db_statsfile() -
+ *
+ *	Reads in an existing statistics collector db file and initializes the
+ *	tables and functions hash tables (for the database identified by Oid).
+ * ----------
+ */
+static void
+pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent)
+{
+	PgStat_StatTabEntry *tabentry;
+	PgStat_StatTabEntry tabbuf;
+	PgStat_StatFuncEntry funcbuf;
+	PgStat_StatFuncEntry *funcentry;
+	FILE	   *fpin;
+	int32		format_id;
+	TimestampTz timestamp;
+	bool		found;
+	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_DB_FILENAME : pgstat_stat_db_filename;
+
+	/*
+	 * OIDs are 32-bit values, so 10 chars should be safe, +1 for the \0 byte
+	 */
+	char db_statfile[strlen(statfile) + 11];
+
+	/*
+	 * Append database OID at the end of the basic filename (both for tmp and target file).
+	 */
+	snprintf(db_statfile, strlen(statfile) + 11, statfile, databaseid);
+
+	/*
+	 * Try to open the status file. If it doesn't exist, the backends simply
+	 * return zero for anything and the collector simply starts from scratch
+	 * with empty counters.
+	 *
+	 * ENOENT is a possibility if the stats collector is not running or has
+	 * not yet written the stats file the first time.  Any other failure
+	 * condition is suspicious.
+	 */
+	if ((fpin = AllocateFile(db_statfile, PG_BINARY_R)) == NULL)
+	{
+		if (errno != ENOENT)
+			ereport(pgStatRunningInCollector ? LOG : WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not open statistics file \"%s\": %m",
+							db_statfile)));
+		return;
+	}
+
+	/*
+	 * Verify it's of the expected format.
+	 */
+	if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id)
+		|| format_id != PGSTAT_FILE_FORMAT_ID)
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", db_statfile)));
+		goto done;
+	}
+
+	/*
+	 * Read global stats struct
+	 */
+	if (fread(&timestamp, 1, sizeof(timestamp), fpin) != sizeof(timestamp))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", db_statfile)));
+		goto done;
+	}
+
+	/*
+	 * We found an existing collector stats file. Read it and put all the
+	 * hashtable entries into place.
+	 */
+	for (;;)
+	{
+		switch (fgetc(fpin))
+		{
 				/*
 				 * 'T'	A PgStat_StatTabEntry follows.
 				 */
@@ -3777,7 +4144,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
-									statfile)));
+									db_statfile)));
 					goto done;
 				}
 
@@ -3795,7 +4162,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
-									statfile)));
+									db_statfile)));
 					goto done;
 				}
 
@@ -3811,7 +4178,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
-									statfile)));
+									db_statfile)));
 					goto done;
 				}
 
@@ -3829,7 +4196,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
-									statfile)));
+									db_statfile)));
 					goto done;
 				}
 
@@ -3845,7 +4212,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 			default:
 				ereport(pgStatRunningInCollector ? LOG : WARNING,
 						(errmsg("corrupted statistics file \"%s\"",
-								statfile)));
+								db_statfile)));
 				goto done;
 		}
 	}
@@ -3854,37 +4221,47 @@ done:
 	FreeFile(fpin);
 
 	if (permanent)
-		unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+	{
+		char db_statfile[strlen(PGSTAT_STAT_PERMANENT_DB_FILENAME) + 11];
+		snprintf(db_statfile, strlen(PGSTAT_STAT_PERMANENT_DB_FILENAME) + 11,
+				 PGSTAT_STAT_PERMANENT_DB_FILENAME, databaseid);
+		elog(DEBUG1, "removing permanent stats file '%s'", db_statfile);
+		unlink(db_statfile);
+	}
 
-	return dbhash;
+	return;
 }
 
 /* ----------
- * pgstat_read_statsfile_timestamp() -
+ * pgstat_read_db_statsfile_timestamp() -
  *
- *	Attempt to fetch the timestamp of an existing stats file.
+ *	Attempt to fetch the timestamp of an existing stats file (for a DB).
  *	Returns TRUE if successful (timestamp is stored at *ts).
  * ----------
  */
 static bool
-pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
+pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent, TimestampTz *ts)
 {
-	PgStat_GlobalStats myGlobalStats;
+	TimestampTz timestamp;
 	FILE	   *fpin;
 	int32		format_id;
-	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
+	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_DB_FILENAME : pgstat_stat_db_filename;
+	char db_statfile[strlen(statfile) + 11];
+
+	/* format the db statfile filename */
+	snprintf(db_statfile, strlen(statfile) + 11, statfile, databaseid);
 
 	/*
 	 * Try to open the status file.  As above, anything but ENOENT is worthy
 	 * of complaining about.
 	 */
-	if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
+	if ((fpin = AllocateFile(db_statfile, PG_BINARY_R)) == NULL)
 	{
 		if (errno != ENOENT)
 			ereport(pgStatRunningInCollector ? LOG : WARNING,
 					(errcode_for_file_access(),
 					 errmsg("could not open statistics file \"%s\": %m",
-							statfile)));
+							db_statfile)));
 		return false;
 	}
 
@@ -3895,7 +4272,7 @@ pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
 		|| format_id != PGSTAT_FILE_FORMAT_ID)
 	{
 		ereport(pgStatRunningInCollector ? LOG : WARNING,
-				(errmsg("corrupted statistics file \"%s\"", statfile)));
+				(errmsg("corrupted statistics file \"%s\"", db_statfile)));
 		FreeFile(fpin);
 		return false;
 	}
@@ -3903,15 +4280,15 @@ pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
 	/*
 	 * Read global stats struct
 	 */
-	if (fread(&myGlobalStats, 1, sizeof(myGlobalStats), fpin) != sizeof(myGlobalStats))
+	if (fread(&timestamp, 1, sizeof(TimestampTz), fpin) != sizeof(TimestampTz))
 	{
 		ereport(pgStatRunningInCollector ? LOG : WARNING,
-				(errmsg("corrupted statistics file \"%s\"", statfile)));
+				(errmsg("corrupted statistics file \"%s\"", db_statfile)));
 		FreeFile(fpin);
 		return false;
 	}
 
-	*ts = myGlobalStats.stats_timestamp;
+	*ts = timestamp;
 
 	FreeFile(fpin);
 	return true;
@@ -3947,7 +4324,7 @@ backend_read_statsfile(void)
 
 		CHECK_FOR_INTERRUPTS();
 
-		ok = pgstat_read_statsfile_timestamp(false, &file_ts);
+		ok = pgstat_read_db_statsfile_timestamp(MyDatabaseId, false, &file_ts);
 
 		cur_ts = GetCurrentTimestamp();
 		/* Calculate min acceptable timestamp, if we didn't already */
@@ -4006,7 +4383,7 @@ backend_read_statsfile(void)
 				pfree(mytime);
 			}
 
-			pgstat_send_inquiry(cur_ts, min_ts);
+			pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
 			break;
 		}
 
@@ -4016,7 +4393,7 @@ backend_read_statsfile(void)
 
 		/* Not there or too old, so kick the collector and wait a bit */
 		if ((count % PGSTAT_INQ_LOOP_COUNT) == 0)
-			pgstat_send_inquiry(cur_ts, min_ts);
+			pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
 
 		pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
 	}
@@ -4026,9 +4403,16 @@ backend_read_statsfile(void)
 
 	/* Autovacuum launcher wants stats about all databases */
 	if (IsAutoVacuumLauncherProcess())
-		pgStatDBHash = pgstat_read_statsfile(InvalidOid, false);
+		/* 
+		 * FIXME Does it really need info including tables/functions? Or is it enough to read
+		 * database-level stats? It seems to me the launcher needs PgStat_StatDBEntry only
+		 * (at least that's how I understand the rebuild_database_list() in autovacuum.c),
+		 * because pgstat_stattabentries are used in do_autovacuum() only, that that's what's
+		 * executed in workers ... So maybe we'd be just fine by reading in the dbentries?
+		 */
+		pgStatDBHash = pgstat_read_statsfile(InvalidOid, false, true);
 	else
-		pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false);
+		pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false, false);
 }
 
 
@@ -4084,44 +4468,84 @@ pgstat_clear_snapshot(void)
 static void
 pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
 {
-	/*
-	 * Advance last_statrequest if this requestor has a newer cutoff time
-	 * than any previous request.
-	 */
-	if (msg->cutoff_time > last_statrequest)
-		last_statrequest = msg->cutoff_time;
+	int i = 0;
+	bool found = false;
+	PgStat_StatDBEntry *dbentry;
+
+	elog(DEBUG1, "received inquiry for %d", msg->databaseid);
 
 	/*
-	 * If the requestor's local clock time is older than last_statwrite, we
-	 * should suspect a clock glitch, ie system time going backwards; though
-	 * the more likely explanation is just delayed message receipt.  It is
-	 * worth expending a GetCurrentTimestamp call to be sure, since a large
-	 * retreat in the system clock reading could otherwise cause us to neglect
-	 * to update the stats file for a long time.
+	 * Find the last write request for this DB (found=true in that case). Plain
+	 * linear search, not really worth doing any magic here (probably).
 	 */
-	if (msg->clock_time < last_statwrite)
+	for (i = 0; i < num_statrequests; i++)
+	{
+		if (last_statrequests[i].databaseid == msg->databaseid)
+		{
+			found = true;
+			break;
+		}
+	}
+	
+	if (found)
+	{
+		/*
+		 * There already is a request for this DB, so lets advance the
+		 * request time	 if this requestor has a newer cutoff time
+		 * than any previous request.
+		 */
+		if (msg->cutoff_time > last_statrequests[i].request_time)
+			last_statrequests[i].request_time = msg->cutoff_time;
+	}
+	else
 	{
-		TimestampTz cur_ts = GetCurrentTimestamp();
+		/*
+		 * There's no request for this DB yet, so lets create it (allocate a
+		 * space for it, set the values).
+		 */
+		if (last_statrequests == NULL)
+			last_statrequests = palloc(sizeof(DBWriteRequest));
+		else
+			last_statrequests = repalloc(last_statrequests,
+								(num_statrequests + 1)*sizeof(DBWriteRequest));
+		
+		last_statrequests[num_statrequests].databaseid = msg->databaseid;
+		last_statrequests[num_statrequests].request_time = msg->clock_time;
+		num_statrequests += 1;
 
-		if (cur_ts < last_statwrite)
+		/*
+		* If the requestor's local clock time is older than last_statwrite, we
+		* should suspect a clock glitch, ie system time going backwards; though
+		* the more likely explanation is just delayed message receipt.  It is
+		* worth expending a GetCurrentTimestamp call to be sure, since a large
+		* retreat in the system clock reading could otherwise cause us to neglect
+		* to update the stats file for a long time.
+		*/
+		dbentry = pgstat_get_db_entry(msg->databaseid, false);
+		if ((dbentry != NULL) && (msg->clock_time < dbentry->stats_timestamp))
 		{
-			/*
-			 * Sure enough, time went backwards.  Force a new stats file write
-			 * to get back in sync; but first, log a complaint.
-			 */
-			char	   *writetime;
-			char	   *mytime;
-
-			/* Copy because timestamptz_to_str returns a static buffer */
-			writetime = pstrdup(timestamptz_to_str(last_statwrite));
-			mytime = pstrdup(timestamptz_to_str(cur_ts));
-			elog(LOG, "last_statwrite %s is later than collector's time %s",
-				 writetime, mytime);
-			pfree(writetime);
-			pfree(mytime);
-
-			last_statrequest = cur_ts;
-			last_statwrite = last_statrequest - 1;
+			TimestampTz cur_ts = GetCurrentTimestamp();
+
+			if (cur_ts < dbentry->stats_timestamp)
+			{
+				/*
+				* Sure enough, time went backwards.  Force a new stats file write
+				* to get back in sync; but first, log a complaint.
+				*/
+				char	   *writetime;
+				char	   *mytime;
+
+				/* Copy because timestamptz_to_str returns a static buffer */
+				writetime = pstrdup(timestamptz_to_str(dbentry->stats_timestamp));
+				mytime = pstrdup(timestamptz_to_str(cur_ts));
+				elog(LOG, "last_statwrite %s is later than collector's time %s for "
+					"db %d", writetime, mytime, dbentry->databaseid);
+				pfree(writetime);
+				pfree(mytime);
+
+				last_statrequests[num_statrequests].request_time = cur_ts;
+				dbentry->stats_timestamp = cur_ts - 1;
+			}
 		}
 	}
 }
@@ -4278,10 +4702,17 @@ pgstat_recv_dropdb(PgStat_MsgDropdb *msg, int len)
 	dbentry = pgstat_get_db_entry(msg->m_databaseid, false);
 
 	/*
-	 * If found, remove it.
+	 * If found, remove it (along with the db statfile).
 	 */
 	if (dbentry)
 	{
+		char db_statfile[strlen(pgstat_stat_db_filename) + 11];
+		snprintf(db_statfile, strlen(pgstat_stat_db_filename) + 11,
+				 pgstat_stat_filename, dbentry->databaseid);
+		
+		elog(DEBUG1, "removing %s", db_statfile);
+		unlink(db_statfile);
+		
 		if (dbentry->tables != NULL)
 			hash_destroy(dbentry->tables);
 		if (dbentry->functions != NULL)
@@ -4687,3 +5118,58 @@ pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len)
 						   HASH_REMOVE, NULL);
 	}
 }
+
+/* ----------
+ * pgstat_write_statsfile_needed() -
+ *
+ *	Checks whether there's a db stats request, requiring a file write.
+ * 
+ *	TODO Seems that thanks the way we handle last_statrequests (erase after
+ *	a write), this is unnecessary. Just check that there's at least one
+ *	request and you're done. Although there might be delayed requests ...
+ * ----------
+ */
+
+static bool pgstat_write_statsfile_needed(void)
+{
+	int i = 0;
+	PgStat_StatDBEntry *dbentry;
+	
+	/* Check the databases if they need to refresh the stats. */
+	for (i = 0; i < num_statrequests; i++)
+	{
+		dbentry = pgstat_get_db_entry(last_statrequests[i].databaseid, false);
+		
+		/* No dbentry yet or too old. */
+		if ((! dbentry) ||
+			(dbentry->stats_timestamp < last_statrequests[i].request_time)) {
+			return true;
+		}
+		
+	}
+	
+	/* Well, everything was written recently ... */
+	return false;
+}
+
+/* ----------
+ * pgstat_write_statsfile_needed() -
+ *
+ *	Checks whether stats for a particular DB need to be written to a file).
+ * ----------
+ */
+
+static bool
+pgstat_db_requested(Oid databaseid)
+{
+	int i = 0;
+	
+	/* Check the databases if they need to refresh the stats. */
+	for (i = 0; i < num_statrequests; i++)
+	{
+		if (last_statrequests[i].databaseid == databaseid)
+			return true;
+	}
+	
+	return false;
+}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index b0af9f5..08ef324 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -8709,20 +8709,43 @@ static void
 assign_pgstat_temp_directory(const char *newval, void *extra)
 {
 	/* check_canonical_path already canonicalized newval for us */
+	char	   *dname;
 	char	   *tname;
 	char	   *fname;
-
-	tname = guc_malloc(ERROR, strlen(newval) + 12);		/* /pgstat.tmp */
-	sprintf(tname, "%s/pgstat.tmp", newval);
-	fname = guc_malloc(ERROR, strlen(newval) + 13);		/* /pgstat.stat */
-	sprintf(fname, "%s/pgstat.stat", newval);
-
+	char	   *tname_db;
+	char	   *fname_db;
+
+	/* directory */
+	dname = guc_malloc(ERROR, strlen(newval) + 1);		/* runtime dir */
+	sprintf(dname, "%s", newval);
+
+	/* global stats */
+	tname = guc_malloc(ERROR, strlen(newval) + 12);		/* /global.tmp */
+	sprintf(tname, "%s/global.tmp", newval);
+	fname = guc_malloc(ERROR, strlen(newval) + 13);		/* /global.stat */
+	sprintf(fname, "%s/global.stat", newval);
+
+	/* per-db stats */
+	tname_db = guc_malloc(ERROR, strlen(newval) + 8);		/* /%d.tmp */
+	sprintf(tname_db, "%s/%%d.tmp", newval);
+	fname_db = guc_malloc(ERROR, strlen(newval) + 9);		/* /%d.stat */
+	sprintf(fname_db, "%s/%%d.stat", newval);
+
+	if (pgstat_stat_directory)
+		free(pgstat_stat_directory);
+	pgstat_stat_directory = dname;
 	if (pgstat_stat_tmpname)
 		free(pgstat_stat_tmpname);
 	pgstat_stat_tmpname = tname;
 	if (pgstat_stat_filename)
 		free(pgstat_stat_filename);
 	pgstat_stat_filename = fname;
+	if (pgstat_stat_db_tmpname)
+		free(pgstat_stat_db_tmpname);
+	pgstat_stat_db_tmpname = tname_db;
+	if (pgstat_stat_db_filename)
+		free(pgstat_stat_db_filename);
+	pgstat_stat_db_filename = fname_db;
 }
 
 static bool
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 1bba426..da1e19f 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -192,6 +192,7 @@ const char *subdirs[] = {
 	"base",
 	"base/1",
 	"pg_tblspc",
+	"pg_stat",
 	"pg_stat_tmp"
 };
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 03c0174..d7d4ad9 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -205,6 +205,7 @@ typedef struct PgStat_MsgInquiry
 	PgStat_MsgHdr m_hdr;
 	TimestampTz clock_time;		/* observed local clock time */
 	TimestampTz cutoff_time;	/* minimum acceptable file timestamp */
+	Oid			databaseid;		/* requested DB (InvalidOid => all DBs) */
 } PgStat_MsgInquiry;
 
 
@@ -514,7 +515,7 @@ typedef union PgStat_Msg
  * ------------------------------------------------------------
  */
 
-#define PGSTAT_FILE_FORMAT_ID	0x01A5BC9A
+#define PGSTAT_FILE_FORMAT_ID	0xA240CA47
 
 /* ----------
  * PgStat_StatDBEntry			The collector's data per database
@@ -545,6 +546,7 @@ typedef struct PgStat_StatDBEntry
 	PgStat_Counter n_block_write_time;
 
 	TimestampTz stat_reset_timestamp;
+	TimestampTz stats_timestamp;		/* time of db stats file update */
 
 	/*
 	 * tables and functions must be last in the struct, because we don't write
@@ -722,8 +724,11 @@ extern bool pgstat_track_activities;
 extern bool pgstat_track_counts;
 extern int	pgstat_track_functions;
 extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern char *pgstat_stat_directory;
 extern char *pgstat_stat_tmpname;
 extern char *pgstat_stat_filename;
+extern char *pgstat_stat_db_tmpname;
+extern char *pgstat_stat_db_filename;
 
 /*
  * BgWriter statistics counters are updated directly by bgwriter and bufmgr
#33Tomas Vondra
tv@fuzzy.cz
In reply to: Alvaro Herrera (#26)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 1.2.2013 17:19, Alvaro Herrera wrote:

Tomas Vondra wrote:

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index be3adf1..4ec485e 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -64,10 +64,14 @@

/* ----------
* Paths for the statistics files (relative to installation's $PGDATA).
+ * Permanent and temprorary, global and per-database files.

Note typo in the line above.

-#define PGSTAT_STAT_PERMANENT_FILENAME		"global/pgstat.stat"
-#define PGSTAT_STAT_PERMANENT_TMPFILE		"global/pgstat.tmp"
+#define PGSTAT_STAT_PERMANENT_DIRECTORY		"stat"
+#define PGSTAT_STAT_PERMANENT_FILENAME		"stat/global.stat"
+#define PGSTAT_STAT_PERMANENT_TMPFILE		"stat/global.tmp"
+#define PGSTAT_STAT_PERMANENT_DB_FILENAME	"stat/%d.stat"
+#define PGSTAT_STAT_PERMANENT_DB_TMPFILE	"stat/%d.tmp"
+char	   *pgstat_stat_directory = NULL;
char	   *pgstat_stat_filename = NULL;
char	   *pgstat_stat_tmpname = NULL;
+char	   *pgstat_stat_db_filename = NULL;
+char	   *pgstat_stat_db_tmpname = NULL;

I don't like the quoted parts very much; it seems awkward to have the
snprintf patterns in one place and have them be used in very distant

I don't see that as particularly awkward, but that's a matter of taste.
I still see that as a bunch of constants that are sprintf patterns at
the same time.

places. Is there a way to improve that? Also, if I understand clearly,
the pgstat_stat_db_filename value needs to be an snprintf pattern too,
right? What if it doesn't contain the required % specifier?

Ummmm, yes - it needs to be a pattern too, but the user specifies the
directory (stats_temp_directory) and this is used to derive all the
other values - see assign_pgstat_temp_directory() in guc.c.

Also, if you can filter this through pgindent, that would be best. Make
sure to add DBWriteRequest to src/tools/pgindent/typedefs_list.

Will do.

+		/*
+		 * There's no request for this DB yet, so lets create it (allocate a
+		 * space for it, set the values).
+		 */
+		if (last_statrequests == NULL)
+			last_statrequests = palloc(sizeof(DBWriteRequest));
+		else
+			last_statrequests = repalloc(last_statrequests,
+								(num_statrequests + 1)*sizeof(DBWriteRequest));
+		
+		last_statrequests[num_statrequests].databaseid = msg->databaseid;
+		last_statrequests[num_statrequests].request_time = msg->clock_time;
+		num_statrequests += 1;

Having to repalloc this array each time seems wrong. Would a list
instead of an array help? see ilist.c/h; I vote for a dlist because you
can easily delete elements from the middle of it, if required (I think
you'd need that.)

Thanks. I'm not very familiar with the list interface, so I've used
plain array. But yes, there are better ways than doing repalloc all the
time.

+		char db_statfile[strlen(pgstat_stat_db_filename) + 11];
+		snprintf(db_statfile, strlen(pgstat_stat_db_filename) + 11,
+				 pgstat_stat_filename, dbentry->databaseid);

This pattern seems rather frequent. Can we use a macro or similar here?
Encapsulating the "11" better would be good. Magic numbers are evil.

Yes, this needs to be cleaned / improved.

diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 613c1c2..b3467d2 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -205,6 +205,7 @@ typedef struct PgStat_MsgInquiry
PgStat_MsgHdr m_hdr;
TimestampTz clock_time;		/* observed local clock time */
TimestampTz cutoff_time;	/* minimum acceptable file timestamp */
+	Oid			databaseid;		/* requested DB (InvalidOid => all DBs) */
} PgStat_MsgInquiry;

Do we need to support the case that somebody requests stuff from the
"shared" DB? IIRC that's what InvalidOid means in pgstat ...

Frankly, I don't know, but I guess we do because it was in the original
code, and there are such inquiries right after the database starts
(that's why I had to add pgstat_write_db_dummyfile).

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34Jeff Janes
jeff.janes@gmail.com
In reply to: Tomas Vondra (#32)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On Sun, Feb 3, 2013 at 4:51 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

LOG: last_statwrite 11133-08-28 19:22:31.711744+02 is later than
collector's time 2013-02-04 00:54:21.113439+01 for db 19093
WARNING: pgstat wait timeout
LOG: last_statwrite 39681-12-23 18:48:48.9093+01 is later than
collector's time 2013-02-04 00:54:31.424681+01 for db 46494
FATAL: could not find block containing chunk 0x2af4a60
LOG: statistics collector process (PID 10063) exited with exit code 1
WARNING: pgstat wait timeout
WARNING: pgstat wait timeout

I'm not entirely sure where the FATAL came from, but it seems it was
somehow related to the issue - it was quite reproducible, although I
don't see how exactly could this happen. There relevant block of code
looks like this:

char *writetime;
char *mytime;

/* Copy because timestamptz_to_str returns a static buffer */
writetime = pstrdup(timestamptz_to_str(dbentry->stats_timestamp));
mytime = pstrdup(timestamptz_to_str(cur_ts));
elog(LOG, "last_statwrite %s is later than collector's time %s for "
"db %d", writetime, mytime, dbentry->databaseid);
pfree(writetime);
pfree(mytime);

which seems quite fine to mee. I'm not sure how one of the pfree calls
could fail?

I don't recall seeing the FATAL errors myself, but didn't keep the
logfile around. (I do recall seeing the pgstat wait timeout).

Are you using windows? pstrdup seems to be different there.

I'm afraid I don't have much to say on the code. Indeed I never even
look at it (other than grepping for pstrdup just now). I am taking a
purely experimental approach, Since Alvaro and others have looked at
the code.

Anyway, attached is a patch that fixes all three issues, i.e.

1) the un-initialized timestamp
2) the "void" ommited from the signature
3) rename to "pg_stat" instead of just "stat"

Thanks.

If I shutdown the server and blow away the stats with "rm
data/pg_stat/*", it recovers gracefully when I start it back up. If a
do "rm -r data/pg_stat" then it has problems the next time I shut it
down, but I have no right to do that in the first place. If I initdb
a database without this patch, then shut it down and restart with
binaries that include this patch, and need to manually make the
pg_stat directory. Does that mean it needs a catalog bump in order to
force an initdb?

A review:

It applies cleanly (some offsets, no fuzz), builds without warnings,
and passes make check including with cassert.

The final test done in "make check" inherently tests this code, and it
passes. If I intentionally break the patch by making
pgstat_read_db_statsfile add one to the oid it opens, then the test
fails. So the existing test is at least plausible as a test.

doc/src/sgml/monitoring.sgml needs to be changed: "a permanent copy of
the statistics data is stored in the global subdirectory". I'm not
aware of any other needed changes to the docs.

The big question is whether we want this. I think we do. While
having hundreds of databases in a cluster is not recommended, that is
no reason not to handle it better than we do. I don't see any
down-sides, other than possibly some code uglification. Some file
systems might not deal well with having lots of small stats files
being rapidly written and rewritten, but it is hard to see how the
current behavior would be more favorable for those systems.

We do not already have this. There is no relevant spec. I can't see
how this could need pg_dump support (but what about pg_upgrade?)

I am not aware of any dangers.

I have a question about its completeness. When I first start up the
cluster and have not yet touched it, there is very little stats
collector activity, either with or without this patch. When I kick
the cluster sufficiently (I've been using vacuumdb -a to do that) then
there is a lot of stats collector activity. Even once the vacuumdb
has long finished, this high level of activity continues even though
the database is otherwise completely idle, and this seems to happen
for every. This patch makes that high level of activity much more
efficient, but it does not reduce the activity. I don't understand
why an idle database cannot get back into the state right after
start-up.

I do not think that the patch needs to solve this problem in order to
be accepted, but if it can be addressed while the author and reviewers
are paying attention to this part of the system, that would be ideal.
And if not, then we should at least remember that there is future work
that could be done here.

I created 1000 databases each with 200 single column tables (x serial
primary key).

After vacuumdb -a, I let it idle for a long time to see what steady
state was reached.

without the patch:
vacuumdb -a real 11m2.624s
idle steady state: 48.17% user, 39.24% system, 11.78% iowait, 0.81% idle.

with the patch:
vacuumdb -a real 6m41.306s
idle steady state: 7.86% user, 5.00% system 0.09% iowait 87% idle,

I also ran pgbench on a scale that fits in memory with fsync=off, on a
singe CPU machine. With the same above-mentioned 1000 databases as
unused decoys to bloat the stats file.

pgbench_tellers and branches undergo enough turn over that they should
get vacuumed every minuted (nap time).

Without the patch, they only get vacuumed every 40 minutes or so as
the autovac workers are so distracted by reading the bloated stats
file. and the TPS is ~680.

With the patch, they get vacuumed every 1 to 2 minutes and TPS is ~940

So this seems to be effective at it intended goal.

I have not done a review of the code itself, only the performance.

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35Tomas Vondra
tv@fuzzy.cz
In reply to: Jeff Janes (#34)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 5.2.2013 19:23, Jeff Janes wrote:

On Sun, Feb 3, 2013 at 4:51 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

LOG: last_statwrite 11133-08-28 19:22:31.711744+02 is later than
collector's time 2013-02-04 00:54:21.113439+01 for db 19093
WARNING: pgstat wait timeout
LOG: last_statwrite 39681-12-23 18:48:48.9093+01 is later than
collector's time 2013-02-04 00:54:31.424681+01 for db 46494
FATAL: could not find block containing chunk 0x2af4a60
LOG: statistics collector process (PID 10063) exited with exit code 1
WARNING: pgstat wait timeout
WARNING: pgstat wait timeout

I'm not entirely sure where the FATAL came from, but it seems it was
somehow related to the issue - it was quite reproducible, although I
don't see how exactly could this happen. There relevant block of code
looks like this:

char *writetime;
char *mytime;

/* Copy because timestamptz_to_str returns a static buffer */
writetime = pstrdup(timestamptz_to_str(dbentry->stats_timestamp));
mytime = pstrdup(timestamptz_to_str(cur_ts));
elog(LOG, "last_statwrite %s is later than collector's time %s for "
"db %d", writetime, mytime, dbentry->databaseid);
pfree(writetime);
pfree(mytime);

which seems quite fine to mee. I'm not sure how one of the pfree calls
could fail?

I don't recall seeing the FATAL errors myself, but didn't keep the
logfile around. (I do recall seeing the pgstat wait timeout).

Are you using windows? pstrdup seems to be different there.

Nope. I'll repeat the test with the original patch to find out what went
wrong, just to be sure it was fixed.

I'm afraid I don't have much to say on the code. Indeed I never even
look at it (other than grepping for pstrdup just now). I am taking a
purely experimental approach, Since Alvaro and others have looked at
the code.

Thanks for finding the issue with unitialized timestamp!

If I shutdown the server and blow away the stats with "rm
data/pg_stat/*", it recovers gracefully when I start it back up. If a
do "rm -r data/pg_stat" then it has problems the next time I shut it
down, but I have no right to do that in the first place. If I initdb
a database without this patch, then shut it down and restart with
binaries that include this patch, and need to manually make the
pg_stat directory. Does that mean it needs a catalog bump in order to
force an initdb?

Ummmm, what you mean by "catalog bump"?

Anyway, messing with files in the "base" directory is a bad idea in
general, and I don't think that's a reason to treat the pg_stat
directory differently. If you remove it by hand, you'll be rightfully
punished by various errors.

A review:

It applies cleanly (some offsets, no fuzz), builds without warnings,
and passes make check including with cassert.

The final test done in "make check" inherently tests this code, and it
passes. If I intentionally break the patch by making
pgstat_read_db_statsfile add one to the oid it opens, then the test
fails. So the existing test is at least plausible as a test.

doc/src/sgml/monitoring.sgml needs to be changed: "a permanent copy of
the statistics data is stored in the global subdirectory". I'm not
aware of any other needed changes to the docs.

Yeah, that should be "in the global/pg_stat subdirectory".

The big question is whether we want this. I think we do. While
having hundreds of databases in a cluster is not recommended, that is
no reason not to handle it better than we do. I don't see any
down-sides, other than possibly some code uglification. Some file
systems might not deal well with having lots of small stats files
being rapidly written and rewritten, but it is hard to see how the
current behavior would be more favorable for those systems.

If the filesystem has issues with that many entries, it's already hosed
by contents of the "base" directory (one per directory) or in the
database directories (multiple files per table).

Moreover, it's still possible to use tmpfs to handle this at runtime
(which is often the recommended solution with the current code), and use
the actual filesystem only for keeping the data across restarts.

We do not already have this. There is no relevant spec. I can't see
how this could need pg_dump support (but what about pg_upgrade?)

pg_dump - no

pg_upgrage - IMHO it should create the pg_stat directory. I don't think
it could "convert" statfile into the new format (by splitting it into
the pieces). I haven't checked but I believe the default behavior is to
delete it as there might be new fields / slight changes of meaning etc.

I am not aware of any dangers.

I have a question about its completeness. When I first start up the
cluster and have not yet touched it, there is very little stats
collector activity, either with or without this patch. When I kick
the cluster sufficiently (I've been using vacuumdb -a to do that) then
there is a lot of stats collector activity. Even once the vacuumdb
has long finished, this high level of activity continues even though
the database is otherwise completely idle, and this seems to happen
for every. This patch makes that high level of activity much more
efficient, but it does not reduce the activity. I don't understand
why an idle database cannot get back into the state right after
start-up.

What do you mean by "stats collector activity"? Is it reading/writing a
lot of data, or is it just using a lot of CPU?

Isn't that just a natural and expected behavior because the database
needs to actually perform ANALYZE to actually collect the data. Although
the tables are empty, it costs some CPU / IO and there's a lot of them
(1000 dbs, each with 200 tables).

I don't think there's a way around this. You may increase the autovacuum
naptime, but that's about all.

I do not think that the patch needs to solve this problem in order to
be accepted, but if it can be addressed while the author and reviewers
are paying attention to this part of the system, that would be ideal.
And if not, then we should at least remember that there is future work
that could be done here.

If I understand that correctly, you see the same behaviour even without
the patch, right? In that case I'd vote not to make the patch more
complex, and try to improve that separately (if it's even possible).

I created 1000 databases each with 200 single column tables (x serial
primary key).

After vacuumdb -a, I let it idle for a long time to see what steady
state was reached.

without the patch:
vacuumdb -a real 11m2.624s
idle steady state: 48.17% user, 39.24% system, 11.78% iowait, 0.81% idle.

with the patch:
vacuumdb -a real 6m41.306s
idle steady state: 7.86% user, 5.00% system 0.09% iowait 87% idle,

Nice. Another interesting numbers would be device utilization, average
I/O speed and required space (which should be ~2x the pgstat.stat size
without the patch).

I also ran pgbench on a scale that fits in memory with fsync=off, on a
singe CPU machine. With the same above-mentioned 1000 databases as
unused decoys to bloat the stats file.

pgbench_tellers and branches undergo enough turn over that they should
get vacuumed every minuted (nap time).

Without the patch, they only get vacuumed every 40 minutes or so as
the autovac workers are so distracted by reading the bloated stats
file. and the TPS is ~680.

With the patch, they get vacuumed every 1 to 2 minutes and TPS is ~940

Great, I haven't really aimed to improve pgbench results, but it seems
natural that the decreased CPU utilization can go somewhere else. Not bad.

Have you moved the stats somewhere to tmpfs, or have you used the
default location (on disk)?

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tomas Vondra (#35)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

with the patch:
vacuumdb -a real 6m41.306s
idle steady state: 7.86% user, 5.00% system 0.09% iowait 87% idle,

Nice. Another interesting numbers would be device utilization, average
I/O speed and required space (which should be ~2x the pgstat.stat size
without the patch).

this point is important - with large warehouse with lot of databases
and tables you have move stat file to some ramdisc - without it you
lost lot of IO capacity - and it is very important if you need only
half sized ramdisc

Regards

Pavel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#36)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Pavel Stehule <pavel.stehule@gmail.com> writes:

Nice. Another interesting numbers would be device utilization, average
I/O speed and required space (which should be ~2x the pgstat.stat size
without the patch).

this point is important - with large warehouse with lot of databases
and tables you have move stat file to some ramdisc - without it you
lost lot of IO capacity - and it is very important if you need only
half sized ramdisc

[ blink... ] I confess I'd not been paying close attention to this
thread, but if that's true I'd say the patch is DOA. Why should we
accept 2x bloat in the already-far-too-large stats file? I thought
the idea was just to split up the existing data into multiple files.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#37)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Tom Lane escribió:

Pavel Stehule <pavel.stehule@gmail.com> writes:

Nice. Another interesting numbers would be device utilization, average
I/O speed and required space (which should be ~2x the pgstat.stat size
without the patch).

this point is important - with large warehouse with lot of databases
and tables you have move stat file to some ramdisc - without it you
lost lot of IO capacity - and it is very important if you need only
half sized ramdisc

[ blink... ] I confess I'd not been paying close attention to this
thread, but if that's true I'd say the patch is DOA. Why should we
accept 2x bloat in the already-far-too-large stats file? I thought
the idea was just to split up the existing data into multiple files.

I think they are saying just the opposite: maximum disk space
utilization is now half of the unpatched code. This is because when we
need to write the temporary file to rename on top of the other one, the
temporary file is not of the size of the complete pgstat data collation,
but just that for the requested database.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39Pavel Stehule
pavel.stehule@gmail.com
In reply to: Alvaro Herrera (#38)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2013/2/6 Alvaro Herrera <alvherre@2ndquadrant.com>:

Tom Lane escribió:

Pavel Stehule <pavel.stehule@gmail.com> writes:

Nice. Another interesting numbers would be device utilization, average
I/O speed and required space (which should be ~2x the pgstat.stat size
without the patch).

this point is important - with large warehouse with lot of databases
and tables you have move stat file to some ramdisc - without it you
lost lot of IO capacity - and it is very important if you need only
half sized ramdisc

[ blink... ] I confess I'd not been paying close attention to this
thread, but if that's true I'd say the patch is DOA. Why should we
accept 2x bloat in the already-far-too-large stats file? I thought
the idea was just to split up the existing data into multiple files.

I think they are saying just the opposite: maximum disk space
utilization is now half of the unpatched code. This is because when we
need to write the temporary file to rename on top of the other one, the
temporary file is not of the size of the complete pgstat data collation,
but just that for the requested database.

+1

Pavel

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40Tomas Vondra
tv@fuzzy.cz
In reply to: Alvaro Herrera (#38)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Dne 06.02.2013 16:53, Alvaro Herrera napsal:

Tom Lane escribió:

Pavel Stehule <pavel.stehule@gmail.com> writes:

Nice. Another interesting numbers would be device utilization,

average

I/O speed and required space (which should be ~2x the pgstat.stat

size

without the patch).

this point is important - with large warehouse with lot of

databases

and tables you have move stat file to some ramdisc - without it

you

lost lot of IO capacity - and it is very important if you need

only

half sized ramdisc

[ blink... ] I confess I'd not been paying close attention to this
thread, but if that's true I'd say the patch is DOA. Why should we
accept 2x bloat in the already-far-too-large stats file? I thought
the idea was just to split up the existing data into multiple files.

I think they are saying just the opposite: maximum disk space
utilization is now half of the unpatched code. This is because when
we
need to write the temporary file to rename on top of the other one,
the
temporary file is not of the size of the complete pgstat data
collation,
but just that for the requested database.

Exactly. And I suspect the current (unpatched) code ofter requires more
than
twice the space because of open file descriptors to already deleted
files.

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41Jeff Janes
jeff.janes@gmail.com
In reply to: Tomas Vondra (#35)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On Tue, Feb 5, 2013 at 2:31 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 5.2.2013 19:23, Jeff Janes wrote:

If I shutdown the server and blow away the stats with "rm
data/pg_stat/*", it recovers gracefully when I start it back up. If a
do "rm -r data/pg_stat" then it has problems the next time I shut it
down, but I have no right to do that in the first place. If I initdb
a database without this patch, then shut it down and restart with
binaries that include this patch, and need to manually make the
pg_stat directory. Does that mean it needs a catalog bump in order to
force an initdb?

Ummmm, what you mean by "catalog bump"?

There is a catalog number in src/include/catalog/catversion.h, which
when changed forces one to redo initdb.

Formally I guess it is only for system catalog changes, but I thought
it was used for any on-disk changes during development cycles. I like
it the way it is, as I can use the same data directory for both
versions of the binary (patched and unpatched), and just manually
create or remove the directory pg_stat directory when changing modes.
That is ideal for testing this patch, probably not ideal for being
committed into the tree along with all the other ongoing devel work.
But I think this is something the committer has to worry about.

I have a question about its completeness. When I first start up the
cluster and have not yet touched it, there is very little stats
collector activity, either with or without this patch. When I kick
the cluster sufficiently (I've been using vacuumdb -a to do that) then
there is a lot of stats collector activity. Even once the vacuumdb
has long finished, this high level of activity continues even though
the database is otherwise completely idle, and this seems to happen
for every. This patch makes that high level of activity much more
efficient, but it does not reduce the activity. I don't understand
why an idle database cannot get back into the state right after
start-up.

What do you mean by "stats collector activity"? Is it reading/writing a
lot of data, or is it just using a lot of CPU?

Basically, the launching of new autovac workers and the work that that
entails. Your patch reduces the size of data that needs to be
written, read, and parsed for every launch, but not the number of
times that that happens.

Isn't that just a natural and expected behavior because the database
needs to actually perform ANALYZE to actually collect the data. Although
the tables are empty, it costs some CPU / IO and there's a lot of them
(1000 dbs, each with 200 tables).

It isn't touching the tables at all, just the stats files.

I was wrong about the cluster opening quiet. It only does that if,
while the cluster was shutdown, you remove the statistics files which
I was doing, as I was switching back and forth between patched and
unpatched.

When the cluster opens, any databases that don't have statistics in
the stat file(s) will not get an autovacuum worker process spawned.
They only start getting spawned once someone asks for statistics for
that database. But then once that happens, that database then gets a
worker spawned for it every naptime (or, at least, as close to that as
the server can keep up with) for eternity, even if that database is
never used again. The only way to stop this is the unsupported way of
blowing away the permanent stats files.

I don't think there's a way around this. You may increase the autovacuum
naptime, but that's about all.

I do not think that the patch needs to solve this problem in order to
be accepted, but if it can be addressed while the author and reviewers
are paying attention to this part of the system, that would be ideal.
And if not, then we should at least remember that there is future work
that could be done here.

If I understand that correctly, you see the same behaviour even without
the patch, right? In that case I'd vote not to make the patch more
complex, and try to improve that separately (if it's even possible).

OK. I just thought that while digging through the code, you might
have a good idea for fixing this part as well. If so, it would be a
shame for that idea to be lost when you move on to other things.

I created 1000 databases each with 200 single column tables (x serial
primary key).

After vacuumdb -a, I let it idle for a long time to see what steady
state was reached.

without the patch:
vacuumdb -a real 11m2.624s
idle steady state: 48.17% user, 39.24% system, 11.78% iowait, 0.81% idle.

with the patch:
vacuumdb -a real 6m41.306s
idle steady state: 7.86% user, 5.00% system 0.09% iowait 87% idle,

Nice. Another interesting numbers would be device utilization, average
I/O speed

I didn't gather that data, as I never figured out how to interpret
those numbers and so don't have much faith in them. (But I am pretty
impressed with the numbers I do understand)

and required space (which should be ~2x the pgstat.stat size
without the patch).

I didn't study this in depth, but the patch seems to do what it should
(that is, take less space, not more). If I fill the device up so
that there is less than 3x the size of the stats file available for
use (i.e. space for the file itself and for 1 temp copy version of it
but not space for a complete second temp copy), I occasionally get
out-of-space warning with unpatched. But never get those errors with
patched. Indeed, with patch I never get warnings even with only 1.04
times the aggregate size of the stats files available for use. (That
is, size for all the files, plus just 1/25 that amount to spare.
Obviously this limit is specific to having 1000 databases of equal
size.)

I also ran pgbench on a scale that fits in memory with fsync=off, on a
singe CPU machine. With the same above-mentioned 1000 databases as
unused decoys to bloat the stats file.

pgbench_tellers and branches undergo enough turn over that they should
get vacuumed every minuted (nap time).

Without the patch, they only get vacuumed every 40 minutes or so as
the autovac workers are so distracted by reading the bloated stats
file. and the TPS is ~680.

With the patch, they get vacuumed every 1 to 2 minutes and TPS is ~940

Great, I haven't really aimed to improve pgbench results, but it seems
natural that the decreased CPU utilization can go somewhere else. Not bad.

My goal there was to prove to myself that the correct tables were
getting vacuumed. The TPS measurements were just a by-product of
that, but since I had them I figured I'd post them.

Have you moved the stats somewhere to tmpfs, or have you used the
default location (on disk)?

All the specific work I reported was with them on disk, except the
part about running out of space, which was done on /dev/shm. But even
with the data theoretically going to disk, the kernel caches it well
enough that I wouldn't expect things to change very much.

Two more questions I've come up with:

If I invoke pg_stat_reset() from a database, the corresponding file
does not get removed from the pg_stat_tmp directory. And when the
database is shut down, a file for the reset database does get created
in pg_stat. Is this OK?

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#41)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Jeff Janes <jeff.janes@gmail.com> writes:

On Tue, Feb 5, 2013 at 2:31 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

Ummmm, what you mean by "catalog bump"?

There is a catalog number in src/include/catalog/catversion.h, which
when changed forces one to redo initdb.

Formally I guess it is only for system catalog changes, but I thought
it was used for any on-disk changes during development cycles.

Yeah, it would be appropriate to bump the catversion if we're creating a
new PGDATA subdirectory.

I'm not excited about keeping code to take care of the lack of such a
subdirectory at runtime, as I gather there is in the current state of
the patch. Formally, if there were such code, we'd not need a
catversion bump --- the rule of thumb is to change catversion if the new
postgres executable would fail regression tests without a run of the new
initdb. But it's pretty dumb to keep such code indefinitely, when it
would have no more possible use after the next catversion bump (which is
seldom more than a week or two away during devel phase).

What do you mean by "stats collector activity"? Is it reading/writing a
lot of data, or is it just using a lot of CPU?

Basically, the launching of new autovac workers and the work that that
entails. Your patch reduces the size of data that needs to be
written, read, and parsed for every launch, but not the number of
times that that happens.

It doesn't seem very reasonable to ask this patch to redesign the
autovacuum algorithms, which is essentially what it'll take to improve
that. That's a completely separate layer of code.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43Tomas Vondra
tv@fuzzy.cz
In reply to: Tom Lane (#42)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 7.2.2013 00:40, Tom Lane wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

On Tue, Feb 5, 2013 at 2:31 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

Ummmm, what you mean by "catalog bump"?

There is a catalog number in src/include/catalog/catversion.h, which
when changed forces one to redo initdb.

Formally I guess it is only for system catalog changes, but I thought
it was used for any on-disk changes during development cycles.

Yeah, it would be appropriate to bump the catversion if we're creating a
new PGDATA subdirectory.

I'm not excited about keeping code to take care of the lack of such a
subdirectory at runtime, as I gather there is in the current state of
the patch. Formally, if there were such code, we'd not need a

No, there is nothing to handle that at runtime. The directory is created
at initdb and the patch expects that (and fails if it's gone).

catversion bump --- the rule of thumb is to change catversion if the new
postgres executable would fail regression tests without a run of the new
initdb. But it's pretty dumb to keep such code indefinitely, when it
would have no more possible use after the next catversion bump (which is
seldom more than a week or two away during devel phase).

What do you mean by "stats collector activity"? Is it reading/writing a
lot of data, or is it just using a lot of CPU?

Basically, the launching of new autovac workers and the work that that
entails. Your patch reduces the size of data that needs to be
written, read, and parsed for every launch, but not the number of
times that that happens.

It doesn't seem very reasonable to ask this patch to redesign the
autovacuum algorithms, which is essentially what it'll take to improve
that. That's a completely separate layer of code.

My opinion, exactly.

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44Jeff Janes
jeff.janes@gmail.com
In reply to: Tomas Vondra (#35)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On Tue, Feb 5, 2013 at 2:31 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

We do not already have this. There is no relevant spec. I can't see
how this could need pg_dump support (but what about pg_upgrade?)

pg_dump - no

pg_upgrage - IMHO it should create the pg_stat directory. I don't think
it could "convert" statfile into the new format (by splitting it into
the pieces). I haven't checked but I believe the default behavior is to
delete it as there might be new fields / slight changes of meaning etc.

Right, I have no concerns with pg_upgrade any more. The pg_stat will
inherently get created by the initdb of the new cluster (because the
initdb will done with the new binaries with your patch in place them).

pg_upgrade currently doesn't copy over global/pgstat.stat. So that
means the new cluster doesn't have the activity stats either way,
patch or unpatched. So if it is not currently a problem it will not
become one under the proposed patch.

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#32)
1 attachment(s)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Here's an updated version of this patch that takes care of the issues I
reported previously: no more repalloc() of the requests array; it's now
an slist, which makes the code much more natural IMV. And no more
messing around with doing sprintf to create a separate sprintf pattern
for the per-db stats file; instead have a function to return the name
that uses just the pgstat dir as stored by GUC. I think this can be
further simplified still.

I haven't reviewed the rest yet; please do give this a try to confirm
that the speedups previously reported are still there (i.e. I didn't
completely blew it).

Thanks

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

stats-split-v7.patchtext/x-diff; charset=us-asciiDownload
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
***************
*** 38,43 ****
--- 38,44 ----
  #include "access/xact.h"
  #include "catalog/pg_database.h"
  #include "catalog/pg_proc.h"
+ #include "lib/ilist.h"
  #include "libpq/ip.h"
  #include "libpq/libpq.h"
  #include "libpq/pqsignal.h"
***************
*** 66,73 ****
   * Paths for the statistics files (relative to installation's $PGDATA).
   * ----------
   */
! #define PGSTAT_STAT_PERMANENT_FILENAME		"global/pgstat.stat"
! #define PGSTAT_STAT_PERMANENT_TMPFILE		"global/pgstat.tmp"
  
  /* ----------
   * Timer definitions.
--- 67,75 ----
   * Paths for the statistics files (relative to installation's $PGDATA).
   * ----------
   */
! #define PGSTAT_STAT_PERMANENT_DIRECTORY		"pg_stat"
! #define PGSTAT_STAT_PERMANENT_FILENAME		"pg_stat/global.stat"
! #define PGSTAT_STAT_PERMANENT_TMPFILE		"pg_stat/global.tmp"
  
  /* ----------
   * Timer definitions.
***************
*** 115,120 **** int			pgstat_track_activity_query_size = 1024;
--- 117,123 ----
   * Built from GUC parameter
   * ----------
   */
+ char	   *pgstat_stat_directory = NULL;
  char	   *pgstat_stat_filename = NULL;
  char	   *pgstat_stat_tmpname = NULL;
  
***************
*** 219,229 **** static int	localNumBackends = 0;
   */
  static PgStat_GlobalStats globalStats;
  
! /* Last time the collector successfully wrote the stats file */
! static TimestampTz last_statwrite;
  
! /* Latest statistics request time from backends */
! static TimestampTz last_statrequest;
  
  static volatile bool need_exit = false;
  static volatile bool got_SIGHUP = false;
--- 222,237 ----
   */
  static PgStat_GlobalStats globalStats;
  
! /* Write request info for each database */
! typedef struct DBWriteRequest
! {
! 	Oid			databaseid;		/* OID of the database to write */
! 	TimestampTz request_time;	/* timestamp of the last write request */
! 	slist_node	next;
! } DBWriteRequest;
  
! /* Latest statistics request time from backends for each DB */
! static slist_head	last_statrequests = SLIST_STATIC_INIT(last_statrequests);
  
  static volatile bool need_exit = false;
  static volatile bool got_SIGHUP = false;
***************
*** 252,262 **** static void pgstat_sighup_handler(SIGNAL_ARGS);
  static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
  static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
  					 Oid tableoid, bool create);
! static void pgstat_write_statsfile(bool permanent);
! static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent);
  static void backend_read_statsfile(void);
  static void pgstat_read_current_status(void);
  
  static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
  static void pgstat_send_funcstats(void);
  static HTAB *pgstat_collect_oids(Oid catalogid);
--- 260,276 ----
  static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
  static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
  					 Oid tableoid, bool create);
! static void pgstat_write_statsfile(bool permanent, bool force);
! static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent);
! static void pgstat_write_db_dummyfile(Oid databaseid);
! static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs);
! static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
  static void backend_read_statsfile(void);
  static void pgstat_read_current_status(void);
  
+ static bool pgstat_write_statsfile_needed(void);
+ static bool pgstat_db_requested(Oid databaseid);
+ 
  static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
  static void pgstat_send_funcstats(void);
  static HTAB *pgstat_collect_oids(Oid catalogid);
***************
*** 285,291 **** static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le
  static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
  static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
  
- 
  /* ------------------------------------------------------------
   * Public functions called from postmaster follow
   * ------------------------------------------------------------
--- 299,304 ----
***************
*** 549,556 **** startup_failed:
  void
  pgstat_reset_all(void)
  {
! 	unlink(pgstat_stat_filename);
! 	unlink(PGSTAT_STAT_PERMANENT_FILENAME);
  }
  
  #ifdef EXEC_BACKEND
--- 562,605 ----
  void
  pgstat_reset_all(void)
  {
! 	DIR * dir;
! 	struct dirent * entry;
! 
! 	dir = AllocateDir(pgstat_stat_directory);
! 	while ((entry = ReadDir(dir, pgstat_stat_directory)) != NULL)
! 	{
! 		char   *fname;
! 		int		totlen;
! 
! 		if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
! 			continue;
! 
! 		totlen = strlen(pgstat_stat_directory) + strlen(entry->d_name) + 2;
! 		fname = palloc(totlen);
! 
! 		snprintf(fname, totlen, "%s/%s", pgstat_stat_directory, entry->d_name);
! 		unlink(fname);
! 		pfree(fname);
! 	}
! 	FreeDir(dir);
! 
! 	dir = AllocateDir(PGSTAT_STAT_PERMANENT_DIRECTORY);
! 	while ((entry = ReadDir(dir, PGSTAT_STAT_PERMANENT_DIRECTORY)) != NULL)
! 	{
! 		char   *fname;
! 		int		totlen;
! 
! 		if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
! 			continue;
! 
! 		totlen = strlen(pgstat_stat_directory) + strlen(entry->d_name) + 2;
! 		fname = palloc(totlen);
! 
! 		snprintf(fname, totlen, "%s/%s", PGSTAT_STAT_PERMANENT_FILENAME, entry->d_name);
! 		unlink(fname);
! 		pfree(fname);
! 	}
! 	FreeDir(dir);
  }
  
  #ifdef EXEC_BACKEND
***************
*** 1408,1420 **** pgstat_ping(void)
   * ----------
   */
  static void
! pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time)
  {
  	PgStat_MsgInquiry msg;
  
  	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_INQUIRY);
  	msg.clock_time = clock_time;
  	msg.cutoff_time = cutoff_time;
  	pgstat_send(&msg, sizeof(msg));
  }
  
--- 1457,1470 ----
   * ----------
   */
  static void
! pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time, Oid databaseid)
  {
  	PgStat_MsgInquiry msg;
  
  	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_INQUIRY);
  	msg.clock_time = clock_time;
  	msg.cutoff_time = cutoff_time;
+ 	msg.databaseid = databaseid;
  	pgstat_send(&msg, sizeof(msg));
  }
  
***************
*** 3004,3009 **** PgstatCollectorMain(int argc, char *argv[])
--- 3054,3060 ----
  	int			len;
  	PgStat_Msg	msg;
  	int			wr;
+ 	bool		first_write = true;
  
  	IsUnderPostmaster = true;	/* we are a postmaster subprocess now */
  
***************
*** 3053,3069 **** PgstatCollectorMain(int argc, char *argv[])
  	init_ps_display("stats collector process", "", "", "");
  
  	/*
- 	 * Arrange to write the initial status file right away
- 	 */
- 	last_statrequest = GetCurrentTimestamp();
- 	last_statwrite = last_statrequest - 1;
- 
- 	/*
  	 * Read in an existing statistics stats file or initialize the stats to
! 	 * zero.
  	 */
  	pgStatRunningInCollector = true;
! 	pgStatDBHash = pgstat_read_statsfile(InvalidOid, true);
  
  	/*
  	 * Loop to process messages until we get SIGQUIT or detect ungraceful
--- 3104,3114 ----
  	init_ps_display("stats collector process", "", "", "");
  
  	/*
  	 * Read in an existing statistics stats file or initialize the stats to
! 	 * zero (read data for all databases, including table/func stats).
  	 */
  	pgStatRunningInCollector = true;
! 	pgStatDBHash = pgstat_read_statsfile(InvalidOid, true, false);
  
  	/*
  	 * Loop to process messages until we get SIGQUIT or detect ungraceful
***************
*** 3107,3116 **** PgstatCollectorMain(int argc, char *argv[])
  
  			/*
  			 * Write the stats file if a new request has arrived that is not
! 			 * satisfied by existing file.
  			 */
! 			if (last_statwrite < last_statrequest)
! 				pgstat_write_statsfile(false);
  
  			/*
  			 * Try to receive and process a message.  This will not block,
--- 3152,3165 ----
  
  			/*
  			 * Write the stats file if a new request has arrived that is not
! 			 * satisfied by existing file (force writing all files if it's
! 			 * the first write after startup).
  			 */
! 			if (first_write || pgstat_write_statsfile_needed())
! 			{
! 				pgstat_write_statsfile(false, first_write);
! 				first_write = false;
! 			}
  
  			/*
  			 * Try to receive and process a message.  This will not block,
***************
*** 3269,3275 **** PgstatCollectorMain(int argc, char *argv[])
  	/*
  	 * Save the final stats to reuse at next startup.
  	 */
! 	pgstat_write_statsfile(true);
  
  	exit(0);
  }
--- 3318,3324 ----
  	/*
  	 * Save the final stats to reuse at next startup.
  	 */
! 	pgstat_write_statsfile(true, true);
  
  	exit(0);
  }
***************
*** 3349,3354 **** pgstat_get_db_entry(Oid databaseid, bool create)
--- 3398,3404 ----
  		result->n_block_write_time = 0;
  
  		result->stat_reset_timestamp = GetCurrentTimestamp();
+ 		result->stats_timestamp = 0;
  
  		memset(&hash_ctl, 0, sizeof(hash_ctl));
  		hash_ctl.keysize = sizeof(Oid);
***************
*** 3429,3451 **** pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create)
   *	shutting down only), remove the temporary file so that backends
   *	starting up under a new postmaster can't read the old data before
   *	the new collector is ready.
   * ----------
   */
  static void
! pgstat_write_statsfile(bool permanent)
  {
  	HASH_SEQ_STATUS hstat;
- 	HASH_SEQ_STATUS tstat;
- 	HASH_SEQ_STATUS fstat;
  	PgStat_StatDBEntry *dbentry;
- 	PgStat_StatTabEntry *tabentry;
- 	PgStat_StatFuncEntry *funcentry;
  	FILE	   *fpout;
  	int32		format_id;
  	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
  	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
  	int			rc;
  
  	/*
  	 * Open the statistics temp file to write out the current values.
  	 */
--- 3479,3503 ----
   *	shutting down only), remove the temporary file so that backends
   *	starting up under a new postmaster can't read the old data before
   *	the new collector is ready.
+  *
+  *	When 'allDbs' is false, only the requested databases (listed in
+  * 	last_statrequests) will be written. If 'allDbs' is true, all databases
+  * 	will be written.
   * ----------
   */
  static void
! pgstat_write_statsfile(bool permanent, bool allDbs)
  {
  	HASH_SEQ_STATUS hstat;
  	PgStat_StatDBEntry *dbentry;
  	FILE	   *fpout;
  	int32		format_id;
  	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
  	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
  	int			rc;
  
+ 	elog(DEBUG1, "writing statsfile '%s'", statfile);
+ 
  	/*
  	 * Open the statistics temp file to write out the current values.
  	 */
***************
*** 3484,3489 **** pgstat_write_statsfile(bool permanent)
--- 3536,3555 ----
  	while ((dbentry = (PgStat_StatDBEntry *) hash_seq_search(&hstat)) != NULL)
  	{
  		/*
+ 		 * Write our the tables and functions into a separate file, but only
+ 		 * if the database is in the requests or if all DBs are to be written.
+ 		 *
+ 		 * We need to do this before the dbentry write to write the proper
+ 		 * timestamp to the global file.
+ 		 */
+ 		if (allDbs || pgstat_db_requested(dbentry->databaseid))
+ 		{
+ 			elog(DEBUG1, "writing statsfile for DB %d", dbentry->databaseid);
+ 			dbentry->stats_timestamp = globalStats.stats_timestamp;
+ 			pgstat_write_db_statsfile(dbentry, permanent);
+ 		}
+ 
+ 		/*
  		 * Write out the DB entry including the number of live backends. We
  		 * don't write the tables or functions pointers, since they're of no
  		 * use to any other process.
***************
*** 3493,3521 **** pgstat_write_statsfile(bool permanent)
  		(void) rc;				/* we'll check for error with ferror */
  
  		/*
- 		 * Walk through the database's access stats per table.
- 		 */
- 		hash_seq_init(&tstat, dbentry->tables);
- 		while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
- 		{
- 			fputc('T', fpout);
- 			rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
- 			(void) rc;			/* we'll check for error with ferror */
- 		}
- 
- 		/*
- 		 * Walk through the database's function stats table.
- 		 */
- 		hash_seq_init(&fstat, dbentry->functions);
- 		while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
- 		{
- 			fputc('F', fpout);
- 			rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
- 			(void) rc;			/* we'll check for error with ferror */
- 		}
- 
- 		/*
  		 * Mark the end of this DB
  		 */
  		fputc('d', fpout);
  	}
--- 3559,3568 ----
  		(void) rc;				/* we'll check for error with ferror */
  
  		/*
  		 * Mark the end of this DB
+ 		 *
+ 		 * TODO Does using these chars still make sense, when the tables/func
+ 		 * stats are moved to a separate file?
  		 */
  		fputc('d', fpout);
  	}
***************
*** 3527,3532 **** pgstat_write_statsfile(bool permanent)
--- 3574,3607 ----
  	 */
  	fputc('E', fpout);
  
+ 	/* In any case, we can just throw away all the db requests, but we need to
+ 	 * write dummy files for databases without a stat entry (it would cause
+ 	 * issues in pgstat_read_db_statsfile_timestamp and pgstat wait timeouts).
+ 	 * This may happen e.g. for shared DB (oid = 0) right after initdb.
+ 	 */
+ 	if (!slist_is_empty(&last_statrequests))
+ 	{
+ 		slist_mutable_iter	iter;
+ 
+ 		slist_foreach_modify(iter, &last_statrequests)
+ 		{
+ 			DBWriteRequest *req = slist_container(DBWriteRequest, next,
+ 												  iter.cur);
+ 
+ 			/*
+ 			 * Create dummy files for requested databases without a proper
+ 			 * dbentry. It's much easier this way than dealing with multiple
+ 			 * timestamps, possibly existing but not yet written DBs etc.
+ 			 * */
+ 			if (!pgstat_get_db_entry(req->databaseid, false))
+ 				pgstat_write_db_dummyfile(req->databaseid);
+ 
+ 			pfree(req);
+ 		}
+ 
+ 		slist_init(&last_statrequests);
+ 	}
+ 
  	if (ferror(fpout))
  	{
  		ereport(LOG,
***************
*** 3552,3608 **** pgstat_write_statsfile(bool permanent)
  						tmpfile, statfile)));
  		unlink(tmpfile);
  	}
- 	else
- 	{
- 		/*
- 		 * Successful write, so update last_statwrite.
- 		 */
- 		last_statwrite = globalStats.stats_timestamp;
- 
- 		/*
- 		 * If there is clock skew between backends and the collector, we could
- 		 * receive a stats request time that's in the future.  If so, complain
- 		 * and reset last_statrequest.	Resetting ensures that no inquiry
- 		 * message can cause more than one stats file write to occur.
- 		 */
- 		if (last_statrequest > last_statwrite)
- 		{
- 			char	   *reqtime;
- 			char	   *mytime;
- 
- 			/* Copy because timestamptz_to_str returns a static buffer */
- 			reqtime = pstrdup(timestamptz_to_str(last_statrequest));
- 			mytime = pstrdup(timestamptz_to_str(last_statwrite));
- 			elog(LOG, "last_statrequest %s is later than collector's time %s",
- 				 reqtime, mytime);
- 			pfree(reqtime);
- 			pfree(mytime);
- 
- 			last_statrequest = last_statwrite;
- 		}
- 	}
  
  	if (permanent)
  		unlink(pgstat_stat_filename);
  }
  
  
  /* ----------
   * pgstat_read_statsfile() -
   *
   *	Reads in an existing statistics collector file and initializes the
   *	databases' hash table (whose entries point to the tables' hash tables).
   * ----------
   */
  static HTAB *
! pgstat_read_statsfile(Oid onlydb, bool permanent)
  {
  	PgStat_StatDBEntry *dbentry;
  	PgStat_StatDBEntry dbbuf;
- 	PgStat_StatTabEntry *tabentry;
- 	PgStat_StatTabEntry tabbuf;
- 	PgStat_StatFuncEntry funcbuf;
- 	PgStat_StatFuncEntry *funcentry;
  	HASHCTL		hash_ctl;
  	HTAB	   *dbhash;
  	HTAB	   *tabhash = NULL;
--- 3627,3905 ----
  						tmpfile, statfile)));
  		unlink(tmpfile);
  	}
  
  	if (permanent)
  		unlink(pgstat_stat_filename);
  }
  
+ /*
+  * return the length that a DB stat file would have (including terminating \0)
+  *
+  * XXX We could avoid this overhead by caching a maximum length in
+  * assign_pgstat_temp_directory; also the distinctions on "permanent" and
+  * "tempname" seem pointless (what do you mean to save one byte of stack
+  * space!?)
+  */
+ static int
+ get_dbstat_file_len(bool permanent, bool tempname, Oid databaseid)
+ {
+ 	char	tmp[1];
+ 	int		len;
+ 
+ 	/* don't actually print, but return how many chars would be used */
+ 	len = snprintf(tmp, 1, "%s/db_%u.%s",
+ 				   permanent ? "pg_stat" : pgstat_stat_directory,
+ 				   databaseid,
+ 				   tempname ? "tmp" : "stat");
+ 	/* XXX pointless? */
+ 	if (len >= MAXPGPATH)
+ 		elog(PANIC, "pgstat path too long");
+ 
+ 	/* count terminating \0 */
+ 	return len + 1;
+ }
+ 
+ /*
+  * return the filename for a DB stat file; filename is the output buffer,
+  * and len is its length.
+  */
+ static void
+ get_dbstat_filename(bool permanent, bool tempname, Oid databaseid,
+ 					char *filename, int len)
+ {
+ #ifdef USE_ASSERT_CHECKING
+ 	int		printed;
+ 
+ 	printed =
+ #endif
+ 		snprintf(filename, len, "%s/db_%u.%s",
+ 				 permanent ? "pg_stat" : pgstat_stat_directory,
+ 				 databaseid,
+ 				 tempname ? "tmp" : "stat");
+ 	Assert(printed <= len);
+ }
+ 
+ /* ----------
+  * pgstat_write_db_statsfile() -
+  *
+  *	Tell the news. This writes stats file for a single database.
+  *
+  *	If writing to the permanent file (happens when the collector is
+  *	shutting down only), remove the temporary file so that backends
+  *	starting up under a new postmaster can't read the old data before
+  *	the new collector is ready.
+  * ----------
+  */
+ static void
+ pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent)
+ {
+ 	HASH_SEQ_STATUS tstat;
+ 	HASH_SEQ_STATUS fstat;
+ 	PgStat_StatTabEntry *tabentry;
+ 	PgStat_StatFuncEntry *funcentry;
+ 	FILE	   *fpout;
+ 	int32		format_id;
+ 	Oid			dbid = dbentry->databaseid;
+ 	int			rc;
+ 	int			tmpfilelen = get_dbstat_file_len(permanent, true, dbid);
+ 	char		tmpfile[tmpfilelen];
+ 	int			statfilelen = get_dbstat_file_len(permanent, false, dbid);
+ 	char		statfile[statfilelen];
+ 
+ 	get_dbstat_filename(permanent, true, dbid, tmpfile, tmpfilelen);
+ 	get_dbstat_filename(permanent, false, dbid, statfile, statfilelen);
+ 
+ 	elog(DEBUG1, "writing statsfile '%s'", statfile);
+ 
+ 	/*
+ 	 * Open the statistics temp file to write out the current values.
+ 	 */
+ 	fpout = AllocateFile(tmpfile, PG_BINARY_W);
+ 	if (fpout == NULL)
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 				 errmsg("could not open temporary statistics file \"%s\": %m",
+ 						tmpfile)));
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * Write the file header --- currently just a format ID.
+ 	 */
+ 	format_id = PGSTAT_FILE_FORMAT_ID;
+ 	rc = fwrite(&format_id, sizeof(format_id), 1, fpout);
+ 	(void) rc;					/* we'll check for error with ferror */
+ 
+ 	/*
+ 	 * Write the timestamp.
+ 	 */
+ 	rc = fwrite(&(globalStats.stats_timestamp), sizeof(globalStats.stats_timestamp), 1, fpout);
+ 	(void) rc;					/* we'll check for error with ferror */
+ 
+ 	/*
+ 	 * Walk through the database's access stats per table.
+ 	 */
+ 	hash_seq_init(&tstat, dbentry->tables);
+ 	while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
+ 	{
+ 		fputc('T', fpout);
+ 		rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
+ 		(void) rc;			/* we'll check for error with ferror */
+ 	}
+ 
+ 	/*
+ 	 * Walk through the database's function stats table.
+ 	 */
+ 	hash_seq_init(&fstat, dbentry->functions);
+ 	while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
+ 	{
+ 		fputc('F', fpout);
+ 		rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
+ 		(void) rc;			/* we'll check for error with ferror */
+ 	}
+ 
+ 	/*
+ 	 * No more output to be done. Close the temp file and replace the old
+ 	 * pgstat.stat with it.  The ferror() check replaces testing for error
+ 	 * after each individual fputc or fwrite above.
+ 	 */
+ 	fputc('E', fpout);
+ 
+ 	if (ferror(fpout))
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 			   errmsg("could not write temporary statistics file \"%s\": %m",
+ 					  tmpfile)));
+ 		FreeFile(fpout);
+ 		unlink(tmpfile);
+ 	}
+ 	else if (FreeFile(fpout) < 0)
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 			   errmsg("could not close temporary statistics file \"%s\": %m",
+ 					  tmpfile)));
+ 		unlink(tmpfile);
+ 	}
+ 	else if (rename(tmpfile, statfile) < 0)
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 				 errmsg("could not rename temporary statistics file \"%s\" to \"%s\": %m",
+ 						tmpfile, statfile)));
+ 		unlink(tmpfile);
+ 	}
+ 
+ 	if (permanent)
+ 	{
+ 		elog(DEBUG1, "removing temporary stat file '%s'", tmpfile);
+ 		unlink(tmpfile);
+ 	}
+ }
+ 
+ 
+ /* ----------
+  * pgstat_write_db_dummyfile() -
+  *
+  *	All this does is writing a dummy stat file for databases without dbentry
+  *	yet. It basically writes just a file header - format ID and a timestamp.
+  * ----------
+  */
+ static void
+ pgstat_write_db_dummyfile(Oid databaseid)
+ {
+ 	FILE	   *fpout;
+ 	int32		format_id;
+ 	int			rc;
+ 	int			tmpfilelen = get_dbstat_file_len(false, true, databaseid);
+ 	char		tmpfile[tmpfilelen];
+ 	int			statfilelen = get_dbstat_file_len(false, false, databaseid);
+ 	char		statfile[statfilelen];
+ 
+ 	get_dbstat_filename(false, true, databaseid, tmpfile, tmpfilelen);
+ 	get_dbstat_filename(false, false, databaseid, statfile, statfilelen);
+ 
+ 	elog(DEBUG1, "writing statsfile '%s'", statfile);
+ 
+ 	/*
+ 	 * Open the statistics temp file to write out the current values.
+ 	 */
+ 	fpout = AllocateFile(tmpfile, PG_BINARY_W);
+ 	if (fpout == NULL)
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 				 errmsg("could not open temporary statistics file \"%s\": %m",
+ 						tmpfile)));
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * Write the file header --- currently just a format ID.
+ 	 */
+ 	format_id = PGSTAT_FILE_FORMAT_ID;
+ 	rc = fwrite(&format_id, sizeof(format_id), 1, fpout);
+ 	(void) rc;					/* we'll check for error with ferror */
+ 
+ 	/*
+ 	 * Write the timestamp.
+ 	 */
+ 	rc = fwrite(&(globalStats.stats_timestamp), sizeof(globalStats.stats_timestamp), 1, fpout);
+ 	(void) rc;					/* we'll check for error with ferror */
+ 
+ 	/*
+ 	 * No more output to be done. Close the temp file and replace the old
+ 	 * pgstat.stat with it.  The ferror() check replaces testing for error
+ 	 * after each individual fputc or fwrite above.
+ 	 */
+ 	fputc('E', fpout);
+ 
+ 	if (ferror(fpout))
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 			   errmsg("could not write temporary dummy statistics file \"%s\": %m",
+ 					  tmpfile)));
+ 		FreeFile(fpout);
+ 		unlink(tmpfile);
+ 	}
+ 	else if (FreeFile(fpout) < 0)
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 			   errmsg("could not close temporary dummy statistics file \"%s\": %m",
+ 					  tmpfile)));
+ 		unlink(tmpfile);
+ 	}
+ 	else if (rename(tmpfile, statfile) < 0)
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 				 errmsg("could not rename temporary dummy statistics file \"%s\" to \"%s\": %m",
+ 						tmpfile, statfile)));
+ 		unlink(tmpfile);
+ 	}
+ 
+ }
  
  /* ----------
   * pgstat_read_statsfile() -
   *
   *	Reads in an existing statistics collector file and initializes the
   *	databases' hash table (whose entries point to the tables' hash tables).
+  *
+  *	Allows reading only the global stats (at database level), which is just
+  *	enough for many purposes (e.g. autovacuum launcher etc.). If this is
+  *	sufficient for you, use onlydbs=true.
   * ----------
   */
  static HTAB *
! pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs)
  {
  	PgStat_StatDBEntry *dbentry;
  	PgStat_StatDBEntry dbbuf;
  	HASHCTL		hash_ctl;
  	HTAB	   *dbhash;
  	HTAB	   *tabhash = NULL;
***************
*** 3613,3618 **** pgstat_read_statsfile(Oid onlydb, bool permanent)
--- 3910,3920 ----
  	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
  
  	/*
+ 	 * If we want a db-level stats only, we don't want a particular db.
+ 	 */
+ 	Assert(!((onlydb != InvalidOid) && onlydbs));
+ 
+ 	/*
  	 * The tables will live in pgStatLocalContext.
  	 */
  	pgstat_setup_memcxt();
***************
*** 3758,3763 **** pgstat_read_statsfile(Oid onlydb, bool permanent)
--- 4060,4075 ----
  				 */
  				tabhash = dbentry->tables;
  				funchash = dbentry->functions;
+ 
+ 				/*
+ 				 * Read the data from the file for this database. If there was
+ 				 * onlydb specified (!= InvalidOid), we would not get here because
+ 				 * of a break above. So we don't need to recheck.
+ 				 */
+ 				if (!onlydbs)
+ 					pgstat_read_db_statsfile(dbentry->databaseid, tabhash, funchash,
+ 											permanent);
+ 
  				break;
  
  				/*
***************
*** 3768,3773 **** pgstat_read_statsfile(Oid onlydb, bool permanent)
--- 4080,4177 ----
  				funchash = NULL;
  				break;
  
+ 			case 'E':
+ 				goto done;
+ 
+ 			default:
+ 				ereport(pgStatRunningInCollector ? LOG : WARNING,
+ 						(errmsg("corrupted statistics file \"%s\"",
+ 								statfile)));
+ 				goto done;
+ 		}
+ 	}
+ 
+ done:
+ 	FreeFile(fpin);
+ 
+ 	if (permanent)
+ 		unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+ 
+ 	return dbhash;
+ }
+ 
+ 
+ /* ----------
+  * pgstat_read_db_statsfile() -
+  *
+  *	Reads in an existing statistics collector db file and initializes the
+  *	tables and functions hash tables (for the database identified by Oid).
+  * ----------
+  */
+ static void
+ pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent)
+ {
+ 	PgStat_StatTabEntry *tabentry;
+ 	PgStat_StatTabEntry tabbuf;
+ 	PgStat_StatFuncEntry funcbuf;
+ 	PgStat_StatFuncEntry *funcentry;
+ 	FILE	   *fpin;
+ 	int32		format_id;
+ 	TimestampTz timestamp;
+ 	bool		found;
+ 	int			statfilelen = get_dbstat_file_len(permanent, false, databaseid);
+ 	char		statfile[statfilelen];
+ 
+ 	get_dbstat_filename(permanent, false, databaseid, statfile, statfilelen);
+ 
+ 	/*
+ 	 * Try to open the status file. If it doesn't exist, the backends simply
+ 	 * return zero for anything and the collector simply starts from scratch
+ 	 * with empty counters.
+ 	 *
+ 	 * ENOENT is a possibility if the stats collector is not running or has
+ 	 * not yet written the stats file the first time.  Any other failure
+ 	 * condition is suspicious.
+ 	 */
+ 	if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
+ 	{
+ 		if (errno != ENOENT)
+ 			ereport(pgStatRunningInCollector ? LOG : WARNING,
+ 					(errcode_for_file_access(),
+ 					 errmsg("could not open statistics file \"%s\": %m",
+ 							statfile)));
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * Verify it's of the expected format.
+ 	 */
+ 	if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id)
+ 		|| format_id != PGSTAT_FILE_FORMAT_ID)
+ 	{
+ 		ereport(pgStatRunningInCollector ? LOG : WARNING,
+ 				(errmsg("corrupted statistics file \"%s\"", statfile)));
+ 		goto done;
+ 	}
+ 
+ 	/*
+ 	 * Read global stats struct
+ 	 */
+ 	if (fread(&timestamp, 1, sizeof(timestamp), fpin) != sizeof(timestamp))
+ 	{
+ 		ereport(pgStatRunningInCollector ? LOG : WARNING,
+ 				(errmsg("corrupted statistics file \"%s\"", statfile)));
+ 		goto done;
+ 	}
+ 
+ 	/*
+ 	 * We found an existing collector stats file. Read it and put all the
+ 	 * hashtable entries into place.
+ 	 */
+ 	for (;;)
+ 	{
+ 		switch (fgetc(fpin))
+ 		{
  				/*
  				 * 'T'	A PgStat_StatTabEntry follows.
  				 */
***************
*** 3854,3878 **** done:
  	FreeFile(fpin);
  
  	if (permanent)
! 		unlink(PGSTAT_STAT_PERMANENT_FILENAME);
  
! 	return dbhash;
  }
  
  /* ----------
!  * pgstat_read_statsfile_timestamp() -
   *
!  *	Attempt to fetch the timestamp of an existing stats file.
   *	Returns TRUE if successful (timestamp is stored at *ts).
   * ----------
   */
  static bool
! pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
  {
! 	PgStat_GlobalStats myGlobalStats;
  	FILE	   *fpin;
  	int32		format_id;
! 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
  
  	/*
  	 * Try to open the status file.  As above, anything but ENOENT is worthy
--- 4258,4294 ----
  	FreeFile(fpin);
  
  	if (permanent)
! 	{
! 		int		statfilelen = get_dbstat_file_len(permanent, false, databaseid);
! 		char	statfile[statfilelen];
  
! 		get_dbstat_filename(permanent, false, databaseid, statfile, statfilelen);
! 
! 		elog(DEBUG1, "removing permanent stats file '%s'", statfile);
! 		unlink(statfile);
! 	}
! 
! 	return;
  }
  
+ 
  /* ----------
!  * pgstat_read_db_statsfile_timestamp() -
   *
!  *	Attempt to fetch the timestamp of an existing stats file (for a DB).
   *	Returns TRUE if successful (timestamp is stored at *ts).
   * ----------
   */
  static bool
! pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent, TimestampTz *ts)
  {
! 	TimestampTz timestamp;
  	FILE	   *fpin;
  	int32		format_id;
! 	int			filenamelen = get_dbstat_file_len(permanent, false, databaseid);
! 	char		statfile[filenamelen];
! 
! 	get_dbstat_filename(permanent, false, databaseid, statfile, filenamelen);
  
  	/*
  	 * Try to open the status file.  As above, anything but ENOENT is worthy
***************
*** 3903,3909 **** pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
  	/*
  	 * Read global stats struct
  	 */
! 	if (fread(&myGlobalStats, 1, sizeof(myGlobalStats), fpin) != sizeof(myGlobalStats))
  	{
  		ereport(pgStatRunningInCollector ? LOG : WARNING,
  				(errmsg("corrupted statistics file \"%s\"", statfile)));
--- 4319,4325 ----
  	/*
  	 * Read global stats struct
  	 */
! 	if (fread(&timestamp, 1, sizeof(TimestampTz), fpin) != sizeof(TimestampTz))
  	{
  		ereport(pgStatRunningInCollector ? LOG : WARNING,
  				(errmsg("corrupted statistics file \"%s\"", statfile)));
***************
*** 3911,3917 **** pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
  		return false;
  	}
  
! 	*ts = myGlobalStats.stats_timestamp;
  
  	FreeFile(fpin);
  	return true;
--- 4327,4333 ----
  		return false;
  	}
  
! 	*ts = timestamp;
  
  	FreeFile(fpin);
  	return true;
***************
*** 3947,3953 **** backend_read_statsfile(void)
  
  		CHECK_FOR_INTERRUPTS();
  
! 		ok = pgstat_read_statsfile_timestamp(false, &file_ts);
  
  		cur_ts = GetCurrentTimestamp();
  		/* Calculate min acceptable timestamp, if we didn't already */
--- 4363,4369 ----
  
  		CHECK_FOR_INTERRUPTS();
  
! 		ok = pgstat_read_db_statsfile_timestamp(MyDatabaseId, false, &file_ts);
  
  		cur_ts = GetCurrentTimestamp();
  		/* Calculate min acceptable timestamp, if we didn't already */
***************
*** 4006,4012 **** backend_read_statsfile(void)
  				pfree(mytime);
  			}
  
! 			pgstat_send_inquiry(cur_ts, min_ts);
  			break;
  		}
  
--- 4422,4428 ----
  				pfree(mytime);
  			}
  
! 			pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
  			break;
  		}
  
***************
*** 4016,4022 **** backend_read_statsfile(void)
  
  		/* Not there or too old, so kick the collector and wait a bit */
  		if ((count % PGSTAT_INQ_LOOP_COUNT) == 0)
! 			pgstat_send_inquiry(cur_ts, min_ts);
  
  		pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
  	}
--- 4432,4438 ----
  
  		/* Not there or too old, so kick the collector and wait a bit */
  		if ((count % PGSTAT_INQ_LOOP_COUNT) == 0)
! 			pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
  
  		pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
  	}
***************
*** 4026,4034 **** backend_read_statsfile(void)
  
  	/* Autovacuum launcher wants stats about all databases */
  	if (IsAutoVacuumLauncherProcess())
! 		pgStatDBHash = pgstat_read_statsfile(InvalidOid, false);
  	else
! 		pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false);
  }
  
  
--- 4442,4457 ----
  
  	/* Autovacuum launcher wants stats about all databases */
  	if (IsAutoVacuumLauncherProcess())
! 		/*
! 		 * FIXME Does it really need info including tables/functions? Or is it enough to read
! 		 * database-level stats? It seems to me the launcher needs PgStat_StatDBEntry only
! 		 * (at least that's how I understand the rebuild_database_list() in autovacuum.c),
! 		 * because pgstat_stattabentries are used in do_autovacuum() only, that that's what's
! 		 * executed in workers ... So maybe we'd be just fine by reading in the dbentries?
! 		 */
! 		pgStatDBHash = pgstat_read_statsfile(InvalidOid, false, true);
  	else
! 		pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false, false);
  }
  
  
***************
*** 4084,4109 **** pgstat_clear_snapshot(void)
  static void
  pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
  {
  	/*
! 	 * Advance last_statrequest if this requestor has a newer cutoff time
! 	 * than any previous request.
  	 */
! 	if (msg->cutoff_time > last_statrequest)
! 		last_statrequest = msg->cutoff_time;
  
  	/*
! 	 * If the requestor's local clock time is older than last_statwrite, we
  	 * should suspect a clock glitch, ie system time going backwards; though
  	 * the more likely explanation is just delayed message receipt.  It is
  	 * worth expending a GetCurrentTimestamp call to be sure, since a large
  	 * retreat in the system clock reading could otherwise cause us to neglect
  	 * to update the stats file for a long time.
  	 */
! 	if (msg->clock_time < last_statwrite)
  	{
  		TimestampTz cur_ts = GetCurrentTimestamp();
  
! 		if (cur_ts < last_statwrite)
  		{
  			/*
  			 * Sure enough, time went backwards.  Force a new stats file write
--- 4507,4559 ----
  static void
  pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
  {
+ 	slist_iter	iter;
+ 	bool		found = false;
+ 	DBWriteRequest *newreq;
+ 	PgStat_StatDBEntry *dbentry;
+ 
+ 	elog(DEBUG1, "received inquiry for %d", msg->databaseid);
+ 
+ 	/*
+ 	 * Find the last write request for this DB (found=true in that case). Plain
+ 	 * linear search, not really worth doing any magic here (probably).
+ 	 */
+ 	slist_foreach(iter, &last_statrequests)
+ 	{
+ 		DBWriteRequest *req = slist_container(DBWriteRequest, next, iter.cur);
+ 
+ 		if (req->databaseid != msg->databaseid)
+ 			continue;
+ 
+ 		if (msg->cutoff_time > req->request_time)
+ 			req->request_time = msg->cutoff_time;
+ 		found = true;
+ 		return;
+ 	}
+ 
  	/*
! 	 * There's no request for this DB yet, so create one.
  	 */
! 	newreq = palloc(sizeof(DBWriteRequest));
! 
! 	newreq->databaseid = msg->databaseid;
! 	newreq->request_time = msg->clock_time;
! 	slist_push_head(&last_statrequests, &newreq->next);
  
  	/*
! 	 * If the requestor's local clock time is older than stats_timestamp, we
  	 * should suspect a clock glitch, ie system time going backwards; though
  	 * the more likely explanation is just delayed message receipt.  It is
  	 * worth expending a GetCurrentTimestamp call to be sure, since a large
  	 * retreat in the system clock reading could otherwise cause us to neglect
  	 * to update the stats file for a long time.
  	 */
! 	dbentry = pgstat_get_db_entry(msg->databaseid, false);
! 	if ((dbentry != NULL) && (msg->clock_time < dbentry->stats_timestamp))
  	{
  		TimestampTz cur_ts = GetCurrentTimestamp();
  
! 		if (cur_ts < dbentry->stats_timestamp)
  		{
  			/*
  			 * Sure enough, time went backwards.  Force a new stats file write
***************
*** 4113,4127 **** pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
  			char	   *mytime;
  
  			/* Copy because timestamptz_to_str returns a static buffer */
! 			writetime = pstrdup(timestamptz_to_str(last_statwrite));
  			mytime = pstrdup(timestamptz_to_str(cur_ts));
! 			elog(LOG, "last_statwrite %s is later than collector's time %s",
! 				 writetime, mytime);
  			pfree(writetime);
  			pfree(mytime);
  
! 			last_statrequest = cur_ts;
! 			last_statwrite = last_statrequest - 1;
  		}
  	}
  }
--- 4563,4578 ----
  			char	   *mytime;
  
  			/* Copy because timestamptz_to_str returns a static buffer */
! 			writetime = pstrdup(timestamptz_to_str(dbentry->stats_timestamp));
  			mytime = pstrdup(timestamptz_to_str(cur_ts));
! 			elog(LOG,
! 				 "stats_timestamp %s is later than collector's time %s for db %d",
! 				 writetime, mytime, dbentry->databaseid);
  			pfree(writetime);
  			pfree(mytime);
  
! 			newreq->request_time = cur_ts;
! 			dbentry->stats_timestamp = cur_ts - 1;
  		}
  	}
  }
***************
*** 4270,4298 **** pgstat_recv_tabpurge(PgStat_MsgTabpurge *msg, int len)
  static void
  pgstat_recv_dropdb(PgStat_MsgDropdb *msg, int len)
  {
  	PgStat_StatDBEntry *dbentry;
  
  	/*
  	 * Lookup the database in the hashtable.
  	 */
! 	dbentry = pgstat_get_db_entry(msg->m_databaseid, false);
  
  	/*
! 	 * If found, remove it.
  	 */
  	if (dbentry)
  	{
  		if (dbentry->tables != NULL)
  			hash_destroy(dbentry->tables);
  		if (dbentry->functions != NULL)
  			hash_destroy(dbentry->functions);
  
  		if (hash_search(pgStatDBHash,
! 						(void *) &(dbentry->databaseid),
  						HASH_REMOVE, NULL) == NULL)
  			ereport(ERROR,
! 					(errmsg("database hash table corrupted "
! 							"during cleanup --- abort")));
  	}
  }
  
--- 4721,4757 ----
  static void
  pgstat_recv_dropdb(PgStat_MsgDropdb *msg, int len)
  {
+ 	Oid			dbid = msg->m_databaseid;
  	PgStat_StatDBEntry *dbentry;
  
  	/*
  	 * Lookup the database in the hashtable.
  	 */
! 	dbentry = pgstat_get_db_entry(dbid, false);
  
  	/*
! 	 * If found, remove it (along with the db statfile).
  	 */
  	if (dbentry)
  	{
+ 		int			statfilelen = get_dbstat_file_len(true, false, dbid);
+ 		char		statfile[statfilelen];
+ 
+ 		get_dbstat_filename(true, false, dbid, statfile, statfilelen);
+ 
+ 		elog(DEBUG1, "removing %s", statfile);
+ 		unlink(statfile);
+ 
  		if (dbentry->tables != NULL)
  			hash_destroy(dbentry->tables);
  		if (dbentry->functions != NULL)
  			hash_destroy(dbentry->functions);
  
  		if (hash_search(pgStatDBHash,
! 						(void *) &dbid,
  						HASH_REMOVE, NULL) == NULL)
  			ereport(ERROR,
! 					(errmsg("database hash table corrupted during cleanup --- abort")));
  	}
  }
  
***************
*** 4687,4689 **** pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len)
--- 5146,5206 ----
  						   HASH_REMOVE, NULL);
  	}
  }
+ 
+ /* ----------
+  * pgstat_write_statsfile_needed() -
+  *
+  *	Checks whether there's a db stats request, requiring a file write.
+  *
+  *	TODO Seems that thanks the way we handle last_statrequests (erase after
+  *	a write), this is unnecessary. Just check that there's at least one
+  *	request and you're done. Although there might be delayed requests ...
+  * ----------
+  */
+ static bool
+ pgstat_write_statsfile_needed(void)
+ {
+ 	PgStat_StatDBEntry *dbentry;
+ 	slist_iter	iter;
+ 
+ 	/* Check the databases if they need to refresh the stats. */
+ 	slist_foreach(iter, &last_statrequests)
+ 	{
+ 		DBWriteRequest *req = slist_container(DBWriteRequest, next, iter.cur);
+ 
+ 		dbentry = pgstat_get_db_entry(req->databaseid, false);
+ 
+ 		/* No dbentry yet or too old. */
+ 		if (!dbentry || (dbentry->stats_timestamp < req->request_time))
+ 		{
+ 			return true;
+ 		}
+ 	}
+ 
+ 	/* Well, everything was written recently ... */
+ 	return false;
+ }
+ 
+ /* ----------
+  * pgstat_write_statsfile_needed() -
+  *
+  *	Checks whether stats for a particular DB need to be written to a file).
+  * ----------
+  */
+ 
+ static bool
+ pgstat_db_requested(Oid databaseid)
+ {
+ 	slist_iter	iter;
+ 
+ 	/* Check the databases if they need to refresh the stats. */
+ 	slist_foreach(iter, &last_statrequests)
+ 	{
+ 		DBWriteRequest	*req = slist_container(DBWriteRequest, next, iter.cur);
+ 
+ 		if (req->databaseid == databaseid)
+ 			return true;
+ 	}
+ 
+ 	return false;
+ }
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***************
*** 8704,8717 **** static void
  assign_pgstat_temp_directory(const char *newval, void *extra)
  {
  	/* check_canonical_path already canonicalized newval for us */
  	char	   *tname;
  	char	   *fname;
  
! 	tname = guc_malloc(ERROR, strlen(newval) + 12);		/* /pgstat.tmp */
! 	sprintf(tname, "%s/pgstat.tmp", newval);
! 	fname = guc_malloc(ERROR, strlen(newval) + 13);		/* /pgstat.stat */
! 	sprintf(fname, "%s/pgstat.stat", newval);
  
  	if (pgstat_stat_tmpname)
  		free(pgstat_stat_tmpname);
  	pgstat_stat_tmpname = tname;
--- 8704,8726 ----
  assign_pgstat_temp_directory(const char *newval, void *extra)
  {
  	/* check_canonical_path already canonicalized newval for us */
+ 	char	   *dname;
  	char	   *tname;
  	char	   *fname;
  
! 	/* directory */
! 	dname = guc_malloc(ERROR, strlen(newval) + 1);		/* runtime dir */
! 	sprintf(dname, "%s", newval);
  
+ 	/* global stats */
+ 	tname = guc_malloc(ERROR, strlen(newval) + 12);		/* /global.tmp */
+ 	sprintf(tname, "%s/global.tmp", newval);
+ 	fname = guc_malloc(ERROR, strlen(newval) + 13);		/* /global.stat */
+ 	sprintf(fname, "%s/global.stat", newval);
+ 
+ 	if (pgstat_stat_directory)
+ 		free(pgstat_stat_directory);
+ 	pgstat_stat_directory = dname;
  	if (pgstat_stat_tmpname)
  		free(pgstat_stat_tmpname);
  	pgstat_stat_tmpname = tname;
*** a/src/bin/initdb/initdb.c
--- b/src/bin/initdb/initdb.c
***************
*** 192,197 **** const char *subdirs[] = {
--- 192,198 ----
  	"base",
  	"base/1",
  	"pg_tblspc",
+ 	"pg_stat",
  	"pg_stat_tmp"
  };
  
*** a/src/include/pgstat.h
--- b/src/include/pgstat.h
***************
*** 205,210 **** typedef struct PgStat_MsgInquiry
--- 205,211 ----
  	PgStat_MsgHdr m_hdr;
  	TimestampTz clock_time;		/* observed local clock time */
  	TimestampTz cutoff_time;	/* minimum acceptable file timestamp */
+ 	Oid			databaseid;		/* requested DB (InvalidOid => all DBs) */
  } PgStat_MsgInquiry;
  
  
***************
*** 514,520 **** typedef union PgStat_Msg
   * ------------------------------------------------------------
   */
  
! #define PGSTAT_FILE_FORMAT_ID	0x01A5BC9A
  
  /* ----------
   * PgStat_StatDBEntry			The collector's data per database
--- 515,521 ----
   * ------------------------------------------------------------
   */
  
! #define PGSTAT_FILE_FORMAT_ID	0xA240CA47
  
  /* ----------
   * PgStat_StatDBEntry			The collector's data per database
***************
*** 545,550 **** typedef struct PgStat_StatDBEntry
--- 546,552 ----
  	PgStat_Counter n_block_write_time;
  
  	TimestampTz stat_reset_timestamp;
+ 	TimestampTz stats_timestamp;		/* time of db stats file update */
  
  	/*
  	 * tables and functions must be last in the struct, because we don't write
***************
*** 722,727 **** extern bool pgstat_track_activities;
--- 724,730 ----
  extern bool pgstat_track_counts;
  extern int	pgstat_track_functions;
  extern PGDLLIMPORT int pgstat_track_activity_query_size;
+ extern char *pgstat_stat_directory;
  extern char *pgstat_stat_tmpname;
  extern char *pgstat_stat_filename;
  
#46Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#45)
1 attachment(s)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Here's a ninth version of this patch. (version 8 went unpublished). I
have simplified a lot of things and improved some comments; I think I
understand much of it now. I think this patch is fairly close to
committable, but one issue remains, which is this bit in
pgstat_write_statsfiles():

/* In any case, we can just throw away all the db requests, but we need to
* write dummy files for databases without a stat entry (it would cause
* issues in pgstat_read_db_statsfile_timestamp and pgstat wait timeouts).
* This may happen e.g. for shared DB (oid = 0) right after initdb.
*/
if (!slist_is_empty(&last_statrequests))
{
slist_mutable_iter iter;

slist_foreach_modify(iter, &last_statrequests)
{
DBWriteRequest *req = slist_container(DBWriteRequest, next,
iter.cur);

/*
* Create dummy files for requested databases without a proper
* dbentry. It's much easier this way than dealing with multiple
* timestamps, possibly existing but not yet written DBs etc.
* */
if (!pgstat_get_db_entry(req->databaseid, false))
pgstat_write_db_dummyfile(req->databaseid);

pfree(req);
}

slist_init(&last_statrequests);
}

The problem here is that creating these dummy entries will cause a
difference in autovacuum behavior. Autovacuum will skip processing
databases with no pgstat entry, and the intended reason is that if
there's no pgstat entry it's because the database doesn't have enough
activity. Now perhaps we want to change that, but it should be an
explicit decision taken after discussion and thought, not side effect
from an unrelated patch.

Hm, and I now also realize another bug in this patch: the global stats
only include database entries for requested databases; but perhaps the
existing files can serve later requestors just fine for databases that
already had files; so the global stats file should continue to carry
entries for them, with the old timestamps.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

stats-split-v9.patchtext/x-diff; charset=us-asciiDownload
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
***************
*** 38,43 ****
--- 38,44 ----
  #include "access/xact.h"
  #include "catalog/pg_database.h"
  #include "catalog/pg_proc.h"
+ #include "lib/ilist.h"
  #include "libpq/ip.h"
  #include "libpq/libpq.h"
  #include "libpq/pqsignal.h"
***************
*** 66,73 ****
   * Paths for the statistics files (relative to installation's $PGDATA).
   * ----------
   */
! #define PGSTAT_STAT_PERMANENT_FILENAME		"global/pgstat.stat"
! #define PGSTAT_STAT_PERMANENT_TMPFILE		"global/pgstat.tmp"
  
  /* ----------
   * Timer definitions.
--- 67,75 ----
   * Paths for the statistics files (relative to installation's $PGDATA).
   * ----------
   */
! #define PGSTAT_STAT_PERMANENT_DIRECTORY		"pg_stat"
! #define PGSTAT_STAT_PERMANENT_FILENAME		"pg_stat/global.stat"
! #define PGSTAT_STAT_PERMANENT_TMPFILE		"pg_stat/global.tmp"
  
  /* ----------
   * Timer definitions.
***************
*** 115,120 **** int			pgstat_track_activity_query_size = 1024;
--- 117,124 ----
   * Built from GUC parameter
   * ----------
   */
+ char	   *pgstat_stat_directory = NULL;
+ int			pgstat_stat_dbfile_maxlen = 0;
  char	   *pgstat_stat_filename = NULL;
  char	   *pgstat_stat_tmpname = NULL;
  
***************
*** 219,229 **** static int	localNumBackends = 0;
   */
  static PgStat_GlobalStats globalStats;
  
! /* Last time the collector successfully wrote the stats file */
! static TimestampTz last_statwrite;
  
! /* Latest statistics request time from backends */
! static TimestampTz last_statrequest;
  
  static volatile bool need_exit = false;
  static volatile bool got_SIGHUP = false;
--- 223,238 ----
   */
  static PgStat_GlobalStats globalStats;
  
! /* Write request info for each database */
! typedef struct DBWriteRequest
! {
! 	Oid			databaseid;		/* OID of the database to write */
! 	TimestampTz request_time;	/* timestamp of the last write request */
! 	slist_node	next;
! } DBWriteRequest;
  
! /* Latest statistics request times from backends */
! static slist_head	last_statrequests = SLIST_STATIC_INIT(last_statrequests);
  
  static volatile bool need_exit = false;
  static volatile bool got_SIGHUP = false;
***************
*** 252,262 **** static void pgstat_sighup_handler(SIGNAL_ARGS);
  static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
  static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
  					 Oid tableoid, bool create);
! static void pgstat_write_statsfile(bool permanent);
! static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent);
  static void backend_read_statsfile(void);
  static void pgstat_read_current_status(void);
  
  static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
  static void pgstat_send_funcstats(void);
  static HTAB *pgstat_collect_oids(Oid catalogid);
--- 261,277 ----
  static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
  static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
  					 Oid tableoid, bool create);
! static void pgstat_write_statsfiles(bool permanent, bool allDbs);
! static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent);
! static void pgstat_write_db_dummyfile(Oid databaseid);
! static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent, bool deep);
! static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
  static void backend_read_statsfile(void);
  static void pgstat_read_current_status(void);
  
+ static bool pgstat_write_statsfile_needed(void);
+ static bool pgstat_db_requested(Oid databaseid);
+ 
  static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
  static void pgstat_send_funcstats(void);
  static HTAB *pgstat_collect_oids(Oid catalogid);
***************
*** 285,291 **** static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le
  static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
  static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
  
- 
  /* ------------------------------------------------------------
   * Public functions called from postmaster follow
   * ------------------------------------------------------------
--- 300,305 ----
***************
*** 541,556 **** startup_failed:
  }
  
  /*
   * pgstat_reset_all() -
   *
!  * Remove the stats file.  This is currently used only if WAL
   * recovery is needed after a crash.
   */
  void
  pgstat_reset_all(void)
  {
! 	unlink(pgstat_stat_filename);
! 	unlink(PGSTAT_STAT_PERMANENT_FILENAME);
  }
  
  #ifdef EXEC_BACKEND
--- 555,594 ----
  }
  
  /*
+  * subroutine for pgstat_reset_all
+  */
+ static void
+ pgstat_reset_remove_files(const char *directory)
+ {
+ 	DIR * dir;
+ 	struct dirent * entry;
+ 	char	fname[MAXPGPATH];
+ 
+ 	dir = AllocateDir(pgstat_stat_directory);
+ 	while ((entry = ReadDir(dir, pgstat_stat_directory)) != NULL)
+ 	{
+ 		if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
+ 			continue;
+ 
+ 		snprintf(fname, MAXPGPATH, "%s/%s", pgstat_stat_directory,
+ 				 entry->d_name);
+ 		unlink(fname);
+ 	}
+ 	FreeDir(dir);
+ }
+ 
+ /*
   * pgstat_reset_all() -
   *
!  * Remove the stats files.  This is currently used only if WAL
   * recovery is needed after a crash.
   */
  void
  pgstat_reset_all(void)
  {
! 
! 	pgstat_reset_remove_files(pgstat_stat_directory);
! 	pgstat_reset_remove_files(PGSTAT_STAT_PERMANENT_DIRECTORY);
  }
  
  #ifdef EXEC_BACKEND
***************
*** 1408,1420 **** pgstat_ping(void)
   * ----------
   */
  static void
! pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time)
  {
  	PgStat_MsgInquiry msg;
  
  	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_INQUIRY);
  	msg.clock_time = clock_time;
  	msg.cutoff_time = cutoff_time;
  	pgstat_send(&msg, sizeof(msg));
  }
  
--- 1446,1459 ----
   * ----------
   */
  static void
! pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time, Oid databaseid)
  {
  	PgStat_MsgInquiry msg;
  
  	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_INQUIRY);
  	msg.clock_time = clock_time;
  	msg.cutoff_time = cutoff_time;
+ 	msg.databaseid = databaseid;
  	pgstat_send(&msg, sizeof(msg));
  }
  
***************
*** 3004,3009 **** PgstatCollectorMain(int argc, char *argv[])
--- 3043,3049 ----
  	int			len;
  	PgStat_Msg	msg;
  	int			wr;
+ 	bool		first_write = true;
  
  	IsUnderPostmaster = true;	/* we are a postmaster subprocess now */
  
***************
*** 3053,3069 **** PgstatCollectorMain(int argc, char *argv[])
  	init_ps_display("stats collector process", "", "", "");
  
  	/*
- 	 * Arrange to write the initial status file right away
- 	 */
- 	last_statrequest = GetCurrentTimestamp();
- 	last_statwrite = last_statrequest - 1;
- 
- 	/*
  	 * Read in an existing statistics stats file or initialize the stats to
  	 * zero.
  	 */
  	pgStatRunningInCollector = true;
! 	pgStatDBHash = pgstat_read_statsfile(InvalidOid, true);
  
  	/*
  	 * Loop to process messages until we get SIGQUIT or detect ungraceful
--- 3093,3103 ----
  	init_ps_display("stats collector process", "", "", "");
  
  	/*
  	 * Read in an existing statistics stats file or initialize the stats to
  	 * zero.
  	 */
  	pgStatRunningInCollector = true;
! 	pgStatDBHash = pgstat_read_statsfile(InvalidOid, true, true);
  
  	/*
  	 * Loop to process messages until we get SIGQUIT or detect ungraceful
***************
*** 3107,3116 **** PgstatCollectorMain(int argc, char *argv[])
  
  			/*
  			 * Write the stats file if a new request has arrived that is not
! 			 * satisfied by existing file.
  			 */
! 			if (last_statwrite < last_statrequest)
! 				pgstat_write_statsfile(false);
  
  			/*
  			 * Try to receive and process a message.  This will not block,
--- 3141,3154 ----
  
  			/*
  			 * Write the stats file if a new request has arrived that is not
! 			 * satisfied by existing file (force writing all files if it's
! 			 * the first write after startup).
  			 */
! 			if (first_write || pgstat_write_statsfile_needed())
! 			{
! 				pgstat_write_statsfiles(false, first_write);
! 				first_write = false;
! 			}
  
  			/*
  			 * Try to receive and process a message.  This will not block,
***************
*** 3269,3275 **** PgstatCollectorMain(int argc, char *argv[])
  	/*
  	 * Save the final stats to reuse at next startup.
  	 */
! 	pgstat_write_statsfile(true);
  
  	exit(0);
  }
--- 3307,3313 ----
  	/*
  	 * Save the final stats to reuse at next startup.
  	 */
! 	pgstat_write_statsfiles(true, true);
  
  	exit(0);
  }
***************
*** 3349,3354 **** pgstat_get_db_entry(Oid databaseid, bool create)
--- 3387,3393 ----
  		result->n_block_write_time = 0;
  
  		result->stat_reset_timestamp = GetCurrentTimestamp();
+ 		result->stats_timestamp = 0;
  
  		memset(&hash_ctl, 0, sizeof(hash_ctl));
  		hash_ctl.keysize = sizeof(Oid);
***************
*** 3422,3451 **** pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create)
  
  
  /* ----------
!  * pgstat_write_statsfile() -
   *
   *	Tell the news.
!  *	If writing to the permanent file (happens when the collector is
!  *	shutting down only), remove the temporary file so that backends
   *	starting up under a new postmaster can't read the old data before
   *	the new collector is ready.
   * ----------
   */
  static void
! pgstat_write_statsfile(bool permanent)
  {
  	HASH_SEQ_STATUS hstat;
- 	HASH_SEQ_STATUS tstat;
- 	HASH_SEQ_STATUS fstat;
  	PgStat_StatDBEntry *dbentry;
- 	PgStat_StatTabEntry *tabentry;
- 	PgStat_StatFuncEntry *funcentry;
  	FILE	   *fpout;
  	int32		format_id;
  	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
  	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
  	int			rc;
  
  	/*
  	 * Open the statistics temp file to write out the current values.
  	 */
--- 3461,3492 ----
  
  
  /* ----------
!  * pgstat_write_statsfiles() -
   *
   *	Tell the news.
!  *	If writing to the permanent files (happens when the collector is
!  *	shutting down only), remove the temporary files so that backends
   *	starting up under a new postmaster can't read the old data before
   *	the new collector is ready.
+  *
+  *	When 'allDbs' is false, only the requested databases (listed in
+  * 	last_statrequests) will be written. If 'allDbs' is true, all databases
+  * 	will be written.
   * ----------
   */
  static void
! pgstat_write_statsfiles(bool permanent, bool allDbs)
  {
  	HASH_SEQ_STATUS hstat;
  	PgStat_StatDBEntry *dbentry;
  	FILE	   *fpout;
  	int32		format_id;
  	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
  	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
  	int			rc;
  
+ 	elog(DEBUG1, "writing statsfile '%s'", statfile);
+ 
  	/*
  	 * Open the statistics temp file to write out the current values.
  	 */
***************
*** 3484,3523 **** pgstat_write_statsfile(bool permanent)
  	while ((dbentry = (PgStat_StatDBEntry *) hash_seq_search(&hstat)) != NULL)
  	{
  		/*
! 		 * Write out the DB entry including the number of live backends. We
! 		 * don't write the tables or functions pointers, since they're of no
! 		 * use to any other process.
  		 */
  		fputc('D', fpout);
  		rc = fwrite(dbentry, offsetof(PgStat_StatDBEntry, tables), 1, fpout);
  		(void) rc;				/* we'll check for error with ferror */
- 
- 		/*
- 		 * Walk through the database's access stats per table.
- 		 */
- 		hash_seq_init(&tstat, dbentry->tables);
- 		while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
- 		{
- 			fputc('T', fpout);
- 			rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
- 			(void) rc;			/* we'll check for error with ferror */
- 		}
- 
- 		/*
- 		 * Walk through the database's function stats table.
- 		 */
- 		hash_seq_init(&fstat, dbentry->functions);
- 		while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
- 		{
- 			fputc('F', fpout);
- 			rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
- 			(void) rc;			/* we'll check for error with ferror */
- 		}
- 
- 		/*
- 		 * Mark the end of this DB
- 		 */
- 		fputc('d', fpout);
  	}
  
  	/*
--- 3525,3550 ----
  	while ((dbentry = (PgStat_StatDBEntry *) hash_seq_search(&hstat)) != NULL)
  	{
  		/*
! 		 * Write out the tables and functions into a separate file, if
! 		 * required.
! 		 *
! 		 * We need to do this before the dbentry write, to ensure the
! 		 * timestamps written to both are consistent.
! 		 */
! 		if (allDbs || pgstat_db_requested(dbentry->databaseid))
! 		{
! 			elog(DEBUG1, "writing statsfile for DB %d", dbentry->databaseid);
! 			dbentry->stats_timestamp = globalStats.stats_timestamp;
! 			pgstat_write_db_statsfile(dbentry, permanent);
! 		}
! 
! 		/*
! 		 * Write out the DB entry. We don't write the tables or functions
! 		 * pointers, since they're of no use to any other process.
  		 */
  		fputc('D', fpout);
  		rc = fwrite(dbentry, offsetof(PgStat_StatDBEntry, tables), 1, fpout);
  		(void) rc;				/* we'll check for error with ferror */
  	}
  
  	/*
***************
*** 3527,3532 **** pgstat_write_statsfile(bool permanent)
--- 3554,3587 ----
  	 */
  	fputc('E', fpout);
  
+ 	/* In any case, we can just throw away all the db requests, but we need to
+ 	 * write dummy files for databases without a stat entry (it would cause
+ 	 * issues in pgstat_read_db_statsfile_timestamp and pgstat wait timeouts).
+ 	 * This may happen e.g. for shared DB (oid = 0) right after initdb.
+ 	 */
+ 	if (!slist_is_empty(&last_statrequests))
+ 	{
+ 		slist_mutable_iter	iter;
+ 
+ 		slist_foreach_modify(iter, &last_statrequests)
+ 		{
+ 			DBWriteRequest *req = slist_container(DBWriteRequest, next,
+ 												  iter.cur);
+ 
+ 			/*
+ 			 * Create dummy files for requested databases without a proper
+ 			 * dbentry. It's much easier this way than dealing with multiple
+ 			 * timestamps, possibly existing but not yet written DBs etc.
+ 			 * */
+ 			if (!pgstat_get_db_entry(req->databaseid, false))
+ 				pgstat_write_db_dummyfile(req->databaseid);
+ 
+ 			pfree(req);
+ 		}
+ 
+ 		slist_init(&last_statrequests);
+ 	}
+ 
  	if (ferror(fpout))
  	{
  		ereport(LOG,
***************
*** 3552,3612 **** pgstat_write_statsfile(bool permanent)
  						tmpfile, statfile)));
  		unlink(tmpfile);
  	}
- 	else
- 	{
- 		/*
- 		 * Successful write, so update last_statwrite.
- 		 */
- 		last_statwrite = globalStats.stats_timestamp;
- 
- 		/*
- 		 * If there is clock skew between backends and the collector, we could
- 		 * receive a stats request time that's in the future.  If so, complain
- 		 * and reset last_statrequest.	Resetting ensures that no inquiry
- 		 * message can cause more than one stats file write to occur.
- 		 */
- 		if (last_statrequest > last_statwrite)
- 		{
- 			char	   *reqtime;
- 			char	   *mytime;
- 
- 			/* Copy because timestamptz_to_str returns a static buffer */
- 			reqtime = pstrdup(timestamptz_to_str(last_statrequest));
- 			mytime = pstrdup(timestamptz_to_str(last_statwrite));
- 			elog(LOG, "last_statrequest %s is later than collector's time %s",
- 				 reqtime, mytime);
- 			pfree(reqtime);
- 			pfree(mytime);
- 
- 			last_statrequest = last_statwrite;
- 		}
- 	}
  
  	if (permanent)
  		unlink(pgstat_stat_filename);
  }
  
  
  /* ----------
   * pgstat_read_statsfile() -
   *
   *	Reads in an existing statistics collector file and initializes the
!  *	databases' hash table (whose entries point to the tables' hash tables).
   * ----------
   */
  static HTAB *
! pgstat_read_statsfile(Oid onlydb, bool permanent)
  {
  	PgStat_StatDBEntry *dbentry;
  	PgStat_StatDBEntry dbbuf;
- 	PgStat_StatTabEntry *tabentry;
- 	PgStat_StatTabEntry tabbuf;
- 	PgStat_StatFuncEntry funcbuf;
- 	PgStat_StatFuncEntry *funcentry;
  	HASHCTL		hash_ctl;
  	HTAB	   *dbhash;
- 	HTAB	   *tabhash = NULL;
- 	HTAB	   *funchash = NULL;
  	FILE	   *fpin;
  	int32		format_id;
  	bool		found;
--- 3607,3855 ----
  						tmpfile, statfile)));
  		unlink(tmpfile);
  	}
  
  	if (permanent)
  		unlink(pgstat_stat_filename);
  }
  
+ /*
+  * return the filename for a DB stat file; filename is the output buffer,
+  * of length len.
+  */
+ static void
+ get_dbstat_filename(bool permanent, bool tempname, Oid databaseid,
+ 					char *filename, int len)
+ {
+ 	int		printed;
+ 
+ 	printed = snprintf(filename, len, "%s/db_%u.%s",
+ 					   permanent ? "pg_stat" : pgstat_stat_directory,
+ 					   databaseid,
+ 					   tempname ? "tmp" : "stat");
+ 	if (printed > len)
+ 		elog(ERROR, "overlength pgstat path");
+ }
+ 
+ /* ----------
+  * pgstat_write_db_statsfile() -
+  *
+  *	Tell the news. This writes stats file for a single database.
+  *
+  *	If writing to the permanent file (happens when the collector is
+  *	shutting down only), remove the temporary file so that backends
+  *	starting up under a new postmaster can't read the old data before
+  *	the new collector is ready.
+  * ----------
+  */
+ static void
+ pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent)
+ {
+ 	HASH_SEQ_STATUS tstat;
+ 	HASH_SEQ_STATUS fstat;
+ 	PgStat_StatTabEntry *tabentry;
+ 	PgStat_StatFuncEntry *funcentry;
+ 	FILE	   *fpout;
+ 	int32		format_id;
+ 	Oid			dbid = dbentry->databaseid;
+ 	int			rc;
+ 	char		tmpfile[MAXPGPATH];
+ 	char		statfile[MAXPGPATH];
+ 
+ 	get_dbstat_filename(permanent, true, dbid, tmpfile, MAXPGPATH);
+ 	get_dbstat_filename(permanent, false, dbid, statfile, MAXPGPATH);
+ 
+ 	elog(DEBUG1, "writing statsfile '%s'", statfile);
+ 
+ 	/*
+ 	 * Open the statistics temp file to write out the current values.
+ 	 */
+ 	fpout = AllocateFile(tmpfile, PG_BINARY_W);
+ 	if (fpout == NULL)
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 				 errmsg("could not open temporary statistics file \"%s\": %m",
+ 						tmpfile)));
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * Write the file header --- currently just a format ID.
+ 	 */
+ 	format_id = PGSTAT_FILE_FORMAT_ID;
+ 	rc = fwrite(&format_id, sizeof(format_id), 1, fpout);
+ 	(void) rc;					/* we'll check for error with ferror */
+ 
+ 	/*
+ 	 * Write the timestamp.
+ 	 */
+ 	rc = fwrite(&(globalStats.stats_timestamp), sizeof(globalStats.stats_timestamp), 1, fpout);
+ 	(void) rc;					/* we'll check for error with ferror */
+ 
+ 	/*
+ 	 * Walk through the database's access stats per table.
+ 	 */
+ 	hash_seq_init(&tstat, dbentry->tables);
+ 	while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
+ 	{
+ 		fputc('T', fpout);
+ 		rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
+ 		(void) rc;			/* we'll check for error with ferror */
+ 	}
+ 
+ 	/*
+ 	 * Walk through the database's function stats table.
+ 	 */
+ 	hash_seq_init(&fstat, dbentry->functions);
+ 	while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
+ 	{
+ 		fputc('F', fpout);
+ 		rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
+ 		(void) rc;			/* we'll check for error with ferror */
+ 	}
+ 
+ 	/*
+ 	 * No more output to be done. Close the temp file and replace the old
+ 	 * pgstat.stat with it.  The ferror() check replaces testing for error
+ 	 * after each individual fputc or fwrite above.
+ 	 */
+ 	fputc('E', fpout);
+ 
+ 	if (ferror(fpout))
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 			   errmsg("could not write temporary statistics file \"%s\": %m",
+ 					  tmpfile)));
+ 		FreeFile(fpout);
+ 		unlink(tmpfile);
+ 	}
+ 	else if (FreeFile(fpout) < 0)
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 			   errmsg("could not close temporary statistics file \"%s\": %m",
+ 					  tmpfile)));
+ 		unlink(tmpfile);
+ 	}
+ 	else if (rename(tmpfile, statfile) < 0)
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 				 errmsg("could not rename temporary statistics file \"%s\" to \"%s\": %m",
+ 						tmpfile, statfile)));
+ 		unlink(tmpfile);
+ 	}
+ 
+ 	if (permanent)
+ 	{
+ 		get_dbstat_filename(false, false, dbid, tmpfile, MAXPGPATH);
+ 
+ 		elog(DEBUG1, "removing temporary stat file '%s'", tmpfile);
+ 		unlink(tmpfile);
+ 	}
+ }
+ 
+ 
+ /* ----------
+  * pgstat_write_db_dummyfile() -
+  *
+  *	All this does is writing a dummy stat file for databases without dbentry
+  *	yet. It basically writes just a file header - format ID and a timestamp.
+  * ----------
+  */
+ static void
+ pgstat_write_db_dummyfile(Oid databaseid)
+ {
+ 	FILE	   *fpout;
+ 	int32		format_id;
+ 	int			rc;
+ 	char		tmpfile[MAXPGPATH];
+ 	char		statfile[MAXPGPATH];
+ 
+ 	get_dbstat_filename(false, true, databaseid, tmpfile, MAXPGPATH);
+ 	get_dbstat_filename(false, false, databaseid, statfile, MAXPGPATH);
+ 
+ 	elog(DEBUG1, "writing statsfile '%s'", statfile);
+ 
+ 	/*
+ 	 * Open the statistics temp file to write out the current values.
+ 	 */
+ 	fpout = AllocateFile(tmpfile, PG_BINARY_W);
+ 	if (fpout == NULL)
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 				 errmsg("could not open temporary statistics file \"%s\": %m",
+ 						tmpfile)));
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * Write the file header --- currently just a format ID.
+ 	 */
+ 	format_id = PGSTAT_FILE_FORMAT_ID;
+ 	rc = fwrite(&format_id, sizeof(format_id), 1, fpout);
+ 	(void) rc;					/* we'll check for error with ferror */
+ 
+ 	/*
+ 	 * Write the timestamp.
+ 	 */
+ 	rc = fwrite(&(globalStats.stats_timestamp), sizeof(globalStats.stats_timestamp), 1, fpout);
+ 	(void) rc;					/* we'll check for error with ferror */
+ 
+ 	/*
+ 	 * No more output to be done. Close the temp file and replace the old
+ 	 * pgstat.stat with it.  The ferror() check replaces testing for error
+ 	 * after each individual fputc or fwrite above.
+ 	 */
+ 	fputc('E', fpout);
+ 
+ 	if (ferror(fpout))
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 			   errmsg("could not write temporary dummy statistics file \"%s\": %m",
+ 					  tmpfile)));
+ 		FreeFile(fpout);
+ 		unlink(tmpfile);
+ 	}
+ 	else if (FreeFile(fpout) < 0)
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 			   errmsg("could not close temporary dummy statistics file \"%s\": %m",
+ 					  tmpfile)));
+ 		unlink(tmpfile);
+ 	}
+ 	else if (rename(tmpfile, statfile) < 0)
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 				 errmsg("could not rename temporary dummy statistics file \"%s\" to \"%s\": %m",
+ 						tmpfile, statfile)));
+ 		unlink(tmpfile);
+ 	}
+ }
  
  /* ----------
   * pgstat_read_statsfile() -
   *
   *	Reads in an existing statistics collector file and initializes the
!  *	databases' hash table.  If the permanent file name is requested, also
!  *	remove it after reading.
!  *
!  *  If a deep read is requested, table/function stats are read also, otherwise
!  *  the table/function hash tables remain empty.
   * ----------
   */
  static HTAB *
! pgstat_read_statsfile(Oid onlydb, bool permanent, bool deep)
  {
  	PgStat_StatDBEntry *dbentry;
  	PgStat_StatDBEntry dbbuf;
  	HASHCTL		hash_ctl;
  	HTAB	   *dbhash;
  	FILE	   *fpin;
  	int32		format_id;
  	bool		found;
***************
*** 3690,3697 **** pgstat_read_statsfile(Oid onlydb, bool permanent)
  		{
  				/*
  				 * 'D'	A PgStat_StatDBEntry struct describing a database
! 				 * follows. Subsequently, zero to many 'T' and 'F' entries
! 				 * will follow until a 'd' is encountered.
  				 */
  			case 'D':
  				if (fread(&dbbuf, 1, offsetof(PgStat_StatDBEntry, tables),
--- 3933,3939 ----
  		{
  				/*
  				 * 'D'	A PgStat_StatDBEntry struct describing a database
! 				 * follows.
  				 */
  			case 'D':
  				if (fread(&dbbuf, 1, offsetof(PgStat_StatDBEntry, tables),
***************
*** 3753,3773 **** pgstat_read_statsfile(Oid onlydb, bool permanent)
  								   HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
  
  				/*
! 				 * Arrange that following records add entries to this
! 				 * database's hash tables.
  				 */
! 				tabhash = dbentry->tables;
! 				funchash = dbentry->functions;
! 				break;
  
- 				/*
- 				 * 'd'	End of this database.
- 				 */
- 			case 'd':
- 				tabhash = NULL;
- 				funchash = NULL;
  				break;
  
  				/*
  				 * 'T'	A PgStat_StatTabEntry follows.
  				 */
--- 3995,4111 ----
  								   HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
  
  				/*
! 				 * If requested, read the data from the database-specific file.
! 				 * If there was onlydb specified (!= InvalidOid), we would not
! 				 * get here because of a break above. So we don't need to
! 				 * recheck.
  				 */
! 				if (deep)
! 					pgstat_read_db_statsfile(dbentry->databaseid,
! 											 dbentry->tables,
! 											 dbentry->functions,
! 											 permanent);
  
  				break;
  
+ 			case 'E':
+ 				goto done;
+ 
+ 			default:
+ 				ereport(pgStatRunningInCollector ? LOG : WARNING,
+ 						(errmsg("corrupted statistics file \"%s\"",
+ 								statfile)));
+ 				goto done;
+ 		}
+ 	}
+ 
+ done:
+ 	FreeFile(fpin);
+ 
+ 	if (permanent)
+ 	{
+ 		/*
+ 		 * If requested to read the permanent file, also get rid of it; the
+ 		 * in-memory status is now authoritative, and the permanent file would
+ 		 * be out of date in case somebody else reads it.
+ 		 */
+ 		unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+ 	}
+ 
+ 	return dbhash;
+ }
+ 
+ 
+ /* ----------
+  * pgstat_read_db_statsfile() -
+  *
+  *	Reads in an existing statistics collector db file and initializes the
+  *	tables and functions hash tables (for the database identified by Oid).
+  * ----------
+  */
+ static void
+ pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent)
+ {
+ 	PgStat_StatTabEntry *tabentry;
+ 	PgStat_StatTabEntry tabbuf;
+ 	PgStat_StatFuncEntry funcbuf;
+ 	PgStat_StatFuncEntry *funcentry;
+ 	FILE	   *fpin;
+ 	int32		format_id;
+ 	TimestampTz timestamp;
+ 	bool		found;
+ 	char		statfile[MAXPGPATH];
+ 
+ 	get_dbstat_filename(permanent, false, databaseid, statfile, MAXPGPATH);
+ 
+ 	/*
+ 	 * Try to open the status file. If it doesn't exist, the backends simply
+ 	 * return zero for anything and the collector simply starts from scratch
+ 	 * with empty counters.
+ 	 *
+ 	 * ENOENT is a possibility if the stats collector is not running or has
+ 	 * not yet written the stats file the first time.  Any other failure
+ 	 * condition is suspicious.
+ 	 */
+ 	if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
+ 	{
+ 		if (errno != ENOENT)
+ 			ereport(pgStatRunningInCollector ? LOG : WARNING,
+ 					(errcode_for_file_access(),
+ 					 errmsg("could not open statistics file \"%s\": %m",
+ 							statfile)));
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * Verify it's of the expected format.
+ 	 */
+ 	if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id)
+ 		|| format_id != PGSTAT_FILE_FORMAT_ID)
+ 	{
+ 		ereport(pgStatRunningInCollector ? LOG : WARNING,
+ 				(errmsg("corrupted statistics file \"%s\"", statfile)));
+ 		goto done;
+ 	}
+ 
+ 	/*
+ 	 * Read global stats struct
+ 	 */
+ 	if (fread(&timestamp, 1, sizeof(timestamp), fpin) != sizeof(timestamp))
+ 	{
+ 		ereport(pgStatRunningInCollector ? LOG : WARNING,
+ 				(errmsg("corrupted statistics file \"%s\"", statfile)));
+ 		goto done;
+ 	}
+ 
+ 	/*
+ 	 * We found an existing collector stats file. Read it and put all the
+ 	 * hashtable entries into place.
+ 	 */
+ 	for (;;)
+ 	{
+ 		switch (fgetc(fpin))
+ 		{
  				/*
  				 * 'T'	A PgStat_StatTabEntry follows.
  				 */
***************
*** 3854,3878 **** done:
  	FreeFile(fpin);
  
  	if (permanent)
! 		unlink(PGSTAT_STAT_PERMANENT_FILENAME);
  
! 	return dbhash;
  }
  
  /* ----------
!  * pgstat_read_statsfile_timestamp() -
   *
!  *	Attempt to fetch the timestamp of an existing stats file.
   *	Returns TRUE if successful (timestamp is stored at *ts).
   * ----------
   */
  static bool
! pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
  {
! 	PgStat_GlobalStats myGlobalStats;
  	FILE	   *fpin;
  	int32		format_id;
! 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
  
  	/*
  	 * Try to open the status file.  As above, anything but ENOENT is worthy
--- 4192,4224 ----
  	FreeFile(fpin);
  
  	if (permanent)
! 	{
! 		get_dbstat_filename(permanent, false, databaseid, statfile, MAXPGPATH);
  
! 		elog(DEBUG1, "removing permanent stats file '%s'", statfile);
! 		unlink(statfile);
! 	}
! 
! 	return;
  }
  
+ 
  /* ----------
!  * pgstat_read_db_statsfile_timestamp() -
   *
!  *	Attempt to fetch the timestamp of an existing stats file (for a DB).
   *	Returns TRUE if successful (timestamp is stored at *ts).
   * ----------
   */
  static bool
! pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent, TimestampTz *ts)
  {
! 	TimestampTz timestamp;
  	FILE	   *fpin;
  	int32		format_id;
! 	char		statfile[MAXPGPATH];
! 
! 	get_dbstat_filename(permanent, false, databaseid, statfile, MAXPGPATH);
  
  	/*
  	 * Try to open the status file.  As above, anything but ENOENT is worthy
***************
*** 3891,3898 **** pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
  	/*
  	 * Verify it's of the expected format.
  	 */
! 	if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id)
! 		|| format_id != PGSTAT_FILE_FORMAT_ID)
  	{
  		ereport(pgStatRunningInCollector ? LOG : WARNING,
  				(errmsg("corrupted statistics file \"%s\"", statfile)));
--- 4237,4244 ----
  	/*
  	 * Verify it's of the expected format.
  	 */
! 	if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id) ||
! 		format_id != PGSTAT_FILE_FORMAT_ID)
  	{
  		ereport(pgStatRunningInCollector ? LOG : WARNING,
  				(errmsg("corrupted statistics file \"%s\"", statfile)));
***************
*** 3903,3909 **** pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
  	/*
  	 * Read global stats struct
  	 */
! 	if (fread(&myGlobalStats, 1, sizeof(myGlobalStats), fpin) != sizeof(myGlobalStats))
  	{
  		ereport(pgStatRunningInCollector ? LOG : WARNING,
  				(errmsg("corrupted statistics file \"%s\"", statfile)));
--- 4249,4255 ----
  	/*
  	 * Read global stats struct
  	 */
! 	if (fread(&timestamp, 1, sizeof(TimestampTz), fpin) != sizeof(TimestampTz))
  	{
  		ereport(pgStatRunningInCollector ? LOG : WARNING,
  				(errmsg("corrupted statistics file \"%s\"", statfile)));
***************
*** 3911,3917 **** pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
  		return false;
  	}
  
! 	*ts = myGlobalStats.stats_timestamp;
  
  	FreeFile(fpin);
  	return true;
--- 4257,4263 ----
  		return false;
  	}
  
! 	*ts = timestamp;
  
  	FreeFile(fpin);
  	return true;
***************
*** 3947,3953 **** backend_read_statsfile(void)
  
  		CHECK_FOR_INTERRUPTS();
  
! 		ok = pgstat_read_statsfile_timestamp(false, &file_ts);
  
  		cur_ts = GetCurrentTimestamp();
  		/* Calculate min acceptable timestamp, if we didn't already */
--- 4293,4299 ----
  
  		CHECK_FOR_INTERRUPTS();
  
! 		ok = pgstat_read_db_statsfile_timestamp(MyDatabaseId, false, &file_ts);
  
  		cur_ts = GetCurrentTimestamp();
  		/* Calculate min acceptable timestamp, if we didn't already */
***************
*** 4006,4012 **** backend_read_statsfile(void)
  				pfree(mytime);
  			}
  
! 			pgstat_send_inquiry(cur_ts, min_ts);
  			break;
  		}
  
--- 4352,4358 ----
  				pfree(mytime);
  			}
  
! 			pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
  			break;
  		}
  
***************
*** 4016,4022 **** backend_read_statsfile(void)
  
  		/* Not there or too old, so kick the collector and wait a bit */
  		if ((count % PGSTAT_INQ_LOOP_COUNT) == 0)
! 			pgstat_send_inquiry(cur_ts, min_ts);
  
  		pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
  	}
--- 4362,4368 ----
  
  		/* Not there or too old, so kick the collector and wait a bit */
  		if ((count % PGSTAT_INQ_LOOP_COUNT) == 0)
! 			pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
  
  		pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
  	}
***************
*** 4024,4034 **** backend_read_statsfile(void)
  	if (count >= PGSTAT_POLL_LOOP_COUNT)
  		elog(WARNING, "pgstat wait timeout");
  
! 	/* Autovacuum launcher wants stats about all databases */
  	if (IsAutoVacuumLauncherProcess())
! 		pgStatDBHash = pgstat_read_statsfile(InvalidOid, false);
  	else
! 		pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false);
  }
  
  
--- 4370,4383 ----
  	if (count >= PGSTAT_POLL_LOOP_COUNT)
  		elog(WARNING, "pgstat wait timeout");
  
! 	/*
! 	 * Autovacuum launcher wants stats about all databases, but a shallow
! 	 * read is sufficient.
! 	 */
  	if (IsAutoVacuumLauncherProcess())
! 		pgStatDBHash = pgstat_read_statsfile(InvalidOid, false, false);
  	else
! 		pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false, true);
  }
  
  
***************
*** 4084,4109 **** pgstat_clear_snapshot(void)
  static void
  pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
  {
  	/*
! 	 * Advance last_statrequest if this requestor has a newer cutoff time
! 	 * than any previous request.
  	 */
! 	if (msg->cutoff_time > last_statrequest)
! 		last_statrequest = msg->cutoff_time;
  
  	/*
! 	 * If the requestor's local clock time is older than last_statwrite, we
  	 * should suspect a clock glitch, ie system time going backwards; though
  	 * the more likely explanation is just delayed message receipt.  It is
  	 * worth expending a GetCurrentTimestamp call to be sure, since a large
  	 * retreat in the system clock reading could otherwise cause us to neglect
  	 * to update the stats file for a long time.
  	 */
! 	if (msg->clock_time < last_statwrite)
  	{
  		TimestampTz cur_ts = GetCurrentTimestamp();
  
! 		if (cur_ts < last_statwrite)
  		{
  			/*
  			 * Sure enough, time went backwards.  Force a new stats file write
--- 4433,4485 ----
  static void
  pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
  {
+ 	slist_iter	iter;
+ 	bool		found = false;
+ 	DBWriteRequest *newreq;
+ 	PgStat_StatDBEntry *dbentry;
+ 
+ 	elog(DEBUG1, "received inquiry for %d", msg->databaseid);
+ 
+ 	/*
+ 	 * Find the last write request for this DB (found=true in that case). Plain
+ 	 * linear search, not really worth doing any magic here (probably).
+ 	 */
+ 	slist_foreach(iter, &last_statrequests)
+ 	{
+ 		DBWriteRequest *req = slist_container(DBWriteRequest, next, iter.cur);
+ 
+ 		if (req->databaseid != msg->databaseid)
+ 			continue;
+ 
+ 		if (msg->cutoff_time > req->request_time)
+ 			req->request_time = msg->cutoff_time;
+ 		found = true;
+ 		return;
+ 	}
+ 
  	/*
! 	 * There's no request for this DB yet, so create one.
  	 */
! 	newreq = palloc(sizeof(DBWriteRequest));
! 
! 	newreq->databaseid = msg->databaseid;
! 	newreq->request_time = msg->clock_time;
! 	slist_push_head(&last_statrequests, &newreq->next);
  
  	/*
! 	 * If the requestor's local clock time is older than stats_timestamp, we
  	 * should suspect a clock glitch, ie system time going backwards; though
  	 * the more likely explanation is just delayed message receipt.  It is
  	 * worth expending a GetCurrentTimestamp call to be sure, since a large
  	 * retreat in the system clock reading could otherwise cause us to neglect
  	 * to update the stats file for a long time.
  	 */
! 	dbentry = pgstat_get_db_entry(msg->databaseid, false);
! 	if ((dbentry != NULL) && (msg->clock_time < dbentry->stats_timestamp))
  	{
  		TimestampTz cur_ts = GetCurrentTimestamp();
  
! 		if (cur_ts < dbentry->stats_timestamp)
  		{
  			/*
  			 * Sure enough, time went backwards.  Force a new stats file write
***************
*** 4113,4127 **** pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
  			char	   *mytime;
  
  			/* Copy because timestamptz_to_str returns a static buffer */
! 			writetime = pstrdup(timestamptz_to_str(last_statwrite));
  			mytime = pstrdup(timestamptz_to_str(cur_ts));
! 			elog(LOG, "last_statwrite %s is later than collector's time %s",
! 				 writetime, mytime);
  			pfree(writetime);
  			pfree(mytime);
  
! 			last_statrequest = cur_ts;
! 			last_statwrite = last_statrequest - 1;
  		}
  	}
  }
--- 4489,4504 ----
  			char	   *mytime;
  
  			/* Copy because timestamptz_to_str returns a static buffer */
! 			writetime = pstrdup(timestamptz_to_str(dbentry->stats_timestamp));
  			mytime = pstrdup(timestamptz_to_str(cur_ts));
! 			elog(LOG,
! 				 "stats_timestamp %s is later than collector's time %s for db %d",
! 				 writetime, mytime, dbentry->databaseid);
  			pfree(writetime);
  			pfree(mytime);
  
! 			newreq->request_time = cur_ts;
! 			dbentry->stats_timestamp = cur_ts - 1;
  		}
  	}
  }
***************
*** 4270,4298 **** pgstat_recv_tabpurge(PgStat_MsgTabpurge *msg, int len)
  static void
  pgstat_recv_dropdb(PgStat_MsgDropdb *msg, int len)
  {
  	PgStat_StatDBEntry *dbentry;
  
  	/*
  	 * Lookup the database in the hashtable.
  	 */
! 	dbentry = pgstat_get_db_entry(msg->m_databaseid, false);
  
  	/*
! 	 * If found, remove it.
  	 */
  	if (dbentry)
  	{
  		if (dbentry->tables != NULL)
  			hash_destroy(dbentry->tables);
  		if (dbentry->functions != NULL)
  			hash_destroy(dbentry->functions);
  
  		if (hash_search(pgStatDBHash,
! 						(void *) &(dbentry->databaseid),
  						HASH_REMOVE, NULL) == NULL)
  			ereport(ERROR,
! 					(errmsg("database hash table corrupted "
! 							"during cleanup --- abort")));
  	}
  }
  
--- 4647,4682 ----
  static void
  pgstat_recv_dropdb(PgStat_MsgDropdb *msg, int len)
  {
+ 	Oid			dbid = msg->m_databaseid;
  	PgStat_StatDBEntry *dbentry;
  
  	/*
  	 * Lookup the database in the hashtable.
  	 */
! 	dbentry = pgstat_get_db_entry(dbid, false);
  
  	/*
! 	 * If found, remove it (along with the db statfile).
  	 */
  	if (dbentry)
  	{
+ 		char		statfile[MAXPGPATH];
+ 
+ 		get_dbstat_filename(true, false, dbid, statfile, MAXPGPATH);
+ 
+ 		elog(DEBUG1, "removing %s", statfile);
+ 		unlink(statfile);
+ 
  		if (dbentry->tables != NULL)
  			hash_destroy(dbentry->tables);
  		if (dbentry->functions != NULL)
  			hash_destroy(dbentry->functions);
  
  		if (hash_search(pgStatDBHash,
! 						(void *) &dbid,
  						HASH_REMOVE, NULL) == NULL)
  			ereport(ERROR,
! 					(errmsg("database hash table corrupted during cleanup --- abort")));
  	}
  }
  
***************
*** 4687,4689 **** pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len)
--- 5071,5131 ----
  						   HASH_REMOVE, NULL);
  	}
  }
+ 
+ /* ----------
+  * pgstat_write_statsfile_needed() -
+  *
+  *	Checks whether there's a db stats request, requiring a file write.
+  *
+  *	TODO Seems that thanks the way we handle last_statrequests (erase after
+  *	a write), this is unnecessary. Just check that there's at least one
+  *	request and you're done. Although there might be delayed requests ...
+  * ----------
+  */
+ static bool
+ pgstat_write_statsfile_needed(void)
+ {
+ 	PgStat_StatDBEntry *dbentry;
+ 	slist_iter	iter;
+ 
+ 	/* Check the databases if they need to refresh the stats. */
+ 	slist_foreach(iter, &last_statrequests)
+ 	{
+ 		DBWriteRequest *req = slist_container(DBWriteRequest, next, iter.cur);
+ 
+ 		dbentry = pgstat_get_db_entry(req->databaseid, false);
+ 
+ 		/* No dbentry yet or too old. */
+ 		if (!dbentry || (dbentry->stats_timestamp < req->request_time))
+ 		{
+ 			return true;
+ 		}
+ 	}
+ 
+ 	/* Well, everything was written recently ... */
+ 	return false;
+ }
+ 
+ /* ----------
+  * pgstat_db_requested() -
+  *
+  *	Checks whether stats for a particular DB need to be written to a file.
+  * ----------
+  */
+ 
+ static bool
+ pgstat_db_requested(Oid databaseid)
+ {
+ 	slist_iter	iter;
+ 
+ 	/* Check the databases if they need to refresh the stats. */
+ 	slist_foreach(iter, &last_statrequests)
+ 	{
+ 		DBWriteRequest	*req = slist_container(DBWriteRequest, next, iter.cur);
+ 
+ 		if (req->databaseid == databaseid)
+ 			return true;
+ 	}
+ 
+ 	return false;
+ }
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***************
*** 8704,8717 **** static void
  assign_pgstat_temp_directory(const char *newval, void *extra)
  {
  	/* check_canonical_path already canonicalized newval for us */
  	char	   *tname;
  	char	   *fname;
  
! 	tname = guc_malloc(ERROR, strlen(newval) + 12);		/* /pgstat.tmp */
! 	sprintf(tname, "%s/pgstat.tmp", newval);
! 	fname = guc_malloc(ERROR, strlen(newval) + 13);		/* /pgstat.stat */
! 	sprintf(fname, "%s/pgstat.stat", newval);
  
  	if (pgstat_stat_tmpname)
  		free(pgstat_stat_tmpname);
  	pgstat_stat_tmpname = tname;
--- 8704,8728 ----
  assign_pgstat_temp_directory(const char *newval, void *extra)
  {
  	/* check_canonical_path already canonicalized newval for us */
+ 	char	   *dname;
  	char	   *tname;
  	char	   *fname;
  
! 	/* directory */
! 	dname = guc_malloc(ERROR, strlen(newval) + 1);		/* runtime dir */
! 	sprintf(dname, "%s", newval);
  
+ 	/* global stats */
+ 	tname = guc_malloc(ERROR, strlen(newval) + 12);		/* /global.tmp */
+ 	sprintf(tname, "%s/global.tmp", newval);
+ 	fname = guc_malloc(ERROR, strlen(newval) + 13);		/* /global.stat */
+ 	sprintf(fname, "%s/global.stat", newval);
+ 
+ 	if (pgstat_stat_directory)
+ 		free(pgstat_stat_directory);
+ 	pgstat_stat_directory = dname;
+ 	/* invalidate cached length in pgstat.c */
+ 	pgstat_stat_dbfile_maxlen = 0;
  	if (pgstat_stat_tmpname)
  		free(pgstat_stat_tmpname);
  	pgstat_stat_tmpname = tname;
*** a/src/bin/initdb/initdb.c
--- b/src/bin/initdb/initdb.c
***************
*** 192,197 **** const char *subdirs[] = {
--- 192,198 ----
  	"base",
  	"base/1",
  	"pg_tblspc",
+ 	"pg_stat",
  	"pg_stat_tmp"
  };
  
*** a/src/include/pgstat.h
--- b/src/include/pgstat.h
***************
*** 205,210 **** typedef struct PgStat_MsgInquiry
--- 205,211 ----
  	PgStat_MsgHdr m_hdr;
  	TimestampTz clock_time;		/* observed local clock time */
  	TimestampTz cutoff_time;	/* minimum acceptable file timestamp */
+ 	Oid			databaseid;		/* requested DB (InvalidOid => all DBs) */
  } PgStat_MsgInquiry;
  
  
***************
*** 514,520 **** typedef union PgStat_Msg
   * ------------------------------------------------------------
   */
  
! #define PGSTAT_FILE_FORMAT_ID	0x01A5BC9A
  
  /* ----------
   * PgStat_StatDBEntry			The collector's data per database
--- 515,521 ----
   * ------------------------------------------------------------
   */
  
! #define PGSTAT_FILE_FORMAT_ID	0xA240CA47
  
  /* ----------
   * PgStat_StatDBEntry			The collector's data per database
***************
*** 545,550 **** typedef struct PgStat_StatDBEntry
--- 546,552 ----
  	PgStat_Counter n_block_write_time;
  
  	TimestampTz stat_reset_timestamp;
+ 	TimestampTz stats_timestamp;		/* time of db stats file update */
  
  	/*
  	 * tables and functions must be last in the struct, because we don't write
***************
*** 722,727 **** extern bool pgstat_track_activities;
--- 724,731 ----
  extern bool pgstat_track_counts;
  extern int	pgstat_track_functions;
  extern PGDLLIMPORT int pgstat_track_activity_query_size;
+ extern char *pgstat_stat_directory;
+ extern int	pgstat_stat_dbfile_maxlen;
  extern char *pgstat_stat_tmpname;
  extern char *pgstat_stat_filename;
  
#47Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#46)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Alvaro Herrera escribió:

Hm, and I now also realize another bug in this patch: the global stats
only include database entries for requested databases; but perhaps the
existing files can serve later requestors just fine for databases that
already had files; so the global stats file should continue to carry
entries for them, with the old timestamps.

Actually the code already do things that way -- apologies.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48Josh Berkus
josh@agliodbs.com
In reply to: Alvaro Herrera (#47)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

I saw discussion about this on this thread, but I'm not able to figure
out what the answer is: how does this work with moving the stats file,
for example to a RAMdisk? Specifically, if the user sets
stats_temp_directory, does it continue to work the way it does now?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Josh Berkus (#48)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Josh Berkus wrote:

I saw discussion about this on this thread, but I'm not able to figure
out what the answer is: how does this work with moving the stats file,
for example to a RAMdisk? Specifically, if the user sets
stats_temp_directory, does it continue to work the way it does now?

Of course. You get more files than previously, but yes.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#46)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Alvaro Herrera escribió:

Here's a ninth version of this patch. (version 8 went unpublished). I
have simplified a lot of things and improved some comments; I think I
understand much of it now. I think this patch is fairly close to
committable, but one issue remains, which is this bit in
pgstat_write_statsfiles():

/* In any case, we can just throw away all the db requests, but we need to
* write dummy files for databases without a stat entry (it would cause
* issues in pgstat_read_db_statsfile_timestamp and pgstat wait timeouts).
* This may happen e.g. for shared DB (oid = 0) right after initdb.
*/

I think the real way to handle this is to fix backend_read_statsfile().
It's using the old logic of considering existance of the file, but of
course now the file might not exist at all and that doesn't mean we need
to continue kicking the collector to write it. We need a mechanism to
figure that the collector is just not going to write the file no matter
how hard we kick it ...

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#46)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Alvaro Herrera escribió:

Here's a ninth version of this patch. (version 8 went unpublished). I
have simplified a lot of things and improved some comments; I think I
understand much of it now. I think this patch is fairly close to
committable, but one issue remains, which is this bit in
pgstat_write_statsfiles():

I've marked this as Waiting on author for the time being. I'm going to
review/work on other patches now, hoping that Tomas will post an updated
version in time for it to be considered for 9.3.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52Tomas Vondra
tv@fuzzy.cz
In reply to: Josh Berkus (#48)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 14.2.2013 20:43, Josh Berkus wrote:

I saw discussion about this on this thread, but I'm not able to figure
out what the answer is: how does this work with moving the stats file,
for example to a RAMdisk? Specifically, if the user sets
stats_temp_directory, does it continue to work the way it does now?

No change in this respect - you can still use RAMdisk, and you'll
actually need less space because the space requirements decreased due to
breaking the single file into multiple pieces.

We're using it this way (on a tmpfs filesystem) and it works like a charm.

regards
Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#53Tomas Vondra
tv@fuzzy.cz
In reply to: Alvaro Herrera (#46)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

First of all, big thanks for working on this patch and not only
identifying the issues but actually fixing them.

On 14.2.2013 20:23, Alvaro Herrera wrote:

Here's a ninth version of this patch. (version 8 went unpublished). I
have simplified a lot of things and improved some comments; I think I
understand much of it now. I think this patch is fairly close to
committable, but one issue remains, which is this bit in
pgstat_write_statsfiles():

...

The problem here is that creating these dummy entries will cause a
difference in autovacuum behavior. Autovacuum will skip processing
databases with no pgstat entry, and the intended reason is that if
there's no pgstat entry it's because the database doesn't have enough
activity. Now perhaps we want to change that, but it should be an
explicit decision taken after discussion and thought, not side effect
from an unrelated patch.

I don't see how that changes the autovacuum behavior. Can you explain
that a bit more?

As I see it, with the old (single-file version) the autovacuum worker
would get exacly the same thing, i.e. no stats at all.

Which is exacly what autovacuum worker gets with the new code, except
that the check for last statfile timestamp uses the "per-db" file, so we
need to write it. This way the worker is able to read the timestamp, is
happy about it because it gets a fresh file although it gets no stats later.

Where is the behavior change? Can you provide an example?

kind regards
Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#54Tomas Vondra
tv@fuzzy.cz
In reply to: Alvaro Herrera (#51)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 14.2.2013 22:24, Alvaro Herrera wrote:

Alvaro Herrera escribió:

Here's a ninth version of this patch. (version 8 went unpublished). I
have simplified a lot of things and improved some comments; I think I
understand much of it now. I think this patch is fairly close to
committable, but one issue remains, which is this bit in
pgstat_write_statsfiles():

I've marked this as Waiting on author for the time being. I'm going to
review/work on other patches now, hoping that Tomas will post an updated
version in time for it to be considered for 9.3.

Sadly I have no idea how to fix that, and I think the solution you
suggested in the previous messages does not actually do the trick :-(

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#53)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Tomas Vondra escribió:

I don't see how that changes the autovacuum behavior. Can you explain
that a bit more?

It might be that I'm all wet on this. I'll poke at it some more.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#56Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#53)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Tomas Vondra escribió:

On 14.2.2013 20:23, Alvaro Herrera wrote:

The problem here is that creating these dummy entries will cause a
difference in autovacuum behavior. Autovacuum will skip processing
databases with no pgstat entry, and the intended reason is that if
there's no pgstat entry it's because the database doesn't have enough
activity. Now perhaps we want to change that, but it should be an
explicit decision taken after discussion and thought, not side effect
from an unrelated patch.

I don't see how that changes the autovacuum behavior. Can you explain
that a bit more?

As I see it, with the old (single-file version) the autovacuum worker
would get exacly the same thing, i.e. no stats at all.

See in autovacuum.c the calls to pgstat_fetch_stat_dbentry(). Most of
them check for NULL result and act differently depending on that.
Returning a valid (not NULL) entry full of zeroes is not the same.
I didn't actually try to reproduce a problem.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57Tomas Vondra
tv@fuzzy.cz
In reply to: Alvaro Herrera (#56)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 15.2.2013 16:38, Alvaro Herrera wrote:

Tomas Vondra escribió:

On 14.2.2013 20:23, Alvaro Herrera wrote:

The problem here is that creating these dummy entries will cause a
difference in autovacuum behavior. Autovacuum will skip processing
databases with no pgstat entry, and the intended reason is that if
there's no pgstat entry it's because the database doesn't have enough
activity. Now perhaps we want to change that, but it should be an
explicit decision taken after discussion and thought, not side effect
from an unrelated patch.

I don't see how that changes the autovacuum behavior. Can you explain
that a bit more?

As I see it, with the old (single-file version) the autovacuum worker
would get exacly the same thing, i.e. no stats at all.

See in autovacuum.c the calls to pgstat_fetch_stat_dbentry(). Most of
them check for NULL result and act differently depending on that.
Returning a valid (not NULL) entry full of zeroes is not the same.
I didn't actually try to reproduce a problem.

Errrr, but why would the patched code return entry full of zeroes and
not NULL as before? The dummy files serve single purpose - confirm that
the collector attempted to write info for the particular database (and
did not found any data for that).

All it contains is a timestamp of the write - nothing else. So the
worker will read the global file (containing list of stats for dbs) and
then will get NULL just like the old code. Because the database is not
there and the patch does not change that at all.

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#58Tomas Vondra
tv@fuzzy.cz
In reply to: Tomas Vondra (#54)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 15.2.2013 01:02, Tomas Vondra wrote:

On 14.2.2013 22:24, Alvaro Herrera wrote:

Alvaro Herrera escribió:

Here's a ninth version of this patch. (version 8 went unpublished). I
have simplified a lot of things and improved some comments; I think I
understand much of it now. I think this patch is fairly close to
committable, but one issue remains, which is this bit in
pgstat_write_statsfiles():

I've marked this as Waiting on author for the time being. I'm going to
review/work on other patches now, hoping that Tomas will post an updated
version in time for it to be considered for 9.3.

Sadly I have no idea how to fix that, and I think the solution you
suggested in the previous messages does not actually do the trick :-(

I've been thinking about this (actually I had a really weird dream about
it this night) and I think it might work like this:

(1) check the timestamp of the global file -> if it's too old, we need
to send an inquiry or wait a bit longer

(2) if it's new enough, we need to read it a look for that particular
database - if it's not found, we have no info about it yet (this is
the case handled by the dummy files)

(3) if there's a database stat entry, we need to check the timestamp
when it was written for the last time -> if it's too old, send an
inquiry and wait a bit longer

(4) well, we have a recent global file, it contains the database stat
entry and it's fresh enough -> tadaaaaaa, we're done

At least that's my idea - I haven't tried to implement it yet.

I see a few pros and cons of this approach:

pros:

* no dummy files
* no timestamps in the per-db files (and thus no sync issues)

cons:

* the backends / workers will have to re-read the global file just to
check that the per-db file is there and is fresh enough

So far it was sufficient just to peek at the timestamp at the beginning
of the per-db stat file - minimum data read, no CPU-expensive processing
etc. Sadly the more DBs there are, the larger the file get (thus more
overhead to read it).

OTOH it's not that much data (~180B per entry, so with a 1000 of dbs
it's just ~180kB) so I don't expect this to be a tremendous issue. And
the pros seem to be quite compelling.

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#59Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#58)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Tomas Vondra wrote:

I've been thinking about this (actually I had a really weird dream about
it this night) and I think it might work like this:

(1) check the timestamp of the global file -> if it's too old, we need
to send an inquiry or wait a bit longer

(2) if it's new enough, we need to read it a look for that particular
database - if it's not found, we have no info about it yet (this is
the case handled by the dummy files)

(3) if there's a database stat entry, we need to check the timestamp
when it was written for the last time -> if it's too old, send an
inquiry and wait a bit longer

(4) well, we have a recent global file, it contains the database stat
entry and it's fresh enough -> tadaaaaaa, we're done

Hmm, yes, I think this is what I was imagining. I had even considered
that the timestamp would be removed from the per-db file as you suggest
here.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#60Tomas Vondra
tv@fuzzy.cz
In reply to: Alvaro Herrera (#59)
1 attachment(s)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 17.2.2013 06:46, Alvaro Herrera wrote:

Tomas Vondra wrote:

I've been thinking about this (actually I had a really weird dream about
it this night) and I think it might work like this:

(1) check the timestamp of the global file -> if it's too old, we need
to send an inquiry or wait a bit longer

(2) if it's new enough, we need to read it a look for that particular
database - if it's not found, we have no info about it yet (this is
the case handled by the dummy files)

(3) if there's a database stat entry, we need to check the timestamp
when it was written for the last time -> if it's too old, send an
inquiry and wait a bit longer

(4) well, we have a recent global file, it contains the database stat
entry and it's fresh enough -> tadaaaaaa, we're done

Hmm, yes, I think this is what I was imagining. I had even considered
that the timestamp would be removed from the per-db file as you suggest
here.

So, here's v10 of the patch (based on the v9+v9a), that implements the
approach described above.

It turned out to be much easier than I expected (basically just a
rewrite of the pgstat_read_db_statsfile_timestamp() function.

I've done a fair amount of testing (and will do some more next week) but
it seems to work just fine - no errors, no measurable decrease of
performance etc.

regards
Tomas Vondra

Attachments:

stats-split-v10.patchtext/x-diff; name=stats-split-v10.patchDownload
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 9b92ebb..36c0d8b 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -38,6 +38,7 @@
 #include "access/xact.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_proc.h"
+#include "lib/ilist.h"
 #include "libpq/ip.h"
 #include "libpq/libpq.h"
 #include "libpq/pqsignal.h"
@@ -66,8 +67,9 @@
  * Paths for the statistics files (relative to installation's $PGDATA).
  * ----------
  */
-#define PGSTAT_STAT_PERMANENT_FILENAME		"global/pgstat.stat"
-#define PGSTAT_STAT_PERMANENT_TMPFILE		"global/pgstat.tmp"
+#define PGSTAT_STAT_PERMANENT_DIRECTORY		"pg_stat"
+#define PGSTAT_STAT_PERMANENT_FILENAME		"pg_stat/global.stat"
+#define PGSTAT_STAT_PERMANENT_TMPFILE		"pg_stat/global.tmp"
 
 /* ----------
  * Timer definitions.
@@ -115,6 +117,8 @@ int			pgstat_track_activity_query_size = 1024;
  * Built from GUC parameter
  * ----------
  */
+char	   *pgstat_stat_directory = NULL;
+int			pgstat_stat_dbfile_maxlen = 0;
 char	   *pgstat_stat_filename = NULL;
 char	   *pgstat_stat_tmpname = NULL;
 
@@ -219,11 +223,16 @@ static int	localNumBackends = 0;
  */
 static PgStat_GlobalStats globalStats;
 
-/* Last time the collector successfully wrote the stats file */
-static TimestampTz last_statwrite;
+/* Write request info for each database */
+typedef struct DBWriteRequest
+{
+	Oid			databaseid;		/* OID of the database to write */
+	TimestampTz request_time;	/* timestamp of the last write request */
+	slist_node	next;
+} DBWriteRequest;
 
-/* Latest statistics request time from backends */
-static TimestampTz last_statrequest;
+/* Latest statistics request times from backends */
+static slist_head	last_statrequests = SLIST_STATIC_INIT(last_statrequests);
 
 static volatile bool need_exit = false;
 static volatile bool got_SIGHUP = false;
@@ -252,11 +261,16 @@ static void pgstat_sighup_handler(SIGNAL_ARGS);
 static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
 static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
 					 Oid tableoid, bool create);
-static void pgstat_write_statsfile(bool permanent);
-static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent);
+static void pgstat_write_statsfiles(bool permanent, bool allDbs);
+static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent);
+static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent, bool deep);
+static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
 static void backend_read_statsfile(void);
 static void pgstat_read_current_status(void);
 
+static bool pgstat_write_statsfile_needed(void);
+static bool pgstat_db_requested(Oid databaseid);
+
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
 static HTAB *pgstat_collect_oids(Oid catalogid);
@@ -285,7 +299,6 @@ static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le
 static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
 static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
 
-
 /* ------------------------------------------------------------
  * Public functions called from postmaster follow
  * ------------------------------------------------------------
@@ -541,16 +554,40 @@ startup_failed:
 }
 
 /*
+ * subroutine for pgstat_reset_all
+ */
+static void
+pgstat_reset_remove_files(const char *directory)
+{
+	DIR * dir;
+	struct dirent * entry;
+	char	fname[MAXPGPATH];
+
+	dir = AllocateDir(pgstat_stat_directory);
+	while ((entry = ReadDir(dir, pgstat_stat_directory)) != NULL)
+	{
+		if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
+			continue;
+
+		snprintf(fname, MAXPGPATH, "%s/%s", pgstat_stat_directory,
+				 entry->d_name);
+		unlink(fname);
+	}
+	FreeDir(dir);
+}
+
+/*
  * pgstat_reset_all() -
  *
- * Remove the stats file.  This is currently used only if WAL
+ * Remove the stats files.  This is currently used only if WAL
  * recovery is needed after a crash.
  */
 void
 pgstat_reset_all(void)
 {
-	unlink(pgstat_stat_filename);
-	unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+
+	pgstat_reset_remove_files(pgstat_stat_directory);
+	pgstat_reset_remove_files(PGSTAT_STAT_PERMANENT_DIRECTORY);
 }
 
 #ifdef EXEC_BACKEND
@@ -1408,13 +1445,14 @@ pgstat_ping(void)
  * ----------
  */
 static void
-pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time)
+pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time, Oid databaseid)
 {
 	PgStat_MsgInquiry msg;
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_INQUIRY);
 	msg.clock_time = clock_time;
 	msg.cutoff_time = cutoff_time;
+	msg.databaseid = databaseid;
 	pgstat_send(&msg, sizeof(msg));
 }
 
@@ -3004,6 +3042,7 @@ PgstatCollectorMain(int argc, char *argv[])
 	int			len;
 	PgStat_Msg	msg;
 	int			wr;
+	bool		first_write = true;
 
 	IsUnderPostmaster = true;	/* we are a postmaster subprocess now */
 
@@ -3053,17 +3092,11 @@ PgstatCollectorMain(int argc, char *argv[])
 	init_ps_display("stats collector process", "", "", "");
 
 	/*
-	 * Arrange to write the initial status file right away
-	 */
-	last_statrequest = GetCurrentTimestamp();
-	last_statwrite = last_statrequest - 1;
-
-	/*
 	 * Read in an existing statistics stats file or initialize the stats to
 	 * zero.
 	 */
 	pgStatRunningInCollector = true;
-	pgStatDBHash = pgstat_read_statsfile(InvalidOid, true);
+	pgStatDBHash = pgstat_read_statsfile(InvalidOid, true, true);
 
 	/*
 	 * Loop to process messages until we get SIGQUIT or detect ungraceful
@@ -3107,10 +3140,14 @@ PgstatCollectorMain(int argc, char *argv[])
 
 			/*
 			 * Write the stats file if a new request has arrived that is not
-			 * satisfied by existing file.
+			 * satisfied by existing file (force writing all files if it's
+			 * the first write after startup).
 			 */
-			if (last_statwrite < last_statrequest)
-				pgstat_write_statsfile(false);
+			if (first_write || pgstat_write_statsfile_needed())
+			{
+				pgstat_write_statsfiles(false, first_write);
+				first_write = false;
+			}
 
 			/*
 			 * Try to receive and process a message.  This will not block,
@@ -3269,7 +3306,7 @@ PgstatCollectorMain(int argc, char *argv[])
 	/*
 	 * Save the final stats to reuse at next startup.
 	 */
-	pgstat_write_statsfile(true);
+	pgstat_write_statsfiles(true, true);
 
 	exit(0);
 }
@@ -3349,6 +3386,7 @@ pgstat_get_db_entry(Oid databaseid, bool create)
 		result->n_block_write_time = 0;
 
 		result->stat_reset_timestamp = GetCurrentTimestamp();
+		result->stats_timestamp = 0;
 
 		memset(&hash_ctl, 0, sizeof(hash_ctl));
 		hash_ctl.keysize = sizeof(Oid);
@@ -3422,30 +3460,32 @@ pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create)
 
 
 /* ----------
- * pgstat_write_statsfile() -
+ * pgstat_write_statsfiles() -
  *
  *	Tell the news.
- *	If writing to the permanent file (happens when the collector is
- *	shutting down only), remove the temporary file so that backends
+ *	If writing to the permanent files (happens when the collector is
+ *	shutting down only), remove the temporary files so that backends
  *	starting up under a new postmaster can't read the old data before
  *	the new collector is ready.
+ *
+ *	When 'allDbs' is false, only the requested databases (listed in
+ *	last_statrequests) will be written; otherwise, all databases will be
+ *	written.
  * ----------
  */
 static void
-pgstat_write_statsfile(bool permanent)
+pgstat_write_statsfiles(bool permanent, bool allDbs)
 {
 	HASH_SEQ_STATUS hstat;
-	HASH_SEQ_STATUS tstat;
-	HASH_SEQ_STATUS fstat;
 	PgStat_StatDBEntry *dbentry;
-	PgStat_StatTabEntry *tabentry;
-	PgStat_StatFuncEntry *funcentry;
 	FILE	   *fpout;
 	int32		format_id;
 	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 	int			rc;
 
+	elog(DEBUG1, "writing statsfile '%s'", statfile);
+
 	/*
 	 * Open the statistics temp file to write out the current values.
 	 */
@@ -3484,40 +3524,26 @@ pgstat_write_statsfile(bool permanent)
 	while ((dbentry = (PgStat_StatDBEntry *) hash_seq_search(&hstat)) != NULL)
 	{
 		/*
-		 * Write out the DB entry including the number of live backends. We
-		 * don't write the tables or functions pointers, since they're of no
-		 * use to any other process.
+		 * Write out the tables and functions into a separate file, if
+		 * required.
+		 *
+		 * We need to do this before the dbentry write, to ensure the
+		 * timestamps written to both are consistent.
 		 */
-		fputc('D', fpout);
-		rc = fwrite(dbentry, offsetof(PgStat_StatDBEntry, tables), 1, fpout);
-		(void) rc;				/* we'll check for error with ferror */
-
-		/*
-		 * Walk through the database's access stats per table.
-		 */
-		hash_seq_init(&tstat, dbentry->tables);
-		while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
-		{
-			fputc('T', fpout);
-			rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
-			(void) rc;			/* we'll check for error with ferror */
-		}
-
-		/*
-		 * Walk through the database's function stats table.
-		 */
-		hash_seq_init(&fstat, dbentry->functions);
-		while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
+		if (allDbs || pgstat_db_requested(dbentry->databaseid))
 		{
-			fputc('F', fpout);
-			rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
-			(void) rc;			/* we'll check for error with ferror */
+			elog(DEBUG1, "writing statsfile for DB %d", dbentry->databaseid);
+			dbentry->stats_timestamp = globalStats.stats_timestamp;
+			pgstat_write_db_statsfile(dbentry, permanent);
 		}
 
 		/*
-		 * Mark the end of this DB
+		 * Write out the DB entry. We don't write the tables or functions
+		 * pointers, since they're of no use to any other process.
 		 */
-		fputc('d', fpout);
+		fputc('D', fpout);
+		rc = fwrite(dbentry, offsetof(PgStat_StatDBEntry, tables), 1, fpout);
+		(void) rc;				/* we'll check for error with ferror */
 	}
 
 	/*
@@ -3527,6 +3553,25 @@ pgstat_write_statsfile(bool permanent)
 	 */
 	fputc('E', fpout);
 
+	/*
+	 * Now throw away the list of requests.  Note that requests sent after we
+	 * started the write are still waiting on the network socket.
+	 */
+	if (!slist_is_empty(&last_statrequests))
+	{
+		slist_mutable_iter	iter;
+
+		slist_foreach_modify(iter, &last_statrequests)
+		{
+			DBWriteRequest *req = slist_container(DBWriteRequest, next,
+												  iter.cur);
+
+			pfree(req);
+		}
+
+		slist_init(&last_statrequests);
+	}
+
 	if (ferror(fpout))
 	{
 		ereport(LOG,
@@ -3552,61 +3597,161 @@ pgstat_write_statsfile(bool permanent)
 						tmpfile, statfile)));
 		unlink(tmpfile);
 	}
-	else
+
+	if (permanent)
+		unlink(pgstat_stat_filename);
+}
+
+/*
+ * return the filename for a DB stat file; filename is the output buffer,
+ * of length len.
+ */
+static void
+get_dbstat_filename(bool permanent, bool tempname, Oid databaseid,
+					char *filename, int len)
+{
+	int		printed;
+
+	printed = snprintf(filename, len, "%s/db_%u.%s",
+					   permanent ? "pg_stat" : pgstat_stat_directory,
+					   databaseid,
+					   tempname ? "tmp" : "stat");
+	if (printed > len)
+		elog(ERROR, "overlength pgstat path");
+}
+
+/* ----------
+ * pgstat_write_db_statsfile() -
+ *
+ *	Tell the news. This writes stats file for a single database.
+ *
+ *	If writing to the permanent file (happens when the collector is
+ *	shutting down only), remove the temporary file so that backends
+ *	starting up under a new postmaster can't read the old data before
+ *	the new collector is ready.
+ * ----------
+ */
+static void
+pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent)
+{
+	HASH_SEQ_STATUS tstat;
+	HASH_SEQ_STATUS fstat;
+	PgStat_StatTabEntry *tabentry;
+	PgStat_StatFuncEntry *funcentry;
+	FILE	   *fpout;
+	int32		format_id;
+	Oid			dbid = dbentry->databaseid;
+	int			rc;
+	char		tmpfile[MAXPGPATH];
+	char		statfile[MAXPGPATH];
+
+	get_dbstat_filename(permanent, true, dbid, tmpfile, MAXPGPATH);
+	get_dbstat_filename(permanent, false, dbid, statfile, MAXPGPATH);
+
+	elog(DEBUG1, "writing statsfile '%s'", statfile);
+
+	/*
+	 * Open the statistics temp file to write out the current values.
+	 */
+	fpout = AllocateFile(tmpfile, PG_BINARY_W);
+	if (fpout == NULL)
 	{
-		/*
-		 * Successful write, so update last_statwrite.
-		 */
-		last_statwrite = globalStats.stats_timestamp;
+		ereport(LOG,
+				(errcode_for_file_access(),
+				 errmsg("could not open temporary statistics file \"%s\": %m",
+						tmpfile)));
+		return;
+	}
 
-		/*
-		 * If there is clock skew between backends and the collector, we could
-		 * receive a stats request time that's in the future.  If so, complain
-		 * and reset last_statrequest.	Resetting ensures that no inquiry
-		 * message can cause more than one stats file write to occur.
-		 */
-		if (last_statrequest > last_statwrite)
-		{
-			char	   *reqtime;
-			char	   *mytime;
+	/*
+	 * Write the file header --- currently just a format ID.
+	 */
+	format_id = PGSTAT_FILE_FORMAT_ID;
+	rc = fwrite(&format_id, sizeof(format_id), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
 
-			/* Copy because timestamptz_to_str returns a static buffer */
-			reqtime = pstrdup(timestamptz_to_str(last_statrequest));
-			mytime = pstrdup(timestamptz_to_str(last_statwrite));
-			elog(LOG, "last_statrequest %s is later than collector's time %s",
-				 reqtime, mytime);
-			pfree(reqtime);
-			pfree(mytime);
+	/*
+	 * Walk through the database's access stats per table.
+	 */
+	hash_seq_init(&tstat, dbentry->tables);
+	while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
+	{
+		fputc('T', fpout);
+		rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
+		(void) rc;			/* we'll check for error with ferror */
+	}
 
-			last_statrequest = last_statwrite;
-		}
+	/*
+	 * Walk through the database's function stats table.
+	 */
+	hash_seq_init(&fstat, dbentry->functions);
+	while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
+	{
+		fputc('F', fpout);
+		rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
+		(void) rc;			/* we'll check for error with ferror */
+	}
+
+	/*
+	 * No more output to be done. Close the temp file and replace the old
+	 * pgstat.stat with it.  The ferror() check replaces testing for error
+	 * after each individual fputc or fwrite above.
+	 */
+	fputc('E', fpout);
+
+	if (ferror(fpout))
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+			   errmsg("could not write temporary statistics file \"%s\": %m",
+					  tmpfile)));
+		FreeFile(fpout);
+		unlink(tmpfile);
+	}
+	else if (FreeFile(fpout) < 0)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+			   errmsg("could not close temporary statistics file \"%s\": %m",
+					  tmpfile)));
+		unlink(tmpfile);
+	}
+	else if (rename(tmpfile, statfile) < 0)
+	{
+		ereport(LOG,
+				(errcode_for_file_access(),
+				 errmsg("could not rename temporary statistics file \"%s\" to \"%s\": %m",
+						tmpfile, statfile)));
+		unlink(tmpfile);
 	}
 
 	if (permanent)
-		unlink(pgstat_stat_filename);
-}
+	{
+		get_dbstat_filename(false, false, dbid, tmpfile, MAXPGPATH);
 
+		elog(DEBUG1, "removing temporary stat file '%s'", tmpfile);
+		unlink(tmpfile);
+	}
+}
 
 /* ----------
  * pgstat_read_statsfile() -
  *
  *	Reads in an existing statistics collector file and initializes the
- *	databases' hash table (whose entries point to the tables' hash tables).
+ *	databases' hash table.  If the permanent file name is requested, also
+ *	remove it after reading.
+ *
+ *  If a deep read is requested, table/function stats are read also, otherwise
+ *  the table/function hash tables remain empty.
  * ----------
  */
 static HTAB *
-pgstat_read_statsfile(Oid onlydb, bool permanent)
+pgstat_read_statsfile(Oid onlydb, bool permanent, bool deep)
 {
 	PgStat_StatDBEntry *dbentry;
 	PgStat_StatDBEntry dbbuf;
-	PgStat_StatTabEntry *tabentry;
-	PgStat_StatTabEntry tabbuf;
-	PgStat_StatFuncEntry funcbuf;
-	PgStat_StatFuncEntry *funcentry;
 	HASHCTL		hash_ctl;
 	HTAB	   *dbhash;
-	HTAB	   *tabhash = NULL;
-	HTAB	   *funchash = NULL;
 	FILE	   *fpin;
 	int32		format_id;
 	bool		found;
@@ -3662,8 +3807,8 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 	/*
 	 * Verify it's of the expected format.
 	 */
-	if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id)
-		|| format_id != PGSTAT_FILE_FORMAT_ID)
+	if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id) ||
+		format_id != PGSTAT_FILE_FORMAT_ID)
 	{
 		ereport(pgStatRunningInCollector ? LOG : WARNING,
 				(errmsg("corrupted statistics file \"%s\"", statfile)));
@@ -3690,8 +3835,7 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 		{
 				/*
 				 * 'D'	A PgStat_StatDBEntry struct describing a database
-				 * follows. Subsequently, zero to many 'T' and 'F' entries
-				 * will follow until a 'd' is encountered.
+				 * follows.
 				 */
 			case 'D':
 				if (fread(&dbbuf, 1, offsetof(PgStat_StatDBEntry, tables),
@@ -3753,21 +3897,106 @@ pgstat_read_statsfile(Oid onlydb, bool permanent)
 								   HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
 
 				/*
-				 * Arrange that following records add entries to this
-				 * database's hash tables.
+				 * If requested, read the data from the database-specific file.
+				 * If there was onlydb specified (!= InvalidOid), we would not
+				 * get here because of a break above. So we don't need to
+				 * recheck.
 				 */
-				tabhash = dbentry->tables;
-				funchash = dbentry->functions;
-				break;
+				if (deep)
+					pgstat_read_db_statsfile(dbentry->databaseid,
+											 dbentry->tables,
+											 dbentry->functions,
+											 permanent);
 
-				/*
-				 * 'd'	End of this database.
-				 */
-			case 'd':
-				tabhash = NULL;
-				funchash = NULL;
 				break;
 
+			case 'E':
+				goto done;
+
+			default:
+				ereport(pgStatRunningInCollector ? LOG : WARNING,
+						(errmsg("corrupted statistics file \"%s\"",
+								statfile)));
+				goto done;
+		}
+	}
+
+done:
+	FreeFile(fpin);
+
+	if (permanent)
+	{
+		/*
+		 * If requested to read the permanent file, also get rid of it; the
+		 * in-memory status is now authoritative, and the permanent file would
+		 * be out of date in case somebody else reads it.
+		 */
+		unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+	}
+
+	return dbhash;
+}
+
+
+/* ----------
+ * pgstat_read_db_statsfile() -
+ *
+ *	Reads in an existing statistics collector db file and initializes the
+ *	tables and functions hash tables (for the database identified by Oid).
+ * ----------
+ */
+static void
+pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent)
+{
+	PgStat_StatTabEntry *tabentry;
+	PgStat_StatTabEntry tabbuf;
+	PgStat_StatFuncEntry funcbuf;
+	PgStat_StatFuncEntry *funcentry;
+	FILE	   *fpin;
+	int32		format_id;
+	bool		found;
+	char		statfile[MAXPGPATH];
+
+	get_dbstat_filename(permanent, false, databaseid, statfile, MAXPGPATH);
+
+	/*
+	 * Try to open the status file. If it doesn't exist, the backends simply
+	 * return zero for anything and the collector simply starts from scratch
+	 * with empty counters.
+	 *
+	 * ENOENT is a possibility if the stats collector is not running or has
+	 * not yet written the stats file the first time.  Any other failure
+	 * condition is suspicious.
+	 */
+	if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
+	{
+		if (errno != ENOENT)
+			ereport(pgStatRunningInCollector ? LOG : WARNING,
+					(errcode_for_file_access(),
+					 errmsg("could not open statistics file \"%s\": %m",
+							statfile)));
+		return;
+	}
+
+	/*
+	 * Verify it's of the expected format.
+	 */
+	if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id)
+		|| format_id != PGSTAT_FILE_FORMAT_ID)
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", statfile)));
+		goto done;
+	}
+
+	/*
+	 * We found an existing collector stats file. Read it and put all the
+	 * hashtable entries into place.
+	 */
+	for (;;)
+	{
+		switch (fgetc(fpin))
+		{
 				/*
 				 * 'T'	A PgStat_StatTabEntry follows.
 				 */
@@ -3854,24 +4083,41 @@ done:
 	FreeFile(fpin);
 
 	if (permanent)
-		unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+	{
+		get_dbstat_filename(permanent, false, databaseid, statfile, MAXPGPATH);
 
-	return dbhash;
+		elog(DEBUG1, "removing permanent stats file '%s'", statfile);
+		unlink(statfile);
+	}
+
+	return;
 }
 
 /* ----------
- * pgstat_read_statsfile_timestamp() -
+ * pgstat_read_db_statsfile_timestamp() -
  *
- *	Attempt to fetch the timestamp of an existing stats file.
+ *	Attempt to determine the timestamp of the last db statfile write.
  *	Returns TRUE if successful (timestamp is stored at *ts).
+ * 
+ *	This needs to be careful about handling databases without statfiles,
+ *	i.e. databases without stat entry or not yet written. The 
+ * 
+ *	- if there's a db stat entry, return the corresponding stats_timestamp
+ *	(which may be 0 if it was not yet written, which results in writing it)
+ *
+ *	- if there's no db stat entry (e.g. for a new or inactive database), there's
+ * 	no stat_timestamp but also nothing to write so we return timestamp of the
+ * 	global statfile
  * ----------
  */
 static bool
-pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
+pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent, TimestampTz *ts)
 {
+	PgStat_StatDBEntry dbentry;
 	PgStat_GlobalStats myGlobalStats;
 	FILE	   *fpin;
 	int32		format_id;
+
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 
 	/*
@@ -3911,12 +4157,58 @@ pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
 		return false;
 	}
 
+	/* By default, we're going to return the timestamp of the global file. */
 	*ts = myGlobalStats.stats_timestamp;
 
+	/*
+	 * We found an existing collector stats file. Read it and look for a record
+	 * for database with (OID = databaseid) - if found, use it's timestamp.
+	 */
+	for (;;)
+	{
+		switch (fgetc(fpin))
+		{
+				/*
+				 * 'D'	A PgStat_StatDBEntry struct describing a database
+				 * follows.
+				 */
+			case 'D':
+				
+				if (fread(&dbentry, 1, offsetof(PgStat_StatDBEntry, tables),
+						  fpin) != offsetof(PgStat_StatDBEntry, tables))
+				{
+					ereport(pgStatRunningInCollector ? LOG : WARNING,
+							(errmsg("corrupted statistics file \"%s\"",
+									statfile)));
+					goto done;
+				}
+
+				/* Is this the DB we're looking for? */
+				if (dbentry.databaseid == databaseid) {
+					*ts = dbentry.stats_timestamp;
+					goto done;
+				}
+
+				break;
+
+			case 'E':
+				goto done;
+
+			default:
+				ereport(pgStatRunningInCollector ? LOG : WARNING,
+						(errmsg("corrupted statistics file \"%s\"",
+								statfile)));
+				goto done;
+		}
+	}
+
+
+done:
 	FreeFile(fpin);
 	return true;
 }
 
+
 /*
  * If not already done, read the statistics collector stats file into
  * some hash tables.  The results will be kept until pgstat_clear_snapshot()
@@ -3947,7 +4239,19 @@ backend_read_statsfile(void)
 
 		CHECK_FOR_INTERRUPTS();
 
-		ok = pgstat_read_statsfile_timestamp(false, &file_ts);
+		ok = pgstat_read_db_statsfile_timestamp(MyDatabaseId, false, &file_ts);
+
+		if (!ok)
+		{
+			/*
+			 * see if the global file exists; if it does, then failure to read
+			 * the db-specific file only means that there's no entry in the
+			 * collector for it.  If so, break out of here, because the file is
+			 * not going to magically show up.
+			 */
+
+
+		}
 
 		cur_ts = GetCurrentTimestamp();
 		/* Calculate min acceptable timestamp, if we didn't already */
@@ -4006,7 +4310,7 @@ backend_read_statsfile(void)
 				pfree(mytime);
 			}
 
-			pgstat_send_inquiry(cur_ts, min_ts);
+			pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
 			break;
 		}
 
@@ -4016,7 +4320,7 @@ backend_read_statsfile(void)
 
 		/* Not there or too old, so kick the collector and wait a bit */
 		if ((count % PGSTAT_INQ_LOOP_COUNT) == 0)
-			pgstat_send_inquiry(cur_ts, min_ts);
+			pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
 
 		pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
 	}
@@ -4024,11 +4328,14 @@ backend_read_statsfile(void)
 	if (count >= PGSTAT_POLL_LOOP_COUNT)
 		elog(WARNING, "pgstat wait timeout");
 
-	/* Autovacuum launcher wants stats about all databases */
+	/*
+	 * Autovacuum launcher wants stats about all databases, but a shallow
+	 * read is sufficient.
+	 */
 	if (IsAutoVacuumLauncherProcess())
-		pgStatDBHash = pgstat_read_statsfile(InvalidOid, false);
+		pgStatDBHash = pgstat_read_statsfile(InvalidOid, false, false);
 	else
-		pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false);
+		pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false, true);
 }
 
 
@@ -4084,26 +4391,53 @@ pgstat_clear_snapshot(void)
 static void
 pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
 {
+	slist_iter	iter;
+	bool		found = false;
+	DBWriteRequest *newreq;
+	PgStat_StatDBEntry *dbentry;
+
+	elog(DEBUG1, "received inquiry for %d", msg->databaseid);
+
 	/*
-	 * Advance last_statrequest if this requestor has a newer cutoff time
-	 * than any previous request.
+	 * Find the last write request for this DB (found=true in that case). Plain
+	 * linear search, not really worth doing any magic here (probably).
 	 */
-	if (msg->cutoff_time > last_statrequest)
-		last_statrequest = msg->cutoff_time;
+	slist_foreach(iter, &last_statrequests)
+	{
+		DBWriteRequest *req = slist_container(DBWriteRequest, next, iter.cur);
+
+		if (req->databaseid != msg->databaseid)
+			continue;
+
+		if (msg->cutoff_time > req->request_time)
+			req->request_time = msg->cutoff_time;
+		found = true;
+		return;
+	}
 
 	/*
-	 * If the requestor's local clock time is older than last_statwrite, we
+	 * There's no request for this DB yet, so create one.
+	 */
+	newreq = palloc(sizeof(DBWriteRequest));
+
+	newreq->databaseid = msg->databaseid;
+	newreq->request_time = msg->clock_time;
+	slist_push_head(&last_statrequests, &newreq->next);
+
+	/*
+	 * If the requestor's local clock time is older than stats_timestamp, we
 	 * should suspect a clock glitch, ie system time going backwards; though
 	 * the more likely explanation is just delayed message receipt.  It is
 	 * worth expending a GetCurrentTimestamp call to be sure, since a large
 	 * retreat in the system clock reading could otherwise cause us to neglect
 	 * to update the stats file for a long time.
 	 */
-	if (msg->clock_time < last_statwrite)
+	dbentry = pgstat_get_db_entry(msg->databaseid, false);
+	if ((dbentry != NULL) && (msg->clock_time < dbentry->stats_timestamp))
 	{
 		TimestampTz cur_ts = GetCurrentTimestamp();
 
-		if (cur_ts < last_statwrite)
+		if (cur_ts < dbentry->stats_timestamp)
 		{
 			/*
 			 * Sure enough, time went backwards.  Force a new stats file write
@@ -4113,15 +4447,16 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
 			char	   *mytime;
 
 			/* Copy because timestamptz_to_str returns a static buffer */
-			writetime = pstrdup(timestamptz_to_str(last_statwrite));
+			writetime = pstrdup(timestamptz_to_str(dbentry->stats_timestamp));
 			mytime = pstrdup(timestamptz_to_str(cur_ts));
-			elog(LOG, "last_statwrite %s is later than collector's time %s",
-				 writetime, mytime);
+			elog(LOG,
+				 "stats_timestamp %s is later than collector's time %s for db %d",
+				 writetime, mytime, dbentry->databaseid);
 			pfree(writetime);
 			pfree(mytime);
 
-			last_statrequest = cur_ts;
-			last_statwrite = last_statrequest - 1;
+			newreq->request_time = cur_ts;
+			dbentry->stats_timestamp = cur_ts - 1;
 		}
 	}
 }
@@ -4270,29 +4605,36 @@ pgstat_recv_tabpurge(PgStat_MsgTabpurge *msg, int len)
 static void
 pgstat_recv_dropdb(PgStat_MsgDropdb *msg, int len)
 {
+	Oid			dbid = msg->m_databaseid;
 	PgStat_StatDBEntry *dbentry;
 
 	/*
 	 * Lookup the database in the hashtable.
 	 */
-	dbentry = pgstat_get_db_entry(msg->m_databaseid, false);
+	dbentry = pgstat_get_db_entry(dbid, false);
 
 	/*
-	 * If found, remove it.
+	 * If found, remove it (along with the db statfile).
 	 */
 	if (dbentry)
 	{
+		char		statfile[MAXPGPATH];
+
+		get_dbstat_filename(true, false, dbid, statfile, MAXPGPATH);
+
+		elog(DEBUG1, "removing %s", statfile);
+		unlink(statfile);
+
 		if (dbentry->tables != NULL)
 			hash_destroy(dbentry->tables);
 		if (dbentry->functions != NULL)
 			hash_destroy(dbentry->functions);
 
 		if (hash_search(pgStatDBHash,
-						(void *) &(dbentry->databaseid),
+						(void *) &dbid,
 						HASH_REMOVE, NULL) == NULL)
 			ereport(ERROR,
-					(errmsg("database hash table corrupted "
-							"during cleanup --- abort")));
+					(errmsg("database hash table corrupted during cleanup --- abort")));
 	}
 }
 
@@ -4687,3 +5029,43 @@ pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len)
 						   HASH_REMOVE, NULL);
 	}
 }
+
+/* ----------
+ * pgstat_write_statsfile_needed() -
+ *
+ *	Do we need to write out the files?
+ * ----------
+ */
+static bool
+pgstat_write_statsfile_needed(void)
+{
+	if (!slist_is_empty(&last_statrequests))
+		return true;
+
+	/* Everything was written recently */
+	return false;
+}
+
+/* ----------
+ * pgstat_db_requested() -
+ *
+ *	Checks whether stats for a particular DB need to be written to a file.
+ * ----------
+ */
+
+static bool
+pgstat_db_requested(Oid databaseid)
+{
+	slist_iter	iter;
+
+	/* Check the databases if they need to refresh the stats. */
+	slist_foreach(iter, &last_statrequests)
+	{
+		DBWriteRequest	*req = slist_container(DBWriteRequest, next, iter.cur);
+
+		if (req->databaseid == databaseid)
+			return true;
+	}
+
+	return false;
+}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6128694..0a53bb7 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -8704,14 +8704,25 @@ static void
 assign_pgstat_temp_directory(const char *newval, void *extra)
 {
 	/* check_canonical_path already canonicalized newval for us */
+	char	   *dname;
 	char	   *tname;
 	char	   *fname;
 
-	tname = guc_malloc(ERROR, strlen(newval) + 12);		/* /pgstat.tmp */
-	sprintf(tname, "%s/pgstat.tmp", newval);
-	fname = guc_malloc(ERROR, strlen(newval) + 13);		/* /pgstat.stat */
-	sprintf(fname, "%s/pgstat.stat", newval);
-
+	/* directory */
+	dname = guc_malloc(ERROR, strlen(newval) + 1);		/* runtime dir */
+	sprintf(dname, "%s", newval);
+
+	/* global stats */
+	tname = guc_malloc(ERROR, strlen(newval) + 12);		/* /global.tmp */
+	sprintf(tname, "%s/global.tmp", newval);
+	fname = guc_malloc(ERROR, strlen(newval) + 13);		/* /global.stat */
+	sprintf(fname, "%s/global.stat", newval);
+
+	if (pgstat_stat_directory)
+		free(pgstat_stat_directory);
+	pgstat_stat_directory = dname;
+	/* invalidate cached length in pgstat.c */
+	pgstat_stat_dbfile_maxlen = 0;
 	if (pgstat_stat_tmpname)
 		free(pgstat_stat_tmpname);
 	pgstat_stat_tmpname = tname;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index b8faf9c..b501132 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -192,6 +192,7 @@ const char *subdirs[] = {
 	"base",
 	"base/1",
 	"pg_tblspc",
+	"pg_stat",
 	"pg_stat_tmp"
 };
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 03c0174..1248f47 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -205,6 +205,7 @@ typedef struct PgStat_MsgInquiry
 	PgStat_MsgHdr m_hdr;
 	TimestampTz clock_time;		/* observed local clock time */
 	TimestampTz cutoff_time;	/* minimum acceptable file timestamp */
+	Oid			databaseid;		/* requested DB (InvalidOid => all DBs) */
 } PgStat_MsgInquiry;
 
 
@@ -514,7 +515,7 @@ typedef union PgStat_Msg
  * ------------------------------------------------------------
  */
 
-#define PGSTAT_FILE_FORMAT_ID	0x01A5BC9A
+#define PGSTAT_FILE_FORMAT_ID	0xA240CA47
 
 /* ----------
  * PgStat_StatDBEntry			The collector's data per database
@@ -545,6 +546,7 @@ typedef struct PgStat_StatDBEntry
 	PgStat_Counter n_block_write_time;
 
 	TimestampTz stat_reset_timestamp;
+	TimestampTz stats_timestamp;		/* time of db stats file update */
 
 	/*
 	 * tables and functions must be last in the struct, because we don't write
@@ -722,6 +724,8 @@ extern bool pgstat_track_activities;
 extern bool pgstat_track_counts;
 extern int	pgstat_track_functions;
 extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern char *pgstat_stat_directory;
+extern int	pgstat_stat_dbfile_maxlen;
 extern char *pgstat_stat_tmpname;
 extern char *pgstat_stat_filename;
 
#61Tomas Vondra
tv@fuzzy.cz
In reply to: Alvaro Herrera (#59)
4 attachment(s)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Hi,

just a few charts from our production system, illustrating the impact of
this patch. We've deployed this patch on January 12 on our systems
running 9.1 (i.e. a backpatch), so we do have enough data to do some
nice pics from it. I don't expect the changes made to the patch since
then to affect the impact significantly.

I've chosen two systems with large numbers of databases (over 1000 on
each), each database contains multiple (possibly hundreds or more).

The "cpu" first charts show quantiles of CPU usage, and assuming that
the system usage did not change, there's a clear drop by about 15%
percent in both cases. This is a 8-core system (on AWS), so this means a
save of about 120% of one core.

The "disk usage" charts show how much space was needed for the stats
(placed on tmpfs filesystem, mounted at /mnt/pg_tmp). The filesystem max
size is 400MB and the files require ~100MB. With the unpatched code the
space required was actually ~200MB because of the copying.

regards
Tomas

Attachments:

system-a-cpu.pngimage/png; name=system-a-cpu.pngDownload
�PNG


IHDR;}����sBIT��O�	pHYs���+ IDATx���y|T����!	$\��� 
.UQk���Vj[�j}��Z��?��RZ��vQT
J�U�.�
E��MQ����I d����L&�����3���9g&3�IfB��'���9����<�s��}[TUAAADV�#� � � ��	)� � � �R,AAAd/�X� � ��^H�AA���b!� � "{!�BAAD�B�� � � ���AAA�)� � � �R,AAAd/�X� � ��^H�AA���b!� � "{!�BAAD��N���z���c���l!� � �H�v)����|���x<.��s���>k�ZM�AAA$En����c�[o���*��y�����n���;v���a�����[�4f� � � ���X�o���O���0a��c�VUU�n!� � �H��c,�?���nccc����������tAAADR��Xt���<999�,�n������@�CAAD�(((�9sf��=m�%//O���|Q{��m�E��e����AAAt�
-�M������:�'��hy���w�������5�S����q��ez]:Q�C�*A�D%����8t��NT���J���z+�H[?��C�:t@II�\`�� � � ")�c���/^|��������g�n!� � �H��*�W^y��5j��9sN�81s�L�f�BAA�i��8p����o!� � �H���X� � � �)� � � �R,AAAd/�X� � ��^H�AA���b!� � "{Igu�N���4�Che��q�AAA��t1�`�������dzAAq�CYaAAAd/]/��I6��%�� � � �+$#B(�� � � �"]>+�$/��?���������	]�����z�l�����O������&M��z�*���}�����FAADW������~�������;>\���E������[������;����[B�z�������E��~���� � � �+]>�������;


���n��!�ny��������c�w�	� � ����e��q��x/~�f��n���}���^}�����n��g�������4y��b�6����� � � �)�T��������I�L��K/�u�Yl�����Y�n��{.x��3f������Y���dt�AA�M���e��o�����G�\����'����,)..��OR^^���7w��1��<}��AA�itS�=���CO<�����`wC�Pnn����B�e����e��m�����gn�AA�����e��MO<���f�~��Ql�#?|������{����nV�������������� � "�t_����%���[o��o�f��g�^8|���_x��`���������37^� � ���t_����+u[��5��Y����K��w��q���72� � � "ty�2^3=� � � :���XJJJ2=� � � :�.�X����!AA��P�+� � � �R,AAAd/�X� � ��^���%Y�=�^� � �����!�����k��q�_0b����� � � ��������>j�������;n�u�E>��/�^/����[nY���;��={v�d��+/�l����W��~��G~��N~uAA�����%5����g�u��]�����^\���7��y��e��:�x��W��;w�+������'���-��`A�NAA��n�X�=z�w|��?��q���6}�{S\q���d������7w���=V\\��1AAD7�k�XRf��q��xo����x����f�<x��\.�������w���L�<�b�L�~S�MAA��nc�������
�.����^<����|�^�����H���3��������V+��AADF���e��o�����G�\����'���W 
���=���;f�y7O��-�AAAt�4+�q���'���Vp���uu��c��
�vX�n��b��}�?_�� �~��C��~{F�LAA��nc�y���?��c�yb��Q.���/��`���L�@[�[�	� � ��L�o����K�����v�{������{�o�Vz���>��>���q���� � � ��W����J����\���k8���gXV��%K����1��320D� � ���ty�"�w��PC�
:lX��FAAD�������$�C � � ����e����AAAK��FAAD�C�� � � ���AAA�K��$k�'�AAAt9��bA2"���AADW�k+yy�m�#�}u[F������=���n�?���u�?_�����v�����]�v�K��G��LAAD�ty����'��/�l��Z���������-�_=����\���Eo��^�� � � �;��y/����~��Y�L����[�_`����>�Xqqq��� � � �;�4��Y��������gOv��W\��������4i��f�~���ssr&M�l�X�M�)�C%� � �nH����B�?���?��{���k���9����p��{��_��gV�u��)AAAtS:V�x��={����}��y���o�����p�i��%���?��O����>���3����O�4q��U�24R� � ����-+�����w���l���<f�������2����~���G����J� ����|��
�rss-��-����X,;�o���n�=C�%� � ��H���������k���x��7�z����7_w�uS�N��c��
�M��^���W_UPX��<����^}���'\<���f��n��
y��9=�uAAAd��)Q���rss�
�<x��0v����W������Z��_���r��)�>�p�����"4�������gh�AA�MI�b�������N�4i��}��v����������Oss�};�U�]��r��Y�2���,a7��1��3:odAAADH�b9���z��y���������yyyrrrdY���l�2�AJKK�|���F��������}`ZHd�Gf��A'*q�\%����s� t���U���J:W�O�����o���	&|�����/�3gN^^� ����(���[��<�����q��?Kii)��������J|���i������v���|�.����s� t���U���J:W	B'*q�\%���{3;��)����_p�.���+V(**r��g�uOK/���O�1	� � ��*�V�j��A[�l	�B[�l2d��C�:t@II	3AAAI���O������^��������{�p�
�/>v�Xuu������DAAAt��XF���?�Q�e��Qs��9q����3;"+� � � �S��)S8p��:xIIIR���� � � ��X:��EH"���t���c���1�}�"� � "=tm� ��ym�#>!7�1���?a��������������-���I��_����EA�;o�m���~��uk�����o�AAA�A�W,��z�����7g�3�>���_{��K/�4o����oA��������-z���27X� � ������qWg��M���TW\q�������������}������39D� � ��~t�����</7���=7i�d��~��A�p�\~7���f����3i�d��2m�M�/AAAt;�o���i���]7s�=<	������ ''G�ew���g��u�~f�Z�/X���D�@in��� ��.��bY�dIqq������2�z�
B�P��"����s��9����&M��z���
� �j(<|8�� � "���Ya�P(77�KAA!���^_Ww��1v�m���l����Z,���q�m����Q���XW'��gzAAd�W�<����^}���'\<��^��[�3f���L���v��`7~�������7E��TW'��ezAAd�W�\3e��O/>b��/������}�n������|M}0n������5��Q|�}���p��Gaa��^��:�f��(���l)���L�� � �/�W��=k���f��/�d��8p`�������K���9c��3fdf�Mp���)SR����_������>K�_�2d����gz A�M������}j6l��a�:At�;RV,JK�p�h6+��6�|,By���/CVk���fz,A�M������$�C ��C	����K55Y��"���Y�X��/��XQA�� � 2EV,������S+*d�+��WW�����X[�U�J	<����gz,A�}��WD�B���H���v*�l���!��+��J0����i�����XQ���AD�����8�>�c�����.�<);�iO�Q&����L%���W�
�AAdR,D��m�i�?������(��j1�r�tHN�*�$���G�o�j3����l^�Z��R8B�������v�jm��� �H�.�cI�vO����mc,��c�_��cKK�@@
�,�z��p��&�c,RMM�F:R�<��"��6��z���[z��kW�s�M�����Iv:sN?��#"�������L�� �xta��dDUk���=�����*���iV,�"Z�����A�R8�p�����,x:1byj�bkj����wUA�N�LJ�(>_�{�i����I���Hp�����n�GAD<�|VX^I^��G�^�j�U����q3~���#G���#��?��_w���M�8q��U��[��/�y/��	������X����x��A(-M�O�T�%����� H�z�y���8n|�5U��[�$�1�
t�dd�'��B��t��sg�ED��Kj|��?��������~�����o�?����;w�+������'���-��`A�F�����G$�Min6n>�O�=��&��c7$�'��K6s��G:��>xh���UA����n�X]��A���u[��BY��K�2�]��u(� :����2�yvc���551�2y��9=z�7w���=V|J��v�bQ�A���z�����p�d�����0�O�%��M��bQ�����R�*i�b�~��1��M2��0J��� ��#X�=G�n�q�@ x�@��CD��Mc,��?���[n�w�}���^}�����n��g�������4y��b�6�����t��	����c*�,& ;��pb��r�

j(��c�"�%������F�*������e����
���[���������z
� �C�j�K'O�.�.�� �l�[+����+�����������Y�n��{.x��3f������Y������L��t&�9�&�2a����z{`/-�����M`�� C�f���X�Q�8�o�i�G���Q*+�������I=A�AHS��PUE9��Gqj���3�z��������.^������?�Iyy�b���s��9�������P������t������H,�"�U��L���v;��Xx�Y�1����-cQ�^��%����.5?x2Ya�,�������E��E�u}!"5��X������At4�W�|��7N�>f�y|K(�e�KAA�\�~�Z���c�6n����I6[����A���P(�1�����_����t"��@��A���R�����f��Y;�O����t6����}>��)��Huu� ��
�x�{O��N{R%AqHWVS,���� t[�r��qUU�\{�v�#?����~��g.�@[,/7� ��3G(VWwf�����Im�`�ls��%��Qx���FI��(N��.|�Z�c������$�+��|��(����X�3�l��|���=-�k	��8�i�������g����.+g|��)�>�p�����"4���?7'����w�����\�	�o�;��^�����5T����%}�@m���*�v�D:1Z���������� ���:����X$��BF�,����bD�� :
��I
���G�>�<�p�(�BYO7U,�������Y���5��]�d	�q��w���I#� ��t�(��f����B�H%�fK�]�S����X��BU)T7V�^����f��$Y�"���k+���� �m��p���
1��oc7H�D�!:L�}%}(��DW �K���6��iI6 VWwf����:��[�BQ�'LH����"f�ly�V,L����fhK
$EM���t��h��K����.
L��q7GU������$��S,���Kq�MF������H�����,lUHN����/���1�85�"���|
MI��i�������!E������W/��BG�����b	g��	��gmc�6Qe���g���>}�E����^^�]�����v��2��z����w�/��e��%�%���#�U���h���j�tF,�At"����.w����3]� NV,������d�����UU���u>,VU
x��T����4$V�XN�w{���v���0�l��pVX�$�����)�Xt�E��t�%�������G�~t���-@BY��OJ�)�
K���g��?�&������o��he����_}���%A� ��*I��,����p����E���Na_%A�dQ�����@Nu��j���-�v\s�C���m�Rk�.��y��	}'*+��^�`���h�j�~�PH���:�]�X�L;��:������m�����i�X��zK�Eii���n�ejqFx��ra�������3)"�P��OaoF�b,��E�G�} ���(>������iQ�MmY�	%����a�n�vt���47�&�2�PV��z5%�� 0O����o�p��nK��az_{t����r3�U�$��>+,��X^�����0U��|S���+ "�k\���T(�t���E�>�*�X�lB���Lg{vr���d���~��H�,R,��X"�@G�%�qs�.�"D��M��=�Vq��f�����&��Kv���X;h������5�����7�_K�c���=��<p �Q�Ut���K�JX�(~�.�'Z�Be�.M�����>����ee�_��b,DV!54d���q0������z3�z
�!�S�,�p��v+>_������w!�Kk�����1����`��\X��av��eK�����#F���J�n7�����y�����2�A�����F�����e�f�F���e������hz��E{W8v,VF�PZ�w�9��%7�$�l6c�X_���s�|���b ����v���X`������z@�W`3&����D�������c!�
��8�K�����w��^g�#"{�"�U
=Zp���?"q���q:��v�w�#4s����Kd�[7�
���o���45���/�cj��u��q��1Vh��j��g�5mS�y���v��N����������0d�I55�m���v�� �(���tu<�����37��� =����t^%���-�����;%�����: �w�V������5�4y=!�**����^���(�u."��#54���bIWuc�"]v��"��]W����&y�ym��z���??wdX��l���r���V��������'��]y�e�&N\�j����>��_�������������*�-	����'�������J0�|R]]`�n�}X�$�B�Z��l�|����rf>��6�}��q��Z��FW'��1m�i�s��q�VKD��VkC����8c�n����\'�� VV���c�
���p$��r�����vk�4�p��8�.����;�U}��#�R#�(��xs�=������?���W_0o����,[����\�M� IDAT�v�_X�h���36mVX'<�����-����,G��hs��e���
�Ng=��������4���>Lg����^
�\�?_}�m�bP�&�bMM,k����U�i37�h��XLc,�8N��>f�1A��O+{����HJ�YYTA@�m���	"]H

���^
�Xu�v�XtY���\�9d�b	u�������o����y���~o*�+���PdU;/7� ����s}���������9ISZi��"_gN����JV�(q���p�q{������ZeZ~�u?,�H$�W�*I-6���e#F8��m��Lb,55������Q�h����4f���p��[;b��>�S�k
'�6�Lv���:Js��m��N�`���������)�c����>��%|Xj"IYLv)���Rf�����n��<@��.���f�~�������I�'[,�i�o�����X:e�G'xuW�TU(+K��������}�(��T�%����-VW��3��b,���'FTQl�����G�G�<y�mMo����phT,Rm�*IA�D5m���l�\	eea-a���l��(������b��L�$��b����04�%t���������+1�$�jB�$�Xd�
#�������J��*��V�}Ya��QUW��D�"��X]����>��W�|999�,�s��=�����3���i�`TY�"���y�J��%y����IU�4�����V%��~����Q�e�'�v��
���(�?4�#��!���9h+1Z���|�|��������k��
)Z�)^/���&�q��G��E������L��N��v�3yO����:��n]�$�K9�Bee�9Pl��8O�����XC_��b,D�Br8TQ�r��������&Dg��cI��4#��.� d�%w�z�
B�P��"�}���c��w��i�&^�,��d��������t����S��^$��E;x��moD�F%%�Z>�<�kW���@����A���g���].;4�mu�O<��|y��#9l�_c��s�yC��������z�0TQdoP{��BEE���bU��v��<��XB����o0v��Sc�R,�W,�e$�&��%����X��)���!��
)+,I������'-��X:11L���O���`��E*��_��b����m�+�����<��@U�{I���t������
3�X6o�o�*=��$[�Q4��u,��Q�T��X[��������;��biJN�0�%��p��������O>�&����������	����_����~����]�~H����C��`�|�����N����6�
k�4,;���8�7���0\�Q	[_���G��A%�&���V��b��%(� ��J����*��X�A5J�8:K"I����ub�$���}�%C1�/���/�����)S����p����$����<	>-hg�����5���J��7�d�W�xc�%��
r������mw�v�Ze���M���LV�
�

��+;����r��n��_�������G�[C��<(RC!�,�J��m��M�����9x��c9���(��,2)Q|�Pt�z�n������X"���*��^�(G�����sZ�������V��]	�/��/�
����������*I��U��%%����b,A���X��;A��8�c,h_�c����2v�%�(���X�LA�%�}?����?��g�z��w�������������������7U5V�~���&���M�Kbj(�	�%Z	'O+*��]���^��v�L�8mD�R���54p��E���� �uW��L�#�Br��_}�������|���x�j(������������u�5�0�~�+�m�a��h?�����'��z��x������&c,&q�$�G��	�)�`���@\�}�n�>Z1�VL<�����h��>�����1F���,c!�������b���My�@���/
��G�x���%��d�b���0�����K.Y��5w|����9��`��%���U�>���O����<����g���/K6[�e��S�t�Y�yo4����zu��7����*E	�]��XQ�A����k���'�jH�?v�L�	����#�c����Pi����������2����h>��>����Mo��n+�X1�5�R_@�Z[��M4��$��t�D�`f;1m���W@u�Q,p��W0C�/�lf�I��{�3z�������i��ZaD���)��f�b,�ZYL��������)�N�Huum�k�"���K�PC�
���[���<xP��.��p������{1.	Z����?%���K'$���Bii�]����w�8q��F��.s���	t�_�j���O�;wv\l����ee1Mj
:�M�c�����<u�SUm��u��e. ��u���U��q�kc,��s� �� ?
����bT��1��'=Ng
e��eLO`�K��<�j���o0����	p���Q�#��UE�E8��{F��)�Bd	�c,-7v�p�LT?�4��%|�l-��H%�N@v�e��t]"$�?B$N�) ���%	��H������F(��z�y6)���V�����������e�G���W,�6�"��������r����-���{]�Yd�C@��lB��TX+;*�����k����X�_�������_`��{��������G���:���h&%��.TT����Ma�aQ$#��_T�Ct����w
V]J��I%�m�c�e���c�*K5n��b1�$��2�?���I���b,D����
R]]��'��i�
�R1S}����8�Y�1o�����C�����L��Pii��W������;z$���4�'O�/�{�����Ya��W}��&�PY�s�����V�������_�X:��1�u8�,D����	f�qt�r�����b���F���>0�V���55yC��4�'���w�����=�a��n7{�[�������;��������r`>�pVX�%���I�� ��mq�x\���7d���v��9r�}��F�XX<(:����KGOMio��^�������y`��U�E�����-Js���4B��TYb4��w(������Q���5����������T�8�Ak����
���+t�p"�Y�M;)�#+L���CF��&����;��FL��t�$.�<��'$���}���K�#Z��k�zW���3E?�A��s*��1�����B�e��O�j��tl��K�X}�����;��G�m�%��v�/n}����(�'��������|�jcLc�7�G"l�����v7�0�X"TT���k'�������P��>�����By��r�����;lDS�ZY�@��I=��h��	s�0��������O�C����fe��,�n��������b��`������ENt6m�E�1��7}u)x�@��%TQk�*V�1����c�������.�i�Xbe���_���p����m�T�Ar��CD��� z5��*:�b�����cil�&�����^jw#������.���hm���i��j�����`a1���v���K�q�X���z���cO2����~������`D���
G��SE�4���p�6�'`�Xv�����c��1������
��PV/E�����1�����QW��b����lXv��Vq3�}tNZp�>�L(������?G��/1�r����������	��-3g8��>�����7�+*�i�c�r�5f��\v���BO��&R��ysR+�.���@�FN���������*�(�p����J��
�__�Lk�U,��{M�.N���kS?C����f2F����������7V�WN���=��������p�a��"K���[��%x�P��"R$�!�r�E+uv�[���R��zuRI���pm��]w(+,��,:c��=v>i�
�x<|����S���x�}��!C�����?������~<��J��lJ��$e���wkjr�<S�Q7�2�XTA`>����W\3�b4��6o��i��!���T��rr!#��#��/���7Nw�#
|��QeYv��8@�y����3
����v�����JSS�~��������*��mv�~�ZZ������x�Z��>n��5�!|�����l�� �����H4C��W�����0X�����<uc�T
�#���TI
���+�z��>��Mk1J��_��[�b1�X�����������m�/�l�%��U|�6��X���?t;�������I�"��C�����DV ;v ,�����'��n�=kV�H���-�2r��|v[�D(��	ee����{�-&��\������lQ���$�,Av:�X�R�AU�_}Ux�5�G��S����,]�t��C��y�����n���;v���aC�����BHv��v����Br8�_qE�UW����h����o�SO?�X"�E�e���������I�tu��`I�*I���s<p����W�����1Q,���v��t��/�#}k����R� w�R��*)���GT�`�
�*�
��-|��E���L���}������$'�Y��)TTG��l����b=P�e���w�������a[���j�y�y� ��`#�TX�r�*I����`M,08N�P9��K�������<�1Wa�v�U���d�+g��Gka�$+L�}����~�1V(b)�M�P�ZL�v�	������#��SdMD�|�Kt���T+������kA��{TQ����({<9���5K��XR��c,6����@��7��������]G�%Ya�������}�GW)�s���I������]W��M"�|��_~�g?�7�<xp��	��[����fL��/�E���{��l6���Z���9���l���&��FU�u�"�������dU��K���WIWJ+[������\�AQ����J���g���8��0���E�f�L��`I�1+L���)g�15d3K�k��8��C�l�����~�S�nn�l6GDH�iD_��bq��W_%��e4���Xbg��)a�d�h�|��a�U%C42��0E���	`�*�?2By97���8�Q�`���O#+,#+�C��c���1�}��iF�pC
�\I�����*,/C_���B�Fd�$�N��o�6�S(+cR]��cI9+L�H�*�	�E��B�k���tB��(�`��_<��EiNY������Y|�6eIzaj�M��]�v�������������}��iN�����R�28�z��e�m{�Q���^��M��j��5�\��S����`]�dm1���_W��6���}$N��.��,������,���f9�9��(�W8���B�z���M����SZZ��Z]?��rLm0L��l�k���Z������Z�,e����4n�@D��nw�f���"U��I��Z��� #���w$��X99V�b�-Y�;w���q>t5C
A��K��V�\�1�h����ee�7�F7�����=��&Ve��0���Op���!n������b�K��~�+�"�����c!��v������D�kk���SA��+��0ca���?�������z�����a'�x����
�=xMR,��t����2���,	��F������w���+V>����4hP ������#��-[�����,��_��LARl�������=��Y��h|><�<��JR"�.IB[.m�#G\�#����o�P�e��5�f�����E0��l�R��#�6���7f�Lj��:���h�b����:��
D&���
}}ii)JJ����BCE��X�����#5��={F���
�3>xDQ�)��o���I���w1}z����>�D���wG�,�m�����e�\�����F��l�v��?L�U(�0���������^Zl��P�!������c������:����m�����GPa�b���1_�B���~�9�5%���#�?��GX_� U��]���:$@�G����k��}�7�0F_�������ZR�c�4���B����cJv]��$H:OTY;��������u$�zr��uV���3������)���B*���G*��
�a�

>�z5�9'�����'����'��4�|>DTS���
��j77�o�4/���P*�l�n����HA���8������6����r���N�<��/�|���g�����'B~~�(��{�����h�.[�LA��p���@Am���,+Z������e]�������K<y���8�s{5Km:a�>��:��b
��u�l�jLA����M���F'������E� 	�>�GA�w���Y�w��paR�4���gC$S��Yahn���s�'%�%���f�y>��q���7�W$+�	},MMc�;�y������C,�>�����1�zs./JKK�;~��d�s����~����&��� h]����(.��N�f**l�z\��0�#��ee��w���P��\ "���cij�Jr����@@^#��-�C��P��h^�7�KZK�������yn�;��9u��w��`��a�}+k��u8��?
�����5�4(,7���7uIr,������wc
�x���?�y��Q9�b)��2������*��?�R�3���8��/��v�9��H�u�K���D|�{�Z��ON��2${66����fcq����]��)�����������+=�.8��(���]�x��c���� ���q`���=�`���	���U��'O���9��/?}H��:���35���0�g��ukU `�����Q����L����_���'���]}���������n74�am�a?P���6�ghq���i�t��������&sC�p��6��l�?$V�	,[&2�5Z���G�Ya�XYTY���;w�I�=�K���X��&���\��%��]p�}�J7������x�P��:N�N#�>��FU<��^x����i����#+'O>y����oD_���}Le�E��B�a�&%�-�*a�aba80��y�������&V�`���9��md�>�`I�����v^�����\��vo���y7�
�03��j�Y�3���M��*��7�-G0��%@�����2
� �X(1��8,�����>���5+��8N��E[�8���(�����-���7Vu����F9����C���J��B��;1�y�Hb|�m�@
���w?m����.����EQ��}��g�
`����PRRrAb�xK1�
gP|�$�A��}����|�3w�z�%�)W�3�T^r�w�J(�
1�]RH���S�y�*I|���;��s����Pe9�^�q�f���
��'�D;,��+�����K)�Bx4-YTI��n�h��i��X�N��&{<��������?���!#'��ua(?�� �������ZIn~�p��3I+��.����uV#����zjE"J�����W
��{��lX���a���,S�	��P�/|����~�Jk�$N����w3(�Xi�����Op�L>�	�x��$ZK��W,��}$������8RC��\�X�T	�r�����NKJ�E�|OX��:�|%54����N�P(����#`�1��r2����!]V��@�eOr��?R�������Xn���#G���������g��	��nX�~�K/�����_�p1��56_W��Mm.3+�q�}(4�KH����Z{�=�W^����J�N>U�u�5V�U��%��=��� t����k��L�3��G>����dB�pVX8� �\��BOn�`K��� t�0_����d������r�D��Q����0��]�1��������K�v��g��u�����'I��b���E��0�������TI�66�)����-j1TQ����e��]���	+��z�k�s����E�<���";�����G0j-���q��Sx@���9�kC��u0�B���#��
(f�C�#�.��AR�@��.����TF��;Ng�36�_�m IDAT�Xl6��e
=�t�'J����h�����e�|/TT���wR]�qq<-I1)#{<�>��*@�!m�������~�����s�Yg5j��9s.���9s�H���z\�~g�24��V��X1������x}�������K�yuhg���������q�V������������C��s���
)--������r�

.�t9�����F��P	*��@}R��z�h������B;�-]�[�0o������E�@
;)tM>;���l�-�|9��������,�U����gC���������K�i�W@�k�0��p(MM���GU����q!X����z�%c�Q���>mrW�)j��W�3�T)+��d����:4���W�k*�����)$�)��{~��W���[6l��������M�`�N�p�%�==tkL�����ur�Cv�3���1%�n���e_���n�)��-K8i��M,�b�0	*��M$���HZPh�eEQ����m�aZ���b,�}����;�������
�B�cP�>+�d�I6O�b�un-��n��l�H�F�b	hL]buu��Q�d/���2��X�QQ��p��.\��F[v���Q��Dr8��+����RP��0�x�,+^/���aX������'qE�Xj"1D�]���Wc���.������q���G?
t�n.��CG��L,
,:���D.�JS����4>�'�*����������%Hv;W,,f(��p$��*{��Z	�P���X��l��\i\	�������9B��X�� O�� ����������Bil����FL%�1��3�o��oSVX:a�a������O$��f3-(0*��H����8���X�"JccRI�|!�xNZ>�4�1�����X�����|����j[�\.��������*��e���`���s�����
�������:c���\�5Eij
�����m�	�q;#l�u��b���LS����!�I��]��YN�1��<f?����P}������,�Y�C������"���L,U8�Ub��5_��,[fL��#�%��t<	R����n+���e�D��N����vX���rf�TR�t��B�G��W���Fl���*��z]W��a�,�>�h��>N���O\{m��a��G���$���l��H
C7�N�e�F~�b,�d���#Z�mZ��XJ"�;>��L�s��6?��vFU���bJ	��+J `�n��U�~����Kc�F%d�I���{Rk�l��#�?W,<�,\.��^ijr���n5�E���5�j���3��qU>����h�N���e|�b:�=Q������E	�*iq��~�+�����(G�O0�mL7��-����_v`K�fSv�P�]�1�t$�'�0]��V�h54X�*+��Xt&�4�2��w���q�7q�#:�-YD�2�����>����8�K��EUu��]��"�$R��pY����ra��O��"���l�a�*�>�G���f������U7����;9GKjl���0� FG+TV
�D���D2��*0D��P[���c��v�y�f�f�J-(0�*%U97��u<����'`���c�A�5�c]rd����K����w�������Mw�y�\Z$����b6�$�@�xM*v���h����BO��!x��.��W)L��.���a��p���1D_�U��3v�p�X�:�����1�XTA�)	9��D����b�^I
��b,
,�1z/.�DT��J���,����F�`�[b�H=�pF�0���3-���!���x�&�{�n���bi�������f�����7q���������
��$��K�Ev:y.�=�&TU�R����<���R,-����.��@�_S�,��v7�w��-�=�?TR��h)MMY$��|,&���Bg����hA|�Q� �#����1����C�����K���i��6��
,_�v�}-M�F�RQ�����R�p����0%���'��'������t�=XR�����Ns���!��;zP$64���>L�9�T��`U�ay�~����tm����_���&�4W	t�6���#�9Rb�Yj1�4`r���vC
�'��k*,�q}�}��;�^���>u'p���&�n��&�k���B��#�~[m�*:����~��n8)7S,�u��m��b6�q5%��BE�������wLc}���<�W,c!2���h�N����Z]&�f�c��A
��eo��h/y�q�9%�a��Ng���l�
s���m,��o'���U�V���X2����X��[�d�ba+���,�,7��n"G���ZP`�
r�b,Q��jp��16���#m��<O���*%4?:�J�i��I��p�4��?0f.0�/.?�B��#��G��p<(�].�b��[��@��F��;~v���<��Z,�(D���v;�����Xb�%����8
A���||���Dr�P�x<l�q(�Q���!{'N�U�W������p:����|��he`�d��r���	��\L�h!O���-�m���L'+p�qc
�h�4�X�yOd][�2\O�Y�����)����V����]}�;��i��\v�c��m�l�
s���������}����Ya��P�
e�.�qy�����T�8���l��`./"d�cIWVX���K���)���J�K��fac���b��G�����������2]�f���X������4����0*���06c�����^��'~�#�=����5�����N�].>�������,W��������|�����CG���W,~����g���b�o�������m�l��4
��kLSd���=�4.W�a��������(��v:O8�|#�'��	�X��#�l
z���pI���I)k���`�q�r�����S�
��T��X�V�D|$�+��b	�X2�X<��81�4�4
���+�@9m��>��'c�\ef��jtu��,Y�X�W������9�
om��cq�vf��bdx��(l���6����������N�l6��=Uc�
S�e�S%�a�B�Fm�{N������! ������Ya������B��2v���X-H���=Iu�?{9�moZ���tr����Cq��F���cq%ciD������N���y�v">��X��_���z�ERD�i�n������K�z#�����z��UU,o����c*~L�6#V�y�F\������?��
S0D��N'�k�\���1�)kc,����caw�6-
���^������{i>�W�6QEQil��N9%��%�����&�9#�"�/�a�r����4>cbH2�&�\���]�\XV+��.�����K��\SN�L/���A��-����c,"r_��~��x_p&~xic;��[x*�$!�7���L��#F?xS�]�2TZ���k����]�kus��q;wG�N@����cab,}TQ4~�TQd��I����V�xh�Yb������%��cq<�n�k%)���$���cS�~�[5V���K��:c�D�
��i�v`	&e�'A���Y�e]b���Z�C�Li��ohK?oH�b�4�b��O%�X��#�l�����X#>kK��,T^n�4�|����)W76V����g2>Um���)a�}KK�>o�����J�}<(2��)���lY����^��x�p\��#k3R���Xe��>�5M�*+��pA,h���Z�h�TQL��t#~���~�����{��������~aY����w�U�Jf��3��-[���@
?ibYa�(:7���=C;�UaY�[�����~�*_�A�k��P�6��[�4�N�b��(�Q,�8���1Ccq��yS�b��q��/�,0���$�����;�����p�;f��uG�Q���r�����,8�l.�J/Xlj�g�pcm�C��?�)Q�!'bX�R#v���w��C>"��.�����##k[f�k�z�!���rIc��_���I��"�EL�|�(����0+�������%����������
�h��X�Q,��bw���+i����������Z�_���[��B�	��B������R}}���m���w`<����y5�$"k�����E*,$]H�M�a�t|	��Y[z?+�D��U��~�a���J  ;����#q|,;�-f�[��������7q�6���S`�a�������7~��b���E����x�@@�=���ce�B$�CkA9��l�|��qj_���>0Y_�	� z�����7�Z�>�6N�w��6�J��+�����-54(�h/��V���rfbA\�"#� �\��t�8��7/�)a������">��X��"�l:�m,`���n%���_AHEA�4,��@D����~��V	bu^N����"j�' ��������W�J���}uJc#�H��gzaa
�\�!�a���x<���p���BEE�Q1c�L�)nw�>ri$��$#Ya���.�X����4�7��a4o4�����t|E�X|(�.,�N�M���'Ee	Gb,55�V,�Mj�Z]�%���GA)F.���W ���P(��KU��9�~���5�]�N�nJ��#���?����G���wE��`v�4F��$/����9o�(�nW|>����
yRm-a+����i�
2r���������y��X^�Ob9L>L8��b��F3>,�����G�H�v�������PQ�]4bYa~��O���k�����M6�*]U���?������Gv��K���il")�8Q1qb �Vcm�����/���v�j*�&���p�N�������*I�\���1O>V�s����%^�%-�rvMi1D���Za�,g�dp,�����0}�.��%Kb��e��>{v����5����BmV_��%��q���t��j��k��2I�C!����j���
���i�i�}�'5����%������etg�������5����v5�W�����j�������J���/TV��pyO��p�*I\�"�:��{�mv�9�1�3�=����������G�������ty�I=\��P��"74�v�.ND�i���H�V��@��������8;7��T(V��XE6�
��%�X��Bd�f�U��K`��-�����Byy�������y06kk��q���(F��8���]���P
_�������RM��uxSh���U,���SO
hKeGf�1w�N����i����`caW�������
BWl�������dD
�����c���2+o�%��h��L���0�J��j},��1�f�I���������������z\�	��'�i�Xm%c�{
	96�n4�?���+�t����(6��X����5��'�mV+����p����*|���������'�J�@��X�p��lf��"~?�?���3�Vy����B��O����r�00)g����`����A�X[�x���k�Fs+��v�8��L�4���po��XL��uV��*�ZCI�E�:
����X���
C��,����_}@E����3g���g�55���]h7�X�j`�Ya��E���KH�@QR��X�XY��=]����%�qM�a�vY�b��� -��=Q���q�RD�d[�c��Q�i��<��A�K�S���Y�X�8��"�����c�j���b���������Y��/p�n����[	N���(`_c����6��e_\��h���%����E�gp���PHW��15���<�t"�`�6�����6E��F��Jc�%����������IRUY��������z��YADD�m\G?�qP�q��}u�mfD�m�mDeD��
��i������}���=3+����5�������["#2�������������/�y��{��?�K���Q�B��,�^Y� ��/�#L.�Y�9��
�H�n��~����xU����S��.�y������W�!�<,���xW]�+��I,�z��sse�����4V(L��G��8�=�Mn_#Cy�Kb��#�{'��.��@Yt�a�(;u��19)��H;U��)-YJ��M����[2w�1t����|���d��GNI�u��i�r��[�wn���"�2��PX7�C5(W�k���v�t�9�Ka�ne����d
�Xr�T&G5%����8��{�V�aq��|��g��E����{��JK:U�������_`K��X\�,[��;_rI��[u)�����(.���YY��Oc�5Ag�^�F<^r������u��,�����B.�W����C��PX�~������brU�e�N<�c���K��NJD1(�K���a��Y��U�k`�J�����K��a8����k�q�����0��2����V<���?��OcM�O��F���Y@�Gq�;�:OpM�0�S_e���+L.����oT����I|�o�������t���di�:���p�S8��2?�:*���8����d��C��G]�,�uAN (�##�<p��b�����\N���v�|�y,�a���}r�\�^{��3��]sMY�������m�v>�c��G�����Q(�k#�k)��Gc��6�
�4z��+�8�������{7�##�S��w�A��7��DhY}*��6o�|������8���
��q|��h��_�X���k�����m���y�	/��c��o��r�-T7�#G��{H_��PY��m�\�!]*�-x�f�/�<c��.�������\�w	�9�X*D��F���V=W�X�[�Z���2%1������R�:Iy��b��V<��\caw���4�4?�U��>\��:��*g�����q���^�F}6B�H�,��0@�Cf,9t<�K��&�`e���c9��Kl����j���,5&�����&�ZM�m��s'RY������\�3��D��Y%��J�*�a/{��_/��I��L���=r��i~��?/�����A��w,1�I�p0+e5LbI��l����t����I�I��)���������.W��X����iKB�6{X+W�����������8�e�����ap4�v�5K���c@�e��s~�p�5W�*�����
`#.�q��2zg���?�f<�;����e�����,;q��p9������Up����VxT��l9U��x��	���Xd����xt,���0��!8g�	�?@��g�L5�I�=������5�_y#�E�q�|��������c ��_����x�v�6V�u>�m�l��P���3z���.�s	�?m=����*�m��$'�,�����:�O�&fx��F�����x�{tF���:�K�0�s�y,�����{�Db�=��W����u����SW����B�{NJ���b���s���'�?�[B}-Y�`7h�cW�����SSV>??q��J�a�������{�1���������x���Xx	N(������wP��%���X�+������&+N	�D'X��|bpW7�
?���.�����RP���t�Me��9|�J$��1>X�)����R3o�2���\�{u^�t�xJb,�+�.���4�{���+l��h7&&�\�
n�`���k�b:�}��������"�E�	�Y��z<�n����r*W�.S���a<�/+�ysa���l������$�h�/e�E#�P�����uz}
�t��X��if|�6�&���_w3�v�X��;�������c���}V�=�H������|����M��V� ���}�!+��K���
c��	��a���Tst�W�=�3�����]�#���4>�
Z������Y??�(3���T���rF{���4Xys��W��95����2��a��\%�e~�af"a���:�XD'<%�y��d������8CYC)�.+�'+K��N�P�?���h�����6.c��x?�����^Z��NJ��������EHk9�Ug�
�RxP�w~�|�3�)��[���|��CX��aK��G:���Y�a��sF���F�=��L*[�0(��Z�r��8�^\|5�g5���������5KY��Qe,�6����`,B�\XS�,�d�����_@b55o�0�\�	�en����� >���0&&�Gg�X*���� IDAT��"�n��b� �+8y�~������%���h���	�V6+,��6l��_/G]G�
T����%��s5��C8�������yA}q�zk,r�d����2_y,|���8��H�0@�k[�k�v5�~��o����b�e/�
����|�16�##>��/,(��c�$������m��Y��k�@�.-���[x����[�*Y[���H%eK���[����_����������\*pK�N�[�q�!��"y���o�����bE�u��t��	��F�qw�,%��a�c2���o{3��QG���B�e�'TyD��A�S3��p������3�T��'�F ������W��38X��!f�<2=Rr��G��R@
kT�K�}h&������t�������	xc���������@�(���.3����SX�np+���73a(�����v�<��w[33S_�
_R�7�Q73V6�����\a�Zp�{��T9W���w�+������V.�Jg
|��95���]�y2f�W�����A5f*�H����x�x�&�*KK%�����f"��V���O4V���F$�:�W�.��X����g,�r���;�r>��fL@�(>�.8���������_�{^��;�/����k�0)2���(.�:���������D|�F�<P���n��!�~~QJ�%@;����	R�U�����*�I,���ge����a����#c�>��,:�86�����x�:$�=�;�Sj���1`!����)*�={�=�-��I*j�c,L��6�E ��V�
r:�������JX��S���(W|���XZ�����x=�O�FRYJ\����92�����3~c��'�oP��R�<��4r�U�GC89�V���_������]p[�E
���E�3��1�6:��
��TJh-�����u$�0�I~�M��2t�Xh�����Fk��,����$8��<3���5��x����q��e,��|_F��Ru��p9�a�������X���E�%D����g}2��y�(m�.��/�V'O�6
c�F�1>��=aR�A���D�����q�V.�FI�.v���~�7Z�9�u0�"[U)�|�03��I�����$*�L�R��$���8fH�ca�,O�\E��@����Z��_��L�?�������c���3��p����c�2�h�)��IL�A7[/od���wo���>���N�k,E(����+���dr���;]�0%�ob����09�-KR�L�e����1Q�y���i�4���sQ�X���
���M��8�����em��w�0��j�4cj��mYc����v�q��!\�{�tw`q-Y4c�Ne�^)��;q��{d� ���X��c����{q������|��hC�E����<�u������ef���������D�a������^�Q����akZ�
���q<-Iu0�8K�R�F��Q�4�dRW��^P���Q�@+_*'�A��EXc�a0x}_`4�1�U��;�����I)�]U����X��>�����:��2�>��t��c�dsi#�@�0��	�Z��Jg��8�.���vLc�&�����xW� ��#]]*��J�l��iV�����<���� 7�P@HY��A�"�/��c�|�e���g��7���X�6cq����+��y��g�s
�����saX�����f�����/���S����t��aA3]���q�;���;���/�%���X���Zf>9t\�����:y]%��r'�PO�	�?���I,�$��-:
~	z��X\F�����ID�W��SU��Hk��@�����R�������b�@K��Q+��m{g��_ci���(�d�Czb��$;�����L�B� �kMD�g�-��iGc9�X��V@�	���11�����y���?�y����Y�r�|�`�0�c,&Z�q4����J��0<K/>��Qm����%��H��v�4^f���4�T�&,w�H��\Xc�'�4%�1#��`%�����v���W�K���uf��K���>��K�N��Bx��T)`#��� ZSX�����O���]B����}��?��������.;t�y'A�1�6p	��+l;�~��1\#{����E�\pi,���h���}��?t������B�)=Q���2PG'FT�f,�b'+�v�*�Xl�n�����A\���������b�3���<u��"i��{&Z&��|���L�yT7�+�_~m~��q����W��O���@�Q�T�K���;�E����?��
pBc�C������y,J���N���he�T����7�[)����W+��������(�7*�0�����\��>������k���'������V������+�k�J"�������5�J9E���]r�b9��4�i�P�-�^�X��-��]�d����J�L$�Z��uT��������(����|�'��m�������3��c�W]��xV6[_���Bc,��<���U�^"`�<���)!B��K�R��+�D����1L�0x�6'�������D�"���z�9Km#���Na�.�-���D
}���l���/���@�Gau��hF��t����������i,!�]A�.D�@�2�d���3�9Jb�sS�X�Ni���+/���H��Q�_�V���a,A�������,���������sj*n�L�f��- ,�V��MO�o�������c|/���Db,��a9�Z	~q�<�A]an��/�g-�����B�~�t�	�	������.��RV������yN����"��({��d���e�}�����o_��e*K���G�H�����tK�1�[9f��?���B���M7%��}�M�F���d�Xa��9>L��Kb,����9�f�iF�V6K��Y������k�e�AL,�^?�FxT�H:w� ��6*9v�"�&h,�h�%��_{�y�+QP�6]5a��8R�+�)><�����7����V��X<����C�tL��@����k���l�Fx�K�X�q����j�'\asF$�h+��m�H��a��s������;?�7�XG+n�vk���a�1����0	���d���
V5�eV
�NN��!�dE��i�R���[���i����{����=���5p->.��kZ�j|�\�@�X�L�����D�����Zvq�~�4��)�X�������Y+���t���9-���,�U�r��U��6��o��
��WQ���,h�",���Yuw��1�N#ew{�4�X�&�HE��?�����[�����J�X�8�)����w���~E���hl`J�%��9��=�R���m�.m��%:�nq#a�|��)�H�Kc	��Y�����1�}8�w�L^����4pz��(�	��������l�n�{�6��z4c9N�o#�*���B�X�GCyd$s�mA����7����Q�x$��R�������"C�|F�i,��LF�y��@9��"�T�E~'q~l��-�����
�����9�������w�E�w6��H�11!p6�
�����6n,�J�"n\�*����
CVu��fe���.��R��5����y(�����J�_�W��m��/}I�8��x�:�� ��u`]F,�&�fi,��P!��q,�K�	�D��LL(���Or�0w3-���~0�����>��
}SEe�JP�e'{�� ^�����������Y���*����TG�<����TLc�����4�o���J�3y�"�|���^����@�T�:��������X�_�[�`�BNe)j��F,&M���)��z~�;�Q��{����^v��*K�7r6��
VF�W>v��Y��d��E�����4�1LIN�F�ca0b,�%�9^��/�Q=�L�|	f�F�X���v0����^�5R�"E���6�/��Q��E��)�KAz}���6�ot���F�0�Bf"�F�f"a��r��$����q"Nh,�f��"�i�_hX�J��,��R��cmM4�4�^Jy�Oc�gx����W���T(��<5��L~���{�*��
�y�V���.�[)p|�@pW�I������n�r!��2�\�ld,�_����h�X�2��G����!���55�E�i�Cc�R&G��X�|�X�GOC�$0����}a����������!�����]�X56?�fgK��p��K�LjV�k"��A���[�������O}dkf�����Wc	���s�k��N���M��8�0z��;CNXo.��2�QV���*����5�O+G���,c��NS��
c��Z����\�f�;��)��&qF����+�����Q*�A��V����G�:3��b������<e�a2W�+;������p
����C%��"Q4Y�
�}b��]
������m���-��&�D���/�5�k�I�K��5-e��(���������s���'t���X��=#Mj�M��O��z.�!��h,aA��p����v����x55����B���X�!~a |Bc	���H�~|�-���a�B.p\s�w�`D���K�E�c�b�S�����*<��F�pc�m+l����	S
�b��1��_~�s���^��[��].s�0���oWX���J��2��������~'����������m#D&��t���
7�5=��]a�'�P��G0���"��,������ca����X&'����%�].{�a��������?�=�h�4���4?X������p�oO�-�jg�@~���&:��+��N�e'��/��B�63�
c�a)��2?��=$�j��
�qjS��x��I"^2kN�<4�X<�P��U�������e3�Jci��:�P>�@>���D$�G:=��[#�ZO�
���c�Z����)/<�t���X�������_��");��#�zP�>-N�zw�r�g�+Ac�Fo�������o��S����N�t����(S;��K*wu,9?�p��Hc�sWX"�A�U�B��"��4�X�r�L$r�x� ��k-w����M�|�����W:zT���u�9����f&�dp�5�����X�����W3���`���������|O9��&E@�@���0�X�����-�Z�N�(2�EfEhb#����~�6��X��[�5ca�aA�[,�I���g�E�bjJ�8t�j,�SM���24p�:<�XB�����������_�����D��)g<�#�]t:T�Ck]f]�-���O����H���Kb�L��+�{�U�h�[|���!? �7����
{�@����ra�qDZ����r
1���m�3���zP��������������8���3��i�4��e~2��It�t�yh~`�RN��VhB�x�p\�-�$��^~��
��&S
?�L�7n���Ef�,�<��J�T���gy,z-���{4��c�B���N��K�����w*�����G��8���(��h,��9��4
��1���,�V����5Qc��o�PD[}��u�E�Xx�r'^v/.w�:.�0x����c���*c*X�p����`��]V��k�����?��M�?�r,ne,]�Ok��1<R�:v9tu�������<�`d���X��+���k��<<\3��p�n1��r������5;k����+�\�(�a7u�����~��}~�v��U�07,��b�o����c��?��t]�LbaKM�"��m�e+=���[�>����[X�W���V1Z�/{j,��b&������(���5x!L�zk�Gg�0S)'��M���#�����R6����&����+�._m3�6�I�c�c����He�o�"���H���0�G�/��y$�0��D?O�����r$���l��]{�r?���e�����Xl��U|]EcQ��F�����Mx�Pn��[�����i����.�dg�&��u����,+�}:H����d��X����z�����P�� i������:��b,��0��9?#���B��xz��n���Xd���}]=~�!�2�ey�Qe,|�qj]�D����cza�3�����L9r���������N�J�.��c�����8������f"��Z%x�����I�q��v��!�M���J�JS�'/���
�|����7��{���p�[�
j���r���Xc��V�M8W�T\i�R*)c���W~�f^�`��!�~�����bJ�
:�Za����-i�	]
x���&?�Q��U+�r�X���c�u��$>��X�b�V>�$�)����$�%���	��{�T_;,W�y�����,F�[���uW��[����|����*;6�u�����������C�3!�1�Wx��p���J+��W7m"&c����\o��u�L���Qc�jb��8��m	k���1~�+!"��!�b!t/.�w�_�)=^y,�<�Yt���)�2��c@>E������N�e��9�a,m�]��y�C8YF8�}����@8G����c�����7N������D�B��0�����b
w�0�(V_c��y�{�����5=�,�q����S}_�0P��i���kW�@^>Y�H�/��e�
U�X�1\����J}Uh,����d����A,cQ�0K��}W�_{d${��V6[��`�L���.\Y�.
�R���r���B��T�$�tn%�T{�;���������m�t����xb2����#�mK�/�~>m�_�e�#��8��>�X��)���0w�*c�{��4�lJ�Z�����?��S=H�z����0��~�����b�������<h�a���o�����{�B��.���9�X����,M������-��������1�%����"�c��8��O���x���Qw�07�+l7���w�x�bE*�P2���S�t��5]��0H����h,��ZS�8���]'�}\J��9�N��6���R'��@�S���L��Y���m���m���7�#����z����{cr�| (��%s~<�f,p��E�%��#5��|��S�����a�X)����^��x��~Yc)8�+��nYe,��R8{�?���u��J���\�6ofY�;��O�\�[P���{�f�&�fq�����g�R�%����Tx�I>�s*A���c�q��������et�H���0��8S��}$�������V:vL9	[�,?o��c��e>�.��i2xd��l�4x�0@;����I��,(:Ty���[�����}�KJ�vOF4*�U7b��u=d�� N�c��W�O�����,L�48���v��{]��'4� ��9��A����m�F��|o����@��5R��6���Q�[���f5�en+����l��ca����o�^��A�"|�M��/��zB���)�r&���B��:���~���,r8g������N������=2����{'sUy���t,�-����6�4�Q���c�5��b��s�x���)c�}�e����2�\�m������)/����ef2n��
K!D�������5����\�x���O���:�1$6BcXf!t7��l�\a�5eQ��
��e+���T��K|�{�Ri=����m���F�_�X����HO��L�|�h,�+Xi{�HX33���V&�=J��U�Kl��e\��	X�E�G%4���Zb4���-�0�W�Sc0���X�t�3����w����:�h��x�5����_x�5����b�BHeY���2kd�3�e�����A��
kv�s<�����, ��	a(��Nm����� �s��R$��X6W�8��s'��)Z4�����X�8��U��)�P�b\����������c��@��QB��u)%m������f�=�V��LB�h,��raV.��g�-���D�f��=�a��	H)���%��*5
����Xr�<2&U50���"����1��-3�*�ZLe���23��;pJk���n�����B7)���Ec�r�U:p u�M�:���0�'TP��q��c��+�G�l�(K���4�������h�
���2wSix��j�Xf\�E��K*_�28n IDAT�I*q4�
������q��q~����(27����&&xM��HY0>E�11a!�^]\�
�e�(K�����:����F4z�����_����Ff���������|�a%	G6B������1�v����������<`�3��38�P�����\��v�������0�Z��2��UtKmWX�O���v�N����vW������C��yJ��YK�����
��X�e>�Oy����rT�����c��1L�Xt������> ��I_���C��7�DeM�~������ ��6l��!g��Ne�<aEp+��+LYQtj
s�
�o�2r�U�'�����$�>R�4�Z�,chQ>@���t+�r�bo(�nI�t*M9h��i�eP���5u���3��t��Pw�����$������[�}�
S2+�r�X��EP�
���I��*;F�� �h��
kW
�Ib,����'�T��X���
�����UN�(QI�"�k�����),��X���'Q�8X�'�@yt�gG���p���i����x9���7mb�2	�T��*d!|k�������t�.?�U��Y�,c���n����X+������Y�8(��_d�*��#�c	kk���Y��5i:��R�e�D�x�\�u��\��;9�B����N���/U�M3u-u�+�1��{e�-��7!������K����C��.^�ci�SUr@]�5�
c1���e�������V�N��g��3�b�]aV6K�`���bE��f��uy,SS�3WXq���7���f�lV����G
����+{gk1������X�^��VB���2N
���IJ3���X33�z����A/�e����X\�W�enJ4S������_/4�d�0��y�[��_�fx23�v\a�����ra�K�!�'�\�B�W�=�S%]��5G����X�M.��v��������{�����
f��!_��n��'����u%
��qIr�o�nz�+e��gr��������O�4��#�&"���o�����o��=�3�N�{�T�����M��e,��P�r
���cib�0�i�����L���?o���
5��{��k�|\��*p:���%IW��	u�<��
���K�����6��}���Y:����^
���%G�Xt�>�:�a�K�5��c��K{vp������
�|��FZS+Q:|x��P6����/�i�~4
�G�\�@5����q���c��Za���������T������k,~�^�6xT2�������� ����%����{p����L���\u�03�q��"��ZD�l=�-]��/!�<������(|K�'� �e���i%^\G��XX������19�AY�+�0��
[���I��oS�/v��r�J��4yE��l��=S3 ����"d��1&'���-��%�����T4��r�95�+�H� Y����KI#Bs�Nq�<_����������p����hy����Z���Fi:�w��X����#��j�h�
�eae��G{�0����c	k��9�+���-�O�x��:����M�Y8Q���[=�a�ru	�PDr�0��L>i��s=s�umI�]		�!9��A����o��T�'����1�D�%�=���+�h�d�/x��OnV6;���XR]#���],�E�������B�����`LN���\�j�\a�������7�����
h���dKdva����Cq�
au���5+�����/��`Q�9����AK6����������J�����"��p>���
C������K�����x-��%��x�^����X�Rc����\����o��l$���pkj
����j����d	�::��H�K���6
��c��S������]*���VT1>��Nl�*4�:�Q�.��X@��2Ze/��U��<'��q,}q��N�=�1��)�V����q��K�&�9(����3�����!���+�/�$�_�V��
�X��r�&R������*F+�\(�E�����v�a�|��[��/|K��$��D_����p��G��F4o�S����G�������TM��:�[]����Vh��9��s�h0���;�,���.�����_tdn�U�T�i,z�B��j��l��33?�,ay9����b�n7������_�!o'/1��.O���o!�Le�����V�b�"�N��
��^��i����U�>j���Z��VX@e=��X���W*��m�����08��BM-�O��g�J�8�m����w���T�X��fNM��y�88q�
r�D��^�
caP������W,$K���
��+,���[	�zt���t��m����GF�x��	})��@'6�~�3#���;Bx�V[?��S��qDl����x>�X�_�T[nc���I�9�X<�x���H�+���y,���f,�?[@�P2�f#7"��Eg��j��P�=�6�r�Y�Q��M�!��zhG'�OL+�����).,�4��-�r��aGq�p9�?6n�����@3��L��s"�Oi�~\aP�e��D���w������G�HPL}�k3����{d�dD�����{��J���8���E
��^���7�fe2�^�Z9�)<�o�1c�����R��?�����e�`,�#����'���W2������~���\��V%�caYI�rR`�be���A�Dp��q@t�q�5�dE�-�o�+X����s�%Dr����RA����x��g���2�%���S^V��Y�������YX)skz�m����������U��v��;���'����x@�*�0�KX:?���x�Tt��?����uN�l�o���C'�a�'�5#����v�t��Rc�'����t��j��C��b-��E��s�����2��^"�V�a�<�S������h%M�n=�a�����m�.
�E>'I�6;��gEc�O��rv{�Y���������p��`@\���2�������)����C�7���sjJ0g%��%�������bGn{��F����N�4Y��c���
���E�V`e�E����b19�=����k��7�7o�u~`��}$AW.�~����&��b!i�w�1R�+�+�d,�z���L�:�8���k,M`,�s�N��r�i��Eq��+�]g�#�ae��:AD��4������{H��>�(l[��iD����%���T\�@�/������Oe��Y:?f:-D;�����L����8�]��m�������X��l�;�?�_�������YE2�/��T�m�<4��E<X��6<��R���GVj�u�RY����M�Bf,5#QM����\�ek,~��U$b,����4��J�	{z��P���u�����Q�f�5�1��[e����5�Z��k���C�
�E���n��4	��En���R�Z^<��=���vMr|
��a��hk�[c	�,����6?(9b6�����	B*��Xt��/�M#�by��L���?��v���L��k\�pT��[�X�d�%�6����������R���I,P����q\���r9����c�YQ,���0���|�_����
�,�]a4T~�a�,'�(53��\a��]�~i�����6o6����hM�WY+��%��168`
�hc��/�>�[%�e�y$����}��C�X�����X(���Y��
:�\�{�2���4��\h!��W��6B�L��	z����V��)�4���3��t������`�Y�'-�]{��~�=����x�������5��z�W�@e�I����x��`M��E���+,��b���@�����#H�0��!��x_������-���<���*������:�+����,�n�)����o����p�z�v+��m�^0�	.�<�dw%���XB����vO���^x;B���h�.�<����O�QV.���@�E	dCW.�<66����?M(p���i���6�|2y����~����&n���X�Sa��e�iLc�Ms�����.����8�GH�t�e,O������U}u��QfU����(.�
���*����+������p���y��\q�^����r�E�+
}�����zKa��n���$����J�#���1v���4O��
KU��m9������{�QV.�9�|�3&</D%�S�a���-Y�<x�;���2w����O������?MCzK���0.���
������YO3K��@5�T���+I#������+Ay�������-�d���f�����x�)�0�v�m4mi�~,d�������BU��B����r�%G��5��+�g��^�
���������X���tr@�<�Vy������@:���D$�����
�CoM����^�i����l��(_h�K�j<�K�a�G�z��\�xN�����5�B�q���h��|3!�E�Xty5����&4�e����t��]8m�R?���*c��������t���?���)��Y�4�
��+�����>e�5,�����4��j>��]Z��y{�������#����-�������~)��2�N�U.]�'q��q�<X�����.��BaF`,������+J��?�|2����$�(kL������JM�w��
&�
;�S����Z&�'T}V�@�t��.+LW.���Y�������3w�C�g��%K��,��m��"1��6����m���������VXl����,��z�������f�B�e��5���F��r��SE=bLb��x�����������B��@��L�� ����3�\����3&'��vO��]Y�d����X�z��6��X�U��hNN.�����,bntq��
��H���Zq^m9�f��1{�{�4�q�I��U��0
Z���5@,�w��n��A�|}��p�|�vn��r�?�<D9�t������pN(��j'�;	������n]����5Ef���S����Hb��f��F,&%ZTF~	�C�?�����-^�����E
w�(������]�6N��W����k���Q�W�'������������'2�R�
���=�������<��w�TT�r�X��L�Q�4�������Fe�I>�
R(Ld,��w�������1�bz�������*Y����2|Md��gp)�<������t����qm)K���{������\�cf����X(��r����k�4O>I�0�:e�:�J��r�yi,���d���W����,����������������
�l���
�I��
��/���x�A�J�]��
�doo�\��Y���kNMiT�:�~V��8b!3WX7����`j��x$l�8w��R��+���+}DZ�ZuP����Me�]a�DK�7c	a������T�Cx(�#���3 �����PyNN����ggk�������%����X�\#e2:?Du�En�n�}�\a]�k�<�����Hv��E��������i�����nMP|SRY�bY��M����'���FaY�M�:���m��<`����~Eq�53������)Yb�����5Y}J�t������*��S7���'X33�|oD����-���
�PQ���g%��i,5�o������nR'"�0�%,���%�e���J����E��6gfr����q�T���H��jE/��Xh�#��8������������B�1�,Gi3:K�y,����kcb"���p/1d�y,!9O����W����KM$�l�6M!�V ��8��=���3�"q�O������Nc�$G��b������Ew�A�9�Y�U��z.g�KCCv��93��(�PM�:~�[���d�n��?�<����_���,��y,�K;\����8X�z�����\caN*�g��1��_���X:=�+Q�,�V��GV�*���������u��F�������UX	�N��R5r�w�x��a,p�4%#�����q-�h,��=�P�,�Kh�\aa������T���W0g`e2�nJ�0C������^���X����w����p?`���iQ�_������e9�j��.KX���=������a�J|���o}��op����0�4����b��"�������?��i,5���N�X����}
c����<���aL�gf��9��4�^�[c�zS#�(E��0�iQZ�������q�I��X�h��,iXz0������{�1�O������!�d��6M������G2)�������R����P��i�����Y�[pY�h#K�B9r4�%�����Y����HCB�X�ZR>|�����k��Yc���1���u�{�*�g��-��g,O=���W_
 �J]���6m����v�_C	��W�!�\_�W��Hc�D
j�F������x����>��������
v*�.�%�+�I�7q�x>p��7����%�${�5	g����:�����p�{�,��N5��X���Jn�oc�e�����o������C��[Cf,m���6��N��5E�����,���A6����6Bw�E�#���1S)!r��Yy����+�z�
9�����y�+!�����M7�n�Q�NYcq����p���Y��i�]�L$F�����>%�N����/}��O��|o�b��s�K�/�}�Jc�������T��L��&���oW�L�s2{
�0w�0T=Lc�����a,L�(��@?�L�?��"�$;pHc����c,��2�f,I������$M�,��I,�
c����:���\�L"I����.!����],�d��}3�k!r������P��@�z\�q\�\��*���p�h��X�4v|
��
�����_�a��
��{�f��K�A��j�|#.��T	��u'W���x��4�����w;�C=�����C��UW]u��T�� r�}��k(�����c��L�`����4ZG�0���_r�aWN�t
����y�"��..�7����ga����y����$M��T�	�$���X�I���.nUU�u�����I�;��t,�j
��D�t�����>[G'�z1pT�2(5	��s`��M�'K�4z�����t;��t?��n����"��*�^.f��F1(�KT'�'qn��=1!y���5N�;%����Xa��Y�;wVI�m'���K.���0��_O~�����@�����g�H�2�����n=.�k��|����A����}���d��
�C�t�����#��y��k,�PYZ���xc1b1+�����>E�N��,���ck������'�
O?]�Q������2�BAv���c�y��;���1�Q!���'��5K	���X�K�3�c��H6��p)�L��
��X������9^y*���d�A\���<4D�iDc�j�>���
��6a���g,X��&3�?���z������;/��Bg�}��:u)D�*�a^���X��KN�wf<$o���|Owa\v+���~,1f�o����Z���\���	k�K���akM�%c��
����F��*E�J���ic���#�g��p�,
F���rv����?=�a,��n3��.k,������|a���h,��q��Q�p�G
�����lMO��u�#�z�f*Ez��&o,OLH��!^j��3����Bl��+������+��s��R��?-.\������G���G�����(~x����hw@�������g:��tou�T�����<wz�X�N|f�E��R����������7���[��[��<'���c�����g,����jj�k�;q��0,�H��=%,���f��{2���X2��-O����z�Eki��I2��v���;����{���~�����r<x��.�Ir�`��L��^o!$;����ko��i�2KX�h�6M�����-2U
��gwb@e��vi�>R�&��~�.�e����f���@����y|b��&�r������%K*wr:�������3����!��*��l����tB���S����O���<���X	��B���ER�=������}$e�+Z#G�c�������
�f�m�E��X�����N�a�8:���BM��ol4�y IDAT���(��6�ha��Cci�tKF$�������c���,��Ss�+n�����E��p�����\a��>�l���=�>)��!g�rW��b��K��mB;�8��*=JKG�x�}t}����6m�n#��H�����i���.uf�����/�5����`���6���H�O�8�t�������2w��O�w�|�|�(����n`���d�4�>������ �E���RS�0�
��N9t���{���!pZ���+�����Z�Ev���K������e���1�X�������|�53��g�u����&}�-�i�!�P.��(8��j$@�"���p%$��� �g_�'�����B.I�a���P��.�x���������(Vi�~�:��u�E����vD$!���'t9�<�Q��Z��<�iX�\^�n�+_�J�%��G"---���xP
3�rM�+���KR�X���88����i��J��:`-����tL'�,am������>���`�B9�5�F��k�p��m����-{2-(�A4���F�:P�j���#kS,�XV��9��]��w=Rj�Cv�����x����u��k�Sq����82�(9RBdN�
��_�f,�C���X3���r7B�
�5cb����V0�4z?����W@e��E9���J��%&+���l��op�n!��Yhi~����B9��g?3�b�{���:�O<�zh��������X���*�	��a�i)F,�����������Z�0t�q��>��, ��"L���{����`e2�p�pQ��9���E��nLM�*Z���4!fH3g��yo���=�n�h�P-
����;��-��>�����YHn�'�d)��]F��.�)b�����k��<�N�4!4��A7+�Fr
+�h��Gd,��KX��V0b ��{��RI������\*K}Rvzk�*�{��e�u_hZ��]w�����(
!��J����r���-������.N~
	�2����e�r@������2���2@����.@����{*w���C�Xr&Z���3�E}��.:lWX����$t������X����e���a�G���/E��� ���e�.L(\�O���-�����Nc���Nc�-d����
 ��8����K��9�P�.K��q��������vb�cx��lM$y�H�i
���	��&&��<�#G011�e��qm�,h]akX�����)�=��#��{���&�v�N	�s��1���]�qk��f���p�����r�8������ovmT�+{`��ux����@=q� ��v���3��]a�]�b����u'�qDo�����Gc�R�.�4��������Z���_�/x�epW����	����%N������8�bT�Y����]y%��14��
�g_-�y����$r,�~�hiE��Xe3g'�m�v��>9bs���q�U��}�#�Ae��$�:v���4�p�'���H�
�?x�dTxu�����w#��������?��~�(5#������
�z1m����x�T���6��XrO?����!���h�^Qy:G�b5\+2;p���t�ap
��n���Q��]���;p
h��_FkJ�`��"��\<p�
��X�~���s����}�
�5�Xj��������i���{��8
����^�dI2�\�j���x���������<��~��D7K.�0���U��7����@k�<��f,J���=N�Gh�'�(��+`=�0�*m�^P�-��XZJh��Tb�l�W9���o��4`VOWtm@��n[�~ ��\�}����>�|�Gw�6�Ss���u�7�+����d4R����1`6`��p�v����������������}U>�
 ��c��s���].����8s
����u�i���;��c��`������x��J?ww�8�g,8z���N:�J�O�������?���Z�=�~-��'�w��fcrR���2����E�2;v��XI��+"��������s��,*�V�s���[=�FAa��\�
TD�������2����
:/e��X<�K�`���e�yOa��#NJ��p�������������CP=�p������������*�k8�@�Z�09�w�:l���l���.�y!\%���.�N$���s�X�0��)n�v~�������i��JU�C
��X�S��gx���K�J�������1���L�y'�����p�'c���;�hI�o���e���4��_X��5���r�j��B�)���,z?��Ev�z ��5[Oe21�X��::��slv�����J��j--�bh�n���1�������k=Z����Yt� �Ib!x�x��[���ik�7�x�����~�z��]�v��c����#_��m�&�%���1���|�G�Aet=�U�[������U7>�z$���OeuWV.��BxH4Q������<ghv����Y�0������\v{�vI��n-�9w��4j�P�X��g,�a����]����a�'wbg������2e�S>��ti��b�$���0k��GFl��a������>�A������]g	�m�b�`Y��[�+l;�f^������n"�-����!m!���
��[';�e��7�
K�
���;
W��e�������3������3#��"�psi��{hh�	M�x�}���X�X��pAod5�bm�Ba��C0H'��-+�7]���$������uhe,���������
-�o�b&���*�AZ#�5�q��Q�}�S!�����^y��SO{�;)��]�r��rDt�d,P���qe�V.G��y�K
}y����k��q,�x�6�7%�-�k�b&�Q	-Y�##��%z�R*�������i���3r*�w�aa�
�[��I5i
��e������������w�}�]y���?��i���|����i�I��� +�X�l�n�E�}",�W��X�a���*}��\X�VHFC���������N���=1�p�:c	�������V�>8�}�v��]a�YU7-tiF2�+�%�S�M
��:�:2\���r�Q�.�]����cIb����/
�a��H�r�|���|����r�r}'>Z���B?�����Wlw���b��tdLLc����Q1�XR:t��f[:p�44���+��2��D�F�"Fi�������f�����"���s�qq�|����q��{��I� V�Qcq�ncb�Ob�hd���cXv@����/�E`,F4���fkffFxu��P$p�fn��t�������A��>:L�����L
���m!,���i��7��*P���<&�d��#�m�X�(������E�dT<�)in�o�$P��U.&.k,�RY�]���!=���Z���&�5������m��*�A~m"~�uV� Y���V�>�`
?���,�f�<����Xx���B���XHf9��������.���k����)`�lnf�j[-IW���z�Zsjc���W�G�K��|<���f����X�A^�&	,p��J��GvT�����1 )%�p����KM�d,]@��L���}�j'��h��5���4����n
c���E���
���M��T|B�H�i,ak�U��\[A�N��B
��Na�,��~:}��R����*c�s�^��/p��*T����6za��A<'�������AylL�O�c�e��uv�:z1�u���eeye�5a���Pj,
{���f�bA	�1�����Q]��]�f���r���A1��m��8�0����)`#T��t(|����!�O�p��q��"��~�������8�\t���]�LW(�r<�7��{!�����b��\�i���`'�F�R�!<�gQ�eq�d��$� ��6 f"!��B���B*���UN�<r����w��@������K�YY�����_�
z��tS�2�bt�v��d��d-�h#.�"���ONE���~��1��t���=�yr�'�*!7���99�	��c�����n����#����|�KC�t��[jU���k}R�0��X�~,���f���u�����XGo.ca�d��.������<�����3n��j�����Hs;��u��]WZD!DT0���7�Xtv���Yp�k�h,a�%�EV&���]  �B����Xv�v�N��g|�Y�w�11!4�3b�,���+��da�S0��w����"�_KCC��(�:�M�cqwe���nz��Kyh�������[$\aLca5�Xo�P0S)�*a�q'�_�����<�E��*�������h�������]��������v)<&dS��.���M�2�Za�����Vo���I����lo%��
�R)!�����kj[)����Lj�&�{�q7^D/T���vn��i�����Nw�~FP�J�EX�p._D�����b��|���>��H
?�w����%*J.���~�zi,��)0��qQ���X:�����.H9+���y��R���	��F�8����V��:5<�H����'�"�Ri����|��E7��M$%����p���;����������8���l.c	;7g�����0��p��B����v��1E�A���e��K���~�����:�(�2m�Z5�3��L\��;��E����Z��5t����
��q�+��F�d����{���b���$f��G1XB�~\�6N�w��,�[��8�S��j��[I��h,5�D�be2����"�bp����X�b�y�XI!�Xh��@�v�]r�f����-a�p�{���Y�b�&T��dPW49�?�������;�s�������lL_�0�mN�pBpp[�:��|bb�w�$s���
�ffl�\aq�� q6_�J���0.U�e����R��U���@Wg��?�h�G�Ke�
C��R��]�����_�8P�=�2����-�{�������=p��E����A��� �	��Gp�������5!����|M�yl'���<)mb�'���5@��+.�u���u�]D��
�X�����$��^�����z��F���IOz��#k�;Wy,5�Ec����/�����J����������_�f���a+�,��Y����W����F��5��������2���<a�0���I�g/@�
*T�u����i�����G���0�Vt����,�O%���I"	ws%9Z��3K�M���C�����E���K����GD��7W��
cq����������1VT�,5���|o�b�d��Fo	��(����\*Kn��}b.�����*zX�0�)e�}�8z��.[����X>.5Gf-Y����U`�q���O��������h�z����x?9T��4������o�a�6Of
���aT���s��q����#����
ct+++<���Rj,{+ T� �g=����s4�����c�l���:�q���9T�����X�����g,���]����6J�x��������9tc��!�h��,���[����}z��M��#	\*�c,Kp��	4R=��	hV��X�h��,uGl:��,�\�9�8t�Uu<�st���QD18,���y��0�i,s�D���TL�[f,V@�������E�wK����x�z1~�AD�2��b|��UDK-������w�B����x��S!d�'�eN����Q�������.+s/9�����V�s��p�|I���������@���8P�C�r0��C��~��"/�3�Q"	O�l���������0/� �X���;�Q8<$U�m������w�U(�Kb!��U�>����v�h�R��b�ff��(wK��U��/:~*]!&�W~��k"�8���d����	jke�J�������3��^���tN�8�d\��8�?i�V<u���b������1��;e
�v�l����^Qb+�3��4�0��HK��,����4��R�VT.t@�@(��ke*������S^�`��Ous�`m��h�^���A:$���p]R�+&33w���
�VX+��5�� ��p��g���,���������y����z�����'��APq*��L��
F���Q�����"DDDd����z�����������?�9��a�}���VQ���O����q�}�������Y�M( ����b�%�'�Y��V�qa,�c�#Q@�R��3w����$]��>	i�"w�	*�*�RB��XL����Z�Sw���D��V���R�]�B��O^c������Cx�	o�la/xD�`�}��$UaBF���LH�1��Op�;kn�j1���h�I|����q���V��Ka�����t%��HP�"
�|/�|�Q��);�����Ta���X��&�0%�Rb,%x���2��YWe0I�����w=���P�X����	����$�
���#������;�m�\��2�w����R���x�#X�~���/����n�/�SeU`Me�
}K�6����j%���scI�9�*1�P���g����
0�������jf����"�M%x�}}�����G$U�� �)�q���f����~8.	��0�n{�e���'(�BX�BCG�����k���B�J��
��]'~�]rj�p�w
J�i����H��SY	�"1����~oM�G�r�%�O���E��bB���� ��<��1���H/����-7!�?�2�m���,������FA
XB}.\N�+�e C���BC�+�T7�6.@n���gx��2_K��p�m�N�&���[�V��|U�WB�4 
3���_�u�m��o�s4��d��ZK�Q!�L�0����~T=��$T�XcpW���T�����Y�'�}X��^����$��J����c&�����x�Y���T��8��;�:��d�bJ?~-��T�E�ng,�B��X�u\�R�������*��^�{��/_�1��6ig��Xi9n	^��c,4�:�v��x$|c���	^�|Y��.������������|WjD�H����_��/"QFN���S����O���p��a�DG�m�O��XL�'��Jd�E�	H��X�.�������30��q��I�J��et�n��4h	Bi��E�.��{��
[0o�xr��CXg?g3�'�������N�����Xc��F�;��U����c�0�Xd�7$��9�K�_HJ�F��L�*��C������t�]�p��S����n������B*����jx��D�EW�&�:]x%�D$���3�m�U'����W$�,�d��
_2�\T��*q|��3?
��X��Z������������4+�F6�����s�����>"c)�8Q:�m/fn�"���}m�?���G�FH'@���$csic�H��,�$���]c������X� �x�DfX�c���1�rm+�]/J�6a�������8���Q�#�	��*D��{��e���{����&��-��9��z>�fM���o�,mTa���e�n�|�wT�^�K'L��
����X���q�����>4F���c5�{�������?�e��}�J6l��$�z��h�6����1�I9c��[����@<���Nt�o������-�*��� e��`��O��DT��X���^�s��9����b��lE�V��I�TKu�B1S+�e�1��q�{]+��*1�D0��)��7�<����J���:�K^~�����Z'Pi>��D��_*�jX,��z�zEbR�:��? �G^�/��SY^�u�{�%}�[ {������Im[�c"�9
�yo��3ih�����9����k'��u�������~�SJ�)�
���?��>�g��{�s���QD��������8{P�FG�>������0�����1Cl���c��j����-�������+��2Z�_�c����W0]����Lf�������"{-�{�D�'�,�SNba.����<eNe����'�\X��0h?��l��XR�>�����WW��>Y�kt�/o� IDAT~dr�o�=��1�>4����	�|O99���H?�UW�������
c	b,�K`!,#������X�ES!
����J��RqP��G�0l:��!�I�Y�j
:���`,��@�.�&u���B-b���8�R�+�G��0��h���U_
'�� `V�;c�K�-Mh3��v�f������,iH�
�N�WXf,FUC�w�8S��[�KU��(1��N�#�g�N�SG�H���CK�u�-�5����p*An�X&k��h�B�;	l"{E��M&��@wr�+�M�&���0�>��8cI�J3��S������:���s�"J��n-"c)tw����������P�����X��&��](�+B�X6n��p��X������]T�P�����=aK�Boo�P`���@��j�m%���'|��h���/��?
(!����3���Z�1�E`,�mY.,n�Z��������R.L0^y�%`,|p�wc������ ��9�����w����^4��Ua,,���v�p�k�����M�Y�|��C�.K�0")�844�e��=Q���Qy�L6�d ����f�m�<C]c��������5
E��]�c�:�F�HOBo�A����Ex��5�`�����x�K$�T��C�jf�d������l�E�h �fHR����<x�>���X���Dj$��,:�������?4!B&�%�V��=s��mjs��3X��-�"���PQ�,���Z����_�?Vy�]J��c,m�*l��r��"N�]� ���Y�{�������*l�U�V,�ww�n�<��cG��s�������O�l��tMJ��bet�fv�Q<z���?t��b^� �o��g7�������@/a��$�o��D\�m-YJ##\�%��GG���3f�	��ud���C��P�]��p���b?k1���(�j�r�c���}���o���(����iTfi�z,'EJ�H��"�}qtgPB�H�j{�i<h�ES���������n��1G������`��Q�����������^(�������������an`C�y,��0��e���9��S��DB�%$��>c,��s��l�<�w��}mC���d�����L����F�I@t��XLV4q�c�%Eq��)E��|L$:6��7Tmo��+cZ��d���L����
5K�&�I�����&�Y�L���,��J@�$u���`�0?M]��3�&��
3#'��r�z�oG�B�$��O�	s����X�-��3��cX�+<qM�g+����4���p��SWW�@�Xq���v�O��w��/�����u����PTa:g�|
�!}�O����e��p�U��?����/�������On��\<�m0z.����66�o������_dxa��X������{��F�
�|Z>�����QF�ai�����9��gt��Bw�>A���t�uw��XU���Q��
\���c�Z�T�;0�c��
�E���o��s���W���}���`%�x�sX*K�c�n�
�j������=�?����,����UXM�X��j8�*?�E#�7o��%�E�����W${��~��24"�-8�f����W�1,���]�u
����/)����c,�P�z�{1�*�d�8���A��j}���xl2���7"��������M@
9n�ov�Rpd,j\����d��)D�BT�w�����a�����"c��i>C�?����E��Yr[$�������Q(e4�Ar�~�Y?!�"A
��|�d,
]�u�+�\U�0������Q��v����cQ��?�k�������;8cQTa��_���vaypd,>�M8�*�c�%DM�b��1S�m��p���\6��Sn&(
���F��q-Ug)%w������S���Y�4V�\�7F�o?th��X�
H����������a�G��jo����_�������b����M/�EnA��X�������vg,��z�o�=������1$
�E�wP���3�a,f�NWZ���00P���� ���3�FY$��
��>21}��e�3����W����<QXu���7t@�)�Fi����9&+�f��Q�W�a{�L�cyEwe�!k�jv
�������X����1�SQ	��bJ2����i�m;�Gq�/����_��1�i=j����*�c��Ta|���ra�aW�����5Ch/�wuE�Jt��i����
#aJ���5��������#�y��i|%SU\�g������~/�5�7S�c
��=��f8�e���%�c�C��R+���x������������E$v��~���� ���/d,T�����Q�?�O�V��Fj�)����]#&�����m?|X,���q���0v*]�#q]�����J.r�{���<����02�*L)U�0F�	��qxH~��Mo�����#���d�%����G��Mj�AIVv#�F�	X���8��������q1��M��F"�"J�����.���9|�3���A��j(?��A�D\��0��sF���1�c����_� hv+_I���T`I�i!����L�P�)�X�t��2 ���'��l��T��^0�R#�Op";G�ZC����y'L`���X������&U��� �e�>��T��a?� ��Q��W�Z�={���C\@2�����������"z�s,�_�g�vz����������A�;�k{�G�FWN6{��B��A4���'���c,GP/��pU�������}�1��sX������2W���X^��/�	n]�iB<�4�������P�������[��+�%�AK!P��1��h��@��A��Yf��5��^��l�h�!$�'b)�[<��t�����=`>�F��c��|��	�2���
[�U��b,��Q�s���EsUa}h���;��X��*�g�����Pb'��l�B�����c�+�D`,��P32���1�"}h�'��5�����K��_�)���z6���i[�3��423�>��@3��z+�Pq��Z������<(�jA��Zi��x%E8��G�����L�*������|Mz��%��V��}$��'�{ko�N����>&�q��|��7��L"s<|�%���V�l��H�����?�7~�X��y�DiDai�)��� `�y,��c�g
��~Y�������0���XX'�"<E6���q�f�0��/�=4��F��V���j�X)"���>����#���t?���C�c�k^�;8u�����j�~R���5J�� !
S����3{Rl����[�<�U��W	5X���6S
���2b����Z`>��?D}`�j��x�S�=Z��2�aJo����4��W����6��G��������B��@��~���Q���[B��o8�N>%s��������X��q1���",&������4b��a����������_c�:����N�s�LU�b����D0�j�6�PW��	puK���m����X���Y�*0��|����|Hr"��s���|s�Bw���������~>b��.^����F���d�����o�o���o#9�~�����{�h��X��P����iE$'=����
d�,�nUF��Ta�w$�*����d��
c,����W�
R�r��9X_�[��d�E�~P���x)����V*X	2��<�~Rb,���8&��,���cX-���I,&o.�*�t0?j�8w����8F�k�������
C�w�����c�)��"L�+|@�h�Cqg������dJ'��������g��}���{xg�/�)��}���+N|:��q��,p�"<��s�������$���	r����v��u��w��e,U��(y������	c,ln��\��"~����^�����p��X~��%��^��Xx���Ri�I�E���Y7��Y<��1N�I�=�����1���[�[L*��\cQ��K*��?"}B�Pg,�A_��;�7~x�������������=b��8����W�#���ra�I,=ca�DX�=O�x���z�_�US}�R*�Ua�m�,q���5����X���Pa�ccbicvKYm���[��X�a��=Z9���S�z���GX*���������l:3���'��P���U���*�Xt���Io��M�
�Y��o�mG�p9l��.tl
K ��=�s�k!)���3tS���j�����:�s���umB��6c��U�K�C�����w�]Wi�u�����Te,m�����Jj7���t*4Q�RXI�e]��e�������X�O���.\�^|v9�|]�9����c�c#Q��T�%G�v���Q%~z�=��_�����
;�~.r�0K�s����iV���]b,��223�ra��H����4Ry1�@��5)�^��T����>5P��������)����R�P��y�%����p��>��^�L"?~�`i|���7<�����������Tcy!P��L����%�@dI��$�t���H���X^�$��UGjtD��?�5����9@�c1}>7��Vb�11�f�V���'���&c�C�����wuR�iT���Bx��6�^�q,xT���J���&*���������Y�?-h�(��B�P���x�!t�XA�<���0c�1�Y����S]������+�-��@��e�n�Lp�E�g��\1�"�L�8������E�����}#��0�
����x�(x0�� ��'�I����/�i��'�P���
���Xf?��8���cE����EI�g�#�Jl�+�?Tb,��?MF�rn$cq�
��A�Or����#��`�7Ez��;v���5��"�g(��.g������Ba�������AWU����7��E��o\�0j���1�*4�<������fLo��"�}Zmf�O�FNa>���N~&bic�dT�ED�K�X�E��K����A��~[6�V�Fh	UXrT~U�=�e�����&���,�Ba ���d ,r,E��i�V�������j�����sD'G��:> ����f�Zq �3}e�=�m�+��sT�	�:�k61S�%e�r�;�@����?y��Y�*cA5�a�$�r��4 v�?��\{��;������<�A�Gk�E�,?��r:�x��:�Z��!F�&�����\��������.#�1���1��"0�ra-{|Zk�)Gi�Q��i?��QK��e9�������01��@h��O�wIY�=�Z�?e�\X?'����Gw��o#��'�=T*�c�h:<������Uz7����}���Xd�@�9��Y��h�</e�[�^�.�e�%�~�K�P;����������r����O�9��-GQ�*�X_Q�,��1�C8}�>�X�rav#���m%�aK%��[*#UaP������"d,i����������11���/������fd���G�{�Je�9^�U�����
�V��Px-	�9��:�Gi���*LN�Q��	�����_����Tb,�#��ra�Zy�:���� V�x��S����q�wc���7�<�@����E��I*mF��?������W�,��a�V2����Z��-���cq��.�sQ�(t��>����)���UWe��+������j���,�EQj�F"��,���+1��%8��?H�:�@^�O�����d������\*��O�2����`Ah^'x*p��?���!
H>�3�^.��P[�f8+���q�z���q�
4�O\�R���������	o�Tb}��j�0nn����1_��&����U����`,%x����WSt3�$	v�Rg
Tho�-����
���8&��P�X� _��[��w���$��j����;+ci�fB� n�3j��AH�7��<s{��D&�H���k���#�dp2z�+��F��~���qI���$�7�*��TS��(�������H���X���D�����e����W���|�r��q�"�I���r������L�L$F��5ggry1(q����6����OW!���]Zt=�%�c�f_�gJ��p	����X���
��2&4;&R��s��F��-cO�_���+����H�����sG��Z�X�������[��')�K[�����&1����������f����Rj�yF������X��)Z �t�����E����R6Kp���nba7��Pf��0��r���HW�X��&dE������U��E��
�e��.��0��Ps$H����6�|�������C�}��*�#�9f~��9*��A�����|�����Tr3yi5�m�V#bU[��h&o�;���\b,���g5�� mB�Y�xNZ��ra��X�2fZ�����L�IG:SE,�7b��*�?@�%*�M"�M���>��/��+Q���#wcd�����\�k$s��!�pN�e���FT�Y�8}W~�|���w�>���|1��S�~|����9��9��<c�=����|E�s��Oc�t�fHZL�rvD�}��L�������-���1�A�C��+~��Aj��v���->�+)��I�Y ���b,}h����,����P���	o
7@602�������8(����F,�2�o�s�@G5����e
��
����{\~�(I)Ex/�� �E,�M0����qV*�+�MOV.\�o�)���5��j��f�8m��������6QE��$c�3����!�_����.�4;c�qk�ao�iXB��\�eUDb����Y��bj_��z$�Z�������S��V'�X�;ca�#
7��\XmpV�Y��=�R.c����a*1}��cg�G�uK:Q2�����T��i?�����r�Y��x-1��e�K��T��d,^	gG=����Zj��0��\�Y�z���������V\�\����Z�y3�A����g���>rG����}6���!\�w��z�bQV+����q�-T�}���1�I�3���c\]O��#$��&�����?H�Y�����
#�7�Cf"N�_���*V�x+�����J^K0�jx��t���0ZQ��X��%)7�.GI/#0�������]�?}�J<��X�3\*�k7:(�n4v�c�2�B���1��+AJ2�x��*k������z��`&g_�l{%�b������i�����S�|��3��j�)��C]F2��c��2�7=(���o���&�S�K2(�%�1
�V*��-��IcEV
X@y������}�A\��X�?Mc��b,fk#��I��e�p�1��>�,�,m;����2hD�=i���6d�
,�����,��$�`q�F?�c���W�B^k~�)������&��9�"�������H���5N���x�}X�{�'}���1����8
�a��h<���?�h��,IJ'�>=I7Q��R+nq����Xx;q�Y�|n	��~���E������ 5��C1$ci�nKb��
#��0����b��!Y}h�.W����z��M���!��l��!`,�h!mq�1l���������A�({.�=�\ ��$����*c)����o)d�"3B�G/ ��j=9VU����l�g5������g,&���V���kl�r��3��r�E���}����
�������*G�h&{X�e�*s0�]{K����-����G��L��h�����:jG����/��!7]L��a�[(��@�"<���$�����ij��@}����g�X�[�����Tb,���������A$�l��-{h>��m.����}��9%@�2��=��1�C�5�1��!IJ�9nL��i�t>I���
���s�Vc���x��&>'[�C 5������k&[p�9�H�x��{��'"�"��������q��D��n���)5�Uw��|��A����������k�i���*��D�m����f�W�n�.Q���U�"���� �����m�6x'�=���V��o�0���d��K��a����;pP`,]�V����6�"�G�D�c�L1�����6c��Du���b��H�:(]0 IDAT��*����L���el�P�O���a�8��i��
wQ��X�2�`����XY�C8�*?�b�\������d�HS��~z)��:�]YT�j�c�m�sQ_��
\m��lUF�n�%���5:c!���{�Oe����[����/��B�y�W���&H;3�E���~�;��a*1�X��Q��T�zn=G��+��`k�U�k��O��?�H'H����=�@F3�rH�r��x��q�w���r������/������������
�kg�����S�q�p���ar�������~�6�W�kLq�
�k��f���L�BG�x�=�E�ovU����r�\���Ax����)w���gV��D@</�L�	�H'V����L��2�bR�1�0��Rg���or
�n�:c)��o�P�7�X��	q�f�/
u�=TJQU&�W~����&�c��P�bScF��q%���x��`6*����iy��e)��U�`��/�jx�3k~f�=�
�XKF�����'�~>8�i��.lC>����RD�����c��&X�v��Q��;��HDy�aW�7���Hp�����>�#��T@2�&�R%W�lp�l��P�(��+*��#�P�[V�fo&��1�����Z����LX��p���4��p
�"z-�J�J������(`�#l����Ua�Q[����#�'qb�-������d,g���\�����.���E��k<�E��=�Q��X���i��
��M���������7��g�Xu'��3��O_�1�e�+�xIee
cqo�"2q���e����(��B!�z���
�?8��8��������N�c���:}���[{����h��+�j�i��
�=k�������\����5�d1������]��#����gvv��#�e�sQ��Q>������=�Kj$�m���2����/XG��6�
��zO3XBNDq��h�F�I3VU�V�����wY]�?�tU�Do�j��
K$�3������
�,q��������O ����������@M���
K��Y,�}8=��Wd��2�ms������l�����q����J��!�lc��Za����'�����p�
�A�������)?
/��$�48�RT�ugD�*�������c�P�C��
~�"�j�I�gD��Ca0��;[j�����n�/x������g������d2J�ih&n����[��hf��5���z�Md�p����8�'jL@q?�mtCz/R#Xs��If�����$G	
�^�H�{�{����7�r�0�:g����C��
��Rh]{�?�?*��M}��A[���36Q]e�"�����t�.Q�)��J��Tl:A}���L��`���$�"���V!�>T��!5�w��3���5��Q�H�D�85��G����sr�|���~���/�s7f���,���2����	,\��P�f�z�5������B�	���TT�$�J�b,	��c���Xug�|���P
�1(9���IM�r���������������D�=����W����O�^3�8�]���M<ir"���W���b�[^E\�������X��U���4mt���r~���/A#T::��<;�QU�����
���I�Tb,pN����vs�T�*L��������"���D$D�b���?J�j����9�%*��_H�����/��?b���sp��D1��,|�Q��
��d&�0)���4���V��������{Uayyb�g2;XE+����BW0���Y(��1� 9�w|4�@��&n�Xn��1�%����;��Y~Y��[�}�����jO��|1 ~���Ih�2�sga`f���P������h�-QJ�6�K�2g;�L{��v�l�"�\�`��
�V]O�|��J����+w�C	�7p��3�"���di��b�LWx=S�����O��Z���.hu�:�\���p��@d�i��
�X�&���X��zf:��sA�����/t.����
��9c����K/�f�XQqHI��@�|����8�8�]^��gJ�W0=tF��m6V��TL?\�R��������m�F�|���;��A��<ci�CXT<���K�X�8X����J{#��o��Dg��n����	�"
��kT����`}f�����.����=
@e,�UZ�M���o#������W�Nq�l��o��y������oG�����W�H���bZ`"��N�1���@��������:o37��>R*��U0A��
[�qd�����.��d�oFm�/{O�Z(�E���*9x��Q?>������rga,$�K&��|�`=��&��1_`,|{�y\|S@.������qH[�a��g�������kN��.i��F��*�C�`������[g������6Jun���j��r����9�;�"�:�1&�����?����*3�p%D
��q��g�i�342���g��n��m6o93S���dL��U0��W��a�xQN�)����b,�	�x�'�\�r8%��1j�u>�D��E��g��s^
L1�B�O�C�gfs�$��n�+|��1{$��	q�>U��X,�������b�w=�DAM����V�V+�7/����>A]��+��m�U��yqbX�[���ofm�Z�-n���=�b|���L�E��}�������|+[��Pn���x@��pUd3d�0�����U�l<��F����eV���������
�����3T�|��q?'A���X��,$�yI��q��mg��x�c�k��`����:2�Z� $�_����d��8�XD}D���x�c�DWxA�h�������f"��x����e��X���c�f_j1��$�O����k�Un�J��j!2�D��S���M;S�&	,�T�8,���XT�_c����Q�� �IN#�I��m�V���|[1���b���5{���<;��X��y�'3����H�`�F@�����x'S���~��YJ����T���Xj����`�y%4���?���nk�T�0L�fa,���hG��D�H>O����_�����I�c���Ua�U#�2�e���|&d):���
3�+�Bm/��95������Zw�����J���N����E�X����0Kbs�K��Ti�'JE���6����u��O�c��+ �N�*}�.���<	�s����_>,6���_��������N<�Lq��w�x@��K
��l�xV�E
4G�ml�<*�����yXk�EK���T	�gW�'�86����r6H1�����e�/����o7�'[�s��x����!�s���������x9���x�r�_1�E����*7�m������9�������~�-K�������	�i?��d����Ze���5�PD�/��*�J}���a�������8�>�������FZ����>��u�i���o<[�J����K����X�'��������o��`�$p�g"<�F�O����8�<��I�Q���=<P$L�e�K��l�@��=��_{��K
���]OH���AI��`X�Z��Ta�6	fr���������9.�P��b^I�*�jPO9���LyGT�*�*37��D4�����q.-=�%����+]��R��*1��nu��~����LX������7�}��w�y��[n)���<�w��;V��}��{.�xl��s$
���`,~��A�JExFwa�PnK��+0���j�*�p2
��X�k���f���$��5)(
�!�IIHdIh����������2�r`�����d�N,���KU�P^@s�
�k�QB0��X(Olx?�:!��[��lT����)���/q�R�sb�����������bi�A �vd��(��'G(�D�Z���cSrvY��'����*z*c��!\�2
�W"�����7UT���
6�"�V�i*a�1�o�cCX�I)�M��;64�����������H��O-�r�0����1s��V�J/�K�����a�)n'�?���k���?@���kp09�'�0��n!�)V�%��>������m8���$���D�O{ ��Y�?�p�A�}s�#�4\E*������
cx���u�,��Vc�1�^Z������(�1�*������zG6m:��m[�h��%�������N>�����?��,��-��nY@�W��:���$m�^����+�Z3��dBI1S��*�z�
���m�L���^�s)t�Z�Yd	��e���8]��<�m���V���<���>�?�p�qw����N��y���y,��3'�~'�W�)��?X��xZve)G�@4:�X��8Sx��dDkQ}7��f]�q�9��XH���fH��
�,=��	�O��I�l6��HQ��O�����/~�V��]4-�'0#�U����	c����������t���X�'�O0�a��+����L�0��������JJ�T�Nb�!9�!�eh/�8S(���"�m��m��Z��)x�'b�����G�s�.���'Gctf����p���R����:�����g
�B�F\�r��]���7���2M�E4�����X,�bO|_�����C�l:��XfV��l>��u������$ca<�'YGd#���S*H����������k��8�������2��;��P.�8������D��CxVO9*��eW��=��p���,9vY�z�W��1cd�c,��I�L-mOER�����[�>��N�\��E��������;�Sn�6�=GP�N�U^+��c,JN��VC����K{�]T�]���4'�:���3���l��q��SY48���!���2�=b�E��L�L�7�]dp3��,��8�<5��<V�Lk��G�d��t2nE/���Bb�vXxEU�Y
yr1�K�|F���,�[5�Hrb����[0E����;��01��&S/<��H���<�4���4�K�@�����
�f_��a�������w�~o�c����tS�7�����qy���8�x�n�l���KZo*��O�m
N�9ru��~����]��	���������0���A��j�"���Oc���P1�46v�S�����-M`���]��f�>��-f�V���c��`	�J�m8E;�������� %@c5?����*`���;']��
+�r��X���Z3=��v��1�Nq1�9��?m>�Z����{x��Rj��$��^���(�EP��3��m\-�y�h����5�wd,	`p.p�X��=����������18F���i�aj�e,���8I��L�P�vB��� �r�|�Q+P;q,d�Iy��/
c ����Gm���I>���j&�cF�da�1Xm�����3������S�b,����Xt>���F�{
Oi]�Z��E�����f�E(���������-��m���:c9�$�&x��
k��@���3a7��w�9n�������
c���Mh��K�n��Z/�0#*z�Hq�������8�&x��0fL�*�!��*� �X� !�������+�*pg�c�c���N���l����fOC����*LW7��G�l�}�6#����4��9��1F9�a������A6;<`��O����i��	���U�8��8}�!�a�s���ZU���`
�qf�1�{J��~���:s
�%���1�}arH/Hx�eQ�����XY��Ib,�7�'��S�Y�)���`Ie�FT��6@���sOV<�d*��j�Xui�����DFujMr����t�cA�h~��>y����|�d����(����s��L93����~�=9��W$�TfX�*����q�0������.��ACuKm�o�2K�y�}f�~���0G�b���!:��3s~Xr��G��P��ke��"����(�����*p*��0E�Pg,�0�3rQhv��2�
�����OS�c	b,���Am/j{��/C�xp��&�S2�=-�����0*�%\�#3�\�G����	6������t;�N��Y�m�Y]� �,�4�us��@�t�<�R~�0."�a��1�������4D��>������f ��6��3�M�{�p��#?+�0�rw]2�B�:���4UX�~J�]jW�X���(�X`3��K�<[���+�m:1�b,�����n3f�(P
J��E�'3^r���`�1�2��J'�U���^���b�-&A
����:�������ra:ifoR����k$Kr����K�M������hcQ�Y�Ku-l��$	P����aS�bg��Yf	�0�b��������	&S!���Pk�[��h
�T���;����������6����8]�����7�x������c�����@~^	u=B�	����w[�'H��LZ$�K�*��O�,����J`Q�=�"��IT�(��6������U��kG'�8�Y+�������;�Y,D�%}"</x�V���C��F}$)�z^QU�V���A	Ke��~���0-H5y+A}pG��-k|\�H��[�cI�QV�'�������'�X��y,�}-�n���s8�'a%Ud�}W
���/"iP���7=���YG����X�x�(�3�:�)��%����z���{@�B�T������P�c,�E�T��+���a���<�����;�9s���Z�`�����o���X��0���
�;����c��Z�,�_F�Dc���V�'5i�
���S)>��Q�,|�SY��A�T*��y�Gy:>c��+p���+
�>�q%5?�ba`��DPq8gm7�����K|�)��f��GU�\�j)����oqk��U����5�t`.p�h�l|�|/�������w['J2��T�� `,�vp�5G��8�@%	�����^���r#0�}����>��a��1�+��+���9��������j�z���Xx�	aW���4,�����������S�DN�*o]#���`����3�1 #�
�j�.��������>��.���~�n;��*�L����8� L�T��'�X.��	C�V��,��4e05�Z���I�B�I,0�X�k�i?_�%����.�%=\��s�O|�����n��g�9�b�8�E�77c�v�����H��9?D�fI��t�d���IU�j�U%�Ub+��n�Hi��&����.%���t�����K������'���;�q�,�R*B[����%���M�bgdd*KT�
!��!���`��~qc������A��Z���f��$�K�	��6���:t����)�O�gXH���+�R1l�����`P�{�,=�T���8&��4�����A�X~�\5$a�D����d!���E?��d��E����1��l^Gv�^��[G5�\�K:3����1�-/������������,x���mN�e�F*�"3^��M�T���X����
Z����������4�#����L��4��@]O����Op�-� gQ�+��1���X�rQ�BVnI��:���{9��-1�E�O��C�2��������c0��};�GT��K�%�/�	����!&���SL���	e,�V ��'������J|S�e�`���q��;�_��xw�^����2[��}�9��l�b�5�z;.��uq�gR|x���>c�:1I���
T�"����sd&�|3�
��6�����Ee*����9�1�9�y?�1.��n��1;LQ�����bH���$d���-&���v�wU�$.�P�x<C�b����mvaX�5��s"K��f����k��xhqg��?������&�5����A�x)��K�(�0�1�����_�{����_t��3!�J����0��V���h���;B��D�E�?�6b'�E�sR�:�Z��f�����d���v�}YSY11�}����sO���j%d7�E���U��XDR�2s�����<RM'2�����o��9f�������Q�;s�V$eU�%�Q(%��E �T.���������c�����d�eO��[4���I�7L�6��� IDAT��C� n���X�S,��~��������q����WQ/������i���������,�z�?���8��;!�
A�����5�"MKZw!?P�0p�� �0���Cc,z��^��(c���3���2���j1�F5�O�1�@�����`���]F��qb����2������c��$�qW�%�u%%�aI]�'�:�O��0y��h�����_��x|+����N��6�c��7F��|�������*�EYUu
�x��8����	�����;���%�tpV��1�d�E���k++�!�U�
�#c�
�o�}�D����
���������Q/[�����Ib����wr�^h�q�B^�u���.!n�0=�p����H�Iib,L%��G���&U����6\���_�x6�t���`��X�R����7P+/g����4�be,��}�'����
fV���-e�+j(�uw}yq�}=5��o�{�@mS%�^�"����h$������v%��E��B,%?��yF�>Y�0p�����#fK���
Ki�k9-)HT
�
��T�<puqj�Y�1�X�"���d�o���O�~m��<-{��1<�Ob������0�����|�oJ%e��]����JE������e`ko�{>`mj4U0�K\�XR�x%��s� ��+L�gh�-���uX�������I��13�3q��-1m����@HF�X��Q��(��UwJ��+b��8�{R���TK�`���u��N����7�G��l�,3IM�W\���J������>����c,��Z���b�p�AH����y�2�)"��vcv	����4��
]P��AS^?�����:�l����le��]���|���#X�
;&_������B*����qK�)������3�6�����M�&>������y�9�J�N8/�raZw����fV&
��a��1k�p����N6%w���	��;���B�����1���U����A��e	0CZI�b
�#c�0,/}���K��K;6PE�p����$Z�M�$\�����*�'���>����ca:~V��<H\G��DL�a�K8�{x���������XP�0l�w�V]����^1�I��j�����L�h��4��M^8d�K��.����T GB�u�F����������x����|�B\����a�1��z�sd\��5C�cMU)��q���L�I#���)��gE�3IK������c�P#W���0O�n(���%&�g�mZ��p��9���=�r7.`��s�xQ��c���	E�H��Xx����d��w���-&'�RiGD��X�rV9�u�h40�2ah���E:+=1��C�`�Y�_r���Z4�%B�����S�j�������z�� �����K���.�X,�=��B\���b���hb��o�K�0,
�}���S�L�W��_�]�}�P��S���AV`�w���K:m����4f��$�|�~����5?(
Uc�].,7���������9p�N,�"���U�k��,k������q*�650%K9��s�u�#��-T���k+�;a�v�3F�%�rU��J*K]�-q�+���#*����BUG���,|��
���(��:�r�wM5����u�Z��r���nky�����z�KB�.�9U
k�����E����QM��}j���+��	a.(K���0x@N�U��l����'�,�i�{q��sX�=�c�p�u��%������p�}TE������i4�/B�_/8���� E,%wDuF��z����=��Yc�����i9{�W�2��*�#�����!8J�^����
z6�%����ce�XH�0��rw����>^�����%Gq�7#��<�T����{�����rN��i?^�u �������G0s#���
�M=��a>����d�����jT�p��T�R~qSYR#x�_������W^��1V�=$
a81s<B��=������ay�����&cb� �0l����T�\���WK��t1�:�D��A�R+J3%�a~nS�A�%{�c@|���"R�,. H���K��!c�7������[�lL�Ts����>�����6�T��e�h_��~�]*1,u���Hi�z������5BI�8I+�U�h[���yae@f,�`��1e��^����
��	��1�>"�4n���������Q������dq``f}�A����� �
�P���LZ���YA�S�Ja����M�0�e�U���������E���� M�C6���v���Xs���n�)�#]K�ra2���T���a��qc="e�X8,|���CX�8�b��]'��i�c�7����:c�?����(�s�����:�Eeu�Q-Uf/}�|JJP�K]�EO��������`�������:S��+������H�>��K��]����r������
��+J�Lc�������	�$c)C��A�y�tR�6�:'�#���KqD�f>h������c�95�y/�P��~#�"����1l�m���XLU.��\�}W�����x��!��	73 	tgk�����vHEx*�{�qN���V�+�*�����Ipv��9,7�����S6�$�oT=p��
��Z�4o�l��������r�B�Je}K���0���&��]�H�w�c���]��,��~!F=����qH��[�����%ec�]��%2a�c��~�����4���U�m0,�llUI�����T���>����j������lr%���t�V�����1�v����_"t�3}�����e��w`��1"Ng��A�/{<����"Sw�=N��b�����E
3c|�X���N�V�h�X��q��G���xCf?��-�grT��_�s��Y/J�Crn��b�0�s��&���)�w��94�"���O����.I�����J���:�0
M:<���}leo8Q���*eI�eO���"�mX���e;\r���/vb�X�e����G!d,r�v_�+��hDb��acJ^��n���������P>G����1���������y��b,|�����sXk
�4M��)�yD�s�0u,��r��!��v������k:�����e����,GP�V���Vt�\�Zg��O���3�@.�����I����a�I,��9���]_������������M-�7������\���Az�p7-_�,1�����:66��*�(����VzY[^�
+�M�0�����T��m��2�a����o����nN����sn��kn�mVKo~o�k|�2����',��ca��r��������B`�j���f(��)�R3��3���+~cl�z�g�!��G����+���	iR(��v����&?�<q^$G�3�hS���"������8SS��p�l�1�l�v�l�GJ�Z����5>Ey���3��%��n��p����
���w���RI��Ta0��Xb��xd��<���r-��R����3
cI��Y���c��;/�UX���Mg��5|���"�d,�'���:KT�C]N��6d�6��N$d�{��fJXB��p�?�d~�����E�����s�K��-���I�c����{��H/�t�'�,9J����Q2Gb!��|by���B*p������g��m�������A���b>,1s1����g�K�y1���X�i��'�5���.\�0sc���At���[�����c,(�����������~��w�Z){L��4cY�s\{��^	K��}�"�!nz{���t�KS�Pc�7����O�	���I���F0W;����{e�=~���Sb,v���a~�;ci������6�����a]6�9u�����S�X����y����Go��}�U�x�6PV�00��]W���4���p|��I2���v���;T�q��#i�X\��3��N))3��c!n��|&
,.������pM���\p�:�Cp�w�Hu�s�O�b?7.���4���Va,�{�E3�L�h�^^�o	�Y��]�����m�����0�U(�'�bsG��>��qq�y�0;6��>����&P�d��U���, 3]����k�Y��$wI�E��4����}id,�
��X����VD�v�Q�}����M|.���o�eg?v�x��2�5��Dg�=h<��|~���V��#I$�1��	�!3�)��K�6g#�D#9�xR�Y7��{�R�|&4�I�R�^��#�������U~�������p�;�"�D(w�����"c,37�1=f9��>��c����0���7���C|���l1-�l���KH����k��w@���O>���~�c������}����W����O~��_��������d��O
u��)��g,�w��k����d���uB�-�(V[v����(�����S����2���
����T����B~;�2�z��7):$��5�-T�PL�HJL�Fn���N,2�R��g,�E�Kd`M�9��{�nI��g�%������x2=����� ���_����`��0l���c��������+b��Q��Y�~�������Jq����z��F���\��j����@�N�nK�K+^'T#�����mv�T��r�E�����D����`�D�����1�/�g��O��U����;*���C@�xh�&M�3��b#H�.�D"�,|H�P�����[��g������	@]1g� �W���Yni���)�=\���c��|,z���$c1��:����K����
7��9�9c�c,p.p�#<6������}W"}���W
gG-��j���E���M��;�������K.������x�k^���~���|����M�vT%n%�L�r
=6��1��Y�P$����u�d���e�����8���~����:I/��Bf�7�b1Tf)O��<o�NI�I.�`��3�s�b�_��oY����
$U��tB���n��&��)K�\����3�d,���Q�i��s�VX�\�^�X��X�HZ8��g�����}i|�x�H���<m���Xq�^���Ff���bF�Tf���+�
��Z�K�W����s��g?��b��a���*�� �O:��e\�5�������z0�����ae,�Hi��7_�������������Hr��'�S���"d�������TfCi�hb,ko��M�'C%F�wG'�����������N�]�P�z��laX~�����w	MkY�0L�(b�8�������+Im/��g���O��?W#B��|;*9J�a�h��^��Gx,+Y�o��a������-�U�g;��c���k.���t:�h����>/���i��`���{���W�F��j�����%���H;cm�4�hf�������}a��c�����;��������!����U�^?��o`Q�)	���tQt?�W,���g��>��xZ��%��cb,-���r��d,������,#��<�:M}#���������\�8�p�r��������f��U�������X��6lvK�/��.��/
���]4�=s���!�K���p�
��lI�5CRz���q�mx���by�zq���vjD��uua9!���Xz/>��!?�+>a�����R��p��Q:/�s��8���25c�������(��u?r��5uD�y�h.�����|bz-�������N$9���~X3�k����v�~���8f?f7�>������e�zS�������X�h���C�@��,��#���>B����2&"��.���X��f��}���<�f�6SS#A{h������DA����T���$��y,�=���u�655���;z�-�,�d1�:u�b��	��H�-����V`5�Xt��Y/���k����*ZW{��=��5V�c�H�k
I�-{����%�^�G�9A5<����s�>'M��.���l��b��{�2�����dYu��j9��1�m����\�&�6��x�?%��F�%o�U3�\f]}5��NJ� ��,1�/cI�1=���=�ud���h��V�
��S�8�8��,�������K���6|�\�p�W�xm�m9,���8��)x%�w��m?��E���lq�q��.���|�3oF���X"�wLz0�,{D(���\������c�����K-Q0���Y�����.�Rd�?�I|�
���?@n� Zu'��Wo	����Q"��=VfQf���uM�_yq{;�*��G	���|
�~$��(=�3�	Rgc8$9'�W���o�����si���'R����Q������O[�������.Uf,����>��W\����t@2�,�����
�=���vRd�0�F��L*��>T\�����U�&7�'��������y:�&"����8-{�=sD�%���82�6�A���IkO�&��q�CR>yW[/M��1�w����8�^p��b,��/n�%�1%�8��1���^iG<b-0����e������v�B��3^KV������������
Hn�6��P��o��c����3f�2���*�5���@�8.�Lt21���}c(������N�7��x�tG���+�y'G�D���zx��^�|���b�+~��gv�h�;2��
��I�%t��Gr���
_��G$%��{����RQ�aR��TM��Y@�
c��E��I�i��VX���	\���;��D�f�6��P�E������;�����.-�\ZwJ��]�����a����Rb��������4�1����/�UBA��cy������2$���|Il0�����[��f.���Y�&R�E�{��v�mW]uUmm-�t:=::ZSS366�>q�
7D�N+�o�h�Q+�N�����P�c,�^����s���G�#�?Y(\L,U�X�~4��M�&�	��4o��#3�
������Y 2��r��O���.Wg�y=�s���o������:�0��*�\d,�+����_TJf��1jx�#�����e�N�E)�$Q�S����LD�
Z
e��`4��������=�V`_�[��W�;�X8�7�������Ar&#+���t-��n�X��s�
�f�pn���^h�u=�J(�<����X$
�!H�W�:�Hm����ch&.���S��97�A���O@�>�Bmn�^rQ��w����~�L��5��<q
�Uf�}�'B+��#5�$?��|2�X>c	��I-t@N.�`�,x�}�vV�����Xv������(���
�������s������K��x�����}G
����0[x����G�������P3�����>����WD��XL9�K����y7s#��
x����u(n��7��>�e�w.C��6��,�\�x�K)v��M�J���*KU�����.��5}��l&�Q��5����F�2Z�O:���+����}���D���N�:�*�
!��M�)v��9pO&��X��5k����;=���g����a">�����	L"x�3����� ��P[2`+��O�4�;6�P�	�W��h�,G�Sq���6��6�L��:����e�Z��������*���c�@}cI��	e,���hpx�i��X��If�C��
�j3�cQ��0��
�C��$�"V�a�9����=g!�@����b&�����z]��%���4��I�P/�"�)%
spzvH��u'����aj+������/~d������f�\;�ZI����� }��*�����T�K��e�E�n�8�������[��=�R��X1"�������2��q�s�XZw��
o��������c`}$9c�%r�f���_���pp����cNO��oFo���.����������3SHF4��zp���o�L37��e��i?�������^s�c�E������q��p4�E�$���o
���B�6�,��q�����u_������cH�������������G�U�������os���$�4$@q����%X���V����������2O���xr���{U��WA�X�HT���~� IDAT���bA��$4	��y����������;�'����>k�5�\s����_�dY�;��"��B���`�L������r{1L	R������C=���h�����k^[A|�H�[O��
��*��A*�94������eh�W����:�d�������M�l��a�b���@���:h���,��k/�,����:�M�Nrh:�z,�g�����E�y��9Z
���_��*x/0f{462Rz����Z������(�+K
yo�x\�T;*$��4j,v
�������`�*�9��v��� �=�lx�2�URGf�s�W�&�$�@��Y�`l}"��1�AIKp���w��5����K"�N>G���lm������������7���Xu��e��)t��/��r�wB����!��?�e����������C�^�M��'8���?	�l)�HJ�E<V�:Wc������}��8�*\�����t�Y����y��^�!���CY�#QUg}KU����3:�,����L����<J(K�=�^]K��O�`�U�����[����L���y�>Cz�Sn�R�����r9CV�.���H(��6��[���
h��,��^:�����X������J���)��y(�8��t�u8�Y1__���0���n���[.p��g�s�=��O������}��*����\i���4�1�"]��G>� m��G�hc�9lg{�C���������������dbcz�����>h:��7,Y\���T��9KXf����~�tV��i,���^�Y��0��fc�lOu�(ph,����6F��vQ������(I�����E)e�����>(d��6�����c�t����1{�����9mNL��mc���v�F�T?����i,�Mg�w����P���S�b2��[��c�5��)���E$4��y�v���s,uK�o�`_q]b0��2�����jSA\�R�$�������s���b��*vp��=Xpr�b�m��R�����^|���3�vv��SnR��3�
��V|�56f�*���4��c�	���Z�>L%�z�����<'���[���|��� {A�]�N�I5�LS��'Y�D�u��I�����H�����s���j��b��X�Np��A���3�D=�3!��N�m��D��Ti(
�
[�f���1��f��-[��^�Z�
�m��z�'my����9��yj,���	��@����X�a_�TP0U�=�{)A��N�**������A���W��zF�$S�����_���t>E�pK�H���w��^b��{�)`6���K�vh�L���2l
���Csf�y�!@����b���9��+�_:�2�Rg��������/����u1r��M��"�i`9����q?�$j����JX�^^��7g������������L�!���������&�\��@����x��I$�4������/wN���t7����0!�������un��7�o1�A��+��G���0{=^x��D:��u�t~�����������~y�.�� �<�����4@q���� ���nN�U�`�9xu	���ew`|�oR��,�)J�
)��~�x2�[����#�������+
V�����#�@e��<�I�_#�����C_��z�<n�?��eq'��5f�
����p�W��
���8��	?��P�%d-�C���.^�����c�W��.	MU_y�=x����6�AP2��?N2�u��,q�/��j�f�Uj�i�>o������"��
9�*I��O=R�j)���D4�}���������w��'�PW���0�
,�N f���o������f5*�,$������s�����Y|���2��u�iC�.���<�8
�@XJ�:+���F�V��.3������������7��I�vh,�B)�u$�Y}��+l�M��0��Ccf��~����'�UEf��?�%g/����|��L�_GhD�E����cM����'PW�o�������9{������#������P�OY��4%��B��E�h����E\��Z9���B��>V��`%�T�����g�O���v�}���jm�m,s��;��Pk.���,�M�����>K&J�:��c�"|�+��h����,��/���iY�����c���3�,��������Z�Yi��6>��1����xk,�������>�g��U#�raoC��,�9�������+��� Jc9�Q"��k����U�s�y���|/����}6gtO�K3mf�2R��h��X���;����|`XhI������G��,P:�^������|
��
c:�dnJ�����<Q��`A��������K`������{�����JJ���}��/���T>�HT����F��wEE�&�	�H�����~1���U�oKt�������zE��l,�&l���Y������f�I����M�S��@��4&��	�������k�#�8�s�����t�=��BS����!�]�Gc��iX�;��|�#x�JHj|;��T�%�������n,_����RG���mT��y����5�9���D����|�� ��L���v��
gj��j

}�6��"���<XF�G���I��y������������f�����������C!)m��e��ot��
�4���C�@����T��!�@���/�q���6���ar��YU
q��7EMD���qIK~7����(,��v���]��XGc���!�\�n��+U	�os��F�=3;����]}OZQ$D=%�%�5���b�T���5Y���?6l/|[���I��p��g���yo��_�,�T��8�6b9�N����o�Z��2�T��(���	A�i,���Q�tr���@�!�7����R�G�u����_N�}�{y����E
�z^rE�w��������m��R���c�8�������S����S2�&2S�'h|kD=�Y:�
�d���~��<�%h������,��q�k.
��K��"�k����1`���[v��/K�b�K�������\�?�yJ��"���>���@Q6#djc�Mc����ly�
Wv���o�����������vm��!��1>Jv]���������%�T>�~�nc�=��{��4�i��d��Q[#>6J�Z96�����L��T%o;��
`X�
l/�����Wo�����KhO!Vr����E"����\��uP0�^H��*��
@.��Bo4�����LDi,t1�4���E��L�M�KXn��Qc��?ty�WV2�P�0������yKr�c�����%���(��T�������Z`�����;�=c�������X����$zF���C��(�n<��*��o�Q,o��ug�80��qd��=��g�7(��>,}`*��Y�����]�3�(�L��_5���������'X�/�e����	\_R3A����#Ew���,��S�&�l0�xF��(x'�7��0�.��������d�C�V<�*4�%g=�
c*x�z�l�T8��)�2�?��,���
���T�3�d���	����<�x=���l���6���_���'��`�Lc�k���d�>�NK{�h=���ZT�,���a���O�:��l��2�B��H���u���\���?��F��������?�\�6�C�=C�k���2��*�k�uGQ�){��Q���>`r���a)`����#�������j|�V������)o��@f�A`��>�������t�]�Ba|A(������T���v��ut�y�Z�a�o��i��#�}��`G"R�6����A����%V����bI��|}E�mOs8�/8����
[�QD �$���O0��P6��~�&;����?���_P����E�{j,F���yB[C����i_f��ecq�	�]XzW����i���4iPg���1�Fba�]c����}!wS#�/�c��zZE�H������Tr:t�������o��m�d���T����\��J�-mTg�{����x*��.����Q�dY��%�������X��5��B���y[z�z�QO�E+=��$.�(N�
6�B�2�u���P������F0r�i,�WT�d���CJ����2����6�T������	��5= ed�t�rr��w�����Q�t��!c�`V0�zk}6m3��vl�I_��[c���O��l,�L-��a|�%���������Q;����G�7$�~�_x;�$����G�������Sc
��]��m�(�.�W���>F0�A|��������&A����8��S>�� ���"�����`�����(;����>B=�t�FA-1����p��H3u�"�+o�������f��Nh)1�0-�F�sIJ%�@�I<��G�JT�=\x��]H("�'�����:5kS�"��h7��g��y/���Lt��+LqW�g��R�DS��>}QhzQ����l��f�g'�fAo}�����X,��Y��|b��w�PNe���7�hc��I�
D���}���v_��������_"���n�icQ����t���Z�~��E�kR6������	����2$d0�l|%~%�N���X�L� ��O��h�������m�o4Li��tT�9E�K`c]K��\������]��*�By����X2������!#�2�xM�8���/}4� �$�#�zj�r��zY��e=��Zd�A��8��}���O�*� A��
B��6���#��f[��K�����}L7"�%�p:�
xC]G����#�`����;����)������
'�[�%x����a������f�Xbi,m�#O��h�r(��+C/`d:�Q��y{e��^fv���i����e��'�/��(�;D��A��N1��{��r@��fb���"'�k����3�5��S��ZV�-�������@�c-t��Q
�@z �7tJ&Q�6e��%��fc�^g��gt#8H�bZqd96��TxrQ�����0_�-<=�������Y���_?�|��v�n"����EXh���{��qznG����f
�zU����w��l�	����K�*g�dI��L���������{�p��o�2YkPz_����\aR��x���	We����jw���W%��H���T�o�7��1�������<K�=�}�A�]��~���G���P���S)�P�o��YBN��FM{XN��Fh,m��e(���m���Z[�<��9=% y��(C�z���E����'�?xW�G6�%���T���b�X�T�~uF�r���UK���2nlB��I��^V`�I����Ud�P]3L���E��*
��T���mRD��`,�� qJ�&3�5{K������1��7EWx"+��p���'r1����������D^��)��)��s�����k{-g���G�W7���K��?tk,r>��ha;8�8��
e����y!8-=��=o�18
�7y�<X����ot{ -94� ����g�H�a�����R��e�n�8��&3��;&j�i�|�*��j�(�X�&����FD���"Si����?�px��LCn�o�;L��t�1|�������8�F�"u�P�E|�6�D���%o\���Y�e�^a���6�[�H�-�u�&i�+�\��� T�1/��_Q�������qR��d-r���%��`~�y�2�1���B�V��E�R:��
�
r{ u��0�~|�sj.5��g���Ljm{�C�P��
���i���u8��~Z�M�B#y�^�X������T���P~�^+��S�)7�������>ZGMO|4�����{$�T:���4#�t=�K�t���3gk"f���,#.��Fq�t5�R��U�L�k��
�Y��HG���U�����L����Xl;��p�J�m^C��0�5U����*��r���Q������'��Kf�}��|����>�/|@��P3��o�be9�s�!�1��t��|Q��G�Xd�W��D��f��
���tr�Jg(�LK�b�|v����6����V��{]����a��Z6��X��q���S����_��D�^�A�	�����*�j��0��9r�T|�,���8��.����a��=#�"��h��H�V��@��%~O���]m�O�_>_<�q_=���x�v:�p1i��R2>��'C�d�&��9S�9��s:2z)h���I���y�1B�����Y1���1�l
��%�5�g�]��� 1��a6�U��5��b��� kq�If��?�����[��b���y����<������LlR��e���@���N���d���.���G��]�b���6�\R��v2t���KZ��OiHy@,�����A����}���G0��������A��{����6:QF�����y/77P�hB?y��u!���n,�#��2`z|��1Y�[#%�K��������M��N����|,����;��U����7\���}�Ouo\
QQv4������8�P��E_@3��E:+OP�U7Oc1.�	\����e`�]05����/�#~��$Cs�������S<�{��E�^�^l�z�=�*a���v\@Wx���zuA%E�I:����DAV%���
�-��]*���;5��5�f��~	7
����H�
m�0#5���b��G�S�@������~	R�R��=]��t�1NF8�/0=�B���F�&��d|����
�J6���I�"��GP����$�m�j��%%C���SL�������.j��-�
�d����U���Kdy�X�
/��Cqk,�b&�Z�l,t9QBYr����1�Z~�T�J�V?��9���2���r`bb���X`�X�!���@��G'�+L��x�����o!<b��P�rZp�5L������N���p�r��-�]���_|��������X9�~�x��Og|��<Na�����2����
���������;^�+���tu��^�t#�eXu	��4�i�#j[P'��>C��$���l��p:0l-�Cdb��p;/��$�iA��d�}H'�6mQ��b�����T�����k�QJ����W�����]����R��	��w
X
��X��u6���,����ZR�jv��.���lH���3��Q�Y�U��
+�����@Q�V7��m�
������m�z\BG\+(3��
,dBEP^	�rA�hdt��-�]I[��X�c����M�`7`+��v��i,y���f�4���
`����C��VF���iyNyx�}?�=|	1��#��6�T�S�74m$�Q<�/�����sv�cbo
����\" ���?���2@�&���`���c���Ajc��b�ODy"4�������5d�a��������=�'��M��8��)�t���R��'~�m8����d�S���8��g7�<J�C��D���4�����M��Wd���Cc�/K5G��1@��YY8S=�#�'�����Ty�F��H�����6l�:E��I.Tt��*�'��� �-�C-]��,#�1����$�su�HY�������m�����kV�*6��,3��+�c��^�f�
)���(JV/s����An|�U@�����b����|����h��w��� �W������@������Q���K�]�b�{���x]�
5Y����oA���{)�����7(U���"����AUx��:g�&��+o���g�����B���=����h,nJ�0��q�'c�z4��i�-���:�����z�c��>��%�����]��8�z|�o��b��������[�R�R�{Vh,�c)Y����"*F��<���P�(�N��j�������no0�*e�X,��>b��DRQ�:t	@UW���z���R�����B�`iT^���|p����+T�`�l�Z+L���Q �?e�N�PsuS���R��C��T��#��a���q,7����U��0�l������!�����a)	���8H IDATZ���m�%�W:��T���`��>��l2��_X����6��`�S���.�d�%�.X#V ��&�����RYStx�V�MS�M���E���;4(���6R������a�B�A�]�� 0lj�H8��g��c\��G8�9VA���pY���������i�HZE2��������Sp��q��K�'q����BBoJi��:*9M�h,�$��7�s{q������j��Ea�|���t���F��D
�h����[0������o��y����y���29x��qK�:!�	*V��a��G�s�S%�}%n��8����p�����6�u9�i#m����u*����k,)�x��x_���X&��>���u@��?;�8y�����	^�����t�'���\��?{��9�m���s��`G���k,��]������[R�J���h����U5��F��w ���lo��8�e�+6�>PlCG�AJ`}�8�t %m,F��kpO=3���"�mO0�Gj,���`������X�wJe�m��^�^9I�K��J��b��`�eZ�7�m�\{���O�gI�������7���T~�##a"�9��^��)��g����TTI D���{�����
�����U{I1��������w5����\�"w�U�<CZ(G6vcF�>� �������D���'�	m�'�\�o�������\�9em>��,92~q�@�{q�B�m��p�e�u��1�
b
���vpX�q��E�qh}�8��|4%SK�
����<p|p����������n����}O{ A���#$�����c������N���������"���n���K��X?�y{N�U���l<�������N�b��I�b��#���zbj,J�
������4�(��LIi����A�W�����2���m��)��}�	v��LzD�J���������]������@��ic�##�����������m W����vyg���D�l�B�=811�v�X4=�����Pr��`J�]�+M�{K�Z"E�]����-�����L3���v2�O`�(����d��J��&��tTHw.H�/nVlr��XP�v��!F��`;��3���lJR������]��,�����ew���c�f8���L�MO}��:�F���x�y�DY�5�
��K���Y.��
M�(��"��|�r��D�"���Cif�#���(��S�J�(��#'m:x��md�=���K�=������G�����6D�e��@V�cayH�cY�*A2.�_
��,����+���2��|���'u�A�c(�?�o#_��i
��N�"u�d�A������W���#�C.j�*��C�/Z�B}������j��Gx$84c�kz�F��RN�qWXu
	~��T|?q����;v�D	�'6��!t���Y��@���r�z�?��y`����6�s�lO�R��3�"'���J2��O�Nd&�X'1������-en��'�����Ff��,������������"4Dc�OS�x����a���or��X�QS���eTf�~z�D_��b���g��n��D�GFz�K)h[1�
�X�Dc�&e����2�p��O�Y�y>���w��,����m&Oh�m��h�mZXcAx"n,e�-x6B�6�����S�o<�k#)V@N^�!�7:2�/�E��hc=]�(Z������:����L�v*��i�]���E���	�>�L�G)���~�-���D.-��"2�����l��+����z o���8Z��B�y��$�:��p���[{�D��=<���.���z������ ��t8��dG���w��������v1���������r@
���* ��$|���U��$���ji�q_���6g����R�E����%��)���UJw�i��"+QH��3�z3�8�c�c��n�7�7�y1w���e<��:F�����r�|H�K{����j:�0����$������1$
s��D�\�r4�4|(��"��1��F�>���;�����5�H")��C-MVV���m��]�P��q2��B[�9�3���X��mm���
�eS����	[�a���5=�Y���:Z�G/���[�I�U��g�Y���X���!��X@f����},��gdY�S!Y?�>����lg
X_�^tw��6�_��dl���8�o��w��1�X`�X�W�Qc�; �����o���J`��)4e�Q0��Tf��Q��d���e� ��Jr�:�(t��I��?4k��2��V������9���icQ�����x�$`	�k6�n�=hp5(n��"1:&H:��R1E��Tt����X����a��/Aql�,��z�F�����)Y=�
����`��Z�cXS�u���~��r9���0?:����O���I^�6��65�����X����7�3�0���80l�b1ASa��,�CQ�7z�_[�FII7u�J���LD�6���t��������V�W�J.�N�i|c���JC\�����;�07�����V�q���(�g��b��9��"�@{��5����X�g$u[���y��md(��hn�Y`�
������
2t�������o��������E\���P} CH�X�^a��S�����*������
~�L�_�a�6W+������
}Lt����R����I��A�����\�$�9v��T]�;���'�x���c�y�iA�F/IC��
���@
��������9��v��%�Nh��Hq��~V����Y���dv������z���b�Y2d7j,#��T�����M� 
'����O��Jp$���S���(��2�N%e��'�Hk,��M��@������%�G0~B���E�!'x
	A����9]W�P�
�����m�+���v�@����6�])�����;���4b�g�9��?�
��?7e�Dp��k��"<n=��X:��7*�"���������F��_oA"��UX��v8�X`����<�p,p"PP�,���g�E����>�G��}.X�_���?���&&v�$�2�l�.���h��<X��j���e��b�������I��<����*�_c�e����W/[:D�]��7����G�Sgx�R�������^�Q`\��+����N{puAw��=�,`��y���l�F�Wc����lnW�0N4�nO�<�)u���L�b�6\=5�T�6��yb��h����	�>%���%��������&�X���c����=���[�g�aKr�1d��Y�0�hs�`3����*/v	������9�;��U�'2S"Gu�]�kE*��N�t�~e{���c��Nmf�#�����J$9m3���Y�0A����V��yO*q��vS&�����j}�����N4���w�%��[|�|6���-��n�t`%0bz��
?��?
>���g1W|���m��k����.Z>�������Nm6c�>q&�Ue�X�L���}��
Jk���%�g�&�X�������e�.�d)�No��IC�qQ�NNk�>�3��+C��|������mD����h�.��� �3$����U�)g��k,��Ox����]��Hm��|QZ�'���X�Kh�+�	/�I���;QsS�2V+�����:�5�5*���>���P���ee��]�m,��-�e�A�H����j�^�S*�1�yTQ�b�c�����
����c�S�;����L�	��wp��K�/2���/����f�tc){k,�8+A���X:c�o����#�\2�2�3�^���g�K�����6��l���	>�6o|���/��S��Lm,��u�5���x�C�X�����������j7;�����
���`����q�&aH^W�Zq4��1}��@z|��Z,6������F�x���l3MB�[�����=�����d-�R��vt{��"�e���$�p��S!�5Y�<5%�q���A�.�JGAIY��+tY
�U�N�	p������.1�s�
A,�������So��G�&r�,�����O��������~S2��{F�L���4-B)�� ��X�X������R�*�K���lNe�fj�e��L�����W5j�K[�EO&�LL&~8��3�V�����p�����Y!��[0�)20�Y�
������������+��*o �g��'��	sn���O�>����2�xXD����"�d�e7}�S�@K~U��������O������E����Fd#k�}����+��� R�����d%�'�4Y1m�c9�6\c�E�k�E��V��1%�O\d;M7��L��M`c�cX���I����"uH]�]W�Z�Q��W%�����6��}�qr(EM��	
d7�����/�������^Z��\�kuj�=��-���4J�]�>��l6��9��(
1/d=2Wv5h#��`S���91����oc�E�>3��*6��~t6��*�~�Im���z���A�� ���]�Y�%c*cX�=(�B�w&��Pjc����� �R����Zc�YH�Cj,�wG*|$L�4B{^��d�Z
��9�t*���j�(/��?�m/*�����Fd��q�oc�J�t��z��c���B��z���0[��B�*�
���e2j,���J2]�K��H]w	C�����fd���W��_9L w����s�'#r���*p��y�������f�� �[�3���F�R[Y����.{�g*g�S
�P�57���#�ku70�5��������qJ�����H�2�{j,���pS�d�c�%��Q�EP��+|G��3�������k,��H�6��+�l�X�����%	����^a�A�d�/�AID�P�a:�Y�a������_~�'L��h;���VV,��b3� �8�h[����
��`T����u��������ue����I$�N6j,��h���59>���g��6������#uE}�=��bcI����(�Y�M������T�z����N���6����4�43Lc7����G�!�.�U�.���ccq l�>�Y>��zz�f���wDU��RW6/1�Q�����R3>s�mJ��'����9�K�H����fx:��R�XI�����A�k@�NM���X��N��x�J���u�2�d���']_��^:����dd��0�����A�����i��� �yE���6���d���"�C�l�����GS0y_�WT:\�b���Qw���X�y/rHi�-���t9���4�#�k�+A�c�W����bLj���C6��m��
�X����R����Hqm,4����Hw,w`����Tt������`Fj,�f��>��@���N�/�ly=���i��q����r�
�Z�R8^�i1�e�Y�����1�&����@�v�CMm,�r�uQh,COb������-���;5����}p�h#��[	�TV�0�g��4�����X�He�	r.���d���v��\u�3�u���
f������i�6��r.zf�R�x�#��<���3���nt�Is��L���b-�-�T`�������(��w]!E`E�
5�$���mfR=���R�r)�T>4�b��P�~|��1��1�9nL�`���bg�C��l��c�M�#�V�vM���E�-�-bv`(Bn��*S��D�Z|B;!m,�����@Bcy�?M�_j,zT�l����0�����V7��NH��x&*�x{���(�6�(����s@�����a��-����������!�.��t��m>�<�7��Kc�-P���-�u���+�y�R�YFQ��c�"�R*�l�e`E�'tR��7���]����%�-`�m�����i-���l�������k)�G:���e@���Ks�C�z����`�F� S��
3]�+����������X�4�p�����������8(}������wM���S�0:N�taT�Li�"}c�lyZ�M��1)�Kz�w��SI)0R���
/� ����@4u/�?��U�m�
,Ku�^j���&��,��q:�Y����'h�������?be0�cn������7:3�Q����*�G���dP���K�b,*y��/����
�Z| }M�}�&k%4_��3�9Xca��L*(ec,J�����,R��Lb���|(0��B�A)��T� ����e+N�.����LS�8�|������/��F��fY,�fGT�z=�U��
�k��B
�M��"�$LwX�,�k��EP6�����y"����D��:8�n����A�iH��F�RL=.	r�wPjA=�-/Ntz7%�����K���Y�|T,����)SC������6�j���_c�6�a{�_������9��1��`�X���5)�1�<�Sn�9��_b�XtA���|a����E5G��Xa����z�z�=i��{qGn�4����S��wx@���r�J��cc�9%c��/��_�|�&k,}�#e������3u�$K9���H#^�z�Ju#����{&�;8�w_�!s��W@�4���E1��4 ���0� ������7i���5�9�ic�e�#���o�OSK�wp����m;�������(�:3���AF�Xh��G����k<���@i�1��R*y&��8!,���Y���G#��W�6�Oq���X��+0�� ���J�����]�J�c����t�A��t�����S�������#_�.ymQY����Z������+w�Rf�z
���v��3���8���Va�������>}#��[�X���5��-�8kr{�������8�&HK�H'6�A����8=��C�2_����VxvY6�7�-�%,zH��1��<�U�d,��am��XlK��E�:�20)�b^��%��BG�IK�?UK����\'�+���F���N��VT_J]�M�O,������O�������X&M�����of�P+ejM�����Uz���������0�QN�Zi�'E����-EX���#4��:E���=fE�U����'m,��^�)pL�Az�cyp\�yh���7DE��>>�Q4�$w�{��"�m���m`�%��?������e�n�
t����'���'R��[���N���"����0o&+�4�l����g/�k
���������X�e���
6�i��x�k� ���0��P�x|=� Q�C2���(k�1f@�}�9L�q,�8��;��d��km=���1���M�TE�<�������"kDRH���l,��C&N�@����zl,>��#�[J���������(=����A��S��d�X��0��:A�@T�D.a����!�'(����`��t�Jq9md:�x�X�v�����,k,�0�����:�D���	l����Z�oi�4���"S�	��N��n�����`K�78Y)7Ep������i`)�-[^�l���y� :�V�x�o`A�Lk,�E}G��,t��|��������p��DO��9����:��2�U��t�65���_��b�{��[�k���3�����0����?�����n�6X^=��f3���X�aE"IL���v�{���������$����H�R/&Q�v�R�u���L��RY����?�A���%%� Rv�Q(?4�U�|�G9a,�<sbA?�����rs�� WG�q��$���Ky�x=v�3X"V�8��:-+F�tk�j�>(��XY
��X� �'m��������j�8O�7;(�_��^rM0�(����2
a�a$4'U�
��(�U���$8�4�j�E�X��������8�d.�`���g:���������6��eA���7�A����d���p�X'�gc5�Wc��F�zP$�d��������`QD#G�;M�D`z:>�}�v/�����"H0 �����N$���*,�����b!o_�:�l��8��M�7��5�a��C��[e���������b�
�%�,�Xh5Izw���u��/��-�Hr�r����p	��$0��3����t�.��+�"�sd����(�O��U\�)E�!}�IDAT6^w����)d<��H��Dy:��lI������3�&]�~�@P��I�Z0���{�6E4���J���0�zd���T(K�� �%�Q�Df��WIi_�?E�X���W����<se��X��Z�-��C��O-��u�)>}�V�������dcP!���(m�S�n�mf�m�G��@c����#q��$��T��:^�g�3�<I25UD���N��4�� �~#J��m.*��Hc:��0LKB���BY����H�����,=��^ar5U�7�b����;�D5�Z�����E��G!��"n3��u?�~n����k�yzb����u�t��F�G������� ��(��:���)?'77q�l�Ll6�v���$F+���[�J��/�H!8�~9yk)g��P��5�a������ ��z@� ��}6nj��l.F��T���Hu�C�_=G�5�,�Yb��DQD[N6iKf`��n`0�MRPrAp�T���Z4����i���
���6�����A+���b�X*;���Z[���'��J�e�;*F3*�_y�e��Zi������Q"����z���0��$E����,�/!�g�`]c��u�U�7��*��E�:��"k���[��E��_[�R�T�
,�B��N��[�W�l}��Q��.����������6�
�K��%���3��������FvQ����X�f���8,3�X�aZ%�~�z���&�Xd��^IM~h�BG}�����or���m^;�=��{!���T�#g`��������&�o5�z�Q�A`OF�^�uQ}�����y�h
��Y����gs6��z������ZQ)�7��r�)X�"��M������0LK�h,����(����q%oq��%�5
�
��T��d���7�
R�C&����
F��#m`�(.:����YrzM�F>��*�������I``A�����Oi�~z\���@$�������\�YM9��X�c0}�$�0�C�XFC�P��!bc1�>^a�/�,Q�3����ecQ����O�������S����8
S�l���m���c����I8K��:MB�?�:��S@��LSD?0.��Z�.F��S!;},�{�h�V7�4��\h�0��LXv,O��y�a��74�����>`P�j��b\�����p��J�����cc���SB{In������B}��#����lP��E�|�,��^����b&���n�&�p��,��%vo�������W�i�iJ!mWV[�{Da�0L����/������h�^�\��4�CBR�`�
\V�S��Z�
|�
��k�@�c-�"B����[V"�0aL�:M5�r�e���$
����c-�4(%�\}F��T�`��ZJ}@�q��%;����<��lca�iU��n����Sn����P��:�;��W�_Il�MZ�Z�b���"1����Q���`��p�]M��;�I~���R&�#4�H� ��+�Q�:5y3Y���*F�~���ik,�0�J.�%}/��������`)��a:t��b��m���e��#��g���d�fW��k���rkT���p�W ����\�h�vZ
tU`�
o��&����������o�ow/�0���������,�,�z��lL�w���[�������vb�%l�-'t������`a�G�!����&j,W^y��������^x�yb�9:��N�Z��%��Y���k��X��}��C������A��(X��v�V���0���~���D#�����Yg�{��������������k1��#�����=������z�i����3���k����]@�������X��p8��iM��l��q���.\�u���]�a���V��N����9@��-��� A�0;����n�	-GJ��\[�0���+��]�:;;T����z�yb�9j);��gO�A�5�����o'a�af<M�X������d2�4�B�0G->E����0�L����r���{�X,NNNV*j��w��]���~x]��0�0�0L2��#�+��k������shhH��Q.��2���{���Nj^{�&��<�������(��<�������(��<y�������866�i�&6lX�xq�.�0�0�0��Jm,g�}��W_�y��W^y��+�h���a�a�9Zi��r�1��Y�f��-�W�����a�a�a"inM______S/�0�0�0�QL���CG�
�0�0�0f�/�0�0�0�k,�0�0��.��0�0�0�����0�0�0L�������#������7n�d2�j�H�����uD�?�;������'�Q�p_y����'�Q�4���
{��o����^{������.Z�`����/�����Kp�����u�]o������E]4:::11q�UW���������>o�<q��
����k����O>y�m�����,��'>������
�����n������w�q_|q�\�9�;����~��i�=�otf`G)M�o����W��3;
���������^����WJG]~���O�Q
��������;�=�\��(�=tD�T*���7�����4����z��t�M�������������[����SO=�j��o~�����n��&���Yg����'r�}�����[w���K���G��?x�UW-Z���nhF�bq����Z���_�������4�GAG)m3���L�NS�c���L������H���}u��3M;Jo�`���O=�T��8M���Q�4������&��m�7����O>�����4�+�=����/�w�y
��4�(�IG���Pc���loo�f�ccc���Kv7���g�;��\.7��;v��q����,\�p�����G}�������>����s�9�B���N{������Xl����SO�����v��
�q��������X�i�)�1~�X�iG�M�o��L��:�����7I������p��8M������4�(�Io�����^{�%�,[��IW��}eT����3�<��W���7��������g>s����|���?���>�1��\y��/��rgg����X������3�<����f�^x�	'�`��O<q��'��kWgg'�j���[o��~��_��o��x`||<��5�����k��Z���������}~2;Ji���nfH���1����(�{F�H$3���o��(����s�U*���^�3���2���+��d2�^z����#�0C:
Z�~����{��>]$�!}e����]{�I'e�����(hM:j$��KWWW>���e����=��5k���K�]w����{����}�k_��a��~�#������������Kq|.���d8�����E�WG�K/����n���;��jww��OffGQ��/�N�d�w��Hfr_�j����������g�����2c���O��'?�n����������z����}�����������"2c�
������>���|����QG���Pc���[�?������������[��Y#��/�T*%>�����������R��O?��+���~���G�W��T*r��{��W,'''�76���o�s�=w�}���K��`������������^�~=L�(��E�����	w�z#�Wh{��l���?=��3��,���!n�}�{�]w��(;����\�r��_���7�|�W\�}����Z�re:�*;G���Pc���~�x�b��r�~�X�������M|����>����z����T&������G=������?���w��944$�M6���y����_�"��5��_~��o��Q����N��;��r#�Wh{��l�{���������099��d
�������r��\.w�g��������z���~�����Q6��=a�����< ���tk��y���&''��['3����olll�������}��C��,Y"���i�
6����������}����W��}cM`�������.�����A�O�QF���N��;�������=�Q�k�����R�o�+�w����~��'��3g��'�(���K��]+�������r_Q��_�h�"�+pGQ��=���O��Oo������kxx�������O}�S����������3g�����K.�����^�l�-^G��n��u�L�v�u��}��W_}����_y��+�����s�9�����������{���&���r�=�\t�E+W����e��~�4O��(����r_y��;��+������������~������(\p��7�|��w\r�%�_��(>���_l�w����V�<R����[�l�?����L��:"����v��������(��<�������(�W�Lca�a�afF�0��a�a�a�0��0�0�0�k,�0�0��.��0�0�0�����0�0�0L���0�0�0�k,�0�0��.��0�0�0�����0�0�0L���0�0�0�k,�0�0��.��0�0�0�����0�0�0L���0�0�0�k,�0�0��.��0�0�0�����0�0�0L���0�0�0�k,�0�0��.��0�0�0�����'���z�IEND�B`�
system-a-disk-usage.pngimage/png; name=system-a-disk-usage.pngDownload
�PNG


IHDR�08Kn�bKGD������� IDATx���{x�����_$P����Zm+J�z(V�K�V*l-iw�����jwA@Y�^�+�k�v�j�,hEa�QT�C�'T@�$���d����0��a���=/��s]\c�����p�����7����������I����J�|�����q����r>�(Z�v�$�	��	/�$���]^�w�^IRKK���������c�vy^�	l������*33S���������	���^555
jmm����D�����p�B�����u�=�DxF'���l����Z�h�.\�>� ���^�w/pr;x��>���L���W��������$��~��7�Q��}����#F���/����;�/++K��-��,O|�_��������555Z�~����.
6,&s�J��Akk������o����]S�����F�W����{U[[{\�|���z����������Y�f��o��������v��V��]�V��oWCC����5{�l
2$��m���������4b���1C�F��������>~cc���]�;v(99Y]t�������{Z���O������\�sk��k��V'N��X�9�9x�����������=��}qs���UIII����Cz�������cu��=x����Y����+%%E������Z%$$D��%���Po���jjj4`��r�-!_�-���H�/�'==]����$
0����"�$�l�2555���>���+u�m�u{�(O|>����^J���?���L���!!!Ag�u��N��U�Vu�YCC�V�Z���g�����K/���\���y�c���{���{�C��O~�
0@�>���x�	��?�c������?��������
6�����==~O?���?������Z�p������#�h��!���%�����_4577k�����K�?���Pbb�������b>�Xm"�����y�\�R���!?~O����G�����@@����<y�~�����A+W��_��W]r�%�~|Iz�����;����oWff��~�o�;V���Z_x�g�}&��A����I�&����*,,��e�T[[����.wZ�������P�������.��R]z�������@eee���;������������ow�o��I������:���4k�,�1"����O�������V���6n����UPP�������Yg�}�n����vf������}���z���k���n�Ig�uVH����>}��x�
eff���nSzzz��o�y�����@G��\��3gJ�����n�:m��M


:��S���|G���=>~O���V�{���_���x.��]s�5!�Aot5��7��s�9'����|Z�f�JKK������>[3g�T�>}��I�&i��]�����={�h������k���/
�����m��������~z�����.�H���?�g>�/8f���l��1��Ww������z�����o[��/?���}������Kw�}�q?������=u�T���;~>x��������#��=��'��[[[������;�P��}��o_]r�%z��w������f~�?�{~����BG��W��U%'��G�� ������=�<�_x�����:���t�m�i��A!?~(��vG���~����!=v(����G��:{�VTT���B��OWbb�����+��B/��B�c���KRQQ�f��<�}q�=���/����+���E 4z���#�p���/���|>=��s����j��1�������;��s�=�����~������0`�~��)==]/����z�)�u�]���^�ZS�N���_�'�x������T__�{��G���Z�j���[|����O����E��~�z�[�.��IHII���K�z�j�[�N��rK����m����N���t���z�j�������RZZ�>��S�����_bb��-[�.W8�w�>�V��)���{��WMMM�����/�����>x�o|�:������O��;�����~��^�Eo����x|���{���:t��
����������~�s�=7"�}�g�yF�_~y�m5_����C���8������\^^�����������c��� �����}���*544(##C�����}O��
�K/����I(��P��]2d��?��O��������e
t�������������.]�?��Oz�����~7b�������a�4|����j�����������'�}���j=zT������W���k��������[��(Z����'6>a���_Xo{JLLTBB�>���F
0��ni�g�yF����]�H�i���;OJLL����;�}>����u��+111�{����V��m��3�����G�\rI�?8���/VBB��<��._@w���.Rbb��?�|������J���_���9rd��
7����4I���#��������������G��'�����R�>}���������m[�lfffp'x�������������Q��zz�����u���j��u����k���������n�����k���]�g���=�������+�h����,++K��w�.\�@ ��_=����{�Y���Q��o9k?����7���>~o����J�>}���H)))Z�z��,Y����w���k�8��{���?��$���n��p����?v�'���z���t����	�~z�����;t�P
:T6lPss�jkkUXX�����<~]]�$���C��O����gjnn���k;}��D��������� l


=����n�Mo���


4x�`]{��������~������}�"z>QO����[z���u��Q577������~���eh�����
~/D����q����:_�]������;I������}�=��5������������M�l}������������w�_bb�;���yE���c�==~O��[�������5�\�)S�h��uz��'4o���<~}}��{�9}������455��������=������������������:~�������_��y����ao��v~]	����SN9E�g��$UVV���@�V���w��cD��w�}���P����~kai�v�R}}�&O�r&��	�����7!!A������������A�4y�d���/�������o|�JNNVrr��M�|�h?����]|0��w���?�����Yg����V��'�������>��~��{�==��Z�`AHo�HJJ�����%&&���!��}>���]�;��Cg�q������f�/g�����4���tx���4%$$��?���������h��������B6��w�ih��#G���SN����]�Ob;���H;�c�Z�|��y6G�5;����G4����CO��S��VEE�>�s�=W���:��s����F���9���j���u��=���;��3��N ��O>��#Gj��i!������<~�������PJJ�8�1c�H�8V������>~D��9x�`]v�ez��G$���)ZB}~��~���
�C[��i�&M�2%� .���#��off�~���a�eggG���
����^�~�������9���8<NI���ax;�I���Y��oWQQ��O�r���Z�v�RSSS�`wl��k;Jvv��~���;##CIII�����m<������@ ���4�����7���MOOWff��|�M�����w������TM�4I����J���3��[y������0aB����jj��9�=����������y�v�;T+kiii3f���_���&���j��
�����������==~O�����`w���)�?)Trr�>����������������\�x��������e����K/���Y7�pC�?��w�o/���z��W���@o������|bb��L��W_}U�����|��_�r����Z�H�O������gCC�^}�UUWWK������o��C��M�>��?l������Pu��'}�.��?�8���J��Oo��<�O#����w���_?�����'%%��s���/����f566j�����K�;�~����8q���j����?=m�	<I-\�P���1b�����)x��w������k����w�����^{M�V�RKK����t���v9�M7��|Po������/w;���]s�5�����N;��s
���!�������8p`���|��Z�j�6m��.������=[�?��~��_���Q�����W�������~�PWW���?4R�o��[�N>��5t��,����>1~���j���JNN�W��]q�a=F��z��Z�f����>%&&�������^k2v(��H��tu|��{��P���8��x����_7�|���[���~Z��z�n�������UKKK���m��A���Z�hQ�{�v�>�������FIII:��3����<~O?�IO����Nk���������d]|����Ohw����I�?z��LJJ������C���V}�����c����/�By~���k������������������[oi��! ��z���>?z�{�n����jjj����u��7�B�7��
7�~���:��3t��7�����/�������%�N����OIx����B��%O>�����k�����JHb}�uxO}}�:�N������g~�X�@�t�����_�~<x�6o��q���zJ��~���|����@�������?�Auuu1bD����Buuu�C�:��A ����w�����@�'�khh��"���?�������s�@�#\'������#G��
2dH���E`��������Zc<]UU�F�%I
������vy^Jj
H�}$I�R���	@�������^J��'������������@h222TVV�o}����~�z=zT��/��~]]�V�\)�����t��3G�������jnI����e��'k`j���f/p5p��n���E'���>\?����O����W4q�D���j���z�������*������{��5����k��}��g����'�����O��YYY<x�q�/..����%IS�L��]�����2�4����J�����������(���kt���\�+�����u������X��q�Q����v�y9����m3��mF���3��m&�3f�egg�������P/���J7??_Pmmm��Uccc�s�r�5�0`��8^_���z��o�5n��Zo���3�9j�6C��f$j�:C��fB|���P>&.^JRZZ����w�_����k��TM~�Z++����4hP�6��:|�m�>}����8�9���:��e���^]�q��47����Zo���q�u��q<���8�����xd������|5�����B���v����n������$���O���V�,Y�%K�����SNN��������|�r���v���TT���Q�F'���j����]+)N:�&L���[%I[�l�����~�JJJ����3�9Q��8^_���z��o�5n��Zo���3�9j�6C��f$j�:C��f�������;z���u��Q���������^���Z�Z���KAt����	`)R����R%%%u��������	�	����p�B�������u��J�RSS5�|���j����z]}?T^�`�=c9�@��k�������q�Q��j�6#Q��j�6���T�=�����P��h��R�:���������~'d'���w�Av�X�E'�6����m3� �g\s��m���H��u����"55�f��6tX��Ac���Ieee������I���%%%��v��Ma��r�����u0���z�c������QP�:��j����������x\�z�8n;?������Xeee*//�5:�m��D'�����`�%q�X��9���5n��\��k���P�����P���h���N Kt=��;� �g,��h�q�Q��v��3�9j�6C��f$j�:C��fzRUU�@ V�N`:�,E����T]]���4
4H			]��N`��#��{�r,:���5n�a�=����m3��mF���3��m�'������VRR�>��SUVV�����N K�8'����������#����6\'��[�/e;?�h=!�/e;?�/�>�k������N�}=p�����������u��Xt��	`�O� /�7�sI�3�cqN�m�5G��f8��=����m3��mF���3��m&�����@���;������f\s��m�d��k���P�����P���h���N Kt=��;� �g,��h�q�Q��v��3�9j�6C��f$j�:C��f\����N`:�,E�X^^������O'�
�	����K�����_O��K����K�����8n;?�h_�m��q<2�D�:�{�������u{:�m��D'�����`�%q�X��9���5n��\��k���P�����P�����7�����R�:�~�_G�Qkk�5d�
8����	��w�Av�X�E'�6����m3� �g\s��m���H��u�������J�F���1c�������n�O'�
�@�"�	���O5b�%$$�9r�q��/�������N�m�5G��f�Av����q�5n���q�5n�	EFF�����g����O���N`:�,E��A���	L6�����T^^���m<oS���$0e��zl����O�N/����R%%%���%x;v��.�������������g:�l��N`jj����t��Aeff�xz�	������$���8'�6����m3�K��q�Q��j�6#Q��j�6�@ �@ �����w�s�pN K��>|X~�_�
Ruu����4t����W��w�^=��������~�;���H�����b�
���i������q��#��{�r,:���5n�a�=����m3��mF���3��m&��r������k��1��<V\t���h���3f�v���?�P7�p�


����i��i�����|�1cF��A'��Hu�@���}�.�:����=z�$i��q�W�����<y�$i��)��kWX���v��3�c�	�����q�;���5n���m35n���m3�8x�`�_Q\t���j���:����m�6�]�VK�,���K�`��Y�Fs����e��h��N�N K��+%%%�uCC�&L�p�������***��>�C�)1�����4��7��v�$�VV�&?_����JR��t��_�r[RR��-�q�~���p~.�m9?/��������Zo���q�u��q<���8�����xd�i��PM~��v�To���k����?�����?.:��:|���q�y�����SNN��������|�r���v�����B��~���k:���$I&L���[%I[�l����z\/�7�sI�3�cqN�m�5G��f8��=����m3��mF���3��m&_|���� �?��N�{���
6(h����9s�U[[�U�V���)==]s��Q���;}:�,E����N�:U�����O�o�1xN`jj�������\��?���]���;������f\s��m�d��k���P�����P���h��N`$�	`�N�yyG�d���Xtm3�9j�6��{�5G��f�q��D�[g�q�L4�	lC'��Hu�~��9���V%&&j��!8p�q������Ieee����$���oKJJ:|���M�����8�9Q�`8?����zu�5����u0���z{}8������������q�v~�#3Nqq����T^^�������Q�4f�egg�������	lC'��Hu?��S�1"xi�O?�T#G�<�~tC���s.�{�r,�	�����q����g\s��m���H��u����"##Ceee��g��������n�O'�
�@��tP���;��I^, IDAT�����f\s��m�d��k���P�����P���h���N K�����*))I---���c�w?:�!���;������f\s��m�d��k���P�����P���P���j���n�C'�
�@�"�	,//WVV�q�_D'�
�	����K�����_O��K����K�����8n;?�h_�m��q<2�D�:��$
<X�����������R�>���Z�
:��tC���s.�{�r,�	�����q����g\s��m���H��u����������t��	`)R���{�v����A&L8�~tC��v��3�c�	�����q�;���5n���m35n���m3�=zt�?������N`:�,E��@����A:�!���;������f\s��m�d��k���P�����P���P�_���!��R���������:�=�r})��q�@��	q})��q})�q\s�m��u�������8�Gf�H^'���i����?>�����	lC'�%:����s.�{�r,�	�����q����g\s��m���H��u����"33�����N`:�,E�:����JNN���e#��N`��#��{�r,:���5n�a�=����m3��mF���3��m&�G���Q��������N_�N`:�,E�������r
<X������_����/�������N�m�5G��f�Av����q�5n���q�5n�	Eyy�����4�z��������t��	`)R��@ �����G'�
�	����K�����_O��K����K�����8n;?�h_�m��q<2�D�:����p:����R�:����JJJRKKK�v������N`���`�%q�X��9���5n��\��k���P�����P���P���j���n�C'�
�@�"�	,//WVV�q�_W����7���<�w�}7����:�X�ByyyZ�b�����z\/�������N�m�5G��f�Av����q�5n���q�5n�	��$I���N/q���.[�Lw�u����j�������eddh��i��q�|>�f����c�	`)R���{�J��uN��A�������&edd�_\\���'K��L��]�v���^�`�=c9�@��k�������q�Q��j�6#Q��j�6���Gk���0`��sN���������;��C��
�$-]�T,��5k4w�\-[�L�-��1���N�$���k���:���u��
>����U'��_���s5w�\����~����y��uy����J������R��BI
���^���PnKJJ����8����<�:��e�-���q\s��"��p~V���u�8n�^>���:X��q��q��8����W����������������{��_�n����K�j��������?�����SNN��������|�r���v�tX�T'���N�����~q�	8p�<�C�ux���	�u�VI��-[4q������
�\���X�h�q�Q���%q����q�5n���q�5n�	E(/���;v����WBB����o_����j��U��|JOO��9s�\ :�,E���P�U'�����=�������C�/55U���Wnn�����+d/�������N�m�5G��f�Av����q�5n���q�5n�����F�@��z��w�Av�X�E'�6����m3� �g\s��m���H��u���D��6tX�#MMM*++SEE�$���4x[RR���Pn7m���-�q����X����w<���8������������q�~�|��u�������8�q���UVV���rY���N Kt=���
�\���X�h�q�Q���%q����q�5n���q�5n��:�m��D'����#��{�r,:���5n�a�=����m3��mF���3��m&�����@���;������f\s��m�d��k���P�����P���h���N Ktc���|���l��u��'���l������q�q���	�������8�q�N��	`�N�y����K����sm3�9j�6��$��5n���m35n���m3�@'�
�@��z��w�Av�X�E'�6����m3� �g\s��m���H��u���D��6tX��A^�`�=c9�@��k�������q�Q��j�6#Q��j�6
t��	`�N`�p���o������N�������������8�9�����:����q�v~�#3�	�:�,�	� /�7�sI�3�cqN�m�5G��f8��=����m3��mF���3��m&�����@���;������f\s��m�d��k���P�����P���h���N Kt=��;� �g,��h�q�Q��v��3�9j�6C��f$j�:C��f��N`:�,�	����-�����	���������r�5�q�v~\'��8�����xd��:�@'�%:����s.�{�r,�	�����q����g\s��m���H��u���D��6tX��h�������_�����������Y]]�V�X���<�X�Buuua=��w�Av�X�E'�6����m3� �g\s��m���H��u���DC\u�o��?�P7�|�$���@�6m�6n�(���3ft�������u��=��������<y�$i��)��kWX���v��3�c�	�����q�;���5n���m35n���m3�7����F=�����~���dI���K�`��Y�Fs����e��h��N�tX��K;v��g�|�.--M���SBBB����J������R��BI
���^���PnKJJ����8����<�:��e�-���q\s��"��p~V���u�8n�^>���:X��q��q��8����W����7����|]t�E:��3�����SNN��������|�r���v�����^�����}�t�gt���	�u�VI��-[4q������
�\���X�h�q�Q���%q����q�5n���q�5n�����n��Y%%%�OmW[[�U�V���)==]s��Q���;}:�,�	��.������������+77W�����`W��#��{�r,:���5n�a�=����m3��mF���3��m&��	tX��A^�`�=c9�@��k�������q�Q��j�6#Q��j�6
t��	`�N`�455���L������mIII��C���iSX���5�/*b�����X�.���v����Zo���q�u��q<���8�����xd�)..VYY����e�N`:�,�	� /�7�sI�3�cqN�m�5G��f8��=����m3��mF���3��m&�����@���;������f\s��m�d��k���P�����P���h���N Kt=��;� �g,��h�q�Q��v��3�9j�6C��f$j�:C��f��N`:�,�	����-�����	���������r�5�q�v~\'��8�����xd��:�@'�%:����s.�{�r,�	�����q����g\s��m���H��u���D��6tX��A^�`�=c9�@��k�������q�Q��j�6#Q��j�6
t��	`�N�yyG�d���Xtm3�9j�6��{�5G��f�q��D�[g�q�L4�	lC'�%:�1�u{���R���:�����R����R����8��������q��q��8\'���D'�����`�%q�X��9���5n��\��k���P�����P���h���N ��F_������~�����}R@t�	� /�������N�m�5G��f,w��~�m��a�r9!��{E_�J}Q�5�����������D�[g�q�L4�	lC'�����5v�X��u|���}Qg��l����(���\���t{��JNM��� �	� /�������N�m�5G��f\����U�&<������Vx/�Z�k��c:�.]�+v_��K�t���!�C���������q�:C��f��N`:��mUU��o��[Jn��8��}��wW�k}-��HRmy�V[vn��9J����,�.�$m^�Y�t��DcZW���	����K�����_O��K���u�?���M���
.�o��d���j�����@��,�2�������ma�W��Y=<�a=��h��7}�Q��:���pT��5���x<����u�����:����z�@�6��k�q�~=����/���k���
�i���a���\����}�9>���(V�@���FII���7����H/�4��x}�2.��:V�e\s��m�u��wWk��Aa�����������h��!gZ&�(i{RX/�\�m�PN '���k����C8�+))�;W�#_�O_�0�_�3V��\�e$���j�6
t��	o�S;���NV�����z��g�o��o����{��'I�����+����+V���.�����y�S��q���[�c�[�5�5����I����j�k�>?���.�V����s.cy5���z���O��'�zy�]s����_�.�<?j�6
q�	|���U^^���g+!!A���JMMUAA�2224m�4m��Q>�O3f���1��'�W�V�S�����B�����$�����5� ���8o,>Y~"+x	�����[u�-�(!!A���������b��|~^��)S�|��._v���
�|)���X�h�q��c�o��U��:/�L�w�����mn9�r.���H����s���x|�S��������������C��/aeZ&����n�r�����x�H���x����m3����K���K.���[���x��������K�`��Y�Fs����e��h��N�N �r���y���7�of����b"~�|"�K���V%�OWpr���^hmm�)���h���Z�fM�giii�7o^�Kx\��R5��j�����P�����Ww�:�������o9�k��#����sYo��yy�������p����x�_9�}w���n�|2C}����o���!���&���O�w7LPjV�'��������:$}��|�>d/��	��t����5�R_X������G��0AW��B���T=���C�_�yw���Gut���������y���x��x}~V���u��8����W��������<��'?Qbb������>-^�Xyyy���QZZ��~��/_����N�N ���h�$���/��L���S������~��
�����~����{5x�`I��	�u�VI��-[4q����������g,���Am3�9������6}��X>U�=���z������EEJ���#�Q�5o���p��DC\t�~��x�	�|>%&&���nRvv�jkk�j�*�|>���k��9�����A'��s�n��]��q�Y^�I���>��~u�������0:��0p�@��?_�����O~���lI�Jh���������xyG�d���Xtm3�9�o�m�:|��y]��:�rB�s��+��	������Av���8��f�EE�Z��h���q��q�L4�E'0���^���]����x}���c^��ue~-S��T':���v��3�c�	����\k�@����7�2�/�;����q�q������24E

Q�5o���p��D��6t���i���%�k�#b=�����;T���u�|��HSS����TQQ!I*--
����t�:��M�6�u�q\s��"��p~.�}���c�<���q�����s�~��v��3��m��;�����)�+��[o�������q��q�����T=�Z

'�:X���q����z{}z;Nqq����T^^.kt��	����oJ�.�}��%�������g���Tx�@��{�9��=c9��f$)�;������>����/����������q.�{�5�q�6�/*R�S������������D�[g8��f��N`:� R�>�������?�9�S�at=��;� �g,��h�q�Q��v��3�9j�6�/*R�!��p�O��H��u���m&���"%����a�r9Q�
���A^�`�=c9�@��k�������q�Q��Q���	����P�A}�X����[F���3�m3�@'�
�@K��~\�o�^G
��T�#\'��[��K�/��u��������:X�;��u�=�k��K����K�����:���s]����������q�v~�#3�	�:�]�_�_��y��u������:V����9;���)�M{N_�?_V�%Y��
#��&��z�)))��q�<�Y}�jU��
9�2�EI���+k�~]�!'�s�����:)�e��U�_T���^V�u�x���\�<���5��@��g,��\o�s����@��f8��f��N`��:��0:S \����F������0�9��h���<��gu�����t�p_z�����-����q�Q��>U�=����m3�������������\�e��N`�	�����V�A}5�g�c=F8'�����`�%q�X��9���5n��\��k���pN�mF���3���}j�����s�<��I-�����<��e��:�}n�|"�z�@K;��#�%��$�S�b���*}�TW=sUT�9�>��sc���|���l��u��'���Km��]	�&h��I!���������n�f����q��^�W�������x�u'��u��������s]���m��Yc�>�=}�('��+K�����W���+5|����{���a�s�H�:��@'�W�:�JOU>�g�=�qo��sJ58ipT���}/����w������Ta��LY��%����U�S���a���z@��	��T1I��n�1&ULR��f�<{g��������f8'�=���z�?��4k����5��TU5W��"������r�5����[��0h���c�������q=?-���$��{��w�~1�5r��?nr.��k���#�N`:�,4����>a��'W+=)=
3��);�����)�S�:�����^��5����XO���f������.��Lp2��A^�`�=c9�@��k������4��(�=go��q�M<<Q�#��e���v�y9���8n�q]��A�&��w�}G��|?�Lo:`.�p�fe����?<�Xs�<7�5�9����h���N �G�x�b����c=���,��yC�if��XO8)5V7����k^���3��e��f%�N@W�z��w�Av�X�E'�6�����={4&eLT�q���l��2��>;����q�Lo����Q�&<r�eR�n����������zO��@NX�����'j<.��N`������%IK�/��L�wW�]�2Fw�vW���I����	����K��������������=yh�'�!kG�����f��^��������x\�z�:����8�q�N��	`���.���K5=mz��r��|��i���d�&�S��@'��JJJ�*�������u������X��q�Y������	t�����Z�?'0\V����r�5�q�6C��f����g�?��w��s��s�2�G����z-������it�����3?<S��w�z15��D�<{g���((<Z��=������MJN���r=S���+����gb=���m�����z@�h���>	�_5��Z|�eP��
6D��#��k����c������������y��z�����?�Y_��Wu���G}��Z�h��"I�����/������:�\�R>�O����3g������z�S��3�����'@��[���O
g�bR�$m�=�_��{.��U���A_���.x�6~icT���A�3�cyy�]s�����RiC����KQG�~��9�M��qCX�l%%%���^�F IDAT�e������2^�q�����K�R'p����5k���?�C�g����+�����l�=Y�d��,Yr��

����i��i�����|�1cF��q�v R��}_���]���E������$I��?�3N|�~|�~|����A���Tbjf�L�<K��5�S��x����{�*//Oyyy3f����4a�=���=z��D{���X�'O�$M�2E�v���I/�7�sI�3�cqN�m�5G��n_���9'�����V%���R�����ef|��������#��xS�)�����zV��-��Yd�q�L4t�	|���t�W(!!A����N;�4]w�uJHHPj��:f��{������t�M:��S%IK�.���f���;W��-��E�:}:�Nv�-���XUN���T"�)�����
;W3�F���Nz��?�w�n������T"��������T�$o��<Nl��655i���6l����5}�tz��$��?�����.}�+_��O?��giii�7o�:��VV�&?_����JR��t��_�r[RR��-�q�~���p~.�m9?/���������S���T��	���C���C_Q��	���j��X��$m}�?56e��������nwg�Vk��Q��q�u��q<��t�j��r����4��`y?��K��2!���f~�X�������B5��j�i��]v�{�����n�IS�N�������?�|��
�����x�bIR^^�rrr���&������+77��,�@p3n�8��.�~���z���!�x����
pR���Y<v7;��N�:U�/����5u�TI��@�����VI����u�)�6a�m��U��e�M�81����{�9��=c9��f\s�x�2R&��!����$�%��o|�����-5n��|��g���UiCi����������w7������+��oK��f����N��-[�~�z%%%)%%E�f�Rff�$���V�V�
�t����~��)���N�_���q�<7���'i����$�uO�'O�=�����\��?�S��$���j������������F���v��3�c�	�������e�������!g,w�}{|:#������[j�6C��w�Y�g������Y�xqC����~��x����m3�p�w#�N ���yw^>_@�|->
�2(���V��F�I��i �	� /�������N�m�5G��.3.e���C����Tx�0��l�s��?��S56el��py������%�]&=)]��a�9Z�z�����3����[j�6
t��	7����0aC�e ��	����&������B�TZZ�-))��u(��6m
�������EE����\�;��e�������+��L��*\�����S���FW�`1�k����������`Y�m�g��^_���S\\���2������6tX��A^~o0���g,���@��k���p.�{�5G��f�q��D�[g�q�L4�	lC'�%:���v��3�c�	�����q�;���5n���m35n���m3�@'�
�@��z��w�Av�X�E'�6����m3� �g\s��m���H��u���D��6tX�#\'��[�/e;?�h=!�/e;?�/�>�k������N�}=p�������u=�N Kt=���
�\���X�h�q�Q���%q����q�5n���q�5n��:�m��D'����#��{�r,:���5n�a�=����m3��mF���3��m&�����@���;������f\s��m�d��k���P�����P���h���N Ktc���|���l��u��'���l������q�q���	�������8�q�N��	`�N�y����K����sm3�9j�6��$��5n���m35n���m3�@'�
�@��z��w�Av�X�E'�6����m3� �g\s��m���H��u���D��6tX��A^�`�=c9�@��k�������q�Q��j�6#Q��j�6
t��	`�N`�p���o������N�������������8�9�����:����q�v~�#3�	�:�,�	� /�7�sI�3�cqN�m�5G��f8��=����m3��mF���3��m&������^z�����_��������b�
���i������1��#��{�r,:���5n�a�=����m3��mF���3��m&��X__�5k����?��%K��/((PFF��M���7���i���>�@����+���K/����k�����)S�h��]a=��w�Av�X�E'�6����m3� �g\s��m���H��u���D�	�	<t��^}�U�r�-Z�dI�N���K�`��Y�Fs����e��h��N�N Kt�������+��yZZ��������N�ZY���|�VV�_X(I�����;|�mIIIX���5w��GX��������<�k�_T�:��j����������x\�z�8n;?������B5��j��S�N�N�����*))I�TQQ���l��?��$)//O999JKK��������������	`�N�����g����u��w�_�~���4a�m��U��e�M�81����{�9��=c9��f\s�5���	���g�9���GU�k4��E�sI�3�9���j�6#Q��j�6
'|'�X_<'���V�V����Szz��������w�������u�Z�|`N�����u�������Q��F'0�}(I����?�rss5��._v��;� �g,��h�q�Y��g��i#B�p*))�������J�+\^��x|��k�[d,���z���-#Q��j�6
q�	�
:�,��3+v����9S�k�e�����s44��^E'����#��{�r,:����e�W��ihV��J())������	������~�-���������r�5�q�6C��f$j�:C��f��N`:�,<��]IR����:NEy���2���@Nf:C'0F���TVV���
IRiii���������n��)��[�������sY�x�W�q\s;

�����
��G}�jhu����	���o6��z{�^9�����������`y�����8�q���UVV���rY���N }7��Z{vU��y�e��4#{�Y3�O��������G�vK��g��4:����s.�{�r,�	��yz����i{r�J����
Q��e��_�S�����.\r�.L0�!���g\s�m3��mF���3��m&���x������}��u��XO%b��R���1��V�g�������A}c=N
t=��;� �g,����L��~��n'��%�
���sVEy�������p�e}��dv��3�9���j�6#Q��j�6
t��	������w�:G_�&;�S����u�������{��JD�	��@�j�>^�E��n������k�cy=�qJ?U���X�5~��VC3�������g6��%�q�Av���8��f�q��D�[g�q�L4�	l�Y'���oJ�~�wq,���~u�&e�O�-wN��T:���?�c���������JD\7�q�n���50�S��@'0F��N������_~������C��c���Mg���uI\s\_�v~\'������w�!��s����}{44�����57���_�0�����������x<��e=p�@��q��8\'�8'��?���*���������J�V�m���������t�E'�����`�%q�X��9����N�����	<P�k�Y������$V�q8��=���8n���m35n���m3�@'����	�������*�������������2=����K�<����6��o��7�wF��7������oj���XO���@���@{�����9���3�%��gua����m�N��3��S��l��X�5^Q^��Y�������Q�d��k���m���H��u���D��6'j'0\��`�~��4�u����
��}W�}�E�����z*���yOk�c_�����z*q��Ok���5��!��
':����]�Q�������}���m�N��3�����Hx���v�����9'p����%�:;����q�5n���q�5n��:�mN�N�����N��9g�z*@�Z[�r���so6~_�}B��jnj��}�8R|�cK'[tc�������uI$)c�!�_V��zG5�����8s���y���9�a����^�w�:�g��O�9���%W�|{�5���_>k��W>����k�'w_��%W*1)�s���}�OM��Q��K���N����N�}=p�@��q��8\'�N�N�����|�_<zi��zq�n��T����*�L����r�X]{��(����g��+n:]W�<.���'K������O]��y���=��?_��K�b=N
t=���
v}?qRZ�*�m>���Ju���ae��5V�<UV�����8'�=#I��k�';*���m���ge����:���������$���K�f�q��D�[g�q�L4�	ls�twl>��������C�T����a+��{�z���
��5�F]6(?�1�L���;g�93�+���K�0�	��@�����.B���������-��n]p��g������}��Ay�#U0*J���k^L0x����������j�����67��Mo*Sj�M5��G��V�C����QP`P������ct��bf����y�<����f�[{��7{/E��*��\�rE�:�^>�#r���+��1�u�e��(�������+�Q�8�(���8VI��0����q�1�q�1��J�M��h�0�W�w[L��C�
��^D��������?�+��j�3��]!""�X	T!5�=�p��LQ�-���_�)>T�8���������r\����A$�/�	�7:l�� �����J"7�9.7`���a����Vo�/�@""KV��!,zu<�?��]!""�X	����N���;R�8���	�����2=���8�x�������qP�~��Af>p?.���[����Q�J �\������kI�cd��k����1�M|�����n�vx-�x�hs\ns\n����c���J �wo�t�����8��]!""�X	TUU���D$%%!11g��1����[�n��
�u�V477+������x���X	�#�7����H��-���8�������9.;�9.7�����
�V�V���z$''��7�8p~~~���@vv60��.��J �w���J���B���z=<<<�?+..��I��'O�T%�
5��d��m�(7F4�9n�J��cD���rc��rc������[p�J TTT`���hnn��+0l�0���k��{�"**
���������+�D���_��{�
������+DDD�+�}���x,X�iii�~����%K�@��tk�z��o���U�33��X�{w���Q��)z��vD�j?��� �"�-�jnG4N���q�����]��y;��[4N��B�~��A��qg�������q?n�v���p}�v�A6���b4�v�Z$$$6l��������z�[�lA\\\���Q��/�b�D,zu���BDD�/�(���+hkk`�������g���(((���#,,L�����`^K"#�-^(7F4�9n��uMJ;��D<F4�9.7�9.7`���a��������_�BFF4
�Z--Z��� @SSRRR���,^����c���@"��X	$""���@A��?����qqqX�j��<==�l�2���a��e-Q��A���+�rcD���&�hjl�y;<�,#����0�e�0������W���@"��~��c��@,y}���BDD�/��Bj>#�3��12�b%Pn�hs���vEJ;<�,#����0�e�0����+�7�HD�+�DDDr�h'mmm�x�"���eee�G�N����y<v�����lG4N���q��?��v�|iG4������A�j���{�~\�8�y?��� 3���?����Nqq1.^����/C6Vob%�������C_��������BDD�/��Bj�n0�%����	�#�7is����%��c���a������c��rcl����X	$����@"""�X	T!5��d��m�(7F4�9n�J��cD���rc��rc������[`%�&V������
p�������""�~��@R��A���+�rcD���&��hjl�y;<�,#����0�e�0����+�7�HD�����(��
^�����
Q��J��p��������q�@��	q})�c}{�S���O���5���qd���r����u��:�*�J �w���J�
������D<Ff[�&Pn�hs��5�h7��^K"#����0�e�0����+�7�HD����Jq���.����""�~��@R��A���+�rcD���&�7�_�����Ss�hs\ns\n����c���J �w���J�
���� ���l��@�1�q�qV�#����0�e�0����+�7�HD����)���)������+DDD�]oh�c���������*�&+�v�u{~��Rr��u��'���L�uMg����4��������A��qg�������9�~���x��Q����\��7�}&=����B����+�D������\������BDD$]�����<���^�l^�x��iBm��Bj�n0�%����	�#�7������%��c���a������cD���a�4�G��{z�f|����)��B�|�E9f>9R�{��#E��5�*�J �w��-]���M�����{�MfV������|���9\q�R-M�x��O�bF��aO�O��Jj>�3��12�b%Pn�hs�D�2Zot��VI�cD���rc��rc������V��!�������k\���]�����������:�'Vob%����������s�^`������
[��������]Q
VUH�g`xY<Ff[����c���^�����*IIq��u!����h7(ng��#:���R�9.7��@�1s\v�p%��~�*�j��`m���J �w�X	�*o����/�_�u����>��I�m�����S��-D�������z^��cH ���v��4�
���?<j���+�v�u{~��Rr��u��'���L��*��T�]VV�Sgp�C.��&�k���F��;Up�Gy+���Fx���g���q����3���[7/�:;~��j����OO0U�{��������$���Mf���==q3���Y���2�	,-)�i��P�M�M�[�|�������]N3}m���+�D��}S�wVd��s��+Vs`{1r����O"{�&*�f
��KBm�V��t<�,3��[Q[$��W��&����B�����p`�s�����G	�(��'�����^���mP~���G�by�T<0{�P{�t�����A���P'2�"DOj�v����3~0~}�������Jm���t:���8M�h�>+�f��y;jY1"�-������1�M��+��j�y;��.V���QC�wU���t:T�k��Q��	�@��&�m����R<zDQL��T���<�V�[OF*��v�r����t:�>~�e
���^����7�W��p�>����HI�W�c�&t`�WO)jKt�55r8������@��.2��O���a�bt:^����U�^�y_r\��)�����������_�i�������V�����x+�D���~{o�t��k���b5"U=���h[%|X�0UQ[2�VI��>���~y�7�����j&J��)K��*���!9c���b5������C�j���V����f
�����}���@A�����o�F#\]]�p�B����������>>>X�x1<<z�A�gX%���+�rcD���&�k.J2��% IDATiGf%P�2��R��N����T��(F���Ry��J��^(<����d��h�D�9.4�J�H/c�YYp�6C����E�?<Lq���/��t������2�i>�U��q8��.ZM������aH�s��H�{��mCu�uE7����]n��]�s5����W������	oooTUUa������8p~~~���@vv60��.��J �w�X	����]>E��:p���Ab�\Em���UQ;�?��������0P�0�{<���y�"�_T��
�B��&���m��S������SY�����]��U| ��xwPA��
����700��_7�����&ML�<g��Q�m5�"s���#7������1�M�4T��z���oC�Q���>���������|A�x��������t��'j������)%2��U�a?>r�*J{��>+�d^R|��5��Dc��@�ee����x��
���8c��7��\�(F���������5-��u���+������s����g��]������w/����n�:�^���XV����|A��G�*��������^j�KS�!��b{w����v��5���C�B�|�c���>Y^{� ^���8�����������0�?������&�b�4Eq���w^9��<����=��G�������������X�d	4���Wn�z��o���U�33��X�{w���Q��)z��vD�j?��� �"�-�jnG4N���q��DE�yL\n�vD���rf��d����nE9������5����W�OO���#_)jOd�R{����H��-���Wq�e
��=}���?z�:�8�8�j��r�]n��~ed�q	�P�8����/���\�x���
aS�f��N{]�o���"���29E%�������/ 00����
�|�rx{{C��c��-����@"���^	�5j&s�)R�*��R����l��"�G�Xw�M�y!|�����pe��8�o2��e�7���'����,�;5�^o����
jmm��;0o��N����@~~>���m[��
��R�12��5�rcD���&U���#:�u��U{��������*�]G(2�-��������O�S{L��<��)2��>�P���3����
��Y���3y��wj@�c�>j����=z���!����\�...hjjBJJJ���`%�����+7����q�.��]��Q3�]�����a��w��VI�^�w	���
��cV�cZ��-�q���(�QG���P����a����=�_X	����b��u���1���bz[���X�l����l�2Ek�>#�*�x���X	�#�7�XYC������w��&���O����q7X%�����)�qj���������~�WLz4�����oW�l��������)�\�rd~�\�rE�j5����W���@"��������?C��%��J�x���G	��'L�uLkKfx|��-VI�����������k�n^m����o+��%����Bj>#�*�x���X	�#�79���vxM�xx
�+^[�lg���*�����8��z��ZY%��G15�G��R�9n��c���J �wj��5�����$���u����(����8��=#�C��nUw�������
zDd;��I[[.^����:@YY��Q��u�wo�;���2���geq$�Od��1_E��+<p����T.�7��v����k����$���z�5U��5;�Q��N5j����M���h;.Z
v�� �����;RwOW���O��������c@y&��O�jz��[?e�P\\��/���������X	$"�i>B�q��������^j�KS�!���^��4e���Q���]6���+�>��]i���FE
5i�J k����t		q��8}V��i�v�>�bD�[�-g��c���L��4�����~	�|�*F�����Lz4�^���9�\}O��U�������hP�����v�>j���}�h������������!O+�q����Sb1
�d����x+�DDb�����g������:�w�#"��hUJD*Y�����v�������J"#�-V����1��3��18�?e�>�ge���k��8p�qP���9.7��@�1����������(<���?�@�����a��
e�T����i�:s��@'�J ���
�x�w��8�����YdE�;�
�y��5FD������(#��[���@E��e~�\�rE��02bD���rc����@p����V��?��������5�^��[\j���js���[`%�&V�����L�um"X�#;c%�N�N`��\'Pn��N�����N���q})�vD����?����/�k��{w�����CK���B���5������z���*��-.E�a����	�_��[��������iD��@R�w�y-�x���xM���8���^K"#��#<�����QK��������9��is\v��rcl����X	$"""�a���z��@R��A���+�rcD���rcxY<F4�9~3F�n�
c���yI�1s\v��rcl����X	$""r ���`%P��|F�g��cd��J���8����A�����
�k����Z8�����q�1�q�1��J�M�}O_�wc����WX	�������>���u��'�u����K��#���E��p���h}7��k��FE�r��w�q�9/���?�����	TV��H��T��bu��HuX	T!57�i�%�#�-^(7F4�9.7F���H���n�D�w��tI��g}�Tw����S��������a����Vob%���	��DD�b������J"#�-V����1�������N����q��m�y�E��-`���a����Vob%��H"V����X	T#5��d��m�(7F4�)s\�um
c���_g�N�:����m�y�E��-`���a����Vob%��T�U3"""��J��p����N���q�@��		�w��/�����|E�/[?��Q�-�^�a�*O\_J����q���:����������i���+�D*$s
5Q��� VUH��
��$�12��5�?�I�W�k�)��F�9n�kI�cD�����0�e�0����+�7�HDDDDD2��yyyx��w��_���������u+6l���[����Y�v�|F�g��cd��J���8����A��c���a������c��rcl�)*��/_�����}$$$��?p�������l444`���]n��@"""""����>�����x����&ML�<g��Q�]5��d��m�(7F4�9.7�g��cD���rc��rc������[p�J�-			�*�k��Ell,������(�[��W��2��@��d@�Z�M�CDDDDd+�����7�,Y�F���
W�����0\�
}f&��v�����<�t:E����h\�{��j�_��G����^����<|2���`�c�c_���[\
�}����T�������?5�#����8H����V�8H���W�q��w�q���������i�����oG[��b�SW7l��������z�[�lA\\\��[	L��U�5W!\�������X	����P����(^��
6��V$g
5��j;�m��@�1�q��Dn�%��c���a������c��rcl�)*�'N�@zz:�����s�b���hjjBJJ
�����������m����m^���Q#""""�����O��o������x��71}�t���'�-[���8,[������f�����~V�g����(#���3��1�q�q�1�q�1s\vs\n�-8E%���@"""""rH�������x���X	�#���3��1�q�q�1�q�1s\vs\n�-�x+�DDDDD$+�v�����/���PVVf~��t�����c��)z��vD��YY��og�W�vD�
�8H����V�8p?.��w�q���������i���/^����!+�7�HDDDDD2��Bj�n0�%����	�#���kI�cD���rc��rc������[`%�&V�����H&VUH�gxY<Ff[����c����d��8�������9.;�9.7�X	���@"""""���@R��A���+�rcD���rcxY<F4�9.7�9.7`���a����Vob%������db%�N�N`��\_Jn��N���������q})�vD����?�(?���?�����	TV�����H&VUH��
��$�12��5�rcD���rcx-�x�hs\ns\n����c���J ��J�
���� ���l��@�1�q�q�1<�,#����0�e�0����+�7�HDDDDD2��Bj>#�3��12�b%Pn�hs\n� ����1���0����q�1�q�1��J�M��L��	�	����K���	�������?�/%��h��r��u����r����u��:�*�J ��J�
������D<Ff[�&Pn�hs\n�%��c���a������c��rcl����X	$"""""�X	T!5��d��m�(7F4�9.7�g��cD���rc��rc������[`%�&V�����H&VUH�gxY<Ff[����c����d��8�������9.;�9.7�X	���@"""""���@;�:�=?r})���:�����Rr��������~\n��N��|�~\n���N;\'PX	$"""""�X	T!57������l�����c�����$�1�q�q�1�q�1s\vs\n�-�x+�DDDDD$+�6�����[�b��
��u+�������� ���l��@�1�q�q�1<�,#����0�e�0�����SW8???DDD ;;


�?~��e%������d�U	��2�>m�j�i{��X\\�I�&&O��3g�(�W��A���+�rcD���rc���1�F��?]�DS���D�'!t��P<�O��oT�?VI������a�[?�����
��FW�Q�]����v�Z���b��������u��z��._[W?cj����{p_�u�����-�n������a�R_�~xG�����@�,[��~8DD��t��=�*�������j��k��[�5z���J�����2���X�d������^����
t�?�������<���bT{=�]}1YS�����{������v<0�	�/C��c}i)|�����;._���������1��NQ�\�������"�F������vd����������II;m�Na��	����?Y�*��q��V4jJ�W�B����x��G�k��8O�4�T9"�����P47\�����������T�8������O�x�}���0��
�/�����!��V���c]U\���G�2x0:����]w��<��!��n����/���7�z=�l�����._[W?C|�:=w���9RQ�j���������m����AV��x���l1�q�q�1��[4N�1�q�q�1�q�1s\vs��1�����~�������6�V%P��+�$����m��=������#G������������._�|COwC��<<<��j�����_}����6oG�� +Fd�E�r��8���Y�-���8�������9.;�9n���6�o�����W�.�r�J`SSRRR���,^�]���J ����2d����op���������c.�[>��\o�u����J���`�C�/$"""""r@�b�%�:|���`4��>��+W���,����E�_�z5��������������������_���B�P3��wyy9�a������x��P_���c�����[�b��
��u+���0�{����9������Bbb"�������x�"G��1�%''���Xk�-���,�8�������~�;l��yyyB�Q#��wnn�9�7l�����7B}Q��;h_���B�����������R��j����������c��x�����C&������-[oooTUUa���o,���9��FXX"""����#G�`����q�j��M���9��������+WB��������x��7��G������������R0@J?dR�x[���Y�q����q��
���A�����IJdP�xO�6
p��)�>}ZJ_l������GCC���������
3'�����{�n\�z...X�h���,>Kyy9<����gpssS����@��?�Q�x6�������~��u����s�����|�r�����e���?�9.y�oq�W�x��z���;25�����1s�L������������q�

����C�1�N�����# ������L�2�F���x��!C����mmm�������9�8��~aaa���������h����K�p��,]�������7���
�-���Cm�~�M������{���6o���9�z��}�vDEE9��KK�<����j���
������X�b�
��}�m�kjj���l�7n'jo���~Q��_�z�~�-


�����v:����8�����s��������;�=v���555�����7P__o>�PRR�K�.!--
��������1"##1h������~///���c��=x��Wm�UE��]]]���l�����c���X�d����z�u��5��8�AAA���G~~>����.��6�iii�7o������m����E�cn0���Xa�����/~a��kwjo(,,��q������R��s	����L�s�=���Auu5"""�?�����K�/^��w�Fxx�����xyy�����_���85�wcc#>��3���N���[�9��
Bcc#�������r|H����9����%<<���|g���1�����m�---HNN�k����w�jo���~Q�����`������0����/oQU�8�����|����xw�n������s>|8F��.���0v�X����F#*++�}����F���9s��?���v�z=��JKKK`���6j���V������s����5������iXX������m��=��6�W�\A[�i=Z�N???��aP���������G||<�����P�x���_�6��N8�<l�wl_j���\�p����������n<�#F����
�q���{�n����h4b���1b���o�2e
�����������Pii)222��j����g�}���'N ==���X�n��������v $Q�x?~�.]���q��A���+�}r�s�={6RRR���,^�s\�x;{��m�KJJ����F�V�E��|dS��;;�����g-���,�6�0g���������85���S�p�=�8�WA.ODDDDD�/p�x"""""�~��DDDDDD���������#<$"""""�GxHDDDDD��� ��������A �F#8�
6����������%"�y��7���_�9�7�|���|�����EdW|�v��e�n�Uo>�c/uODVs��1ttt ..�uuu����������P]]
77�;^��+�����455.^�h���Wo>+�D����c�=pqq���C			�������b����
7n���[��'''���D6����+W����������_�����OHH������;� ;;���8'�����`��1�������HKK3���O>���������������[����@�����7�|���$$%%��>��s�PSS���������������h>��o>� ��W�^�����ttt����V���/�l~��2"G�����7n\��CBB�tz���S�LAtt4�=��g���JJJ�1c����0~�x�9s�)����HOO��/�^px IDAT����H���v��9:K������U���_~������C1a�dff���x������i���|�A ���I������W�UHH��={���%...������;��s�]ii)F��1c�������	OOO����������F�|��w��#��j;m��������������
��?�	


��3g���3�����	:��k>��@"���uY
����*���j��p'rD||<���{�z�N���Y�������6m`�C����������PXX���j�IA"gg��a��]X�p!F�����]�p��
������k0pq���= "U�<y2�����z��	DSSjkk��nX��=�Y���Ze;������1c����#>>c��Ayy9`��	8}�4��?�1c��c���.TTT����������|hnn��!C���5�/����o����P|��W�b�5X	$"�#�<b^"������:t(}�Q|��G3f��m�zrf�����thiiARR�����z���	rd%%%x����;v,���1v�X������
AAA������#%%�F�����������?�����@hh(���PXX���f����������0a�
�~�As��I#�=�. """"�r���W�\��;�j�*{w�����+�DDDDdeeee��_����v,X����!��A YUhh(bcc��
"��7�!"""""�G�R	lmm��M���[��e�W����?��cx�@^^222�������w����*|����h40��7o���������l��`@}}=�z�-��!"+���Ajjj�������#G�g����S����9di���5>#�{����Sw���>7,�1�r������p9�V����;�>|8V�X�����m`����j����Grr2�x�
L�6
��M�:u
�O��z���������0`@��:�U�V������=Z�C���Y�3���+��=Y��>7,�1��J��|���k=����`����M�
����_ZB��������`��)V��->|3g�DIII��}}}�����u��}J���Q���^�yB�����%2���J`yy9������#�i"`0�q�F���a��E:t��v***�s�N477c���~����s�������F��l���z����w�l��E�E���	�a�9G$������Cw��$�������3d7KD�~��_a��Ux����g���!>>,@ZZZ��b��qpu�
�I����0g��?���BTT�9������W�����}n�������FTWW#$$Df�D�^^^���p|��w}�^xx8���;=����������&����*l��
���ZZZ��?�x�"������J�6�=��l��Jt[W����
{�zx��q�������z@ii)z����+W�����t���jii��0n�8+���v^�u���#>>���x����?4h���QSSOO�Nq����sD�`��"g`i>t���o>H����`@~~>���e5ID}TZZ���h�Z�����g�5����HOOGss3��[��s�b���L�M}_EE222��h��j�h�"��N�:�{���_%���O��O>�F���O>��g?�@�sh�����X�3���w�"5�n>t�^�As��I#�=Zh�UXX���"<��36m���������o�>+��������e5GDDDDDD]�j�DDDDDD���������������6mBll,\\:�����X�z����}			HHH�S[D�Uss3>��S444����/������+��ZqN���@t�����W^I���m@��'0x�`�;���eee�������/33���}j���:t�����������	g��Ahh���+��ZqN���@t���CQQ�>|��gw�_o�����4�������o�����h~������m�q�FTTT 77iii��|��'�p�B����������wHJJBzzz����������M}_�f
6o��m��a��������)..��I��'O��3g�^�t;Dj�9At��mj���	,//���?|||�������_���3���j~���x������c���?~��
����������n��/��/�K��� ""��|{{;�L����h=z�i=��K�����x��WQXXh�� rdz��}�vxyy���I��J�C�V�D�q>���� ���c��a�������APP@����/))��K������vxzz�����������q���h�����#�g��{���0a��yC����������;<==�������������%K��z��!R+�	��8�nS�|�z������j����:&::���}�����BTWW�K��m?**
������������+�0~���,=OD&�
Bcc#���������)�z��!R+�	��8�nS�|��u����������C��������*oc��ENN�h4���0a��>}�����1cz���k�0z�h��'?Amm���Q����@~~>����?KNN������#�� �����65�i�@�����|DGG����s����>����1p�@��,�����������'b�����C[[����_�n��v�BSS�F#���g�7I����=)))���1�������^����9�	��8�nS�|��<y��G��Ho����<��Cn��������������@""""""��[���� """"""��A Q?"}�@hmm��M������LDFFZ�cDt���<ddd���������";;���L���x����������w�^�F�����^���RSSQZZ
�F�Y�fa��)6_D�f)�-inn���~��������������Zu�q��<�4rssq����bL�:�&���,��y,���.����<�����@"�����c������6mbbb��s������q;����p�B���!22�dgg������C\\o�M�R�[r��a�������---(--�c�\"5��3P������C���_����HKK�z�����|P��2>���|7�u�_o������Y��7o��m��q�F�����������������PXX���$��� ))���5��z1x�`�?/((������v�^������Ng��5k4
p!`rX�r����bL�4	0y�d�9s��������,��X��P�������/������?����Id/����<��� �X^^���0}�l���8{�,^}�U�_����)S� ::G��?111pwwGLL^{�5�o��njmm��s�0n��_;l�0���[4440������"11�7oFuu�M�Ld+�r��^��};�����������z���D�"��]�X�h�����p0rXJ�X������c��a���k�Z������nnn�����9��������JD=(,,��q��������<�������������
���H������&�	K9�ooo,Y��\	OKK��9sl�U"�����[����BTT�JK$�m�� ��0������FHH�?�����������H����_����a���+�jkkq��i���&N�����m�Y"����4�������^����'����m�6�k?�����r8"y��|��/�+�����8��(�c�R+����<`�mi�Z���Ze[D�\KK.\���WA��V���`4q������|���������H�,�8���

EAA�	���0�������G||<���yH��<��|LWWW���������.���� �h0������h�l�����M�����e��Ye�Dt�'N ==���X�n����������S���{��*hMM��������h��1c�3gv�����T���`��E6OD�`)���������������-�������(�O<�>��h4<���6yD�bi>t����|��<y��G��������EEE������������@b%p���?~����������vY,����������DDDDDD���%"nimm��M���u��������H�m����������B�577��O?ECC���eE��z�*>��sh4�F��7�|G8"Gc�9Q]]�}���h4���.����������r�yo0������Rh4��5S�L��V��,//hll�����}mw�A��Lw����J��e?q���c�
oc���E������{L�?D���C����MMM8s�BCC�~���x���1c��?;v��tGE"Gb�9q��w���Bdd$F��]�va����u(�7n����YYY�����e�0c����b��2�
Q�i4<���8q�����[��J?g�RTT�_�-��������1/0�f�l����m�����~����~���;���`��hRRZZZ��������>�����~�;$%%!==��|YY~���"%%F���|CC�l����D��@{{���!����1i�$����q����8p �Z-@����"Rk��a������������D6�d>t���5k4
p�@rH����.�u7�~�tG�5����������S����8{�,^}�U�_����)S� ::G�`��hLL���s�B�?�>|�����/����NU���t�����9sf������L�:���X�x�y�CD���z���������^_QQ���{�1^|�E�v����5'n�������k������p���������o������7����&�%R���y���������Vwwwxzz���
�;�����`�����������#�g����uZ����AAA

2W#���s��l�N?#������d��^�(���AAA��������f��Ia�9����������m�M")������`0   ���������{m�]"U��9�t^Y"� ������		��g]�Eo������0c��?����w���0h� 4660����+:=�><<���6�+������������/��o����Ky�����'���PSSc�N�����t^uG�A������Xe[Z���W8����k�0z�h��'?Amm����C��������*��G�FAA�����������I~~~��zvu�nW��r�
���:�~~����H��1'Z[[�c���7���r:NdJ�Cwy�N8�����d>t7����D��`@~~>�������z�6m���/�-[���w�����&�-�o�;w.>��3>4?��SOa��������������/eA��f���������oS|KWgj�z}II	222��h��j�h�"�o����1'�?�K�.����8x� `����"��d>t��s�����;���
~N�C:q��������u��a����>}:e���y������F�T�����B��g�q��9�}���xHDDDDDD�s� ��/!"""""�GxHDDDDD��H�1�����b��M�����=33���6��H�D���������h���+.\���`TUU���?�F�1�p��;TY��-999HMM����e�-"�hnn���~�����B��� G���}������<��#�������������f���)Sl���X�Wk����\9r���KS�N��]����������}�v������-[oooTUUa������C@@V�\	�V���z$''��7�P�hiiAii) �m��������dgg���#�.�n����(���k�Z����y;����q������h���d��C�W���Z����CX�j\\\���h����~��|��k�������K���Y���@�������F���O?
HHH��I�PXX���DDD������hiiARR��^{���omg��q������>k�	��s��k�.C���7����>[j���w�.��6l���q��u����^������v��3g�DII���Md���X�|9`�����eK���^�9A���}���SPP������86��Y#_�����hkk���K��V�~M`yy9�������T6]�t)��=�W_}��������c��)������G���GLL���s���k;3g��K/����t�����x��y���?d�����U�D�(//��{���x��������_�NMM
�z}���9*�^��};���z�Xt�z�	rT?��0��q#6o���:������W����"11�7oFuu�U�MdM���Z�a��EHNN�|��e��;v3f�0�[��������pssCGG����� 88>>>hoo����a�����5?��w�!88#F��V�z?=���v�Muu5���;U5���� --Mh;iii�3g�M�Ld���X�d��,���sN�#��3�W��V�Z��~{�����`@@@bcc��{�Z��D�b)_�5�������^��e������FTWW#$$���u����������okJ�I�H��g���^��k
���(//�NUU�m����{---HNN�v���4h��|=}
���sN�����������w�	o���'N���)���f)_�5.^����`����O}�zx���n����V���+���o0PQQ���*������PQQ����N<��Zb�]"G����;v`��y4?�����t:��K���������#>>���pww����D�$44����Nw���`���sN�#��o���0����;���g���@����?������YEW�j��0h� TWW�������J�;��`@~~>�������z�6m���/�-[���]]]���_���?��O��?���HII��Q����&�nw,�K�(�?�K�.����8x� `������@FF4
�Z--Z�)��gj-m�w�%����g#%%999�%n��z������}{ii)222��j����g�}�S����9s�`���HMM�����7Dj�U�Zs><�����O��h���O�����'O�������(**��E�"�OHH@BB��������W�\��;�j�*���R�DDDDDD������J����1~�xUm�����_������gDDDDDD�!��(����x���QQQv�
�uH�:���{w�������f�r����M�6!66V�����������y�Q�V�^
�#�<���999HMM������Fyy9�����WWW,\���vw�'R��r���Jw�?�<�����___�������MW�!77���L7����[o���v����������N�)���8r�����N�j�wB�w����(--�F���Y���?��F�����t���ng��)V�|��A`^^����tg����}"2�j�]V�[ZZPZZ���
,[����������;���������.��������?c���5jJKKq��a<��S6}OD�����M��i��N�����{������b�
������?t�V�Z$&&� �T-;;7n�@\\4
��������+WB��������x��7o�����A�.�]]>����z�������Kk��A`` ���Q__�Q�F����`��M�4	������@DD
�{�u�p�y?�X���6D
Y�����������FfNWlf������Zy��?Z�i��T�d:k`!���������7��}�`|�w���x=6'���y}^����|��u�&EEEtuua0���s�Gip|������ijjR�k���JIDAT��� ''��Djj�2/���+�����t:���HNN��	�w��y�,Y�������|=}�t:::�25!�������MMM<��@��S����q�B����J��7t?�!!!�����j����aSYY��M��h4N{����8�8��|�����jBCC	��;�n�Jff&���>R�6����D�-[FVVIII�*����.{��(((`������b�X8y�$����l6����j��s�N.\����{���WZ�BLV}}}9rRRR�6mf����v�������3�|���x�
��}��3�~xx8�n�"..������������5w����n�����/�A�&%%E�L{��m��W������������$00������d"77����;F����>^���~����S���t0e��������rZ-3g�$88��6���o��h4b0���/�Z�J���("##6j�5k�N�����^������{y���X�x1�N��h4�b�
�c544p��e��Y�6�3w���g�W�7l���K�8|�0f�Y�6��������7�;w���/e4IKK#--
��8�t�S}}}������o�t�R���k������`��u=�jq|��0�-���6���qzm��s0�V��|$������]|���4���������L�!b2�p������g������c@�����,�a�C�����������G��/��L��U{����W>w,�G����(�~�WTT�����*��{��Q'555��%�X&>>���X���^OHH���3#�����q��bYY��V���ttww{��9���`���"}}}�L&jkk�/477���+�X,>�W����v����s�aaa������222pj������'X�z�����B<���Y�g|���	@�b��)))!!!a,.E�Qs�3���������s�:7������Sihh�l6{��!&RLLUUU@�J��>�(������PUU�4�����U���������QQQAzz�O�-Z���G�������_�n������b�����'22�^��K�0�Ll��Q����Ggg�������;w(..F�����Ojj�G�
^
������������Bv��5��BL�gV����{B��r��-~���v;�g�&))i�P�����o����O�
:��p��U����Z�:t���d.\��U�8~�8���k����1Z+V� 77�s��)�Q�L&����h4�t:RRR�q�~p�������kv@Ys���y�[�n�a��e|!�B!��7;}�40��@!�B!�g�(K�	!�B!�$"�@!�B!��Dd<!�B����dff�$���Y�t�Ob�s��A���?a����;����,_����JJJ8t��O���~BCC���l���2�P��;���S\\L[[����R��/�PRR����y��g���Z�|��W�����-[T-�6�Z�����r��9��j�CUWWs��i�v;z�����+��x�gmm-'O�D��(�W���zt
�HO�B!�x�\�xq�Sp0V���ze�������}~�N�����?��#��;�������x��7HOO�h�������={���U�j���Q�SMWWw��q��V�J`` ��og��=���:��M�aaa������w�m�6en�HI#P!�B��_�����`���?����\�r�?���@QQ�����`����������{�<�P����|����?�����';;���6Ow�tttp��1>��#G0�L�k���Hcc�S�b���'�(�mllT����3���{\�|����.������f����^v�������COO���x����,X����~����m9���j���T�s��y�,Y�q�+���)����Q�~~~�t:��Nw=�����B!�bL��;���D4
---|�������.�w�^������ ..���8233��{���k��X�d	Z����v��	@QQ���
����py��SPP��������b�p��Ie�jW����7o�~�-,����_x�����q�g��e����l6Y�lYYY�������` ((��_�m�6����$�������q�m��a������_����<j���>���fh��L{{�2ts�rO���3���i2�����j��c���?�4�B!��jmm%//���V�Z-�����5�S�N1�|�z�)��W���F��b�(�f����(�w�]�jn��M]]�2��f�)�y3�.&&���lRSS�M&iii@���o�Q�k���h4�s��i����7����F����h���^���� ^}�U�����Z}5C��FV�^�TO�|8


\�|����M�QQQdddPQQ��h�����F�B!�Syyy�_����hz{{9p�iiiTWWSQQAYY��{�4��uoo���t'==��9fj������   ���Z����Gc��Gs�{��)=f555���:u*mmm�����)S��{����;v�������o��Z�N[[999l���!��������3g����dN�B!�SV���{�_1������f���Y�r�CO��^uww{������N,�C�X__&����Z�����i��a2����qh4������9s(--�n�;4t�������������@�����H�bG-���m4��L�:����f�G
�y��QYY	@EE����Z9������g��}ddd(�����Z����nN�8�����>}���lll������*��X�#=�B!��'l6���/��,`��5|���2o�<e���<:;;�%�[�hG�%$$�����=���?�g�}�������S��z=?��#555l��Q)ONN&''�����jy���n�:���)--�n��4�F3�r���k�����������M�6�:�`j���z.]���dr�oj�^�JQQV��C�������Y�j��G���v��a�,_����lJKK�-������V_-O�8�PVVF]]����k�.�Z�Wy���S\\�F�A�����2��4��]�DGG���B!�xx�r�B!�����@!�������������c���1c�������������)ST�
������	���P�!M6�
����je���^+�B�'�(�bB��z<x�������������o����b�P__��Y,�V�CLW��z��G��� 22����z���'22����9KC�c��+
zz�blH/��I#P!���j�h��$i4������R����^j��466����n�f�Q__��H�'Fs�B1V�(�b����8�tf��O
���,���������;�^-P��(�|C��-E��X!�b��'�B�	����IX,�1���<���n�f�G�&hii������N�a�&�I�:44���jjj���]����������������n�B!&��*�B!�������(�B!���4�B!�b�F�B!�BL"�B!�B�ID�B!�B1�������b�B!�B�wi����{��D�!�B!�b�]��@,_H)IEND�B`�
system-b-cpu.pngimage/png; name=system-b-cpu.pngDownload
system-b-disk-usage.pngimage/png; name=system-b-disk-usage.pngDownload
�PNG


IHDR�08Kn�bKGD������� IDATx���yxTU����B��H��,!��j .��+#�("�&=�L;
�N;�<�mw��g��8�i5j�0jmZ%�E@6C*f� !I��V���ARC�R�����:�~�'O��|�{�{o}S�~������_{@!�B!D{������ �B!�"���*|��wi���/���B!�B�������@B!�B	$xH!�B!D��&�B!�BT���������`���`������I !�B!��1������O#!!eee=�����B!���444�������=��I ����2,_�\(���O<�D������>���X���/�7�|���!��^����6�O�Fyy9PTT������v��������1`�c������k0v����f��~���C�z��8|��g�E���[�n�c�=��C��K��������{�
��n�	���7�
Q������b���(**B]]]�1\��v����������7o��C����IIIX�p!(4�?��O8v�X�����;�����x0|�p��7#G�4�M#������]�P[[���(�s�=��������&l��G�AHH.��r�p�
����_���/>����~���#..N������������3�<ch�������������DBB,X`*G��|�6�:~{����_o�Q���DDD���|��I`��f���n|���x����x��'��N��G����fl��s�����c���---���?���f��	0}�tdee��G�4h�qDEE��w��o����1��o�>444tz?))	iii���Akk+�m��W^y������}�v���>� �r�|�df����>\.�/_����]��e�]�:e���?�{�����~���k���4�������Fdeea����2e
>��#����X�l����|�������?�?��?Dm��9��!

Ejj*N�>������f���������+���9rrrPWW�����3s����mvv6JJJ��C!,,��>}�
owW|{���s'rrr�r�0d�,X������8q���:���0c��������������C����'N����;]�����k�m���O?�p�]wa��	���?w�\���			X�x1bbb��}K�,Avv6jjj0s�L�q����-[�����hll�����O��OHHH��}_�kmm�SO=��R���g��Y���
y�������1i�$C�8�Nl��	���q�w 44�����c�����PXX���R�t�M�5kV�����@MM
~��"$�|zn���NO�w_��]~�u|�:~|��/������k_��������>}:v������3g�������/��FMM
>��S�y��X�n]��������}?EGG�������b��^���m_�����V|���x���0`�0�f������_B����0r|����������O���F0����>�_|�!C�`������5�}3����>|?���m����t:�cj?���cG�����0���Eo>b�_����__�C����axH����!�._�����+�N'�{�=<��#5jjkkQ^^�e���{�N���?l�$����������111�����[o��������1}�t\s�5x��7:m����BCC�x�	��� ++[�l�������?�+V����[�e��_����z�jl��[�l�=��c�����>����06n����<��c���8q�<���w7��� �Y�����W���w�OVV���z
n�������q�m�y���G?��������>�(N�>��[��<	8p ������o���/GRR����vz:�e�����k��:~��5k�����^z)n��v��D(--��Z�����g�v��;���k�����)++��/�����=?��OLm���UWW�������������;w�y��[�|m���
��������������C��G}dz�]m�F�o#���0��3Bo�{O�����X�z5�|�M|�����{�L�B���K:��
3����3h� :���)Sp��aL�<��v|}>|}�����������������g�����0����]���AAA��l(//GSS���:U+��_�����W��
_�S�LA\\���0u�Toy8YVV�+��AAA��g���������7o"""�Y�f���C���
�l6\z����@����_��� ��1yyy��������CDD���0b�����~;�v;`��HLL4����O������}����z���"22W_}5<�!6!!�[�6l�t:}j�����?�)����q�F�Z�
/��2***�mz:�epa~���}?}A\\~��_`���x���P^^�w�y����������y��u��a��u{��7�|�������D<���X�|9<�o�nx�=����p��Y���?�/�K477c���=n������������0`����n���|v�����"�=��S��?��������3g�������}����/��W\q��_�OPP���>l��������|�y�����������H���]��?���W�?_���/|01Mcc���B����x�b����������M7��K.��C���r8�N��|"_�_~�%�o����x<477{c��������������/z��x<}>'�]7$$��|�v���`���
��^{|wj0������_�UM���v�/���n���ruh�����6z
<.TVV";;YYYx��G�||�����k��:~����H��������_p��w������+����s���]


x������j�n�������u��_�����G		AHHf���=�������{�i555���'��/�~>{����}���~��N����}k�0��;v


�:u��_�Okk+233q��7b��i��e�x�
,Y�D��]}��"9~E��������#�����0�4EEE��5a�L�0�������_��W<��S�<�����o�x�
,]���m5�����1

Bcc�)}�����7�����q�PTT��f{�w�\��������]�����'�|R�6 ����Ett4���;������f��^{��s�0x��n����t�m+n#����^;�N�<���is*���q�UWa���^������.�������u����d���x<��_��#F`����b��;���jdddtx��'����>��(_��v�I�bcc����������S�Na��Q�S�N������v�__>���3��IF�o�����:S%1���;1m�4S' �����
���c���

��I��~��^�����l�0z����k�}���J�L�!js��)o����0�4�inn��C������s�������c��v���/<��i�%))	o����m���!88��x��n��I����v�---��kW����$$$��/�@kk+������###������l����J=z�P��������?%%��~o��I�&��w��^=?u�N�>�'�o��
��S�L��7v��F����[�v�QWW�m��y*�[��'��������_|�=y�u|���v�|�_��Q�n����v�;|I9~����������Gjjj�m���>Bss3n���n�O�>����7��o�����}����>���{����^�r���'�tZ~�����_pp0&O�����ohnnFSSv����.���i�AAA�6m>��455��t���>�t[}o��Ew�wo?��>F�>��h�m���g���������s��q��:�:~����o���G��� �vz�|��|����vD��������?����Drr����E V����#((����E��W
v���w�}����������ZZZ����"++---HLL�}�����]w����{_}�~�������0�x��x��W1d��.���?p�@\u�U�������.����ndeea����9sf��_�p!��}����ESSbbbp��W���v������c�������U����l����{MMM4hP��v�����9s�`��u		��W^�k����6������6m��O?��� L�87�tS�l;88���x��PWW���P�=��������������u��?��?;��}�NAA����v�1`�L�4	��rK�m��m		��+�m�_ynii1=O����rl������>����}����o������3�����������
����o����CBBp�Wt��[}������������O�1��r�J$&&����7�������>|�p���t�DDD������-[���o���.��6��>�>_F�Go>@�����3��������S�����4��_�`x!YBt����Cff&V�^��]1D/�N!�B����g����x�Eu�������H��'O"<<�����g����]"�B!D���p$%%n��@pTWW���3���1|�pik(B!��TWW{����B!�B4��������vPB!�B���'������"<	$�B!�?��O,.++��=�	$�B!���:��B!��1.������8p����$�����E����~�	!�B!����
#G�x<�<y������I �VPQ
kD���;D!�B!&�x<��l��|���$@�+�-��r�Bgo��B!�B�1���PRR����������[QSS�������___��^{
N�111X�h"""�}�B�[lp����jW�#[��l\�B!�"Jttt��~�z:��a�������������1~�x,[�������~���RQ����.�5�B!�BT��r������(..������_�&&&">>���yyy�:u*`��i8v�X��_H��&\<�1Q-�xX.����M|���
l����1�j���.f����]U��� C��`n�C�j07��1C��aF����$TUU����N���L733QQQ�����}�455���-*J��1�j���.f����]U��� C��`n�C�j07��1��s��F#���$�v;�,Y�i����Nke%j33�ZY����Fll��5��o:�n�5���T���PS�E�eV��~���.�G�j1U���������:U���Z1��T���;��;~/�]����Q��	����>j0L���qqq=��}�������{-n�V���U���gdd ==v�.������e��}�+*�C1zd�E# �B!�:�����m���&����8p��~�?�����p8L����+]��U5T�W
���h������jjn���ji07����r5������*��w�����7����n���n�e�]���:deeuZ
������@B!�B���U%�������KNN���/+��]v�/_����7X�|9.��2@dd$����l�2���yO��{�(�\�QUCEu���j�H���@�
"1�W-
��1�W�s��3DFF"99��OO�U%P&�B!�B���*�eeeHLL���/+�V���U5T�W
]�����_���� C��`n�C�j07��1Cd��b+�m�H!�B�>��p��())AEE���*�_G������6�~�������������_]��+7W�q��/]�Uu��W�q���@���q�����8T=����q�{Y������������jX	l��@B!�B���� ��-����������	������Db��Z�
rc��\
��12a%�
V	!�B!V�J���rED$FU
��EC���"1���557���_�4����_��
rc�RUU��c*���6X	$�B!�XI_T�N'���a�����m[V
���U5T�W
]�����_���� C��`n�C�j07��1JLL����'N����ge���6X	$�B!�X��9�555�����#:�+�mp�@�Z�EEu���c�j�t�W������_�?����U�_~/�~��	���]�^+�m�H!�B�>TAt�7Z$FU
��EC�y"1���557���_�4����_��
rcd�J`�B!�B���@����H��*����.WKEb���kjn���ji07����r5�������6X	$�B!�X	+�
���U5T�W
]�����_���� C��`n�C�j07���
�N.Y	l��@B!�B�������!11������u�k��e��������_U�G��cG��x�:v�W����q��:�EEE~�Ng��Y	l��@B!�B���� ��-����������	������Db��Z�
rc��\
��1�!2���+�m�H!�B�����\.�;w���

�����+�����H��*����.WKEb���kjn���ji07����r5�������
#G���Q�����������+��B!�+��J��'0|�p�l6��#F�����@�rED$FU
��EC���"1���557���_�4����_��
rc�������qqq=�g%�
V	!�B!V"�����^	�TUA�n7������~�Sv�ww!�B!��#���iBhhh��9�������hii���=������$d����@B!�B�5���J`dd$q��i$$$�\��s{@�{�EbT�P�_]4t�7!C�_#Ps�H�UK��An����� 7�����������9��B!�+��J`yy9\.bccQ]]
���A�uj���A�����s�!##/��2jkk�����a222�a�������.WDDbT�P�_]4t�Z*C�_#Ps�H�UK��An����� 7����Q��Q�Fuyx!ZT�����;���Q����������oGvv6���0{�l���N�����r�B!�B���*����F`W���U%�������1c�x�����0u�T��i�p��1S�����H��*����.WKEb���kjn���ji07����r5��������=��}�������;w.&L���b���X�jV�^��K�b��Mx���f��X���m�H!�B�������!,,��{cc#RRR:������������s�=��g�"(���e���d��.���ZY���L�VV�������;�n����k����0�^�_fu���.���?����b��KU�f�Uu��+P��bf���:U�w�+w�^��~5WT�63��G�[bbb�������������)//�����G}HOO��n�����u��l��.�X	$�B!�X	�����~z<|����2e
 %%������7�]]���QUCEu��e��H���@�
"1�W-
��1�W�s��3|����� ���C%p��}��m<F��;��AAA���CVV�N'bbb�h�"DDDt�
V	!�B!V�W�@�hU	�>}:�-[������?�wN`dd$����l�2���u{��\�QUCEu���j�H���@�
"1�W-
��1�W�s���hQ	�X	$�B!�X	+�
���U5T�W
]�����_���� C��`n�C�j07���	+�m�H!�B�����\.�;w���

�����+�m��n������PPP�}u8~7�z$;�T��;w�j/�/�:V�Ku���\-��j�t�W��a�_U��j��_+�a�����P�x��r���e��W^^JJJPVV��RUU��#Gb��QHJJBUUU��Y	l��@B!�B���U%���>|�wi�'N`�����h]���QUCEu��e��H���@�
"1�W-
��1�W�s��3���������(..F\\\��Y	l��@B!�B�����
���U5T�W
]�����_����� IDAT� C��`n�C�j07���	+�m�H!�B���� 88---����Gwj�J�t�""����������R������Db��Z�
rc��\
��1f���Drrr���`%�
V	!�B!V�W����2$&&vz�>���u�k��e��������_U�G��cG��x�:v�W����q��:�QQQ���x��n��=+�m�H!�B�YO���Flll��Y	4�.�F��������h�2oB$����F������s���+W��AnLo������Y	l��@B!�B���U%���������HII����@�rED$FU
��EC���"1���557���_�4����_��
rc������'&&������J !�B!�Jd�	t:�]��h]����������h�r�T$����F������s���+W��An�<O���tP��H!�B���^'����a%�
��_�����.�C�u�T��.���?u;����;��������e���/�	t:�(**��8�����+��B!�+a%PAt�7Z$FU
��EC�y"1���557���_�4����_��
rc�����������6X	$�B!�XI_�����������>�@�+""1�j���.�\-�������A$������ 7����`n�c���d�9AAA������BX	l��@B!�B���U%���	eee���GCC����\&��@�rED$FU
��EC���"1���557���_�4����_��
rc�PVV�a���n����.B]]]��Y	l��@B!�B���U%����f��l�J`\'���qQ�_]��.����/]�Uu�:v����w�cGU�����G_�h��C{V��J !�B!�J��XPP���`���x_G���+����h�U5T�W
]�M���_���� C��`n�C�j07��1Cdd$���;��+�m�H!�B�������!11�������g�<���x��g�w�^�������a222�a�������.WDDbT�P�_]4t�Z*C�_#Ps�H�UK��An����� 7�QQQ���x�ry����f�<��c

�s�=����������gc��p:��7o^��`%�B!�b%}U	,**�����
�����8��yyy�:u*`��i8v�����rED$FU
��EC���"1���557���_�4����_��
rc�������dDEE���S�N���=��
X�z5�.]�M�6����5k�b��.��J !�B!�J��


(--��_�S�Na��a��hU	���������~����v;�,Y���������Dke%\99�}-�����F^]����;S�E�eV��~���.��|�Z-��j�t�W��a�_U��j��_+�a�����P�x��r���e��WsEj33�>z�����(//GBB����c{-*��W����+�<����������t��v�\.�[���-�r�B!�B���*���������N�J`tt4N�>��g�v��5%%������7�]]���QUCEu��e��H���@�
"1�W-
��1�W�s��39�-*�G��������[o��{�WWW���,8�N���`��E��J !�B!�J�rN���N�0O<�~��_t��EFF"--
��-CZZ��3d]����������h�r�T$����F������s���+W��An�L����B!�B���@����H��*����.WKEb���kjn���ji07����r5�������6X	$�B!�X	+�����FII	***�W����w#�G��M���s�����2�cE�t�W������b��KU�f�Uu��+P��bf���:U�w�+w�^��~������eee�V�`%�B!�b%�*�.�F��������h�2oB$����F������s���+W��An�LX	l��@B!�B���� �\�QUCEu���j�H���@�
"1�W-
��1�W�s�����+��B!�+a%PAt�""����������R������Db��Z�
rc��\
��12a%�
V	!�B!V�J`?�u�k��e��������_U�G��cG��x�:v�W����qp�@`%�B!�b%�*�.�F��������h�2oB$����F������s���+W��An�LX	l��@B!�B���� �\�QUCEu���j�H���@�
"1�W-
��1�W�s�����+��B!�+a%PAt�""����������R������Db��Z�
rc��\
��12a%�
V	!�B!V�J`?�u�k��e��������_U�G��cG��x�:v�W����qp�@`%�B!�b%�*�.�F��������h�2oB$����F������s���+W��An�LX	l��@B!�B���� �\�QUCEu���j�H���@�
"1�W-
��1�W�s�����+��B!�+a%PAt�""����������R������Db��Z�
rc��\
��12a%�
V	!�B!V�J`?�u�k��e��������_U�G��cG��x�:v�W����qp�@`%�B!�b%�*�.�F��������h�2oB$����F������s���+W��An�LX	l��@B!�B���(��={��������GFF�z�)�����c��
�����
P__oj��\�QUCEu���j�H���@�
"1�W-
��1�W�s���hU	<t�>����������8��=;v������y���e%�B!�b%�����������a����i�����c�����U5T�W
]�����_���� C��`n�C�j07����6����&<�����/��������t�Rl��	<���Y�+Vt�J !�B!�JX	�%G���q��'����v,Y�6������J�ff�������l���w#���\S������2�cE�t�W��Q�v��P�_�����0����P�_����0�wM�q�z��_�������������p=
��������/��^z���������n���ra��uX�lY���B!�B����^������b�7���)))8p�`���?~����ro�H��*����.�&Db���kjn���ji07����r5����D�J��={�p8�Om���YYYp:������E���6X	$�B!�X	+��`����N 22iiiX�l����=�]����������h�r�T$����F������s���+W��An�L����B!�B���@����H��*����.WKEb���kjn���ji07����r5�������6X	$�B!�X	+�����FII	***�W����w#�G��M���s�����2�cE�t�W������b��KU�f�Uu��+P��bf���:U�w�+w�^��~������eee�V�`%�B!�b%�*�.�F��������h�2oB$����F������s���+W��An�LX	l��@B!�B���� �\�QUCEu���j�H���@�
"1�W-
��1�W�s�����+��B!�+a%PAt�""����������R������Db��Z�
rc��\
��12a%�
V	!�B!V�J`?�u�k��e��������_U�G��cG��x�:v�W����qp�@`%�B!�b%�*�.�F��������h�2oB$����F������s���+W��An�LX	l��@B!�B���� �\�QUCEu���j�H���@�
"1�W-
��1�W�s�����+��B!D���f�)�O�b�[���8�V6o0��I����a%PAt�""����������R������Db��Z�
rc������H���tO����64V5J��H�N����a���L�.��>������6X	$�B!"�Y�0������q����[`��.�[K��j|x�����=���2�,<����%��3��\'���qQ�_]��.����/]�Uu�:v����w�c������c��������GcU�R�j���Q������~����f���u��@B!��m�M�<X� B"C�}���1}�t�;���M�G%�������n6����o����!~b���u��@���h�U5T�W
]�����_���� C��`n�#�o��<�D��Ovm��H��>��1�h�j2�^�}�jnh8�`����q�hih��'����+��B!����o�����4r��`��aHY�"MCe�VZv?�����C����N�����P��f����H��h�_}+�
���U5T�W
]�F���_���� C��`n�#�o�!Q��z"��V�Om�n����+��W���D�)��s��j����hr�U��
+�m�H!�B*W���c���4�����j�����V���)��Z��'jO���+��}'�3�>��l�X=�������D���>�j����[�r
F�9��F_�J���r�T$FU
��EC���"1���557���_�4�����[x���������:��t%��D-�GF���2H�n��w���P�1�SZ�*t��h8���x������h�o6�aV��@B!�R�Y�z�+���6iy�y(�)�����cJsJ�w�^���VC�_�������j}�=��|�������4�~���5W[��������������0p�@L��	�5�
V�	���h�c*�]��S�_������u�����\'P=K�8�Q�8�=�h�j25nW��SZ
�4OjF�C���^��q�=�m8�~D=NV�45���*����WiB)ZZ,?��N��H!�B
��y������K�(�)��U{qk���`�"�i�&����AS	�Q��[��W�p����l��&n�rb��j���_a@�L��T�}+�
���	�U5T�W
]�����_���� C��`n�#�������t����h�67_�U�B��1�c�R��v�Mi�E���5M���;���Sf���w��f��@Ea%�B!�Y{�����fI�����_�>���2c�)����eF�<R��R8����<R�Y4���SQ �%
� ��������\���+M��V{Akk+�y���w�CFF��������
����
6�����vu�Z*����������h������Db��Z�
rcD�-k(��t������ZSX�(�����:4��>�R���b����;�ODM�����k��I7|�hih15��(�������.��fC]]"##������8��=;v������y���+��B!d������	?����x��A�M�=�~A!�j8�������������I�������Q�A1����rmV{��p��W�f;�y�d.//S��?��M��c�����.WKEbT�P�_]4t�-C�_#Ps�H�UK��An�h%p@��J��8
B�-�'�P3�F�9�Nss�8�z�r�@d����5k8�����?			X�z5�.]�M�6����5k�b��.��J !�B���.D'Gc�c���+�B��l[�
�����)��/�[�q��q���,�f%����b���X�t)����M�6y��n�c��%�*a���J�ff�������l���w#���\S������2�cE�t�W��Q�v��P�_�����0����P�_����0�wM�q�z����\�����������h�n�:��e��C`���~�����G_��������ym+�������AAA�x<x����r�Jddd ==v�.������e���+��B!���>��y#1������^q��C����?�����|p����T�vxw�K��r���b����k����������������8p��~�?��vu�7!����������������Db��Z�
rcD������N���;��Z���������|:�?�r���o��t"((w�u���PWW���,8�N���`��E����r�B!}G��/0����'"�o��u�r����`c
�.�u����t`��q���wi\w������f��{���\���^���4,[��?�8����Jh��iii��v�.WKEbT�P�_]4t�-C�_#Ps�H�U�:��c����W���D�)
���ErC��|���7����������a
����*���:v��gpF�J����\%���D�W�*
+��BH�Q���_8��>4������1c��;Lr����3�x+�-,>��p�+C^��C1���@d�����~���]�e����'��m;n���t 36��kj��]n���*t=h�vU����8�3�T�+7�s�H��"FU
��E���"�|�����V��#{����>3��%�����P�CV�_fn����x=b���i��
W��������!�p����S��p4�m0|��W"1"��\�?�"}��(k(����:�f5jr�����5��%���`��0����t1�k�����P{omzA����H�d�J`�B��/�}�"�
�1��o����0�CduK�_�'��[��
�(��n���}����H��TC������z��>�B��R�{zn�v�����~����aW������_�m;nC�����J��>^�o���G�F��z�=�m�_v�4
Uys���q�����ws9'��p��())AEE������p8:�n��Hv���;w�4�^�_fu���.���?\��Z�C�~���������������R_2����^L��dS�<}0��>j�_y�����>��o|+����R���1w*�\���;���]+((@aI!�)nS�p���P����VU������J�C����'Q�X'uG>�FDB�R��U�#6%�T��_yyy())AYY����6X	$�������WeK}�����|o9��<�pL~V>
�*���\/�_9�`�����$E������9���r/n�~��>���?F��J�zq����~��'�c��N��3=�S��p���6��ir6!kD~���4�?���=����,MCU������ ��KQ��s]����U��YS�_�����Hn�(�0u;��8��c���)x�������Q�p~�4S��6��G�
�oImApA0bF�x.�j
k�W�X�"7��>�AC��S��4�7��Q�+�Wn."8/�_��%���	���b���k�R��5�������IU�D4b���:��	�*
+��X�e��t[)��e�4����O����[R���#��bv�'?9����J}�`ks��/���5��l��#u���Tu�U���W���N���]Q���Qw�W����d]0���`%PAt�T��������h�R�R�_�����Hnp~�D�(�~G�']��h��GE���U��w�k��'�q|W��i�v����3��j�"����Hn8�r��?���p "!���2�w$9s�o�"���a����� jD�4
U��Pu+%N���I`k�D���N^
3�y��=�BH`a��8U1;����a��~�p�=3����1{�l\4���h: ��O�O��n�c+F/����5co���\�����������m�6���������G��/0w�B��8,<���{�*�.�*�U5T�W
]*U��KU��
5�5�T�`��'_W���\���H�����G��(�����14j(^��������N7���Hn8~������D]�q��<�as�UG/���
�ZV	<�J���f������D�'�:ye��2a%�
V	!$��b�<"���\\4�"\�~�4
��0Z�6�+C^��CL�����[�oa������T��M���4�\u IDAT���s`w-����hLz|�4
]0;���)o��W���)�TD:�u�	��_�����.�C�u�T��.���?�?�fx]��R_�K�^B��Q#�������h�8��E��l����G��>N������al��	�{�
�?���&g��~�&���Yjj�:K�p���NIL	�L0�?��u����I7�zm���� gK;N�dg��D-j���K���L{�qT��Q�Wm8��L�Tn��?�N��H!��O�$�ql�1��}s��1���Wg��?����K���kC��c��wp��W"��i� 2�-������ ��t�+7W��1�j���.f����]U�8��Js�����W"1:��������9��O,�O�pq���n���Z��W�#o��w������K�^BiN�a��skM�	4;����tO���{��7u��W�5d�J`�B�1JsJ����K��e�-�O�T}z%���=g���w`�c����s1x�`Lxd��>��6�.�����#(���|:�����J�U5T�W
]�^�j�T������6a����k��9AGDI��/z�>�R��]��;t�P��4������?c��>@��BSOT57���I��W!
Us��a%�
V	!��s������
!L�<T����!D6�� ����QUCEu��e��HL �k�)��bt��l35���kM�rnPQC$����P57��!V�`%��X1G�B!]�J`?�u�k��e����h�JsJM��8����g�?3��������?b�m=�����N��b4^����0�^�_U��j��_+�a�����P�x��r���e���	TV	����nD��*S1fv���7q��wI�H�!2_O��y�#E!���*�.�F��������h�2o�=f�����	�{�M�3<+N��p{E��
�;���H17����ji��{U=���\
��12a%�
V	!f��g�#vl,R�-U���B!z�J���rED$FU
��EC���"1"Mc�P�o<q��[|��g���;�����ji��{U=���\
��12a%�
V	!f(�R��/�M�d8����
}����B!��J���rED$FU
��EC���"1"U�U��3W	<�����tr���cnP+�����K�U�x��r5�������6X	$��������l:n��E�Ld�!�B+���
�zZ<��'{)�%���)��	����������w���:�uH��z�w�x\�����J��h�Q�nM�m�������L����a��y��#u����U�
T���������e����*@V�O���)o����
�|���W���6���ZR�bH�!��U�W�����K�I�� �S�10d�4
B!���J`���~�����1cL�To����k�O�HEUE~�c�����P����[|������������%2v������5�@wKj����i�Pt��"z�S}:��YL^:��	���7�8�c�kXC�+31Vh��VD��q��/U�����N��1�r:*[*
��h���+����rnPQC$��27���lX	l��*��6|���o�#����?���=����l8&�&W�]m�����8:����V����0c��v���M�&d�f��13+?+o��w�7�C���p���9�\��4��Ggbm�Z���)M�B�G�������G��<|���X�f
>��S��3��+���������W_}�}���6l@FF6l����zS�y�����!%,��F}Y="#L�����g���������UOV6wN��2S��w1��Q�����I7� ���-(|��m�
��4�%l�t����l/���Db����g1#r�T�K�/AU��;D5�	u"1�W-
Us��"1�W�s�������kX�`���?`���x������OZ�I_�Z�
�V���~vv6���0{�l���N�����r}U	��3����|�������Qw�W�.������$�����v�f���;�b����o�$$1zky;���.�7���X��H� �B��*�EEE���@FFF����\������^Brr�e�
yyy�:u*`��i8v�X��������Gks�a���bl������Su�j�T�+�	W&���2�U���;1�����$���.WKEbt������������7u(���3#Q�R)U#P�F���_�4t�����W�s���t[	|��p����f����?��!Cp�-��f�!2R�5��z�)8aaa����p�EV�^��K�b��Mx���f��X���mtW	|=�u���V�G�
����7�gC~��c�����xF�h��?�p!�tEqS1f;E����+^����#>8�L�ew�BQ
�*�n���
���C�����s�"''G�@���w<��c���+���ow�?���%K��f������J�ff�������l��(;�>����=���}���6���p`����H�0���_F�����"��+7W�q��?����b���*ks�#68V��}��A��P�����r\Y��@���q�����8T=����q�{Y���\Q���L��Z���n+������P�u�]�>}:����3�?@�j<�~�i�\�������t��v�\.�[���-�2��J�����R^��[w�G��&��B
�{������1h� �1����>��'���Kv�wW��/_�=�{�>y}w�BQ
�*���O���+�r�JL�~~
)O].Z[�������������������������-�^[��I['�����x�������n7u�p8P����+u����������	���m�(S�,�'�9'P�����.�W��]U�9�M���=�L�:�T��������
959R�.��H{������	��?�n����`���a��HHH���!++N�111X�h""�^���J`^fJ��b�_����:vi�i��{�!�|������?@��Y��/;?���~��S>���XN]k������-e���J����i	1��;�t �\��|��yG����[�9�
D��j�����X�*��S�z��}"##���&�m����Q��)�1���������c�_�cT�p��"z���xe6���"�|�[�ou��J�l�ZN���fn�@]rC��B���e��\�~�{�]S_DR+Rqh�!S��m��i��s�SnP�����GAc.	�D����n;�
.5�l��F��$xfx�j��{���;f�te�������J`Ma
�����E����#��'�.�5�[��������xq4:p����HU�V+XQz����{��{B���<U]E�=��g��Y=�1�1�zD�����g�Q#��*v��:v"1�jp^��U�M�_s�U��*r��90�������I��]Uu��%��j�
������X��)�)nj�V3'���C��12a%���*���O�����1	�\�Hw���5�Q�yBn��
0W;��A��BHW����F~c>^Hz����+v����%����_�wW���J`?�v�QRR���
@AA����pt������lS�w��i��h���X�/]��e�rs���������j���K�����s<3<p�;�~��9F����~!�_�q|��F��65��
�������
U��Z1��T���zf(�����|G>F�����~/�]����PRR����'����6X	$���-��������y���^��<�-�[��������+�h��
������~�����G����x���B�VD�{�EbT�P�_]4t��"���*�	�0;O��������p����xg���K��6���gs�8
kX�o��I�IR5������a%�
V	!:0��dd]��I���+�B�yB?+������
����
!��;� ~�x_w�bA�,�7��!+�J���U5T�W
]�F�����.��qa����o8����,tb\�8������bt��V���n�[�ono���M�1`�T
�07���lX	l��@B���Sn����g[E��8M*2u"�"f�	��?��>sh���bxQkBH��lq"v���f6��88���B�����_%0�RU?��p`��1�b\����3G��1��4��c
��O�H��A�Li�vh�K=a��������T4o���G
�X�?���=vE4�<�E�E3��=���������2��z��J�j���p`����'d�����������@��S�0�"�'�6�^�����6]4t��V�g��3�c����f�{jE*N:���C
���/s�Z�a%�
V	!�B!V�9���	��u\T�W���u��?��@]���������_u���2���u��@B!�B����3m�;��1�%���7�
��$����������Db���kjn���ji07����r5�������6��^�2rkD��/B!�B�V��p ,"����cT�""����������R������Db��Z�
rc��\
��12a%���*����q�����~�!�B!DWX	T���H{(�j��cT�""����������R������Db��Z�
rc��\
��12a%���*��$���w�������+B!�B����O�Z'pDjj�M~�^���������.�C�u�T��.���?u;����;��������e����*@w��{&��5Y�`�����+B!�B���� ���!h���AU�P�_]4t�7!C�_#Ps�H�UK��An����� 7F&���]%�����_�L����"�B!��
+�
�p8: M�-�cT�""����������R������Db��Z�
rc��\
��12a%���*�����zdf���"�B!��
+�
�p8lCk��p��WDDbT�P�_]4t�Z*C�_#Ps�H�UK��An����� 7F&���]%p�}�b���������+B!�B����O�Z'p@����=���^���������.�C�u�T����3m�qG�K�^��G���G��|i�Pu�:v�W��V�����_U���2���u��J����&�]�L��^BWu��|�����o�f���Y���!�:F!��>��@q8�����xo�H��*����.�&DbT���c�j������g����������;�o SQB��M�M���V7������i�&��])\^�V^�f]���uMm���t�5i5]3���� dE�Na`��������#�3�9�z>>�����s�s�����3'fL/���P�������
jbX_�r�e��ug}���� 6F$v������K����w�}�FED���w�x��������X���X���##""���N�M������?��1����7"))	7n��lV�N��O/�4[������Ys�X_�����R51�����S�;�C��B~�����8���=8����i%�����1��\9�2������bspn#�.:�


��s'N�<���D�����		������������9��u��nH<�K+d�DD�|���$�7?��pLcC&��MQ����cc�:����<1�#E9�[�-ODD��	�	��)������F��=�����k4�����+-��xFDM��9d��^r��l��Y�[Rv���^���u����m��*zh416�����C��P�]�51��\9�2������bspn#��w�����'����Bbbb�N��+��;wb���X�r%�/_��z�u�����*��3��%j��:���||��E���o�a;�DDD`'P�}��a��iv��`�k>-�����	��j�������������>�_�{�����t�����h4*Z����,�&��R�2�W/�G�[o�b;d���V��G��@��P�|o@���B�Ci}��_i5.w���������!�������.��q5WV�n�&4��Ak.�	|��W���	���Ddd$�z�)@RR���������#!!������X���_W�����DD��5��!����G����I��@��}�Y,[���-�����
 DGG#''������E�6��F]M��1Z}6�DZ��o����fM�%:?{..F/�M�������2�����
��Ixw�����r����+��������Al�H.�	��_X__������ ((������_���:�o)��������6�V�N�
���~��;�3t;��/)����z��t���|\�	,��HDD�1v������?.\���,\���@{�F#������Yl5gDj/]A`���o�;n�S��~J���8�v��-��#��������OL�H���,��cRS�f�
���;��bd����r����+��������Al�H���{����:��r������.9]�E?�{N?,,u���|�,�
�{�
����{���;0qz��Z����|�"�="""��(!����[W���������~�qr�z9[�&FM�Ac��\u������3��.�	�3�
��7M97�a}�����W������s����	lc��}���Q��1KX�i%X�x��C+j��:������HV�'���5�0|Ba9~}�?���it{a9���[����Y��N ���$MMM(..Fee%���S��F���hn��{������-w���#GP{�}��3����yD.u\������5���n�s[�����'.U4�����������M������u�m�8[\$���#��=��T�U�|DM2)Z^�v(�������r��j�J���������������WAA���QZZ
����^'���*���C���9�r�n*���$n��CV�����"zdOg�H���a��H���a9�	���u�o�d'����c�J�h4�`��o����*�QJ��:�F��������~�\\�^��P�&Gp�5���	li�(��g��fEoe�U��f��:�
��.El�+W��������������N`{����&<>y>:����O��
�!��}�S�����"��%E���'U�|"������)?����
�T�	��VgD���AC�����@)r��l��59�{�(�v�E��; P��d�U����js����&���+�^�^Y�w�Wl�
bcDb'���N`M�<�U��;qA����������I_��B=��~���b�P������(f@Lv��Z��HF{�~_�/�K������M8�i	V�+n`b'����c�J�h4�����5���{M������Yuw8�[�����^�oe|>�r�TM��V�*\�hpxySz:JN�����E���(W51��\9�2������bspn#;�m�u�-��������OL�q/������r�*����������g�f����'_��������������|�����a�T�������$�.�11��C!""�
;�Nr���+BS�_�q�
7�����QW���>#�������C����On�|�G%�����M��'1p�U�v��(��L�e�tYQ}M
��; @��Ps�*}>�GT��b�|��}�����W���}�X_Y��f;x����({�@�<�is����p���[���_��@��0B/���xh�(��C��H+�S?<*�p����Kxr]�+��Eo��J�����[���}9�<>y��x��;��C!""���:�|��	�����(E1�'\���r�M��oU:.�1Z�8`���g��P��%X�i���U�J����0��#`��9����U^}5UW�����",���q�Z��[���	P���Z+Y������+W51�/�W�!;�m��XW��{�7)�;f���p������N Q�xM��T}c`�����o�<nU�P/�z�o��+�^�AMM�+6G��
0�5���s���W��<7�����s����	l�U�@""jo�/>��'������="""��(!��Q#k���9��f IDATz9[�&�����w���:7��a}����s���987�����6���N Q��	t��'P�KQ�G����X_�<�O �+���*��z;������b;x;�W���u�����J��@""1���AL�;��r�P�����N�����h51�����z����&����bs���Es�Ehw�����r�p����u������N`v���x���p��������
�T�	��^�����5����K��-U���������1��\9�yn`}]?��1"����@""1�	$""�;���51�����z�����jbX_�9��1���Yhw�����r�p����u������N`v���x�?c��P�yr���BDD$v���	t����X_�<z������K}�<^=��w���V�r��j����+K}��L���}%�N ��u��@	����jbd�!c}��C/�M��a}����~Ixw�����r�p����u������N`v���X��#����
�T�	T����V�Brr2V�Z���|���f36n����$l��f�Y���rFDM��9d��^r��l���Wl���h��"4���
jbX_�r������~�
bcDr�N`cc#<==������/c���x������		������������9����HD$�_�x��X��(g���H*�����
OOO��d�����w5��E�����u	��3"jbd�!c}��C/gK����bsX|���p��AM�+Ww�X_����Al�H.�	�s��a���0��x��'X�b����s�N��?+W�����;\;�DDb�HDD�1voBDD�-[�Y�fa��}�~��`0tk��F��M�TW�������oo����4��+Z�h4*Z^�����b\z��^�������v�:.��W���p	=K��J����J�q�k}����d�Y�w�W�v�u������u�6�)/Z�E'�*���+V 11������8�d2a���HHH�0��@""16��
��F,����
�T�	T���
MMMZ?kb�]tt4rrr������Q�n�|6ZM��9d��^r���	51���M�s����&���+�;�
������ 6F$��~���8p�<==1g�DDD���������a��y��8���	$"��@""�����w������FBB�,Yb{���X�p!�p�B�o���51�����z�����jbX_�9�	�+���+�;�
������ 6F$��vv������
J�M�_5��C!""�
;���51�����z�����jbX_�9Z�+���"4���
jbX_�r������~�
bcDb'�
;�DDb�X���o���_�v�P�����N��455�������S�N�~��v������TE�9rD��j��4���K}��|���u���K/�U�|\n>)|;��W/��V�r��j�J���������������WAA���QZZ
�����@""1�	$""�;���g�����C���%�^��P������Q��F���:7��a}����s���987�����6�����B��Y�L����BDD$v%��3"jbd�!c}��C/gK����bs�^Qv���1��\9�yn`}]?��1"����@""1�	$""�;���51�����z�����jbX_�9�	�+���+�;�
������ 6F$v��HD$����������9{(DDDRa'�Ix�@��������������:.��W��QY�=���n����N�q�k}���������|]��v�>�`'��H���:��r�����="""��(!�|6ZM��9d��^r���	51����L%�s����&���+�;�
������ 6F$v��HD$;�DDDc'PBz9#�&F�2�W/9�r�TM�+6G��-��9�unP����������spn#;�m�	$"#�p)����x;c���BDD$v%��3"jbd�!c}��C/gK����bsT\:'<���
jbX_�r������~�
bcDb'�
;�DDb�HDD2�,���[�W���<�����:	��Z�q���zy>�r;Y�����y>N��Z�v��}�����W���}�X_Y�����6o�������p�w��k>������������J��@""1������;G~�����YZ���QQ��O���}^�DZ	�����J�zj��bB�#���sV'�K��.�h4"**JQ�)=S���E��9d��^r(���������jr��KQoj��]�������� W51��|sC��-�����9s��c���Eo��o=�q4���^��qk��Y����@""1
����=���������/��-�����v�P��k%��oIR#k���z�551�������s����&���+�;�
����P37T\�G����(����9Dc'�
;�DDb�HDDW�sc>r�]���}LW��	��^�����5����K��-U����q���Uhw�����r�p����u�j����u�J���6��Qz���'��Ggq�P����^�K��q�1;n���"v���	t����X_�<z������K}�<_w\�v��>v�'�g����s���g������:�3/����>8|�N��W����������������|ei�T���	�(v;��.�,5�[�e���Y��(v���`'P���Z����p���;>����0?�#s}G?9�
�'�v�,a9~;f�{�n�>���1���WFJ��oKLN��4�.a9�����	��V��>s��w�v��oM��q��=)�>�2�q/%51�����z���{)�:.��WM�s%g��	�fn(((�#�rx�������qxus��7F�����w@��y�	T��;�&P��j�/;���N��1�/���p�M�)=y��B��^��e4���;0������k�b"��`���Tt��K�~���v��{x�z:�G���}"}�u��s�uxL{�Z���}�����s~������N��e���O?e���o����	l�N �e�&��O��_g]�x��K���C�C����n�������n*��OK���X�����xt�����sh��b�x����(yQ���=�K����6��r�w���v���K��c��^���0a�R�DZ��7tW)y>u�S������~�M_>�pLei=�����<Jk��9��ln��.o�]�����i0"��N�JEEE��k�V+���0{�lDFF�f3��}��� ((�����������R51�����z���N����K}����tu5�Bsh17|���[���t{(��U;�&�h4��t
�p8O�m��,5+�����a~8P�;!�R�y�c�E9������o�pxySz:��*0;.F���lGH���qBs�F�����s���[x������W���^���cc��H~a��������?>���!�G�������Bv�XYY������l��			���T���`������@MM
f����z�	$"�t���u�.-p�Pn��7�Eq�e���?q8f���_����q��8?
�����Z~��^��c^���BAN%^|�Sl�q��C66�`����)�H�������%����j���rS�t'�p�����>��mF|��4���i��P+~;�Jaaal�v"<<uuu��`��Q���G#??_����_�Rz���^���z�����Wl�3gO����p�B"+���r/)��PT������PJM}�/����U������p������bp���
�^je41tl/�g9�9�b�-�(Fx��N�RZ��R�E����{1��p|�E��9�G20hX������.�������q��i��;�b�
���c����?>V�\����w�N �z������7O������p���*<���T
~q�l����f��O�����6Ou8�d�EM�4��qxz��c<����6w��$,���-o����/9���?�=�{��}���c������TVV�����>���`���SKu5�6m�����4��<�}{��w��)=]��F�Q��j��4���K}��|T���.�C�q���j��3gOcb�b�����j�����.�h;z���t�%�G��q�
7�����(=kB����y��+�Z��V]4�?��q���+���k�n������i�F�vE�y�=^(����y��������]�U����[p�H��qA�Y��'�v(Y���Gse%�6mBS^k
���N`mm-�~�m<����=������8�d2a�������@""q�6(�B-�����=hE�k�d�V�\���v#>y"��&,������/��#�z�P������(F�y����Rcc#6o��3f�{������dgg#&��o���.E/�u�K}��C/����a}]?�����B����\/����V����"*��};�Rj�[u���>���[i�d���A�[GYp�\��lcJOGs�������nw=���r��{)��U�yQ/9Ds�N��������g�����y�xxx���)))�"��@""q�6(�9����#��^`2R{?>��U�z�Brm���@xdY:��5��i��@����n�\�K�.����h�,,\�			X�p��{��T���5����K���V�������Bz��r��s��.��S|/0�9�Rs�������@��n�F�Ue���N�
����T�H3J�M/oJO���j���{u��^���er���;�]��@""y��m��L���9���J��^`j����'V���X�oV��}��9s��,Qg����^���}�����H+�JH/gD����C���%�^�F��a}�����*�Q37����L�����()r��r��4E�T=8b�����+t;��1�e��QzVY'�T�%�C���R_�.�+�h��a'��H	����a9��^`J;#���WFJ�l��"���b��!�����1s�m����!j��	%�
;�N������bTVVN�:e�i4���#?sSS-��E����<Z�K/����aJO��v�:.��W��#8���
���hD}�E��������q�������}�	���M��]_-�C��5Y�C���dQ.�������v�o����/�t�%X|�Y_���e77������Zc'�
;�DD�xc��������Ha9��^`'�J������-����:��<9L�����;���g�����C���%�^�KQ�����Go?T�;~����~���������L������Z�c��?��c{	������ 6����������N`v���������y���{Z��S����	��Vs��{x�z
�{b'PBz9#�&F�2�W/9�r�TM�+_�����\��}/��D�}������{zy(��`�g1���������&����������N`v�����y)�X��w�����y0""rE�JH/gD����C���%�^����a}�����*�O���G��LM���w���b87��a}���� 6F$v��HD$������uhy����\;�N����}\d��^����N�q���Z=�=}V��}���js��L��C�q�����!����������_���
�0�����y�����?��p_�So��./�(v�����$S��7U|���b���v%���F���5����K�\7�&��u��:7��a}����Al�[������_���&��R����4e�Jy��
��V���:�l�oyM���HDDDD����u ���}I�-�v;��u���:�^�fu1F�QQQ�bL���2Eh-bd�!c}��Cim��p��]/��5���
Z��]���A�jb4��g��7CF���R�FHu�(��I���[�/V9c��@�
�C�}Q��]$v��HDDDD�n��xM���r���Ys�X_����ujbX_����s���W�n?7(�>��uk�3-�T��M]�&��2Y�C�
bcDb'�
;�DDDDm��f
`��;�N����}\d��^����N�q����>�z;����w��c��������k��Oq�wF�E�
���-����y���? ������;_����x�@	�HDDD.����;���g�����C���%�^�KQ���~w������s(�7�Y��Wl�
bcDb'�
;�DDD:'�����m�(!��Q#k���z9[�&��u��:7���U}�~������y���rn����!���������@"""'�unD���	��^�����5����K��-U���~w���hV_��t.~���1����:���#��K��~nJ�KGD�)v���	t����X_�<z������K}e}>tu;	��f���t�����������������'P�Q��unDD�	v%���F���5����K�\7�&��u��\������7U��q���Al�+6��1"����@""��� voBVV^}�U<����7����q#����q�F��fE���51�����z�����jbX_��k�AS�<V`Z�;]t�d��9������ 8���Wl�
bcD�E'���>>>x��7���h{<55!!!�<y2222PSS��3gv�v��-��FDD�4�����p������

0j�(��������h�z9#�&F�2�W/9�r�TM�����C�4����]�51��\9�2������bspn#�.:�W%&&���X������s'�����+Wb�����H�3k��N�H�7"""�b'P���@,X�����[��Q�i,��0�����������#?M����7���W;.�y��^��������t��<��^��t��7�i�W��r?�yj�6E�W�9>�v�*�T��uW�������]���v(��&�v�����b����nn\������	Myy���;�III���C`` L&��_����c�	$"""""-�(@tt4rrr������Q���F���5����K�\7�&��u��:7��a}����Al�+6��1"��x��1����������q��w���)))���APP���??���N i����p��w��^����g������;���X�p!�p�B�o���51�����j��q�P����R�1�����q��������� W51���������N`W`'�E�����R�FF""""��@	������9�zO3���������y�B���&�gK������C��WO�����s�\9����bspn#;�m�	$"""""-��$MMM(..Fee%���S��F����;�375U��G���p����
�C?�/�P�<Vpd�����`y5����n��qi���t]l����K}e}>��W���u\�Z_-�C��5Y�C�����|]vs�*((@qq1JKK�5v�tY'P�5k���������(g^��������r���ujbX_����s���W������bspn#;�mxM i��@	�����Ys�X_�����R51����p��AM�+W�
bcX_�987�����6����	��^�����5����K��-U���~w�����r��� 6����s����	l�N i��@'��>�
�w����~4��|��>v��K/����p���������>v��,���2����	�;�DDDDD�%v%���F���5����K�\7�&��u��:7��a}����Al�+6��1"����@"""""�;���51�����z�����jbX_����s���W������bspn#;�m�	$"""""-�(!��Q#k���z9[�&��u��:7��a}����Al�+6��1"����@"""""�;�N����}\d��^����N�q����>�z;����w�������|]��v�>�`'��������N�����h51�����z����&�������]�51��\987��a}���� 6F$v��HDDDD���� IDATDZb'PBz9#�&F�2�W/9�r�TM���9�unP�����s���Wl�
bcDb'�
;�DDDDD�%v%��3"jbd�!c}��C/gK�������]�51��\987��a}���� 6F$v��HDDDDDZb'�Ix�@��������������:.��W���]�c������������������'P����	��^>�&F�2�W/9�r������s����&���+��1��������N`v�����HK�
`6��q�F$%%a���0�����rFDM��9d��^r��l�����s����&���+��1���������N`jj*BBB0y�dddd���3g��pYv�����HKW;�eU�
jB7/��|n�	,((��Q��G�F~~��x��Q#k���z9[�&��u��:7��a}����Al�)=�MN�g<0�5F�������sC���7x��Eo\�����WJ���+V >>;w������r�J,_���e+/w�����(�S 0����i@�TU��7�=O|���
��lKI�sNrR�/��X'"��q~'�����N��_�c^�V�
i��O�M��#W;�^]�f�b���=�����&OX�\A��3h�;�>E^��|E^�m�@����~�n=������������+��c�@vK_�z:��������p���+k/���|�!~�d:��p�������n����������\X�C�����R�vZQk�vxy5�������FS��>��z���$�z]B�����W��<x���7}�-�
����n��<�S��k	px?��\��
��/��������j��s���P��+���O��Zl��������]���v(��&�v�����k�����?����Y�w-^�����y�����=z���x��x�^�FF ]w������@�L&�_�			.[y�z7�{����������W��>':���Z���C���%���������^�+kw������s�\9�����\%��_`EH`�[`��k������h�������(><<\q���'���"F�2�W/9��VMw���R_Ys����&���+��1�����.����~}�,�
���X__������ ((������_��v�	T�RY	��=oz=�1�W�V,�W,�W,�W�V,�W,���\���M��M��fq�0�������p���*�*o���?�j��"������kU�+11Q����/Grr2���q��Q��YYYx��W������Ld�oQQV�Z���$���(..V58��f�7nDRR6n������=]Q_=���|�-))��U�����U�V)��Ld��U����~���k���+�����w��������$dee��d�����m�nRR���W�Xd��o����`�L&h�Z���PMr{zzb����=��o_<��x��74�H�����.D`` JJJ�u�V�_n$;g����������������C�a����w�HG�7n�n�]@����g?�3�<OOO\�|k���s�=��x��l��9s&PXX�n��i2�d�����H��fdd���+HHH��`@}}�&�A���7���������d,"�M��	

EMM
v���>}� ,,�v0���a�����������������_UTT��{�����=|||�I�������aaa��G]]]�l�8������=������3��
��U���r��d2�����X��b��)8��KEd#c}�D�����������z�q�K��^����1c��r��&�����555hjjByy9|}}mCjj*&N����TTT��>�����>.\@jj*{���A�X,X�z5|||0g����[��jM��feea���b6\����dl��	���w�����\_W�w9�{��9l��f�O<����K������d2!22R��kD���z=!c}�����7� ''~~~�={v����D��@cc#N�>����7��\<�	t1���(//G���q��\�|�vv�����p����hnn��q������o���E@@�
s��@�������;v��'���N%k}��������/��Y[���x��!k}���r�7""��-Cvv6�������l���of���U��t�����	�k�X��W/���#//;w����v��jN��@nn.�//�}+��#wS={�DZZn��6�������'O��~���^�`��y��a���9r�����t��0r�H|���7�%r��������e~�a��H�3k���Z�d2�t����z�w9�{���#�{�n�[�|��������.���k���SO=u�[�<�����	��#Fbbb�}����D����@���'L�p[�|�vP���O�o�������=���`�������	�Z�8�|��{xx`���6m���v��d2�bi����B���K��9�l�mll����1c���~�Y����h���h��cbb�|��M���i���oUU��Z�mk4���!�j����b��eX�l|}}]�
  _}��zB��@TT�F#���3���Gn��d�oCC��=�!C�t��j��@��GX,���>>>��G�{�Y�fa��������j��#��_?��_5f�dee��o���9����B8p���������sm�;v��������+Wb������;�B��{��Q\�p{�����{�<��mL��Y��{��)))���DPP���������i������'q��xzzb��9�k �l���������}=��d�/L�6
[�n��={l_���d����~��n���?

�f�DDDDDDn�7�'"""""rC|HDDDDD�F�&���������M ���@""""""7�7�DDDDDDn�o���������j�"55IIIx���p��Qg��i^x�|��'�C���^�n���zK�a9��o��m��9{DN���W����'�.s����� !!�������xyy���PVV���Y�h���"r���z@qq��GB�\��}p� ++��s����{�$&&��y�����$&&����7V�^��7�_�v��	�����������F$''����o�xbb"v���W^y�~�c�\���'1h� ����������c��}�e�{�=�={p��i����HII�����v���@�����'N 99���x��7q��i����/�ZZZPWW�U�V�N��;�&����������������b��%x��Gm�_=CF��������C�i�xTT�.]�n���{���c������q���v��1A����������A�PPP6l������eee�������<�bcc����n]<�����0d�,Y�K�.���>�]�v�w��>|8��������������g|HD7e��Q��l���D�***
���u�����@dd$��������w<&��b���4hN�<	������?***p��I2�p��EDFF�_�~���l�.����}�����o����$���;���L�2���hii������8�x�5�D		��x�:����!OO��������7�-[___�����hw��Eyy9jkk�f��/t��������o����(++��$�;{�m����gc���hii��+W�\Acc#.]�������?"�����������B�~<������������a�� ������ WVPP�I�&a��eX�l
���"������w����34h�-�O�>8w���?�	D��}0�����'��kf-��?�O�SDGG��O?m�����@"��'?��"���������7���nl���
����C����g{����hDCC���1h� �����4�����'Ob��	��<x0


0x�`������	�>r�}�!%%���3x<�^��9��_������h��� 77f�#G�DKK��]����#,,����W_}e��:eDDDD�/W?.ZUU���7c��%���kv�������:u
}����1k�,g��~�o�����KEGG#>>��� ";��0DDDDDDn�)����F�Y����R|E*u.33{����/�l{���3��s'�V+�������{��������>��`��j��3c�q"�utLt����(9��d��~|��q:tp���b������b�`��=(,,��`���S1f�,_����Z��l����6��&u��a��]�Z��������iw�-o�8Q�)o���0r�H�$r


(,,D�n��=����g���PXX������^�z��g����'._���k�������8����W�%������?��c,Y�X�j�
�fdd���+HHH��`@}}=����.]�T����RG����.\���@���`���HHH������w����]XHaH����������H^��)S�{�����z+ **
F���:���m_n2�������D2�wL�{\�z�[D�fo?FSS���r���999�:u*������J�aaa������N��]y�h�	,**Bhh(����NMD
����d2u�����0���a��a���oPSS�����;��[��l6��'����D2�wLtv�(Y���"r����9s��n������������7�|������a�����b��������9s��w��]�D]�F�kVV����~����D
�?�y��L�4I��D���}�0m����_�
���X�j���o�����,[��f���}�n�8����+J�(?������x��}�?>������n�X��W/���#66;w�����%K0q�D����K�O��:�_��������3g:�����w����_���Z���!**J��D�RII	�}�]���khhh���Z��-B||<F����}`���(**r�q"��;&:;V��Pli����������������o���� �1���r�})���#q��E[A�u��������e~�a�>�ioy{���~�����V'"������;11O=�����f3���`�Zq��!�9����k��[���
�������F��E����de����X�?&�[D2�l?@YY<<<�{�����v���!Cp�����&�	������@aa!z��%v��n�������7o��3~]��������8QK�7�����X�x�V)�H���<|����Z�4h&O�l����L�;w��`���'������Dz��lmg���������{�<�@��ut<L�6
[�n��={����9s�������'|||0w�\��O���������p������{��y��G�|<t�|G��Z�����
T�G���"//������!"""""���������a��a��aZ�#"""""��+�����������o�~���5k���^z�%,_�\��(11���7���]��f�������APP���???��+]��xL]������E�%���U���c���|�����q�=�[����!66��r���?�x���P__���|DGG+^^�z�d�c��D��t<���������OF������	&�����S�������X�V��uuux��w�j�*�^���������o�>�2�����=���?��s����HNN�����-�{�n���+��h��/��u����w������{�n��B�F


0j�(��������jy��!��	�kx<]#����5�EEE

EPP��������G��)S���h{<55'ND||<������wc��a�
nnnFYY"##;]�'�|���~K�.mw����f�3�/��������������'�|����jA��L&`��M���;���U-�t=D��1At
��kd<4�&���#�4iR���������=~��I\�p���knn���?���QQQ���r2�������;v����C1|�p�������`@ss�-���/�����������-��b��7��������5<����x��M`mm-����p������[�v��~�����EYY��%���������"dgg����X�h��7�?|�C�'�V���E`` L&���U-�t=D��1At
��kd<4�8���G1~������7��;�������
<��������������g����A�n��K�.a�������QQQ���E�����������FLL��wk��ux���C�JxL]������:�����X�x�u��>}:�l���}��������Y��}�vdff�j�b�����BBB��������G;[��m�P__���3f��H"7r���"%%�����)�������;[�+�1At
��kd<_}���z%����E^^~��_���������\��]�h�&�����������@�oADDDDDD��7�DDDDDDnD��@cc#��Y���xxx�{��������]'++@mm-^~�e���{��Aaa!�N��1c�(^����\�����������|�2���?u�|�r���~���`���(**��]�`�Z������g#22R�����hW�w�<�F�+���L����C��~���cf����>jjjl_����j���-J�=z�Z���C�=0x�`����i�i�`0`��	8v�X�c.==�/_���1i�$_wOOG�c�q"W��o_L�8'NDPP���u&==��1q�D���p���u�]���E����m�6L�4I�����hW�w�<�F�+���L����~�iL�0���1e������<��C���G~~>����1//�Mt�W�t���%�n�����{���/"<<����|�2�|����Q�F!77�'O���������������������SO=�v3�����;|<''=�������c�q"W������������;<<uuu]5,"!:������-���J��5Spp0���������k��


=z4��_��3g����!44�v��b��������'�|����e���1f�,^��6K�.���/�.]�7�DNR]]�o���V���u�PVV��!9]cc#N�>�!C��pY�����Wc��u�'*++C�1L"�(�����)}�4g��]�o��f����L&������;���U�I�7�G�i��OOO������>>>hii�68DFF"((���Z��:a�X��W/���#66;w�t����.77C�����?h��?�K�,����c��v�+++CFF��3�D2P�s�'�R��i��}�?>����}����]`` ,X`�*�������Z���!**���u�!7��D$FPPF�������A�n���q�w8�l���#G����m����b��-x���o��!"Y)������)}�T\\���HDFF������P[[��+x3���o�=����w��<==����%�""����`4g��i��Rk��u�������g������?>&L&,����z���q���7c����\������I���f
@YY���������FNN����111����-",����x��.Y�]w��5k� 88.��u���;�����l6c����>}:���NL�6
[�n��={����9s��b::�eo=�'r5�~�-n���?
��c������'|||0w�\�'K/\���{�b����g�yF����D�h�r�J��������{�� W�U��~���������`�<`{��{�EJJ
233m��P���W_Y`����W����\���������������k�.v�
�a��i��������:�����	$"""""r#�}���f�������=--
���]���^z	��/Wu_""$IDAT5"wd6���������vq���������p��������_V�"Y)��;Z����v���j���f�����H
���k(9�������Az`�5PGJJJ���`0�j�b���o����s��E��}U�h��C�=0x�`����i��7�j�����{��G�x�������<��C���G~~>����.o00a�;v����t=D�R�/w���������������?�m��I�&i�D]C��p����=�F��{
�L�8�&M��a��y�fL�<@�yyyn�����!�sDff&&L�x���n�:����X�z5v��m[.11�w��+�������2������$''wxo��>��s����HNN����m��:u
��������j{��������U����������C�


0j�(��������������lR�"Y)��;Z>,,���Z����:��&D����~����^u������Z�)��N_W�_XTT���Ph��c�=����O>�$rssm�677c��1X�x1>��[F�.]
___,]�O=�T���O>�O?�4�.]j{'
����#�<�)S������~��c�">>�����`R"Wf2��v��w����z��������7Z>++C�3X"����?���7����s���k��o�y���]y<h�&���#�>����	___������---������H9����������c������������������O��e��]������Z��)���-_VV�����9S�0�4��x����o����,[��f���}������M�������QQQ����
���P����?�|L�4	g���;���|�Dt������h=3������9��}��������e~�a�����{�� w6r�H���+�M�=z�����uyzz��gg��t�������{���s�������:�DNN��[,�.3��DGG�����l�7W��Z]5�!r%J����oll����1c����k3p"�����Az������
MMM�����k������f���X,��������d}w�u��Y���`,\����o��
�����Y�j�����e���ooo�����/�u�V|��g���E\\��[Y��������dff����������?v��������+Wb������;;]�+QzLt����Gq�����{����k�n����������%�$a3�A�FE�Q[,F�F�N��Q����6� 2��NmH��i�JJj;���ZqM���1�Ib&��*��"�a���[��r��->�/����=w=�s��{�HKK�>HL8z���z/}��2������j�RZZ��(�F�����|��3g�8��k$Y,��?���??!�!�B!�DVXX�� P!�B!����%B!�Bq�A�B!�B�GFma����������;�~����S$&&�X|o�+��M]]���8L&k���l6�������QE]p��
U�O������y��y@��Q�����K(���%K����s�����9t�mmm�������K��V����>}���r�}������k��B�G��kUU�������g��a�|�����;c2���b����4@s7�E|o�+��M`` )))���@~~>��������h����}������m�'N�}�v{��U������}���E���k4NM�;y�$���$$$P^^NYY������K��V�����?����p��Y��;�6]���]}�:u*��m������b��f������K,p����<�����-[���+�A@@���L�6���{���,�����b!!!���,%%%���������/���f��8111����n�:u���/SPP��l���������e�:����:���)S��#""���p�R�����l����P���0N{�����a�E�����u��^|�E�����o��v��_���(����o{MM��C�t!����U���������rg��	���#,,�����t��-�����?��E�k�����'55����������N@@���C����Y�x1���w())Q�KJJ��i���C6�L�������BLDUUU��9S��j���o��?��M�6y|RR�������r������|����������w466�H��i������'{����/mBLD��o�qoo/�/_&&&fX�B�G���Z�Lz����A����E����F#L�4	��������fBBB��l^��Edd$���j��70��DFFb4�:O�wu\!&���F����f5������d��5{�Q\\Lrr2���N��v;���������D�}��9!FCpp0�7oVg?��/mBL4��o���b!&&��4�t!�#_�W�k&��^��� ������F��������4��H=�iz�)�D���N^^7nty;��������������f�f3���jzHH�f� 66���&��]��D{{;�?����7���&�D��GxS����y�����.�x����u�4`��^��� �����}�k>�e4�����n�c�Zihh <<\M��G�Z����;���=���
1Q���r��AV�\�t?{KK}}}���:=��o��!����hll�����B!::���Z�\��C=4�"���1c555@�E�����jZ��M��B�����U{������?r�V���7�u��L�����yu��Nuu5���>��`�rrr

%%%�m|���|@}}=���W����o�����i��������hW������k��QTTDQQiiiX�VJKKQ��HRR���\}��b�
8��(�^�ZM_�l���;v��0$����K������B��a��6��_���(�������jg����'�r�V���V}���O())������w�|�r�~�i`��L����R��9�F|�J�������n��YYYdee
I��l�L&ZZZ8x� ��o�i���+�B!����������8����U��>�����o�l6��Y3B%B!�B��c�f'����!iaaa$''�Ai�B!��7F}&p�HOO�"!�B!���A`oo/999������JO�:Ebb�=��g���l��]������]]��n�s��1.]���(,Y����x�8Z���b<������c���GM;}�4eee@��/����|]]���8L&k���l6s��i�����6����k��6�'#�=�U�����C�hkkSJ
����kZ�^o=�j����������={��@p?�E|!D?���r�������o�����(tuu����_o!�ZOO�.]��pJ?q���o�`0�w�^��|`` )))���@~~>��?��������;wnD�E�{��>����������@yy9eee�Z�JW�b�i�{��X�=h�?��zX�`���n����l���W^y���hmme��i<��s@�L��9s�X,$$$�����b���������������_�������#G�y���LqTTyyyX�V��[�>��G�����H\\��/��	�%WSS��
P����Z���b��<y���s��E����P���0�o�2e��sDD���C�����������m�p�/��"s������f��U���cM�����Z�A���������FHH�?m�e����x��W���~�m6���<��3������u���,��<
�p��q.\Hll,���>|���Tl6			�?�K/��������;����Y	q����������������?���7���O���!00��k�:]���_o!�RSS���!�%%%���n��u�1����9s�SZoo/�/_��<�*�X�U�������INNV���mWB��z���j��o�����1�-R7�0i�$����s���g0`6�			�f�y�����������g�����QQQDFF:m���c�q��Q���dsRq���s'��og���=zTM�������c�y��w�����7�c����e��i����Lrr2��������Hyy9�V�rJ�X,���H$�=_����l��Y�E���cE��{S��w��7Fu���Ncc#���C^���0.������Jzz:���dff�=~rr2�-���+��O����2�<y2�g����jzHH�f� 66���W����#�Xjhh��w���7����G�f������f�f3���c�����������*W]]�SO=����k��#���hoo�gA���v%�X���z����������+++�U����h���wX��x�	***p8��o���Z�444��/��u��������������
1utt`���t�����k������p��z�!�5W�����#�x������L233	pz&=((���F���������^<���+���pz�����?������;!|��}��3����������+!��z�����]���Q���n�S]]Mjj�O�-X����BCCIIIq��59r���
�f�"22����~��je���j������p8�\��'�b"�t�����F���Y�n����e�������c��K\}����]!&�+Vp��Ea���N�
n���\�v���"���HKK�`0p��Y�|�I�T�{��#�.]Jnn.�����]	1�h�{=�h�_R��9���1G��b����<���2�B!�BLd����(�B!�B���A���.�B!��
!�B!�}D�:B!�>���EVV�Ob�:u���D��r���_g��]cV����',Y���K�RVVFYY�w���1v��EXX���w��]������XUUEii)������GM?}�4eee@�b'���s����C�����.����7�V9���s^uuu�p80�L�]�V��]O98|�0����W���
�	B!����S���NF�<&�I]!�����h4��e����zc���l���x�)������?$55uX���<y���X222���UZZ�z�h�So=�HJJ
�[��#G�xU���p���HOOg�����}��A�B!�Q�������&;;���z���/��G��_����lJJJ���������!;;{X�b�=k����;������_�B=&�g�}���srssq8n���<������;����7�|�����g3���@ZZZ������o~����E=���{����g����8��+^^���MKDD��}~CCC�������|�c�.0g�����������s���UN�q����)SV�����U9���0�@���fB�CnB!�#*&&���xE����?��ddd�����s�N������ ..���8���HOO����l6/^��`�������K����i�&��?���Z�9~�8.$66���f>��Y�j�7-3f����;s�����sN����o2k�,��;�_��W�l���f#>>�g�y�}������6��;w��� 88����n�o��HHH���������n��1GG���'99Y�Z���sv�_��8z�k@UU3g����V����|�����m������B!��jkk������6mmm<��c=z��3g���~���5DEE�(
���jzSSQQQ����rj�x�"��]So	��l�kz�����&77�u��9�[�V�������_�����f3��8S�����-�������`���j����������=���l��y��Z��V~-z�xs^��������)gTT���TWWS\\����A�B!�Q�]�����s��^}�U������������J�g�EQ�s�����Njj���1����Gff&��o0���^x*�������"(�������Npp0L�4�m��8zi��{^�������q�F��x[���g��{�yqF�'�
!�B�����W����H����[��>}:�>���L��^���+���]]]477;����v�V+


�����?�0V����z�A�V9����OPQQ���p�y&��3,**
������+�'Z���s��Ed���hll���iX�3fPSS@uu����V:�������'�����������\���������B__����z���	B!�>a����7������9sX�j���o	d���(tuu�K��m�����JJJ���~������=�?�8~~~j��d��>�����������/'//��S�:��*�Vy��Y��#G�����p0k�,u�v/�QX�z5����������
6�s��i��d2���b�Z�>7-�|�	%%%tww�{�n�/_��O?��+8p����z�j�q�.]Jnn.��	�������_��z��9���J�]�FQQEEE���a0t�������RE�h4������tG9s��`����H!�B�����
�e0����
!�S���<��#��q���h�1�]����>��=�������p8�4i���������y���0�[l6MMMtww�����z�B1���@!�c�d2q��mn�������&�\���/��������������f����b��7�����:u*������j����/���d���C�Y|�>���J��}�bd�,��� P!��3��$EQ���GMwxi�����BXX�������������{y�B1Rd(�b����Y���p8
����,n�/����DHH�S���*����
�zZ���{�B��"=�B�1g4��hnnV��<����KSS��6M���Jkk+]]]Cn�Z���aaa���S__�.�*���������}�����p8��W!�+�:�B!�B�V��@!�B!���� P!�B!�#2B!�B����B!��>"�@!�B!����n����#�B!����p��-._�<��B!�B1
��X�����0IEND�B`�
#62Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#60)
1 attachment(s)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Tomas Vondra wrote:

So, here's v10 of the patch (based on the v9+v9a), that implements the
approach described above.

It turned out to be much easier than I expected (basically just a
rewrite of the pgstat_read_db_statsfile_timestamp() function.

Thanks. I'm giving this another look now. I think the new code means
we no longer need the first_write logic; just let the collector idle
until we get the first request. (If for some reason we considered that
we should really be publishing initial stats as early as possible, we
could just do a write_statsfiles(allDbs) call before entering the main
loop. But I don't see any reason to do this. If you do, please speak
up.)

Also, it seems to me that the new pgstat_db_requested() logic is
slightly bogus (in the "inefficient" sense, not the "incorrect" sense):
we should be comparing the timestamp of the request vs. what's already
on disk instead of blindly returning true if the list is nonempty. If
the request is older than the file, we don't need to write anything and
can discard the request. For example, suppose that backend A sends a
request for a DB; we write the file. If then quickly backend B also
requests stats for the same DB, with the current logic we'd go write the
file, but perhaps backend B would be fine with the file we already
wrote.

Another point is that I think there's a better way to handle nonexistant
files, instead of having to read the global file and all the DB records
to find the one we want. Just try to read the database file, and only
if that fails try to read the global file and compare the timestamp (so
there might be two timestamps for each DB, one in the global file and
one in the DB-specific file. I don't think this is a problem). The
point is avoid having to read the global file if possible.

So here's v11. I intend to commit this shortly. (I wanted to get it
out before lunch, but I introduced a silly bug that took me a bit to
fix.)

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

stats-split-v11.patchtext/x-diff; charset=us-asciiDownload
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
***************
*** 38,43 ****
--- 38,44 ----
  #include "access/xact.h"
  #include "catalog/pg_database.h"
  #include "catalog/pg_proc.h"
+ #include "lib/ilist.h"
  #include "libpq/ip.h"
  #include "libpq/libpq.h"
  #include "libpq/pqsignal.h"
***************
*** 66,73 ****
   * Paths for the statistics files (relative to installation's $PGDATA).
   * ----------
   */
! #define PGSTAT_STAT_PERMANENT_FILENAME		"global/pgstat.stat"
! #define PGSTAT_STAT_PERMANENT_TMPFILE		"global/pgstat.tmp"
  
  /* ----------
   * Timer definitions.
--- 67,75 ----
   * Paths for the statistics files (relative to installation's $PGDATA).
   * ----------
   */
! #define PGSTAT_STAT_PERMANENT_DIRECTORY		"pg_stat"
! #define PGSTAT_STAT_PERMANENT_FILENAME		"pg_stat/global.stat"
! #define PGSTAT_STAT_PERMANENT_TMPFILE		"pg_stat/global.tmp"
  
  /* ----------
   * Timer definitions.
***************
*** 115,120 **** int			pgstat_track_activity_query_size = 1024;
--- 117,123 ----
   * Built from GUC parameter
   * ----------
   */
+ char	   *pgstat_stat_directory = NULL;
  char	   *pgstat_stat_filename = NULL;
  char	   *pgstat_stat_tmpname = NULL;
  
***************
*** 219,229 **** static int	localNumBackends = 0;
   */
  static PgStat_GlobalStats globalStats;
  
! /* Last time the collector successfully wrote the stats file */
! static TimestampTz last_statwrite;
  
! /* Latest statistics request time from backends */
! static TimestampTz last_statrequest;
  
  static volatile bool need_exit = false;
  static volatile bool got_SIGHUP = false;
--- 222,237 ----
   */
  static PgStat_GlobalStats globalStats;
  
! /* Write request info for each database */
! typedef struct DBWriteRequest
! {
! 	Oid			databaseid;		/* OID of the database to write */
! 	TimestampTz request_time;	/* timestamp of the last write request */
! 	slist_node	next;
! } DBWriteRequest;
  
! /* Latest statistics request times from backends */
! static slist_head	last_statrequests = SLIST_STATIC_INIT(last_statrequests);
  
  static volatile bool need_exit = false;
  static volatile bool got_SIGHUP = false;
***************
*** 252,262 **** static void pgstat_sighup_handler(SIGNAL_ARGS);
  static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
  static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
  					 Oid tableoid, bool create);
! static void pgstat_write_statsfile(bool permanent);
! static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent);
  static void backend_read_statsfile(void);
  static void pgstat_read_current_status(void);
  
  static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
  static void pgstat_send_funcstats(void);
  static HTAB *pgstat_collect_oids(Oid catalogid);
--- 260,275 ----
  static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
  static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
  					 Oid tableoid, bool create);
! static void pgstat_write_statsfiles(bool permanent, bool allDbs);
! static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent);
! static HTAB *pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep);
! static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
  static void backend_read_statsfile(void);
  static void pgstat_read_current_status(void);
  
+ static bool pgstat_write_statsfile_needed(void);
+ static bool pgstat_db_requested(Oid databaseid);
+ 
  static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
  static void pgstat_send_funcstats(void);
  static HTAB *pgstat_collect_oids(Oid catalogid);
***************
*** 285,291 **** static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le
  static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
  static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
  
- 
  /* ------------------------------------------------------------
   * Public functions called from postmaster follow
   * ------------------------------------------------------------
--- 298,303 ----
***************
*** 541,556 **** startup_failed:
  }
  
  /*
   * pgstat_reset_all() -
   *
!  * Remove the stats file.  This is currently used only if WAL
   * recovery is needed after a crash.
   */
  void
  pgstat_reset_all(void)
  {
! 	unlink(pgstat_stat_filename);
! 	unlink(PGSTAT_STAT_PERMANENT_FILENAME);
  }
  
  #ifdef EXEC_BACKEND
--- 553,592 ----
  }
  
  /*
+  * subroutine for pgstat_reset_all
+  */
+ static void
+ pgstat_reset_remove_files(const char *directory)
+ {
+ 	DIR * dir;
+ 	struct dirent * entry;
+ 	char	fname[MAXPGPATH];
+ 
+ 	dir = AllocateDir(pgstat_stat_directory);
+ 	while ((entry = ReadDir(dir, pgstat_stat_directory)) != NULL)
+ 	{
+ 		if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
+ 			continue;
+ 
+ 		snprintf(fname, MAXPGPATH, "%s/%s", pgstat_stat_directory,
+ 				 entry->d_name);
+ 		unlink(fname);
+ 	}
+ 	FreeDir(dir);
+ }
+ 
+ /*
   * pgstat_reset_all() -
   *
!  * Remove the stats files.  This is currently used only if WAL
   * recovery is needed after a crash.
   */
  void
  pgstat_reset_all(void)
  {
! 
! 	pgstat_reset_remove_files(pgstat_stat_directory);
! 	pgstat_reset_remove_files(PGSTAT_STAT_PERMANENT_DIRECTORY);
  }
  
  #ifdef EXEC_BACKEND
***************
*** 1408,1420 **** pgstat_ping(void)
   * ----------
   */
  static void
! pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time)
  {
  	PgStat_MsgInquiry msg;
  
  	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_INQUIRY);
  	msg.clock_time = clock_time;
  	msg.cutoff_time = cutoff_time;
  	pgstat_send(&msg, sizeof(msg));
  }
  
--- 1444,1457 ----
   * ----------
   */
  static void
! pgstat_send_inquiry(TimestampTz clock_time, TimestampTz cutoff_time, Oid databaseid)
  {
  	PgStat_MsgInquiry msg;
  
  	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_INQUIRY);
  	msg.clock_time = clock_time;
  	msg.cutoff_time = cutoff_time;
+ 	msg.databaseid = databaseid;
  	pgstat_send(&msg, sizeof(msg));
  }
  
***************
*** 3053,3069 **** PgstatCollectorMain(int argc, char *argv[])
  	init_ps_display("stats collector process", "", "", "");
  
  	/*
- 	 * Arrange to write the initial status file right away
- 	 */
- 	last_statrequest = GetCurrentTimestamp();
- 	last_statwrite = last_statrequest - 1;
- 
- 	/*
  	 * Read in an existing statistics stats file or initialize the stats to
  	 * zero.
  	 */
  	pgStatRunningInCollector = true;
! 	pgStatDBHash = pgstat_read_statsfile(InvalidOid, true);
  
  	/*
  	 * Loop to process messages until we get SIGQUIT or detect ungraceful
--- 3090,3100 ----
  	init_ps_display("stats collector process", "", "", "");
  
  	/*
  	 * Read in an existing statistics stats file or initialize the stats to
  	 * zero.
  	 */
  	pgStatRunningInCollector = true;
! 	pgStatDBHash = pgstat_read_statsfiles(InvalidOid, true, true);
  
  	/*
  	 * Loop to process messages until we get SIGQUIT or detect ungraceful
***************
*** 3109,3116 **** PgstatCollectorMain(int argc, char *argv[])
  			 * Write the stats file if a new request has arrived that is not
  			 * satisfied by existing file.
  			 */
! 			if (last_statwrite < last_statrequest)
! 				pgstat_write_statsfile(false);
  
  			/*
  			 * Try to receive and process a message.  This will not block,
--- 3140,3147 ----
  			 * Write the stats file if a new request has arrived that is not
  			 * satisfied by existing file.
  			 */
! 			if (pgstat_write_statsfile_needed())
! 				pgstat_write_statsfiles(false, false);
  
  			/*
  			 * Try to receive and process a message.  This will not block,
***************
*** 3269,3275 **** PgstatCollectorMain(int argc, char *argv[])
  	/*
  	 * Save the final stats to reuse at next startup.
  	 */
! 	pgstat_write_statsfile(true);
  
  	exit(0);
  }
--- 3300,3306 ----
  	/*
  	 * Save the final stats to reuse at next startup.
  	 */
! 	pgstat_write_statsfiles(true, true);
  
  	exit(0);
  }
***************
*** 3349,3354 **** pgstat_get_db_entry(Oid databaseid, bool create)
--- 3380,3386 ----
  		result->n_block_write_time = 0;
  
  		result->stat_reset_timestamp = GetCurrentTimestamp();
+ 		result->stats_timestamp = 0;
  
  		memset(&hash_ctl, 0, sizeof(hash_ctl));
  		hash_ctl.keysize = sizeof(Oid);
***************
*** 3422,3451 **** pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create)
  
  
  /* ----------
!  * pgstat_write_statsfile() -
   *
   *	Tell the news.
!  *	If writing to the permanent file (happens when the collector is
!  *	shutting down only), remove the temporary file so that backends
   *	starting up under a new postmaster can't read the old data before
   *	the new collector is ready.
   * ----------
   */
  static void
! pgstat_write_statsfile(bool permanent)
  {
  	HASH_SEQ_STATUS hstat;
- 	HASH_SEQ_STATUS tstat;
- 	HASH_SEQ_STATUS fstat;
  	PgStat_StatDBEntry *dbentry;
- 	PgStat_StatTabEntry *tabentry;
- 	PgStat_StatFuncEntry *funcentry;
  	FILE	   *fpout;
  	int32		format_id;
  	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
  	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
  	int			rc;
  
  	/*
  	 * Open the statistics temp file to write out the current values.
  	 */
--- 3454,3485 ----
  
  
  /* ----------
!  * pgstat_write_statsfiles() -
   *
   *	Tell the news.
!  *	If writing to the permanent files (happens when the collector is
!  *	shutting down only), remove the temporary files so that backends
   *	starting up under a new postmaster can't read the old data before
   *	the new collector is ready.
+  *
+  *	When 'allDbs' is false, only the requested databases (listed in
+  *	last_statrequests) will be written; otherwise, all databases will be
+  *	written.
   * ----------
   */
  static void
! pgstat_write_statsfiles(bool permanent, bool allDbs)
  {
  	HASH_SEQ_STATUS hstat;
  	PgStat_StatDBEntry *dbentry;
  	FILE	   *fpout;
  	int32		format_id;
  	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
  	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
  	int			rc;
  
+ 	elog(DEBUG1, "writing statsfile '%s'", statfile);
+ 
  	/*
  	 * Open the statistics temp file to write out the current values.
  	 */
***************
*** 3484,3523 **** pgstat_write_statsfile(bool permanent)
  	while ((dbentry = (PgStat_StatDBEntry *) hash_seq_search(&hstat)) != NULL)
  	{
  		/*
! 		 * Write out the DB entry including the number of live backends. We
! 		 * don't write the tables or functions pointers, since they're of no
! 		 * use to any other process.
! 		 */
! 		fputc('D', fpout);
! 		rc = fwrite(dbentry, offsetof(PgStat_StatDBEntry, tables), 1, fpout);
! 		(void) rc;				/* we'll check for error with ferror */
! 
! 		/*
! 		 * Walk through the database's access stats per table.
! 		 */
! 		hash_seq_init(&tstat, dbentry->tables);
! 		while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
! 		{
! 			fputc('T', fpout);
! 			rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
! 			(void) rc;			/* we'll check for error with ferror */
! 		}
! 
! 		/*
! 		 * Walk through the database's function stats table.
  		 */
! 		hash_seq_init(&fstat, dbentry->functions);
! 		while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
  		{
! 			fputc('F', fpout);
! 			rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
! 			(void) rc;			/* we'll check for error with ferror */
  		}
  
  		/*
! 		 * Mark the end of this DB
  		 */
! 		fputc('d', fpout);
  	}
  
  	/*
--- 3518,3542 ----
  	while ((dbentry = (PgStat_StatDBEntry *) hash_seq_search(&hstat)) != NULL)
  	{
  		/*
! 		 * Write out the tables and functions into the DB stat file, if
! 		 * required.
! 		 *
! 		 * We need to do this before the dbentry write, to ensure the
! 		 * timestamps written to both are consistent.
  		 */
! 		if (allDbs || pgstat_db_requested(dbentry->databaseid))
  		{
! 			dbentry->stats_timestamp = globalStats.stats_timestamp;
! 			pgstat_write_db_statsfile(dbentry, permanent);
  		}
  
  		/*
! 		 * Write out the DB entry. We don't write the tables or functions
! 		 * pointers, since they're of no use to any other process.
  		 */
! 		fputc('D', fpout);
! 		rc = fwrite(dbentry, offsetof(PgStat_StatDBEntry, tables), 1, fpout);
! 		(void) rc;				/* we'll check for error with ferror */
  	}
  
  	/*
***************
*** 3552,3612 **** pgstat_write_statsfile(bool permanent)
  						tmpfile, statfile)));
  		unlink(tmpfile);
  	}
! 	else
  	{
! 		/*
! 		 * Successful write, so update last_statwrite.
! 		 */
! 		last_statwrite = globalStats.stats_timestamp;
  
! 		/*
! 		 * If there is clock skew between backends and the collector, we could
! 		 * receive a stats request time that's in the future.  If so, complain
! 		 * and reset last_statrequest.	Resetting ensures that no inquiry
! 		 * message can cause more than one stats file write to occur.
! 		 */
! 		if (last_statrequest > last_statwrite)
  		{
! 			char	   *reqtime;
! 			char	   *mytime;
  
! 			/* Copy because timestamptz_to_str returns a static buffer */
! 			reqtime = pstrdup(timestamptz_to_str(last_statrequest));
! 			mytime = pstrdup(timestamptz_to_str(last_statwrite));
! 			elog(LOG, "last_statrequest %s is later than collector's time %s",
! 				 reqtime, mytime);
! 			pfree(reqtime);
! 			pfree(mytime);
! 
! 			last_statrequest = last_statwrite;
  		}
  	}
  
! 	if (permanent)
! 		unlink(pgstat_stat_filename);
  }
  
  
  /* ----------
!  * pgstat_read_statsfile() -
   *
!  *	Reads in an existing statistics collector file and initializes the
!  *	databases' hash table (whose entries point to the tables' hash tables).
   * ----------
   */
  static HTAB *
! pgstat_read_statsfile(Oid onlydb, bool permanent)
  {
  	PgStat_StatDBEntry *dbentry;
  	PgStat_StatDBEntry dbbuf;
- 	PgStat_StatTabEntry *tabentry;
- 	PgStat_StatTabEntry tabbuf;
- 	PgStat_StatFuncEntry funcbuf;
- 	PgStat_StatFuncEntry *funcentry;
  	HASHCTL		hash_ctl;
  	HTAB	   *dbhash;
- 	HTAB	   *tabhash = NULL;
- 	HTAB	   *funchash = NULL;
  	FILE	   *fpin;
  	int32		format_id;
  	bool		found;
--- 3571,3752 ----
  						tmpfile, statfile)));
  		unlink(tmpfile);
  	}
! 
! 	if (permanent)
! 		unlink(pgstat_stat_filename);
! 
! 	/*
! 	 * Now throw away the list of requests.  Note that requests sent after we
! 	 * started the write are still waiting on the network socket.
! 	 */
! 	if (!slist_is_empty(&last_statrequests))
  	{
! 		slist_mutable_iter	iter;
  
! 		slist_foreach_modify(iter, &last_statrequests)
  		{
! 			DBWriteRequest *req;
  
! 			req = slist_container(DBWriteRequest, next, iter.cur);
! 			pfree(req);
  		}
+ 
+ 		slist_init(&last_statrequests);
  	}
+ }
  
! /*
!  * return the filename for a DB stat file; filename is the output buffer,
!  * of length len.
!  */
! static void
! get_dbstat_filename(bool permanent, bool tempname, Oid databaseid,
! 					char *filename, int len)
! {
! 	int		printed;
! 
! 	printed = snprintf(filename, len, "%s/db_%u.%s",
! 					   permanent ? "pg_stat" : pgstat_stat_directory,
! 					   databaseid,
! 					   tempname ? "tmp" : "stat");
! 	if (printed > len)
! 		elog(ERROR, "overlength pgstat path");
  }
  
+ /* ----------
+  * pgstat_write_db_statsfile() -
+  *
+  *	Tell the news. This writes stats file for a single database.
+  *
+  *	If writing to the permanent file (happens when the collector is
+  *	shutting down only), remove the temporary file so that backends
+  *	starting up under a new postmaster can't read the old data before
+  *	the new collector is ready.
+  * ----------
+  */
+ static void
+ pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent)
+ {
+ 	HASH_SEQ_STATUS tstat;
+ 	HASH_SEQ_STATUS fstat;
+ 	PgStat_StatTabEntry *tabentry;
+ 	PgStat_StatFuncEntry *funcentry;
+ 	FILE	   *fpout;
+ 	int32		format_id;
+ 	Oid			dbid = dbentry->databaseid;
+ 	int			rc;
+ 	char		tmpfile[MAXPGPATH];
+ 	char		statfile[MAXPGPATH];
+ 
+ 	get_dbstat_filename(permanent, true, dbid, tmpfile, MAXPGPATH);
+ 	get_dbstat_filename(permanent, false, dbid, statfile, MAXPGPATH);
+ 
+ 	elog(DEBUG1, "writing statsfile '%s'", statfile);
+ 
+ 	/*
+ 	 * Open the statistics temp file to write out the current values.
+ 	 */
+ 	fpout = AllocateFile(tmpfile, PG_BINARY_W);
+ 	if (fpout == NULL)
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 				 errmsg("could not open temporary statistics file \"%s\": %m",
+ 						tmpfile)));
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * Write the file header --- currently just a format ID.
+ 	 */
+ 	format_id = PGSTAT_FILE_FORMAT_ID;
+ 	rc = fwrite(&format_id, sizeof(format_id), 1, fpout);
+ 	(void) rc;					/* we'll check for error with ferror */
+ 
+ 	/*
+ 	 * Walk through the database's access stats per table.
+ 	 */
+ 	hash_seq_init(&tstat, dbentry->tables);
+ 	while ((tabentry = (PgStat_StatTabEntry *) hash_seq_search(&tstat)) != NULL)
+ 	{
+ 		fputc('T', fpout);
+ 		rc = fwrite(tabentry, sizeof(PgStat_StatTabEntry), 1, fpout);
+ 		(void) rc;			/* we'll check for error with ferror */
+ 	}
+ 
+ 	/*
+ 	 * Walk through the database's function stats table.
+ 	 */
+ 	hash_seq_init(&fstat, dbentry->functions);
+ 	while ((funcentry = (PgStat_StatFuncEntry *) hash_seq_search(&fstat)) != NULL)
+ 	{
+ 		fputc('F', fpout);
+ 		rc = fwrite(funcentry, sizeof(PgStat_StatFuncEntry), 1, fpout);
+ 		(void) rc;			/* we'll check for error with ferror */
+ 	}
+ 
+ 	/*
+ 	 * No more output to be done. Close the temp file and replace the old
+ 	 * pgstat.stat with it.  The ferror() check replaces testing for error
+ 	 * after each individual fputc or fwrite above.
+ 	 */
+ 	fputc('E', fpout);
+ 
+ 	if (ferror(fpout))
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 			   errmsg("could not write temporary statistics file \"%s\": %m",
+ 					  tmpfile)));
+ 		FreeFile(fpout);
+ 		unlink(tmpfile);
+ 	}
+ 	else if (FreeFile(fpout) < 0)
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 			   errmsg("could not close temporary statistics file \"%s\": %m",
+ 					  tmpfile)));
+ 		unlink(tmpfile);
+ 	}
+ 	else if (rename(tmpfile, statfile) < 0)
+ 	{
+ 		ereport(LOG,
+ 				(errcode_for_file_access(),
+ 				 errmsg("could not rename temporary statistics file \"%s\" to \"%s\": %m",
+ 						tmpfile, statfile)));
+ 		unlink(tmpfile);
+ 	}
+ 
+ 	if (permanent)
+ 	{
+ 		get_dbstat_filename(false, false, dbid, statfile, MAXPGPATH);
+ 
+ 		elog(DEBUG1, "removing temporary stat file '%s'", statfile);
+ 		unlink(statfile);
+ 	}
+ }
  
  /* ----------
!  * pgstat_read_statsfiles() -
   *
!  *	Reads in the existing statistics collector files and initializes the
!  *	databases' hash table.  If the permanent file name is requested (which
!  *	only happens in the stats collector itself), also remove the file after
!  *	reading; the in-memory status is now authoritative, and the permanent file
!  *	would be out of date in case somebody else reads it.
!  *
!  *  If a deep read is requested, table/function stats are read also, otherwise
!  *  the table/function hash tables remain empty.
   * ----------
   */
  static HTAB *
! pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
  {
  	PgStat_StatDBEntry *dbentry;
  	PgStat_StatDBEntry dbbuf;
  	HASHCTL		hash_ctl;
  	HTAB	   *dbhash;
  	FILE	   *fpin;
  	int32		format_id;
  	bool		found;
***************
*** 3641,3647 **** pgstat_read_statsfile(Oid onlydb, bool permanent)
  	globalStats.stat_reset_timestamp = GetCurrentTimestamp();
  
  	/*
! 	 * Try to open the status file. If it doesn't exist, the backends simply
  	 * return zero for anything and the collector simply starts from scratch
  	 * with empty counters.
  	 *
--- 3781,3787 ----
  	globalStats.stat_reset_timestamp = GetCurrentTimestamp();
  
  	/*
! 	 * Try to open the stats file. If it doesn't exist, the backends simply
  	 * return zero for anything and the collector simply starts from scratch
  	 * with empty counters.
  	 *
***************
*** 3662,3669 **** pgstat_read_statsfile(Oid onlydb, bool permanent)
  	/*
  	 * Verify it's of the expected format.
  	 */
! 	if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id)
! 		|| format_id != PGSTAT_FILE_FORMAT_ID)
  	{
  		ereport(pgStatRunningInCollector ? LOG : WARNING,
  				(errmsg("corrupted statistics file \"%s\"", statfile)));
--- 3802,3809 ----
  	/*
  	 * Verify it's of the expected format.
  	 */
! 	if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id) ||
! 		format_id != PGSTAT_FILE_FORMAT_ID)
  	{
  		ereport(pgStatRunningInCollector ? LOG : WARNING,
  				(errmsg("corrupted statistics file \"%s\"", statfile)));
***************
*** 3690,3697 **** pgstat_read_statsfile(Oid onlydb, bool permanent)
  		{
  				/*
  				 * 'D'	A PgStat_StatDBEntry struct describing a database
! 				 * follows. Subsequently, zero to many 'T' and 'F' entries
! 				 * will follow until a 'd' is encountered.
  				 */
  			case 'D':
  				if (fread(&dbbuf, 1, offsetof(PgStat_StatDBEntry, tables),
--- 3830,3836 ----
  		{
  				/*
  				 * 'D'	A PgStat_StatDBEntry struct describing a database
! 				 * follows.
  				 */
  			case 'D':
  				if (fread(&dbbuf, 1, offsetof(PgStat_StatDBEntry, tables),
***************
*** 3753,3773 **** pgstat_read_statsfile(Oid onlydb, bool permanent)
  								   HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
  
  				/*
! 				 * Arrange that following records add entries to this
! 				 * database's hash tables.
  				 */
! 				tabhash = dbentry->tables;
! 				funchash = dbentry->functions;
! 				break;
  
- 				/*
- 				 * 'd'	End of this database.
- 				 */
- 			case 'd':
- 				tabhash = NULL;
- 				funchash = NULL;
  				break;
  
  				/*
  				 * 'T'	A PgStat_StatTabEntry follows.
  				 */
--- 3892,3998 ----
  								   HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
  
  				/*
! 				 * If requested, read the data from the database-specific file.
! 				 * If there was onlydb specified (!= InvalidOid), we would not
! 				 * get here because of a break above. So we don't need to
! 				 * recheck.
  				 */
! 				if (deep)
! 					pgstat_read_db_statsfile(dbentry->databaseid,
! 											 dbentry->tables,
! 											 dbentry->functions,
! 											 permanent);
  
  				break;
  
+ 			case 'E':
+ 				goto done;
+ 
+ 			default:
+ 				ereport(pgStatRunningInCollector ? LOG : WARNING,
+ 						(errmsg("corrupted statistics file \"%s\"",
+ 								statfile)));
+ 				goto done;
+ 		}
+ 	}
+ 
+ done:
+ 	FreeFile(fpin);
+ 
+ 	/* If requested to read the permanent file, also get rid of it. */
+ 	if (permanent)
+ 	{
+ 		elog(DEBUG1, "removing permanent stats file '%s'", statfile);
+ 		unlink(statfile);
+ 	}
+ 
+ 	return dbhash;
+ }
+ 
+ 
+ /* ----------
+  * pgstat_read_db_statsfile() -
+  *
+  *	Reads in the existing statistics collector file for the given database,
+  *	and initializes the tables and functions hash tables.
+  *
+  *	As pgstat_read_statsfiles, if the permanent file is requested, it is
+  *	removed after reading.
+  * ----------
+  */
+ static void
+ pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash,
+ 						 bool permanent)
+ {
+ 	PgStat_StatTabEntry *tabentry;
+ 	PgStat_StatTabEntry tabbuf;
+ 	PgStat_StatFuncEntry funcbuf;
+ 	PgStat_StatFuncEntry *funcentry;
+ 	FILE	   *fpin;
+ 	int32		format_id;
+ 	bool		found;
+ 	char		statfile[MAXPGPATH];
+ 
+ 	get_dbstat_filename(permanent, false, databaseid, statfile, MAXPGPATH);
+ 
+ 	/*
+ 	 * Try to open the stats file. If it doesn't exist, the backends simply
+ 	 * return zero for anything and the collector simply starts from scratch
+ 	 * with empty counters.
+ 	 *
+ 	 * ENOENT is a possibility if the stats collector is not running or has
+ 	 * not yet written the stats file the first time.  Any other failure
+ 	 * condition is suspicious.
+ 	 */
+ 	if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
+ 	{
+ 		if (errno != ENOENT)
+ 			ereport(pgStatRunningInCollector ? LOG : WARNING,
+ 					(errcode_for_file_access(),
+ 					 errmsg("could not open statistics file \"%s\": %m",
+ 							statfile)));
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * Verify it's of the expected format.
+ 	 */
+ 	if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id)
+ 		|| format_id != PGSTAT_FILE_FORMAT_ID)
+ 	{
+ 		ereport(pgStatRunningInCollector ? LOG : WARNING,
+ 				(errmsg("corrupted statistics file \"%s\"", statfile)));
+ 		goto done;
+ 	}
+ 
+ 	/*
+ 	 * We found an existing collector stats file. Read it and put all the
+ 	 * hashtable entries into place.
+ 	 */
+ 	for (;;)
+ 	{
+ 		switch (fgetc(fpin))
+ 		{
  				/*
  				 * 'T'	A PgStat_StatTabEntry follows.
  				 */
***************
*** 3854,3881 **** done:
  	FreeFile(fpin);
  
  	if (permanent)
! 		unlink(PGSTAT_STAT_PERMANENT_FILENAME);
  
! 	return dbhash;
  }
  
  /* ----------
!  * pgstat_read_statsfile_timestamp() -
   *
!  *	Attempt to fetch the timestamp of an existing stats file.
!  *	Returns TRUE if successful (timestamp is stored at *ts).
   * ----------
   */
  static bool
! pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
  {
  	PgStat_GlobalStats myGlobalStats;
  	FILE	   *fpin;
  	int32		format_id;
  	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
  
  	/*
! 	 * Try to open the status file.  As above, anything but ENOENT is worthy
  	 * of complaining about.
  	 */
  	if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
--- 4079,4121 ----
  	FreeFile(fpin);
  
  	if (permanent)
! 	{
! 		elog(DEBUG1, "removing permanent stats file '%s'", statfile);
! 		unlink(statfile);
! 	}
  
! 	return;
  }
  
  /* ----------
!  * pgstat_read_db_statsfile_timestamp() -
!  *
!  *	Attempt to determine the timestamp of the last db statfile write.
!  *	Returns TRUE if successful; the timestamp is stored in *ts.
!  *
!  *	This needs to be careful about handling databases without stats,
!  *	such as databases without stat entry or those not yet written:
   *
!  *	- if there's a database entry in the global file, return the corresponding
!  *	stats_timestamp value.
!  *
!  *	- if there's no db stat entry (e.g. for a new or inactive database),
!  *	there's no stat_timestamp value, but also nothing to write so we return
!  *	the timestamp of the global statfile.
   * ----------
   */
  static bool
! pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
! 								   TimestampTz *ts)
  {
+ 	PgStat_StatDBEntry dbentry;
  	PgStat_GlobalStats myGlobalStats;
  	FILE	   *fpin;
  	int32		format_id;
  	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
  
  	/*
! 	 * Try to open the stats file.  As above, anything but ENOENT is worthy
  	 * of complaining about.
  	 */
  	if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL)
***************
*** 3911,3918 **** pgstat_read_statsfile_timestamp(bool permanent, TimestampTz *ts)
--- 4151,4205 ----
  		return false;
  	}
  
+ 	/* By default, we're going to return the timestamp of the global file. */
  	*ts = myGlobalStats.stats_timestamp;
  
+ 	/*
+ 	 * We found an existing collector stats file.  Read it and look for a
+ 	 * record for the requested database.  If found, use its timestamp.
+ 	 */
+ 	for (;;)
+ 	{
+ 		switch (fgetc(fpin))
+ 		{
+ 				/*
+ 				 * 'D'	A PgStat_StatDBEntry struct describing a database
+ 				 * follows.
+ 				 */
+ 			case 'D':
+ 				if (fread(&dbentry, 1, offsetof(PgStat_StatDBEntry, tables),
+ 						  fpin) != offsetof(PgStat_StatDBEntry, tables))
+ 				{
+ 					ereport(pgStatRunningInCollector ? LOG : WARNING,
+ 							(errmsg("corrupted statistics file \"%s\"",
+ 									statfile)));
+ 					goto done;
+ 				}
+ 
+ 				/*
+ 				 * If this is the DB we're looking for, save its timestamp
+ 				 * and we're done.
+ 				 */
+ 				if (dbentry.databaseid == databaseid)
+ 				{
+ 					*ts = dbentry.stats_timestamp;
+ 					goto done;
+ 				}
+ 
+ 				break;
+ 
+ 			case 'E':
+ 				goto done;
+ 
+ 			default:
+ 				ereport(pgStatRunningInCollector ? LOG : WARNING,
+ 						(errmsg("corrupted statistics file \"%s\"",
+ 								statfile)));
+ 				goto done;
+ 		}
+ 	}
+ 
+ done:
  	FreeFile(fpin);
  	return true;
  }
***************
*** 3947,3953 **** backend_read_statsfile(void)
  
  		CHECK_FOR_INTERRUPTS();
  
! 		ok = pgstat_read_statsfile_timestamp(false, &file_ts);
  
  		cur_ts = GetCurrentTimestamp();
  		/* Calculate min acceptable timestamp, if we didn't already */
--- 4234,4240 ----
  
  		CHECK_FOR_INTERRUPTS();
  
! 		ok = pgstat_read_db_statsfile_timestamp(MyDatabaseId, false, &file_ts);
  
  		cur_ts = GetCurrentTimestamp();
  		/* Calculate min acceptable timestamp, if we didn't already */
***************
*** 4006,4012 **** backend_read_statsfile(void)
  				pfree(mytime);
  			}
  
! 			pgstat_send_inquiry(cur_ts, min_ts);
  			break;
  		}
  
--- 4293,4299 ----
  				pfree(mytime);
  			}
  
! 			pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
  			break;
  		}
  
***************
*** 4016,4022 **** backend_read_statsfile(void)
  
  		/* Not there or too old, so kick the collector and wait a bit */
  		if ((count % PGSTAT_INQ_LOOP_COUNT) == 0)
! 			pgstat_send_inquiry(cur_ts, min_ts);
  
  		pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
  	}
--- 4303,4309 ----
  
  		/* Not there or too old, so kick the collector and wait a bit */
  		if ((count % PGSTAT_INQ_LOOP_COUNT) == 0)
! 			pgstat_send_inquiry(cur_ts, min_ts, MyDatabaseId);
  
  		pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
  	}
***************
*** 4024,4034 **** backend_read_statsfile(void)
  	if (count >= PGSTAT_POLL_LOOP_COUNT)
  		elog(WARNING, "pgstat wait timeout");
  
! 	/* Autovacuum launcher wants stats about all databases */
  	if (IsAutoVacuumLauncherProcess())
! 		pgStatDBHash = pgstat_read_statsfile(InvalidOid, false);
  	else
! 		pgStatDBHash = pgstat_read_statsfile(MyDatabaseId, false);
  }
  
  
--- 4311,4324 ----
  	if (count >= PGSTAT_POLL_LOOP_COUNT)
  		elog(WARNING, "pgstat wait timeout");
  
! 	/*
! 	 * Autovacuum launcher wants stats about all databases, but a shallow
! 	 * read is sufficient.
! 	 */
  	if (IsAutoVacuumLauncherProcess())
! 		pgStatDBHash = pgstat_read_statsfiles(InvalidOid, false, false);
  	else
! 		pgStatDBHash = pgstat_read_statsfiles(MyDatabaseId, false, true);
  }
  
  
***************
*** 4084,4109 **** pgstat_clear_snapshot(void)
  static void
  pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
  {
  	/*
! 	 * Advance last_statrequest if this requestor has a newer cutoff time
! 	 * than any previous request.
  	 */
! 	if (msg->cutoff_time > last_statrequest)
! 		last_statrequest = msg->cutoff_time;
  
  	/*
! 	 * If the requestor's local clock time is older than last_statwrite, we
  	 * should suspect a clock glitch, ie system time going backwards; though
  	 * the more likely explanation is just delayed message receipt.  It is
  	 * worth expending a GetCurrentTimestamp call to be sure, since a large
  	 * retreat in the system clock reading could otherwise cause us to neglect
  	 * to update the stats file for a long time.
  	 */
! 	if (msg->clock_time < last_statwrite)
  	{
  		TimestampTz cur_ts = GetCurrentTimestamp();
  
! 		if (cur_ts < last_statwrite)
  		{
  			/*
  			 * Sure enough, time went backwards.  Force a new stats file write
--- 4374,4426 ----
  static void
  pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
  {
+ 	slist_iter	iter;
+ 	bool		found = false;
+ 	DBWriteRequest *newreq;
+ 	PgStat_StatDBEntry *dbentry;
+ 
+ 	elog(DEBUG1, "received inquiry for %d", msg->databaseid);
+ 
+ 	/*
+ 	 * Find the last write request for this DB (found=true in that case). Plain
+ 	 * linear search, not really worth doing any magic here (probably).
+ 	 */
+ 	slist_foreach(iter, &last_statrequests)
+ 	{
+ 		DBWriteRequest *req = slist_container(DBWriteRequest, next, iter.cur);
+ 
+ 		if (req->databaseid != msg->databaseid)
+ 			continue;
+ 
+ 		if (msg->cutoff_time > req->request_time)
+ 			req->request_time = msg->cutoff_time;
+ 		found = true;
+ 		return;
+ 	}
+ 
  	/*
! 	 * There's no request for this DB yet, so create one.
  	 */
! 	newreq = palloc(sizeof(DBWriteRequest));
! 
! 	newreq->databaseid = msg->databaseid;
! 	newreq->request_time = msg->clock_time;
! 	slist_push_head(&last_statrequests, &newreq->next);
  
  	/*
! 	 * If the requestor's local clock time is older than stats_timestamp, we
  	 * should suspect a clock glitch, ie system time going backwards; though
  	 * the more likely explanation is just delayed message receipt.  It is
  	 * worth expending a GetCurrentTimestamp call to be sure, since a large
  	 * retreat in the system clock reading could otherwise cause us to neglect
  	 * to update the stats file for a long time.
  	 */
! 	dbentry = pgstat_get_db_entry(msg->databaseid, false);
! 	if ((dbentry != NULL) && (msg->clock_time < dbentry->stats_timestamp))
  	{
  		TimestampTz cur_ts = GetCurrentTimestamp();
  
! 		if (cur_ts < dbentry->stats_timestamp)
  		{
  			/*
  			 * Sure enough, time went backwards.  Force a new stats file write
***************
*** 4113,4127 **** pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
  			char	   *mytime;
  
  			/* Copy because timestamptz_to_str returns a static buffer */
! 			writetime = pstrdup(timestamptz_to_str(last_statwrite));
  			mytime = pstrdup(timestamptz_to_str(cur_ts));
! 			elog(LOG, "last_statwrite %s is later than collector's time %s",
! 				 writetime, mytime);
  			pfree(writetime);
  			pfree(mytime);
  
! 			last_statrequest = cur_ts;
! 			last_statwrite = last_statrequest - 1;
  		}
  	}
  }
--- 4430,4445 ----
  			char	   *mytime;
  
  			/* Copy because timestamptz_to_str returns a static buffer */
! 			writetime = pstrdup(timestamptz_to_str(dbentry->stats_timestamp));
  			mytime = pstrdup(timestamptz_to_str(cur_ts));
! 			elog(LOG,
! 				 "stats_timestamp %s is later than collector's time %s for db %d",
! 				 writetime, mytime, dbentry->databaseid);
  			pfree(writetime);
  			pfree(mytime);
  
! 			newreq->request_time = cur_ts;
! 			dbentry->stats_timestamp = cur_ts - 1;
  		}
  	}
  }
***************
*** 4270,4298 **** pgstat_recv_tabpurge(PgStat_MsgTabpurge *msg, int len)
  static void
  pgstat_recv_dropdb(PgStat_MsgDropdb *msg, int len)
  {
  	PgStat_StatDBEntry *dbentry;
  
  	/*
  	 * Lookup the database in the hashtable.
  	 */
! 	dbentry = pgstat_get_db_entry(msg->m_databaseid, false);
  
  	/*
! 	 * If found, remove it.
  	 */
  	if (dbentry)
  	{
  		if (dbentry->tables != NULL)
  			hash_destroy(dbentry->tables);
  		if (dbentry->functions != NULL)
  			hash_destroy(dbentry->functions);
  
  		if (hash_search(pgStatDBHash,
! 						(void *) &(dbentry->databaseid),
  						HASH_REMOVE, NULL) == NULL)
  			ereport(ERROR,
! 					(errmsg("database hash table corrupted "
! 							"during cleanup --- abort")));
  	}
  }
  
--- 4588,4623 ----
  static void
  pgstat_recv_dropdb(PgStat_MsgDropdb *msg, int len)
  {
+ 	Oid			dbid = msg->m_databaseid;
  	PgStat_StatDBEntry *dbentry;
  
  	/*
  	 * Lookup the database in the hashtable.
  	 */
! 	dbentry = pgstat_get_db_entry(dbid, false);
  
  	/*
! 	 * If found, remove it (along with the db statfile).
  	 */
  	if (dbentry)
  	{
+ 		char		statfile[MAXPGPATH];
+ 
+ 		get_dbstat_filename(true, false, dbid, statfile, MAXPGPATH);
+ 
+ 		elog(DEBUG1, "removing %s", statfile);
+ 		unlink(statfile);
+ 
  		if (dbentry->tables != NULL)
  			hash_destroy(dbentry->tables);
  		if (dbentry->functions != NULL)
  			hash_destroy(dbentry->functions);
  
  		if (hash_search(pgStatDBHash,
! 						(void *) &dbid,
  						HASH_REMOVE, NULL) == NULL)
  			ereport(ERROR,
! 					(errmsg("database hash table corrupted during cleanup --- abort")));
  	}
  }
  
***************
*** 4687,4689 **** pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len)
--- 5012,5053 ----
  						   HASH_REMOVE, NULL);
  	}
  }
+ 
+ /* ----------
+  * pgstat_write_statsfile_needed() -
+  *
+  *	Do we need to write out the files?
+  * ----------
+  */
+ static bool
+ pgstat_write_statsfile_needed(void)
+ {
+ 	if (!slist_is_empty(&last_statrequests))
+ 		return true;
+ 
+ 	/* Everything was written recently */
+ 	return false;
+ }
+ 
+ /* ----------
+  * pgstat_db_requested() -
+  *
+  *	Checks whether stats for a particular DB need to be written to a file.
+  * ----------
+  */
+ static bool
+ pgstat_db_requested(Oid databaseid)
+ {
+ 	slist_iter	iter;
+ 
+ 	/* Check the databases if they need to refresh the stats. */
+ 	slist_foreach(iter, &last_statrequests)
+ 	{
+ 		DBWriteRequest	*req = slist_container(DBWriteRequest, next, iter.cur);
+ 
+ 		if (req->databaseid == databaseid)
+ 			return true;
+ 	}
+ 
+ 	return false;
+ }
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***************
*** 8705,8718 **** static void
  assign_pgstat_temp_directory(const char *newval, void *extra)
  {
  	/* check_canonical_path already canonicalized newval for us */
  	char	   *tname;
  	char	   *fname;
  
! 	tname = guc_malloc(ERROR, strlen(newval) + 12);		/* /pgstat.tmp */
! 	sprintf(tname, "%s/pgstat.tmp", newval);
! 	fname = guc_malloc(ERROR, strlen(newval) + 13);		/* /pgstat.stat */
! 	sprintf(fname, "%s/pgstat.stat", newval);
  
  	if (pgstat_stat_tmpname)
  		free(pgstat_stat_tmpname);
  	pgstat_stat_tmpname = tname;
--- 8705,8727 ----
  assign_pgstat_temp_directory(const char *newval, void *extra)
  {
  	/* check_canonical_path already canonicalized newval for us */
+ 	char	   *dname;
  	char	   *tname;
  	char	   *fname;
  
! 	/* directory */
! 	dname = guc_malloc(ERROR, strlen(newval) + 1);		/* runtime dir */
! 	sprintf(dname, "%s", newval);
  
+ 	/* global stats */
+ 	tname = guc_malloc(ERROR, strlen(newval) + 12);		/* /global.tmp */
+ 	sprintf(tname, "%s/global.tmp", newval);
+ 	fname = guc_malloc(ERROR, strlen(newval) + 13);		/* /global.stat */
+ 	sprintf(fname, "%s/global.stat", newval);
+ 
+ 	if (pgstat_stat_directory)
+ 		free(pgstat_stat_directory);
+ 	pgstat_stat_directory = dname;
  	if (pgstat_stat_tmpname)
  		free(pgstat_stat_tmpname);
  	pgstat_stat_tmpname = tname;
*** a/src/bin/initdb/initdb.c
--- b/src/bin/initdb/initdb.c
***************
*** 192,197 **** const char *subdirs[] = {
--- 192,198 ----
  	"base",
  	"base/1",
  	"pg_tblspc",
+ 	"pg_stat",
  	"pg_stat_tmp"
  };
  
*** a/src/include/pgstat.h
--- b/src/include/pgstat.h
***************
*** 205,210 **** typedef struct PgStat_MsgInquiry
--- 205,211 ----
  	PgStat_MsgHdr m_hdr;
  	TimestampTz clock_time;		/* observed local clock time */
  	TimestampTz cutoff_time;	/* minimum acceptable file timestamp */
+ 	Oid			databaseid;		/* requested DB (InvalidOid => all DBs) */
  } PgStat_MsgInquiry;
  
  
***************
*** 514,520 **** typedef union PgStat_Msg
   * ------------------------------------------------------------
   */
  
! #define PGSTAT_FILE_FORMAT_ID	0x01A5BC9A
  
  /* ----------
   * PgStat_StatDBEntry			The collector's data per database
--- 515,521 ----
   * ------------------------------------------------------------
   */
  
! #define PGSTAT_FILE_FORMAT_ID	0x01A5BC9B
  
  /* ----------
   * PgStat_StatDBEntry			The collector's data per database
***************
*** 545,550 **** typedef struct PgStat_StatDBEntry
--- 546,552 ----
  	PgStat_Counter n_block_write_time;
  
  	TimestampTz stat_reset_timestamp;
+ 	TimestampTz stats_timestamp;		/* time of db stats file update */
  
  	/*
  	 * tables and functions must be last in the struct, because we don't write
***************
*** 722,727 **** extern bool pgstat_track_activities;
--- 724,730 ----
  extern bool pgstat_track_counts;
  extern int	pgstat_track_functions;
  extern PGDLLIMPORT int pgstat_track_activity_query_size;
+ extern char *pgstat_stat_directory;
  extern char *pgstat_stat_tmpname;
  extern char *pgstat_stat_filename;
  
#63Tomas Vondra
tv@fuzzy.cz
In reply to: Alvaro Herrera (#62)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 18.2.2013 16:50, Alvaro Herrera wrote:

Tomas Vondra wrote:

So, here's v10 of the patch (based on the v9+v9a), that implements the
approach described above.

It turned out to be much easier than I expected (basically just a
rewrite of the pgstat_read_db_statsfile_timestamp() function.

Thanks. I'm giving this another look now. I think the new code means
we no longer need the first_write logic; just let the collector idle
until we get the first request. (If for some reason we considered that
we should really be publishing initial stats as early as possible, we
could just do a write_statsfiles(allDbs) call before entering the main
loop. But I don't see any reason to do this. If you do, please speak
up.)

Also, it seems to me that the new pgstat_db_requested() logic is
slightly bogus (in the "inefficient" sense, not the "incorrect" sense):
we should be comparing the timestamp of the request vs. what's already
on disk instead of blindly returning true if the list is nonempty. If
the request is older than the file, we don't need to write anything and
can discard the request. For example, suppose that backend A sends a
request for a DB; we write the file. If then quickly backend B also
requests stats for the same DB, with the current logic we'd go write the
file, but perhaps backend B would be fine with the file we already
wrote.

Hmmm, you're probably right.

Another point is that I think there's a better way to handle nonexistant
files, instead of having to read the global file and all the DB records
to find the one we want. Just try to read the database file, and only
if that fails try to read the global file and compare the timestamp (so
there might be two timestamps for each DB, one in the global file and
one in the DB-specific file. I don't think this is a problem). The
point is avoid having to read the global file if possible.

I don't think that's a good idea. Keeping the timestamps at one place is
a significant simplification, and I don't think it's worth the
additional complexity. And the overhead is minimal.

So my vote on this change is -1.

So here's v11. I intend to commit this shortly. (I wanted to get it
out before lunch, but I introduced a silly bug that took me a bit to
fix.)

;-)

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#64Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#63)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Tomas Vondra wrote:

On 18.2.2013 16:50, Alvaro Herrera wrote:

Also, it seems to me that the new pgstat_db_requested() logic is
slightly bogus (in the "inefficient" sense, not the "incorrect" sense):
we should be comparing the timestamp of the request vs. what's already
on disk instead of blindly returning true if the list is nonempty. If
the request is older than the file, we don't need to write anything and
can discard the request. For example, suppose that backend A sends a
request for a DB; we write the file. If then quickly backend B also
requests stats for the same DB, with the current logic we'd go write the
file, but perhaps backend B would be fine with the file we already
wrote.

Hmmm, you're probably right.

I left it as is for now; I think it warrants revisiting.

Another point is that I think there's a better way to handle nonexistant
files, instead of having to read the global file and all the DB records
to find the one we want. Just try to read the database file, and only
if that fails try to read the global file and compare the timestamp (so
there might be two timestamps for each DB, one in the global file and
one in the DB-specific file. I don't think this is a problem). The
point is avoid having to read the global file if possible.

I don't think that's a good idea. Keeping the timestamps at one place is
a significant simplification, and I don't think it's worth the
additional complexity. And the overhead is minimal.

So my vote on this change is -1.

Fair enough.

I have pushed it now. Further testing, of course, is always welcome.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#65Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#64)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Alvaro Herrera wrote:

I have pushed it now. Further testing, of course, is always welcome.

Mastodon failed:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodon&amp;dt=2013-02-19%2000%3A00%3A01

probably worth investigating a bit; we might have broken something.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#66Tomas Vondra
tv@fuzzy.cz
In reply to: Alvaro Herrera (#65)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Dne 19.02.2013 05:46, Alvaro Herrera napsal:

Alvaro Herrera wrote:

I have pushed it now. Further testing, of course, is always
welcome.

Mastodon failed:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodon&amp;dt=2013-02-19%2000%3A00%3A01

probably worth investigating a bit; we might have broken something.

Hmmm, interesting. A single Windows machine, while the other Windows
machines seem to work fine (although some of them were not built for a
few weeks).

I'll look into that, but I have no clue why this might happen. Except
maybe for some unexpected timing issue or something ...

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#67Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#66)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Tomas Vondra <tv@fuzzy.cz> writes:

Dne 19.02.2013 05:46, Alvaro Herrera napsal:

Mastodon failed:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodon&amp;dt=2013-02-19%2000%3A00%3A01

probably worth investigating a bit; we might have broken something.

Hmmm, interesting. A single Windows machine, while the other Windows
machines seem to work fine (although some of them were not built for a
few weeks).

Could be random chance --- we've seen the same failure before, eg

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodon&amp;dt=2012-11-25%2006%3A00%3A00

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#68Tomas Vondra
tv@fuzzy.cz
In reply to: Tom Lane (#67)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Dne 19.02.2013 11:27, Tom Lane napsal:

Tomas Vondra <tv@fuzzy.cz> writes:

Dne 19.02.2013 05:46, Alvaro Herrera napsal:

Mastodon failed:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodon&amp;dt=2013-02-19%2000%3A00%3A01

probably worth investigating a bit; we might have broken something.

Hmmm, interesting. A single Windows machine, while the other Windows
machines seem to work fine (although some of them were not built for
a
few weeks).

Could be random chance --- we've seen the same failure before, eg

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodon&amp;dt=2012-11-25%2006%3A00%3A00

Maybe. But why does random chance happens to me only with regression
tests and not lottery, like to normal people?

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#69Tomas Vondra
tv@fuzzy.cz
In reply to: Tom Lane (#67)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 19.2.2013 11:27, Tom Lane wrote:

Tomas Vondra <tv@fuzzy.cz> writes:

Dne 19.02.2013 05:46, Alvaro Herrera napsal:

Mastodon failed:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodon&amp;dt=2013-02-19%2000%3A00%3A01

probably worth investigating a bit; we might have broken something.

Hmmm, interesting. A single Windows machine, while the other Windows
machines seem to work fine (although some of them were not built for a
few weeks).

Could be random chance --- we've seen the same failure before, eg

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mastodon&amp;dt=2012-11-25%2006%3A00%3A00

regards, tom lane

I'm looking at that test, and I'm not really sure about a few details.

First, this function seems pretty useless to me:

=======================================================================
create function wait_for_stats() returns void as $$
declare
start_time timestamptz := clock_timestamp();
updated bool;
begin
-- we don't want to wait forever; loop will exit after 30 seconds
for i in 1 .. 300 loop

-- check to see if indexscan has been sensed
SELECT (st.idx_scan >= pr.idx_scan + 1) INTO updated
FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr
WHERE st.relname='tenk2' AND cl.relname='tenk2';

exit when updated;

-- wait a little
perform pg_sleep(0.1);

-- reset stats snapshot so we can test again
perform pg_stat_clear_snapshot();

end loop;

-- report time waited in postmaster log (where it won't change test
output)
raise log 'wait_for_stats delayed % seconds',
extract(epoch from clock_timestamp() - start_time);
end
$$ language plpgsql;
=======================================================================

AFAIK the stats remain the same within a transaction, and as a function
runs within a transaction, it will either get new data on the first
iteration, or it will run all 300 of them. I've checked several
buildfarm members and I'm yet to see a single duration between 12ms and
30sec.

So IMHO we can replace the function call with pg_sleep(30) and we'll get
about the same effect.

But this obviously does not answer the question why it failed, although
on both occasions there's this log message:

[50b1b7fa.0568:14] LOG: wait_for_stats delayed 34.75 seconds

which essentialy means the stats were not updated before the call to
wait_for_stats().

Anyway, there are these two failing queries:

=======================================================================
-- check effects
SELECT st.seq_scan >= pr.seq_scan + 1,
st.seq_tup_read >= pr.seq_tup_read + cl.reltuples,
st.idx_scan >= pr.idx_scan + 1,
st.idx_tup_fetch >= pr.idx_tup_fetch + 1
FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr
WHERE st.relname='tenk2' AND cl.relname='tenk2';
?column? | ?column? | ?column? | ?column?
----------+----------+----------+----------
t | t | t | t
(1 row)

SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,
st.idx_blks_read + st.idx_blks_hit >= pr.idx_blks + 1
FROM pg_statio_user_tables AS st, pg_class AS cl, prevstats AS pr
WHERE st.relname='tenk2' AND cl.relname='tenk2';
?column? | ?column?
----------+----------
t | t
(1 row)
=======================================================================

The first one returns just falses, the second one retuns either (t,f) or
(f,f) - for the two failures posted by Alvaro and TL earlier today.

I'm really wondering how that could happen. The only thing that I can
think of is some strange timing issue, causing lost requests to write
the stats or maybe some of the stats updates. Hmmm, IIRC the stats are
sent over UDP - couldn't that be related?

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#70Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#69)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Tomas Vondra wrote:

AFAIK the stats remain the same within a transaction, and as a function
runs within a transaction, it will either get new data on the first
iteration, or it will run all 300 of them. I've checked several
buildfarm members and I'm yet to see a single duration between 12ms and
30sec.

No, there's a call to pg_stat_clear_snapshot() that takes care of that.

I'm really wondering how that could happen. The only thing that I can
think of is some strange timing issue, causing lost requests to write
the stats or maybe some of the stats updates. Hmmm, IIRC the stats are
sent over UDP - couldn't that be related?

yes, UDP packet drops can certainly happen. This is considered a
feature (do not cause backends to block when the network socket to stat
collector is swamped; it's better to lose some stat messages instead).

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#71Tomas Vondra
tv@fuzzy.cz
In reply to: Alvaro Herrera (#70)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 19.2.2013 23:31, Alvaro Herrera wrote:> Tomas Vondra wrote:

AFAIK the stats remain the same within a transaction, and as a
function runs within a transaction, it will either get new data on
the first iteration, or it will run all 300 of them. I've checked
several buildfarm members and I'm yet to see a single duration
between 12ms and 30sec.

No, there's a call to pg_stat_clear_snapshot() that takes care of
that.

Aha! Missed that for some reason. Thanks.

I'm really wondering how that could happen. The only thing that I
can think of is some strange timing issue, causing lost requests to
write the stats or maybe some of the stats updates. Hmmm, IIRC the
stats are sent over UDP - couldn't that be related?

yes, UDP packet drops can certainly happen. This is considered a
feature (do not cause backends to block when the network socket to
stat collector is swamped; it's better to lose some stat messages
instead).

Is there anything we could add to the test to identify this? Something
that either shows "stats were sent" and "stats arrived" (maybe in the
log only), or that some UPD packets were dropped?

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#72Jeff Janes
jeff.janes@gmail.com
In reply to: Alvaro Herrera (#62)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On Mon, Feb 18, 2013 at 7:50 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

So here's v11. I intend to commit this shortly. (I wanted to get it
out before lunch, but I introduced a silly bug that took me a bit to
fix.)

On Windows with Mingw I get this:

pgstat.c:4389:8: warning: variable 'found' set but not used
[-Wunused-but-set-variable]

I don't get that on Linux, but I bet that is just the gcc version
(4.6.2 vs 4.4.6) rather than the OS. It looks like it is just a
useless variable, rather than any possible cause of the Windows "make
check" failure (which I can't reproduce).

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#73Kevin Grittner
kgrittn@ymail.com
In reply to: Jeff Janes (#72)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Jeff Janes <jeff.janes@gmail.com> wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

So here's v11.  I intend to commit this shortly.  (I wanted to get it
out before lunch, but I introduced a silly bug that took me a bit to
fix.)

On Windows with Mingw I get this:

pgstat.c:4389:8: warning: variable 'found' set but not used
[-Wunused-but-set-variable]

I don't get that on Linux, but I bet that is just the gcc version
(4.6.2 vs 4.4.6) rather than the OS.

I get it on Linux with gcc version 4.7.2.

It looks like it is just a useless variable

Agreed.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#74Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jeff Janes (#72)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Jeff Janes escribió:

On Mon, Feb 18, 2013 at 7:50 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

So here's v11. I intend to commit this shortly. (I wanted to get it
out before lunch, but I introduced a silly bug that took me a bit to
fix.)

On Windows with Mingw I get this:

pgstat.c:4389:8: warning: variable 'found' set but not used
[-Wunused-but-set-variable]

I don't get that on Linux, but I bet that is just the gcc version
(4.6.2 vs 4.4.6) rather than the OS. It looks like it is just a
useless variable, rather than any possible cause of the Windows "make
check" failure (which I can't reproduce).

Hm, I remember looking at that code and thinking that the "return" there
might not be the best idea because it'd miss running the code that
checks for clock skew; and so the "found" was necessary because the
return was to be taken out. But on second thought, a database for which the
loop terminates early has already run the clock-skew detection code
recently, so that's probably not worth worrying about.

IOW I will just remove that variable. Thanks for the notice.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#75Noah Misch
noah@leadboat.com
In reply to: Alvaro Herrera (#64)
1 attachment(s)
Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On Mon, Feb 18, 2013 at 06:19:12PM -0300, Alvaro Herrera wrote:

I have pushed it now. Further testing, of course, is always welcome.

While investigating stats.sql buildfarm failures, mostly on animals axolotl
and shearwater, I found that this patch (commit 187492b) inadvertently removed
the collector's ability to coalesce inquiries. Every PGSTAT_MTYPE_INQUIRY
received now causes one stats file write. Before, pgstat_recv_inquiry() did:

if (msg->inquiry_time > last_statrequest)
last_statrequest = msg->inquiry_time;

and pgstat_write_statsfile() did:

globalStats.stats_timestamp = GetCurrentTimestamp();
... (work of writing a stats file) ...
last_statwrite = globalStats.stats_timestamp;
last_statrequest = last_statwrite;

If the collector entered pgstat_write_statsfile() with more inquiries waiting
in its socket receive buffer, it would ignore them as being too old once it
finished the write and resumed message processing. Commit 187492b converted
last_statrequest to a "last_statrequests" list that we wipe after each write.

I modeled a machine with slow stats writes using the attached diagnostic patch
(not for commit). It has pgstat_write_statsfiles() sleep just before renaming
the temporary file, and it logs each stats message received. A three second
delay causes stats.sql to fail the way it did on shearwater[1]http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=shearwater&amp;dt=2015-09-23%2002%3A08%3A31 and on
axolotl[2]http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotl&amp;dt=2015-08-04%2019%3A31%3A22. Inquiries accumulate until the socket receive buffer overflows,
at which point the socket drops stats messages whose effects we later test
for. The 3s delay makes us drop some 85% of installcheck stats messages.
Logs from a single VACUUM show receipt of five inquiries ("stats 1:") with 3s
between them:

24239 2015-12-10 04:21:03.439 GMT LOG: connection authorized: user=nmisch database=test
24236 2015-12-10 04:21:03.454 GMT LOG: stats 2: 1888 + 936 = 2824
24236 2015-12-10 04:21:03.454 GMT LOG: stats 2: 2824 + 376 = 3200
24236 2015-12-10 04:21:03.454 GMT LOG: stats 2: 3200 + 824 = 4024
24239 2015-12-10 04:21:03.455 GMT LOG: statement: vacuum pg_class
24236 2015-12-10 04:21:03.455 GMT LOG: stats 1: 4024 + 32 = 4056
24236 2015-12-10 04:21:06.458 GMT LOG: stats 12: 4056 + 88 = 4144
24236 2015-12-10 04:21:06.458 GMT LOG: stats 1: 4144 + 32 = 4176
24239 2015-12-10 04:21:06.463 GMT LOG: disconnection: session time: 0:00:03.025 user=nmisch database=test host=[local]
24236 2015-12-10 04:21:09.486 GMT LOG: stats 1: 4176 + 32 = 4208
24236 2015-12-10 04:21:12.503 GMT LOG: stats 1: 4208 + 32 = 4240
24236 2015-12-10 04:21:15.519 GMT LOG: stats 1: 4240 + 32 = 4272
24236 2015-12-10 04:21:18.535 GMT LOG: stats 9: 4272 + 48 = 4320
24236 2015-12-10 04:21:18.535 GMT LOG: stats 2: 4320 + 936 = 5256
24236 2015-12-10 04:21:18.535 GMT LOG: stats 2: 5256 + 376 = 5632
24236 2015-12-10 04:21:18.535 GMT LOG: stats 2: 5632 + 264 = 5896
24236 2015-12-10 04:21:18.535 GMT LOG: stats 12: 5896 + 88 = 5984

As for how to fix this, the most direct translation of the old logic is to
retain last_statrequests entries that could help coalesce inquiries. I lean
toward that for an initial, back-patched fix. It would be good, though, to
process two closely-spaced, different-database inquiries in one
pgstat_write_statsfiles() call. We do one-database writes and all-databases
writes, but we never write "1 < N < all" databases despite the code prepared
to do so. I tried calling pgstat_write_statsfiles() only when the socket
receive buffer empties. That's dead simple to implement and aggressively
coalesces inquiries (even a 45s sleep did not break stats.sql), but it starves
inquirers if the socket receive buffer stays full persistently. Ideally, I'd
want to process inquiries when the buffer empties _or_ when the oldest inquiry
is X seconds old. I don't have a more-specific design in mind, though.

Thanks,
nm

[1]: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=shearwater&amp;dt=2015-09-23%2002%3A08%3A31
[2]: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotl&amp;dt=2015-08-04%2019%3A31%3A22

Attachments:

stat-coalesce-v1.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index ab018c4..e6f04b0 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3314,6 +3314,7 @@ pgstat_send_bgwriter(void)
 NON_EXEC_STATIC void
 PgstatCollectorMain(int argc, char *argv[])
 {
+	unsigned total = 0;
 	int			len;
 	PgStat_Msg	msg;
 	int			wr;
@@ -3425,6 +3426,10 @@ PgstatCollectorMain(int argc, char *argv[])
 						 errmsg("could not read statistics message: %m")));
 			}
 
+			elog(LOG, "stats %d: %u + %u = %u",
+				 msg.msg_hdr.m_type, total, len, total + len);
+			total += len;
+
 			/*
 			 * We ignore messages that are smaller than our common header
 			 */
@@ -3817,6 +3822,13 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	 */
 	fputc('E', fpout);
 
+	if (1)
+	{
+		PG_SETMASK(&BlockSig);
+		pg_usleep(3 * 1000000L);
+		PG_SETMASK(&UnBlockSig);
+	}
+
 	if (ferror(fpout))
 	{
 		ereport(LOG,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index f5be70f..b042062 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -47,6 +47,9 @@ begin
       FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr
      WHERE st.relname='tenk2' AND cl.relname='tenk2';
 
+    raise log 'stats updated as of % snapshot: %',
+      pg_stat_get_snapshot_timestamp(), updated;
+
     exit when updated;
 
     -- wait a little
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index cd2d592..e87454d 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -42,6 +42,9 @@ begin
       FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr
      WHERE st.relname='tenk2' AND cl.relname='tenk2';
 
+    raise log 'stats updated as of % snapshot: %',
+      pg_stat_get_snapshot_timestamp(), updated;
+
     exit when updated;
 
     -- wait a little
#76Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Noah Misch (#75)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Hi,

On 12/22/2015 03:49 PM, Noah Misch wrote:

On Mon, Feb 18, 2013 at 06:19:12PM -0300, Alvaro Herrera wrote:

I have pushed it now. Further testing, of course, is always welcome.

While investigating stats.sql buildfarm failures, mostly on animals axolotl
and shearwater, I found that this patch (commit 187492b) inadvertently removed
the collector's ability to coalesce inquiries. Every PGSTAT_MTYPE_INQUIRY
received now causes one stats file write. Before, pgstat_recv_inquiry() did:

if (msg->inquiry_time > last_statrequest)
last_statrequest = msg->inquiry_time;

and pgstat_write_statsfile() did:

globalStats.stats_timestamp = GetCurrentTimestamp();
... (work of writing a stats file) ...
last_statwrite = globalStats.stats_timestamp;
last_statrequest = last_statwrite;

If the collector entered pgstat_write_statsfile() with more inquiries waiting
in its socket receive buffer, it would ignore them as being too old once it
finished the write and resumed message processing. Commit 187492b converted
last_statrequest to a "last_statrequests" list that we wipe after each write.

So essentially we remove the list of requests, and thus on the next
round we don't know the timestamp of the last request and write the file
again unnecessarily. Do I get that right?

What if we instead kept the list but marked the requests as 'invalid' so
that we know the timestamp? In that case we'd be able to do pretty much
exactly what the original code did (but at per-db granularity).

We'd have to cleanup the list once in a while not to grow excessively
large, but something like removing entries older than
PGSTAT_STAT_INTERVAL should be enough.

Actually, I think that was the idea when I wrote the patch, but
apparently I got distracted and it did not make it into the code.

I modeled a machine with slow stats writes using the attached diagnostic patch
(not for commit). It has pgstat_write_statsfiles() sleep just before renaming
the temporary file, and it logs each stats message received. A three second
delay causes stats.sql to fail the way it did on shearwater[1] and on
axolotl[2]. Inquiries accumulate until the socket receive buffer overflows,
at which point the socket drops stats messages whose effects we later test
for. The 3s delay makes us drop some 85% of installcheck stats messages.
Logs from a single VACUUM show receipt of five inquiries ("stats 1:") with 3s
between them:

24239 2015-12-10 04:21:03.439 GMT LOG: connection authorized: user=nmisch database=test
24236 2015-12-10 04:21:03.454 GMT LOG: stats 2: 1888 + 936 = 2824
24236 2015-12-10 04:21:03.454 GMT LOG: stats 2: 2824 + 376 = 3200
24236 2015-12-10 04:21:03.454 GMT LOG: stats 2: 3200 + 824 = 4024
24239 2015-12-10 04:21:03.455 GMT LOG: statement: vacuum pg_class
24236 2015-12-10 04:21:03.455 GMT LOG: stats 1: 4024 + 32 = 4056
24236 2015-12-10 04:21:06.458 GMT LOG: stats 12: 4056 + 88 = 4144
24236 2015-12-10 04:21:06.458 GMT LOG: stats 1: 4144 + 32 = 4176
24239 2015-12-10 04:21:06.463 GMT LOG: disconnection: session time: 0:00:03.025 user=nmisch database=test host=[local]
24236 2015-12-10 04:21:09.486 GMT LOG: stats 1: 4176 + 32 = 4208
24236 2015-12-10 04:21:12.503 GMT LOG: stats 1: 4208 + 32 = 4240
24236 2015-12-10 04:21:15.519 GMT LOG: stats 1: 4240 + 32 = 4272
24236 2015-12-10 04:21:18.535 GMT LOG: stats 9: 4272 + 48 = 4320
24236 2015-12-10 04:21:18.535 GMT LOG: stats 2: 4320 + 936 = 5256
24236 2015-12-10 04:21:18.535 GMT LOG: stats 2: 5256 + 376 = 5632
24236 2015-12-10 04:21:18.535 GMT LOG: stats 2: 5632 + 264 = 5896
24236 2015-12-10 04:21:18.535 GMT LOG: stats 12: 5896 + 88 = 5984

As for how to fix this, the most direct translation of the old logic is to
retain last_statrequests entries that could help coalesce inquiries.I lean
toward that for an initial, back-patched fix.

That seems reasonable and I believe it's pretty much the idea I came up
with above, right? Depending on how you define "entries that could help
coalesce inquiries".

It would be good, though, to
process two closely-spaced, different-database inquiries in one
pgstat_write_statsfiles() call. We do one-database writes and all-databases
writes, but we never write "1 < N < all" databases despite the code prepared
to do so. I tried calling pgstat_write_statsfiles() only when the socket
receive buffer empties. That's dead simple to implement and aggressively
coalesces inquiries (even a 45s sleep did not break stats.sql), but it starves
inquirers if the socket receive buffer stays full persistently. Ideally, I'd
want to process inquiries when the buffer empties _or_ when the oldest inquiry
is X seconds old. I don't have a more-specific design in mind, though.

That's a nice idea, but I agree that binding the coalescing to buffer
like this seems like a pretty bad idea exactly because of the starving.
What might work though is if we could look at how much data is there in
the buffer, process only those requests and then write the stats files.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#77Noah Misch
noah@leadboat.com
In reply to: Tomas Vondra (#76)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On Mon, Feb 01, 2016 at 07:03:45PM +0100, Tomas Vondra wrote:

On 12/22/2015 03:49 PM, Noah Misch wrote:

On Mon, Feb 18, 2013 at 06:19:12PM -0300, Alvaro Herrera wrote:

I have pushed it now. Further testing, of course, is always welcome.

While investigating stats.sql buildfarm failures, mostly on animals axolotl
and shearwater, I found that this patch (commit 187492b) inadvertently removed
the collector's ability to coalesce inquiries. Every PGSTAT_MTYPE_INQUIRY
received now causes one stats file write. Before, pgstat_recv_inquiry() did:

if (msg->inquiry_time > last_statrequest)
last_statrequest = msg->inquiry_time;

and pgstat_write_statsfile() did:

globalStats.stats_timestamp = GetCurrentTimestamp();
... (work of writing a stats file) ...
last_statwrite = globalStats.stats_timestamp;
last_statrequest = last_statwrite;

If the collector entered pgstat_write_statsfile() with more inquiries waiting
in its socket receive buffer, it would ignore them as being too old once it
finished the write and resumed message processing. Commit 187492b converted
last_statrequest to a "last_statrequests" list that we wipe after each write.

So essentially we remove the list of requests, and thus on the next round we
don't know the timestamp of the last request and write the file again
unnecessarily. Do I get that right?

Essentially right. Specifically, for each database, we must remember the
globalStats.stats_timestamp of the most recent write. It could be okay to
forget the last request timestamp. (I now doubt I picked the best lines to
quote, above.)

What if we instead kept the list but marked the requests as 'invalid' so
that we know the timestamp? In that case we'd be able to do pretty much
exactly what the original code did (but at per-db granularity).

The most natural translation of the old code would be to add a write_time
field to struct DBWriteRequest. One can infer "invalid" from write_time and
request_time. There are other reasonable designs, though.

We'd have to cleanup the list once in a while not to grow excessively large,
but something like removing entries older than PGSTAT_STAT_INTERVAL should
be enough.

Specifically, if you assume the socket delivers messages in the order sent,
you may as well discard entries having write_time at least
PGSTAT_STAT_INTERVAL older than the most recent cutoff_time seen in a
PgStat_MsgInquiry. That delivery order assumption does not hold in general,
but I expect it's close enough for this purpose.

As for how to fix this, the most direct translation of the old logic is to
retain last_statrequests entries that could help coalesce inquiries.I lean
toward that for an initial, back-patched fix.

That seems reasonable and I believe it's pretty much the idea I came up with
above, right? Depending on how you define "entries that could help coalesce
inquiries".

Yes.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#78Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Noah Misch (#77)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 02/03/2016 06:46 AM, Noah Misch wrote:

On Mon, Feb 01, 2016 at 07:03:45PM +0100, Tomas Vondra wrote:

On 12/22/2015 03:49 PM, Noah Misch wrote:

...

If the collector entered pgstat_write_statsfile() with more
inquiries waiting in its socket receive buffer, it would ignore
them as being too old once it finished the write and resumed
message processing. Commit 187492b converted last_statrequest to
a "last_statrequests" list that we wipe after each write.

So essentially we remove the list of requests, and thus on the next
round we don't know the timestamp of the last request and write the
file again unnecessarily. Do I get that right?

Essentially right. Specifically, for each database, we must remember
the globalStats.stats_timestamp of the most recent write. It could be
okay to forget the last request timestamp. (I now doubt I picked the
best lines to quote, above.)

What if we instead kept the list but marked the requests as
'invalid' so that we know the timestamp? In that case we'd be able
to do pretty much exactly what the original code did (but at per-db
granularity).

The most natural translation of the old code would be to add a
write_time field to struct DBWriteRequest. One can infer "invalid"
from write_time and request_time. There are other reasonable designs,
though.

OK, makes sense. I'll look into that.

We'd have to cleanup the list once in a while not to grow
excessively large, but something like removing entries older than
PGSTAT_STAT_INTERVAL should be enough.

Specifically, if you assume the socket delivers messages in the order
sent, you may as well discard entries having write_time at least
PGSTAT_STAT_INTERVAL older than the most recent cutoff_time seen in a
PgStat_MsgInquiry. That delivery order assumption does not hold in
general, but I expect it's close enough for this purpose.

Agreed. If I get that right, it might result in some false negatives (in
the sense that we'll remove a record too early, forcing us to write the
database file again). But I expect that to be a rare case.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#79Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Noah Misch (#77)
1 attachment(s)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Hi,

On 02/03/2016 06:46 AM, Noah Misch wrote:

On Mon, Feb 01, 2016 at 07:03:45PM +0100, Tomas Vondra wrote:

On 12/22/2015 03:49 PM, Noah Misch wrote:

On Mon, Feb 18, 2013 at 06:19:12PM -0300, Alvaro Herrera wrote:

I have pushed it now. Further testing, of course, is always welcome.

While investigating stats.sql buildfarm failures, mostly on animals
axolotl and shearwater, I found that this patch (commit 187492b)
inadvertently removed the collector's ability to coalesce inquiries.
Every PGSTAT_MTYPE_INQUIRY received now causes one stats file write.
Before, pgstat_recv_inquiry() did:

if (msg->inquiry_time > last_statrequest)
last_statrequest = msg->inquiry_time;

and pgstat_write_statsfile() did:

globalStats.stats_timestamp = GetCurrentTimestamp();
... (work of writing a stats file) ...
last_statwrite = globalStats.stats_timestamp;
last_statrequest = last_statwrite;

If the collector entered pgstat_write_statsfile() with more inquiries
waiting in its socket receive buffer, it would ignore them as being too
old once it finished the write and resumed message processing. Commit
187492b converted last_statrequest to a "last_statrequests" list that we
wipe after each write.

So I've been looking at this today, and I think the attached patch should do
the trick. I can't really verify it, because I've been unable to reproduce the
non-coalescing - I presume it requires much slower system (axolotl is RPi, not
sure about shearwater).

The patch simply checks DBEntry,stats_timestamp in pgstat_recv_inquiry() and
ignores requests that are already resolved by the last write (maybe this should
accept a file written up to PGSTAT_STAT_INTERVAL in the past).

The required field is already in DBEntry (otherwise it'd be impossible to
determine if backends need to send inquiries or not), so this is pretty trivial
change. I can't think of a simpler patch.

Can you try applying the patch on a machine where the problem is reproducible?
I might have some RPi machines laying around (for other purposes).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

pgstat-coalesce-v1.patchbinary/octet-stream; name=pgstat-coalesce-v1.patchDownload
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index da768c6..e3d4e17 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4706,6 +4706,20 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
 	}
 
 	/*
+	 * Ignore requests that are already resolved by the last write.
+	 *
+	 * We discard the queue of requests at the end of pgstat_write_statsfiles(),
+	 * so the requests already waiting on the UDP socket at that moment can't
+	 * be discarded in the previous loop.
+	 *
+	 * XXX Maybe this should also care about the clock skew, just like the
+	 *     block a few lines down.
+	 */
+	dbentry = pgstat_get_db_entry(msg->databaseid, false);
+	if ((dbentry != NULL) && (msg->cutoff_time > dbentry->stats_timestamp))
+		return;
+
+	/*
 	 * There's no request for this DB yet, so create one.
 	 */
 	newreq = palloc(sizeof(DBWriteRequest));
@@ -4722,7 +4736,6 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
 	 * retreat in the system clock reading could otherwise cause us to neglect
 	 * to update the stats file for a long time.
 	 */
-	dbentry = pgstat_get_db_entry(msg->databaseid, false);
 	if ((dbentry != NULL) && (msg->clock_time < dbentry->stats_timestamp))
 	{
 		TimestampTz cur_ts = GetCurrentTimestamp();
#80Noah Misch
noah@leadboat.com
In reply to: Tomas Vondra (#79)
1 attachment(s)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On Thu, Mar 03, 2016 at 06:08:07AM +0100, Tomas Vondra wrote:

On 02/03/2016 06:46 AM, Noah Misch wrote:

On Mon, Feb 01, 2016 at 07:03:45PM +0100, Tomas Vondra wrote:

On 12/22/2015 03:49 PM, Noah Misch wrote:

On Mon, Feb 18, 2013 at 06:19:12PM -0300, Alvaro Herrera wrote:

I have pushed it now. Further testing, of course, is always welcome.

While investigating stats.sql buildfarm failures, mostly on animals
axolotl and shearwater, I found that this patch (commit 187492b)
inadvertently removed the collector's ability to coalesce inquiries.
Every PGSTAT_MTYPE_INQUIRY received now causes one stats file write.
Before, pgstat_recv_inquiry() did:

if (msg->inquiry_time > last_statrequest)
last_statrequest = msg->inquiry_time;

and pgstat_write_statsfile() did:

globalStats.stats_timestamp = GetCurrentTimestamp();
... (work of writing a stats file) ...
last_statwrite = globalStats.stats_timestamp;
last_statrequest = last_statwrite;

If the collector entered pgstat_write_statsfile() with more inquiries
waiting in its socket receive buffer, it would ignore them as being too
old once it finished the write and resumed message processing. Commit
187492b converted last_statrequest to a "last_statrequests" list that we
wipe after each write.

So I've been looking at this today, and I think the attached patch should do
the trick. I can't really verify it, because I've been unable to reproduce the
non-coalescing - I presume it requires much slower system (axolotl is RPi, not
sure about shearwater).

The patch simply checks DBEntry,stats_timestamp in pgstat_recv_inquiry() and
ignores requests that are already resolved by the last write (maybe this
should accept a file written up to PGSTAT_STAT_INTERVAL in the past).

The required field is already in DBEntry (otherwise it'd be impossible to
determine if backends need to send inquiries or not), so this is pretty
trivial change. I can't think of a simpler patch.

Can you try applying the patch on a machine where the problem is
reproducible? I might have some RPi machines laying around (for other
purposes).

I've not attempted to study the behavior on slow hardware. Instead, my report
used stat-coalesce-v1.patch[1]/messages/by-id/20151222144950.GA2553834@tornado.leadboat.com to simulate slow writes. (That diagnostic
patch no longer applies cleanly, so I'm attaching a rebased version. I've
changed the patch name from "stat-coalesce" to "slow-stat-simulate" to
more-clearly distinguish it from the "pgstat-coalesce" patch.) Problems
remain after applying your patch; consider "VACUUM pg_am" behavior:

9.2 w/ stat-coalesce-v1.patch:
VACUUM returns in 3s, stats collector writes each file 1x over 3s
HEAD w/ slow-stat-simulate-v2.patch:
VACUUM returns in 3s, stats collector writes each file 5x over 15s
HEAD w/ slow-stat-simulate-v2.patch and your patch:
VACUUM returns in 10s, stats collector writes no files over 10s

[1]: /messages/by-id/20151222144950.GA2553834@tornado.leadboat.com

Attachments:

slow-stat-simulate-v2.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 14afef6..4308df2 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3444,6 +3444,7 @@ pgstat_send_bgwriter(void)
 NON_EXEC_STATIC void
 PgstatCollectorMain(int argc, char *argv[])
 {
+	unsigned total = 0;
 	int			len;
 	PgStat_Msg	msg;
 	int			wr;
@@ -3555,6 +3556,10 @@ PgstatCollectorMain(int argc, char *argv[])
 						 errmsg("could not read statistics message: %m")));
 			}
 
+			elog(LOG, "stats %d: %u + %u = %u",
+				 msg.msg_hdr.m_type, total, len, total + len);
+			total += len;
+
 			/*
 			 * We ignore messages that are smaller than our common header
 			 */
@@ -3947,6 +3952,13 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	 */
 	fputc('E', fpout);
 
+	if (1)
+	{
+		PG_SETMASK(&BlockSig);
+		pg_usleep(3 * 1000000L);
+		PG_SETMASK(&UnBlockSig);
+	}
+
 	if (ferror(fpout))
 	{
 		ereport(LOG,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index a811265..064cf9f 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -63,6 +63,9 @@ begin
     SELECT (n_tup_ins > 0) INTO updated3
       FROM pg_stat_user_tables WHERE relname='trunc_stats_test';
 
+    raise log 'stats updated as of % snapshot: 1:% 2:% 3:%',
+      pg_stat_get_snapshot_timestamp(), updated1, updated2, updated3;
+
     exit when updated1 and updated2 and updated3;
 
     -- wait a little
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index b3e2efa..d252124 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -58,6 +58,9 @@ begin
     SELECT (n_tup_ins > 0) INTO updated3
       FROM pg_stat_user_tables WHERE relname='trunc_stats_test';
 
+    raise log 'stats updated as of % snapshot: 1:% 2:% 3:%',
+      pg_stat_get_snapshot_timestamp(), updated1, updated2, updated3;
+
     exit when updated1 and updated2 and updated3;
 
     -- wait a little
#81Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Noah Misch (#80)
1 attachment(s)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On Sun, 2016-03-13 at 18:46 -0400, Noah Misch wrote:

On Thu, Mar 03, 2016 at 06:08:07AM +0100, Tomas Vondra wrote:

On 02/03/2016 06:46 AM, Noah Misch wrote:

On Mon, Feb 01, 2016 at 07:03:45PM +0100, Tomas Vondra wrote:

On 12/22/2015 03:49 PM, Noah Misch wrote:

On Mon, Feb 18, 2013 at 06:19:12PM -0300, Alvaro Herrera
wrote:

I have pushed it now.  Further testing, of course, is
always welcome.

While investigating stats.sql buildfarm failures, mostly on
animals
axolotl and shearwater, I found that this patch (commit
187492b)
inadvertently removed the collector's ability to coalesce
inquiries.
Every PGSTAT_MTYPE_INQUIRY received now causes one stats file
write.
Before, pgstat_recv_inquiry() did:

if (msg->inquiry_time > last_statrequest)
last_statrequest = msg->inquiry_time;

and pgstat_write_statsfile() did:

globalStats.stats_timestamp = GetCurrentTimestamp();
... (work of writing a stats file) ...
last_statwrite = globalStats.stats_timestamp;
last_statrequest = last_statwrite;

If the collector entered pgstat_write_statsfile() with more
inquiries
waiting in its socket receive buffer, it would ignore them as
being too
old once it finished the write and resumed message
processing. Commit
187492b converted last_statrequest to a "last_statrequests"
list that we
wipe after each write.

So I've been looking at this today, and I think the attached patch
should do
the trick. I can't really verify it, because I've been unable to
reproduce the
non-coalescing - I presume it requires much slower system (axolotl
is RPi, not
sure about shearwater).

The patch simply checks DBEntry,stats_timestamp in
pgstat_recv_inquiry() and
ignores requests that are already resolved by the last write (maybe
this
should accept a file written up to PGSTAT_STAT_INTERVAL in the
past).

The required field is already in DBEntry (otherwise it'd be
impossible to
determine if backends need to send inquiries or not), so this is
pretty
trivial change. I can't think of a simpler patch.

Can you try applying the patch on a machine where the problem is
reproducible? I might have some RPi machines laying around (for
other
purposes).

I've not attempted to study the behavior on slow hardware.  Instead,
my report
used stat-coalesce-v1.patch[1] to simulate slow writes.  (That
diagnostic
patch no longer applies cleanly, so I'm attaching a rebased
version.  I've
changed the patch name from "stat-coalesce" to "slow-stat-simulate"
to
more-clearly distinguish it from the "pgstat-coalesce"
patch.)  Problems
remain after applying your patch; consider "VACUUM pg_am" behavior:

9.2 w/ stat-coalesce-v1.patch:
  VACUUM returns in 3s, stats collector writes each file 1x over 3s
HEAD w/ slow-stat-simulate-v2.patch:
  VACUUM returns in 3s, stats collector writes each file 5x over 15s
HEAD w/ slow-stat-simulate-v2.patch and your patch:
  VACUUM returns in 10s, stats collector writes no files over 10s

Oh damn, the timestamp comparison in pgstat_recv_inquiry should be in
the opposite direction. After fixing that "VACUUM pg_am" completes in 3
seconds and writes each file just once.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

pgstat-coalesce-v2.patchtext/x-patch; charset=UTF-8; name=pgstat-coalesce-v2.patchDownload
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 14afef6..e750d46 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4836,6 +4836,20 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
 	}
 
 	/*
+	 * Ignore requests that are already resolved by the last write.
+	 *
+	 * We discard the queue of requests at the end of pgstat_write_statsfiles(),
+	 * so the requests already waiting on the UDP socket at that moment can't
+	 * be discarded in the previous loop.
+	 *
+	 * XXX Maybe this should also care about the clock skew, just like the
+	 *     block a few lines down.
+	 */
+	dbentry = pgstat_get_db_entry(msg->databaseid, false);
+	if ((dbentry != NULL) && (msg->cutoff_time <= dbentry->stats_timestamp))
+		return;
+
+	/*
 	 * There's no request for this DB yet, so create one.
 	 */
 	newreq = palloc(sizeof(DBWriteRequest));
@@ -4852,7 +4866,6 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
 	 * retreat in the system clock reading could otherwise cause us to neglect
 	 * to update the stats file for a long time.
 	 */
-	dbentry = pgstat_get_db_entry(msg->databaseid, false);
 	if ((dbentry != NULL) && (msg->clock_time < dbentry->stats_timestamp))
 	{
 		TimestampTz cur_ts = GetCurrentTimestamp();
#82Noah Misch
noah@leadboat.com
In reply to: Tomas Vondra (#81)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

[Aside: your new mail editor is rewrapping lines in quoted material, and the
result is messy. I have rerewrapped one paragraph below.]

On Mon, Mar 14, 2016 at 02:00:03AM +0100, Tomas Vondra wrote:

On Sun, 2016-03-13 at 18:46 -0400, Noah Misch wrote:

I've not attempted to study the behavior on slow hardware.��Instead, my
report used stat-coalesce-v1.patch[1] to simulate slow writes.��(That
diagnostic patch no longer applies cleanly, so I'm attaching a rebased
version.��I've changed the patch name from "stat-coalesce" to
"slow-stat-simulate" to more-clearly distinguish it from the
"pgstat-coalesce" patch.)��Problems remain after applying your patch;
consider "VACUUM pg_am" behavior:

9.2 w/ stat-coalesce-v1.patch:
� VACUUM returns in 3s, stats collector writes each file 1x over 3s
HEAD w/ slow-stat-simulate-v2.patch:
� VACUUM returns in 3s, stats collector writes each file 5x over 15s
HEAD w/ slow-stat-simulate-v2.patch and your patch:
� VACUUM returns in 10s, stats collector writes no files over 10s

Oh damn, the timestamp comparison in pgstat_recv_inquiry should be in
the opposite direction. After fixing that "VACUUM pg_am" completes in 3
seconds and writes each file just once.

That fixes things. "make check" passes under an 8s stats write time.

--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4836,6 +4836,20 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
}
/*
+	 * Ignore requests that are already resolved by the last write.
+	 *
+	 * We discard the queue of requests at the end of pgstat_write_statsfiles(),
+	 * so the requests already waiting on the UDP socket at that moment can't
+	 * be discarded in the previous loop.
+	 *
+	 * XXX Maybe this should also care about the clock skew, just like the
+	 *     block a few lines down.

Yes, it should. (The problem is large (>~100s), backward clock resets, not
skew.) A clock reset causing "msg->clock_time < dbentry->stats_timestamp"
will usually also cause "msg->cutoff_time < dbentry->stats_timestamp". Such
cases need the correction a few lines down.

The other thing needed here is to look through and update comments about
last_statrequests. For example, this loop is dead code due to us writing
files as soon as we receive one inquiry:

/*
* Find the last write request for this DB. If it's older than the
* request's cutoff time, update it; otherwise there's nothing to do.
*
* Note that if a request is found, we return early and skip the below
* check for clock skew. This is okay, since the only way for a DB
* request to be present in the list is that we have been here since the
* last write round.
*/
slist_foreach(iter, &last_statrequests) ...

I'm okay keeping the dead code for future flexibility, but the comment should
reflect that.

Thanks,
nm

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#83Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Noah Misch (#82)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Hi,

On 03/14/2016 07:14 AM, Noah Misch wrote:

[Aside: your new mail editor is rewrapping lines in quoted material, and the
result is messy. I have rerewrapped one paragraph below.]

Thanks, I've noticed that too. I've been testing Evolution in the past
few days, and apparently the line wrapping algorithm is broken. I've
switched back to Thunderbird, so hopefully that'll fix it.

On Mon, Mar 14, 2016 at 02:00:03AM +0100, Tomas Vondra wrote:

On Sun, 2016-03-13 at 18:46 -0400, Noah Misch wrote:

I've not attempted to study the behavior on slow hardware. Instead, my
report used stat-coalesce-v1.patch[1] to simulate slow writes. (That
diagnostic patch no longer applies cleanly, so I'm attaching a rebased
version. I've changed the patch name from "stat-coalesce" to
"slow-stat-simulate" to more-clearly distinguish it from the
"pgstat-coalesce" patch.) Problems remain after applying your patch;
consider "VACUUM pg_am" behavior:

9.2 w/ stat-coalesce-v1.patch:
VACUUM returns in 3s, stats collector writes each file 1x over 3s
HEAD w/ slow-stat-simulate-v2.patch:
VACUUM returns in 3s, stats collector writes each file 5x over 15s
HEAD w/ slow-stat-simulate-v2.patch and your patch:
VACUUM returns in 10s, stats collector writes no files over 10s

Oh damn, the timestamp comparison in pgstat_recv_inquiry should be in
the opposite direction. After fixing that "VACUUM pg_am" completes in 3
seconds and writes each file just once.

That fixes things. "make check" passes under an 8s stats write time.

OK, good.

--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4836,6 +4836,20 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
}
/*
+	 * Ignore requests that are already resolved by the last write.
+	 *
+	 * We discard the queue of requests at the end of pgstat_write_statsfiles(),
+	 * so the requests already waiting on the UDP socket at that moment can't
+	 * be discarded in the previous loop.
+	 *
+	 * XXX Maybe this should also care about the clock skew, just like the
+	 *     block a few lines down.

Yes, it should. (The problem is large (>~100s), backward clock resets, not
skew.) A clock reset causing "msg->clock_time < dbentry->stats_timestamp"
will usually also cause "msg->cutoff_time < dbentry->stats_timestamp". Such
cases need the correction a few lines down.

I'll look into that. I have to admit I have a hard time reasoning about
the code handling clock skew, so it might take some time, though.

The other thing needed here is to look through and update comments about
last_statrequests. For example, this loop is dead code due to us writing
files as soon as we receive one inquiry:

/*
* Find the last write request for this DB. If it's older than the
* request's cutoff time, update it; otherwise there's nothing to do.
*
* Note that if a request is found, we return early and skip the below
* check for clock skew. This is okay, since the only way for a DB
* request to be present in the list is that we have been here since the
* last write round.
*/
slist_foreach(iter, &last_statrequests) ...

I'm okay keeping the dead code for future flexibility, but the comment should
reflect that.

Yes, that's another thing that I'd like to look into. Essentially the
problem is that we always process the inquiries one by one, so we never
actually see a list with more than a single element. Correct?

I think the best way to address that is to peek is to first check how
much data is in the UDP queue, and then fetching all of that before
actually doing the writes. Peeking at the number of requests first (or
even some reasonable hard-coded limit) should should prevent starving
the inquirers in case of a steady stream or inquiries.

regards
Tomas

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#84Noah Misch
noah@leadboat.com
In reply to: Tomas Vondra (#83)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On Mon, Mar 14, 2016 at 01:33:08PM +0100, Tomas Vondra wrote:

On 03/14/2016 07:14 AM, Noah Misch wrote:

On Mon, Mar 14, 2016 at 02:00:03AM +0100, Tomas Vondra wrote:

+	 * XXX Maybe this should also care about the clock skew, just like the
+	 *     block a few lines down.

Yes, it should. (The problem is large (>~100s), backward clock resets, not
skew.) A clock reset causing "msg->clock_time < dbentry->stats_timestamp"
will usually also cause "msg->cutoff_time < dbentry->stats_timestamp". Such
cases need the correction a few lines down.

I'll look into that. I have to admit I have a hard time reasoning about the
code handling clock skew, so it might take some time, though.

No hurry; it would be no problem to delay this several months.

The other thing needed here is to look through and update comments about
last_statrequests. For example, this loop is dead code due to us writing
files as soon as we receive one inquiry:

/*
* Find the last write request for this DB. If it's older than the
* request's cutoff time, update it; otherwise there's nothing to do.
*
* Note that if a request is found, we return early and skip the below
* check for clock skew. This is okay, since the only way for a DB
* request to be present in the list is that we have been here since the
* last write round.
*/
slist_foreach(iter, &last_statrequests) ...

I'm okay keeping the dead code for future flexibility, but the comment should
reflect that.

Yes, that's another thing that I'd like to look into. Essentially the
problem is that we always process the inquiries one by one, so we never
actually see a list with more than a single element. Correct?

Correct.

I think the best way to address that is to peek is to first check how much
data is in the UDP queue, and then fetching all of that before actually
doing the writes. Peeking at the number of requests first (or even some
reasonable hard-coded limit) should should prevent starving the inquirers in
case of a steady stream or inquiries.

Now that you mention it, a hard-coded limit sounds good: write the files for
pending inquiries whenever the socket empties or every N messages processed,
whichever comes first. I don't think the amount of pending UDP data is
portably available, and I doubt it's important. Breaking every, say, one
thousand messages will make the collector predictably responsive to inquiries,
and that is the important property.

I would lean toward making this part 9.7-only; it would be a distinct patch
from the one previously under discussion.

Thanks,
nm

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#85Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Noah Misch (#84)
1 attachment(s)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Hi,

On 03/15/2016 03:04 AM, Noah Misch wrote:

On Mon, Mar 14, 2016 at 01:33:08PM +0100, Tomas Vondra wrote:

On 03/14/2016 07:14 AM, Noah Misch wrote:

On Mon, Mar 14, 2016 at 02:00:03AM +0100, Tomas Vondra wrote:

+	 * XXX Maybe this should also care about the clock skew, just like the
+	 *     block a few lines down.

Yes, it should. (The problem is large (>~100s), backward clock resets, not
skew.) A clock reset causing "msg->clock_time < dbentry->stats_timestamp"
will usually also cause "msg->cutoff_time < dbentry->stats_timestamp". Such
cases need the correction a few lines down.

I'll look into that. I have to admit I have a hard time reasoning about the
code handling clock skew, so it might take some time, though.

No hurry; it would be no problem to delay this several months.

Attached is a patch that should fix the coalescing, including the clock
skew detection. In the end I reorganized the code a bit, moving the
check at the end, after the clock skew detection. Otherwise I'd have to
do the clock skew detection on multiple places, and that seemed ugly.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

pgstat-coalesce-v3.patchbinary/octet-stream; name=pgstat-coalesce-v3.patchDownload
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 1467355..1345e8d 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4865,13 +4865,13 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
 	}
 
 	/*
-	 * There's no request for this DB yet, so create one.
+	 * There's no request for this DB yet, so create one. Don't add it to the
+	 * list yet - we will check clock skew and the existing db entry first.
 	 */
 	newreq = palloc(sizeof(DBWriteRequest));
 
 	newreq->databaseid = msg->databaseid;
 	newreq->request_time = msg->clock_time;
-	slist_push_head(&last_statrequests, &newreq->next);
 
 	/*
 	 * If the requestor's local clock time is older than stats_timestamp, we
@@ -4908,6 +4908,28 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
 			dbentry->stats_timestamp = cur_ts - 1;
 		}
 	}
+
+	/*
+	 * Ignore requests that are already resolved by the last write.
+	 *
+	 * We discard the list of requests after writing the stats files, so the
+	 * requests that are already waiting on the UDP socket at that moment
+	 * won't be discarded in the loop at the beginning of the method. But we
+	 * can skip them here, if we found the database entry.
+	 *
+	 * We newer skip the requests if we detected clock skew, though. In that
+	 * case we want to write the files anyway, to get in sync. Simply check
+	 * whether we tweaked the request time in the previous block.
+	 */
+	if ((dbentry != NULL) && (msg->cutoff_time <= dbentry->stats_timestamp)
+						  && (newreq->request_time == msg->clock_time))
+	{
+		pfree(newreq);
+		return;
+	}
+
+	/* The file is stale or there was a clock skew, so request a write. */
+	slist_push_head(&last_statrequests, &newreq->next);
 }
 
 
#86Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#85)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

Attached is a patch that should fix the coalescing, including the clock
skew detection. In the end I reorganized the code a bit, moving the
check at the end, after the clock skew detection. Otherwise I'd have to
do the clock skew detection on multiple places, and that seemed ugly.

I hadn't been paying any attention to this thread, I must confess.
But I rediscovered this no-coalescing problem while pursuing the poor
behavior for shared catalogs that Peter complained of in
/messages/by-id/56AD41AC.1030509@gmx.net

I posted a patch at
/messages/by-id/13023.1464213041@sss.pgh.pa.us
which I think is functionally equivalent to what you have here, but
it goes to some lengths to make the code more readable, whereas this
is just adding another layer of complication to something that's
already a mess (eg, the request_time field is quite useless as-is).
So I'd like to propose pushing that in place of this patch ... do you
care to review it first?

Reacting to the thread overall:

I see Noah's concern about wanting to merge the write work for requests
about different databases. I've got mixed feelings about that: it's
arguable that any such change would make things worse not better.
In particular, it's inevitable that trying to merge writes will result
in delaying the response to the first request, whether or not we are
able to merge anything. That's not good in itself, and it means that
we can't hope to merge requests over any very long interval, which very
possibly will prevent any merging from happening in real situations.
Also, considering that we know the stats collector can be pretty slow
to respond at all under load, I'm worried that this would result in
more outright failures.

Moreover, what we'd hope to gain from merging is fewer writes of the
global stats file and the shared-catalog stats file; but neither of
those are very big, so I'm skeptical of what we'd win.

In view of 52e8fc3e2, there's more or less no case in which we'd be
writing stats without writing stats for the shared catalogs. So I'm
tempted to propose that we try to reduce the overhead by merging the
shared-catalog stats back into the global-stats file, thereby halving
the filesystem metadata traffic for updating those.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#87Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#86)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Hi,

On 05/26/2016 10:10 PM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

Attached is a patch that should fix the coalescing, including the clock
skew detection. In the end I reorganized the code a bit, moving the
check at the end, after the clock skew detection. Otherwise I'd have to
do the clock skew detection on multiple places, and that seemed ugly.

I hadn't been paying any attention to this thread, I must confess.
But I rediscovered this no-coalescing problem while pursuing the poor
behavior for shared catalogs that Peter complained of in
/messages/by-id/56AD41AC.1030509@gmx.net

I posted a patch at
/messages/by-id/13023.1464213041@sss.pgh.pa.us
which I think is functionally equivalent to what you have here, but
it goes to some lengths to make the code more readable, whereas this
is just adding another layer of complication to something that's
already a mess (eg, the request_time field is quite useless as-is).
So I'd like to propose pushing that in place of this patch ... do
you care to review it first?

I do care and I'll look at it over the next few days. FWIW when writing
that patch I intentionally refrained from major changes, as I think the
plan was to backpatch it. But +1 for more readable code from me.

Reacting to the thread overall:

I see Noah's concern about wanting to merge the write work for
requests about different databases. I've got mixed feelings about
that: it's arguable that any such change would make things worse not
better. In particular, it's inevitable that trying to merge writes
will result in delaying the response to the first request, whether
or not we are able to merge anything. That's not good in itself, and
it means that we can't hope to merge requests over any very long
interval, which very possibly will prevent any merging from
happening in real situations. Also, considering that we know the
stats collector can be pretty slow to respond at all under load, I'm
worried that this would result in more outright failures.

Moreover, what we'd hope to gain from merging is fewer writes of the
global stats file and the shared-catalog stats file; but neither of
those are very big, so I'm skeptical of what we'd win.

Yep. Clearly there's a trade-off between slowing down response to the
first request vs. speeding-up the whole process, but as you point out we
probably can't gain enough to justify that.

I wonder if that's still true on clusters with many databases (say,
shared systems with thousands of dbs). Perhaps walking the list just
once would save enough CPU to make this a win.

In any case, if we decide to abandon the idea of merging requests for
multiple databases, that probably means we can further simplify the
code. last_statrequests is a list but it actually never contains more
than just a single request. We kept it that way because of the plan to
add the merging. But if that's not worth it ...

In view of 52e8fc3e2, there's more or less no case in which we'd be
writing stats without writing stats for the shared catalogs. So I'm
tempted to propose that we try to reduce the overhead by merging the
shared-catalog stats back into the global-stats file, thereby
halving the filesystem metadata traffic for updating those.

I find this a bit contradictory with the previous paragraph. If you
believe that reducing the filesystem metadata traffic will have a
measurable benefit, then surely merging writes for multiple dbs (thus
not writing the global/shared files multiple times) will have even
higher impact, no?

E.g. let's assume we're still writing the global+shared+db files for
each database. If there are requests for 10 databases, we'll write 30
files. If we merge those requests first, we're writing only 12 files.

So I'm not sure about the metadata traffic argument, we'd need to see
some numbers showing it really makes a difference.

That being said, I'm not opposed to merging the shared catalog into the
global-stats file - it's not really a separate database so having it in
a separate file is a bit weird.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#88Michael Paquier
michael.paquier@gmail.com
In reply to: Tomas Vondra (#87)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On Thu, May 26, 2016 at 6:43 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

On 05/26/2016 10:10 PM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
In view of 52e8fc3e2, there's more or less no case in which we'd be
writing stats without writing stats for the shared catalogs. So I'm
tempted to propose that we try to reduce the overhead by merging the
shared-catalog stats back into the global-stats file, thereby
halving the filesystem metadata traffic for updating those.

[...]

That being said, I'm not opposed to merging the shared catalog into the
global-stats file - it's not really a separate database so having it in a
separate file is a bit weird.

While looking at this stuff, to be honest I got surprised that shared
relation stats are in located in a file whose name depends on
InvalidOid, so +1 from here as well to merge that into the global
stats file.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#89Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#87)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On 05/26/2016 10:10 PM, Tom Lane wrote:

In view of 52e8fc3e2, there's more or less no case in which we'd be
writing stats without writing stats for the shared catalogs. So I'm
tempted to propose that we try to reduce the overhead by merging the
shared-catalog stats back into the global-stats file, thereby
halving the filesystem metadata traffic for updating those.

I find this a bit contradictory with the previous paragraph. If you
believe that reducing the filesystem metadata traffic will have a
measurable benefit, then surely merging writes for multiple dbs (thus
not writing the global/shared files multiple times) will have even
higher impact, no?

Well, my thinking is that this is something we could get "for free"
without any penalty in response time. Going further will require
some sort of tradeoff.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#90Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#86)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Hi,

On 05/26/2016 10:10 PM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

Attached is a patch that should fix the coalescing, including the clock
skew detection. In the end I reorganized the code a bit, moving the
check at the end, after the clock skew detection. Otherwise I'd have to
do the clock skew detection on multiple places, and that seemed ugly.

I hadn't been paying any attention to this thread, I must confess.
But I rediscovered this no-coalescing problem while pursuing the poor
behavior for shared catalogs that Peter complained of in
/messages/by-id/56AD41AC.1030509@gmx.net

I posted a patch at
/messages/by-id/13023.1464213041@sss.pgh.pa.us
which I think is functionally equivalent to what you have here, but
it goes to some lengths to make the code more readable, whereas this
is just adding another layer of complication to something that's
already a mess (eg, the request_time field is quite useless as-is).
So I'd like to propose pushing that in place of this patch ... do you
care to review it first?

I've reviewed the patch today, and it seems fine to me - correct and
achieving the same goal as the patch posted to this thread (plus fixing
the issue with shared catalogs and fixing many comments).

FWIW do you still plan to back-patch this? Minimizing the amount of
changes was one of the things I had in mind when writing "my" patch,
which is why I ended up with parts that are less readable.

The one change I'm not quite sure about is the removal of clock skew
detection in pgstat_recv_inquiry(). You've removed the first check on
the inquiry, replacing it with this comment:

It seems sufficient to check for clock skew once per write round.

But the first check was comparing msg/req, while the second check looks
at dbentry/cur_ts. I don't see how those two clock skew check are
redundant - if they are, the comment should explain that I guess.

Another thing is that if you believe merging requests across databases
is a silly idea, maybe we should bite the bullet and replace the list of
requests with a single item. I'm not convinced about this, though.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#91Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#90)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On 05/26/2016 10:10 PM, Tom Lane wrote:

I posted a patch at
/messages/by-id/13023.1464213041@sss.pgh.pa.us
which I think is functionally equivalent to what you have here, but
it goes to some lengths to make the code more readable, whereas this
is just adding another layer of complication to something that's
already a mess (eg, the request_time field is quite useless as-is).
So I'd like to propose pushing that in place of this patch ... do you
care to review it first?

I've reviewed the patch today, and it seems fine to me - correct and
achieving the same goal as the patch posted to this thread (plus fixing
the issue with shared catalogs and fixing many comments).

Thanks for reviewing!

FWIW do you still plan to back-patch this? Minimizing the amount of
changes was one of the things I had in mind when writing "my" patch,
which is why I ended up with parts that are less readable.

Yeah, I think it's a bug fix and should be back-patched. I'm not in
favor of making things more complicated just to reduce the number of
lines a patch touches.

The one change I'm not quite sure about is the removal of clock skew
detection in pgstat_recv_inquiry(). You've removed the first check on
the inquiry, replacing it with this comment:
It seems sufficient to check for clock skew once per write round.
But the first check was comparing msg/req, while the second check looks
at dbentry/cur_ts. I don't see how those two clock skew check are
redundant - if they are, the comment should explain that I guess.

I'm confused here --- are you speaking of having removed

if (msg->cutoff_time > req->request_time)
req->request_time = msg->cutoff_time;

? That is not a check for clock skew, it's intending to be sure that
req->request_time reflects the latest request for this DB when we've seen
more than one request. But since req->request_time isn't actually being
used anywhere, it's useless code.

I reformatted the actual check for clock skew, but I do not think I
changed its behavior.

Another thing is that if you believe merging requests across databases
is a silly idea, maybe we should bite the bullet and replace the list of
requests with a single item. I'm not convinced about this, though.

No, I don't want to do that either. We're not spending much code by
having pending_write_requests be a list rather than a single entry,
and we might eventually figure out a reasonable way to time the flushes
so that we can merge requests.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#92Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#91)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 05/31/2016 06:59 PM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On 05/26/2016 10:10 PM, Tom Lane wrote:

I posted a patch at
/messages/by-id/13023.1464213041@sss.pgh.pa.us
which I think is functionally equivalent to what you have here, but
it goes to some lengths to make the code more readable, whereas this
is just adding another layer of complication to something that's
already a mess (eg, the request_time field is quite useless as-is).
So I'd like to propose pushing that in place of this patch ... do you
care to review it first?

I've reviewed the patch today, and it seems fine to me - correct and
achieving the same goal as the patch posted to this thread (plus fixing
the issue with shared catalogs and fixing many comments).

Thanks for reviewing!

FWIW do you still plan to back-patch this? Minimizing the amount of
changes was one of the things I had in mind when writing "my" patch,
which is why I ended up with parts that are less readable.

Yeah, I think it's a bug fix and should be back-patched. I'm not in
favor of making things more complicated just to reduce the number of
lines a patch touches.

The one change I'm not quite sure about is the removal of clock skew
detection in pgstat_recv_inquiry(). You've removed the first check on
the inquiry, replacing it with this comment:
It seems sufficient to check for clock skew once per write round.
But the first check was comparing msg/req, while the second check looks
at dbentry/cur_ts. I don't see how those two clock skew check are
redundant - if they are, the comment should explain that I guess.

I'm confused here --- are you speaking of having removed

if (msg->cutoff_time > req->request_time)
req->request_time = msg->cutoff_time;

? That is not a check for clock skew, it's intending to be sure that
req->request_time reflects the latest request for this DB when we've
seen more than one request. But since req->request_time isn't
actually being used anywhere, it's useless code.

Ah, you're right. I've made the mistake of writing the e-mail before
drinking any coffee today, and I got distracted by the comment change.

I reformatted the actual check for clock skew, but I do not think I
changed its behavior.

I'm not sure it does not change the behavior, though. request_time only
became unused after you removed the two places that set the value (one
of them in the clock skew check).

I'm not sure this is a bad change, though. But there was a dependency
between the new request and the preceding one.

Another thing is that if you believe merging requests across databases
is a silly idea, maybe we should bite the bullet and replace the list of
requests with a single item. I'm not convinced about this, though.

No, I don't want to do that either. We're not spending much code by
having pending_write_requests be a list rather than a single entry,
and we might eventually figure out a reasonable way to time the flushes
so that we can merge requests.

+1

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#93Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#92)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On 05/31/2016 06:59 PM, Tom Lane wrote:

I'm confused here --- are you speaking of having removed
if (msg->cutoff_time > req->request_time)
req->request_time = msg->cutoff_time;
? That is not a check for clock skew, it's intending to be sure that
req->request_time reflects the latest request for this DB when we've
seen more than one request. But since req->request_time isn't
actually being used anywhere, it's useless code.

Ah, you're right. I've made the mistake of writing the e-mail before
drinking any coffee today, and I got distracted by the comment change.

I reformatted the actual check for clock skew, but I do not think I
changed its behavior.

I'm not sure it does not change the behavior, though. request_time only
became unused after you removed the two places that set the value (one
of them in the clock skew check).

Well, it's unused in the sense that the if-test quoted above is the only
place in HEAD that examines the value of request_time. And since that
if-test only controls whether we change the value, and not whether we
proceed to make the clock skew check, I don't see how it's related
to clock skew or indeed anything else at all.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#94Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#93)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

On 05/31/2016 07:24 PM, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On 05/31/2016 06:59 PM, Tom Lane wrote:

I'm confused here --- are you speaking of having removed
if (msg->cutoff_time > req->request_time)
req->request_time = msg->cutoff_time;
? That is not a check for clock skew, it's intending to be sure that
req->request_time reflects the latest request for this DB when we've
seen more than one request. But since req->request_time isn't
actually being used anywhere, it's useless code.

Ah, you're right. I've made the mistake of writing the e-mail before
drinking any coffee today, and I got distracted by the comment change.

I reformatted the actual check for clock skew, but I do not think I
changed its behavior.

I'm not sure it does not change the behavior, though. request_time only
became unused after you removed the two places that set the value (one
of them in the clock skew check).

Well, it's unused in the sense that the if-test quoted above is the only
place in HEAD that examines the value of request_time. And since that
if-test only controls whether we change the value, and not whether we
proceed to make the clock skew check, I don't see how it's related
to clock skew or indeed anything else at all.

I see, in that case it indeed is useless.

I've checked how this worked in 9.2 (before the 9.3 patch that split the
file per db), and back then last_statsrequest (transformed to
request_time) was used to decide whether we need to write something. But
now we do that by simply checking whether the list is empty.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#95Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#94)
Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

I've checked how this worked in 9.2 (before the 9.3 patch that split the
file per db), and back then last_statsrequest (transformed to
request_time) was used to decide whether we need to write something. But
now we do that by simply checking whether the list is empty.

Right. In effect, 9.3 moved the decision about "do we need to write stats
because of this request" from the main loop to pgstat_recv_inquiry()
... but we forgot to incorporate any check for whether the request was
already satisfied into pgstat_recv_inquiry(). We can do that, though,
as per either of the patches under discussion --- and once we do so,
maintaining DBWriteRequest.request_time seems a bit pointless.

It's conceivable that we'd try to implement merging of close-together
requests in a way that would take account of how far back the oldest
unsatisfied request is. But I think that a global oldest-request time
would be sufficient for that; we don't need to track it per-database.
In any case, it's hard to see exactly how to make that work without
putting a gettimeofday() call into the inner message handling loop,
which I believe we won't want to do on performance grounds. The previous
speculation about doing writes every N messages or when we have no input
to process seems more likely to lead to a useful answer.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers