reducing statistics write overhead

Started by Martin Pihlakalmost 18 years ago47 messageshackers

martin.pihlak@gmail.com

almost 18 years ago

Howdy,

The statistics collector currently dumps the stats file at every 500ms. This
is a major problem if the file becomes large -- occasionally we've been forced
to disable stats collection to cope with it. Another issue is that while the
file is frequently written, it is seldom read. Typically once a minute -
autovacuum plus possible user initiated stats queries.

So, as a simple optimization I am proposing that the file should be only
written when some backend requests statistics. This would significantly reduce
the undesired write traffic at the cost of slightly slower stats access.

Attached is a WIP patch, which basically implements this:

Disable periodic writing of the stats file. Introduce new stats message type -
PGSTAT_MTYPE_INQUIRY. Backends send this to notify collector that stats is needed.
Pid of the requestor is provided in the message. Backend then installs an alarm
handler and starts a timer. Collector processes the messages and compiles a list
of pids to be notified. If there are any, the stats file is written and SIGALRM
is sent to the requestors. Backend then proceeds to read the stats file a usual.

Thoughts, comments?

regards,
Martin

Tom Lane

tgl@sss.pgh.pa.us

almost 18 years ago

In reply to: Martin Pihlak (#1)

Re: reducing statistics write overhead

Martin Pihlak <martin.pihlak@gmail.com> writes:

So, as a simple optimization I am proposing that the file should be
only written when some backend requests statistics. This would
significantly reduce the undesired write traffic at the cost of
slightly slower stats access.

How necessary is this given the recent fixes to allow the stats file to
be kept on a ramdisk?

Attached is a WIP patch, which basically implements this:

This patch breaks deadlock checking and statement_timeout, because
backends already use SIGALRM. You can't just take over that signal.
It's possible that you could get things to work by treating this as an
additional reason for SIGALRM, but that code is unreasonably complex
already. I'd suggest finding some other way.

regards, tom lane

Joshua D. Drake

jd@commandprompt.com

almost 18 years ago

In reply to: Tom Lane (#2)

Re: reducing statistics write overhead

On Fri, 05 Sep 2008 15:23:18 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Martin Pihlak <martin.pihlak@gmail.com> writes:

So, as a simple optimization I am proposing that the file should be
only written when some backend requests statistics. This would
significantly reduce the undesired write traffic at the cost of
slightly slower stats access.

How necessary is this given the recent fixes to allow the stats file
to be kept on a ramdisk?

From an usability and integration perspective this patch is a nice
touch. On demand is a nice feature when used correctly.

Sincerely,

Joshua D. Drake

--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

Martin Pihlak

martin.pihlak@gmail.com

almost 18 years ago

In reply to: Tom Lane (#2)

Re: reducing statistics write overhead

Tom Lane wrote:

Martin Pihlak <martin.pihlak@gmail.com> writes:

So, as a simple optimization I am proposing that the file should be
only written when some backend requests statistics. This would
significantly reduce the undesired write traffic at the cost of
slightly slower stats access.

How necessary is this given the recent fixes to allow the stats file to
be kept on a ramdisk?

Ramdisk helps, but requires additional effort to set up. Also the stats
file has a tendency to creep up on you -- as the database evolves the file
size gradually increases and suddenly the DBA is left wondering what
happened to performance.

Attached is a WIP patch, which basically implements this:

This patch breaks deadlock checking and statement_timeout, because
backends already use SIGALRM. You can't just take over that signal.
It's possible that you could get things to work by treating this as an
additional reason for SIGALRM, but that code is unreasonably complex
already. I'd suggest finding some other way.

I suspected that, but somehow managed to overlook it :( I guess it was
too tempting to use it. I'll start looking for alternatives.

regards,
Martin

Euler Taveira de Oliveira

euler@timbira.com

almost 18 years ago

In reply to: Martin Pihlak (#4)

Re: reducing statistics write overhead

Martin Pihlak escreveu:

I suspected that, but somehow managed to overlook it :( I guess it was
too tempting to use it. I'll start looking for alternatives.

If you can't afford a 500 msec pgstat time, then you need to make it
tunable. Another ideas are (i) turn on/off pgstat per table or database
and (ii) make the pgstat time tunable per table or database. You can use
the reloptions column to store these info. These workarounds are much
simpler than that you proposed and they're almost for free.

--
Euler Taveira de Oliveira
http://www.timbira.com/

Tom Lane

tgl@sss.pgh.pa.us

almost 18 years ago

In reply to: Euler Taveira de Oliveira (#5)

Re: reducing statistics write overhead

Euler Taveira de Oliveira <euler@timbira.com> writes:

If you can't afford a 500 msec pgstat time, then you need to make it
tunable. Another ideas are (i) turn on/off pgstat per table or database
and (ii) make the pgstat time tunable per table or database. You can use
the reloptions column to store these info. These workarounds are much
simpler than that you proposed and they're almost for free.

For normal usage on-demand dumping would be a really good thing; it'd
cut the overhead of having stats on tremendously, especially for people
who don't really use 'em. The particular signaling proposed here is
bogus, but if Martin can make it work in a cleaner fashion I think it's
likely a good idea.

regards, tom lane

Asko Oja

ascoja@gmail.com

almost 18 years ago

In reply to: Euler Taveira de Oliveira (#5)

Re: reducing statistics write overhead

On Sat, Sep 6, 2008 at 2:29 AM, Euler Taveira de Oliveira <euler@timbira.com

wrote:

Martin Pihlak escreveu:

I suspected that, but somehow managed to overlook it :( I guess it was
too tempting to use it. I'll start looking for alternatives.

If you can't afford a 500 msec pgstat time, then you need to make it
tunable.

Additional parameter in config file. Not good.

Another ideas are (i) turn on/off pgstat per table or database
and (ii) make the pgstat time tunable per table or database. You can use
the reloptions column to store these info. These workarounds are much
simpler than that you proposed and they're almost for free.

Does not seem simple to me. Why would dba's want extra management work. We
want all the stats to be there as we don't know when we need to look at it.

Show quoted text

--
Euler Taveira de Oliveira
http://www.timbira.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Simon Riggs

simon@2ndQuadrant.com

almost 18 years ago

In reply to: Tom Lane (#2)

Re: reducing statistics write overhead

On Fri, 2008-09-05 at 15:23 -0400, Tom Lane wrote:

How necessary is this given the recent fixes to allow the stats file to
be kept on a ramdisk?

I would prefer this approach and back-out the other change.

On-demand is cheaper and easier to use.

Attached is a WIP patch, which basically implements this:

This patch breaks deadlock checking and statement_timeout, because
backends already use SIGALRM. You can't just take over that signal.
It's possible that you could get things to work by treating this as an
additional reason for SIGALRM, but that code is unreasonably complex
already. I'd suggest finding some other way.

There are other ways already in use in backend, so just use those.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Tom Lane

tgl@sss.pgh.pa.us

almost 18 years ago

In reply to: Simon Riggs (#8)

Re: reducing statistics write overhead

Simon Riggs <simon@2ndQuadrant.com> writes:

On Fri, 2008-09-05 at 15:23 -0400, Tom Lane wrote:

How necessary is this given the recent fixes to allow the stats file to
be kept on a ramdisk?

I would prefer this approach and back-out the other change.

Even if we get on-demand done, I wouldn't see it as a reason to back out
the statfile relocation work. In an environment where the stats are
demanded frequently, you could still need that for performance.

(In fact, maybe this patch ought to include some sort of maximum update
rate tunable? The worst case behavior could actually be WORSE than now.)

regards, tom lane

#10

Alvaro Herrera

alvherre@2ndquadrant.com

almost 18 years ago

In reply to: Tom Lane (#9)

Re: reducing statistics write overhead

Tom Lane escribiï¿½:

(In fact, maybe this patch ought to include some sort of maximum update
rate tunable? The worst case behavior could actually be WORSE than now.)

Some sort of "if stats were requested in the last 500 ms, just tell the
requester to read the existing file".

Things that come to mind:

- autovacuum could use a more frequent stats update in certain cases

- Maybe we oughta have separate files, one for each database? That way
we'd reduce unnecessary I/O traffic for both the reader and the writer.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#11

Tom Lane

tgl@sss.pgh.pa.us

almost 18 years ago

In reply to: Alvaro Herrera (#10)

Re: reducing statistics write overhead

Alvaro Herrera <alvherre@commandprompt.com> writes:

Some sort of "if stats were requested in the last 500 ms, just tell the
requester to read the existing file".

Hmm, I was thinking of delaying both the write and the reply signal
until 500ms had elapsed. But the above behavior would certainly be
easier to implement, and would probably be good enough (TM).

- Maybe we oughta have separate files, one for each database? That way
we'd reduce unnecessary I/O traffic for both the reader and the writer.

The signaling would become way too complex, I think. Also what do you
do about shared tables?

regards, tom lane

#12

Tom Lane

tgl@sss.pgh.pa.us

almost 18 years ago

In reply to: Alvaro Herrera (#10)

Re: reducing statistics write overhead

Alvaro Herrera <alvherre@commandprompt.com> writes:

Some sort of "if stats were requested in the last 500 ms, just tell the
requester to read the existing file".

Things that come to mind:

- autovacuum could use a more frequent stats update in certain cases

BTW, we could implement that by, instead of having a global tunable,
including a field in the request message saying how stale an existing
file is acceptable for this requestor. 500ms might be the standard
value but autovac could use a smaller number.

regards, tom lane

#13

Asko Oja

ascoja@gmail.com

almost 18 years ago

In reply to: Tom Lane (#12)

Re: reducing statistics write overhead

Too frequent read protection is already handled in the patch but these
comments might lead it into new directions. Current implementation had this
same limit that file was written no more than once per 500 ms.

On Sat, Sep 6, 2008 at 9:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Alvaro Herrera <alvherre@commandprompt.com> writes:

Some sort of "if stats were requested in the last 500 ms, just tell the
requester to read the existing file".

Things that come to mind:

- autovacuum could use a more frequent stats update in certain cases

BTW, we could implement that by, instead of having a global tunable,
including a field in the request message saying how stale an existing
file is acceptable for this requestor. 500ms might be the standard
value but autovac could use a smaller number.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Alvaro Herrera

alvherre@2ndquadrant.com

almost 18 years ago

In reply to: Tom Lane (#11)

Re: reducing statistics write overhead

Tom Lane escribiï¿½:

Alvaro Herrera <alvherre@commandprompt.com> writes:

- Maybe we oughta have separate files, one for each database? That way
we'd reduce unnecessary I/O traffic for both the reader and the writer.

The signaling would become way too complex, I think. Also what do you
do about shared tables?

They are already stored in a "separate database" (denoted with
InvalidOid dbid), and autovacuum grabs it separately. I admit I don't
know what do regular backends do about it.

As for signalling, maybe we could implement something like we do for the
postmaster signal stuff: the requestor stores a dbid in shared memory
and sends a SIGUSR2 to pgstat or some such. We'd have enough shmem
space for a reasonable number of requests, and pgstat consumes them from
there into local memory (similar to what Andrew proposes for
LISTEN/NOTIFY); it stores the dbid and PID of the requestor. As soon as
the request has been fulfilled, pgstat responds by <fill in magical
mechanism that Martin is about to propose> to that particular backend.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#15

Tom Lane

tgl@sss.pgh.pa.us

almost 18 years ago

In reply to: Alvaro Herrera (#14)

Re: reducing statistics write overhead

Alvaro Herrera <alvherre@commandprompt.com> writes:

As for signalling, maybe we could implement something like we do for the
postmaster signal stuff: the requestor stores a dbid in shared memory
and sends a SIGUSR2 to pgstat or some such.

No, no, no. Martin already had a perfectly sane design for that
direction of signalling: send a special stats message to the collector.
That can carry whatever baggage it needs to. It's the reverse direction
of "the data you requested is available now, sir" that is tricky.
And I fear that having to keep track of multiple stats-collector output
files would make it very significantly trickier --- both for the stats
collector's own bookkeeping and for the signaling mechanism itself.
I don't believe it's gonna be worth that.

regards, tom lane

#16

Tom Lane

tgl@sss.pgh.pa.us

almost 18 years ago

In reply to: Tom Lane (#15)

Re: reducing statistics write overhead

I wrote:

No, no, no. Martin already had a perfectly sane design for that
direction of signalling: send a special stats message to the collector.

Actually ... given that the stats message mechanism is designed to be
lossy under high load, maybe that isn't so sane. At the very least
there would have to be timeout-and-resend logic on the backend side.

I dislike the alternative of communicating through shared memory,
though. Right now the stats collector isn't even connected to shared
memory.

regards, tom lane

#17

Magnus Hagander

magnus@hagander.net

almost 18 years ago

In reply to: Martin Pihlak (#4)

Re: reducing statistics write overhead

Martin Pihlak wrote:

Attached is a WIP patch, which basically implements this:

This patch breaks deadlock checking and statement_timeout, because
backends already use SIGALRM. You can't just take over that signal.
It's possible that you could get things to work by treating this as an
additional reason for SIGALRM, but that code is unreasonably complex
already. I'd suggest finding some other way.

I suspected that, but somehow managed to overlook it :( I guess it was
too tempting to use it. I'll start looking for alternatives.

I wrote a patch for this some time back, that was actually applied.
Turns out it didn't work, and I ran out of time to fix it, so it was
backed out again. And then I forgot about it :-) If you look through the
cvs history of pgstat you should be able to find it - maybe it can give
you some further ideas.

//Magnus

#18

Magnus Hagander

magnus@hagander.net

almost 18 years ago

In reply to: Tom Lane (#15)

Re: reducing statistics write overhead

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

As for signalling, maybe we could implement something like we do for the
postmaster signal stuff: the requestor stores a dbid in shared memory
and sends a SIGUSR2 to pgstat or some such.

No, no, no. Martin already had a perfectly sane design for that
direction of signalling: send a special stats message to the collector.
That can carry whatever baggage it needs to. It's the reverse direction
of "the data you requested is available now, sir" that is tricky.

IIRC, my previous patch looked at the inode of the stats file, then sent
of the "gimme a new file" signal, and then read the file once the inode
change.

But also IIRC, that's the area where there was a problem - sometimes it
didn't properly pick up changes...

//Magnus

#19

Dimitri Fontaine

dimitri@2ndQuadrant.fr

almost 18 years ago

In reply to: Tom Lane (#16)

Re: reducing statistics write overhead

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

Le 7 sept. 08 à 00:45, Tom Lane a écrit :

I dislike the alternative of communicating through shared memory,
though. Right now the stats collector isn't even connected to shared
memory.

Maybe Markus Wanner work for Postgres-R internal messaging, now it has
been reworked to follow your advices, could be of some use here?
http://archives.postgresql.org/pgsql-hackers/2008-07/msg01114.php
http://archives.postgresql.org/pgsql-hackers/2008-07/msg01420.php

Regards,
- --
dim

- --
Dimitri Fontaine
PostgreSQL DBA, Architecte

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkjEF0sACgkQlBXRlnbh1bl/FACeORN+NjEFC9wi22suNaSoWmi5
LBEAnj9Qo2E6GWqVjdtsSCG7JILBPmX6
=5jPo
-----END PGP SIGNATURE-----

#20

Martin Pihlak

martin.pihlak@gmail.com

almost 18 years ago

In reply to: Magnus Hagander (#17)

Re: reducing statistics write overhead

Magnus Hagander wrote:

I wrote a patch for this some time back, that was actually applied.
Turns out it didn't work, and I ran out of time to fix it, so it was
backed out again. And then I forgot about it :-) If you look through the
cvs history of pgstat you should be able to find it - maybe it can give
you some further ideas.

Got it - this was 1.126. Looks very familiar indeed :)

I had also previously experimented with stat() based polling but ran into
the same issues - no portable high resolution timestamp on files. I guess
stat() is unusable unless we can live with 1 second update interval for the
stats (eg. backend reads the file if it is within 1 second of the request).

One alternative is to include a timestamp in the stats file header - the
backend can then wait on that -- check the timestamp, sleep, resend the
request, loop. Not particularly elegant, but easy to implement. Would this
be acceptable?

regards,
Martin

#21

Tom Lane

tgl@sss.pgh.pa.us

almost 18 years ago

In reply to: Martin Pihlak (#20)

#22

Magnus Hagander

magnus@hagander.net

almost 18 years ago

In reply to: Martin Pihlak (#20)

#23