Moving pgstat.stat and pgstat.tmp

Started by Erik Jonesover 18 years ago13 messagesgeneral
Jump to latest
#1Erik Jones
erik@myemma.com

Hi, I'm currently doctoring a situation wherein we've got table
inheritance scheme that over the years that has ballooned like only
in your nightmares (think well over 100K tables + indexes on those).
The obvious solution is to re-design the schema with a better
partitioning scheme in mind (see another msg from me later today on
that) but that's a big project that's just getting underway and an
immediate concern is the I/O on out data partition due in large part
to the stats file(s) getting hammered. We can verify this by looking
at our write volume 45+ Mbits/s and watching it drop to well below 10
on average when we disable stat_row_level as well as watching the
insane amounts of writes to pgstat.tmp when running the rwsnoop
dtrace script.

So, for the interim we're looking to move where the stats files are
written to. I've made the changes to the file paths for pgstat.stat
and pgstat.tmp in src/backend/postmaster/pgstat.c, recompiled and
verified that everything seems to be working ok on our test machine.
However, seeing as how I'm not all that familiar with the code base,
I'm asking here: is that all I need to do? Is there anything I've
missed?

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Erik Jones (#1)
Re: Moving pgstat.stat and pgstat.tmp

Erik Jones <erik@myemma.com> writes:

Hi, I'm currently doctoring a situation wherein we've got table
inheritance scheme that over the years that has ballooned like only
in your nightmares (think well over 100K tables + indexes on those).
The obvious solution is to re-design the schema with a better
partitioning scheme in mind (see another msg from me later today on
that) but that's a big project that's just getting underway and an
immediate concern is the I/O on out data partition due in large part
to the stats file(s) getting hammered.

Which PG version? Early 8.2.x releases had a nasty bug that caused
excessive stats file writes.

regards, tom lane

#3Erik Jones
erik@myemma.com
In reply to: Tom Lane (#2)
Re: Moving pgstat.stat and pgstat.tmp

On Dec 3, 2007, at 4:16 PM, Tom Lane wrote:

Erik Jones <erik@myemma.com> writes:

Hi, I'm currently doctoring a situation wherein we've got table
inheritance scheme that over the years that has ballooned like only
in your nightmares (think well over 100K tables + indexes on those).
The obvious solution is to re-design the schema with a better
partitioning scheme in mind (see another msg from me later today on
that) but that's a big project that's just getting underway and an
immediate concern is the I/O on out data partition due in large part
to the stats file(s) getting hammered.

Which PG version? Early 8.2.x releases had a nasty bug that caused
excessive stats file writes.

8.2.5 on Solaris 10. Before we upgraded to 8.2.4 it was doing about
65 Mbs/sec. Interestingly, a while back we were running with the
data directory mounted with forcedirectio and saw none of this, I'm
guessing that fsync calls would have something to do with that?

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Erik Jones (#3)
Re: Moving pgstat.stat and pgstat.tmp

Erik Jones <erik@myemma.com> writes:

8.2.5 on Solaris 10. Before we upgraded to 8.2.4 it was doing about
65 Mbs/sec. Interestingly, a while back we were running with the
data directory mounted with forcedirectio and saw none of this, I'm
guessing that fsync calls would have something to do with that?

Hmm ... no, because the stats file never gets fsync'd. I should think
that forcedirectio would have made things worse.

regards, tom lane

#5Erik Jones
erik@myemma.com
In reply to: Tom Lane (#4)
Re: Moving pgstat.stat and pgstat.tmp

On Dec 3, 2007, at 6:10 PM, Tom Lane wrote:

Erik Jones <erik@myemma.com> writes:

8.2.5 on Solaris 10. Before we upgraded to 8.2.4 it was doing about
65 Mbs/sec. Interestingly, a while back we were running with the
data directory mounted with forcedirectio and saw none of this, I'm
guessing that fsync calls would have something to do with that?

Hmm ... no, because the stats file never gets fsync'd. I should think
that forcedirectio would have made things worse.

Interesting. If this is anything you'd like to look into I can
provide whatever diagnostic output you need (iostat, vmstat, dtrace
script outputs, etc...) but I do have to reiterate that we are an
extreme corner case due to out schema size. For now, is renaming the
#define'd paths for the stats file and temp file sufficient for
moving them? Basically, we'd like to move them onto a RAM disk to
give our disks a break.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Erik Jones (#5)
Re: Moving pgstat.stat and pgstat.tmp

Erik Jones <erik@myemma.com> writes:

For now, is renaming the
#define'd paths for the stats file and temp file sufficient for
moving them?

I would think so, but haven't tried it. There definitely shouldn't be
anything outside pgstat.c that's touching them.

regards, tom lane

#7Robert Treat
xzilla@users.sourceforge.net
In reply to: Erik Jones (#5)
Re: Moving pgstat.stat and pgstat.tmp

On Monday 03 December 2007 20:22, Erik Jones wrote:

On Dec 3, 2007, at 6:10 PM, Tom Lane wrote:

Erik Jones <erik@myemma.com> writes:

8.2.5 on Solaris 10. Before we upgraded to 8.2.4 it was doing about
65 Mbs/sec. Interestingly, a while back we were running with the
data directory mounted with forcedirectio and saw none of this, I'm
guessing that fsync calls would have something to do with that?

Hmm ... no, because the stats file never gets fsync'd. I should think
that forcedirectio would have made things worse.

Interesting. If this is anything you'd like to look into I can
provide whatever diagnostic output you need (iostat, vmstat, dtrace
script outputs, etc...) but I do have to reiterate that we are an
extreme corner case due to out schema size. For now, is renaming the
#define'd paths for the stats file and temp file sufficient for
moving them? Basically, we'd like to move them onto a RAM disk to
give our disks a break.

Yeah, we've noticed the same problem (pgstat is the most active file on the
system... uncovered in much the same way... go solaris). Actually I was
wondering if it could be done with symlinks, a la moving xlogs. Since we do
custom builds, that's not a real issue, but I was curious.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Treat (#7)
Re: Moving pgstat.stat and pgstat.tmp

Robert Treat wrote:

On Monday 03 December 2007 20:22, Erik Jones wrote:

Interesting. If this is anything you'd like to look into I can
provide whatever diagnostic output you need (iostat, vmstat, dtrace
script outputs, etc...) but I do have to reiterate that we are an
extreme corner case due to out schema size. For now, is renaming the
#define'd paths for the stats file and temp file sufficient for
moving them? Basically, we'd like to move them onto a RAM disk to
give our disks a break.

Yeah, we've noticed the same problem (pgstat is the most active file on the
system... uncovered in much the same way... go solaris). Actually I was
wondering if it could be done with symlinks, a la moving xlogs.

Not really, because a new file is created and renamed in place each time
it's going to be rewritten. So the symlink would be lost in the first
file rewrite.

The first idea that comes to mind is to make the path configurable via
GUC, so the user could set it to be written to an in-memory filesystem
(/tmp in Solaris?). But then I thought, why do we need it to be a file
at all? Why not use a mmap'ed memory area or something like that, and
only write it to a file on postmaster shutdown? (Losing the file on
unclean shutdown is not a problem, because the file is removed anyway.)

--
Alvaro Herrera Valdivia, Chile ICBM: S 39� 49' 18.1", W 73� 13' 56.4"
"Prefiero omelette con amigos que caviar con tontos"
(Alain Nonnet)

#9Robert Treat
xzilla@users.sourceforge.net
In reply to: Alvaro Herrera (#8)
Re: Moving pgstat.stat and pgstat.tmp

On Wednesday 05 December 2007 07:22, Alvaro Herrera wrote:

Robert Treat wrote:

On Monday 03 December 2007 20:22, Erik Jones wrote:

Interesting. If this is anything you'd like to look into I can
provide whatever diagnostic output you need (iostat, vmstat, dtrace
script outputs, etc...) but I do have to reiterate that we are an
extreme corner case due to out schema size. For now, is renaming the
#define'd paths for the stats file and temp file sufficient for
moving them? Basically, we'd like to move them onto a RAM disk to
give our disks a break.

Yeah, we've noticed the same problem (pgstat is the most active file on
the system... uncovered in much the same way... go solaris). Actually I
was wondering if it could be done with symlinks, a la moving xlogs.

Not really, because a new file is created and renamed in place each time
it's going to be rewritten. So the symlink would be lost in the first
file rewrite.

Ah yeah, thats what I concluded back then.

The first idea that comes to mind is to make the path configurable via
GUC, so the user could set it to be written to an in-memory filesystem
(/tmp in Solaris?).

Yep, thought of that to, though it was after feature freeze so I didn't
propose it. Course if someone wants to sneak that in it would be cool :-)

But then I thought, why do we need it to be a file
at all? Why not use a mmap'ed memory area or something like that, and
only write it to a file on postmaster shutdown? (Losing the file on
unclean shutdown is not a problem, because the file is removed anyway.)

I suppose you need some facility to spill to disk, so maybe being in a file is
better? Seems it might not be in most cases... I wonder how big a memory
space we (or Erik) need.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#8)
Re: Moving pgstat.stat and pgstat.tmp

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

But then I thought, why do we need it to be a file
at all? Why not use a mmap'ed memory area or something like that, and
only write it to a file on postmaster shutdown?

Yeah, we definitely need some other technology for this. The difficulty
is in dealing with a highly variably sized chunk of data --- our
existing shmem approach won't work well, and once you get away from that
the old portability question raises its head.

There's also a synchronization issue: how can the stats collector make
updates appear atomic? mmap by itself doesn't solve that AFAIK.

regards, tom lane

#11Erik Jones
erik@myemma.com
In reply to: Robert Treat (#9)
Re: Moving pgstat.stat and pgstat.tmp

On Dec 5, 2007, at 7:50 AM, Robert Treat wrote:

On Wednesday 05 December 2007 07:22, Alvaro Herrera wrote:

Robert Treat wrote:

On Monday 03 December 2007 20:22, Erik Jones wrote:

Interesting. If this is anything you'd like to look into I can
provide whatever diagnostic output you need (iostat, vmstat, dtrace
script outputs, etc...) but I do have to reiterate that we are an
extreme corner case due to out schema size. For now, is
renaming the
#define'd paths for the stats file and temp file sufficient for
moving them? Basically, we'd like to move them onto a RAM disk to
give our disks a break.

Yeah, we've noticed the same problem (pgstat is the most active
file on
the system... uncovered in much the same way... go solaris).
Actually I
was wondering if it could be done with symlinks, a la moving xlogs.

Not really, because a new file is created and renamed in place
each time
it's going to be rewritten. So the symlink would be lost in the
first
file rewrite.

Ah yeah, thats what I concluded back then.

The first idea that comes to mind is to make the path configurable
via
GUC, so the user could set it to be written to an in-memory
filesystem
(/tmp in Solaris?).

Yep, thought of that to, though it was after feature freeze so I
didn't
propose it. Course if someone wants to sneak that in it would be
cool :-)

But then I thought, why do we need it to be a file
at all? Why not use a mmap'ed memory area or something like that,
and
only write it to a file on postmaster shutdown? (Losing the file on
unclean shutdown is not a problem, because the file is removed
anyway.)

I suppose you need some facility to spill to disk, so maybe being
in a file is
better? Seems it might not be in most cases... I wonder how big a
memory
space we (or Erik) need.

What I've done and tested on our test db server is to change lines 65
& 66 in pg_stat.c from

#define PGSTAT_STAT_FILENAME "global/pgstat.stat"
#define PGSTAT_STAT_TMPFILE "global/pgstat.tmp"

to

#define PGSTAT_STAT_FILENAME "global/pg_stats/pgstat.stat"
#define PGSTAT_STAT_TMPFILE "global/pg_stats/pgstat.tmp"

recompile and then create that pg_stats directory as a symlink to a
directory with a swapfs mounted in it. Everything seems to be
kosher. Of course, this adds a bit to our shutdown procedure in the
case where we're going to bounce the actual server in that we need to
make sure to copy the stats file(s) out of the swapfs directory in
order to preserve stats in that case (and back in afterwards, of
course).

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#12Erik Jones
erik@myemma.com
In reply to: Robert Treat (#9)
Re: Moving pgstat.stat and pgstat.tmp

On Dec 5, 2007, at 7:50 AM, Robert Treat wrote:

On Wednesday 05 December 2007 07:22, Alvaro Herrera wrote:

Robert Treat wrote:

On Monday 03 December 2007 20:22, Erik Jones wrote:

Interesting. If this is anything you'd like to look into I can
provide whatever diagnostic output you need (iostat, vmstat, dtrace
script outputs, etc...) but I do have to reiterate that we are an
extreme corner case due to out schema size. For now, is
renaming the
#define'd paths for the stats file and temp file sufficient for
moving them? Basically, we'd like to move them onto a RAM disk to
give our disks a break.

Yeah, we've noticed the same problem (pgstat is the most active
file on
the system... uncovered in much the same way... go solaris).
Actually I
was wondering if it could be done with symlinks, a la moving xlogs.

Not really, because a new file is created and renamed in place
each time
it's going to be rewritten. So the symlink would be lost in the
first
file rewrite.

Ah yeah, thats what I concluded back then.

The first idea that comes to mind is to make the path configurable
via
GUC, so the user could set it to be written to an in-memory
filesystem
(/tmp in Solaris?).

Yep, thought of that to, though it was after feature freeze so I
didn't
propose it. Course if someone wants to sneak that in it would be
cool :-)

But then I thought, why do we need it to be a file
at all? Why not use a mmap'ed memory area or something like that,
and
only write it to a file on postmaster shutdown? (Losing the file on
unclean shutdown is not a problem, because the file is removed
anyway.)

I suppose you need some facility to spill to disk, so maybe being
in a file is
better? Seems it might not be in most cases... I wonder how big a
memory
space we (or Erik) need.

We made the swapfs 300MB which is actually way more than we need as I
don't think I've seen our pgstat.stat file crack 10MB using the
entirely scientific method of spot-checking :)

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#13Bruce Momjian
bruce@momjian.us
In reply to: Erik Jones (#1)
Re: Moving pgstat.stat and pgstat.tmp

Added to TODO:

* Reduce file system activity overhead of statistics file pgstat.stat

http://archives.postgresql.org/pgsql-general/2007-12/msg00106.php

---------------------------------------------------------------------------

Erik Jones wrote:

Hi, I'm currently doctoring a situation wherein we've got table
inheritance scheme that over the years that has ballooned like only
in your nightmares (think well over 100K tables + indexes on those).
The obvious solution is to re-design the schema with a better
partitioning scheme in mind (see another msg from me later today on
that) but that's a big project that's just getting underway and an
immediate concern is the I/O on out data partition due in large part
to the stats file(s) getting hammered. We can verify this by looking
at our write volume 45+ Mbits/s and watching it drop to well below 10
on average when we disable stat_row_level as well as watching the
insane amounts of writes to pgstat.tmp when running the rwsnoop
dtrace script.

So, for the interim we're looking to move where the stats files are
written to. I've made the changes to the file paths for pgstat.stat
and pgstat.tmp in src/backend/postmaster/pgstat.c, recompiled and
verified that everything seems to be working ok on our test machine.
However, seeing as how I'm not all that familiar with the code base,
I'm asking here: is that all I need to do? Is there anything I've
missed?

Erik Jones

Software Developer | Emma?
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +