Megabytes of stats saved after every connection

Started by Phil Endecottover 20 years ago24 messagesgeneral
Jump to latest
#1Phil Endecott
spam_from_postgresql_general@chezphil.org

Dear Postgresql experts,

For some time I had been trying to work out why every connection to my
database resulted in several megabytes of data being written to the
disk, however trivial the query. I think I've found the culprit:
global/pgstat.stat. This is with 7.4.7.

This is for a web application which uses a new connection for each CGI
request. The server doesn't have a particularly high disk bandwidth and
this mysterious activity had been the bottleneck for some time. The
system is a little unusual as one of the databases has tens of thousands
of tables (though I saw these writes whichever database I connected to).

Looking at the output of vmstat I could see about 2.7Mbytes being
written up to about 5 seconds after the query was processed. I was
scratching my head about this for a long time, but today I noticed that
this size was just a little larger than my global/pgstat.stat file. So
I turned off stat_start_collector and stats_row_level and the writes
vanished. Turing them back on, the pgstats.stats file is much smaller
(10k) and the writes are invisible against the background noise.

So can I expect this file to grow again? I think I need the stats,
though I'm not entirely sure about that.

Was the entire file re-written, even when the only query I've run is
"select 1"? Is this necessary?

Any comments or suggestions gratefully received.

--Phil.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Phil Endecott (#1)
Re: Megabytes of stats saved after every connection

Phil Endecott <spam_from_postgresql_general@chezphil.org> writes:

For some time I had been trying to work out why every connection to my
database resulted in several megabytes of data being written to the
disk, however trivial the query. I think I've found the culprit:
global/pgstat.stat. This is with 7.4.7.

This is for a web application which uses a new connection for each CGI
request. The server doesn't have a particularly high disk bandwidth and
this mysterious activity had been the bottleneck for some time. The
system is a little unusual as one of the databases has tens of thousands
of tables (though I saw these writes whichever database I connected to).

Well, there's the problem --- the stats subsystem is designed in a way
that makes it rewrite its entire stats collection on every update.
That's clearly not going to scale well to a large number of tables.
Offhand I don't see an easy solution ... Jan, any ideas?

So can I expect this file to grow again? I think I need the stats,
though I'm not entirely sure about that.

If you're not using autovacuum then you don't need stats_row_level.

regards, tom lane

#3Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#2)
Re: Megabytes of stats saved after every connection

On 7/28/2005 2:03 PM, Tom Lane wrote:

Phil Endecott <spam_from_postgresql_general@chezphil.org> writes:

For some time I had been trying to work out why every connection to my
database resulted in several megabytes of data being written to the
disk, however trivial the query. I think I've found the culprit:
global/pgstat.stat. This is with 7.4.7.

This is for a web application which uses a new connection for each CGI
request. The server doesn't have a particularly high disk bandwidth and
this mysterious activity had been the bottleneck for some time. The
system is a little unusual as one of the databases has tens of thousands
of tables (though I saw these writes whichever database I connected to).

Well, there's the problem --- the stats subsystem is designed in a way
that makes it rewrite its entire stats collection on every update.
That's clearly not going to scale well to a large number of tables.
Offhand I don't see an easy solution ... Jan, any ideas?

PostgreSQL itself doesn't work too well with tens of thousands of
tables. I don't see much of an easy solution either. The best workaround
I can offer is to move that horror-DB to a separate postmaster with
stats disabled altogether.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#3)
Re: Megabytes of stats saved after every connection

Jan Wieck <JanWieck@Yahoo.com> writes:

On 7/28/2005 2:03 PM, Tom Lane wrote:

Well, there's the problem --- the stats subsystem is designed in a way
that makes it rewrite its entire stats collection on every update.
That's clearly not going to scale well to a large number of tables.
Offhand I don't see an easy solution ... Jan, any ideas?

PostgreSQL itself doesn't work too well with tens of thousands of
tables.

Really? AFAIK it should be pretty OK, assuming you are on a filesystem
that doesn't choke with tens of thousands of entries in a directory.
I think we should put down a TODO item to see if we can improve the
stats subsystem's performance in such cases.

regards, tom lane

#5Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#4)
Re: Megabytes of stats saved after every connection

On 7/28/2005 2:28 PM, Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

On 7/28/2005 2:03 PM, Tom Lane wrote:

Well, there's the problem --- the stats subsystem is designed in a way
that makes it rewrite its entire stats collection on every update.
That's clearly not going to scale well to a large number of tables.
Offhand I don't see an easy solution ... Jan, any ideas?

PostgreSQL itself doesn't work too well with tens of thousands of
tables.

Really? AFAIK it should be pretty OK, assuming you are on a filesystem
that doesn't choke with tens of thousands of entries in a directory.
I think we should put down a TODO item to see if we can improve the
stats subsystem's performance in such cases.

Okay, I should be more specific. The problem with tens of thousands of
tables does not exist just because of them being there. It will emerge
if all those tables are actually used because it will mean that you'd
need all the pg_class and pg_attribute rows cached and also your vfd
cache will constantly rotate.

Then again, the stats file is only written. There is nothing that
actually forces the blocks out. On a busy system, one individual stats
file will be created, written to, renamed, live for 500ms and be thrown
away by the next stat files rename operation. I would assume that with a
decent filesystem and appropriate OS buffers, none of the data blocks of
most stat files even hit the disk. I must be missing something.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#5)
Re: Megabytes of stats saved after every connection

Jan Wieck <JanWieck@Yahoo.com> writes:

On 7/28/2005 2:28 PM, Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

PostgreSQL itself doesn't work too well with tens of thousands of
tables.

Really? AFAIK it should be pretty OK, assuming you are on a filesystem
that doesn't choke with tens of thousands of entries in a directory.
I think we should put down a TODO item to see if we can improve the
stats subsystem's performance in such cases.

Okay, I should be more specific. The problem with tens of thousands of
tables does not exist just because of them being there. It will emerge
if all those tables are actually used because it will mean that you'd
need all the pg_class and pg_attribute rows cached and also your vfd
cache will constantly rotate.

Sure, if you have a single backend touching all tables you'll have some
issues in that backend. But the stats problem is that it tracks every
table anyone has ever touched, which makes the issue much more pressing.

Then again, the stats file is only written. There is nothing that
actually forces the blocks out. On a busy system, one individual stats
file will be created, written to, renamed, live for 500ms and be thrown
away by the next stat files rename operation. I would assume that with a
decent filesystem and appropriate OS buffers, none of the data blocks of
most stat files even hit the disk. I must be missing something.

This is possibly true --- Phil, do you see actual disk I/O happening
from the stats writes, or is it just kernel calls?

regards, tom lane

#7Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Jan Wieck (#5)
Re: Megabytes of stats saved after every connection

On Thu, 2005-07-28 at 13:40, Jan Wieck wrote:

On 7/28/2005 2:28 PM, Tom Lane wrote:

Jan Wieck <JanWieck@Yahoo.com> writes:

On 7/28/2005 2:03 PM, Tom Lane wrote:

Well, there's the problem --- the stats subsystem is designed in a way
that makes it rewrite its entire stats collection on every update.
That's clearly not going to scale well to a large number of tables.
Offhand I don't see an easy solution ... Jan, any ideas?

PostgreSQL itself doesn't work too well with tens of thousands of
tables.

Really? AFAIK it should be pretty OK, assuming you are on a filesystem
that doesn't choke with tens of thousands of entries in a directory.
I think we should put down a TODO item to see if we can improve the
stats subsystem's performance in such cases.

Okay, I should be more specific. The problem with tens of thousands of
tables does not exist just because of them being there. It will emerge
if all those tables are actually used because it will mean that you'd
need all the pg_class and pg_attribute rows cached and also your vfd
cache will constantly rotate.

Then again, the stats file is only written. There is nothing that
actually forces the blocks out. On a busy system, one individual stats
file will be created, written to, renamed, live for 500ms and be thrown
away by the next stat files rename operation. I would assume that with a
decent filesystem and appropriate OS buffers, none of the data blocks of
most stat files even hit the disk. I must be missing something.

Yeah, I found these three facets of the OP's system a bit disconcerting:

QUOTE ---
This is for a web application which uses a new connection for each CGI
request.
The server doesn't have a particularly high disk bandwidth and this
mysterious activity had been the bottleneck for some time.
The system is a little unusual as one of the databases has tens of
thousands of tables.
ENDQUOTE ---

Any two of those choices could cause some issues, but all three together
are pretty much a death knell for performance, whether or not the
global/pgstat file is being written or not.

Just an observation.

#8Bruce Momjian
bruce@momjian.us
In reply to: Jan Wieck (#5)
Re: Megabytes of stats saved after every connection

Jan Wieck <JanWieck@Yahoo.com> writes:

Then again, the stats file is only written. There is nothing that actually
forces the blocks out. On a busy system, one individual stats file will be
created, written to, renamed, live for 500ms and be thrown away by the next
stat files rename operation. I would assume that with a decent filesystem and
appropriate OS buffers, none of the data blocks of most stat files even hit the
disk. I must be missing something.

Renaming is a metadata operation. Depending on the filesystem it has to be
done either synchronously or force a log write barrier. I'm not sure how those
things are implemented in various filesystems but I could easily imagine some
implementations treating them as implicit fsyncs for that file.

Perhaps this user could put the stats file in a ramdisk. It doesn't sound like
losing it in a crash would be anything to worry about.

--
greg

#9Bruce Momjian
bruce@momjian.us
In reply to: Jan Wieck (#5)
Re: Megabytes of stats saved after every connection

Jan Wieck <JanWieck@Yahoo.com> writes:

PostgreSQL itself doesn't work too well with tens of thousands of tables.

Really? AFAIK it should be pretty OK, assuming you are on a filesystem
that doesn't choke with tens of thousands of entries in a directory.
I think we should put down a TODO item to see if we can improve the
stats subsystem's performance in such cases.

Okay, I should be more specific. The problem with tens of thousands of tables
does not exist just because of them being there. It will emerge if all those
tables are actually used because it will mean that you'd need all the pg_class
and pg_attribute rows cached and also your vfd cache will constantly rotate.

I think occasionally people get bitten by not having their pg_* tables being
vacuumed or analyzed regularly. If you have lots of tables and the stats are
never updated for pg_class or related tables you can find the planner taking a
long time to plan queries.

This happens if you schedule a cron job to do your vacuuming and analyzing but
connect as a user other than the database owner. For example, you leave the
database owned by "postgres" but create a user to own all the tables and use
that to run regularly scheduled "vacuum analyze"s.

I'm not sure how often these types of problems get properly diagnosed. The
symptoms are quite mysterious. In retrospect I think I observed something like
it and never figured out what was going on. The problem only went away when I
upgraded the database and went through an initdb cycle.

--
greg

#10Phil Endecott
spam_from_postgresql_general@chezphil.org
In reply to: Tom Lane (#6)
Re: Megabytes of stats saved after every connection

Hello again,

Just to give a bit of background, in case it is useful: this is my
family tree website, treefic.com. I have a schema for each user, each
with about a dozen tables. In most cases the tables are small, i.e.
tens of entries, but the users I care about are the ones with tens of
thousands of people in their trees. The schemas are independent of each
other. Example web page: http://treefic.com/treefic/royal92

Jan Wieck <JanWieck@Yahoo.com> writes:

PostgreSQL itself doesn't work too well with tens of thousands of
tables.

I've specifically asked about this here before. This is obviously
important for my application so I invite all readers to share any
thoughts they might have about possible problems with large numbers of
tables. I also create and drop large numbers of temporary tables - can
anyone think of any additional problems with that?

Issues I have discussed here before include tab-completion in psql
(unimportant) and autovacuum's O(n^2) performance (important).

Okay, I should be more specific. The problem with tens of thousands of
tables does not exist just because of them being there. It will emerge
if all those tables are actually used because it will mean that you'd
need all the pg_class and pg_attribute rows cached and also your vfd
cache will constantly rotate.

If many trees are being viewed simultaneously, another part of the
system will be the bottleneck. Within any, say, 5 minute period, only
hundreds of tables will be in use.

Then again, the stats file is only written. There is nothing that
actually forces the blocks out. On a busy system, one individual stats
file will be created, written to, renamed, live for 500ms and be thrown
away by the next stat files rename operation. I would assume that with a
decent filesystem and appropriate OS buffers, none of the data blocks of
most stat files even hit the disk. I must be missing something.

This is possibly true --- Phil, do you see actual disk I/O happening
from the stats writes, or is it just kernel calls?

During my tests the system was idle; I would run "psql -c 'select 1;'"
and see the blocks in vmstat's "bo" column a couple of seconds later.
As I understand it that indicates actual I/O, and the delay suggests
that it is being flushed by the kernel. When the system is busy it is
harder to see what is going on and it is possible that at least some of
this activity was not being written to the disk. Typically I would see
a lot more write bandwidth than read bandwidth (by a factor of 5 or so)
according to vmstat; any advice about how to identify what files or
processes are involved would be appreciated. I had previously imagined
that it could be temporary tables. This is Linux 2.4.26 and an ext3
filesystem.

Having disabled stats earlier my stats file is still quite small.
Presumably it will gradually grow back. In the meantime I cannot do any
experiments.

Thanks as ever for your prompt responses.

Regards,

--Phil.

#11Phil Endecott
spam_from_postgresql_general@chezphil.org
In reply to: Scott Marlowe (#7)
Re: Megabytes of stats saved after every connection

Scott Marlowe wrote:

Yeah, I found these three facets of the OP's system a bit disconcerting:

QUOTE ---
This is for a web application which uses a new connection for each CGI
request.
The server doesn't have a particularly high disk bandwidth and this
mysterious activity had been the bottleneck for some time.
The system is a little unusual as one of the databases has tens of
thousands of tables.
ENDQUOTE ---

Any two of those choices could cause some issues, but all three together
are pretty much a death knell for performance, whether or not the
global/pgstat file is being written or not.

See my previous message for some background about the application and an
example URL. When PostgreSQL is running smoothly, it is not the
bottleneck in the system: all it has to do is read maybe 100k from the
disk (or more likely the cache), do some in-memory sorts and joins, and
pass it to the rest of the application.

As far as I can see it is only because some parts of PostgreSQL have
poor O(num tables) performance that things are slowing down.

--Phil.

#12Peter Wiersig
peter@friesenpeter.de
In reply to: Phil Endecott (#10)
Re: Megabytes of stats saved after every connection

On Thu, Jul 28, 2005 at 08:31:21PM +0100, Phil Endecott wrote:

This is Linux 2.4.26 and an ext3 filesystem.

With the dir_index feature or without?

Peter

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Wiersig (#12)
Re: Megabytes of stats saved after every connection

On Thu, Jul 28, 2005 at 09:43:44PM +0200, Peter Wiersig wrote:

On Thu, Jul 28, 2005 at 08:31:21PM +0100, Phil Endecott wrote:

This is Linux 2.4.26 and an ext3 filesystem.

With the dir_index feature or without?

Also, with data=ordered, data=writeback or data=journal?
(First one is default value)

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"No renuncies a nada. No te aferres a nada."

#14Guy Rouillier
guyr@masergy.com
In reply to: Alvaro Herrera (#13)
Re: Megabytes of stats saved after every connection

Jan Wieck wrote:

Then again, the stats file is only written. There is nothing that
actually forces the blocks out. On a busy system, one individual stats
file will be created, written to, renamed, live for 500ms and be
thrown away by the next stat files rename operation. I would assume
that with a decent filesystem and appropriate OS buffers, none of the
data blocks of most stat files even hit the disk. I must be missing
something.

(From someone who is at best semi-informed).. Unless battery-backed
cache is available, we are advised to run with fsync enabled. Wouldn't
that affect the stats files as well?

--
Guy Rouillier

#15Phil Endecott
spam_from_postgresql_general@chezphil.org
In reply to: Guy Rouillier (#14)
Re: Megabytes of stats saved after every connection

This is Linux 2.4.26 and an ext3 filesystem.

With the dir_index feature or without?

With, I believe. It is enabled in the superblock (tune2fs -O dir_index)
but this was not done when the filesystem was created so only new
directories are indexed I think. I don't think there's a way to index
an existing directory on a mounted filesystem, or to tell if a
particular directory is indexed. I created new directories for my
postgres data and moved the files into them in the hope that they would
then have indexes, but am not sure how to check.

In any case, this does not seem to be a bottleneck.

--Phil.

#16Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Guy Rouillier (#14)
Re: Megabytes of stats saved after every connection

On Thu, Jul 28, 2005 at 05:48:21PM -0500, Guy Rouillier wrote:

Jan Wieck wrote:

Then again, the stats file is only written. There is nothing that
actually forces the blocks out. On a busy system, one individual stats
file will be created, written to, renamed, live for 500ms and be
thrown away by the next stat files rename operation. I would assume
that with a decent filesystem and appropriate OS buffers, none of the
data blocks of most stat files even hit the disk. I must be missing
something.

(From someone who is at best semi-informed).. Unless battery-backed
cache is available, we are advised to run with fsync enabled. Wouldn't
that affect the stats files as well?

The stats file is dispensable. In fact, it has been proposed that on
crash recovery the stat file should be deleted.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Y eso te lo doy firmado con mis l�grimas" (Fiebre del Loco)

#17Steve Atkins
steve@blighty.com
In reply to: Bruce Momjian (#9)
Re: Megabytes of stats saved after every connection

On Thu, Jul 28, 2005 at 03:12:33PM -0400, Greg Stark wrote:

I think occasionally people get bitten by not having their pg_* tables being
vacuumed or analyzed regularly. If you have lots of tables and the stats are
never updated for pg_class or related tables you can find the planner taking a
long time to plan queries.

This happens if you schedule a cron job to do your vacuuming and analyzing but
connect as a user other than the database owner. For example, you leave the
database owned by "postgres" but create a user to own all the tables and use
that to run regularly scheduled "vacuum analyze"s.

I'm not sure how often these types of problems get properly diagnosed. The
symptoms are quite mysterious. In retrospect I think I observed something like
it and never figured out what was going on. The problem only went away when I
upgraded the database and went through an initdb cycle.

I've had exactly this problem at least five times, twice on my own
systems and three times that I noticed on customer machines. It's an
easy mistake to make on a system that doesn't have much interactive
use, and if you're creating and dropping a lot of tables it can
devastate your performance after a while.

Cheers,
Steve

#18Jeff
threshar@torgo.978.org
In reply to: Jan Wieck (#5)
Re: Megabytes of stats saved after every connection

On Jul 28, 2005, at 2:40 PM, Jan Wieck wrote:

Then again, the stats file is only written. There is nothing that
actually forces the blocks out. On a busy system, one individual
stats file will be created, written to,

If one is running with stats_reset_on_server_start true (the default)
do we even need this file if it is never read?

To help alleviate his problem could he symlink it to /dev/null?

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

#19Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jeff (#18)
Re: Megabytes of stats saved after every connection

On Fri, Jul 29, 2005 at 09:08:28AM -0400, Jeff Trout wrote:

On Jul 28, 2005, at 2:40 PM, Jan Wieck wrote:

Then again, the stats file is only written. There is nothing that
actually forces the blocks out. On a busy system, one individual
stats file will be created, written to,

If one is running with stats_reset_on_server_start true (the default)
do we even need this file if it is never read?

To help alleviate his problem could he symlink it to /dev/null?

If you don't want the stat collector to run, you can disable it. No
need to play games with a useless pgstat file.

Anyway -- I see this as an argument in favor of the ability to
deactivate stats on a per-database basis.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"The important things in the world are problems with society that we don't
understand at all. The machines will become more complicated but they won't
be more complicated than the societies that run them." (Freeman Dyson)

#20Bruce Momjian
bruce@momjian.us
In reply to: Phil Endecott (#10)
Re: Megabytes of stats saved after every connection

Phil Endecott <spam_from_postgresql_general@chezphil.org> writes:

Hello again,

Just to give a bit of background, in case it is useful: this is my family tree
website, treefic.com. I have a schema for each user, each with about a dozen
tables. In most cases the tables are small, i.e. tens of entries, but the
users I care about are the ones with tens of thousands of people in their
trees. The schemas are independent of each other. Example web page:

I would strongly suggest you reconsider this design altogether. A normal (and
normalized) design would have a users table that assigns a sequential id to
each user. Then every other table would combine everybody's data but have a
user id column to indicate which user that row belonged to.

If you don't believe there's anything wrong with your current system, consider
what it would look like to query your existing schema to find out the answer
to the question "how many users have > 1000 people in their tree". Or "how
many users have updated their tree in the last 7 days".

In a normalized database you really want one table for any given type of data.
Not hundreds of tables that contain the same type data but for different
people.

--
greg

#21Phil Endecott
spam_from_postgresql_general@chezphil.org
In reply to: Bruce Momjian (#20)
#22Bruce Momjian
bruce@momjian.us
In reply to: Phil Endecott (#21)
#23Phil Endecott
spam_from_postgresql_general@chezphil.org
In reply to: Bruce Momjian (#22)
#24Bruce Momjian
bruce@momjian.us
In reply to: Phil Endecott (#23)