[rfc] overhauling pgstat.stat

Started by Satoshi Nagayasuover 12 years ago31 messageshackers

snaga@uptime.jp

over 12 years ago

Hi,

I'm considering overhauling pgstat.stat, and would like to know how many
people are interested in this topic.

As you may know, this file could be handreds of MB in size, because
pgstat.stat holds all access statistics in each database, and it needs
to read/write an entire pgstat.stat frequently.

As a result, pgstat.stat often generates massive I/O operation,
particularly when having a large number of tables in the database.

To support multi-tenancy or just a large number of tables (up to 10k
tables in single database), I think pgstat.stat needs to be overhauled.

I think using heap and btree in pgstat.stat would be preferred to reduce
read/write and to allow updating access statistics for specific tables
in pgstat.stat file.

Is this good for us?

Any comments or suggestions?

Regards,
--
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Atri Sharma

atri.jiit@gmail.com

over 12 years ago

In reply to: Satoshi Nagayasu (#1)

Re: [rfc] overhauling pgstat.stat

On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:

Hi,

I'm considering overhauling pgstat.stat, and would like to know how many
people are interested in this topic.

As you may know, this file could be handreds of MB in size, because
pgstat.stat holds all access statistics in each database, and it needs
to read/write an entire pgstat.stat frequently.

As a result, pgstat.stat often generates massive I/O operation,
particularly when having a large number of tables in the database.

To support multi-tenancy or just a large number of tables (up to 10k
tables in single database), I think pgstat.stat needs to be overhauled.

I think using heap and btree in pgstat.stat would be preferred to reduce
read/write and to allow updating access statistics for specific tables
in pgstat.stat file.

Is this good for us?

Hi,

Nice thought. I/O reduction in pgstat can be really helpful.

I am trying to think of our aim here. Would we be looking to split
pgstat per table, so that the I/O write happens for only a portion of
pgstat? Or reduce the I/O in general?

If the later, how would using BTree help us? I would rather go for a
range tree or something. But again, I may be completely wrong.

Please elaborate a bit more on the solution we are trying to
achieve.It seems really interesting.

Regards,

Atri

--
Regards,

Atri
l'apprenant

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alvaro Herrera

alvherre@2ndquadrant.com

over 12 years ago

In reply to: Satoshi Nagayasu (#1)

Re: [rfc] overhauling pgstat.stat

Satoshi Nagayasu wrote:

As you may know, this file could be handreds of MB in size, because
pgstat.stat holds all access statistics in each database, and it needs
to read/write an entire pgstat.stat frequently.

As a result, pgstat.stat often generates massive I/O operation,
particularly when having a large number of tables in the database.

We already changed it:

commit 187492b6c2e8cafc5b39063ca3b67846e8155d24
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Mon Feb 18 17:56:08 2013 -0300

Split pgstat file in smaller pieces

We now write one file per database and one global file, instead of
having the whole thing in a single huge file. This reduces the I/O that
must be done when partial data is required -- which is all the time,
because each process only needs information on its own database anyway.
Also, the autovacuum launcher does not need data about tables and
functions in each database; having the global stats for all DBs is
enough.

Catalog version bumped because we have a new subdir under PGDATA.

Author: Tomas Vondra. Some rework by ï¿½lvaro
Testing by Jeff Janes
Other discussion by Heikki Linnakangas, Tom Lane.

I don't oppose further tweaking, of course, but I wonder if you are
considering these changes.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Satoshi Nagayasu

snaga@uptime.jp

over 12 years ago

In reply to: Alvaro Herrera (#3)

Re: [rfc] overhauling pgstat.stat

Hi,

(2013/09/04 13:07), Alvaro Herrera wrote:

Satoshi Nagayasu wrote:

As you may know, this file could be handreds of MB in size, because
pgstat.stat holds all access statistics in each database, and it needs
to read/write an entire pgstat.stat frequently.

As a result, pgstat.stat often generates massive I/O operation,
particularly when having a large number of tables in the database.

We already changed it:

commit 187492b6c2e8cafc5b39063ca3b67846e8155d24
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Mon Feb 18 17:56:08 2013 -0300

Split pgstat file in smaller pieces

Thanks for the comments. I forgot to mention that.

Yes, we have already split single pgstat.stat file into
several pieces.

However, we still need to read/write large amount of statistics
data when we have a large number of tables in single database
or multiple databases being accessed. Right?

I think the issue here is that it is necessary to write/read
statistics data even it's not actually changed.

So, I'm wondering how we can minimize read/write operations
on these statistics data files with using heap and btree.

Regards,

commit 187492b6c2e8cafc5b39063ca3b67846e8155d24
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Mon Feb 18 17:56:08 2013 -0300

Split pgstat file in smaller pieces

We now write one file per database and one global file, instead of
having the whole thing in a single huge file. This reduces the I/O that
must be done when partial data is required -- which is all the time,
because each process only needs information on its own database anyway.
Also, the autovacuum launcher does not need data about tables and
functions in each database; having the global stats for all DBs is
enough.

Catalog version bumped because we have a new subdir under PGDATA.

Author: Tomas Vondra. Some rework by ï¿½lvaro
Testing by Jeff Janes
Other discussion by Heikki Linnakangas, Tom Lane.

I don't oppose further tweaking, of course, but I wonder if you are
considering these changes.

--
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Satoshi Nagayasu

snaga@uptime.jp

over 12 years ago

In reply to: Atri Sharma (#2)

Re: [rfc] overhauling pgstat.stat

Hi,

(2013/09/04 12:52), Atri Sharma wrote:

On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:

Hi,

I'm considering overhauling pgstat.stat, and would like to know how many
people are interested in this topic.

As you may know, this file could be handreds of MB in size, because
pgstat.stat holds all access statistics in each database, and it needs
to read/write an entire pgstat.stat frequently.

As a result, pgstat.stat often generates massive I/O operation,
particularly when having a large number of tables in the database.

To support multi-tenancy or just a large number of tables (up to 10k
tables in single database), I think pgstat.stat needs to be overhauled.

I think using heap and btree in pgstat.stat would be preferred to reduce
read/write and to allow updating access statistics for specific tables
in pgstat.stat file.

Is this good for us?

Hi,

Nice thought. I/O reduction in pgstat can be really helpful.

I am trying to think of our aim here. Would we be looking to split
pgstat per table, so that the I/O write happens for only a portion of
pgstat? Or reduce the I/O in general?

I prefer the latter.

Under the current implementation, DBA need to split single database
into many smaller databases with considering access locality of the
tables. It's difficult and could be change in future.

And splitting the statistics data into many files (per table,
for example) would cause another performance issue when
collecting/showing statistics at once. Just my guess though.

So, I'm looking for a new way to reduce I/O for the statistics data
in general.

Regards,

If the later, how would using BTree help us? I would rather go for a
range tree or something. But again, I may be completely wrong.

Please elaborate a bit more on the solution we are trying to
achieve.It seems really interesting.

Regards,

Atri

--
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Atri Sharma

atri.jiit@gmail.com

over 12 years ago

In reply to: Satoshi Nagayasu (#5)

Re: [rfc] overhauling pgstat.stat

Sent from my iPad

On 04-Sep-2013, at 10:54, Satoshi Nagayasu <snaga@uptime.jp> wrote:

Hi,

(2013/09/04 12:52), Atri Sharma wrote:

On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:

Hi,

I'm considering overhauling pgstat.stat, and would like to know how many
people are interested in this topic.

As you may know, this file could be handreds of MB in size, because
pgstat.stat holds all access statistics in each database, and it needs
to read/write an entire pgstat.stat frequently.

As a result, pgstat.stat often generates massive I/O operation,
particularly when having a large number of tables in the database.

To support multi-tenancy or just a large number of tables (up to 10k
tables in single database), I think pgstat.stat needs to be overhauled.

I think using heap and btree in pgstat.stat would be preferred to reduce
read/write and to allow updating access statistics for specific tables
in pgstat.stat file.

Is this good for us?

Hi,

Nice thought. I/O reduction in pgstat can be really helpful.

I am trying to think of our aim here. Would we be looking to split
pgstat per table, so that the I/O write happens for only a portion of
pgstat? Or reduce the I/O in general?

I prefer the latter.

Under the current implementation, DBA need to split single database
into many smaller databases with considering access locality of the
tables. It's difficult and could be change in future.

And splitting the statistics data into many files (per table,
for example) would cause another performance issue when
collecting/showing statistics at once. Just my guess though.

So, I'm looking for a new way to reduce I/O for the statistics data
in general.

Regards,

If the later, how would using BTree help us? I would rather go for a
range tree or something. But again, I may be completely wrong.

Please elaborate a bit more on the solution we are trying to
achieve.It seems really interesting.

Regards,

Atri

Right,thanks.

How would using heap and BTree help here? Are we looking at a priority queue which supports the main storage system of the stats?

Regards,

Atri

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Satoshi Nagayasu

snaga@uptime.jp

over 12 years ago

In reply to: Atri Sharma (#6)

Re: [rfc] overhauling pgstat.stat

(2013/09/04 15:23), Atri Sharma wrote:

Sent from my iPad

On 04-Sep-2013, at 10:54, Satoshi Nagayasu <snaga@uptime.jp> wrote:

Hi,

(2013/09/04 12:52), Atri Sharma wrote:

On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:

Hi,

I'm considering overhauling pgstat.stat, and would like to know how many
people are interested in this topic.

As you may know, this file could be handreds of MB in size, because
pgstat.stat holds all access statistics in each database, and it needs
to read/write an entire pgstat.stat frequently.

As a result, pgstat.stat often generates massive I/O operation,
particularly when having a large number of tables in the database.

To support multi-tenancy or just a large number of tables (up to 10k
tables in single database), I think pgstat.stat needs to be overhauled.

I think using heap and btree in pgstat.stat would be preferred to reduce
read/write and to allow updating access statistics for specific tables
in pgstat.stat file.

Is this good for us?

Hi,

Nice thought. I/O reduction in pgstat can be really helpful.

I am trying to think of our aim here. Would we be looking to split
pgstat per table, so that the I/O write happens for only a portion of
pgstat? Or reduce the I/O in general?

I prefer the latter.

Under the current implementation, DBA need to split single database
into many smaller databases with considering access locality of the
tables. It's difficult and could be change in future.

And splitting the statistics data into many files (per table,
for example) would cause another performance issue when
collecting/showing statistics at once. Just my guess though.

So, I'm looking for a new way to reduce I/O for the statistics data
in general.

Regards,

If the later, how would using BTree help us? I would rather go for a
range tree or something. But again, I may be completely wrong.

Please elaborate a bit more on the solution we are trying to
achieve.It seems really interesting.

Regards,

Atri

Right,thanks.

How would using heap and BTree help here? Are we looking at a priority queue which supports the main storage system of the stats?

For example, when you read only a single block from your table,
then you need to write all values in your database statistics next.
It often generates large amount of i/o operation.

However, if random access is allowed in the statistics, you can
update only as single record for the specific table which you read.
It would be less than 100 bytes for each table.

I have no idea about how a priority queue can work here so far.
However, if the statistics is overhauled, PostgreSQL would be able
to host a much larger number of customers more easily.

Regards,
--
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Stephen Frost

sfrost@snowman.net

over 12 years ago

In reply to: Satoshi Nagayasu (#4)

Re: [rfc] overhauling pgstat.stat

Satoshi,

* Satoshi Nagayasu (snaga@uptime.jp) wrote:

(2013/09/04 13:07), Alvaro Herrera wrote:

We already changed it:

commit 187492b6c2e8cafc5b39063ca3b67846e8155d24
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Mon Feb 18 17:56:08 2013 -0300

Split pgstat file in smaller pieces

Thanks for the comments. I forgot to mention that.

Yes, we have already split single pgstat.stat file into
several pieces.

However, we still need to read/write large amount of statistics
data when we have a large number of tables in single database
or multiple databases being accessed. Right?

Would simply also splitting per tablespace help?

I think the issue here is that it is necessary to write/read
statistics data even it's not actually changed.

So, I'm wondering how we can minimize read/write operations
on these statistics data files with using heap and btree.

It does sound like an interesting idea to use heap/btree instead but I
wonder about the effort involved, particularly around coordinating
access. We wouldn't want to end up introducing additional contention
points by doing this..

Thanks,

Stephen

Atri Sharma

atri.jiit@gmail.com

over 12 years ago

In reply to: Satoshi Nagayasu (#7)

Re: [rfc] overhauling pgstat.stat

Sent from my iPad

On 04-Sep-2013, at 15:39, Satoshi Nagayasu <snaga@uptime.jp> wrote:

(2013/09/04 15:23), Atri Sharma wrote:

Sent from my iPad

On 04-Sep-2013, at 10:54, Satoshi Nagayasu <snaga@uptime.jp> wrote:

Hi,

(2013/09/04 12:52), Atri Sharma wrote:

On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote:

Hi,

I'm considering overhauling pgstat.stat, and would like to know how many
people are interested in this topic.

As you may know, this file could be handreds of MB in size, because
pgstat.stat holds all access statistics in each database, and it needs
to read/write an entire pgstat.stat frequently.

As a result, pgstat.stat often generates massive I/O operation,
particularly when having a large number of tables in the database.

To support multi-tenancy or just a large number of tables (up to 10k
tables in single database), I think pgstat.stat needs to be overhauled.

I think using heap and btree in pgstat.stat would be preferred to reduce
read/write and to allow updating access statistics for specific tables
in pgstat.stat file.

Is this good for us?

Hi,

Nice thought. I/O reduction in pgstat can be really helpful.

I am trying to think of our aim here. Would we be looking to split
pgstat per table, so that the I/O write happens for only a portion of
pgstat? Or reduce the I/O in general?

I prefer the latter.

Under the current implementation, DBA need to split single database
into many smaller databases with considering access locality of the
tables. It's difficult and could be change in future.

And splitting the statistics data into many files (per table,
for example) would cause another performance issue when
collecting/showing statistics at once. Just my guess though.

So, I'm looking for a new way to reduce I/O for the statistics data
in general.

Regards,

If the later, how would using BTree help us? I would rather go for a
range tree or something. But again, I may be completely wrong.

Please elaborate a bit more on the solution we are trying to
achieve.It seems really interesting.

Regards,

Atri

Right,thanks.

How would using heap and BTree help here? Are we looking at a priority queue which supports the main storage system of the stats?

For example, when you read only a single block from your table,
then you need to write all values in your database statistics next.
It often generates large amount of i/o operation.

However, if random access is allowed in the statistics, you can
update only as single record for the specific table which you read.
It would be less than 100 bytes for each table.

I have no idea about how a priority queue can work here so far.
However, if the statistics is overhauled, PostgreSQL would be able
to host a much larger number of customers

Ah, now I get it. Thanks a ton for the detailed explanation.

Yes, a BTree will sufficiently isolate per table stats here and allow for random access.

Another thing I can think of is having a write back cache which could probably be used for a buffer before the actual stats write. I am just musing here though.

Regards,

Atri

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Pavel Stehule

pavel.stehule@gmail.com

over 12 years ago

In reply to: Atri Sharma (#9)

Re: [rfc] overhauling pgstat.stat

2013/9/4 Atri Sharma <atri.jiit@gmail.com>

Sent from my iPad

On 04-Sep-2013, at 15:39, Satoshi Nagayasu <snaga@uptime.jp> wrote:

(2013/09/04 15:23), Atri Sharma wrote:

Sent from my iPad

On 04-Sep-2013, at 10:54, Satoshi Nagayasu <snaga@uptime.jp> wrote:

Hi,

(2013/09/04 12:52), Atri Sharma wrote:

On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <snaga@uptime.jp>

wrote:

Hi,

I'm considering overhauling pgstat.stat, and would like to know how

many

people are interested in this topic.

As you may know, this file could be handreds of MB in size, because
pgstat.stat holds all access statistics in each database, and it

needs

to read/write an entire pgstat.stat frequently.

As a result, pgstat.stat often generates massive I/O operation,
particularly when having a large number of tables in the database.

To support multi-tenancy or just a large number of tables (up to 10k
tables in single database), I think pgstat.stat needs to be

overhauled.

I think using heap and btree in pgstat.stat would be preferred to

reduce

read/write and to allow updating access statistics for specific

tables

in pgstat.stat file.

Is this good for us?

Hi,

Nice thought. I/O reduction in pgstat can be really helpful.

I am trying to think of our aim here. Would we be looking to split
pgstat per table, so that the I/O write happens for only a portion of
pgstat? Or reduce the I/O in general?

I prefer the latter.

Under the current implementation, DBA need to split single database
into many smaller databases with considering access locality of the
tables. It's difficult and could be change in future.

And splitting the statistics data into many files (per table,
for example) would cause another performance issue when
collecting/showing statistics at once. Just my guess though.

So, I'm looking for a new way to reduce I/O for the statistics data
in general.

Regards,

If the later, how would using BTree help us? I would rather go for a
range tree or something. But again, I may be completely wrong.

Please elaborate a bit more on the solution we are trying to
achieve.It seems really interesting.

Regards,

Atri

Right,thanks.

How would using heap and BTree help here? Are we looking at a priority

queue which supports the main storage system of the stats?

For example, when you read only a single block from your table,
then you need to write all values in your database statistics next.
It often generates large amount of i/o operation.

However, if random access is allowed in the statistics, you can
update only as single record for the specific table which you read.
It would be less than 100 bytes for each table.

I have no idea about how a priority queue can work here so far.
However, if the statistics is overhauled, PostgreSQL would be able
to host a much larger number of customers

Ah, now I get it. Thanks a ton for the detailed explanation.

Yes, a BTree will sufficiently isolate per table stats here and allow for
random access.

Another thing I can think of is having a write back cache which could
probably be used for a buffer before the actual stats write. I am just
musing here though.

we very successfully use a tmpfs volume for pgstat files (use a backport of
multiple statfiles from 9.3 to 9.1)

Regards

Pavel

Show quoted text

Regards,

Atri

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Pavel Stehule

pavel.stehule@gmail.com

over 12 years ago

In reply to: Atri Sharma (#9)

Re: [rfc] overhauling pgstat.stat

2013/9/4 Atri Sharma <atri.jiit@gmail.com>

Sent from my iPad

On 04-Sep-2013, at 15:39, Satoshi Nagayasu <snaga@uptime.jp> wrote:

(2013/09/04 15:23), Atri Sharma wrote:

Sent from my iPad

On 04-Sep-2013, at 10:54, Satoshi Nagayasu <snaga@uptime.jp> wrote:

Hi,

(2013/09/04 12:52), Atri Sharma wrote:

On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <snaga@uptime.jp>

wrote:

Hi,

I'm considering overhauling pgstat.stat, and would like to know how

many

people are interested in this topic.

As you may know, this file could be handreds of MB in size, because
pgstat.stat holds all access statistics in each database, and it

needs

to read/write an entire pgstat.stat frequently.

As a result, pgstat.stat often generates massive I/O operation,
particularly when having a large number of tables in the database.

To support multi-tenancy or just a large number of tables (up to 10k
tables in single database), I think pgstat.stat needs to be

overhauled.

I think using heap and btree in pgstat.stat would be preferred to

reduce

read/write and to allow updating access statistics for specific

tables

in pgstat.stat file.

Is this good for us?

Hi,

Nice thought. I/O reduction in pgstat can be really helpful.

I am trying to think of our aim here. Would we be looking to split
pgstat per table, so that the I/O write happens for only a portion of
pgstat? Or reduce the I/O in general?

I prefer the latter.

Under the current implementation, DBA need to split single database
into many smaller databases with considering access locality of the
tables. It's difficult and could be change in future.

And splitting the statistics data into many files (per table,
for example) would cause another performance issue when
collecting/showing statistics at once. Just my guess though.

So, I'm looking for a new way to reduce I/O for the statistics data
in general.

Regards,

If the later, how would using BTree help us? I would rather go for a
range tree or something. But again, I may be completely wrong.

Please elaborate a bit more on the solution we are trying to
achieve.It seems really interesting.

Regards,

Atri

Right,thanks.

How would using heap and BTree help here? Are we looking at a priority

queue which supports the main storage system of the stats?

For example, when you read only a single block from your table,
then you need to write all values in your database statistics next.
It often generates large amount of i/o operation.

However, if random access is allowed in the statistics, you can
update only as single record for the specific table which you read.
It would be less than 100 bytes for each table.

I have no idea about how a priority queue can work here so far.
However, if the statistics is overhauled, PostgreSQL would be able
to host a much larger number of customers

Ah, now I get it. Thanks a ton for the detailed explanation.

Yes, a BTree will sufficiently isolate per table stats here and allow for
random access.

Another thing I can think of is having a write back cache which could
probably be used for a buffer before the actual stats write. I am just
musing here though.

we very successfully use a tmpfs volume for pgstat files (use a backport of
multiple statfiles from 9.3 to 9.1

Show quoted text

Regards,

Atri

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 12 years ago

In reply to: Satoshi Nagayasu (#5)

Re: [rfc] overhauling pgstat.stat

On 4.9.2013 07:24, Satoshi Nagayasu wrote:

Hi,

(2013/09/04 12:52), Atri Sharma wrote:

On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <snaga@uptime.jp>
wrote:

Hi,

I'm considering overhauling pgstat.stat, and would like to know
how many people are interested in this topic.

As you may know, this file could be handreds of MB in size,
because pgstat.stat holds all access statistics in each database,
and it needs to read/write an entire pgstat.stat frequently.

As a result, pgstat.stat often generates massive I/O operation,
particularly when having a large number of tables in the
database.

To support multi-tenancy or just a large number of tables (up to
10k tables in single database), I think pgstat.stat needs to be
overhauled.

I think using heap and btree in pgstat.stat would be preferred to
reduce read/write and to allow updating access statistics for
specific tables in pgstat.stat file.

Is this good for us?

Hi,

Nice thought. I/O reduction in pgstat can be really helpful.

I am trying to think of our aim here. Would we be looking to split
pgstat per table, so that the I/O write happens for only a portion
of pgstat? Or reduce the I/O in general?

I prefer the latter.

Under the current implementation, DBA need to split single database
into many smaller databases with considering access locality of the
tables. It's difficult and could be change in future.

And splitting the statistics data into many files (per table, for
example) would cause another performance issue when
collecting/showing statistics at once. Just my guess though.

So, I'm looking for a new way to reduce I/O for the statistics data
in general.

Hi,

as one of the authors of the 9.3 patch (per database stats), I planned
to work on this a bit more in the 9.4 cycle. So a few comments / ideas.

I'm not entirely sure splitting the stats per table would be that bad.
After all we already have per-relation data files, so either the users
already have serious problems (so this won't make it noticeably worse)
or it will work fine. But I'm not saying it's the right choice either.

My idea was to keep the per-database stats, but allow some sort of
"random" access - updating / deleting the records in place, adding
records etc. The simplest way I could think of was adding a simple
"index" - a mapping of OID to position in the stat file.

I.e. a simple array of (oid, offset) pairs, stored in oid.stat.index or
something like that. This would make it quite simple to access existing
record

1: get position from the index
2: read sizeof(Entry) from the file
3: if it's update, just overwrite the bytes, for delete set isdeleted
flag (needs to be added to all entries)

or reading all the records (just read the whole file as today).

regards
Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 12 years ago

In reply to: Pavel Stehule (#11)

Re: [rfc] overhauling pgstat.stat

On 4.9.2013 14:43, Pavel Stehule wrote:

2013/9/4 Atri Sharma <atri.jiit@gmail.com <mailto:atri.jiit@gmail.com>>

Sent from my iPad

On 04-Sep-2013, at 15:39, Satoshi Nagayasu <snaga@uptime.jp
<mailto:snaga@uptime.jp>> wrote:

(2013/09/04 15:23), Atri Sharma wrote:

Sent from my iPad

On 04-Sep-2013, at 10:54, Satoshi Nagayasu <snaga@uptime.jp

<mailto:snaga@uptime.jp>> wrote:

Hi,

(2013/09/04 12:52), Atri Sharma wrote:

On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu

<snaga@uptime.jp <mailto:snaga@uptime.jp>> wrote:

Hi,

I'm considering overhauling pgstat.stat, and would like to

know how many

people are interested in this topic.

As you may know, this file could be handreds of MB in size,

because

pgstat.stat holds all access statistics in each database, and

it needs

to read/write an entire pgstat.stat frequently.

As a result, pgstat.stat often generates massive I/O operation,
particularly when having a large number of tables in the database.

To support multi-tenancy or just a large number of tables (up

to 10k

tables in single database), I think pgstat.stat needs to be

overhauled.

I think using heap and btree in pgstat.stat would be preferred

to reduce

read/write and to allow updating access statistics for

specific tables

in pgstat.stat file.

Is this good for us?

Hi,

Nice thought. I/O reduction in pgstat can be really helpful.

I am trying to think of our aim here. Would we be looking to split
pgstat per table, so that the I/O write happens for only a

portion of

pgstat? Or reduce the I/O in general?

I prefer the latter.

Under the current implementation, DBA need to split single database
into many smaller databases with considering access locality of the
tables. It's difficult and could be change in future.

And splitting the statistics data into many files (per table,
for example) would cause another performance issue when
collecting/showing statistics at once. Just my guess though.

So, I'm looking for a new way to reduce I/O for the statistics data
in general.

Regards,

If the later, how would using BTree help us? I would rather go

for a

range tree or something. But again, I may be completely wrong.

Please elaborate a bit more on the solution we are trying to
achieve.It seems really interesting.

Regards,

Atri

Right,thanks.

How would using heap and BTree help here? Are we looking at a

priority queue which supports the main storage system of the stats?

For example, when you read only a single block from your table,
then you need to write all values in your database statistics next.
It often generates large amount of i/o operation.

However, if random access is allowed in the statistics, you can
update only as single record for the specific table which you read.
It would be less than 100 bytes for each table.

I have no idea about how a priority queue can work here so far.
However, if the statistics is overhauled, PostgreSQL would be able
to host a much larger number of customers

Ah, now I get it. Thanks a ton for the detailed explanation.

Yes, a BTree will sufficiently isolate per table stats here and
allow for random access.

Another thing I can think of is having a write back cache which
could probably be used for a buffer before the actual stats write. I
am just musing here though.

we very successfully use a tmpfs volume for pgstat files (use a backport
of multiple statfiles from 9.3 to 9.1

It works quite well as long as you have the objects (tables, indexes,
functions) spread across multiple databases. Once you have one database
with very large number of objects, tmpfs is not as effective.

It's going to help with stats I/O, but it's not going to help with high
CPU usage (you're reading and parsing the stat files over and over) and
every rewrite creates a copy of the file. So if you have 400MB stats,
you will need 800MB tmpfs + some slack (say, 200MB). That means you'll
use ~1GB tmpfs although 400MB would be just fine. And this 600MB won't
be used for page cache etc.

OTOH, it's true that if you have that many objects, 600MB of RAM is not
going to help you anyway.

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Pavel Stehule

pavel.stehule@gmail.com

over 12 years ago

In reply to: Tomas Vondra (#13)

Re: [rfc] overhauling pgstat.stat

we very successfully use a tmpfs volume for pgstat files (use a backport
of multiple statfiles from 9.3 to 9.1

It works quite well as long as you have the objects (tables, indexes,
functions) spread across multiple databases. Once you have one database
with very large number of objects, tmpfs is not as effective.

It's going to help with stats I/O, but it's not going to help with high
CPU usage (you're reading and parsing the stat files over and over) and
every rewrite creates a copy of the file. So if you have 400MB stats,
you will need 800MB tmpfs + some slack (say, 200MB). That means you'll
use ~1GB tmpfs although 400MB would be just fine. And this 600MB won't
be used for page cache etc.

OTOH, it's true that if you have that many objects, 600MB of RAM is not
going to help you anyway.

and just idea - can we use a database for storing these files. It can be
used in unlogged tables. Second idea - hold a one bg worker as persistent
memory key value database and hold data in memory with some optimizations -
using anti cache and similar memory database fetures.

Pavel

Show quoted text

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 12 years ago

In reply to: Stephen Frost (#8)

Re: [rfc] overhauling pgstat.stat

On 4.9.2013 14:13, Stephen Frost wrote:

* Satoshi Nagayasu (snaga@uptime.jp) wrote:

Yes, we have already split single pgstat.stat file into several
pieces.

However, we still need to read/write large amount of statistics
data when we have a large number of tables in single database or
multiple databases being accessed. Right?

Would simply also splitting per tablespace help?

I don't think that's a good solution. Forcing the users to use
tablespaces just to minimize the stats overhead is not that far from
forcing them to use multiple databases.

Tablespaces have disadvantages too, so I'd suggest to look for a
solution metting the "just works" criterion.

I think the issue here is that it is necessary to write/read
statistics data even it's not actually changed.

So, I'm wondering how we can minimize read/write operations on
these statistics data files with using heap and btree.

It does sound like an interesting idea to use heap/btree instead but
I wonder about the effort involved, particularly around coordinating
access. We wouldn't want to end up introducing additional contention
points by doing this..

Yes, this is definitely an important thing to keep in mind. Maintaining
the "index" (no matter what kind of index will be used) will cause some
contention, but I believe it will be manageable with a bit of effort.

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Alvaro Herrera

alvherre@2ndquadrant.com

over 12 years ago

In reply to: Tomas Vondra (#12)

Re: [rfc] overhauling pgstat.stat

Tomas Vondra wrote:

My idea was to keep the per-database stats, but allow some sort of
"random" access - updating / deleting the records in place, adding
records etc. The simplest way I could think of was adding a simple
"index" - a mapping of OID to position in the stat file.

I.e. a simple array of (oid, offset) pairs, stored in oid.stat.index or
something like that. This would make it quite simple to access existing
record

1: get position from the index
2: read sizeof(Entry) from the file
3: if it's update, just overwrite the bytes, for delete set isdeleted
flag (needs to be added to all entries)

or reading all the records (just read the whole file as today).

Sounds reasonable. However, I think the index should be a real index,
i.e. have a tree structure that can be walked down, not just a plain
array. If you have a 400 MB stat file, then you must have about 4
million tables, and you will not want to scan such a large array every
time you want to find an entry.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 12 years ago

In reply to: Alvaro Herrera (#16)

Re: [rfc] overhauling pgstat.stat

On 4.9.2013 20:59, Alvaro Herrera wrote:

Tomas Vondra wrote:

My idea was to keep the per-database stats, but allow some sort of
"random" access - updating / deleting the records in place, adding
records etc. The simplest way I could think of was adding a simple
"index" - a mapping of OID to position in the stat file.

I.e. a simple array of (oid, offset) pairs, stored in oid.stat.index or
something like that. This would make it quite simple to access existing
record

1: get position from the index
2: read sizeof(Entry) from the file
3: if it's update, just overwrite the bytes, for delete set isdeleted
flag (needs to be added to all entries)

or reading all the records (just read the whole file as today).

Sounds reasonable. However, I think the index should be a real index,
i.e. have a tree structure that can be walked down, not just a plain
array. If you have a 400 MB stat file, then you must have about 4
million tables, and you will not want to scan such a large array every
time you want to find an entry.

I was thinking about a sorted array, so a bisection would be a simple
and fast way to search. New items could be added to another small
unsorted array (say, 1000 elements) and this would be extended and
resorted only when this small one gets full.

Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Satoshi Nagayasu

snaga@uptime.jp

over 12 years ago

In reply to: Pavel Stehule (#14)

Re: [rfc] overhauling pgstat.stat

(2013/09/05 3:50), Pavel Stehule wrote:

we very successfully use a tmpfs volume for pgstat files (use a

backport

of multiple statfiles from 9.3 to 9.1

It works quite well as long as you have the objects (tables, indexes,
functions) spread across multiple databases. Once you have one database
with very large number of objects, tmpfs is not as effective.

It's going to help with stats I/O, but it's not going to help with high
CPU usage (you're reading and parsing the stat files over and over) and
every rewrite creates a copy of the file. So if you have 400MB stats,
you will need 800MB tmpfs + some slack (say, 200MB). That means you'll
use ~1GB tmpfs although 400MB would be just fine. And this 600MB won't
be used for page cache etc.

OTOH, it's true that if you have that many objects, 600MB of RAM is not
going to help you anyway.

and just idea - can we use a database for storing these files. It can be
used in unlogged tables. Second idea - hold a one bg worker as
persistent memory key value database and hold data in memory with some
optimizations - using anti cache and similar memory database fetures.

Yeah, I'm interested in this idea too.

If the stat collector has a dedicated connection to the backend in
order to store statistics into dedicated tables, we can easily take
advantages of index (btree, or hash?) and heap storage.

Is this worth trying?

Regards,
--
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Satoshi Nagayasu

snaga@uptime.jp

over 12 years ago

In reply to: Alvaro Herrera (#16)

Re: [rfc] overhauling pgstat.stat

(2013/09/05 3:59), Alvaro Herrera wrote:

Tomas Vondra wrote:

My idea was to keep the per-database stats, but allow some sort of
"random" access - updating / deleting the records in place, adding
records etc. The simplest way I could think of was adding a simple
"index" - a mapping of OID to position in the stat file.

I.e. a simple array of (oid, offset) pairs, stored in oid.stat.index or
something like that. This would make it quite simple to access existing
record

1: get position from the index
2: read sizeof(Entry) from the file
3: if it's update, just overwrite the bytes, for delete set isdeleted
flag (needs to be added to all entries)

or reading all the records (just read the whole file as today).

Sounds reasonable. However, I think the index should be a real index,
i.e. have a tree structure that can be walked down, not just a plain
array. If you have a 400 MB stat file, then you must have about 4
million tables, and you will not want to scan such a large array every
time you want to find an entry.

I thought an array structure at first.

But, for now, I think we should have a real index for the
statistics data because we already have several index storages,
and it will allow us to minimize read/write operations.

BTW, what kind of index would be preferred for this purpose?
btree or hash?

If we use btree, do we need "range scan" thing on the statistics
tables? I have no idea so far.

Regards,
--
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Atri Sharma

atri.jiit@gmail.com

over 12 years ago

In reply to: Satoshi Nagayasu (#19)

Re: [rfc] overhauling pgstat.stat

Sent from my iPad

On 05-Sep-2013, at 8:58, Satoshi Nagayasu <snaga@uptime.jp> wrote:

(2013/09/05 3:59), Alvaro Herrera wrote:

Tomas Vondra wrote:

My idea was to keep the per-database stats, but allow some sort of
"random" access - updating / deleting the records in place, adding
records etc. The simplest way I could think of was adding a simple
"index" - a mapping of OID to position in the stat file.

I.e. a simple array of (oid, offset) pairs, stored in oid.stat.index or
something like that. This would make it quite simple to access existing
record

1: get position from the index
2: read sizeof(Entry) from the file
3: if it's update, just overwrite the bytes, for delete set isdeleted
flag (needs to be added to all entries)

or reading all the records (just read the whole file as today).

Sounds reasonable. However, I think the index should be a real index,
i.e. have a tree structure that can be walked down, not just a plain
array. If you have a 400 MB stat file, then you must have about 4
million tables, and you will not want to scan such a large array every
time you want to find an entry.

I thought an array structure at first.

But, for now, I think we should have a real index for the
statistics data because we already have several index storages,
and it will allow us to minimize read/write operations.

BTW, what kind of index would be preferred for this purpose?
btree or hash?

If we use btree, do we need "range scan" thing on the statistics
tables? I have no idea so far.

The thing I am interested in is range scan. That is the reason I wish to explore range tree usage here, maybe as a secondary index.

Regards,

Atri

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Alvaro Herrera

alvherre@2ndquadrant.com

over 12 years ago

In reply to: Satoshi Nagayasu (#19)

#22

Atri Sharma

atri.jiit@gmail.com

over 12 years ago

In reply to: Alvaro Herrera (#21)

#23

Tomas Vondra