pg_stat_*_columns?

Started by Joel Jacobsonalmost 11 years ago39 messageshackers
Jump to latest
#1Joel Jacobson
joel@trustly.com

Would others find it useful to see per column statistics about accesses to
specific columns?

Two possible use-cases: (maybe there are more?)

1. I think it would be helpful for DBAs to better understand their own
system.
Finding unused *tables* is today easy thanks to pg_stat_*_tables, but
knowing if something is accessing a *column* or not is not easy.
In my case all our database access is via sprocs, so I can just grep the
source code for the column name to see if something is using it, but most
DBAs probably don't have that luxury.

2. It could also be useful for audit trailing, if you want to know what a
query did, i.e. what tables/columns were accessed in the txn.

Here is an idea of a very simple audit trailing system that would probably
fulfill my own needs:

Imagine if we had pg_stat_xact_user_columns and for each committed txn, do
an insert to an unlogged table with the same structure as
pg_stat_xact_user_columns
with the addition of session_user and timestamp for the txn.

I would much rather have audit trailing in a nice table than in a text
file. Maybe a foreign data wrapper could be used to ship the audit trail
data to some other external append-only pg-database, if the purpose of the
audit trailing is to prevent an evil DBA from doing evil things. But for
others it might be sufficient to do audit trailing to the same database,
for convenience purposes.

In summary:

Some users might only be interested in the statistics and mostly use
pg_stat_user_columns.
Other others might also be interested in what happened in a specific txn
and use pg_stat_xact_user_columns.
Yet some other users might be interested in audit trailing and want to log
pg_stat_xact_user_columns for each txn. Probably very expensive performance
wise, but might make sense if you have extremely sensitive data and audit
trailing is more important than performance.

Thoughts?

#2Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Joel Jacobson (#1)
Re: pg_stat_*_columns?

On 6/5/15 6:51 AM, Joel Jacobson wrote:

1. I think it would be helpful for DBAs to better understand their own
system.
Finding unused *tables* is today easy thanks to pg_stat_*_tables, but
knowing if something is accessing a *column* or not is not easy.
In my case all our database access is via sprocs, so I can just grep the
source code for the column name to see if something is using it, but
most DBAs probably don't have that luxury.

I have wanted this exact thing when considering vertical partitioning.
It's easy to tell from a size standpoint what columns are good
candidates for putting in a 'side table', but it's very hard to know how
often columns are actually used.

BTW, I think the right way to measure this would be how many rows were
returned for queries referencing a column. Simply knowing how many
queries reference a column doesn't tell you much; you want to know how
much column data was actually pulled out.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Robert Haas
robertmhaas@gmail.com
In reply to: Joel Jacobson (#1)
Re: pg_stat_*_columns?

On Fri, Jun 5, 2015 at 7:51 AM, Joel Jacobson <joel@trustly.com> wrote:

Would others find it useful to see per column statistics about accesses to
specific columns?

A probably serious and maybe not entirely obvious problem with this is
that it would increase the amount of statistical information we keep
by a pretty large multiple. How many columns do you have in a typical
table? 20-30? That's a lot of data for a statistics collector that,
regrettably, is already overloaded.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Joel Jacobson
joel@trustly.com
In reply to: Robert Haas (#3)
Re: pg_stat_*_columns?

So I've heard from Magnus Hagander today IRL at our Stockholm PostgreSQL
User Group meeting where we discussed this idea. He told me the overhead in
the statistics collector is mainly when reading from it, not that much when
writing to it.
Magnus idea was to first optimize the collector to make it less of a
problem to collect more data. Sounds like a good thing to do, but maybe
more data in it wouldn't be a problem as long as you don't read too often
from it?

On Mon 8 Jun 2015 at 18:48 Robert Haas <robertmhaas@gmail.com> wrote:

Show quoted text

On Fri, Jun 5, 2015 at 7:51 AM, Joel Jacobson <joel@trustly.com> wrote:

Would others find it useful to see per column statistics about accesses

to

specific columns?

A probably serious and maybe not entirely obvious problem with this is
that it would increase the amount of statistical information we keep
by a pretty large multiple. How many columns do you have in a typical
table? 20-30? That's a lot of data for a statistics collector that,
regrettably, is already overloaded.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Joel Jacobson (#4)
Re: pg_stat_*_columns?

On 6/8/15 3:26 PM, Joel Jacobson wrote:

So I've heard from Magnus Hagander today IRL at our Stockholm PostgreSQL
User Group meeting where we discussed this idea. He told me the overhead
in the statistics collector is mainly when reading from it, not that
much when writing to it.

I've heard enough stories of people moving the stats files to faster
storage that I'm not sure how true that really is...

Magnus idea was to first optimize the collector to make it less of a
problem to collect more data. Sounds like a good thing to do, but maybe
more data in it wouldn't be a problem as long as you don't read too
often from it?

The stats collector is a known problem under certain circumstances, so
improving it would probably be a good thing. The first thing that comes
to mind is splitting it into more files.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Joel Jacobson
joel@trustly.com
In reply to: Jim Nasby (#5)
Re: pg_stat_*_columns?

On Tue, Jun 16, 2015 at 4:47 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

Magnus idea was to first optimize the collector to make it less of a

problem to collect more data. Sounds like a good thing to do, but maybe
more data in it wouldn't be a problem as long as you don't read too
often from it?

The stats collector is a known problem under certain circumstances, so
improving it would probably be a good thing. The first thing that comes to
mind is splitting it into more files.

Is there any chance the project would accept a patch which adds the
pg_stat_*_columns-feature without first optimizing the collector? I guess
it primarily depends on how much of the new code that would need to be
rewritten, if the collector is optimized/rewritten in the future?

(I would be interested in sponsoring the work, if anyone is interested.)

#7Robert Haas
robertmhaas@gmail.com
In reply to: Joel Jacobson (#6)
Re: pg_stat_*_columns?

On Sat, Jun 20, 2015 at 10:12 AM, Joel Jacobson <joel@trustly.com> wrote:

Is there any chance the project would accept a patch which adds the
pg_stat_*_columns-feature without first optimizing the collector?

I doubt it. It's such a pain point already that massively increasing
the amount of data we need to store does not seem like a good plan.

I guess it
primarily depends on how much of the new code that would need to be
rewritten, if the collector is optimized/rewritten in the future?

I don't think that's really the issue. It's more that I think it
would be the extra data would be likely to cause real pain for users.

FWIW, I tend to think that the solution here has less to do with
splitting the data up into more files and more to do with rethinking
the format.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#7)
Re: pg_stat_*_columns?

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Jun 20, 2015 at 10:12 AM, Joel Jacobson <joel@trustly.com> wrote:

I guess it
primarily depends on how much of the new code that would need to be
rewritten, if the collector is optimized/rewritten in the future?

I don't think that's really the issue. It's more that I think it
would be the extra data would be likely to cause real pain for users.

Yes. The stats data already causes real pain.

FWIW, I tend to think that the solution here has less to do with
splitting the data up into more files and more to do with rethinking
the format.

I dunno that tweaking the format would accomplish much. Where I'd love
to get to is to not have to write the data to disk at all (except at
shutdown). But that seems to require an adjustable-size shared memory
block, and I'm not sure how to do that. One idea, if the DSM stuff
could be used, is to allow the stats collector to allocate multiple
DSM blocks as needed --- but how well would that work on 32-bit
machines? I'd be worried about running out of address space.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#8)
Re: pg_stat_*_columns?

On Sat, Jun 20, 2015 at 10:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Jun 20, 2015 at 10:12 AM, Joel Jacobson <joel@trustly.com>

wrote:

I guess it
primarily depends on how much of the new code that would need to be
rewritten, if the collector is optimized/rewritten in the future?

I don't think that's really the issue. It's more that I think it
would be the extra data would be likely to cause real pain for users.

Yes. The stats data already causes real pain.

FWIW, I tend to think that the solution here has less to do with
splitting the data up into more files and more to do with rethinking
the format.

I dunno that tweaking the format would accomplish much. Where I'd love
to get to is to not have to write the data to disk at all (except at
shutdown). But that seems to require an adjustable-size shared memory
block, and I'm not sure how to do that. One idea, if the DSM stuff
could be used, is to allow the stats collector to allocate multiple
DSM blocks as needed --- but how well would that work on 32-bit
machines? I'd be worried about running out of address space.

I've considered both that and to perhaps use a shared memory message queue
to communicate. Basically, have a backend send a request when it needs a
snapshot of the stats data and get a copy back through that method instead
of disk. It would be much easier if we didn't actually take a snapshot of
the data per transaction, but we really don't want to give that up (if we
didn't care about that, we could just have a protocol asking for individual
values).

We'd need a way to actually transfer the whole hashtables over, without
rebuilding them on the other end I think. Just the cost of looping over it
to dump and then rehashing everything on the other end seems quite wasteful
and unnecessary.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#9)
Re: pg_stat_*_columns?

Magnus Hagander <magnus@hagander.net> writes:

On Sat, Jun 20, 2015 at 10:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I dunno that tweaking the format would accomplish much. Where I'd love
to get to is to not have to write the data to disk at all (except at
shutdown). But that seems to require an adjustable-size shared memory
block, and I'm not sure how to do that. One idea, if the DSM stuff
could be used, is to allow the stats collector to allocate multiple
DSM blocks as needed --- but how well would that work on 32-bit
machines? I'd be worried about running out of address space.

I've considered both that and to perhaps use a shared memory message queue
to communicate. Basically, have a backend send a request when it needs a
snapshot of the stats data and get a copy back through that method instead
of disk. It would be much easier if we didn't actually take a snapshot of
the data per transaction, but we really don't want to give that up (if we
didn't care about that, we could just have a protocol asking for individual
values).

Yeah, that might work quite nicely, and it would not require nearly as
much surgery on the existing code as mapping the stuff into
constrained-size shmem blocks would do. The point about needing a data
snapshot is a good one as well; I'm not sure how we'd preserve that
behavior if backends are accessing the collector's data structures
directly through shmem.

I wonder if we should think about replacing the IP-socket-based data
transmission protocol with a shared memory queue, as well.

We'd need a way to actually transfer the whole hashtables over, without
rebuilding them on the other end I think. Just the cost of looping over it
to dump and then rehashing everything on the other end seems quite wasteful
and unnecessary.

Meh. All of a sudden you've made it complicated and invasive again,
to get rid of a bottleneck that's not been shown to be a problem.
Let's do the simple thing first, else maybe nothing will happen at all.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#10)
Re: pg_stat_*_columns?

On 06/20/2015 11:32 AM, Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

On Sat, Jun 20, 2015 at 10:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I dunno that tweaking the format would accomplish much. Where I'd love
to get to is to not have to write the data to disk at all (except at
shutdown). But that seems to require an adjustable-size shared memory
block, and I'm not sure how to do that. One idea, if the DSM stuff
could be used, is to allow the stats collector to allocate multiple
DSM blocks as needed --- but how well would that work on 32-bit
machines? I'd be worried about running out of address space.

Hmm. A backend already reads all the stats it needs to backend-private
memory in one go. That consumes about as much address space as the DSM
would, no? I guess it'd double the need, because you'd have mapped both
the DSM and the backend-private memory at the same time.

I've considered both that and to perhaps use a shared memory message queue
to communicate. Basically, have a backend send a request when it needs a
snapshot of the stats data and get a copy back through that method instead
of disk. It would be much easier if we didn't actually take a snapshot of
the data per transaction, but we really don't want to give that up (if we
didn't care about that, we could just have a protocol asking for individual
values).

Yeah, that might work quite nicely, and it would not require nearly as
much surgery on the existing code as mapping the stuff into
constrained-size shmem blocks would do. The point about needing a data
snapshot is a good one as well; I'm not sure how we'd preserve that
behavior if backends are accessing the collector's data structures
directly through shmem.

Usually you use a lock for such things ;-). Acquire lock, copy to
private memory, unlock. If that's too slow, have two copies of the
structure in shared memory, one that's being updated, and one that's
being read, and swap them periodically. Or something like that - this
doesn't seem particularly hard.

I wonder if we should think about replacing the IP-socket-based data
transmission protocol with a shared memory queue, as well.

One problem is that the archiver process is not connected to shared
memory, but calls pgstat_send_archiver() to update the stats whenever it
has archived a file.

If we nevertheless replace the files with dynamic shared memory, and
switch to using shared memory queues for communication, ISTM we might as
well have all the backends update the shared memory directly, and get
rid of the stats collector process altogether. If we didn't already have
a stats collector, that certainly seems like a more straightforward design.

Let's do the simple thing first, else maybe nothing will happen at all.

Yeah, there's that..

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Robert Haas
robertmhaas@gmail.com
In reply to: Magnus Hagander (#9)
Re: pg_stat_*_columns?

On Sat, Jun 20, 2015 at 11:15 AM, Magnus Hagander <magnus@hagander.net> wrote:

I've considered both that and to perhaps use a shared memory message queue
to communicate. Basically, have a backend send a request when it needs a
snapshot of the stats data and get a copy back through that method instead
of disk. It would be much easier if we didn't actually take a snapshot of
the data per transaction, but we really don't want to give that up (if we
didn't care about that, we could just have a protocol asking for individual
values).

We'd need a way to actually transfer the whole hashtables over, without
rebuilding them on the other end I think. Just the cost of looping over it
to dump and then rehashing everything on the other end seems quite wasteful
and unnecessary.

One idea would be to advertise a DSM ID in the main shared memory
segment, and have the individual backends read that value and attach
to it. When new stats are generated, the stats collector creates a
new DSM (which might be bigger or smaller than the old one), writes
the new stats in there, and then advertises the new DSM ID in the main
shared memory segment. Backends that still have the old segment
attached can still use it, and it will go away automatically once they
all drop off.

But I'm not sure how this would work with the new per-database split
of the stats file. I don't think it'll work to have one DSM per
database; we don't support enough DSMs for that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#12)
Re: pg_stat_*_columns?

Robert Haas <robertmhaas@gmail.com> writes:

One idea would be to advertise a DSM ID in the main shared memory
segment, and have the individual backends read that value and attach
to it. When new stats are generated, the stats collector creates a
new DSM (which might be bigger or smaller than the old one), writes
the new stats in there, and then advertises the new DSM ID in the main
shared memory segment. Backends that still have the old segment
attached can still use it, and it will go away automatically once they
all drop off.

Hmmm. This sounds attractive, but what happens if we fail to create
a new DSM when needed?

But I'm not sure how this would work with the new per-database split
of the stats file. I don't think it'll work to have one DSM per
database; we don't support enough DSMs for that.

AFAIR, that per-database split exists only to try to reduce the amount of
traffic written to disk. We could lose it cheerfully if the communication
all happens in shared memory.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#13)
Re: pg_stat_*_columns?

On Sat, Jun 20, 2015 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

One idea would be to advertise a DSM ID in the main shared memory
segment, and have the individual backends read that value and attach
to it. When new stats are generated, the stats collector creates a
new DSM (which might be bigger or smaller than the old one), writes
the new stats in there, and then advertises the new DSM ID in the main
shared memory segment. Backends that still have the old segment
attached can still use it, and it will go away automatically once they
all drop off.

Hmmm. This sounds attractive, but what happens if we fail to create
a new DSM when needed?

Presumably you keep using the old one and retry later. I mean, out of
memory is out of memory; you can't move a variable-size data structure
into shared memory without the possibility of running into OOM
failures at some point.

But I'm not sure how this would work with the new per-database split
of the stats file. I don't think it'll work to have one DSM per
database; we don't support enough DSMs for that.

AFAIR, that per-database split exists only to try to reduce the amount of
traffic written to disk. We could lose it cheerfully if the communication
all happens in shared memory.

If we arranged things so that the processes could use the data in the
DSM directly rather than having to copy it out, we'd presumably save
quite a bit of memory, since the whole structure would be shared
rather than each backend having its own copy. But if the structure
got too big to map (on a 32-bit system), then you'd be sort of hosed,
because there's no way to attach just part of it. That might not be
worth worrying about, but it depends on how big it's likely to get - a
32-bit system is very likely to choke on a 1GB mapping, and maybe even
on a much smaller one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#13)
Re: pg_stat_*_columns?

Hi,

On 06/21/2015 12:15 AM, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

One idea would be to advertise a DSM ID in the main shared memory
segment, and have the individual backends read that value and attach
to it. When new stats are generated, the stats collector creates a
new DSM (which might be bigger or smaller than the old one), writes
the new stats in there, and then advertises the new DSM ID in the main
shared memory segment. Backends that still have the old segment
attached can still use it, and it will go away automatically once they
all drop off.

Hmmm. This sounds attractive, but what happens if we fail to create
a new DSM when needed?

Also, isn't this a potential problem with long-running backends? We
might ultimately end with each backend using a different DSM segment.

But I'm not sure how this would work with the new per-database
split of the stats file. I don't think it'll work to have one DSM
per database; we don't support enough DSMs for that.

AFAIR, that per-database split exists only to try to reduce the
amount of traffic written to disk. We could lose it cheerfully if the
communication all happens in shared memory.

Yes, reducing the amount of writes is one of the benefits of the
per-database split, but there are other benefits too. Obviously, it also
reduces the amount of reads (because each backend can read just the
right portion of stats), and reduction of CPU time (because you're
parsing much less data). But I don't see why we couldn't partition the
stats in DSM in a similar manner.

kind regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#14)
Re: pg_stat_*_columns?

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Jun 20, 2015 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hmmm. This sounds attractive, but what happens if we fail to create
a new DSM when needed?

Presumably you keep using the old one and retry later. I mean, out of
memory is out of memory; you can't move a variable-size data structure
into shared memory without the possibility of running into OOM
failures at some point.

Right, which is one reason that Magnus' thought of replacing the file
transmission mechanism with a message queue might be a good alternative.

If we arranged things so that the processes could use the data in the
DSM directly rather than having to copy it out, we'd presumably save
quite a bit of memory, since the whole structure would be shared
rather than each backend having its own copy.

That is not going to happen, because it would imply locking out the stats
collector from doing any more updates for the entire time that any backend
is looking at the results. We *do* need to copy.

But if the structure
got too big to map (on a 32-bit system), then you'd be sort of hosed,
because there's no way to attach just part of it. That might not be
worth worrying about, but it depends on how big it's likely to get - a
32-bit system is very likely to choke on a 1GB mapping, and maybe even
on a much smaller one.

Yeah, I'm quite worried about assuming that we can map a data structure
that might be of very significant size into shared memory on 32-bit
machines. The address space just isn't there.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#14)
Re: pg_stat_*_columns?

Robert Haas wrote:

If we arranged things so that the processes could use the data in the
DSM directly rather than having to copy it out, we'd presumably save
quite a bit of memory, since the whole structure would be shared
rather than each backend having its own copy. But if the structure
got too big to map (on a 32-bit system), then you'd be sort of hosed,
because there's no way to attach just part of it. That might not be
worth worrying about, but it depends on how big it's likely to get - a
32-bit system is very likely to choke on a 1GB mapping, and maybe even
on a much smaller one.

How realistic it is that you would get a 1 GB mapping on a 32-bit
system? Each table entry is 106 bytes at the moment if my count is
right, so you need about one million tables to get that large a table.
It doesn't sound really realistic to have such a database on a smallish
machine.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#16)
Re: pg_stat_*_columns?

On Sat, Jun 20, 2015 at 7:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

If we arranged things so that the processes could use the data in the
DSM directly rather than having to copy it out, we'd presumably save
quite a bit of memory, since the whole structure would be shared
rather than each backend having its own copy.

That is not going to happen, because it would imply locking out the stats
collector from doing any more updates for the entire time that any backend
is looking at the results. We *do* need to copy.

No, it doesn't mean that at all. As often as needed, the stats
collector would build and publish a new copy of the data in a new DSM
segment. The old one would stick around until the last reader drops
off. So it would be almost totally lock-free. That's a very appealing
design in my book.

Stuffing all of the data through a message queue will be very
inefficient by comparison. If the queue is big enough to fit all the
info in a single chunk, it will be nice and fast. If not, the process
receiving the data will have to read a chunk and then go to sleep
while it waits for the next chunk to be sent. Even if you make the
message queue pretty big, like 1MB, a large stats file, say 80MB, will
require ~160 context switches to transfer. That's probably going to
suck, especially if the stats collector does anything other than
trying to service writes to the queues.

But if the structure
got too big to map (on a 32-bit system), then you'd be sort of hosed,
because there's no way to attach just part of it. That might not be
worth worrying about, but it depends on how big it's likely to get - a
32-bit system is very likely to choke on a 1GB mapping, and maybe even
on a much smaller one.

Yeah, I'm quite worried about assuming that we can map a data structure
that might be of very significant size into shared memory on 32-bit
machines. The address space just isn't there.

Considering the advantages of avoiding message queues, I think we
should think a little bit harder about whether we can't find some way
to skin this cat. As I think about this a little more, I'm not sure
there's really a problem with one stats DSM per database. Sure, the
system might have 100,000 databases in some crazy pathological case,
but the maximum number of those that can be in use is bounded by
max_connections, which means the maximum number of stats file DSMs we
could ever need at one time is also bounded by max_connections. There
are a few corner cases to think about, like if the user writes a
client that connects to all 100,000 databases in very quick
succession, we've got to jettison the old DSMs fast enough to make
room for the new DSMs before we run out of slots, but that doesn't
seem like a particularly tough nut to crack. If the stats collector
ensures that it never attaches to more than MaxBackends stats DSMs at
a time, and each backend ensures that it never attaches to more than
one stats DSM at a time, then 2 * MaxBackends stats DSMs is always
enough. And that's just a matter of bumping
PG_DYNSHMEM_SLOTS_PER_BACKEND from 2 to 4.

In more realistic cases, it will probably be normal for many or all
backends to be connected to the same database, and the number of stats
DSMs required will be far smaller.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Magnus Hagander
magnus@hagander.net
In reply to: Robert Haas (#18)
Re: pg_stat_*_columns?

On Sat, Jun 20, 2015 at 11:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Jun 20, 2015 at 7:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

But if the structure
got too big to map (on a 32-bit system), then you'd be sort of hosed,
because there's no way to attach just part of it. That might not be
worth worrying about, but it depends on how big it's likely to get - a
32-bit system is very likely to choke on a 1GB mapping, and maybe even
on a much smaller one.

Yeah, I'm quite worried about assuming that we can map a data structure
that might be of very significant size into shared memory on 32-bit
machines. The address space just isn't there.

Considering the advantages of avoiding message queues, I think we
should think a little bit harder about whether we can't find some way
to skin this cat. As I think about this a little more, I'm not sure
there's really a problem with one stats DSM per database. Sure, the
system might have 100,000 databases in some crazy pathological case,
but the maximum number of those that can be in use is bounded by
max_connections, which means the maximum number of stats file DSMs we
could ever need at one time is also bounded by max_connections. There
are a few corner cases to think about, like if the user writes a
client that connects to all 100,000 databases in very quick
succession, we've got to jettison the old DSMs fast enough to make
room for the new DSMs before we run out of slots, but that doesn't
seem like a particularly tough nut to crack. If the stats collector
ensures that it never attaches to more than MaxBackends stats DSMs at
a time, and each backend ensures that it never attaches to more than
one stats DSM at a time, then 2 * MaxBackends stats DSMs is always
enough. And that's just a matter of bumping
PG_DYNSHMEM_SLOTS_PER_BACKEND from 2 to 4.

In more realistic cases, it will probably be normal for many or all
backends to be connected to the same database, and the number of stats
DSMs required will be far smaller.

What about a combination in the line of something like this: stats
collector keeps the statistics in local memory as before. But when a
backend needs to get a snapshot of it's data, it uses a shared memory queue
to request it. What the stats collector does in this case is allocate a new
DSM, copy the data into that DSM, and hands the DSM over to the backend. At
this point the stats collector can forget about it, and it's up to the
backend to get rid of it when it's done with it.

That means the address space thing should not be any worse than today,
because each backend will still only see "it's own data". And we only need
to copy the data for databases that are actually used.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#20Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#8)
Re: pg_stat_*_columns?

On 2015-06-20 10:55:03 -0400, Tom Lane wrote:

I dunno that tweaking the format would accomplish much. Where I'd love
to get to is to not have to write the data to disk at all (except at
shutdown). But that seems to require an adjustable-size shared memory
block, and I'm not sure how to do that. One idea, if the DSM stuff
could be used, is to allow the stats collector to allocate multiple
DSM blocks as needed --- but how well would that work on 32-bit
machines? I'd be worried about running out of address space.

We could also just mmap() the stats file into memory in various
processes. With a bit care it should be quite possible to only mmap a
subsets of the file at once, taking care of the address space issues.

There'll be some interesting problems to solve for both DSM and mmap
based solutions to make the locking work nicely. I guess most of it
should be doable quite sensibly using atomics; but the emulation on
older platforms might mean that we'd need to serialize at restarts.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#20)
#22Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#21)
#23Robert Haas
robertmhaas@gmail.com
In reply to: Magnus Hagander (#19)
#24Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#22)
#25Magnus Hagander
magnus@hagander.net
In reply to: Robert Haas (#23)
#26Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Robert Haas (#24)
#27Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#24)
#28Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#27)
#29Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#28)
#30Jeff Janes
jeff.janes@gmail.com
In reply to: Jim Nasby (#5)
#31Jeff Janes
jeff.janes@gmail.com
In reply to: Robert Haas (#7)
#32Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Janes (#31)
#33Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Jeff Janes (#30)
#34Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tomas Vondra (#33)
#35Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Jim Nasby (#34)
#36Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Joel Jacobson (#1)
#37Joel Jacobson
joel@trustly.com
In reply to: Jim Nasby (#36)
#38Andres Freund
andres@anarazel.de
In reply to: Jim Nasby (#36)
#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joel Jacobson (#37)