Enhancement Request

Started by Rob Brucksalmost 10 years ago7 messagesgeneral
Jump to latest
#1Rob Brucks
rob.brucks@rackspace.com

I'd like to propose two enhancements to the PostgreSQL code, but I'm not sure if this is the correct mailing list. So if it's not then please let me know where I need to post this.

These are monitoring-centric enhancement requests since I'm trying to implement accurate monitoring in a secure fashion.

* General monitoring:
We have a need for a "monitoring" role in PostgreSQL that has read-only access to any "pg_stat" view. As of 9.4, only a super-user can read all columns of "pg_stat_activity", "pg_stat_replication", and "pg_stat_archiver" (there may be other restricted views as well). These views provide critical insight on how well the cluster is operating and what is going on.

There appears to be only two ways to gain access to these views:

1. grant super-user to my monitoring user
2. write custom functions owned by a super-user with "SECURITY DEFINER" and grant access to my monitoring user

Option 1 is too big of a security risk to grant to a monitoring user. If the monitoring system gets compromised then the DB will be at risk too.

Option 2 requires creating, deploying, and maintaining sets of monitoring functions, which is a decent chunk of work in a large environment with dozens or hundreds of deployments, many running different versions of postgres possibly needing custom versions of the functions. When you add the bureaucracy of a large IT organization and SOX and PCI compliance requirements it ends up being a PITA implementing or changing these functions when you only have a small DBA team.

* Streaming Replication Monitoring:
Make the "pg_stat_replication" view more persistent (maybe keep the rows for 24 hours or have a registration process?).

There appears to be no way to *accurately* monitor streaming replication via SQL alone currently. This is due to three different problems:

1. When a standby gets disconnected from the master then the corresponding row is immediately removed from pg_stat_replication on the master. Once the row is gone you cannot tell via simple SQL whether a standby is behind or not (or that it even existed at all) without storing prior values and extrapolating from them.
2. On the standby, if the WAL streaming process gets disconnected from the master then it does not report that it is behind because pg_last_xlog_receive_location() has not been updated from the master. The standby has no idea how far ahead the master has gotten and just blindly reports the last value.
3. On a "quiet" system there may not be any update activity on the master for a long time, which makes the pg_last_xact_replay_timestamp() function report an ever-increasing interval. So it is not useable for accurately measuring lag on quiet systems.

Ideally the master should be able to report standby lag time via SQL, even when there has been a disruption in connectivity with a standby.

The only accurate method I have found to measure standby lag is to create a synthetic update that runs periodically. This works, but is less than ideal and requires adding a table to every cluster (which then has to be vacuumed frequently too) and writing and maintaining a process to update it and purge it.

These two enhancements would go a long way in making it easier to monitor PostgreSQL clusters and replication because it would eliminate a lot of custom coding requirements and enable us to pull metrics directly via simple SQL.

If anyone can provide insight on how I could accomplish these in a simple manner by other means then I'm all ears!

Thanks,
Rob

#2Melvin Davidson
melvin6925@gmail.com
In reply to: Rob Brucks (#1)
Re: Enhancement Request

On Tue, Apr 19, 2016 at 4:37 PM, Rob Brucks <rob.brucks@rackspace.com>
wrote:

I'd like to propose two enhancements to the PostgreSQL code, but I'm not
sure if this is the correct mailing list. So if it's not then please let
me know where I need to post this.

These are monitoring-centric enhancement requests since I'm trying to
implement accurate monitoring in a secure fashion.

* General monitoring:
We have a need for a "monitoring" role in PostgreSQL that has read-only
access to any "pg_stat" view. As of 9.4, only a super-user can read all
columns of "pg_stat_activity", "pg_stat_replication", and
"pg_stat_archiver" (there may be other restricted views as well). These
views provide critical insight on how well the cluster is operating and
what is going on.

There appears to be only two ways to gain access to these views:

1. grant super-user to my monitoring user
2. write custom functions owned by a super-user with "SECURITY
DEFINER" and grant access to my monitoring user

Option 1 is too big of a security risk to grant to a monitoring user. If
the monitoring system gets compromised then the DB will be at risk too.

Option 2 requires creating, deploying, and maintaining sets of monitoring
functions, which is a decent chunk of work in a large environment with
dozens or hundreds of deployments, many running different versions of
postgres possibly needing custom versions of the functions. When you add
the bureaucracy of a large IT organization and SOX and PCI compliance
requirements it ends up being a PITA implementing or changing these
functions when you only have a small DBA team.

* Streaming Replication Monitoring:
Make the "pg_stat_replication" view more persistent (maybe keep the rows
for 24 hours or have a registration process?).

There appears to be no way to *accurately* monitor streaming replication
via SQL alone currently. This is due to three different problems:

1. When a standby gets disconnected from the master then the
corresponding row is immediately removed from pg_stat_replication on the
master. Once the row is gone you cannot tell via simple SQL whether a
standby is behind or not (or that it even existed at all) without storing
prior values and extrapolating from them.
2. On the standby, if the WAL streaming process gets disconnected from
the master then it does not report that it is behind
because pg_last_xlog_receive_location() has not been updated from the
master. The standby has no idea how far ahead the master has gotten and
just blindly reports the last value.
3. On a "quiet" system there may not be any update activity on the
master for a long time, which makes the pg_last_xact_replay_timestamp()
function report an ever-increasing interval. So it is not useable for
accurately measuring lag on quiet systems.

Ideally the master should be able to report standby lag time via SQL, even
when there has been a disruption in connectivity with a standby.

The only accurate method I have found to measure standby lag is to create
a synthetic update that runs periodically. This works, but is less than
ideal and requires adding a table to every cluster (which then has to be
vacuumed frequently too) and writing and maintaining a process to update it
and purge it.

These two enhancements would go a long way in making it easier to monitor
PostgreSQL clusters and replication because it would eliminate a lot of
custom coding requirements and enable us to pull metrics directly via
simple SQL.

If anyone can provide insight on how I could accomplish these in a simple
manner by other means then I'm all ears!

Thanks,
Rob

Rob,
Enhancement or feature requests should probably go to *Custormer Feedback
<https://postgresql.uservoice.com/forums/21853-general&gt;*
https://postgresql.uservoice.com/forums/21853-general
--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Melvin Davidson (#2)
Re: Enhancement Request

On Tue, Apr 19, 2016 at 5:59 PM, Melvin Davidson <melvin6925@gmail.com>
wrote:

Enhancement or feature requests should probably go to *Custormer
Feedback <https://postgresql.uservoice.com/forums/21853-general&gt;*
https://postgresql.uservoice.com/forums/21853-general

​I'm on the PostgreSQL home page: what series of links would lead to "
uservoice.com" so that I can submit my feature request?

If that site is going to be recommended over posting to pgsql-general it
should feature prominently on the main website.

​I've seen or heard nothing that suggests anywhere except pgsql-general,
this list, is the correct place to submit such requests. I do think we are
lacking sufficient communication in this area but would rather see that
corrected and formalized before we go pointing people to third-party sites.

David J.

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Melvin Davidson (#2)
Re: Enhancement Request

Melvin Davidson wrote:

On Tue, Apr 19, 2016 at 4:37 PM, Rob Brucks <rob.brucks@rackspace.com> wrote:

I'd like to propose two enhancements to the PostgreSQL code, but I'm not sure if this is the
correct mailing list. So if it's not then please let me know where I need to post this.

I'd post feature requests to the -hackers list, but -general is usually fine too.

* General monitoring:
We have a need for a "monitoring" role in PostgreSQL that has read-only access to any "pg_stat"
view. As of 9.4, only a super-user can read all columns of "pg_stat_activity", "pg_stat_replication",
and "pg_stat_archiver" (there may be other restricted views as well). These views provide critical
insight on how well the cluster is operating and what is going on.

There appears to be only two ways to gain access to these views:

1. grant super-user to my monitoring user
2. write custom functions owned by a super-user with "SECURITY DEFINER" and grant access to
my monitoring user

I think #2 is good enough.
To reduce the pain of deploying such a function, you can create it in
"template1" and CREATE DATABASE will automatically copy it.

* Streaming Replication Monitoring:

The only accurate method I have found to measure standby lag is to create a synthetic update
that runs periodically. This works, but is less than ideal and requires adding a table to every
cluster (which then has to be vacuumed frequently too) and writing and maintaining a process to update
it and purge it.

I second that, I have wished for something like that myself.

You don't actually have to create a table for this, a simple "SELECT txid_current()"
on the primary will do the trick. Still it is annoying.

Enhancement or feature requests should probably go to Custormer Feedback
<https://postgresql.uservoice.com/forums/21853-general&gt;

I didn't know that site.
Was it ever announced?

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5John R Pierce
pierce@hogranch.com
In reply to: Laurenz Albe (#4)
Re: Enhancement Request

On 4/20/2016 12:28 AM, Albe Laurenz wrote:

Enhancement or feature requests should probably go to Custormer Feedback
<https://postgresql.uservoice.com/forums/21853-general&gt;

I didn't know that site.
Was it ever announced?

first I've heard of it, was Melvin suggesting it in another thread here
the other day.

who runs/owns uservoice, what are they in this for? Who setup this
postgresql uservoice site?

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Shulgin, Oleksandr
oleksandr.shulgin@zalando.de
In reply to: David G. Johnston (#3)
Re: Enhancement Request

On Wed, Apr 20, 2016 at 3:47 AM, David G. Johnston <
david.g.johnston@gmail.com> wrote:

On Tue, Apr 19, 2016 at 5:59 PM, Melvin Davidson <melvin6925@gmail.com>
wrote:

Enhancement or feature requests should probably go to *Custormer
Feedback <https://postgresql.uservoice.com/forums/21853-general&gt;*
https://postgresql.uservoice.com/forums/21853-general

​I'm on the PostgreSQL home page: what series of links would lead to "
uservoice.com" so that I can submit my feature request?

If that site is going to be recommended over posting to pgsql-general it
should feature prominently on the main website.

​I've seen or heard nothing that suggests anywhere except pgsql-general,
this list, is the correct place to submit such requests. I do think we are
lacking sufficient communication in this area but would rather see that
corrected and formalized before we go pointing people to third-party sites.

It is interesting that this channel was opened at least in the middle 2009
or before (judging by the feature/comments timestamps). It would be nice
if uservoice provided any hint or contact for the owner of the channel,
because right now it appears to be a total mystery.

I'm not sure if the owner is subscribed to -general, I wouldn't be
surprised if not. Otherwise, please speak up. :-)

I think that such user communication channel can be useful if maintained
properly, and if the rest of the community would be aware of its
existence. What I dislike right now, is that it pretends to be official
channel without any disclaimer or references to the actual official
communication channels (which are the mailing lists and IRC, AFAIK).

Regards,
--
Alex

#7Stephen Frost
sfrost@snowman.net
In reply to: Rob Brucks (#1)
Re: Enhancement Request

Rob,

* Rob Brucks (rob.brucks@rackspace.com) wrote:

I'd like to propose two enhancements to the PostgreSQL code, but I'm not sure if this is the correct mailing list. So if it's not then please let me know where I need to post this.

This is the correct place. I don't know why people are suggesting third
party sites, but the correct place is -general, as you've done, which is
fantastic.

These are monitoring-centric enhancement requests since I'm trying to implement accurate monitoring in a secure fashion.

I've been working on exactly this problem and 9.6 will (finally) have
the start of work to improve PostgreSQL in this area. Your thoughts and
use-cases are exactly what we need to continue that effort and get to a
point where monitoring solutions can depend on PostgreSQL to provide the
features, capabilities, and information which they need, without
requiring the monitoring user to be a superuser.

* General monitoring:
We have a need for a "monitoring" role in PostgreSQL that has read-only access to any "pg_stat" view. As of 9.4, only a super-user can read all columns of "pg_stat_activity", "pg_stat_replication", and "pg_stat_archiver" (there may be other restricted views as well). These views provide critical insight on how well the cluster is operating and what is going on.

That was proposed and was called 'pg_monitor'. Unfortunately, through a
lack of support and questions about such a role possibly being "too
broad", it ended up not being included for 9.6. I'd very much like to
work through those issues and find a solution for post-9.6 (note that we
are past the feature freeze point for 9.6, so any further changes will
be for later versions).

* Streaming Replication Monitoring:
Make the "pg_stat_replication" view more persistent (maybe keep the rows for 24 hours or have a registration process?).

I believe this is improved when working with replication slots in recent
versions.

If anyone can provide insight on how I could accomplish these in a simple manner by other means then I'm all ears!

Please continue to engage with the PostgreSQL community on these issues.
I agree that these are critical features which we really need to have
and will continue to work on them, but support from users, particularly
with detailed real-world use-caes, goes a very long way.

Thanks!

Stephen