Database and OS monitoring

Started by Edson Richterover 11 years ago6 messagesgeneral

edsonrichter@hotmail.com

over 11 years ago

Dear list,

I've been searching in web for guidelines on OS (Linux) and PostgreSQL
(9.3.5) active monitoring best practices.
Can someone share experiences?
I'm inclined to look at Cacti and Nagios. Any other experiences?
Recommended books?
I don't want to use SaaS for monitoring - I'll have a cloud server hired
specifically for this purpose, outside my main data center infrastructure.

Thanks in advance,

Edson

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

John R Pierce

pierce@hogranch.com

over 11 years ago

In reply to: Edson Richter (#1)

Re: Database and OS monitoring

On 12/13/2014 10:55 AM, Edson Carlos Ericksson Richter wrote:

I've been searching in web for guidelines on OS (Linux) and PostgreSQL
(9.3.5) active monitoring best practices.
Can someone share experiences?
I'm inclined to look at Cacti and Nagios. Any other experiences?
Recommended books?
I don't want to use SaaS for monitoring - I'll have a cloud server
hired specifically for this purpose, outside my main data center
infrastructure.

Munin is another good choice, its like a much better implementation of
Cacti. It also comes with quite a few postgres monitoring graphs
already setup, you just have to enable it to connect to your postgres
server.

--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Andy Colson

andy@squeakycode.net

over 11 years ago

In reply to: Edson Richter (#1)

Re: Database and OS monitoring

On 12/13/2014 12:55 PM, Edson Carlos Ericksson Richter wrote:

Dear list,

I've been searching in web for guidelines on OS (Linux) and PostgreSQL (9.3.5) active monitoring best practices.
Can someone share experiences?
I'm inclined to look at Cacti and Nagios. Any other experiences? Recommended books?
I don't want to use SaaS for monitoring - I'll have a cloud server hired specifically for this purpose, outside my main data center infrastructure.

Thanks in advance,

Edson

Stats are one thing, but errors are another. I've found my best monitor is rsyslog and a perl script.

rsyslog.conf contains:

local0.* action(type="omprog"
binary="/usr/local/bin/logMonitor.pl"
template="RSYSLOG_TraditionalFileFormat")

the perl script is sort of like:

while (<>)
{
emailme() if (/error/);
}

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Vick Khera

vivek@khera.org

over 11 years ago

In reply to: Edson Richter (#1)

Re: Database and OS monitoring

On Sat, Dec 13, 2014 at 1:55 PM, Edson Carlos Ericksson Richter <
edsonrichter@hotmail.com> wrote:

I've been searching in web for guidelines on OS (Linux) and PostgreSQL
(9.3.5) active monitoring best practices.

Recent trends are more toward monitoring response latency by first
establishing a baseline level of activity and latency, then alerting when
those numbers get out of acceptable range.

There are some open source tools to collect and sort and report this way
(see Kibana and Grafana and their underlying data stores). I've not seen
alerting tools based on this that are non-commercial, though. Two services
I know of are Ruxit and Circonus.

Personally I still use Nagios to tell my staff when things are down or not
responding, but often that is too late to proactively fix things.

One thing that'd be really cool is to use the new binary JSON storage in
the upcoming Pg release to store the time series data for use with
Grafana... but then you'd have a chicken/egg problem with monitoring
itself. :)

Tim Smith

randomdev4+postgres@gmail.com

over 11 years ago

In reply to: Vick Khera (#4)

Re: Database and OS monitoring

Try http://brendangregg.com/

Lots of great tidbits there from a guy who really knows his performance
stuff (ex-Sun, now Netflix)

On Sunday, 14 December 2014, Vick Khera <vivek@khera.org> wrote:

Show quoted text

On Sat, Dec 13, 2014 at 1:55 PM, Edson Carlos Ericksson Richter <
edsonrichter@hotmail.com
<javascript:_e(%7B%7D,'cvml','edsonrichter@hotmail.com');>> wrote:

I've been searching in web for guidelines on OS (Linux) and PostgreSQL
(9.3.5) active monitoring best practices.

Recent trends are more toward monitoring response latency by first
establishing a baseline level of activity and latency, then alerting when
those numbers get out of acceptable range.

There are some open source tools to collect and sort and report this way
(see Kibana and Grafana and their underlying data stores). I've not seen
alerting tools based on this that are non-commercial, though. Two services
I know of are Ruxit and Circonus.

Personally I still use Nagios to tell my staff when things are down or not
responding, but often that is too late to proactively fix things.

One thing that'd be really cool is to use the new binary JSON storage in
the upcoming Pg release to store the time series data for use with
Grafana... but then you'd have a chicken/egg problem with monitoring
itself. :)

Joseph Kregloh

jkregloh@sproutloud.com

over 11 years ago

In reply to: Tim Smith (#5)

Re: Database and OS monitoring

I use Zabbix a lot. There is very nice template for Postgres
http://pg-monz.github.io/pg_monz/index-en.html

On Sun, Dec 14, 2014 at 12:13 PM, Tim Smith <randomdev4+postgres@gmail.com>
wrote:

Show quoted text

Try http://brendangregg.com/

Lots of great tidbits there from a guy who really knows his performance
stuff (ex-Sun, now Netflix)

On Sunday, 14 December 2014, Vick Khera <vivek@khera.org> wrote:

On Sat, Dec 13, 2014 at 1:55 PM, Edson Carlos Ericksson Richter <
edsonrichter@hotmail.com> wrote:

I've been searching in web for guidelines on OS (Linux) and PostgreSQL
(9.3.5) active monitoring best practices.

Recent trends are more toward monitoring response latency by first
establishing a baseline level of activity and latency, then alerting when
those numbers get out of acceptable range.

There are some open source tools to collect and sort and report this way
(see Kibana and Grafana and their underlying data stores). I've not seen
alerting tools based on this that are non-commercial, though. Two services
I know of are Ruxit and Circonus.

Personally I still use Nagios to tell my staff when things are down or
not responding, but often that is too late to proactively fix things.

One thing that'd be really cool is to use the new binary JSON storage in
the upcoming Pg release to store the time series data for use with
Grafana... but then you'd have a chicken/egg problem with monitoring
itself. :)