Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

Started by Steeve Boulangerover 1 year ago20 messagesgeneral
Jump to latest
#1Steeve Boulanger
sboulanger29@gmail.com

Hello,

I have noticed recently that our databases (ie most but not all) have their
stats reset on daily basis:

select count(case when stats_reset > (now() - interval '1 day') then 1 else
null end) db_reset_cnt, count(*) db_total_cnt from pg_stat_database;
db_reset_cnt | db_total_cnt
--------------+--------------
77 | 157

Furthermore, the archiver stats also get reset:

select stats_reset from pg_stat_archiver;
stats_reset
-------------------------------
2024-11-21 13:52:35.864855+00

Some facts:
- PostgreSQL v13.16 on Linux Ubuntu 20.04
- There's no evidence in the logs that any pg_stat_reset* functions have
been executed
name | setting
----------------------------+---------
log_min_duration_statement | 0
log_statement | all
- we don't have any user procs with a pg_stat_reset call
- the timestamps of the reset across databases are sometimes close to each
other, but not always

Anyone has any insight on what could be doing this daily stats reset, or
perhaps how to find the root cause?

Regards,
Steeve

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Steeve Boulanger (#1)
Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

On 11/21/24 09:29, Steeve Boulanger wrote:

Hello,

I have noticed recently that our databases (ie most but not all) have
their stats reset on daily basis:

select count(case when stats_reset > (now() - interval '1 day') then 1
else null end) db_reset_cnt, count(*) db_total_cnt from pg_stat_database;
 db_reset_cnt | db_total_cnt
--------------+--------------
           77 |          157

Furthermore, the archiver stats also get reset:

select stats_reset from pg_stat_archiver;
          stats_reset
-------------------------------
 2024-11-21 13:52:35.864855+00

Some facts:
- PostgreSQL v13.16 on Linux Ubuntu 20.04
- There's no evidence in the logs that any pg_stat_reset* functions have
been executed
            name            | setting
----------------------------+---------
 log_min_duration_statement | 0
 log_statement              | all
- we don't have any user procs with a pg_stat_reset call
- the timestamps of the reset across databases are sometimes close to
each other, but not always
Anyone has any insight on what could be doing this daily stats reset, or
perhaps how to find the root cause?

https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-SETUP

"Cumulative statistics are collected in shared memory. Every PostgreSQL
process collects statistics locally, then updates the shared data at
appropriate intervals. When a server, including a physical replica,
shuts down cleanly, a permanent copy of the statistics data is stored in
the pg_stat subdirectory, so that statistics can be retained across
server restarts. In contrast, when starting from an unclean shutdown
(e.g., after an immediate shutdown, a server crash, starting from a base
backup, and point-in-time recovery), all statistics counters are reset."

Regards,
Steeve

--
Adrian Klaver
adrian.klaver@aklaver.com

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Steeve Boulanger (#1)
Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

On 11/21/24 12:34, Steeve Boulanger wrote:

Please reply to list also.

Ccing list.

Thanks Adrian for taking the time to respond. I will review the
documentation once more, just in case I missed anything.

My apologies - I forgot to mention in my original post, that our last
cluster shutdown was over 15 days ago, thus an "unclean"
shutdown would not explain these daily stats reset that we are seeing.
It might also be relevant to mentioned

Then something is resetting the statistics.

1) What is log_min_error_statement set to?

2) Did you reload the server when changing?:

log_min_duration_statement | 0
log_statement | all

-Steeve

--
Adrian Klaver
adrian.klaver@aklaver.com

#4Steeve Boulanger
sboulanger29@gmail.com
In reply to: Adrian Klaver (#3)
Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

Please reply to list also.

My apologies - I thought I did a "Reply all", but apparently not. I'm a
little bit of a noob with email distrib lists.

1) What is log_min_error_statement set to?

name | setting | pending_restart
-------------------------+---------+-----------------
log_min_error_statement | error | f

2) Did you reload the server when changing?:

yes - pg_reload_conf()

-Steeve

On Thu, Nov 21, 2024 at 2:49 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 11/21/24 12:34, Steeve Boulanger wrote:

Please reply to list also.

Ccing list.

Thanks Adrian for taking the time to respond. I will review the
documentation once more, just in case I missed anything.

My apologies - I forgot to mention in my original post, that our last
cluster shutdown was over 15 days ago, thus an "unclean"
shutdown would not explain these daily stats reset that we are seeing.
It might also be relevant to mentioned

Then something is resetting the statistics.

1) What is log_min_error_statement set to?

2) Did you reload the server when changing?:

log_min_duration_statement | 0
log_statement | all

-Steeve

--
Adrian Klaver
adrian.klaver@aklaver.com

#5Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Steeve Boulanger (#4)
Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

On 11/21/24 12:57, Steeve Boulanger wrote:

Please reply to list also.

My apologies - I thought I did a "Reply all", but apparently not. I'm a
little bit of a noob with email distrib lists.

1) What is log_min_error_statement set to?

          name           | setting | pending_restart
-------------------------+---------+-----------------
 log_min_error_statement | error   | f

2) Did you reload the server when changing?:

yes - pg_reload_conf()

All I can think to do is look at the logs around the stats_reset times
for the databases and see if there is anything relevant.

-Steeve

--
Adrian Klaver
adrian.klaver@aklaver.com

#6Ron
ronljohnsonjr@gmail.com
In reply to: Adrian Klaver (#5)
Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

On Thu, Nov 21, 2024 at 4:12 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 11/21/24 12:57, Steeve Boulanger wrote:

Please reply to list also.

My apologies - I thought I did a "Reply all", but apparently not. I'm a
little bit of a noob with email distrib lists.

1) What is log_min_error_statement set to?

name | setting | pending_restart
-------------------------+---------+-----------------
log_min_error_statement | error | f

2) Did you reload the server when changing?:

yes - pg_reload_conf()

All I can think to do is look at the logs around the stats_reset times
for the databases and see if there is anything relevant.

Also, "SELECT pg_postmaster_start_time();" to show the *real* Postgresql
start time, instead of what you think it is.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

#7Steeve Boulanger
sboulanger29@gmail.com
In reply to: Adrian Klaver (#5)
Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

All I can think to do is look at the logs around the stats_reset times
for the databases and see if there is anything relevant.

That was already done, but nothing relevant was found unfortunately.

-Steeve

On Thu, Nov 21, 2024 at 3:12 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 11/21/24 12:57, Steeve Boulanger wrote:

Please reply to list also.

My apologies - I thought I did a "Reply all", but apparently not. I'm a
little bit of a noob with email distrib lists.

1) What is log_min_error_statement set to?

name | setting | pending_restart
-------------------------+---------+-----------------
log_min_error_statement | error | f

2) Did you reload the server when changing?:

yes - pg_reload_conf()

All I can think to do is look at the logs around the stats_reset times
for the databases and see if there is anything relevant.

-Steeve

--
Adrian Klaver
adrian.klaver@aklaver.com

#8Steeve Boulanger
sboulanger29@gmail.com
In reply to: Ron (#6)
Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

Hello Ron,

Also, "SELECT pg_postmaster_start_time();" to show the *real* Postgresql

start time, instead of what you think it is.

Yes you're right - I should have been more explicit.

[local]:5432 postgres@postgres=# select current_timestamp -
pg_postmaster_start_time() as uptime;
uptime
-------------------------
15 days 10:03:44.048383

-Steeve

On Thu, Nov 21, 2024 at 3:30 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:

Show quoted text

On Thu, Nov 21, 2024 at 4:12 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 11/21/24 12:57, Steeve Boulanger wrote:

Please reply to list also.

My apologies - I thought I did a "Reply all", but apparently not. I'm a
little bit of a noob with email distrib lists.

1) What is log_min_error_statement set to?

name | setting | pending_restart
-------------------------+---------+-----------------
log_min_error_statement | error | f

2) Did you reload the server when changing?:

yes - pg_reload_conf()

All I can think to do is look at the logs around the stats_reset times
for the databases and see if there is anything relevant.

Also, "SELECT pg_postmaster_start_time();" to show the *real* Postgresql
start time, instead of what you think it is.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

#9Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Steeve Boulanger (#7)
Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

On 11/21/24 13:31, Steeve Boulanger wrote:

All I can think to do is look at the logs  around the stats_reset times
for the databases and see if there is anything relevant.

That was already done, but nothing relevant was found unfortunately.

Unless it was not recognized as relevant. Since for the time being I am
eliminating magic as the cause, something concrete is causing this and
it should be leaving a trace. In your post you had this affecting 77 out
of 157 databases in the cluster.

1) Do the 77 share some trait the other 80 don't.

2) Do the OS system logs reveal anything?

3) What was happening in the databases just prior to the time the stats
reset?

4) Do you have external tools accessing these databases?

5) Is the cluster directly open to the world?

-Steeve

On Thu, Nov 21, 2024 at 3:12 PM Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

On 11/21/24 12:57, Steeve Boulanger wrote:

  > Please reply to list also.

My apologies - I thought I did a "Reply all", but apparently not.

I'm a

little bit of a noob with email distrib lists.

  > 1) What is log_min_error_statement set to?

            name           | setting | pending_restart
-------------------------+---------+-----------------
   log_min_error_statement | error   | f

  > 2) Did you reload the server when changing?:

yes - pg_reload_conf()

All I can think to do is look at the logs  around the stats_reset times
for the databases and see if there is anything relevant.

-Steeve

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com

#10Steeve Boulanger
sboulanger29@gmail.com
In reply to: Adrian Klaver (#9)
Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

1) Do the 77 share some trait the other 80 don't.

No pattern found yet .. but still verifying a few things

2) Do the OS system logs reveal anything?

Nothing found in syslog

3) What was happening in the databases just prior to the time the stats

reset?

Here's an example (log extracts) for a stats reset occurrence:

select datname, stats_reset, now()-stats_reset as since_reset
from pg_stat_database
where ( now()-stats_reset ) < interval '1 day'
order by 3 limit 1;

datname | stats_reset | since_reset
----------------+-------------------------------+-----------------
MyDB | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304

<--LOGS-->
2024-11-21 13:48:34.324 UTC pid=[322035][2] db=[MyDB] usr=[user1]
client=[host1] app=[[unknown]]LOG: connection authorized: user=user1
database=MyDB applicatio
n_name=app1 <..>

<.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..>

2024-11-21 13:48:34.336 UTC pid=[322035][3] db=[MyDB] usr=[user1]
client=[host1] app=[app1]LOG: duration: 1.071 ms parse <unnamed>: SELECT
<..>
<--LOGS-->

As you can see from above, the stats for MyDB were reset at ".332" . The
only logs before/after for the db was the connection (at .324), and then
the parse (at .336). NB: I also checked the logs at ".333" in case there
would have been a rounding up, but nothing relevant was found. With that
said, I only verified one occurence - tomorrow I'll check a few more just
to validate.

4) Do you have external tools accessing these databases?

We have internal micro-services accessing the databases, as well as a
monitoring tool (Netdata), and some of the Devs use pgAdmin. I discarded
the scenario where someone would inadvertently do a "pg_stat_reset" via
pgAdmin, just because a lot of databases have their stats reset within a
short period of time.

On the other hand, Netdata does connect to most (if not all) databases
frequently by its nature - so as a test, I stopped the Netdata service
today to see if tomorrow we're still seeing the stats reset or not. I can
report back tomorrow on this.

5) Is the cluster directly open to the world?

No. It's an on-premise installation. Only local applications can connect to
it.

-Steeve

On Thu, Nov 21, 2024 at 4:32 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 11/21/24 13:31, Steeve Boulanger wrote:

All I can think to do is look at the logs around the stats_reset

times

for the databases and see if there is anything relevant.

That was already done, but nothing relevant was found unfortunately.

Unless it was not recognized as relevant. Since for the time being I am
eliminating magic as the cause, something concrete is causing this and
it should be leaving a trace. In your post you had this affecting 77 out
of 157 databases in the cluster.

1) Do the 77 share some trait the other 80 don't.

2) Do the OS system logs reveal anything?

3) What was happening in the databases just prior to the time the stats
reset?

4) Do you have external tools accessing these databases?

5) Is the cluster directly open to the world?

-Steeve

On Thu, Nov 21, 2024 at 3:12 PM Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

On 11/21/24 12:57, Steeve Boulanger wrote:

Please reply to list also.

My apologies - I thought I did a "Reply all", but apparently not.

I'm a

little bit of a noob with email distrib lists.

1) What is log_min_error_statement set to?

name | setting | pending_restart
-------------------------+---------+-----------------
log_min_error_statement | error | f

2) Did you reload the server when changing?:

yes - pg_reload_conf()

All I can think to do is look at the logs around the stats_reset

times

for the databases and see if there is anything relevant.

-Steeve

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com

#11Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Steeve Boulanger (#10)
Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

On 11/21/24 15:50, Steeve Boulanger wrote:

1) Do the 77 share some trait the other 80 don't.

No pattern found yet .. but still verifying a few things

2) Do the OS system logs reveal anything?

Nothing found in syslog

3) What was happening in the databases just prior to the time the stats

reset?

Here's an example (log extracts) for a stats reset occurrence:

select datname, stats_reset, now()-stats_reset as since_reset
from pg_stat_database
where ( now()-stats_reset ) < interval '1 day'
order by 3  limit 1;

    datname     |          stats_reset          |   since_reset
----------------+-------------------------------+-----------------
MyDB           | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304

<--LOGS-->
2024-11-21 13:48:34.324 UTC pid=[322035][2]  db=[MyDB] usr=[user1]
client=[host1] app=[[unknown]]LOG: connection authorized: user=user1
database=MyDB applicatio
n_name=app1 <..>

What is the [2] referring to?

<.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..>

My guess is the difference in time it takes to log the action and set
the log timestamp. Whereas the stats_reset value is the timestamp when
the stats system actually did the reset.

2024-11-21 13:48:34.336 UTC pid=[322035][3]  db=[MyDB] usr=[user1]
client=[host1] app=[app1]LOG:  duration: 1.071 ms parse <unnamed>:
SELECT <..>

The above is some garden variety select?

--
Adrian Klaver
adrian.klaver@aklaver.com

#12Steeve Boulanger
sboulanger29@gmail.com
In reply to: Adrian Klaver (#11)
Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

What is the [2] referring to?

Number of the log line for each session or process, starting at 1

My guess is the difference in time it takes to log the action and set
the log timestamp. Whereas the stats_reset value is the timestamp when
the stats system actually did the reset.

Very plausible. I thought the same too.

The above is some garden variety select?

Not 100% sure what the expression "garden variety select" means lol, but
I'll take a guess that it means an "select from an in-house application"
.. and yes it is.

-Steeve

On Fri, Nov 22, 2024 at 11:18 AM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 11/21/24 15:50, Steeve Boulanger wrote:

1) Do the 77 share some trait the other 80 don't.

No pattern found yet .. but still verifying a few things

2) Do the OS system logs reveal anything?

Nothing found in syslog

3) What was happening in the databases just prior to the time the

stats

reset?

Here's an example (log extracts) for a stats reset occurrence:

select datname, stats_reset, now()-stats_reset as since_reset
from pg_stat_database
where ( now()-stats_reset ) < interval '1 day'
order by 3 limit 1;

datname | stats_reset | since_reset
----------------+-------------------------------+-----------------
MyDB | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304

<--LOGS-->
2024-11-21 13:48:34.324 UTC pid=[322035][2] db=[MyDB] usr=[user1]
client=[host1] app=[[unknown]]LOG: connection authorized: user=user1
database=MyDB applicatio
n_name=app1 <..>

What is the [2] referring to?

<.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..>

My guess is the difference in time it takes to log the action and set
the log timestamp. Whereas the stats_reset value is the timestamp when
the stats system actually did the reset.

2024-11-21 13:48:34.336 UTC pid=[322035][3] db=[MyDB] usr=[user1]
client=[host1] app=[app1]LOG: duration: 1.071 ms parse <unnamed>:
SELECT <..>

The above is some garden variety select?

--
Adrian Klaver
adrian.klaver@aklaver.com

#13Ray O'Donnell
ray@rodonnell.ie
In reply to: Steeve Boulanger (#12)
Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

On 23/11/2024 13:06, Steeve Boulanger wrote:

The above is some garden variety select?

Not 100% sure what the expression "garden variety select" means lol,
but I'll take a guess that it means an "select from an in-house
application"  .. and yes it is.

Here (Ireland) we sometimes say "common-or-garden variety".... It means
a normal, everyday variety. :-)

Ray.

-Steeve

On Fri, Nov 22, 2024 at 11:18 AM Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 11/21/24 15:50, Steeve Boulanger wrote:

  > 1) Do the 77 share some trait the other 80 don't.

No pattern found yet .. but still verifying a few things

  > 2) Do the OS system logs reveal anything?

Nothing found in syslog

  > 3) What was happening in the databases just prior to the time

the stats

reset?

Here's an example (log extracts) for a stats reset occurrence:

select datname, stats_reset, now()-stats_reset as since_reset
from pg_stat_database
where ( now()-stats_reset ) < interval '1 day'
order by 3  limit 1;

      datname     |          stats_reset          | since_reset
----------------+-------------------------------+-----------------
MyDB           | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304

<--LOGS-->
2024-11-21 13:48:34.324 UTC pid=[322035][2]  db=[MyDB] usr=[user1]
client=[host1] app=[[unknown]]LOG: connection authorized:

user=user1

database=MyDB applicatio
n_name=app1 <..>

What is the [2] referring to?

<.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..>

My guess is the difference in time it takes to log the action and set
the log timestamp. Whereas the stats_reset value is the timestamp
when
the stats system actually did the reset.

2024-11-21 13:48:34.336 UTC pid=[322035][3]  db=[MyDB] usr=[user1]
client=[host1] app=[app1]LOG:  duration: 1.071 ms parse <unnamed>:
SELECT <..>

The above is some garden variety select?

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Raymond O'Donnell // Galway // Ireland
ray@rodonnell.ie

#14Steeve Boulanger
sboulanger29@gmail.com
In reply to: Ray O'Donnell (#13)
Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

Here (Ireland) we sometimes say "common-or-garden variety".... It means a normal, everyday variety. :-)

I'm afraid that my Irish dialect is limited to "sláinte" only ;-) In
any case, thanks for taking the time to help with this issue. I'm
still investigating, but I think that calling the "ghostbusters" is
moving up the list now lol.

Show quoted text

On Sat, Nov 23, 2024 at 7:09 AM Ray O'Donnell <ray@rodonnell.ie> wrote:

On 23/11/2024 13:06, Steeve Boulanger wrote:

The above is some garden variety select?

Not 100% sure what the expression "garden variety select" means lol, but I'll take a guess that it means an "select from an in-house application" .. and yes it is.

Here (Ireland) we sometimes say "common-or-garden variety".... It means a normal, everyday variety. :-)

Ray.

-Steeve

On Fri, Nov 22, 2024 at 11:18 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 11/21/24 15:50, Steeve Boulanger wrote:

1) Do the 77 share some trait the other 80 don't.

No pattern found yet .. but still verifying a few things

2) Do the OS system logs reveal anything?

Nothing found in syslog

3) What was happening in the databases just prior to the time the stats

reset?

Here's an example (log extracts) for a stats reset occurrence:

select datname, stats_reset, now()-stats_reset as since_reset
from pg_stat_database
where ( now()-stats_reset ) < interval '1 day'
order by 3 limit 1;

datname | stats_reset | since_reset
----------------+-------------------------------+-----------------
MyDB | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304

<--LOGS-->
2024-11-21 13:48:34.324 UTC pid=[322035][2] db=[MyDB] usr=[user1]
client=[host1] app=[[unknown]]LOG: connection authorized: user=user1
database=MyDB applicatio
n_name=app1 <..>

What is the [2] referring to?

<.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..>

My guess is the difference in time it takes to log the action and set
the log timestamp. Whereas the stats_reset value is the timestamp when
the stats system actually did the reset.

2024-11-21 13:48:34.336 UTC pid=[322035][3] db=[MyDB] usr=[user1]
client=[host1] app=[app1]LOG: duration: 1.071 ms parse <unnamed>:
SELECT <..>

The above is some garden variety select?

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Raymond O'Donnell // Galway // Ireland
ray@rodonnell.ie

#15Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Steeve Boulanger (#14)
Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

On 11/23/24 05:16, Steeve Boulanger wrote:

Here (Ireland) we sometimes say "common-or-garden variety".... It means a normal, everyday variety. :-)

I'm afraid that my Irish dialect is limited to "sláinte" only ;-) In
any case, thanks for taking the time to help with this issue. I'm
still investigating, but I think that calling the "ghostbusters" is
moving up the list now lol.

I'm guessing shutting down Netdata did not stop the resets?

You might try setting log_min_messages to info to see if that catches
anything.

On Sat, Nov 23, 2024 at 7:09 AM Ray O'Donnell <ray@rodonnell.ie> wrote:

On 23/11/2024 13:06, Steeve Boulanger wrote:

--
Adrian Klaver
adrian.klaver@aklaver.com

#16Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Steeve Boulanger (#14)
Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

On 11/23/24 05:16, Steeve Boulanger wrote:

Here (Ireland) we sometimes say "common-or-garden variety".... It means a normal, everyday variety. :-)

I'm afraid that my Irish dialect is limited to "sláinte" only ;-) In
any case, thanks for taking the time to help with this issue. I'm
still investigating, but I think that calling the "ghostbusters" is
moving up the list now lol.

One possible scenario:

log_min_messages = info
log_min_error_statement = info
log_statement = 'all'

psql -d test -U postgres -p 5432

CREATE OR REPLACE FUNCTION public.admin_func()
RETURNS void
LANGUAGE plpgsql
AS $function$
BEGIN
PERFORM pg_stat_reset();
RAISE NOTICE 'Reset statistics';
END;
$function$

select datname, stats_reset from pg_stat_database;
datname | stats_reset
--------------+-------------------------------
NULL | NULL
postgres | NULL
template1 | NULL
template0 | NULL
test | 2024-11-23 09:21:49.421552-08
task_manager | NULL
test_psql | NULL
production | NULL
pp_archive | NULL
farm_db | NULL

select admin_func();
NOTICE: Reset statistics
admin_func
------------

(1 row)

select datname, stats_reset from pg_stat_database;
datname | stats_reset
--------------+-------------------------------
NULL | NULL
postgres | NULL
template1 | NULL
template0 | NULL
test | 2024-11-23 09:26:30.749257-08
task_manager | NULL
test_psql | NULL
production | NULL
pp_archive | NULL
farm_db | NULL

2024-11-23 09:26:30.749 PST [14501] postgres@test LOG: statement:
select admin_func();
2024-11-23 09:26:30.749 PST [14501] postgres@test NOTICE: Reset statistics
2024-11-23 09:26:30.749 PST [14501] postgres@test CONTEXT: PL/pgSQL
function admin_func() line 4 at RAISE
2024-11-23 09:26:30.749 PST [14501] postgres@test STATEMENT: select
admin_func();

The issue being that the pg_stat_reset() is buried in a function and
does not show up on its own. The RAISE NOTICE alerts in my logs just so
I could find the function easily. It could be there is a function or
functions in your setup doing something similar.

On Sat, Nov 23, 2024 at 7:09 AM Ray O'Donnell <ray@rodonnell.ie> wrote:

On 23/11/2024 13:06, Steeve Boulanger wrote:

The above is some garden variety select?

Not 100% sure what the expression "garden variety select" means lol, but I'll take a guess that it means an "select from an in-house application" .. and yes it is.

Here (Ireland) we sometimes say "common-or-garden variety".... It means a normal, everyday variety. :-)

Ray.

-Steeve

On Fri, Nov 22, 2024 at 11:18 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 11/21/24 15:50, Steeve Boulanger wrote:

1) Do the 77 share some trait the other 80 don't.

No pattern found yet .. but still verifying a few things

2) Do the OS system logs reveal anything?

Nothing found in syslog

3) What was happening in the databases just prior to the time the stats

reset?

Here's an example (log extracts) for a stats reset occurrence:

select datname, stats_reset, now()-stats_reset as since_reset
from pg_stat_database
where ( now()-stats_reset ) < interval '1 day'
order by 3 limit 1;

datname | stats_reset | since_reset
----------------+-------------------------------+-----------------
MyDB | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304

<--LOGS-->
2024-11-21 13:48:34.324 UTC pid=[322035][2] db=[MyDB] usr=[user1]
client=[host1] app=[[unknown]]LOG: connection authorized: user=user1
database=MyDB applicatio
n_name=app1 <..>

What is the [2] referring to?

<.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..>

My guess is the difference in time it takes to log the action and set
the log timestamp. Whereas the stats_reset value is the timestamp when
the stats system actually did the reset.

2024-11-21 13:48:34.336 UTC pid=[322035][3] db=[MyDB] usr=[user1]
client=[host1] app=[app1]LOG: duration: 1.071 ms parse <unnamed>:
SELECT <..>

The above is some garden variety select?

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Raymond O'Donnell // Galway // Ireland
ray@rodonnell.ie

--
Adrian Klaver
adrian.klaver@aklaver.com

#17Greg Sabino Mullane
greg@turnstep.com
In reply to: Steeve Boulanger (#14)
Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

As a superuser, rename pg_stat_reset inside one of the commonly affected
databases:

alter function pg_stat_reset rename to
hey_stop_running_pg_stat_reset_already;

Then see who starts complaining. Additionally, your server log will get
helpful entries like this:

ERROR: function pg_stat_reset() does not exist

Cheers,
Greg

#18Steeve Boulanger
sboulanger29@gmail.com
In reply to: Adrian Klaver (#15)
Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

Apologies for the late reply.

I'm guessing shutting down Netdata did not stop the resets?

There was a hiccup yesterday, so I had to redo the test and wait until
today to see if there were any stats resets ... and NONE today!!

reset_status | cnt
--------------------+-----
reset before today | 100
no reset | 57
reset today | 0

This is quite interesting, and a little unexpected... but in order to
confirm this theory, I have done the following just now:

1) Restart Netdata service
2) rename the "pg_stat_reset()" function , as you suggested in another
post - very clever btw! :-)

Now let's wait until tomorrow to see if we have any stats reset and/or some
relevant errors in the PG logs!

Quick note: I did do a quick check in any user functions for any
"pg_stat_reset" calls , but didn't find any. With that said, we have 150+
databases in our cluster, so perhaps I missed something. I also did a grep
on the netdata directories, but nothing as well.

-Steeve

On Sat, Nov 23, 2024 at 10:19 AM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 11/23/24 05:16, Steeve Boulanger wrote:

Here (Ireland) we sometimes say "common-or-garden variety".... It means

a normal, everyday variety. :-)

I'm afraid that my Irish dialect is limited to "sláinte" only ;-) In
any case, thanks for taking the time to help with this issue. I'm
still investigating, but I think that calling the "ghostbusters" is
moving up the list now lol.

I'm guessing shutting down Netdata did not stop the resets?

You might try setting log_min_messages to info to see if that catches
anything.

On Sat, Nov 23, 2024 at 7:09 AM Ray O'Donnell <ray@rodonnell.ie> wrote:

On 23/11/2024 13:06, Steeve Boulanger wrote:

--
Adrian Klaver
adrian.klaver@aklaver.com

#19Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Steeve Boulanger (#18)
Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

On 11/23/24 16:20, Steeve Boulanger wrote:

Apologies for the late reply.

I'm guessing shutting down Netdata did not stop the resets?

There was a hiccup yesterday, so I had to redo the test and wait until
today to see if there were any stats resets ... and NONE today!!

    reset_status    | cnt
--------------------+-----
 reset before today | 100
 no reset           |  57
 reset today        |   0

Getting closer.

This is quite interesting, and a little unexpected... but in order to
confirm this theory, I have done the following just now:

1) Restart Netdata service
2) rename the "pg_stat_reset()" function , as you suggested in another
post  - very clever btw! :-)

Not sure who 'you' refers to, but just to be clear that was Greg Sabino
Mullane.

Now let's wait until tomorrow to see if we have any stats reset and/or
some relevant errors in the PG logs!

Quick note: I did do a quick check in any user functions for any
"pg_stat_reset" calls , but didn't find any. With that said, we have
150+ databases in our cluster, so perhaps I missed something. I also did
a grep on the netdata directories, but nothing as well.

-Steeve

--
Adrian Klaver
adrian.klaver@aklaver.com

#20Steeve Boulanger
sboulanger29@gmail.com
In reply to: Greg Sabino Mullane (#17)
Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

As a superuser, rename pg_stat_reset inside one of the commonly affected

databases:

alter function pg_stat_reset rename to

hey_stop_running_pg_stat_reset_already;

Then see who starts complaining. Additionally, your server log will get

helpful entries like this:

Thanks for the tip Greg - that's very clever! I've done this just now,
and will check back tomorrow to see if there's any corresponding error
message.

NB: I hadn't originally realized that the system functions (ie pg_*) were
created for each database, so I had renamed only the one in the postgres
database, thinking it was global (ie I'll blame this on my Oracle days
;-)).

Now I renamed the function under one database that had been verified as
getting its stats reset on a daily basis. Hopefully tomorrow we'll get more
interesting info!

-Steeve

On Sat, Nov 23, 2024 at 11:50 AM Greg Sabino Mullane <htamfids@gmail.com>
wrote:

Show quoted text

As a superuser, rename pg_stat_reset inside one of the commonly affected
databases:

alter function pg_stat_reset rename to
hey_stop_running_pg_stat_reset_already;

Then see who starts complaining. Additionally, your server log will get
helpful entries like this:

ERROR: function pg_stat_reset() does not exist

Cheers,
Greg