Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?
Hello,
I have noticed recently that our databases (ie most but not all) have their
stats reset on daily basis:
select count(case when stats_reset > (now() - interval '1 day') then 1 else
null end) db_reset_cnt, count(*) db_total_cnt from pg_stat_database;
db_reset_cnt | db_total_cnt
--------------+--------------
77 | 157
Furthermore, the archiver stats also get reset:
select stats_reset from pg_stat_archiver;
stats_reset
-------------------------------
2024-11-21 13:52:35.864855+00
Some facts:
- PostgreSQL v13.16 on Linux Ubuntu 20.04
- There's no evidence in the logs that any pg_stat_reset* functions have
been executed
name | setting
----------------------------+---------
log_min_duration_statement | 0
log_statement | all
- we don't have any user procs with a pg_stat_reset call
- the timestamps of the reset across databases are sometimes close to each
other, but not always
Anyone has any insight on what could be doing this daily stats reset, or
perhaps how to find the root cause?
Regards,
Steeve
On 11/21/24 09:29, Steeve Boulanger wrote:
Hello,
I have noticed recently that our databases (ie most but not all) have
their stats reset on daily basis:select count(case when stats_reset > (now() - interval '1 day') then 1
else null end) db_reset_cnt, count(*) db_total_cnt from pg_stat_database;
db_reset_cnt | db_total_cnt
--------------+--------------
77 | 157Furthermore, the archiver stats also get reset:
select stats_reset from pg_stat_archiver;
stats_reset
-------------------------------
2024-11-21 13:52:35.864855+00Some facts:
- PostgreSQL v13.16 on Linux Ubuntu 20.04
- There's no evidence in the logs that any pg_stat_reset* functions have
been executed
name | setting
----------------------------+---------
log_min_duration_statement | 0
log_statement | all
- we don't have any user procs with a pg_stat_reset call
- the timestamps of the reset across databases are sometimes close to
each other, but not always
Anyone has any insight on what could be doing this daily stats reset, or
perhaps how to find the root cause?
https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-SETUP
"Cumulative statistics are collected in shared memory. Every PostgreSQL
process collects statistics locally, then updates the shared data at
appropriate intervals. When a server, including a physical replica,
shuts down cleanly, a permanent copy of the statistics data is stored in
the pg_stat subdirectory, so that statistics can be retained across
server restarts. In contrast, when starting from an unclean shutdown
(e.g., after an immediate shutdown, a server crash, starting from a base
backup, and point-in-time recovery), all statistics counters are reset."
Regards,
Steeve
--
Adrian Klaver
adrian.klaver@aklaver.com
On 11/21/24 12:34, Steeve Boulanger wrote:
Please reply to list also.
Ccing list.
Thanks Adrian for taking the time to respond. I will review the
documentation once more, just in case I missed anything.My apologies - I forgot to mention in my original post, that our last
cluster shutdown was over 15 days ago, thus an "unclean"
shutdown would not explain these daily stats reset that we are seeing.
It might also be relevant to mentioned
Then something is resetting the statistics.
1) What is log_min_error_statement set to?
2) Did you reload the server when changing?:
log_min_duration_statement | 0
log_statement | all
-Steeve
--
Adrian Klaver
adrian.klaver@aklaver.com
Import Notes
Reply to msg id not found: CAAiSvx_vsiCmL3gBj+1hRpMxo=nOc3HUa+VaGO9_GfOXDWCsrQ@mail.gmail.com
Please reply to list also.
My apologies - I thought I did a "Reply all", but apparently not. I'm a
little bit of a noob with email distrib lists.
1) What is log_min_error_statement set to?
name | setting | pending_restart
-------------------------+---------+-----------------
log_min_error_statement | error | f
2) Did you reload the server when changing?:
yes - pg_reload_conf()
-Steeve
On Thu, Nov 21, 2024 at 2:49 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
Show quoted text
On 11/21/24 12:34, Steeve Boulanger wrote:
Please reply to list also.
Ccing list.
Thanks Adrian for taking the time to respond. I will review the
documentation once more, just in case I missed anything.My apologies - I forgot to mention in my original post, that our last
cluster shutdown was over 15 days ago, thus an "unclean"
shutdown would not explain these daily stats reset that we are seeing.
It might also be relevant to mentionedThen something is resetting the statistics.
1) What is log_min_error_statement set to?
2) Did you reload the server when changing?:
log_min_duration_statement | 0
log_statement | all-Steeve
--
Adrian Klaver
adrian.klaver@aklaver.com
On 11/21/24 12:57, Steeve Boulanger wrote:
Please reply to list also.
My apologies - I thought I did a "Reply all", but apparently not. I'm a
little bit of a noob with email distrib lists.1) What is log_min_error_statement set to?
name | setting | pending_restart
-------------------------+---------+-----------------
log_min_error_statement | error | f2) Did you reload the server when changing?:
yes - pg_reload_conf()
All I can think to do is look at the logs around the stats_reset times
for the databases and see if there is anything relevant.
-Steeve
--
Adrian Klaver
adrian.klaver@aklaver.com
On Thu, Nov 21, 2024 at 4:12 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
On 11/21/24 12:57, Steeve Boulanger wrote:
Please reply to list also.
My apologies - I thought I did a "Reply all", but apparently not. I'm a
little bit of a noob with email distrib lists.1) What is log_min_error_statement set to?
name | setting | pending_restart
-------------------------+---------+-----------------
log_min_error_statement | error | f2) Did you reload the server when changing?:
yes - pg_reload_conf()
All I can think to do is look at the logs around the stats_reset times
for the databases and see if there is anything relevant.
Also, "SELECT pg_postmaster_start_time();" to show the *real* Postgresql
start time, instead of what you think it is.
--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
All I can think to do is look at the logs around the stats_reset times
for the databases and see if there is anything relevant.
That was already done, but nothing relevant was found unfortunately.
-Steeve
On Thu, Nov 21, 2024 at 3:12 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
Show quoted text
On 11/21/24 12:57, Steeve Boulanger wrote:
Please reply to list also.
My apologies - I thought I did a "Reply all", but apparently not. I'm a
little bit of a noob with email distrib lists.1) What is log_min_error_statement set to?
name | setting | pending_restart
-------------------------+---------+-----------------
log_min_error_statement | error | f2) Did you reload the server when changing?:
yes - pg_reload_conf()
All I can think to do is look at the logs around the stats_reset times
for the databases and see if there is anything relevant.-Steeve
--
Adrian Klaver
adrian.klaver@aklaver.com
Hello Ron,
Also, "SELECT pg_postmaster_start_time();" to show the *real* Postgresql
start time, instead of what you think it is.
Yes you're right - I should have been more explicit.
[local]:5432 postgres@postgres=# select current_timestamp -
pg_postmaster_start_time() as uptime;
uptime
-------------------------
15 days 10:03:44.048383
-Steeve
On Thu, Nov 21, 2024 at 3:30 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
Show quoted text
On Thu, Nov 21, 2024 at 4:12 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:On 11/21/24 12:57, Steeve Boulanger wrote:
Please reply to list also.
My apologies - I thought I did a "Reply all", but apparently not. I'm a
little bit of a noob with email distrib lists.1) What is log_min_error_statement set to?
name | setting | pending_restart
-------------------------+---------+-----------------
log_min_error_statement | error | f2) Did you reload the server when changing?:
yes - pg_reload_conf()
All I can think to do is look at the logs around the stats_reset times
for the databases and see if there is anything relevant.Also, "SELECT pg_postmaster_start_time();" to show the *real* Postgresql
start time, instead of what you think it is.--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On 11/21/24 13:31, Steeve Boulanger wrote:
All I can think to do is look at the logs around the stats_reset times
for the databases and see if there is anything relevant.That was already done, but nothing relevant was found unfortunately.
Unless it was not recognized as relevant. Since for the time being I am
eliminating magic as the cause, something concrete is causing this and
it should be leaving a trace. In your post you had this affecting 77 out
of 157 databases in the cluster.
1) Do the 77 share some trait the other 80 don't.
2) Do the OS system logs reveal anything?
3) What was happening in the databases just prior to the time the stats
reset?
4) Do you have external tools accessing these databases?
5) Is the cluster directly open to the world?
-Steeve
On Thu, Nov 21, 2024 at 3:12 PM Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:On 11/21/24 12:57, Steeve Boulanger wrote:
> Please reply to list also.
My apologies - I thought I did a "Reply all", but apparently not.
I'm a
little bit of a noob with email distrib lists.
> 1) What is log_min_error_statement set to?
name | setting | pending_restart
-------------------------+---------+-----------------
log_min_error_statement | error | f> 2) Did you reload the server when changing?:
yes - pg_reload_conf()
All I can think to do is look at the logs around the stats_reset times
for the databases and see if there is anything relevant.-Steeve
--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
--
Adrian Klaver
adrian.klaver@aklaver.com
1) Do the 77 share some trait the other 80 don't.
No pattern found yet .. but still verifying a few things
2) Do the OS system logs reveal anything?
Nothing found in syslog
3) What was happening in the databases just prior to the time the stats
reset?
Here's an example (log extracts) for a stats reset occurrence:
select datname, stats_reset, now()-stats_reset as since_reset
from pg_stat_database
where ( now()-stats_reset ) < interval '1 day'
order by 3 limit 1;
datname | stats_reset | since_reset
----------------+-------------------------------+-----------------
MyDB | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304
<--LOGS-->
2024-11-21 13:48:34.324 UTC pid=[322035][2] db=[MyDB] usr=[user1]
client=[host1] app=[[unknown]]LOG: connection authorized: user=user1
database=MyDB applicatio
n_name=app1 <..>
<.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..>
2024-11-21 13:48:34.336 UTC pid=[322035][3] db=[MyDB] usr=[user1]
client=[host1] app=[app1]LOG: duration: 1.071 ms parse <unnamed>: SELECT
<..>
<--LOGS-->
As you can see from above, the stats for MyDB were reset at ".332" . The
only logs before/after for the db was the connection (at .324), and then
the parse (at .336). NB: I also checked the logs at ".333" in case there
would have been a rounding up, but nothing relevant was found. With that
said, I only verified one occurence - tomorrow I'll check a few more just
to validate.
4) Do you have external tools accessing these databases?
We have internal micro-services accessing the databases, as well as a
monitoring tool (Netdata), and some of the Devs use pgAdmin. I discarded
the scenario where someone would inadvertently do a "pg_stat_reset" via
pgAdmin, just because a lot of databases have their stats reset within a
short period of time.
On the other hand, Netdata does connect to most (if not all) databases
frequently by its nature - so as a test, I stopped the Netdata service
today to see if tomorrow we're still seeing the stats reset or not. I can
report back tomorrow on this.
5) Is the cluster directly open to the world?
No. It's an on-premise installation. Only local applications can connect to
it.
-Steeve
On Thu, Nov 21, 2024 at 4:32 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
Show quoted text
On 11/21/24 13:31, Steeve Boulanger wrote:
All I can think to do is look at the logs around the stats_reset
times
for the databases and see if there is anything relevant.
That was already done, but nothing relevant was found unfortunately.
Unless it was not recognized as relevant. Since for the time being I am
eliminating magic as the cause, something concrete is causing this and
it should be leaving a trace. In your post you had this affecting 77 out
of 157 databases in the cluster.1) Do the 77 share some trait the other 80 don't.
2) Do the OS system logs reveal anything?
3) What was happening in the databases just prior to the time the stats
reset?4) Do you have external tools accessing these databases?
5) Is the cluster directly open to the world?
-Steeve
On Thu, Nov 21, 2024 at 3:12 PM Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:On 11/21/24 12:57, Steeve Boulanger wrote:
Please reply to list also.
My apologies - I thought I did a "Reply all", but apparently not.
I'm a
little bit of a noob with email distrib lists.
1) What is log_min_error_statement set to?
name | setting | pending_restart
-------------------------+---------+-----------------
log_min_error_statement | error | f2) Did you reload the server when changing?:
yes - pg_reload_conf()
All I can think to do is look at the logs around the stats_reset
times
for the databases and see if there is anything relevant.
-Steeve
--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>--
Adrian Klaver
adrian.klaver@aklaver.com
On 11/21/24 15:50, Steeve Boulanger wrote:
1) Do the 77 share some trait the other 80 don't.
No pattern found yet .. but still verifying a few things
2) Do the OS system logs reveal anything?
Nothing found in syslog
3) What was happening in the databases just prior to the time the stats
reset?
Here's an example (log extracts) for a stats reset occurrence:
select datname, stats_reset, now()-stats_reset as since_reset
from pg_stat_database
where ( now()-stats_reset ) < interval '1 day'
order by 3 limit 1;datname | stats_reset | since_reset
----------------+-------------------------------+-----------------
MyDB | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304<--LOGS-->
2024-11-21 13:48:34.324 UTC pid=[322035][2] db=[MyDB] usr=[user1]
client=[host1] app=[[unknown]]LOG: connection authorized: user=user1
database=MyDB applicatio
n_name=app1 <..>
What is the [2] referring to?
<.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..>
My guess is the difference in time it takes to log the action and set
the log timestamp. Whereas the stats_reset value is the timestamp when
the stats system actually did the reset.
2024-11-21 13:48:34.336 UTC pid=[322035][3] db=[MyDB] usr=[user1]
client=[host1] app=[app1]LOG: duration: 1.071 ms parse <unnamed>:
SELECT <..>
The above is some garden variety select?
--
Adrian Klaver
adrian.klaver@aklaver.com
What is the [2] referring to?
Number of the log line for each session or process, starting at 1
My guess is the difference in time it takes to log the action and set
the log timestamp. Whereas the stats_reset value is the timestamp when
the stats system actually did the reset.
Very plausible. I thought the same too.
The above is some garden variety select?
Not 100% sure what the expression "garden variety select" means lol, but
I'll take a guess that it means an "select from an in-house application"
.. and yes it is.
-Steeve
On Fri, Nov 22, 2024 at 11:18 AM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
Show quoted text
On 11/21/24 15:50, Steeve Boulanger wrote:
1) Do the 77 share some trait the other 80 don't.
No pattern found yet .. but still verifying a few things
2) Do the OS system logs reveal anything?
Nothing found in syslog
3) What was happening in the databases just prior to the time the
stats
reset?
Here's an example (log extracts) for a stats reset occurrence:
select datname, stats_reset, now()-stats_reset as since_reset
from pg_stat_database
where ( now()-stats_reset ) < interval '1 day'
order by 3 limit 1;datname | stats_reset | since_reset
----------------+-------------------------------+-----------------
MyDB | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304<--LOGS-->
2024-11-21 13:48:34.324 UTC pid=[322035][2] db=[MyDB] usr=[user1]
client=[host1] app=[[unknown]]LOG: connection authorized: user=user1
database=MyDB applicatio
n_name=app1 <..>What is the [2] referring to?
<.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..>
My guess is the difference in time it takes to log the action and set
the log timestamp. Whereas the stats_reset value is the timestamp when
the stats system actually did the reset.2024-11-21 13:48:34.336 UTC pid=[322035][3] db=[MyDB] usr=[user1]
client=[host1] app=[app1]LOG: duration: 1.071 ms parse <unnamed>:
SELECT <..>The above is some garden variety select?
--
Adrian Klaver
adrian.klaver@aklaver.com
On 23/11/2024 13:06, Steeve Boulanger wrote:
The above is some garden variety select?
Not 100% sure what the expression "garden variety select" means lol,
but I'll take a guess that it means an "select from an in-house
application" .. and yes it is.
Here (Ireland) we sometimes say "common-or-garden variety".... It means
a normal, everyday variety. :-)
Ray.
-Steeve
On Fri, Nov 22, 2024 at 11:18 AM Adrian Klaver
<adrian.klaver@aklaver.com> wrote:On 11/21/24 15:50, Steeve Boulanger wrote:
> 1) Do the 77 share some trait the other 80 don't.
No pattern found yet .. but still verifying a few things
> 2) Do the OS system logs reveal anything?
Nothing found in syslog
> 3) What was happening in the databases just prior to the time
the stats
reset?
Here's an example (log extracts) for a stats reset occurrence:
select datname, stats_reset, now()-stats_reset as since_reset
from pg_stat_database
where ( now()-stats_reset ) < interval '1 day'
order by 3 limit 1;datname | stats_reset | since_reset
----------------+-------------------------------+-----------------
MyDB | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304<--LOGS-->
2024-11-21 13:48:34.324 UTC pid=[322035][2] db=[MyDB] usr=[user1]
client=[host1] app=[[unknown]]LOG: connection authorized:user=user1
database=MyDB applicatio
n_name=app1 <..>What is the [2] referring to?
<.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..>
My guess is the difference in time it takes to log the action and set
the log timestamp. Whereas the stats_reset value is the timestamp
when
the stats system actually did the reset.2024-11-21 13:48:34.336 UTC pid=[322035][3] db=[MyDB] usr=[user1]
client=[host1] app=[app1]LOG: duration: 1.071 ms parse <unnamed>:
SELECT <..>The above is some garden variety select?
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Raymond O'Donnell // Galway // Ireland
ray@rodonnell.ie
Here (Ireland) we sometimes say "common-or-garden variety".... It means a normal, everyday variety. :-)
I'm afraid that my Irish dialect is limited to "sláinte" only ;-) In
any case, thanks for taking the time to help with this issue. I'm
still investigating, but I think that calling the "ghostbusters" is
moving up the list now lol.
Show quoted text
On Sat, Nov 23, 2024 at 7:09 AM Ray O'Donnell <ray@rodonnell.ie> wrote:
On 23/11/2024 13:06, Steeve Boulanger wrote:
The above is some garden variety select?
Not 100% sure what the expression "garden variety select" means lol, but I'll take a guess that it means an "select from an in-house application" .. and yes it is.
Here (Ireland) we sometimes say "common-or-garden variety".... It means a normal, everyday variety. :-)
Ray.
-Steeve
On Fri, Nov 22, 2024 at 11:18 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 11/21/24 15:50, Steeve Boulanger wrote:
1) Do the 77 share some trait the other 80 don't.
No pattern found yet .. but still verifying a few things
2) Do the OS system logs reveal anything?
Nothing found in syslog
3) What was happening in the databases just prior to the time the stats
reset?
Here's an example (log extracts) for a stats reset occurrence:
select datname, stats_reset, now()-stats_reset as since_reset
from pg_stat_database
where ( now()-stats_reset ) < interval '1 day'
order by 3 limit 1;datname | stats_reset | since_reset
----------------+-------------------------------+-----------------
MyDB | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304<--LOGS-->
2024-11-21 13:48:34.324 UTC pid=[322035][2] db=[MyDB] usr=[user1]
client=[host1] app=[[unknown]]LOG: connection authorized: user=user1
database=MyDB applicatio
n_name=app1 <..>What is the [2] referring to?
<.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..>
My guess is the difference in time it takes to log the action and set
the log timestamp. Whereas the stats_reset value is the timestamp when
the stats system actually did the reset.2024-11-21 13:48:34.336 UTC pid=[322035][3] db=[MyDB] usr=[user1]
client=[host1] app=[app1]LOG: duration: 1.071 ms parse <unnamed>:
SELECT <..>The above is some garden variety select?
--
Adrian Klaver
adrian.klaver@aklaver.com--
Raymond O'Donnell // Galway // Ireland
ray@rodonnell.ie
On 11/23/24 05:16, Steeve Boulanger wrote:
Here (Ireland) we sometimes say "common-or-garden variety".... It means a normal, everyday variety. :-)
I'm afraid that my Irish dialect is limited to "sláinte" only ;-) In
any case, thanks for taking the time to help with this issue. I'm
still investigating, but I think that calling the "ghostbusters" is
moving up the list now lol.
I'm guessing shutting down Netdata did not stop the resets?
You might try setting log_min_messages to info to see if that catches
anything.
On Sat, Nov 23, 2024 at 7:09 AM Ray O'Donnell <ray@rodonnell.ie> wrote:
On 23/11/2024 13:06, Steeve Boulanger wrote:
--
Adrian Klaver
adrian.klaver@aklaver.com
On 11/23/24 05:16, Steeve Boulanger wrote:
Here (Ireland) we sometimes say "common-or-garden variety".... It means a normal, everyday variety. :-)
I'm afraid that my Irish dialect is limited to "sláinte" only ;-) In
any case, thanks for taking the time to help with this issue. I'm
still investigating, but I think that calling the "ghostbusters" is
moving up the list now lol.
One possible scenario:
log_min_messages = info
log_min_error_statement = info
log_statement = 'all'
psql -d test -U postgres -p 5432
CREATE OR REPLACE FUNCTION public.admin_func()
RETURNS void
LANGUAGE plpgsql
AS $function$
BEGIN
PERFORM pg_stat_reset();
RAISE NOTICE 'Reset statistics';
END;
$function$
select datname, stats_reset from pg_stat_database;
datname | stats_reset
--------------+-------------------------------
NULL | NULL
postgres | NULL
template1 | NULL
template0 | NULL
test | 2024-11-23 09:21:49.421552-08
task_manager | NULL
test_psql | NULL
production | NULL
pp_archive | NULL
farm_db | NULL
select admin_func();
NOTICE: Reset statistics
admin_func
------------
(1 row)
select datname, stats_reset from pg_stat_database;
datname | stats_reset
--------------+-------------------------------
NULL | NULL
postgres | NULL
template1 | NULL
template0 | NULL
test | 2024-11-23 09:26:30.749257-08
task_manager | NULL
test_psql | NULL
production | NULL
pp_archive | NULL
farm_db | NULL
2024-11-23 09:26:30.749 PST [14501] postgres@test LOG: statement:
select admin_func();
2024-11-23 09:26:30.749 PST [14501] postgres@test NOTICE: Reset statistics
2024-11-23 09:26:30.749 PST [14501] postgres@test CONTEXT: PL/pgSQL
function admin_func() line 4 at RAISE
2024-11-23 09:26:30.749 PST [14501] postgres@test STATEMENT: select
admin_func();
The issue being that the pg_stat_reset() is buried in a function and
does not show up on its own. The RAISE NOTICE alerts in my logs just so
I could find the function easily. It could be there is a function or
functions in your setup doing something similar.
On Sat, Nov 23, 2024 at 7:09 AM Ray O'Donnell <ray@rodonnell.ie> wrote:
On 23/11/2024 13:06, Steeve Boulanger wrote:
The above is some garden variety select?
Not 100% sure what the expression "garden variety select" means lol, but I'll take a guess that it means an "select from an in-house application" .. and yes it is.
Here (Ireland) we sometimes say "common-or-garden variety".... It means a normal, everyday variety. :-)
Ray.
-Steeve
On Fri, Nov 22, 2024 at 11:18 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 11/21/24 15:50, Steeve Boulanger wrote:
1) Do the 77 share some trait the other 80 don't.
No pattern found yet .. but still verifying a few things
2) Do the OS system logs reveal anything?
Nothing found in syslog
3) What was happening in the databases just prior to the time the stats
reset?
Here's an example (log extracts) for a stats reset occurrence:
select datname, stats_reset, now()-stats_reset as since_reset
from pg_stat_database
where ( now()-stats_reset ) < interval '1 day'
order by 3 limit 1;datname | stats_reset | since_reset
----------------+-------------------------------+-----------------
MyDB | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304<--LOGS-->
2024-11-21 13:48:34.324 UTC pid=[322035][2] db=[MyDB] usr=[user1]
client=[host1] app=[[unknown]]LOG: connection authorized: user=user1
database=MyDB applicatio
n_name=app1 <..>What is the [2] referring to?
<.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..>
My guess is the difference in time it takes to log the action and set
the log timestamp. Whereas the stats_reset value is the timestamp when
the stats system actually did the reset.2024-11-21 13:48:34.336 UTC pid=[322035][3] db=[MyDB] usr=[user1]
client=[host1] app=[app1]LOG: duration: 1.071 ms parse <unnamed>:
SELECT <..>The above is some garden variety select?
--
Adrian Klaver
adrian.klaver@aklaver.com--
Raymond O'Donnell // Galway // Ireland
ray@rodonnell.ie
--
Adrian Klaver
adrian.klaver@aklaver.com
As a superuser, rename pg_stat_reset inside one of the commonly affected
databases:
alter function pg_stat_reset rename to
hey_stop_running_pg_stat_reset_already;
Then see who starts complaining. Additionally, your server log will get
helpful entries like this:
ERROR: function pg_stat_reset() does not exist
Cheers,
Greg
Apologies for the late reply.
I'm guessing shutting down Netdata did not stop the resets?
There was a hiccup yesterday, so I had to redo the test and wait until
today to see if there were any stats resets ... and NONE today!!
reset_status | cnt
--------------------+-----
reset before today | 100
no reset | 57
reset today | 0
This is quite interesting, and a little unexpected... but in order to
confirm this theory, I have done the following just now:
1) Restart Netdata service
2) rename the "pg_stat_reset()" function , as you suggested in another
post - very clever btw! :-)
Now let's wait until tomorrow to see if we have any stats reset and/or some
relevant errors in the PG logs!
Quick note: I did do a quick check in any user functions for any
"pg_stat_reset" calls , but didn't find any. With that said, we have 150+
databases in our cluster, so perhaps I missed something. I also did a grep
on the netdata directories, but nothing as well.
-Steeve
On Sat, Nov 23, 2024 at 10:19 AM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
Show quoted text
On 11/23/24 05:16, Steeve Boulanger wrote:
Here (Ireland) we sometimes say "common-or-garden variety".... It means
a normal, everyday variety. :-)
I'm afraid that my Irish dialect is limited to "sláinte" only ;-) In
any case, thanks for taking the time to help with this issue. I'm
still investigating, but I think that calling the "ghostbusters" is
moving up the list now lol.I'm guessing shutting down Netdata did not stop the resets?
You might try setting log_min_messages to info to see if that catches
anything.On Sat, Nov 23, 2024 at 7:09 AM Ray O'Donnell <ray@rodonnell.ie> wrote:
On 23/11/2024 13:06, Steeve Boulanger wrote:
--
Adrian Klaver
adrian.klaver@aklaver.com
On 11/23/24 16:20, Steeve Boulanger wrote:
Apologies for the late reply.
I'm guessing shutting down Netdata did not stop the resets?
There was a hiccup yesterday, so I had to redo the test and wait until
today to see if there were any stats resets ... and NONE today!!reset_status | cnt
--------------------+-----
reset before today | 100
no reset | 57
reset today | 0
Getting closer.
This is quite interesting, and a little unexpected... but in order to
confirm this theory, I have done the following just now:1) Restart Netdata service
2) rename the "pg_stat_reset()" function , as you suggested in another
post - very clever btw! :-)
Not sure who 'you' refers to, but just to be clear that was Greg Sabino
Mullane.
Now let's wait until tomorrow to see if we have any stats reset and/or
some relevant errors in the PG logs!Quick note: I did do a quick check in any user functions for any
"pg_stat_reset" calls , but didn't find any. With that said, we have
150+ databases in our cluster, so perhaps I missed something. I also did
a grep on the netdata directories, but nothing as well.-Steeve
--
Adrian Klaver
adrian.klaver@aklaver.com
As a superuser, rename pg_stat_reset inside one of the commonly affected
databases:
alter function pg_stat_reset rename to
hey_stop_running_pg_stat_reset_already;
Then see who starts complaining. Additionally, your server log will get
helpful entries like this:
Thanks for the tip Greg - that's very clever! I've done this just now,
and will check back tomorrow to see if there's any corresponding error
message.
NB: I hadn't originally realized that the system functions (ie pg_*) were
created for each database, so I had renamed only the one in the postgres
database, thinking it was global (ie I'll blame this on my Oracle days
;-)).
Now I renamed the function under one database that had been verified as
getting its stats reset on a daily basis. Hopefully tomorrow we'll get more
interesting info!
-Steeve
On Sat, Nov 23, 2024 at 11:50 AM Greg Sabino Mullane <htamfids@gmail.com>
wrote:
Show quoted text
As a superuser, rename pg_stat_reset inside one of the commonly affected
databases:alter function pg_stat_reset rename to
hey_stop_running_pg_stat_reset_already;Then see who starts complaining. Additionally, your server log will get
helpful entries like this:ERROR: function pg_stat_reset() does not exist
Cheers,
Greg