Checksum errors in pg_stat_database
Would it make sense to add a column to pg_stat_database showing the total
number of checksum errors that have occurred in a database?
It's really a ">1 means it's bad", but it's a lot easier to monitor that in
the statistics views, and given how much a lot of people set their systems
out to log, it's far too easy to miss individual checksum matches in the
logs.
If we track it at the database level, I don't think the overhead of adding
one more counter would be very high either.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On Fri, Jan 11, 2019 at 5:21 AM Magnus Hagander <magnus@hagander.net> wrote:
Would it make sense to add a column to pg_stat_database showing the total number of checksum errors that have occurred in a database?
It's really a ">1 means it's bad", but it's a lot easier to monitor that in the statistics views, and given how much a lot of people set their systems out to log, it's far too easy to miss individual checksum matches in the logs.
If we track it at the database level, I don't think the overhead of adding one more counter would be very high either.
It's probably not the idea way to track it. If you have a terabyte or
fifty of data, and you see that you have some checksum failures, good
luck finding the offending blocks.
But I'm tentatively in favor of your proposal anyway, because it's
pretty simple and cheap and might help people, and doing something
noticeably better is probably annoyingly complicated.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 1/11/19 7:40 PM, Robert Haas wrote:
On Fri, Jan 11, 2019 at 5:21 AM Magnus Hagander <magnus@hagander.net> wrote:
Would it make sense to add a column to pg_stat_database showing
the total number of checksum errors that have occurred in a database?It's really a ">1 means it's bad", but it's a lot easier to monitor
that in the statistics views, and given how much a lot of people
set their systems out to log, it's far too easy to miss individual
checksum matches in the logs.If we track it at the database level, I don't think the overhead
of adding one more counter would be very high either.It's probably not the idea way to track it. If you have a terabyte or
fifty of data, and you see that you have some checksum failures, good
luck finding the offending blocks.
Isn't that somewhat similar to deadlocks, which we also track in
pg_stat_database? The number of deadlocks is rather useless on it's own,
you need to dive into the server log to find the details. Same for
checksum errors.
But I'm tentatively in favor of your proposal anyway, because it's
pretty simple and cheap and might help people, and doing something
noticeably better is probably annoyingly complicated.
+1
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jan 11, 2019 at 9:20 PM Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:
On 1/11/19 7:40 PM, Robert Haas wrote:
On Fri, Jan 11, 2019 at 5:21 AM Magnus Hagander <magnus@hagander.net>
wrote:
Would it make sense to add a column to pg_stat_database showing
the total number of checksum errors that have occurred in a database?It's really a ">1 means it's bad", but it's a lot easier to monitor
that in the statistics views, and given how much a lot of people
set their systems out to log, it's far too easy to miss individual
checksum matches in the logs.If we track it at the database level, I don't think the overhead
of adding one more counter would be very high either.It's probably not the idea way to track it. If you have a terabyte or
fifty of data, and you see that you have some checksum failures, good
luck finding the offending blocks.Isn't that somewhat similar to deadlocks, which we also track in
pg_stat_database? The number of deadlocks is rather useless on it's own,
you need to dive into the server log to find the details. Same for
checksum errors.
It is a bit similar yeah. Though a checksum counter is really a "you need
to look at fixing this right away" in a bit more sense than deadlocks. But
yes, the fact that we already tracks deadlocks there is a good example. (Of
course, I believe I added that one at some point as well, so I'm clearly
biased there)
But I'm tentatively in favor of your proposal anyway, because it's
pretty simple and cheap and might help people, and doing something
noticeably better is probably annoyingly complicated.+1
Yeah, that's the idea behind it -- it's cheap, and an
early-warning-indicator.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On 1/11/19 10:25 PM, Magnus Hagander wrote:
On Fri, Jan 11, 2019 at 9:20 PM Tomas Vondra
On 1/11/19 7:40 PM, Robert Haas wrote:But I'm tentatively in favor of your proposal anyway, because it's
pretty simple and cheap and might help people, and doing something
noticeably better is probably annoyingly complicated.+1
Yeah, that's the idea behind it -- it's cheap, and an
early-warning-indicator.
+1
--
-David
david@pgmasters.net
On Sat, Jan 12, 2019 at 5:16 AM David Steele <david@pgmasters.net> wrote:
On 1/11/19 10:25 PM, Magnus Hagander wrote:
On Fri, Jan 11, 2019 at 9:20 PM Tomas Vondra
On 1/11/19 7:40 PM, Robert Haas wrote:But I'm tentatively in favor of your proposal anyway, because it's
pretty simple and cheap and might help people, and doing something
noticeably better is probably annoyingly complicated.+1
Yeah, that's the idea behind it -- it's cheap, and an
early-warning-indicator.+1
PFA is a patch to do this.
It tracks things that happen in the general backends. Possibly we should
also consider counting the errors actually found when running base backups?
OTOH, that part of the code doesn't really track things like databases (as
it operates just on the raw data directory underneath), so that
implementation would definitely not be as clean...
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Attachments:
stat_checksums.patchtext/x-patch; charset=US-ASCII; name=stat_checksums.patchDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0e73cdcdda..e2630fd368 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2509,6 +2509,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
<entry>Number of deadlocks detected in this database</entry>
</row>
<row>
+ <entry><structfield>checksum_failures</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry>Number of data page checksum failures detected in this database</entry>
+ </row>
+ <row>
<entry><structfield>blk_read_time</structfield></entry>
<entry><type>double precision</type></entry>
<entry>Time spent reading data file blocks by backends in this database,
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 3e229c693c..7723f01327 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -823,6 +823,7 @@ CREATE VIEW pg_stat_database AS
pg_stat_get_db_temp_files(D.oid) AS temp_files,
pg_stat_get_db_temp_bytes(D.oid) AS temp_bytes,
pg_stat_get_db_deadlocks(D.oid) AS deadlocks,
+ pg_stat_get_db_checksum_failures(D.oid) AS checksum_failures,
pg_stat_get_db_blk_read_time(D.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(D.oid) AS blk_write_time,
pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 81c6499251..c5b4fab5ae 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -334,6 +334,7 @@ static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len);
static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
+static void pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len);
static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
/* ------------------------------------------------------------
@@ -1518,6 +1519,26 @@ pgstat_report_deadlock(void)
pgstat_send(&msg, sizeof(msg));
}
+
+/* --------
+ * pgstat_report_checksum_failure() -
+ *
+ * Tell the collector about a checksum failure.
+ * --------
+ */
+void
+pgstat_report_checksum_failure(void)
+{
+ PgStat_MsgDeadlock msg;
+
+ if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+ return;
+
+ pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_CHECKSUMFAILURE);
+ msg.m_databaseid = MyDatabaseId;
+ pgstat_send(&msg, sizeof(msg));
+}
+
/* --------
* pgstat_report_tempfile() -
*
@@ -4455,6 +4476,10 @@ PgstatCollectorMain(int argc, char *argv[])
pgstat_recv_tempfile((PgStat_MsgTempFile *) &msg, len);
break;
+ case PGSTAT_MTYPE_CHECKSUMFAILURE:
+ pgstat_recv_checksum_failure((PgStat_MsgChecksumFailure *) &msg, len);
+ break;
+
default:
break;
}
@@ -4554,6 +4579,7 @@ reset_dbentry_counters(PgStat_StatDBEntry *dbentry)
dbentry->n_temp_files = 0;
dbentry->n_temp_bytes = 0;
dbentry->n_deadlocks = 0;
+ dbentry->n_checksum_failures = 0;
dbentry->n_block_read_time = 0;
dbentry->n_block_write_time = 0;
@@ -6197,6 +6223,22 @@ pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len)
}
/* ----------
+ * pgstat_recv_checksum_failure() -
+ *
+ * Process a DEADLOCK message.
+ * ----------
+ */
+static void
+pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
+{
+ PgStat_StatDBEntry *dbentry;
+
+ dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
+
+ dbentry->n_checksum_failures ++;
+}
+
+/* ----------
* pgstat_recv_tempfile() -
*
* Process a TEMPFILE message.
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 517220bc33..14bc61b8ad 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -17,6 +17,7 @@
#include "access/htup_details.h"
#include "access/itup.h"
#include "access/xlog.h"
+#include "pgstat.h"
#include "storage/checksum.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
@@ -151,6 +152,8 @@ PageIsVerified(Page page, BlockNumber blkno)
errmsg("page verification failed, calculated checksum %u but expected %u",
checksum, p->pd_checksum)));
+ pgstat_report_checksum_failure();
+
if (header_sane && ignore_checksum_failure)
return true;
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 69f7265779..da1d685c08 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1498,6 +1498,21 @@ pg_stat_get_db_deadlocks(PG_FUNCTION_ARGS)
}
Datum
+pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ int64 result;
+ PgStat_StatDBEntry *dbentry;
+
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = (int64) (dbentry->n_checksum_failures);
+
+ PG_RETURN_INT64(result);
+}
+
+Datum
pg_stat_get_db_blk_read_time(PG_FUNCTION_ARGS)
{
Oid dbid = PG_GETARG_OID(0);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index a4e173b484..911fad1af9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5233,6 +5233,10 @@
proname => 'pg_stat_get_db_deadlocks', provolatile => 's', proparallel => 'r',
prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_db_deadlocks' },
+{ oid => '3425', descr => 'statistics: checksum failures detected in database',
+ proname => 'pg_stat_get_db_checksum_failures', provolatile => 's', proparallel => 'r',
+ prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_checksum_failures' },
{ oid => '3074', descr => 'statistics: last reset for a database',
proname => 'pg_stat_get_db_stat_reset_time', provolatile => 's',
proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 88a75fb798..b9e6463f2b 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -64,7 +64,8 @@ typedef enum StatMsgType
PGSTAT_MTYPE_FUNCPURGE,
PGSTAT_MTYPE_RECOVERYCONFLICT,
PGSTAT_MTYPE_TEMPFILE,
- PGSTAT_MTYPE_DEADLOCK
+ PGSTAT_MTYPE_DEADLOCK,
+ PGSTAT_MTYPE_CHECKSUMFAILURE
} StatMsgType;
/* ----------
@@ -530,6 +531,17 @@ typedef struct PgStat_MsgDeadlock
Oid m_databaseid;
} PgStat_MsgDeadlock;
+/* ----------
+ * PgStat_MsgChecksumFailure Sent by the backend to tell the collector
+ * about checksum failures noticed.
+ * ----------
+ */
+typedef struct PgStat_MsgChecksumFailure
+{
+ PgStat_MsgHdr m_hdr;
+ Oid m_databaseid;
+} PgStat_MsgChecksumFailure;
+
/* ----------
* PgStat_Msg Union over all possible messages.
@@ -593,6 +605,7 @@ typedef struct PgStat_StatDBEntry
PgStat_Counter n_temp_files;
PgStat_Counter n_temp_bytes;
PgStat_Counter n_deadlocks;
+ PgStat_Counter n_checksum_failures;
PgStat_Counter n_block_read_time; /* times in microseconds */
PgStat_Counter n_block_write_time;
@@ -1200,6 +1213,7 @@ extern void pgstat_report_analyze(Relation rel,
extern void pgstat_report_recovery_conflict(int reason);
extern void pgstat_report_deadlock(void);
+extern void pgstat_report_checksum_failure(void);
extern void pgstat_initialize(void);
extern void pgstat_bestart(void);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 98f417cb57..e0f2c543ef 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1817,6 +1817,7 @@ pg_stat_database| SELECT d.oid AS datid,
pg_stat_get_db_temp_files(d.oid) AS temp_files,
pg_stat_get_db_temp_bytes(d.oid) AS temp_bytes,
pg_stat_get_db_deadlocks(d.oid) AS deadlocks,
+ pg_stat_get_db_checksum_failures(d.oid) AS checksum_failures,
pg_stat_get_db_blk_read_time(d.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(d.oid) AS blk_write_time,
pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset
On Fri, Feb 22, 2019 at 3:01 PM Magnus Hagander <magnus@hagander.net> wrote:
PFA is a patch to do this.
+void
+pgstat_report_checksum_failure(void)
+{
+ PgStat_MsgDeadlock msg;
I think that you meant PgStat_MsgChecksumFailure :)
+/* ----------
+ * pgstat_recv_checksum_failure() -
+ *
+ * Process a DEADLOCK message.
+ * ----------
same here
Otherwise LGTM.
On Fri, Feb 22, 2019 at 3:16 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Fri, Feb 22, 2019 at 3:01 PM Magnus Hagander <magnus@hagander.net>
wrote:PFA is a patch to do this.
+void +pgstat_report_checksum_failure(void) +{ + PgStat_MsgDeadlock msg;I think that you meant PgStat_MsgChecksumFailure :)
+/* ---------- + * pgstat_recv_checksum_failure() - + * + * Process a DEADLOCK message. + * ----------same here
Otherwise LGTM.
Haha, damit, that's embarassing. You can probably guess where I copy/pasted
from :)
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On Fri, Feb 22, 2019 at 3:23 PM Magnus Hagander <magnus@hagander.net> wrote:
On Fri, Feb 22, 2019 at 3:16 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Fri, Feb 22, 2019 at 3:01 PM Magnus Hagander <magnus@hagander.net>
wrote:PFA is a patch to do this.
+void +pgstat_report_checksum_failure(void) +{ + PgStat_MsgDeadlock msg;I think that you meant PgStat_MsgChecksumFailure :)
+/* ---------- + * pgstat_recv_checksum_failure() - + * + * Process a DEADLOCK message. + * ----------same here
Otherwise LGTM.
Haha, damit, that's embarassing. You can probably guess where I
copy/pasted from :)
And of course, then I forgot to attach the new file.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Attachments:
stat_checksums2.patchtext/x-patch; charset=US-ASCII; name=stat_checksums2.patchDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0e73cdcdda..e2630fd368 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2509,6 +2509,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
<entry>Number of deadlocks detected in this database</entry>
</row>
<row>
+ <entry><structfield>checksum_failures</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry>Number of data page checksum failures detected in this database</entry>
+ </row>
+ <row>
<entry><structfield>blk_read_time</structfield></entry>
<entry><type>double precision</type></entry>
<entry>Time spent reading data file blocks by backends in this database,
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 3e229c693c..7723f01327 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -823,6 +823,7 @@ CREATE VIEW pg_stat_database AS
pg_stat_get_db_temp_files(D.oid) AS temp_files,
pg_stat_get_db_temp_bytes(D.oid) AS temp_bytes,
pg_stat_get_db_deadlocks(D.oid) AS deadlocks,
+ pg_stat_get_db_checksum_failures(D.oid) AS checksum_failures,
pg_stat_get_db_blk_read_time(D.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(D.oid) AS blk_write_time,
pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 81c6499251..f09028d208 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -334,6 +334,7 @@ static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len);
static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
+static void pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len);
static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
/* ------------------------------------------------------------
@@ -1518,6 +1519,26 @@ pgstat_report_deadlock(void)
pgstat_send(&msg, sizeof(msg));
}
+
+/* --------
+ * pgstat_report_checksum_failure() -
+ *
+ * Tell the collector about a checksum failure.
+ * --------
+ */
+void
+pgstat_report_checksum_failure(void)
+{
+ PgStat_MsgChecksumFailure msg;
+
+ if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+ return;
+
+ pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_CHECKSUMFAILURE);
+ msg.m_databaseid = MyDatabaseId;
+ pgstat_send(&msg, sizeof(msg));
+}
+
/* --------
* pgstat_report_tempfile() -
*
@@ -4455,6 +4476,10 @@ PgstatCollectorMain(int argc, char *argv[])
pgstat_recv_tempfile((PgStat_MsgTempFile *) &msg, len);
break;
+ case PGSTAT_MTYPE_CHECKSUMFAILURE:
+ pgstat_recv_checksum_failure((PgStat_MsgChecksumFailure *) &msg, len);
+ break;
+
default:
break;
}
@@ -4554,6 +4579,7 @@ reset_dbentry_counters(PgStat_StatDBEntry *dbentry)
dbentry->n_temp_files = 0;
dbentry->n_temp_bytes = 0;
dbentry->n_deadlocks = 0;
+ dbentry->n_checksum_failures = 0;
dbentry->n_block_read_time = 0;
dbentry->n_block_write_time = 0;
@@ -6197,6 +6223,22 @@ pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len)
}
/* ----------
+ * pgstat_recv_checksum_failure() -
+ *
+ * Process a CHECKSUMFAILURE message.
+ * ----------
+ */
+static void
+pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
+{
+ PgStat_StatDBEntry *dbentry;
+
+ dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
+
+ dbentry->n_checksum_failures ++;
+}
+
+/* ----------
* pgstat_recv_tempfile() -
*
* Process a TEMPFILE message.
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 517220bc33..14bc61b8ad 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -17,6 +17,7 @@
#include "access/htup_details.h"
#include "access/itup.h"
#include "access/xlog.h"
+#include "pgstat.h"
#include "storage/checksum.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
@@ -151,6 +152,8 @@ PageIsVerified(Page page, BlockNumber blkno)
errmsg("page verification failed, calculated checksum %u but expected %u",
checksum, p->pd_checksum)));
+ pgstat_report_checksum_failure();
+
if (header_sane && ignore_checksum_failure)
return true;
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 69f7265779..da1d685c08 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1498,6 +1498,21 @@ pg_stat_get_db_deadlocks(PG_FUNCTION_ARGS)
}
Datum
+pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ int64 result;
+ PgStat_StatDBEntry *dbentry;
+
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = (int64) (dbentry->n_checksum_failures);
+
+ PG_RETURN_INT64(result);
+}
+
+Datum
pg_stat_get_db_blk_read_time(PG_FUNCTION_ARGS)
{
Oid dbid = PG_GETARG_OID(0);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index a4e173b484..911fad1af9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5233,6 +5233,10 @@
proname => 'pg_stat_get_db_deadlocks', provolatile => 's', proparallel => 'r',
prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_db_deadlocks' },
+{ oid => '3425', descr => 'statistics: checksum failures detected in database',
+ proname => 'pg_stat_get_db_checksum_failures', provolatile => 's', proparallel => 'r',
+ prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_checksum_failures' },
{ oid => '3074', descr => 'statistics: last reset for a database',
proname => 'pg_stat_get_db_stat_reset_time', provolatile => 's',
proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 88a75fb798..b9e6463f2b 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -64,7 +64,8 @@ typedef enum StatMsgType
PGSTAT_MTYPE_FUNCPURGE,
PGSTAT_MTYPE_RECOVERYCONFLICT,
PGSTAT_MTYPE_TEMPFILE,
- PGSTAT_MTYPE_DEADLOCK
+ PGSTAT_MTYPE_DEADLOCK,
+ PGSTAT_MTYPE_CHECKSUMFAILURE
} StatMsgType;
/* ----------
@@ -530,6 +531,17 @@ typedef struct PgStat_MsgDeadlock
Oid m_databaseid;
} PgStat_MsgDeadlock;
+/* ----------
+ * PgStat_MsgChecksumFailure Sent by the backend to tell the collector
+ * about checksum failures noticed.
+ * ----------
+ */
+typedef struct PgStat_MsgChecksumFailure
+{
+ PgStat_MsgHdr m_hdr;
+ Oid m_databaseid;
+} PgStat_MsgChecksumFailure;
+
/* ----------
* PgStat_Msg Union over all possible messages.
@@ -593,6 +605,7 @@ typedef struct PgStat_StatDBEntry
PgStat_Counter n_temp_files;
PgStat_Counter n_temp_bytes;
PgStat_Counter n_deadlocks;
+ PgStat_Counter n_checksum_failures;
PgStat_Counter n_block_read_time; /* times in microseconds */
PgStat_Counter n_block_write_time;
@@ -1200,6 +1213,7 @@ extern void pgstat_report_analyze(Relation rel,
extern void pgstat_report_recovery_conflict(int reason);
extern void pgstat_report_deadlock(void);
+extern void pgstat_report_checksum_failure(void);
extern void pgstat_initialize(void);
extern void pgstat_bestart(void);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 98f417cb57..e0f2c543ef 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1817,6 +1817,7 @@ pg_stat_database| SELECT d.oid AS datid,
pg_stat_get_db_temp_files(d.oid) AS temp_files,
pg_stat_get_db_temp_bytes(d.oid) AS temp_bytes,
pg_stat_get_db_deadlocks(d.oid) AS deadlocks,
+ pg_stat_get_db_checksum_failures(d.oid) AS checksum_failures,
pg_stat_get_db_blk_read_time(d.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(d.oid) AS blk_write_time,
pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset
On Fri, Feb 22, 2019 at 3:25 PM Magnus Hagander <magnus@hagander.net> wrote:
On Fri, Feb 22, 2019 at 3:23 PM Magnus Hagander <magnus@hagander.net> wrote:
On Fri, Feb 22, 2019 at 3:16 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Fri, Feb 22, 2019 at 3:01 PM Magnus Hagander <magnus@hagander.net> wrote:
PFA is a patch to do this.
+void +pgstat_report_checksum_failure(void) +{ + PgStat_MsgDeadlock msg;I think that you meant PgStat_MsgChecksumFailure :)
+/* ---------- + * pgstat_recv_checksum_failure() - + * + * Process a DEADLOCK message. + * ----------same here
Otherwise LGTM.
Haha, damit, that's embarassing. You can probably guess where I copy/pasted from :)
heh :)
And of course, then I forgot to attach the new file.
It all looks fine. One minor nitpicking issue I just noticed, there's
an extra space there:
+ dbentry->n_checksum_failures ++;
I'm marking it as ready for committer!
On Fri, Feb 22, 2019 at 3:01 PM Magnus Hagander <magnus@hagander.net> wrote:
It tracks things that happen in the general backends. Possibly we should also consider counting the errors actually found when running base backups? OTOH, that part of the code doesn't really track things like databases (as it operates just on the raw data directory underneath), so that implementation would definitely not be as clean...
Sorry I just realized that I totally forgot this part of the thread.
While it's true that we operate on raw directory, I see that sendDir()
already setup a isDbDir var, and if this is true lastDir should
contain the oid of the underlying database. Wouldn't it be enough to
call sendFile() using this, something like (untested):
if (!sizeonly)
- sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true);
+ sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true,
isDbDir ? pg_atoi(lastDir+1, 4) : InvalidOid);
and accordingly report any checksum error from sendFile()?
On Mon, Mar 4, 2019 at 8:31 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Fri, Feb 22, 2019 at 3:01 PM Magnus Hagander <magnus@hagander.net> wrote:
It tracks things that happen in the general backends. Possibly we should also consider counting the errors actually found when running base backups? OTOH, that part of the code doesn't really track things like databases (as it operates just on the raw data directory underneath), so that implementation would definitely not be as clean...
Sorry I just realized that I totally forgot this part of the thread.
While it's true that we operate on raw directory, I see that sendDir()
already setup a isDbDir var, and if this is true lastDir should
contain the oid of the underlying database. Wouldn't it be enough to
call sendFile() using this, something like (untested):if (!sizeonly) - sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true); + sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true, isDbDir ? pg_atoi(lastDir+1, 4) : InvalidOid);and accordingly report any checksum error from sendFile()?
So this seem to work just fine without adding much code. PFA v3 of
Magnus' patch including error reporting for BASE_BACKUP.
Attachments:
stat_checksums_v3.diffapplication/octet-stream; name=stat_checksums_v3.diffDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0e73cdcdda..e2630fd368 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2508,6 +2508,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
<entry><type>bigint</type></entry>
<entry>Number of deadlocks detected in this database</entry>
</row>
+ <row>
+ <entry><structfield>checksum_failures</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry>Number of data page checksum failures detected in this database</entry>
+ </row>
<row>
<entry><structfield>blk_read_time</structfield></entry>
<entry><type>double precision</type></entry>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 3e229c693c..7723f01327 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -823,6 +823,7 @@ CREATE VIEW pg_stat_database AS
pg_stat_get_db_temp_files(D.oid) AS temp_files,
pg_stat_get_db_temp_bytes(D.oid) AS temp_bytes,
pg_stat_get_db_deadlocks(D.oid) AS deadlocks,
+ pg_stat_get_db_checksum_failures(D.oid) AS checksum_failures,
pg_stat_get_db_blk_read_time(D.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(D.oid) AS blk_write_time,
pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 81c6499251..ce4e2a7b2d 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -334,6 +334,7 @@ static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len);
static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
+static void pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len);
static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
/* ------------------------------------------------------------
@@ -1518,6 +1519,27 @@ pgstat_report_deadlock(void)
pgstat_send(&msg, sizeof(msg));
}
+
+/* --------
+ * pgstat_report_checksum_failure() -
+ *
+ * Tell the collector about a checksum failure.
+ * --------
+ */
+void
+pgstat_report_checksum_failure(Oid dbid, int nfailures)
+{
+ PgStat_MsgChecksumFailure msg;
+
+ if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+ return;
+
+ pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_CHECKSUMFAILURE);
+ msg.m_databaseid = dbid;
+ msg.m_nfailures = nfailures;
+ pgstat_send(&msg, sizeof(msg));
+}
+
/* --------
* pgstat_report_tempfile() -
*
@@ -4455,6 +4477,10 @@ PgstatCollectorMain(int argc, char *argv[])
pgstat_recv_tempfile((PgStat_MsgTempFile *) &msg, len);
break;
+ case PGSTAT_MTYPE_CHECKSUMFAILURE:
+ pgstat_recv_checksum_failure((PgStat_MsgChecksumFailure *) &msg, len);
+ break;
+
default:
break;
}
@@ -4554,6 +4580,7 @@ reset_dbentry_counters(PgStat_StatDBEntry *dbentry)
dbentry->n_temp_files = 0;
dbentry->n_temp_bytes = 0;
dbentry->n_deadlocks = 0;
+ dbentry->n_checksum_failures = 0;
dbentry->n_block_read_time = 0;
dbentry->n_block_write_time = 0;
@@ -6196,6 +6223,22 @@ pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len)
dbentry->n_deadlocks++;
}
+/* ----------
+ * pgstat_recv_checksum_failure() -
+ *
+ * Process a CHECKSUMFAILURE message.
+ * ----------
+ */
+static void
+pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
+{
+ PgStat_StatDBEntry *dbentry;
+
+ dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
+
+ dbentry->n_checksum_failures+= msg->m_nfailures;
+}
+
/* ----------
* pgstat_recv_tempfile() -
*
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index def6c03dd0..2236888eb5 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -58,7 +58,7 @@ typedef struct
static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
List *tablespaces, bool sendtblspclinks);
static bool sendFile(const char *readfilename, const char *tarfilename,
- struct stat *statbuf, bool missing_ok);
+ struct stat *statbuf, bool missing_ok, Oid dbid);
static void sendFileWithContent(const char *filename, const char *content);
static int64 _tarWriteHeader(const char *filename, const char *linktarget,
struct stat *statbuf, bool sizeonly);
@@ -342,7 +342,8 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m",
XLOG_CONTROL_FILE)));
- sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false);
+ sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false,
+ InvalidOid);
}
else
sendTablespace(ti->path, false);
@@ -592,7 +593,7 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m", pathbuf)));
- sendFile(pathbuf, pathbuf, &statbuf, false);
+ sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
/* unconditionally mark file as archived */
StatusFilePath(pathbuf, fname, ".done");
@@ -1302,7 +1303,8 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (!sizeonly)
sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
- true);
+ true,
+ isDbDir ? pg_atoi(lastDir + 1, 4, 0) : InvalidOid);
if (sent || sizeonly)
{
@@ -1357,13 +1359,14 @@ is_checksummed_file(const char *fullpath, const char *filename)
* Given the member, write the TAR header & send the file.
*
* If 'missing_ok' is true, will not throw an error if the file is not found.
+ * The dbid field is used to report checksum error that could happen.
*
* Returns true if the file was successfully sent, false if 'missing_ok',
* and the file did not exist.
*/
static bool
-sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf,
- bool missing_ok)
+sendFile(const char *readfilename, const char *tarfilename,
+ struct stat *statbuf, bool missing_ok, Oid dbid)
{
FILE *fp;
BlockNumber blkno = 0;
@@ -1583,6 +1586,9 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
}
total_checksum_failures += checksum_failures;
+ if (checksum_failures != 0 && dbid != InvalidOid)
+ pgstat_report_checksum_failure(dbid, checksum_failures);
+
return true;
}
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 517220bc33..c1e61b318b 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -17,6 +17,8 @@
#include "access/htup_details.h"
#include "access/itup.h"
#include "access/xlog.h"
+#include "miscadmin.h"
+#include "pgstat.h"
#include "storage/checksum.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
@@ -151,6 +153,8 @@ PageIsVerified(Page page, BlockNumber blkno)
errmsg("page verification failed, calculated checksum %u but expected %u",
checksum, p->pd_checksum)));
+ pgstat_report_checksum_failure(MyDatabaseId, 1);
+
if (header_sane && ignore_checksum_failure)
return true;
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 69f7265779..da1d685c08 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1497,6 +1497,21 @@ pg_stat_get_db_deadlocks(PG_FUNCTION_ARGS)
PG_RETURN_INT64(result);
}
+Datum
+pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ int64 result;
+ PgStat_StatDBEntry *dbentry;
+
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = (int64) (dbentry->n_checksum_failures);
+
+ PG_RETURN_INT64(result);
+}
+
Datum
pg_stat_get_db_blk_read_time(PG_FUNCTION_ARGS)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index e9638660ff..2768926716 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5227,6 +5227,10 @@
proname => 'pg_stat_get_db_deadlocks', provolatile => 's', proparallel => 'r',
prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_db_deadlocks' },
+{ oid => '3425', descr => 'statistics: checksum failures detected in database',
+ proname => 'pg_stat_get_db_checksum_failures', provolatile => 's', proparallel => 'r',
+ prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_checksum_failures' },
{ oid => '3074', descr => 'statistics: last reset for a database',
proname => 'pg_stat_get_db_stat_reset_time', provolatile => 's',
proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 88a75fb798..6ba20e88fb 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -64,7 +64,8 @@ typedef enum StatMsgType
PGSTAT_MTYPE_FUNCPURGE,
PGSTAT_MTYPE_RECOVERYCONFLICT,
PGSTAT_MTYPE_TEMPFILE,
- PGSTAT_MTYPE_DEADLOCK
+ PGSTAT_MTYPE_DEADLOCK,
+ PGSTAT_MTYPE_CHECKSUMFAILURE
} StatMsgType;
/* ----------
@@ -530,6 +531,18 @@ typedef struct PgStat_MsgDeadlock
Oid m_databaseid;
} PgStat_MsgDeadlock;
+/* ----------
+ * PgStat_MsgChecksumFailure Sent by the backend to tell the collector
+ * about checksum failures noticed.
+ * ----------
+ */
+typedef struct PgStat_MsgChecksumFailure
+{
+ PgStat_MsgHdr m_hdr;
+ Oid m_databaseid;
+ int m_nfailures;
+} PgStat_MsgChecksumFailure;
+
/* ----------
* PgStat_Msg Union over all possible messages.
@@ -593,6 +606,7 @@ typedef struct PgStat_StatDBEntry
PgStat_Counter n_temp_files;
PgStat_Counter n_temp_bytes;
PgStat_Counter n_deadlocks;
+ PgStat_Counter n_checksum_failures;
PgStat_Counter n_block_read_time; /* times in microseconds */
PgStat_Counter n_block_write_time;
@@ -1200,6 +1214,7 @@ extern void pgstat_report_analyze(Relation rel,
extern void pgstat_report_recovery_conflict(int reason);
extern void pgstat_report_deadlock(void);
+extern void pgstat_report_checksum_failure(Oid dbid, int nfailures);
extern void pgstat_initialize(void);
extern void pgstat_bestart(void);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 98f417cb57..e0f2c543ef 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1817,6 +1817,7 @@ pg_stat_database| SELECT d.oid AS datid,
pg_stat_get_db_temp_files(d.oid) AS temp_files,
pg_stat_get_db_temp_bytes(d.oid) AS temp_bytes,
pg_stat_get_db_deadlocks(d.oid) AS deadlocks,
+ pg_stat_get_db_checksum_failures(d.oid) AS checksum_failures,
pg_stat_get_db_blk_read_time(d.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(d.oid) AS blk_write_time,
pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset
On Mon, Mar 4, 2019 at 11:31 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Fri, Feb 22, 2019 at 3:01 PM Magnus Hagander <magnus@hagander.net>
wrote:It tracks things that happen in the general backends. Possibly we should
also consider counting the errors actually found when running base backups?
OTOH, that part of the code doesn't really track things like databases (as
it operates just on the raw data directory underneath), so that
implementation would definitely not be as clean...Sorry I just realized that I totally forgot this part of the thread.
While it's true that we operate on raw directory, I see that sendDir()
already setup a isDbDir var, and if this is true lastDir should
contain the oid of the underlying database. Wouldn't it be enough to
call sendFile() using this, something like (untested):if (!sizeonly) - sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true); + sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true, isDbDir ? pg_atoi(lastDir+1, 4) : InvalidOid);and accordingly report any checksum error from sendFile()?
That seems it was easy enough. PFA an updated patch that does this, and
also rebased so it doesn't conflict on oid.
(yes, while moving this from draft to publish after lunch, I realized that
you put a patch togerher for about the same. So let's merge it)
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Attachments:
stat_checksums_2.patchtext/x-patch; charset=US-ASCII; name=stat_checksums_2.patchDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0e73cdcdda..e2630fd368 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2509,6 +2509,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
<entry>Number of deadlocks detected in this database</entry>
</row>
<row>
+ <entry><structfield>checksum_failures</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry>Number of data page checksum failures detected in this database</entry>
+ </row>
+ <row>
<entry><structfield>blk_read_time</structfield></entry>
<entry><type>double precision</type></entry>
<entry>Time spent reading data file blocks by backends in this database,
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 3e229c693c..7723f01327 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -823,6 +823,7 @@ CREATE VIEW pg_stat_database AS
pg_stat_get_db_temp_files(D.oid) AS temp_files,
pg_stat_get_db_temp_bytes(D.oid) AS temp_bytes,
pg_stat_get_db_deadlocks(D.oid) AS deadlocks,
+ pg_stat_get_db_checksum_failures(D.oid) AS checksum_failures,
pg_stat_get_db_blk_read_time(D.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(D.oid) AS blk_write_time,
pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 81c6499251..43ec33834b 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -334,6 +334,7 @@ static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len);
static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
+static void pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len);
static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
/* ------------------------------------------------------------
@@ -1518,6 +1519,40 @@ pgstat_report_deadlock(void)
pgstat_send(&msg, sizeof(msg));
}
+
+
+/* --------
+ * pgstat_report_checksum_failures_in_db(dboid, failure_count) -
+ *
+ * Tell the collector about one or more checksum failures.
+ * --------
+ */
+void
+pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount)
+{
+ PgStat_MsgChecksumFailure msg;
+
+ if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+ return;
+
+ pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_CHECKSUMFAILURE);
+ msg.m_databaseid = dboid;
+ msg.m_failurecount = failurecount;
+ pgstat_send(&msg, sizeof(msg));
+}
+
+/* --------
+ * pgstat_report_checksum_failure() -
+ *
+ * Tell the collector about a checksum failure.
+ * --------
+ */
+void
+pgstat_report_checksum_failure(void)
+{
+ pgstat_report_checksum_failures_in_db(MyDatabaseId, 1);
+}
+
/* --------
* pgstat_report_tempfile() -
*
@@ -4455,6 +4490,10 @@ PgstatCollectorMain(int argc, char *argv[])
pgstat_recv_tempfile((PgStat_MsgTempFile *) &msg, len);
break;
+ case PGSTAT_MTYPE_CHECKSUMFAILURE:
+ pgstat_recv_checksum_failure((PgStat_MsgChecksumFailure *) &msg, len);
+ break;
+
default:
break;
}
@@ -4554,6 +4593,7 @@ reset_dbentry_counters(PgStat_StatDBEntry *dbentry)
dbentry->n_temp_files = 0;
dbentry->n_temp_bytes = 0;
dbentry->n_deadlocks = 0;
+ dbentry->n_checksum_failures = 0;
dbentry->n_block_read_time = 0;
dbentry->n_block_write_time = 0;
@@ -6197,6 +6237,22 @@ pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len)
}
/* ----------
+ * pgstat_recv_checksum_failure() -
+ *
+ * Process a CHECKSUMFAILURE message.
+ * ----------
+ */
+static void
+pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
+{
+ PgStat_StatDBEntry *dbentry;
+
+ dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
+
+ dbentry->n_checksum_failures += msg->m_failurecount;
+}
+
+/* ----------
* pgstat_recv_tempfile() -
*
* Process a TEMPFILE message.
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index def6c03dd0..6c324a6661 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -58,7 +58,7 @@ typedef struct
static int64 sendDir(const char *path, int basepathlen, bool sizeonly,
List *tablespaces, bool sendtblspclinks);
static bool sendFile(const char *readfilename, const char *tarfilename,
- struct stat *statbuf, bool missing_ok);
+ struct stat *statbuf, bool missing_ok, Oid dboid);
static void sendFileWithContent(const char *filename, const char *content);
static int64 _tarWriteHeader(const char *filename, const char *linktarget,
struct stat *statbuf, bool sizeonly);
@@ -342,7 +342,7 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m",
XLOG_CONTROL_FILE)));
- sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false);
+ sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false, InvalidOid);
}
else
sendTablespace(ti->path, false);
@@ -592,7 +592,7 @@ perform_base_backup(basebackup_options *opt)
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m", pathbuf)));
- sendFile(pathbuf, pathbuf, &statbuf, false);
+ sendFile(pathbuf, pathbuf, &statbuf, false, InvalidOid);
/* unconditionally mark file as archived */
StatusFilePath(pathbuf, fname, ".done");
@@ -1302,7 +1302,7 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
if (!sizeonly)
sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf,
- true);
+ true, isDbDir ? pg_atoi(lastDir + 1, sizeof(Oid), 0) : InvalidOid);
if (sent || sizeonly)
{
@@ -1358,12 +1358,15 @@ is_checksummed_file(const char *fullpath, const char *filename)
*
* If 'missing_ok' is true, will not throw an error if the file is not found.
*
+ * If dboid is anything other than InvalidOid then any checksum failures detected
+ * will get reported to the stats collector.
+ *
* Returns true if the file was successfully sent, false if 'missing_ok',
* and the file did not exist.
*/
static bool
sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf,
- bool missing_ok)
+ bool missing_ok, Oid dboid)
{
FILE *fp;
BlockNumber blkno = 0;
@@ -1580,6 +1583,9 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
ereport(WARNING,
(errmsg("file \"%s\" has a total of %d checksum verification "
"failures", readfilename, checksum_failures)));
+
+ if (dboid != InvalidOid)
+ pgstat_report_checksum_failures_in_db(dboid, checksum_failures);
}
total_checksum_failures += checksum_failures;
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 517220bc33..14bc61b8ad 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -17,6 +17,7 @@
#include "access/htup_details.h"
#include "access/itup.h"
#include "access/xlog.h"
+#include "pgstat.h"
#include "storage/checksum.h"
#include "utils/memdebug.h"
#include "utils/memutils.h"
@@ -151,6 +152,8 @@ PageIsVerified(Page page, BlockNumber blkno)
errmsg("page verification failed, calculated checksum %u but expected %u",
checksum, p->pd_checksum)));
+ pgstat_report_checksum_failure();
+
if (header_sane && ignore_checksum_failure)
return true;
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 69f7265779..da1d685c08 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1498,6 +1498,21 @@ pg_stat_get_db_deadlocks(PG_FUNCTION_ARGS)
}
Datum
+pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ int64 result;
+ PgStat_StatDBEntry *dbentry;
+
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = (int64) (dbentry->n_checksum_failures);
+
+ PG_RETURN_INT64(result);
+}
+
+Datum
pg_stat_get_db_blk_read_time(PG_FUNCTION_ARGS)
{
Oid dbid = PG_GETARG_OID(0);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index e9638660ff..562c5408f8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5227,6 +5227,10 @@
proname => 'pg_stat_get_db_deadlocks', provolatile => 's', proparallel => 'r',
prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_db_deadlocks' },
+{ oid => '3426', descr => 'statistics: checksum failures detected in database',
+ proname => 'pg_stat_get_db_checksum_failures', provolatile => 's', proparallel => 'r',
+ prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_checksum_failures' },
{ oid => '3074', descr => 'statistics: last reset for a database',
proname => 'pg_stat_get_db_stat_reset_time', provolatile => 's',
proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 88a75fb798..725c8b0d64 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -64,7 +64,8 @@ typedef enum StatMsgType
PGSTAT_MTYPE_FUNCPURGE,
PGSTAT_MTYPE_RECOVERYCONFLICT,
PGSTAT_MTYPE_TEMPFILE,
- PGSTAT_MTYPE_DEADLOCK
+ PGSTAT_MTYPE_DEADLOCK,
+ PGSTAT_MTYPE_CHECKSUMFAILURE
} StatMsgType;
/* ----------
@@ -530,6 +531,18 @@ typedef struct PgStat_MsgDeadlock
Oid m_databaseid;
} PgStat_MsgDeadlock;
+/* ----------
+ * PgStat_MsgChecksumFailure Sent by the backend to tell the collector
+ * about checksum failures noticed.
+ * ----------
+ */
+typedef struct PgStat_MsgChecksumFailure
+{
+ PgStat_MsgHdr m_hdr;
+ Oid m_databaseid;
+ int m_failurecount;
+} PgStat_MsgChecksumFailure;
+
/* ----------
* PgStat_Msg Union over all possible messages.
@@ -593,6 +606,7 @@ typedef struct PgStat_StatDBEntry
PgStat_Counter n_temp_files;
PgStat_Counter n_temp_bytes;
PgStat_Counter n_deadlocks;
+ PgStat_Counter n_checksum_failures;
PgStat_Counter n_block_read_time; /* times in microseconds */
PgStat_Counter n_block_write_time;
@@ -1200,6 +1214,8 @@ extern void pgstat_report_analyze(Relation rel,
extern void pgstat_report_recovery_conflict(int reason);
extern void pgstat_report_deadlock(void);
+extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
+extern void pgstat_report_checksum_failure(void);
extern void pgstat_initialize(void);
extern void pgstat_bestart(void);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 98f417cb57..e0f2c543ef 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1817,6 +1817,7 @@ pg_stat_database| SELECT d.oid AS datid,
pg_stat_get_db_temp_files(d.oid) AS temp_files,
pg_stat_get_db_temp_bytes(d.oid) AS temp_bytes,
pg_stat_get_db_deadlocks(d.oid) AS deadlocks,
+ pg_stat_get_db_checksum_failures(d.oid) AS checksum_failures,
pg_stat_get_db_blk_read_time(d.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(d.oid) AS blk_write_time,
pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset
On Sat, Mar 9, 2019 at 12:35 AM Magnus Hagander <magnus@hagander.net> wrote:
On Mon, Mar 4, 2019 at 11:31 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Fri, Feb 22, 2019 at 3:01 PM Magnus Hagander <magnus@hagander.net> wrote:
It tracks things that happen in the general backends. Possibly we should also consider counting the errors actually found when running base backups? OTOH, that part of the code doesn't really track things like databases (as it operates just on the raw data directory underneath), so that implementation would definitely not be as clean...
Sorry I just realized that I totally forgot this part of the thread.
While it's true that we operate on raw directory, I see that sendDir()
already setup a isDbDir var, and if this is true lastDir should
contain the oid of the underlying database. Wouldn't it be enough to
call sendFile() using this, something like (untested):if (!sizeonly) - sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true); + sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true, isDbDir ? pg_atoi(lastDir+1, 4) : InvalidOid);and accordingly report any checksum error from sendFile()?
That seems it was easy enough. PFA an updated patch that does this, and also rebased so it doesn't conflict on oid.
(yes, while moving this from draft to publish after lunch, I realized that you put a patch togerher for about the same. So let's merge it)
Thanks! Our implementations are quite similar, so I'm fine with most
of the changes :) I'm just not sure about having two distinct
functions for reporting failures, given that there's only one caller
for each. On the other hand it avoids to include miscadmin.h in
bufpage.c.
That's just a detail, so I'm marking it (again) as ready for committer!
On Sat, Mar 9, 2019 at 9:34 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Mar 9, 2019 at 12:35 AM Magnus Hagander <magnus@hagander.net> wrote:
On Mon, Mar 4, 2019 at 11:31 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Fri, Feb 22, 2019 at 3:01 PM Magnus Hagander <magnus@hagander.net> wrote:
It tracks things that happen in the general backends. Possibly we should also consider counting the errors actually found when running base backups? OTOH, that part of the code doesn't really track things like databases (as it operates just on the raw data directory underneath), so that implementation would definitely not be as clean...
Sorry I just realized that I totally forgot this part of the thread.
While it's true that we operate on raw directory, I see that sendDir()
already setup a isDbDir var, and if this is true lastDir should
contain the oid of the underlying database. Wouldn't it be enough to
call sendFile() using this, something like (untested):if (!sizeonly) - sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true); + sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true, isDbDir ? pg_atoi(lastDir+1, 4) : InvalidOid);and accordingly report any checksum error from sendFile()?
That seems it was easy enough. PFA an updated patch that does this, and also rebased so it doesn't conflict on oid.
Sorry, I have again new comments after a little bit more thinking.
I'm wondering if we can do something about shared objects while we're
at it. They don't belong to any database, so it's a little bit
orthogonal to this proposal, but it seems quite important to track
error on those too!
What about adding a new field in PgStat_GlobalStats for that? We can
use the same lastDir to easily detect such objects and slightly adapt
sendFile again, which seems quite straightforward.
On Sat, Mar 9, 2019 at 12:33 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Mar 9, 2019 at 12:35 AM Magnus Hagander <magnus@hagander.net>
wrote:On Mon, Mar 4, 2019 at 11:31 AM Julien Rouhaud <rjuju123@gmail.com>
wrote:
On Fri, Feb 22, 2019 at 3:01 PM Magnus Hagander <magnus@hagander.net>
wrote:
It tracks things that happen in the general backends. Possibly we
should also consider counting the errors actually found when running base
backups? OTOH, that part of the code doesn't really track things like
databases (as it operates just on the raw data directory underneath), so
that implementation would definitely not be as clean...Sorry I just realized that I totally forgot this part of the thread.
While it's true that we operate on raw directory, I see that sendDir()
already setup a isDbDir var, and if this is true lastDir should
contain the oid of the underlying database. Wouldn't it be enough to
call sendFile() using this, something like (untested):if (!sizeonly) - sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true); + sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true, isDbDir ? pg_atoi(lastDir+1, 4) : InvalidOid);and accordingly report any checksum error from sendFile()?
That seems it was easy enough. PFA an updated patch that does this, and
also rebased so it doesn't conflict on oid.
(yes, while moving this from draft to publish after lunch, I realized
that you put a patch togerher for about the same. So let's merge it)
Thanks! Our implementations are quite similar, so I'm fine with most
of the changes :) I'm just not sure about having two distinct
functions for reporting failures, given that there's only one caller
for each. On the other hand it avoids to include miscadmin.h in
bufpage.c.
Yeah, and it brings "cosistence" to at least the calling point(s) within
regular backends.
That's just a detail, so I'm marking it (again) as ready for committer!
Thanks, and pushed :)
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On Sat, Mar 9, 2019 at 10:41 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Mar 9, 2019 at 9:34 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Mar 9, 2019 at 12:35 AM Magnus Hagander <magnus@hagander.net>
wrote:
On Mon, Mar 4, 2019 at 11:31 AM Julien Rouhaud <rjuju123@gmail.com>
wrote:
On Fri, Feb 22, 2019 at 3:01 PM Magnus Hagander <magnus@hagander.net>
wrote:
It tracks things that happen in the general backends. Possibly we
should also consider counting the errors actually found when running base
backups? OTOH, that part of the code doesn't really track things like
databases (as it operates just on the raw data directory underneath), so
that implementation would definitely not be as clean...Sorry I just realized that I totally forgot this part of the thread.
While it's true that we operate on raw directory, I see that sendDir()
already setup a isDbDir var, and if this is true lastDir should
contain the oid of the underlying database. Wouldn't it be enough to
call sendFile() using this, something like (untested):if (!sizeonly) - sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true); + sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, true, isDbDir ? pg_atoi(lastDir+1, 4) : InvalidOid);and accordingly report any checksum error from sendFile()?
That seems it was easy enough. PFA an updated patch that does this,
and also rebased so it doesn't conflict on oid.
Sorry, I have again new comments after a little bit more thinking.
I'm wondering if we can do something about shared objects while we're
at it. They don't belong to any database, so it's a little bit
orthogonal to this proposal, but it seems quite important to track
error on those too!What about adding a new field in PgStat_GlobalStats for that? We can
use the same lastDir to easily detect such objects and slightly adapt
sendFile again, which seems quite straightforward.
Ah, didn't spot that one until after I pushed :/ Sorry about that.
Hmm. That's an interesting thought. And then add a column to
pg_stat_bgwriter, I assume? (Which is an ever increasingly bad name for the
view, but that's unrelated to this)
Question is then what number that should show -- only the checksum counter
in non-database-fields, or the total number across the cluster?
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On Sat, Mar 9, 2019 at 7:48 PM Magnus Hagander <magnus@hagander.net> wrote:
On Sat, Mar 9, 2019 at 12:33 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
Thanks! Our implementations are quite similar, so I'm fine with most
of the changes :) I'm just not sure about having two distinct
functions for reporting failures, given that there's only one caller
for each. On the other hand it avoids to include miscadmin.h in
bufpage.c.Yeah, and it brings "cosistence" to at least the calling point(s) within regular backends.
That's just a detail, so I'm marking it (again) as ready for committer!
Thanks, and pushed :)
Thanks :)
On Sat, Mar 9, 2019 at 7:50 PM Magnus Hagander <magnus@hagander.net> wrote:
On Sat, Mar 9, 2019 at 10:41 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
Sorry, I have again new comments after a little bit more thinking.
I'm wondering if we can do something about shared objects while we're
at it. They don't belong to any database, so it's a little bit
orthogonal to this proposal, but it seems quite important to track
error on those too!What about adding a new field in PgStat_GlobalStats for that? We can
use the same lastDir to easily detect such objects and slightly adapt
sendFile again, which seems quite straightforward.Ah, didn't spot that one until after I pushed :/ Sorry about that.
No problem, I should have thought about it sooner anyway.
Hmm. That's an interesting thought. And then add a column to pg_stat_bgwriter, I assume?
Yes, and a new entry for PgStat_Shared_Reset_Target I guess.
(Which is an ever increasingly bad name for the view, but that's
unrelated to this)
yeah :/
Question is then what number that should show -- only the checksum counter in non-database-fields, or the total number across the cluster?
I'd say only for non-database-fields errors, especially if we can
reset each counters separately. If necessary, we can add a new view
to give a global overview of checksum errors for DBA convenience.
On Sat, Mar 9, 2019 at 7:58 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Mar 9, 2019 at 7:50 PM Magnus Hagander <magnus@hagander.net> wrote:
On Sat, Mar 9, 2019 at 10:41 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
Sorry, I have again new comments after a little bit more thinking.
I'm wondering if we can do something about shared objects while we're
at it. They don't belong to any database, so it's a little bit
orthogonal to this proposal, but it seems quite important to track
error on those too!What about adding a new field in PgStat_GlobalStats for that? We can
use the same lastDir to easily detect such objects and slightly adapt
sendFile again, which seems quite straightforward.Question is then what number that should show -- only the checksum counter in non-database-fields, or the total number across the cluster?
I'd say only for non-database-fields errors, especially if we can
reset each counters separately. If necessary, we can add a new view
to give a global overview of checksum errors for DBA convenience.
So, after reading current implementation, I don't think that
PgStat_GlobalStats is the right place. It's already enough of a mess,
and clearly pg_stat_reset_shared('bgwriter') would not make any sense
if it did reset the shared relations checksum errors (though arguably
the fact that's it's resetting checkpointer stats right now is hardly
better), and handling a different target to reset part of
PgStat_GlobalStats counters would be an ugly kludge.
I'm considering adding a new PgStat_ChecksumStats for that purpose
instead, but I don't know if that's acceptable to do so in the last
commitfest. It seems worthwhile to add it eventually, since we'll
probably end up having more things to report to users related to
checksum. Online enabling of checksum could be the most immediate
potential target.
If that's acceptable, I was thinking this new stat could have those
fields with the first drop:
- number of non-db-related checksum checks done
- number of non-db-related checksum checks failed
- last stats reset
(and adding the number of checks for db-related blocks done with the
current checksum errors counter). Maybe also adding a
pg_checksum_stats view that would summarise all the counters in one
place.
It'll obviously add some traffic to the stats collector, but I'd hope
not too much, since BufferAlloc shouldn't be that frequent, and
BASE_BACKUP reports stats only once per file.
On Sun, Mar 10, 2019 at 1:13 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Mar 9, 2019 at 7:58 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Mar 9, 2019 at 7:50 PM Magnus Hagander <magnus@hagander.net> wrote:
On Sat, Mar 9, 2019 at 10:41 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
Sorry, I have again new comments after a little bit more thinking.
I'm wondering if we can do something about shared objects while we're
at it. They don't belong to any database, so it's a little bit
orthogonal to this proposal, but it seems quite important to track
error on those too!What about adding a new field in PgStat_GlobalStats for that? We can
use the same lastDir to easily detect such objects and slightly adapt
sendFile again, which seems quite straightforward.Question is then what number that should show -- only the checksum counter in non-database-fields, or the total number across the cluster?
I'd say only for non-database-fields errors, especially if we can
reset each counters separately. If necessary, we can add a new view
to give a global overview of checksum errors for DBA convenience.I'm considering adding a new PgStat_ChecksumStats for that purpose
instead, but I don't know if that's acceptable to do so in the last
commitfest. It seems worthwhile to add it eventually, since we'll
probably end up having more things to report to users related to
checksum. Online enabling of checksum could be the most immediate
potential target.
I wasn't aware that we were already storing informations about shared
objects in PgStat_StatDBEntry, with an InvalidOid as databaseid
(though we don't have any system view that are actually showing
information for such objects).
As a result I ended up simply adding counters for the number of total
checks and the timestamp of the last failure in PgStat_StatDBEntry,
making attached patch very lightweight. I moved all the checksum
related counters out of pg_stat_database in a new pg_stat_checksum
view. It avoids to make pg_stat_database too wide, and also allows to
display information about shared object in this new view (some of the
other counters don't really make sense for shared objects or could
break existing monitoring query). While at it, I tried to add a
little bit of documentation wrt. checksum monitoring.
On Wed, Mar 13, 2019 at 4:53 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sun, Mar 10, 2019 at 1:13 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Mar 9, 2019 at 7:58 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Mar 9, 2019 at 7:50 PM Magnus Hagander <magnus@hagander.net> wrote:
On Sat, Mar 9, 2019 at 10:41 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
Sorry, I have again new comments after a little bit more thinking.
I'm wondering if we can do something about shared objects while we're
at it. They don't belong to any database, so it's a little bit
orthogonal to this proposal, but it seems quite important to track
error on those too!What about adding a new field in PgStat_GlobalStats for that? We can
use the same lastDir to easily detect such objects and slightly adapt
sendFile again, which seems quite straightforward.Question is then what number that should show -- only the checksum counter in non-database-fields, or the total number across the cluster?
I'd say only for non-database-fields errors, especially if we can
reset each counters separately. If necessary, we can add a new view
to give a global overview of checksum errors for DBA convenience.I'm considering adding a new PgStat_ChecksumStats for that purpose
instead, but I don't know if that's acceptable to do so in the last
commitfest. It seems worthwhile to add it eventually, since we'll
probably end up having more things to report to users related to
checksum. Online enabling of checksum could be the most immediate
potential target.I wasn't aware that we were already storing informations about shared
objects in PgStat_StatDBEntry, with an InvalidOid as databaseid
(though we don't have any system view that are actually showing
information for such objects).As a result I ended up simply adding counters for the number of total
checks and the timestamp of the last failure in PgStat_StatDBEntry,
making attached patch very lightweight. I moved all the checksum
related counters out of pg_stat_database in a new pg_stat_checksum
view. It avoids to make pg_stat_database too wide, and also allows to
display information about shared object in this new view (some of the
other counters don't really make sense for shared objects or could
break existing monitoring query). While at it, I tried to add a
little bit of documentation wrt. checksum monitoring.
and of course I forgot to attach the patch.
Attachments:
pg_stat_checksum-v1.diffapplication/octet-stream; name=pg_stat_checksum-v1.diffDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e2630fd368..44ce5a045a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -376,6 +376,14 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
</entry>
</row>
+ <row>
+ <entry><structname>pg_stat_checksum</structname><indexterm><primary>pg_stat_checksum</primary></indexterm></entry>
+ <entry>One row per database, plus one for the shared objects, showing
+ database-wide checksums statistics. See
+ <xref linkend="pg-stat-checksum-view"/> for details.
+ </entry>
+ </row>
+
<row>
<entry><structname>pg_stat_database</structname><indexterm><primary>pg_stat_database</primary></indexterm></entry>
<entry>One row per database, showing database-wide statistics. See
@@ -2397,6 +2405,59 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
single row, containing global data for the cluster.
</para>
+ <table id="pg-stat-checksum-view" xreflabel="pg_stat_checksum">
+ <title><structname>pg_stat_checksum</structname> View</title>
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Column</entry>
+ <entry>Type</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry><structfield>datid</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry>OID of a database, or 0 for objects belonging to a shared relation</entry>
+ </row>
+ <row>
+ <entry><structfield>datname</structfield></entry>
+ <entry><type>name</type></entry>
+ <entry>Name of this database, or <literal><shared_objects></literal></entry>
+ </row>
+ <row>
+ <entry><structfield>checksum_checks</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry>Number of data page checksum checks processed in this database</entry>
+ </row>
+ <row>
+ <entry><structfield>checksum_failures</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry>Number of data page checksum failures detected in this
+ database</entry>
+ </row>
+ <row>
+ <entry><structfield>checksum_last_failure</structfield></entry>
+ <entry><type>timestamp with time zone</type></entry>
+ <entry>Time at which the last data page checksum failures was detected in
+ this database</entry>
+ </row>
+ <row>
+ <entry><structfield>stats_reset</structfield></entry>
+ <entry><type>timestamp with time zone</type></entry>
+ <entry>Time at which these statistics were last reset</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ The <structname>pg_stat_database</structname> view will contain one row
+ for each database in the cluster, showing database-wide statistics.
+ </para>
+
<table id="pg-stat-database-view" xreflabel="pg_stat_database">
<title><structname>pg_stat_database</structname> View</title>
<tgroup cols="3">
@@ -2508,11 +2569,6 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
<entry><type>bigint</type></entry>
<entry>Number of deadlocks detected in this database</entry>
</row>
- <row>
- <entry><structfield>checksum_failures</structfield></entry>
- <entry><type>bigint</type></entry>
- <entry>Number of data page checksum failures detected in this database</entry>
- </row>
<row>
<entry><structfield>blk_read_time</structfield></entry>
<entry><type>double precision</type></entry>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 84fb37c293..7d4fed9a91 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -218,7 +218,9 @@ PostgreSQL documentation
I/O system that would otherwise be silent. Enabling checksums
may incur a noticeable performance penalty. This option can only
be set during initialization, and cannot be changed later. If
- set, checksums are calculated for all objects, in all databases.
+ set, checksums are calculated for all objects, in all databases. All
+ checksum activity will be reported in the <xref
+ linkend="pg-stat-checksum-view"/> view.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index c4f3950e5b..82a2f8999d 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -531,7 +531,8 @@ PostgreSQL documentation
By default, checksums are verified and checksum failures will result
in a non-zero exit status. However, the base backup will not be
removed in such a case, as if the <option>--no-clean</option> option
- had been used.
+ had been used. Checksum verifications will also be reported in the
+ <xref linkend="pg-stat-checksum-view"/> view.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 6a47dda683..c207d7db0c 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -38,8 +38,10 @@ PostgreSQL documentation
<para>
<application>pg_checksums</application> verifies data checksums in a
<productname>PostgreSQL</productname> cluster. The server must be shut
- down cleanly before running <application>pg_checksums</application>.
- The exit status is zero if there are no checksum errors, otherwise nonzero.
+ down cleanly before running <application>pg_checksums</application>. As a
+ consequence, the <structname>pg_stat_checksum</structname> view won't
+ reflect this activity. The exit status is zero if there are no checksum
+ errors, otherwise nonzero.
</para>
</refsect1>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7723f01327..68acc21604 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -823,12 +823,25 @@ CREATE VIEW pg_stat_database AS
pg_stat_get_db_temp_files(D.oid) AS temp_files,
pg_stat_get_db_temp_bytes(D.oid) AS temp_bytes,
pg_stat_get_db_deadlocks(D.oid) AS deadlocks,
- pg_stat_get_db_checksum_failures(D.oid) AS checksum_failures,
pg_stat_get_db_blk_read_time(D.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(D.oid) AS blk_write_time,
pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
FROM pg_database D;
+CREATE VIEW pg_stat_checksum AS
+ SELECT
+ D.oid AS datid,
+ D.datname AS datname,
+ pg_stat_get_db_checksum_checks(D.oid) AS checksum_checks,
+ pg_stat_get_db_checksum_failures(D.oid) AS checksum_failures,
+ pg_stat_get_db_checksum_last_failure(D.oid) AS checksum_last_failure,
+ pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
+ FROM (
+ SELECT oid, datname FROM pg_database
+ UNION ALL
+ SELECT 0, '<shared objects>'
+ ) D;
+
CREATE VIEW pg_stat_database_conflicts AS
SELECT
D.oid AS datid,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index ba31f532ea..a18bd00dfc 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -335,7 +335,7 @@ static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len);
static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
-static void pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len);
+static void pgstat_recv_checksum(PgStat_MsgChecksum *msg, int len);
static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
/* ------------------------------------------------------------
@@ -1523,35 +1523,42 @@ pgstat_report_deadlock(void)
/* --------
- * pgstat_report_checksum_failures_in_db(dboid, failure_count) -
+ * pgstat_report_checksum_in_db(dboid, failure_count) -
*
- * Tell the collector about one or more checksum failures.
+ * Tell the collector about one or more checksum event.
* --------
*/
void
-pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount)
+pgstat_report_checksum_in_db(Oid dboid, int checkcount, int failurecount)
{
- PgStat_MsgChecksumFailure msg;
+ PgStat_MsgChecksum msg;
if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
return;
- pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_CHECKSUMFAILURE);
+ pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_CHECKSUM);
msg.m_databaseid = dboid;
+ msg.m_checkcount = checkcount;
msg.m_failurecount = failurecount;
+
+ if (failurecount > 0)
+ msg.m_failure_time = GetCurrentTimestamp();
+ else
+ msg.m_failure_time = 0;
+
pgstat_send(&msg, sizeof(msg));
}
/* --------
- * pgstat_report_checksum_failure() -
+ * pgstat_report_checksum() -
*
- * Tell the collector about a checksum failure.
+ * Tell the collector about a checksum verification.
* --------
*/
void
-pgstat_report_checksum_failure(void)
+pgstat_report_checksum(bool failed)
{
- pgstat_report_checksum_failures_in_db(MyDatabaseId, 1);
+ pgstat_report_checksum_in_db(MyDatabaseId, 1, failed ? 1 : 0);
}
/* --------
@@ -4491,8 +4498,8 @@ PgstatCollectorMain(int argc, char *argv[])
pgstat_recv_tempfile((PgStat_MsgTempFile *) &msg, len);
break;
- case PGSTAT_MTYPE_CHECKSUMFAILURE:
- pgstat_recv_checksum_failure((PgStat_MsgChecksumFailure *) &msg, len);
+ case PGSTAT_MTYPE_CHECKSUM:
+ pgstat_recv_checksum((PgStat_MsgChecksum *) &msg, len);
break;
default:
@@ -4594,7 +4601,9 @@ reset_dbentry_counters(PgStat_StatDBEntry *dbentry)
dbentry->n_temp_files = 0;
dbentry->n_temp_bytes = 0;
dbentry->n_deadlocks = 0;
+ dbentry->n_checksum_checks = 0;
dbentry->n_checksum_failures = 0;
+ dbentry->last_checksum_failure = 0;
dbentry->n_block_read_time = 0;
dbentry->n_block_write_time = 0;
@@ -6240,17 +6249,24 @@ pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len)
/* ----------
* pgstat_recv_checksum_failure() -
*
- * Process a CHECKSUMFAILURE message.
+ * Process a CHECKSUM message.
* ----------
*/
static void
-pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
+pgstat_recv_checksum(PgStat_MsgChecksum *msg, int len)
{
PgStat_StatDBEntry *dbentry;
dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
+ dbentry->n_checksum_checks += msg->m_checkcount;
dbentry->n_checksum_failures += msg->m_failurecount;
+
+ if (msg->m_failurecount > 0)
+ {
+ Assert(msg->m_failure_time != 0);
+ dbentry->last_checksum_failure = msg->m_failure_time;
+ }
}
/* ----------
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 537f09e342..5bd5fdb78c 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1373,7 +1373,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
bool block_retry = false;
char buf[TAR_SEND_SIZE];
uint16 checksum;
- int checksum_failures = 0;
+ int checksum_checks = 0, checksum_failures = 0;
off_t cnt;
int i;
pgoff_t len = 0;
@@ -1527,6 +1527,8 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
"failures in file \"%s\" will not "
"be reported", readfilename)));
}
+ else if (block_retry == false)
+ checksum_checks++;
}
block_retry = false;
blkno++;
@@ -1583,10 +1585,9 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
ereport(WARNING,
(errmsg("file \"%s\" has a total of %d checksum verification "
"failures", readfilename, checksum_failures)));
-
- if (dboid != InvalidOid)
- pgstat_report_checksum_failures_in_db(dboid, checksum_failures);
}
+
+ pgstat_report_checksum_in_db(dboid, checksum_checks, checksum_failures);
total_checksum_failures += checksum_failures;
return true;
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 14bc61b8ad..9428e7cfa1 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -152,11 +152,13 @@ PageIsVerified(Page page, BlockNumber blkno)
errmsg("page verification failed, calculated checksum %u but expected %u",
checksum, p->pd_checksum)));
- pgstat_report_checksum_failure();
+ pgstat_report_checksum(true);
if (header_sane && ignore_checksum_failure)
return true;
}
+ else
+ pgstat_report_checksum(false);
return false;
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index da1d685c08..5faf8c156c 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1497,6 +1497,21 @@ pg_stat_get_db_deadlocks(PG_FUNCTION_ARGS)
PG_RETURN_INT64(result);
}
+Datum
+pg_stat_get_db_checksum_checks(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ int64 result;
+ PgStat_StatDBEntry *dbentry;
+
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = (int64) (dbentry->n_checksum_checks);
+
+ PG_RETURN_INT64(result);
+}
+
Datum
pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
{
@@ -1512,6 +1527,24 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
PG_RETURN_INT64(result);
}
+Datum
+pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ TimestampTz result;
+ PgStat_StatDBEntry *dbentry;
+
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = dbentry->last_checksum_failure;
+
+ if (result == 0)
+ PG_RETURN_NULL();
+ else
+ PG_RETURN_TIMESTAMPTZ(result);
+}
+
Datum
pg_stat_get_db_blk_read_time(PG_FUNCTION_ARGS)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c4b012cf4c..43b93698b7 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5253,10 +5253,20 @@
prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_db_deadlocks' },
{ oid => '3426',
+ descr => 'statistics: checksum checks done in database',
+ proname => 'pg_stat_get_db_checksum_checks', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_checksum_checks' },
+{ oid => '3427',
descr => 'statistics: checksum failures detected in database',
proname => 'pg_stat_get_db_checksum_failures', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_db_checksum_failures' },
+{ oid => '3428',
+ descr => 'statistics: last checksum failure detected in database',
+ proname => 'pg_stat_get_db_checksum_last_failure', provolatile => 's',
+ proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_checksum_last_failure' },
{ oid => '3074', descr => 'statistics: last reset for a database',
proname => 'pg_stat_get_db_stat_reset_time', provolatile => 's',
proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 725c8b0d64..7f19e2e451 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -65,7 +65,7 @@ typedef enum StatMsgType
PGSTAT_MTYPE_RECOVERYCONFLICT,
PGSTAT_MTYPE_TEMPFILE,
PGSTAT_MTYPE_DEADLOCK,
- PGSTAT_MTYPE_CHECKSUMFAILURE
+ PGSTAT_MTYPE_CHECKSUM
} StatMsgType;
/* ----------
@@ -532,16 +532,18 @@ typedef struct PgStat_MsgDeadlock
} PgStat_MsgDeadlock;
/* ----------
- * PgStat_MsgChecksumFailure Sent by the backend to tell the collector
- * about checksum failures noticed.
+ * PgStat_MsgChecksum Sent by the backend to tell the collector
+ * about checksum verficiations done.
* ----------
*/
-typedef struct PgStat_MsgChecksumFailure
+typedef struct PgStat_MsgChecksum
{
PgStat_MsgHdr m_hdr;
Oid m_databaseid;
+ int m_checkcount;
int m_failurecount;
-} PgStat_MsgChecksumFailure;
+ TimestampTz m_failure_time;
+} PgStat_MsgChecksum;
/* ----------
@@ -606,7 +608,9 @@ typedef struct PgStat_StatDBEntry
PgStat_Counter n_temp_files;
PgStat_Counter n_temp_bytes;
PgStat_Counter n_deadlocks;
+ PgStat_Counter n_checksum_checks;
PgStat_Counter n_checksum_failures;
+ TimestampTz last_checksum_failure;
PgStat_Counter n_block_read_time; /* times in microseconds */
PgStat_Counter n_block_write_time;
@@ -678,14 +682,14 @@ typedef struct PgStat_StatFuncEntry
*/
typedef struct PgStat_ArchiverStats
{
- PgStat_Counter archived_count; /* archival successes */
- char last_archived_wal[MAX_XFN_CHARS + 1]; /* last WAL file
+ PgStat_Counter archived_count; /* archival successes */
+ char last_archived_wal[MAX_XFN_CHARS + 1]; /* last WAL file
* archived */
- TimestampTz last_archived_timestamp; /* last archival success time */
- PgStat_Counter failed_count; /* failed archival attempts */
- char last_failed_wal[MAX_XFN_CHARS + 1]; /* WAL file involved in
+ TimestampTz last_archived_timestamp; /* last archival success time */
+ PgStat_Counter failed_count; /* failed archival attempts */
+ char last_failed_wal[MAX_XFN_CHARS + 1]; /* WAL file involved in
* last failure */
- TimestampTz last_failed_timestamp; /* last archival failure time */
+ TimestampTz last_failed_timestamp; /* last archival failure time */
TimestampTz stat_reset_timestamp;
} PgStat_ArchiverStats;
@@ -1214,8 +1218,8 @@ extern void pgstat_report_analyze(Relation rel,
extern void pgstat_report_recovery_conflict(int reason);
extern void pgstat_report_deadlock(void);
-extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
-extern void pgstat_report_checksum_failure(void);
+extern void pgstat_report_checksum_in_db(Oid dboid, int checkcount, int failurecount);
+extern void pgstat_report_checksum(bool failed);
extern void pgstat_initialize(void);
extern void pgstat_bestart(void);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index f104dc4a62..c73f1c5edd 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1801,6 +1801,18 @@ pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints
pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
pg_stat_get_buf_alloc() AS buffers_alloc,
pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
+pg_stat_checksum| SELECT d.oid AS datid,
+ d.datname,
+ pg_stat_get_db_checksum_checks(d.oid) AS checksum_checks,
+ pg_stat_get_db_checksum_failures(d.oid) AS checksum_failures,
+ pg_stat_get_db_checksum_last_failure(d.oid) AS checksum_last_failure,
+ pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset
+ FROM ( SELECT pg_database.oid,
+ pg_database.datname
+ FROM pg_database
+ UNION ALL
+ SELECT 0,
+ '<shared objects>'::name) d;
pg_stat_database| SELECT d.oid AS datid,
d.datname,
pg_stat_get_db_numbackends(d.oid) AS numbackends,
@@ -1817,7 +1829,6 @@ pg_stat_database| SELECT d.oid AS datid,
pg_stat_get_db_temp_files(d.oid) AS temp_files,
pg_stat_get_db_temp_bytes(d.oid) AS temp_bytes,
pg_stat_get_db_deadlocks(d.oid) AS deadlocks,
- pg_stat_get_db_checksum_failures(d.oid) AS checksum_failures,
pg_stat_get_db_blk_read_time(d.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(d.oid) AS blk_write_time,
pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset
On Wed, Mar 13, 2019 at 4:54 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Wed, Mar 13, 2019 at 4:53 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sun, Mar 10, 2019 at 1:13 PM Julien Rouhaud <rjuju123@gmail.com>
wrote:
On Sat, Mar 9, 2019 at 7:58 PM Julien Rouhaud <rjuju123@gmail.com>
wrote:
On Sat, Mar 9, 2019 at 7:50 PM Magnus Hagander <magnus@hagander.net>
wrote:
On Sat, Mar 9, 2019 at 10:41 AM Julien Rouhaud <rjuju123@gmail.com>
wrote:
Sorry, I have again new comments after a little bit more thinking.
I'm wondering if we can do something about shared objects whilewe're
at it. They don't belong to any database, so it's a little bit
orthogonal to this proposal, but it seems quite important to track
error on those too!What about adding a new field in PgStat_GlobalStats for that? We
can
use the same lastDir to easily detect such objects and slightly
adapt
sendFile again, which seems quite straightforward.
Question is then what number that should show -- only the checksum
counter in non-database-fields, or the total number across the cluster?
I'd say only for non-database-fields errors, especially if we can
reset each counters separately. If necessary, we can add a new view
to give a global overview of checksum errors for DBA convenience.I'm considering adding a new PgStat_ChecksumStats for that purpose
instead, but I don't know if that's acceptable to do so in the last
commitfest. It seems worthwhile to add it eventually, since we'll
probably end up having more things to report to users related to
checksum. Online enabling of checksum could be the most immediate
potential target.I wasn't aware that we were already storing informations about shared
objects in PgStat_StatDBEntry, with an InvalidOid as databaseid
(though we don't have any system view that are actually showing
information for such objects).As a result I ended up simply adding counters for the number of total
checks and the timestamp of the last failure in PgStat_StatDBEntry,
making attached patch very lightweight. I moved all the checksum
related counters out of pg_stat_database in a new pg_stat_checksum
view. It avoids to make pg_stat_database too wide, and also allows to
display information about shared object in this new view (some of the
other counters don't really make sense for shared objects or could
break existing monitoring query). While at it, I tried to add a
little bit of documentation wrt. checksum monitoring.and of course I forgot to attach the patch.
Does it really make any sense to track "number of checksum checks"? In any
sort of interesting database that's just going to be an insanely high
number, isn't it? (And also, to stay consistent with checksum failures, we
should of course also count the checks done in base backups, which is not
in the patch. But I'm more thinking we should drop it)
I do like the addition of the "last failure" column, that's really useful.
Having thought some more about this, I wonder if the right thing to do is
to actually add a row to pg_stat_database for the global stats, rather than
invent a separate view. I can see the argument going both ways, but
particularly with the name pg_stat_checksums we are setting a pattern that
will create one view for each counter. That's not very good, I think.
In the end I'm somewhat split on the idea of pg_stat_database with a NULL
row or pg_stat_checkpoints. What do others think?
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On Sat, Mar 30, 2019 at 2:33 PM Magnus Hagander <magnus@hagander.net> wrote:
On Wed, Mar 13, 2019 at 4:54 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Wed, Mar 13, 2019 at 4:53 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
As a result I ended up simply adding counters for the number of total
checks and the timestamp of the last failure in PgStat_StatDBEntry,
making attached patch very lightweight. I moved all the checksum
related counters out of pg_stat_database in a new pg_stat_checksum
view. It avoids to make pg_stat_database too wide, and also allows to
display information about shared object in this new view (some of the
other counters don't really make sense for shared objects or could
break existing monitoring query). While at it, I tried to add a
little bit of documentation wrt. checksum monitoring.and of course I forgot to attach the patch.
Does it really make any sense to track "number of checksum checks"? In any sort of interesting database that's just going to be an insanely high number, isn't it? (And also, to stay consistent with checksum failures, we should of course also count the checks done in base backups, which is not in the patch. But I'm more thinking we should drop it)
Thanks for looking at it!
It's surely going to be a huge number on databases with a large number
of buffer eviction and/or frequent pg_basebackup. The idea was to be
able to know if the possible lack of failure was due to lack of check
at all or because the server appears to be healthy, without spamming
gettimeofday calls. If having a last_check() is better, I'm fine with
it. If it's useless, let's drop it.
The number of checks was supposed to also be tracked in base_backups, with
@@ -1527,6 +1527,8 @@ sendFile(const char *readfilename, const char
*tarfilename, struct stat *statbuf
"failures in file \"%s\" will not "
"be reported", readfilename)));
}
+ else if (block_retry == false)
+ checksum_checks++;
Having thought some more about this, I wonder if the right thing to do is to actually add a row to pg_stat_database for the global stats, rather than invent a separate view. I can see the argument going both ways, but particularly with the name pg_stat_checksums we are setting a pattern that will create one view for each counter. That's not very good, I think.
In the end I'm somewhat split on the idea of pg_stat_database with a NULL row or pg_stat_checkpoints. What do others think?
I agree that having a separate view for each counter if a bad idea.
But what I was thinking is that we'll probably end up with a view to
track per-db online checksum activation progress/activity/status at
some point (similar to pg_stat_progress_vacuum), so why not starting
with this dedicated view right now and add new counters later, either
in pgstat and/or some shmem, as long as we keep the view name as SQL
interface.
Anyway, I don't have a strong preference for any implementation, so
I'll be happy to send an updated patch with what ends up being
preferred.
On Sat, Mar 30, 2019 at 3:55 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Mar 30, 2019 at 2:33 PM Magnus Hagander <magnus@hagander.net>
wrote:On Wed, Mar 13, 2019 at 4:54 PM Julien Rouhaud <rjuju123@gmail.com>
wrote:
On Wed, Mar 13, 2019 at 4:53 PM Julien Rouhaud <rjuju123@gmail.com>
wrote:
As a result I ended up simply adding counters for the number of total
checks and the timestamp of the last failure in PgStat_StatDBEntry,
making attached patch very lightweight. I moved all the checksum
related counters out of pg_stat_database in a new pg_stat_checksum
view. It avoids to make pg_stat_database too wide, and also allows to
display information about shared object in this new view (some of the
other counters don't really make sense for shared objects or could
break existing monitoring query). While at it, I tried to add a
little bit of documentation wrt. checksum monitoring.and of course I forgot to attach the patch.
Does it really make any sense to track "number of checksum checks"? In
any sort of interesting database that's just going to be an insanely high
number, isn't it? (And also, to stay consistent with checksum failures, we
should of course also count the checks done in base backups, which is not
in the patch. But I'm more thinking we should drop it)Thanks for looking at it!
It's surely going to be a huge number on databases with a large number
of buffer eviction and/or frequent pg_basebackup. The idea was to be
able to know if the possible lack of failure was due to lack of check
at all or because the server appears to be healthy, without spamming
gettimeofday calls. If having a last_check() is better, I'm fine with
it. If it's useless, let's drop it.
I'm not sure either of them are really useful, but would be happy to take
input from others :)
The number of checks was supposed to also be tracked in base_backups, with
Oh, that's a sloppy review. I see it's there. However, it doesn't appear to
count up in the *normal* backend path...
My vote is still to drop it completely, but if we're keeping it, it has to
go in both paths.
Having thought some more about this, I wonder if the right thing to do is
to actually add a row to pg_stat_database for the global stats, rather than
invent a separate view. I can see the argument going both ways, but
particularly with the name pg_stat_checksums we are setting a pattern that
will create one view for each counter. That's not very good, I think.In the end I'm somewhat split on the idea of pg_stat_database with a
NULL row or pg_stat_checkpoints. What do others think?
I agree that having a separate view for each counter if a bad idea.
But what I was thinking is that we'll probably end up with a view to
track per-db online checksum activation progress/activity/status at
some point (similar to pg_stat_progress_vacuum), so why not starting
with this dedicated view right now and add new counters later, either
in pgstat and/or some shmem, as long as we keep the view name as SQL
interface.
Technically, that should be in pg_stat_progress_checksums to be consistent
:) So whichever way we turn, it's going to be inconsistent with something.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Sorry for delay, I had to catch a train.
On Sat, Mar 30, 2019 at 4:02 PM Magnus Hagander <magnus@hagander.net> wrote:
My vote is still to drop it completely, but if we're keeping it, it has to go in both paths.
Ok. For now I'm attaching v2, which drops this field, rename the view
to pg_stat_checksums (terminal s), and use the policy for choosing
random oid in the 8000..9999 range for new functions.
I'd also have to get more feedback on this. For now, I'll add this
thread to the pg12 open items, as a follow up of the initial code
drop.
Technically, that should be in pg_stat_progress_checksums to be consistent :) So whichever way we turn, it's going to be inconsistent with something.
Indeed :)
Attachments:
pg_stat_checksums-v2.difftext/x-patch; charset=US-ASCII; name=pg_stat_checksums-v2.diffDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index f1df14bdea..30674f61ce 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -384,6 +384,14 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
</entry>
</row>
+ <row>
+ <entry><structname>pg_stat_checksums</structname><indexterm><primary>pg_stat_checksums</primary></indexterm></entry>
+ <entry>One row per database, plus one for the shared objects, showing
+ database-wide checksums statistics. See
+ <xref linkend="pg-stat-checksums-view"/> for details.
+ </entry>
+ </row>
+
<row>
<entry><structname>pg_stat_database</structname><indexterm><primary>pg_stat_database</primary></indexterm></entry>
<entry>One row per database, showing database-wide statistics. See
@@ -2418,6 +2426,54 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
single row, containing global data for the cluster.
</para>
+ <table id="pg-stat-checksums-view" xreflabel="pg_stat_checksums">
+ <title><structname>pg_stat_checksums</structname> View</title>
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Column</entry>
+ <entry>Type</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry><structfield>datid</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry>OID of a database, or 0 for objects belonging to a shared relation</entry>
+ </row>
+ <row>
+ <entry><structfield>datname</structfield></entry>
+ <entry><type>name</type></entry>
+ <entry>Name of this database, or <literal><shared_objects></literal></entry>
+ </row>
+ <row>
+ <entry><structfield>checksum_failures</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry>Number of data page checksum failures detected in this
+ database</entry>
+ </row>
+ <row>
+ <entry><structfield>checksum_last_failure</structfield></entry>
+ <entry><type>timestamp with time zone</type></entry>
+ <entry>Time at which the last data page checksum failures was detected in
+ this database</entry>
+ </row>
+ <row>
+ <entry><structfield>stats_reset</structfield></entry>
+ <entry><type>timestamp with time zone</type></entry>
+ <entry>Time at which these statistics were last reset</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ The <structname>pg_stat_database</structname> view will contain one row
+ for each database in the cluster, showing database-wide statistics.
+ </para>
+
<table id="pg-stat-database-view" xreflabel="pg_stat_database">
<title><structname>pg_stat_database</structname> View</title>
<tgroup cols="3">
@@ -2529,11 +2585,6 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
<entry><type>bigint</type></entry>
<entry>Number of deadlocks detected in this database</entry>
</row>
- <row>
- <entry><structfield>checksum_failures</structfield></entry>
- <entry><type>bigint</type></entry>
- <entry>Number of data page checksum failures detected in this database</entry>
- </row>
<row>
<entry><structfield>blk_read_time</structfield></entry>
<entry><type>double precision</type></entry>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 84fb37c293..e063dd972e 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -218,7 +218,9 @@ PostgreSQL documentation
I/O system that would otherwise be silent. Enabling checksums
may incur a noticeable performance penalty. This option can only
be set during initialization, and cannot be changed later. If
- set, checksums are calculated for all objects, in all databases.
+ set, checksums are calculated for all objects, in all databases. All
+ checksum failures will be reported in the <xref
+ linkend="pg-stat-checksums-view"/> view.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index c4f3950e5b..8179db5f2a 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -531,7 +531,8 @@ PostgreSQL documentation
By default, checksums are verified and checksum failures will result
in a non-zero exit status. However, the base backup will not be
removed in such a case, as if the <option>--no-clean</option> option
- had been used.
+ had been used. Checksum verifications failures will also be reported
+ in the <xref linkend="pg-stat-checksums-view"/> view.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 70339eaec9..65e067a8aa 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -39,10 +39,11 @@ PostgreSQL documentation
<application>pg_checksums</application> checks, enables or disables data
checksums in a <productname>PostgreSQL</productname> cluster. The server
must be shut down cleanly before running
- <application>pg_checksums</application>. The exit status is zero if there
- are no checksum errors when checking them, and nonzero if at least one
- checksum failure is detected. If enabling or disabling checksums, the
- exit status is nonzero if the operation failed.
+ <application>pg_checksums</application>. As a consequence, the
+ <structname>pg_stat_checksums</structname> view won't reflect this activity.
+ The exit status is zero if there are no checksum errors when checking them,
+ and nonzero if at least one checksum failure is detected. If enabling or
+ disabling checksums, the exit status is nonzero if the operation failed.
</para>
<para>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b89df70653..ffe53bf888 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -823,12 +823,24 @@ CREATE VIEW pg_stat_database AS
pg_stat_get_db_temp_files(D.oid) AS temp_files,
pg_stat_get_db_temp_bytes(D.oid) AS temp_bytes,
pg_stat_get_db_deadlocks(D.oid) AS deadlocks,
- pg_stat_get_db_checksum_failures(D.oid) AS checksum_failures,
pg_stat_get_db_blk_read_time(D.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(D.oid) AS blk_write_time,
pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
FROM pg_database D;
+CREATE VIEW pg_stat_checksums AS
+ SELECT
+ D.oid AS datid,
+ D.datname AS datname,
+ pg_stat_get_db_checksum_failures(D.oid) AS checksum_failures,
+ pg_stat_get_db_checksum_last_failure(D.oid) AS checksum_last_failure,
+ pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
+ FROM (
+ SELECT oid, datname FROM pg_database
+ UNION ALL
+ SELECT 0, '<shared objects>'
+ ) D;
+
CREATE VIEW pg_stat_database_conflicts AS
SELECT
D.oid AS datid,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 2a8472b91a..4bf5abab5d 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1539,6 +1539,8 @@ pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount)
pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_CHECKSUMFAILURE);
msg.m_databaseid = dboid;
msg.m_failurecount = failurecount;
+ msg.m_failure_time = GetCurrentTimestamp();
+
pgstat_send(&msg, sizeof(msg));
}
@@ -4601,6 +4603,7 @@ reset_dbentry_counters(PgStat_StatDBEntry *dbentry)
dbentry->n_temp_bytes = 0;
dbentry->n_deadlocks = 0;
dbentry->n_checksum_failures = 0;
+ dbentry->last_checksum_failure = 0;
dbentry->n_block_read_time = 0;
dbentry->n_block_write_time = 0;
@@ -6257,6 +6260,7 @@ pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
dbentry->n_checksum_failures += msg->m_failurecount;
+ dbentry->last_checksum_failure = msg->m_failure_time;
}
/* ----------
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 537f09e342..36dcb28754 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1584,9 +1584,9 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
(errmsg("file \"%s\" has a total of %d checksum verification "
"failures", readfilename, checksum_failures)));
- if (dboid != InvalidOid)
- pgstat_report_checksum_failures_in_db(dboid, checksum_failures);
+ pgstat_report_checksum_failures_in_db(dboid, checksum_failures);
}
+
total_checksum_failures += checksum_failures;
return true;
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 90a817a25c..b28d9ab216 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1514,6 +1514,24 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
PG_RETURN_INT64(result);
}
+Datum
+pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ TimestampTz result;
+ PgStat_StatDBEntry *dbentry;
+
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = dbentry->last_checksum_failure;
+
+ if (result == 0)
+ PG_RETURN_NULL();
+ else
+ PG_RETURN_TIMESTAMPTZ(result);
+}
+
Datum
pg_stat_get_db_blk_read_time(PG_FUNCTION_ARGS)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index eac909109c..c4aeeee810 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5281,6 +5281,11 @@
proname => 'pg_stat_get_db_checksum_failures', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_db_checksum_failures' },
+{ oid => '8394',
+ descr => 'statistics: when last checksum failure was detected in database',
+ proname => 'pg_stat_get_db_checksum_last_failure', provolatile => 's',
+ proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_checksum_last_failure' },
{ oid => '3074', descr => 'statistics: last reset for a database',
proname => 'pg_stat_get_db_stat_reset_time', provolatile => 's',
proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c080fa6388..d6d1772706 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -541,6 +541,7 @@ typedef struct PgStat_MsgChecksumFailure
PgStat_MsgHdr m_hdr;
Oid m_databaseid;
int m_failurecount;
+ TimestampTz m_failure_time;
} PgStat_MsgChecksumFailure;
@@ -607,6 +608,7 @@ typedef struct PgStat_StatDBEntry
PgStat_Counter n_temp_bytes;
PgStat_Counter n_deadlocks;
PgStat_Counter n_checksum_failures;
+ TimestampTz last_checksum_failure;
PgStat_Counter n_block_read_time; /* times in microseconds */
PgStat_Counter n_block_write_time;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index d5f309fbfb..193de82cc3 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1801,6 +1801,17 @@ pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints
pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
pg_stat_get_buf_alloc() AS buffers_alloc,
pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
+pg_stat_checksums| SELECT d.oid AS datid,
+ d.datname,
+ pg_stat_get_db_checksum_failures(d.oid) AS checksum_failures,
+ pg_stat_get_db_checksum_last_failure(d.oid) AS checksum_last_failure,
+ pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset
+ FROM ( SELECT pg_database.oid,
+ pg_database.datname
+ FROM pg_database
+ UNION ALL
+ SELECT 0,
+ '<shared objects>'::name) d;
pg_stat_database| SELECT d.oid AS datid,
d.datname,
pg_stat_get_db_numbackends(d.oid) AS numbackends,
@@ -1817,7 +1828,6 @@ pg_stat_database| SELECT d.oid AS datid,
pg_stat_get_db_temp_files(d.oid) AS temp_files,
pg_stat_get_db_temp_bytes(d.oid) AS temp_bytes,
pg_stat_get_db_deadlocks(d.oid) AS deadlocks,
- pg_stat_get_db_checksum_failures(d.oid) AS checksum_failures,
pg_stat_get_db_blk_read_time(d.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(d.oid) AS blk_write_time,
pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset
On Sat, Mar 30, 2019 at 06:15:11PM +0100, Julien Rouhaud wrote:
I'd also have to get more feedback on this. For now, I'll add this
thread to the pg12 open items, as a follow up of the initial code
drop.
Catching up here... I think that having a completely separate view
with one row for each database and one row for shared objects makes
the most sense based on what has been proposed on this thread. Being
able to track checksum failures for shared catalogs is really
something I'd like to be able to see easily, and I have seen
corruption involving such objects from time to time. I think that we
should have a design which is extensible. One thing which is not
proposed on this patch, and I am fine with it as a first draft, is
that we don't have any information about the broken block number and
the file involved. My gut tells me that we'd want a separate view,
like pg_stat_checksums_details with one tuple per (dboid, rel, fork,
blck) to be complete. But that's just for future work.
For the progress part, we would most likely have a separate view for
that as well, as the view should show no rows if there is no operation
in progress.
The patch looks rather clean to me, I have some comments.
- <application>pg_checksums</application>. The exit status is zero if there
- are no checksum errors when checking them, and nonzero if at least one
- checksum failure is detected. If enabling or disabling checksums, the
- exit status is nonzero if the operation failed.
+ <application>pg_checksums</application>. As a consequence, the
+ <structname>pg_stat_checksums</structnameview won't reflect this activity.
+ The exit status is zero if there are no checksum errors when checking them,
+ and nonzero if at least one checksum failure is detected. If enabling or
+ disabling checksums, the exit status is nonzero if the operation failed.
The docs of pg_checksums already clearly state that the cluster needs
to be offline, so I am not sure that this addition is necessary.
@@ -1539,6 +1539,8 @@ pgstat_report_checksum_failures_in_db(Oid dboid,
int failurecount)
Please note that there is no need to have the list of arguments in the
comment block at the top of pgstat_report_checksum_failures_in_db().
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = dbentry->last_checksum_failure;
+
+ if (result == 0)
+ PG_RETURN_NULL();
+ else
+ PG_RETURN_TIMESTAMPTZ(result);
+}
No need for two ifs here. What about just that?
if (NULL)
PG_RETURN_NULL();
else
PG_RETURN_TIMESTAMPTZ(last_checksum_failure);
--
Michael
On Tue, Apr 2, 2019 at 6:56 AM Michael Paquier <michael@paquier.xyz> wrote:
On Sat, Mar 30, 2019 at 06:15:11PM +0100, Julien Rouhaud wrote:
I'd also have to get more feedback on this. For now, I'll add this
thread to the pg12 open items, as a follow up of the initial code
drop.Catching up here... I think that having a completely separate view
with one row for each database and one row for shared objects makes
the most sense based on what has been proposed on this thread. Being
able to track checksum failures for shared catalogs is really
something I'd like to be able to see easily, and I have seen
corruption involving such objects from time to time. I think that we
should have a design which is extensible.
Ok!
One thing which is not
proposed on this patch, and I am fine with it as a first draft, is
that we don't have any information about the broken block number and
the file involved. My gut tells me that we'd want a separate view,
like pg_stat_checksums_details with one tuple per (dboid, rel, fork,
blck) to be complete. But that's just for future work.
That could indeed be nice.
For the progress part, we would most likely have a separate view for
that as well, as the view should show no rows if there is no operation
in progress.
Ok.
The patch looks rather clean to me, I have some comments.
- <application>pg_checksums</application>. The exit status is zero if there - are no checksum errors when checking them, and nonzero if at least one - checksum failure is detected. If enabling or disabling checksums, the - exit status is nonzero if the operation failed. + <application>pg_checksums</application>. As a consequence, the + <structname>pg_stat_checksums</structnameview won't reflect this activity. + The exit status is zero if there are no checksum errors when checking them, + and nonzero if at least one checksum failure is detected. If enabling or + disabling checksums, the exit status is nonzero if the operation failed.The docs of pg_checksums already clearly state that the cluster needs
to be offline, so I am not sure that this addition is necessary.
Agreed, removed.
@@ -1539,6 +1539,8 @@ pgstat_report_checksum_failures_in_db(Oid dboid,
int failurecount)Please note that there is no need to have the list of arguments in the
comment block at the top of pgstat_report_checksum_failures_in_db().
Indeed, fixed.
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL) + result = 0; + else + result = dbentry->last_checksum_failure; + + if (result == 0) + PG_RETURN_NULL(); + else + PG_RETURN_TIMESTAMPTZ(result); +}No need for two ifs here. What about just that?
if (NULL)
PG_RETURN_NULL();
else
PG_RETURN_TIMESTAMPTZ(last_checksum_failure);
I do agree, but this is done like this everywhere in pgstatfuncs.c, so
I think it's better to keep it as-is for consistency.
Attachments:
pg_stat_checksums-v3.difftext/x-patch; charset=US-ASCII; name=pg_stat_checksums-v3.diffDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index f1df14bdea..30674f61ce 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -384,6 +384,14 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
</entry>
</row>
+ <row>
+ <entry><structname>pg_stat_checksums</structname><indexterm><primary>pg_stat_checksums</primary></indexterm></entry>
+ <entry>One row per database, plus one for the shared objects, showing
+ database-wide checksums statistics. See
+ <xref linkend="pg-stat-checksums-view"/> for details.
+ </entry>
+ </row>
+
<row>
<entry><structname>pg_stat_database</structname><indexterm><primary>pg_stat_database</primary></indexterm></entry>
<entry>One row per database, showing database-wide statistics. See
@@ -2418,6 +2426,54 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
single row, containing global data for the cluster.
</para>
+ <table id="pg-stat-checksums-view" xreflabel="pg_stat_checksums">
+ <title><structname>pg_stat_checksums</structname> View</title>
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Column</entry>
+ <entry>Type</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry><structfield>datid</structfield></entry>
+ <entry><type>oid</type></entry>
+ <entry>OID of a database, or 0 for objects belonging to a shared relation</entry>
+ </row>
+ <row>
+ <entry><structfield>datname</structfield></entry>
+ <entry><type>name</type></entry>
+ <entry>Name of this database, or <literal><shared_objects></literal></entry>
+ </row>
+ <row>
+ <entry><structfield>checksum_failures</structfield></entry>
+ <entry><type>bigint</type></entry>
+ <entry>Number of data page checksum failures detected in this
+ database</entry>
+ </row>
+ <row>
+ <entry><structfield>checksum_last_failure</structfield></entry>
+ <entry><type>timestamp with time zone</type></entry>
+ <entry>Time at which the last data page checksum failures was detected in
+ this database</entry>
+ </row>
+ <row>
+ <entry><structfield>stats_reset</structfield></entry>
+ <entry><type>timestamp with time zone</type></entry>
+ <entry>Time at which these statistics were last reset</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ The <structname>pg_stat_database</structname> view will contain one row
+ for each database in the cluster, showing database-wide statistics.
+ </para>
+
<table id="pg-stat-database-view" xreflabel="pg_stat_database">
<title><structname>pg_stat_database</structname> View</title>
<tgroup cols="3">
@@ -2529,11 +2585,6 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
<entry><type>bigint</type></entry>
<entry>Number of deadlocks detected in this database</entry>
</row>
- <row>
- <entry><structfield>checksum_failures</structfield></entry>
- <entry><type>bigint</type></entry>
- <entry>Number of data page checksum failures detected in this database</entry>
- </row>
<row>
<entry><structfield>blk_read_time</structfield></entry>
<entry><type>double precision</type></entry>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 84fb37c293..e063dd972e 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -218,7 +218,9 @@ PostgreSQL documentation
I/O system that would otherwise be silent. Enabling checksums
may incur a noticeable performance penalty. This option can only
be set during initialization, and cannot be changed later. If
- set, checksums are calculated for all objects, in all databases.
+ set, checksums are calculated for all objects, in all databases. All
+ checksum failures will be reported in the <xref
+ linkend="pg-stat-checksums-view"/> view.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index c4f3950e5b..8179db5f2a 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -531,7 +531,8 @@ PostgreSQL documentation
By default, checksums are verified and checksum failures will result
in a non-zero exit status. However, the base backup will not be
removed in such a case, as if the <option>--no-clean</option> option
- had been used.
+ had been used. Checksum verifications failures will also be reported
+ in the <xref linkend="pg-stat-checksums-view"/> view.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b89df70653..ffe53bf888 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -823,12 +823,24 @@ CREATE VIEW pg_stat_database AS
pg_stat_get_db_temp_files(D.oid) AS temp_files,
pg_stat_get_db_temp_bytes(D.oid) AS temp_bytes,
pg_stat_get_db_deadlocks(D.oid) AS deadlocks,
- pg_stat_get_db_checksum_failures(D.oid) AS checksum_failures,
pg_stat_get_db_blk_read_time(D.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(D.oid) AS blk_write_time,
pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
FROM pg_database D;
+CREATE VIEW pg_stat_checksums AS
+ SELECT
+ D.oid AS datid,
+ D.datname AS datname,
+ pg_stat_get_db_checksum_failures(D.oid) AS checksum_failures,
+ pg_stat_get_db_checksum_last_failure(D.oid) AS checksum_last_failure,
+ pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
+ FROM (
+ SELECT oid, datname FROM pg_database
+ UNION ALL
+ SELECT 0, '<shared objects>'
+ ) D;
+
CREATE VIEW pg_stat_database_conflicts AS
SELECT
D.oid AS datid,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 2a8472b91a..ed877d6d95 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1523,7 +1523,7 @@ pgstat_report_deadlock(void)
/* --------
- * pgstat_report_checksum_failures_in_db(dboid, failure_count) -
+ * pgstat_report_checksum_failures_in_db() -
*
* Tell the collector about one or more checksum failures.
* --------
@@ -1539,6 +1539,8 @@ pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount)
pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_CHECKSUMFAILURE);
msg.m_databaseid = dboid;
msg.m_failurecount = failurecount;
+ msg.m_failure_time = GetCurrentTimestamp();
+
pgstat_send(&msg, sizeof(msg));
}
@@ -4601,6 +4603,7 @@ reset_dbentry_counters(PgStat_StatDBEntry *dbentry)
dbentry->n_temp_bytes = 0;
dbentry->n_deadlocks = 0;
dbentry->n_checksum_failures = 0;
+ dbentry->last_checksum_failure = 0;
dbentry->n_block_read_time = 0;
dbentry->n_block_write_time = 0;
@@ -6257,6 +6260,7 @@ pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
dbentry->n_checksum_failures += msg->m_failurecount;
+ dbentry->last_checksum_failure = msg->m_failure_time;
}
/* ----------
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 537f09e342..36dcb28754 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1584,9 +1584,9 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
(errmsg("file \"%s\" has a total of %d checksum verification "
"failures", readfilename, checksum_failures)));
- if (dboid != InvalidOid)
- pgstat_report_checksum_failures_in_db(dboid, checksum_failures);
+ pgstat_report_checksum_failures_in_db(dboid, checksum_failures);
}
+
total_checksum_failures += checksum_failures;
return true;
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 90a817a25c..b28d9ab216 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1514,6 +1514,24 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
PG_RETURN_INT64(result);
}
+Datum
+pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ TimestampTz result;
+ PgStat_StatDBEntry *dbentry;
+
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = dbentry->last_checksum_failure;
+
+ if (result == 0)
+ PG_RETURN_NULL();
+ else
+ PG_RETURN_TIMESTAMPTZ(result);
+}
+
Datum
pg_stat_get_db_blk_read_time(PG_FUNCTION_ARGS)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index eac909109c..c4aeeee810 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5281,6 +5281,11 @@
proname => 'pg_stat_get_db_checksum_failures', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_db_checksum_failures' },
+{ oid => '8394',
+ descr => 'statistics: when last checksum failure was detected in database',
+ proname => 'pg_stat_get_db_checksum_last_failure', provolatile => 's',
+ proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_checksum_last_failure' },
{ oid => '3074', descr => 'statistics: last reset for a database',
proname => 'pg_stat_get_db_stat_reset_time', provolatile => 's',
proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c080fa6388..d6d1772706 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -541,6 +541,7 @@ typedef struct PgStat_MsgChecksumFailure
PgStat_MsgHdr m_hdr;
Oid m_databaseid;
int m_failurecount;
+ TimestampTz m_failure_time;
} PgStat_MsgChecksumFailure;
@@ -607,6 +608,7 @@ typedef struct PgStat_StatDBEntry
PgStat_Counter n_temp_bytes;
PgStat_Counter n_deadlocks;
PgStat_Counter n_checksum_failures;
+ TimestampTz last_checksum_failure;
PgStat_Counter n_block_read_time; /* times in microseconds */
PgStat_Counter n_block_write_time;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index d5f309fbfb..193de82cc3 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1801,6 +1801,17 @@ pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints
pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
pg_stat_get_buf_alloc() AS buffers_alloc,
pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
+pg_stat_checksums| SELECT d.oid AS datid,
+ d.datname,
+ pg_stat_get_db_checksum_failures(d.oid) AS checksum_failures,
+ pg_stat_get_db_checksum_last_failure(d.oid) AS checksum_last_failure,
+ pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset
+ FROM ( SELECT pg_database.oid,
+ pg_database.datname
+ FROM pg_database
+ UNION ALL
+ SELECT 0,
+ '<shared objects>'::name) d;
pg_stat_database| SELECT d.oid AS datid,
d.datname,
pg_stat_get_db_numbackends(d.oid) AS numbackends,
@@ -1817,7 +1828,6 @@ pg_stat_database| SELECT d.oid AS datid,
pg_stat_get_db_temp_files(d.oid) AS temp_files,
pg_stat_get_db_temp_bytes(d.oid) AS temp_bytes,
pg_stat_get_db_deadlocks(d.oid) AS deadlocks,
- pg_stat_get_db_checksum_failures(d.oid) AS checksum_failures,
pg_stat_get_db_blk_read_time(d.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(d.oid) AS blk_write_time,
pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset
On Tue, Apr 02, 2019 at 07:43:12AM +0200, Julien Rouhaud wrote:
On Tue, Apr 2, 2019 at 6:56 AM Michael Paquier <michael@paquier.xyz> wrote:
One thing which is not
proposed on this patch, and I am fine with it as a first draft, is
that we don't have any information about the broken block number and
the file involved. My gut tells me that we'd want a separate view,
like pg_stat_checksums_details with one tuple per (dboid, rel, fork,
blck) to be complete. But that's just for future work.That could indeed be nice.
Actually, backpedaling on this one... pg_stat_checksums_details may
be a bad idea as we could finish with one row per broken block. If
a corruption is spreading quickly, pgstat would not be able to sustain
that amount of objects. Having pg_stat_checksums would allow us to
plugin more data easily based on the last failure state:
- last relid of failure
- last fork type of failure
- last block number of failure.
Not saying to do that now, but having that in pg_stat_database does
not seem very natural to me. And on top of that we would have an
extra row full of NULLs for shared objects in pg_stat_database if we
adopt the unique view approach... I find that rather ugly.
No need for two ifs here. What about just that?
if (NULL)
PG_RETURN_NULL();
else
PG_RETURN_TIMESTAMPTZ(last_checksum_failure);I do agree, but this is done like this everywhere in pgstatfuncs.c, so
I think it's better to keep it as-is for consistency.
Okay, this is not an issue for me.
The patch looks fine to me as-is. Let's see what Magnus or others have to
say about it. I can take care of this open item if necessary but
that's not my commit so I'd rather not step on Magnus' toes.
--
Michael
On Tue, Apr 2, 2019 at 8:47 AM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Apr 02, 2019 at 07:43:12AM +0200, Julien Rouhaud wrote:
On Tue, Apr 2, 2019 at 6:56 AM Michael Paquier <michael@paquier.xyz>
wrote:
One thing which is not
proposed on this patch, and I am fine with it as a first draft, is
that we don't have any information about the broken block number and
the file involved. My gut tells me that we'd want a separate view,
like pg_stat_checksums_details with one tuple per (dboid, rel, fork,
blck) to be complete. But that's just for future work.That could indeed be nice.
Actually, backpedaling on this one... pg_stat_checksums_details may
be a bad idea as we could finish with one row per broken block. If
a corruption is spreading quickly, pgstat would not be able to sustain
that amount of objects. Having pg_stat_checksums would allow us to
plugin more data easily based on the last failure state:
- last relid of failure
- last fork type of failure
- last block number of failure.
Not saying to do that now, but having that in pg_stat_database does
not seem very natural to me. And on top of that we would have an
extra row full of NULLs for shared objects in pg_stat_database if we
adopt the unique view approach... I find that rather ugly.
I think that tracking each and every block is of course a non-starter, as
you've noticed.
I'm really not sure how much those three extra fields help, TBH. As I see
it the real usecase for this is automated monitoring and quick-checks of
the kind of "is my db currently broken somewhere", in combination with "did
this occur recently" (for people who have never looked at their stats).
This gives people enough information to know where to go look in the logs.
I mean, what's the actual usecase for tracking relid/fork/block of the
*last* failure only? To monitor and see if it changes? What do I do when I
have 10 failures, and I only know about the last one? (I have to go to the
logs anyway)
I think having the count and hte last time make sense, but I'm very
sceptical about the rest.
I can somewhat agree that splitting it on a per database level might even
at that be overdoing it. What might actually be more interesting from a
failure-location perspective would be tablespace, rather than any of the
others. Or we could reduce it down to just putting it in pg_stat_bgwriter
and only count global values perhaps, if in the end we don't think the
split-per-database is reasonable?
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On Tue, Apr 02, 2019 at 07:06:35PM +0200, Magnus Hagander wrote:
I think having the count and hte last time make sense, but I'm very
sceptical about the rest.
There may be other things which we are not considering on this
thread. I don't know.
I can somewhat agree that splitting it on a per database level might even
at that be overdoing it. What might actually be more interesting from a
failure-location perspective would be tablespace, rather than any of the
others. Or we could reduce it down to just putting it in pg_stat_bgwriter
and only count global values perhaps, if in the end we don't think the
split-per-database is reasonable?
A split per database or per tablespace is I think a very good thing.
This helps in tracking down which partitions have gone crazy, and a
single global counter does not allow that.
--
Michael
On Wed, Apr 3, 2019 at 3:43 AM Michael Paquier <michael@paquier.xyz> wrote:
I can somewhat agree that splitting it on a per database level might even
at that be overdoing it. What might actually be more interesting from a
failure-location perspective would be tablespace, rather than any of the
others. Or we could reduce it down to just putting it in pg_stat_bgwriter
and only count global values perhaps, if in the end we don't think the
split-per-database is reasonable?A split per database or per tablespace is I think a very good thing.
This helps in tracking down which partitions have gone crazy, and a
single global counter does not allow that.
Indeed, a per-tablespace would be much more convenient to track the
problem down at the physical level, but we don't have the required
infrastructure for that yet, and it seems quite late to add it now.
IMHO, a per-database has also some value, as it can help to track down
issues at the application level.
Maybe we could add a new column to the view (for instance "source")
which would always be 'database', and we could later add
per-tablespace counters, keeping the view compatibility.
On Wed, Apr 3, 2019 at 10:44 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Wed, Apr 3, 2019 at 3:43 AM Michael Paquier <michael@paquier.xyz>
wrote:I can somewhat agree that splitting it on a per database level might
even
at that be overdoing it. What might actually be more interesting from a
failure-location perspective would be tablespace, rather than any ofthe
others. Or we could reduce it down to just putting it in
pg_stat_bgwriter
and only count global values perhaps, if in the end we don't think the
split-per-database is reasonable?A split per database or per tablespace is I think a very good thing.
This helps in tracking down which partitions have gone crazy, and a
single global counter does not allow that.Indeed, a per-tablespace would be much more convenient to track the
problem down at the physical level, but we don't have the required
infrastructure for that yet, and it seems quite late to add it now.
IMHO, a per-database has also some value, as it can help to track down
issues at the application level.Maybe we could add a new column to the view (for instance "source")
which would always be 'database', and we could later add
per-tablespace counters, keeping the view compatibility.
Ugh.
If we wanted per tablespace counters, shouldn't we have a
pg_stat_tablespace instead? So we'd have a checksum failures counter in
pg_state_database separated by database, and one in pg_stat_tablespace
separated by tablespace? (Along with probably a bunch of other entries for
tablespaces)
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On Wed, Apr 3, 2019 at 11:31 AM Magnus Hagander <magnus@hagander.net> wrote:
On Wed, Apr 3, 2019 at 10:44 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Wed, Apr 3, 2019 at 3:43 AM Michael Paquier <michael@paquier.xyz> wrote:
I can somewhat agree that splitting it on a per database level might even
at that be overdoing it. What might actually be more interesting from a
failure-location perspective would be tablespace, rather than any of the
others. Or we could reduce it down to just putting it in pg_stat_bgwriter
and only count global values perhaps, if in the end we don't think the
split-per-database is reasonable?A split per database or per tablespace is I think a very good thing.
This helps in tracking down which partitions have gone crazy, and a
single global counter does not allow that.Indeed, a per-tablespace would be much more convenient to track the
problem down at the physical level, but we don't have the required
infrastructure for that yet, and it seems quite late to add it now.
IMHO, a per-database has also some value, as it can help to track down
issues at the application level.Maybe we could add a new column to the view (for instance "source")
which would always be 'database', and we could later add
per-tablespace counters, keeping the view compatibility.Ugh.
If we wanted per tablespace counters, shouldn't we have a pg_stat_tablespace instead? So we'd have a checksum failures counter in pg_state_database separated by database, and one in pg_stat_tablespace separated by tablespace? (Along with probably a bunch of other entries for tablespaces)
But there's still the problem of reporting errors on shared relation,
so pg_stat_database doesn't really fit for that. If we go with a
checksum centric view, it'd be strange to have some of the counters in
another view.
On Wed, Apr 03, 2019 at 11:56:14AM +0200, Julien Rouhaud wrote:
But there's still the problem of reporting errors on shared relation,
so pg_stat_database doesn't really fit for that. If we go with a
checksum centric view, it'd be strange to have some of the counters in
another view.
Having pg_stat_database filled with a phantom row full of NULLs to
track checksum failures of shared objects would be confusing I think.
I personally quite like the separate view approach, with one row per
database, but one opinion does not stand as an agreement.
Anyway, even if we have no agreement on the shape of what we'd like to
do, I don't think that HEAD is in a proper shape now because we just
don't track a portion of the objects which could have checksum
failures. So we should either revert the patch currently committed,
or add tracking for shared objects, but definitely not keep the code
in a state in-between.
--
Michael
On Thu, Apr 4, 2019 at 6:22 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Apr 03, 2019 at 11:56:14AM +0200, Julien Rouhaud wrote:
But there's still the problem of reporting errors on shared relation,
so pg_stat_database doesn't really fit for that. If we go with a
checksum centric view, it'd be strange to have some of the counters in
another view.Having pg_stat_database filled with a phantom row full of NULLs to
track checksum failures of shared objects would be confusing I think.
I personally quite like the separate view approach, with one row per
database, but one opinion does not stand as an agreement.
It wouldn't be just that, but it would make sense to include things like
blks_read/blks_hit there as well, wouldn't it? As well as read/write time.
Things we don't track today, but it could be useful to do so.
But yeah, I'm not strongly in either direction, so if others feel strongly
a separate view is better, then we should do a separate view.
Anyway, even if we have no agreement on the shape of what we'd like to
do, I don't think that HEAD is in a proper shape now because we just
don't track a portion of the objects which could have checksum
failures. So we should either revert the patch currently committed,
or add tracking for shared objects, but definitely not keep the code
in a state in-between.
Definitely. That's why we're discussing it now :) Maybe we should put it on
the open items list, because we definitely don't want to ship it one way
and then change our mind in the next version.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On Thu, Apr 4, 2019 at 10:25 AM Magnus Hagander <magnus@hagander.net> wrote:
On Thu, Apr 4, 2019 at 6:22 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Apr 03, 2019 at 11:56:14AM +0200, Julien Rouhaud wrote:
But there's still the problem of reporting errors on shared relation,
so pg_stat_database doesn't really fit for that. If we go with a
checksum centric view, it'd be strange to have some of the counters in
another view.Having pg_stat_database filled with a phantom row full of NULLs to
track checksum failures of shared objects would be confusing I think.
I personally quite like the separate view approach, with one row per
database, but one opinion does not stand as an agreement.It wouldn't be just that, but it would make sense to include things like blks_read/blks_hit there as well, wouldn't it? As well as read/write time. Things we don't track today, but it could be useful to do so.
Actually we do track counters for shared relations (see
pgstat_report_stat), we just don't expose them in any view. But it's
still possible to get the counters manually:
# select pg_stat_get_db_blocks_hit(0);
pg_stat_get_db_blocks_hit
---------------------------
2710329
(1 row)
My main concern is that pg_stat_get_db_numbackends(0) report something
like the total number of backend (though it seems that there's an
extra connection accounted for, I don't know which process it's), so
if we expose it in pg_stat_database, sum(numbackends) won't make sense
anymore.
Anyway, even if we have no agreement on the shape of what we'd like to
do, I don't think that HEAD is in a proper shape now because we just
don't track a portion of the objects which could have checksum
failures. So we should either revert the patch currently committed,
or add tracking for shared objects, but definitely not keep the code
in a state in-between.Definitely. That's why we're discussing it now :) Maybe we should put it on the open items list, because we definitely don't want to ship it one way and then change our mind in the next version.
I already added an open item for that.
On Thu, Apr 4, 2019 at 10:47 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Thu, Apr 4, 2019 at 10:25 AM Magnus Hagander <magnus@hagander.net>
wrote:On Thu, Apr 4, 2019 at 6:22 AM Michael Paquier <michael@paquier.xyz>
wrote:
On Wed, Apr 03, 2019 at 11:56:14AM +0200, Julien Rouhaud wrote:
But there's still the problem of reporting errors on shared relation,
so pg_stat_database doesn't really fit for that. If we go with a
checksum centric view, it'd be strange to have some of the counters in
another view.Having pg_stat_database filled with a phantom row full of NULLs to
track checksum failures of shared objects would be confusing I think.
I personally quite like the separate view approach, with one row per
database, but one opinion does not stand as an agreement.It wouldn't be just that, but it would make sense to include things like
blks_read/blks_hit there as well, wouldn't it? As well as read/write time.
Things we don't track today, but it could be useful to do so.Actually we do track counters for shared relations (see
pgstat_report_stat), we just don't expose them in any view. But it's
still possible to get the counters manually:# select pg_stat_get_db_blocks_hit(0);
pg_stat_get_db_blocks_hit
---------------------------
2710329
(1 row)
Oh, right, we do actually collect it, we just don't show is. So that's
another argument *for* having it in pg_stat_database. Or at least not for
having it in a checksum specific view, because then we should really make a
separate view for this as well.
My main concern is that pg_stat_get_db_numbackends(0) report something
like the total number of backend (though it seems that there's an
extra connection accounted for, I don't know which process it's), so
if we expose it in pg_stat_database, sum(numbackends) won't make sense
anymore.
We could also just hardcoded it so that one always shows 0?
Anyway, even if we have no agreement on the shape of what we'd like to
do, I don't think that HEAD is in a proper shape now because we just
don't track a portion of the objects which could have checksum
failures. So we should either revert the patch currently committed,
or add tracking for shared objects, but definitely not keep the code
in a state in-between.Definitely. That's why we're discussing it now :) Maybe we should put it
on the open items list, because we definitely don't want to ship it one way
and then change our mind in the next version.I already added an open item for that.
Good.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On Thu, Apr 4, 2019 at 1:25 PM Magnus Hagander <magnus@hagander.net> wrote:
On Thu, Apr 4, 2019 at 10:47 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
Actually we do track counters for shared relations (see
pgstat_report_stat), we just don't expose them in any view. But it's
still possible to get the counters manually:# select pg_stat_get_db_blocks_hit(0);
pg_stat_get_db_blocks_hit
---------------------------
2710329
(1 row)Oh, right, we do actually collect it, we just don't show is. So that's another argument *for* having it in pg_stat_database. Or at least not for having it in a checksum specific view, because then we should really make a separate view for this as well.
Ok, so let's expose all the shared counters in pg_stat_database and
remove the pg_stat_checksum view.
My main concern is that pg_stat_get_db_numbackends(0) report something
like the total number of backend (though it seems that there's an
extra connection accounted for, I don't know which process it's), so
if we expose it in pg_stat_database, sum(numbackends) won't make sense
anymore.We could also just hardcoded it so that one always shows 0?
That's a bit hacky, but that's probably the best compromise. Attached
v4 with all those changes.
Attachments:
pg_stat_checksums-v4.difftext/x-patch; charset=US-ASCII; name=pg_stat_checksums-v4.diffDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b946e13fdc..272f860a2a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2498,20 +2498,22 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
<row>
<entry><structfield>datid</structfield></entry>
<entry><type>oid</type></entry>
- <entry>OID of a database</entry>
+ <entry>OID of this database, or 0 for objects belonging to a shared
+ relation</entry>
</row>
<row>
<entry><structfield>datname</structfield></entry>
<entry><type>name</type></entry>
- <entry>Name of this database</entry>
+ <entry>Name of this database, or <literal><shared_objects>
+ </literal></entry>
</row>
<row>
<entry><structfield>numbackends</structfield></entry>
<entry><type>integer</type></entry>
- <entry>Number of backends currently connected to this database.
- This is the only column in this view that returns a value reflecting
- current state; all other columns return the accumulated values since
- the last reset.</entry>
+ <entry>Number of backends currently connected to this database, or 0 for
+ the shared objects. This is the only column in this view that returns a
+ value reflecting current state; all other columns return the accumulated
+ values since the last reset.</entry>
</row>
<row>
<entry><structfield>xact_commit</structfield></entry>
@@ -2597,7 +2599,14 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
<row>
<entry><structfield>checksum_failures</structfield></entry>
<entry><type>bigint</type></entry>
- <entry>Number of data page checksum failures detected in this database</entry>
+ <entry>Number of data page checksum failures detected in this
+ database</entry>
+ </row>
+ <row>
+ <entry><structfield>checksum_last_failure</structfield></entry>
+ <entry><type>timestamp with time zone</type></entry>
+ <entry>Time at which the last data page checksum failures was detected in
+ this database</entry>
</row>
<row>
<entry><structfield>blk_read_time</structfield></entry>
@@ -2622,7 +2631,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
<para>
The <structname>pg_stat_database</structname> view will contain one row
- for each database in the cluster, showing database-wide statistics.
+ for each database in the cluster, plus one for the shared objects, showing
+ database-wide statistics.
</para>
<table id="pg-stat-database-conflicts-view" xreflabel="pg_stat_database_conflicts">
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 7f32310308..7fc3152c6d 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -218,7 +218,9 @@ PostgreSQL documentation
I/O system that would otherwise be silent. Enabling checksums
may incur a noticeable performance penalty. This option can only
be set during initialization, and cannot be changed later. If
- set, checksums are calculated for all objects, in all databases.
+ set, checksums are calculated for all objects, in all databases. All
+ checksum failures will be reported in the <xref
+ linkend="pg-stat-database-view"/> view.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index dd6bce57d2..3c7baf1fb4 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -531,7 +531,8 @@ PostgreSQL documentation
By default, checksums are verified and checksum failures will result
in a non-zero exit status. However, the base backup will not be
removed in such a case, as if the <option>--no-clean</option> option
- had been used.
+ had been used. Checksum verifications failures will also be reported
+ in the <xref linkend="pg-stat-database-view"/> view.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 72f786d6f8..fbf64775ae 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -816,7 +816,10 @@ CREATE VIEW pg_stat_database AS
SELECT
D.oid AS datid,
D.datname AS datname,
- pg_stat_get_db_numbackends(D.oid) AS numbackends,
+ CASE
+ WHEN (D.oid = (0)::oid) THEN 0
+ ELSE pg_stat_get_db_numbackends(D.oid)
+ END AS numbackends,
pg_stat_get_db_xact_commit(D.oid) AS xact_commit,
pg_stat_get_db_xact_rollback(D.oid) AS xact_rollback,
pg_stat_get_db_blocks_fetched(D.oid) -
@@ -832,10 +835,15 @@ CREATE VIEW pg_stat_database AS
pg_stat_get_db_temp_bytes(D.oid) AS temp_bytes,
pg_stat_get_db_deadlocks(D.oid) AS deadlocks,
pg_stat_get_db_checksum_failures(D.oid) AS checksum_failures,
+ pg_stat_get_db_checksum_last_failure(D.oid) AS checksum_last_failure,
pg_stat_get_db_blk_read_time(D.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(D.oid) AS blk_write_time,
pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
- FROM pg_database D;
+ FROM (
+ SELECT 0 AS oid, '<shared objects>'::name AS datname
+ UNION ALL
+ SELECT oid, datname FROM pg_database
+ ) D;
CREATE VIEW pg_stat_database_conflicts AS
SELECT
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 0355fa65fb..fc489e1b53 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1523,7 +1523,7 @@ pgstat_report_deadlock(void)
/* --------
- * pgstat_report_checksum_failures_in_db(dboid, failure_count) -
+ * pgstat_report_checksum_failures_in_db() -
*
* Tell the collector about one or more checksum failures.
* --------
@@ -1539,6 +1539,8 @@ pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount)
pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_CHECKSUMFAILURE);
msg.m_databaseid = dboid;
msg.m_failurecount = failurecount;
+ msg.m_failure_time = GetCurrentTimestamp();
+
pgstat_send(&msg, sizeof(msg));
}
@@ -4647,6 +4649,7 @@ reset_dbentry_counters(PgStat_StatDBEntry *dbentry)
dbentry->n_temp_bytes = 0;
dbentry->n_deadlocks = 0;
dbentry->n_checksum_failures = 0;
+ dbentry->last_checksum_failure = 0;
dbentry->n_block_read_time = 0;
dbentry->n_block_write_time = 0;
@@ -6303,6 +6306,7 @@ pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
dbentry->n_checksum_failures += msg->m_failurecount;
+ dbentry->last_checksum_failure = msg->m_failure_time;
}
/* ----------
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 537f09e342..36dcb28754 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1584,9 +1584,9 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
(errmsg("file \"%s\" has a total of %d checksum verification "
"failures", readfilename, checksum_failures)));
- if (dboid != InvalidOid)
- pgstat_report_checksum_failures_in_db(dboid, checksum_failures);
+ pgstat_report_checksum_failures_in_db(dboid, checksum_failures);
}
+
total_checksum_failures += checksum_failures;
return true;
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9a1d07bee3..97f41fb46c 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1534,6 +1534,24 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
PG_RETURN_INT64(result);
}
+Datum
+pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ TimestampTz result;
+ PgStat_StatDBEntry *dbentry;
+
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = dbentry->last_checksum_failure;
+
+ if (result == 0)
+ PG_RETURN_NULL();
+ else
+ PG_RETURN_TIMESTAMPTZ(result);
+}
+
Datum
pg_stat_get_db_blk_read_time(PG_FUNCTION_ARGS)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fb257c17c8..b5e0fc77bd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5285,6 +5285,11 @@
proname => 'pg_stat_get_db_checksum_failures', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_db_checksum_failures' },
+{ oid => '8394',
+ descr => 'statistics: when last checksum failure was detected in database',
+ proname => 'pg_stat_get_db_checksum_last_failure', provolatile => 's',
+ proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_checksum_last_failure' },
{ oid => '3074', descr => 'statistics: last reset for a database',
proname => 'pg_stat_get_db_stat_reset_time', provolatile => 's',
proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 5888242f75..ef2e81d676 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -541,6 +541,7 @@ typedef struct PgStat_MsgChecksumFailure
PgStat_MsgHdr m_hdr;
Oid m_databaseid;
int m_failurecount;
+ TimestampTz m_failure_time;
} PgStat_MsgChecksumFailure;
@@ -607,6 +608,7 @@ typedef struct PgStat_StatDBEntry
PgStat_Counter n_temp_bytes;
PgStat_Counter n_deadlocks;
PgStat_Counter n_checksum_failures;
+ TimestampTz last_checksum_failure;
PgStat_Counter n_block_read_time; /* times in microseconds */
PgStat_Counter n_block_write_time;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index bf7fca54ee..31633fc382 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1803,7 +1803,10 @@ pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints
pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
pg_stat_database| SELECT d.oid AS datid,
d.datname,
- pg_stat_get_db_numbackends(d.oid) AS numbackends,
+ CASE
+ WHEN (d.oid = (0)::oid) THEN 0
+ ELSE pg_stat_get_db_numbackends(d.oid)
+ END AS numbackends,
pg_stat_get_db_xact_commit(d.oid) AS xact_commit,
pg_stat_get_db_xact_rollback(d.oid) AS xact_rollback,
(pg_stat_get_db_blocks_fetched(d.oid) - pg_stat_get_db_blocks_hit(d.oid)) AS blks_read,
@@ -1818,10 +1821,16 @@ pg_stat_database| SELECT d.oid AS datid,
pg_stat_get_db_temp_bytes(d.oid) AS temp_bytes,
pg_stat_get_db_deadlocks(d.oid) AS deadlocks,
pg_stat_get_db_checksum_failures(d.oid) AS checksum_failures,
+ pg_stat_get_db_checksum_last_failure(d.oid) AS checksum_last_failure,
pg_stat_get_db_blk_read_time(d.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(d.oid) AS blk_write_time,
pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset
- FROM pg_database d;
+ FROM ( SELECT 0 AS oid,
+ '<shared objects>'::name AS datname
+ UNION ALL
+ SELECT pg_database.oid,
+ pg_database.datname
+ FROM pg_database) d;
pg_stat_database_conflicts| SELECT d.oid AS datid,
d.datname,
pg_stat_get_db_conflict_tablespace(d.oid) AS confl_tablespace,
On Thu, Apr 4, 2019 at 2:52 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Thu, Apr 4, 2019 at 1:25 PM Magnus Hagander <magnus@hagander.net>
wrote:On Thu, Apr 4, 2019 at 10:47 AM Julien Rouhaud <rjuju123@gmail.com>
wrote:
Actually we do track counters for shared relations (see
pgstat_report_stat), we just don't expose them in any view. But it's
still possible to get the counters manually:# select pg_stat_get_db_blocks_hit(0);
pg_stat_get_db_blocks_hit
---------------------------
2710329
(1 row)Oh, right, we do actually collect it, we just don't show is. So that's
another argument *for* having it in pg_stat_database. Or at least not for
having it in a checksum specific view, because then we should really make a
separate view for this as well.Ok, so let's expose all the shared counters in pg_stat_database and
remove the pg_stat_checksum view.My main concern is that pg_stat_get_db_numbackends(0) report something
like the total number of backend (though it seems that there's an
extra connection accounted for, I don't know which process it's), so
if we expose it in pg_stat_database, sum(numbackends) won't make sense
anymore.We could also just hardcoded it so that one always shows 0?
That's a bit hacky, but that's probably the best compromise. Attached
v4 with all those changes.
I'm not sure I like the idea of using "<shared_objects>" as the database
name. It's not very likely that somebody would be using that as a name for
their database, but i's not impossible. But it also just looks strrange.
Wouldn't NULL be a more appropriate choice?
Likewise, shouldn't we return NULL as the number of backends for the shared
counters, rather than 0?
Micro-nit:
+ <entry>Time at which the last data page checksum failures was
detected in
s/failures/failure/
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Thanks for looking it it!
On Sun, Apr 7, 2019 at 4:36 PM Magnus Hagander <magnus@hagander.net> wrote:
I'm not sure I like the idea of using "<shared_objects>" as the database name. It's not very likely that somebody would be using that as a name for their database, but i's not impossible. But it also just looks strrange. Wouldn't NULL be a more appropriate choice?
Likewise, shouldn't we return NULL as the number of backends for the shared counters, rather than 0?
I wanted to make things more POLA-compliant, but maybe it was a bad
idea. I changed it for NULL here and for numbackends.
Micro-nit:
+ <entry>Time at which the last data page checksum failures was detected in
s/failures/failure/
Oops.
v5 attached.
Attachments:
pg_stat_checksums-v5.difftext/x-patch; charset=US-ASCII; name=pg_stat_checksums-v5.diffDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4eb43f2de9..6bad265413 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2498,20 +2498,22 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
<row>
<entry><structfield>datid</structfield></entry>
<entry><type>oid</type></entry>
- <entry>OID of a database</entry>
+ <entry>OID of this database, or 0 for objects belonging to a shared
+ relation</entry>
</row>
<row>
<entry><structfield>datname</structfield></entry>
<entry><type>name</type></entry>
- <entry>Name of this database</entry>
+ <entry>Name of this database, or <literal>NULL</literal> for the shared
+ objects.</entry>
</row>
<row>
<entry><structfield>numbackends</structfield></entry>
<entry><type>integer</type></entry>
- <entry>Number of backends currently connected to this database.
- This is the only column in this view that returns a value reflecting
- current state; all other columns return the accumulated values since
- the last reset.</entry>
+ <entry>Number of backends currently connected to this database, or
+ <literal>NULL</literal> for the shared objects. This is the only column
+ in this view that returns a value reflecting current state; all other
+ columns return the accumulated values since the last reset.</entry>
</row>
<row>
<entry><structfield>xact_commit</structfield></entry>
@@ -2597,7 +2599,14 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
<row>
<entry><structfield>checksum_failures</structfield></entry>
<entry><type>bigint</type></entry>
- <entry>Number of data page checksum failures detected in this database</entry>
+ <entry>Number of data page checksum failures detected in this
+ database</entry>
+ </row>
+ <row>
+ <entry><structfield>checksum_last_failure</structfield></entry>
+ <entry><type>timestamp with time zone</type></entry>
+ <entry>Time at which the last data page checksum failure was detected in
+ this database, or on a shared object.</entry>
</row>
<row>
<entry><structfield>blk_read_time</structfield></entry>
@@ -2622,7 +2631,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
<para>
The <structname>pg_stat_database</structname> view will contain one row
- for each database in the cluster, showing database-wide statistics.
+ for each database in the cluster, plus one for the shared objects, showing
+ database-wide statistics.
</para>
<table id="pg-stat-database-conflicts-view" xreflabel="pg_stat_database_conflicts">
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 7f32310308..7fc3152c6d 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -218,7 +218,9 @@ PostgreSQL documentation
I/O system that would otherwise be silent. Enabling checksums
may incur a noticeable performance penalty. This option can only
be set during initialization, and cannot be changed later. If
- set, checksums are calculated for all objects, in all databases.
+ set, checksums are calculated for all objects, in all databases. All
+ checksum failures will be reported in the <xref
+ linkend="pg-stat-database-view"/> view.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index dd6bce57d2..3c7baf1fb4 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -531,7 +531,8 @@ PostgreSQL documentation
By default, checksums are verified and checksum failures will result
in a non-zero exit status. However, the base backup will not be
removed in such a case, as if the <option>--no-clean</option> option
- had been used.
+ had been used. Checksum verifications failures will also be reported
+ in the <xref linkend="pg-stat-database-view"/> view.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 16e456a7d9..e90d251a2c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -816,7 +816,10 @@ CREATE VIEW pg_stat_database AS
SELECT
D.oid AS datid,
D.datname AS datname,
- pg_stat_get_db_numbackends(D.oid) AS numbackends,
+ CASE
+ WHEN (D.oid = (0)::oid) THEN NULL::integer
+ ELSE pg_stat_get_db_numbackends(D.oid)
+ END AS numbackends,
pg_stat_get_db_xact_commit(D.oid) AS xact_commit,
pg_stat_get_db_xact_rollback(D.oid) AS xact_rollback,
pg_stat_get_db_blocks_fetched(D.oid) -
@@ -832,10 +835,15 @@ CREATE VIEW pg_stat_database AS
pg_stat_get_db_temp_bytes(D.oid) AS temp_bytes,
pg_stat_get_db_deadlocks(D.oid) AS deadlocks,
pg_stat_get_db_checksum_failures(D.oid) AS checksum_failures,
+ pg_stat_get_db_checksum_last_failure(D.oid) AS checksum_last_failure,
pg_stat_get_db_blk_read_time(D.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(D.oid) AS blk_write_time,
pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
- FROM pg_database D;
+ FROM (
+ SELECT 0 AS oid, NULL::name AS datname
+ UNION ALL
+ SELECT oid, datname FROM pg_database
+ ) D;
CREATE VIEW pg_stat_database_conflicts AS
SELECT
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 0355fa65fb..fc489e1b53 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1523,7 +1523,7 @@ pgstat_report_deadlock(void)
/* --------
- * pgstat_report_checksum_failures_in_db(dboid, failure_count) -
+ * pgstat_report_checksum_failures_in_db() -
*
* Tell the collector about one or more checksum failures.
* --------
@@ -1539,6 +1539,8 @@ pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount)
pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_CHECKSUMFAILURE);
msg.m_databaseid = dboid;
msg.m_failurecount = failurecount;
+ msg.m_failure_time = GetCurrentTimestamp();
+
pgstat_send(&msg, sizeof(msg));
}
@@ -4647,6 +4649,7 @@ reset_dbentry_counters(PgStat_StatDBEntry *dbentry)
dbentry->n_temp_bytes = 0;
dbentry->n_deadlocks = 0;
dbentry->n_checksum_failures = 0;
+ dbentry->last_checksum_failure = 0;
dbentry->n_block_read_time = 0;
dbentry->n_block_write_time = 0;
@@ -6303,6 +6306,7 @@ pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
dbentry->n_checksum_failures += msg->m_failurecount;
+ dbentry->last_checksum_failure = msg->m_failure_time;
}
/* ----------
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 537f09e342..36dcb28754 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1584,9 +1584,9 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
(errmsg("file \"%s\" has a total of %d checksum verification "
"failures", readfilename, checksum_failures)));
- if (dboid != InvalidOid)
- pgstat_report_checksum_failures_in_db(dboid, checksum_failures);
+ pgstat_report_checksum_failures_in_db(dboid, checksum_failures);
}
+
total_checksum_failures += checksum_failures;
return true;
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9a1d07bee3..97f41fb46c 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1534,6 +1534,24 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
PG_RETURN_INT64(result);
}
+Datum
+pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
+{
+ Oid dbid = PG_GETARG_OID(0);
+ TimestampTz result;
+ PgStat_StatDBEntry *dbentry;
+
+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = dbentry->last_checksum_failure;
+
+ if (result == 0)
+ PG_RETURN_NULL();
+ else
+ PG_RETURN_TIMESTAMPTZ(result);
+}
+
Datum
pg_stat_get_db_blk_read_time(PG_FUNCTION_ARGS)
{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index ad4519e001..73ebfdf970 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5285,6 +5285,11 @@
proname => 'pg_stat_get_db_checksum_failures', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_db_checksum_failures' },
+{ oid => '8394',
+ descr => 'statistics: when last checksum failure was detected in database',
+ proname => 'pg_stat_get_db_checksum_last_failure', provolatile => 's',
+ proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_checksum_last_failure' },
{ oid => '3074', descr => 'statistics: last reset for a database',
proname => 'pg_stat_get_db_stat_reset_time', provolatile => 's',
proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 5888242f75..ef2e81d676 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -541,6 +541,7 @@ typedef struct PgStat_MsgChecksumFailure
PgStat_MsgHdr m_hdr;
Oid m_databaseid;
int m_failurecount;
+ TimestampTz m_failure_time;
} PgStat_MsgChecksumFailure;
@@ -607,6 +608,7 @@ typedef struct PgStat_StatDBEntry
PgStat_Counter n_temp_bytes;
PgStat_Counter n_deadlocks;
PgStat_Counter n_checksum_failures;
+ TimestampTz last_checksum_failure;
PgStat_Counter n_block_read_time; /* times in microseconds */
PgStat_Counter n_block_write_time;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 4638374a76..7b3c4c43b8 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1803,7 +1803,10 @@ pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints
pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
pg_stat_database| SELECT d.oid AS datid,
d.datname,
- pg_stat_get_db_numbackends(d.oid) AS numbackends,
+ CASE
+ WHEN (d.oid = (0)::oid) THEN NULL::integer
+ ELSE pg_stat_get_db_numbackends(d.oid)
+ END AS numbackends,
pg_stat_get_db_xact_commit(d.oid) AS xact_commit,
pg_stat_get_db_xact_rollback(d.oid) AS xact_rollback,
(pg_stat_get_db_blocks_fetched(d.oid) - pg_stat_get_db_blocks_hit(d.oid)) AS blks_read,
@@ -1818,10 +1821,16 @@ pg_stat_database| SELECT d.oid AS datid,
pg_stat_get_db_temp_bytes(d.oid) AS temp_bytes,
pg_stat_get_db_deadlocks(d.oid) AS deadlocks,
pg_stat_get_db_checksum_failures(d.oid) AS checksum_failures,
+ pg_stat_get_db_checksum_last_failure(d.oid) AS checksum_last_failure,
pg_stat_get_db_blk_read_time(d.oid) AS blk_read_time,
pg_stat_get_db_blk_write_time(d.oid) AS blk_write_time,
pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset
- FROM pg_database d;
+ FROM ( SELECT 0 AS oid,
+ NULL::name AS datname
+ UNION ALL
+ SELECT pg_database.oid,
+ pg_database.datname
+ FROM pg_database) d;
pg_stat_database_conflicts| SELECT d.oid AS datid,
d.datname,
pg_stat_get_db_conflict_tablespace(d.oid) AS confl_tablespace,
On Sun, Apr 7, 2019 at 6:28 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Thanks for looking it it!
On Sun, Apr 7, 2019 at 4:36 PM Magnus Hagander <magnus@hagander.net>
wrote:I'm not sure I like the idea of using "<shared_objects>" as the database
name. It's not very likely that somebody would be using that as a name for
their database, but i's not impossible. But it also just looks strrange.
Wouldn't NULL be a more appropriate choice?Likewise, shouldn't we return NULL as the number of backends for the
shared counters, rather than 0?
I wanted to make things more POLA-compliant, but maybe it was a bad
idea. I changed it for NULL here and for numbackends.Micro-nit:
+ <entry>Time at which the last data page checksum failures wasdetected in
s/failures/failure/
Oops.
v5 attached.
Thanks. Pushed!
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On Fri, Apr 12, 2019 at 2:18 PM Magnus Hagander <magnus@hagander.net> wrote:
On Sun, Apr 7, 2019 at 6:28 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
v5 attached.
Thanks. Pushed!
Thanks!
I started looking at this the other night but I see Magnus beat me in
committing it...
On Fri, Apr 12, 2019 at 8:18 AM Magnus Hagander <magnus@hagander.net> wrote:
On Sun, Apr 7, 2019 at 6:28 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Thanks for looking it it!
On Sun, Apr 7, 2019 at 4:36 PM Magnus Hagander <magnus@hagander.net> wrote:I'm not sure I like the idea of using "<shared_objects>" as the database name. It's not very likely that somebody would be using that as a name for their database, but i's not impossible. But it also just looks strrange. Wouldn't NULL be a more appropriate choice?
Likewise, shouldn't we return NULL as the number of backends for the shared counters, rather than 0?
I wanted to make things more POLA-compliant, but maybe it was a bad
idea. I changed it for NULL here and for numbackends.
ISTM the argument here is go with zero since you have zero connections
vs go with null since you can't actually connect, so it doesn't make
sense. (There is a third argument about making it -1 since you can't
connect, but that breaks sum(numbackends) so it's easily dismissed.) I
think I would have gone for 0 personally, but what ended up surprising
me was that a bunch of other stuff like xact_commit show zero when
AFAICT the above reasoning would apply the same to those columns.
(unless there is a way to commit a transaction in the global objects
that I don't know about).
Micro-nit:
+ <entry>Time at which the last data page checksum failures was detected in
s/failures/failure/Oops.
v5 attached.
What originally got me looking at this was the idea of returning -1
(or maybe null) for checksum failures for cases when checksums are not
enabled. This seems a little more complicated to set up, but seems
like it might ward off people thinking they are safe due to no
checksum error reports when they actually aren't.
Robert Treat
https://xzilla.net
On Sat, Apr 13, 2019 at 8:46 PM Robert Treat <rob@xzilla.net> wrote:
On Fri, Apr 12, 2019 at 8:18 AM Magnus Hagander <magnus@hagander.net>
wrote:On Sun, Apr 7, 2019 at 6:28 PM Julien Rouhaud <rjuju123@gmail.com>
wrote:
Thanks for looking it it!
On Sun, Apr 7, 2019 at 4:36 PM Magnus Hagander <magnus@hagander.net>wrote:
I'm not sure I like the idea of using "<shared_objects>" as the
database name. It's not very likely that somebody would be using that as a
name for their database, but i's not impossible. But it also just looks
strrange. Wouldn't NULL be a more appropriate choice?Likewise, shouldn't we return NULL as the number of backends for the
shared counters, rather than 0?
I wanted to make things more POLA-compliant, but maybe it was a bad
idea. I changed it for NULL here and for numbackends.ISTM the argument here is go with zero since you have zero connections
vs go with null since you can't actually connect, so it doesn't make
sense. (There is a third argument about making it -1 since you can't
connect, but that breaks sum(numbackends) so it's easily dismissed.) I
think I would have gone for 0 personally, but what ended up surprising
me was that a bunch of other stuff like xact_commit show zero when
AFAICT the above reasoning would apply the same to those columns.
(unless there is a way to commit a transaction in the global objects
that I don't know about).
That's a good point. I mean, you can commit a transaction that involves
changes of global objects, but it counts in the database that you were
conneced to.
We should probably at least make it consistent and make it NULL in all or 0
in all.
I'm -1 for using -1 (!), for the very reason that you mention. But either
changing the numbackends to 0, or the others to NULL would work for
consistency. I'm leaning towards the 0 as well.
Micro-nit:
+ <entry>Time at which the last data page checksum failures was
detected in
s/failures/failure/
Oops.
v5 attached.
What originally got me looking at this was the idea of returning -1
(or maybe null) for checksum failures for cases when checksums are not
enabled. This seems a little more complicated to set up, but seems
like it might ward off people thinking they are safe due to no
checksum error reports when they actually aren't.
NULL seems like the reasonable thing to return there. I'm not sure what
you're referring to with a little more complicated to set up, thought? Do
you mean somehow for the end user?
Code-wise it seems it should be simple -- just do an "if checksums disabled
then return null" in the two functions.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Sorry for late reply,
On Sun, Apr 14, 2019 at 7:12 PM Magnus Hagander <magnus@hagander.net> wrote:
On Sat, Apr 13, 2019 at 8:46 PM Robert Treat <rob@xzilla.net> wrote:
On Fri, Apr 12, 2019 at 8:18 AM Magnus Hagander <magnus@hagander.net> wrote:
ISTM the argument here is go with zero since you have zero connections
vs go with null since you can't actually connect, so it doesn't make
sense. (There is a third argument about making it -1 since you can't
connect, but that breaks sum(numbackends) so it's easily dismissed.) I
think I would have gone for 0 personally, but what ended up surprising
me was that a bunch of other stuff like xact_commit show zero when
AFAICT the above reasoning would apply the same to those columns.
(unless there is a way to commit a transaction in the global objects
that I don't know about).That's a good point. I mean, you can commit a transaction that involves changes of global objects, but it counts in the database that you were conneced to.
We should probably at least make it consistent and make it NULL in all or 0 in all.
I'm -1 for using -1 (!), for the very reason that you mention. But either changing the numbackends to 0, or the others to NULL would work for consistency. I'm leaning towards the 0 as well.
+1 for 0 :) Especially since it's less code in the view.
What originally got me looking at this was the idea of returning -1
(or maybe null) for checksum failures for cases when checksums are not
enabled. This seems a little more complicated to set up, but seems
like it might ward off people thinking they are safe due to no
checksum error reports when they actually aren't.NULL seems like the reasonable thing to return there. I'm not sure what you're referring to with a little more complicated to set up, thought? Do you mean somehow for the end user?
Code-wise it seems it should be simple -- just do an "if checksums disabled then return null" in the two functions.
That's indeed a good point! Lack of checksum error is distinct from
checksums not activated and we should make it obvious.
I don't know if that counts as an open item, but I attach a patch for
all points discussed here.
Attachments:
checksums_reporting_fix_v1.difftext/x-patch; charset=US-ASCII; name=checksums_reporting_fix_v1.diffDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 547fe4cce9..bf122f861a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2600,13 +2600,14 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
<entry><structfield>checksum_failures</structfield></entry>
<entry><type>bigint</type></entry>
<entry>Number of data page checksum failures detected in this
- database</entry>
+ database, or NULL is data checksums are not enabled.</entry>
</row>
<row>
<entry><structfield>checksum_last_failure</structfield></entry>
<entry><type>timestamp with time zone</type></entry>
<entry>Time at which the last data page checksum failure was detected in
- this database, or on a shared object.</entry>
+ this database (or on a shared object), or NULL is data checksums are not
+ enabled.</entry>
</row>
<row>
<entry><structfield>blk_read_time</structfield></entry>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 161bad6c90..566100d6df 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -817,7 +817,7 @@ CREATE VIEW pg_stat_database AS
D.oid AS datid,
D.datname AS datname,
CASE
- WHEN (D.oid = (0)::oid) THEN NULL::integer
+ WHEN (D.oid = (0)::oid) THEN 0
ELSE pg_stat_get_db_numbackends(D.oid)
END AS numbackends,
pg_stat_get_db_xact_commit(D.oid) AS xact_commit,
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 97f41fb46c..05240bfd14 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -15,6 +15,7 @@
#include "postgres.h"
#include "access/htup_details.h"
+#include "access/xlog.h"
#include "catalog/pg_authid.h"
#include "catalog/pg_type.h"
#include "common/ip.h"
@@ -1526,6 +1527,9 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
int64 result;
PgStat_StatDBEntry *dbentry;
+ if (!DataChecksumsEnabled())
+ PG_RETURN_NULL();
+
if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
result = 0;
else
@@ -1541,6 +1545,9 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
TimestampTz result;
PgStat_StatDBEntry *dbentry;
+ if (!DataChecksumsEnabled())
+ PG_RETURN_NULL();
+
if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
result = 0;
else
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 30973904c5..0c392e51e2 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1806,7 +1806,7 @@ pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints
pg_stat_database| SELECT d.oid AS datid,
d.datname,
CASE
- WHEN (d.oid = (0)::oid) THEN NULL::integer
+ WHEN (d.oid = (0)::oid) THEN 0
ELSE pg_stat_get_db_numbackends(d.oid)
END AS numbackends,
pg_stat_get_db_xact_commit(d.oid) AS xact_commit,
On Mon, Apr 15, 2019 at 3:32 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Sorry for late reply,
On Sun, Apr 14, 2019 at 7:12 PM Magnus Hagander <magnus@hagander.net> wrote:
On Sat, Apr 13, 2019 at 8:46 PM Robert Treat <rob@xzilla.net> wrote:
On Fri, Apr 12, 2019 at 8:18 AM Magnus Hagander <magnus@hagander.net> wrote:
ISTM the argument here is go with zero since you have zero connections
vs go with null since you can't actually connect, so it doesn't make
sense. (There is a third argument about making it -1 since you can't
connect, but that breaks sum(numbackends) so it's easily dismissed.) I
think I would have gone for 0 personally, but what ended up surprising
me was that a bunch of other stuff like xact_commit show zero when
AFAICT the above reasoning would apply the same to those columns.
(unless there is a way to commit a transaction in the global objects
that I don't know about).That's a good point. I mean, you can commit a transaction that involves changes of global objects, but it counts in the database that you were conneced to.
We should probably at least make it consistent and make it NULL in all or 0 in all.
I'm -1 for using -1 (!), for the very reason that you mention. But either changing the numbackends to 0, or the others to NULL would work for consistency. I'm leaning towards the 0 as well.
+1 for 0 :) Especially since it's less code in the view.
+1 for 0
What originally got me looking at this was the idea of returning -1
(or maybe null) for checksum failures for cases when checksums are not
enabled. This seems a little more complicated to set up, but seems
like it might ward off people thinking they are safe due to no
checksum error reports when they actually aren't.NULL seems like the reasonable thing to return there. I'm not sure what you're referring to with a little more complicated to set up, thought? Do you mean somehow for the end user?
Code-wise it seems it should be simple -- just do an "if checksums disabled then return null" in the two functions.
That's indeed a good point! Lack of checksum error is distinct from
checksums not activated and we should make it obvious.I don't know if that counts as an open item, but I attach a patch for
all points discussed here.
ISTM we should mention shared objects in both places in the docs, and
want "NULL if data checksums" rather than "NULL is data checksums".
Attaching slightly modified patch with those changes, but otherwise
LGTM.
Robert Treat
https://xzilla.net
Attachments:
checksums_reporting_fix_v2.diffapplication/octet-stream; name=checksums_reporting_fix_v2.diffDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 547fe4cce9..bf122f861a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2600,13 +2600,14 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
<entry><structfield>checksum_failures</structfield></entry>
<entry><type>bigint</type></entry>
<entry>Number of data page checksum failures detected in this
- database</entry>
+ database (or on a shared object), or NULL if data checksums are not
+ enabled.</entry>
</row>
<row>
<entry><structfield>checksum_last_failure</structfield></entry>
<entry><type>timestamp with time zone</type></entry>
<entry>Time at which the last data page checksum failure was detected in
- this database, or on a shared object.</entry>
+ this database (or on a shared object), or NULL if data checksums are not
+ enabled.</entry>
</row>
<row>
<entry><structfield>blk_read_time</structfield></entry>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 161bad6c90..566100d6df 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -817,7 +817,7 @@ CREATE VIEW pg_stat_database AS
D.oid AS datid,
D.datname AS datname,
CASE
- WHEN (D.oid = (0)::oid) THEN NULL::integer
+ WHEN (D.oid = (0)::oid) THEN 0
ELSE pg_stat_get_db_numbackends(D.oid)
END AS numbackends,
pg_stat_get_db_xact_commit(D.oid) AS xact_commit,
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 97f41fb46c..05240bfd14 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -15,6 +15,7 @@
#include "postgres.h"
#include "access/htup_details.h"
+#include "access/xlog.h"
#include "catalog/pg_authid.h"
#include "catalog/pg_type.h"
#include "common/ip.h"
@@ -1526,6 +1527,9 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
int64 result;
PgStat_StatDBEntry *dbentry;
+ if (!DataChecksumsEnabled())
+ PG_RETURN_NULL();
+
if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
result = 0;
else
@@ -1541,6 +1545,9 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
TimestampTz result;
PgStat_StatDBEntry *dbentry;
+ if (!DataChecksumsEnabled())
+ PG_RETURN_NULL();
+
if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
result = 0;
else
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 30973904c5..0c392e51e2 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1806,7 +1806,7 @@ pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints
pg_stat_database| SELECT d.oid AS datid,
d.datname,
CASE
- WHEN (d.oid = (0)::oid) THEN NULL::integer
+ WHEN (d.oid = (0)::oid) THEN 0
ELSE pg_stat_get_db_numbackends(d.oid)
END AS numbackends,
pg_stat_get_db_xact_commit(d.oid) AS xact_commit,
On Tue, Apr 16, 2019 at 5:39 PM Robert Treat <rob@xzilla.net> wrote:
On Mon, Apr 15, 2019 at 3:32 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Sorry for late reply,
On Sun, Apr 14, 2019 at 7:12 PM Magnus Hagander <magnus@hagander.net>
wrote:
On Sat, Apr 13, 2019 at 8:46 PM Robert Treat <rob@xzilla.net> wrote:
On Fri, Apr 12, 2019 at 8:18 AM Magnus Hagander <magnus@hagander.net>
wrote:
ISTM the argument here is go with zero since you have zero connections
vs go with null since you can't actually connect, so it doesn't make
sense. (There is a third argument about making it -1 since you can't
connect, but that breaks sum(numbackends) so it's easily dismissed.) I
think I would have gone for 0 personally, but what ended up surprising
me was that a bunch of other stuff like xact_commit show zero when
AFAICT the above reasoning would apply the same to those columns.
(unless there is a way to commit a transaction in the global objects
that I don't know about).That's a good point. I mean, you can commit a transaction that
involves changes of global objects, but it counts in the database that you
were conneced to.We should probably at least make it consistent and make it NULL in all
or 0 in all.
I'm -1 for using -1 (!), for the very reason that you mention. But
either changing the numbackends to 0, or the others to NULL would work for
consistency. I'm leaning towards the 0 as well.+1 for 0 :) Especially since it's less code in the view.
+1 for 0
What originally got me looking at this was the idea of returning -1
(or maybe null) for checksum failures for cases when checksums are not
enabled. This seems a little more complicated to set up, but seems
like it might ward off people thinking they are safe due to no
checksum error reports when they actually aren't.NULL seems like the reasonable thing to return there. I'm not sure
what you're referring to with a little more complicated to set up, thought?
Do you mean somehow for the end user?Code-wise it seems it should be simple -- just do an "if checksums
disabled then return null" in the two functions.
That's indeed a good point! Lack of checksum error is distinct from
checksums not activated and we should make it obvious.I don't know if that counts as an open item, but I attach a patch for
all points discussed here.ISTM we should mention shared objects in both places in the docs, and
want "NULL if data checksums" rather than "NULL is data checksums".
Attaching slightly modified patch with those changes, but otherwise
LGTM.
Interestingly enough, that patch comes out as corrupt. I have no idea why
though :) v1 is fine.
So I tried merging back your changes into it, and then pushing. Please
doublecheck I didn't miss something :)
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On Wed, Apr 17, 2019 at 1:55 PM Magnus Hagander <magnus@hagander.net> wrote:
On Tue, Apr 16, 2019 at 5:39 PM Robert Treat <rob@xzilla.net> wrote:
On Mon, Apr 15, 2019 at 3:32 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
I don't know if that counts as an open item, but I attach a patch for
all points discussed here.ISTM we should mention shared objects in both places in the docs, and
want "NULL if data checksums" rather than "NULL is data checksums".
Attaching slightly modified patch with those changes, but otherwise
LGTM.
Thanks, that's indeed embarassing typos. And agreed for mentioning
shared objects in both places.
Interestingly enough, that patch comes out as corrupt. I have no idea why though :) v1 is fine.
So I tried merging back your changes into it, and then pushing. Please doublecheck I didn't miss something :)
Thanks! I double checked and it all looks fine.
On Wed, Apr 17, 2019 at 9:07 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Wed, Apr 17, 2019 at 1:55 PM Magnus Hagander <magnus@hagander.net> wrote:
On Tue, Apr 16, 2019 at 5:39 PM Robert Treat <rob@xzilla.net> wrote:
On Mon, Apr 15, 2019 at 3:32 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
I don't know if that counts as an open item, but I attach a patch for
all points discussed here.ISTM we should mention shared objects in both places in the docs, and
want "NULL if data checksums" rather than "NULL is data checksums".
Attaching slightly modified patch with those changes, but otherwise
LGTM.Thanks, that's indeed embarassing typos. And agreed for mentioning
shared objects in both places.Interestingly enough, that patch comes out as corrupt. I have no idea why though :) v1 is fine.
So I tried merging back your changes into it, and then pushing. Please doublecheck I didn't miss something :)
Thanks! I double checked and it all looks fine.
+1
Robert Treat
https://xzilla.net
On 4/2/19 7:06 PM, Magnus Hagander wrote:
On Tue, Apr 2, 2019 at 8:47 AM Michael Paquier <michael@paquier.xyz <mailto:michael@paquier.xyz>> wrote:
On Tue, Apr 02, 2019 at 07:43:12AM +0200, Julien Rouhaud wrote:
On Tue, Apr 2, 2019 at 6:56 AM Michael Paquier <michael@paquier.xyz <mailto:michael@paquier.xyz>> wrote:
One thing which is not
proposed on this patch, and I am fine with it as a first draft, is
that we don't have any information about the broken block number and
the file involved. My gut tells me that we'd want a separate view,
like pg_stat_checksums_details with one tuple per (dboid, rel, fork,
blck) to be complete. But that's just for future work.That could indeed be nice.
Actually, backpedaling on this one... pg_stat_checksums_details may
be a bad idea as we could finish with one row per broken block. If
a corruption is spreading quickly, pgstat would not be able to sustain
that amount of objects. Having pg_stat_checksums would allow us to
plugin more data easily based on the last failure state:
- last relid of failure
- last fork type of failure
- last block number of failure.
Not saying to do that now, but having that in pg_stat_database does
not seem very natural to me. And on top of that we would have an
extra row full of NULLs for shared objects in pg_stat_database if we
adopt the unique view approach... I find that rather ugly.I think that tracking each and every block is of course a non-starter, as you've noticed.
I think that's less of a concern now that the stats collector process has gone and that the stats are now collected in shared memory, what do you think?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Thu, Dec 8, 2022 at 2:35 PM Drouvot, Bertrand <
bertranddrouvot.pg@gmail.com> wrote:
On 4/2/19 7:06 PM, Magnus Hagander wrote:
On Tue, Apr 2, 2019 at 8:47 AM Michael Paquier <michael@paquier.xyz
<mailto:michael@paquier.xyz>> wrote:
On Tue, Apr 02, 2019 at 07:43:12AM +0200, Julien Rouhaud wrote:
On Tue, Apr 2, 2019 at 6:56 AM Michael Paquier <
michael@paquier.xyz <mailto:michael@paquier.xyz>> wrote:
One thing which is not
proposed on this patch, and I am fine with it as a first draft,is
that we don't have any information about the broken block number
and
the file involved. My gut tells me that we'd want a separate
view,
like pg_stat_checksums_details with one tuple per (dboid, rel,
fork,
blck) to be complete. But that's just for future work.
That could indeed be nice.
Actually, backpedaling on this one... pg_stat_checksums_details may
be a bad idea as we could finish with one row per broken block. If
a corruption is spreading quickly, pgstat would not be able tosustain
that amount of objects. Having pg_stat_checksums would allow us to
plugin more data easily based on the last failure state:
- last relid of failure
- last fork type of failure
- last block number of failure.
Not saying to do that now, but having that in pg_stat_database does
not seem very natural to me. And on top of that we would have an
extra row full of NULLs for shared objects in pg_stat_database if we
adopt the unique view approach... I find that rather ugly.I think that tracking each and every block is of course a non-starter,
as you've noticed.
I think that's less of a concern now that the stats collector process has
gone and that the stats are now collected in shared memory, what do you
think?
It would be less of a concern yes, but I think it still would be a concern.
If you have a large amount of corruption you could quickly get to millions
of rows to keep track of which would definitely be a problem in shared
memory as well, wouldn't it?
But perhaps we could keep a list of "the last 100 checksum failures" or
something like that?
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On Sun, Dec 11, 2022 at 09:18:42PM +0100, Magnus Hagander wrote:
It would be less of a concern yes, but I think it still would be a concern.
If you have a large amount of corruption you could quickly get to millions
of rows to keep track of which would definitely be a problem in shared
memory as well, wouldn't it?
Yes. I have discussed this item with Bertrand off-list and I share
the same concern. This would lead to an lot of extra workload on a
large seqscan for a corrupted relation when the stats are written
(shutdown delay) while bloating shared memory with potentially
millions of items even if variable lists are handled through a dshash
and DSM.
But perhaps we could keep a list of "the last 100 checksum failures" or
something like that?
Applying a threshold is one solution. Now, a second thing I have seen
in the past is that some disk partitions were busted but not others,
and the current database-level counters are not enough to make a
difference when it comes to grab patterns in this area. A list of the
last N failures may be able to show some pattern, but that would be
like analyzing things with a lot of noise without a clear conclusion.
Anyway, the workload caused by the threshold number had better be
measured before being decided (large set of relation files with a full
range of blocks corrupted, much better if these are in the OS cache
when scanned), which does not change the need of a benchmark.
What about just adding a counter tracking the number of checksum
failures for relfilenodes in a new structure related to them (note
that I did not write PgStat_StatTabEntry)?
If we do that, it is then possible to cross-check the failures with
tablespaces, which would point to disk areas that are more sensitive
to corruption.
--
Michael
Hi,
On 2022-12-12 08:40:04 +0900, Michael Paquier wrote:
What about just adding a counter tracking the number of checksum
failures for relfilenodes in a new structure related to them (note
that I did not write PgStat_StatTabEntry)?
Why were you thinking of tracking it separately from PgStat_StatTabEntry?
I think there's a good argument for starting to track some stats based on the
relfilenode, rather the oid, because it'd allow us to track e.g. the number of
writes for a relation too (we don't have the oid when writing out
buffers). But that's a relatively large change...
Greetings,
Andres Freund
On Sun, Dec 11, 2022 at 04:51:49PM -0800, Andres Freund wrote:
Why were you thinking of tracking it separately from PgStat_StatTabEntry?
We only know the relfilenode when loading the page on a checksum
failure, not its parent relation, and there are things like physical
base backups where we would not know them anyway because we may not be
connected to a database. Or perhaps it would be possible to link
table entries with their relfilenodes using some tweaks in the stat
APIs? I am sure that you know the business in this area better than I
do currently :)
I think there's a good argument for starting to track some stats based on the
relfilenode, rather the oid, because it'd allow us to track e.g. the number of
writes for a relation too (we don't have the oid when writing out
buffers). But that's a relatively large change...
Yeah. I was thinking among the lines of sync requests and sync
failures, as well.
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
On Sun, Dec 11, 2022 at 04:51:49PM -0800, Andres Freund wrote:
I think there's a good argument for starting to track some stats based on the
relfilenode, rather the oid, because it'd allow us to track e.g. the number of
writes for a relation too (we don't have the oid when writing out
buffers). But that's a relatively large change...
Yeah. I was thinking among the lines of sync requests and sync
failures, as well.
I think a stats table indexed solely by relfilenode wouldn't be a great
idea, because you'd lose all the stats about a table as soon as you
do vacuum full/cluster/rewriting-alter-table/etc. Can we create two
index structures over the same stats table entries, so you can look
up by either relfilenode or OID? I'm not quite sure how to manage
rewrites, where you transiently have two relfilenodes for "the
same" table ... maybe we could allow multiple pointers to the same
stats entry??
regards, tom lane
On Sun, Dec 11, 2022 at 08:48:15PM -0500, Tom Lane wrote:
I think a stats table indexed solely by relfilenode wouldn't be a great
idea, because you'd lose all the stats about a table as soon as you
do vacuum full/cluster/rewriting-alter-table/etc. Can we create two
index structures over the same stats table entries, so you can look
up by either relfilenode or OID? I'm not quite sure how to manage
rewrites, where you transiently have two relfilenodes for "the
same" table ... maybe we could allow multiple pointers to the same
stats entry??
FWIW, I am not sure that I would care much if we were to dropped the
stats associated to a relfilenode on a rewrite. In terms of checksum
failures, tuples are deformed so if there is one checksum failure a
rewrite would just not happen. The potential complexity is not really
appealing compared to the implementation simplicity and its gains, and
rewrites are lock-heavy so I'd like to think that people avoid them
(cough)..
--
Michael
On 12/12/22 12:40 AM, Michael Paquier wrote:
On Sun, Dec 11, 2022 at 09:18:42PM +0100, Magnus Hagander wrote:
It would be less of a concern yes, but I think it still would be a concern.
If you have a large amount of corruption you could quickly get to millions
of rows to keep track of which would definitely be a problem in shared
memory as well, wouldn't it?Yes. I have discussed this item with Bertrand off-list and I share
the same concern. This would lead to an lot of extra workload on a
large seqscan for a corrupted relation when the stats are written
(shutdown delay) while bloating shared memory with potentially
millions of items even if variable lists are handled through a dshash
and DSM.But perhaps we could keep a list of "the last 100 checksum failures" or
something like that?Applying a threshold is one solution. Now, a second thing I have seen
in the past is that some disk partitions were busted but not others,
and the current database-level counters are not enough to make a
difference when it comes to grab patterns in this area. A list of the
last N failures may be able to show some pattern, but that would be
like analyzing things with a lot of noise without a clear conclusion.
Anyway, the workload caused by the threshold number had better be
measured before being decided (large set of relation files with a full
range of blocks corrupted, much better if these are in the OS cache
when scanned), which does not change the need of a benchmark.What about just adding a counter tracking the number of checksum
failures for relfilenodes
Agree about your concern for tracking the corruption for every single block.
I like this idea for relfilenodes tracking instead. Indeed it looks like this is enough useful historical information to work with.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On 12/12/22 5:09 AM, Michael Paquier wrote:
On Sun, Dec 11, 2022 at 08:48:15PM -0500, Tom Lane wrote:
I think a stats table indexed solely by relfilenode wouldn't be a great
idea, because you'd lose all the stats about a table as soon as you
do vacuum full/cluster/rewriting-alter-table/etc. Can we create two
index structures over the same stats table entries, so you can look
up by either relfilenode or OID? I'm not quite sure how to manage
rewrites, where you transiently have two relfilenodes for "the
same" table ... maybe we could allow multiple pointers to the same
stats entry??FWIW, I am not sure that I would care much if we were to dropped the
stats associated to a relfilenode on a rewrite. In terms of checksum
failures, tuples are deformed so if there is one checksum failure a
rewrite would just not happen. The potential complexity is not really
appealing compared to the implementation simplicity and its gains, and
rewrites are lock-heavy so I'd like to think that people avoid them
(cough)..
Agree that this is less "problematic" for the checksum use case.
On the other hand, losing IO stats (as the ones we could add later on, suggested by Andres up-thread) looks more of a concern to me.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Mon, Dec 12, 2022 at 12:40 AM Michael Paquier <michael@paquier.xyz>
wrote:
On Sun, Dec 11, 2022 at 09:18:42PM +0100, Magnus Hagander wrote:
It would be less of a concern yes, but I think it still would be a
concern.
If you have a large amount of corruption you could quickly get to
millions
of rows to keep track of which would definitely be a problem in shared
memory as well, wouldn't it?Yes. I have discussed this item with Bertrand off-list and I share
the same concern. This would lead to an lot of extra workload on a
large seqscan for a corrupted relation when the stats are written
(shutdown delay) while bloating shared memory with potentially
millions of items even if variable lists are handled through a dshash
and DSM.But perhaps we could keep a list of "the last 100 checksum failures" or
something like that?Applying a threshold is one solution. Now, a second thing I have seen
in the past is that some disk partitions were busted but not others,
and the current database-level counters are not enough to make a
difference when it comes to grab patterns in this area. A list of the
last N failures may be able to show some pattern, but that would be
like analyzing things with a lot of noise without a clear conclusion.
Anyway, the workload caused by the threshold number had better be
measured before being decided (large set of relation files with a full
range of blocks corrupted, much better if these are in the OS cache
when scanned), which does not change the need of a benchmark.What about just adding a counter tracking the number of checksum
failures for relfilenodes in a new structure related to them (note
that I did not write PgStat_StatTabEntry)?If we do that, it is then possible to cross-check the failures with
tablespaces, which would point to disk areas that are more sensitive
to corruption.
If that's the concern, then perhaps the level we should be tracking things
on is tablespace? We don't have any stats per tablespace today I believe,
but that doesn't mean we couldn't create that.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On 12/12/22 8:15 AM, Drouvot, Bertrand wrote:
On 12/12/22 5:09 AM, Michael Paquier wrote:
On Sun, Dec 11, 2022 at 08:48:15PM -0500, Tom Lane wrote:
I think a stats table indexed solely by relfilenode wouldn't be a great
idea, because you'd lose all the stats about a table as soon as you
do vacuum full/cluster/rewriting-alter-table/etc. Can we create two
index structures over the same stats table entries, so you can look
up by either relfilenode or OID? I'm not quite sure how to manage
rewrites, where you transiently have two relfilenodes for "the
same" table ... maybe we could allow multiple pointers to the same
stats entry??FWIW, I am not sure that I would care much if we were to dropped the
stats associated to a relfilenode on a rewrite. In terms of checksum
failures, tuples are deformed so if there is one checksum failure a
rewrite would just not happen. The potential complexity is not really
appealing compared to the implementation simplicity and its gains, and
rewrites are lock-heavy so I'd like to think that people avoid them
(cough)..Agree that this is less "problematic" for the checksum use case.
On the other hand, losing IO stats (as the ones we could add later on, suggested by Andres up-thread) looks more of a concern to me.
One option could be to have a dedicated PgStat_StatRelFileNodeEntry and populate the related PgStat_StatTabEntry when calling the new to be created pgstat_relfilenode_flush_cb()? (That's what we are doing currently to
flush some of the table stats to the database stats for example).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On 2022-12-11 20:48:15 -0500, Tom Lane wrote:
Michael Paquier <michael@paquier.xyz> writes:
On Sun, Dec 11, 2022 at 04:51:49PM -0800, Andres Freund wrote:
I think there's a good argument for starting to track some stats based on the
relfilenode, rather the oid, because it'd allow us to track e.g. the number of
writes for a relation too (we don't have the oid when writing out
buffers). But that's a relatively large change...Yeah. I was thinking among the lines of sync requests and sync
failures, as well.I think a stats table indexed solely by relfilenode wouldn't be a great
idea, because you'd lose all the stats about a table as soon as you
do vacuum full/cluster/rewriting-alter-table/etc.
I don't think that'd be a huge issue - we already have code to keep some
stats as part of rewrites that change the oid of a relation. We could do
the same for rewrites that just change the relfilenode.
However, I'm not sure it's a good idea to keep the stats during
rewrites. Most rewrites end up not copying dead tuples, for example, so
keeping the old counts of updated tuples doesn't really make sense.
Can we create two index structures over the same stats table entries,
so you can look up by either relfilenode or OID?
We could likely do that, yes. I think we'd have one "stats body" and
multiple hash table entries pointing to one. The complicated bit would
likely be that we'd need some additional refcounting to know when
there's no references to the "stats body" left.
Greetings,
Andres Freund