Some regression tests for the pg_control_*() functions

Started by Michael Paquierabout 3 years ago5 messages

michael@paquier.xyz

about 3 years ago

1 attachment(s)

Hi all,

As mentioned in [1]/messages/by-id/YzY0iLxNbmaxHpbs@paquier.xyz -- Michael, there is no regression tests for the SQL control
functions: pg_control_checkpoint, pg_control_recovery,
pg_control_system and pg_control_init.

It would be minimal to check their execution, as of a "SELECT FROM
func()", still some validation can be done on its output as long as
the test is portable enough (needs transparency for wal_level, commit
timestamps, etc.).

Attached is a proposal to provide some coverage. Some of the checks
could be just removed, like the ones for non-NULL fields, but I have
written out everything to show how much could be done.

Thoughts?

[1]: /messages/by-id/YzY0iLxNbmaxHpbs@paquier.xyz -- Michael
--
Michael

Attachments:

controldata-regression-3.patchtext/x-diff; charset=us-asciiDownload

diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 9f106c2a10..a5df523109 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -594,3 +594,89 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
          Index Cond: (unique1 = g.g)
 (4 rows)
 
+--
+-- Test functions for control data
+--
+\x
+SELECT checkpoint_lsn > '0/0'::pg_lsn AS checkpoint_lsn,
+    redo_lsn > '0/0'::pg_lsn AS redo_lsn,
+    redo_wal_file IS NOT NULL AS redo_wal_file,
+    timeline_id > 0 AS timeline_id,
+    prev_timeline_id > 0 AS prev_timeline_id,
+    next_xid IS NOT NULL AS next_xid,
+    next_oid > 0 AS next_oid,
+    next_multixact_id != '0'::xid AS next_multixact_id,
+    next_multi_offset IS NOT NULL AS next_multi_offset,
+    oldest_xid != '0'::xid AS oldest_xid,
+    oldest_xid_dbid > 0 AS oldest_xid_dbid,
+    oldest_active_xid IS NOT NULL AS oldest_active_xid,
+    oldest_multi_xid != '0'::xid AS oldest_multi_xid,
+    oldest_multi_dbid > 0 AS oldest_multi_dbid,
+    oldest_commit_ts_xid IS NOT NULL AS oldest_commit_ts_xid,
+    newest_commit_ts_xid IS NOT NULL AS newest_commit_ts_xid
+  FROM pg_control_checkpoint();
+-[ RECORD 1 ]--------+--
+checkpoint_lsn       | t
+redo_lsn             | t
+redo_wal_file        | t
+timeline_id          | t
+prev_timeline_id     | t
+next_xid             | t
+next_oid             | t
+next_multixact_id    | t
+next_multi_offset    | t
+oldest_xid           | t
+oldest_xid_dbid      | t
+oldest_active_xid    | t
+oldest_multi_xid     | t
+oldest_multi_dbid    | t
+oldest_commit_ts_xid | t
+newest_commit_ts_xid | t
+
+SELECT max_data_alignment > 0 AS max_data_alignment,
+    database_block_size > 0 AS database_block_size,
+    blocks_per_segment > 0 AS blocks_per_segment,
+    wal_block_size > 0 AS wal_block_size,
+    max_identifier_length > 0 AS max_identifier_length,
+    max_index_columns > 0 AS max_index_columns,
+    max_toast_chunk_size > 0 AS max_toast_chunk_size,
+    large_object_chunk_size > 0 AS large_object_chunk_size,
+    float8_pass_by_value IS NOT NULL AS float8_pass_by_value,
+    data_page_checksum_version >= 0 AS data_page_checksum_version
+  FROM pg_control_init();
+-[ RECORD 1 ]--------------+--
+max_data_alignment         | t
+database_block_size        | t
+blocks_per_segment         | t
+wal_block_size             | t
+max_identifier_length      | t
+max_index_columns          | t
+max_toast_chunk_size       | t
+large_object_chunk_size    | t
+float8_pass_by_value       | t
+data_page_checksum_version | t
+
+SELECT min_recovery_end_lsn >= '0/0'::pg_lsn AS min_recovery_end_lsn,
+    min_recovery_end_timeline >= 0 AS min_recovery_end_timeline,
+    backup_start_lsn >= '0/0'::pg_lsn AS backup_start_lsn,
+    backup_end_lsn >= '0/0'::pg_lsn AS backup_end_lsn,
+    end_of_backup_record_required IS NOT NULL AS end_of_backup_record_required
+  FROM pg_control_recovery();
+-[ RECORD 1 ]-----------------+--
+min_recovery_end_lsn          | t
+min_recovery_end_timeline     | t
+backup_start_lsn              | t
+backup_end_lsn                | t
+end_of_backup_record_required | t
+
+SELECT pg_control_version > 0 AS pg_control_version,
+    catalog_version_no > 0 AS catalog_version_no,
+    system_identifier >= 0 AS system_identifier,
+    pg_control_last_modified <= now() AS pg_control_last_modified
+  FROM pg_control_system();
+-[ RECORD 1 ]------------+--
+pg_control_version       | t
+catalog_version_no       | t
+system_identifier        | t
+pg_control_last_modified | t
+
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index 639e9b352c..e5e75b82f3 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -223,3 +223,47 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,1000) g ON a.unique1 = g;
 
 EXPLAIN (COSTS OFF)
 SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
+
+--
+-- Test functions for control data
+--
+\x
+SELECT checkpoint_lsn > '0/0'::pg_lsn AS checkpoint_lsn,
+    redo_lsn > '0/0'::pg_lsn AS redo_lsn,
+    redo_wal_file IS NOT NULL AS redo_wal_file,
+    timeline_id > 0 AS timeline_id,
+    prev_timeline_id > 0 AS prev_timeline_id,
+    next_xid IS NOT NULL AS next_xid,
+    next_oid > 0 AS next_oid,
+    next_multixact_id != '0'::xid AS next_multixact_id,
+    next_multi_offset IS NOT NULL AS next_multi_offset,
+    oldest_xid != '0'::xid AS oldest_xid,
+    oldest_xid_dbid > 0 AS oldest_xid_dbid,
+    oldest_active_xid IS NOT NULL AS oldest_active_xid,
+    oldest_multi_xid != '0'::xid AS oldest_multi_xid,
+    oldest_multi_dbid > 0 AS oldest_multi_dbid,
+    oldest_commit_ts_xid IS NOT NULL AS oldest_commit_ts_xid,
+    newest_commit_ts_xid IS NOT NULL AS newest_commit_ts_xid
+  FROM pg_control_checkpoint();
+SELECT max_data_alignment > 0 AS max_data_alignment,
+    database_block_size > 0 AS database_block_size,
+    blocks_per_segment > 0 AS blocks_per_segment,
+    wal_block_size > 0 AS wal_block_size,
+    max_identifier_length > 0 AS max_identifier_length,
+    max_index_columns > 0 AS max_index_columns,
+    max_toast_chunk_size > 0 AS max_toast_chunk_size,
+    large_object_chunk_size > 0 AS large_object_chunk_size,
+    float8_pass_by_value IS NOT NULL AS float8_pass_by_value,
+    data_page_checksum_version >= 0 AS data_page_checksum_version
+  FROM pg_control_init();
+SELECT min_recovery_end_lsn >= '0/0'::pg_lsn AS min_recovery_end_lsn,
+    min_recovery_end_timeline >= 0 AS min_recovery_end_timeline,
+    backup_start_lsn >= '0/0'::pg_lsn AS backup_start_lsn,
+    backup_end_lsn >= '0/0'::pg_lsn AS backup_end_lsn,
+    end_of_backup_record_required IS NOT NULL AS end_of_backup_record_required
+  FROM pg_control_recovery();
+SELECT pg_control_version > 0 AS pg_control_version,
+    catalog_version_no > 0 AS catalog_version_no,
+    system_identifier >= 0 AS system_identifier,
+    pg_control_last_modified <= now() AS pg_control_last_modified
+  FROM pg_control_system();

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

about 3 years ago

In reply to: Michael Paquier (#1)

Re: Some regression tests for the pg_control_*() functions

On Tue, Oct 25, 2022 at 11:07 AM Michael Paquier <michael@paquier.xyz> wrote:

Hi all,

As mentioned in [1], there is no regression tests for the SQL control
functions: pg_control_checkpoint, pg_control_recovery,
pg_control_system and pg_control_init.

It would be minimal to check their execution, as of a "SELECT FROM
func()", still some validation can be done on its output as long as
the test is portable enough (needs transparency for wal_level, commit
timestamps, etc.).

Attached is a proposal to provide some coverage. Some of the checks
could be just removed, like the ones for non-NULL fields, but I have
written out everything to show how much could be done.

Thoughts?

[1]: /messages/by-id/YzY0iLxNbmaxHpbs@paquier.xyz

+1 for improving the test coverage. Is there a strong reason to
validate individual output columns rather than select count(*) > 0
from pg_control_XXXX(); sort of tests? If the intention is to validate
the pg_controlfile contents, we have pg_controldata to look at and
pg_control_XXXX() functions doing crc checks. If this isn't enough, we
can have the pg_control_validate() function to do all the necessary
checks and simplify the tests, no?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Michael Paquier

michael@paquier.xyz

about 3 years ago

In reply to: Bharath Rupireddy (#2)

Re: Some regression tests for the pg_control_*() functions

On Wed, Oct 26, 2022 at 10:13:29AM +0530, Bharath Rupireddy wrote:

+1 for improving the test coverage. Is there a strong reason to
validate individual output columns rather than select count(*) > 0
from pg_control_XXXX(); sort of tests? If the intention is to validate
the pg_controlfile contents, we have pg_controldata to look at and
pg_control_XXXX() functions doing crc checks.

And it could be possible that the control file finishes by writing
some incorrect data due to a bug in the backend. Adding a count(*) or
similar to get the number of fields of the function is basically the
same as checking its execution, still I'd like to think that having a
minimum set of checks would be kind of nice on top of that. Among all
the ones I wrote in the patch upthread, the following ones would be in
my minimalistic list:
- timeline_id > 0
- timeline_id >= prev_timeline_id
- checkpoint_lsn >= redo_lsn
- data_page_checksum_version >= 0
- Perhaps the various fields of pg_control_init() using their
lower-bound values.
- Perhaps pg_control_version and/or catalog_version_no > NN

If this isn't enough, we
can have the pg_control_validate() function to do all the necessary
checks and simplify the tests, no?

There is no function like that. Perhaps that you mean to introduce
something like that at the C level, but that does not seem necessary
to me as long as a SQL is able to do the job for the most meaningful
parts.
--
Michael

Bharath Rupireddy

bharath.rupireddyforpostgres@gmail.com

about 3 years ago

In reply to: Michael Paquier (#3)

Re: Some regression tests for the pg_control_*() functions

On Wed, Oct 26, 2022 at 12:48 PM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Oct 26, 2022 at 10:13:29AM +0530, Bharath Rupireddy wrote:

+1 for improving the test coverage. Is there a strong reason to
validate individual output columns rather than select count(*) > 0
from pg_control_XXXX(); sort of tests? If the intention is to validate
the pg_controlfile contents, we have pg_controldata to look at and
pg_control_XXXX() functions doing crc checks.

And it could be possible that the control file finishes by writing
some incorrect data due to a bug in the backend.

We will have bigger problems when a backend corrupts the pg_control
file, no? The bigger problems could be that the server won't come up
or it behaves abnormally or some other.

Adding a count(*) or
similar to get the number of fields of the function is basically the
same as checking its execution, still I'd like to think that having a
minimum set of checks would be kind of nice on top of that. Among all
the ones I wrote in the patch upthread, the following ones would be in
my minimalistic list:
- timeline_id > 0
- timeline_id >= prev_timeline_id
- checkpoint_lsn >= redo_lsn
- data_page_checksum_version >= 0
- Perhaps the various fields of pg_control_init() using their
lower-bound values.
- Perhaps pg_control_version and/or catalog_version_no > NN

Can't the CRC check detect any of the above corruptions? Do we have
any evidence of backend corrupting the pg_control file or any of the
above variables while running regression tests?

If the concern is backend corrupting the pg_control file and CRC check
can't detect it, then the extra checks (as proposed in the patch) must
be placed within the core (perhaps before writing/after reading the
pg_control file), not in regression tests for sure.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Michael Paquier

michael@paquier.xyz

about 3 years ago

In reply to: Bharath Rupireddy (#4)

Re: Some regression tests for the pg_control_*() functions

On Wed, Oct 26, 2022 at 01:41:12PM +0530, Bharath Rupireddy wrote:

We will have bigger problems when a backend corrupts the pg_control
file, no? The bigger problems could be that the server won't come up
or it behaves abnormally or some other.

Possibly, yes.

Can't the CRC check detect any of the above corruptions? Do we have
any evidence of backend corrupting the pg_control file or any of the
above variables while running regression tests?

It could be possible that the backend writes an incorrect data
combination though its APIs, where the CRC is correct but the data is
not (say a TLI of 0, as one example).

If the concern is backend corrupting the pg_control file and CRC check
can't detect it, then the extra checks (as proposed in the patch) must
be placed within the core (perhaps before writing/after reading the
pg_control file), not in regression tests for sure.

Well, that depends on the level of protection you want. Now there are
things in place already when it comes to recovery or at startup.
Anyway, the recent experience with the 56-bit relfilenode thread is
really that we don't check the execution of these functions at all,
and that's the actual minimal requirement, so I have applied a patch
based on count(*) > 0 for now to cover that. I am not sure if any of
the checks for the control file fields are valuable, perhaps some
are..
--
Michael