regression test failed when enabling checksum

Started by Fujii Masaoalmost 13 years ago15 messages
#1Fujii Masao
masao.fujii@gmail.com
1 attachment(s)

Hi,

I found that the regression test failed when I created the database
cluster with the checksum and set wal_level to archive. I think that
there are some bugs around checksum feature. Attached is the regression.diff.

Regards,

--
Fujii Masao

Attachments:

regression.diffsapplication/octet-stream; name=regression.diffsDownload
*** /Users/postgres/pgsql/head/src/test/regress/expected/tablespace.out	2013-03-26 02:40:25.000000000 +0900
--- /Users/postgres/pgsql/head/src/test/regress/results/tablespace.out	2013-03-26 02:40:25.000000000 +0900
***************
*** 54,68 ****
  ALTER TABLE testschema.atable SET TABLESPACE testspace;
  ALTER INDEX testschema.anindex SET TABLESPACE testspace;
  INSERT INTO testschema.atable VALUES(3);	-- ok
  INSERT INTO testschema.atable VALUES(1);	-- fail (checks index)
! ERROR:  duplicate key value violates unique constraint "anindex"
! DETAIL:  Key (column1)=(1) already exists.
  SELECT COUNT(*) FROM testschema.atable;		-- checks heap
!  count 
! -------
!      3
! (1 row)
! 
  -- Will fail with bad path
  CREATE TABLESPACE badspace LOCATION '/no/such/location';
  ERROR:  directory "/no/such/location" does not exist
--- 54,67 ----
  ALTER TABLE testschema.atable SET TABLESPACE testspace;
  ALTER INDEX testschema.anindex SET TABLESPACE testspace;
  INSERT INTO testschema.atable VALUES(3);	-- ok
+ WARNING:  page verification failed, calculated checksum 33140 but expected 46276
+ ERROR:  invalid page in block 0 of relation pg_tblspc/16385/PG_9.3_201303201/16384/16401
  INSERT INTO testschema.atable VALUES(1);	-- fail (checks index)
! WARNING:  page verification failed, calculated checksum 33140 but expected 46276
! ERROR:  invalid page in block 0 of relation pg_tblspc/16385/PG_9.3_201303201/16384/16401
  SELECT COUNT(*) FROM testschema.atable;		-- checks heap
! WARNING:  page verification failed, calculated checksum 11593 but expected 59263
! ERROR:  invalid page in block 0 of relation pg_tblspc/16385/PG_9.3_201303201/16384/16402
  -- Will fail with bad path
  CREATE TABLESPACE badspace LOCATION '/no/such/location';
  ERROR:  directory "/no/such/location" does not exist

======================================================================

*** /Users/postgres/pgsql/head/src/test/regress/expected/vacuum.out	2012-07-13 00:41:16.000000000 +0900
--- /Users/postgres/pgsql/head/src/test/regress/results/vacuum.out	2013-03-26 02:40:38.000000000 +0900
***************
*** 30,71 ****
  
  VACUUM FULL vactst;
  UPDATE vactst SET i = i + 1;
  INSERT INTO vactst SELECT * FROM vactst;
  INSERT INTO vactst SELECT * FROM vactst;
  INSERT INTO vactst SELECT * FROM vactst;
  INSERT INTO vactst SELECT * FROM vactst;
  INSERT INTO vactst SELECT * FROM vactst;
  INSERT INTO vactst SELECT * FROM vactst;
  INSERT INTO vactst SELECT * FROM vactst;
  INSERT INTO vactst SELECT * FROM vactst;
  INSERT INTO vactst SELECT * FROM vactst;
  INSERT INTO vactst SELECT * FROM vactst;
  INSERT INTO vactst SELECT * FROM vactst;
  INSERT INTO vactst VALUES (0);
  SELECT count(*) FROM vactst;
!  count 
! -------
!   2049
! (1 row)
! 
  DELETE FROM vactst WHERE i != 0;
  VACUUM (FULL) vactst;
  DELETE FROM vactst;
  SELECT * FROM vactst;
!  i 
! ---
! (0 rows)
! 
  VACUUM (FULL, FREEZE) vactst;
  VACUUM (ANALYZE, FULL) vactst;
  CREATE TABLE vaccluster (i INT PRIMARY KEY);
  ALTER TABLE vaccluster CLUSTER ON vaccluster_pkey;
  INSERT INTO vaccluster SELECT * FROM vactst;
  CLUSTER vaccluster;
  VACUUM FULL pg_am;
  VACUUM FULL pg_class;
  VACUUM FULL pg_database;
  VACUUM FULL vaccluster;
  VACUUM FULL vactst;
  DROP TABLE vaccluster;
  DROP TABLE vactst;
--- 30,113 ----
  
  VACUUM FULL vactst;
  UPDATE vactst SET i = i + 1;
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  INSERT INTO vactst SELECT * FROM vactst;
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  INSERT INTO vactst SELECT * FROM vactst;
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  INSERT INTO vactst SELECT * FROM vactst;
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  INSERT INTO vactst SELECT * FROM vactst;
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  INSERT INTO vactst SELECT * FROM vactst;
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  INSERT INTO vactst SELECT * FROM vactst;
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  INSERT INTO vactst SELECT * FROM vactst;
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  INSERT INTO vactst SELECT * FROM vactst;
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  INSERT INTO vactst SELECT * FROM vactst;
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  INSERT INTO vactst SELECT * FROM vactst;
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  INSERT INTO vactst SELECT * FROM vactst;
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  INSERT INTO vactst VALUES (0);
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  SELECT count(*) FROM vactst;
! WARNING:  page verification failed, calculated checksum 32640 but expected 34086
! ERROR:  invalid page in block 0 of relation base/16384/28634
  DELETE FROM vactst WHERE i != 0;
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  VACUUM (FULL) vactst;
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  DELETE FROM vactst;
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  SELECT * FROM vactst;
! WARNING:  page verification failed, calculated checksum 32640 but expected 34086
! ERROR:  invalid page in block 0 of relation base/16384/28634
  VACUUM (FULL, FREEZE) vactst;
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  VACUUM (ANALYZE, FULL) vactst;
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  CREATE TABLE vaccluster (i INT PRIMARY KEY);
  ALTER TABLE vaccluster CLUSTER ON vaccluster_pkey;
  INSERT INTO vaccluster SELECT * FROM vactst;
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  CLUSTER vaccluster;
  VACUUM FULL pg_am;
+ WARNING:  page verification failed, calculated checksum 42300 but expected 6194
+ ERROR:  invalid page in block 0 of relation base/16384/28655
  VACUUM FULL pg_class;
+ WARNING:  page verification failed, calculated checksum 40542 but expected 53329
+ ERROR:  invalid page in block 0 of relation base/16384/28658
  VACUUM FULL pg_database;
+ ERROR:  could not read block 4 in file "base/16384/28658": read only 0 of 8192 bytes
  VACUUM FULL vaccluster;
+ ERROR:  could not read block 14 in file "base/16384/28658": read only 0 of 8192 bytes
  VACUUM FULL vactst;
+ ERROR:  could not read block 14 in file "base/16384/28658": read only 0 of 8192 bytes
  DROP TABLE vaccluster;
+ ERROR:  could not read block 14 in file "base/16384/28658": read only 0 of 8192 bytes
  DROP TABLE vactst;
+ ERROR:  could not read block 14 in file "base/16384/28658": read only 0 of 8192 bytes

======================================================================

*** /Users/postgres/pgsql/head/src/test/regress/expected/sanity_check.out	2012-08-20 07:37:38.000000000 +0900
--- /Users/postgres/pgsql/head/src/test/regress/results/sanity_check.out	2013-03-26 02:40:39.000000000 +0900
***************
*** 1,4 ****
--- 1,6 ----
  VACUUM;
+ WARNING:  page verification failed, calculated checksum 32640 but expected 34086
+ ERROR:  invalid page in block 0 of relation base/16384/28634
  --
  -- sanity check, if we don't have indices the test will take years to
  -- complete.  But skip TOAST relations (since they will have varying
***************
*** 165,172 ****
   timestamptz_tbl         | f
   timetz_tbl              | f
   tinterval_tbl           | f
   varchar_tbl             | f
! (155 rows)
  
  --
  -- another sanity check: every system catalog that has OIDs should have
--- 167,176 ----
   timestamptz_tbl         | f
   timetz_tbl              | f
   tinterval_tbl           | f
+  vaccluster              | t
+  vactst                  | f
   varchar_tbl             | f
! (157 rows)
  
  --
  -- another sanity check: every system catalog that has OIDs should have

======================================================================

*** /Users/postgres/pgsql/head/src/test/regress/expected/misc.out	2013-03-26 02:40:25.000000000 +0900
--- /Users/postgres/pgsql/head/src/test/regress/results/misc.out	2013-03-26 02:40:44.000000000 +0900
***************
*** 694,702 ****
   tvv
   tvvm
   tvvmv
   varchar_tbl
   xacttest
! (118 rows)
  
  SELECT name(equipment(hobby_construct(text 'skywalking', text 'mer')));
   name 
--- 694,704 ----
   tvv
   tvvm
   tvvmv
+  vaccluster
+  vactst
   varchar_tbl
   xacttest
! (120 rows)
  
  SELECT name(equipment(hobby_construct(text 'skywalking', text 'mer')));
   name 

======================================================================

*** /Users/postgres/pgsql/head/src/test/regress/expected/cluster.out	2012-07-13 00:41:16.000000000 +0900
--- /Users/postgres/pgsql/head/src/test/regress/results/cluster.out	2013-03-26 02:40:47.000000000 +0900
***************
*** 53,93 ****
  -- This entry is needed to test that TOASTED values are copied correctly.
  INSERT INTO clstr_tst (b, c, d) VALUES (6, 'seis', repeat('xyzzy', 100000));
  CLUSTER clstr_tst_c ON clstr_tst;
  SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
   a  | b  |       c       |           substring            | length 
  ----+----+---------------+--------------------------------+--------
-  10 | 14 | catorce       |                                |       
-  18 |  5 | cinco         |                                |       
-   9 |  4 | cuatro        |                                |       
-  26 | 19 | diecinueve    |                                |       
-  12 | 18 | dieciocho     |                                |       
-  30 | 16 | dieciseis     |                                |       
-  24 | 17 | diecisiete    |                                |       
-   2 | 10 | diez          |                                |       
-  23 | 12 | doce          |                                |       
-  11 |  2 | dos           |                                |       
-  25 |  9 | nueve         |                                |       
-  31 |  8 | ocho          |                                |       
    1 | 11 | once          |                                |       
!  28 | 15 | quince        |                                |       
!  32 |  6 | seis          | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
!  29 |  7 | siete         |                                |       
!  15 | 13 | trece         |                                |       
!  22 | 30 | treinta       |                                |       
!  17 | 32 | treinta y dos |                                |       
    3 | 31 | treinta y uno |                                |       
    5 |  3 | tres          |                                |       
-  20 |  1 | uno           |                                |       
    6 | 20 | veinte        |                                |       
   14 | 25 | veinticinco   |                                |       
!  21 | 24 | veinticuatro  |                                |       
!   4 | 22 | veintidos     |                                |       
!  19 | 29 | veintinueve   |                                |       
   16 | 28 | veintiocho    |                                |       
   27 | 26 | veintiseis    |                                |       
!  13 | 27 | veintisiete   |                                |       
!   7 | 23 | veintitres    |                                |       
!   8 | 21 | veintiuno     |                                |       
  (32 rows)
  
  SELECT a,b,c,substring(d for 30), length(d) from clstr_tst ORDER BY a;
--- 53,95 ----
  -- This entry is needed to test that TOASTED values are copied correctly.
  INSERT INTO clstr_tst (b, c, d) VALUES (6, 'seis', repeat('xyzzy', 100000));
  CLUSTER clstr_tst_c ON clstr_tst;
+ WARNING:  page verification failed, calculated checksum 29960 but expected 36001
+ ERROR:  invalid page in block 0 of relation base/16384/31088
  SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
   a  | b  |       c       |           substring            | length 
  ----+----+---------------+--------------------------------+--------
    1 | 11 | once          |                                |       
!   2 | 10 | diez          |                                |       
    3 | 31 | treinta y uno |                                |       
+   4 | 22 | veintidos     |                                |       
    5 |  3 | tres          |                                |       
    6 | 20 | veinte        |                                |       
+   7 | 23 | veintitres    |                                |       
+   8 | 21 | veintiuno     |                                |       
+   9 |  4 | cuatro        |                                |       
+  10 | 14 | catorce       |                                |       
+  11 |  2 | dos           |                                |       
+  12 | 18 | dieciocho     |                                |       
+  13 | 27 | veintisiete   |                                |       
   14 | 25 | veinticinco   |                                |       
!  15 | 13 | trece         |                                |       
   16 | 28 | veintiocho    |                                |       
+  17 | 32 | treinta y dos |                                |       
+  18 |  5 | cinco         |                                |       
+  19 | 29 | veintinueve   |                                |       
+  20 |  1 | uno           |                                |       
+  21 | 24 | veinticuatro  |                                |       
+  22 | 30 | treinta       |                                |       
+  23 | 12 | doce          |                                |       
+  24 | 17 | diecisiete    |                                |       
+  25 |  9 | nueve         |                                |       
+  26 | 19 | diecinueve    |                                |       
   27 | 26 | veintiseis    |                                |       
!  28 | 15 | quince        |                                |       
!  29 |  7 | siete         |                                |       
!  30 | 16 | dieciseis     |                                |       
!  31 |  8 | ocho          |                                |       
!  32 |  6 | seis          | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
  (32 rows)
  
  SELECT a,b,c,substring(d for 30), length(d) from clstr_tst ORDER BY a;
***************
*** 206,243 ****
  SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
   a  |  b  |       c        |           substring            | length 
  ----+-----+----------------+--------------------------------+--------
-  10 |  14 | catorce        |                                |       
-  18 |   5 | cinco          |                                |       
-   9 |   4 | cuatro         |                                |       
-  26 |  19 | diecinueve     |                                |       
-  12 |  18 | dieciocho      |                                |       
-  30 |  16 | dieciseis      |                                |       
-  24 |  17 | diecisiete     |                                |       
-   2 |  10 | diez           |                                |       
-  23 |  12 | doce           |                                |       
-  11 |   2 | dos            |                                |       
-  25 |   9 | nueve          |                                |       
-  31 |   8 | ocho           |                                |       
    1 |  11 | once           |                                |       
!  28 |  15 | quince         |                                |       
!  32 |   6 | seis           | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
!  29 |   7 | siete          |                                |       
!  15 |  13 | trece          |                                |       
!  22 |  30 | treinta        |                                |       
!  17 |  32 | treinta y dos  |                                |       
    3 |  31 | treinta y uno  |                                |       
    5 |   3 | tres           |                                |       
-  20 |   1 | uno            |                                |       
    6 |  20 | veinte         |                                |       
   14 |  25 | veinticinco    |                                |       
!  21 |  24 | veinticuatro   |                                |       
!   4 |  22 | veintidos      |                                |       
!  19 |  29 | veintinueve    |                                |       
   16 |  28 | veintiocho     |                                |       
   27 |  26 | veintiseis     |                                |       
!  13 |  27 | veintisiete    |                                |       
!   7 |  23 | veintitres     |                                |       
!   8 |  21 | veintiuno      |                                |       
    0 | 100 | in child table |                                |       
  (33 rows)
  
--- 208,245 ----
  SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
   a  |  b  |       c        |           substring            | length 
  ----+-----+----------------+--------------------------------+--------
    1 |  11 | once           |                                |       
!   2 |  10 | diez           |                                |       
    3 |  31 | treinta y uno  |                                |       
+   4 |  22 | veintidos      |                                |       
    5 |   3 | tres           |                                |       
    6 |  20 | veinte         |                                |       
+   7 |  23 | veintitres     |                                |       
+   8 |  21 | veintiuno      |                                |       
+   9 |   4 | cuatro         |                                |       
+  10 |  14 | catorce        |                                |       
+  11 |   2 | dos            |                                |       
+  12 |  18 | dieciocho      |                                |       
+  13 |  27 | veintisiete    |                                |       
   14 |  25 | veinticinco    |                                |       
!  15 |  13 | trece          |                                |       
   16 |  28 | veintiocho     |                                |       
+  17 |  32 | treinta y dos  |                                |       
+  18 |   5 | cinco          |                                |       
+  19 |  29 | veintinueve    |                                |       
+  20 |   1 | uno            |                                |       
+  21 |  24 | veinticuatro   |                                |       
+  22 |  30 | treinta        |                                |       
+  23 |  12 | doce           |                                |       
+  24 |  17 | diecisiete     |                                |       
+  25 |   9 | nueve          |                                |       
+  26 |  19 | diecinueve     |                                |       
   27 |  26 | veintiseis     |                                |       
!  28 |  15 | quince         |                                |       
!  29 |   7 | siete          |                                |       
!  30 |  16 | dieciseis      |                                |       
!  31 |   8 | ocho           |                                |       
!  32 |   6 | seis           | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
    0 | 100 | in child table |                                |       
  (33 rows)
  
***************
*** 277,286 ****
  	AND indrelid=pg_class_2.oid
  	AND pg_class_2.relname = 'clstr_tst'
  	AND indisclustered;
!    relname   
! -------------
!  clstr_tst_c
! (1 row)
  
  -- Try changing indisclustered
  ALTER TABLE clstr_tst CLUSTER ON clstr_tst_b_c;
--- 279,287 ----
  	AND indrelid=pg_class_2.oid
  	AND pg_class_2.relname = 'clstr_tst'
  	AND indisclustered;
!  relname 
! ---------
! (0 rows)
  
  -- Try changing indisclustered
  ALTER TABLE clstr_tst CLUSTER ON clstr_tst_b_c;
***************
*** 323,338 ****
  CLUSTER clstr_2;
  ERROR:  there is no previously clustered index for table "clstr_2"
  CLUSTER clstr_1_pkey ON clstr_1;
  CLUSTER clstr_2 USING clstr_2_pkey;
  SELECT * FROM clstr_1 UNION ALL
    SELECT * FROM clstr_2 UNION ALL
    SELECT * FROM clstr_3;
   a 
  ---
-  1
   2
   1
   2
   2
   1
  (6 rows)
--- 324,343 ----
  CLUSTER clstr_2;
  ERROR:  there is no previously clustered index for table "clstr_2"
  CLUSTER clstr_1_pkey ON clstr_1;
+ WARNING:  page verification failed, calculated checksum 48216 but expected 21420
+ ERROR:  invalid page in block 0 of relation base/16384/31111
  CLUSTER clstr_2 USING clstr_2_pkey;
+ WARNING:  page verification failed, calculated checksum 41924 but expected 46647
+ ERROR:  invalid page in block 0 of relation base/16384/31115
  SELECT * FROM clstr_1 UNION ALL
    SELECT * FROM clstr_2 UNION ALL
    SELECT * FROM clstr_3;
   a 
  ---
   2
   1
   2
+  1
   2
   1
  (6 rows)
***************
*** 356,363 ****
    SELECT * FROM clstr_3;
   a 
  ---
-  1
   2
   2
   1
   2
--- 361,368 ----
    SELECT * FROM clstr_3;
   a 
  ---
   2
+  1
   2
   1
   2
***************
*** 369,379 ****
  INSERT INTO clstr_1 VALUES (2);
  INSERT INTO clstr_1 VALUES (1);
  CLUSTER clstr_1;
  SELECT * FROM clstr_1;
   a 
  ---
-  1
   2
  (2 rows)
  
  -- Test MVCC-safety of cluster. There isn't much we can do to verify the
--- 374,385 ----
  INSERT INTO clstr_1 VALUES (2);
  INSERT INTO clstr_1 VALUES (1);
  CLUSTER clstr_1;
+ ERROR:  there is no previously clustered index for table "clstr_1"
  SELECT * FROM clstr_1;
   a 
  ---
   2
+  1
  (2 rows)
  
  -- Test MVCC-safety of cluster. There isn't much we can do to verify the
***************
*** 405,429 ****
  (5 rows)
  
  CLUSTER clustertest_pkey ON clustertest;
  SELECT * FROM clustertest;
!  key 
! -----
!   20
!   30
!   35
!   80
!  100
! (5 rows)
! 
  COMMIT;
  SELECT * FROM clustertest;
   key 
  -----
    20
    30
!   35
!   80
!  100
  (5 rows)
  
  -- check that temp tables can be clustered
--- 411,429 ----
  (5 rows)
  
  CLUSTER clustertest_pkey ON clustertest;
+ WARNING:  page verification failed, calculated checksum 23672 but expected 3264
+ ERROR:  invalid page in block 0 of relation base/16384/31124
  SELECT * FROM clustertest;
! ERROR:  current transaction is aborted, commands ignored until end of transaction block
  COMMIT;
  SELECT * FROM clustertest;
   key 
  -----
+   10
    20
    30
!   40
!   50
  (5 rows)
  
  -- check that temp tables can be clustered

======================================================================

#2Jeff Davis
pgsql@j-davis.com
In reply to: Fujii Masao (#1)
1 attachment(s)
Re: regression test failed when enabling checksum

On Tue, 2013-03-26 at 02:50 +0900, Fujii Masao wrote:

Hi,

I found that the regression test failed when I created the database
cluster with the checksum and set wal_level to archive. I think that
there are some bugs around checksum feature. Attached is the regression.diff.

Thank you for the report. This was a significant oversight, but simple
to diagnose and fix.

There were several places that were doing something like:

PageSetChecksumInplace
if (use_wal)
log_newpage
smgrextend

Which is obviously wrong, because log_newpage set the LSN of the page,
invalidating the checksum. We need to set the checksum after
log_newpage.

Also, I noticed that copy_relation_data was doing smgrread without
validating the checksum (or page header, for that matter), so I also
fixed that.

Patch attached. Only brief testing done, so I might have missed
something. I will look more closely later.

Regards,
Jeff Davis

Attachments:

checksums-fix.patchtext/x-patch; charset=ISO-8859-1; name=checksums-fix.patchDownload
*** a/src/backend/access/heap/rewriteheap.c
--- b/src/backend/access/heap/rewriteheap.c
***************
*** 273,286 **** end_heap_rewrite(RewriteState state)
  	/* Write the last page, if any */
  	if (state->rs_buffer_valid)
  	{
- 		PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
- 
  		if (state->rs_use_wal)
  			log_newpage(&state->rs_new_rel->rd_node,
  						MAIN_FORKNUM,
  						state->rs_blockno,
  						state->rs_buffer);
  		RelationOpenSmgr(state->rs_new_rel);
  		smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, state->rs_blockno,
  				   (char *) state->rs_buffer, true);
  	}
--- 273,287 ----
  	/* Write the last page, if any */
  	if (state->rs_buffer_valid)
  	{
  		if (state->rs_use_wal)
  			log_newpage(&state->rs_new_rel->rd_node,
  						MAIN_FORKNUM,
  						state->rs_blockno,
  						state->rs_buffer);
  		RelationOpenSmgr(state->rs_new_rel);
+ 
+ 		PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
+ 
  		smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, state->rs_blockno,
  				   (char *) state->rs_buffer, true);
  	}
***************
*** 616,623 **** raw_heap_insert(RewriteState state, HeapTuple tup)
  		{
  			/* Doesn't fit, so write out the existing page */
  
- 			PageSetChecksumInplace(page, state->rs_blockno);
- 
  			/* XLOG stuff */
  			if (state->rs_use_wal)
  				log_newpage(&state->rs_new_rel->rd_node,
--- 617,622 ----
***************
*** 632,637 **** raw_heap_insert(RewriteState state, HeapTuple tup)
--- 631,639 ----
  			 * end_heap_rewrite.
  			 */
  			RelationOpenSmgr(state->rs_new_rel);
+ 
+ 			PageSetChecksumInplace(page, state->rs_blockno);
+ 
  			smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM,
  					   state->rs_blockno, (char *) page, true);
  
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***************
*** 51,56 ****
--- 51,57 ----
  #include "commands/tablespace.h"
  #include "commands/trigger.h"
  #include "commands/typecmds.h"
+ #include "common/relpath.h"
  #include "executor/executor.h"
  #include "foreign/foreign.h"
  #include "miscadmin.h"
***************
*** 8902,8913 **** copy_relation_data(SMgrRelation src, SMgrRelation dst,
  
  		smgrread(src, forkNum, blkno, buf);
  
! 		PageSetChecksumInplace(page, blkno);
  
  		/* XLOG stuff */
  		if (use_wal)
  			log_newpage(&dst->smgr_rnode.node, forkNum, blkno, page);
  
  		/*
  		 * Now write the page.	We say isTemp = true even if it's not a temp
  		 * rel, because there's no need for smgr to schedule an fsync for this
--- 8903,8923 ----
  
  		smgrread(src, forkNum, blkno, buf);
  
! 		if (!PageIsVerified(page, blkno))
! 			ereport(ERROR,
! 					(errcode(ERRCODE_DATA_CORRUPTED),
! 					 errmsg("invalid page in block %u of relation %s",
! 							blkno,
! 							relpathbackend(src->smgr_rnode.node,
! 										   src->smgr_rnode.backend,
! 										   forkNum))));
  
  		/* XLOG stuff */
  		if (use_wal)
  			log_newpage(&dst->smgr_rnode.node, forkNum, blkno, page);
  
+ 		PageSetChecksumInplace(page, blkno);
+ 
  		/*
  		 * Now write the page.	We say isTemp = true even if it's not a temp
  		 * rel, because there's no need for smgr to schedule an fsync for this
#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#1)
Re: regression test failed when enabling checksum

On 25 March 2013 17:50, Fujii Masao <masao.fujii@gmail.com> wrote:

I found that the regression test failed when I created the database
cluster with the checksum and set wal_level to archive. I think that
there are some bugs around checksum feature. Attached is the regression.diff.

Apologies for not responding to your original email, I must have missed that.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Simon Riggs
simon@2ndQuadrant.com
In reply to: Jeff Davis (#2)
Re: regression test failed when enabling checksum

On 26 March 2013 23:23, Jeff Davis <pgsql@j-davis.com> wrote:

Patch attached. Only brief testing done, so I might have missed
something. I will look more closely later.

Thanks, I'll look at that tomorrow.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Jeff Janes
jeff.janes@gmail.com
In reply to: Jeff Davis (#2)
Re: regression test failed when enabling checksum

On Tue, Mar 26, 2013 at 4:23 PM, Jeff Davis <pgsql@j-davis.com> wrote:

On Tue, 2013-03-26 at 02:50 +0900, Fujii Masao wrote:

Hi,

I found that the regression test failed when I created the database
cluster with the checksum and set wal_level to archive. I think that
there are some bugs around checksum feature. Attached is the

regression.diff.

Thank you for the report. This was a significant oversight, but simple
to diagnose and fix.

There were several places that were doing something like:

PageSetChecksumInplace
if (use_wal)
log_newpage
smgrextend

Which is obviously wrong, because log_newpage set the LSN of the page,
invalidating the checksum. We need to set the checksum after
log_newpage.

Also, I noticed that copy_relation_data was doing smgrread without
validating the checksum (or page header, for that matter), so I also
fixed that.

Patch attached. Only brief testing done, so I might have missed
something. I will look more closely later.

After applying your patch, I could run the stress test described here:

http://archives.postgresql.org/pgsql-hackers/2012-02/msg01227.php

But altered to make use of initdb -k, of course.

Over 10,000 cycles of crash and recovery, I encountered two cases of
checksum failures after recovery, example:

14264 SELECT 2013-03-28 13:08:38.980 PDT:WARNING: page verification
failed, calculated checksum 7017 but expected 1098
14264 SELECT 2013-03-28 13:08:38.980 PDT:ERROR: invalid page in block 77
of relation base/16384/2088965
14264 SELECT 2013-03-28 13:08:38.980 PDT:STATEMENT: select sum(count) from
foo

In both cases, the bad block (77 in this case) is the same block that was
intentionally partially-written during the "crash". However, that block
should have been restored from the WAL FPW, so its fragmented nature should
not have been present in order to be detected. Any idea what is going on?

Unfortunately I already cleaned up the data directory before noticing the
problem, so I have nothing to post for forensic analysis. I'll try to
reproduce the problem.

Without the initdb -k option, I ran it for 30,000 cycles and found no
problems. I don't think this is because the problem exists but is going
undetected, because my test is designed to detect such problems--if the
block is fragmented but not overwritten by WAL FPW, that should
occasionally lead to detectable inconsistent tuples.

I don't think your patch caused this particular problem, but it merely
fixed a problem that was previously preventing me from running my test.

Cheers,

Jeff

#6Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Janes (#5)
Re: regression test failed when enabling checksum

On Mon, 2013-04-01 at 10:37 -0700, Jeff Janes wrote:

Over 10,000 cycles of crash and recovery, I encountered two cases of
checksum failures after recovery, example:

14264 SELECT 2013-03-28 13:08:38.980 PDT:WARNING: page verification
failed, calculated checksum 7017 but expected 1098
14264 SELECT 2013-03-28 13:08:38.980 PDT:ERROR: invalid page in block
77 of relation base/16384/2088965

14264 SELECT 2013-03-28 13:08:38.980 PDT:STATEMENT: select sum(count)
from foo

It would be nice to know whether that's an index or a heap page.

In both cases, the bad block (77 in this case) is the same block that
was intentionally partially-written during the "crash". However, that
block should have been restored from the WAL FPW, so its fragmented
nature should not have been present in order to be detected. Any idea
what is going on?

Not right now. My primary suspect is what's going on in
visibilitymap_set() and heap_xlog_visible(), which is more complex than
some of the other code. That would require some VACUUM activity, which
isn't in your workload -- do you think autovacuum may kick in sometimes?

Thank you for testing! I will try to reproduce it, as well.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Jeff Janes
jeff.janes@gmail.com
In reply to: Jeff Janes (#5)

On Mon, Apr 1, 2013 at 10:37 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Tue, Mar 26, 2013 at 4:23 PM, Jeff Davis <pgsql@j-davis.com> wrote:

Patch attached. Only brief testing done, so I might have missed
something. I will look more closely later.

After applying your patch, I could run the stress test described here:

http://archives.postgresql.org/pgsql-hackers/2012-02/msg01227.php

But altered to make use of initdb -k, of course.

Over 10,000 cycles of crash and recovery, I encountered two cases of
checksum failures after recovery, example:
...

Unfortunately I already cleaned up the data directory before noticing the
problem, so I have nothing to post for forensic analysis. I'll try to
reproduce the problem.

I've reproduced the problem, this time in block 74 of relation
base/16384/4931589, and a tarball of the data directory is here:

https://docs.google.com/file/d/0Bzqrh1SO9FcELS1majlFcTZsR0k/edit?usp=sharing

(the table is in database jjanes under role jjanes, the binary is commit
9ad27c215362df436f8c)

What I would probably really want is the data as it existed after the crash
but before recovery started, but since the postmaster immediately starts
recovery after the crash, I don't know of a good way to capture this.

I guess one thing to do would be to extract from the WAL the most recent
FPW for block 74 of relation base/16384/4931589 (assuming it hasn't been
recycled already) and see if it matches what is actually in that block of
that data file, but I don't currently know how to do that.

11500 SELECT 2013-04-01 12:01:56.926 PDT:WARNING: page verification
failed, calculated checksum 54570 but expected 34212
11500 SELECT 2013-04-01 12:01:56.926 PDT:ERROR: invalid page in block 74
of relation base/16384/4931589
11500 SELECT 2013-04-01 12:01:56.926 PDT:STATEMENT: select sum(count) from
foo

Cheers,

Jeff

#8Jeff Janes
jeff.janes@gmail.com
In reply to: Jeff Davis (#6)
Re: regression test failed when enabling checksum

On Monday, April 1, 2013, Jeff Davis wrote:

On Mon, 2013-04-01 at 10:37 -0700, Jeff Janes wrote:

Over 10,000 cycles of crash and recovery, I encountered two cases of
checksum failures after recovery, example:

14264 SELECT 2013-03-28 13:08:38.980 PDT:WARNING: page verification
failed, calculated checksum 7017 but expected 1098
14264 SELECT 2013-03-28 13:08:38.980 PDT:ERROR: invalid page in block
77 of relation base/16384/2088965

14264 SELECT 2013-03-28 13:08:38.980 PDT:STATEMENT: select sum(count)
from foo

It would be nice to know whether that's an index or a heap page.

It is a heap page for the table jjanes.public.foo.

In both cases, the bad block (77 in this case) is the same block that
was intentionally partially-written during the "crash". However, that
block should have been restored from the WAL FPW, so its fragmented
nature should not have been present in order to be detected. Any idea
what is going on?

Not right now. My primary suspect is what's going on in
visibilitymap_set() and heap_xlog_visible(), which is more complex than
some of the other code. That would require some VACUUM activity, which
isn't in your workload -- do you think autovacuum may kick in sometimes?

Yes, a modification to my test harness that I failed to mention is that it
now sleeps for 2 minutes after every 100 rounds of crash/recovery,
specifically so that autovac has a chance to kick in and run to completion.
I made that change so as to avoid wrap-around shut-downs on long running
tests. However "foo" is truncated at the beginning of every test, so I
don't think this would be relevant to that table, as any poisoned fruits of
the autovac would be discarded with the truncation.

Cheers,

Jeff

#9Simon Riggs
simon@2ndQuadrant.com
In reply to: Jeff Davis (#6)
Re: regression test failed when enabling checksum

On 2 April 2013 02:53, Jeff Davis <pgsql@j-davis.com> wrote:

Any idea
what is going on?

Not right now.

Since I'm now responsible for the quality of this patch, I need to say
this before someone else does: we have until the end of the week to
fix this conclusively, or I will need to consider whether to revoke
the patch in this release. Thanks for your time and trouble to locate
problems.

It may be that the patch is revealing an underlying bug, so we don't
yet have evidence for the source of the bug.

My primary suspect is what's going on in
visibilitymap_set() and heap_xlog_visible(), which is more complex than
some of the other code. That would require some VACUUM activity, which
isn't in your workload -- do you think autovacuum may kick in sometimes?

Other candidates might be:

* page hole is non-empty, so the overwrite of the "Full" page write
may leave the block in an intermediate state. Tom recently fixed
RestoreBkpBlock() to avoid it zeroing the contents first before
applying the page.

* something to do with access to GetPageLSN()

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andres Freund
andres@2ndquadrant.com
In reply to: Jeff Janes (#7)
1 attachment(s)
Re: regression test failed when enabling checksum

On 2013-04-01 19:51:19 -0700, Jeff Janes wrote:

On Mon, Apr 1, 2013 at 10:37 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Tue, Mar 26, 2013 at 4:23 PM, Jeff Davis <pgsql@j-davis.com> wrote:

Patch attached. Only brief testing done, so I might have missed
something. I will look more closely later.

After applying your patch, I could run the stress test described here:

http://archives.postgresql.org/pgsql-hackers/2012-02/msg01227.php

But altered to make use of initdb -k, of course.

Over 10,000 cycles of crash and recovery, I encountered two cases of
checksum failures after recovery, example:
...

Unfortunately I already cleaned up the data directory before noticing the
problem, so I have nothing to post for forensic analysis. I'll try to
reproduce the problem.

I've reproduced the problem, this time in block 74 of relation
base/16384/4931589, and a tarball of the data directory is here:

https://docs.google.com/file/d/0Bzqrh1SO9FcELS1majlFcTZsR0k/edit?usp=sharing

(the table is in database jjanes under role jjanes, the binary is commit
9ad27c215362df436f8c)

What I would probably really want is the data as it existed after the crash
but before recovery started, but since the postmaster immediately starts
recovery after the crash, I don't know of a good way to capture this.

I guess one thing to do would be to extract from the WAL the most recent
FPW for block 74 of relation base/16384/4931589 (assuming it hasn't been
recycled already) and see if it matches what is actually in that block of
that data file, but I don't currently know how to do that.

Since I bragged somewhere else recently that it should be easy to do now that
we have pg_xlogdump I hacked it up so it dumps all the full page writes into
the directory specified by --dump-bkp=PATH. It currently overwrites previous
full page writes to the same page but that should be trivial to change if you
want by adding %X.%X for the lsn into the path sprintf.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-pg_xlogdump-add-option-for-dumping-full-page-writes-.patchtext/x-patch; charset=us-asciiDownload
>From 09cdce611dc74082901ca1a646135a5ea1af709c Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 2 Apr 2013 11:43:23 +0200
Subject: [PATCH] pg_xlogdump: add option for dumping full page writes into a
 directory

---
 contrib/pg_xlogdump/pg_xlogdump.c |   56 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c
index d6d5498..a5e4186 100644
--- a/contrib/pg_xlogdump/pg_xlogdump.c
+++ b/contrib/pg_xlogdump/pg_xlogdump.c
@@ -40,6 +40,7 @@ typedef struct XLogDumpConfig
 	bool		bkp_details;
 	int			stop_after_records;
 	int			already_displayed_records;
+	char		*dump_bkp;
 
 	/* filter options */
 	int			filter_by_rmgr;
@@ -373,6 +374,46 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogRecPtr ReadRecPtr, XLogRecord
 				   bkpb.block, bkpb.hole_offset, bkpb.hole_length);
 		}
 	}
+
+	if (config->dump_bkp != NULL)
+	{
+		int			bkpnum;
+		char	   *blk = (char *) XLogRecGetData(record) + record->xl_len;
+
+		for (bkpnum = 0; bkpnum < XLR_MAX_BKP_BLOCKS; bkpnum++)
+		{
+			BkpBlock	bkpb;
+			char		bkpbdata[BLCKSZ];
+			int			outfd;
+			char		outpath[MAXPGPATH];
+
+			if (!(XLR_BKP_BLOCK(bkpnum) & record->xl_info))
+				continue;
+
+			memcpy(&bkpb, blk, sizeof(BkpBlock));
+			blk += sizeof(BkpBlock);
+
+			memcpy(bkpbdata, blk, bkpb.hole_offset);
+			memset(bkpbdata + bkpb.hole_offset, 0, bkpb.hole_length);
+			memcpy(bkpbdata + bkpb.hole_offset + bkpb.hole_length,
+			       blk + bkpb.hole_offset,
+			       BLCKSZ - bkpb.hole_offset - bkpb.hole_length);
+
+			blk += BLCKSZ - bkpb.hole_length;
+
+			sprintf(outpath, "%s/%u-%u-%u:%u_%s", config->dump_bkp,
+			        bkpb.node.spcNode, bkpb.node.dbNode, bkpb.node.relNode,
+			        bkpb.block, forkNames[bkpb.fork]);
+			outfd = open(outpath, O_WRONLY|O_TRUNC|O_CREAT, S_IRUSR|S_IWUSR);
+			if (outfd < 0)
+				fatal_error("could not open output file %s: %s",
+				            outpath, strerror(errno));
+			if (write(outfd, bkpbdata, BLCKSZ) != BLCKSZ)
+				fatal_error("could not successfully write output block %s: %s",
+				            outpath, strerror(errno));
+			close(outfd);
+		}
+	}
 }
 
 static void
@@ -387,6 +428,7 @@ usage(void)
 	printf("  -?, --help             show this help, then exit\n");
 	printf("\nContent options:\n");
 	printf("  -b, --bkp-details      output detailed information about backup blocks\n");
+	printf("  -d, --dump-bkp=path    dump all full page images into PATH\n");
 	printf("  -e, --end=RECPTR       stop reading at log position RECPTR\n");
 	printf("  -n, --limit=N          number of records to display\n");
 	printf("  -p, --path=PATH        directory in which to find log segment files\n");
@@ -413,6 +455,7 @@ main(int argc, char **argv)
 
 	static struct option long_options[] = {
 		{"bkp-details", no_argument, NULL, 'b'},
+		{"dump-bkp", required_argument, NULL, 'd'},
 		{"end", required_argument, NULL, 'e'},
 		{"help", no_argument, NULL, '?'},
 		{"limit", required_argument, NULL, 'n'},
@@ -438,6 +481,7 @@ main(int argc, char **argv)
 	private.endptr = InvalidXLogRecPtr;
 
 	config.bkp_details = false;
+	config.dump_bkp = NULL;
 	config.stop_after_records = -1;
 	config.already_displayed_records = 0;
 	config.filter_by_rmgr = -1;
@@ -450,7 +494,7 @@ main(int argc, char **argv)
 		goto bad_argument;
 	}
 
-	while ((option = getopt_long(argc, argv, "be:?n:p:r:s:t:Vx:",
+	while ((option = getopt_long(argc, argv, "bd:e:?n:p:r:s:t:Vx:",
 								 long_options, &optindex)) != -1)
 	{
 		switch (option)
@@ -458,6 +502,16 @@ main(int argc, char **argv)
 			case 'b':
 				config.bkp_details = true;
 				break;
+			case 'd':
+				config.dump_bkp = pg_strdup(optarg);
+				if (!verify_directory(config.dump_bkp))
+				{
+					fprintf(stderr,
+					        "%s: path \"%s\" cannot be opened: %s",
+					        progname, config.dump_bkp, strerror(errno));
+					goto bad_argument;
+				}
+				break;
 			case 'e':
 				if (sscanf(optarg, "%X/%X", &xlogid, &xrecoff) != 2)
 				{
-- 
1.7.10.4

#11Andres Freund
andres@2ndquadrant.com
In reply to: Jeff Janes (#7)
Re: regression test failed when enabling checksum

On 2013-04-01 19:51:19 -0700, Jeff Janes wrote:

On Mon, Apr 1, 2013 at 10:37 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Tue, Mar 26, 2013 at 4:23 PM, Jeff Davis <pgsql@j-davis.com> wrote:

Patch attached. Only brief testing done, so I might have missed
something. I will look more closely later.

After applying your patch, I could run the stress test described here:

http://archives.postgresql.org/pgsql-hackers/2012-02/msg01227.php

But altered to make use of initdb -k, of course.

Over 10,000 cycles of crash and recovery, I encountered two cases of
checksum failures after recovery, example:
...

Unfortunately I already cleaned up the data directory before noticing the
problem, so I have nothing to post for forensic analysis. I'll try to
reproduce the problem.

I've reproduced the problem, this time in block 74 of relation
base/16384/4931589, and a tarball of the data directory is here:

https://docs.google.com/file/d/0Bzqrh1SO9FcELS1majlFcTZsR0k/edit?usp=sharing

(the table is in database jjanes under role jjanes, the binary is commit
9ad27c215362df436f8c)

What I would probably really want is the data as it existed after the crash
but before recovery started, but since the postmaster immediately starts
recovery after the crash, I don't know of a good way to capture this.

I guess one thing to do would be to extract from the WAL the most recent
FPW for block 74 of relation base/16384/4931589 (assuming it hasn't been
recycled already) and see if it matches what is actually in that block of
that data file, but I don't currently know how to do that.

11500 SELECT 2013-04-01 12:01:56.926 PDT:WARNING: page verification
failed, calculated checksum 54570 but expected 34212
11500 SELECT 2013-04-01 12:01:56.926 PDT:ERROR: invalid page in block 74
of relation base/16384/4931589
11500 SELECT 2013-04-01 12:01:56.926 PDT:STATEMENT: select sum(count) from
foo

I just checked and unfortunately your dump doesn't contain all that much
valid WAL:
rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn: 7/AB000028, prev 7/AA000090, bkp: 0000, desc: checkpoint: redo 7/AB000028; tli 1; prev tli 1; fpw true; xid 0/156747297; oid 4939781; multi 1; offset 0; oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; online
rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn: 7/AB000090, prev 7/AB000028, bkp: 0000, desc: checkpoint: redo 7/AB000090; tli 1; prev tli 1; fpw true; xid 0/156747297; oid 4939781; multi 1; offset 0; oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown
pg_xlogdump: FATAL: error in WAL record at 7/AB000090: record with zero length at 7/AB0000F8

So just two checkpoint records.

Unfortunately I fear that won't be enough to diagnose the problem,
could you reproduce it with a higher wal_keep_segments?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Jeff Janes
jeff.janes@gmail.com
In reply to: Andres Freund (#11)
Re: regression test failed when enabling checksum

On Wed, Apr 3, 2013 at 2:31 AM, Andres Freund <andres@2ndquadrant.com>wrote:

I just checked and unfortunately your dump doesn't contain all that much

valid WAL:
...

So just two checkpoint records.

Unfortunately I fear that won't be enough to diagnose the problem,
could you reproduce it with a higher wal_keep_segments?

I've been trying, but see message "commit dfda6ebaec67 versus
wal_keep_segments".

Looking at some of the log files more, I see that vacuum is involved, but
in some way I don't understand. The crash always happens on a test
cycle immediately after the sleep that allows the autovac to kick in and
finish. So the events goes something like this:

...
run the frantic updating of "foo" until crash
recovery
query "foo" and verify the results are consistent with expectations
sleep to allow autovac to do its job.
truncate "foo" and repopulate it.
run the frantic updating of "foo" until crash
recovery
attempt to query "foo" but get the checksum failure.

What the vacuum is doing that corrupts the system in a way that survives
the truncate is a mystery to me.

Also, at one point I had the harness itself exit as soon as it detected the
problem, but I failed to have it shut down the server. So the server keep
running idle and having autovac do its thing, which produced some
interesting log output:

WARNING: relation "foo" page 45 is uninitialized --- fixing
WARNING: relation "foo" page 46 is uninitialized --- fixing
...
WARNING: relation "foo" page 72 is uninitialized --- fixing
WARNING: relation "foo" page 73 is uninitialized --- fixing
WARNING: page verification failed, calculated checksum 54570 but expected
34212
ERROR: invalid page in block 74 of relation base/16384/4931589

This happened 3 times. Every time, the warnings started on page 45, and
they continued up until the invalid page was found (which varied, being 74,
86, and 74 again)

I wonder if the bug is in checksums, or if the checksums are doing their
job by finding some other bug. And why did those uninitialized pages
trigger warnings when they were autovacced, but not when they were seq
scanned in a query?

Cheers,

Jeff

#13Andres Freund
andres@2ndquadrant.com
In reply to: Jeff Janes (#12)
Re: regression test failed when enabling checksum

On 2013-04-03 09:48:54 -0700, Jeff Janes wrote:

On Wed, Apr 3, 2013 at 2:31 AM, Andres Freund <andres@2ndquadrant.com>wrote:

I just checked and unfortunately your dump doesn't contain all that much

valid WAL:
...

So just two checkpoint records.

Unfortunately I fear that won't be enough to diagnose the problem,
could you reproduce it with a higher wal_keep_segments?

I've been trying, but see message "commit dfda6ebaec67 versus
wal_keep_segments".

Setting up an archive_command could help if its affordable from the
space perspective :/

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Janes (#12)
Re: regression test failed when enabling checksum

On Wed, 2013-04-03 at 09:48 -0700, Jeff Janes wrote:

And why did those uninitialized pages trigger warnings when they were
autovacced, but not when they were seq scanned in a query?

A scan won't trigger that warning. Empty pages are sometimes the
expected result of a crash when the file is extended but the page is not
written yet. So empty pages aren't necessarily an indication of
corruption (though I'd like to fix that eventually, because sometimes
zeroing is corruption).

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Janes (#7)
Re: regression test failed when enabling checksum

On Mon, 2013-04-01 at 19:51 -0700, Jeff Janes wrote:

I've reproduced the problem, this time in block 74 of relation
base/16384/4931589, and a tarball of the data directory is here:

https://docs.google.com/file/d/0Bzqrh1SO9FcELS1majlFcTZsR0k/edit?usp=sharing

(the table is in database jjanes under role jjanes, the binary is
commit 9ad27c215362df436f8c)

What I would probably really want is the data as it existed after the
crash but before recovery started, but since the postmaster
immediately starts recovery after the crash, I don't know of a good
way to capture this.

Can you just turn off the restart_after_crash GUC? I had a chance to
look at this, and seeing the block before and after recovery would be
nice. I didn't see a log file in the data directory, but it didn't go
through recovery, so I assume it already did that.

The block is corrupt as far as I can tell. The first third is written,
and the remainder is all zeros. The header looks like this:

(These numbers are mostly from pageinspect. The checksum value that
comes from pageinspect needed to be cast back to an unsigned short to
match the error message above -- should that be changed in
pageinspect?).

lsn: 7/252E4080
checksum: 34212
flags: 1
lower: 1188
upper: 5952
special: 8192
pagesize: 8192
version: 4
prune_xid: 156833911

So the header looks good, but most of the page data is missing. I tried
with pg_filedump (the 9.2.0 version, which should be fine), and it
agrees with me that the page is corrupt.

Interestingly, that doesn't result in a user-visible error when
ignore_checksum_failure=on. That's because the item pointers appear to
all be either not normal or they point to the zeroed region. Testing the
visibility of a zeroed tuple header is always false, so no problem.

You'd still think this would cause incorrect results, but I think what's
happening is that this is a new page (otherwise it would have been
written with something other than zeroes before). So, the tuples that
are supposed to be there may be uncommitted anyway.

So, the page may be corrupt without checksums as well, but it just
happens to be hidden for the same reason. Can you try to reproduce it
without -k? And on the checkin right before checksums were added?
Without checksums, you'll need to use pg_filedump (or similar) to find
whether an error has happened.

To start speculating about the root cause: something is violating the
WAL before data rule, or not writing a FPI when it's supposed to, or not
properly restoring the FPI during recovery, or something sets the wrong
LSN. This could still be caused by the checksums patch, but it seems a
little less likely now. The reason I say that is because it's a new page
with tuples on it, so that means something in the insert/update path
ended up not writing the FPI before writing the page.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers