DDL result is lost by CREATE DATABASE with WAL_LOG strategy

Started by Ryo Matsumura (Fujitsu)almost 3 years ago21 messages

matsumura.ryo@fujitsu.com

almost 3 years ago

Hi, hackers.

I found that CREATE DATABASE occurs lost of DDL result after crash server.
The direct cause is that the checkpoint skips sync for page of template1's main fork
because buffer status is not marked as BM_PERMANENT in BufferAlloc().

Have you any knowledge about it?

Reproduction:
1) Do initdb.
2) Start server.
3) Connect to 'postgres' database.
4) Execute CREATE DATABASE statement with WAL_LOG strategy.
5) Connect to 'template1' database.
6) Update pg_class by executing some DDL.
7) Do checkpoint.
8) Crash server.

[src/backend/storage/buffer/bufmgr.c]
1437 if (relpersistence == RELPERSISTENCE_PERMANENT || forkNum == INIT_FORKNUM)
1438 buf_state |= BM_TAG_VALID | BM_PERMANENT | BUF_USAGECOUNT_ONE;
1439 else
!!! walk this route !!!
1440 buf_state |= BM_TAG_VALID | BUF_USAGECOUNT_ONE;

The above is occured by the following call.
The argument 'permanent' of ReadBufferWithoutRelcache() is passed to
BufferAlloc() as 'relpersistence'.

[src/backend/commands/]
298 buf = ReadBufferWithoutRelcache(rnode, MAIN_FORKNUM, blkno,
299 RBM_NORMAL, bstrategy, false);

Regards
Ryo Matsumura

Justin Pryzby

pryzby@telsasoft.com

almost 3 years ago

In reply to: Ryo Matsumura (Fujitsu) (#1)

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On Wed, Feb 15, 2023 at 04:49:38AM +0000, Ryo Matsumura (Fujitsu) wrote:

Hi, hackers.

I found that CREATE DATABASE occurs lost of DDL result after crash server.
The direct cause is that the checkpoint skips sync for page of template1's main fork
because buffer status is not marked as BM_PERMANENT in BufferAlloc().

I had some trouble reproducing this when running the commands by hand.

But it reproduces fine like this:

$ ./tmp_install/usr/local/pgsql/bin/postgres -D ./testrun/regress/regress/tmp_check/data& sleep 2; psql -h /tmp postgres -c "DROP DATABASE IF EXISTS j" -c "CREATE DATABASE j STRATEGY wal_log" && psql -h /tmp template1 -c "CREATE TABLE t(i int)" -c "INSERT INTO t SELECT generate_series(1,9)" -c CHECKPOINT; kill -9 %1; wait; ./tmp_install/usr/local/pgsql/bin/postgres -D ./testrun/regress/regress/tmp_check/data& sleep 9; psql -h /tmp template1 -c "table t"; kill %1
[1]: 29088 2023-02-15 10:10:31.664 CST postmaster[29088] LOG: starting PostgreSQL 16devel on x86_64-linux, compiled by gcc-9.4.0, 64-bit 2023-02-15 10:10:31.665 CST postmaster[29088] LOG: listening on IPv4 address "127.0.0.1", port 5432 2023-02-15 10:10:31.724 CST postmaster[29088] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2023-02-15 10:10:31.780 CST startup[29094] LOG: database system was interrupted; last known up at 2023-02-15 10:10:30 CST 2023-02-15 10:10:33.888 CST startup[29094] LOG: database system was not properly shut down; automatic recovery in progress 2023-02-15 10:10:33.934 CST startup[29094] LOG: redo starts at 0/BA9E868 2023-02-15 10:10:33.934 CST startup[29094] LOG: invalid record length at 0/BA9E918: wanted 24, got 0 2023-02-15 10:10:33.934 CST startup[29094] LOG: redo done at 0/BA9E8A0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s 2023-02-15 10:10:34.073 CST checkpointer[29092] LOG: checkpoint starting: end-of-recovery immediate wait 2023-02-15 10:10:34.275 CST checkpointer[29092] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.035 s, sync=0.026 s, total=0.257 s; sync files=2, longest=0.019 s, average=0.013 s; distance=0 kB, estimate=0 kB; lsn=0/BA9E918, redo lsn=0/BA9E918 2023-02-15 10:10:34.321 CST postmaster[29088] LOG: database system is ready to accept connections 2023-02-15 10:10:39.893 CST client backend[29110] psql ERROR: relation "t" does not exist at character 7 2023-02-15 10:10:39.893 CST client backend[29110] psql STATEMENT: table t ERROR: relation "t" does not exist
2023-02-15 10:10:27.584 CST postmaster[29069] LOG: starting PostgreSQL 16devel on x86_64-linux, compiled by gcc-9.4.0, 64-bit
2023-02-15 10:10:27.584 CST postmaster[29069] LOG: listening on IPv4 address "127.0.0.1", port 5432
2023-02-15 10:10:27.663 CST postmaster[29069] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2023-02-15 10:10:27.728 CST startup[29074] LOG: database system was shut down at 2023-02-15 10:10:13 CST
2023-02-15 10:10:27.780 CST postmaster[29069] LOG: database system is ready to accept connections
NOTICE: database "j" does not exist, skipping
DROP DATABASE
CREATE DATABASE
CREATE TABLE
INSERT 0 9
2023-02-15 10:10:30.160 CST checkpointer[29072] LOG: checkpoint starting: immediate force wait
2023-02-15 10:10:30.740 CST checkpointer[29072] LOG: checkpoint complete: wrote 943 buffers (5.8%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.070 s, sync=0.369 s, total=0.581 s; sync files=268, longest=0.274 s, average=0.002 s; distance=4322 kB, estimate=4322 kB; lsn=0/BA9E8A0, redo lsn=0/BA9E868
CHECKPOINT
[1]: 29088 2023-02-15 10:10:31.664 CST postmaster[29088] LOG: starting PostgreSQL 16devel on x86_64-linux, compiled by gcc-9.4.0, 64-bit 2023-02-15 10:10:31.665 CST postmaster[29088] LOG: listening on IPv4 address "127.0.0.1", port 5432 2023-02-15 10:10:31.724 CST postmaster[29088] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2023-02-15 10:10:31.780 CST startup[29094] LOG: database system was interrupted; last known up at 2023-02-15 10:10:30 CST 2023-02-15 10:10:33.888 CST startup[29094] LOG: database system was not properly shut down; automatic recovery in progress 2023-02-15 10:10:33.934 CST startup[29094] LOG: redo starts at 0/BA9E868 2023-02-15 10:10:33.934 CST startup[29094] LOG: invalid record length at 0/BA9E918: wanted 24, got 0 2023-02-15 10:10:33.934 CST startup[29094] LOG: redo done at 0/BA9E8A0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s 2023-02-15 10:10:34.073 CST checkpointer[29092] LOG: checkpoint starting: end-of-recovery immediate wait 2023-02-15 10:10:34.275 CST checkpointer[29092] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.035 s, sync=0.026 s, total=0.257 s; sync files=2, longest=0.019 s, average=0.013 s; distance=0 kB, estimate=0 kB; lsn=0/BA9E918, redo lsn=0/BA9E918 2023-02-15 10:10:34.321 CST postmaster[29088] LOG: database system is ready to accept connections 2023-02-15 10:10:39.893 CST client backend[29110] psql ERROR: relation "t" does not exist at character 7 2023-02-15 10:10:39.893 CST client backend[29110] psql STATEMENT: table t ERROR: relation "t" does not exist
[1]: 29088 2023-02-15 10:10:31.664 CST postmaster[29088] LOG: starting PostgreSQL 16devel on x86_64-linux, compiled by gcc-9.4.0, 64-bit 2023-02-15 10:10:31.665 CST postmaster[29088] LOG: listening on IPv4 address "127.0.0.1", port 5432 2023-02-15 10:10:31.724 CST postmaster[29088] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2023-02-15 10:10:31.780 CST startup[29094] LOG: database system was interrupted; last known up at 2023-02-15 10:10:30 CST 2023-02-15 10:10:33.888 CST startup[29094] LOG: database system was not properly shut down; automatic recovery in progress 2023-02-15 10:10:33.934 CST startup[29094] LOG: redo starts at 0/BA9E868 2023-02-15 10:10:33.934 CST startup[29094] LOG: invalid record length at 0/BA9E918: wanted 24, got 0 2023-02-15 10:10:33.934 CST startup[29094] LOG: redo done at 0/BA9E8A0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s 2023-02-15 10:10:34.073 CST checkpointer[29092] LOG: checkpoint starting: end-of-recovery immediate wait 2023-02-15 10:10:34.275 CST checkpointer[29092] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.035 s, sync=0.026 s, total=0.257 s; sync files=2, longest=0.019 s, average=0.013 s; distance=0 kB, estimate=0 kB; lsn=0/BA9E918, redo lsn=0/BA9E918 2023-02-15 10:10:34.321 CST postmaster[29088] LOG: database system is ready to accept connections 2023-02-15 10:10:39.893 CST client backend[29110] psql ERROR: relation "t" does not exist at character 7 2023-02-15 10:10:39.893 CST client backend[29110] psql STATEMENT: table t ERROR: relation "t" does not exist
2023-02-15 10:10:31.664 CST postmaster[29088] LOG: starting PostgreSQL 16devel on x86_64-linux, compiled by gcc-9.4.0, 64-bit
2023-02-15 10:10:31.665 CST postmaster[29088] LOG: listening on IPv4 address "127.0.0.1", port 5432
2023-02-15 10:10:31.724 CST postmaster[29088] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2023-02-15 10:10:31.780 CST startup[29094] LOG: database system was interrupted; last known up at 2023-02-15 10:10:30 CST
2023-02-15 10:10:33.888 CST startup[29094] LOG: database system was not properly shut down; automatic recovery in progress
2023-02-15 10:10:33.934 CST startup[29094] LOG: redo starts at 0/BA9E868
2023-02-15 10:10:33.934 CST startup[29094] LOG: invalid record length at 0/BA9E918: wanted 24, got 0
2023-02-15 10:10:33.934 CST startup[29094] LOG: redo done at 0/BA9E8A0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2023-02-15 10:10:34.073 CST checkpointer[29092] LOG: checkpoint starting: end-of-recovery immediate wait
2023-02-15 10:10:34.275 CST checkpointer[29092] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.035 s, sync=0.026 s, total=0.257 s; sync files=2, longest=0.019 s, average=0.013 s; distance=0 kB, estimate=0 kB; lsn=0/BA9E918, redo lsn=0/BA9E918
2023-02-15 10:10:34.321 CST postmaster[29088] LOG: database system is ready to accept connections
2023-02-15 10:10:39.893 CST client backend[29110] psql ERROR: relation "t" does not exist at character 7
2023-02-15 10:10:39.893 CST client backend[29110] psql STATEMENT: table t
ERROR: relation "t" does not exist

Nathan Bossart

nathandbossart@gmail.com

almost 3 years ago

In reply to: Ryo Matsumura (Fujitsu) (#1)

1 attachment(s)

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On Wed, Feb 15, 2023 at 04:49:38AM +0000, Ryo Matsumura (Fujitsu) wrote:

The above is occured by the following call.
The argument 'permanent' of ReadBufferWithoutRelcache() is passed to
BufferAlloc() as 'relpersistence'.

[src/backend/commands/]
298 buf = ReadBufferWithoutRelcache(rnode, MAIN_FORKNUM, blkno,
299 RBM_NORMAL, bstrategy, false);

Indeed, setting that to true (as per the attached patch) seems to fix this.
I don't see any reason this _shouldn't_ be true from what I've read so far.
We're reading pg_class, which will probably never be UNLOGGED.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

create_db_fix.patchtext/x-diff; charset=us-asciiDownload

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index ef05633bb0..a0259cc593 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -296,7 +296,7 @@ ScanSourceDatabasePgClass(Oid tbid, Oid dbid, char *srcpath)
 		CHECK_FOR_INTERRUPTS();
 
 		buf = ReadBufferWithoutRelcache(rlocator, MAIN_FORKNUM, blkno,
-										RBM_NORMAL, bstrategy, false);
+										RBM_NORMAL, bstrategy, true);
 
 		LockBuffer(buf, BUFFER_LOCK_SHARE);
 		page = BufferGetPage(buf);

Dilip Kumar

dilipbalaut@gmail.com

almost 3 years ago

In reply to: Nathan Bossart (#3)

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On Thu, Feb 16, 2023 at 5:37 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Wed, Feb 15, 2023 at 04:49:38AM +0000, Ryo Matsumura (Fujitsu) wrote:

The above is occured by the following call.
The argument 'permanent' of ReadBufferWithoutRelcache() is passed to
BufferAlloc() as 'relpersistence'.

[src/backend/commands/]
298 buf = ReadBufferWithoutRelcache(rnode, MAIN_FORKNUM, blkno,
299 RBM_NORMAL, bstrategy, false);

Indeed, setting that to true (as per the attached patch) seems to fix this.
I don't see any reason this _shouldn't_ be true from what I've read so far.
We're reading pg_class, which will probably never be UNLOGGED.

Yes, there is no reason to pass this as false, seems like this is
passed false by mistake. And your patch fixes the issue.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Dilip Kumar (#4)

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On Thu, Feb 16, 2023 at 10:24:13AM +0530, Dilip Kumar wrote:

Yes, there is no reason to pass this as false, seems like this is
passed false by mistake. And your patch fixes the issue.

So, if I am understanding this stuff right, this issue can create data
corruption once a DDL updates any pages of pg_class stored in a
template database that gets copied by this routine. In this case, the
patch sent makes sure that any page copied will get written once a
checkpoint kicks in. Shouldn't we have at least a regression test for
such a scenario? The issue can happen when updating a template
database after creating a database from it, which is less worrying
than the initial impression I got, still I'd like to think that we
should have some coverage as of the special logic this code path
relies on for pg_class when reading its buffers.

I have not given much attention to this area, but I am a bit
suspicious that enforcing the default as WAL_LOG was a good idea for
15~, TBH. We are usually much more conservative when it comes to
such choices, switching to the new behavior after a few years would
have been wiser..
--
Michael

Nathan Bossart

nathandbossart@gmail.com

almost 3 years ago

In reply to: Michael Paquier (#5)

1 attachment(s)

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On Thu, Feb 16, 2023 at 02:26:55PM +0900, Michael Paquier wrote:

So, if I am understanding this stuff right, this issue can create data
corruption once a DDL updates any pages of pg_class stored in a
template database that gets copied by this routine. In this case, the
patch sent makes sure that any page copied will get written once a
checkpoint kicks in. Shouldn't we have at least a regression test for
such a scenario? The issue can happen when updating a template
database after creating a database from it, which is less worrying
than the initial impression I got, still I'd like to think that we
should have some coverage as of the special logic this code path
relies on for pg_class when reading its buffers.

I was able to quickly hack together a TAP test that seems to reliably fail
before the fix and pass afterwards. There's probably a better place for
it, though...

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

create_db_fix_v2.patchtext/x-diff; charset=us-asciiDownload

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index ef05633bb0..a0259cc593 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -296,7 +296,7 @@ ScanSourceDatabasePgClass(Oid tbid, Oid dbid, char *srcpath)
 		CHECK_FOR_INTERRUPTS();
 
 		buf = ReadBufferWithoutRelcache(rlocator, MAIN_FORKNUM, blkno,
-										RBM_NORMAL, bstrategy, false);
+										RBM_NORMAL, bstrategy, true);
 
 		LockBuffer(buf, BUFFER_LOCK_SHARE);
 		page = BufferGetPage(buf);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 209118a639..6e9f8a7c7f 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -39,6 +39,7 @@ tests += {
       't/031_recovery_conflict.pl',
       't/032_relfilenode_reuse.pl',
       't/033_replay_tsp_drops.pl',
+      't/100_bugs.pl',
     ],
   },
 }
diff --git a/src/test/recovery/t/100_bugs.pl b/src/test/recovery/t/100_bugs.pl
new file mode 100644
index 0000000000..6429a352e9
--- /dev/null
+++ b/src/test/recovery/t/100_bugs.pl
@@ -0,0 +1,24 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for various bugs found over time
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# DDL on template is persisted after creating database using WAL_LOG strategy
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init;
+$node->start;
+$node->safe_psql("postgres", "CREATE DATABASE test STRATEGY WAL_LOG;");
+$node->safe_psql("template1", "CREATE TABLE test (a INT);");
+$node->safe_psql("postgres", "CHECKPOINT;");
+$node->stop('immediate');
+$node->start;
+my $result = $node->safe_psql("template1", "SELECT count(*) FROM test;");
+is($result, "0", "check table still exists");
+$node->stop;
+
+done_testing();

Dilip Kumar

dilipbalaut@gmail.com

almost 3 years ago

In reply to: Nathan Bossart (#6)

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On Thu, Feb 16, 2023 at 12:11 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:

On Thu, Feb 16, 2023 at 02:26:55PM +0900, Michael Paquier wrote:

So, if I am understanding this stuff right, this issue can create data
corruption once a DDL updates any pages of pg_class stored in a
template database that gets copied by this routine. In this case, the
patch sent makes sure that any page copied will get written once a
checkpoint kicks in. Shouldn't we have at least a regression test for
such a scenario? The issue can happen when updating a template
database after creating a database from it, which is less worrying
than the initial impression I got, still I'd like to think that we
should have some coverage as of the special logic this code path
relies on for pg_class when reading its buffers.

I was able to quickly hack together a TAP test that seems to reliably fail
before the fix and pass afterwards. There's probably a better place for
it, though...

I think the below change is not relevant to this bug right?

diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 209118a639..6e9f8a7c7f 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -39,6 +39,7 @@ tests += {
       't/031_recovery_conflict.pl',
       't/032_relfilenode_reuse.pl',
       't/033_replay_tsp_drops.pl',
+      't/100_bugs.pl',
     ],
   },
 }

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Robert Haas

robertmhaas@gmail.com

almost 3 years ago

In reply to: Dilip Kumar (#7)

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On Thu, Feb 16, 2023 at 12:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I think the below change is not relevant to this bug right?

diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 209118a639..6e9f8a7c7f 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -39,6 +39,7 @@ tests += {
't/031_recovery_conflict.pl',
't/032_relfilenode_reuse.pl',
't/033_replay_tsp_drops.pl',
+      't/100_bugs.pl',
],
},
}

Why not? The patch creates 100_bugs.pl so it also adds it to meson.build.

--
Robert Haas
EDB: http://www.enterprisedb.com

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Robert Haas (#8)

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On Thu, Feb 16, 2023 at 12:37:57PM +0530, Robert Haas wrote:

Why not? The patch creates 100_bugs.pl so it also adds it to meson.build.

It would not matter for REL_15_STABLE, but on HEAD all the new TAP
test files have to be added in their respective meson.build. If you
don't do that, the CFBot would not test it, neither would a local
build with meson.

Adding a test in src/test/recovery/ is OK by me. This is a recovery
case, and that's a.. Bug. Another possible name would be something
like 0XX_create_database.pl, now that's me being picky.
--
Michael

#10

Dilip Kumar

dilipbalaut@gmail.com

almost 3 years ago

In reply to: Robert Haas (#8)

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On Thu, Feb 16, 2023 at 12:38 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Feb 16, 2023 at 12:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I think the below change is not relevant to this bug right?
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 209118a639..6e9f8a7c7f 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -39,6 +39,7 @@ tests += {
't/031_recovery_conflict.pl',
't/032_relfilenode_reuse.pl',
't/033_replay_tsp_drops.pl',
+      't/100_bugs.pl',
],
},
}
Why not? The patch creates 100_bugs.pl so it also adds it to meson.build.

Yeah my bad, I somehow assumed this was an existing file.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#11

Andres Freund

andres@anarazel.de

almost 3 years ago

In reply to: Robert Haas (#8)

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On 2023-02-16 12:37:57 +0530, Robert Haas wrote:

The patch creates 100_bugs.pl

What's the story behind 100_bugs.pl? This name clearly is copied from
src/test/subscription/t/100_bugs.pl - but I've never understood why that is
outside of the normal numbering space.

#12

Amit Kapila

amit.kapila16@gmail.com

almost 3 years ago

In reply to: Andres Freund (#11)

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On Fri, Feb 17, 2023 at 2:59 AM Andres Freund <andres@anarazel.de> wrote:

On 2023-02-16 12:37:57 +0530, Robert Haas wrote:

The patch creates 100_bugs.pl

What's the story behind 100_bugs.pl? This name clearly is copied from
src/test/subscription/t/100_bugs.pl - but I've never understood why that is
outside of the normal numbering space.

Yeah, I have also previously wondered about this name for
src/test/subscription/t/100_bugs.pl. My guess is that it has been kept
to distinguish it from the other feature tests which have numbering
starting from 001.

--
With Regards,
Amit Kapila.

#13

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 3 years ago

In reply to: Andres Freund (#11)

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On 16.02.23 22:29, Andres Freund wrote:

What's the story behind 100_bugs.pl? This name clearly is copied from
src/test/subscription/t/100_bugs.pl - but I've never understood why that is
outside of the normal numbering space.

Mainly to avoid awkwardness for backpatching. The number of tests in
src/test/subscription/ varies quite a bit across branches.

#14

Nathan Bossart

nathandbossart@gmail.com

almost 3 years ago

In reply to: Peter Eisentraut (#13)

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On Fri, Feb 17, 2023 at 03:13:32PM +0100, Peter Eisentraut wrote:

On 16.02.23 22:29, Andres Freund wrote:

What's the story behind 100_bugs.pl? This name clearly is copied from
src/test/subscription/t/100_bugs.pl - but I've never understood why that is
outside of the normal numbering space.

Mainly to avoid awkwardness for backpatching. The number of tests in
src/test/subscription/ varies quite a bit across branches.

I'm happy to move this new test to wherever folks think it should go. I'll
look around to see if I can find a better place, too.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#15

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Nathan Bossart (#14)

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On Fri, Feb 17, 2023 at 02:35:30PM -0800, Nathan Bossart wrote:

I'm happy to move this new test to wherever folks think it should go. I'll
look around to see if I can find a better place, too.

I think that src/test/recovery/ is the best fit, because this stresses
a code path for WAL replay on pg_class for the template db. The name
is not specific enough, though, why not just using something like
0NN_create_database.pl?
--
Michael

#16

Ryo Matsumura (Fujitsu)

matsumura.ryo@fujitsu.com

almost 3 years ago

In reply to: Michael Paquier (#5)

RE: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On Thu, Feb 16, 2023 at 10:24:13AM +0530, Dilip Kumar wrote:

Yes, there is no reason to pass this as false, seems like this is
passed false by mistake. And your patch fixes the issue.

On Thu, Feb 16, 2023 at 02:26:55PM +0900, Michael Paquier wrote:

So, if I am understanding this stuff right, this issue can create data
corruption once a DDL updates any pages of pg_class stored in a
template database that gets copied by this routine. In this case, the
patch sent makes sure that any page copied will get written once a
checkpoint kicks in.

Thank you for comment and patch.
I think that the patch for dbcommand.c is fixed.
So I apply to my environment.

I have not given much attention to this area, but I am a bit
suspicious that enforcing the default as WAL_LOG was a good idea for
15~, TBH. We are usually much more conservative when it comes to
such choices, switching to the new behavior after a few years would
have been wiser..

I think so too. I was surprised that new strategy is default.

Regards
Ryo Matsumura

#17

Nathan Bossart

nathandbossart@gmail.com

almost 3 years ago

In reply to: Michael Paquier (#15)

1 attachment(s)

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On Mon, Feb 20, 2023 at 05:02:01PM +0900, Michael Paquier wrote:

On Fri, Feb 17, 2023 at 02:35:30PM -0800, Nathan Bossart wrote:

I'm happy to move this new test to wherever folks think it should go. I'll
look around to see if I can find a better place, too.

I think that src/test/recovery/ is the best fit, because this stresses
a code path for WAL replay on pg_class for the template db. The name
is not specific enough, though, why not just using something like
0NN_create_database.pl?

Okay. I've renamed the test file as suggested in v3.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

create_db_fix_v3.patchtext/x-diff; charset=us-asciiDownload

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index ef05633bb0..a0259cc593 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -296,7 +296,7 @@ ScanSourceDatabasePgClass(Oid tbid, Oid dbid, char *srcpath)
 		CHECK_FOR_INTERRUPTS();
 
 		buf = ReadBufferWithoutRelcache(rlocator, MAIN_FORKNUM, blkno,
-										RBM_NORMAL, bstrategy, false);
+										RBM_NORMAL, bstrategy, true);
 
 		LockBuffer(buf, BUFFER_LOCK_SHARE);
 		page = BufferGetPage(buf);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 209118a639..59465b97f3 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -39,6 +39,7 @@ tests += {
       't/031_recovery_conflict.pl',
       't/032_relfilenode_reuse.pl',
       't/033_replay_tsp_drops.pl',
+      't/034_create_database.pl',
     ],
   },
 }
diff --git a/src/test/recovery/t/034_create_database.pl b/src/test/recovery/t/034_create_database.pl
new file mode 100644
index 0000000000..872969ad85
--- /dev/null
+++ b/src/test/recovery/t/034_create_database.pl
@@ -0,0 +1,23 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# DDL on template is persisted after creating database using WAL_LOG strategy
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init;
+$node->start;
+$node->safe_psql("postgres", "CREATE DATABASE test STRATEGY WAL_LOG;");
+$node->safe_psql("template1", "CREATE TABLE test (a INT);");
+$node->safe_psql("postgres", "CHECKPOINT;");
+$node->stop('immediate');
+$node->start;
+my $result = $node->safe_psql("template1", "SELECT count(*) FROM test;");
+is($result, "0", "check table still exists");
+$node->stop;
+
+done_testing();

#18

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Nathan Bossart (#17)

1 attachment(s)

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On Mon, Feb 20, 2023 at 04:43:22PM -0800, Nathan Bossart wrote:

On Mon, Feb 20, 2023 at 05:02:01PM +0900, Michael Paquier wrote:

On Fri, Feb 17, 2023 at 02:35:30PM -0800, Nathan Bossart wrote:

I'm happy to move this new test to wherever folks think it should go. I'll
look around to see if I can find a better place, too.

I think that src/test/recovery/ is the best fit, because this stresses
a code path for WAL replay on pg_class for the template db. The name
is not specific enough, though, why not just using something like
0NN_create_database.pl?

Okay. I've renamed the test file as suggested in v3.

The test enforces a checkpoint after the table creation on the
template, so what about testing it also without a checkpoint, like the
extended version attached? I have tweaked a few things in the test,
while on it.
--
Michael

Attachments:

create_db_fix_v4.patchtext/x-diff; charset=us-asciiDownload

From 4404a480bac81db7df3c1f7c9e49bbeab53f8a0b Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Tue, 21 Feb 2023 15:06:42 +0900
Subject: [PATCH] Fix persistence of database template's pg_class after a crash

---
 src/backend/commands/dbcommands.c          |  2 +-
 src/test/recovery/meson.build              |  1 +
 src/test/recovery/t/034_create_database.pl | 64 ++++++++++++++++++++++
 3 files changed, 66 insertions(+), 1 deletion(-)
 create mode 100644 src/test/recovery/t/034_create_database.pl

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index ef05633bb0..a0259cc593 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -296,7 +296,7 @@ ScanSourceDatabasePgClass(Oid tbid, Oid dbid, char *srcpath)
 		CHECK_FOR_INTERRUPTS();
 
 		buf = ReadBufferWithoutRelcache(rlocator, MAIN_FORKNUM, blkno,
-										RBM_NORMAL, bstrategy, false);
+										RBM_NORMAL, bstrategy, true);
 
 		LockBuffer(buf, BUFFER_LOCK_SHARE);
 		page = BufferGetPage(buf);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 209118a639..59465b97f3 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -39,6 +39,7 @@ tests += {
       't/031_recovery_conflict.pl',
       't/032_relfilenode_reuse.pl',
       't/033_replay_tsp_drops.pl',
+      't/034_create_database.pl',
     ],
   },
 }
diff --git a/src/test/recovery/t/034_create_database.pl b/src/test/recovery/t/034_create_database.pl
new file mode 100644
index 0000000000..38062662df
--- /dev/null
+++ b/src/test/recovery/t/034_create_database.pl
@@ -0,0 +1,64 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test WAL replay for CREATE DATABASE .. STRATEGY WAL_LOG.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init;
+$node->start;
+
+# This checks that any DDLs run on the template database that modify pg_class
+# are persisted after creating a database from it using WAL_LOG strategy,
+# as a direct copy of the template database's pg_class is used in this case.
+my $db_template = "template1";
+
+# Create table.  It should persist on the template database.
+$node->safe_psql("postgres",
+	"CREATE DATABASE test_db_1 STRATEGY WAL_LOG TEMPLATE $db_template;");
+
+$node->safe_psql($db_template, "CREATE TABLE tab_db_after_create_1 (a INT);");
+$node->safe_psql("postgres",   "CHECKPOINT;");
+
+$node->stop('immediate');
+$node->start;
+my $result = $node->safe_psql($db_template,
+	"SELECT count(*) FROM pg_class WHERE relname LIKE 'tab_db_%';");
+is($result, "1",
+	"check that table exists on template after crash, with checkpoint");
+
+# The new database should have no tables.
+$result = $node->safe_psql("test_db_1",
+	"SELECT count(*) FROM pg_class WHERE relname LIKE 'tab_db_%';");
+is($result, "0",
+	"check that there are no tables from template on new database after crash"
+);
+
+# Again, but without a checkpoint.
+$node->safe_psql("postgres",
+	"CREATE DATABASE test_db_2 STRATEGY WAL_LOG TEMPLATE $db_template;");
+$node->safe_psql($db_template, "CREATE TABLE tab_db_after_create_2 (a INT);");
+
+$node->stop('immediate');
+$node->start;
+
+$result = $node->safe_psql($db_template,
+	"SELECT count(*) FROM pg_class WHERE relname LIKE 'tab_db_%';");
+is($result, "2",
+	"check that all tables exist on template after crash, without checkpoint"
+);
+
+# The table from the first case should also exist on the second, new,
+# database.
+$result = $node->safe_psql("test_db_2",
+	"SELECT count(*) FROM pg_class WHERE relname LIKE 'tab_db_%';");
+is($result, "1",
+	"check that the table from template exists on new database after crash, without checkpoint"
+);
+
+done_testing();
-- 
2.39.2

#19

Nathan Bossart

nathandbossart@gmail.com

almost 3 years ago

In reply to: Michael Paquier (#18)

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On Tue, Feb 21, 2023 at 03:13:10PM +0900, Michael Paquier wrote:

The test enforces a checkpoint after the table creation on the
template, so what about testing it also without a checkpoint, like the
extended version attached? I have tweaked a few things in the test,
while on it.

What is the purpose of testing it without the checkpoint? Other than that
question, the patch looks reasonable to me.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#20

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Nathan Bossart (#19)

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On Tue, Feb 21, 2023 at 10:00:11AM -0800, Nathan Bossart wrote:

What is the purpose of testing it without the checkpoint?

Perhaps none, I was wondering whether it would be worth testing that
with the flush phase, but perhaps that's just extra cycles wasted at
this point.

Other than that
question, the patch looks reasonable to me.

Okay, applied and backpatched with a minimal test set, then. I have
kept the tweaks I did to the tests with extra comments.
--
Michael

#21

Nathan Bossart

nathandbossart@gmail.com

almost 3 years ago

In reply to: Michael Paquier (#20)

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

On Wed, Feb 22, 2023 at 12:30:20PM +0900, Michael Paquier wrote:

Okay, applied and backpatched with a minimal test set, then. I have
kept the tweaks I did to the tests with extra comments.

Thanks!

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com