Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

Started by Justin Pryzbyover 3 years ago3 messages
#1Justin Pryzby
pryzby@telsasoft.com

On Tue, Aug 02, 2022 at 12:50:43PM -0500, Justin Pryzby wrote:

Also, if I understand correctly, this patch seems to assume that nobody is
connected to the source database. But what's actually enforced is just that
nobody *else* is connected. Is it any issue that the current DB can be used as
a source? Anyway, both of the above problems are reproducible using a
different database.

|postgres=# CREATE DATABASE new TEMPLATE postgres STRATEGY wal_log;
|CREATE DATABASE

On Thu, Aug 04, 2022 at 05:16:04PM -0500, Justin Pryzby wrote:

On Thu, Aug 04, 2022 at 06:02:50PM -0400, Tom Lane wrote:

The "invalidation" comment bothered me for awhile, but I think it's
fine: we know that no other backend can connect to the source DB
because we have it locked,

About that - is it any problem that the currently-connected db can be used as a
template? It's no issue for 2-phase commit, because "create database" cannot
run in an txn.

Would anybody want to comment on this ?
Is it okay that the *current* DB can be used as a template ?

#2Dilip Kumar
dilipbalaut@gmail.com
In reply to: Justin Pryzby (#1)

On Fri, Sep 2, 2022 at 5:25 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Tue, Aug 02, 2022 at 12:50:43PM -0500, Justin Pryzby wrote:

Also, if I understand correctly, this patch seems to assume that nobody is
connected to the source database. But what's actually enforced is just that
nobody *else* is connected. Is it any issue that the current DB can be used as
a source? Anyway, both of the above problems are reproducible using a
different database.

|postgres=# CREATE DATABASE new TEMPLATE postgres STRATEGY wal_log;
|CREATE DATABASE

On Thu, Aug 04, 2022 at 05:16:04PM -0500, Justin Pryzby wrote:

On Thu, Aug 04, 2022 at 06:02:50PM -0400, Tom Lane wrote:

The "invalidation" comment bothered me for awhile, but I think it's
fine: we know that no other backend can connect to the source DB
because we have it locked,

About that - is it any problem that the currently-connected db can be used as a
template? It's no issue for 2-phase commit, because "create database" cannot
run in an txn.

Would anybody want to comment on this ?
Is it okay that the *current* DB can be used as a template ?

I don't think there should be any problem with that. The main problem
could have been that since we are reading the pg_class tuple block by
block there could be an issue if someone concurrently modifies the
pg_class or there are some tuples that are inserted by the prepared
transaction. But in this case, the same backend can not have an open
prepared transaction while creating a database and that backend of
course can not perform any parallel operation as well.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#3Yoshikazu Imai (Fujitsu)
imai.yoshikazu@fujitsu.com
In reply to: Dilip Kumar (#2)
RE: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

Hi.

While investigating the codes in RelationCopyStorageUsingBuffer, I wonder that
there is any reason to use RBM_NORMAL for acquiring destination buffer.
I think we can use RBM_ZERO_AND_LOCK here instead of RBM_NORMAL.

When we use RBM_NORMAL, a destination block is read by smgrread and it's
content is verified with PageIsVerifiedExtended in ReadBuffer_common.
A page verification normally succeeds because destination blocks are
zero-filled by using smgrextend, but do we trust that it is surely zero-filled?

On Fri, Aug 5, 2022 at 00:32 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

Hmm ... makes sense. We'd be using smgrextend to write just the last page
of the file, relying on its API spec "Note that we assume writing a block
beyond current EOF causes intervening file space to become filled with
zeroes". I don't know that we're using that assumption aggressively
today, but as long as it doesn't confuse the kernel it'd probably be fine.

If there is a block which is not zero-filled, page verification would fail and
eventually CREATE DATABASE would fail.
(I also think that originally we don't need page verification for destination blocks
here because those blocks are just overwritten by source buffer.)

Also, from performance POV, I think it is good to use RBM_ZERO_AND_LOCK.
In RBM_NORMAL, destination blocks are read from disk by smgrread each time, but
in RBM_ZERO_AND_LOCK, we only set buffers zero-filled by MemSet and don't
access the disk which makes performance better.
If we expect the destination buffer is always zero-filled, we can use
RBM_ZERO_AND_LOCK.

Apart from above, there seems to be an old comment which is for the old codes
when acquiring destination buffer by using P_NEW.

"/* Use P_NEW to extend the destination relation. */"

--
Yoshikazu Imai

Show quoted text

-----Original Message-----
From: Dilip Kumar <dilipbalaut@gmail.com>
Sent: Friday, September 2, 2022 11:22 PM
To: Justin Pryzby <pryzby@telsasoft.com>
Cc: Robert Haas <robertmhaas@gmail.com>; Tom Lane <tgl@sss.pgh.pa.us>; Andres Freund <andres@anarazel.de>;
Ashutosh Sharma <ashu.coek88@gmail.com>; Maciek Sakrejda <m.sakrejda@gmail.com>; Bruce Momjian
<bruce@momjian.us>; Alvaro Herrera <alvherre@alvh.no-ip.org>; Andrew Dunstan <andrew@dunslane.net>; Heikki
Linnakangas <hlinnaka@iki.fi>; pgsql-hackers@lists.postgresql.org; Thomas Munro <thomas.munro@gmail.com>
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

On Fri, Sep 2, 2022 at 5:25 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Tue, Aug 02, 2022 at 12:50:43PM -0500, Justin Pryzby wrote:

Also, if I understand correctly, this patch seems to assume that
nobody is connected to the source database. But what's actually
enforced is just that nobody *else* is connected. Is it any issue
that the current DB can be used as a source? Anyway, both of the
above problems are reproducible using a different database.

|postgres=# CREATE DATABASE new TEMPLATE postgres STRATEGY wal_log;
|CREATE DATABASE

On Thu, Aug 04, 2022 at 05:16:04PM -0500, Justin Pryzby wrote:

On Thu, Aug 04, 2022 at 06:02:50PM -0400, Tom Lane wrote:

The "invalidation" comment bothered me for awhile, but I think
it's
fine: we know that no other backend can connect to the source DB
because we have it locked,

About that - is it any problem that the currently-connected db can
be used as a template? It's no issue for 2-phase commit, because
"create database" cannot run in an txn.

Would anybody want to comment on this ?
Is it okay that the *current* DB can be used as a template ?

I don't think there should be any problem with that. The main problem could have been that since we are reading the
pg_class tuple block by block there could be an issue if someone concurrently modifies the pg_class or there are some
tuples that are inserted by the prepared transaction. But in this case, the same backend can not have an open prepared
transaction while creating a database and that backend of course can not perform any parallel operation as well.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com