Recovery performance of DROP DATABASE with many tablespaces
Hi,
My colleague encountered the problem that WAL replay took a long time
in the standby with large shared_buffers when he dropped the database
using many tablespaces. As far as I read the code, this happens because
DROP DATABASE generates as many XLOG_DBASE_DROP WAL records as
the number of tablespaces that the database to drop uses,
and then WAL replay of one XLOG_DBASE_DROP record causes full scan of
shared_buffers. That is, DROP DATABASE causes the scans of shared_buffers
as many times as the number of the tablespaces during recovery.
Since the first scan caused by the first XLOG_DBASE_DROP record invalidates
all the pages related to the database to drop, in shared_buffers,
the subsequent scans by the subsequent records seem basically useless.
So I'd like to change the code so that we can avoid such subsequent
unnecessary scans, to reduce the recovery time of DROP DATABASE.
Generally the recovery performance of DROP DATABASE is not critical
for many users. But unfortunately my colleague's project might need to
sometimes drop the database using multiple tablespaces, for some reasons.
So, if the fix is not so complicated, I think that it's worth applying that.
The straight approach to avoid such unnecessary scans is to change
DROP DATABASE so that it generates only one XLOG_DBASE_DROP record,
and register the information of all the tablespace into it. Then, WAL replay
of XLOG_DBASE_DROP record scans shared_buffers once and deletes
all tablespaces. POC patch is attached.
Thought?
Regards,
--
Fujii Masao
Attachments:
xlog-dbase-drop_v1.patchapplication/octet-stream; name=xlog-dbase-drop_v1.patchDownload
*** a/src/backend/access/rmgrdesc/dbasedesc.c
--- b/src/backend/access/rmgrdesc/dbasedesc.c
***************
*** 35,43 **** dbase_desc(StringInfo buf, XLogReaderState *record)
else if (info == XLOG_DBASE_DROP)
{
xl_dbase_drop_rec *xlrec = (xl_dbase_drop_rec *) rec;
! appendStringInfo(buf, "dir %u/%u",
! xlrec->db_id, xlrec->tablespace_id);
}
}
--- 35,46 ----
else if (info == XLOG_DBASE_DROP)
{
xl_dbase_drop_rec *xlrec = (xl_dbase_drop_rec *) rec;
+ int i;
! appendStringInfo(buf, "dir");
! for (i = 0; i < xlrec->tablespace_num; i++)
! appendStringInfo(buf, " %u/%u",
! xlrec->db_id, xlrec->tablespace_ids[i]);
}
}
*** a/src/backend/commands/dbcommands.c
--- b/src/backend/commands/dbcommands.c
***************
*** 1362,1371 **** movedb(const char *dbname, const char *tblspcname)
xl_dbase_drop_rec xlrec;
xlrec.db_id = db_id;
! xlrec.tablespace_id = src_tblspcoid;
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, sizeof(xl_dbase_drop_rec));
(void) XLogInsert(RM_DBASE_ID,
XLOG_DBASE_DROP | XLR_SPECIAL_REL_UPDATE);
--- 1362,1372 ----
xl_dbase_drop_rec xlrec;
xlrec.db_id = db_id;
! xlrec.tablespace_num = 1;
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, sizeof(xl_dbase_drop_rec));
+ XLogRegisterData((char *) &src_tblspcoid, sizeof(Oid));
(void) XLogInsert(RM_DBASE_ID,
XLOG_DBASE_DROP | XLR_SPECIAL_REL_UPDATE);
***************
*** 1873,1878 **** remove_dbtablespaces(Oid db_id)
--- 1874,1884 ----
Relation rel;
HeapScanDesc scan;
HeapTuple tuple;
+ List *ltblspc = NIL;
+ ListCell *cell;
+ int ntblspc;
+ int i;
+ Oid *tablespace_ids;
rel = heap_open(TableSpaceRelationId, AccessShareLock);
scan = heap_beginscan_catalog(rel, 0, NULL);
***************
*** 1900,1922 **** remove_dbtablespaces(Oid db_id)
(errmsg("some useless files may be left behind in old database directory \"%s\"",
dstpath)));
! /* Record the filesystem change in XLOG */
! {
! xl_dbase_drop_rec xlrec;
! xlrec.db_id = db_id;
! xlrec.tablespace_id = dsttablespace;
! XLogBeginInsert();
! XLogRegisterData((char *) &xlrec, sizeof(xl_dbase_drop_rec));
! (void) XLogInsert(RM_DBASE_ID,
! XLOG_DBASE_DROP | XLR_SPECIAL_REL_UPDATE);
! }
! pfree(dstpath);
}
heap_endscan(scan);
heap_close(rel, AccessShareLock);
}
--- 1906,1940 ----
(errmsg("some useless files may be left behind in old database directory \"%s\"",
dstpath)));
! ltblspc = lappend_oid(ltblspc, dsttablespace);
! pfree(dstpath);
! }
! ntblspc = list_length(ltblspc);
! Assert(ntblspc > 0);
! tablespace_ids = (Oid *) palloc(ntblspc * sizeof(Oid));
! i = 0;
! foreach(cell, ltblspc)
! tablespace_ids[i++] = lfirst_oid(cell);
! /* Record the filesystem change in XLOG */
! {
! xl_dbase_drop_rec xlrec;
! xlrec.db_id = db_id;
! xlrec.tablespace_num = ntblspc;
! XLogBeginInsert();
! XLogRegisterData((char *) &xlrec, MinSizeOfDbaseDropRec);
! XLogRegisterData((char *) tablespace_ids, ntblspc * sizeof(Oid));
!
! (void) XLogInsert(RM_DBASE_ID,
! XLOG_DBASE_DROP | XLR_SPECIAL_REL_UPDATE);
}
+ list_free(ltblspc);
+ pfree(tablespace_ids);
+
heap_endscan(scan);
heap_close(rel, AccessShareLock);
}
***************
*** 2122,2129 **** dbase_redo(XLogReaderState *record)
{
xl_dbase_drop_rec *xlrec = (xl_dbase_drop_rec *) XLogRecGetData(record);
char *dst_path;
!
! dst_path = GetDatabasePath(xlrec->db_id, xlrec->tablespace_id);
if (InHotStandby)
{
--- 2140,2146 ----
{
xl_dbase_drop_rec *xlrec = (xl_dbase_drop_rec *) XLogRecGetData(record);
char *dst_path;
! int i;
if (InHotStandby)
{
***************
*** 2153,2163 **** dbase_redo(XLogReaderState *record)
/* Clean out the xlog relcache too */
XLogDropDatabase(xlrec->db_id);
! /* And remove the physical files */
! if (!rmtree(dst_path, true))
! ereport(WARNING,
! (errmsg("some useless files may be left behind in old database directory \"%s\"",
! dst_path)));
if (InHotStandby)
{
--- 2170,2186 ----
/* Clean out the xlog relcache too */
XLogDropDatabase(xlrec->db_id);
! for (i = 0; i < xlrec->tablespace_num; i++)
! {
! dst_path = GetDatabasePath(xlrec->db_id, xlrec->tablespace_ids[i]);
!
! /* And remove the physical files */
! if (!rmtree(dst_path, true))
! ereport(WARNING,
! (errmsg("some useless files may be left behind in old database directory \"%s\"",
! dst_path)));
! pfree(dst_path);
! }
if (InHotStandby)
{
*** a/src/include/commands/dbcommands_xlog.h
--- b/src/include/commands/dbcommands_xlog.h
***************
*** 32,41 **** typedef struct xl_dbase_create_rec
typedef struct xl_dbase_drop_rec
{
- /* Records dropping of a single subdirectory incl. contents */
Oid db_id;
! Oid tablespace_id;
} xl_dbase_drop_rec;
extern void dbase_redo(XLogReaderState *rptr);
extern void dbase_desc(StringInfo buf, XLogReaderState *rptr);
--- 32,42 ----
typedef struct xl_dbase_drop_rec
{
Oid db_id;
! int tablespace_num; /* number of tablespace IDs */
! Oid tablespace_ids[FLEXIBLE_ARRAY_MEMBER];
} xl_dbase_drop_rec;
+ #define MinSizeOfDbaseDropRec offsetof(xl_dbase_drop_rec, tablespace_ids)
extern void dbase_redo(XLogReaderState *rptr);
extern void dbase_desc(StringInfo buf, XLogReaderState *rptr);
On Mon, Jun 4, 2018 at 9:46 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Generally the recovery performance of DROP DATABASE is not critical
for many users. But unfortunately my colleague's project might need to
sometimes drop the database using multiple tablespaces, for some reasons.
So, if the fix is not so complicated, I think that it's worth applying
that.
Agree, in isolation need for this improvement is not felt, but yes any
improvements for single serialized replay process is definitely helpful.
The straight approach to avoid such unnecessary scans is to change
DROP DATABASE so that it generates only one XLOG_DBASE_DROP record,
and register the information of all the tablespace into it. Then, WAL
replay
of XLOG_DBASE_DROP record scans shared_buffers once and deletes
all tablespaces. POC patch is attached.
Also, irrespective of performance improvement looks better to just have
single xlog record for the same.
Hi, Fujii-san
I came across this post and I got interested in it,
so I tried to apply/test the patch but I am not sure if I did it correctly.
I set-up master-slave sync, 200GB shared_buffers, 20000 max_locks_per_transaction,
1 DB with 500 table partitions shared evenly across 5 tablespaces.
After dropping the db, with or without patch,
there were no difference in recovery performance when dropping database,
so maybe I made a mistake somewhere. But anyway, here's the results.
======WITHOUT PATCH=======
[200GB shared buffers]
DROPDB only (skipped DROP TABLE and DROP TABLESPACE)
2018/07/04_13:35:00.161
dropdb
2018/07/04_13:35:05.591 5.591 sec
[200GB shared_buffers]
DROPDB (including DROP TABLE and DROP TABLESPACE)
real 3m19.717s
user 0m0.001s
sys 0m0.001s
======WITH PATCH=======
[200GB shared_buffers]
DROPDB only (skipped DROP TABLE and DROP TABLESPACE)
2018/07/04_14:19:47.128
dropdb
2018/07/04_14:19:53.177 6.049 sec
[200GB shared_buffers]
DROPDB (included the DROP TABLE and DROP TABLESPACE commands)
real 3m51.834s
user 0m0.001s
sys 0m0.002s
Just in case, do you also have some performance test numbers/case
to show the recovery perf improvement when dropping database that contain multiple tablespaces?
Regards,
Kirk Jamison
On Wed, Jul 4, 2018 at 4:47 PM, Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:
Hi, Fujii-san
I came across this post and I got interested in it,
so I tried to apply/test the patch but I am not sure if I did it correctly.
I set-up master-slave sync, 200GB shared_buffers, 20000 max_locks_per_transaction,
1 DB with 500 table partitions shared evenly across 5 tablespaces.After dropping the db, with or without patch,
there were no difference in recovery performance when dropping database,
so maybe I made a mistake somewhere. But anyway, here's the results.======WITHOUT PATCH=======
[200GB shared buffers]
DROPDB only (skipped DROP TABLE and DROP TABLESPACE)
2018/07/04_13:35:00.161
dropdb
2018/07/04_13:35:05.591 5.591 sec[200GB shared_buffers]
DROPDB (including DROP TABLE and DROP TABLESPACE)
real 3m19.717s
user 0m0.001s
sys 0m0.001s======WITH PATCH=======
[200GB shared_buffers]
DROPDB only (skipped DROP TABLE and DROP TABLESPACE)
2018/07/04_14:19:47.128
dropdb
2018/07/04_14:19:53.177 6.049 sec[200GB shared_buffers]
DROPDB (included the DROP TABLE and DROP TABLESPACE commands)
real 3m51.834s
user 0m0.001s
sys 0m0.002sJust in case, do you also have some performance test numbers/case
to show the recovery perf improvement when dropping database that contain multiple tablespaces?
Thanks for testing!
TBH, I have no numbers measured by the test.
One question about your test is; how did you measure the *recovery time*
of DROP DATABASE? Since it's *recovery* performance, basically it's not easy
to measure that.
Regards,
--
Fujii Masao
On 4 June 2018 at 17:46, Fujii Masao <masao.fujii@gmail.com> wrote:
Hi,
My colleague encountered the problem that WAL replay took a long time
in the standby with large shared_buffers when he dropped the database
using many tablespaces. As far as I read the code, this happens because
DROP DATABASE generates as many XLOG_DBASE_DROP WAL records as
the number of tablespaces that the database to drop uses,
and then WAL replay of one XLOG_DBASE_DROP record causes full scan of
shared_buffers. That is, DROP DATABASE causes the scans of shared_buffers
as many times as the number of the tablespaces during recovery.Since the first scan caused by the first XLOG_DBASE_DROP record invalidates
all the pages related to the database to drop, in shared_buffers,
the subsequent scans by the subsequent records seem basically useless.
So I'd like to change the code so that we can avoid such subsequent
unnecessary scans, to reduce the recovery time of DROP DATABASE.
+1
Generally the recovery performance of DROP DATABASE is not critical
for many users. But unfortunately my colleague's project might need to
sometimes drop the database using multiple tablespaces, for some reasons.
So, if the fix is not so complicated, I think that it's worth applying that.
Agreed
The straight approach to avoid such unnecessary scans is to change
DROP DATABASE so that it generates only one XLOG_DBASE_DROP record,
and register the information of all the tablespace into it. Then, WAL replay
of XLOG_DBASE_DROP record scans shared_buffers once and deletes
all tablespaces. POC patch is attached.
Seems clear on read of patch, but not tested it.
Please replace tablespace_num with ntablespaces so its clearer and
consistent with other other WAL records
Cheers
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jul 05, 2018 at 01:42:20AM +0900, Fujii Masao wrote:
TBH, I have no numbers measured by the test.
One question about your test is; how did you measure the *recovery time*
of DROP DATABASE? Since it's *recovery* performance, basically it's not easy
to measure that.
It would be simple to measure the time it takes to replay this single
DROP DATABASE record by putting two gettimeofday() calls or such things
and then take the time difference. There are many methods that you
could use here, and I suppose that with a shared buffer setting of a
couple of GBs of shared buffers you would see a measurable difference
with a dozen of tablespaces or so. You could also take a base backup
after creating all the tablespaces, connect the standby and then drop
the database on the primary to see the actual time it takes. Your patch
looks logically correct to me because DropDatabaseBuffers is a
*bottleneck* with large shared_buffers, and it would be nice to see
numbers.
--
Michael
On Tue, Jul 10, 2018 at 03:04:05PM +0900, Michael Paquier wrote:
It would be simple to measure the time it takes to replay this single
DROP DATABASE record by putting two gettimeofday() calls or such things
and then take the time difference. There are many methods that you
could use here, and I suppose that with a shared buffer setting of a
couple of GBs of shared buffers you would see a measurable difference
with a dozen of tablespaces or so. You could also take a base backup
after creating all the tablespaces, connect the standby and then drop
the database on the primary to see the actual time it takes. Your patch
looks logically correct to me because DropDatabaseBuffers is a
*bottleneck* with large shared_buffers, and it would be nice to see
numbers.
This was a couple of months ago, and nothing has happened since with the
patch waiting on author, so the patch is marked as returned with
feedback.
--
Michael
On Thu, Jul 5, 2018 at 5:15 PM Simon Riggs <simon@2ndquadrant.com> wrote:
On 4 June 2018 at 17:46, Fujii Masao <masao.fujii@gmail.com> wrote:
Hi,
My colleague encountered the problem that WAL replay took a long time
in the standby with large shared_buffers when he dropped the database
using many tablespaces. As far as I read the code, this happens because
DROP DATABASE generates as many XLOG_DBASE_DROP WAL records as
the number of tablespaces that the database to drop uses,
and then WAL replay of one XLOG_DBASE_DROP record causes full scan of
shared_buffers. That is, DROP DATABASE causes the scans of shared_buffers
as many times as the number of the tablespaces during recovery.Since the first scan caused by the first XLOG_DBASE_DROP record invalidates
all the pages related to the database to drop, in shared_buffers,
the subsequent scans by the subsequent records seem basically useless.
So I'd like to change the code so that we can avoid such subsequent
unnecessary scans, to reduce the recovery time of DROP DATABASE.+1
Generally the recovery performance of DROP DATABASE is not critical
for many users. But unfortunately my colleague's project might need to
sometimes drop the database using multiple tablespaces, for some reasons.
So, if the fix is not so complicated, I think that it's worth applying that.Agreed
The straight approach to avoid such unnecessary scans is to change
DROP DATABASE so that it generates only one XLOG_DBASE_DROP record,
and register the information of all the tablespace into it. Then, WAL replay
of XLOG_DBASE_DROP record scans shared_buffers once and deletes
all tablespaces. POC patch is attached.Seems clear on read of patch, but not tested it.
Please replace tablespace_num with ntablespaces so its clearer and
consistent with other other WAL records
Thanks for the review! I changed the patch that way. Also I rebased it
on master.
Attached is the latest version of the patch.
I will add this patch to next CommitFest.
Regards,
--
Fujii Masao
Attachments:
xlog_dbase_drop_v2.patchapplication/octet-stream; name=xlog_dbase_drop_v2.patchDownload
diff --git a/src/backend/access/rmgrdesc/dbasedesc.c b/src/backend/access/rmgrdesc/dbasedesc.c
index c7d60ce10d..d08c575872 100644
*** a/src/backend/access/rmgrdesc/dbasedesc.c
--- b/src/backend/access/rmgrdesc/dbasedesc.c
***************
*** 35,43 **** dbase_desc(StringInfo buf, XLogReaderState *record)
else if (info == XLOG_DBASE_DROP)
{
xl_dbase_drop_rec *xlrec = (xl_dbase_drop_rec *) rec;
! appendStringInfo(buf, "dir %u/%u",
! xlrec->tablespace_id, xlrec->db_id);
}
}
--- 35,46 ----
else if (info == XLOG_DBASE_DROP)
{
xl_dbase_drop_rec *xlrec = (xl_dbase_drop_rec *) rec;
+ int i;
! appendStringInfo(buf, "dir");
! for (i = 0; i < xlrec->ntablespaces; i++)
! appendStringInfo(buf, " %u/%u",
! xlrec->tablespace_ids[i], xlrec->db_id);
}
}
diff --git a/src/backend/commands/dbcommands.cindex 01d66212e9..643a539efe 100644
*** a/src/backend/commands/dbcommands.c
--- b/src/backend/commands/dbcommands.c
***************
*** 1404,1413 **** movedb(const char *dbname, const char *tblspcname)
xl_dbase_drop_rec xlrec;
xlrec.db_id = db_id;
! xlrec.tablespace_id = src_tblspcoid;
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, sizeof(xl_dbase_drop_rec));
(void) XLogInsert(RM_DBASE_ID,
XLOG_DBASE_DROP | XLR_SPECIAL_REL_UPDATE);
--- 1404,1414 ----
xl_dbase_drop_rec xlrec;
xlrec.db_id = db_id;
! xlrec.ntablespaces = 1;
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, sizeof(xl_dbase_drop_rec));
+ XLogRegisterData((char *) &src_tblspcoid, sizeof(Oid));
(void) XLogInsert(RM_DBASE_ID,
XLOG_DBASE_DROP | XLR_SPECIAL_REL_UPDATE);
***************
*** 1915,1920 **** remove_dbtablespaces(Oid db_id)
--- 1916,1926 ----
Relation rel;
TableScanDesc scan;
HeapTuple tuple;
+ List *ltblspc = NIL;
+ ListCell *cell;
+ int ntblspc;
+ int i;
+ Oid *tablespace_ids;
rel = table_open(TableSpaceRelationId, AccessShareLock);
scan = table_beginscan_catalog(rel, 0, NULL);
***************
*** 1943,1965 **** remove_dbtablespaces(Oid db_id)
(errmsg("some useless files may be left behind in old database directory \"%s\"",
dstpath)));
! /* Record the filesystem change in XLOG */
! {
! xl_dbase_drop_rec xlrec;
! xlrec.db_id = db_id;
! xlrec.tablespace_id = dsttablespace;
! XLogBeginInsert();
! XLogRegisterData((char *) &xlrec, sizeof(xl_dbase_drop_rec));
! (void) XLogInsert(RM_DBASE_ID,
! XLOG_DBASE_DROP | XLR_SPECIAL_REL_UPDATE);
! }
! pfree(dstpath);
}
table_endscan(scan);
table_close(rel, AccessShareLock);
}
--- 1949,1989 ----
(errmsg("some useless files may be left behind in old database directory \"%s\"",
dstpath)));
! ltblspc = lappend_oid(ltblspc, dsttablespace);
! pfree(dstpath);
! }
! ntblspc = list_length(ltblspc);
! if (ntblspc == 0)
! {
! table_endscan(scan);
! table_close(rel, AccessShareLock);
! return;
! }
! tablespace_ids = (Oid *) palloc(ntblspc * sizeof(Oid));
! i = 0;
! foreach(cell, ltblspc)
! tablespace_ids[i++] = lfirst_oid(cell);
! /* Record the filesystem change in XLOG */
! {
! xl_dbase_drop_rec xlrec;
! xlrec.db_id = db_id;
! xlrec.ntablespaces = ntblspc;
!
! XLogBeginInsert();
! XLogRegisterData((char *) &xlrec, MinSizeOfDbaseDropRec);
! XLogRegisterData((char *) tablespace_ids, ntblspc * sizeof(Oid));
!
! (void) XLogInsert(RM_DBASE_ID,
! XLOG_DBASE_DROP | XLR_SPECIAL_REL_UPDATE);
}
+ list_free(ltblspc);
+ pfree(tablespace_ids);
+
table_endscan(scan);
table_close(rel, AccessShareLock);
}
***************
*** 2166,2173 **** dbase_redo(XLogReaderState *record)
{
xl_dbase_drop_rec *xlrec = (xl_dbase_drop_rec *) XLogRecGetData(record);
char *dst_path;
!
! dst_path = GetDatabasePath(xlrec->db_id, xlrec->tablespace_id);
if (InHotStandby)
{
--- 2190,2196 ----
{
xl_dbase_drop_rec *xlrec = (xl_dbase_drop_rec *) XLogRecGetData(record);
char *dst_path;
! int i;
if (InHotStandby)
{
***************
*** 2197,2207 **** dbase_redo(XLogReaderState *record)
/* Clean out the xlog relcache too */
XLogDropDatabase(xlrec->db_id);
! /* And remove the physical files */
! if (!rmtree(dst_path, true))
! ereport(WARNING,
! (errmsg("some useless files may be left behind in old database directory \"%s\"",
! dst_path)));
if (InHotStandby)
{
--- 2220,2236 ----
/* Clean out the xlog relcache too */
XLogDropDatabase(xlrec->db_id);
! for (i = 0; i < xlrec->ntablespaces; i++)
! {
! dst_path = GetDatabasePath(xlrec->db_id, xlrec->tablespace_ids[i]);
!
! /* And remove the physical files */
! if (!rmtree(dst_path, true))
! ereport(WARNING,
! (errmsg("some useless files may be left behind in old database directory \"%s\"",
! dst_path)));
! pfree(dst_path);
! }
if (InHotStandby)
{
diff --git a/src/include/commands/dbcommindex 46be8a615a..4e8a442c89 100644
*** a/src/include/commands/dbcommands_xlog.h
--- b/src/include/commands/dbcommands_xlog.h
***************
*** 32,41 **** typedef struct xl_dbase_create_rec
typedef struct xl_dbase_drop_rec
{
- /* Records dropping of a single subdirectory incl. contents */
Oid db_id;
! Oid tablespace_id;
} xl_dbase_drop_rec;
extern void dbase_redo(XLogReaderState *rptr);
extern void dbase_desc(StringInfo buf, XLogReaderState *rptr);
--- 32,42 ----
typedef struct xl_dbase_drop_rec
{
Oid db_id;
! int ntablespaces; /* number of tablespace IDs */
! Oid tablespace_ids[FLEXIBLE_ARRAY_MEMBER];
} xl_dbase_drop_rec;
+ #define MinSizeOfDbaseDropRec offsetof(xl_dbase_drop_rec, tablespace_ids)
extern void dbase_redo(XLogReaderState *rptr);
extern void dbase_desc(StringInfo buf, XLogReaderState *rptr);
On Tue, Jul 10, 2018 at 3:04 PM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Jul 05, 2018 at 01:42:20AM +0900, Fujii Masao wrote:
TBH, I have no numbers measured by the test.
One question about your test is; how did you measure the *recovery time*
of DROP DATABASE? Since it's *recovery* performance, basically it's not easy
to measure that.It would be simple to measure the time it takes to replay this single
DROP DATABASE record by putting two gettimeofday() calls or such things
and then take the time difference. There are many methods that you
could use here, and I suppose that with a shared buffer setting of a
couple of GBs of shared buffers you would see a measurable difference
with a dozen of tablespaces or so. You could also take a base backup
after creating all the tablespaces, connect the standby and then drop
the database on the primary to see the actual time it takes. Your patch
looks logically correct to me because DropDatabaseBuffers is a
*bottleneck* with large shared_buffers, and it would be nice to see
numbers.
Thanks for the comment!
I measured how long it takes to replay DROP DATABASE with 1000 tablespaces,
in master and patched version. shared_buffers was set to 16GB.
[master]
It took 8 seconds to replay DROP DATABASE with 1000 tablespaces,
as follows. In this case, 16GB shared_buffers was fully scanned 1000 times.
2019-10-02 16:50:14 JST LOG: redo starts at 0/2000028
2019-10-02 16:50:22 JST LOG: redo done at 0/300A298
[patched]
It took less than 1 second to replay DROP DATABASE with 1000 tablespaces,
as follows. In this case, 16GB shared_buffers was scanned only one time.
2019-10-02 16:47:03 JST LOG: redo starts at 0/2000028
2019-10-02 16:47:03 JST LOG: redo done at 0/3001588
Regards,
--
Fujii Masao
On Wed, Oct. 2, 2019 5:40 PM, Fujii Masao wrote:
On Tue, Jul 10, 2018 at 3:04 PM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Jul 05, 2018 at 01:42:20AM +0900, Fujii Masao wrote:
TBH, I have no numbers measured by the test.
One question about your test is; how did you measure the *recovery
time* of DROP DATABASE? Since it's *recovery* performance, basically
it's not easy to measure that.It would be simple to measure the time it takes to replay this single
DROP DATABASE record by putting two gettimeofday() calls or such
things and then take the time difference. There are many methods that
you could use here, and I suppose that with a shared buffer setting of
a couple of GBs of shared buffers you would see a measurable
difference with a dozen of tablespaces or so. You could also take a
base backup after creating all the tablespaces, connect the standby
and then drop the database on the primary to see the actual time it
takes. Your patch looks logically correct to me because
DropDatabaseBuffers is a
*bottleneck* with large shared_buffers, and it would be nice to see
numbers.Thanks for the comment!
I measured how long it takes to replay DROP DATABASE with 1000 tablespaces,
in master and patched version. shared_buffers was set to 16GB.[master]
It took 8 seconds to replay DROP DATABASE with 1000 tablespaces, as follows.
In this case, 16GB shared_buffers was fully scanned 1000 times.2019-10-02 16:50:14 JST LOG: redo starts at 0/2000028
2019-10-02 16:50:22 JST LOG: redo done at 0/300A298[patched]
It took less than 1 second to replay DROP DATABASE with 1000 tablespaces,
as follows. In this case, 16GB shared_buffers was scanned only one time.2019-10-02 16:47:03 JST LOG: redo starts at 0/2000028
2019-10-02 16:47:03 JST LOG: redo done at 0/3001588
Hi Fujii-san,
It's been a while, so I checked the patch once again.
It's fairly straightforward and I saw no problems nor bug in the code.
[patched]
It took less than 1 second to replay DROP DATABASE with 1000 tablespaces,
The results are good.
I want to replicate the performance to confirm the results as well.
Could you share how you measured the recovery replay?
Did you actually execute a failover?
Regards,
Kirk Jamison
On Wed, Nov 13, 2019 at 3:57 PM k.jamison@fujitsu.com
<k.jamison@fujitsu.com> wrote:
On Wed, Oct. 2, 2019 5:40 PM, Fujii Masao wrote:
On Tue, Jul 10, 2018 at 3:04 PM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Jul 05, 2018 at 01:42:20AM +0900, Fujii Masao wrote:
TBH, I have no numbers measured by the test.
One question about your test is; how did you measure the *recovery
time* of DROP DATABASE? Since it's *recovery* performance, basically
it's not easy to measure that.It would be simple to measure the time it takes to replay this single
DROP DATABASE record by putting two gettimeofday() calls or such
things and then take the time difference. There are many methods that
you could use here, and I suppose that with a shared buffer setting of
a couple of GBs of shared buffers you would see a measurable
difference with a dozen of tablespaces or so. You could also take a
base backup after creating all the tablespaces, connect the standby
and then drop the database on the primary to see the actual time it
takes. Your patch looks logically correct to me because
DropDatabaseBuffers is a
*bottleneck* with large shared_buffers, and it would be nice to see
numbers.Thanks for the comment!
I measured how long it takes to replay DROP DATABASE with 1000 tablespaces,
in master and patched version. shared_buffers was set to 16GB.[master]
It took 8 seconds to replay DROP DATABASE with 1000 tablespaces, as follows.
In this case, 16GB shared_buffers was fully scanned 1000 times.2019-10-02 16:50:14 JST LOG: redo starts at 0/2000028
2019-10-02 16:50:22 JST LOG: redo done at 0/300A298[patched]
It took less than 1 second to replay DROP DATABASE with 1000 tablespaces,
as follows. In this case, 16GB shared_buffers was scanned only one time.2019-10-02 16:47:03 JST LOG: redo starts at 0/2000028
2019-10-02 16:47:03 JST LOG: redo done at 0/3001588Hi Fujii-san,
It's been a while, so I checked the patch once again.
It's fairly straightforward and I saw no problems nor bug in the code.
Thanks for the review!
[patched]
It took less than 1 second to replay DROP DATABASE with 1000 tablespaces,The results are good.
I want to replicate the performance to confirm the results as well.
Could you share how you measured the recovery replay?
I forgot the actual steps that I used for the measurement.
But I think they are something like
1. create database "hoge"
2. create 1,000 tablespaces
3. create 1,000 tables on the database "hoge".
each table should be placed in different tablespace.
4. take a base backup
5. drop database "hoge"
6. shutdown the server with immediate mode
7. start an archive recovery from the backup taken at #4
8. measure how long it takes to apply DROP DATABASE record by
checking the timestamp at REDO start and REDO end.
I think that I performed the above steps on the master and
the patched version.
Did you actually execute a failover?
No.
Regards,
--
Fujii Masao
On Wed, Nov 13, 2019 5:34PM (GMT+9), Fujii Masao wrote:
On Wed, Nov 13, 2019 at 3:57 PM k.jamison@fujitsu.com <k.jamison@fujitsu.com>
wrote:On Wed, Oct. 2, 2019 5:40 PM, Fujii Masao wrote:
On Tue, Jul 10, 2018 at 3:04 PM Michael Paquier <michael@paquier.xyz>
wrote:
On Thu, Jul 05, 2018 at 01:42:20AM +0900, Fujii Masao wrote:
TBH, I have no numbers measured by the test.
One question about your test is; how did you measure the
*recovery
time* of DROP DATABASE? Since it's *recovery* performance,
basically it's not easy to measure that.It would be simple to measure the time it takes to replay this
single DROP DATABASE record by putting two gettimeofday() calls or
such things and then take the time difference. There are many
methods that you could use here, and I suppose that with a shared
buffer setting of a couple of GBs of shared buffers you would see
a measurable difference with a dozen of tablespaces or so. You
could also take a base backup after creating all the tablespaces,
connect the standby and then drop the database on the primary to
see the actual time it takes. Your patch looks logically correct
to me because DropDatabaseBuffers is a
*bottleneck* with large shared_buffers, and it would be nice to
see numbers.Thanks for the comment!
I measured how long it takes to replay DROP DATABASE with 1000
tablespaces, in master and patched version. shared_buffers was set to16GB.
[master]
It took 8 seconds to replay DROP DATABASE with 1000 tablespaces, as follows.
In this case, 16GB shared_buffers was fully scanned 1000 times.2019-10-02 16:50:14 JST LOG: redo starts at 0/2000028
2019-10-02 16:50:22 JST LOG: redo done at 0/300A298[patched]
It took less than 1 second to replay DROP DATABASE with 1000
tablespaces, as follows. In this case, 16GB shared_buffers was scannedonly one time.
2019-10-02 16:47:03 JST LOG: redo starts at 0/2000028
2019-10-02 16:47:03 JST LOG: redo done at 0/3001588Hi Fujii-san,
It's been a while, so I checked the patch once again.
It's fairly straightforward and I saw no problems nor bug in the code.Thanks for the review!
[patched]
It took less than 1 second to replay DROP DATABASE with 1000
tablespaces,The results are good.
I want to replicate the performance to confirm the results as well.
Could you share how you measured the recovery replay?I forgot the actual steps that I used for the measurement.
But I think they are something like1. create database "hoge"
2. create 1,000 tablespaces
3. create 1,000 tables on the database "hoge".
each table should be placed in different tablespace.
4. take a base backup
5. drop database "hoge"
6. shutdown the server with immediate mode 7. start an archive recovery from
the backup taken at #4 8. measure how long it takes to apply DROP DATABASE
record by
checking the timestamp at REDO start and REDO end.I think that I performed the above steps on the master and the patched version.
Did you actually execute a failover?
No.
I'm sorry for the late reply, and thanks for the guide above.
I replicated the same recovery test above on a standalone server
and have confirmed with the logs that the patch made the recovery faster.
[MASTER/UNPATCHED] ~10 seconds
2019-11-19 15:25:23.891 JST [23042] LOG: redo starts at 0/180006A0
...
2019-11-19 15:25:34.492 JST [23042] LOG: redo done at 0/1800A478
[PATCHED] ~less than 1 sec
2019-11-19 15:31:59.415 JST [17625] LOG: redo starts at 0/40005B8
...
2019-11-19 15:32:00.159 JST [17625] CONTEXT: WAL redo at 0/4000668 for Database/DROP: dir 1663/16384 16385/16384...//further details ommitted//...
...
2019-11-19 15:32:00.159 JST [17625] LOG: redo done at 0/4001638
I believe there are no problems, so I am marking this patch now
as "Ready for Committer".
Regards,
Kirk Jamison
On Tue, Nov 19, 2019 at 3:39 PM k.jamison@fujitsu.com
<k.jamison@fujitsu.com> wrote:
On Wed, Nov 13, 2019 5:34PM (GMT+9), Fujii Masao wrote:
On Wed, Nov 13, 2019 at 3:57 PM k.jamison@fujitsu.com <k.jamison@fujitsu.com>
wrote:On Wed, Oct. 2, 2019 5:40 PM, Fujii Masao wrote:
On Tue, Jul 10, 2018 at 3:04 PM Michael Paquier <michael@paquier.xyz>
wrote:
On Thu, Jul 05, 2018 at 01:42:20AM +0900, Fujii Masao wrote:
TBH, I have no numbers measured by the test.
One question about your test is; how did you measure the
*recovery
time* of DROP DATABASE? Since it's *recovery* performance,
basically it's not easy to measure that.It would be simple to measure the time it takes to replay this
single DROP DATABASE record by putting two gettimeofday() calls or
such things and then take the time difference. There are many
methods that you could use here, and I suppose that with a shared
buffer setting of a couple of GBs of shared buffers you would see
a measurable difference with a dozen of tablespaces or so. You
could also take a base backup after creating all the tablespaces,
connect the standby and then drop the database on the primary to
see the actual time it takes. Your patch looks logically correct
to me because DropDatabaseBuffers is a
*bottleneck* with large shared_buffers, and it would be nice to
see numbers.Thanks for the comment!
I measured how long it takes to replay DROP DATABASE with 1000
tablespaces, in master and patched version. shared_buffers was set to16GB.
[master]
It took 8 seconds to replay DROP DATABASE with 1000 tablespaces, as follows.
In this case, 16GB shared_buffers was fully scanned 1000 times.2019-10-02 16:50:14 JST LOG: redo starts at 0/2000028
2019-10-02 16:50:22 JST LOG: redo done at 0/300A298[patched]
It took less than 1 second to replay DROP DATABASE with 1000
tablespaces, as follows. In this case, 16GB shared_buffers was scannedonly one time.
2019-10-02 16:47:03 JST LOG: redo starts at 0/2000028
2019-10-02 16:47:03 JST LOG: redo done at 0/3001588Hi Fujii-san,
It's been a while, so I checked the patch once again.
It's fairly straightforward and I saw no problems nor bug in the code.Thanks for the review!
[patched]
It took less than 1 second to replay DROP DATABASE with 1000
tablespaces,The results are good.
I want to replicate the performance to confirm the results as well.
Could you share how you measured the recovery replay?I forgot the actual steps that I used for the measurement.
But I think they are something like1. create database "hoge"
2. create 1,000 tablespaces
3. create 1,000 tables on the database "hoge".
each table should be placed in different tablespace.
4. take a base backup
5. drop database "hoge"
6. shutdown the server with immediate mode 7. start an archive recovery from
the backup taken at #4 8. measure how long it takes to apply DROP DATABASE
record by
checking the timestamp at REDO start and REDO end.I think that I performed the above steps on the master and the patched version.
Did you actually execute a failover?
No.
I'm sorry for the late reply, and thanks for the guide above.
I replicated the same recovery test above on a standalone server
and have confirmed with the logs that the patch made the recovery faster.[MASTER/UNPATCHED] ~10 seconds
2019-11-19 15:25:23.891 JST [23042] LOG: redo starts at 0/180006A0
...
2019-11-19 15:25:34.492 JST [23042] LOG: redo done at 0/1800A478[PATCHED] ~less than 1 sec
2019-11-19 15:31:59.415 JST [17625] LOG: redo starts at 0/40005B8
...
2019-11-19 15:32:00.159 JST [17625] CONTEXT: WAL redo at 0/4000668 for Database/DROP: dir 1663/16384 16385/16384...//further details ommitted//...
...
2019-11-19 15:32:00.159 JST [17625] LOG: redo done at 0/4001638I believe there are no problems, so I am marking this patch now
as "Ready for Committer".
Thanks for the review! Committed.
Regards,
--
Fujii Masao