Clear empty space in a page.

Started by Yura Sokolovover 4 years ago5 messages
#1Yura Sokolov
y.sokolov@postgrespro.ru
1 attachment(s)

Good day.

Long time ago I've been played with proprietary "compressed storage"
patch on heavily updated table, and found empty pages (ie cleaned by
vacuum) are not compressed enough.

When table is stress-updated, page for new row versions are allocated
in round-robin kind, therefore some 1GB segments contains almost
no live tuples. Vacuum removes dead tuples, but segments remains large
after compression (>400MB) as if they are still full.

After some investigation I found it is because PageRepairFragmentation,
PageIndex*Delete* don't clear space that just became empty therefore it
still contains garbage data. Clearing it with memset greatly increase
compression ratio: some compressed relation segments become 30-60MB just
after vacuum remove tuples in them.

While this result is not directly applied to stock PostgreSQL, I believe
page compression is important for full_page_writes with wal_compression
enabled. And probably when PostgreSQL is used on filesystem with
compression enabled (ZFS?).

Therefore I propose clearing page's empty space with zero in
PageRepairFragmentation, PageIndexMultiDelete, PageIndexTupleDelete and
PageIndexTupleDeleteNoCompact.

Sorry, didn't measure impact on raw performance yet.

regards,
Yura Sokolov aka funny_falcon

Attachments:

clear_page.patchtext/x-diff; name=clear_page.patchDownload
commit 6abfcaeb87fcb396c5e2dccd434ce2511314ff76
Author: Yura Sokolov <y.sokolov@postgrespro.ru>
Date:   Sun May 30 02:39:17 2021 +0300

    Clear empty space in a page
    
    Write zeroes to just cleared space in PageRepairFragmentation,
    PageIndexTupleDelete, PageIndexMultiDelete and PageIndexDeleteNoCompact.
    
    It helps increase compression ration on compression enabled filesystems
    and with full_page_write and wal_compression enabled.

diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 82ca91f5977..7deb6cc71a4 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -681,6 +681,17 @@ compactify_tuples(itemIdCompact itemidbase, int nitems, Page page, bool presorte
 	phdr->pd_upper = upper;
 }
 
+/*
+ * Clean up space between pd_lower and pd_upper for better page compression.
+ */
+static void
+memset_hole(Page page, LocationIndex old_pd_upper)
+{
+	PageHeader	phdr = (PageHeader) page;
+	if (phdr->pd_upper > old_pd_upper)
+		MemSet((char *)page + old_pd_upper, 0, phdr->pd_upper - old_pd_upper);
+}
+
 /*
  * PageRepairFragmentation
  *
@@ -797,6 +808,7 @@ PageRepairFragmentation(Page page)
 
 		compactify_tuples(itemidbase, nstorage, page, presorted);
 	}
+	memset_hole(page, pd_upper);
 
 	/* Set hint bit for PageAddItemExtended */
 	if (nunused > 0)
@@ -1114,6 +1126,7 @@ PageIndexTupleDelete(Page page, OffsetNumber offnum)
 
 	if (offset > phdr->pd_upper)
 		memmove(addr + size, addr, offset - phdr->pd_upper);
+	MemSet(addr, 0, size);
 
 	/* adjust free space boundary pointers */
 	phdr->pd_upper += size;
@@ -1271,6 +1284,7 @@ PageIndexMultiDelete(Page page, OffsetNumber *itemnos, int nitems)
 		compactify_tuples(itemidbase, nused, page, presorted);
 	else
 		phdr->pd_upper = pd_special;
+	memset_hole(page, pd_upper);
 }
 
 
@@ -1351,6 +1365,7 @@ PageIndexTupleDeleteNoCompact(Page page, OffsetNumber offnum)
 
 	if (offset > phdr->pd_upper)
 		memmove(addr + size, addr, offset - phdr->pd_upper);
+	MemSet(addr, 0, size);
 
 	/* adjust free space boundary pointer */
 	phdr->pd_upper += size;
#2Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Yura Sokolov (#1)
Re: Clear empty space in a page.

Hello Yura,

didn't measure impact on raw performance yet.

Must be done. There c/should be a guc to control this behavior if the
performance impact is noticeable.

--
Fabien.

#3Omar Kilani
omar.kilani@gmail.com
In reply to: Fabien COELHO (#2)
Re: Clear empty space in a page.

Hi,

I happened to be running some postgres on zfs on Linux/aarch64 tests
and tested this patch.

Kernel: 4.18.0-305.el8.aarch64
CPU: 16x3.0GHz Ampere Alta / Arm Neoverse N1 cores

ZFS: 2.1.0-rc6
ZFS options: options spl spl_kmem_cache_slab_limit=65536 (see:
https://github.com/openzfs/zfs/issues/12150)

Postgres: 13.3 with and without the patch
Postgres config:

full_page_writes = on
wal_compression = on

Without patch:

starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 43200 s
number of transactions actually processed: 612557228
latency average = 2.257 ms
tps = 14179.551402 (including connections establishing)
tps = 14179.553286 (excluding connections establishing)

With patch:

starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 43200 s
number of transactions actually processed: 606967295
latency average = 2.278 ms
tps = 14050.164370 (including connections establishing)
tps = 14050.166007 (excluding connections establishing)

It does seem to help with on disk compression but it *might* have
caused more fragmentation.

Regards,
Omar

Show quoted text

On Sat, May 29, 2021 at 10:22 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello Yura,

didn't measure impact on raw performance yet.

Must be done. There c/should be a guc to control this behavior if the
performance impact is noticeable.

--
Fabien.

#4Andres Freund
andres@anarazel.de
In reply to: Yura Sokolov (#1)
Re: Clear empty space in a page.

Hi,

On 2021-05-30 03:10:26 +0300, Yura Sokolov wrote:

While this result is not directly applied to stock PostgreSQL, I believe
page compression is important for full_page_writes with wal_compression
enabled. And probably when PostgreSQL is used on filesystem with
compression enabled (ZFS?).

I don't think the former is relevant, because the hole is skipped in wal page
compression (at some cost).

Therefore I propose clearing page's empty space with zero in
PageRepairFragmentation, PageIndexMultiDelete, PageIndexTupleDelete and
PageIndexTupleDeleteNoCompact.

Sorry, didn't measure impact on raw performance yet.

I'm worried that this might cause O(n^2) behaviour in some cases, by
repeatedly memset'ing the same mostly already zeroed space to 0. Why do we
ever need to do memset_hole() instead of accurately just zeroing out the space
that was just vacated?

Greetings,

Andres Freund

#5Yura Sokolov
y.sokolov@postgrespro.ru
In reply to: Andres Freund (#4)
Re: Clear empty space in a page.

Hi,

Andres Freund wrote 2021-05-31 00:07:

Hi,

On 2021-05-30 03:10:26 +0300, Yura Sokolov wrote:

While this result is not directly applied to stock PostgreSQL, I
believe
page compression is important for full_page_writes with
wal_compression
enabled. And probably when PostgreSQL is used on filesystem with
compression enabled (ZFS?).

I don't think the former is relevant, because the hole is skipped in
wal page
compression (at some cost).

Ah, forgot about. Yep, you are right.

Therefore I propose clearing page's empty space with zero in
PageRepairFragmentation, PageIndexMultiDelete, PageIndexTupleDelete
and
PageIndexTupleDeleteNoCompact.

Sorry, didn't measure impact on raw performance yet.

I'm worried that this might cause O(n^2) behaviour in some cases, by
repeatedly memset'ing the same mostly already zeroed space to 0. Why do
we
ever need to do memset_hole() instead of accurately just zeroing out
the space
that was just vacated?

It is done exactly this way: memset_hole accepts "old_pd_upper" and
cleans between
old and new one.

regards,
Yura