Setting pd_lower in GIN metapage
What are some arguments against setting pd_lower in the GIN metapage as
follows?
GinMetaPageData *metad = GinPageGetMeta(page);
((PageHeader) page)->pd_lower =
((char *) metad + sizeof(GinMetaPageData)) - (char *) page;
I saw that _bt_initmetapage() does it, so was wondering why doesn't GIN.
How about porting such a change to the back-branches if we do this at all?
I couldn't find any discussion in the archives about this. I read in
comments that server versions older than 9.4 didn't set pd_lower correctly
in any of GIN index pages, so relying on pd_lower value of GIN pages is
unreliable in general.
The reason I'm asking is that a certain backup tool relies on pd_lower
values of data pages (disk blocks in relation files that are known to have
a valid PageHeaderData) to be correct to discard the portion of every page
that supposedly does not contain any useful information. The assumption
doesn't hold in the case of GIN metapage, so any GIN indexes contain
corrupted metapage after recovery (metadata overwritten with zeros).
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 19, 2017 at 11:37 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
What are some arguments against setting pd_lower in the GIN metapage as
follows?GinMetaPageData *metad = GinPageGetMeta(page);
((PageHeader) page)->pd_lower =
((char *) metad + sizeof(GinMetaPageData)) - (char *) page;I saw that _bt_initmetapage() does it, so was wondering why doesn't GIN.
Actually, hash index also has similar code (See _hash_init_metabuffer)
and I see no harm in doing this at similar other places. This helps
in compressing the hole in metapage during WAL writing. I think that
in itself might not be an argument in favor of doing this because
there is only one meta page for index and saving ~7K WAL is not huge
but OTOH I don't see a reason for not doing it.
How about porting such a change to the back-branches if we do this at all?
I couldn't find any discussion in the archives about this. I read in
comments that server versions older than 9.4 didn't set pd_lower correctly
in any of GIN index pages, so relying on pd_lower value of GIN pages is
unreliable in general.The reason I'm asking is that a certain backup tool relies on pd_lower
values of data pages (disk blocks in relation files that are known to have
a valid PageHeaderData) to be correct to discard the portion of every page
that supposedly does not contain any useful information. The assumption
doesn't hold in the case of GIN metapage, so any GIN indexes contain
corrupted metapage after recovery (metadata overwritten with zeros).
Why can't you do a special check for metapage identification? It
should be easy to add such a check. I have seen one of such tools
(pg_filedump) has similar check to skip metapage in certain code
paths.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Amit Kapila <amit.kapila16@gmail.com> writes:
On Mon, Jun 19, 2017 at 11:37 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:What are some arguments against setting pd_lower in the GIN metapage as
follows?
Actually, hash index also has similar code (See _hash_init_metabuffer)
and I see no harm in doing this at similar other places.
Seems reasonable.
How about porting such a change to the back-branches if we do this at all?
The reason I'm asking is that a certain backup tool relies on pd_lower
values of data pages (disk blocks in relation files that are known to have
a valid PageHeaderData) to be correct to discard the portion of every page
that supposedly does not contain any useful information. The assumption
doesn't hold in the case of GIN metapage, so any GIN indexes contain
corrupted metapage after recovery (metadata overwritten with zeros).
I'm not in favor of back-porting such a change. Even if we did, it would
only affect subsequently-created indexes not existing ones. That means
your tool has to cope with an unset pd_lower in any case --- and will for
the foreseeable future, because of pg_upgrade.
I'd suggest a rule like "if pd_lower is smaller than SizeOfPageHeaderData
then don't trust it, but assume all of the page is valid data".
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/06/19 22:59, Amit Kapila wrote:
On Mon, Jun 19, 2017 at 11:37 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:What are some arguments against setting pd_lower in the GIN metapage as
follows?GinMetaPageData *metad = GinPageGetMeta(page);
((PageHeader) page)->pd_lower =
((char *) metad + sizeof(GinMetaPageData)) - (char *) page;I saw that _bt_initmetapage() does it, so was wondering why doesn't GIN.
Actually, hash index also has similar code (See _hash_init_metabuffer)
and I see no harm in doing this at similar other places. This helps
in compressing the hole in metapage during WAL writing. I think that
in itself might not be an argument in favor of doing this because
there is only one meta page for index and saving ~7K WAL is not huge
but OTOH I don't see a reason for not doing it.
I agree. Will write a patch.
How about porting such a change to the back-branches if we do this at all?
I couldn't find any discussion in the archives about this. I read in
comments that server versions older than 9.4 didn't set pd_lower correctly
in any of GIN index pages, so relying on pd_lower value of GIN pages is
unreliable in general.The reason I'm asking is that a certain backup tool relies on pd_lower
values of data pages (disk blocks in relation files that are known to have
a valid PageHeaderData) to be correct to discard the portion of every page
that supposedly does not contain any useful information. The assumption
doesn't hold in the case of GIN metapage, so any GIN indexes contain
corrupted metapage after recovery (metadata overwritten with zeros).Why can't you do a special check for metapage identification? It
should be easy to add such a check. I have seen one of such tools
(pg_filedump) has similar check to skip metapage in certain code
paths.
I tried to add a metapage check, but getting such code to compile requires
it to include headers like gin_private.h (in PG versions < 10), which in
turn includes other headers that are forbidden to be included in what's
supposed to be FRONTEND code. (thread at [1]/messages/by-id/CA+TgmoZ=F=GkxV0YEv-A8tb+AEGy_Qa7GSiJ8deBKFATnzfEug@mail.gmail.com seems relevant in this
regard.) Another way to fix this might be to copy the GinMetaPageData
struct definition and a few other symbols into the tool's code and make
necessary checks using the same, instead of including gin_private.h.
Thanks,
Amit
[1]: /messages/by-id/CA+TgmoZ=F=GkxV0YEv-A8tb+AEGy_Qa7GSiJ8deBKFATnzfEug@mail.gmail.com
/messages/by-id/CA+TgmoZ=F=GkxV0YEv-A8tb+AEGy_Qa7GSiJ8deBKFATnzfEug@mail.gmail.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/06/19 23:31, Tom Lane wrote:
Amit Kapila <amit.kapila16@gmail.com> writes:
On Mon, Jun 19, 2017 at 11:37 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:What are some arguments against setting pd_lower in the GIN metapage as
follows?Actually, hash index also has similar code (See _hash_init_metabuffer)
and I see no harm in doing this at similar other places.Seems reasonable.
Here is a patch that does it for the GIN metapage. (I am not sure if the
changes to gin_mask() that are included in the patch are really necessary.)
How about porting such a change to the back-branches if we do this at all?
The reason I'm asking is that a certain backup tool relies on pd_lower
values of data pages (disk blocks in relation files that are known to have
a valid PageHeaderData) to be correct to discard the portion of every page
that supposedly does not contain any useful information. The assumption
doesn't hold in the case of GIN metapage, so any GIN indexes contain
corrupted metapage after recovery (metadata overwritten with zeros).I'm not in favor of back-porting such a change. Even if we did, it would
only affect subsequently-created indexes not existing ones. That means
your tool has to cope with an unset pd_lower in any case --- and will for
the foreseeable future, because of pg_upgrade.I'd suggest a rule like "if pd_lower is smaller than SizeOfPageHeaderData
then don't trust it, but assume all of the page is valid data".
Actually, such a check is already in place in the tool, whose condition
looks like:
if (PageGetPageSize(header) == BLCKSZ &&
PageGetPageLayoutVersion(header) == PG_PAGE_LAYOUT_VERSION &&
(header->pd_flags & ~PD_VALID_FLAG_BITS) == 0 &&
header->pd_lower >= SizeOfPageHeaderData &&
header->pd_lower <= header->pd_upper &&
header->pd_upper <= header->pd_special &&
header->pd_special <= BLCKSZ &&
header->pd_special == MAXALIGN(header->pd_special) && ...
which even GIN metapage passes, making it an eligible data page and hence
for omitting the hole between pd_lower and pd_upper.
That's because a GIN metapage will always have undergone PageInit() that
sets pd_lower to SizeOfPageHeaderData. Which means the tool has to look
beyond the standard PageHeaderData to determine whether the area between
pd_lower and pd_upper is really a hole. Amit K also suggested the same,
but that seems to require either duplicating GIN's private struct
definition (of GinMetaPageData) in the tool or including backend's
gin_private.h, either of which doesn't seem to be a good thing to do in
what is FRONTEND code, but maybe there is no other way. Am I missing
something?
Thanks,
Amit
Attachments:
set-gin-metapage-pd_lower-v1.patchtext/plain; charset=UTF-8; name=set-gin-metapage-pd_lower-v1.patchDownload
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index d03d59da6a..b1755c6f1c 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -372,6 +372,13 @@ GinInitMetabuffer(Buffer b)
metadata->nDataPages = 0;
metadata->nEntries = 0;
metadata->ginVersion = GIN_CURRENT_VERSION;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 7ba04e324f..95b842bef0 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -776,18 +776,11 @@ gin_mask(char *pagedata, BlockNumber blkno)
mask_page_hint_bits(page);
/*
- * GIN metapage doesn't use pd_lower/pd_upper. Other page types do. Hence,
- * we need to apply masking for those pages.
+ * For GIN_DELETED page, the page is initialized to empty. Hence, mask
+ * the page content.
*/
- if (opaque->flags != GIN_META)
- {
- /*
- * For GIN_DELETED page, the page is initialized to empty. Hence, mask
- * the page content.
- */
- if (opaque->flags & GIN_DELETED)
- mask_page_content(page);
- else
- mask_unused_space(page);
- }
+ if (opaque->flags & GIN_DELETED)
+ mask_page_content(page);
+ else
+ mask_unused_space(page);
}
On Tue, Jun 20, 2017 at 1:50 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/06/19 23:31, Tom Lane wrote:
Amit Kapila <amit.kapila16@gmail.com> writes:
On Mon, Jun 19, 2017 at 11:37 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:What are some arguments against setting pd_lower in the GIN metapage as
follows?Actually, hash index also has similar code (See _hash_init_metabuffer)
and I see no harm in doing this at similar other places.Seems reasonable.
Here is a patch that does it for the GIN metapage. (I am not sure if the
changes to gin_mask() that are included in the patch are really necessary.)How about porting such a change to the back-branches if we do this at all?
The reason I'm asking is that a certain backup tool relies on pd_lower
values of data pages (disk blocks in relation files that are known to have
a valid PageHeaderData) to be correct to discard the portion of every page
that supposedly does not contain any useful information. The assumption
doesn't hold in the case of GIN metapage, so any GIN indexes contain
corrupted metapage after recovery (metadata overwritten with zeros).I'm not in favor of back-porting such a change. Even if we did, it would
only affect subsequently-created indexes not existing ones. That means
your tool has to cope with an unset pd_lower in any case --- and will for
the foreseeable future, because of pg_upgrade.I'd suggest a rule like "if pd_lower is smaller than SizeOfPageHeaderData
then don't trust it, but assume all of the page is valid data".Actually, such a check is already in place in the tool, whose condition
looks like:if (PageGetPageSize(header) == BLCKSZ &&
PageGetPageLayoutVersion(header) == PG_PAGE_LAYOUT_VERSION &&
(header->pd_flags & ~PD_VALID_FLAG_BITS) == 0 &&
header->pd_lower >= SizeOfPageHeaderData &&
header->pd_lower <= header->pd_upper &&
header->pd_upper <= header->pd_special &&
header->pd_special <= BLCKSZ &&
header->pd_special == MAXALIGN(header->pd_special) && ...which even GIN metapage passes, making it an eligible data page and hence
for omitting the hole between pd_lower and pd_upper.
Won't checking for GIN_META in header->pd_flags gives you what you want?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/06/20 20:37, Amit Kapila wrote:
On Tue, Jun 20, 2017 at 1:50 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/06/19 23:31, Tom Lane wrote:
I'd suggest a rule like "if pd_lower is smaller than SizeOfPageHeaderData
then don't trust it, but assume all of the page is valid data".Actually, such a check is already in place in the tool, whose condition
looks like:if (PageGetPageSize(header) == BLCKSZ &&
PageGetPageLayoutVersion(header) == PG_PAGE_LAYOUT_VERSION &&
(header->pd_flags & ~PD_VALID_FLAG_BITS) == 0 &&
header->pd_lower >= SizeOfPageHeaderData &&
header->pd_lower <= header->pd_upper &&
header->pd_upper <= header->pd_special &&
header->pd_special <= BLCKSZ &&
header->pd_special == MAXALIGN(header->pd_special) && ...which even GIN metapage passes, making it an eligible data page and hence
for omitting the hole between pd_lower and pd_upper.Won't checking for GIN_META in header->pd_flags gives you what you want?
GIN_META flag is not written into pd_flags but GinPageOpaqueData.flags,
which still requires including GIN's private header.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 21, 2017 at 9:42 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/06/20 20:37, Amit Kapila wrote:
On Tue, Jun 20, 2017 at 1:50 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/06/19 23:31, Tom Lane wrote:
I'd suggest a rule like "if pd_lower is smaller than SizeOfPageHeaderData
then don't trust it, but assume all of the page is valid data".Actually, such a check is already in place in the tool, whose condition
looks like:if (PageGetPageSize(header) == BLCKSZ &&
PageGetPageLayoutVersion(header) == PG_PAGE_LAYOUT_VERSION &&
(header->pd_flags & ~PD_VALID_FLAG_BITS) == 0 &&
header->pd_lower >= SizeOfPageHeaderData &&
header->pd_lower <= header->pd_upper &&
header->pd_upper <= header->pd_special &&
header->pd_special <= BLCKSZ &&
header->pd_special == MAXALIGN(header->pd_special) && ...which even GIN metapage passes, making it an eligible data page and hence
for omitting the hole between pd_lower and pd_upper.Won't checking for GIN_META in header->pd_flags gives you what you want?
GIN_META flag is not written into pd_flags but GinPageOpaqueData.flags,
which still requires including GIN's private header.
Did you check this patch with wal_consistency_checking? I am getting
failures so your patch does not have the masking of GIN pages
completely right:
FATAL: inconsistent page found, rel 1663/16385/28133, forknum 0, blkno 0
CONTEXT: WAL redo at 0/39379EB8 for Gin/UPDATE_META_PAGE:
That's easily reproducible with installcheck and a standby replaying
the changes. I did not look at the code in details to see what you may
be missing here.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/06/22 16:56, Michael Paquier wrote:
On Wed, Jun 21, 2017 at 9:42 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/06/20 20:37, Amit Kapila wrote:
On Tue, Jun 20, 2017 at 1:50 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/06/19 23:31, Tom Lane wrote:
I'd suggest a rule like "if pd_lower is smaller than SizeOfPageHeaderData
then don't trust it, but assume all of the page is valid data".Actually, such a check is already in place in the tool, whose condition
looks like:if (PageGetPageSize(header) == BLCKSZ &&
PageGetPageLayoutVersion(header) == PG_PAGE_LAYOUT_VERSION &&
(header->pd_flags & ~PD_VALID_FLAG_BITS) == 0 &&
header->pd_lower >= SizeOfPageHeaderData &&
header->pd_lower <= header->pd_upper &&
header->pd_upper <= header->pd_special &&
header->pd_special <= BLCKSZ &&
header->pd_special == MAXALIGN(header->pd_special) && ...which even GIN metapage passes, making it an eligible data page and hence
for omitting the hole between pd_lower and pd_upper.Won't checking for GIN_META in header->pd_flags gives you what you want?
GIN_META flag is not written into pd_flags but GinPageOpaqueData.flags,
which still requires including GIN's private header.Did you check this patch with wal_consistency_checking? I am getting
failures so your patch does not have the masking of GIN pages
completely right:
FATAL: inconsistent page found, rel 1663/16385/28133, forknum 0, blkno 0
CONTEXT: WAL redo at 0/39379EB8 for Gin/UPDATE_META_PAGE:
That's easily reproducible with installcheck and a standby replaying
the changes. I did not look at the code in details to see what you may
be missing here.
Oh, wasn't sure about the gin_mask() changes myself. Thanks for checking.
Actually, the WAL consistency check fails even without patching
gin_mask(), so the problem may be with the main patch itself. That is,
the patch needs to do something else other than just teaching
GinInitMetabuffer() to initialize pd_lower. Will look into that.
Thanks,
Amit
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jun 22, 2017 at 6:55 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/06/22 16:56, Michael Paquier wrote:
On Wed, Jun 21, 2017 at 9:42 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/06/20 20:37, Amit Kapila wrote:
On Tue, Jun 20, 2017 at 1:50 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/06/19 23:31, Tom Lane wrote:
I'd suggest a rule like "if pd_lower is smaller than SizeOfPageHeaderData
then don't trust it, but assume all of the page is valid data".Actually, such a check is already in place in the tool, whose condition
looks like:if (PageGetPageSize(header) == BLCKSZ &&
PageGetPageLayoutVersion(header) == PG_PAGE_LAYOUT_VERSION &&
(header->pd_flags & ~PD_VALID_FLAG_BITS) == 0 &&
header->pd_lower >= SizeOfPageHeaderData &&
header->pd_lower <= header->pd_upper &&
header->pd_upper <= header->pd_special &&
header->pd_special <= BLCKSZ &&
header->pd_special == MAXALIGN(header->pd_special) && ...which even GIN metapage passes, making it an eligible data page and hence
for omitting the hole between pd_lower and pd_upper.Won't checking for GIN_META in header->pd_flags gives you what you want?
GIN_META flag is not written into pd_flags but GinPageOpaqueData.flags,
which still requires including GIN's private header.Did you check this patch with wal_consistency_checking? I am getting
failures so your patch does not have the masking of GIN pages
completely right:
FATAL: inconsistent page found, rel 1663/16385/28133, forknum 0, blkno 0
CONTEXT: WAL redo at 0/39379EB8 for Gin/UPDATE_META_PAGE:
That's easily reproducible with installcheck and a standby replaying
the changes. I did not look at the code in details to see what you may
be missing here.Oh, wasn't sure about the gin_mask() changes myself. Thanks for checking.
Actually, the WAL consistency check fails even without patching
gin_mask(), so the problem may be with the main patch itself. That is,
the patch needs to do something else other than just teaching
GinInitMetabuffer() to initialize pd_lower. Will look into that.
I've not read the code deeply but I guess we should use
GinInitMetabuffer() in ginRedoUpdateMetapage() instead of
GinInitPage(). Maybe also GinInitPage() in ginRedoDeleteListPages() is
the same.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/06/23 10:22, Masahiko Sawada wrote:
On Thu, Jun 22, 2017 at 6:55 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/06/22 16:56, Michael Paquier wrote:
Did you check this patch with wal_consistency_checking? I am getting
failures so your patch does not have the masking of GIN pages
completely right:
FATAL: inconsistent page found, rel 1663/16385/28133, forknum 0, blkno 0
CONTEXT: WAL redo at 0/39379EB8 for Gin/UPDATE_META_PAGE:
That's easily reproducible with installcheck and a standby replaying
the changes. I did not look at the code in details to see what you may
be missing here.Oh, wasn't sure about the gin_mask() changes myself. Thanks for checking.
Actually, the WAL consistency check fails even without patching
gin_mask(), so the problem may be with the main patch itself. That is,
the patch needs to do something else other than just teaching
GinInitMetabuffer() to initialize pd_lower. Will look into that.I've not read the code deeply but I guess we should use
GinInitMetabuffer() in ginRedoUpdateMetapage() instead of
GinInitPage(). Maybe also GinInitPage() in ginRedoDeleteListPages() is
the same.
That was it, thanks for the pointer.
Attached updated patch, which I confirmed, passes wal_consistency_check = gin.
Thanks,
Amit
Attachments:
set-gin-metapage-pd_lower-v2.patchtext/plain; charset=UTF-8; name=set-gin-metapage-pd_lower-v2.patchDownload
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 91e4a8cf70..52de599b29 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -372,6 +372,13 @@ GinInitMetabuffer(Buffer b)
metadata->nDataPages = 0;
metadata->nEntries = 0;
metadata->ginVersion = GIN_CURRENT_VERSION;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 7ba04e324f..6d2e528852 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -514,7 +514,7 @@ ginRedoUpdateMetapage(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
MarkBufferDirty(metabuffer);
@@ -656,7 +656,7 @@ ginRedoDeleteListPages(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
@@ -776,18 +776,11 @@ gin_mask(char *pagedata, BlockNumber blkno)
mask_page_hint_bits(page);
/*
- * GIN metapage doesn't use pd_lower/pd_upper. Other page types do. Hence,
- * we need to apply masking for those pages.
+ * For GIN_DELETED page, the page is initialized to empty. Hence, mask
+ * the page content.
*/
- if (opaque->flags != GIN_META)
- {
- /*
- * For GIN_DELETED page, the page is initialized to empty. Hence, mask
- * the page content.
- */
- if (opaque->flags & GIN_DELETED)
- mask_page_content(page);
- else
- mask_unused_space(page);
- }
+ if (opaque->flags & GIN_DELETED)
+ mask_page_content(page);
+ else
+ mask_unused_space(page);
}
On Fri, Jun 23, 2017 at 11:17 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/06/23 10:22, Masahiko Sawada wrote:
On Thu, Jun 22, 2017 at 6:55 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/06/22 16:56, Michael Paquier wrote:
Did you check this patch with wal_consistency_checking? I am getting
failures so your patch does not have the masking of GIN pages
completely right:
FATAL: inconsistent page found, rel 1663/16385/28133, forknum 0, blkno 0
CONTEXT: WAL redo at 0/39379EB8 for Gin/UPDATE_META_PAGE:
That's easily reproducible with installcheck and a standby replaying
the changes. I did not look at the code in details to see what you may
be missing here.Oh, wasn't sure about the gin_mask() changes myself. Thanks for checking.
Actually, the WAL consistency check fails even without patching
gin_mask(), so the problem may be with the main patch itself. That is,
the patch needs to do something else other than just teaching
GinInitMetabuffer() to initialize pd_lower. Will look into that.I've not read the code deeply but I guess we should use
GinInitMetabuffer() in ginRedoUpdateMetapage() instead of
GinInitPage(). Maybe also GinInitPage() in ginRedoDeleteListPages() is
the same.That was it, thanks for the pointer.
Attached updated patch, which I confirmed, passes wal_consistency_check = gin.
Thank you for updating the patch. It looks good to me.
BTW I'm inclined to have a regression test case where doing 'make
check' to the streaming replication environment with
wal_consistency_check on standby server so that we can detect a bug
around the wal.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 23, 2017 at 2:56 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Thank you for updating the patch. It looks good to me.
BTW I'm inclined to have a regression test case where doing 'make
check' to the streaming replication environment with
wal_consistency_check on standby server so that we can detect a bug
around the wal.
This would be very costly. A single run of the main regression tests
generates 15 to 20GB of FPWs if I recall correctly. Tiny buildfarm
members would suffer on that. *cough*
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/06/23 15:07, Michael Paquier wrote:
On Fri, Jun 23, 2017 at 2:56 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Thank you for updating the patch. It looks good to me.
BTW I'm inclined to have a regression test case where doing 'make
check' to the streaming replication environment with
wal_consistency_check on standby server so that we can detect a bug
around the wal.This would be very costly. A single run of the main regression tests
generates 15 to 20GB of FPWs if I recall correctly. Tiny buildfarm
members would suffer on that. *cough*
Initially, I had naively set wal_consistency_check = all before running
make installcheck and then had to wait for a long time to confirm that WAL
generated by the gin test indeed caused consistency check failure on the
standby with the v1 patch.
But I can see Sawada-san's point that there should be some way for
developers writing code that better had gone through WAL consistency
checking facility to do it without much hassle. But then again, it may
not be that frequent to need that.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 23, 2017 at 3:17 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Initially, I had naively set wal_consistency_check = all before running
make installcheck and then had to wait for a long time to confirm that WAL
generated by the gin test indeed caused consistency check failure on the
standby with the v1 patch.
wal_consistency_check = gin would have saved you a lot of I/O.
But I can see Sawada-san's point that there should be some way for
developers writing code that better had gone through WAL consistency
checking facility to do it without much hassle. But then again, it may
not be that frequent to need that.
Yeah, I have my own set of generic scripts for that. There could be a
set of modules out of the main check-world, the risk that those finish
rotting is high though...
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 23, 2017 at 3:31 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Fri, Jun 23, 2017 at 3:17 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Initially, I had naively set wal_consistency_check = all before running
make installcheck and then had to wait for a long time to confirm that WAL
generated by the gin test indeed caused consistency check failure on the
standby with the v1 patch.wal_consistency_check = gin would have saved you a lot of I/O.
But I can see Sawada-san's point that there should be some way for
developers writing code that better had gone through WAL consistency
checking facility to do it without much hassle. But then again, it may
not be that frequent to need that.
Yeah, it should be optional. I imagined providing such an option of
pg_regress or TAP test for the developers.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Masahiko Sawada wrote:
On Fri, Jun 23, 2017 at 3:31 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Fri, Jun 23, 2017 at 3:17 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Initially, I had naively set wal_consistency_check = all before running
make installcheck and then had to wait for a long time to confirm that WAL
generated by the gin test indeed caused consistency check failure on the
standby with the v1 patch.wal_consistency_check = gin would have saved you a lot of I/O.
But I can see Sawada-san's point that there should be some way for
developers writing code that better had gone through WAL consistency
checking facility to do it without much hassle. But then again, it may
not be that frequent to need that.Yeah, it should be optional. I imagined providing such an option of
pg_regress or TAP test for the developers.
As far as I know it is possible to have third-party modules that extend
the buildfarm client script so that it runs additional tests that the
standard ones. You could have a custom module installed in some
powerful machine of yours that runs the WAL consistency check and report
the results to the buildfarm. A single animal running that test should
be enough, right?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 23, 2017 at 11:17 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
That was it, thanks for the pointer.
GinInitMetabuffer() sets up pd_lower and pd_upper anyway using
PageInit so the check of PageIsVerified is guaranteed to work in any
case. Upgraded pages will still have their pd_lower set to the
previous values, and new pages will have the optimization. So this
patch is actually harmless for past pages, while newer ones are seen
as more compressible.
Attached updated patch, which I confirmed, passes wal_consistency_check = gin.
I have spent some time looking at this patch, playing with pg_upgrade
to check the state of the page upgraded. And this looks good to me.
One thing that I noticed is that this optimization could as well
happen for spgist meta pages. What do others think?
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jun 24, 2017 at 1:35 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
Masahiko Sawada wrote:
On Fri, Jun 23, 2017 at 3:31 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Fri, Jun 23, 2017 at 3:17 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Initially, I had naively set wal_consistency_check = all before running
make installcheck and then had to wait for a long time to confirm that WAL
generated by the gin test indeed caused consistency check failure on the
standby with the v1 patch.wal_consistency_check = gin would have saved you a lot of I/O.
But I can see Sawada-san's point that there should be some way for
developers writing code that better had gone through WAL consistency
checking facility to do it without much hassle. But then again, it may
not be that frequent to need that.Yeah, it should be optional. I imagined providing such an option of
pg_regress or TAP test for the developers.As far as I know it is possible to have third-party modules that extend
the buildfarm client script so that it runs additional tests that the
standard ones. You could have a custom module installed in some
powerful machine of yours that runs the WAL consistency check and report
the results to the buildfarm. A single animal running that test should
be enough, right?
Yes, thank you for the information. It's a good idea. I'll try it.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 26, 2017 at 10:54 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Fri, Jun 23, 2017 at 11:17 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:That was it, thanks for the pointer.
GinInitMetabuffer() sets up pd_lower and pd_upper anyway using
PageInit so the check of PageIsVerified is guaranteed to work in any
case. Upgraded pages will still have their pd_lower set to the
previous values, and new pages will have the optimization. So this
patch is actually harmless for past pages, while newer ones are seen
as more compressible.Attached updated patch, which I confirmed, passes wal_consistency_check = gin.
I have spent some time looking at this patch, playing with pg_upgrade
to check the state of the page upgraded. And this looks good to me.
One thing that I noticed is that this optimization could as well
happen for spgist meta pages. What do others think?
Good point. I think it could happen for brin meta page as well.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/06/26 10:54, Michael Paquier wrote:
On Fri, Jun 23, 2017 at 11:17 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:That was it, thanks for the pointer.
GinInitMetabuffer() sets up pd_lower and pd_upper anyway using
PageInit so the check of PageIsVerified is guaranteed to work in any
case. Upgraded pages will still have their pd_lower set to the
previous values, and new pages will have the optimization. So this
patch is actually harmless for past pages, while newer ones are seen
as more compressible.
Right.
Attached updated patch, which I confirmed, passes wal_consistency_check = gin.
I have spent some time looking at this patch, playing with pg_upgrade
to check the state of the page upgraded. And this looks good to me.
Thanks for the review.
One thing that I noticed is that this optimization could as well
happen for spgist meta pages. What do others think?
I agree. As Sawada-san mentioned, brin metapage code can use a similar patch.
So attached are three patches for gin, brin, and sp-gist respectively.
Both brin and sp-gist cases didn't require any special consideration for
passing wal_consistency_checking, as the patch didn't cause brin and
sp-gist metapages to become invalid when recreated on standby (remember
that patch 0001 needed to be updated for that).
Thanks,
Amit
Attachments:
0001-Set-pd_lower-correctly-in-the-GIN-metapage.patchtext/plain; charset=UTF-8; name=0001-Set-pd_lower-correctly-in-the-GIN-metapage.patchDownload
From 05d15c1ed3a49dda0e82f9fc18c9af3c6c15ebff Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 23 Jun 2017 11:20:41 +0900
Subject: [PATCH 1/3] Set pd_lower correctly in the GIN metapage.
---
src/backend/access/gin/ginutil.c | 7 +++++++
src/backend/access/gin/ginxlog.c | 23 ++++++++---------------
2 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 91e4a8cf70..52de599b29 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -372,6 +372,13 @@ GinInitMetabuffer(Buffer b)
metadata->nDataPages = 0;
metadata->nEntries = 0;
metadata->ginVersion = GIN_CURRENT_VERSION;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 7ba04e324f..6d2e528852 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -514,7 +514,7 @@ ginRedoUpdateMetapage(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
MarkBufferDirty(metabuffer);
@@ -656,7 +656,7 @@ ginRedoDeleteListPages(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
@@ -776,18 +776,11 @@ gin_mask(char *pagedata, BlockNumber blkno)
mask_page_hint_bits(page);
/*
- * GIN metapage doesn't use pd_lower/pd_upper. Other page types do. Hence,
- * we need to apply masking for those pages.
+ * For GIN_DELETED page, the page is initialized to empty. Hence, mask
+ * the page content.
*/
- if (opaque->flags != GIN_META)
- {
- /*
- * For GIN_DELETED page, the page is initialized to empty. Hence, mask
- * the page content.
- */
- if (opaque->flags & GIN_DELETED)
- mask_page_content(page);
- else
- mask_unused_space(page);
- }
+ if (opaque->flags & GIN_DELETED)
+ mask_page_content(page);
+ else
+ mask_unused_space(page);
}
--
2.11.0
0002-Set-pd_lower-correctly-in-the-BRIN-index-metapage.patchtext/plain; charset=UTF-8; name=0002-Set-pd_lower-correctly-in-the-BRIN-index-metapage.patchDownload
From 260b2536520ab665c44ccc1a78053013dc202c6f Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 26 Jun 2017 15:13:32 +0900
Subject: [PATCH 2/3] Set pd_lower correctly in the BRIN index metapage
---
src/backend/access/brin/brin_pageops.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 80f803e438..8762356433 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -491,6 +491,13 @@ brin_metapage_init(Page page, BlockNumber pagesPerRange, uint16 version)
* revmap page to be created when the index is.
*/
metadata->lastRevmapPage = 0;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is to log full
+ * page image of metapage in xloginsert.c.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) page;
}
/*
--
2.11.0
0003-Set-pd_lower-correctly-in-the-SP-GiST-index-metapage.patchtext/plain; charset=UTF-8; name=0003-Set-pd_lower-correctly-in-the-SP-GiST-index-metapage.patchDownload
From 8ce6e5493879b4d381628f1199498c7b0de82c59 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 26 Jun 2017 15:23:34 +0900
Subject: [PATCH 3/3] Set pd_lower correctly in the SP-GiST index metapage
---
src/backend/access/spgist/spgutils.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 8656af453c..f031c8c68c 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -534,6 +534,13 @@ SpGistInitMetapage(Page page)
/* initialize last-used-page cache to empty */
for (i = 0; i < SPGIST_CACHED_PAGES; i++)
metadata->lastUsedPages.cachedPage[i].blkno = InvalidBlockNumber;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is to log full
+ * page image of metapage in xloginsert.c.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(SpGistMetaPageData)) - (char *) page;
}
/*
--
2.11.0
On Mon, Jun 26, 2017 at 3:54 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/06/26 10:54, Michael Paquier wrote:
On Fri, Jun 23, 2017 at 11:17 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:That was it, thanks for the pointer.
GinInitMetabuffer() sets up pd_lower and pd_upper anyway using
PageInit so the check of PageIsVerified is guaranteed to work in any
case. Upgraded pages will still have their pd_lower set to the
previous values, and new pages will have the optimization. So this
patch is actually harmless for past pages, while newer ones are seen
as more compressible.Right.
Attached updated patch, which I confirmed, passes wal_consistency_check = gin.
I have spent some time looking at this patch, playing with pg_upgrade
to check the state of the page upgraded. And this looks good to me.Thanks for the review.
One thing that I noticed is that this optimization could as well
happen for spgist meta pages. What do others think?I agree. As Sawada-san mentioned, brin metapage code can use a similar patch.
So attached are three patches for gin, brin, and sp-gist respectively.
Both brin and sp-gist cases didn't require any special consideration for
passing wal_consistency_checking, as the patch didn't cause brin and
sp-gist metapages to become invalid when recreated on standby (remember
that patch 0001 needed to be updated for that).
Thank you for the patches! I checked additional patches for brin and
spgist. They look good to me.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 26, 2017 at 4:11 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Thank you for the patches! I checked additional patches for brin and
spgist. They look good to me.
Last versions are still missing something: brin_mask() and spg_mask()
can be updated so as mask_unused_space() is called for meta pages.
Except that the patches look to be on the right track.
By the way, as this is an optimization and not an actual bug, could
you add this patch to the next commit fest? I don't think that this
should get into 10. The focus is to stabilize things lately.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/06/27 10:22, Michael Paquier wrote:
On Mon, Jun 26, 2017 at 4:11 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Thank you for the patches! I checked additional patches for brin and
spgist. They look good to me.Last versions are still missing something: brin_mask() and spg_mask()
can be updated so as mask_unused_space() is called for meta pages.
Except that the patches look to be on the right track.
Thanks for the review.
I updated brin_mask() and spg_mask() in the attached updated patches so
that they consider meta pages as containing unused space.
By the way, as this is an optimization and not an actual bug, could
you add this patch to the next commit fest? I don't think that this
should get into 10. The focus is to stabilize things lately.
Sure, done.
Thanks,
Amit
Attachments:
0001-Set-pd_lower-correctly-in-the-GIN-metapage_v2.patchtext/plain; charset=UTF-8; name=0001-Set-pd_lower-correctly-in-the-GIN-metapage_v2.patchDownload
From b05760739a36f3247019e75a9029ece7ab0bb09c Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 23 Jun 2017 11:20:41 +0900
Subject: [PATCH 1/3] Set pd_lower correctly in the GIN metapage.
---
src/backend/access/gin/ginutil.c | 7 +++++++
src/backend/access/gin/ginxlog.c | 23 ++++++++---------------
2 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 91e4a8cf70..52de599b29 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -372,6 +372,13 @@ GinInitMetabuffer(Buffer b)
metadata->nDataPages = 0;
metadata->nEntries = 0;
metadata->ginVersion = GIN_CURRENT_VERSION;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 7ba04e324f..6d2e528852 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -514,7 +514,7 @@ ginRedoUpdateMetapage(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
MarkBufferDirty(metabuffer);
@@ -656,7 +656,7 @@ ginRedoDeleteListPages(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
@@ -776,18 +776,11 @@ gin_mask(char *pagedata, BlockNumber blkno)
mask_page_hint_bits(page);
/*
- * GIN metapage doesn't use pd_lower/pd_upper. Other page types do. Hence,
- * we need to apply masking for those pages.
+ * For GIN_DELETED page, the page is initialized to empty. Hence, mask
+ * the page content.
*/
- if (opaque->flags != GIN_META)
- {
- /*
- * For GIN_DELETED page, the page is initialized to empty. Hence, mask
- * the page content.
- */
- if (opaque->flags & GIN_DELETED)
- mask_page_content(page);
- else
- mask_unused_space(page);
- }
+ if (opaque->flags & GIN_DELETED)
+ mask_page_content(page);
+ else
+ mask_unused_space(page);
}
--
2.11.0
0002-Set-pd_lower-correctly-in-the-BRIN-index-metapage_v2.patchtext/plain; charset=UTF-8; name=0002-Set-pd_lower-correctly-in-the-BRIN-index-metapage_v2.patchDownload
From 3f197e13dcef968e8f0a5ff17cf0c328458cde34 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 26 Jun 2017 15:13:32 +0900
Subject: [PATCH 2/3] Set pd_lower correctly in the BRIN index metapage
---
src/backend/access/brin/brin_pageops.c | 7 +++++++
src/backend/access/brin/brin_xlog.c | 2 +-
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 80f803e438..8762356433 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -491,6 +491,13 @@ brin_metapage_init(Page page, BlockNumber pagesPerRange, uint16 version)
* revmap page to be created when the index is.
*/
metadata->lastRevmapPage = 0;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is to log full
+ * page image of metapage in xloginsert.c.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c
index dff7198a39..36e6e1411a 100644
--- a/src/backend/access/brin/brin_xlog.c
+++ b/src/backend/access/brin/brin_xlog.c
@@ -336,7 +336,7 @@ brin_mask(char *pagedata, BlockNumber blkno)
mask_page_hint_bits(page);
- if (BRIN_IS_REGULAR_PAGE(page))
+ if (BRIN_IS_REGULAR_PAGE(page) || BRIN_IS_META_PAGE(page))
{
/* Regular brin pages contain unused space which needs to be masked. */
mask_unused_space(page);
--
2.11.0
0003-Set-pd_lower-correctly-in-the-SP-GiST-index-metapage_v2.patchtext/plain; charset=UTF-8; name=0003-Set-pd_lower-correctly-in-the-SP-GiST-index-metapage_v2.patchDownload
From 32c5f42c17e9fae57ed12c2326470de5e20f964c Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 26 Jun 2017 15:23:34 +0900
Subject: [PATCH 3/3] Set pd_lower correctly in the SP-GiST index metapage
---
src/backend/access/spgist/spgutils.c | 7 +++++++
src/backend/access/spgist/spgxlog.c | 7 +------
2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 8656af453c..f031c8c68c 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -534,6 +534,13 @@ SpGistInitMetapage(Page page)
/* initialize last-used-page cache to empty */
for (i = 0; i < SPGIST_CACHED_PAGES; i++)
metadata->lastUsedPages.cachedPage[i].blkno = InvalidBlockNumber;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is to log full
+ * page image of metapage in xloginsert.c.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(SpGistMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index c440d21715..9ea3fd8e8d 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -1038,10 +1038,5 @@ spg_mask(char *pagedata, BlockNumber blkno)
mask_page_hint_bits(page);
- /*
- * Any SpGist page other than meta contains unused space which needs to be
- * masked.
- */
- if (!SpGistPageIsMeta(page))
- mask_unused_space(page);
+ mask_unused_space(page);
}
--
2.11.0
On Tue, Jun 27, 2017 at 4:56 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/06/27 10:22, Michael Paquier wrote:
On Mon, Jun 26, 2017 at 4:11 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Thank you for the patches! I checked additional patches for brin and
spgist. They look good to me.Last versions are still missing something: brin_mask() and spg_mask()
can be updated so as mask_unused_space() is called for meta pages.
Except that the patches look to be on the right track.Thanks for the review.
I updated brin_mask() and spg_mask() in the attached updated patches so
that they consider meta pages as containing unused space.
Thanks for the new version. I had an extra look at those patches, and
I am happy with its shape. I also have been doing more testing with
pg_upgrade and wal_consistency_checking, and found no issues. So
switched status as ready for committer. Everything could be put into a
single commit.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/08/22 9:39, Michael Paquier wrote:
On Tue, Jun 27, 2017 at 4:56 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/06/27 10:22, Michael Paquier wrote:
On Mon, Jun 26, 2017 at 4:11 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Thank you for the patches! I checked additional patches for brin and
spgist. They look good to me.Last versions are still missing something: brin_mask() and spg_mask()
can be updated so as mask_unused_space() is called for meta pages.
Except that the patches look to be on the right track.Thanks for the review.
I updated brin_mask() and spg_mask() in the attached updated patches so
that they consider meta pages as containing unused space.Thanks for the new version. I had an extra look at those patches, and
I am happy with its shape. I also have been doing more testing with
pg_upgrade and wal_consistency_checking, and found no issues. So
switched status as ready for committer. Everything could be put into a
single commit.
Thanks for the review. Agreed about committing these together.
Regards,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
On 2017/08/22 9:39, Michael Paquier wrote:
On Tue, Jun 27, 2017 at 4:56 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:I updated brin_mask() and spg_mask() in the attached updated patches so
that they consider meta pages as containing unused space.
I looked briefly at these patches. I'm not sure that it's safe for the
mask functions to assume that meta pages always have valid pd_lower.
What happens when replaying log data concerning an old index that doesn't
have that field filled?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 7, 2017 at 5:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
On 2017/08/22 9:39, Michael Paquier wrote:
On Tue, Jun 27, 2017 at 4:56 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:I updated brin_mask() and spg_mask() in the attached updated patches so
that they consider meta pages as containing unused space.I looked briefly at these patches. I'm not sure that it's safe for the
mask functions to assume that meta pages always have valid pd_lower.
What happens when replaying log data concerning an old index that doesn't
have that field filled?
There will be inconsistency between the pages, and the masking check
will complain. My point here is that wal_consistency_checking is
primarily used by developers on newly-deployed clusters to check WAL
consistency by using installcheck. So an upgraded cluster would see
diffs because of that, but I would think that nobody would really face
them. Perhaps we should document this point for wal_consistency_check?
Getting the patch discussed on this thread into v10 would have saved
the day here, but this boat has sailed already.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/07 8:51, Michael Paquier wrote:
On Thu, Sep 7, 2017 at 5:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
On 2017/08/22 9:39, Michael Paquier wrote:
On Tue, Jun 27, 2017 at 4:56 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:I updated brin_mask() and spg_mask() in the attached updated patches so
that they consider meta pages as containing unused space.I looked briefly at these patches. I'm not sure that it's safe for the
mask functions to assume that meta pages always have valid pd_lower.
What happens when replaying log data concerning an old index that doesn't
have that field filled?There will be inconsistency between the pages, and the masking check
will complain. My point here is that wal_consistency_checking is
primarily used by developers on newly-deployed clusters to check WAL
consistency by using installcheck. So an upgraded cluster would see
diffs because of that, but I would think that nobody would really face
them.
I too tend to think that any users who use this masking facility would
know to expect to get these failures on upgraded clusters with invalid
pd_lower in meta pages.
(PS: I wonder if it is reasonable to allow configuring the error level
used when a masking failure occurs? Currently, checkXLogConsistency()
will abort the process (FATAL))
Perhaps we should document this point for wal_consistency_check?
Do you mean permanently under wal_consistency_check parameter
documentation or in the release notes under incompatibilities for the
affected index types?
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 7, 2017 at 10:42 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
I too tend to think that any users who use this masking facility would
know to expect to get these failures on upgraded clusters with invalid
pd_lower in meta pages.
Yes, I don't think that an optimization reducing WAL that impacts all
users should be stopped by a small set of users who use an option for
development purposes.
(PS: I wonder if it is reasonable to allow configuring the error level
used when a masking failure occurs? Currently, checkXLogConsistency()
will abort the process (FATAL))
It definitely is worth it in my opinion, perhaps with an on/off switch
to trigger a warning instead. The reason why we use FATAL now is to
trigger more easily red flags for any potential buildfarm runs: a
startup process facing FATAL takes down the standby.
Perhaps we should document this point for wal_consistency_check?
Do you mean permanently under wal_consistency_check parameter
documentation or in the release notes under incompatibilities for the
affected index types?
Under the parameter itself, in the spirit of a don't-do-that from an
upgraded instance.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier <michael.paquier@gmail.com> writes:
On Thu, Sep 7, 2017 at 5:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I looked briefly at these patches. I'm not sure that it's safe for the
mask functions to assume that meta pages always have valid pd_lower.
What happens when replaying log data concerning an old index that doesn't
have that field filled?
There will be inconsistency between the pages, and the masking check
will complain.
That doesn't seem like a pleasant outcome to me. The WAL consistency
check code is supposed to complain if there's some kind of replication
or replay failure, and this cannot be categorized as either.
The idea I'd had was to apply the masking only if pd_lower >=
SizeOfPageHeaderData, or if you wanted to be stricter, only if
pd_lower != 0.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 7, 2017 at 12:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
The idea I'd had was to apply the masking only if pd_lower >=
SizeOfPageHeaderData, or if you wanted to be stricter, only if
pd_lower != 0.
If putting a check, it seems to me that the stricter one makes the
most sense. pd_lower should remain at 0 on pre-10 servers.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/07 13:09, Michael Paquier wrote:
On Thu, Sep 7, 2017 at 12:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
The idea I'd had was to apply the masking only if pd_lower >=
SizeOfPageHeaderData, or if you wanted to be stricter, only if
pd_lower != 0.If putting a check, it seems to me that the stricter one makes the
most sense.
OK, patches updated that way.
pd_lower should remain at 0 on pre-10 servers.
Doesn't PageInit(), which is where any page gets initialized, has always
set pd_lower to SizeOfPageHeaderData?
Thanks,
Amit
Attachments:
0001-Set-pd_lower-correctly-in-the-GIN-metapage.patchtext/plain; charset=UTF-8; name=0001-Set-pd_lower-correctly-in-the-GIN-metapage.patchDownload
From 681c0ea82d0d9acd37f1979ebe1918d8636b0d26 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 23 Jun 2017 11:20:41 +0900
Subject: [PATCH 1/3] Set pd_lower correctly in the GIN metapage.
---
src/backend/access/gin/ginutil.c | 7 +++++++
src/backend/access/gin/ginxlog.c | 24 +++++++++---------------
2 files changed, 16 insertions(+), 15 deletions(-)
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 136ea27718..ebac391818 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -374,6 +374,13 @@ GinInitMetabuffer(Buffer b)
metadata->nDataPages = 0;
metadata->nEntries = 0;
metadata->ginVersion = GIN_CURRENT_VERSION;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 7ba04e324f..f5c11b2d9a 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -514,7 +514,7 @@ ginRedoUpdateMetapage(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
MarkBufferDirty(metabuffer);
@@ -656,7 +656,7 @@ ginRedoDeleteListPages(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
@@ -768,6 +768,7 @@ void
gin_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
GinPageOpaque opaque;
mask_page_lsn(page);
@@ -776,18 +777,11 @@ gin_mask(char *pagedata, BlockNumber blkno)
mask_page_hint_bits(page);
/*
- * GIN metapage doesn't use pd_lower/pd_upper. Other page types do. Hence,
- * we need to apply masking for those pages.
+ * For GIN_DELETED page, the page is initialized to empty. Hence, mask
+ * the page content.
*/
- if (opaque->flags != GIN_META)
- {
- /*
- * For GIN_DELETED page, the page is initialized to empty. Hence, mask
- * the page content.
- */
- if (opaque->flags & GIN_DELETED)
- mask_page_content(page);
- else
- mask_unused_space(page);
- }
+ if (opaque->flags & GIN_DELETED)
+ mask_page_content(page);
+ else if (pagehdr->pd_lower != 0)
+ mask_unused_space(page);
}
--
2.11.0
0002-Set-pd_lower-correctly-in-the-BRIN-index-metapage.patchtext/plain; charset=UTF-8; name=0002-Set-pd_lower-correctly-in-the-BRIN-index-metapage.patchDownload
From a605ae99664138aa827500216567137be97d0460 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 26 Jun 2017 15:13:32 +0900
Subject: [PATCH 2/3] Set pd_lower correctly in the BRIN index metapage
---
src/backend/access/brin/brin_pageops.c | 7 +++++++
src/backend/access/brin/brin_xlog.c | 9 +++++++--
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 80f803e438..8762356433 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -491,6 +491,13 @@ brin_metapage_init(Page page, BlockNumber pagesPerRange, uint16 version)
* revmap page to be created when the index is.
*/
metadata->lastRevmapPage = 0;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is to log full
+ * page image of metapage in xloginsert.c.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c
index dff7198a39..1309d44b04 100644
--- a/src/backend/access/brin/brin_xlog.c
+++ b/src/backend/access/brin/brin_xlog.c
@@ -331,14 +331,19 @@ void
brin_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
mask_page_lsn(page);
mask_page_hint_bits(page);
- if (BRIN_IS_REGULAR_PAGE(page))
+ /*
+ * Regular brin pages contain unused space which needs to be masked.
+ * Similarly for meta pages, but only if pd_lower has been set.
+ */
+ if (BRIN_IS_REGULAR_PAGE(page) ||
+ (BRIN_IS_META_PAGE(page) && pagehdr->pd_lower != 0))
{
- /* Regular brin pages contain unused space which needs to be masked. */
mask_unused_space(page);
}
}
--
2.11.0
0003-Set-pd_lower-correctly-in-the-SP-GiST-index-metapage.patchtext/plain; charset=UTF-8; name=0003-Set-pd_lower-correctly-in-the-SP-GiST-index-metapage.patchDownload
From e01d37b77b3e761a499850aa80de1c403f5dab86 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 26 Jun 2017 15:23:34 +0900
Subject: [PATCH 3/3] Set pd_lower correctly in the SP-GiST index metapage
---
src/backend/access/spgist/spgutils.c | 7 +++++++
src/backend/access/spgist/spgxlog.c | 8 +++-----
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 22f64b0103..2f21f623da 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -534,6 +534,13 @@ SpGistInitMetapage(Page page)
/* initialize last-used-page cache to empty */
for (i = 0; i < SPGIST_CACHED_PAGES; i++)
metadata->lastUsedPages.cachedPage[i].blkno = InvalidBlockNumber;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is to log full
+ * page image of metapage in xloginsert.c.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(SpGistMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index c440d21715..bf209416f2 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -1033,15 +1033,13 @@ void
spg_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
mask_page_lsn(page);
mask_page_hint_bits(page);
- /*
- * Any SpGist page other than meta contains unused space which needs to be
- * masked.
- */
- if (!SpGistPageIsMeta(page))
+ /* Mask the unused space, provided the page's pd_lower is set. */
+ if (pagehdr->pd_lower != 0)
mask_unused_space(page);
}
--
2.11.0
On Thu, Sep 7, 2017 at 2:40 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/09/07 13:09, Michael Paquier wrote:
pd_lower should remain at 0 on pre-10 servers.
Doesn't PageInit(), which is where any page gets initialized, has always
set pd_lower to SizeOfPageHeaderData?
Yes, sorry. I had a mind slippery here..
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 6, 2017 at 9:42 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
I too tend to think that any users who use this masking facility would
know to expect to get these failures on upgraded clusters with invalid
pd_lower in meta pages.
I strongly disagree.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier <michael.paquier@gmail.com> writes:
On Thu, Sep 7, 2017 at 2:40 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/09/07 13:09, Michael Paquier wrote:
pd_lower should remain at 0 on pre-10 servers.
Doesn't PageInit(), which is where any page gets initialized, has always
set pd_lower to SizeOfPageHeaderData?
Yes, sorry. I had a mind slippery here..
Indeed, which raises the question of how did any of this code work at all?
Up to now, these pages have been initialized with pd_lower =
SizeOfPageHeaderData, *not* zero. Which means that the FPI code would
have considered the metadata to be part of a valid "hole" and not stored
it, if we'd ever told the FPI code that the metadata page was of standard
format. I've just looked through the code for these three index types and
can't find any place where they do that --- for instance, they don't pass
REGBUF_STANDARD to XLogRegisterBuffer when writing their metapages, and
they pass page_std = false to log_newpage when using that for metapages.
Good thing, because they'd be completely broken otherwise.
This means that the premise of this patch is wrong. Adjusting pd_lower,
by itself, would accomplish precisely zip for WAL compression, because
we'd still not be telling the WAL code to compress out the hole.
To get any benefit, I think we'd need to do all of the following:
1. Initialize pd_lower correctly in the metapage init functions, as here.
2. Any place we are about to write the metapage, set its pd_lower to the
correct value, in case it's an old index that doesn't have that set
correctly. Fortunately this is cheap enough that we might as well just
do it unconditionally.
3. Adjust all the xlog calls to tell them the page is of standard format.
Now, one advantage we'd get from this is that we could say confidently
that any index metapage appearing in a WAL stream generated by v11 or
later has got the right pd_lower; therefore we could dispense with
checking for wrong pd_lower in the mask functions (unless they're used
in some way I don't know about, which is surely possible).
BTW, while nbtree correctly initializes pd_lower, it looks to me like it
is not exploiting that, because it seems never to pass REGBUF_STANDARD for
the metapage anyway. I think this doesn't matter performance-wise for
nbtree, because it seems to always pass REGBUF_WILL_INIT instead.
But if we do likewise in other index types then they aren't really going
to win anyway. GIN at least looks like it might do that; I have not
gone through any of the index types in detail.
In short, this patch needs a significant rewrite, and more analysis than
you've done so far on whether there's actually any benefit to be gained.
It might not be worth messing with.
I'll set the CF entry back to Waiting on Author.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Sep 8, 2017 at 5:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
This means that the premise of this patch is wrong. Adjusting pd_lower,
by itself, would accomplish precisely zip for WAL compression, because
we'd still not be telling the WAL code to compress out the hole.To get any benefit, I think we'd need to do all of the following:
1. Initialize pd_lower correctly in the metapage init functions, as here.
[...]
In short, this patch needs a significant rewrite, and more analysis than
you've done so far on whether there's actually any benefit to be gained.
It might not be worth messing with.I'll set the CF entry back to Waiting on Author.
I did some measurements of the compressibility of the GIN meta page,
looking at its FPWs with and without wal_compression and you are
right: there is no direct compressibility effect when setting pd_lower
on the meta page. However, it seems to me that there is an argument
still pleading on favor of this patch for wal_consistency_checking.
On HEAD pd_lower gets set to 24 and pd_upper to 8184 for GIN meta
pages. With the patch, it gets at 80. On top of cleaning up the
masking functions GIN, BRIN and SpGist by removing some exceptions in
their handling, we are able to get a better masked page because it is
possible to mask a portion that we *know* is unused. So even if there
are no compressibility benefits, I think that it actually helps in
tracking down inconsistencies in meta pages by having a better
precision lookup. So I would still vote for integrating the patch
as-is, with the addition of a comment to mention that the
compressibility optimization is not used yet, though this is helpful
when masking the page. The same comment ought to be mentioned for
btree.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier <michael.paquier@gmail.com> writes:
On Fri, Sep 8, 2017 at 5:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
In short, this patch needs a significant rewrite, and more analysis than
you've done so far on whether there's actually any benefit to be gained.
It might not be worth messing with.
I did some measurements of the compressibility of the GIN meta page,
looking at its FPWs with and without wal_compression and you are
right: there is no direct compressibility effect when setting pd_lower
on the meta page. However, it seems to me that there is an argument
still pleading on favor of this patch for wal_consistency_checking.
I think that would be true if we did both my point 1 and 2, so that
the wal replay functions could trust pd_lower to be sane in all cases.
But really, if you have to touch all the places that write these
metapages, you might as well mark them REGBUF_STANDARD while at it.
The same comment ought to be mentioned for btree.
Yeah, I was wondering if we ought not clean up btree/hash while at it.
At the very least, their existing comments saying that it's inessential
to set pd_lower could use some more detail about why or why not.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Sep 8, 2017 at 1:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
On Thu, Sep 7, 2017 at 2:40 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/09/07 13:09, Michael Paquier wrote:
pd_lower should remain at 0 on pre-10 servers.
Doesn't PageInit(), which is where any page gets initialized, has always
set pd_lower to SizeOfPageHeaderData?Yes, sorry. I had a mind slippery here..
Indeed, which raises the question of how did any of this code work at all?
Up to now, these pages have been initialized with pd_lower =
SizeOfPageHeaderData, *not* zero. Which means that the FPI code would
have considered the metadata to be part of a valid "hole" and not stored
it, if we'd ever told the FPI code that the metadata page was of standard
format. I've just looked through the code for these three index types and
can't find any place where they do that --- for instance, they don't pass
REGBUF_STANDARD to XLogRegisterBuffer when writing their metapages, and
they pass page_std = false to log_newpage when using that for metapages.
Good thing, because they'd be completely broken otherwise.This means that the premise of this patch is wrong. Adjusting pd_lower,
by itself, would accomplish precisely zip for WAL compression, because
we'd still not be telling the WAL code to compress out the hole.To get any benefit, I think we'd need to do all of the following:
1. Initialize pd_lower correctly in the metapage init functions, as here.
2. Any place we are about to write the metapage, set its pd_lower to the
correct value, in case it's an old index that doesn't have that set
correctly. Fortunately this is cheap enough that we might as well just
do it unconditionally.3. Adjust all the xlog calls to tell them the page is of standard format.
Now, one advantage we'd get from this is that we could say confidently
that any index metapage appearing in a WAL stream generated by v11 or
later has got the right pd_lower; therefore we could dispense with
checking for wrong pd_lower in the mask functions (unless they're used
in some way I don't know about, which is surely possible).BTW, while nbtree correctly initializes pd_lower, it looks to me like it
is not exploiting that, because it seems never to pass REGBUF_STANDARD for
the metapage anyway.
During the creation of the index, it uses log_newpage to log metapage
and there it uses REGBUF_STANDARD. So, there seems to be some use of
it.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Sep 9, 2017 at 9:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
On Fri, Sep 8, 2017 at 5:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
In short, this patch needs a significant rewrite, and more analysis than
you've done so far on whether there's actually any benefit to be gained.
It might not be worth messing with.I did some measurements of the compressibility of the GIN meta page,
looking at its FPWs with and without wal_compression and you are
right: there is no direct compressibility effect when setting pd_lower
on the meta page. However, it seems to me that there is an argument
still pleading on favor of this patch for wal_consistency_checking.I think that would be true if we did both my point 1 and 2, so that
the wal replay functions could trust pd_lower to be sane in all cases.
But really, if you have to touch all the places that write these
metapages, you might as well mark them REGBUF_STANDARD while at it.The same comment ought to be mentioned for btree.
Yeah, I was wondering if we ought not clean up btree/hash while at it.
At the very least, their existing comments saying that it's inessential
to set pd_lower could use some more detail about why or why not.
+1. I think we can even use REGBUF_STANDARD in the hash for metapage
where currently it is not used. I can give a try to write a patch for
hash/btree part if you want.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Sep 10, 2017 at 3:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Sep 9, 2017 at 9:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
On Fri, Sep 8, 2017 at 5:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
In short, this patch needs a significant rewrite, and more analysis than
you've done so far on whether there's actually any benefit to be gained.
It might not be worth messing with.I did some measurements of the compressibility of the GIN meta page,
looking at its FPWs with and without wal_compression and you are
right: there is no direct compressibility effect when setting pd_lower
on the meta page. However, it seems to me that there is an argument
still pleading on favor of this patch for wal_consistency_checking.I think that would be true if we did both my point 1 and 2, so that
the wal replay functions could trust pd_lower to be sane in all cases.
But really, if you have to touch all the places that write these
metapages, you might as well mark them REGBUF_STANDARD while at it.The same comment ought to be mentioned for btree.
Yeah, I was wondering if we ought not clean up btree/hash while at it.
At the very least, their existing comments saying that it's inessential
to set pd_lower could use some more detail about why or why not.+1. I think we can even use REGBUF_STANDARD in the hash for metapage
where currently it is not used. I can give a try to write a patch for
hash/btree part if you want.
Coordinating efforts here would be nice. If you, Amit K, are taking
care of a patch for btree and hash, would you, Amit L, write the part
for GIN, BRIN and SpGist? This needs a careful lookup as many code
paths need a lookup so it may take time. Please note that I don't mind
double-checking this part if you don't have enough room to do so.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Sep 10, 2017 at 11:52 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Sun, Sep 10, 2017 at 3:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Sep 9, 2017 at 9:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
On Fri, Sep 8, 2017 at 5:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
In short, this patch needs a significant rewrite, and more analysis than
you've done so far on whether there's actually any benefit to be gained.
It might not be worth messing with.I did some measurements of the compressibility of the GIN meta page,
looking at its FPWs with and without wal_compression and you are
right: there is no direct compressibility effect when setting pd_lower
on the meta page. However, it seems to me that there is an argument
still pleading on favor of this patch for wal_consistency_checking.I think that would be true if we did both my point 1 and 2, so that
the wal replay functions could trust pd_lower to be sane in all cases.
But really, if you have to touch all the places that write these
metapages, you might as well mark them REGBUF_STANDARD while at it.The same comment ought to be mentioned for btree.
Yeah, I was wondering if we ought not clean up btree/hash while at it.
At the very least, their existing comments saying that it's inessential
to set pd_lower could use some more detail about why or why not.+1. I think we can even use REGBUF_STANDARD in the hash for metapage
where currently it is not used. I can give a try to write a patch for
hash/btree part if you want.Coordinating efforts here would be nice. If you, Amit K, are taking
care of a patch for btree and hash
I think here we should first agree on what we want to do. Based on
Tom's comment, I was thinking of changing comments in btree/hash part
and additionally for hash indexes, I can see if we can pass
REGBUF_STANDARD for all usages of metapage. I am not sure if we want
similar exercise for btree as well.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Sep 10, 2017 at 3:28 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
I think here we should first agree on what we want to do. Based on
Tom's comment, I was thinking of changing comments in btree/hash part
and additionally for hash indexes, I can see if we can pass
REGBUF_STANDARD for all usages of metapage. I am not sure if we want
similar exercise for btree as well.
Changing only comments for now looks like a good idea to me.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Amit Kapila <amit.kapila16@gmail.com> writes:
On Sun, Sep 10, 2017 at 11:52 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:Coordinating efforts here would be nice. If you, Amit K, are taking
care of a patch for btree and hash
I think here we should first agree on what we want to do. Based on
Tom's comment, I was thinking of changing comments in btree/hash part
and additionally for hash indexes, I can see if we can pass
REGBUF_STANDARD for all usages of metapage. I am not sure if we want
similar exercise for btree as well.
FWIW, now that we've noticed the discrepancy, I'm for using
REGBUF_STANDARD or equivalent for all metapage calls. Even if it
saves no space, inconsistency is bad because it's confusing. And
Michael is correct to point out that we can exploit this to
improve WAL consistency checking.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 11, 2017 at 1:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
FWIW, now that we've noticed the discrepancy, I'm for using
REGBUF_STANDARD or equivalent for all metapage calls. Even if it
saves no space, inconsistency is bad because it's confusing.
OK, I don't mind having a more aggressive approach, but it would
definitely be nicer if the REGBUF additions are done in a second patch
that can be applied on top of the one setting pd_lower where needed.
Both things are linked, still are aimed to solve different problems.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Sep 10, 2017 at 9:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Kapila <amit.kapila16@gmail.com> writes:
On Sun, Sep 10, 2017 at 11:52 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:Coordinating efforts here would be nice. If you, Amit K, are taking
care of a patch for btree and hashI think here we should first agree on what we want to do. Based on
Tom's comment, I was thinking of changing comments in btree/hash part
and additionally for hash indexes, I can see if we can pass
REGBUF_STANDARD for all usages of metapage. I am not sure if we want
similar exercise for btree as well.FWIW, now that we've noticed the discrepancy, I'm for using
REGBUF_STANDARD or equivalent for all metapage calls. Even if it
saves no space, inconsistency is bad because it's confusing.
Agreed. However, I feel there is no harm in doing in two patches, one
for hash/btree and second for all other indexes (or maybe separate
patches for them as well; I haven't yet looked into the work involved
for other indexes) unless you prefer to do it all at a one-shot.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 11, 2017 at 7:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sun, Sep 10, 2017 at 9:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Kapila <amit.kapila16@gmail.com> writes:
On Sun, Sep 10, 2017 at 11:52 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:Coordinating efforts here would be nice. If you, Amit K, are taking
care of a patch for btree and hashI think here we should first agree on what we want to do. Based on
Tom's comment, I was thinking of changing comments in btree/hash part
and additionally for hash indexes, I can see if we can pass
REGBUF_STANDARD for all usages of metapage. I am not sure if we want
similar exercise for btree as well.FWIW, now that we've noticed the discrepancy, I'm for using
REGBUF_STANDARD or equivalent for all metapage calls. Even if it
saves no space, inconsistency is bad because it's confusing.Agreed. However, I feel there is no harm in doing in two patches, one
for hash/btree and second for all other indexes (or maybe separate
patches for them as well; I haven't yet looked into the work involved
for other indexes) unless you prefer to do it all at a one-shot.
I have prepared separate patches for hash and btree index. I think
for another type of indexes, it is better to first fix the pd_lower
issue.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
change_metapage_usage_hash-v1.patchapplication/octet-stream; name=change_metapage_usage_hash-v1.patchDownload
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 0579841..ad0ef55 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -403,7 +403,7 @@ _hash_init(Relation rel, double num_tuples, ForkNumber forkNum)
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfHashInitMetaPage);
- XLogRegisterBuffer(0, metabuf, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuf, REGBUF_WILL_INIT | REGBUF_STANDARD);
recptr = XLogInsert(RM_HASH_ID, XLOG_HASH_INIT_META_PAGE);
@@ -592,8 +592,13 @@ _hash_init_metabuffer(Buffer buf, double num_tuples, RegProcedure procid,
metap->hashm_firstfree = 0;
/*
- * Set pd_lower just past the end of the metadata. This is to log full
- * page image of metapage in xloginsert.c.
+ * Set pd_lower just past the end of the metadata. This is to avoid
+ * logging hole when we backup full page image of metapage in
+ * xloginsert.c.
+ *
+ * This won't be of any help unless we use option like REGBUF_STANDARD
+ * while registering the buffer for metapage as otherwise, it won't try to
+ * remove the hole while logging the full page image.
*/
((PageHeader) page)->pd_lower =
((char *) metap + sizeof(HashMetaPageData)) - (char *) page;
change_metapage_usage_btree-v1.patchapplication/octet-stream; name=change_metapage_usage_btree-v1.patchDownload
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index bf963fc..5cbaba1 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -898,7 +898,7 @@ _bt_insertonpg(Relation rel,
xlmeta.fastroot = metad->btm_fastroot;
xlmeta.fastlevel = metad->btm_fastlevel;
- XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBufData(2, (char *) &xlmeta, sizeof(xl_btree_metadata));
xlinfo = XLOG_BTREE_INSERT_META;
@@ -2032,7 +2032,7 @@ _bt_newroot(Relation rel, Buffer lbuf, Buffer rbuf)
XLogRegisterBuffer(0, rootbuf, REGBUF_WILL_INIT);
XLogRegisterBuffer(1, lbuf, REGBUF_STANDARD);
- XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT | REGBUF_STANDARD);
md.root = rootblknum;
md.level = metad->btm_level;
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 5c817b6..8f38610 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -67,6 +67,10 @@ _bt_initmetapage(Page page, BlockNumber rootbknum, uint32 level)
/*
* Set pd_lower just past the end of the metadata. This is not essential
* but it makes the page look compressible to xlog.c.
+ *
+ * This won't be of any help unless we use option like REGBUF_STANDARD
+ * while registering the buffer for metapage as otherwise, it won't try to
+ * remove the hole while logging the full page image.
*/
((PageHeader) page)->pd_lower =
((char *) metad + sizeof(BTMetaPageData)) - (char *) page;
@@ -241,7 +245,7 @@ _bt_getroot(Relation rel, int access)
XLogBeginInsert();
XLogRegisterBuffer(0, rootbuf, REGBUF_WILL_INIT);
- XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT | REGBUF_STANDARD);
md.root = rootblkno;
md.level = 0;
@@ -1827,7 +1831,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
if (BufferIsValid(metabuf))
{
- XLogRegisterBuffer(4, metabuf, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(4, metabuf, REGBUF_WILL_INIT | REGBUF_STANDARD);
xlmeta.root = metad->btm_root;
xlmeta.level = metad->btm_level;
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 3610c7c..81eb359 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -108,7 +108,8 @@ _bt_restore_meta(XLogReaderState *record, uint8 block_id)
/*
* Set pd_lower just past the end of the metadata. This is not essential
- * but it makes the page look compressible to xlog.c.
+ * but it makes the page look compressible to xlog.c. See
+ * _bt_initmetapage.
*/
((PageHeader) metapg)->pd_lower =
((char *) md + sizeof(BTMetaPageData)) - (char *) metapg;
On Mon, Sep 11, 2017 at 4:07 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
I have prepared separate patches for hash and btree index. I think
for another type of indexes, it is better to first fix the pd_lower
issue.
Just wondering (sorry I have not looked at your patch in details)...
Have you tested the compressibility effects of this patch on FPWs with
and without wal_compression?
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/10 15:22, Michael Paquier wrote:
On Sun, Sep 10, 2017 at 3:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Sep 9, 2017 at 9:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
On Fri, Sep 8, 2017 at 5:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
In short, this patch needs a significant rewrite, and more analysis than
you've done so far on whether there's actually any benefit to be gained.
It might not be worth messing with.I did some measurements of the compressibility of the GIN meta page,
looking at its FPWs with and without wal_compression and you are
right: there is no direct compressibility effect when setting pd_lower
on the meta page. However, it seems to me that there is an argument
still pleading on favor of this patch for wal_consistency_checking.I think that would be true if we did both my point 1 and 2, so that
the wal replay functions could trust pd_lower to be sane in all cases.
But really, if you have to touch all the places that write these
metapages, you might as well mark them REGBUF_STANDARD while at it.The same comment ought to be mentioned for btree.
Yeah, I was wondering if we ought not clean up btree/hash while at it.
At the very least, their existing comments saying that it's inessential
to set pd_lower could use some more detail about why or why not.+1. I think we can even use REGBUF_STANDARD in the hash for metapage
where currently it is not used. I can give a try to write a patch for
hash/btree part if you want.Coordinating efforts here would be nice. If you, Amit K, are taking
care of a patch for btree and hash, would you, Amit L, write the part
for GIN, BRIN and SpGist? This needs a careful lookup as many code
paths need a lookup so it may take time. Please note that I don't mind
double-checking this part if you don't have enough room to do so.
Sorry, I didn't have time today to carefully go through the recent
discussion on this thread (starting with Tom's email wherein he said he
set the status of the patch to Waiting on Author). I will try tomorrow.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 11, 2017 at 5:40 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/09/10 15:22, Michael Paquier wrote:
On Sun, Sep 10, 2017 at 3:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Sep 9, 2017 at 9:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
On Fri, Sep 8, 2017 at 5:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
In short, this patch needs a significant rewrite, and more analysis than
you've done so far on whether there's actually any benefit to be gained.
It might not be worth messing with.I did some measurements of the compressibility of the GIN meta page,
looking at its FPWs with and without wal_compression and you are
right: there is no direct compressibility effect when setting pd_lower
on the meta page. However, it seems to me that there is an argument
still pleading on favor of this patch for wal_consistency_checking.I think that would be true if we did both my point 1 and 2, so that
the wal replay functions could trust pd_lower to be sane in all cases.
But really, if you have to touch all the places that write these
metapages, you might as well mark them REGBUF_STANDARD while at it.The same comment ought to be mentioned for btree.
Yeah, I was wondering if we ought not clean up btree/hash while at it.
At the very least, their existing comments saying that it's inessential
to set pd_lower could use some more detail about why or why not.+1. I think we can even use REGBUF_STANDARD in the hash for metapage
where currently it is not used. I can give a try to write a patch for
hash/btree part if you want.Coordinating efforts here would be nice. If you, Amit K, are taking
care of a patch for btree and hash, would you, Amit L, write the part
for GIN, BRIN and SpGist? This needs a careful lookup as many code
paths need a lookup so it may take time. Please note that I don't mind
double-checking this part if you don't have enough room to do so.Sorry, I didn't have time today to carefully go through the recent
discussion on this thread (starting with Tom's email wherein he said he
set the status of the patch to Waiting on Author). I will try tomorrow.
Thanks for the update! Once you get to this point, please let me know
if you would like to work on a more complete patch for brin, gin and
spgist. If you don't have enough room, I am fine to produce something.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 11, 2017 at 12:43 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Mon, Sep 11, 2017 at 4:07 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
I have prepared separate patches for hash and btree index. I think
for another type of indexes, it is better to first fix the pd_lower
issue.Just wondering (sorry I have not looked at your patch in details)...
Have you tested the compressibility effects of this patch on FPWs with
and without wal_compression?
I have debugged it to see that it is executing the code path to
eliminate the hole for the hash index.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/11 18:13, Michael Paquier wrote:
On Mon, Sep 11, 2017 at 5:40 PM, Amit Langote wrote:
On 2017/09/10 15:22, Michael Paquier wrote:
Coordinating efforts here would be nice. If you, Amit K, are taking
care of a patch for btree and hash, would you, Amit L, write the part
for GIN, BRIN and SpGist? This needs a careful lookup as many code
paths need a lookup so it may take time. Please note that I don't mind
double-checking this part if you don't have enough room to do so.Sorry, I didn't have time today to carefully go through the recent
discussion on this thread (starting with Tom's email wherein he said he
set the status of the patch to Waiting on Author). I will try tomorrow.Thanks for the update! Once you get to this point, please let me know
if you would like to work on a more complete patch for brin, gin and
spgist. If you don't have enough room, I am fine to produce something.
I updated the patches for GIN, BRIN, and SP-GiST to include the following
changes:
1. Pass REGBUF_STNADARD flag when registering the metapage buffer
2. Pass true for page_std argument of log_newpage() or
log_newpage_buffer() when using it to log the metapage
3. Update comment near where pd_lower is set in the respective metapages,
similar to what Amit K did in his patches for btree and hash metapages.
Did I miss something from the discussion?
Thanks,
Amit
Attachments:
0001-Set-pd_lower-correctly-in-the-GIN-metapage.patchtext/plain; charset=UTF-8; name=0001-Set-pd_lower-correctly-in-the-GIN-metapage.patchDownload
From bf291247e8f49d4558ab70fa335bd5a7bdda88b4 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 23 Jun 2017 11:20:41 +0900
Subject: [PATCH 1/3] Set pd_lower correctly in the GIN metapage.
Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
src/backend/access/gin/gininsert.c | 4 ++--
src/backend/access/gin/ginutil.c | 9 +++++++++
src/backend/access/gin/ginxlog.c | 24 +++++++++---------------
3 files changed, 20 insertions(+), 17 deletions(-)
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 5378011f50..c9aa4ee147 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -348,7 +348,7 @@ ginbuild(Relation heap, Relation index, IndexInfo *indexInfo)
Page page;
XLogBeginInsert();
- XLogRegisterBuffer(0, MetaBuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, MetaBuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBuffer(1, RootBuffer, REGBUF_WILL_INIT);
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_CREATE_INDEX);
@@ -447,7 +447,7 @@ ginbuildempty(Relation index)
START_CRIT_SECTION();
GinInitMetabuffer(MetaBuffer);
MarkBufferDirty(MetaBuffer);
- log_newpage_buffer(MetaBuffer, false);
+ log_newpage_buffer(MetaBuffer, true);
GinInitBuffer(RootBuffer, GIN_LEAF);
MarkBufferDirty(RootBuffer);
log_newpage_buffer(RootBuffer, false);
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 136ea27718..e926649dd2 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -374,6 +374,15 @@ GinInitMetabuffer(Buffer b)
metadata->nDataPages = 0;
metadata->nEntries = 0;
metadata->ginVersion = GIN_CURRENT_VERSION;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c, as long as the
+ * buffer containing the page is passed to XLogRegisterBuffer() as a
+ * REGBUF_STANDARD page.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 7ba04e324f..f5c11b2d9a 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -514,7 +514,7 @@ ginRedoUpdateMetapage(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
MarkBufferDirty(metabuffer);
@@ -656,7 +656,7 @@ ginRedoDeleteListPages(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
@@ -768,6 +768,7 @@ void
gin_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
GinPageOpaque opaque;
mask_page_lsn(page);
@@ -776,18 +777,11 @@ gin_mask(char *pagedata, BlockNumber blkno)
mask_page_hint_bits(page);
/*
- * GIN metapage doesn't use pd_lower/pd_upper. Other page types do. Hence,
- * we need to apply masking for those pages.
+ * For GIN_DELETED page, the page is initialized to empty. Hence, mask
+ * the page content.
*/
- if (opaque->flags != GIN_META)
- {
- /*
- * For GIN_DELETED page, the page is initialized to empty. Hence, mask
- * the page content.
- */
- if (opaque->flags & GIN_DELETED)
- mask_page_content(page);
- else
- mask_unused_space(page);
- }
+ if (opaque->flags & GIN_DELETED)
+ mask_page_content(page);
+ else if (pagehdr->pd_lower != 0)
+ mask_unused_space(page);
}
--
2.11.0
0002-Set-pd_lower-correctly-in-the-BRIN-index-metapage.patchtext/plain; charset=UTF-8; name=0002-Set-pd_lower-correctly-in-the-BRIN-index-metapage.patchDownload
From d5357e311be2aa71f842335f8684a7600493044c Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 26 Jun 2017 15:13:32 +0900
Subject: [PATCH 2/3] Set pd_lower correctly in the BRIN index metapage
Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
src/backend/access/brin/brin.c | 4 ++--
src/backend/access/brin/brin_pageops.c | 9 +++++++++
src/backend/access/brin/brin_xlog.c | 9 +++++++--
3 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index b3aa6d1ced..e6909d7aea 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -685,7 +685,7 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfBrinCreateIdx);
- XLogRegisterBuffer(0, meta, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, meta, REGBUF_WILL_INIT | REGBUF_STANDARD);
recptr = XLogInsert(RM_BRIN_ID, XLOG_BRIN_CREATE_INDEX);
@@ -742,7 +742,7 @@ brinbuildempty(Relation index)
brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index),
BRIN_CURRENT_VERSION);
MarkBufferDirty(metabuf);
- log_newpage_buffer(metabuf, false);
+ log_newpage_buffer(metabuf, true);
END_CRIT_SECTION();
UnlockReleaseBuffer(metabuf);
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 80f803e438..117322ed17 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -491,6 +491,15 @@ brin_metapage_init(Page page, BlockNumber pagesPerRange, uint16 version)
* revmap page to be created when the index is.
*/
metadata->lastRevmapPage = 0;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c, as long as the
+ * buffer containing the page is passed to XLogRegisterBuffer() as a
+ * REGBUF_STANDARD page.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c
index dff7198a39..1309d44b04 100644
--- a/src/backend/access/brin/brin_xlog.c
+++ b/src/backend/access/brin/brin_xlog.c
@@ -331,14 +331,19 @@ void
brin_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
mask_page_lsn(page);
mask_page_hint_bits(page);
- if (BRIN_IS_REGULAR_PAGE(page))
+ /*
+ * Regular brin pages contain unused space which needs to be masked.
+ * Similarly for meta pages, but only if pd_lower has been set.
+ */
+ if (BRIN_IS_REGULAR_PAGE(page) ||
+ (BRIN_IS_META_PAGE(page) && pagehdr->pd_lower != 0))
{
- /* Regular brin pages contain unused space which needs to be masked. */
mask_unused_space(page);
}
}
--
2.11.0
0003-Set-pd_lower-correctly-in-the-SP-GiST-index-metapage.patchtext/plain; charset=UTF-8; name=0003-Set-pd_lower-correctly-in-the-SP-GiST-index-metapage.patchDownload
From e6a5b734d61171dee1df39137439cbd48ed56b44 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 26 Jun 2017 15:23:34 +0900
Subject: [PATCH 3/3] Set pd_lower correctly in the SP-GiST index metapage
Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
src/backend/access/spgist/spginsert.c | 4 ++--
src/backend/access/spgist/spgutils.c | 9 +++++++++
src/backend/access/spgist/spgxlog.c | 8 +++-----
3 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index e4b2c29b0e..80b82e1602 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -110,7 +110,7 @@ spgbuild(Relation heap, Relation index, IndexInfo *indexInfo)
* Replay will re-initialize the pages, so don't take full pages
* images. No other data to log.
*/
- XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBuffer(1, rootbuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBuffer(2, nullbuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
@@ -173,7 +173,7 @@ spgbuildempty(Relation index)
smgrwrite(index->rd_smgr, INIT_FORKNUM, SPGIST_METAPAGE_BLKNO,
(char *) page, true);
log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
- SPGIST_METAPAGE_BLKNO, page, false);
+ SPGIST_METAPAGE_BLKNO, page, true);
/* Likewise for the root page. */
SpGistInitPage(page, SPGIST_LEAF);
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 22f64b0103..291a1ae571 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -534,6 +534,15 @@ SpGistInitMetapage(Page page)
/* initialize last-used-page cache to empty */
for (i = 0; i < SPGIST_CACHED_PAGES; i++)
metadata->lastUsedPages.cachedPage[i].blkno = InvalidBlockNumber;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c, as long as the
+ * buffer containing the page is passed to XLogRegisterBuffer() as a
+ * REGBUF_STANDARD page.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(SpGistMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index c440d21715..bf209416f2 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -1033,15 +1033,13 @@ void
spg_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
mask_page_lsn(page);
mask_page_hint_bits(page);
- /*
- * Any SpGist page other than meta contains unused space which needs to be
- * masked.
- */
- if (!SpGistPageIsMeta(page))
+ /* Mask the unused space, provided the page's pd_lower is set. */
+ if (pagehdr->pd_lower != 0)
mask_unused_space(page);
}
--
2.11.0
On Tue, Sep 12, 2017 at 3:51 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/09/11 18:13, Michael Paquier wrote:
On Mon, Sep 11, 2017 at 5:40 PM, Amit Langote wrote:
On 2017/09/10 15:22, Michael Paquier wrote:
Coordinating efforts here would be nice. If you, Amit K, are taking
care of a patch for btree and hash, would you, Amit L, write the part
for GIN, BRIN and SpGist? This needs a careful lookup as many code
paths need a lookup so it may take time. Please note that I don't mind
double-checking this part if you don't have enough room to do so.Sorry, I didn't have time today to carefully go through the recent
discussion on this thread (starting with Tom's email wherein he said he
set the status of the patch to Waiting on Author). I will try tomorrow.Thanks for the update! Once you get to this point, please let me know
if you would like to work on a more complete patch for brin, gin and
spgist. If you don't have enough room, I am fine to produce something.I updated the patches for GIN, BRIN, and SP-GiST to include the following
changes:1. Pass REGBUF_STNADARD flag when registering the metapage buffer
I have looked into brin patch and it seems you have not considered all
usages of meta page. The structure BrinRevmap also contains a
reference to meta page buffer and when that is modified (ex. in
revmap_physical_extend), then also I think you need to consider using
REGBUF_STNADARD flag.
Did I miss something from the discussion?
I think one point which might be missed is that the patch needs to
modify pd_lower for all usages of metapage, not only when it is first
time initialized.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for the review.
On 2017/09/12 23:27, Amit Kapila wrote:
On Tue, Sep 12, 2017 at 3:51 PM, Amit Langote wrote:
I updated the patches for GIN, BRIN, and SP-GiST to include the following
changes:1. Pass REGBUF_STNADARD flag when registering the metapage buffer
I have looked into brin patch and it seems you have not considered all
usages of meta page. The structure BrinRevmap also contains a
reference to meta page buffer and when that is modified (ex. in
revmap_physical_extend), then also I think you need to consider using
REGBUF_STNADARD flag.
Fixed.
Did I miss something from the discussion?
I think one point which might be missed is that the patch needs to
modify pd_lower for all usages of metapage, not only when it is first
time initialized.
Maybe I'm missing something, but isn't the metadata size fixed and hence
pd_lower won't change once it's initialized? Maybe, it's not true for all
index types?
Thanks,
Amit
Attachments:
0001-Set-pd_lower-correctly-in-the-GIN-metapage.patchtext/plain; charset=UTF-8; name=0001-Set-pd_lower-correctly-in-the-GIN-metapage.patchDownload
From c73183871632b368e2662ff5a35bfb6b3eaaade1 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 23 Jun 2017 11:20:41 +0900
Subject: [PATCH 1/3] Set pd_lower correctly in the GIN metapage.
Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
src/backend/access/gin/gininsert.c | 4 ++--
src/backend/access/gin/ginutil.c | 9 +++++++++
src/backend/access/gin/ginxlog.c | 24 +++++++++---------------
3 files changed, 20 insertions(+), 17 deletions(-)
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 5378011f50..c9aa4ee147 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -348,7 +348,7 @@ ginbuild(Relation heap, Relation index, IndexInfo *indexInfo)
Page page;
XLogBeginInsert();
- XLogRegisterBuffer(0, MetaBuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, MetaBuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBuffer(1, RootBuffer, REGBUF_WILL_INIT);
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_CREATE_INDEX);
@@ -447,7 +447,7 @@ ginbuildempty(Relation index)
START_CRIT_SECTION();
GinInitMetabuffer(MetaBuffer);
MarkBufferDirty(MetaBuffer);
- log_newpage_buffer(MetaBuffer, false);
+ log_newpage_buffer(MetaBuffer, true);
GinInitBuffer(RootBuffer, GIN_LEAF);
MarkBufferDirty(RootBuffer);
log_newpage_buffer(RootBuffer, false);
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 136ea27718..e926649dd2 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -374,6 +374,15 @@ GinInitMetabuffer(Buffer b)
metadata->nDataPages = 0;
metadata->nEntries = 0;
metadata->ginVersion = GIN_CURRENT_VERSION;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c, as long as the
+ * buffer containing the page is passed to XLogRegisterBuffer() as a
+ * REGBUF_STANDARD page.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 7ba04e324f..f5c11b2d9a 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -514,7 +514,7 @@ ginRedoUpdateMetapage(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
MarkBufferDirty(metabuffer);
@@ -656,7 +656,7 @@ ginRedoDeleteListPages(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
@@ -768,6 +768,7 @@ void
gin_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
GinPageOpaque opaque;
mask_page_lsn(page);
@@ -776,18 +777,11 @@ gin_mask(char *pagedata, BlockNumber blkno)
mask_page_hint_bits(page);
/*
- * GIN metapage doesn't use pd_lower/pd_upper. Other page types do. Hence,
- * we need to apply masking for those pages.
+ * For GIN_DELETED page, the page is initialized to empty. Hence, mask
+ * the page content.
*/
- if (opaque->flags != GIN_META)
- {
- /*
- * For GIN_DELETED page, the page is initialized to empty. Hence, mask
- * the page content.
- */
- if (opaque->flags & GIN_DELETED)
- mask_page_content(page);
- else
- mask_unused_space(page);
- }
+ if (opaque->flags & GIN_DELETED)
+ mask_page_content(page);
+ else if (pagehdr->pd_lower != 0)
+ mask_unused_space(page);
}
--
2.11.0
0002-Set-pd_lower-correctly-in-the-BRIN-index-metapage.patchtext/plain; charset=UTF-8; name=0002-Set-pd_lower-correctly-in-the-BRIN-index-metapage.patchDownload
From 9617c1d81f96fa790e067dd4b2aed36cd77063ac Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 26 Jun 2017 15:13:32 +0900
Subject: [PATCH 2/3] Set pd_lower correctly in the BRIN index metapage
Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
src/backend/access/brin/brin.c | 4 ++--
src/backend/access/brin/brin_pageops.c | 9 +++++++++
src/backend/access/brin/brin_revmap.c | 2 +-
src/backend/access/brin/brin_xlog.c | 9 +++++++--
4 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index b3aa6d1ced..e6909d7aea 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -685,7 +685,7 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfBrinCreateIdx);
- XLogRegisterBuffer(0, meta, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, meta, REGBUF_WILL_INIT | REGBUF_STANDARD);
recptr = XLogInsert(RM_BRIN_ID, XLOG_BRIN_CREATE_INDEX);
@@ -742,7 +742,7 @@ brinbuildempty(Relation index)
brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index),
BRIN_CURRENT_VERSION);
MarkBufferDirty(metabuf);
- log_newpage_buffer(metabuf, false);
+ log_newpage_buffer(metabuf, true);
END_CRIT_SECTION();
UnlockReleaseBuffer(metabuf);
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 80f803e438..117322ed17 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -491,6 +491,15 @@ brin_metapage_init(Page page, BlockNumber pagesPerRange, uint16 version)
* revmap page to be created when the index is.
*/
metadata->lastRevmapPage = 0;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c, as long as the
+ * buffer containing the page is passed to XLogRegisterBuffer() as a
+ * REGBUF_STANDARD page.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 22f2076887..f85a51c169 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -635,7 +635,7 @@ revmap_physical_extend(BrinRevmap *revmap)
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfBrinRevmapExtend);
- XLogRegisterBuffer(0, revmap->rm_metaBuf, 0);
+ XLogRegisterBuffer(0, revmap->rm_metaBuf, REGBUF_STANDARD);
XLogRegisterBuffer(1, buf, REGBUF_WILL_INIT);
diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c
index dff7198a39..1309d44b04 100644
--- a/src/backend/access/brin/brin_xlog.c
+++ b/src/backend/access/brin/brin_xlog.c
@@ -331,14 +331,19 @@ void
brin_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
mask_page_lsn(page);
mask_page_hint_bits(page);
- if (BRIN_IS_REGULAR_PAGE(page))
+ /*
+ * Regular brin pages contain unused space which needs to be masked.
+ * Similarly for meta pages, but only if pd_lower has been set.
+ */
+ if (BRIN_IS_REGULAR_PAGE(page) ||
+ (BRIN_IS_META_PAGE(page) && pagehdr->pd_lower != 0))
{
- /* Regular brin pages contain unused space which needs to be masked. */
mask_unused_space(page);
}
}
--
2.11.0
0003-Set-pd_lower-correctly-in-the-SP-GiST-index-metapage.patchtext/plain; charset=UTF-8; name=0003-Set-pd_lower-correctly-in-the-SP-GiST-index-metapage.patchDownload
From ce3ff8ddbaaaea487f89e1a47b71c2d32047b426 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 26 Jun 2017 15:23:34 +0900
Subject: [PATCH 3/3] Set pd_lower correctly in the SP-GiST index metapage
Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
src/backend/access/spgist/spginsert.c | 4 ++--
src/backend/access/spgist/spgutils.c | 9 +++++++++
src/backend/access/spgist/spgxlog.c | 8 +++-----
3 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index e4b2c29b0e..80b82e1602 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -110,7 +110,7 @@ spgbuild(Relation heap, Relation index, IndexInfo *indexInfo)
* Replay will re-initialize the pages, so don't take full pages
* images. No other data to log.
*/
- XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBuffer(1, rootbuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBuffer(2, nullbuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
@@ -173,7 +173,7 @@ spgbuildempty(Relation index)
smgrwrite(index->rd_smgr, INIT_FORKNUM, SPGIST_METAPAGE_BLKNO,
(char *) page, true);
log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
- SPGIST_METAPAGE_BLKNO, page, false);
+ SPGIST_METAPAGE_BLKNO, page, true);
/* Likewise for the root page. */
SpGistInitPage(page, SPGIST_LEAF);
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 22f64b0103..291a1ae571 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -534,6 +534,15 @@ SpGistInitMetapage(Page page)
/* initialize last-used-page cache to empty */
for (i = 0; i < SPGIST_CACHED_PAGES; i++)
metadata->lastUsedPages.cachedPage[i].blkno = InvalidBlockNumber;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c, as long as the
+ * buffer containing the page is passed to XLogRegisterBuffer() as a
+ * REGBUF_STANDARD page.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(SpGistMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index c440d21715..bf209416f2 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -1033,15 +1033,13 @@ void
spg_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
mask_page_lsn(page);
mask_page_hint_bits(page);
- /*
- * Any SpGist page other than meta contains unused space which needs to be
- * masked.
- */
- if (!SpGistPageIsMeta(page))
+ /* Mask the unused space, provided the page's pd_lower is set. */
+ if (pagehdr->pd_lower != 0)
mask_unused_space(page);
}
--
2.11.0
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
On 2017/09/12 23:27, Amit Kapila wrote:
I think one point which might be missed is that the patch needs to
modify pd_lower for all usages of metapage, not only when it is first
time initialized.
Maybe I'm missing something, but isn't the metadata size fixed and hence
pd_lower won't change once it's initialized? Maybe, it's not true for all
index types?
No, the point is that you might be dealing with an index recently
pg_upgraded from v10 or before, which does not have the correct
value for pd_lower on that page. This has to be coped with.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/13 13:05, Tom Lane wrote:
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
On 2017/09/12 23:27, Amit Kapila wrote:
I think one point which might be missed is that the patch needs to
modify pd_lower for all usages of metapage, not only when it is first
time initialized.Maybe I'm missing something, but isn't the metadata size fixed and hence
pd_lower won't change once it's initialized? Maybe, it's not true for all
index types?No, the point is that you might be dealing with an index recently
pg_upgraded from v10 or before, which does not have the correct
value for pd_lower on that page. This has to be coped with.
Ah, got it. Thanks for the explanation.
I updated the patches so that the metapage's pd_lower is set to the
correct value just before *every* point where we are about to insert a
full page image of the metapage into WAL. That's in addition to doing the
same in various metapage init routines, which the original patch did
already anyway. I guess this now ensures that wal_consistency_checking
masking of these metapages as standard layout pages always works, even for
pre-v11 indexes that were upgraded.
Also, we now pass the metapage buffer as containing a page of standard
layout to XLogRegisterBuffer(), so that any hole in it is compressed when
actually writing to WAL.
Thanks,
Amit
Attachments:
0001-Set-pd_lower-correctly-in-the-GIN-metapage.patchtext/plain; charset=UTF-8; name=0001-Set-pd_lower-correctly-in-the-GIN-metapage.patchDownload
From 607b4ab062652e7ffc0f95338c9265b09be18b56 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 23 Jun 2017 11:20:41 +0900
Subject: [PATCH 1/3] Set pd_lower correctly in the GIN metapage.
Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
src/backend/access/gin/ginfast.c | 22 ++++++++++++++++++++--
src/backend/access/gin/gininsert.c | 4 ++--
src/backend/access/gin/ginutil.c | 19 ++++++++++++++++++-
src/backend/access/gin/ginxlog.c | 24 +++++++++---------------
4 files changed, 49 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 59e435465a..d96529cf72 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -399,6 +399,15 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
/*
* Write metabuffer, make xlog entry
*/
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c, because we pass the
+ * buffer containing this page to XLogRegisterBuffer() as a page with
+ * standard layout.
+ */
+ ((PageHeader) metapage)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) metapage;
MarkBufferDirty(metabuffer);
if (needWal)
@@ -407,7 +416,7 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
memcpy(&data.metadata, metadata, sizeof(GinMetaPageData));
- XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterData((char *) &data, sizeof(ginxlogUpdateMeta));
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_UPDATE_META_PAGE);
@@ -572,6 +581,14 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
metadata->nPendingHeapTuples = 0;
}
+ /*
+ * Set pd_lower just past the end of the metadata. This is not
+ * essential but it makes the page look compressible to xlog.c,
+ * because we pass the buffer containing this page to
+ * XLogRegisterBuffer() as page with standard layout.
+ */
+ ((PageHeader) metapage)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) metapage;
MarkBufferDirty(metabuffer);
for (i = 0; i < data.ndeleted; i++)
@@ -586,7 +603,8 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
XLogRecPtr recptr;
XLogBeginInsert();
- XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuffer,
+ REGBUF_WILL_INIT | REGBUF_STANDARD);
for (i = 0; i < data.ndeleted; i++)
XLogRegisterBuffer(i + 1, buffers[i], REGBUF_WILL_INIT);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 5378011f50..c9aa4ee147 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -348,7 +348,7 @@ ginbuild(Relation heap, Relation index, IndexInfo *indexInfo)
Page page;
XLogBeginInsert();
- XLogRegisterBuffer(0, MetaBuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, MetaBuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBuffer(1, RootBuffer, REGBUF_WILL_INIT);
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_CREATE_INDEX);
@@ -447,7 +447,7 @@ ginbuildempty(Relation index)
START_CRIT_SECTION();
GinInitMetabuffer(MetaBuffer);
MarkBufferDirty(MetaBuffer);
- log_newpage_buffer(MetaBuffer, false);
+ log_newpage_buffer(MetaBuffer, true);
GinInitBuffer(RootBuffer, GIN_LEAF);
MarkBufferDirty(RootBuffer);
log_newpage_buffer(RootBuffer, false);
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 136ea27718..d680849e9d 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -374,6 +374,15 @@ GinInitMetabuffer(Buffer b)
metadata->nDataPages = 0;
metadata->nEntries = 0;
metadata->ginVersion = GIN_CURRENT_VERSION;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c, because we pass the
+ * buffer containing this page to XLogRegisterBuffer() as a page with
+ * standard layout.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) page;
}
/*
@@ -676,6 +685,14 @@ ginUpdateStats(Relation index, const GinStatsData *stats)
metadata->nDataPages = stats->nDataPages;
metadata->nEntries = stats->nEntries;
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c, because we pass the
+ * buffer containing this page to XLogRegisterBuffer() as page with
+ * standard layout.
+ */
+ ((PageHeader) metapage)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) metapage;
MarkBufferDirty(metabuffer);
if (RelationNeedsWAL(index))
@@ -690,7 +707,7 @@ ginUpdateStats(Relation index, const GinStatsData *stats)
XLogBeginInsert();
XLogRegisterData((char *) &data, sizeof(ginxlogUpdateMeta));
- XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_UPDATE_META_PAGE);
PageSetLSN(metapage, recptr);
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 7ba04e324f..f5c11b2d9a 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -514,7 +514,7 @@ ginRedoUpdateMetapage(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
MarkBufferDirty(metabuffer);
@@ -656,7 +656,7 @@ ginRedoDeleteListPages(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
@@ -768,6 +768,7 @@ void
gin_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
GinPageOpaque opaque;
mask_page_lsn(page);
@@ -776,18 +777,11 @@ gin_mask(char *pagedata, BlockNumber blkno)
mask_page_hint_bits(page);
/*
- * GIN metapage doesn't use pd_lower/pd_upper. Other page types do. Hence,
- * we need to apply masking for those pages.
+ * For GIN_DELETED page, the page is initialized to empty. Hence, mask
+ * the page content.
*/
- if (opaque->flags != GIN_META)
- {
- /*
- * For GIN_DELETED page, the page is initialized to empty. Hence, mask
- * the page content.
- */
- if (opaque->flags & GIN_DELETED)
- mask_page_content(page);
- else
- mask_unused_space(page);
- }
+ if (opaque->flags & GIN_DELETED)
+ mask_page_content(page);
+ else if (pagehdr->pd_lower != 0)
+ mask_unused_space(page);
}
--
2.11.0
0002-Set-pd_lower-correctly-in-the-BRIN-index-metapage.patchtext/plain; charset=UTF-8; name=0002-Set-pd_lower-correctly-in-the-BRIN-index-metapage.patchDownload
From 202ef67cf10060e09865fd2ce2e77f4145d7be64 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 26 Jun 2017 15:13:32 +0900
Subject: [PATCH 2/3] Set pd_lower correctly in the BRIN index metapage
Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
src/backend/access/brin/brin.c | 4 ++--
src/backend/access/brin/brin_pageops.c | 9 +++++++++
src/backend/access/brin/brin_revmap.c | 11 ++++++++++-
src/backend/access/brin/brin_xlog.c | 18 ++++++++++++++++--
4 files changed, 37 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index b3aa6d1ced..e6909d7aea 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -685,7 +685,7 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfBrinCreateIdx);
- XLogRegisterBuffer(0, meta, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, meta, REGBUF_WILL_INIT | REGBUF_STANDARD);
recptr = XLogInsert(RM_BRIN_ID, XLOG_BRIN_CREATE_INDEX);
@@ -742,7 +742,7 @@ brinbuildempty(Relation index)
brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index),
BRIN_CURRENT_VERSION);
MarkBufferDirty(metabuf);
- log_newpage_buffer(metabuf, false);
+ log_newpage_buffer(metabuf, true);
END_CRIT_SECTION();
UnlockReleaseBuffer(metabuf);
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 80f803e438..92903f38c7 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -491,6 +491,15 @@ brin_metapage_init(Page page, BlockNumber pagesPerRange, uint16 version)
* revmap page to be created when the index is.
*/
metadata->lastRevmapPage = 0;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c, because we pass the
+ * buffer containing this page to XLogRegisterBuffer() as a page with
+ * standard layout.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 22f2076887..4b056c68a2 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -624,6 +624,15 @@ revmap_physical_extend(BrinRevmap *revmap)
MarkBufferDirty(buf);
metadata->lastRevmapPage = mapBlk;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c, because we pass the
+ * buffer containing this page to XLogRegisterBuffer() as a page with
+ * standard layout.
+ */
+ ((PageHeader) metapage)->pd_lower =
+ ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) metapage;
MarkBufferDirty(revmap->rm_metaBuf);
if (RelationNeedsWAL(revmap->rm_irel))
@@ -635,7 +644,7 @@ revmap_physical_extend(BrinRevmap *revmap)
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfBrinRevmapExtend);
- XLogRegisterBuffer(0, revmap->rm_metaBuf, 0);
+ XLogRegisterBuffer(0, revmap->rm_metaBuf, REGBUF_STANDARD);
XLogRegisterBuffer(1, buf, REGBUF_WILL_INIT);
diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c
index dff7198a39..e94b0033bf 100644
--- a/src/backend/access/brin/brin_xlog.c
+++ b/src/backend/access/brin/brin_xlog.c
@@ -234,6 +234,15 @@ brin_xlog_revmap_extend(XLogReaderState *record)
metadata->lastRevmapPage = xlrec->targetBlk;
PageSetLSN(metapg, lsn);
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not
+ * essential but it makes the page look compressible to xlog.c, because
+ * we pass the buffer containing this page to XLogRegisterBuffer() as a
+ * page with standard layout.
+ */
+ ((PageHeader) metapg)->pd_lower =
+ ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) metapg;
MarkBufferDirty(metabuf);
}
@@ -331,14 +340,19 @@ void
brin_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
mask_page_lsn(page);
mask_page_hint_bits(page);
- if (BRIN_IS_REGULAR_PAGE(page))
+ /*
+ * Regular brin pages contain unused space which needs to be masked.
+ * Similarly for meta pages, but only if pd_lower has been set.
+ */
+ if (BRIN_IS_REGULAR_PAGE(page) ||
+ (BRIN_IS_META_PAGE(page) && pagehdr->pd_lower != 0))
{
- /* Regular brin pages contain unused space which needs to be masked. */
mask_unused_space(page);
}
}
--
2.11.0
0003-Set-pd_lower-correctly-in-the-SP-GiST-index-metapage.patchtext/plain; charset=UTF-8; name=0003-Set-pd_lower-correctly-in-the-SP-GiST-index-metapage.patchDownload
From 8617179f96ad3d8cba99b365c836dff5a525c843 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 26 Jun 2017 15:23:34 +0900
Subject: [PATCH 3/3] Set pd_lower correctly in the SP-GiST index metapage
Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
src/backend/access/spgist/spginsert.c | 4 ++--
src/backend/access/spgist/spgutils.c | 19 +++++++++++++++++++
src/backend/access/spgist/spgxlog.c | 8 +++-----
3 files changed, 24 insertions(+), 7 deletions(-)
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index e4b2c29b0e..80b82e1602 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -110,7 +110,7 @@ spgbuild(Relation heap, Relation index, IndexInfo *indexInfo)
* Replay will re-initialize the pages, so don't take full pages
* images. No other data to log.
*/
- XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBuffer(1, rootbuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBuffer(2, nullbuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
@@ -173,7 +173,7 @@ spgbuildempty(Relation index)
smgrwrite(index->rd_smgr, INIT_FORKNUM, SPGIST_METAPAGE_BLKNO,
(char *) page, true);
log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
- SPGIST_METAPAGE_BLKNO, page, false);
+ SPGIST_METAPAGE_BLKNO, page, true);
/* Likewise for the root page. */
SpGistInitPage(page, SPGIST_LEAF);
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 22f64b0103..9048c08f1c 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -256,15 +256,25 @@ SpGistUpdateMetaPage(Relation index)
if (cache != NULL)
{
Buffer metabuffer;
+ Page metapage;
SpGistMetaPageData *metadata;
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
+ metapage = BufferGetPage(metabuffer);
if (ConditionalLockBuffer(metabuffer))
{
metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
metadata->lastUsedPages = cache->lastUsedPages;
+ /*
+ * Set pd_lower just past the end of the metadata. This is not
+ * essential but it makes the page look compressible to xlog.c,
+ * because we pass the buffer containing this page to
+ * XLogRegisterBuffer() as page with standard layout.
+ */
+ ((PageHeader) metapage)->pd_lower = ((char *) metadata +
+ sizeof(SpGistMetaPageData)) - (char *) metapage;
MarkBufferDirty(metabuffer);
UnlockReleaseBuffer(metabuffer);
}
@@ -534,6 +544,15 @@ SpGistInitMetapage(Page page)
/* initialize last-used-page cache to empty */
for (i = 0; i < SPGIST_CACHED_PAGES; i++)
metadata->lastUsedPages.cachedPage[i].blkno = InvalidBlockNumber;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c, because we pass the
+ * buffer containing this page to XLogRegisterBuffer() as page with
+ * standard layout.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(SpGistMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index c440d21715..bf209416f2 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -1033,15 +1033,13 @@ void
spg_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
mask_page_lsn(page);
mask_page_hint_bits(page);
- /*
- * Any SpGist page other than meta contains unused space which needs to be
- * masked.
- */
- if (!SpGistPageIsMeta(page))
+ /* Mask the unused space, provided the page's pd_lower is set. */
+ if (pagehdr->pd_lower != 0)
mask_unused_space(page);
}
--
2.11.0
On Wed, Sep 13, 2017 at 2:48 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
I updated the patches so that the metapage's pd_lower is set to the
correct value just before *every* point where we are about to insert a
full page image of the metapage into WAL. That's in addition to doing the
same in various metapage init routines, which the original patch did
already anyway. I guess this now ensures that wal_consistency_checking
masking of these metapages as standard layout pages always works, even for
pre-v11 indexes that were upgraded.
Please note that I do have plans to look at all the patches proposed
on this thread for all the indexes next. No report for today though as
those deal with many code paths so it requires some attention. I think
I'll group the review for all index AMs into the same email if you
don't mind, each patch deals with its own thing in its own
src/backend/access/ path.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/13 16:20, Michael Paquier wrote:
On Wed, Sep 13, 2017 at 2:48 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:I updated the patches so that the metapage's pd_lower is set to the
correct value just before *every* point where we are about to insert a
full page image of the metapage into WAL. That's in addition to doing the
same in various metapage init routines, which the original patch did
already anyway. I guess this now ensures that wal_consistency_checking
masking of these metapages as standard layout pages always works, even for
pre-v11 indexes that were upgraded.Please note that I do have plans to look at all the patches proposed
on this thread for all the indexes next. No report for today though as
those deal with many code paths so it requires some attention. I think
I'll group the review for all index AMs into the same email if you
don't mind, each patch deals with its own thing in its own
src/backend/access/ path.
Sure, no problem.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 13, 2017 at 4:43 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Sure, no problem.
OK, I took a closer look at all patches, but did not run any manual
tests to test the compression except some stuff with
wal_consistency_checking.
+ if (opaque->flags & GIN_DELETED)
+ mask_page_content(page);
+ else if (pagehdr->pd_lower != 0)
+ mask_unused_space(page);
[...]
+ /* Mask the unused space, provided the page's pd_lower is set. */
+ if (pagehdr->pd_lower != 0)
mask_unused_space(page);
For the masking functions, shouldn't those check use (pd_lower >
SizeOfPageHeaderData) instead of 0? pd_lower gets set to a non-zero
value on HEAD, so you would apply the masking even if the meta page is
upgraded from an instance that did not enforce the value of pd_lower
later on. Those conditions also definitely need comments. That will be
a good reminder so as why it needs to be kept.
+ *
+ * This won't be of any help unless we use option like REGBUF_STANDARD
+ * while registering the buffer for metapage as otherwise, it won't try to
+ * remove the hole while logging the full page image.
*/
This comment is in the btree code. But you actually add
REGBUF_STANDARD. So I think that this could be just removed.
* Set pd_lower just past the end of the metadata. This is not essential
- * but it makes the page look compressible to xlog.c.
+ * but it makes the page look compressible to xlog.c. See
+ * _bt_initmetapage.
This reference could be removed as well as _bt_initmetapage does not
provide any information, the existing comment is wrong without your
patch, and then becomes right with this patch.
After that I have spotted a couple of places for btree, hash and
SpGist where the updates of pd_lower are not happening. Let's keep in
mind that updated meta pages could come from an upgraded version, so
we need to be careful here about all code paths updating meta pages,
and WAL-logging them.
It seems to me that an update of pd_lower is missing in _bt_getroot(),
just before marking the buffer as dirty I think. And there is a second
in _bt_unlink_halfdead_page(), a third in _bt_insertonpg() and finally
one in _bt_newroot().
For SpGist, I think that there are two missing: spgbuild() and spgGetCache().
For hash, hashbulkdelete(), _hash_vacuum_one_page(),
_hash_addovflpage(), _hash_freeovflpage() and _hash_doinsert() are
missing the shot, no? We could have a meta page of a hash index
upgraded from PG10.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 14, 2017 at 12:30 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Wed, Sep 13, 2017 at 4:43 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Sure, no problem.
+ * + * This won't be of any help unless we use option like REGBUF_STANDARD + * while registering the buffer for metapage as otherwise, it won't try to + * remove the hole while logging the full page image. */ This comment is in the btree code. But you actually add REGBUF_STANDARD. So I think that this could be just removed.* Set pd_lower just past the end of the metadata. This is not essential - * but it makes the page look compressible to xlog.c. + * but it makes the page look compressible to xlog.c. See + * _bt_initmetapage. This reference could be removed as well as _bt_initmetapage does not provide any information, the existing comment is wrong without your patch, and then becomes right with this patch.
I have added this comment just to add some explanation as to why we
are setting pd_lower and what makes it useful. We can change it or
remove it, but I am not sure what is the right thing to do here, may
be we can defer this to the committer.
It seems to me that an update of pd_lower is missing in _bt_getroot(),
just before marking the buffer as dirty I think. And there is a second
in _bt_unlink_halfdead_page(), a third in _bt_insertonpg() and finally
one in _bt_newroot().For hash, hashbulkdelete(), _hash_vacuum_one_page(),
_hash_addovflpage(), _hash_freeovflpage() and _hash_doinsert() are
missing the shot, no? We could have a meta page of a hash index
upgraded from PG10.
Why do we need to change metapage at every place for btree or hash?
Any index that is upgraded should have pd_lower set, do you have any
case in mind where it won't be set? For hash, if someone upgrades
from a version lower than 9.6, it might not have set, but we already
give warning to reindex the hash indexes upgraded from a version lower
than 10.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 14, 2017 at 6:36 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Why do we need to change metapage at every place for btree ...
I have been hunting for some time places where meta buffers were
marked as dirtied and logged. So in the effort, I think that my hands
and mind got hotter, forgetting that pd_lower is set there for ages.
Of course feel free to ignore that.
... or hash?
Any index that is upgraded should have pd_lower set, do you have any
case in mind where it won't be set? For hash, if someone upgrades
from a version lower than 9.6, it might not have set, but we already
give warning to reindex the hash indexes upgraded from a version lower
than 10.
Ah yes. You do set pd_lower in 10 as well for hash... So that will be
fine. So remains SpGist as a slacking AM based on the current patches.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/14 16:00, Michael Paquier wrote:
On Wed, Sep 13, 2017 at 4:43 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Sure, no problem.
OK, I took a closer look at all patches, but did not run any manual
tests to test the compression except some stuff with
wal_consistency_checking.
Thanks for the review.
+ if (opaque->flags & GIN_DELETED) + mask_page_content(page); + else if (pagehdr->pd_lower != 0) + mask_unused_space(page); [...] + /* Mask the unused space, provided the page's pd_lower is set. */ + if (pagehdr->pd_lower != 0) mask_unused_space(page);For the masking functions, shouldn't those check use (pd_lower >
SizeOfPageHeaderData) instead of 0? pd_lower gets set to a non-zero
value on HEAD, so you would apply the masking even if the meta page is
upgraded from an instance that did not enforce the value of pd_lower
later on. Those conditions also definitely need comments. That will be
a good reminder so as why it needs to be kept.
Agreed, done.
+ * + * This won't be of any help unless we use option like REGBUF_STANDARD + * while registering the buffer for metapage as otherwise, it won't try to + * remove the hole while logging the full page image. */ This comment is in the btree code. But you actually add REGBUF_STANDARD. So I think that this could be just removed.* Set pd_lower just past the end of the metadata. This is not essential - * but it makes the page look compressible to xlog.c. + * but it makes the page look compressible to xlog.c. See + * _bt_initmetapage. This reference could be removed as well as _bt_initmetapage does not provide any information, the existing comment is wrong without your patch, and then becomes right with this patch.
Amit K's reply may have addressed these comments.
After that I have spotted a couple of places for btree, hash and
SpGist where the updates of pd_lower are not happening. Let's keep in
mind that updated meta pages could come from an upgraded version, so
we need to be careful here about all code paths updating meta pages,
and WAL-logging them.It seems to me that an update of pd_lower is missing in _bt_getroot(),
just before marking the buffer as dirty I think. And there is a second
in _bt_unlink_halfdead_page(), a third in _bt_insertonpg() and finally
one in _bt_newroot().
Amit K's reply about btree and hash should've resolved any doubts for
those index types. About SP-Gist, see the comment below.
For SpGist, I think that there are two missing: spgbuild() and spgGetCache().
spgbuild() calls SpGistInitMetapage() before marking the metapage buffer
dirty. The latter already sets pd_lower correctly, so we don't need to do
it explicitly in spgbuild() itself.
spgGetCache() doesn't write the metapage, only reads it:
/* Last, get the lastUsedPages data from the metapage */
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
if (metadata->magicNumber != SPGIST_MAGIC_NUMBER)
elog(ERROR, "index \"%s\" is not an SP-GiST index",
RelationGetRelationName(index));
cache->lastUsedPages = metadata->lastUsedPages;
UnlockReleaseBuffer(metabuffer);
So, I think it won't be correct to set pd_lower here, no?
For hash, hashbulkdelete(), _hash_vacuum_one_page(),
_hash_addovflpage(), _hash_freeovflpage() and _hash_doinsert() are
missing the shot, no? We could have a meta page of a hash index
upgraded from PG10.
Amit K's reply. :)
Updated patch attached, which implements your suggested changes to the
masking functions.
By the way, as I noted on another unrelated thread, I will not be able to
respond to emails from tomorrow until 9/23.
Thanks,
Amit
Attachments:
0001-Set-pd_lower-correctly-in-the-GIN-metapage.patchtext/plain; charset=UTF-8; name=0001-Set-pd_lower-correctly-in-the-GIN-metapage.patchDownload
From 6741334ce5f3261b5e5caffa3914d4e1485fe5d8 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 23 Jun 2017 11:20:41 +0900
Subject: [PATCH 1/3] Set pd_lower correctly in the GIN metapage.
Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
src/backend/access/gin/ginfast.c | 22 ++++++++++++++++++++--
src/backend/access/gin/gininsert.c | 4 ++--
src/backend/access/gin/ginutil.c | 19 ++++++++++++++++++-
src/backend/access/gin/ginxlog.c | 25 ++++++++++---------------
4 files changed, 50 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 59e435465a..d96529cf72 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -399,6 +399,15 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
/*
* Write metabuffer, make xlog entry
*/
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c, because we pass the
+ * buffer containing this page to XLogRegisterBuffer() as a page with
+ * standard layout.
+ */
+ ((PageHeader) metapage)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) metapage;
MarkBufferDirty(metabuffer);
if (needWal)
@@ -407,7 +416,7 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
memcpy(&data.metadata, metadata, sizeof(GinMetaPageData));
- XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterData((char *) &data, sizeof(ginxlogUpdateMeta));
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_UPDATE_META_PAGE);
@@ -572,6 +581,14 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
metadata->nPendingHeapTuples = 0;
}
+ /*
+ * Set pd_lower just past the end of the metadata. This is not
+ * essential but it makes the page look compressible to xlog.c,
+ * because we pass the buffer containing this page to
+ * XLogRegisterBuffer() as page with standard layout.
+ */
+ ((PageHeader) metapage)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) metapage;
MarkBufferDirty(metabuffer);
for (i = 0; i < data.ndeleted; i++)
@@ -586,7 +603,8 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
XLogRecPtr recptr;
XLogBeginInsert();
- XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuffer,
+ REGBUF_WILL_INIT | REGBUF_STANDARD);
for (i = 0; i < data.ndeleted; i++)
XLogRegisterBuffer(i + 1, buffers[i], REGBUF_WILL_INIT);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 5378011f50..c9aa4ee147 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -348,7 +348,7 @@ ginbuild(Relation heap, Relation index, IndexInfo *indexInfo)
Page page;
XLogBeginInsert();
- XLogRegisterBuffer(0, MetaBuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, MetaBuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBuffer(1, RootBuffer, REGBUF_WILL_INIT);
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_CREATE_INDEX);
@@ -447,7 +447,7 @@ ginbuildempty(Relation index)
START_CRIT_SECTION();
GinInitMetabuffer(MetaBuffer);
MarkBufferDirty(MetaBuffer);
- log_newpage_buffer(MetaBuffer, false);
+ log_newpage_buffer(MetaBuffer, true);
GinInitBuffer(RootBuffer, GIN_LEAF);
MarkBufferDirty(RootBuffer);
log_newpage_buffer(RootBuffer, false);
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 136ea27718..d680849e9d 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -374,6 +374,15 @@ GinInitMetabuffer(Buffer b)
metadata->nDataPages = 0;
metadata->nEntries = 0;
metadata->ginVersion = GIN_CURRENT_VERSION;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c, because we pass the
+ * buffer containing this page to XLogRegisterBuffer() as a page with
+ * standard layout.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) page;
}
/*
@@ -676,6 +685,14 @@ ginUpdateStats(Relation index, const GinStatsData *stats)
metadata->nDataPages = stats->nDataPages;
metadata->nEntries = stats->nEntries;
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c, because we pass the
+ * buffer containing this page to XLogRegisterBuffer() as page with
+ * standard layout.
+ */
+ ((PageHeader) metapage)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) metapage;
MarkBufferDirty(metabuffer);
if (RelationNeedsWAL(index))
@@ -690,7 +707,7 @@ ginUpdateStats(Relation index, const GinStatsData *stats)
XLogBeginInsert();
XLogRegisterData((char *) &data, sizeof(ginxlogUpdateMeta));
- XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_UPDATE_META_PAGE);
PageSetLSN(metapage, recptr);
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 7ba04e324f..52c14ce42e 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -514,7 +514,7 @@ ginRedoUpdateMetapage(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
MarkBufferDirty(metabuffer);
@@ -656,7 +656,7 @@ ginRedoDeleteListPages(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
@@ -768,6 +768,7 @@ void
gin_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
GinPageOpaque opaque;
mask_page_lsn(page);
@@ -776,18 +777,12 @@ gin_mask(char *pagedata, BlockNumber blkno)
mask_page_hint_bits(page);
/*
- * GIN metapage doesn't use pd_lower/pd_upper. Other page types do. Hence,
- * we need to apply masking for those pages.
+ * For GIN_DELETED page, the page is initialized to empty. Hence, mask
+ * the page content. For other pages, mask the hole if the pd_lower
+ * appears to have been set correctly.
*/
- if (opaque->flags != GIN_META)
- {
- /*
- * For GIN_DELETED page, the page is initialized to empty. Hence, mask
- * the page content.
- */
- if (opaque->flags & GIN_DELETED)
- mask_page_content(page);
- else
- mask_unused_space(page);
- }
+ if (opaque->flags & GIN_DELETED)
+ mask_page_content(page);
+ else if (pagehdr->pd_lower > SizeOfPageHeaderData)
+ mask_unused_space(page);
}
--
2.11.0
0002-Set-pd_lower-correctly-in-the-BRIN-index-metapage.patchtext/plain; charset=UTF-8; name=0002-Set-pd_lower-correctly-in-the-BRIN-index-metapage.patchDownload
From c51791a4e0effe9c5b31af11ed07f8c49cd0014e Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 26 Jun 2017 15:13:32 +0900
Subject: [PATCH 2/3] Set pd_lower correctly in the BRIN index metapage
Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
src/backend/access/brin/brin.c | 4 ++--
src/backend/access/brin/brin_pageops.c | 9 +++++++++
src/backend/access/brin/brin_revmap.c | 11 ++++++++++-
src/backend/access/brin/brin_xlog.c | 19 +++++++++++++++++--
4 files changed, 38 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index b3aa6d1ced..e6909d7aea 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -685,7 +685,7 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfBrinCreateIdx);
- XLogRegisterBuffer(0, meta, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, meta, REGBUF_WILL_INIT | REGBUF_STANDARD);
recptr = XLogInsert(RM_BRIN_ID, XLOG_BRIN_CREATE_INDEX);
@@ -742,7 +742,7 @@ brinbuildempty(Relation index)
brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index),
BRIN_CURRENT_VERSION);
MarkBufferDirty(metabuf);
- log_newpage_buffer(metabuf, false);
+ log_newpage_buffer(metabuf, true);
END_CRIT_SECTION();
UnlockReleaseBuffer(metabuf);
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 80f803e438..92903f38c7 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -491,6 +491,15 @@ brin_metapage_init(Page page, BlockNumber pagesPerRange, uint16 version)
* revmap page to be created when the index is.
*/
metadata->lastRevmapPage = 0;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c, because we pass the
+ * buffer containing this page to XLogRegisterBuffer() as a page with
+ * standard layout.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 22f2076887..4b056c68a2 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -624,6 +624,15 @@ revmap_physical_extend(BrinRevmap *revmap)
MarkBufferDirty(buf);
metadata->lastRevmapPage = mapBlk;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c, because we pass the
+ * buffer containing this page to XLogRegisterBuffer() as a page with
+ * standard layout.
+ */
+ ((PageHeader) metapage)->pd_lower =
+ ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) metapage;
MarkBufferDirty(revmap->rm_metaBuf);
if (RelationNeedsWAL(revmap->rm_irel))
@@ -635,7 +644,7 @@ revmap_physical_extend(BrinRevmap *revmap)
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfBrinRevmapExtend);
- XLogRegisterBuffer(0, revmap->rm_metaBuf, 0);
+ XLogRegisterBuffer(0, revmap->rm_metaBuf, REGBUF_STANDARD);
XLogRegisterBuffer(1, buf, REGBUF_WILL_INIT);
diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c
index dff7198a39..a9c7d909f6 100644
--- a/src/backend/access/brin/brin_xlog.c
+++ b/src/backend/access/brin/brin_xlog.c
@@ -234,6 +234,15 @@ brin_xlog_revmap_extend(XLogReaderState *record)
metadata->lastRevmapPage = xlrec->targetBlk;
PageSetLSN(metapg, lsn);
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not
+ * essential but it makes the page look compressible to xlog.c, because
+ * we pass the buffer containing this page to XLogRegisterBuffer() as a
+ * page with standard layout.
+ */
+ ((PageHeader) metapg)->pd_lower =
+ ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) metapg;
MarkBufferDirty(metabuf);
}
@@ -331,14 +340,20 @@ void
brin_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
mask_page_lsn(page);
mask_page_hint_bits(page);
- if (BRIN_IS_REGULAR_PAGE(page))
+ /*
+ * Regular brin pages contain unused space which needs to be masked.
+ * Similarly for meta pages, but mask it only if pd_lower appears to have
+ * been set correctly.
+ */
+ if (BRIN_IS_REGULAR_PAGE(page) ||
+ (BRIN_IS_META_PAGE(page) && pagehdr->pd_lower > SizeOfPageHeaderData))
{
- /* Regular brin pages contain unused space which needs to be masked. */
mask_unused_space(page);
}
}
--
2.11.0
0003-Set-pd_lower-correctly-in-the-SP-GiST-index-metapage.patchtext/plain; charset=UTF-8; name=0003-Set-pd_lower-correctly-in-the-SP-GiST-index-metapage.patchDownload
From 478a9306e64d821d2fafd28906690ad2e00e5cd0 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 26 Jun 2017 15:23:34 +0900
Subject: [PATCH 3/3] Set pd_lower correctly in the SP-GiST index metapage
Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
src/backend/access/spgist/spginsert.c | 4 ++--
src/backend/access/spgist/spgutils.c | 19 +++++++++++++++++++
src/backend/access/spgist/spgxlog.c | 7 ++++---
3 files changed, 25 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index e4b2c29b0e..80b82e1602 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -110,7 +110,7 @@ spgbuild(Relation heap, Relation index, IndexInfo *indexInfo)
* Replay will re-initialize the pages, so don't take full pages
* images. No other data to log.
*/
- XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBuffer(1, rootbuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBuffer(2, nullbuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
@@ -173,7 +173,7 @@ spgbuildempty(Relation index)
smgrwrite(index->rd_smgr, INIT_FORKNUM, SPGIST_METAPAGE_BLKNO,
(char *) page, true);
log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
- SPGIST_METAPAGE_BLKNO, page, false);
+ SPGIST_METAPAGE_BLKNO, page, true);
/* Likewise for the root page. */
SpGistInitPage(page, SPGIST_LEAF);
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 22f64b0103..9048c08f1c 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -256,15 +256,25 @@ SpGistUpdateMetaPage(Relation index)
if (cache != NULL)
{
Buffer metabuffer;
+ Page metapage;
SpGistMetaPageData *metadata;
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
+ metapage = BufferGetPage(metabuffer);
if (ConditionalLockBuffer(metabuffer))
{
metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
metadata->lastUsedPages = cache->lastUsedPages;
+ /*
+ * Set pd_lower just past the end of the metadata. This is not
+ * essential but it makes the page look compressible to xlog.c,
+ * because we pass the buffer containing this page to
+ * XLogRegisterBuffer() as page with standard layout.
+ */
+ ((PageHeader) metapage)->pd_lower = ((char *) metadata +
+ sizeof(SpGistMetaPageData)) - (char *) metapage;
MarkBufferDirty(metabuffer);
UnlockReleaseBuffer(metabuffer);
}
@@ -534,6 +544,15 @@ SpGistInitMetapage(Page page)
/* initialize last-used-page cache to empty */
for (i = 0; i < SPGIST_CACHED_PAGES; i++)
metadata->lastUsedPages.cachedPage[i].blkno = InvalidBlockNumber;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is not essential
+ * but it makes the page look compressible to xlog.c, because we pass the
+ * buffer containing this page to XLogRegisterBuffer() as page with
+ * standard layout.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(SpGistMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index c440d21715..84acd5a30f 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -1033,15 +1033,16 @@ void
spg_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
mask_page_lsn(page);
mask_page_hint_bits(page);
/*
- * Any SpGist page other than meta contains unused space which needs to be
- * masked.
+ * Mask the unused space, only if the page's pd_lower appears to have been
+ * set correctly.
*/
- if (!SpGistPageIsMeta(page))
+ if (pagehdr->pd_lower > SizeOfPageHeaderData)
mask_unused_space(page);
}
--
2.11.0
On Fri, Sep 15, 2017 at 4:22 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/09/14 16:00, Michael Paquier wrote:
On Wed, Sep 13, 2017 at 4:43 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Sure, no problem.
OK, I took a closer look at all patches, but did not run any manual
tests to test the compression except some stuff with
wal_consistency_checking.Thanks for the review.
For SpGist, I think that there are two missing: spgbuild() and spgGetCache().
spgbuild() calls SpGistInitMetapage() before marking the metapage buffer
dirty. The latter already sets pd_lower correctly, so we don't need to do
it explicitly in spgbuild() itself.
Check.
spgGetCache() doesn't write the metapage, only reads it:
/* Last, get the lastUsedPages data from the metapage */
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
if (metadata->magicNumber != SPGIST_MAGIC_NUMBER)
elog(ERROR, "index \"%s\" is not an SP-GiST index",
RelationGetRelationName(index));cache->lastUsedPages = metadata->lastUsedPages;
UnlockReleaseBuffer(metabuffer);
So, I think it won't be correct to set pd_lower here, no?
Yeah, I am just reading the code again and there is no alarm here.
Updated patch attached, which implements your suggested changes to the
masking functions.By the way, as I noted on another unrelated thread, I will not be able to
respond to emails from tomorrow until 9/23.
No problem. Enjoy your vacations.
I have spent some time looking at the XLOG insert code, and tested the
compressibility of the meta pages. And I have noticed that all pages
registered with REGBUF_WILL_INIT will force XLogRecordAssemble to not
take a FPW of the block registered because the page will be
reinitialized at replay, and in such cases the zero'ed page is filled
with the data from the record. log_newpage is used to initialize init
forks for unlogged relations, which is where this patch allows page
compression to take effect correctly. Still setting REGBUF_STANDARD
with REGBUF_WILL_INIT is actually a no-op, except if
wal_checking_consistency is used when registering a buffer for WAL
insertion. There is one code path though where things are useful all
the time: revmap_physical_extend for BRIN.
The patch is using the correct logic, still such comments are in my
opinion incorrect because of the reason written above:
+ * This won't be of any help unless we use option like REGBUF_STANDARD
+ * while registering the buffer for metapage as otherwise, it won't try to
+ * remove the hole while logging the full page image.
Here REGBUF_STANDARD is actually a no-op for btree.
+ /*
+ * Set pd_lower just past the end of the metadata. This is not
+ * essential but it makes the page look compressible to xlog.c,
+ * because we pass the buffer containing this page to
+ * XLogRegisterBuffer() as page with standard layout.
+ */
And here as well because of REGBUF_WILL_INIT is used.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Sep 16, 2017 at 7:15 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Fri, Sep 15, 2017 at 4:22 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/09/14 16:00, Michael Paquier wrote:
On Wed, Sep 13, 2017 at 4:43 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Sure, no problem.
OK, I took a closer look at all patches, but did not run any manual
tests to test the compression except some stuff with
wal_consistency_checking.Thanks for the review.
For SpGist, I think that there are two missing: spgbuild() and spgGetCache().
spgbuild() calls SpGistInitMetapage() before marking the metapage buffer
dirty. The latter already sets pd_lower correctly, so we don't need to do
it explicitly in spgbuild() itself.Check.
spgGetCache() doesn't write the metapage, only reads it:
/* Last, get the lastUsedPages data from the metapage */
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
if (metadata->magicNumber != SPGIST_MAGIC_NUMBER)
elog(ERROR, "index \"%s\" is not an SP-GiST index",
RelationGetRelationName(index));cache->lastUsedPages = metadata->lastUsedPages;
UnlockReleaseBuffer(metabuffer);
So, I think it won't be correct to set pd_lower here, no?
Yeah, I am just reading the code again and there is no alarm here.
Updated patch attached, which implements your suggested changes to the
masking functions.By the way, as I noted on another unrelated thread, I will not be able to
respond to emails from tomorrow until 9/23.No problem. Enjoy your vacations.
I have spent some time looking at the XLOG insert code, and tested the
compressibility of the meta pages. And I have noticed that all pages
registered with REGBUF_WILL_INIT will force XLogRecordAssemble to not
take a FPW of the block registered because the page will be
reinitialized at replay, and in such cases the zero'ed page is filled
with the data from the record. log_newpage is used to initialize init
forks for unlogged relations, which is where this patch allows page
compression to take effect correctly. Still setting REGBUF_STANDARD
with REGBUF_WILL_INIT is actually a no-op, except if
wal_checking_consistency is used when registering a buffer for WAL
insertion. There is one code path though where things are useful all
the time: revmap_physical_extend for BRIN.The patch is using the correct logic, still such comments are in my opinion incorrect because of the reason written above: + * This won't be of any help unless we use option like REGBUF_STANDARD + * while registering the buffer for metapage as otherwise, it won't try to + * remove the hole while logging the full page image. Here REGBUF_STANDARD is actually a no-op for btree.
You have already noticed above that it will help when
wal_checking_consistency is used and that was the main motivation to
pass REGBUF_STANDARD apart from maintaining consistency. It is not
clear to me what is bothering you. If your only worry about these
patches is that you want this sentence to be removed from the comment
because you think it is obvious or doesn't make much sense, then I
think we can leave this decision to committer. I have added it based
on Tom's suggestion above thread about explaining why it is
inessential or essential to set pd_lower. I think Amit Langote just
tried to mimic what I have done in hash and btree patches to maintain
consistency. I am also not very sure if we should write some detailed
comment or leave the existing comment as it is. I think it is just a
matter of different perspective.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 18, 2017 at 11:16 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Sep 16, 2017 at 7:15 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Fri, Sep 15, 2017 at 4:22 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/09/14 16:00, Michael Paquier wrote:
On Wed, Sep 13, 2017 at 4:43 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Sure, no problem.
OK, I took a closer look at all patches, but did not run any manual
tests to test the compression except some stuff with
wal_consistency_checking.Thanks for the review.
For SpGist, I think that there are two missing: spgbuild() and spgGetCache().
spgbuild() calls SpGistInitMetapage() before marking the metapage buffer
dirty. The latter already sets pd_lower correctly, so we don't need to do
it explicitly in spgbuild() itself.Check.
spgGetCache() doesn't write the metapage, only reads it:
/* Last, get the lastUsedPages data from the metapage */
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
if (metadata->magicNumber != SPGIST_MAGIC_NUMBER)
elog(ERROR, "index \"%s\" is not an SP-GiST index",
RelationGetRelationName(index));cache->lastUsedPages = metadata->lastUsedPages;
UnlockReleaseBuffer(metabuffer);
So, I think it won't be correct to set pd_lower here, no?
Yeah, I am just reading the code again and there is no alarm here.
Updated patch attached, which implements your suggested changes to the
masking functions.By the way, as I noted on another unrelated thread, I will not be able to
respond to emails from tomorrow until 9/23.No problem. Enjoy your vacations.
I have spent some time looking at the XLOG insert code, and tested the
compressibility of the meta pages. And I have noticed that all pages
registered with REGBUF_WILL_INIT will force XLogRecordAssemble to not
take a FPW of the block registered because the page will be
reinitialized at replay, and in such cases the zero'ed page is filled
with the data from the record. log_newpage is used to initialize init
forks for unlogged relations, which is where this patch allows page
compression to take effect correctly. Still setting REGBUF_STANDARD
with REGBUF_WILL_INIT is actually a no-op, except if
wal_checking_consistency is used when registering a buffer for WAL
insertion. There is one code path though where things are useful all
the time: revmap_physical_extend for BRIN.The patch is using the correct logic, still such comments are in my opinion incorrect because of the reason written above: + * This won't be of any help unless we use option like REGBUF_STANDARD + * while registering the buffer for metapage as otherwise, it won't try to + * remove the hole while logging the full page image. Here REGBUF_STANDARD is actually a no-op for btree.You have already noticed above that it will help when
wal_checking_consistency is used and that was the main motivation to
pass REGBUF_STANDARD apart from maintaining consistency. It is not
clear to me what is bothering you. If your only worry about these
patches is that you want this sentence to be removed from the comment
because you think it is obvious or doesn't make much sense, then I
think we can leave this decision to committer. I have added it based
on Tom's suggestion above thread about explaining why it is
inessential or essential to set pd_lower. I think Amit Langote just
tried to mimic what I have done in hash and btree patches to maintain
consistency. I am also not very sure if we should write some detailed
comment or leave the existing comment as it is. I think it is just a
matter of different perspective.
What is disturbing me a bit is that the existing comments mention
something that could be supported (the compression of pages), but
that's actually not done and this is unlikely to happen because the
number of bytes associated to a meta page is going to be always
cheaper than a FPW, which would cost in CPU to store it for
compression is enabled. So I think that we should switch comments to
mention that pd_lower is set so as this helps with page masking, but
we don't take advantage of XLOG compression in the code. I agree that
this is a minor point, so if the wave of this thread is that I am too
noisy, please feel free to ignore me: the logic of the patches is
still correct, still having those comments feels like cheating a bit
;p
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 18, 2017 at 4:03 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Mon, Sep 18, 2017 at 11:16 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Sep 16, 2017 at 7:15 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:You have already noticed above that it will help when
wal_checking_consistency is used and that was the main motivation to
pass REGBUF_STANDARD apart from maintaining consistency. It is not
clear to me what is bothering you. If your only worry about these
patches is that you want this sentence to be removed from the comment
because you think it is obvious or doesn't make much sense, then I
think we can leave this decision to committer. I have added it based
on Tom's suggestion above thread about explaining why it is
inessential or essential to set pd_lower. I think Amit Langote just
tried to mimic what I have done in hash and btree patches to maintain
consistency. I am also not very sure if we should write some detailed
comment or leave the existing comment as it is. I think it is just a
matter of different perspective.What is disturbing me a bit is that the existing comments mention
something that could be supported (the compression of pages), but
that's actually not done and this is unlikely to happen because the
number of bytes associated to a meta page is going to be always
cheaper than a FPW, which would cost in CPU to store it for
compression is enabled. So I think that we should switch comments to
mention that pd_lower is set so as this helps with page masking, but
we don't take advantage of XLOG compression in the code.
I think that is not true because we do need FPW for certain usages of
metapage. Consider a case in _hash_doinsert where register metabuf
with just
REGBUF_STANDARD, it can take advantage of removing the hole if
pd_lower is set to its correct position. There are other similar
usages in hash index. For other indexes like btree, there is no such
usage currently, but it can also take advantage for
wal_consistency_checking. Now, probably there is an argument that we
use different comments for different indexes as the usage varies, but
I think someone looking at code after reading the comments can
differentiate such cases.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 19, 2017 at 12:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Sep 18, 2017 at 4:03 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Mon, Sep 18, 2017 at 11:16 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Sep 16, 2017 at 7:15 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:You have already noticed above that it will help when
wal_checking_consistency is used and that was the main motivation to
pass REGBUF_STANDARD apart from maintaining consistency. It is not
clear to me what is bothering you. If your only worry about these
patches is that you want this sentence to be removed from the comment
because you think it is obvious or doesn't make much sense, then I
think we can leave this decision to committer. I have added it based
on Tom's suggestion above thread about explaining why it is
inessential or essential to set pd_lower. I think Amit Langote just
tried to mimic what I have done in hash and btree patches to maintain
consistency. I am also not very sure if we should write some detailed
comment or leave the existing comment as it is. I think it is just a
matter of different perspective.What is disturbing me a bit is that the existing comments mention
something that could be supported (the compression of pages), but
that's actually not done and this is unlikely to happen because the
number of bytes associated to a meta page is going to be always
cheaper than a FPW, which would cost in CPU to store it for
compression is enabled. So I think that we should switch comments to
mention that pd_lower is set so as this helps with page masking, but
we don't take advantage of XLOG compression in the code.I think that is not true because we do need FPW for certain usages of
metapage. Consider a case in _hash_doinsert where register metabuf
with just
REGBUF_STANDARD, it can take advantage of removing the hole if
pd_lower is set to its correct position.
I am not saying that no index AMs take advantage FPW compressibility
for their meta pages. There are cases like this one, as well as one
code path in BRIN where this is useful, and it is useful as well when
logging FPWs of the init forks for unlogged relations.
There are other similar
usages in hash index. For other indexes like btree, there is no such
usage currently, but it can also take advantage for
wal_consistency_checking. Now, probably there is an argument that we
use different comments for different indexes as the usage varies, but
I think someone looking at code after reading the comments can
differentiate such cases.
I'd think about adjusting the comments the proper way for each AM so
as one can read those comments and catch any limitation immediately.
The fact this has not been pointed out on this thread before any
checks and the many mails exchanged on the matter on this thread make
me think that the current code does not outline the current code
properties appropriately.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 19, 2017 at 12:57 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
I'd think about adjusting the comments the proper way for each AM so
as one can read those comments and catch any limitation immediately.
The fact this has not been pointed out on this thread before any
checks and the many mails exchanged on the matter on this thread make
me think that the current code does not outline the current code
properties appropriately.
Another thing that we could consider as well is adding an assertion in
XLogRegisterBuffer & friends so as the combination of REGBUF_STANDARD
and REGBUF_NO_IMAGE is forbidden. That's bugging me as well.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 19, 2017 at 9:27 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Tue, Sep 19, 2017 at 12:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Sep 18, 2017 at 4:03 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Mon, Sep 18, 2017 at 11:16 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Sep 16, 2017 at 7:15 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:You have already noticed above that it will help when
wal_checking_consistency is used and that was the main motivation to
pass REGBUF_STANDARD apart from maintaining consistency. It is not
clear to me what is bothering you. If your only worry about these
patches is that you want this sentence to be removed from the comment
because you think it is obvious or doesn't make much sense, then I
think we can leave this decision to committer. I have added it based
on Tom's suggestion above thread about explaining why it is
inessential or essential to set pd_lower. I think Amit Langote just
tried to mimic what I have done in hash and btree patches to maintain
consistency. I am also not very sure if we should write some detailed
comment or leave the existing comment as it is. I think it is just a
matter of different perspective.What is disturbing me a bit is that the existing comments mention
something that could be supported (the compression of pages), but
that's actually not done and this is unlikely to happen because the
number of bytes associated to a meta page is going to be always
cheaper than a FPW, which would cost in CPU to store it for
compression is enabled. So I think that we should switch comments to
mention that pd_lower is set so as this helps with page masking, but
we don't take advantage of XLOG compression in the code.I think that is not true because we do need FPW for certain usages of
metapage. Consider a case in _hash_doinsert where register metabuf
with just
REGBUF_STANDARD, it can take advantage of removing the hole if
pd_lower is set to its correct position.I am not saying that no index AMs take advantage FPW compressibility
for their meta pages. There are cases like this one, as well as one
code path in BRIN where this is useful, and it is useful as well when
logging FPWs of the init forks for unlogged relations.
Hmm, why is it useful for logging FPWs of the init forks for unlogged
relations? We don't use REGBUF_STANDARD in those cases.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 19, 2017 at 9:33 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Tue, Sep 19, 2017 at 12:57 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:I'd think about adjusting the comments the proper way for each AM so
as one can read those comments and catch any limitation immediately.
The fact this has not been pointed out on this thread before any
checks and the many mails exchanged on the matter on this thread make
me think that the current code does not outline the current code
properties appropriately.Another thing that we could consider as well is adding an assertion in
XLogRegisterBuffer & friends so as the combination of REGBUF_STANDARD
and REGBUF_NO_IMAGE is forbidden. That's bugging me as well.
Is that related to this patch? If not, then maybe we can discuss it
in a separate thread.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Amit Kapila <amit.kapila16@gmail.com> writes:
On Tue, Sep 19, 2017 at 9:27 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:I am not saying that no index AMs take advantage FPW compressibility
for their meta pages. There are cases like this one, as well as one
code path in BRIN where this is useful, and it is useful as well when
logging FPWs of the init forks for unlogged relations.
Hmm, why is it useful for logging FPWs of the init forks for unlogged
relations? We don't use REGBUF_STANDARD in those cases.
But if we started to do so, that would be a concrete benefit of this
patch ...
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 20, 2017 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Kapila <amit.kapila16@gmail.com> writes:
On Tue, Sep 19, 2017 at 9:27 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:I am not saying that no index AMs take advantage FPW compressibility
for their meta pages. There are cases like this one, as well as one
code path in BRIN where this is useful, and it is useful as well when
logging FPWs of the init forks for unlogged relations.Hmm, why is it useful for logging FPWs of the init forks for unlogged
relations? We don't use REGBUF_STANDARD in those cases.But if we started to do so, that would be a concrete benefit of this
patch ...
In the proposed set of patches, all the empty() routines part of index
AMs which use log_newpage_buffer() (brin, gin, spgst) are doing the
right thing by updating log_newpage_buffer(). btree also should have
its call to log_newpage updated in btbuildempty(), and your patch is
missing that. Also, _hash_init() would need some extra work to
generate FPWs, but I don't think that it is necessary per its handling
of a per-record meta data either. So REGBUF_STANDARD could be just
removed from there, and there is actually no need to patch
src/backend/access/hash at all.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 20, 2017 at 4:25 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Wed, Sep 20, 2017 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Kapila <amit.kapila16@gmail.com> writes:
On Tue, Sep 19, 2017 at 9:27 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:I am not saying that no index AMs take advantage FPW compressibility
for their meta pages. There are cases like this one, as well as one
code path in BRIN where this is useful, and it is useful as well when
logging FPWs of the init forks for unlogged relations.Hmm, why is it useful for logging FPWs of the init forks for unlogged
relations? We don't use REGBUF_STANDARD in those cases.But if we started to do so, that would be a concrete benefit of this
patch ...In the proposed set of patches, all the empty() routines part of index
AMs which use log_newpage_buffer() (brin, gin, spgst) are doing the
right thing by updating log_newpage_buffer(). btree also should have
its call to log_newpage updated in btbuildempty(), and your patch is
missing that.
We can add that for btree patch.
Also, _hash_init() would need some extra work to
generate FPWs, but I don't think that it is necessary per its handling
of a per-record meta data either. So REGBUF_STANDARD could be just
removed from there, and there is actually no need to patch
src/backend/access/hash at all.
I think there is no need to remove it. As per discussion above, we
want to keep REGBUF_STANDARD for all metapage initializations for the
matter of consistency and that will be useful for
wal_consistency_checking in which case we anyway need full page image.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 20, 2017 at 12:49 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 20, 2017 at 4:25 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:Also, _hash_init() would need some extra work to
generate FPWs, but I don't think that it is necessary per its handling
of a per-record meta data either. So REGBUF_STANDARD could be just
removed from there, and there is actually no need to patch
src/backend/access/hash at all.I think there is no need to remove it. As per discussion above, we
want to keep REGBUF_STANDARD for all metapage initializations for the
matter of consistency and that will be useful for
wal_consistency_checking in which case we anyway need full page image.
Arf, yes. You are indeed right. I misread that you still need that
anyway in XLogRecordAssemble.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 20, 2017 at 9:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 20, 2017 at 4:25 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Wed, Sep 20, 2017 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Kapila <amit.kapila16@gmail.com> writes:
On Tue, Sep 19, 2017 at 9:27 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:I am not saying that no index AMs take advantage FPW compressibility
for their meta pages. There are cases like this one, as well as one
code path in BRIN where this is useful, and it is useful as well when
logging FPWs of the init forks for unlogged relations.Hmm, why is it useful for logging FPWs of the init forks for unlogged
relations? We don't use REGBUF_STANDARD in those cases.But if we started to do so, that would be a concrete benefit of this
patch ...In the proposed set of patches, all the empty() routines part of index
AMs which use log_newpage_buffer() (brin, gin, spgst) are doing the
right thing by updating log_newpage_buffer(). btree also should have
its call to log_newpage updated in btbuildempty(), and your patch is
missing that.We can add that for btree patch.
Added and updated the comments for both btree and hash index patches.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
change_metapage_usage_btree-v2.patchapplication/octet-stream; name=change_metapage_usage_btree-v2.patchDownload
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index bf963fc..5cbaba1 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -898,7 +898,7 @@ _bt_insertonpg(Relation rel,
xlmeta.fastroot = metad->btm_fastroot;
xlmeta.fastlevel = metad->btm_fastlevel;
- XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBufData(2, (char *) &xlmeta, sizeof(xl_btree_metadata));
xlinfo = XLOG_BTREE_INSERT_META;
@@ -2032,7 +2032,7 @@ _bt_newroot(Relation rel, Buffer lbuf, Buffer rbuf)
XLogRegisterBuffer(0, rootbuf, REGBUF_WILL_INIT);
XLogRegisterBuffer(1, lbuf, REGBUF_STANDARD);
- XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT | REGBUF_STANDARD);
md.root = rootblknum;
md.level = metad->btm_level;
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 10697e9..3dfcc0c 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -65,8 +65,14 @@ _bt_initmetapage(Page page, BlockNumber rootbknum, uint32 level)
metaopaque->btpo_flags = BTP_META;
/*
- * Set pd_lower just past the end of the metadata. This is not essential
- * but it makes the page look compressible to xlog.c.
+ * Set pd_lower just past the end of the metadata. This is required for
+ * correctness as otherwise, the pd_lower would be pointing to the
+ * position as set by _bt_pageinit and that can lead to logging the
+ * incorrect metadata while backing up full page image in xloginsert.c.
+ * Currently, the advantage of setting pd_lower is in limited cases like
+ * during wal_consistency_checking or while logging for unlogged relation
+ * as for all other purposes, we initialize the metapage. Note, it also
+ * helps in page masking by allowing to mask unused space.
*/
((PageHeader) page)->pd_lower =
((char *) metad + sizeof(BTMetaPageData)) - (char *) page;
@@ -241,7 +247,7 @@ _bt_getroot(Relation rel, int access)
XLogBeginInsert();
XLogRegisterBuffer(0, rootbuf, REGBUF_WILL_INIT);
- XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT | REGBUF_STANDARD);
md.root = rootblkno;
md.level = 0;
@@ -1827,7 +1833,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
if (BufferIsValid(metabuf))
{
- XLogRegisterBuffer(4, metabuf, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(4, metabuf, REGBUF_WILL_INIT | REGBUF_STANDARD);
xlmeta.root = metad->btm_root;
xlmeta.level = metad->btm_level;
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 3dbafdd..399e6a1 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -298,7 +298,7 @@ btbuildempty(Relation index)
smgrwrite(index->rd_smgr, INIT_FORKNUM, BTREE_METAPAGE,
(char *) metapage, true);
log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
- BTREE_METAPAGE, metapage, false);
+ BTREE_METAPAGE, metapage, true);
/*
* An immediate sync is required even if we xlog'd the page, because the
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 82337f8..3ccb8ad 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -107,8 +107,9 @@ _bt_restore_meta(XLogReaderState *record, uint8 block_id)
pageop->btpo_flags = BTP_META;
/*
- * Set pd_lower just past the end of the metadata. This is not essential
- * but it makes the page look compressible to xlog.c.
+ * Set pd_lower just past the end of the metadata. This is essential and
+ * it makes the page look compressible to xloginsert.c. See
+ * _bt_initmetapage.
*/
((PageHeader) metapg)->pd_lower =
((char *) md + sizeof(BTMetaPageData)) - (char *) metapg;
change_metapage_usage_hash-v2.patchapplication/octet-stream; name=change_metapage_usage_hash-v2.patchDownload
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index f279dce..185b6d7 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -403,7 +403,7 @@ _hash_init(Relation rel, double num_tuples, ForkNumber forkNum)
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfHashInitMetaPage);
- XLogRegisterBuffer(0, metabuf, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuf, REGBUF_WILL_INIT | REGBUF_STANDARD);
recptr = XLogInsert(RM_HASH_ID, XLOG_HASH_INIT_META_PAGE);
@@ -592,8 +592,11 @@ _hash_init_metabuffer(Buffer buf, double num_tuples, RegProcedure procid,
metap->hashm_firstfree = 0;
/*
- * Set pd_lower just past the end of the metadata. This is to log full
- * page image of metapage in xloginsert.c.
+ * Set pd_lower just past the end of the metadata. This is required for
+ * correctness as otherwise, the pd_lower would be pointing to the
+ * position as set by _hash_pageinit and that can lead to logging the
+ * incorrect metadata while backing up full page image in xloginsert.c.
+ * Note, it also helps in page masking by allowing to mask unused space.
*/
((PageHeader) page)->pd_lower =
((char *) metap + sizeof(HashMetaPageData)) - (char *) page;
On Sun, Sep 24, 2017 at 2:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Added and updated the comments for both btree and hash index patches.
I don't have real complaints about this patch, this looks fine to me.
+ * Currently, the advantage of setting pd_lower is in limited cases like
+ * during wal_consistency_checking or while logging for unlogged relation
+ * as for all other purposes, we initialize the metapage. Note, it also
+ * helps in page masking by allowing to mask unused space.
I would have reworked this comment a bit, say like that:
Setting pd_lower is useful for two cases which make use of WAL
compressibility even if the meta page is initialized at replay:
- Logging of init forks for unlogged relations.
- wal_consistency_checking logs extra full-page writes, and this
allows masking of the unused space of the page.
Now I often get complains that I suck at this exercise ;)
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 25, 2017 at 10:13 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Sun, Sep 24, 2017 at 2:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Added and updated the comments for both btree and hash index patches.
I don't have real complaints about this patch, this looks fine to me.
+ * Currently, the advantage of setting pd_lower is in limited cases like + * during wal_consistency_checking or while logging for unlogged relation + * as for all other purposes, we initialize the metapage. Note, it also + * helps in page masking by allowing to mask unused space. I would have reworked this comment a bit, say like that: Setting pd_lower is useful for two cases which make use of WAL compressibility even if the meta page is initialized at replay: - Logging of init forks for unlogged relations. - wal_consistency_checking logs extra full-page writes, and this allows masking of the unused space of the page.Now I often get complains that I suck at this exercise ;)
I understand that and I think there are always multiple ways to write
same information. It might be better to pass this patch series to
committer if you don't see any mistake because he might anyway change
some comments before committing.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 25, 2017 at 2:26 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
I understand that and I think there are always multiple ways to write
same information. It might be better to pass this patch series to
committer if you don't see any mistake because he might anyway change
some comments before committing.
Yeah, agreed. I don't mind doing that as well honestly.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi.
Trying to catch up.
On 2017/09/25 13:43, Michael Paquier wrote:
On Sun, Sep 24, 2017 at 2:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Added and updated the comments for both btree and hash index patches.
I don't have real complaints about this patch, this looks fine to me.
+ * Currently, the advantage of setting pd_lower is in limited cases like + * during wal_consistency_checking or while logging for unlogged relation + * as for all other purposes, we initialize the metapage. Note, it also + * helps in page masking by allowing to mask unused space. I would have reworked this comment a bit, say like that: Setting pd_lower is useful for two cases which make use of WAL compressibility even if the meta page is initialized at replay: - Logging of init forks for unlogged relations. - wal_consistency_checking logs extra full-page writes, and this allows masking of the unused space of the page.Now I often get complains that I suck at this exercise ;)
So, I think I understand the discussion so far and the arguments about
what we should write to explain why we're setting pd_lower to the correct
value.
Just to remind, I didn't actually start this thread [1]/messages/by-id/0d273805-0e9e-ec1a-cb84-d4da400b8f85@lab.ntt.co.jp to address the
issue that the FPWs of meta pages written to WAL are not getting
compressed. An external backup tool relies on pd_lower to give the
correct starting offset of the hole to compress, provided the page's other
fields suggest it has the standard page layout. Since we discovered that
pd_lower is not set correctly in gin, brin, and spgist meta pages, I
created patches to do the same. You (Michael) pointed out that that
actually helps compress their FPW images in WAL as well, so we began
considering that. Also, you pointed out that WAL checking code masks
pages based on the respective masking functions' assumptions about the
page's layout properties, which the original patches forgot to consider.
So, we updated the patches to change the respective masking functions to
mask meta pages as pages with standard page layout, too.
But as Tom pointed out [2]/messages/by-id/22268.1504815869@sss.pgh.pa.us, WAL compressibility benefit cannot be obtained
unless we change how the meta page is passed to xlog.c to be written to
WAL. So, we found out all the places that write/register the meta page to
WAL and changed the respective XLogRegisterBuffer calls to include the
REGBUG_STANDARD flag. Some of these calls already passed
REGBUF_WILL_INIT, which would result in no FPW image to be written to the
WAL and so there would be no question of compressibility. But, passing
REGBUF_STANDARD would still help the case where WAL checking is enabled,
because FPW image *is* written in that case.
So, ISTM, comments that the patches add should all say that setting the
meta pages' pd_lower to the correct value helps to pass those pages to
xlog.c as compressible standard layout pages, regardless of whether they
are actually passed that way. Even if the patches do take care of the
latter as well.
Did I miss something?
Looking at Amit K's updated patches for btree and hash, it seems that he
updated the comment to say that setting pd_lower to the correct value is
*essential*, because those pages are passed as REGBUF_STANDARD pages and
hence will be compressed. That seems to contradict what I wrote above,
but I think that's not wrong, too. So, I updated the gin, brin, and
spgist patches to say the same, viz. the following:
/*
* Set pd_lower just past the end of the metadata. This is essential,
* because without doing so, metadata will be lost if xlog.c compresses
* the page.
*/
Attached updated patches.
Thanks,
Amit
[1]: /messages/by-id/0d273805-0e9e-ec1a-cb84-d4da400b8f85@lab.ntt.co.jp
/messages/by-id/0d273805-0e9e-ec1a-cb84-d4da400b8f85@lab.ntt.co.jp
Attachments:
0001-Set-pd_lower-correctly-in-the-GIN-metapage.patchtext/plain; charset=UTF-8; name=0001-Set-pd_lower-correctly-in-the-GIN-metapage.patchDownload
From 600caa054e98673428fc9595b13df54efe06a6a4 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 23 Jun 2017 11:20:41 +0900
Subject: [PATCH 1/5] Set pd_lower correctly in the GIN metapage.
Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
src/backend/access/gin/ginfast.c | 20 ++++++++++++++++++--
src/backend/access/gin/gininsert.c | 4 ++--
src/backend/access/gin/ginutil.c | 17 ++++++++++++++++-
src/backend/access/gin/ginxlog.c | 25 ++++++++++---------------
4 files changed, 46 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 59e435465a..359de85dfc 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -399,6 +399,14 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
/*
* Write metabuffer, make xlog entry
*/
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is essential,
+ * because without doing so, metadata will be lost if xlog.c compresses
+ * the page.
+ */
+ ((PageHeader) metapage)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) metapage;
MarkBufferDirty(metabuffer);
if (needWal)
@@ -407,7 +415,7 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
memcpy(&data.metadata, metadata, sizeof(GinMetaPageData));
- XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterData((char *) &data, sizeof(ginxlogUpdateMeta));
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_UPDATE_META_PAGE);
@@ -572,6 +580,13 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
metadata->nPendingHeapTuples = 0;
}
+ /*
+ * Set pd_lower just past the end of the metadata. This is essential,
+ * because without doing so, metadata will be lost if xlog.c compresses
+ * the page.
+ */
+ ((PageHeader) metapage)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) metapage;
MarkBufferDirty(metabuffer);
for (i = 0; i < data.ndeleted; i++)
@@ -586,7 +601,8 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
XLogRecPtr recptr;
XLogBeginInsert();
- XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuffer,
+ REGBUF_WILL_INIT | REGBUF_STANDARD);
for (i = 0; i < data.ndeleted; i++)
XLogRegisterBuffer(i + 1, buffers[i], REGBUF_WILL_INIT);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 5378011f50..c9aa4ee147 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -348,7 +348,7 @@ ginbuild(Relation heap, Relation index, IndexInfo *indexInfo)
Page page;
XLogBeginInsert();
- XLogRegisterBuffer(0, MetaBuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, MetaBuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBuffer(1, RootBuffer, REGBUF_WILL_INIT);
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_CREATE_INDEX);
@@ -447,7 +447,7 @@ ginbuildempty(Relation index)
START_CRIT_SECTION();
GinInitMetabuffer(MetaBuffer);
MarkBufferDirty(MetaBuffer);
- log_newpage_buffer(MetaBuffer, false);
+ log_newpage_buffer(MetaBuffer, true);
GinInitBuffer(RootBuffer, GIN_LEAF);
MarkBufferDirty(RootBuffer);
log_newpage_buffer(RootBuffer, false);
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 136ea27718..f6e776dbe9 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -374,6 +374,14 @@ GinInitMetabuffer(Buffer b)
metadata->nDataPages = 0;
metadata->nEntries = 0;
metadata->ginVersion = GIN_CURRENT_VERSION;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is essential,
+ * because without doing so, metadata will be lost if xlog.c compresses
+ * the page.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) page;
}
/*
@@ -676,6 +684,13 @@ ginUpdateStats(Relation index, const GinStatsData *stats)
metadata->nDataPages = stats->nDataPages;
metadata->nEntries = stats->nEntries;
+ /*
+ * Set pd_lower just past the end of the metadata. This is essential,
+ * because without doing so, metadata will be lost if xlog.c compresses
+ * the page.
+ */
+ ((PageHeader) metapage)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) metapage;
MarkBufferDirty(metabuffer);
if (RelationNeedsWAL(index))
@@ -690,7 +705,7 @@ ginUpdateStats(Relation index, const GinStatsData *stats)
XLogBeginInsert();
XLogRegisterData((char *) &data, sizeof(ginxlogUpdateMeta));
- XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_UPDATE_META_PAGE);
PageSetLSN(metapage, recptr);
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 92cafe950b..2261447926 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -514,7 +514,7 @@ ginRedoUpdateMetapage(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
MarkBufferDirty(metabuffer);
@@ -656,7 +656,7 @@ ginRedoDeleteListPages(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
@@ -768,6 +768,7 @@ void
gin_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
GinPageOpaque opaque;
mask_page_lsn_and_checksum(page);
@@ -776,18 +777,12 @@ gin_mask(char *pagedata, BlockNumber blkno)
mask_page_hint_bits(page);
/*
- * GIN metapage doesn't use pd_lower/pd_upper. Other page types do. Hence,
- * we need to apply masking for those pages.
+ * For GIN_DELETED page, the page is initialized to empty. Hence, mask
+ * the page content. For other pages, mask the hole if the pd_lower
+ * appears to have been set correctly.
*/
- if (opaque->flags != GIN_META)
- {
- /*
- * For GIN_DELETED page, the page is initialized to empty. Hence, mask
- * the page content.
- */
- if (opaque->flags & GIN_DELETED)
- mask_page_content(page);
- else
- mask_unused_space(page);
- }
+ if (opaque->flags & GIN_DELETED)
+ mask_page_content(page);
+ else if (pagehdr->pd_lower > SizeOfPageHeaderData)
+ mask_unused_space(page);
}
--
2.11.0
0002-Set-pd_lower-correctly-in-the-BRIN-index-metapage.patchtext/plain; charset=UTF-8; name=0002-Set-pd_lower-correctly-in-the-BRIN-index-metapage.patchDownload
From c5c2b0ebffdb48da530b6b2786e4ca3f58354a83 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 26 Jun 2017 15:13:32 +0900
Subject: [PATCH 2/5] Set pd_lower correctly in the BRIN index metapage
Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
src/backend/access/brin/brin.c | 4 ++--
src/backend/access/brin/brin_pageops.c | 8 ++++++++
src/backend/access/brin/brin_revmap.c | 10 +++++++++-
src/backend/access/brin/brin_xlog.c | 18 ++++++++++++++++--
4 files changed, 35 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index b3aa6d1ced..e6909d7aea 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -685,7 +685,7 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfBrinCreateIdx);
- XLogRegisterBuffer(0, meta, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, meta, REGBUF_WILL_INIT | REGBUF_STANDARD);
recptr = XLogInsert(RM_BRIN_ID, XLOG_BRIN_CREATE_INDEX);
@@ -742,7 +742,7 @@ brinbuildempty(Relation index)
brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index),
BRIN_CURRENT_VERSION);
MarkBufferDirty(metabuf);
- log_newpage_buffer(metabuf, false);
+ log_newpage_buffer(metabuf, true);
END_CRIT_SECTION();
UnlockReleaseBuffer(metabuf);
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 80f803e438..30bc5051ff 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -491,6 +491,14 @@ brin_metapage_init(Page page, BlockNumber pagesPerRange, uint16 version)
* revmap page to be created when the index is.
*/
metadata->lastRevmapPage = 0;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is essential,
+ * because without doing so, metadata will be lost if xlog.c compresses
+ * the page.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 22f2076887..a276dc79ad 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -624,6 +624,14 @@ revmap_physical_extend(BrinRevmap *revmap)
MarkBufferDirty(buf);
metadata->lastRevmapPage = mapBlk;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is essential,
+ * because without doing so, metadata will be lost if xlog.c compresses
+ * the page.
+ */
+ ((PageHeader) metapage)->pd_lower =
+ ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) metapage;
MarkBufferDirty(revmap->rm_metaBuf);
if (RelationNeedsWAL(revmap->rm_irel))
@@ -635,7 +643,7 @@ revmap_physical_extend(BrinRevmap *revmap)
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfBrinRevmapExtend);
- XLogRegisterBuffer(0, revmap->rm_metaBuf, 0);
+ XLogRegisterBuffer(0, revmap->rm_metaBuf, REGBUF_STANDARD);
XLogRegisterBuffer(1, buf, REGBUF_WILL_INIT);
diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c
index 60daa54a95..55d4dea891 100644
--- a/src/backend/access/brin/brin_xlog.c
+++ b/src/backend/access/brin/brin_xlog.c
@@ -234,6 +234,14 @@ brin_xlog_revmap_extend(XLogReaderState *record)
metadata->lastRevmapPage = xlrec->targetBlk;
PageSetLSN(metapg, lsn);
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is essential,
+ * because without doing so, metadata will be lost if xlog.c
+ * compresses the page.
+ */
+ ((PageHeader) metapg)->pd_lower =
+ ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) metapg;
MarkBufferDirty(metabuf);
}
@@ -331,14 +339,20 @@ void
brin_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
mask_page_lsn_and_checksum(page);
mask_page_hint_bits(page);
- if (BRIN_IS_REGULAR_PAGE(page))
+ /*
+ * Regular brin pages contain unused space which needs to be masked.
+ * Similarly for meta pages, but mask it only if pd_lower appears to have
+ * been set correctly.
+ */
+ if (BRIN_IS_REGULAR_PAGE(page) ||
+ (BRIN_IS_META_PAGE(page) && pagehdr->pd_lower > SizeOfPageHeaderData))
{
- /* Regular brin pages contain unused space which needs to be masked. */
mask_unused_space(page);
}
}
--
2.11.0
0003-Set-pd_lower-correctly-in-the-SP-GiST-index-metapage.patchtext/plain; charset=UTF-8; name=0003-Set-pd_lower-correctly-in-the-SP-GiST-index-metapage.patchDownload
From b0c7b929fdc78db4252af5f5ef5e1814cb441f59 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 26 Jun 2017 15:23:34 +0900
Subject: [PATCH 3/5] Set pd_lower correctly in the SP-GiST index metapage
Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
src/backend/access/spgist/spginsert.c | 4 ++--
src/backend/access/spgist/spgutils.c | 17 +++++++++++++++++
src/backend/access/spgist/spgxlog.c | 7 ++++---
3 files changed, 23 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index e4b2c29b0e..80b82e1602 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -110,7 +110,7 @@ spgbuild(Relation heap, Relation index, IndexInfo *indexInfo)
* Replay will re-initialize the pages, so don't take full pages
* images. No other data to log.
*/
- XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBuffer(1, rootbuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBuffer(2, nullbuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
@@ -173,7 +173,7 @@ spgbuildempty(Relation index)
smgrwrite(index->rd_smgr, INIT_FORKNUM, SPGIST_METAPAGE_BLKNO,
(char *) page, true);
log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
- SPGIST_METAPAGE_BLKNO, page, false);
+ SPGIST_METAPAGE_BLKNO, page, true);
/* Likewise for the root page. */
SpGistInitPage(page, SPGIST_LEAF);
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 22f64b0103..9e47883ef6 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -256,15 +256,24 @@ SpGistUpdateMetaPage(Relation index)
if (cache != NULL)
{
Buffer metabuffer;
+ Page metapage;
SpGistMetaPageData *metadata;
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
+ metapage = BufferGetPage(metabuffer);
if (ConditionalLockBuffer(metabuffer))
{
metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
metadata->lastUsedPages = cache->lastUsedPages;
+ /*
+ * Set pd_lower just past the end of the metadata. This is
+ * essential, because without doing so, metadata will be lost if
+ * xlog.c compresses the page.
+ */
+ ((PageHeader) metapage)->pd_lower = ((char *) metadata +
+ sizeof(SpGistMetaPageData)) - (char *) metapage;
MarkBufferDirty(metabuffer);
UnlockReleaseBuffer(metabuffer);
}
@@ -534,6 +543,14 @@ SpGistInitMetapage(Page page)
/* initialize last-used-page cache to empty */
for (i = 0; i < SPGIST_CACHED_PAGES; i++)
metadata->lastUsedPages.cachedPage[i].blkno = InvalidBlockNumber;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is essential,
+ * because without doing so, metadata will be lost if xlog.c compresses
+ * the page.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(SpGistMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index 87def79ee5..539ce8eb8c 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -1033,15 +1033,16 @@ void
spg_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
mask_page_lsn_and_checksum(page);
mask_page_hint_bits(page);
/*
- * Any SpGist page other than meta contains unused space which needs to be
- * masked.
+ * Mask the unused space, only if the page's pd_lower appears to have been
+ * set correctly.
*/
- if (!SpGistPageIsMeta(page))
+ if (pagehdr->pd_lower > SizeOfPageHeaderData)
mask_unused_space(page);
}
--
2.11.0
On Mon, Sep 25, 2017 at 12:18 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Hi.
Trying to catch up.
On 2017/09/25 13:43, Michael Paquier wrote:
On Sun, Sep 24, 2017 at 2:25 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Added and updated the comments for both btree and hash index patches.
I don't have real complaints about this patch, this looks fine to me.
+ * Currently, the advantage of setting pd_lower is in limited cases like + * during wal_consistency_checking or while logging for unlogged relation + * as for all other purposes, we initialize the metapage. Note, it also + * helps in page masking by allowing to mask unused space. I would have reworked this comment a bit, say like that: Setting pd_lower is useful for two cases which make use of WAL compressibility even if the meta page is initialized at replay: - Logging of init forks for unlogged relations. - wal_consistency_checking logs extra full-page writes, and this allows masking of the unused space of the page.Now I often get complains that I suck at this exercise ;)
So, I think I understand the discussion so far and the arguments about
what we should write to explain why we're setting pd_lower to the correct
value.Just to remind, I didn't actually start this thread [1] to address the
issue that the FPWs of meta pages written to WAL are not getting
compressed. An external backup tool relies on pd_lower to give the
correct starting offset of the hole to compress, provided the page's other
fields suggest it has the standard page layout. Since we discovered that
pd_lower is not set correctly in gin, brin, and spgist meta pages, I
created patches to do the same. You (Michael) pointed out that that
actually helps compress their FPW images in WAL as well, so we began
considering that. Also, you pointed out that WAL checking code masks
pages based on the respective masking functions' assumptions about the
page's layout properties, which the original patches forgot to consider.
So, we updated the patches to change the respective masking functions to
mask meta pages as pages with standard page layout, too.But as Tom pointed out [2], WAL compressibility benefit cannot be obtained
unless we change how the meta page is passed to xlog.c to be written to
WAL. So, we found out all the places that write/register the meta page to
WAL and changed the respective XLogRegisterBuffer calls to include the
REGBUG_STANDARD flag. Some of these calls already passed
REGBUF_WILL_INIT, which would result in no FPW image to be written to the
WAL and so there would be no question of compressibility. But, passing
REGBUF_STANDARD would still help the case where WAL checking is enabled,
because FPW image *is* written in that case.So, ISTM, comments that the patches add should all say that setting the
meta pages' pd_lower to the correct value helps to pass those pages to
xlog.c as compressible standard layout pages, regardless of whether they
are actually passed that way. Even if the patches do take care of the
latter as well.Did I miss something?
Looking at Amit K's updated patches for btree and hash, it seems that he
updated the comment to say that setting pd_lower to the correct value is
*essential*, because those pages are passed as REGBUF_STANDARD pages and
hence will be compressed. That seems to contradict what I wrote above,
but I think that's not wrong, too.
I think you are missing that there are many cases where we use only
REGBUF_STANDARD for meta-pages (cf. hash index). For btree, where the
advantage is in fewer cases, I have explicitly stated those cases.
Do you still have confusion?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 25, 2017 at 3:48 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Trying to catch up.
Glad to see you back.
Just to remind, I didn't actually start this thread [1] to address the
issue that the FPWs of meta pages written to WAL are not getting
compressed. An external backup tool relies on pd_lower to give the
correct starting offset of the hole to compress, provided the page's other
fields suggest it has the standard page layout. Since we discovered that
pd_lower is not set correctly in gin, brin, and spgist meta pages, I
created patches to do the same. You (Michael) pointed out that that
actually helps compress their FPW images in WAL as well, so we began
considering that. Also, you pointed out that WAL checking code masks
pages based on the respective masking functions' assumptions about the
page's layout properties, which the original patches forgot to consider.
So, we updated the patches to change the respective masking functions to
mask meta pages as pages with standard page layout, too.
Thanks for summarizing all that happened.
But as Tom pointed out [2], WAL compressibility benefit cannot be obtained
unless we change how the meta page is passed to xlog.c to be written to
WAL. So, we found out all the places that write/register the meta page to
WAL and changed the respective XLogRegisterBuffer calls to include the
REGBUG_STANDARD flag. Some of these calls already passed
REGBUF_WILL_INIT, which would result in no FPW image to be written to the
WAL and so there would be no question of compressibility. But, passing
REGBUF_STANDARD would still help the case where WAL checking is enabled,
because FPW image *is* written in that case.
Yep.
So, ISTM, comments that the patches add should all say that setting the
meta pages' pd_lower to the correct value helps to pass those pages to
xlog.c as compressible standard layout pages, regardless of whether they
are actually passed that way. Even if the patches do take care of the
latter as well.Did I miss something?
Not that I think of.
Buffer metabuffer;
+ Page metapage;
SpGistMetaPageData *metadata;
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
+ metapage = BufferGetPage(metabuffer);
No need to define metapage here and to call BufferGetPage() as long as
the lock on the buffer is not taken.
Except that small thing, the patches do their duty.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/26 11:34, Amit Kapila wrote:
On Mon, Sep 25, 2017 at 12:18 PM, Amit Langote wrote:
So, ISTM, comments that the patches add should all say that setting the
meta pages' pd_lower to the correct value helps to pass those pages to
xlog.c as compressible standard layout pages, regardless of whether they
are actually passed that way. Even if the patches do take care of the
latter as well.Did I miss something?
Looking at Amit K's updated patches for btree and hash, it seems that he
updated the comment to say that setting pd_lower to the correct value is
*essential*, because those pages are passed as REGBUF_STANDARD pages and
hence will be compressed. That seems to contradict what I wrote above,
but I think that's not wrong, too.I think you are missing that there are many cases where we use only
REGBUF_STANDARD for meta-pages (cf. hash index). For btree, where the
advantage is in fewer cases, I have explicitly stated those cases.
I do see that there are some places that use only REGBUF_STANDARD. I also
understand that specifying this flag is necessary condition for
XLogRecordAssemble() to perform the hole compression, if it is to be
performed at all. ISTM, the hole is compressed only if we write the FP
image. However, reasons for why FP image needs to be written, AFAICS, are
independent of whether the hole is (should be) compressed or not. Whether
compression should occur or not depends on whether the REGBUF_STANDARD
flag is passed, that is, whether a caller is sure that the page is of
standard layout. The last part is only true if page's pd_lower is always
set correctly.
Perhaps, we should focus on just writing exactly why setting pd_lower to
the correct value is essential. I think that's a good change, so I
adopted it from your patch. The *only* reason why it's essential to set
pd_lower to the correct value, as I see it, is that xloginsert.c *might*
compress the hole in the page and its pd_lower better be correct if we
want compression to preserve the metadata content (in meta pages). OTOH,
it's not clear to me why we should write in that comment *why* the
compression would occur or why the FP image would be written in the first
place. Whether or not FP image is written has nothing to do with whether
we should set pd_lower correctly, does it? Also, since we want to repeat
the same comment in multiple places where we now set pd_lower (gin, brin,
spgist patches have many more new places where meta page's pd_lower is
set), keeping it to the point would be good.
But as you said a few times, we should leave it to the committer to decide
the exact text to write in these comment, which I think is fine.
Do you still have confusion?
This is an area of PG code I don't have much experience with and is also a
bit complex, so sorry if I'm saying things that are not quite right.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/26 12:17, Michael Paquier wrote:
On Mon, Sep 25, 2017 at 3:48 PM, Amit Langote wrote:
So, ISTM, comments that the patches add should all say that setting the
meta pages' pd_lower to the correct value helps to pass those pages to
xlog.c as compressible standard layout pages, regardless of whether they
are actually passed that way. Even if the patches do take care of the
latter as well.Did I miss something?
Not that I think of.
Thanks.
Buffer metabuffer;
+ Page metapage;
SpGistMetaPageData *metadata;metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
+ metapage = BufferGetPage(metabuffer);
No need to define metapage here and to call BufferGetPage() as long as
the lock on the buffer is not taken.
Ah, okay.
Moved those additions inside the if (ConditionalLockBuffer(metabuffer)) block.
Except that small thing, the patches do their duty.
Thanks, revised patches attached.
Regards,
Amit
Attachments:
0001-Set-pd_lower-correctly-in-the-GIN-metapage.patchtext/plain; charset=UTF-8; name=0001-Set-pd_lower-correctly-in-the-GIN-metapage.patchDownload
From e82e787bc9533977e73456b0782ed78354637bda Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 23 Jun 2017 11:20:41 +0900
Subject: [PATCH 1/5] Set pd_lower correctly in the GIN metapage.
Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
src/backend/access/gin/ginfast.c | 20 ++++++++++++++++++--
src/backend/access/gin/gininsert.c | 4 ++--
src/backend/access/gin/ginutil.c | 17 ++++++++++++++++-
src/backend/access/gin/ginxlog.c | 25 ++++++++++---------------
4 files changed, 46 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 59e435465a..359de85dfc 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -399,6 +399,14 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
/*
* Write metabuffer, make xlog entry
*/
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is essential,
+ * because without doing so, metadata will be lost if xlog.c compresses
+ * the page.
+ */
+ ((PageHeader) metapage)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) metapage;
MarkBufferDirty(metabuffer);
if (needWal)
@@ -407,7 +415,7 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
memcpy(&data.metadata, metadata, sizeof(GinMetaPageData));
- XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterData((char *) &data, sizeof(ginxlogUpdateMeta));
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_UPDATE_META_PAGE);
@@ -572,6 +580,13 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
metadata->nPendingHeapTuples = 0;
}
+ /*
+ * Set pd_lower just past the end of the metadata. This is essential,
+ * because without doing so, metadata will be lost if xlog.c compresses
+ * the page.
+ */
+ ((PageHeader) metapage)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) metapage;
MarkBufferDirty(metabuffer);
for (i = 0; i < data.ndeleted; i++)
@@ -586,7 +601,8 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
XLogRecPtr recptr;
XLogBeginInsert();
- XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuffer,
+ REGBUF_WILL_INIT | REGBUF_STANDARD);
for (i = 0; i < data.ndeleted; i++)
XLogRegisterBuffer(i + 1, buffers[i], REGBUF_WILL_INIT);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 5378011f50..c9aa4ee147 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -348,7 +348,7 @@ ginbuild(Relation heap, Relation index, IndexInfo *indexInfo)
Page page;
XLogBeginInsert();
- XLogRegisterBuffer(0, MetaBuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, MetaBuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBuffer(1, RootBuffer, REGBUF_WILL_INIT);
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_CREATE_INDEX);
@@ -447,7 +447,7 @@ ginbuildempty(Relation index)
START_CRIT_SECTION();
GinInitMetabuffer(MetaBuffer);
MarkBufferDirty(MetaBuffer);
- log_newpage_buffer(MetaBuffer, false);
+ log_newpage_buffer(MetaBuffer, true);
GinInitBuffer(RootBuffer, GIN_LEAF);
MarkBufferDirty(RootBuffer);
log_newpage_buffer(RootBuffer, false);
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 136ea27718..f6e776dbe9 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -374,6 +374,14 @@ GinInitMetabuffer(Buffer b)
metadata->nDataPages = 0;
metadata->nEntries = 0;
metadata->ginVersion = GIN_CURRENT_VERSION;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is essential,
+ * because without doing so, metadata will be lost if xlog.c compresses
+ * the page.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) page;
}
/*
@@ -676,6 +684,13 @@ ginUpdateStats(Relation index, const GinStatsData *stats)
metadata->nDataPages = stats->nDataPages;
metadata->nEntries = stats->nEntries;
+ /*
+ * Set pd_lower just past the end of the metadata. This is essential,
+ * because without doing so, metadata will be lost if xlog.c compresses
+ * the page.
+ */
+ ((PageHeader) metapage)->pd_lower =
+ ((char *) metadata + sizeof(GinMetaPageData)) - (char *) metapage;
MarkBufferDirty(metabuffer);
if (RelationNeedsWAL(index))
@@ -690,7 +705,7 @@ ginUpdateStats(Relation index, const GinStatsData *stats)
XLogBeginInsert();
XLogRegisterData((char *) &data, sizeof(ginxlogUpdateMeta));
- XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_UPDATE_META_PAGE);
PageSetLSN(metapage, recptr);
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 92cafe950b..2261447926 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -514,7 +514,7 @@ ginRedoUpdateMetapage(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
MarkBufferDirty(metabuffer);
@@ -656,7 +656,7 @@ ginRedoDeleteListPages(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
- GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+ GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
@@ -768,6 +768,7 @@ void
gin_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
GinPageOpaque opaque;
mask_page_lsn_and_checksum(page);
@@ -776,18 +777,12 @@ gin_mask(char *pagedata, BlockNumber blkno)
mask_page_hint_bits(page);
/*
- * GIN metapage doesn't use pd_lower/pd_upper. Other page types do. Hence,
- * we need to apply masking for those pages.
+ * For GIN_DELETED page, the page is initialized to empty. Hence, mask
+ * the page content. For other pages, mask the hole if the pd_lower
+ * appears to have been set correctly.
*/
- if (opaque->flags != GIN_META)
- {
- /*
- * For GIN_DELETED page, the page is initialized to empty. Hence, mask
- * the page content.
- */
- if (opaque->flags & GIN_DELETED)
- mask_page_content(page);
- else
- mask_unused_space(page);
- }
+ if (opaque->flags & GIN_DELETED)
+ mask_page_content(page);
+ else if (pagehdr->pd_lower > SizeOfPageHeaderData)
+ mask_unused_space(page);
}
--
2.11.0
0002-Set-pd_lower-correctly-in-the-BRIN-index-metapage.patchtext/plain; charset=UTF-8; name=0002-Set-pd_lower-correctly-in-the-BRIN-index-metapage.patchDownload
From 34c43597b35921c6b3e5cd6fb96b47bc19c99c16 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 26 Jun 2017 15:13:32 +0900
Subject: [PATCH 2/5] Set pd_lower correctly in the BRIN index metapage
Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
src/backend/access/brin/brin.c | 4 ++--
src/backend/access/brin/brin_pageops.c | 8 ++++++++
src/backend/access/brin/brin_revmap.c | 10 +++++++++-
src/backend/access/brin/brin_xlog.c | 18 ++++++++++++++++--
4 files changed, 35 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index b3aa6d1ced..e6909d7aea 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -685,7 +685,7 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfBrinCreateIdx);
- XLogRegisterBuffer(0, meta, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, meta, REGBUF_WILL_INIT | REGBUF_STANDARD);
recptr = XLogInsert(RM_BRIN_ID, XLOG_BRIN_CREATE_INDEX);
@@ -742,7 +742,7 @@ brinbuildempty(Relation index)
brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index),
BRIN_CURRENT_VERSION);
MarkBufferDirty(metabuf);
- log_newpage_buffer(metabuf, false);
+ log_newpage_buffer(metabuf, true);
END_CRIT_SECTION();
UnlockReleaseBuffer(metabuf);
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 80f803e438..30bc5051ff 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -491,6 +491,14 @@ brin_metapage_init(Page page, BlockNumber pagesPerRange, uint16 version)
* revmap page to be created when the index is.
*/
metadata->lastRevmapPage = 0;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is essential,
+ * because without doing so, metadata will be lost if xlog.c compresses
+ * the page.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 22f2076887..a276dc79ad 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -624,6 +624,14 @@ revmap_physical_extend(BrinRevmap *revmap)
MarkBufferDirty(buf);
metadata->lastRevmapPage = mapBlk;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is essential,
+ * because without doing so, metadata will be lost if xlog.c compresses
+ * the page.
+ */
+ ((PageHeader) metapage)->pd_lower =
+ ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) metapage;
MarkBufferDirty(revmap->rm_metaBuf);
if (RelationNeedsWAL(revmap->rm_irel))
@@ -635,7 +643,7 @@ revmap_physical_extend(BrinRevmap *revmap)
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfBrinRevmapExtend);
- XLogRegisterBuffer(0, revmap->rm_metaBuf, 0);
+ XLogRegisterBuffer(0, revmap->rm_metaBuf, REGBUF_STANDARD);
XLogRegisterBuffer(1, buf, REGBUF_WILL_INIT);
diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c
index 60daa54a95..55d4dea891 100644
--- a/src/backend/access/brin/brin_xlog.c
+++ b/src/backend/access/brin/brin_xlog.c
@@ -234,6 +234,14 @@ brin_xlog_revmap_extend(XLogReaderState *record)
metadata->lastRevmapPage = xlrec->targetBlk;
PageSetLSN(metapg, lsn);
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is essential,
+ * because without doing so, metadata will be lost if xlog.c
+ * compresses the page.
+ */
+ ((PageHeader) metapg)->pd_lower =
+ ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) metapg;
MarkBufferDirty(metabuf);
}
@@ -331,14 +339,20 @@ void
brin_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
mask_page_lsn_and_checksum(page);
mask_page_hint_bits(page);
- if (BRIN_IS_REGULAR_PAGE(page))
+ /*
+ * Regular brin pages contain unused space which needs to be masked.
+ * Similarly for meta pages, but mask it only if pd_lower appears to have
+ * been set correctly.
+ */
+ if (BRIN_IS_REGULAR_PAGE(page) ||
+ (BRIN_IS_META_PAGE(page) && pagehdr->pd_lower > SizeOfPageHeaderData))
{
- /* Regular brin pages contain unused space which needs to be masked. */
mask_unused_space(page);
}
}
--
2.11.0
0003-Set-pd_lower-correctly-in-the-SP-GiST-index-metapage.patchtext/plain; charset=UTF-8; name=0003-Set-pd_lower-correctly-in-the-SP-GiST-index-metapage.patchDownload
From 645384e0d0bcd0fb57d4b3a9b611364cdf1c2109 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 26 Jun 2017 15:23:34 +0900
Subject: [PATCH 3/5] Set pd_lower correctly in the SP-GiST index metapage
Also tell xlog.c to treat the metapage like a standard page, so any
hole in it is compressed.
---
src/backend/access/spgist/spginsert.c | 4 ++--
src/backend/access/spgist/spgutils.c | 18 ++++++++++++++++++
src/backend/access/spgist/spgxlog.c | 7 ++++---
3 files changed, 24 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index e4b2c29b0e..80b82e1602 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -110,7 +110,7 @@ spgbuild(Relation heap, Relation index, IndexInfo *indexInfo)
* Replay will re-initialize the pages, so don't take full pages
* images. No other data to log.
*/
- XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBuffer(1, rootbuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBuffer(2, nullbuffer, REGBUF_WILL_INIT | REGBUF_STANDARD);
@@ -173,7 +173,7 @@ spgbuildempty(Relation index)
smgrwrite(index->rd_smgr, INIT_FORKNUM, SPGIST_METAPAGE_BLKNO,
(char *) page, true);
log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
- SPGIST_METAPAGE_BLKNO, page, false);
+ SPGIST_METAPAGE_BLKNO, page, true);
/* Likewise for the root page. */
SpGistInitPage(page, SPGIST_LEAF);
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 22f64b0103..5cccb4bc9f 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -262,9 +262,19 @@ SpGistUpdateMetaPage(Relation index)
if (ConditionalLockBuffer(metabuffer))
{
+ Page metapage;
+
metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
metadata->lastUsedPages = cache->lastUsedPages;
+ /*
+ * Set pd_lower just past the end of the metadata. This is
+ * essential, because without doing so, metadata will be lost if
+ * xlog.c compresses the page.
+ */
+ metapage = BufferGetPage(metabuffer);
+ ((PageHeader) metapage)->pd_lower = ((char *) metadata +
+ sizeof(SpGistMetaPageData)) - (char *) metapage;
MarkBufferDirty(metabuffer);
UnlockReleaseBuffer(metabuffer);
}
@@ -534,6 +544,14 @@ SpGistInitMetapage(Page page)
/* initialize last-used-page cache to empty */
for (i = 0; i < SPGIST_CACHED_PAGES; i++)
metadata->lastUsedPages.cachedPage[i].blkno = InvalidBlockNumber;
+
+ /*
+ * Set pd_lower just past the end of the metadata. This is essential,
+ * because without doing so, metadata will be lost if xlog.c compresses
+ * the page.
+ */
+ ((PageHeader) page)->pd_lower =
+ ((char *) metadata + sizeof(SpGistMetaPageData)) - (char *) page;
}
/*
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index 87def79ee5..539ce8eb8c 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -1033,15 +1033,16 @@ void
spg_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
+ PageHeader pagehdr = (PageHeader) page;
mask_page_lsn_and_checksum(page);
mask_page_hint_bits(page);
/*
- * Any SpGist page other than meta contains unused space which needs to be
- * masked.
+ * Mask the unused space, only if the page's pd_lower appears to have been
+ * set correctly.
*/
- if (!SpGistPageIsMeta(page))
+ if (pagehdr->pd_lower > SizeOfPageHeaderData)
mask_unused_space(page);
}
--
2.11.0
On Tue, Sep 26, 2017 at 4:22 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Except that small thing, the patches do their duty.
Thanks, revised patches attached.
Cool, let's switch it back to a ready for committer status then.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/26 16:30, Michael Paquier wrote:
On Tue, Sep 26, 2017 at 4:22 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Except that small thing, the patches do their duty.
Thanks, revised patches attached.
Cool, let's switch it back to a ready for committer status then.
Sure, thanks.
Regards,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
On 2017/09/26 16:30, Michael Paquier wrote:
Cool, let's switch it back to a ready for committer status then.
Sure, thanks.
Pushed with some cosmetic adjustments --- mostly, making the comments more
explicit about why we need the apparently-redundant assignments to
pd_lower.
I've marked the CF entry closed. However, I'm not sure if we're quite
done with this thread. Weren't we going to adjust nbtree and hash to
be more aggressive about labeling their metapages as REGBUF_STANDARD?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Nov 3, 2017 at 2:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
On 2017/09/26 16:30, Michael Paquier wrote:
Cool, let's switch it back to a ready for committer status then.
Sure, thanks.
Pushed with some cosmetic adjustments --- mostly, making the comments more
explicit about why we need the apparently-redundant assignments to
pd_lower.I've marked the CF entry closed. However, I'm not sure if we're quite
done with this thread. Weren't we going to adjust nbtree and hash to
be more aggressive about labeling their metapages as REGBUF_STANDARD?
I have already posted the patches [1]/messages/by-id/CAA4eK1+TKg6ZK=mF14x_wf2KrmOxoMJ6z7YUK3-78acaYLwQ8Q@mail.gmail.com for the same in this thread and
those are reviewed [2]/messages/by-id/CAB7nPqTE4-GCaLtDh=JBcgUKR6B5WkvRLC-NpOqkgybi4FhHPw@mail.gmail.com[3]/messages/by-id/CAB7nPqQBtDW43ABnWEdoHP6A2ToedzDFdpykbGjpO2wuZNiQnw@mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com as well. I have adjusted the comments as per
latest commit. Please find updated patches attached.
[1]: /messages/by-id/CAA4eK1+TKg6ZK=mF14x_wf2KrmOxoMJ6z7YUK3-78acaYLwQ8Q@mail.gmail.com
[2]: /messages/by-id/CAB7nPqTE4-GCaLtDh=JBcgUKR6B5WkvRLC-NpOqkgybi4FhHPw@mail.gmail.com
[3]: /messages/by-id/CAB7nPqQBtDW43ABnWEdoHP6A2ToedzDFdpykbGjpO2wuZNiQnw@mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
change_metapage_usage_btree-v3.patchapplication/octet-stream; name=change_metapage_usage_btree-v3.patchDownload
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index bf963fc..5cbaba1 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -898,7 +898,7 @@ _bt_insertonpg(Relation rel,
xlmeta.fastroot = metad->btm_fastroot;
xlmeta.fastlevel = metad->btm_fastlevel;
- XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT | REGBUF_STANDARD);
XLogRegisterBufData(2, (char *) &xlmeta, sizeof(xl_btree_metadata));
xlinfo = XLOG_BTREE_INSERT_META;
@@ -2032,7 +2032,7 @@ _bt_newroot(Relation rel, Buffer lbuf, Buffer rbuf)
XLogRegisterBuffer(0, rootbuf, REGBUF_WILL_INIT);
XLogRegisterBuffer(1, lbuf, REGBUF_STANDARD);
- XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT | REGBUF_STANDARD);
md.root = rootblknum;
md.level = metad->btm_level;
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 10697e9..c774349 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -65,8 +65,9 @@ _bt_initmetapage(Page page, BlockNumber rootbknum, uint32 level)
metaopaque->btpo_flags = BTP_META;
/*
- * Set pd_lower just past the end of the metadata. This is not essential
- * but it makes the page look compressible to xlog.c.
+ * Set pd_lower just past the end of the metadata. This is essential,
+ * because without doing so, metadata will be lost if xlog.c compresses
+ * the page.
*/
((PageHeader) page)->pd_lower =
((char *) metad + sizeof(BTMetaPageData)) - (char *) page;
@@ -241,7 +242,7 @@ _bt_getroot(Relation rel, int access)
XLogBeginInsert();
XLogRegisterBuffer(0, rootbuf, REGBUF_WILL_INIT);
- XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT | REGBUF_STANDARD);
md.root = rootblkno;
md.level = 0;
@@ -1827,7 +1828,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
if (BufferIsValid(metabuf))
{
- XLogRegisterBuffer(4, metabuf, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(4, metabuf, REGBUF_WILL_INIT | REGBUF_STANDARD);
xlmeta.root = metad->btm_root;
xlmeta.level = metad->btm_level;
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 3dbafdd..399e6a1 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -298,7 +298,7 @@ btbuildempty(Relation index)
smgrwrite(index->rd_smgr, INIT_FORKNUM, BTREE_METAPAGE,
(char *) metapage, true);
log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
- BTREE_METAPAGE, metapage, false);
+ BTREE_METAPAGE, metapage, true);
/*
* An immediate sync is required even if we xlog'd the page, because the
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 82337f8..7250b4f 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -107,8 +107,9 @@ _bt_restore_meta(XLogReaderState *record, uint8 block_id)
pageop->btpo_flags = BTP_META;
/*
- * Set pd_lower just past the end of the metadata. This is not essential
- * but it makes the page look compressible to xlog.c.
+ * Set pd_lower just past the end of the metadata. This is essential,
+ * because without doing so, metadata will be lost if xlog.c compresses
+ * the page.
*/
((PageHeader) metapg)->pd_lower =
((char *) md + sizeof(BTMetaPageData)) - (char *) metapg;
change_metapage_usage_hash-v3.patchapplication/octet-stream; name=change_metapage_usage_hash-v3.patchDownload
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index f279dce..4b14f88 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -403,7 +403,7 @@ _hash_init(Relation rel, double num_tuples, ForkNumber forkNum)
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfHashInitMetaPage);
- XLogRegisterBuffer(0, metabuf, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(0, metabuf, REGBUF_WILL_INIT | REGBUF_STANDARD);
recptr = XLogInsert(RM_HASH_ID, XLOG_HASH_INIT_META_PAGE);
@@ -592,8 +592,9 @@ _hash_init_metabuffer(Buffer buf, double num_tuples, RegProcedure procid,
metap->hashm_firstfree = 0;
/*
- * Set pd_lower just past the end of the metadata. This is to log full
- * page image of metapage in xloginsert.c.
+ * Set pd_lower just past the end of the metadata. This is essential,
+ * because without doing so, metadata will be lost if xlog.c compresses
+ * the page.
*/
((PageHeader) page)->pd_lower =
((char *) metap + sizeof(HashMetaPageData)) - (char *) page;
On Fri, Nov 3, 2017 at 1:10 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 3, 2017 at 2:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I've marked the CF entry closed. However, I'm not sure if we're quite
done with this thread. Weren't we going to adjust nbtree and hash to
be more aggressive about labeling their metapages as REGBUF_STANDARD?I have already posted the patches [1] for the same in this thread and
those are reviewed [2][3] as well. I have adjusted the comments as per
latest commit. Please find updated patches attached.
Confirmed. Setting those makes sense even if REGBUF_WILL_INIT is set,
at least for page masking.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier <michael.paquier@gmail.com> writes:
On Fri, Nov 3, 2017 at 1:10 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 3, 2017 at 2:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I've marked the CF entry closed. However, I'm not sure if we're quite
done with this thread. Weren't we going to adjust nbtree and hash to
be more aggressive about labeling their metapages as REGBUF_STANDARD?
I have already posted the patches [1] for the same in this thread and
those are reviewed [2][3] as well. I have adjusted the comments as per
latest commit. Please find updated patches attached.
Confirmed. Setting those makes sense even if REGBUF_WILL_INIT is set,
at least for page masking.
Thanks, I'd forgotten those patches were already posted. Looks good,
so pushed.
Looking around, I noted that contrib/bloom also had the disease of
not telling log_newpage it was writing a standard-format metapage,
so I fixed that too.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/11/03 6:24, Tom Lane wrote:
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
On 2017/09/26 16:30, Michael Paquier wrote:
Cool, let's switch it back to a ready for committer status then.
Sure, thanks.
Pushed with some cosmetic adjustments --- mostly, making the comments more
explicit about why we need the apparently-redundant assignments to
pd_lower.
Thank you.
Regards,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Nov 4, 2017 at 2:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
On Fri, Nov 3, 2017 at 1:10 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 3, 2017 at 2:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I've marked the CF entry closed. However, I'm not sure if we're quite
done with this thread. Weren't we going to adjust nbtree and hash to
be more aggressive about labeling their metapages as REGBUF_STANDARD?I have already posted the patches [1] for the same in this thread and
those are reviewed [2][3] as well. I have adjusted the comments as per
latest commit. Please find updated patches attached.Confirmed. Setting those makes sense even if REGBUF_WILL_INIT is set,
at least for page masking.Thanks, I'd forgotten those patches were already posted. Looks good,
so pushed.Looking around, I noted that contrib/bloom also had the disease of
not telling log_newpage it was writing a standard-format metapage,
so I fixed that too.
Thanks, Michael and Tom for reviewing and committing the work.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers