GIN tries to form a tuple with a partial compressedList during insertion

Started by Arseniy Mukhin9 months ago6 messages
#1Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
1 attachment(s)

Hi,

In the functions addItemPointersToLeafTuple and buildFreshLeafTuple
(in gininsert.c), the result of ginCompressPostingList()
is passed to GinFormTuple without checking whether
ginCompressPostingList() successfully packed all items.
These GinFormTuple calls consistently fail because the resulting
tuples always exceed the maximum size.
While this doesn’t result in data corruption, it might still be worth
addressing.
Additionally, the condition if (compressedList) is unnecessary, since
ginCompressPostingList() never returns NULL.

Please find the attached patch fixing it.

Best regards,
Arseniy Mukhin

Attachments:

gin_insert_invalid_compressed_list_fix.patchtext/x-patch; charset=US-ASCII; name=gin_insert_invalid_compressed_list_fix.patchDownload
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index a7b7b5996e3..5643c423627 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -218,7 +218,8 @@ addItemPointersToLeafTuple(GinState *ginstate,
 	ItemPointerData *newItems,
 			   *oldItems;
 	int			oldNPosting,
-				newNPosting;
+				newNPosting,
+                nwritten;
 	GinPostingList *compressedList;
 
 	Assert(!GinIsPostingTree(old));
@@ -236,17 +237,18 @@ addItemPointersToLeafTuple(GinState *ginstate,
 	/* Compress the posting list, and try to a build tuple with room for it */
 	res = NULL;
 	compressedList = ginCompressPostingList(newItems, newNPosting, GinMaxItemSize,
-											NULL);
+                                            &nwritten);
 	pfree(newItems);
-	if (compressedList)
+	if (nwritten == newNPosting)
 	{
 		res = GinFormTuple(ginstate, attnum, key, category,
 						   (char *) compressedList,
 						   SizeOfGinPostingList(compressedList),
 						   newNPosting,
 						   false);
-		pfree(compressedList);
 	}
+    pfree(compressedList);
+
 	if (!res)
 	{
 		/* posting list would be too big, convert to posting tree */
@@ -293,17 +295,19 @@ buildFreshLeafTuple(GinState *ginstate,
 {
 	IndexTuple	res = NULL;
 	GinPostingList *compressedList;
+    int nwritten;
 
 	/* try to build a posting list tuple with all the items */
-	compressedList = ginCompressPostingList(items, nitem, GinMaxItemSize, NULL);
-	if (compressedList)
+	compressedList = ginCompressPostingList(items, nitem, GinMaxItemSize, &nwritten);
+	if (nwritten == nitem)
 	{
 		res = GinFormTuple(ginstate, attnum, key, category,
 						   (char *) compressedList,
 						   SizeOfGinPostingList(compressedList),
 						   nitem, false);
-		pfree(compressedList);
 	}
+    pfree(compressedList);
+
 	if (!res)
 	{
 		/* posting list would be too big, build posting tree */
#2Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Arseniy Mukhin (#1)
1 attachment(s)
Re: GIN tries to form a tuple with a partial compressedList during insertion

Hi!

Here is a new version. I added a commit message. I will add it to PG19-2.

Best regards,
Arseniy Mukhin

Attachments:

v2-0001-Add-check-for-compressed-posting-list-items-numbe.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Add-check-for-compressed-posting-list-items-numbe.patchDownload
From 0b5a1fce01dd2bb9ef43febeb497dae4eb274405 Mon Sep 17 00:00:00 2001
From: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Date: Wed, 2 Jul 2025 22:00:31 +0300
Subject: [PATCH v2] Add check for compressed posting list items number in GIN

Current code doesn't check if compressed posting list contains all
new items during index tuple creation which potentially can lead to
lost of data. It doesn't happen now because the size limit of compressed
posting list is the same as gin index tuple size limit, so when compressed
posting list reaches its limit, index tuple size with such data always
too large to be inserted. This commit adds check for compressed items number.

Also, in case of adding items to an existing index tuple we can use more
precise size limit for compressed posting list as we have old index tuple
and can say how many space we have for posting list. It helps to avoid
redundant calls to GinFormTuple.
---
 src/backend/access/gin/gininsert.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index a65acd89104..2d474838a0f 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -218,7 +218,8 @@ addItemPointersToLeafTuple(GinState *ginstate,
 	ItemPointerData *newItems,
 			   *oldItems;
 	int			oldNPosting,
-				newNPosting;
+				newNPosting,
+				nwritten;
 	GinPostingList *compressedList;
 
 	Assert(!GinIsPostingTree(old));
@@ -235,18 +236,19 @@ addItemPointersToLeafTuple(GinState *ginstate,
 
 	/* Compress the posting list, and try to a build tuple with room for it */
 	res = NULL;
-	compressedList = ginCompressPostingList(newItems, newNPosting, GinMaxItemSize,
-											NULL);
+	compressedList = ginCompressPostingList(newItems, newNPosting, GinMaxItemSize - GinGetPostingOffset(old),
+											&nwritten);
 	pfree(newItems);
-	if (compressedList)
+	if (nwritten == newNPosting)
 	{
 		res = GinFormTuple(ginstate, attnum, key, category,
 						   (char *) compressedList,
 						   SizeOfGinPostingList(compressedList),
 						   newNPosting,
 						   false);
-		pfree(compressedList);
 	}
+	pfree(compressedList);
+
 	if (!res)
 	{
 		/* posting list would be too big, convert to posting tree */
@@ -293,17 +295,19 @@ buildFreshLeafTuple(GinState *ginstate,
 {
 	IndexTuple	res = NULL;
 	GinPostingList *compressedList;
+	int			nwritten;
 
 	/* try to build a posting list tuple with all the items */
-	compressedList = ginCompressPostingList(items, nitem, GinMaxItemSize, NULL);
-	if (compressedList)
+	compressedList = ginCompressPostingList(items, nitem, GinMaxItemSize, &nwritten);
+	if (nwritten == nitem)
 	{
 		res = GinFormTuple(ginstate, attnum, key, category,
 						   (char *) compressedList,
 						   SizeOfGinPostingList(compressedList),
 						   nitem, false);
-		pfree(compressedList);
 	}
+	pfree(compressedList);
+
 	if (!res)
 	{
 		/* posting list would be too big, build posting tree */
-- 
2.43.0

#3Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Arseniy Mukhin (#2)
Re: GIN tries to form a tuple with a partial compressedList during insertion

On Wed, Jul 2, 2025 at 12:41 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi!

Here is a new version. I added a commit message. I will add it to PG19-2.

Thank you for the patch.

I think the proposed change is reasonable; if we fail to compress all
ItemPointers, it doesn't make sense to try to form a tuple from it.

Here are some review comments:

---
-       compressedList = ginCompressPostingList(newItems, newNPosting,
GinMaxItemSize,
-
                 NULL);
+       compressedList = ginCompressPostingList(newItems, newNPosting,
GinMaxItemSize - GinGetPostingOffset(old),
+
                 &nwritten);

Why does it need to subtract GinGetPostingOffset(old) from the maxsize?

---
pfree(newItems);
- if (compressedList)
+ if (nwritten == newNPosting)
{
res = GinFormTuple(ginstate, attnum, key, category,
(char *) compressedList,

SizeOfGinPostingList(compressedList),
newNPosting,
false);
- pfree(compressedList);
}
+ pfree(compressedList);

I think it would be cleaner if we move 'pfree(newItems)' to before
'pfree(compressedList)'.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#4Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Masahiko Sawada (#3)
1 attachment(s)
Re: GIN tries to form a tuple with a partial compressedList during insertion

Hi,

On Fri, Sep 26, 2025 at 1:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jul 2, 2025 at 12:41 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi!

Here is a new version. I added a commit message. I will add it to PG19-2.

Thank you for the patch.

Thank you for looking into this!

I think the proposed change is reasonable; if we fail to compress all
ItemPointers, it doesn't make sense to try to form a tuple from it.

Here are some review comments:

---
-       compressedList = ginCompressPostingList(newItems, newNPosting,
GinMaxItemSize,
-
NULL);
+       compressedList = ginCompressPostingList(newItems, newNPosting,
GinMaxItemSize - GinGetPostingOffset(old),
+
&nwritten);

Why does it need to subtract GinGetPostingOffset(old) from the maxsize?

While writing the patch I realized that there is a room for small
optimization. The idea here is that as we know the index tuple size
limit and we know how much space we need for everything except the
posting list (we can get it with GinGetPostingOffset(old)), then we
can calculate the size limit for the posting list more precisely. But
now I think it was a bad idea to add this optimization to the fix.
Even if it relates to the same line of the code, it's a different
thing and it's confusing. Sorry about that. I removed it from the
patch.

---
pfree(newItems);
- if (compressedList)
+ if (nwritten == newNPosting)
{
res = GinFormTuple(ginstate, attnum, key, category,
(char *) compressedList,

SizeOfGinPostingList(compressedList),
newNPosting,
false);
- pfree(compressedList);
}
+ pfree(compressedList);

I think it would be cleaner if we move 'pfree(newItems)' to before
'pfree(compressedList)'.

Done.

Please find the attached new version.

Thank you!

Best regards,
Arseniy Mukhin

Attachments:

v3-0001-Add-check-for-compressed-posting-list-items-numbe.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Add-check-for-compressed-posting-list-items-numbe.patchDownload
From d47dc9125cac3c83b6137783d1ec8c503711b65f Mon Sep 17 00:00:00 2001
From: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Date: Wed, 2 Jul 2025 22:00:31 +0300
Subject: [PATCH v3] Add check for compressed posting list items number in GIN

Current code doesn't check if compressed posting list contains all
items during index tuple creation which potentially can lead to
lost of data. It doesn't happen now because the size limit of compressed
posting list is the same as gin index tuple size limit, so when compressed
posting list reaches its limit, index tuple size with such a posting list
always too large. This commit adds check for compressed items number.
---
 src/backend/access/gin/gininsert.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e9d4b27427e..1d3ab22556d 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -218,7 +218,8 @@ addItemPointersToLeafTuple(GinState *ginstate,
 	ItemPointerData *newItems,
 			   *oldItems;
 	int			oldNPosting,
-				newNPosting;
+				newNPosting,
+				nwritten;
 	GinPostingList *compressedList;
 
 	Assert(!GinIsPostingTree(old));
@@ -235,18 +236,19 @@ addItemPointersToLeafTuple(GinState *ginstate,
 
 	/* Compress the posting list, and try to a build tuple with room for it */
 	res = NULL;
-	compressedList = ginCompressPostingList(newItems, newNPosting, GinMaxItemSize,
-											NULL);
-	pfree(newItems);
-	if (compressedList)
+	compressedList = ginCompressPostingList(newItems, newNPosting, GinMaxItemSize, &nwritten);
+	if (nwritten == newNPosting)
 	{
 		res = GinFormTuple(ginstate, attnum, key, category,
 						   (char *) compressedList,
 						   SizeOfGinPostingList(compressedList),
 						   newNPosting,
 						   false);
-		pfree(compressedList);
 	}
+
+	pfree(newItems);
+	pfree(compressedList);
+
 	if (!res)
 	{
 		/* posting list would be too big, convert to posting tree */
@@ -293,17 +295,19 @@ buildFreshLeafTuple(GinState *ginstate,
 {
 	IndexTuple	res = NULL;
 	GinPostingList *compressedList;
+	int			nwritten;
 
 	/* try to build a posting list tuple with all the items */
-	compressedList = ginCompressPostingList(items, nitem, GinMaxItemSize, NULL);
-	if (compressedList)
+	compressedList = ginCompressPostingList(items, nitem, GinMaxItemSize, &nwritten);
+	if (nwritten == nitem)
 	{
 		res = GinFormTuple(ginstate, attnum, key, category,
 						   (char *) compressedList,
 						   SizeOfGinPostingList(compressedList),
 						   nitem, false);
-		pfree(compressedList);
 	}
+	pfree(compressedList);
+
 	if (!res)
 	{
 		/* posting list would be too big, build posting tree */
-- 
2.43.0

#5Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Arseniy Mukhin (#4)
Re: GIN tries to form a tuple with a partial compressedList during insertion

On Fri, Sep 26, 2025 at 1:58 AM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi,

On Fri, Sep 26, 2025 at 1:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jul 2, 2025 at 12:41 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi!

Here is a new version. I added a commit message. I will add it to PG19-2.

Thank you for the patch.

Thank you for looking into this!

I think the proposed change is reasonable; if we fail to compress all
ItemPointers, it doesn't make sense to try to form a tuple from it.

Here are some review comments:

---
-       compressedList = ginCompressPostingList(newItems, newNPosting,
GinMaxItemSize,
-
NULL);
+       compressedList = ginCompressPostingList(newItems, newNPosting,
GinMaxItemSize - GinGetPostingOffset(old),
+
&nwritten);

Why does it need to subtract GinGetPostingOffset(old) from the maxsize?

While writing the patch I realized that there is a room for small
optimization. The idea here is that as we know the index tuple size
limit and we know how much space we need for everything except the
posting list (we can get it with GinGetPostingOffset(old)), then we
can calculate the size limit for the posting list more precisely. But
now I think it was a bad idea to add this optimization to the fix.
Even if it relates to the same line of the code, it's a different
thing and it's confusing. Sorry about that. I removed it from the
patch.

---
pfree(newItems);
- if (compressedList)
+ if (nwritten == newNPosting)
{
res = GinFormTuple(ginstate, attnum, key, category,
(char *) compressedList,

SizeOfGinPostingList(compressedList),
newNPosting,
false);
- pfree(compressedList);
}
+ pfree(compressedList);

I think it would be cleaner if we move 'pfree(newItems)' to before
'pfree(compressedList)'.

Done.

Please find the attached new version.

Thank you for updating the patch! LGTM, so I've pushed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#6Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Masahiko Sawada (#5)
Re: GIN tries to form a tuple with a partial compressedList during insertion

On Tue, Oct 7, 2025 at 12:04 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

...
Thank you for updating the patch! LGTM, so I've pushed.

Thank you!

Best regards,
Arseniy Mukhin