COPY FROM crash

Started by Zhang Mingliover 1 year ago6 messages

zmlpostgres@gmail.com

over 1 year ago

1 attachment(s)

Hi, all

I got a crash when copy partition tables with mass data in Cloudberry DB[0] https://github.com/cloudberrydb/cloudberrydb(based on Postgres14.4, Greenplum 7).

I have a test on Postgres and it has the similar issue(different places but same function).

However it’s a little hard to reproduce because it happened when inserting next tuple after a previous copy multi insert buffer is flushed.

To reproduce easily, change the Macros to:

#define MAX_BUFFERED_TUPLES 1
#define MAX_PARTITION_BUFFERS 0

Config and make install, when initdb, a core dump will be as:

#0 0x000055de617211b9 in CopyMultiInsertInfoNextFreeSlot (miinfo=0x7ffce496d360, rri=0x55de6368ba88)
at copyfrom.c:592
#1 0x000055de61721ff1 in CopyFrom (cstate=0x55de63592ce8) at copyfrom.c:985
#2 0x000055de6171dd86 in DoCopy (pstate=0x55de63589e00, stmt=0x55de635347d8, stmt_location=0, stmt_len=195,
processed=0x7ffce496d590) at copy.c:306
#3 0x000055de61ad7ce8 in standard_ProcessUtility (pstmt=0x55de635348a8,
queryString=0x55de63533960 "COPY information_schema.sql_features (feature_id, feature_name, sub_feature_id, sub
_feature_name, is_supported, comments) FROM E'/home/gpadmin/install/pg17/share/postgresql/sql_features.txt';\n",
readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55de620b0ce0 <debugtupDR>,
qc=0x7ffce496d910) at utility.c:735
#4 0x000055de61ad7614 in ProcessUtility (pstmt=0x55de635348a8,
queryString=0x55de63533960 "COPY information_schema.sql_features (feature_id, feature_name, sub_feature_id, sub
_feature_name, is_supported, comments) FROM E'/home/gpadmin/install/pg17/share/postgresql/sql_features.txt';\n",
readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55de620b0ce0 <debugtupDR>,
qc=0x7ffce496d910) at utility.c:523
#5 0x000055de61ad5e8f in PortalRunUtility (portal=0x55de633dd7a0, pstmt=0x55de635348a8, isTopLevel=true,
setHoldSnapshot=false, dest=0x55de620b0ce0 <debugtupDR>, qc=0x7ffce496d910) at pquery.c:1158
#6 0x000055de61ad6106 in PortalRunMulti (portal=0x55de633dd7a0, isTopLevel=true, setHoldSnapshot=false,
dest=0x55de620b0ce0 <debugtupDR>, altdest=0x55de620b0ce0 <debugtupDR>, qc=0x7ffce496d910) at pquery.c:1315
#7 0x000055de61ad5550 in PortalRun (portal=0x55de633dd7a0, count=9223372036854775807, isTopLevel=true,
run_once=true, dest=0x55de620b0ce0 <debugtupDR>, altdest=0x55de620b0ce0 <debugtupDR>, qc=0x7ffce496d910)
at pquery.c:791```

The root cause is: we may call CopyMultiInsertInfoFlush() to flush buffer during COPY tuples, ex: insert from next tuple,
CopyMultiInsertInfoNextFreeSlot() will get a crash due to null pointer of buffer.

To fix it: instead of call CopyMultiInsertInfoSetupBuffer() outside, I put it into CopyMultiInsertInfoNextFreeSlot() to avoid such issues.

[0]: https://github.com/cloudberrydb/cloudberrydb

Zhang Mingli
www.hashdata.xyz

Attachments:

v0-0001-Fix-COPY-FROM-crash-due-to-buffer-flush.patchapplication/octet-streamDownload

From e5cddef8845715c55592e5f9786b524c668d83d1 Mon Sep 17 00:00:00 2001
From: Zhang Mingli <avamingli@gmail.com>
Date: Tue, 30 Jul 2024 11:10:32 +0800
Subject: [PATCH] Fix COPY FROM crash due to buffer flush.

We may call CopyMultiInsertInfoFlush() to flush buffer during COPY
tuples, ex: COPY a partitioned table with mass data.

To reproduce it easily, change the Macros to:
define MAX_BUFFERED_TUPLES	1
define MAX_PARTITION_BUFFERS	0

Config and make install, initdb will get a coredump.

When inserting from next tuple CopyMultiInsertInfoNextFreeSlot()
will get a crash due to null pointer of buffer.

Instead of call CopyMultiInsertInfoSetupBuffer() outside, put it
into CopyMultiInsertInfoNextFreeSlot() to avoid such issues.

Authored-by: Zhang Mingli avamingli@gmail.com
---
 src/backend/commands/copyfrom.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index ce4d62e707..ef42b4d523 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -586,8 +586,15 @@ static inline TupleTableSlot *
 CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
 								ResultRelInfo *rri)
 {
-	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
-	int			nused = buffer->nused;
+	CopyMultiInsertBuffer *buffer;
+	int			nused;
+
+	/* Setup buffer in case it's flushed. */
+	if (rri->ri_CopyMultiInsertBuffer == NULL)
+		CopyMultiInsertInfoSetupBuffer(miinfo, rri);
+
+	buffer = rri->ri_CopyMultiInsertBuffer;
+	nused = buffer->nused;

 	Assert(buffer != NULL);
 	Assert(nused < MAX_BUFFERED_TUPLES);
-- 
2.34.1

Import Notes

Reply to msg id not found: f5f769d3-7699-4698-b983-a9c4186977ee@SparkReference msg id not found: f5f769d3-7699-4698-b983-a9c4186977ee@Spark

David Rowley

dgrowleyml@gmail.com

over 1 year ago

In reply to: Zhang Mingli (#1)

Re: COPY FROM crash

On Tue, 30 Jul 2024 at 15:52, Zhang Mingli <zmlpostgres@gmail.com> wrote:

I have a test on Postgres and it has the similar issue(different places but same function).

However it’s a little hard to reproduce because it happened when inserting next tuple after a previous copy multi insert buffer is flushed.

To reproduce easily, change the Macros to:

#define MAX_BUFFERED_TUPLES 1
#define MAX_PARTITION_BUFFERS 0

I think you're going to need to demonstrate to us there's an actual
PostgreSQL bug here with a test case that causes a crash without
changing the above definitions.

It seems to me that it's not valid to set MAX_PARTITION_BUFFERS to
anything less than 2 due to the code inside
CopyMultiInsertInfoFlush(). If we find the CopyMultiInsertBuffer for
'curr_rri' then that code would misbehave if the list only contained a
single CopyMultiInsertBuffer due to the expectation there's another
item in the list after the list_delete_first(). If you're only able
to get it to misbehave by setting MAX_PARTITION_BUFFERS to less than
2, then my suggested fix would be to add a comment to say that values
less than to are not supported.

David

Kirill Reshke

reshkekirill@gmail.com

over 1 year ago

In reply to: Zhang Mingli (#1)

Re: COPY FROM crash

Hi!

On Tue, 30 Jul 2024 at 08:52, Zhang Mingli <zmlpostgres@gmail.com> wrote:

Hi, all

I got a crash when copy partition tables with mass data in Cloudberry DB[0](based on Postgres14.4, Greenplum 7).

I have a test on Postgres and it has the similar issue(different places but same function).

Just to be clear, you are facing this on HEAD, on on REL_14_STABLE?

However it’s a little hard to reproduce because it happened when inserting next tuple after a previous copy multi insert buffer is flushed.

To reproduce easily, change the Macros to:

#define MAX_BUFFERED_TUPLES 1
#define MAX_PARTITION_BUFFERS 0

This way it's harder to believe that the problem persists with the
original settings. Are these values valid?

Zhang Mingli

zmlpostgres@gmail.com

over 1 year ago

In reply to: David Rowley (#2)

1 attachment(s)

Re: COPY FROM crash

Hi,

Zhang Mingli
www.hashdata.xyz

On Jul 30, 2024 at 13:35 +0800, David Rowley <dgrowleyml@gmail.com>, wrote:

If you're only able
to get it to misbehave by setting MAX_PARTITION_BUFFERS to less than
2, then my suggested fix would be to add a comment to say that values
less than to are not supported.

Right.
On Postgres this crash could only happen if it is set to zero.
I’ve updated the comments in patch v1 with an additional Assert.

Attachments:

v1-0001-Add-comments-for-MAX_PARTITION_BUFFERS-to-avoid-misb.patchapplication/octet-streamDownload

From 5d6c8dd8ea75179e382c5b4c9a3ebd5aceb4ea67 Mon Sep 17 00:00:00 2001
From: Zhang Mingli <avamingli@gmail.com>
Date: Tue, 30 Jul 2024 14:59:13 +0800
Subject: [PATCH] Add comments for MAX_PARTITION_BUFFERS to avoid misbehave.

MAX_PARTITION_BUFFERS should always be > 1 to make sense of
copy buffer.If it's set to 0, non-partition tables would clean
up buffer and get a null pointer next insertion.

Authored-by: Zhang Mingli avamingli@gmail.com
---
 src/backend/commands/copyfrom.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index ce4d62e707..b2baf3d29e 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -68,7 +68,12 @@
  */
 #define MAX_BUFFERED_BYTES		65535
 
-/* Trim the list of buffers back down to this number after flushing */
+/*
+ * Trim the list of buffers back down to this number after flushing.
+ * This should be > 1 to make sense of multi insert buffer.
+ * Ex: avoid non-partition tables call CopyMultiInsertBufferCleanup()
+ * when flush buffer.
+ */
 #define MAX_PARTITION_BUFFERS	32
 
 /* Stores multi-insert data related to a single relation in CopyFrom. */
@@ -532,6 +537,8 @@ CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri,
 	miinfo->bufferedTuples = 0;
 	miinfo->bufferedBytes = 0;
 
+	Assert(MAX_PARTITION_BUFFERS > 1);
+
 	/*
 	 * Trim the list of tracked buffers down if it exceeds the limit.  Here we
 	 * remove buffers starting with the ones we created first.  It seems less
-- 
2.34.1

Zhang Mingli

zmlpostgres@gmail.com

over 1 year ago

In reply to: Kirill Reshke (#3)

Re: COPY FROM crash

Hi,

Zhang Mingli
www.hashdata.xyz
On Jul 30, 2024 at 13:37 +0800, Kirill Reshke <reshkekirill@gmail.com>, wrote:

Just to be clear, you are facing this on HEAD, on on REL_14_STABLE?

Postgres HEAD.

David Rowley

dgrowleyml@gmail.com

over 1 year ago

In reply to: Zhang Mingli (#4)

Re: COPY FROM crash

On Tue, 30 Jul 2024 at 19:06, Zhang Mingli <zmlpostgres@gmail.com> wrote:

I’ve updated the comments in patch v1 with an additional Assert.

Thanks. I adjusted a little and used a StaticAssert instead then pushed.

StaticAssert seems better as invalid values will result in compilation failure.

David