Shmem queue is not flushed if receiver is not yet attached

Started by Pavan Deolaseealmost 4 years ago6 messages
#1Pavan Deolasee
pavan.deolasee@gmail.com
1 attachment(s)

Hello,

While testing on the current PG master, I noticed a problem between
backends communicating over a shared memory queue. I think `shm_mq_sendv()`
fails to flush the queue, even if `force_flush` is set to true, if the
receiver is not yet attached to the queue. This simple fix solves
the problem for me.

On another note, `shm_mq.h` declares `shm_mq_flush()`, but I don't see it
being implemented. Maybe just a leftover from the previous work? Though it
seems useful to implement that API.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB: https://www.enterprisedb..com

Attachments:

0001-Flush-the-queue-even-if-receiver-has-not-attached.patchapplication/octet-stream; name=0001-Flush-the-queue-even-if-receiver-has-not-attached.patchDownload
From 4d6158f7d9829dd5a909ee44748d7670e82b9983 Mon Sep 17 00:00:00 2001
From: Pavan Deolasee <pavan.deolasee@gmail.com>
Date: Thu, 17 Mar 2022 12:37:11 +0530
Subject: [PATCH 1/1] Flush the queue even if receiver has not attached.

The new API introduced in PG15 delays flushing shared memoey queue
unless the caller has requested a `force_flush`. But if the
receiver has not yet attached to the queue, we fail to enforce
`force_flush` and return with success without actually flushing the
queue. This commit fixes the bug.
---
 src/backend/storage/ipc/shm_mq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index 603cf9b0fa7..b0450f68e66 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -528,8 +528,6 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait,
 		SpinLockAcquire(&mq->mq_mutex);
 		receiver = mq->mq_receiver;
 		SpinLockRelease(&mq->mq_mutex);
-		if (receiver == NULL)
-			return SHM_MQ_SUCCESS;
 		mqh->mqh_counterparty_attached = true;
 	}
 
@@ -541,7 +539,9 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait,
 	if (force_flush || mqh->mqh_send_pending > (mq->mq_ring_size >> 2))
 	{
 		shm_mq_inc_bytes_written(mq, mqh->mqh_send_pending);
-		SetLatch(&receiver->procLatch);
+		/* If the receiver is attached, signal it. */
+		if (receiver != NULL)
+			SetLatch(&receiver->procLatch);
 		mqh->mqh_send_pending = 0;
 	}
 
-- 
2.30.1 (Apple Git-130)

#2Robert Haas
robertmhaas@gmail.com
In reply to: Pavan Deolasee (#1)
1 attachment(s)
Re: Shmem queue is not flushed if receiver is not yet attached

On Thu, Mar 17, 2022 at 3:13 AM Pavan Deolasee <pavan.deolasee@gmail.com> wrote:

While testing on the current PG master, I noticed a problem between backends communicating over a shared memory queue. I think `shm_mq_sendv()` fails to flush the queue, even if `force_flush` is set to true, if the receiver is not yet attached to the queue. This simple fix solves the problem for me.

On another note, `shm_mq.h` declares `shm_mq_flush()`, but I don't see it being implemented. Maybe just a leftover from the previous work? Though it seems useful to implement that API.

I think that this patch is basically correct, except that it's not
correct to set mqh_counterparty_attached when receiver is still NULL.
Here's a v2 where I've attempted to correct that while preserving the
essence of your proposed fix.

I'm not sure that we need a shm_mq_flush(), but we definitely don't
have one currently, so I've also adjusted your patch to remove the
dead prototype.

Please let me know your thoughts on the attached.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

v2-0001-shm_mq_sendv-Fix-flushing-bug-when-receiver-not-y.patchapplication/octet-stream; name=v2-0001-shm_mq_sendv-Fix-flushing-bug-when-receiver-not-y.patchDownload
From d74a7e53be060c05990ec9e931242ea8b4eaac16 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 24 May 2022 10:55:01 -0400
Subject: [PATCH v2] shm_mq_sendv: Fix flushing bug when receiver not yet
 attached.

With the old logic, when the reciever had not yet attached, we would
never call shm_mq_inc_bytes_written(), even if force_flush = true
was specified. That could result in a situation where data that the
sender believes it has sent is never received.

Along the way, remove a useless function prototype for a nonexistent
function from shm_mq.h.

Commit 46846433a03dff4f2e08c8a161e54a842da360d6 introduced these
problems.

Pavan Deolasee, with a few changes by me.
---
 src/backend/storage/ipc/shm_mq.c | 11 +++++------
 src/include/storage/shm_mq.h     |  1 -
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index 6139c622e0..8ca24de8d6 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -518,8 +518,7 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait,
 
 	/*
 	 * If the counterparty is known to have attached, we can read mq_receiver
-	 * without acquiring the spinlock and assume it isn't NULL.  Otherwise,
-	 * more caution is needed.
+	 * without acquiring the spinlock.  Otherwise, more caution is needed.
 	 */
 	if (mqh->mqh_counterparty_attached)
 		receiver = mq->mq_receiver;
@@ -528,9 +527,8 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait,
 		SpinLockAcquire(&mq->mq_mutex);
 		receiver = mq->mq_receiver;
 		SpinLockRelease(&mq->mq_mutex);
-		if (receiver == NULL)
-			return SHM_MQ_SUCCESS;
-		mqh->mqh_counterparty_attached = true;
+		if (receiver != NULL)
+			mqh->mqh_counterparty_attached = true;
 	}
 
 	/*
@@ -541,7 +539,8 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait,
 	if (force_flush || mqh->mqh_send_pending > (mq->mq_ring_size >> 2))
 	{
 		shm_mq_inc_bytes_written(mq, mqh->mqh_send_pending);
-		SetLatch(&receiver->procLatch);
+		if (receiver != NULL)
+			SetLatch(&receiver->procLatch);
 		mqh->mqh_send_pending = 0;
 	}
 
diff --git a/src/include/storage/shm_mq.h b/src/include/storage/shm_mq.h
index f5220baa78..b6fe68725d 100644
--- a/src/include/storage/shm_mq.h
+++ b/src/include/storage/shm_mq.h
@@ -76,7 +76,6 @@ extern shm_mq_result shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov,
 								  int iovcnt, bool nowait, bool force_flush);
 extern shm_mq_result shm_mq_receive(shm_mq_handle *mqh,
 									Size *nbytesp, void **datap, bool nowait);
-extern void shm_mq_flush(shm_mq_handle *mqh);
 
 /* Wait for our counterparty to attach to the queue. */
 extern shm_mq_result shm_mq_wait_for_attach(shm_mq_handle *mqh);
-- 
2.24.3 (Apple Git-128)

#3Japin Li
japinli@hotmail.com
In reply to: Robert Haas (#2)
Re: Shmem queue is not flushed if receiver is not yet attached

On Tue, 24 May 2022 at 23:05, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Mar 17, 2022 at 3:13 AM Pavan Deolasee <pavan.deolasee@gmail.com> wrote:

While testing on the current PG master, I noticed a problem between backends communicating over a shared memory queue. I think `shm_mq_sendv()` fails to flush the queue, even if `force_flush` is set to true, if the receiver is not yet attached to the queue. This simple fix solves the problem for me.

On another note, `shm_mq.h` declares `shm_mq_flush()`, but I don't see it being implemented. Maybe just a leftover from the previous work? Though it seems useful to implement that API.

I think that this patch is basically correct, except that it's not
correct to set mqh_counterparty_attached when receiver is still NULL.
Here's a v2 where I've attempted to correct that while preserving the
essence of your proposed fix.

I'm not sure that we need a shm_mq_flush(), but we definitely don't
have one currently, so I've also adjusted your patch to remove the
dead prototype.

Please let me know your thoughts on the attached.

Thanks,

Hi,

I have a problem that is also related to shmem queue [1]/messages/by-id/MEYP282MB1669C8D88F0997354C2313C1B6CA9@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM, however, I cannot
reproduce it. How did you reproduce this problem?

[1]: /messages/by-id/MEYP282MB1669C8D88F0997354C2313C1B6CA9@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

#4Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Robert Haas (#2)
Re: Shmem queue is not flushed if receiver is not yet attached

Hi Robert,

On Tue, May 24, 2022 at 8:35 PM Robert Haas <robertmhaas@gmail.com> wrote:

I think that this patch is basically correct, except that it's not
correct to set mqh_counterparty_attached when receiver is still NULL.
Here's a v2 where I've attempted to correct that while preserving the
essence of your proposed fix.

This looks good to me,

I'm not sure that we need a shm_mq_flush(), but we definitely don't
have one currently, so I've also adjusted your patch to remove the
dead prototype.

Makes sense to me.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB: https://www.enterprisedb..com

#5Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Japin Li (#3)
Re: Shmem queue is not flushed if receiver is not yet attached

On Wed, May 25, 2022 at 7:01 AM Japin Li <japinli@hotmail.com> wrote:

I have a problem that is also related to shmem queue [1], however, I cannot
reproduce it. How did you reproduce this problem?

I discovered this bug while working on an extension that makes use of the
shared memory queue facility. Not sure how useful that is for your purpose.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB: https://www.enterprisedb..com

#6Robert Haas
robertmhaas@gmail.com
In reply to: Pavan Deolasee (#4)
Re: Shmem queue is not flushed if receiver is not yet attached

On Mon, May 30, 2022 at 3:06 AM Pavan Deolasee <pavan.deolasee@gmail.com> wrote:

I think that this patch is basically correct, except that it's not
correct to set mqh_counterparty_attached when receiver is still NULL.
Here's a v2 where I've attempted to correct that while preserving the
essence of your proposed fix.

This looks good to me,

Thanks for checking. Committed.

--
Robert Haas
EDB: http://www.enterprisedb.com