No title

Started by Nicola Contualmost 6 years ago6 messages
#1Nicola Contu
nicola.contu@gmail.com

Hello,
I am running postgres 11.5 and we were having issues with shared segments.
So I increased the max_connection as suggested by you guys and reduced my
work_mem to 600M.

Right now instead, it is the second time I see this error :

ERROR: could not resize shared memory segment "/PostgreSQL.2137675995" to
33624064 bytes: Interrupted system call

So do you know what it means and how can I solve it?

Thanks a lot,
Nicola

#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Nicola Contu (#1)
1 attachment(s)
EINTR while resizing dsm segment.

I provided the subject, and added -hackers.

Hello,
I am running postgres 11.5 and we were having issues with shared segments.
So I increased the max_connection as suggested by you guys and reduced my
work_mem to 600M.

Right now instead, it is the second time I see this error :

ERROR: could not resize shared memory segment "/PostgreSQL.2137675995" to
33624064 bytes: Interrupted system call

The function posix_fallocate is protected against EINTR.

| do
| {
| rc = posix_fallocate(fd, 0, size);
| } while (rc == EINTR && !(ProcDiePending || QueryCancelPending));

But not for ftruncate and write. Don't we need to protect them from
ENTRI as the attached?

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Protect-dsm_impl-from-EINTR.patchtext/x-patch; charset=us-asciiDownload
From 590b783f93995bfd1ec05dbcb2805a577372604d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Thu, 2 Apr 2020 17:09:35 +0900
Subject: [PATCH] Protect dsm_impl from EINTR

dsm_impl functions should not error-out by EINTR.
---
 src/backend/storage/ipc/dsm_impl.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 1972aecbed..f4e7350a5e 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -360,7 +360,11 @@ dsm_impl_posix_resize(int fd, off_t size)
 	int			rc;
 
 	/* Truncate (or extend) the file to the requested size. */
-	rc = ftruncate(fd, size);
+	do
+	{
+		rc = ftruncate(fd, size);
+	} while (rc < 0 && errno == EINTR &&
+			 !(ProcDiePending || QueryCancelPending));
 
 	/*
 	 * On Linux, a shm_open fd is backed by a tmpfs file.  After resizing with
@@ -874,11 +878,19 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
 		while (success && remaining > 0)
 		{
 			Size		goal = remaining;
+			Size		rc;
 
 			if (goal > ZBUFFER_SIZE)
 				goal = ZBUFFER_SIZE;
 			pgstat_report_wait_start(WAIT_EVENT_DSM_FILL_ZERO_WRITE);
-			if (write(fd, zbuffer, goal) == goal)
+
+			do
+			{
+				rc = write(fd, zbuffer, goal);
+			} while (rc < 0 && errno == EINTR &&
+					 !(ProcDiePending || QueryCancelPending));
+
+			if (rc == goal)
 				remaining -= goal;
 			else
 				success = false;
-- 
2.18.2

#3Thomas Munro
thomas.munro@gmail.com
In reply to: Kyotaro Horiguchi (#2)
Re: EINTR while resizing dsm segment.

On Thu, Apr 2, 2020 at 9:25 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

I provided the subject, and added -hackers.

Hello,
I am running postgres 11.5 and we were having issues with shared segments.
So I increased the max_connection as suggested by you guys and reduced my
work_mem to 600M.

Right now instead, it is the second time I see this error :

ERROR: could not resize shared memory segment "/PostgreSQL.2137675995" to
33624064 bytes: Interrupted system call

The function posix_fallocate is protected against EINTR.

| do
| {
| rc = posix_fallocate(fd, 0, size);
| } while (rc == EINTR && !(ProcDiePending || QueryCancelPending));

But not for ftruncate and write. Don't we need to protect them from
ENTRI as the attached?

We don't handle EINTR for write() generally because that's not
supposed to be necessary on local files (local disks are not "slow
devices", and we document that if you're using something like NFS you
should use its "hard" mount option so that it behaves that way too).
As for ftruncate(), you'd think it'd be similar, and I can't think of
a more local filesystem than tmpfs (where POSIX shmem lives on Linux),
but I can't seem to figure that out from reading man pages; maybe I'm
reading the wrong ones. Perhaps in low memory situations, an I/O wait
path reached by ftruncate() can return EINTR here rather than entering
D state (non-interruptable sleep) or restarting due to our SA_RESTART
flag... anyone know?

Another thought: is there some way for the posix_fallocate() retry
loop to exit because (ProcDiePending || QueryCancelPending), but then
for CHECK_FOR_INTERRUPTS() to do nothing, so that we fall through to
reporting the EINTR?

#4Nicola Contu
nicola.contu@gmail.com
In reply to: Thomas Munro (#3)
Re: EINTR while resizing dsm segment.

So that seems to be a bug, correct?
Just to confirm, I am not using NFS, it is directly on disk.

Other than that, is there a particular option we can set in the
postgres.conf to mitigate the issue?

Thanks a lot for your help.

Il giorno sab 4 apr 2020 alle ore 02:49 Thomas Munro <thomas.munro@gmail.com>
ha scritto:

Show quoted text

On Thu, Apr 2, 2020 at 9:25 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

I provided the subject, and added -hackers.

Hello,
I am running postgres 11.5 and we were having issues with shared

segments.

So I increased the max_connection as suggested by you guys and reduced

my

work_mem to 600M.

Right now instead, it is the second time I see this error :

ERROR: could not resize shared memory segment

"/PostgreSQL.2137675995" to

33624064 bytes: Interrupted system call

The function posix_fallocate is protected against EINTR.

| do
| {
| rc = posix_fallocate(fd, 0, size);
| } while (rc == EINTR && !(ProcDiePending || QueryCancelPending));

But not for ftruncate and write. Don't we need to protect them from
ENTRI as the attached?

We don't handle EINTR for write() generally because that's not
supposed to be necessary on local files (local disks are not "slow
devices", and we document that if you're using something like NFS you
should use its "hard" mount option so that it behaves that way too).
As for ftruncate(), you'd think it'd be similar, and I can't think of
a more local filesystem than tmpfs (where POSIX shmem lives on Linux),
but I can't seem to figure that out from reading man pages; maybe I'm
reading the wrong ones. Perhaps in low memory situations, an I/O wait
path reached by ftruncate() can return EINTR here rather than entering
D state (non-interruptable sleep) or restarting due to our SA_RESTART
flag... anyone know?

Another thought: is there some way for the posix_fallocate() retry
loop to exit because (ProcDiePending || QueryCancelPending), but then
for CHECK_FOR_INTERRUPTS() to do nothing, so that we fall through to
reporting the EINTR?

#5Nicola Contu
nicola.contu@gmail.com
In reply to: Nicola Contu (#4)
Re: EINTR while resizing dsm segment.

The only change we made on the disk, is the encryption at OS level.
Not sure this can be something related.

Il giorno mar 7 apr 2020 alle ore 10:58 Nicola Contu <nicola.contu@gmail.com>
ha scritto:

Show quoted text

So that seems to be a bug, correct?
Just to confirm, I am not using NFS, it is directly on disk.

Other than that, is there a particular option we can set in the
postgres.conf to mitigate the issue?

Thanks a lot for your help.

Il giorno sab 4 apr 2020 alle ore 02:49 Thomas Munro <
thomas.munro@gmail.com> ha scritto:

On Thu, Apr 2, 2020 at 9:25 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

I provided the subject, and added -hackers.

Hello,
I am running postgres 11.5 and we were having issues with shared

segments.

So I increased the max_connection as suggested by you guys and

reduced my

work_mem to 600M.

Right now instead, it is the second time I see this error :

ERROR: could not resize shared memory segment

"/PostgreSQL.2137675995" to

33624064 bytes: Interrupted system call

The function posix_fallocate is protected against EINTR.

| do
| {
| rc = posix_fallocate(fd, 0, size);
| } while (rc == EINTR && !(ProcDiePending || QueryCancelPending));

But not for ftruncate and write. Don't we need to protect them from
ENTRI as the attached?

We don't handle EINTR for write() generally because that's not
supposed to be necessary on local files (local disks are not "slow
devices", and we document that if you're using something like NFS you
should use its "hard" mount option so that it behaves that way too).
As for ftruncate(), you'd think it'd be similar, and I can't think of
a more local filesystem than tmpfs (where POSIX shmem lives on Linux),
but I can't seem to figure that out from reading man pages; maybe I'm
reading the wrong ones. Perhaps in low memory situations, an I/O wait
path reached by ftruncate() can return EINTR here rather than entering
D state (non-interruptable sleep) or restarting due to our SA_RESTART
flag... anyone know?

Another thought: is there some way for the posix_fallocate() retry
loop to exit because (ProcDiePending || QueryCancelPending), but then
for CHECK_FOR_INTERRUPTS() to do nothing, so that we fall through to
reporting the EINTR?

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Nicola Contu (#4)
Re: EINTR while resizing dsm segment.

On Tue, Apr 7, 2020 at 8:58 PM Nicola Contu <nicola.contu@gmail.com> wrote:

So that seems to be a bug, correct?
Just to confirm, I am not using NFS, it is directly on disk.

Other than that, is there a particular option we can set in the postgres.conf to mitigate the issue?

Hi Nicola,

Yeah, I think it's a bug. We're not sure exactly where yet.