ERROR: invalid spinlock number: 0

Started by Fujii Masaoalmost 5 years ago13 messages
#1Fujii Masao
masao.fujii@oss.nttdata.com

Hi,

Commit 3b733fcd04 caused the buildfarm member "rorqual" to report
the following error and fail the regression test. The cause of this issue
is a bug in that commit.

ERROR: invalid spinlock number: 0

But while investigating the issue, I found that this error could happen
even in the current master (without commit 3b733fcd04). The error can
be easily reproduced by reading pg_stat_wal_receiver view before
walreceiver starts up, in the server built with --disable-atomics --disable-spinlocks.
Furthermore if you try to read pg_stat_wal_receiver again,
that gets stuck. This is not good.

ISTM that the commit 2c8dd05d6c caused this issue. The commit changed
pg_stat_get_wal_receiver() so that it reads "writtenUpto" by using
pg_atomic_read_u64(). But since "writtenUpto" is initialized only when
walreceiver starts up, reading "writtenUpto" before the startup of
walreceiver can cause the error.

Also pg_stat_get_wal_receiver() calls pg_atomic_read_u64() while
a spinlock is being held. Maybe this may cause the process to get stuck
in --disable-atomics case, I guess.

Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#2Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#1)
Re: ERROR: invalid spinlock number: 0

Hi Fujii-san,

On Tue, Feb 09, 2021 at 11:17:04PM +0900, Fujii Masao wrote:

ISTM that the commit 2c8dd05d6c caused this issue. The commit changed
pg_stat_get_wal_receiver() so that it reads "writtenUpto" by using
pg_atomic_read_u64(). But since "writtenUpto" is initialized only when
walreceiver starts up, reading "writtenUpto" before the startup of
walreceiver can cause the error.

Indeed, that's a problem. We should at least move that out of the
spin lock area. I'll try to look at that in details, and that's going
to take me a couple of days at least. Sorry for the delay.
--
Michael

#3Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Michael Paquier (#2)
1 attachment(s)
Re: ERROR: invalid spinlock number: 0

On 2021/02/11 21:55, Michael Paquier wrote:

Hi Fujii-san,

On Tue, Feb 09, 2021 at 11:17:04PM +0900, Fujii Masao wrote:

ISTM that the commit 2c8dd05d6c caused this issue. The commit changed
pg_stat_get_wal_receiver() so that it reads "writtenUpto" by using
pg_atomic_read_u64(). But since "writtenUpto" is initialized only when
walreceiver starts up, reading "writtenUpto" before the startup of
walreceiver can cause the error.

Indeed, that's a problem. We should at least move that out of the
spin lock area.

Yes, so what about the attached patch?

We didn't notice this issue long time because no regression test checks
pg_stat_wal_receiver. So I included such test in the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

bugfix_pg_stat_wal_receiver.patchtext/plain; charset=UTF-8; name=bugfix_pg_stat_wal_receiver.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 723f513d8b..9b32941a76 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1325,7 +1325,6 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
 	state = WalRcv->walRcvState;
 	receive_start_lsn = WalRcv->receiveStart;
 	receive_start_tli = WalRcv->receiveStartTLI;
-	written_lsn = pg_atomic_read_u64(&WalRcv->writtenUpto);
 	flushed_lsn = WalRcv->flushedUpto;
 	received_tli = WalRcv->receivedTLI;
 	last_send_time = WalRcv->lastMsgSendTime;
@@ -1345,6 +1344,19 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
 	if (pid == 0 || !ready_to_display)
 		PG_RETURN_NULL();
 
+	/*
+	 * We reach here if WAL receiver is ready to display (i.e.,
+	 * ready_to_display == true). In this case we can ensure that the atomic
+	 * variable "writtenUpto" has already been initialized by WAL receiver, so
+	 * we can safely read it here.
+	 *
+	 * Read "writtenUpto" without holding a spinlock. So it may not be
+	 * consistent with other WAL receiver's shared variables protected by a
+	 * spinlock. This is OK because that variable is used only for
+	 * informational purpose and should not be used for data integrity checks.
+	 */
+	written_lsn = pg_atomic_read_u64(&WalRcv->writtenUpto);
+
 	/* determine result type */
 	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
 		elog(ERROR, "return type must be a row type");
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 81bdacf59d..6d048e309c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -83,6 +83,13 @@ select count(*) = 1 as ok from pg_stat_wal;
  t
 (1 row)
 
+-- We expect no walreceiver running in this test
+select count(*) = 0 as ok from pg_stat_wal_receiver;
+ ok 
+----
+ t
+(1 row)
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index b9b875bc6a..dc8c9a3ac2 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -40,6 +40,9 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- There must be only one record
 select count(*) = 1 as ok from pg_stat_wal;
 
+-- We expect no walreceiver running in this test
+select count(*) = 0 as ok from pg_stat_wal_receiver;
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
#4Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#3)
Re: ERROR: invalid spinlock number: 0

On Thu, Feb 11, 2021 at 11:30:13PM +0900, Fujii Masao wrote:

Yes, so what about the attached patch?

I see. So the first error triggering the spinlock error would cause
a transaction failure because the fallback implementation of atomics
uses a spinlock for this variable, and it may not initialized in this
code path.

We didn't notice this issue long time because no regression test checks
pg_stat_wal_receiver. So I included such test in the patch.

Moving that behind ready_to_display is fine by me seeing where the
initialization is done. The test case is a good addition.

+    * Read "writtenUpto" without holding a spinlock. So it may not be
+    * consistent with other WAL receiver's shared variables protected by a
+    * spinlock. This is OK because that variable is used only for
+    * informational purpose and should not be used for data integrity checks.
It seems to me that the first two sentences of this comment should be
combined together.
--
Michael
#5Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#4)
Re: ERROR: invalid spinlock number: 0

On Mon, Feb 15, 2021 at 9:27 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Feb 11, 2021 at 11:30:13PM +0900, Fujii Masao wrote:

Yes, so what about the attached patch?

I see. So the first error triggering the spinlock error would cause
a transaction failure because the fallback implementation of atomics
uses a spinlock for this variable, and it may not initialized in this
code path.

Why not initialise it in WalRcvShmemInit()?

#6Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#5)
Re: ERROR: invalid spinlock number: 0

On Mon, Feb 15, 2021 at 10:47:05PM +1300, Thomas Munro wrote:

Why not initialise it in WalRcvShmemInit()?

I was thinking about doing that as well, but we have no real need to
initialize this stuff in most cases, say standalone deployments. In
particular for the fallback implementation of atomics, we would
prepare a spinlock for nothing.
--
Michael

#7Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Michael Paquier (#6)
Re: ERROR: invalid spinlock number: 0

On 2021/02/15 19:45, Michael Paquier wrote:

On Mon, Feb 15, 2021 at 10:47:05PM +1300, Thomas Munro wrote:

Why not initialise it in WalRcvShmemInit()?

I was thinking about doing that as well, but we have no real need to
initialize this stuff in most cases, say standalone deployments. In
particular for the fallback implementation of atomics, we would
prepare a spinlock for nothing.

But on second thought, if we make WalRceiverMain() call pg_atomic_init_u64(),
the variable is initialized (i,e., SpinLockInit() is called in --disable-atomics)
every time walreceiver is started. That may be problematic? If so, the variable
needs to be initialized in WalRcvShmemInit(), instead.

BTW, the recent commit 46d6e5f567 has the similar issue. The variable
that commit added is initialized in InitProcess(), but maybe should be done
in InitProcGlobal() or elsewhere.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#8Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#6)
Re: ERROR: invalid spinlock number: 0

Hi,

On 2021-02-15 19:45:21 +0900, Michael Paquier wrote:

On Mon, Feb 15, 2021 at 10:47:05PM +1300, Thomas Munro wrote:

Why not initialise it in WalRcvShmemInit()?

I was thinking about doing that as well, but we have no real need to
initialize this stuff in most cases, say standalone deployments. In
particular for the fallback implementation of atomics, we would
prepare a spinlock for nothing.

So what? It's just about free to initialize a spinlock, whether it's
using the fallback implementation or not. Initializing upon walsender
startup adds a lot of complications, because e.g. somebody could already
hold the spinlock because the previous walsender just disconnected, and
they were looking at the stats.

Greetings,

Andres Freund

#9Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Andres Freund (#8)
1 attachment(s)
Re: ERROR: invalid spinlock number: 0

On 2021/02/16 6:28, Andres Freund wrote:

Hi,

On 2021-02-15 19:45:21 +0900, Michael Paquier wrote:

On Mon, Feb 15, 2021 at 10:47:05PM +1300, Thomas Munro wrote:

Why not initialise it in WalRcvShmemInit()?

I was thinking about doing that as well, but we have no real need to
initialize this stuff in most cases, say standalone deployments. In
particular for the fallback implementation of atomics, we would
prepare a spinlock for nothing.

So what? It's just about free to initialize a spinlock, whether it's
using the fallback implementation or not. Initializing upon walsender
startup adds a lot of complications, because e.g. somebody could already
hold the spinlock because the previous walsender just disconnected, and
they were looking at the stats.

Even if we initialize "writtenUpto" in WalRcvShmemInit(), WalReceiverMain()
still needs to initialize (reset to 0) by using pg_atomic_write_u64().

Basically we should not acquire new spinlock while holding another spinlock,
to shorten the spinlock duration. Right? If yes, we need to move
pg_atomic_read_u64() of "writtenUpto" after the release of spinlock in
pg_stat_get_wal_receiver.

Attached is the updated version of the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

bugfix_pg_stat_wal_receiver_v2.patchtext/plain; charset=UTF-8; name=bugfix_pg_stat_wal_receiver_v2.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 723f513d8b..316640c275 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -249,7 +249,7 @@ WalReceiverMain(void)
 
 	SpinLockRelease(&walrcv->mutex);
 
-	pg_atomic_init_u64(&WalRcv->writtenUpto, 0);
+	pg_atomic_write_u64(&WalRcv->writtenUpto, 0);
 
 	/* Arrange to clean up at walreceiver exit */
 	on_shmem_exit(WalRcvDie, 0);
@@ -1325,7 +1325,6 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
 	state = WalRcv->walRcvState;
 	receive_start_lsn = WalRcv->receiveStart;
 	receive_start_tli = WalRcv->receiveStartTLI;
-	written_lsn = pg_atomic_read_u64(&WalRcv->writtenUpto);
 	flushed_lsn = WalRcv->flushedUpto;
 	received_tli = WalRcv->receivedTLI;
 	last_send_time = WalRcv->lastMsgSendTime;
@@ -1338,6 +1337,14 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
 	strlcpy(conninfo, (char *) WalRcv->conninfo, sizeof(conninfo));
 	SpinLockRelease(&WalRcv->mutex);
 
+	/*
+	 * Read "writtenUpto" without holding a spinlock. So it may not be
+	 * consistent with other WAL receiver's shared variables protected by a
+	 * spinlock. This is OK because that variable is used only for
+	 * informational purpose and should not be used for data integrity checks.
+	 */
+	written_lsn = pg_atomic_read_u64(&WalRcv->writtenUpto);
+
 	/*
 	 * No WAL receiver (or not ready yet), just return a tuple with NULL
 	 * values
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index 69b91a7dab..63e60478ea 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -63,6 +63,7 @@ WalRcvShmemInit(void)
 		MemSet(WalRcv, 0, WalRcvShmemSize());
 		WalRcv->walRcvState = WALRCV_STOPPED;
 		SpinLockInit(&WalRcv->mutex);
+		pg_atomic_init_u64(&WalRcv->writtenUpto, 0);
 		WalRcv->latch = NULL;
 	}
 }
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 81bdacf59d..6d048e309c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -83,6 +83,13 @@ select count(*) = 1 as ok from pg_stat_wal;
  t
 (1 row)
 
+-- We expect no walreceiver running in this test
+select count(*) = 0 as ok from pg_stat_wal_receiver;
+ ok 
+----
+ t
+(1 row)
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index b9b875bc6a..dc8c9a3ac2 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -40,6 +40,9 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- There must be only one record
 select count(*) = 1 as ok from pg_stat_wal;
 
+-- We expect no walreceiver running in this test
+select count(*) = 0 as ok from pg_stat_wal_receiver;
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
#10Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#9)
Re: ERROR: invalid spinlock number: 0

On Tue, Feb 16, 2021 at 12:43:42PM +0900, Fujii Masao wrote:

On 2021/02/16 6:28, Andres Freund wrote:

So what? It's just about free to initialize a spinlock, whether it's
using the fallback implementation or not. Initializing upon walsender
startup adds a lot of complications, because e.g. somebody could already
hold the spinlock because the previous walsender just disconnected, and
they were looking at the stats.

Okay.

Even if we initialize "writtenUpto" in WalRcvShmemInit(), WalReceiverMain()
still needs to initialize (reset to 0) by using pg_atomic_write_u64().

Yes, you have to do that.

Basically we should not acquire new spinlock while holding another spinlock,
to shorten the spinlock duration. Right? If yes, we need to move
pg_atomic_read_u64() of "writtenUpto" after the release of spinlock in
pg_stat_get_wal_receiver.

It would not matter much as a NULL tuple is returned as long as the
WAL receiver information is not ready to be displayed. The only
reason why all the fields are read before checking for
ready_to_display is that we can be sure that everything is consistent
with the PID. So reading writtenUpto before or after does not really
matter logically. I would just move it after the check, as you did
previously.

+   /*
+    * Read "writtenUpto" without holding a spinlock. So it may not be
+    * consistent with other WAL receiver's shared variables protected by a
+    * spinlock. This is OK because that variable is used only for
+    * informational purpose and should not be used for data integrity checks.
+    */
What about the following?
"Read "writtenUpto" without holding a spinlock.  Note that it may not
be consistent with the other shared variables of the WAL receiver
protected by a spinlock, but this should not be used for data
integrity checks."

I agree that what has been done with MyProc->waitStart in 46d6e5f is
not safe, and that initialization should happen once at postmaster
startup, with a write(0) when starting the backend. There are two of
them in proc.c, one in twophase.c. Do you mind if I add an open item
for this one?
--
Michael

#11Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Michael Paquier (#10)
1 attachment(s)
Re: ERROR: invalid spinlock number: 0

On 2021/02/16 15:50, Michael Paquier wrote:

On Tue, Feb 16, 2021 at 12:43:42PM +0900, Fujii Masao wrote:

On 2021/02/16 6:28, Andres Freund wrote:

So what? It's just about free to initialize a spinlock, whether it's
using the fallback implementation or not. Initializing upon walsender
startup adds a lot of complications, because e.g. somebody could already
hold the spinlock because the previous walsender just disconnected, and
they were looking at the stats.

Okay.

Even if we initialize "writtenUpto" in WalRcvShmemInit(), WalReceiverMain()
still needs to initialize (reset to 0) by using pg_atomic_write_u64().

Yes, you have to do that.

Basically we should not acquire new spinlock while holding another spinlock,
to shorten the spinlock duration. Right? If yes, we need to move
pg_atomic_read_u64() of "writtenUpto" after the release of spinlock in
pg_stat_get_wal_receiver.

It would not matter much as a NULL tuple is returned as long as the
WAL receiver information is not ready to be displayed. The only
reason why all the fields are read before checking for
ready_to_display is that we can be sure that everything is consistent
with the PID. So reading writtenUpto before or after does not really
matter logically. I would just move it after the check, as you did
previously.

OK.

+   /*
+    * Read "writtenUpto" without holding a spinlock. So it may not be
+    * consistent with other WAL receiver's shared variables protected by a
+    * spinlock. This is OK because that variable is used only for
+    * informational purpose and should not be used for data integrity checks.
+    */
What about the following?
"Read "writtenUpto" without holding a spinlock.  Note that it may not
be consistent with the other shared variables of the WAL receiver
protected by a spinlock, but this should not be used for data
integrity checks."

Sounds good. Attached is the updated version of the patch.

I agree that what has been done with MyProc->waitStart in 46d6e5f is
not safe, and that initialization should happen once at postmaster
startup, with a write(0) when starting the backend. There are two of
them in proc.c, one in twophase.c. Do you mind if I add an open item
for this one?

Yeah, please feel free to do that! BTW, I already posted the patch
addressing that issue, at [1]/messages/by-id/1df88660-6f08-cc6e-b7e2-f85296a2bdab@oss.nttdata.com.

[1]: /messages/by-id/1df88660-6f08-cc6e-b7e2-f85296a2bdab@oss.nttdata.com

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

bugfix_pg_stat_wal_receiver_v3.patchtext/plain; charset=UTF-8; name=bugfix_pg_stat_wal_receiver_v3.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 723f513d8b..9ec71238c4 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -249,7 +249,7 @@ WalReceiverMain(void)
 
 	SpinLockRelease(&walrcv->mutex);
 
-	pg_atomic_init_u64(&WalRcv->writtenUpto, 0);
+	pg_atomic_write_u64(&WalRcv->writtenUpto, 0);
 
 	/* Arrange to clean up at walreceiver exit */
 	on_shmem_exit(WalRcvDie, 0);
@@ -1325,7 +1325,6 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
 	state = WalRcv->walRcvState;
 	receive_start_lsn = WalRcv->receiveStart;
 	receive_start_tli = WalRcv->receiveStartTLI;
-	written_lsn = pg_atomic_read_u64(&WalRcv->writtenUpto);
 	flushed_lsn = WalRcv->flushedUpto;
 	received_tli = WalRcv->receivedTLI;
 	last_send_time = WalRcv->lastMsgSendTime;
@@ -1345,6 +1344,14 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
 	if (pid == 0 || !ready_to_display)
 		PG_RETURN_NULL();
 
+	/*
+	 * Read "writtenUpto" without holding a spinlock.  Note that it may not be
+	 * consistent with the other shared variables of the WAL receiver
+	 * protected by a spinlock, but this should not be used for data integrity
+	 * checks.
+	 */
+	written_lsn = pg_atomic_read_u64(&WalRcv->writtenUpto);
+
 	/* determine result type */
 	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
 		elog(ERROR, "return type must be a row type");
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index 69b91a7dab..63e60478ea 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -63,6 +63,7 @@ WalRcvShmemInit(void)
 		MemSet(WalRcv, 0, WalRcvShmemSize());
 		WalRcv->walRcvState = WALRCV_STOPPED;
 		SpinLockInit(&WalRcv->mutex);
+		pg_atomic_init_u64(&WalRcv->writtenUpto, 0);
 		WalRcv->latch = NULL;
 	}
 }
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 81bdacf59d..6d048e309c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -83,6 +83,13 @@ select count(*) = 1 as ok from pg_stat_wal;
  t
 (1 row)
 
+-- We expect no walreceiver running in this test
+select count(*) = 0 as ok from pg_stat_wal_receiver;
+ ok 
+----
+ t
+(1 row)
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index b9b875bc6a..dc8c9a3ac2 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -40,6 +40,9 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
 -- There must be only one record
 select count(*) = 1 as ok from pg_stat_wal;
 
+-- We expect no walreceiver running in this test
+select count(*) = 0 as ok from pg_stat_wal_receiver;
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
#12Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#11)
Re: ERROR: invalid spinlock number: 0

On Tue, Feb 16, 2021 at 11:47:52PM +0900, Fujii Masao wrote:

On 2021/02/16 15:50, Michael Paquier wrote:

+   /*
+    * Read "writtenUpto" without holding a spinlock. So it may not be
+    * consistent with other WAL receiver's shared variables protected by a
+    * spinlock. This is OK because that variable is used only for
+    * informational purpose and should not be used for data integrity checks.
+    */
What about the following?
"Read "writtenUpto" without holding a spinlock.  Note that it may not
be consistent with the other shared variables of the WAL receiver
protected by a spinlock, but this should not be used for data
integrity checks."

Sounds good. Attached is the updated version of the patch.

Thanks, looks good to me.

I agree that what has been done with MyProc->waitStart in 46d6e5f is
not safe, and that initialization should happen once at postmaster
startup, with a write(0) when starting the backend. There are two of
them in proc.c, one in twophase.c. Do you mind if I add an open item
for this one?

Yeah, please feel free to do that! BTW, I already posted the patch
addressing that issue, at [1].

Okay, item added with a link to the original thread.
--
Michael

#13Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Michael Paquier (#12)
Re: ERROR: invalid spinlock number: 0

On 2021/02/17 13:52, Michael Paquier wrote:

On Tue, Feb 16, 2021 at 11:47:52PM +0900, Fujii Masao wrote:

On 2021/02/16 15:50, Michael Paquier wrote:

+   /*
+    * Read "writtenUpto" without holding a spinlock. So it may not be
+    * consistent with other WAL receiver's shared variables protected by a
+    * spinlock. This is OK because that variable is used only for
+    * informational purpose and should not be used for data integrity checks.
+    */
What about the following?
"Read "writtenUpto" without holding a spinlock.  Note that it may not
be consistent with the other shared variables of the WAL receiver
protected by a spinlock, but this should not be used for data
integrity checks."

Sounds good. Attached is the updated version of the patch.

Thanks, looks good to me.

Pushed. Thanks!

I agree that what has been done with MyProc->waitStart in 46d6e5f is
not safe, and that initialization should happen once at postmaster
startup, with a write(0) when starting the backend. There are two of
them in proc.c, one in twophase.c. Do you mind if I add an open item
for this one?

Yeah, please feel free to do that! BTW, I already posted the patch
addressing that issue, at [1].

Okay, item added with a link to the original thread.

Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION