Function to know last log write timestamp

Started by Tatsuo Ishiiover 11 years ago22 messages
#1Tatsuo Ishii
ishii@postgresql.org

We can know the LSN of last committed WAL record on primary by using
pg_current_xlog_location(). It seems there's no API to know the time
when the WAL record was created. I would like to know standby delay by
using pg_last_xact_replay_timestamp() and such that API.

If there's no such a API, it would be useful to invent usch an API IMO.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Fujii Masao
masao.fujii@gmail.com
In reply to: Tatsuo Ishii (#1)
Re: Function to know last log write timestamp

On Mon, Aug 11, 2014 at 9:23 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:

We can know the LSN of last committed WAL record on primary by using
pg_current_xlog_location(). It seems there's no API to know the time
when the WAL record was created. I would like to know standby delay by
using pg_last_xact_replay_timestamp() and such that API.

If there's no such a API, it would be useful to invent usch an API IMO.

+1

I proposed that function before, but unfortunately it failed to be applied.
But I still think that function is useful to calculate the replication delay.
The past discussion is
/messages/by-id/CAHGQGwF3ZjfuNEj5ka683KU5rQUBtSWtqFq7g1X0g34o+JXWBw@mail.gmail.com

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tatsuo Ishii
ishii@postgresql.org
In reply to: Fujii Masao (#2)
Re: Function to know last log write timestamp

On Mon, Aug 11, 2014 at 9:23 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:

We can know the LSN of last committed WAL record on primary by using
pg_current_xlog_location(). It seems there's no API to know the time
when the WAL record was created. I would like to know standby delay by
using pg_last_xact_replay_timestamp() and such that API.

If there's no such a API, it would be useful to invent usch an API IMO.

+1

I proposed that function before, but unfortunately it failed to be applied.
But I still think that function is useful to calculate the replication delay.
The past discussion is
/messages/by-id/CAHGQGwF3ZjfuNEj5ka683KU5rQUBtSWtqFq7g1X0g34o+JXWBw@mail.gmail.com

I looked into the thread briefly and found Simon and Robert gave -1
for this because of performance concern. I'm not sure if it's a actual
performance penalty or not. Maybe we need to major the penalty?

However I still think that kind of API is very useful because
replication delay is one of the big DBA's concern. Why don't we have a
switch to enable the API for DBAs who think the priority is
replication delay, over small performance penalty?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Fujii Masao
masao.fujii@gmail.com
In reply to: Tatsuo Ishii (#3)
Re: Function to know last log write timestamp

On Mon, Aug 11, 2014 at 10:48 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:

On Mon, Aug 11, 2014 at 9:23 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:

We can know the LSN of last committed WAL record on primary by using
pg_current_xlog_location(). It seems there's no API to know the time
when the WAL record was created. I would like to know standby delay by
using pg_last_xact_replay_timestamp() and such that API.

If there's no such a API, it would be useful to invent usch an API IMO.

+1

I proposed that function before, but unfortunately it failed to be applied.
But I still think that function is useful to calculate the replication delay.
The past discussion is
/messages/by-id/CAHGQGwF3ZjfuNEj5ka683KU5rQUBtSWtqFq7g1X0g34o+JXWBw@mail.gmail.com

I looked into the thread briefly and found Simon and Robert gave -1
for this because of performance concern. I'm not sure if it's a actual
performance penalty or not. Maybe we need to major the penalty?

I think that the performance penalty is negligible small because the patch
I posted before added only three stores to shared memory per commit/abort.
No time-consuming operations like lock, gettimeofday, etc were added.
Of course, it's worth checking whether the penalty is actually small or not,
though.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andres Freund
andres@2ndquadrant.com
In reply to: Fujii Masao (#4)
Re: Function to know last log write timestamp

On 2014-08-11 12:42:06 +0900, Fujii Masao wrote:

On Mon, Aug 11, 2014 at 10:48 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:

On Mon, Aug 11, 2014 at 9:23 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:

We can know the LSN of last committed WAL record on primary by using
pg_current_xlog_location(). It seems there's no API to know the time
when the WAL record was created. I would like to know standby delay by
using pg_last_xact_replay_timestamp() and such that API.

If there's no such a API, it would be useful to invent usch an API IMO.

+1

I proposed that function before, but unfortunately it failed to be applied.
But I still think that function is useful to calculate the replication delay.
The past discussion is
/messages/by-id/CAHGQGwF3ZjfuNEj5ka683KU5rQUBtSWtqFq7g1X0g34o+JXWBw@mail.gmail.com

I looked into the thread briefly and found Simon and Robert gave -1
for this because of performance concern. I'm not sure if it's a actual
performance penalty or not. Maybe we need to major the penalty?

I think that the performance penalty is negligible small because the patch
I posted before added only three stores to shared memory per
commit/abort.

Uh. It adds another atomic operation (the spinlock) to the commit
path. That's surely *not* insignificant. At the very least the
concurrency approach needs to be rethought.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Fujii Masao
masao.fujii@gmail.com
In reply to: Andres Freund (#5)
Re: Function to know last log write timestamp

On Mon, Aug 11, 2014 at 3:54 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-11 12:42:06 +0900, Fujii Masao wrote:

On Mon, Aug 11, 2014 at 10:48 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:

On Mon, Aug 11, 2014 at 9:23 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:

We can know the LSN of last committed WAL record on primary by using
pg_current_xlog_location(). It seems there's no API to know the time
when the WAL record was created. I would like to know standby delay by
using pg_last_xact_replay_timestamp() and such that API.

If there's no such a API, it would be useful to invent usch an API IMO.

+1

I proposed that function before, but unfortunately it failed to be applied.
But I still think that function is useful to calculate the replication delay.
The past discussion is
/messages/by-id/CAHGQGwF3ZjfuNEj5ka683KU5rQUBtSWtqFq7g1X0g34o+JXWBw@mail.gmail.com

I looked into the thread briefly and found Simon and Robert gave -1
for this because of performance concern. I'm not sure if it's a actual
performance penalty or not. Maybe we need to major the penalty?

I think that the performance penalty is negligible small because the patch
I posted before added only three stores to shared memory per
commit/abort.

Uh. It adds another atomic operation (the spinlock) to the commit
path. That's surely *not* insignificant. At the very least the
concurrency approach needs to be rethought.

No, the patch doesn't add the spinlock at all. What the commit path
additionally does are

1. increment the counter in shared memory
2. set the timestamp of last commit record to shared memory
3. increment the counter in shared memory

There is no extra spinlock.

OTOH, when pg_last_xact_insert_timestamp reads the timestamp from
the shared memory, it checks whether the counter values are the same
or not before and after reading the timestamp. If they are not the same,
it tries to read the timesetamp again. This logic is necessary for reading
the consistent timestamp value there.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Andres Freund
andres@2ndquadrant.com
In reply to: Fujii Masao (#6)
Re: Function to know last log write timestamp

On 2014-08-11 16:20:41 +0900, Fujii Masao wrote:

On Mon, Aug 11, 2014 at 3:54 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-11 12:42:06 +0900, Fujii Masao wrote:

On Mon, Aug 11, 2014 at 10:48 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:

On Mon, Aug 11, 2014 at 9:23 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:

We can know the LSN of last committed WAL record on primary by using
pg_current_xlog_location(). It seems there's no API to know the time
when the WAL record was created. I would like to know standby delay by
using pg_last_xact_replay_timestamp() and such that API.

If there's no such a API, it would be useful to invent usch an API IMO.

+1

I proposed that function before, but unfortunately it failed to be applied.
But I still think that function is useful to calculate the replication delay.
The past discussion is
/messages/by-id/CAHGQGwF3ZjfuNEj5ka683KU5rQUBtSWtqFq7g1X0g34o+JXWBw@mail.gmail.com

I looked into the thread briefly and found Simon and Robert gave -1
for this because of performance concern. I'm not sure if it's a actual
performance penalty or not. Maybe we need to major the penalty?

I think that the performance penalty is negligible small because the patch
I posted before added only three stores to shared memory per
commit/abort.

Uh. It adds another atomic operation (the spinlock) to the commit
path. That's surely *not* insignificant. At the very least the
concurrency approach needs to be rethought.

No, the patch doesn't add the spinlock at all. What the commit path
additionally does are

1. increment the counter in shared memory
2. set the timestamp of last commit record to shared memory
3. increment the counter in shared memory

There is no extra spinlock.

Ah, I see. There's another patch somewhere down that thread
(CAHGQGwG4xFZjfyzaBn5v__d3qpyNNsGBpH3nAr6p40eLivkW5w@mail.gmail.com). The
patch in the message you linked to *does* use a spinlock though.

OTOH, when pg_last_xact_insert_timestamp reads the timestamp from
the shared memory, it checks whether the counter values are the same
or not before and after reading the timestamp. If they are not the same,
it tries to read the timesetamp again. This logic is necessary for reading
the consistent timestamp value there.

Yea, that approach then just touches a cacheline that should already be
local. I doubt that the implementation is correct on some more lenient
platforms (missing write memory barrier), but that's not "your fault".

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#6)
Re: Function to know last log write timestamp

On Mon, Aug 11, 2014 at 3:20 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

There is no extra spinlock.

The version I reviewed had one; that's what I was objecting to.

Might need to add some pg_read_barrier() and pg_write_barrier() calls,
since we have those now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#8)
Re: Function to know last log write timestamp

On Fri, Aug 15, 2014 at 1:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Aug 11, 2014 at 3:20 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

There is no extra spinlock.

The version I reviewed had one; that's what I was objecting to.

Sorry for confusing you. I posted the latest patch to other thread.
This version doesn't use any spinlock.

/messages/by-id/CAHGQGwEwuh5CC3ib6wd0fxs9LAWme=kO09S4MOXnYnAfn7N5Bg@mail.gmail.com

Might need to add some pg_read_barrier() and pg_write_barrier() calls,
since we have those now.

Yep, memory barries might be needed as follows.

* Set the commit timestamp to shared memory.

shmem->counter++;
pg_write_barrier();
shmem->timestamp = my_timestamp;
pg_write_barrier();
shmem->count++;

* Read the commit timestamp from shared memory

my_count = shmem->counter;
pg_read_barrier();
my_timestamp = shmem->timestamp;
pg_read_barrier();
my_count = shmem->counter;

Is this way to use memory barriers right?

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#9)
Re: Function to know last log write timestamp

On Thu, Aug 14, 2014 at 1:51 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Aug 15, 2014 at 1:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Aug 11, 2014 at 3:20 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

There is no extra spinlock.

The version I reviewed had one; that's what I was objecting to.

Sorry for confusing you. I posted the latest patch to other thread.
This version doesn't use any spinlock.

/messages/by-id/CAHGQGwEwuh5CC3ib6wd0fxs9LAWme=kO09S4MOXnYnAfn7N5Bg@mail.gmail.com

Might need to add some pg_read_barrier() and pg_write_barrier() calls,
since we have those now.

Yep, memory barries might be needed as follows.

* Set the commit timestamp to shared memory.

shmem->counter++;
pg_write_barrier();
shmem->timestamp = my_timestamp;
pg_write_barrier();
shmem->count++;

* Read the commit timestamp from shared memory

my_count = shmem->counter;
pg_read_barrier();
my_timestamp = shmem->timestamp;
pg_read_barrier();
my_count = shmem->counter;

Is this way to use memory barriers right?

That's about the idea. However, what you've got there is actually
unsafe, because shmem->counter++ is not an atomic operation. It reads
the counter (possibly even as two separate 4-byte loads if the counter
is an 8-byte value), increments it inside the CPU, and then writes the
resulting value back to memory. If two backends do this concurrently,
one of the updates might be lost.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#10)
Re: Function to know last log write timestamp

On 2014-08-14 14:19:13 -0400, Robert Haas wrote:

On Thu, Aug 14, 2014 at 1:51 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Aug 15, 2014 at 1:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Aug 11, 2014 at 3:20 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

There is no extra spinlock.

The version I reviewed had one; that's what I was objecting to.

Sorry for confusing you. I posted the latest patch to other thread.
This version doesn't use any spinlock.

/messages/by-id/CAHGQGwEwuh5CC3ib6wd0fxs9LAWme=kO09S4MOXnYnAfn7N5Bg@mail.gmail.com

Might need to add some pg_read_barrier() and pg_write_barrier() calls,
since we have those now.

Yep, memory barries might be needed as follows.

* Set the commit timestamp to shared memory.

shmem->counter++;
pg_write_barrier();
shmem->timestamp = my_timestamp;
pg_write_barrier();
shmem->count++;

* Read the commit timestamp from shared memory

my_count = shmem->counter;
pg_read_barrier();
my_timestamp = shmem->timestamp;
pg_read_barrier();
my_count = shmem->counter;

Is this way to use memory barriers right?

That's about the idea. However, what you've got there is actually
unsafe, because shmem->counter++ is not an atomic operation. It reads
the counter (possibly even as two separate 4-byte loads if the counter
is an 8-byte value), increments it inside the CPU, and then writes the
resulting value back to memory. If two backends do this concurrently,
one of the updates might be lost.

All these are only written by one backend, so it should be safe. Note
that that coding pattern, just without memory barriers, is all over
pgstat.c

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#11)
Re: Function to know last log write timestamp

On Thu, Aug 14, 2014 at 2:21 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-14 14:19:13 -0400, Robert Haas wrote:

On Thu, Aug 14, 2014 at 1:51 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Aug 15, 2014 at 1:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Aug 11, 2014 at 3:20 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

There is no extra spinlock.

The version I reviewed had one; that's what I was objecting to.

Sorry for confusing you. I posted the latest patch to other thread.
This version doesn't use any spinlock.

/messages/by-id/CAHGQGwEwuh5CC3ib6wd0fxs9LAWme=kO09S4MOXnYnAfn7N5Bg@mail.gmail.com

Might need to add some pg_read_barrier() and pg_write_barrier() calls,
since we have those now.

Yep, memory barries might be needed as follows.

* Set the commit timestamp to shared memory.

shmem->counter++;
pg_write_barrier();
shmem->timestamp = my_timestamp;
pg_write_barrier();
shmem->count++;

* Read the commit timestamp from shared memory

my_count = shmem->counter;
pg_read_barrier();
my_timestamp = shmem->timestamp;
pg_read_barrier();
my_count = shmem->counter;

Is this way to use memory barriers right?

That's about the idea. However, what you've got there is actually
unsafe, because shmem->counter++ is not an atomic operation. It reads
the counter (possibly even as two separate 4-byte loads if the counter
is an 8-byte value), increments it inside the CPU, and then writes the
resulting value back to memory. If two backends do this concurrently,
one of the updates might be lost.

All these are only written by one backend, so it should be safe. Note
that that coding pattern, just without memory barriers, is all over
pgstat.c

Ah, OK. If there's a separate slot for each backend, I agree that it's safe.

We should probably add barriers to pgstat.c, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Andres Freund
andres@2ndquadrant.com
In reply to: Robert Haas (#12)
Re: Function to know last log write timestamp

On 2014-08-14 14:37:22 -0400, Robert Haas wrote:

On Thu, Aug 14, 2014 at 2:21 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-14 14:19:13 -0400, Robert Haas wrote:

That's about the idea. However, what you've got there is actually
unsafe, because shmem->counter++ is not an atomic operation. It reads
the counter (possibly even as two separate 4-byte loads if the counter
is an 8-byte value), increments it inside the CPU, and then writes the
resulting value back to memory. If two backends do this concurrently,
one of the updates might be lost.

All these are only written by one backend, so it should be safe. Note
that that coding pattern, just without memory barriers, is all over
pgstat.c

Ah, OK. If there's a separate slot for each backend, I agree that it's safe.

We should probably add barriers to pgstat.c, too.

Yea, definitely. I think this is rather borked on "weaker"
architectures. It's just that the consequences of an out of date/torn
value are rather low, so it's unlikely to be noticed.

Imo we should encapsulate the changecount modifications/checks somehow
instead of repeating the barriers, Asserts, comments et al everywhere.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Fujii Masao
masao.fujii@gmail.com
In reply to: Andres Freund (#13)
1 attachment(s)
Re: Function to know last log write timestamp

On Fri, Aug 15, 2014 at 3:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-14 14:37:22 -0400, Robert Haas wrote:

On Thu, Aug 14, 2014 at 2:21 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-14 14:19:13 -0400, Robert Haas wrote:

That's about the idea. However, what you've got there is actually
unsafe, because shmem->counter++ is not an atomic operation. It reads
the counter (possibly even as two separate 4-byte loads if the counter
is an 8-byte value), increments it inside the CPU, and then writes the
resulting value back to memory. If two backends do this concurrently,
one of the updates might be lost.

All these are only written by one backend, so it should be safe. Note
that that coding pattern, just without memory barriers, is all over
pgstat.c

Ah, OK. If there's a separate slot for each backend, I agree that it's safe.

We should probably add barriers to pgstat.c, too.

Yea, definitely. I think this is rather borked on "weaker"
architectures. It's just that the consequences of an out of date/torn
value are rather low, so it's unlikely to be noticed.

Imo we should encapsulate the changecount modifications/checks somehow
instead of repeating the barriers, Asserts, comments et al everywhere.

So what about applying the attached patch first, which adds the macros
to load and store the changecount with the memory barries, and changes
pgstat.c use them. Maybe this patch needs to be back-patch to at least 9.4?

After applying the patch, I will rebase the pg_last_xact_insert_timestamp
patch and post it again.

Regards,

--
Fujii Masao

Attachments:

add_memory_barrier_to_pgstat_v1.patchtext/x-patch; charset=US-ASCII; name=add_memory_barrier_to_pgstat_v1.patchDownload
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
***************
*** 2563,2569 **** pgstat_bestart(void)
  	beentry = MyBEEntry;
  	do
  	{
! 		beentry->st_changecount++;
  	} while ((beentry->st_changecount & 1) == 0);
  
  	beentry->st_procpid = MyProcPid;
--- 2563,2569 ----
  	beentry = MyBEEntry;
  	do
  	{
! 		pgstat_increment_changecount_before(beentry);
  	} while ((beentry->st_changecount & 1) == 0);
  
  	beentry->st_procpid = MyProcPid;
***************
*** 2588,2595 **** pgstat_bestart(void)
  	beentry->st_appname[NAMEDATALEN - 1] = '\0';
  	beentry->st_activity[pgstat_track_activity_query_size - 1] = '\0';
  
! 	beentry->st_changecount++;
! 	Assert((beentry->st_changecount & 1) == 0);
  
  	/* Update app name to current GUC setting */
  	if (application_name)
--- 2588,2594 ----
  	beentry->st_appname[NAMEDATALEN - 1] = '\0';
  	beentry->st_activity[pgstat_track_activity_query_size - 1] = '\0';
  
! 	pgstat_increment_changecount_after(beentry);
  
  	/* Update app name to current GUC setting */
  	if (application_name)
***************
*** 2624,2635 **** pgstat_beshutdown_hook(int code, Datum arg)
  	 * before and after.  We use a volatile pointer here to ensure the
  	 * compiler doesn't try to get cute.
  	 */
! 	beentry->st_changecount++;
  
  	beentry->st_procpid = 0;	/* mark invalid */
  
! 	beentry->st_changecount++;
! 	Assert((beentry->st_changecount & 1) == 0);
  }
  
  
--- 2623,2633 ----
  	 * before and after.  We use a volatile pointer here to ensure the
  	 * compiler doesn't try to get cute.
  	 */
! 	pgstat_increment_changecount_before(beentry);
  
  	beentry->st_procpid = 0;	/* mark invalid */
  
! 	pgstat_increment_changecount_after(beentry);
  }
  
  
***************
*** 2666,2672 **** pgstat_report_activity(BackendState state, const char *cmd_str)
  			 * non-disabled state.  As our final update, change the state and
  			 * clear fields we will not be updating anymore.
  			 */
! 			beentry->st_changecount++;
  			beentry->st_state = STATE_DISABLED;
  			beentry->st_state_start_timestamp = 0;
  			beentry->st_activity[0] = '\0';
--- 2664,2670 ----
  			 * non-disabled state.  As our final update, change the state and
  			 * clear fields we will not be updating anymore.
  			 */
! 			pgstat_increment_changecount_before(beentry);
  			beentry->st_state = STATE_DISABLED;
  			beentry->st_state_start_timestamp = 0;
  			beentry->st_activity[0] = '\0';
***************
*** 2674,2681 **** pgstat_report_activity(BackendState state, const char *cmd_str)
  			/* st_xact_start_timestamp and st_waiting are also disabled */
  			beentry->st_xact_start_timestamp = 0;
  			beentry->st_waiting = false;
! 			beentry->st_changecount++;
! 			Assert((beentry->st_changecount & 1) == 0);
  		}
  		return;
  	}
--- 2672,2678 ----
  			/* st_xact_start_timestamp and st_waiting are also disabled */
  			beentry->st_xact_start_timestamp = 0;
  			beentry->st_waiting = false;
! 			pgstat_increment_changecount_after(beentry);
  		}
  		return;
  	}
***************
*** 2695,2701 **** pgstat_report_activity(BackendState state, const char *cmd_str)
  	/*
  	 * Now update the status entry
  	 */
! 	beentry->st_changecount++;
  
  	beentry->st_state = state;
  	beentry->st_state_start_timestamp = current_timestamp;
--- 2692,2698 ----
  	/*
  	 * Now update the status entry
  	 */
! 	pgstat_increment_changecount_before(beentry);
  
  	beentry->st_state = state;
  	beentry->st_state_start_timestamp = current_timestamp;
***************
*** 2707,2714 **** pgstat_report_activity(BackendState state, const char *cmd_str)
  		beentry->st_activity_start_timestamp = start_timestamp;
  	}
  
! 	beentry->st_changecount++;
! 	Assert((beentry->st_changecount & 1) == 0);
  }
  
  /* ----------
--- 2704,2710 ----
  		beentry->st_activity_start_timestamp = start_timestamp;
  	}
  
! 	pgstat_increment_changecount_after(beentry);
  }
  
  /* ----------
***************
*** 2734,2746 **** pgstat_report_appname(const char *appname)
  	 * st_changecount before and after.  We use a volatile pointer here to
  	 * ensure the compiler doesn't try to get cute.
  	 */
! 	beentry->st_changecount++;
  
  	memcpy((char *) beentry->st_appname, appname, len);
  	beentry->st_appname[len] = '\0';
  
! 	beentry->st_changecount++;
! 	Assert((beentry->st_changecount & 1) == 0);
  }
  
  /*
--- 2730,2741 ----
  	 * st_changecount before and after.  We use a volatile pointer here to
  	 * ensure the compiler doesn't try to get cute.
  	 */
! 	pgstat_increment_changecount_before(beentry);
  
  	memcpy((char *) beentry->st_appname, appname, len);
  	beentry->st_appname[len] = '\0';
  
! 	pgstat_increment_changecount_after(beentry);
  }
  
  /*
***************
*** 2760,2769 **** pgstat_report_xact_timestamp(TimestampTz tstamp)
  	 * st_changecount before and after.  We use a volatile pointer here to
  	 * ensure the compiler doesn't try to get cute.
  	 */
! 	beentry->st_changecount++;
  	beentry->st_xact_start_timestamp = tstamp;
! 	beentry->st_changecount++;
! 	Assert((beentry->st_changecount & 1) == 0);
  }
  
  /* ----------
--- 2755,2763 ----
  	 * st_changecount before and after.  We use a volatile pointer here to
  	 * ensure the compiler doesn't try to get cute.
  	 */
! 	pgstat_increment_changecount_before(beentry);
  	beentry->st_xact_start_timestamp = tstamp;
! 	pgstat_increment_changecount_after(beentry);
  }
  
  /* ----------
***************
*** 2839,2845 **** pgstat_read_current_status(void)
  		 */
  		for (;;)
  		{
! 			int			save_changecount = beentry->st_changecount;
  
  			localentry->backendStatus.st_procpid = beentry->st_procpid;
  			if (localentry->backendStatus.st_procpid > 0)
--- 2833,2842 ----
  		 */
  		for (;;)
  		{
! 			int			before_changecount;
! 			int			after_changecount;
! 
! 			pgstat_save_changecount_before(beentry, before_changecount);
  
  			localentry->backendStatus.st_procpid = beentry->st_procpid;
  			if (localentry->backendStatus.st_procpid > 0)
***************
*** 2856,2863 **** pgstat_read_current_status(void)
  				localentry->backendStatus.st_activity = localactivity;
  			}
  
! 			if (save_changecount == beentry->st_changecount &&
! 				(save_changecount & 1) == 0)
  				break;
  
  			/* Make sure we can break out of loop if stuck... */
--- 2853,2861 ----
  				localentry->backendStatus.st_activity = localactivity;
  			}
  
! 			pgstat_save_changecount_after(beentry, after_changecount);
! 			if (before_changecount == after_changecount &&
! 				(before_changecount & 1) == 0)
  				break;
  
  			/* Make sure we can break out of loop if stuck... */
***************
*** 2927,2938 **** pgstat_get_backend_current_activity(int pid, bool checkUser)
  
  		for (;;)
  		{
! 			int			save_changecount = vbeentry->st_changecount;
  
  			found = (vbeentry->st_procpid == pid);
  
! 			if (save_changecount == vbeentry->st_changecount &&
! 				(save_changecount & 1) == 0)
  				break;
  
  			/* Make sure we can break out of loop if stuck... */
--- 2925,2941 ----
  
  		for (;;)
  		{
! 			int			before_changecount;
! 			int			after_changecount;
! 
! 			pgstat_save_changecount_before(vbeentry, before_changecount);
  
  			found = (vbeentry->st_procpid == pid);
  
! 			pgstat_save_changecount_after(vbeentry, after_changecount);
! 
! 			if (before_changecount == after_changecount &&
! 				(before_changecount & 1) == 0)
  				break;
  
  			/* Make sure we can break out of loop if stuck... */
*** a/src/include/pgstat.h
--- b/src/include/pgstat.h
***************
*** 16,21 ****
--- 16,22 ----
  #include "libpq/pqcomm.h"
  #include "portability/instr_time.h"
  #include "postmaster/pgarch.h"
+ #include "storage/barrier.h"
  #include "utils/hsearch.h"
  #include "utils/relcache.h"
  
***************
*** 714,719 **** typedef struct PgBackendStatus
--- 715,726 ----
  	 * st_changecount again.  If the value hasn't changed, and if it's even,
  	 * the copy is valid; otherwise start over.  This makes updates cheap
  	 * while reads are potentially expensive, but that's the tradeoff we want.
+ 	 *
+ 	 * The above protocol needs the memory barriers to ensure that
+ 	 * the apparent order of execution is as it desires. Otherwise,
+ 	 * for example, the CPU might rearrange the code so that st_changecount
+ 	 * is incremented twice before the modification on a machine with
+ 	 * weak memory ordering. This surprising result can lead to bugs.
  	 */
  	int			st_changecount;
  
***************
*** 745,750 **** typedef struct PgBackendStatus
--- 752,794 ----
  	char	   *st_activity;
  } PgBackendStatus;
  
+ /*
+  * Macros to load and store st_changecount with the memory barriers.
+  *
+  * pgstat_increment_changecount_before() and
+  * pgstat_increment_changecount_after() need to be called before and after
+  * PgBackendStatus entries are modified, respectively. This makes sure that
+  * st_changecount is incremented around the modification.
+  *
+  * Also pgstat_save_changecount_before() and pgstat_save_changecount_after()
+  * need to be called before and after PgBackendStatus entries are copied into
+  * private memory, respectively.
+  */
+ #define pgstat_increment_changecount_before(beentry)	\
+ 	do {	\
+ 		beentry->st_changecount++;	\
+ 		pg_write_barrier();	\
+ 	} while (0)
+ 
+ #define pgstat_increment_changecount_after(beentry)	\
+ 	do {	\
+ 		pg_write_barrier();	\
+ 		beentry->st_changecount++;	\
+ 		Assert((beentry->st_changecount & 1) == 0);	\
+ 	} while (0)
+ 
+ #define pgstat_save_changecount_before(beentry, save_changecount)	\
+ 	do {	\
+ 		save_changecount = beentry->st_changecount;	\
+ 		pg_read_barrier();	\
+ 	} while (0)
+ 
+ #define pgstat_save_changecount_after(beentry, save_changecount)	\
+ 	do {	\
+ 		pg_read_barrier();	\
+ 		save_changecount = beentry->st_changecount;	\
+ 	} while (0)
+ 
  /* ----------
   * LocalPgBackendStatus
   *
#15Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#14)
Re: Function to know last log write timestamp

On Fri, Aug 15, 2014 at 7:17 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Aug 15, 2014 at 3:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-14 14:37:22 -0400, Robert Haas wrote:

On Thu, Aug 14, 2014 at 2:21 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-14 14:19:13 -0400, Robert Haas wrote:

That's about the idea. However, what you've got there is actually
unsafe, because shmem->counter++ is not an atomic operation. It reads
the counter (possibly even as two separate 4-byte loads if the counter
is an 8-byte value), increments it inside the CPU, and then writes the
resulting value back to memory. If two backends do this concurrently,
one of the updates might be lost.

All these are only written by one backend, so it should be safe. Note
that that coding pattern, just without memory barriers, is all over
pgstat.c

Ah, OK. If there's a separate slot for each backend, I agree that it's safe.

We should probably add barriers to pgstat.c, too.

Yea, definitely. I think this is rather borked on "weaker"
architectures. It's just that the consequences of an out of date/torn
value are rather low, so it's unlikely to be noticed.

Imo we should encapsulate the changecount modifications/checks somehow
instead of repeating the barriers, Asserts, comments et al everywhere.

So what about applying the attached patch first, which adds the macros
to load and store the changecount with the memory barries, and changes
pgstat.c use them. Maybe this patch needs to be back-patch to at least 9.4?

After applying the patch, I will rebase the pg_last_xact_insert_timestamp
patch and post it again.

That looks OK to me on a relatively-quick read-through. I was
initially a bit worried about this part:

do
{
! pgstat_increment_changecount_before(beentry);
} while ((beentry->st_changecount & 1) == 0);

pgstat_increment_changecount_before is an increment followed by a
write barrier. This seemed like funny coding to me at first because
while-test isn't protected by any sort of barrier. But now I think
it's correct, because there's only one process that can possibly write
to that data, and that's the one that is making the test, and it had
certainly better see its own modifications in program order no matter
what.

I wouldn't object to back-patching this to 9.4 if we were earlier in
the beta cycle, but at this point I'm more inclined to just put it in
9.5. If we get an actual bug report about any of this, we can always
back-patch the fix at that time. But so far that seems mostly
hypothetical, so I think the less-risky course of action is to give
this a longer time to bake before it hits an official release.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#15)
Re: Function to know last log write timestamp

On Tue, Aug 19, 2014 at 1:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Aug 15, 2014 at 7:17 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Aug 15, 2014 at 3:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-14 14:37:22 -0400, Robert Haas wrote:

On Thu, Aug 14, 2014 at 2:21 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-14 14:19:13 -0400, Robert Haas wrote:

That's about the idea. However, what you've got there is actually
unsafe, because shmem->counter++ is not an atomic operation. It reads
the counter (possibly even as two separate 4-byte loads if the counter
is an 8-byte value), increments it inside the CPU, and then writes the
resulting value back to memory. If two backends do this concurrently,
one of the updates might be lost.

All these are only written by one backend, so it should be safe. Note
that that coding pattern, just without memory barriers, is all over
pgstat.c

Ah, OK. If there's a separate slot for each backend, I agree that it's safe.

We should probably add barriers to pgstat.c, too.

Yea, definitely. I think this is rather borked on "weaker"
architectures. It's just that the consequences of an out of date/torn
value are rather low, so it's unlikely to be noticed.

Imo we should encapsulate the changecount modifications/checks somehow
instead of repeating the barriers, Asserts, comments et al everywhere.

So what about applying the attached patch first, which adds the macros
to load and store the changecount with the memory barries, and changes
pgstat.c use them. Maybe this patch needs to be back-patch to at least 9.4?

After applying the patch, I will rebase the pg_last_xact_insert_timestamp
patch and post it again.

That looks OK to me on a relatively-quick read-through. I was
initially a bit worried about this part:

do
{
! pgstat_increment_changecount_before(beentry);
} while ((beentry->st_changecount & 1) == 0);

pgstat_increment_changecount_before is an increment followed by a
write barrier. This seemed like funny coding to me at first because
while-test isn't protected by any sort of barrier. But now I think
it's correct, because there's only one process that can possibly write
to that data, and that's the one that is making the test, and it had
certainly better see its own modifications in program order no matter
what.

I wouldn't object to back-patching this to 9.4 if we were earlier in
the beta cycle, but at this point I'm more inclined to just put it in
9.5. If we get an actual bug report about any of this, we can always
back-patch the fix at that time. But so far that seems mostly
hypothetical, so I think the less-risky course of action is to give
this a longer time to bake before it hits an official release.

Sounds reasonable. So, barring any objection, I will apply the patch
only to the master branch.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Jim Nasby
jim@nasby.net
In reply to: Fujii Masao (#16)
Re: Function to know last log write timestamp

On 8/27/14, 7:33 AM, Fujii Masao wrote:

On Tue, Aug 19, 2014 at 1:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Aug 15, 2014 at 7:17 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Aug 15, 2014 at 3:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-14 14:37:22 -0400, Robert Haas wrote:

On Thu, Aug 14, 2014 at 2:21 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-14 14:19:13 -0400, Robert Haas wrote:

That's about the idea. However, what you've got there is actually
unsafe, because shmem->counter++ is not an atomic operation. It reads
the counter (possibly even as two separate 4-byte loads if the counter
is an 8-byte value), increments it inside the CPU, and then writes the
resulting value back to memory. If two backends do this concurrently,
one of the updates might be lost.

All these are only written by one backend, so it should be safe. Note
that that coding pattern, just without memory barriers, is all over
pgstat.c

Ah, OK. If there's a separate slot for each backend, I agree that it's safe.

We should probably add barriers to pgstat.c, too.

Yea, definitely. I think this is rather borked on "weaker"
architectures. It's just that the consequences of an out of date/torn
value are rather low, so it's unlikely to be noticed.

Imo we should encapsulate the changecount modifications/checks somehow
instead of repeating the barriers, Asserts, comments et al everywhere.

So what about applying the attached patch first, which adds the macros
to load and store the changecount with the memory barries, and changes
pgstat.c use them. Maybe this patch needs to be back-patch to at least 9.4?

After applying the patch, I will rebase the pg_last_xact_insert_timestamp
patch and post it again.

That looks OK to me on a relatively-quick read-through. I was
initially a bit worried about this part:

do
{
! pgstat_increment_changecount_before(beentry);
} while ((beentry->st_changecount & 1) == 0);

pgstat_increment_changecount_before is an increment followed by a
write barrier. This seemed like funny coding to me at first because
while-test isn't protected by any sort of barrier. But now I think
it's correct, because there's only one process that can possibly write
to that data, and that's the one that is making the test, and it had
certainly better see its own modifications in program order no matter
what.

I wouldn't object to back-patching this to 9.4 if we were earlier in
the beta cycle, but at this point I'm more inclined to just put it in
9.5. If we get an actual bug report about any of this, we can always
back-patch the fix at that time. But so far that seems mostly
hypothetical, so I think the less-risky course of action is to give
this a longer time to bake before it hits an official release.

Sounds reasonable. So, barring any objection, I will apply the patch
only to the master branch.

It's probably worth adding a comment explaining why it's safe to do this without a barrier...
--
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Fujii Masao
masao.fujii@gmail.com
In reply to: Jim Nasby (#17)
Re: Function to know last log write timestamp

On Thu, Aug 28, 2014 at 2:44 AM, Jim Nasby <jim@nasby.net> wrote:

On 8/27/14, 7:33 AM, Fujii Masao wrote:

On Tue, Aug 19, 2014 at 1:07 AM, Robert Haas <robertmhaas@gmail.com>
wrote:

On Fri, Aug 15, 2014 at 7:17 AM, Fujii Masao <masao.fujii@gmail.com>
wrote:

On Fri, Aug 15, 2014 at 3:40 AM, Andres Freund <andres@2ndquadrant.com>
wrote:

On 2014-08-14 14:37:22 -0400, Robert Haas wrote:

On Thu, Aug 14, 2014 at 2:21 PM, Andres Freund
<andres@2ndquadrant.com> wrote:

On 2014-08-14 14:19:13 -0400, Robert Haas wrote:

That's about the idea. However, what you've got there is actually
unsafe, because shmem->counter++ is not an atomic operation. It
reads
the counter (possibly even as two separate 4-byte loads if the
counter
is an 8-byte value), increments it inside the CPU, and then writes
the
resulting value back to memory. If two backends do this
concurrently,
one of the updates might be lost.

All these are only written by one backend, so it should be safe. Note
that that coding pattern, just without memory barriers, is all over
pgstat.c

Ah, OK. If there's a separate slot for each backend, I agree that
it's safe.

We should probably add barriers to pgstat.c, too.

Yea, definitely. I think this is rather borked on "weaker"
architectures. It's just that the consequences of an out of date/torn
value are rather low, so it's unlikely to be noticed.

Imo we should encapsulate the changecount modifications/checks somehow
instead of repeating the barriers, Asserts, comments et al everywhere.

So what about applying the attached patch first, which adds the macros
to load and store the changecount with the memory barries, and changes
pgstat.c use them. Maybe this patch needs to be back-patch to at least
9.4?

After applying the patch, I will rebase the
pg_last_xact_insert_timestamp
patch and post it again.

That looks OK to me on a relatively-quick read-through. I was
initially a bit worried about this part:

do
{
! pgstat_increment_changecount_before(beentry);
} while ((beentry->st_changecount & 1) == 0);

pgstat_increment_changecount_before is an increment followed by a
write barrier. This seemed like funny coding to me at first because
while-test isn't protected by any sort of barrier. But now I think
it's correct, because there's only one process that can possibly write
to that data, and that's the one that is making the test, and it had
certainly better see its own modifications in program order no matter
what.

I wouldn't object to back-patching this to 9.4 if we were earlier in
the beta cycle, but at this point I'm more inclined to just put it in
9.5. If we get an actual bug report about any of this, we can always
back-patch the fix at that time. But so far that seems mostly
hypothetical, so I think the less-risky course of action is to give
this a longer time to bake before it hits an official release.

Sounds reasonable. So, barring any objection, I will apply the patch
only to the master branch.

It's probably worth adding a comment explaining why it's safe to do this
without a barrier...

s/without/with ?

Theoretically it's not safe without a barrier on a machine with weak
memory ordering. No?

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#18)
Re: Function to know last log write timestamp

On Thu, Aug 28, 2014 at 3:34 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

Theoretically it's not safe without a barrier on a machine with weak
memory ordering. No?

Why not?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Michael Paquier
michael.paquier@gmail.com
In reply to: Fujii Masao (#14)
Re: Function to know last log write timestamp

On Fri, Aug 15, 2014 at 8:17 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Aug 15, 2014 at 3:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-14 14:37:22 -0400, Robert Haas wrote:

On Thu, Aug 14, 2014 at 2:21 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-14 14:19:13 -0400, Robert Haas wrote:

That's about the idea. However, what you've got there is actually
unsafe, because shmem->counter++ is not an atomic operation. It reads
the counter (possibly even as two separate 4-byte loads if the counter
is an 8-byte value), increments it inside the CPU, and then writes the
resulting value back to memory. If two backends do this concurrently,
one of the updates might be lost.

All these are only written by one backend, so it should be safe. Note
that that coding pattern, just without memory barriers, is all over
pgstat.c

Ah, OK. If there's a separate slot for each backend, I agree that it's safe.

We should probably add barriers to pgstat.c, too.

Yea, definitely. I think this is rather borked on "weaker"
architectures. It's just that the consequences of an out of date/torn
value are rather low, so it's unlikely to be noticed.

Imo we should encapsulate the changecount modifications/checks somehow
instead of repeating the barriers, Asserts, comments et al everywhere.

So what about applying the attached patch first, which adds the macros
to load and store the changecount with the memory barries, and changes
pgstat.c use them. Maybe this patch needs to be back-patch to at least 9.4?

After applying the patch, I will rebase the pg_last_xact_insert_timestamp
patch and post it again.

Hm, what's the status on this patch? The addition of those macros to
control count increment with a memory barrier seems like a good thing
at least. The 2nd patch has not been rebased but still..
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Fujii Masao
masao.fujii@gmail.com
In reply to: Michael Paquier (#20)
Re: Function to know last log write timestamp

On Wed, Nov 26, 2014 at 4:05 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Aug 15, 2014 at 8:17 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Aug 15, 2014 at 3:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-14 14:37:22 -0400, Robert Haas wrote:

On Thu, Aug 14, 2014 at 2:21 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-14 14:19:13 -0400, Robert Haas wrote:

That's about the idea. However, what you've got there is actually
unsafe, because shmem->counter++ is not an atomic operation. It reads
the counter (possibly even as two separate 4-byte loads if the counter
is an 8-byte value), increments it inside the CPU, and then writes the
resulting value back to memory. If two backends do this concurrently,
one of the updates might be lost.

All these are only written by one backend, so it should be safe. Note
that that coding pattern, just without memory barriers, is all over
pgstat.c

Ah, OK. If there's a separate slot for each backend, I agree that it's safe.

We should probably add barriers to pgstat.c, too.

Yea, definitely. I think this is rather borked on "weaker"
architectures. It's just that the consequences of an out of date/torn
value are rather low, so it's unlikely to be noticed.

Imo we should encapsulate the changecount modifications/checks somehow
instead of repeating the barriers, Asserts, comments et al everywhere.

So what about applying the attached patch first, which adds the macros
to load and store the changecount with the memory barries, and changes
pgstat.c use them. Maybe this patch needs to be back-patch to at least 9.4?

After applying the patch, I will rebase the pg_last_xact_insert_timestamp
patch and post it again.

Hm, what's the status on this patch? The addition of those macros to
control count increment with a memory barrier seems like a good thing
at least.

Thanks for reminding me of that! Barring any objection, I will commit it.

The 2nd patch has not been rebased but still..

The feature that this 2nd patch implements is very similar to a part of
what the committs patch does, i.e., tracking the timestamps of the committed
transactions. If the committs patch will have been committed, basically
I'd like to no longer work on the 2nd patch to avoid the duplicate work.
OTOH, I'm concerned about the performance impact by the committs patch.
So, for the simple use case like the check of replication lag, what the 2nd
patch implements seems to be better, though...

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#21)
Re: Function to know last log write timestamp

On Fri, Nov 28, 2014 at 9:07 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Wed, Nov 26, 2014 at 4:05 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Aug 15, 2014 at 8:17 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Aug 15, 2014 at 3:40 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-14 14:37:22 -0400, Robert Haas wrote:

On Thu, Aug 14, 2014 at 2:21 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-08-14 14:19:13 -0400, Robert Haas wrote:

That's about the idea. However, what you've got there is actually
unsafe, because shmem->counter++ is not an atomic operation. It reads
the counter (possibly even as two separate 4-byte loads if the counter
is an 8-byte value), increments it inside the CPU, and then writes the
resulting value back to memory. If two backends do this concurrently,
one of the updates might be lost.

All these are only written by one backend, so it should be safe. Note
that that coding pattern, just without memory barriers, is all over
pgstat.c

Ah, OK. If there's a separate slot for each backend, I agree that it's safe.

We should probably add barriers to pgstat.c, too.

Yea, definitely. I think this is rather borked on "weaker"
architectures. It's just that the consequences of an out of date/torn
value are rather low, so it's unlikely to be noticed.

Imo we should encapsulate the changecount modifications/checks somehow
instead of repeating the barriers, Asserts, comments et al everywhere.

So what about applying the attached patch first, which adds the macros
to load and store the changecount with the memory barries, and changes
pgstat.c use them. Maybe this patch needs to be back-patch to at least 9.4?

After applying the patch, I will rebase the pg_last_xact_insert_timestamp
patch and post it again.

Hm, what's the status on this patch? The addition of those macros to
control count increment with a memory barrier seems like a good thing
at least.

Thanks for reminding me of that! Barring any objection, I will commit it.

Applied.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers