Session WAL activity

Started by Konstantin Knizhnikabout 6 years ago12 messages

k.knizhnik@postgrespro.ru

about 6 years ago

1 attachment(s)

Hi hackers,

One of our customers complains about that some sessions generates "too
much WAL records".
Certainly WAL activity doesn't indicate a problem itself: huge workload
cause huge WAL activity.
But them are trying to understand which clients produces so much
database changes and complain that there is
no way to get such information in Postgres. For example in Oracle this
problems can be solved in this way:

http://www.dba-oracle.com/t_find_session_generating_high_redo.htm

Unfortunately there is actually no simple and accurate way to calculate
amount of WAL produced by the particular session.
It is possible to parse WAL (for example using pg_waldump), then using
XID->pid mapping accumulate size of transactions produced by each backend.
But this is very inconvenient and not DBA friendly approach.

I have implemented small patch which collects such statistic.
I have added walWritten field to PGPROC and increment it in
CopyXLogRecordToWAL.
It is possible to inspect this field using pg_stat_get_wal_activity(pid)
function and also I have added
pg_stat_wal_activity which just adds wal_written to standard
pg_activity view:

I wonder if such extra statistic about session WAL activity is
considered to be useful?
The only problem with this approach from my point of view is adding 8
bytes to PGPROC.
But there are already so many fields in this structure
(sizeof(PGPROC)=816), that adding yet another 8 bytes should not be
noticeable.

Comments are welcome.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

wal_activity.patchtext/x-patch; name=wal_activity.patchDownload

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5f0ee50..c985b04 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1486,6 +1486,9 @@ CopyXLogRecordToWAL(int write_len, bool isLogSwitch, XLogRecData *rdata,
 	currpos = GetXLogBuffer(CurrPos);
 	freespace = INSERT_FREESPACE(CurrPos);
 
+	/* Accumulate amount of data written to WAL for pg_xact_activity */
+	MyProc->walWritten += write_len;
+
 	/*
 	 * there should be enough space for at least the first field (xl_tot_len)
 	 * on this page.
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 000cff3..037d313 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -758,6 +758,9 @@ CREATE VIEW pg_stat_activity AS
         LEFT JOIN pg_database AS D ON (S.datid = D.oid)
         LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
 
+CREATE VIEW pg_stat_wal_activity AS
+    SELECT a.*,pg_stat_get_wal_activity(a.pid) as wal_written FROM pg_stat_activity a;
+
 CREATE VIEW pg_stat_replication AS
     SELECT
             S.pid,
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index fff0628..bf6ab34 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -390,6 +390,8 @@ InitProcess(void)
 	MyPgXact->xid = InvalidTransactionId;
 	MyPgXact->xmin = InvalidTransactionId;
 	MyProc->pid = MyProcPid;
+	MyProc->walWritten = 0;
+
 	/* backendId, databaseId and roleId will be filled in later */
 	MyProc->backendId = InvalidBackendId;
 	MyProc->databaseId = InvalidOid;
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bf..b3acba5 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -546,7 +546,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	29
+
+	#define PG_STAT_GET_ACTIVITY_COLS	29
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -919,6 +920,22 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 
 
 Datum
+pg_stat_get_wal_activity(PG_FUNCTION_ARGS)
+{
+	int32	pid = PG_GETARG_INT32(0);
+	PGPROC* proc = BackendPidGetProc(pid);
+	if (proc == NULL)
+	{
+		proc = AuxiliaryPidGetProc(pid);
+	}
+	if (proc == NULL)
+		PG_RETURN_NULL();
+	else
+		PG_RETURN_INT64(proc->walWritten);
+}
+
+
+Datum
 pg_backend_pid(PG_FUNCTION_ARGS)
 {
 	PG_RETURN_INT32(MyProcPid);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b759b15..6d31afd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5152,6 +5152,10 @@
   proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
   proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc}',
   prosrc => 'pg_stat_get_activity' },
+{ oid => '2121', descr => 'statistics: WAL activity',
+  proname => 'pg_stat_get_wal_activity', provolatile => 's', proisstrict => 't',
+  proparallel => 'r', prorettype => 'int8', proargtypes => 'int4',
+  prosrc => 'pg_stat_get_wal_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
   proname => 'pg_stat_get_progress_info', prorows => '100', proretset => 't',
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 281e1db..16859bf 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -203,6 +203,11 @@ struct PGPROC
 	PGPROC	   *lockGroupLeader;	/* lock group leader, if I'm a member */
 	dlist_head	lockGroupMembers;	/* list of members, if I'm a leader */
 	dlist_node	lockGroupLink;	/* my member link, if I'm a member */
+
+	/*
+	 * Amount of data written to the WAL by this process
+	 */
+	uint64      walWritten;
 };
 
 /* NOTE: "typedef struct PGPROC PGPROC" appears in storage/lock.h. */

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 6 years ago

In reply to: Konstantin Knizhnik (#1)

Re: Session WAL activity

Hello.

At Tue, 3 Dec 2019 18:01:28 +0300, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote in

Hi hackers,

One of our customers complains about that some sessions generates "too
much WAL records".
Certainly WAL activity doesn't indicate a problem itself: huge
workload cause huge WAL activity.
But them are trying to understand which clients produces so much
database changes and complain that there is
no way to get such information in Postgres. For example in Oracle this
problems can be solved in this way:

http://www.dba-oracle.com/t_find_session_generating_high_redo.htm

Unfortunately there is actually no simple and accurate way to
calculate amount of WAL produced by the particular session.
It is possible to parse WAL (for example using pg_waldump), then using
XID->pid mapping accumulate size of transactions produced by each
backend.
But this is very inconvenient and not DBA friendly approach.

I have implemented small patch which collects such statistic.
I have added walWritten field to PGPROC and increment it in
CopyXLogRecordToWAL.
It is possible to inspect this field using
pg_stat_get_wal_activity(pid) function and also I have added
pg_stat_wal_activity which just adds wal_written to standard
pg_activity view:

postgres=# select pid, backend_type, wal_written from
pg_stat_wal_activity ;
pid | backend_type | wal_written
------+------------------------------+-------------
4405 | autovacuum launcher | 0
4407 | logical replication launcher | 0
4750 | client backend | 86195
4403 | background writer | 204
4402 | checkpointer | 328
4404 | walwriter | 0
(6 rows)

I wonder if such extra statistic about session WAL activity is
considered to be useful?

The only problem with this approach from my point of view is adding 8
bytes to PGPROC.
But there are already so many fields in this structure
(sizeof(PGPROC)=816), that adding yet another 8 bytes should not be
noticeable.

Comments are welcome.

It seems to be useful to me. We also might want statistics of other
session IOs. In that case the table name would be
"pg_stat_session/process_activity". We are aleady collecting most
kinds of the IO activity but it loses session information...

Briefly looking the patch, I have some comments on it.

As mentioned above, if we are intending future exantion of the
session-stats table, the name should be changed.

Backend status is more appropriate than PGPROC. See pgstat.c.

Some kind of locking is needed to update the fields on shared segment.
(LWLocks for PGPROC and PGSTAT_BEGIN/END_WRITE_ACTIVITY for
PgBackendStatus)

Knitpickings:

The patch contains a trace of older trial in
pg_stat_get_activity. Proc OID should be >= 8000 in
patches. src/include/catalog/unused_oids offers some OID for you.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 6 years ago

In reply to: Kyotaro Horiguchi (#2)

Re: Session WAL activity

On 04.12.2019 8:33, Kyotaro Horiguchi wrote:

It seems to be useful to me. We also might want statistics of other
session IOs. In that case the table name would be
"pg_stat_session/process_activity". We are aleady collecting most
kinds of the IO activity but it loses session information...

Well, actually monitoring disk activity for the particular
backend/session can be easily done using some external tools
(just because now in Postgres session=backend=process). So we can
monitor IO of processes, for example using iotop at Unix
or Performance Monitor at Windows.

Certainly it is more convenient to have such statstic inside Postgres.
But I am not sure if it is really needed.
Concerning WAL activity situation is more obscure: records can be added
to the WAL by one process, but written by another.
This is why it is not possible to use some external tools.

Briefly looking the patch, I have some comments on it.

As mentioned above, if we are intending future exantion of the
session-stats table, the name should be changed.

Backend status is more appropriate than PGPROC. See pgstat.c.

Do you mean pgstat_fetch_stat_beentry?
But why it is better than storing this information directly in PGPROC?
As far as this information ha to be updated from XLogInsertRecord and
it seems to be very performance critical function my intention was to
minimize
overhead of maintaining this statistic. It is hard to imagine something
more efficient than just MyProc->walWriten += write_len;

Also pgstat_fetch_stat_beentry is taken backend id, which is not
reported in pg_stat_activity view and this is why it is more
convenient to pass PID to pg_stat_get_wal_activity. Certainly it is
possible to map PID to backendid, but... why actually do we need to
perform such mapping if simpler solution exists?

Some kind of locking is needed to update the fields on shared segment.
(LWLocks for PGPROC and PGSTAT_BEGIN/END_WRITE_ACTIVITY for
PgBackendStatus)

This information is updated locally only by backend itself.
Certainly update of 64 bit field is not atomic at 32-but architectures.
But it is just statistic. I do not think that it will be fatal if for a
moment
we can see some incorrect value of written WAL bytes (and at most
platforms this
update will be atomic).

As I already wrote above, this information in updated in performance
critical place and this is why
I want to avoid any expensive operations (such as locking or atomic
updates) as much as possible.

Knitpickings:

The patch contains a trace of older trial in
pg_stat_get_activity. Proc OID should be >= 8000 in
patches. src/include/catalog/unused_oids offers some OID for you.

Will fix it.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 6 years ago

In reply to: Konstantin Knizhnik (#3)

Re: Session WAL activity

Hi.

At Wed, 4 Dec 2019 16:40:27 +0300, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote in

On 04.12.2019 8:33, Kyotaro Horiguchi wrote:

It seems to be useful to me. We also might want statistics of other
session IOs. In that case the table name would be
"pg_stat_session/process_activity". We are aleady collecting most
kinds of the IO activity but it loses session information...

Well, actually monitoring disk activity for the particular
backend/session can be easily done using some external tools
(just because now in Postgres session=backend=process). So we can
monitor IO of processes, for example using iotop at Unix
or Performance Monitor at Windows.

Operations that completes on shared buffers cannot be monitored that
way. This is the same with WAL writing.

Certainly it is more convenient to have such statstic inside
Postgres. But I am not sure if it is really needed.
Concerning WAL activity situation is more obscure: records can be
added to the WAL by one process, but written by another.
This is why it is not possible to use some external tools.

For clarity, I didn't suggest that this patch should include general
session IO statistics. Just the view name looked a bit specific.

Briefly looking the patch, I have some comments on it.

As mentioned above, if we are intending future exantion of the
session-stats table, the name should be changed.

Backend status is more appropriate than PGPROC. See pgstat.c.

Do you mean pgstat_fetch_stat_beentry?
But why it is better than storing this information directly in PGPROC?

No it cannot be used there for performance reasons as you are
saying. I'm not sure it's acceptable, but we can directly access
backend status the same way if we expose MyBEEntry (and update it
through a macro or a inline function). If we don't need per record
resolution for the value, we can update a process local variable at
WAL-write time then write it to backend status at commit time or at
the same timing as pgstat reporting.

According to my faint memory, PGPROC is thought that it must be kept
as small as possible for the reasons of CPU caches, that is the reason
for PgBackendStatus.

As far as this information ha to be updated from XLogInsertRecord and
it seems to be very performance critical function my intention was to
minimize
overhead of maintaining this statistic. It is hard to imagine
something more efficient than just MyProc->walWriten += write_len;

Also pgstat_fetch_stat_beentry is taken backend id, which is not
reported in pg_stat_activity view and this is why it is more
convenient to pass PID to pg_stat_get_wal_activity. Certainly it is
possible to map PID to backendid, but... why actually do we need to
perform such mapping if simpler solution exists?

Some kind of locking is needed to update the fields on shared segment.
(LWLocks for PGPROC and PGSTAT_BEGIN/END_WRITE_ACTIVITY for
PgBackendStatus)

This information is updated locally only by backend itself.
Certainly update of 64 bit field is not atomic at 32-but
architectures.
But it is just statistic. I do not think that it will be fatal if for
a moment
we can see some incorrect value of written WAL bytes (and at most
platforms this
update will be atomic).

At least reader needs to take procarray lock to keep PID-WALwrite
consistency, in order to prevent reading WALwrite values for a wrong
process.

As I already wrote above, this information in updated in performance
critical place and this is why
I want to avoid any expensive operations (such as locking or atomic
updates) as much as possible.

I'm afraid that the reason doesn't justify expanding PGPROC..

Knitpickings:

The patch contains a trace of older trial in
pg_stat_get_activity. Proc OID should be >= 8000 in
patches. src/include/catalog/unused_oids offers some OID for you.

Will fix it.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 6 years ago

In reply to: Kyotaro Horiguchi (#4)

Re: Session WAL activity

On 05.12.2019 5:37, Kyotaro Horiguchi wrote:

It seems to be useful to me. We also might want statistics of other

session IOs. In that case the table name would be
"pg_stat_session/process_activity". We are aleady collecting most
kinds of the IO activity but it loses session information...

Well, actually monitoring disk activity for the particular
backend/session can be easily done using some external tools
(just because now in Postgres session=backend=process). So we can
monitor IO of processes, for example using iotop at Unix
or Performance Monitor at Windows.

Operations that completes on shared buffers cannot be monitored that
way. This is the same with WAL writing.

The questions is what we are going to monitor?
Amount of read/dirtied buffers or amount of disk ops?

Certainly it is more convenient to have such statstic inside
Postgres. But I am not sure if it is really needed.
Concerning WAL activity situation is more obscure: records can be
added to the WAL by one process, but written by another.
This is why it is not possible to use some external tools.

For clarity, I didn't suggest that this patch should include general
session IO statistics. Just the view name looked a bit specific.

I am not sure if pg_stat_wal_activity view should be added at all.
We can just add pg_stat_get_wal_activity function and let user specify
PID of backend himself (for example by performing join with
pg_stat_activity).
I proposed name pg_stat_wal_activity just for similarity with
pg_stat_activity but can use any other proposed name.

Briefly looking the patch, I have some comments on it.

As mentioned above, if we are intending future exantion of the
session-stats table, the name should be changed.

Backend status is more appropriate than PGPROC. See pgstat.c.

Do you mean pgstat_fetch_stat_beentry?
But why it is better than storing this information directly in PGPROC?

No it cannot be used there for performance reasons as you are
saying. I'm not sure it's acceptable, but we can directly access
backend status the same way if we expose MyBEEntry (and update it
through a macro or a inline function). If we don't need per record
resolution for the value, we can update a process local variable at
WAL-write time then write it to backend status at commit time or at
the same timing as pgstat reporting.

According to my faint memory, PGPROC is thought that it must be kept
as small as possible for the reasons of CPU caches, that is the reason
for PgBackendStatus.

Why do you think that adding one addition (without any locks and
function calls) to CopyXLogRecordToWAL is not acceptable.
It is just one instruction added to expensive functions. At least I have
not noticed any measurable impact on performance.

Concerning keeping PGPROC size as small as possible, I agree that it is
reasonable argument.
But even now it is very large (816 bytes) and adding extra 8 bytes will
increase it on less than 1%.

This information is updated locally only by backend itself.
Certainly update of 64 bit field is not atomic at 32-but
architectures.
But it is just statistic. I do not think that it will be fatal if for
a moment
we can see some incorrect value of written WAL bytes (and at most
platforms this
update will be atomic).

At least reader needs to take procarray lock to keep PID-WALwrite
consistency, in order to prevent reading WALwrite values for a wrong
process.

Sorry, but I still do not understand whats wrong can happen if reader
will see WAL activity of wrong process.
Yes, correspondent backend may be already terminated and its PGPROC
entry can be reused for some other process.
In this case we can wrongly attribute WAL traffic generated by
terminated backend to the new process
or report zero traffic for old process. But this information is mostly
needed for live (active) backends. So I do not think
that race conditions here are so critical.

Right now pg_stat_activity also accessing PGPROC to obtain wait event
information and also not taking any locks.
So it can wrongly report backend status. But I never heard that somebody
complains about it.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Michael Paquier

michael@paquier.xyz

about 6 years ago

In reply to: Konstantin Knizhnik (#5)

Re: Session WAL activity

On Thu, Dec 05, 2019 at 12:23:40PM +0300, Konstantin Knizhnik wrote:

Concerning keeping PGPROC size as small as possible, I agree that it is
reasonable argument.
But even now it is very large (816 bytes) and adding extra 8 bytes will
increase it on less than 1%.

It does not mean that we should add all kind of things to PGPROC as
that's a structure sensitive enough already. By the way, why do you
assume that 8-byte reads are always safe and atomic in the patch?

Right now pg_stat_activity also accessing PGPROC to obtain wait event
information and also not taking any locks.
So it can wrongly report backend status. But I never heard that somebody
complains about it.

Please see pgstat.h, close to pgstat_report_wait_start().
--
Michael

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 6 years ago

In reply to: Michael Paquier (#6)

Re: Session WAL activity

On 06.12.2019 4:57, Michael Paquier wrote:

On Thu, Dec 05, 2019 at 12:23:40PM +0300, Konstantin Knizhnik wrote:

Concerning keeping PGPROC size as small as possible, I agree that it is
reasonable argument.
But even now it is very large (816 bytes) and adding extra 8 bytes will
increase it on less than 1%.

It does not mean that we should add all kind of things to PGPROC as
that's a structure sensitive enough already. By the way, why do you
assume that 8-byte reads are always safe and atomic in the patch?

I never assumed it - in the previous mail I wrote:

Certainly update of 64 bit field is not atomic at 32-but architectures.

Right now pg_stat_activity also accessing PGPROC to obtain wait event
information and also not taking any locks.
So it can wrongly report backend status. But I never heard that somebody
complains about it.

Please see pgstat.h, close to pgstat_report_wait_start().

Sorry, I do not understand what should I look for?
Do you mean this comment:
ï¿½ï¿½ï¿½ /*
ï¿½ï¿½ï¿½ ï¿½* Since this is a four-byte field which is always read and written as
ï¿½ï¿½ï¿½ ï¿½* four-bytes, updates are atomic.
ï¿½ï¿½ï¿½ï¿½ */

Yes, I alreadyï¿½ have noticed that as far as walWritten is 64-bit, its
update is not atomic at 32-bit platforms and so it is possible to see
sometimes incorrect values.
So monotone observe of walWritten can be violated. From my point of view
it is not so critical to enforce update of this fields under lock or
accumulating result in local variable with later write it to backend
status at commit time as Kyotaro proposed. Monitoring of WAL activity is
especially interested for long living transactions and from my point of
view it is much more
important to be able to see up-to-date but may be not always correct
information then do not see any information at all before commit.
Please also take in account the percent of 32-bit Postgres installations
and probability of observing non-atomic update of 64-bit walWritten
field (I think that you will have no chances to see it even if you will
run Postgres for a years).

But what I mean by "wrongly report backend wait event status" is thatï¿½
pg_stat_activity may report wait event status for wrong backend.
I.e. if backend is already terminated and its PGPROC entry is reused by
some other backend, than you can see incorrect wait event information:
backend with such PID actually never sleep on this event.

In my reply

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Kyotaro Horiguchi

horikyota.ntt@gmail.com

about 6 years ago

In reply to: Konstantin Knizhnik (#7)

Re: Session WAL activity

At Fri, 6 Dec 2019 11:22:14 +0300, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote in

On 06.12.2019 4:57, Michael Paquier wrote:

On Thu, Dec 05, 2019 at 12:23:40PM +0300, Konstantin Knizhnik wrote:
Please see pgstat.h, close to pgstat_report_wait_start().

Sorry, I do not understand what should I look for?
Do you mean this comment:
    /*
    * Since this is a four-byte field which is always read and
written as
    * four-bytes, updates are atomic.
     */

Yes, I already have noticed that as far as walWritten is 64-bit, its
update is not atomic at 32-bit platforms and so it is possible to see
sometimes incorrect values.
So monotone observe of walWritten can be violated. From my point of
view it is not so critical to enforce update of this fields under lock
or accumulating result in local variable with later write it to
backend status at commit time as Kyotaro proposed. Monitoring of WAL
activity is especially interested for long living transactions and
from my point of view it is much more
important to be able to see up-to-date but may be not always correct
information then do not see any information at all before commit.
Please also take in account the percent of 32-bit Postgres
installations and probability of observing non-atomic update of 64-bit
walWritten field (I think that you will have no chances to see it even
if you will run Postgres for a years).

Still I'm not sure non-atomic write is acceptable, but I agree on the
necessity of updating it during a transaction. Couldn't we update
shared stats every n bytes (XLOG_BLCKSZ or such) or every command end?

I think we should refrain from inserting an instruction within the
WALInsertLock section, but I'm not sure which is better between "var
+= var" within the section and "if (inserted) var += var;" outside. If
we can ignore the possitbility of the case where xlogswitch is
omitted, the "if (inserted)" is not needed.

But what I mean by "wrongly report backend wait event status" is that
pg_stat_activity may report wait event status for wrong backend.
I.e. if backend is already terminated and its PGPROC entry is reused
by some other backend, than you can see incorrect wait event
information:
backend with such PID actually never sleep on this event.

I saw a case where an entry with very old xact_start_timestamp
suddenly popped up in pg_stat_activity but I haven't found the cause..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 6 years ago

In reply to: Kyotaro Horiguchi (#8)

1 attachment(s)

Re: Session WAL activity

On 11.12.2019 7:26, Kyotaro Horiguchi wrote:

Still I'm not sure non-atomic write is acceptable, but I agree on the
necessity of updating it during a transaction. Couldn't we update
shared stats every n bytes (XLOG_BLCKSZ or such) or every command end?

I think we should refrain from inserting an instruction within the
WALInsertLock section, but I'm not sure which is better between "var
+= var" within the section and "if (inserted) var += var;" outside. If
we can ignore the possitbility of the case where xlogswitch is
omitted, the "if (inserted)" is not needed.

I think that 32-bit Postgres installations are really exotic, but I
agree that showing incorrect result (even with very small probability)
is not acceptable behavior in this case. I attached new versoin of the
patch with use pg_atomic_write_u64 for updating walWritten field.
As far as at 64-bit systems, pg_atomic_write_u64and pg_atomic_read_u64
are translated to ordinary memory access, them should not have some
negative
impact on performance.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

wal_activity-2.patchtext/x-patch; name=wal_activity-2.patchDownload

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5f0ee50..6a4b209 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1486,6 +1486,9 @@ CopyXLogRecordToWAL(int write_len, bool isLogSwitch, XLogRecData *rdata,
 	currpos = GetXLogBuffer(CurrPos);
 	freespace = INSERT_FREESPACE(CurrPos);
 
+	/* Accumulate amount of data written to WAL for pg_xact_activity */
+	pg_atomic_write_u64(&MyProc->walWritten, pg_atomic_read_u64(&MyProc->walWritten) + write_len);
+
 	/*
 	 * there should be enough space for at least the first field (xl_tot_len)
 	 * on this page.
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 000cff3..037d313 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -758,6 +758,9 @@ CREATE VIEW pg_stat_activity AS
         LEFT JOIN pg_database AS D ON (S.datid = D.oid)
         LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
 
+CREATE VIEW pg_stat_wal_activity AS
+    SELECT a.*,pg_stat_get_wal_activity(a.pid) as wal_written FROM pg_stat_activity a;
+
 CREATE VIEW pg_stat_replication AS
     SELECT
             S.pid,
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index fff0628..c9a63a1 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -390,6 +390,8 @@ InitProcess(void)
 	MyPgXact->xid = InvalidTransactionId;
 	MyPgXact->xmin = InvalidTransactionId;
 	MyProc->pid = MyProcPid;
+	pg_atomic_init_u64(&MyProc->walWritten, 0);
+
 	/* backendId, databaseId and roleId will be filled in later */
 	MyProc->backendId = InvalidBackendId;
 	MyProc->databaseId = InvalidOid;
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bf..1745815 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -919,6 +919,22 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 
 
 Datum
+pg_stat_get_wal_activity(PG_FUNCTION_ARGS)
+{
+	int32	pid = PG_GETARG_INT32(0);
+	PGPROC* proc = BackendPidGetProc(pid);
+	if (proc == NULL)
+	{
+		proc = AuxiliaryPidGetProc(pid);
+	}
+	if (proc == NULL)
+		PG_RETURN_NULL();
+	else
+		PG_RETURN_INT64(pg_atomic_read_u64(&proc->walWritten));
+}
+
+
+Datum
 pg_backend_pid(PG_FUNCTION_ARGS)
 {
 	PG_RETURN_INT32(MyProcPid);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b759b15..6d31afd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5152,6 +5152,10 @@
   proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
   proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc}',
   prosrc => 'pg_stat_get_activity' },
+{ oid => '2121', descr => 'statistics: WAL activity',
+  proname => 'pg_stat_get_wal_activity', provolatile => 's', proisstrict => 't',
+  proparallel => 'r', prorettype => 'int8', proargtypes => 'int4',
+  prosrc => 'pg_stat_get_wal_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
   proname => 'pg_stat_get_progress_info', prorows => '100', proretset => 't',
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 281e1db..3aa2efb 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -203,6 +203,11 @@ struct PGPROC
 	PGPROC	   *lockGroupLeader;	/* lock group leader, if I'm a member */
 	dlist_head	lockGroupMembers;	/* list of members, if I'm a leader */
 	dlist_node	lockGroupLink;	/* my member link, if I'm a member */
+
+	/*
+	 * Amount of data written to the WAL by this process
+	 */
+	pg_atomic_uint64 walWritten;
 };
 
 /* NOTE: "typedef struct PGPROC PGPROC" appears in storage/lock.h. */

#10

Craig Ringer

craig@2ndquadrant.com

about 6 years ago

In reply to: Michael Paquier (#6)

Re: Session WAL activity

On Fri, 6 Dec 2019 at 09:57, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Dec 05, 2019 at 12:23:40PM +0300, Konstantin Knizhnik wrote:

Concerning keeping PGPROC size as small as possible, I agree that it is
reasonable argument.
But even now it is very large (816 bytes) and adding extra 8 bytes will
increase it on less than 1%.

It does not mean that we should add all kind of things to PGPROC as
that's a structure sensitive enough already.

Right. It's not as critical as PGXACT, but PGPROC is still significant for
scalability and connection count limits.

It's a shame we can't really keep most of it in backend-private memory and
copy it to requestors when they want it, say into a temporary DSA or over a
shm_mq. But our single threaded model means we just cannot rely on backends
being responsive in a timely enough manner to supply data on-demand. That
doesn't mean we have to push it to PGPROC though: we could be sending the
parts that don't have to be super-fresh to the stats collector or a new
related process for active session stats and letting it aggregate them.

That's way beyond the scope of this patch though. So personally I'm OK with
the new PGPROC field. Visibility into Pg's activity is woefully limited and
something we need to prioritize more.

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

#11

Bruce Momjian

bruce@momjian.us

about 6 years ago

In reply to: Craig Ringer (#10)

Re: Session WAL activity

On Thu, Dec 12, 2019 at 09:31:22AM +0800, Craig Ringer wrote:

On Fri, 6 Dec 2019 at 09:57, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Dec 05, 2019 at 12:23:40PM +0300, Konstantin Knizhnik wrote:

Concerning keeping PGPROC size as small as possible, I agree that it is
reasonable argument.
But even now it is very large (816 bytes) and adding extra 8 bytes will
increase it on less than 1%.

It does not mean that we should add all kind of things to PGPROC as
that's a structure sensitive enough already.

Right. It's not as critical as PGXACT, but PGPROC is still significant for
scalability and connection count limits.

It's a shame we can't really keep most of it in backend-private memory and copy
it to requestors when they want it, say into a temporary DSA or over a shm_mq.
But our single threaded model means we just cannot rely on backends being
responsive in a timely enough manner to supply data on-demand. That doesn't
mean we have to push it to PGPROC though: we could be sending the parts that
don't have to be super-fresh to the stats collector or a new related process
for active session stats and letting it aggregate them.

That's way beyond the scope of this patch though. So personally I'm OK with the
new PGPROC field. Visibility into Pg's activity is woefully limited and
something we need to prioritize more.

Uh, how much does the new field get us near the CPU cache line max size
for a single PGPROC entry?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

#12

Andres Freund

andres@anarazel.de

about 6 years ago

In reply to: Bruce Momjian (#11)

Re: Session WAL activity

Hi,

On 2019-12-20 16:38:32 -0500, Bruce Momjian wrote:

On Thu, Dec 12, 2019 at 09:31:22AM +0800, Craig Ringer wrote:

On Fri, 6 Dec 2019 at 09:57, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Dec 05, 2019 at 12:23:40PM +0300, Konstantin Knizhnik wrote:

Concerning keeping PGPROC size as small as possible, I agree that it is
reasonable argument.
But even now it is very large (816 bytes) and adding extra 8 bytes will
increase it on less than 1%.

It does not mean that we should add all kind of things to PGPROC as
that's a structure sensitive enough already.

Well, but we don't keep other stats in PGPROC, even when we have them in
shared memory? It seems like PgBackendStatus or such might be a better
place?

Right. It's not as critical as PGXACT, but PGPROC is still significant for
scalability and connection count limits.

It's a shame we can't really keep most of it in backend-private memory and copy
it to requestors when they want it, say into a temporary DSA or over
a shm_mq.

I don't understand what that would buy? Commonly accessed field are just
going to be in L2 or such, with the cacheline being in
modified/exclusive state. The problem isn't that fields / cachelines
*can* be accessed by other backends, it's only a problem *if* they're
frequently accessed. And even if accessed by multiple backends, it's
only really a problem if there are multiple fields *and* they're also
modified (otherwise they can just stay in shared stated across
cpus/sockets).

There *is* an argument for grouping fields in PGPROC by their access
patterns. E.g. something like ->procArrayGroup* is a lot more commonly
accessed by different backends than e.g. this proposed field.

But our single threaded model means we just cannot rely on backends being
responsive in a timely enough manner to supply data on-demand. That doesn't
mean we have to push it to PGPROC though: we could be sending the parts that
don't have to be super-fresh to the stats collector or a new related process
for active session stats and letting it aggregate them.

We should definitely *NOT* do that. Ferrying things through the stats
collector is really expensive, and everyone pays the price for an
increase in size, not just code accessing the field. In fact, no
reasonable quantity that's known at server start should ever go through
a mechanism as expensive as pgstat - the only reason it exists is that
the number of tables obviously can grow over time.

There's a thread somewhere about a patchset to move all of pgstat into
dynamic shared memory, actually. Because the writes / reads needed by
pgstat are really bad on some systems.

That's way beyond the scope of this patch though. So personally I'm OK with the
new PGPROC field. Visibility into Pg's activity is woefully limited and
something we need to prioritize more.

Uh, how much does the new field get us near the CPU cache line max size
for a single PGPROC entry?

It's like ~13x the common size of a cache line (64bytes).

Greetings,

Andres Freund