Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

Started by Bertrand Drouvotabout 3 years ago48 messageshackers
Jump to latest
#1Bertrand Drouvot
bertranddrouvot.pg@gmail.com

Hi hackers,

Please find attached a patch proposal for $SUBJECT.

The idea has been raised in [1]/messages/by-id/20230111225907.6el6c5j3hukizqxc@awork3.anarazel.de by Andres: it would allow to simplify even more the work done to
generate pg_stat_get_xact*() functions with Macros.

Indeed, with the reconciliation done in find_tabstat_entry() then all the pg_stat_get_xact*() functions
(making use of find_tabstat_entry()) now "look the same" (should they take into account live subtransactions or not).

Looking forward to your feedback,

Regards

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

[1]: /messages/by-id/20230111225907.6el6c5j3hukizqxc@awork3.anarazel.de

Attachments:

v1-0001-find_tabstat_entry_recon_get_rid_PgStat_BackendFunctionEntry.patchtext/plain; charset=UTF-8; name=v1-0001-find_tabstat_entry_recon_get_rid_PgStat_BackendFunctionEntry.patchDownload+60-44
#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Bertrand Drouvot (#1)
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

At Thu, 9 Feb 2023 11:38:18 +0100, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in

Hi hackers,

Please find attached a patch proposal for $SUBJECT.

The idea has been raised in [1] by Andres: it would allow to simplify
even more the work done to
generate pg_stat_get_xact*() functions with Macros.

Indeed, with the reconciliation done in find_tabstat_entry() then all
the pg_stat_get_xact*() functions
(making use of find_tabstat_entry()) now "look the same" (should they
take into account live subtransactions or not).

Looking forward to your feedback,

I like that direction.

Don't we rename PgStat_FunctionCounts to PgStat_FuncStatus, unifying
neighboring functions?

Why does find_tabstat_entry() copies the whole pending data and
performs subxaction summarization? The summarization is needed only by
few callers but now that cost is imposed to the all callers along with
additional palloc()/pfree() calls. That doesn't seem reasonable.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#3Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Kyotaro Horiguchi (#2)
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

Hi,

On 2/10/23 3:32 AM, Kyotaro Horiguchi wrote:

At Thu, 9 Feb 2023 11:38:18 +0100, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in

Hi hackers,

Please find attached a patch proposal for $SUBJECT.

The idea has been raised in [1] by Andres: it would allow to simplify
even more the work done to
generate pg_stat_get_xact*() functions with Macros.

Indeed, with the reconciliation done in find_tabstat_entry() then all
the pg_stat_get_xact*() functions
(making use of find_tabstat_entry()) now "look the same" (should they
take into account live subtransactions or not).

Looking forward to your feedback,

I like that direction.

Thanks for looking at it!

Don't we rename PgStat_FunctionCounts to PgStat_FuncStatus, unifying
neighboring functions?

Not sure, I think it's the counter part of PgStat_TableCounts for example.

Why does find_tabstat_entry() copies the whole pending data and
performs subxaction summarization?

It copies the pending data to not increment it's counters while doing the summarization.
The summarization was done here to avoid the pg_stat_get_xact*() functions to do the computation so that all
the pg_stat_get_xact*() functions look the same but....

The summarization is needed only by
few callers but now that cost is imposed to the all callers along with
additional palloc()/pfree() calls. That doesn't seem reasonable.

I agree that's not the best approach.....

Let me come back with another proposal (thinking to increment reconciled
counters in pgstat_count_heap_insert(), pgstat_count_heap_delete() and
pgstat_count_heap_update()).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#4Andres Freund
andres@anarazel.de
In reply to: Bertrand Drouvot (#3)
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

: Andres Freund <andres@anarazel.de>
References: <b9e1f543-ee93-8168-d530-d961708ad9d3@gmail.com>
<20230210.113242.699878230551547182.horikyota.ntt@gmail.com>
<5420b28c-d33f-d25d-9f47-b06b8a2372ba@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <5420b28c-d33f-d25d-9f47-b06b8a2372ba@gmail.com>

Hi,

On 2023-02-10 16:50:32 +0100, Drouvot, Bertrand wrote:

On 2/10/23 3:32 AM, Kyotaro Horiguchi wrote:

The summarization is needed only by
few callers but now that cost is imposed to the all callers along with
additional palloc()/pfree() calls. That doesn't seem reasonable.

I agree that's not the best approach.....

I think it's completely fine to do unnecessary reconciliation for the _xact_
functions. They're not that commonly used, and very rarely is there a huge
number of relations with lots of pending data across lots of subtransactions.

Let me come back with another proposal (thinking to increment reconciled
counters in pgstat_count_heap_insert(), pgstat_count_heap_delete() and
pgstat_count_heap_update()).

Those are the performance crucial functions, we shouldn't do any additional
work there if we can avoid it. Shifting cost from the "looking at
transactional stats" side to the collecting stats side is the opposite of what
we should.

Greetings,

Andres Freund

#5Andres Freund
andres@anarazel.de
In reply to: Bertrand Drouvot (#1)
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

Hi,

On 2023-02-09 11:38:18 +0100, Drouvot, Bertrand wrote:

Please find attached a patch proposal for $SUBJECT.

The idea has been raised in [1] by Andres: it would allow to simplify even more the work done to
generate pg_stat_get_xact*() functions with Macros.

Thanks!

I think this is useful beyond being able to generate those functions with
macros. The fact that we had to deal with transactional code in pgstatfuncs.c
meant that a lot of the relevant itnernals had to be exposed "outside" pgstat,
which seems architecturally not great.

Indeed, with the reconciliation done in find_tabstat_entry() then all the pg_stat_get_xact*() functions
(making use of find_tabstat_entry()) now "look the same" (should they take into account live subtransactions or not).

I'm not bothered by making all of pg_stat_get_xact* functions more expensive,
they're not a hot code path. But if we need to, we could just add a parameter
to find_tabstat_entry() indicating whether we need to reconcile or not.

/* save stats for this function, later used to compensate for recursion */
-	fcu->save_f_total_time = pending->f_counts.f_total_time;
+	fcu->save_f_total_time = pending->f_total_time;

/* save current backend-wide total time */
fcu->save_total = total_func_time;

The diff is noisy due to all the mechanical changes like the above. Could that
be split into a separate commit?

find_tabstat_entry(Oid rel_id)
{
PgStat_EntryRef *entry_ref;
+ PgStat_TableStatus *tablestatus = NULL;

entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id);
if (!entry_ref)
entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id);

if (entry_ref)
-		return entry_ref->pending;
-	return NULL;
+	{
+		PgStat_TableStatus *tabentry = (PgStat_TableStatus *) entry_ref->pending;

I'd add an early return for the !entry_ref case, that way you don't need to
indent the bulk of the function.

+		PgStat_TableXactStatus *trans;
+
+		tablestatus  = palloc(sizeof(PgStat_TableStatus));
+		memcpy(tablestatus, tabentry, sizeof(PgStat_TableStatus));

For things like this I'd use
*tablestatus = *tabentry;

that way the compiler will warn you about mismatching types, and you don't
need the sizeof().

+		/* live subtransactions' counts aren't in t_counts yet */
+		for (trans = tabentry->trans; trans != NULL; trans = trans->upper)
+		{
+			tablestatus->t_counts.t_tuples_inserted += trans->tuples_inserted;
+			tablestatus->t_counts.t_tuples_updated += trans->tuples_updated;
+			tablestatus->t_counts.t_tuples_deleted += trans->tuples_deleted;
+		}
+	}

Hm, why do we end uup with t_counts still being used here, but removed other
places?

diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 6737493402..40a6fbf871 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1366,7 +1366,10 @@ pg_stat_get_xact_numscans(PG_FUNCTION_ARGS)
if ((tabentry = find_tabstat_entry(relid)) == NULL)
result = 0;
else
+	{
result = (int64) (tabentry->t_counts.t_numscans);
+		pfree(tabentry);
+	}

PG_RETURN_INT64(result);
}

I don't think we need to bother with individual pfrees in this path. The
caller will call the function in a dedicated memory context, that'll be reset
very soon after this.

Greetings,

Andres Freund

#6Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Andres Freund (#5)
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

Hi,

On 2/10/23 10:46 PM, Andres Freund wrote:

Hi,

On 2023-02-09 11:38:18 +0100, Drouvot, Bertrand wrote:

Please find attached a patch proposal for $SUBJECT.

The idea has been raised in [1] by Andres: it would allow to simplify even more the work done to
generate pg_stat_get_xact*() functions with Macros.

Thanks!

Thanks for looking at it!

I think this is useful beyond being able to generate those functions with
macros. The fact that we had to deal with transactional code in pgstatfuncs.c
meant that a lot of the relevant itnernals had to be exposed "outside" pgstat,
which seems architecturally not great.

Right, good point.

Indeed, with the reconciliation done in find_tabstat_entry() then all the pg_stat_get_xact*() functions
(making use of find_tabstat_entry()) now "look the same" (should they take into account live subtransactions or not).

I'm not bothered by making all of pg_stat_get_xact* functions more expensive,
they're not a hot code path. But if we need to, we could just add a parameter
to find_tabstat_entry() indicating whether we need to reconcile or not.

I think that's a good idea to avoid doing extra work if not needed.
V2 adds such a bool.

/* save stats for this function, later used to compensate for recursion */
-	fcu->save_f_total_time = pending->f_counts.f_total_time;
+	fcu->save_f_total_time = pending->f_total_time;

/* save current backend-wide total time */
fcu->save_total = total_func_time;

The diff is noisy due to all the mechanical changes like the above. Could that
be split into a separate commit?

Fully agree, the PgStat_BackendFunctionEntry stuff will be done in a separate patch.

find_tabstat_entry(Oid rel_id)
{
PgStat_EntryRef *entry_ref;
+ PgStat_TableStatus *tablestatus = NULL;

entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id);
if (!entry_ref)
entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id);

if (entry_ref)
-		return entry_ref->pending;
-	return NULL;
+	{
+		PgStat_TableStatus *tabentry = (PgStat_TableStatus *) entry_ref->pending;

I'd add an early return for the !entry_ref case, that way you don't need to
indent the bulk of the function.

Good point, done in V2.

+		PgStat_TableXactStatus *trans;
+
+		tablestatus  = palloc(sizeof(PgStat_TableStatus));
+		memcpy(tablestatus, tabentry, sizeof(PgStat_TableStatus));

For things like this I'd use
*tablestatus = *tabentry;

that way the compiler will warn you about mismatching types, and you don't
need the sizeof().

Good point, done in V2.

+		/* live subtransactions' counts aren't in t_counts yet */
+		for (trans = tabentry->trans; trans != NULL; trans = trans->upper)
+		{
+			tablestatus->t_counts.t_tuples_inserted += trans->tuples_inserted;
+			tablestatus->t_counts.t_tuples_updated += trans->tuples_updated;
+			tablestatus->t_counts.t_tuples_deleted += trans->tuples_deleted;
+		}
+	}

Hm, why do we end uup with t_counts still being used here, but removed other
places?

t_counts are not removed, maybe you are confused with the "f_counts" that were removed
in V1 due to the PgStat_BackendFunctionEntry related changes?

diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 6737493402..40a6fbf871 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1366,7 +1366,10 @@ pg_stat_get_xact_numscans(PG_FUNCTION_ARGS)
if ((tabentry = find_tabstat_entry(relid)) == NULL)
result = 0;
else
+	{
result = (int64) (tabentry->t_counts.t_numscans);
+		pfree(tabentry);
+	}

PG_RETURN_INT64(result);
}

I don't think we need to bother with individual pfrees in this path. The
caller will call the function in a dedicated memory context, that'll be reset
very soon after this.

Oh right, the palloc is done in the ExprContext memory context that is reset soon after.
Removing the pfrees in V2 attached.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v2-0001-find_tabstat_entry_recon.patchtext/plain; charset=UTF-8; name=v2-0001-find_tabstat_entry_recon.patchDownload+33-35
#7Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Bertrand Drouvot (#6)
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

At Mon, 13 Feb 2023 08:09:50 +0100, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in

I think this is useful beyond being able to generate those functions
with
macros. The fact that we had to deal with transactional code in
pgstatfuncs.c
meant that a lot of the relevant itnernals had to be exposed "outside"
pgstat,
which seems architecturally not great.

Right, good point.

Agreed.

Removing the pfrees in V2 attached.

Ah, that sound good.

 	if (!entry_ref)
+	{
 		entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id);
+		return tablestatus;
+	}

We should return something if the call returns a non-null value?

So, since we want to hide the internal from pgstatfuncs, the
additional flag should be gone. If the additional cost doesn't bother
anyone, I don't mind to remove the flag. The patch will get far
simpler by that.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#8Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Kyotaro Horiguchi (#7)
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

Hi,

On 2/13/23 8:40 AM, Kyotaro Horiguchi wrote:

At Mon, 13 Feb 2023 08:09:50 +0100, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in

I think this is useful beyond being able to generate those functions
with
macros. The fact that we had to deal with transactional code in
pgstatfuncs.c
meant that a lot of the relevant itnernals had to be exposed "outside"
pgstat,
which seems architecturally not great.

Right, good point.

Agreed.

Removing the pfrees in V2 attached.

Ah, that sound good.

if (!entry_ref)
+	{
entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id);
+		return tablestatus;
+	}

We should return something if the call returns a non-null value?

What we do is: if entry_ref is NULL then we return NULL (so that the caller returns 0).

If entry_ref is not NULL then we return a copy of entry_ref->pending (with or without subtrans).

So, since we want to hide the internal from pgstatfuncs, the
additional flag should be gone.

I think there is pros and cons for both but I don't have a strong opinion about that.

So also proposing V3 attached without the flag in case this is the preferred approach.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v3-0001-find_tabstat_entry_recon.patchtext/plain; charset=UTF-8; name=v3-0001-find_tabstat_entry_recon.patchDownload+27-24
#9Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Bertrand Drouvot (#8)
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

Thanks for the new version.

At Mon, 13 Feb 2023 09:58:52 +0100, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in

Hi,

On 2/13/23 8:40 AM, Kyotaro Horiguchi wrote:

At Mon, 13 Feb 2023 08:09:50 +0100, "Drouvot, Bertrand"
<bertranddrouvot.pg@gmail.com> wrote in

I think this is useful beyond being able to generate those functions
with
macros. The fact that we had to deal with transactional code in
pgstatfuncs.c
meant that a lot of the relevant itnernals had to be exposed "outside"
pgstat,
which seems architecturally not great.

Right, good point.

Agreed.

Removing the pfrees in V2 attached.

Ah, that sound good.
if (!entry_ref)
+ {
entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
InvalidOid, rel_id);
+ return tablestatus;
+ }
We should return something if the call returns a non-null value?

What we do is: if entry_ref is NULL then we return NULL (so that the
caller returns 0).

If entry_ref is not NULL then we return a copy of entry_ref->pending
(with or without subtrans).

Isn't it ignoring the second call to pgstat_fetch_pending_entry?

What the code did is: if entry_ref is NULL for MyDatabaseId then we
*retry* fetching an global (not database-wise) entry. If any global
entry is found, return it (correctly entry_ref->pending) to the caller.

The current patch returns NULL when a glboal entry is found.

I thought that we might be able to return entry_ref->pending since the
callers don't call pfree on the returned pointer, but it is not great
that we don't inform the callers if the returned memory can be safely
pfreed or not.

Thus what I have in mind is the following.

if (!entry_ref)
+	{
entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
InvalidOid, rel_id);
+		if (!entry_ref)
+         return NULL;
+	}

So, since we want to hide the internal from pgstatfuncs, the
additional flag should be gone.

I think there is pros and cons for both but I don't have a strong
opinion about that.

So also proposing V3 attached without the flag in case this is the
preferred approach.

That part looks good to me.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#10Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Kyotaro Horiguchi (#9)
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

Hi,

On 2/14/23 7:11 AM, Kyotaro Horiguchi wrote:

Isn't it ignoring the second call to pgstat_fetch_pending_entry?

Oh right, my bad (the issue has been introduced in V2).
Fixed in V4.

I thought that we might be able to return entry_ref->pending since the
callers don't call pfree on the returned pointer, but it is not great
that we don't inform the callers if the returned memory can be safely
pfreed or not.

Thus what I have in mind is the following.

if (!entry_ref)
+	{
entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
InvalidOid, rel_id);
+		if (!entry_ref)
+         return NULL;
+	}

LGTM, done that way in V4.

So, since we want to hide the internal from pgstatfuncs, the
additional flag should be gone.

I think there is pros and cons for both but I don't have a strong
opinion about that.

So also proposing V3 attached without the flag in case this is the
preferred approach.

That part looks good to me.

Thanks!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v4-0001-find_tabstat_entry_recon.patchtext/plain; charset=UTF-8; name=v4-0001-find_tabstat_entry_recon.patchDownload+28-24
#11Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Bertrand Drouvot (#10)
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

At Tue, 14 Feb 2023 15:43:26 +0100, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in

Oh right, my bad (the issue has been introduced in V2).
Fixed in V4.

Great!

I thought that we might be able to return entry_ref->pending since the
callers don't call pfree on the returned pointer, but it is not great
that we don't inform the callers if the returned memory can be safely
pfreed or not.
Thus what I have in mind is the following.

if (!entry_ref)
+	{
entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
InvalidOid, rel_id);
+		if (!entry_ref)
+         return NULL;
+	}

LGTM, done that way in V4.

That part looks good to me, thanks!

I was going through v4 and it seems to me that the comment for
find_tabstat_entry may not be quite right.

* find any existing PgStat_TableStatus entry for rel
*
* Find any existing PgStat_TableStatus entry for rel_id in the current
* database. If not found, try finding from shared tables.
*
* If no entry found, return NULL, don't create a new one

The comment assumed that the function directly returns an entry from
shared memory, but now it copies the entry's contents into a palloc'ed
memory and stores the sums of some counters for the current
transaction in it. Do you think we should update the comment to
reflect this change?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#12Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Kyotaro Horiguchi (#11)
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

Hi,

On 2/15/23 1:56 AM, Kyotaro Horiguchi wrote:

At Tue, 14 Feb 2023 15:43:26 +0100, "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> wrote in

The comment assumed that the function directly returns an entry from
shared memory, but now it copies the entry's contents into a palloc'ed
memory and stores the sums of some counters for the current
transaction in it. Do you think we should update the comment to
reflect this change?

Good point, thanks! Yeah, definitively, done in V5 attached.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v5-0001-find_tabstat_entry_recon.patchtext/plain; charset=UTF-8; name=v5-0001-find_tabstat_entry_recon.patchDownload+32-24
#13Andres Freund
andres@anarazel.de
In reply to: Bertrand Drouvot (#12)
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

Hi,

On 2023-02-15 09:21:48 +0100, Drouvot, Bertrand wrote:

diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index f793ac1516..b26e2a5a7a 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -471,20 +471,46 @@ pgstat_fetch_stat_tabentry_ext(bool shared, Oid reloid)
* Find any existing PgStat_TableStatus entry for rel_id in the current
* database. If not found, try finding from shared tables.
*
+ * If an entry is found, copy it and increment the copy's counters with their
+ * subtransactions counterparts. Then return the copy. There is no need for the
+ * caller to pfree the copy as the MemoryContext will be reset soon after.
+ *

The "There is no need" bit seems a bit off. Yes, that's true for the current
callers, but who says that it has to stay that way?

Otherwise this looks ready, on a casual scan.

Greetings,

Andres Freund

#14Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Andres Freund (#13)
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

Hi,

On 2/16/23 10:21 PM, Andres Freund wrote:

Hi,

On 2023-02-15 09:21:48 +0100, Drouvot, Bertrand wrote:

diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index f793ac1516..b26e2a5a7a 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -471,20 +471,46 @@ pgstat_fetch_stat_tabentry_ext(bool shared, Oid reloid)
* Find any existing PgStat_TableStatus entry for rel_id in the current
* database. If not found, try finding from shared tables.
*
+ * If an entry is found, copy it and increment the copy's counters with their
+ * subtransactions counterparts. Then return the copy. There is no need for the
+ * caller to pfree the copy as the MemoryContext will be reset soon after.
+ *

The "There is no need" bit seems a bit off. Yes, that's true for the current
callers, but who says that it has to stay that way?

Good point. Wording has been changed in V6 attached.

Otherwise this looks ready, on a casual scan.

Thanks for having looked at it!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v6-0001-find_tabstat_entry_recon.patchtext/plain; charset=UTF-8; name=v6-0001-find_tabstat_entry_recon.patchDownload+32-24
#15Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#14)
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

On Mon, Mar 06, 2023 at 08:33:15AM +0100, Drouvot, Bertrand wrote:

Thanks for having looked at it!

Looking at that, I have a few comments.

+    tabentry = (PgStat_TableStatus *) entry_ref->pending;
+    tablestatus = palloc(sizeof(PgStat_TableStatus));
+    *tablestatus = *tabentry;
+
[...]
+    for (trans = tabentry->trans; trans != NULL; trans = trans->upper)
+    {
+        tablestatus->t_counts.t_tuples_inserted += trans->tuples_inserted;
+        tablestatus->t_counts.t_tuples_updated += trans->tuples_updated;
+        tablestatus->t_counts.t_tuples_deleted += trans->tuples_deleted;
+    }
-    if (entry_ref)
-        return entry_ref->pending;
-    return NULL;
+    return tablestatus;

From what I get with this change, the number of tuples changed by DMLs
have their computations done a bit earlier, meaning that it would make
all the callers of find_tabstat_entry() pay the computation cost.
Still it is not really going to matter, because we will just do the
computation once when looking at any pending changes of
pg_stat_xact_all_tables for each entry. There are 9 callers of
find_tabstat_entry, with 7 being used for pg_stat_xact_all_tables.
How much do we need to care about the remaining two callers
pg_stat_get_xact_blocks_fetched() and pg_stat_get_xact_blocks_hit()?
Could it be a problem if these two also pay the extra computation cost
if a transaction with many subtransactions (aka )needs to look at their
data? These two are used nowhere, they have pg_proc entries and they
are undocumented, so it is hard to say the impact of this change on
them..

Second question: once the data from the subtransactions is copied,
would it be cleaner to set trans to NULL after the data copy is done?

It would feel a bit safer to me to document that find_tabstat_entry()
is currently only used for this xact system view.. The extra
computation could lead to surprises, actually, if this routine is used
outside this context? Perhaps that's OK, but it does not give me a
warm feeling, just to reshape three functions of pgstatfuncs.c with
macros.
--
Michael

#16Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#15)
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

Hi,

On 3/16/23 7:29 AM, Michael Paquier wrote:

On Mon, Mar 06, 2023 at 08:33:15AM +0100, Drouvot, Bertrand wrote:

Thanks for having looked at it!

Looking at that, I have a few comments.

+    tabentry = (PgStat_TableStatus *) entry_ref->pending;
+    tablestatus = palloc(sizeof(PgStat_TableStatus));
+    *tablestatus = *tabentry;
+
[...]
+    for (trans = tabentry->trans; trans != NULL; trans = trans->upper)
+    {
+        tablestatus->t_counts.t_tuples_inserted += trans->tuples_inserted;
+        tablestatus->t_counts.t_tuples_updated += trans->tuples_updated;
+        tablestatus->t_counts.t_tuples_deleted += trans->tuples_deleted;
+    }
-    if (entry_ref)
-        return entry_ref->pending;
-    return NULL;
+    return tablestatus;

From what I get with this change, the number of tuples changed by DMLs
have their computations done a bit earlier,

Thanks for looking at it!

Right, but note this is in a dedicated new tablestatus (created within find_tabstat_entry()).

meaning that it would make
all the callers of find_tabstat_entry() pay the computation cost.

Right. Another suggested approach was to add a flag but then we'd not really
hide the internal from pgstatfuncs.

Still it is not really going to matter, because we will just do the
computation once when looking at any pending changes of
pg_stat_xact_all_tables for each entry.

Yes.

There are 9 callers of
find_tabstat_entry, with 7 being used for pg_stat_xact_all_tables.

Right, those are:

pg_stat_get_xact_numscans()

pg_stat_get_xact_tuples_returned()

pg_stat_get_xact_tuples_fetched()

pg_stat_get_xact_tuples_inserted()

pg_stat_get_xact_tuples_updated()

pg_stat_get_xact_tuples_deleted()

pg_stat_get_xact_tuples_hot_updated()

How much do we need to care about the remaining two callers
pg_stat_get_xact_blocks_fetched() and pg_stat_get_xact_blocks_hit()?

Regarding pg_stat_get_xact_blocks_fetched() and pg_stat_get_xact_blocks_hit()
the callers (if any) are outside of the core PG (as from what I can see they are not used
at all).

I don't think we should pay any particular attention to those 2 ones as anyway nothing
prevent the 7 others to be called outside of the pg_stat_xact_all_tables view.

Could it be a problem if these two also pay the extra computation cost
if a transaction with many subtransactions (aka )needs to look at their
data? These two are used nowhere, they have pg_proc entries and they
are undocumented, so it is hard to say the impact of this change on
them..

Right, and that's the same for the 7 others as nothing prevent them to be called outside
of the pg_stat_xact_all_tables view.

Do you think it would be better to add the extra flag then?

Second question: once the data from the subtransactions is copied,
would it be cleaner to set trans to NULL after the data copy is done?

That would not hurt but I'm not sure it's worth it (well, it's currently
not done in pg_stat_get_xact_tuples_inserted() for example).

It would feel a bit safer to me to document that find_tabstat_entry()
is currently only used for this xact system view.. The extra
computation could lead to surprises, actually, if this routine is used
outside this context? Perhaps that's OK, but it does not give me a
warm feeling, just to reshape three functions of pgstatfuncs.c with
macros.

That's a fair point. On the other hand those 9 functions (which can all be used outside
of the pg_stat_xact_all_tables view) are not documented, so I'm not sure this is that much of
a concern (and if we think it is we still gave the option to add an extra flag to indicate whether
or not the extra computation is needed.)

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#17Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#16)
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

On Thu, Mar 16, 2023 at 11:32:56AM +0100, Drouvot, Bertrand wrote:

On 3/16/23 7:29 AM, Michael Paquier wrote:

From what I get with this change, the number of tuples changed by DMLs
have their computations done a bit earlier,

Thanks for looking at it!

Right, but note this is in a dedicated new tablestatus (created
within find_tabstat_entry()).

Sure, however it copies the pointer of the PgStat_TableXactStatus from
tabentry, isn't it? This means that it keeps a reference of the chain
of subtransactions. It does not matter for the functions but it could
for out-of-core callers of find_tabstat_entry(), no? Perhaps you are
right and that's not worth worrying, still I don't feel particularly
confident that this is the best approach we can take.

How much do we need to care about the remaining two callers
pg_stat_get_xact_blocks_fetched() and pg_stat_get_xact_blocks_hit()?

Regarding pg_stat_get_xact_blocks_fetched() and pg_stat_get_xact_blocks_hit()
the callers (if any) are outside of the core PG (as from what I can
see they are not used at all).

I don't think we should pay any particular attention to those 2 ones
as anyway nothing prevent the 7 others to be called outside of the
pg_stat_xact_all_tables view.

I am not quite sure, TBH. Did you look at the difference with a long
chain of subtrans, like savepoints? The ODBC driver "loves" producing
a lot of savepoints, for example.

It would feel a bit safer to me to document that find_tabstat_entry()
is currently only used for this xact system view.. The extra
computation could lead to surprises, actually, if this routine is used
outside this context? Perhaps that's OK, but it does not give me a
warm feeling, just to reshape three functions of pgstatfuncs.c with
macros.

That's a fair point. On the other hand those 9 functions (which can
all be used outside of the pg_stat_xact_all_tables view) are not
documented, so I'm not sure this is that much of a concern (and if
we think it is we still gave the option to add an extra flag to
indicate whether or not the extra computation is needed.)

That's not quite exact, I think. The first 7 functions are used in a
system catalog that is documented. Still we have a problem here. I
can actually see a few projects relying on these two functions while
looking a bit around, so they are used. And the issue comes from
ddfc2d9, that has removed these functions from the documentation
ignoring that they are used in no system catalogs. I think that we
should fix that and re-add the two missing functions with a proper
description in the docs, at least? There is no trace of them.
Perhaps the ones exposted through pg_stat_xact_all_tables are fine if
not listed.
--
Michael

#18Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#17)
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

On 3/16/23 12:46 PM, Michael Paquier wrote:

On Thu, Mar 16, 2023 at 11:32:56AM +0100, Drouvot, Bertrand wrote:

On 3/16/23 7:29 AM, Michael Paquier wrote:

From what I get with this change, the number of tuples changed by DMLs
have their computations done a bit earlier,

Thanks for looking at it!

Right, but note this is in a dedicated new tablestatus (created
within find_tabstat_entry()).

Sure, however it copies the pointer of the PgStat_TableXactStatus from
tabentry, isn't it?

Oh I see what you mean, yeah, the pointer is copied.

This means that it keeps a reference of the chain
of subtransactions. It does not matter for the functions but it could
for out-of-core callers of find_tabstat_entry(), no?

Yeah, maybe.

Perhaps you are
right and that's not worth worrying, still I don't feel particularly
confident that this is the best approach we can take.

due to what potential out-of-core callers could do with it?

How much do we need to care about the remaining two callers
pg_stat_get_xact_blocks_fetched() and pg_stat_get_xact_blocks_hit()?

Regarding pg_stat_get_xact_blocks_fetched() and pg_stat_get_xact_blocks_hit()
the callers (if any) are outside of the core PG (as from what I can
see they are not used at all).

I don't think we should pay any particular attention to those 2 ones
as anyway nothing prevent the 7 others to be called outside of the
pg_stat_xact_all_tables view.

I am not quite sure, TBH. Did you look at the difference with a long
chain of subtrans, like savepoints? The ODBC driver "loves" producing
a lot of savepoints, for example.

No, I did not measure the impact.

It would feel a bit safer to me to document that find_tabstat_entry()
is currently only used for this xact system view.. The extra
computation could lead to surprises, actually, if this routine is used
outside this context? Perhaps that's OK, but it does not give me a
warm feeling, just to reshape three functions of pgstatfuncs.c with
macros.

That's a fair point. On the other hand those 9 functions (which can
all be used outside of the pg_stat_xact_all_tables view) are not
documented, so I'm not sure this is that much of a concern (and if
we think it is we still gave the option to add an extra flag to
indicate whether or not the extra computation is needed.)

That's not quite exact, I think. The first 7 functions are used in a
system catalog that is documented.

Right.

Still we have a problem here. I
can actually see a few projects relying on these two functions while
looking a bit around, so they are used. And the issue comes from
ddfc2d9, that has removed these functions from the documentation
ignoring that they are used in no system catalogs. I think that we
should fix that and re-add the two missing functions with a proper
description in the docs, at least?

As they could be/are used outside of the xact view, yes I think the same.

There is no trace of them.
Perhaps the ones exposted through pg_stat_xact_all_tables are fine if
not listed.

I'd be tempted to add documentation for all of them, I can look at it.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#19Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#18)
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

On Thu, Mar 16, 2023 at 02:13:45PM +0100, Drouvot, Bertrand wrote:

On 3/16/23 12:46 PM, Michael Paquier wrote:

There is no trace of them.
Perhaps the ones exposted through pg_stat_xact_all_tables are fine if
not listed.

I'd be tempted to add documentation for all of them, I can look at it.

I am not sure that there is any need to completely undo ddfc2d9, later
simplified by 5f2b089, so my opinion would be to just add
documentation for the functions that can be used but are used in none
of the system functions.

Anyway, double-checking, I only see an inconsistency for these two,
confirming my first impression:
- pg_stat_get_xact_blocks_fetched
- pg_stat_get_xact_blocks_hit

There may be a point in having them in some of the system views, but
the non-xact flavors are only used in the statio views, which don't
really need xact versions AFAIK. I am not sure that it makes much
sense to add them in pg_stat_xact_all_tables, either. Another view is
just remove them, though some could rely on them externally. At the
end, documenting both still sounds like the best move to me.
--
Michael

#20Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#19)
Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

Hi,

On 3/20/23 12:43 AM, Michael Paquier wrote:

At the
end, documenting both still sounds like the best move to me.

Agree.

Please find attached v1-0001-pg_stat_get_xact_blocks_fetched-and_hit-doc.patch doing so.

I did not put the exact same wording as the one being removed in ddfc2d9, as:

- For pg_stat_get_xact_blocks_hit(), I think it's better to be closer to say the
pg_statio_all_tables.heap_blks_hit definition.

- For pg_stat_get_xact_blocks_fetched(), I think that using "buffer" is better (than block) as at the
end it's related to pgstat_count_buffer_read().

At the end there is a choice to be made for both for the wording between block and buffer. Indeed their
counters get incremented in "buffer" macros while retrieved in those "blocks" functions.

"Buffer" sounds more appropriate to me, so the attached has been done that way.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-pg_stat_get_xact_blocks_fetched-and_hit-doc.patchtext/plain; charset=UTF-8; name=v1-0001-pg_stat_get_xact_blocks_fetched-and_hit-doc.patchDownload+26-0
#21Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#20)
#22Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#21)
#23Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#22)
#24Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#23)
#25Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#24)
#26Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#25)
#27Melanie Plageman
melanieplageman@gmail.com
In reply to: Bertrand Drouvot (#20)
#28Michael Paquier
michael@paquier.xyz
In reply to: Melanie Plageman (#27)
#29Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#18)
#30Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#29)
#31Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#30)
#32Melanie Plageman
melanieplageman@gmail.com
In reply to: Michael Paquier (#28)
#33Michael Paquier
michael@paquier.xyz
In reply to: Melanie Plageman (#32)
#34Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#33)
#35Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#34)
#36Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#35)
#37Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#36)
#38Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#37)
#39Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#38)
#40Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#39)
#41Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#40)
#42Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#41)
#43Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#31)
#44Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#43)
#45Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#44)
#46Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#45)
#47Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#45)
#48Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#47)