check for null value before looking up the hash function

Started by Zhihong Yuover 3 years ago11 messages
#1Zhihong Yu
zyu@yugabyte.com
1 attachment(s)

Hi,
I was looking at the code in hash_record()
of src/backend/utils/adt/rowtypes.c

It seems if nulls[i] is true, we don't need to look up the hash function.

Please take a look at the patch.

Thanks

Attachments:

hash-record-check-null-first.patchapplication/octet-stream; name=hash-record-check-null-first.patchDownload
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index db843a0fbf..653dde3ee5 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -1843,23 +1843,6 @@ hash_record(PG_FUNCTION_ARGS)
 		if (att->attisdropped)
 			continue;
 
-		/*
-		 * Lookup the hash function if not done already
-		 */
-		typentry = my_extra->columns[i].typentry;
-		if (typentry == NULL ||
-			typentry->type_id != att->atttypid)
-		{
-			typentry = lookup_type_cache(att->atttypid,
-										 TYPECACHE_HASH_PROC_FINFO);
-			if (!OidIsValid(typentry->hash_proc_finfo.fn_oid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_FUNCTION),
-						 errmsg("could not identify a hash function for type %s",
-								format_type_be(typentry->type_id))));
-			my_extra->columns[i].typentry = typentry;
-		}
-
 		/* Compute hash of element */
 		if (nulls[i])
 		{
@@ -1867,6 +1850,23 @@ hash_record(PG_FUNCTION_ARGS)
 		}
 		else
 		{
+			/*
+			 * Lookup the hash function if not done already
+			 */
+			typentry = my_extra->columns[i].typentry;
+			if (typentry == NULL ||
+				typentry->type_id != att->atttypid)
+			{
+				typentry = lookup_type_cache(att->atttypid,
+											 TYPECACHE_HASH_PROC_FINFO);
+				if (!OidIsValid(typentry->hash_proc_finfo.fn_oid))
+					ereport(ERROR,
+							(errcode(ERRCODE_UNDEFINED_FUNCTION),
+							 errmsg("could not identify a hash function for type %s",
+									format_type_be(typentry->type_id))));
+				my_extra->columns[i].typentry = typentry;
+			}
+
 			LOCAL_FCINFO(locfcinfo, 1);
 
 			InitFunctionCallInfoData(*locfcinfo, &typentry->hash_proc_finfo, 1,
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zhihong Yu (#1)
Re: check for null value before looking up the hash function

Zhihong Yu <zyu@yugabyte.com> writes:

I was looking at the code in hash_record()
of src/backend/utils/adt/rowtypes.c
It seems if nulls[i] is true, we don't need to look up the hash function.

I don't think this is worth changing. It complicates the logic,
rendering it unlike quite a few other functions written in the same
style. In cases where the performance actually matters, the hash
function is cached across multiple calls anyway. You might save
something if you have many calls in a query and not one of them
receives a non-null input, but how likely is that?

regards, tom lane

#3Ranier Vilela
ranier.vf@gmail.com
In reply to: Tom Lane (#2)
1 attachment(s)
Re: check for null value before looking up the hash function

Zhihong Yu <zyu(at)yugabyte(dot)com> writes:

I was looking at the code in hash_record()
of src/backend/utils/adt/rowtypes.c
It seems if nulls[i] is true, we don't need to look up the hash function.

I don't think this is worth changing. It complicates the logic,
rendering it unlike quite a few other functions written in the same
style. In cases where the performance actually matters, the hash
function is cached across multiple calls anyway. You might save
something if you have many calls in a query and not one of them
receives a non-null input, but how likely is that?

I disagree.
I think that is worth changing. The fact of complicating the logic
is irrelevant.
But maybe the v2 attached would be a little better.
My doubt is the result calc when nulls are true.

regards,
Ranier Vilela

Attachments:

v2-hash-record-check-null-first.patchapplication/octet-stream; name=v2-hash-record-check-null-first.patchDownload
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index db843a0fbf..a0b0b248d2 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -1835,38 +1835,35 @@ hash_record(PG_FUNCTION_ARGS)
 	for (int i = 0; i < ncolumns; i++)
 	{
 		Form_pg_attribute att;
-		TypeCacheEntry *typentry;
-		uint32		element_hash;
 
 		att = TupleDescAttr(tupdesc, i);
 
 		if (att->attisdropped)
 			continue;
 
-		/*
-		 * Lookup the hash function if not done already
-		 */
-		typentry = my_extra->columns[i].typentry;
-		if (typentry == NULL ||
-			typentry->type_id != att->atttypid)
+		/* Compute hash of element */
+		if (!nulls[i])
 		{
-			typentry = lookup_type_cache(att->atttypid,
+			TypeCacheEntry *typentry;
+			uint32		element_hash;
+
+			/*
+			* Lookup the hash function if not done already
+			*/
+			typentry = my_extra->columns[i].typentry;
+			if (typentry == NULL ||
+				typentry->type_id != att->atttypid)
+			{
+				typentry = lookup_type_cache(att->atttypid,
 										 TYPECACHE_HASH_PROC_FINFO);
-			if (!OidIsValid(typentry->hash_proc_finfo.fn_oid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_FUNCTION),
-						 errmsg("could not identify a hash function for type %s",
-								format_type_be(typentry->type_id))));
-			my_extra->columns[i].typentry = typentry;
-		}
+				if (!OidIsValid(typentry->hash_proc_finfo.fn_oid))
+					ereport(ERROR,
+							(errcode(ERRCODE_UNDEFINED_FUNCTION),
+							errmsg("could not identify a hash function for type %s",
+									format_type_be(typentry->type_id))));
+				my_extra->columns[i].typentry = typentry;
+			}
 
-		/* Compute hash of element */
-		if (nulls[i])
-		{
-			element_hash = 0;
-		}
-		else
-		{
 			LOCAL_FCINFO(locfcinfo, 1);
 
 			InitFunctionCallInfoData(*locfcinfo, &typentry->hash_proc_finfo, 1,
@@ -1877,10 +1874,12 @@ hash_record(PG_FUNCTION_ARGS)
 
 			/* We don't expect hash support functions to return null */
 			Assert(!locfcinfo->isnull);
-		}
 
-		/* see hash_array() */
-		result = (result << 5) - result + element_hash;
+			/* see hash_array() */
+			result = (result << 5) - result + element_hash;
+		}
+		else
+			result = (result << 5) - result + 0;
 	}
 
 	pfree(values);
#4Ranier Vilela
ranier.vf@gmail.com
In reply to: Ranier Vilela (#3)
1 attachment(s)
Re: check for null value before looking up the hash function

Em sáb., 21 de mai. de 2022 às 10:06, Ranier Vilela <ranier.vf@gmail.com>
escreveu:

Zhihong Yu <zyu(at)yugabyte(dot)com> writes:

I was looking at the code in hash_record()
of src/backend/utils/adt/rowtypes.c
It seems if nulls[i] is true, we don't need to look up the hash

function.

I don't think this is worth changing. It complicates the logic,
rendering it unlike quite a few other functions written in the same
style. In cases where the performance actually matters, the hash
function is cached across multiple calls anyway. You might save
something if you have many calls in a query and not one of them
receives a non-null input, but how likely is that?

I disagree.
I think that is worth changing. The fact of complicating the logic
is irrelevant.
But maybe the v2 attached would be a little better.

Or v3.

regards,
Ranier Vilela

Attachments:

v3-hash-record-check-null-first.patchapplication/octet-stream; name=v3-hash-record-check-null-first.patchDownload
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index db843a0fbf..d4828dac62 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -1834,39 +1834,35 @@ hash_record(PG_FUNCTION_ARGS)
 
 	for (int i = 0; i < ncolumns; i++)
 	{
-		Form_pg_attribute att;
-		TypeCacheEntry *typentry;
-		uint32		element_hash;
+		/* Compute hash of element */
+		if (!nulls[i])
+		{
+			Form_pg_attribute att;
+			TypeCacheEntry *typentry;
+			uint32		element_hash;
 
-		att = TupleDescAttr(tupdesc, i);
+			att = TupleDescAttr(tupdesc, i);
 
-		if (att->attisdropped)
-			continue;
+			if (att->attisdropped)
+				continue;
 
-		/*
-		 * Lookup the hash function if not done already
-		 */
-		typentry = my_extra->columns[i].typentry;
-		if (typentry == NULL ||
-			typentry->type_id != att->atttypid)
-		{
-			typentry = lookup_type_cache(att->atttypid,
+			/*
+			* Lookup the hash function if not done already
+			*/
+			typentry = my_extra->columns[i].typentry;
+			if (typentry == NULL ||
+				typentry->type_id != att->atttypid)
+			{
+				typentry = lookup_type_cache(att->atttypid,
 										 TYPECACHE_HASH_PROC_FINFO);
-			if (!OidIsValid(typentry->hash_proc_finfo.fn_oid))
-				ereport(ERROR,
-						(errcode(ERRCODE_UNDEFINED_FUNCTION),
-						 errmsg("could not identify a hash function for type %s",
-								format_type_be(typentry->type_id))));
-			my_extra->columns[i].typentry = typentry;
-		}
+				if (!OidIsValid(typentry->hash_proc_finfo.fn_oid))
+					ereport(ERROR,
+							(errcode(ERRCODE_UNDEFINED_FUNCTION),
+							errmsg("could not identify a hash function for type %s",
+									format_type_be(typentry->type_id))));
+				my_extra->columns[i].typentry = typentry;
+			}
 
-		/* Compute hash of element */
-		if (nulls[i])
-		{
-			element_hash = 0;
-		}
-		else
-		{
 			LOCAL_FCINFO(locfcinfo, 1);
 
 			InitFunctionCallInfoData(*locfcinfo, &typentry->hash_proc_finfo, 1,
@@ -1877,10 +1873,12 @@ hash_record(PG_FUNCTION_ARGS)
 
 			/* We don't expect hash support functions to return null */
 			Assert(!locfcinfo->isnull);
-		}
 
-		/* see hash_array() */
-		result = (result << 5) - result + element_hash;
+			/* see hash_array() */
+			result = (result << 5) - result + element_hash;
+		}
+		else
+			result = (result << 5) - result + 0;
 	}
 
 	pfree(values);
#5Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Ranier Vilela (#3)
Re: check for null value before looking up the hash function

On 5/21/22 15:06, Ranier Vilela wrote:

Zhihong Yu <zyu(at)yugabyte(dot)com> writes:

I was looking at the code in hash_record()
of src/backend/utils/adt/rowtypes.c
It seems if nulls[i] is true, we don't need to look up the hash function.

I don't think this is worth changing. It complicates the logic,
rendering it unlike quite a few other functions written in the same
style. In cases where the performance actually matters, the hash
function is cached across multiple calls anyway. You might save
something if you have many calls in a query and not one of them
receives a non-null input, but how likely is that?

I disagree.
I think that is worth changing. The fact of complicating the logic
is irrelevant.

That's a quite bold claim, and yet you haven't supported it by any
argument whatsoever. Trade-offs between complexity and efficiency are a
crucial development task, so complicating the logic clearly does matter.

It might be out-weighted by efficiency benefits, but as Tom suggested
the cases that might benefit from this are extremely unlikely (data with
just NULL values). And even for those cases no benchmark quantifying the
difference was presented.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Ranier Vilela
ranier.vf@gmail.com
In reply to: Tomas Vondra (#5)
Re: check for null value before looking up the hash function

Em sáb., 21 de mai. de 2022 às 12:05, Tomas Vondra <
tomas.vondra@enterprisedb.com> escreveu:

On 5/21/22 15:06, Ranier Vilela wrote:

Zhihong Yu <zyu(at)yugabyte(dot)com> writes:

I was looking at the code in hash_record()
of src/backend/utils/adt/rowtypes.c
It seems if nulls[i] is true, we don't need to look up the hash

function.

I don't think this is worth changing. It complicates the logic,
rendering it unlike quite a few other functions written in the same
style. In cases where the performance actually matters, the hash
function is cached across multiple calls anyway. You might save
something if you have many calls in a query and not one of them
receives a non-null input, but how likely is that?

I disagree.
I think that is worth changing. The fact of complicating the logic
is irrelevant.

That's a quite bold claim, and yet you haven't supported it by any
argument whatsoever. Trade-offs between complexity and efficiency are a
crucial development task, so complicating the logic clearly does matter.

What I meant is that complicating the logic in search of efficiency is
worth it, and that's what everyone is looking for in this thread.
Likewise, not complicating the logic, losing a little bit of efficiency,
applied to all the code, leads to a big loss of efficiency.
In other words, I never miss an opportunity to gain efficiency.

regards,
Ranier Vilela

#7David G. Johnston
david.g.johnston@gmail.com
In reply to: Ranier Vilela (#6)
Re: check for null value before looking up the hash function

On Sat, May 21, 2022 at 8:32 AM Ranier Vilela <ranier.vf@gmail.com> wrote:

Em sáb., 21 de mai. de 2022 às 12:05, Tomas Vondra <
tomas.vondra@enterprisedb.com> escreveu:

On 5/21/22 15:06, Ranier Vilela wrote:

Zhihong Yu <zyu(at)yugabyte(dot)com> writes:

I was looking at the code in hash_record()
of src/backend/utils/adt/rowtypes.c
It seems if nulls[i] is true, we don't need to look up the hash

function.

I don't think this is worth changing. It complicates the logic,
rendering it unlike quite a few other functions written in the same
style. In cases where the performance actually matters, the hash
function is cached across multiple calls anyway. You might save
something if you have many calls in a query and not one of them
receives a non-null input, but how likely is that?

I disagree.
I think that is worth changing. The fact of complicating the logic
is irrelevant.

That's a quite bold claim, and yet you haven't supported it by any
argument whatsoever. Trade-offs between complexity and efficiency are a
crucial development task, so complicating the logic clearly does matter.

What I meant is that complicating the logic in search of efficiency is
worth it, and that's what everyone is looking for in this thread.
Likewise, not complicating the logic, losing a little bit of efficiency,
applied to all the code, leads to a big loss of efficiency.
In other words, I never miss an opportunity to gain efficiency.

I disliked the fact that 80% of the patch was adding indentation. Instead,
remove indentation for the normal flow case and move the loop short-circuit
to the top of the loop where it is traditionally found.

This seems like a win on both efficiency and complexity grounds. Having
the "/* see hash_array() */" comment and logic twice is a downside but a
minor one that could be replaced with a function call if desired.

diff --git a/src/backend/utils/adt/rowtypes.c
b/src/backend/utils/adt/rowtypes.c
index db843a0fbf..0bc28d1742 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -1838,6 +1838,13 @@ hash_record(PG_FUNCTION_ARGS)
        TypeCacheEntry *typentry;
        uint32      element_hash;
+       if (nulls[i])
+       {
+           /* see hash_array() */
+           result = (result << 5) - result + 0;
+           continue;
+       }
+
        att = TupleDescAttr(tupdesc, i);

if (att->attisdropped)
@@ -1860,24 +1867,16 @@ hash_record(PG_FUNCTION_ARGS)
my_extra->columns[i].typentry = typentry;
}

-       /* Compute hash of element */
-       if (nulls[i])
-       {
-           element_hash = 0;
-       }
-       else
-       {
-           LOCAL_FCINFO(locfcinfo, 1);
+       LOCAL_FCINFO(locfcinfo, 1);
-           InitFunctionCallInfoData(*locfcinfo,
&typentry->hash_proc_finfo, 1,
-                                    att->attcollation, NULL, NULL);
-           locfcinfo->args[0].value = values[i];
-           locfcinfo->args[0].isnull = false;
-           element_hash = DatumGetUInt32(FunctionCallInvoke(locfcinfo));
+       InitFunctionCallInfoData(*locfcinfo, &typentry->hash_proc_finfo, 1,
+                                   att->attcollation, NULL, NULL);
+       locfcinfo->args[0].value = values[i];
+       locfcinfo->args[0].isnull = false;
+       element_hash = DatumGetUInt32(FunctionCallInvoke(locfcinfo));
-           /* We don't expect hash support functions to return null */
-           Assert(!locfcinfo->isnull);
-       }
+       /* We don't expect hash support functions to return null */
+       Assert(!locfcinfo->isnull);

/* see hash_array() */
result = (result << 5) - result + element_hash;

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ranier Vilela (#6)
Re: check for null value before looking up the hash function

Ranier Vilela <ranier.vf@gmail.com> writes:

Em sáb., 21 de mai. de 2022 às 12:05, Tomas Vondra <
tomas.vondra@enterprisedb.com> escreveu:

That's a quite bold claim, and yet you haven't supported it by any
argument whatsoever. Trade-offs between complexity and efficiency are a
crucial development task, so complicating the logic clearly does matter.

What I meant is that complicating the logic in search of efficiency is
worth it, and that's what everyone is looking for in this thread.
Likewise, not complicating the logic, losing a little bit of efficiency,
applied to all the code, leads to a big loss of efficiency.
In other words, I never miss an opportunity to gain efficiency.

[ shrug... ] You quietly ignored Tomas' main point, which is that no
evidence has been provided that there's actually any efficiency gain.

(1) Sure, in the case where only null values are encountered during a
query, we can save a cache lookup, but will that be even micro-measurable
compared to general query overhead? Seems unlikely, especially if this is
changed in only one place. That ties into my complaint about how this is
just one instance of a fairly widely used coding pattern.

(2) What are the effects when we *do* eventually encounter a non-null
value? The existing coding will perform all the necessary lookups
at first call, but with the proposed change those may be spread across
query execution. It's not implausible that that leads to a net loss
of efficiency, due to locality-of-access effects.

I'm also concerned that this increases the size of the "state space"
of this function, in that there are now more possible states of its
cached information. While that probably doesn't add any new bugs,
it does add complication and make things harder to reason about.
So the bottom line remains that I don't think it's worth changing.

regards, tom lane

#9Ranier Vilela
ranier.vf@gmail.com
In reply to: Tom Lane (#8)
Re: check for null value before looking up the hash function

Em sáb., 21 de mai. de 2022 às 13:13, Tom Lane <tgl@sss.pgh.pa.us> escreveu:

Ranier Vilela <ranier.vf@gmail.com> writes:

Em sáb., 21 de mai. de 2022 às 12:05, Tomas Vondra <
tomas.vondra@enterprisedb.com> escreveu:

That's a quite bold claim, and yet you haven't supported it by any
argument whatsoever. Trade-offs between complexity and efficiency are a
crucial development task, so complicating the logic clearly does matter.

What I meant is that complicating the logic in search of efficiency is
worth it, and that's what everyone is looking for in this thread.
Likewise, not complicating the logic, losing a little bit of efficiency,
applied to all the code, leads to a big loss of efficiency.
In other words, I never miss an opportunity to gain efficiency.

[ shrug... ] You quietly ignored Tomas' main point, which is that no
evidence has been provided that there's actually any efficiency gain.

IMHO, the point here, is for-non-commiters everything needs benchmarks.
But, I have been see many commits withouts benchmarks or any evidence gains.
And of course, having benchmarks is better, but for micro-optimizations,
It doesn't seem like it's needed that much.

(1) Sure, in the case where only null values are encountered during a
query, we can save a cache lookup, but will that be even micro-measurable
compared to general query overhead? Seems unlikely, especially if this is
changed in only one place. That ties into my complaint about how this is
just one instance of a fairly widely used coding pattern.

Of course, changing only in one place, the gain is tiny, but, step by step,
the coding pattern is changing too, becoming new "fairly widely".

(2) What are the effects when we *do* eventually encounter a non-null
value? The existing coding will perform all the necessary lookups
at first call, but with the proposed change those may be spread across
query execution. It's not implausible that that leads to a net loss
of efficiency, due to locality-of-access effects.

Weel the current code, test branch for nulls first.
Most of the time, this is not true.
So, the current code has poor branch prediction, at least.
What I proposed, improves the branch prediction, at least.

I'm also concerned that this increases the size of the "state space"
of this function, in that there are now more possible states of its
cached information. While that probably doesn't add any new bugs,
it does add complication and make things harder to reason about.
So the bottom line remains that I don't think it's worth changing.

Of course, your arguments are all valids.
That would all be clarified with benchmarks, maybe the OP is interested in
doing them.

regards,
Ranier Vilela

#10David G. Johnston
david.g.johnston@gmail.com
In reply to: Ranier Vilela (#9)
Re: check for null value before looking up the hash function

On Sat, May 21, 2022 at 10:04 AM Ranier Vilela <ranier.vf@gmail.com> wrote:

Em sáb., 21 de mai. de 2022 às 13:13, Tom Lane <tgl@sss.pgh.pa.us>
escreveu:

Ranier Vilela <ranier.vf@gmail.com> writes:

Em sáb., 21 de mai. de 2022 às 12:05, Tomas Vondra <
tomas.vondra@enterprisedb.com> escreveu:

That's a quite bold claim, and yet you haven't supported it by any
argument whatsoever. Trade-offs between complexity and efficiency are a
crucial development task, so complicating the logic clearly does

matter.

What I meant is that complicating the logic in search of efficiency is
worth it, and that's what everyone is looking for in this thread.
Likewise, not complicating the logic, losing a little bit of efficiency,
applied to all the code, leads to a big loss of efficiency.
In other words, I never miss an opportunity to gain efficiency.

[ shrug... ] You quietly ignored Tomas' main point, which is that no
evidence has been provided that there's actually any efficiency gain.

IMHO, the point here, is for-non-commiters everything needs benchmarks.
But, I have been see many commits withouts benchmarks or any evidence
gains.
And of course, having benchmarks is better, but for micro-optimizations,
It doesn't seem like it's needed that much.

Mostly because committers don't tend to do this kind of drive-by patching
that changes long-established code, which are fairly categorized as
premature optimizations.

(1) Sure, in the case where only null values are encountered during a
query, we can save a cache lookup, but will that be even micro-measurable
compared to general query overhead? Seems unlikely, especially if this is
changed in only one place. That ties into my complaint about how this is
just one instance of a fairly widely used coding pattern.

Of course, changing only in one place, the gain is tiny, but, step by step,
the coding pattern is changing too, becoming new "fairly widely".

Agreed, but that isn't what was done here, there was no investigation of
the overall coding practice and suggestions to change them all to the
improved form.

(2) What are the effects when we *do* eventually encounter a non-null
value? The existing coding will perform all the necessary lookups
at first call, but with the proposed change those may be spread across
query execution. It's not implausible that that leads to a net loss
of efficiency, due to locality-of-access effects.

Weel the current code, test branch for nulls first.
Most of the time, this is not true.
So, the current code has poor branch prediction, at least.
What I proposed, improves the branch prediction, at least.

Per my other reply, the v3 proposal did not, IMHO, do a good job of
branch prediction either.

I find an improvement on code complexity grounds to be warranted, though
the benefit seems unlikely to outweigh the cost of doing it everywhere
(fixing only one place actually increases the cost component).

Even without the plausible locality-of-access argument the benefit here is
likely to be a micro-optimization that provides only minimal benefit.
Evidence to the contrary is welcomed but, yes, the burden is going to be
placed squarely on the patch author(s) to demonstrate the benefit accrued
from the code churn is worth the cost.

David J.

#11Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: David G. Johnston (#10)
Re: check for null value before looking up the hash function

On 5/21/22 19:24, David G. Johnston wrote:

On Sat, May 21, 2022 at 10:04 AM Ranier Vilela <ranier.vf@gmail.com
<mailto:ranier.vf@gmail.com>> wrote:

Em sáb., 21 de mai. de 2022 às 13:13, Tom Lane <tgl@sss.pgh.pa.us
<mailto:tgl@sss.pgh.pa.us>> escreveu:

Ranier Vilela <ranier.vf@gmail.com <mailto:ranier.vf@gmail.com>>
writes:

Em sáb., 21 de mai. de 2022 às 12:05, Tomas Vondra <
tomas.vondra@enterprisedb.com

<mailto:tomas.vondra@enterprisedb.com>> escreveu:

That's a quite bold claim, and yet you haven't supported it

by any

argument whatsoever. Trade-offs between complexity and

efficiency are a

crucial development task, so complicating the logic clearly

does matter.

What I meant is that complicating the logic in search of

efficiency is

worth it, and that's what everyone is looking for in this thread.
Likewise, not complicating the logic, losing a little bit of

efficiency,

applied to all the code, leads to a big loss of efficiency.
In other words, I never miss an opportunity to gain efficiency.

[ shrug... ]  You quietly ignored Tomas' main point, which is
that no
evidence has been provided that there's actually any efficiency
gain.

IMHO, the point here, is for-non-commiters everything needs benchmarks.
But, I have been see many commits withouts benchmarks or any
evidence gains.
And of course, having benchmarks is better, but for
micro-optimizations,
It doesn't seem like it's needed that much.

Mostly because committers don't tend to do this kind of drive-by
patching that changes long-established code, which are fairly
categorized as premature optimizations.

FWIW I find the argument that committers are somehow absolved of having
to demonstrate the benefits of a change rather misleading. Perhaps even
offensive, as it hints committers are less demanding/careful when it
comes to their own patches. Which goes directly against my experience
and understanding of what being a committer is.

I'm not going to claim every "optimization" patch submitted by a
committer had a benchmark - some patches certainly are pushed without
it, with just some general reasoning why the change is beneficial.

But I'm sure that when someone suggest the reasoning is wrong, it's
taken seriously - the discussion continues, there's a benchmark etc. And
I can't recall a committer suggesting it's fine because some other patch
didn't have a benchmark either.

(1) Sure, in the case where only null values are encountered
during a
query, we can save a cache lookup, but will that be even
micro-measurable
compared to general query overhead?  Seems unlikely, especially
if this is
changed in only one place.  That ties into my complaint about
how this is
just one instance of a fairly widely used coding pattern.

Of course, changing only in one place, the gain is tiny, but, step
by step,
the coding pattern is changing too, becoming new "fairly widely".

Agreed, but that isn't what was done here, there was no investigation of
the overall coding practice and suggestions to change them all to the
improved form.

Right. If we think the coding pattern is an improvement, we should tweak
all the places, not just one (and hope the other places will magically
switch on their own).

More importantly, though, I kinda doubt tweaking more places will
actually make the difference more significant (assuming it actually does
improve things). How likely is it you need to hash the same data type
multiple times, with just NULL values? And even if you do, improvements
like this tend to sum, not multiply - i.e. if the improvement is 1% for
one place, it's still 1% even if the query hits 10 such places.

(2) What are the effects when we *do* eventually encounter a
non-null
value?  The existing coding will perform all the necessary lookups
at first call, but with the proposed change those may be spread
across
query execution.  It's not implausible that that leads to a net loss
of efficiency, due to locality-of-access effects.

Weel the current code, test branch for nulls first.
Most of the time, this is not true.
So, the current code has poor branch prediction, at least.
What I proposed, improves the branch prediction, at least.

Per my other reply, the v3 proposal did not, IMHO, do a good job of
branch prediction either.

I find an improvement on code complexity grounds to be warranted, though
the benefit seems unlikely to outweigh the cost of doing it everywhere
(fixing only one place actually increases the cost component).

Even without the plausible locality-of-access argument the benefit here
is likely to be a micro-optimization that provides only minimal
benefit.  Evidence to the contrary is welcomed but, yes, the burden is
going to be placed squarely on the patch author(s) to demonstrate the
benefit accrued from the code churn is worth the cost.

IMHO questions like this are exactly why some actual benchmark results
would be the best thing to move this forward. I certainly can't look at
C code and say how good it is for branch prediction.

regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company