clarify equalTupleDescs()

Started by Peter Eisentrautover 2 years ago10 messageshackers

peter_e@gmx.net

over 2 years ago

In a recent patch thread it was discussed[0]/messages/by-id/202401101316.k4s3fomwjx52@alvherre.pgsql which fields should be
compared by equalTupleDescs() and whether it is ok to remove a field
from tuple descriptors and how that should affect their equality
(attstattarget in that case).

After analyzing all the callers, I have noticed that there are two
classes of callers of equalTupleDescs():

The first want to compare what I call row-type equality, which means
they want to check specifically for equal number of attributes, and the
same attribute names, types, and typmods for each attribute. Most
callers actually want that behavior. The remaining callers just want to
compare the tuple descriptors as they are, they don't care why the
fields are in there, they just want to compare all of them.

In the attached patch, I add a new function equalRowTypes() that is
effectively a subset of equalTupleDescs() and switch most callers to that.

The purpose of this patch is to make the semantics less uncertain.
Questions like the one in [0]/messages/by-id/202401101316.k4s3fomwjx52@alvherre.pgsql about attstattarget now have a clear
answer for both functions. I think this would be useful to have, as we
are thinking about more changes in pg_attribute and tuple descriptors.

[0]: /messages/by-id/202401101316.k4s3fomwjx52@alvherre.pgsql
/messages/by-id/202401101316.k4s3fomwjx52@alvherre.pgsql

Tom Lane

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Peter Eisentraut (#1)

Re: clarify equalTupleDescs()

Peter Eisentraut <peter@eisentraut.org> writes:

The first want to compare what I call row-type equality, which means
they want to check specifically for equal number of attributes, and the
same attribute names, types, and typmods for each attribute. Most
callers actually want that behavior.

Should compare attcollation too, no?

+1 for the general idea, but it seems like "row type equality"
might still be a slightly fuzzy concept.

regards, tom lane

jian he

jian.universality@gmail.com

over 2 years ago

In reply to: Peter Eisentraut (#1)

Re: clarify equalTupleDescs()

On Tue, Feb 6, 2024 at 8:59 PM Peter Eisentraut <peter@eisentraut.org> wrote:

In a recent patch thread it was discussed[0] which fields should be
compared by equalTupleDescs() and whether it is ok to remove a field
from tuple descriptors and how that should affect their equality
(attstattarget in that case).

After analyzing all the callers, I have noticed that there are two
classes of callers of equalTupleDescs():

The first want to compare what I call row-type equality, which means
they want to check specifically for equal number of attributes, and the
same attribute names, types, and typmods for each attribute. Most
callers actually want that behavior. The remaining callers just want to
compare the tuple descriptors as they are, they don't care why the
fields are in there, they just want to compare all of them.

In the attached patch, I add a new function equalRowTypes() that is
effectively a subset of equalTupleDescs() and switch most callers to that.

The purpose of this patch is to make the semantics less uncertain.
Questions like the one in [0] about attstattarget now have a clear
answer for both functions. I think this would be useful to have, as we
are thinking about more changes in pg_attribute and tuple descriptors.

[0]:
/messages/by-id/202401101316.k4s3fomwjx52@alvherre.pgsql

function name record_type_typmod_hash imply that
hashRowType should also hash atttypmod field?

also:

bool
equalRowTypes(TupleDesc tupdesc1, TupleDesc tupdesc2)
{
if (tupdesc1->natts != tupdesc2->natts)
return false;
if (tupdesc1->tdtypeid != tupdesc2->tdtypeid)
return false;

for (int i = 0; i < tupdesc1->natts; i++)
{
Form_pg_attribute attr1 = TupleDescAttr(tupdesc1, i);
Form_pg_attribute attr2 = TupleDescAttr(tupdesc2, i);

if (strcmp(NameStr(attr1->attname), NameStr(attr2->attname)) != 0)
return false;
if (attr1->atttypid != attr2->atttypid)
return false;
if (attr1->atttypmod != attr2->atttypmod)
return false;
}

return true;
}

/*
* hashRowType
*
* If two tuple descriptors would be considered equal by equalRowTypes()
* then their hash value will be equal according to this function.
*/
uint32
hashRowType(TupleDesc desc)
{
uint32 s;
int i;

s = hash_combine(0, hash_uint32(desc->natts));
s = hash_combine(s, hash_uint32(desc->tdtypeid));
for (i = 0; i < desc->natts; ++i)
s = hash_combine(s, hash_uint32(TupleDescAttr(desc, i)->atttypid));

return s;
}

from the hashRowType comment, should we also hash attname and atttypmod?

Peter Eisentraut

peter_e@gmx.net

over 2 years ago

In reply to: jian he (#3)

Re: clarify equalTupleDescs()

On 07.02.24 04:06, jian he wrote:

/*
* hashRowType
*
* If two tuple descriptors would be considered equal by equalRowTypes()
* then their hash value will be equal according to this function.
*/
uint32
hashRowType(TupleDesc desc)
{
uint32 s;
int i;

s = hash_combine(0, hash_uint32(desc->natts));
s = hash_combine(s, hash_uint32(desc->tdtypeid));
for (i = 0; i < desc->natts; ++i)
s = hash_combine(s, hash_uint32(TupleDescAttr(desc, i)->atttypid));

return s;
}

from the hashRowType comment, should we also hash attname and atttypmod?

In principle, hashRowType() could process all the fields that
equalRowTypes() does. But since it's only a hash function, it doesn't
have to be perfect. (This is also the case for the current
hashTupleDesc().) I'm not sure where the best tradeoff is.

Peter Eisentraut

peter_e@gmx.net

over 2 years ago

In reply to: Tom Lane (#2)

Re: clarify equalTupleDescs()

On 06.02.24 16:14, Tom Lane wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

The first want to compare what I call row-type equality, which means
they want to check specifically for equal number of attributes, and the
same attribute names, types, and typmods for each attribute. Most
callers actually want that behavior.

Should compare attcollation too, no?

+1 for the general idea, but it seems like "row type equality"
might still be a slightly fuzzy concept.

I did another pass across the callers to check what pg_attribute fields
might be relevant.

Collation definitely needs to be added, certainly for plancache.c, maybe
for typcache.c, the other callers don't care.

Record types can have attisdropped fields, so it's probably good to
check those.

I'm suspicious about attndims. Maybe one could create a test case where
record types differ only in that. Support for attndims throughout the
system is weak, but maybe there is something to check there.

On a conceptual level, I figured pg_attribute rows can be divided up
into three categories:

1. "row type" stuff: attname, atttypid, atttypmod, attndims,
attisdropped, attcollation

2. physical layout stuff: attlen, attcacheoff, attbyval, attalign

3. table metadata stuff (everything else)

It's not perfect, and sometimes it's not clear whether these categories
inform the implementation or the other way around, but I think it helps
conceptualize it.

Tom Lane

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Peter Eisentraut (#5)

Re: clarify equalTupleDescs()

Peter Eisentraut <peter@eisentraut.org> writes:

On 06.02.24 16:14, Tom Lane wrote:

+1 for the general idea, but it seems like "row type equality"
might still be a slightly fuzzy concept.

I did another pass across the callers to check what pg_attribute fields
might be relevant.

Collation definitely needs to be added, certainly for plancache.c, maybe
for typcache.c, the other callers don't care.

Record types can have attisdropped fields, so it's probably good to
check those.

Yeah, good idea. (In most cases the attname comparison would catch
that, but we shouldn't rely on it.) In a perfect world maybe a
dropped column should be invisible to this comparison, but we're
a very long way from being able to treat it that way.

I'm suspicious about attndims. Maybe one could create a test case where
record types differ only in that. Support for attndims throughout the
system is weak, but maybe there is something to check there.

There was a discussion last year[1]/messages/by-id/ZD+14YZ4IUue8Rhi@gendo.asyd.net about removing attndims
altogether, which still seems to me like possibly a good idea.
So I doubt we want to consider it as a core semantic field.

On a conceptual level, I figured pg_attribute rows can be divided up
into three categories:

1. "row type" stuff: attname, atttypid, atttypmod, attndims,
attisdropped, attcollation

2. physical layout stuff: attlen, attcacheoff, attbyval, attalign

I recall some discussion about taking attcacheoff out of this data
structure too ...

3. table metadata stuff (everything else)

It's not perfect, and sometimes it's not clear whether these categories
inform the implementation or the other way around, but I think it helps
conceptualize it.

Sure.

regards, tom lane

[1]: /messages/by-id/ZD+14YZ4IUue8Rhi@gendo.asyd.net

jian he

jian.universality@gmail.com

over 2 years ago

In reply to: Peter Eisentraut (#4)

Re: clarify equalTupleDescs()

On Mon, Feb 12, 2024 at 7:47 PM Peter Eisentraut <peter@eisentraut.org> wrote:

In principle, hashRowType() could process all the fields that
equalRowTypes() does. But since it's only a hash function, it doesn't
have to be perfect. (This is also the case for the current
hashTupleDesc().) I'm not sure where the best tradeoff is.

That's where my confusion comes from.
hashRowType is used in record_type_typmod_hash.
record_type_typmod_hash is within assign_record_type_typmod.

in assign_record_type_typmod:
------------------------------------------------
if (RecordCacheHash == NULL)
{
/* First time through: initialize the hash table */
HASHCTL ctl;
ctl.keysize = sizeof(TupleDesc); /* just the pointer */
ctl.entrysize = sizeof(RecordCacheEntry);
ctl.hash = record_type_typmod_hash;
ctl.match = record_type_typmod_compare;
RecordCacheHash = hash_create("Record information cache", 64,
&ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE);
/* Also make sure CacheMemoryContext exists */
if (!CacheMemoryContext)
CreateCacheMemoryContext();
}
/*
* Find a hashtable entry for this tuple descriptor. We don't use
* HASH_ENTER yet, because if it's missing, we need to make sure that all
* the allocations succeed before we create the new entry.
*/
recentry = (RecordCacheEntry *) hash_search(RecordCacheHash,
&tupDesc,
HASH_FIND, &found);
------------------------------------------------
based on the comments in hash_create. The above hash_search function
would first use
record_type_typmod_hash to find out candidate entries in a hash table
then use record_type_typmod_compare to compare the given tupDesc with
candidate entries.

Is this how the hash_search in assign_record_type_typmod works?

equalRowTypes processed more fields than hashRowType,
hashRowType comments mentioned equalRowTypes,
maybe we should have some comments in hashRowType explaining why only
hashing natts, tdtypeid, atttypid will be fine.

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 2 years ago

In reply to: Peter Eisentraut (#5)

Re: clarify equalTupleDescs()

Hi,

I looked at this patch today. I went through all the calls switched to
equalRowTypes, and AFAIK all of them are correct - all the places
switched to equalRowTypes() only need the weaker checks.

There's only two places still calling equalTupleDescs() - relcache
certainly needs that, and so does the assert in execReplication().

As for attndims, I agree equalRowTypes() should not check that. We're
not really checking that anywhere, it'd be quite weird to start with it
here. Especially if the plan is to remove it entirely.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 2 years ago

In reply to: jian he (#7)

Re: clarify equalTupleDescs()

On 2/27/24 12:13, jian he wrote:

On Mon, Feb 12, 2024 at 7:47 PM Peter Eisentraut <peter@eisentraut.org> wrote:

In principle, hashRowType() could process all the fields that
equalRowTypes() does. But since it's only a hash function, it doesn't
have to be perfect. (This is also the case for the current
hashTupleDesc().) I'm not sure where the best tradeoff is.

That's where my confusion comes from.
hashRowType is used in record_type_typmod_hash.
record_type_typmod_hash is within assign_record_type_typmod.

in assign_record_type_typmod:

------------------------------------------------
if (RecordCacheHash == NULL)
{
/* First time through: initialize the hash table */
HASHCTL ctl;
ctl.keysize = sizeof(TupleDesc); /* just the pointer */
ctl.entrysize = sizeof(RecordCacheEntry);
ctl.hash = record_type_typmod_hash;
ctl.match = record_type_typmod_compare;
RecordCacheHash = hash_create("Record information cache", 64,
&ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE);
/* Also make sure CacheMemoryContext exists */
if (!CacheMemoryContext)
CreateCacheMemoryContext();
}
/*
* Find a hashtable entry for this tuple descriptor. We don't use
* HASH_ENTER yet, because if it's missing, we need to make sure that all
* the allocations succeed before we create the new entry.
*/
recentry = (RecordCacheEntry *) hash_search(RecordCacheHash,
&tupDesc,
HASH_FIND, &found);
------------------------------------------------
based on the comments in hash_create. The above hash_search function
would first use
record_type_typmod_hash to find out candidate entries in a hash table
then use record_type_typmod_compare to compare the given tupDesc with
candidate entries.

Is this how the hash_search in assign_record_type_typmod works?

Yes.

equalRowTypes processed more fields than hashRowType,
hashRowType comments mentioned equalRowTypes,
maybe we should have some comments in hashRowType explaining why only
hashing natts, tdtypeid, atttypid will be fine.

Not sure I understand what the confusion is - omitting fields with
little entropy is not uncommon, and collisions are inherent to hash
tables, and need to be handled anyway.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10

Peter Eisentraut

peter_e@gmx.net

over 2 years ago

In reply to: Tomas Vondra (#8)

Re: clarify equalTupleDescs()

On 13.03.24 19:43, Tomas Vondra wrote:

I looked at this patch today. I went through all the calls switched to
equalRowTypes, and AFAIK all of them are correct - all the places
switched to equalRowTypes() only need the weaker checks.

There's only two places still calling equalTupleDescs() - relcache
certainly needs that, and so does the assert in execReplication().

As for attndims, I agree equalRowTypes() should not check that. We're
not really checking that anywhere, it'd be quite weird to start with it
here. Especially if the plan is to remove it entirely.

Thanks for checking this again. I have committed the patch as it was
presented then.

clarify equalTupleDescs()

Attachments:

Attachments: