Hashed SAOP on composite type with non-hashable column errors at runtime

Started by Andrei Lepikhov14 days ago3 messagesbugs
Jump to latest
#1Andrei Lepikhov
lepihov@gmail.com

Hi,

There is an issue when we use a record-based array operation in SQL:

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, BUFFERS OFF, SUMMARY OFF)
SELECT count(*) FROM test
WHERE (a,b) = ANY (ARRAY[
(1, 'w1'::tsvector), (2, 'w2'::tsvector), (3, 'w3'::tsvector),
(4, 'w4'::tsvector), (5, 'w5'::tsvector), (6, 'w6'::tsvector),
(7, 'w7'::tsvector), (8, 'w8'::tsvector), (9, 'w9'::tsvector)
]);
ERROR: could not identify a hash function for type tsvector

See the attachment for the full reproduction script.
This happens because the hashability check for the record and array types misses
the op_hashjoinable() test. With fewer than 9 elements the query executes
successfully.

Patch 0001 (attached) fixes this bug. It is a natural follow-up to 17da9d4c282,
the hashing of record types itself was introduced by 01e658fa74c. It deserves a
back-patch down to v14.

More interesting is that EXPLAIN doesn't expose whether the executor used the
hashed or the plain search strategy. That might be acceptable, since we know
hashing is always used from nine elements on. But it forces the user first to
read the source code, and then to inspect the catalog, to find out whether the
clause has a hash function. For a SubPlan we do have this information — so let's
take a look at v0-0002, which introduces a 'hashed' flag.

It would be too prosaic a bug fix if there weren't a nice corner case with the
anonymous record type. Consider the following:

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
SELECT count(*) FROM (SELECT g x, -g y FROM generate_series(1,300000) g) t
WHERE (x, y) = ANY
(array[(1,-1),(2,-2),(1,-1),(2,-2),(1,-1),(2,-2),(1,-1),(2,-2),(64,-64)]);

/*
-- Before the fix:
Aggregate (actual rows=1.00 loops=1)
Buffers: shared hit=63 read=5, temp read=513 written=513
-> Function Scan on generate_series g (actual rows=3.00 loops=1)
Filter: (ROW(g, (- g)) = ANY
('{"(1,-1)","(2,-2)","(1,-1)","(2,-2)","(1,-1)","(2,-2)","(1,-1)","(2,-2)","(64,-64)"}'::record[]))
Rows Removed by Filter: 299997
Buffers: shared hit=63 read=5, temp read=513 written=513
Planning:
Buffers: shared hit=45 read=16
Planning Time: 2.923 ms
Execution Time: 62.969 ms
(10 rows)

-- After the fix:
Aggregate (actual rows=1.00 loops=1)
Buffers: shared hit=42, temp read=513 written=513
-> Function Scan on generate_series g (actual rows=3.00 loops=1)
Filter: (ROW(g, (- g)) = ANY
('{"(1,-1)","(2,-2)","(1,-1)","(2,-2)","(1,-1)","(2,-2)","(1,-1)","(2,-2)","(64,-64)"}'::record[]))
Rows Removed by Filter: 299997
Buffers: shared hit=42, temp read=513 written=513
Planning:
Buffers: shared hit=88
Planning Time: 0.837 ms
Execution Time: 745.897 ms
(10 rows)
*/

You can see a regression here: a legitimate hashed SAOP is no longer hashed. The
fix for that is not so simple — we have to check every element of the array
before deciding whether the hashing strategy is possible. This is quite an
expensive operation, so I sketched a solution in patch 0003, but I'm not sure it
is worth developing: checking an anonymous type might simply be too expensive.
Should it be done only once, conditionally, with a size limit and result caching?

--
regards, Andrei Lepikhov,
pgEdge

Attachments:

v0-0001-Don-t-hash-a-record-array-SAOP-whose-input-type-i.patchtext/plain; charset=UTF-8; name=v0-0001-Don-t-hash-a-record-array-SAOP-whose-input-type-i.patchDownload+72-6
v0-0002-Show-hashed-ScalarArrayOpExpr-decision-in-EXPLAIN.patchtext/plain; charset=UTF-8; name=v0-0002-Show-hashed-ScalarArrayOpExpr-decision-in-EXPLAIN.patchDownload+21-2
v0-0003-Recover-hashed-SAOP-for-anonymous-records-with-ha.patchtext/plain; charset=UTF-8; name=v0-0003-Recover-hashed-SAOP-for-anonymous-records-with-ha.patchDownload+138-6
bug-hashed-saop.sqltext/plain; charset=UTF-8; name=bug-hashed-saop.sqlDownload
#2Andrei Lepikhov
lepihov@gmail.com
In reply to: Andrei Lepikhov (#1)
Re: Hashed SAOP on composite type with non-hashable column errors at runtime

On 05/06/2026 20:12, Tom Lane wrote:

So I'm unexcited about putting the fix for this into
convert_saop_to_hashed_saop_walker as you've done here.
I think it needs to be addressed at the level of the relevant
lsyscache.c lookup functions, so that there's some chance that
future code additions will get this right. Draft fix attached.

Thanks for your efforts!
Now, hash_ok_operator and op_hashjoinable handle all four container-type
equality operators. Side way is a C extension that lets you create a custom type
that groups other types marked as HASHES. I started this research because I had
trouble redesigning my ‘statistics’ type [1]https://github.com/danolivo/pg_track_optimizer/blob/main/rstats.h, but here, using HASHES seems just
not to work for my custom type.

Fixes in the lookup_type_cache related to the multirange type are also correct
for me. As well as pg_operator.dat changes.

I can't get excited about the test case you suggest;
it's rather expensive and it will do nothing whatever
to guard against future mistakes of the same kind.

Ok, let me think about that a little more.

I'm also unexcited about your 0002 and 0003.

I understand about 0003, but what is the problem with 0002? In practice, people
use massive arrays (I’ve seen thousands of elements). You might remember my
complaint about planner’s memory consumption on array selectivity estimation a
couple of years ago - that time you proposed local planning memory context. So,
it’d be nice to see (as with Subplans) whether the SAOP is not hashed for a reason.

I don't really care about optimizing the anonymous-record case; by and large,
it's coincidental that complicated operations work at all on
anonymous record types.

Got it. My actual care here is to provide a way (if possible) for extension
developers to fix this problem in ORM systems where they can't change the
complex application, but have an access pattern and will see regressions, as
they struggle with regressions each time after the introduction of a brand-new
query tree rewriting rule ;).

Note on the ‘lefthashfunc == righthashfunc’ condition. It is correct, because we
can compare RECORDs with only identical types in corresponding positions on the
left and right side of the comparison operator:

if (att1->atttypid != att2->atttypid)
ereport(ERROR, "cannot compare dissimilar column types %s and %s ...");

So, if someday typecache is extended to compare, let’s say, (int4, int8) and
(int4, numeric), this code should also be revised, right?

[1]: https://github.com/danolivo/pg_track_optimizer/blob/main/rstats.h

--
regards, Andrei Lepikhov,
pgEdge

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrei Lepikhov (#2)
Re: Hashed SAOP on composite type with non-hashable column errors at runtime

Andrei Lepikhov <lepihov@gmail.com> writes:

Now, hash_ok_operator and op_hashjoinable handle all four container-type
equality operators. Side way is a C extension that lets you create a custom type
that groups other types marked as HASHES.

Yeah, it's interesting to speculate about what we'd have to do to
allow extensions to invent new kinds of container types. Right now,
the knowledge of what kinds of containers there are is wired into
a bunch of places. This fix isn't adding any new places, just fixing
some places whose knowledge was incomplete. So I'm content with this
for today.

Fixes in the lookup_type_cache related to the multirange type are also correct
for me. As well as pg_operator.dat changes.

Thanks for reviewing; I pushed v1-0001 after a bit more
comment-smithing.

I'm also unexcited about your 0002 and 0003.

I understand about 0003, but what is the problem with 0002?

Let me rephrase that: 0002 is a new feature and hence out of scope
at this point in the development cycle. If you want to start a
new thread proposing that for v20, go right ahead.

regards, tom lane