Nested comparison semantics are unpredictable

Started by Richard Wesley8 months ago2 messagesbugs

richard@duckdblabs.com

8 months ago

Hey guys -

I was trying to make our comparison semantics for nested types <https://github.com/duckdb/duckdb/issues/18039> match Postgres, and I found that the semantics of comparisons change when a join is present.

psql (17.5 (Postgres.app))
Type "help" for help.

hawkfish=# with tbl_s_null as (
hawkfish(# select *
hawkfish(# from (
hawkfish(# values (row(1), row(0)), (row(1), row(1)), (row(1), NULL), (row(1), row(NULL::int))
hawkfish(# ) as tbl_s_null(col0, col1)
hawkfish(# )
hawkfish-# SELECT x.col1, y.col1, x.col1 = y.col1, x.col1 != y.col1
hawkfish-# FROM tbl_s_null x CROSS JOIN tbl_s_null y
hawkfish-# ORDER BY x.col1, y.col1 NULLS LAST;
col1 | col1 | ?column? | ?column?
------+------+----------+----------
(0) | (0) | t | f
(0) | (1) | f | t
(0) | () | f | t
(0) | | |
(1) | (0) | f | t
(1) | (1) | t | f
(1) | () | f | t
(1) | | |
() | (0) | f | t
() | (1) | f | t
() | () | t | f
() | | |
| (0) | |
| (1) | |
| () | |
| | |
(16 rows)

Note that in this query, row(0) <> row(NULL). But if I just issue the comparison directly:

hawkfish=# select row(0) = row(NULL);
?column?
----------

(1 row)

I originally suspected that it might the the CSE processing, but as the first query shows, it does it even with an explicit CROSS JOIN.

This was reproduced with 17.5 on a MacBook Pro 2.4 GHz 8-Core Intel Core i9 running MacOS 15.6 (24G84).

Met vriendelijke groet, best regards, mit freundlichen Grüßen,
Richard Wesley
Time Lord
richard@duckdblabs.com <mailto:richard@duckdblabs.com>

Tom Lane

tgl@sss.pgh.pa.us

8 months ago

In reply to: Richard Wesley (#1)

Re: Nested comparison semantics are unpredictable

Richard Wesley <richard@duckdblabs.com> writes:

I was trying to make our comparison semantics for nested types <https://github.com/duckdb/duckdb/issues/18039> match Postgres, and I found that the semantics of comparisons change when a join is present.

It's not about joins. It's about the syntactic form of the
expression. When you write "ROW(...) = ROW(...)", that goes
through make_row_comparison_op(), which indeed behaves
differently from record_eq(), which is where the comparison
will end up when it looks like "var = var". In particular
"ROW(x) = ROW(y)" is optimized into "x = y" which is why
you get a NULL for "row(0) = row(NULL)". record_eq() is not
allowed to produce a null in such cases, though --- else it
would be unsuitable to use as a btree comparator.

There's a lot of historical baggage and spec-text-lawyering
behind all this, but the short answer is that we're unlikely
to change either behavior.

regards, tom lane