Nested comparison semantics are unpredictable
Hey guys -
I was trying to make our comparison semantics for nested types <https://github.com/duckdb/duckdb/issues/18039> match Postgres, and I found that the semantics of comparisons change when a join is present.
psql (17.5 (Postgres.app))
Type "help" for help.
hawkfish=# with tbl_s_null as (
hawkfish(# select *
hawkfish(# from (
hawkfish(# values (row(1), row(0)), (row(1), row(1)), (row(1), NULL), (row(1), row(NULL::int))
hawkfish(# ) as tbl_s_null(col0, col1)
hawkfish(# )
hawkfish-# SELECT x.col1, y.col1, x.col1 = y.col1, x.col1 != y.col1
hawkfish-# FROM tbl_s_null x CROSS JOIN tbl_s_null y
hawkfish-# ORDER BY x.col1, y.col1 NULLS LAST;
col1 | col1 | ?column? | ?column?
------+------+----------+----------
(0) | (0) | t | f
(0) | (1) | f | t
(0) | () | f | t
(0) | | |
(1) | (0) | f | t
(1) | (1) | t | f
(1) | () | f | t
(1) | | |
() | (0) | f | t
() | (1) | f | t
() | () | t | f
() | | |
| (0) | |
| (1) | |
| () | |
| | |
(16 rows)
Note that in this query, row(0) <> row(NULL). But if I just issue the comparison directly:
hawkfish=# select row(0) = row(NULL);
?column?
----------
(1 row)
I originally suspected that it might the the CSE processing, but as the first query shows, it does it even with an explicit CROSS JOIN.
This was reproduced with 17.5 on a MacBook Pro 2.4 GHz 8-Core Intel Core i9 running MacOS 15.6 (24G84).
Met vriendelijke groet, best regards, mit freundlichen Grüßen,
Richard Wesley
Time Lord
richard@duckdblabs.com <mailto:richard@duckdblabs.com>
Richard Wesley <richard@duckdblabs.com> writes:
I was trying to make our comparison semantics for nested types <https://github.com/duckdb/duckdb/issues/18039> match Postgres, and I found that the semantics of comparisons change when a join is present.
It's not about joins. It's about the syntactic form of the
expression. When you write "ROW(...) = ROW(...)", that goes
through make_row_comparison_op(), which indeed behaves
differently from record_eq(), which is where the comparison
will end up when it looks like "var = var". In particular
"ROW(x) = ROW(y)" is optimized into "x = y" which is why
you get a NULL for "row(0) = row(NULL)". record_eq() is not
allowed to produce a null in such cases, though --- else it
would be unsuitable to use as a btree comparator.
There's a lot of historical baggage and spec-text-lawyering
behind all this, but the short answer is that we're unlikely
to change either behavior.
regards, tom lane