Add system column support to the USING clause
Attachments:
v1-0001-Add-system-column-support-to-the-USING-clause.patchtext/x-diff; name=v1-0001-Add-system-column-support-to-the-USING-clause.patchDownload+330-19
On Friday, September 13, 2024, Denis Garsh <d.garsh@arenadata.io> wrote:
The patch adds support for system columns in JOIN USING clause.
Definitely not high on my list of oversights to fix. Resorting to the ON
clause for the rare query that would need to do such a thing isn’t that
costly. But as the patch exists I’ll leave it to others to judge the cost
of actually adding it, or worthwhile-mess of reviewing it.
Link to PR on GitHub: https://github.com/hilltracer/postgres/pull/3
You apparently missed the note on GitHub that says we don’t work with pull
requests. Patches are to be submitted directly to the mailing list.
David J.
On Friday, September 13, 2024, David G. Johnston <david.g.johnston@gmail.com>
wrote:
Link to PR on GitHub: https://github.com/hilltracer/postgres/pull/3
You apparently missed the note on GitHub that says we don’t work with pull
requests. Patches are to be submitted directly to the mailing list.
Sorry, I see now that you did both - that makes sense.
David J.
Denis Garsh <d.garsh@arenadata.io> writes:
The patch adds support for system columns in JOIN USING clause.
I think this is an actively bad idea, and it was likely intentional
that it's not supported today. A few reasons why:
* There are, fundamentally, no use-cases for joining on system
columns. The only one that is stable enough to even consider
using for the purpose is tableoid, and I'm not detecting a reason
why that'd be useful. If there are any edge cases where people
would actually wish to do that, it can be done easily enough with
a standard JOIN ON clause.
* Non-heap table AMs may not provide the same system columns that
heap does, further reducing the scope for plausible use-cases.
(Yeah, I know we don't support that too well today.)
* This breaks ruleutils.c's mechanism for dealing with name
conflicts across multiple USING clauses. That relies on being
able to assign aliases to the USING inputs at the table level
(that is, "FROM realtable AS aliastable(aliascolumn,...)").
There's no way to alias a system column in the FROM syntax.
regards, tom lane
On 13.09.2024 17:56, Tom Lane wrote:
I think this is an actively bad idea, and it was likely intentional
that it's not supported today. A few reasons why:
Thank you, Tom and David, for your feedback.
I admit my mistake. I should have asked if this problem was worth
solving before diving in. However, since I’ve already spent a lot of
time into the patch, so I'll try to fight a little ;-)
It looks like this feature hasn't been added because it's not obvious
how to do it. And it is difficult to assess the consequences of adding a
system column in RTE. Personally, I had to sweat to do it.
* There are, fundamentally, no use-cases for joining on system
columns. The only one that is stable enough to even consider
using for the purpose is tableoid, and I'm not detecting a reason
why that'd be useful. If there are any edge cases where people
would actually wish to do that, it can be done easily enough with
a standard JOIN ON clause.
But after all, it's implemented in `JOIN ON`. Accordingly, it seems like
it should also be supported in `JOIN USING`. And is there any guarantee
that new system columns won't be added in the future that may be more
useful?
* This breaks ruleutils.c's mechanism for dealing with name
conflicts across multiple USING clauses. That relies on being
able to assign aliases to the USING inputs at the table level
(that is, "FROM realtable AS aliastable(aliascolumn,...)").
There's no way to alias a system column in the FROM syntax.
Could you please provide an example of such a query? I've tried creating
multi-join queries with aliases, but I couldn't break it. For example:
```sql
CREATE TABLE t (id1 int);
CREATE TABLE tt (id2 int);
CREATE TABLE ttt (id3 int);
CREATE TABLE tttt (id4 int);
BEGIN;
INSERT INTO t VALUES (1);
INSERT INTO tt VALUES (101);
INSERT INTO ttt VALUES (201);
INSERT INTO tttt VALUES (301);
COMMIT;
BEGIN;
INSERT INTO t VALUES (2);
INSERT INTO tt VALUES (102);
INSERT INTO ttt VALUES (202);
INSERT INTO tttt VALUES (302);
COMMIT;
INSERT INTO t VALUES (3);
INSERT INTO tt VALUES (103);
INSERT INTO ttt VALUES (203);
INSERT INTO tttt VALUES (303);
SELECT *FROM t FULL JOIN tt USING (xmin);
-- xmin | id1 | id2
--------+-----+-----
-- 1057 | 1 | 101
-- 1058 | 2 | 102
-- 1059 | 3 |
-- 1060 | | 103
--(4 rows)
SELECT *FROM ttt FULL JOIN tttt USING (xmin);
-- xmin | id3 | id4
--------+-----+-----
-- 1057 | 201 | 301
-- 1058 | 202 | 302
-- 1061 | 203 |
-- 1062 | | 303
--(4 rows)
SELECT * FROM t FULL JOIN tt USING (xmin) FULL JOIN ttt USING (xmin);
-- xmin | id1 | id2 | id3
--------+-----+-----+-----
-- 1057 | 1 | 101 | 201
-- 1058 | 2 | 102 | 202
-- 1059 | 3 | |
-- 1060 | | 103 |
-- 1061 | | | 203
--(5 rows)
SELECT *FROM
(t FULL JOIN tt USING (xmin)) AS alias1(col1, col21, col31)
JOIN
(ttt FULL JOIN tttt USING (xmin)) AS alias2(col1, col22, col32)
USING (col1);
-- col1 | col21 | col31 | col22 | col32
--------+-------+-------+-------+-------
-- 1057 | 1 | 101 | 201 | 301
-- 1058 | 2 | 102 | 202 | 302
--(2 rows)
```
I noticed that after adding it to the RTE, the negative system column
attributes will be used in `ruleutils.c` (see
[here](https://github.com/postgres/postgres/blob/52c707483ce4d0161127e4958d981d1b5655865e/src/backend/utils/adt/ruleutils.c#L5055)),
and then in the `colinfo` structure. However, I didn't find any issues
with `colinfo`. For example:
```sql
create table tt2 (a int, b int, c int);
create table tt3 (ax int8, b int2, c numeric);
create table tt4 (ay int, b int, q int);
create view v2 as select * from
tt2 join tt3 using (b,c,xmin) join tt4 using (b, xmin);
select pg_get_viewdef('v2', true);
-- SELECT tt2.b, tt2.xmin, tt3.c, tt2.a, tt3.ax, tt4.ay, tt4.q
-- FROM tt2 JOIN tt3 USING (b, c, xmin) JOIN tt4 USING (b, xmin);
alter table tt2 add column d int;
alter table tt2 add column e int;
select pg_get_viewdef('v2', true);
-- SELECT tt2.b, tt2.xmin, tt3.c, tt2.a, tt3.ax, tt4.ay, tt4.q
-- FROM tt2 JOIN tt3 USING (b, c, xmin) JOIN tt4 USING (b, xmin);
-- alter table tt3 rename c to d;
select pg_get_viewdef('v2', true);
-- SELECT tt2.b, tt2.xmin, tt3.c, tt2.a, tt3.ax, tt4.ay, tt4.q
-- FROM tt2 JOIN tt3 tt3(ax, b, c) USING (b, c, xmin) JOIN tt4 USING
(b, xmin);
alter table tt3 add column c int;
alter table tt3 add column e int;
select pg_get_viewdef('v2', true);
-- SELECT tt2.b, tt2.xmin, tt3.c, tt2.a, tt3.ax, tt4.ay, tt4.q
-- FROM tt2 JOIN tt3 tt3(ax, b, c, c_1, e) USING (b, c, xmin)
-- JOIN tt4 USING (b, xmin);
alter table tt2 drop column d;
select pg_get_viewdef('v2', true);
-- SELECT tt2.b, tt2.xmin, tt3.c, tt2.a, tt3.ax, tt4.ay, tt4.q
-- FROM tt2 JOIN tt3 tt3(ax, b, c, c_1, e) USING (b, c, xmin)
-- JOIN tt4 USING (b, xmin);
```
--
Best regards,
Denis Garsh
Hello, I'm still hoping for an answer.
--
Best regards,
Denis Garsh
If we are not going to implement this functionality, maybe it’s worth
adding an explicit description to the
[documentation](https://www.postgresql.org/docs/current/queries-table-expressions.html)?
A note like: "JOIN USING doesn’t support system columns. If you need to
join tables by a system column, use the JOIN ON clause."
--
Best regards,
Denis Garsh,
C Developer,
d.garsh@arenadata.io