Idea: Avoid JOINs by using path expressions to follow FKs

Started by Joel Jacobsonabout 5 years ago50 messageshackers
Jump to latest
#1Joel Jacobson
joel@compiler.org

Hi,

The database Neo4j has a language called "Cypher" where one of the key selling points is they "don’t need join tables".

Here is an example from https://neo4j.com/developer/cypher/guide-sql-to-cypher/

SQL:

SELECT DISTINCT c.company_name
FROM customers AS c
JOIN orders AS o ON c.customer_id = o.customer_id
JOIN order_details AS od ON o.order_id = od.order_id
JOIN products AS p ON od.product_id = p.product_id
WHERE p.product_name = 'Chocolade';

Neo4j's Cypher:

MATCH (p:product {product_name:"Chocolade"})<-[:PRODUCT]-(:order)<-[:PURCHASED]-(c:customer)
RETURN distinct c.company_name;

Imagine if we could simply write the SQL query like this:

SELECT DISTINCT od.order_id.customer_id.company_name
FROM order_details AS od
WHERE od.product_id.product_name = 'Chocolade';

I took the inspiration for this syntax from SQL/JSON path expressions.

Since there is only a single foreign key on the order_details.order_id column,
we would know how to resolve it, i.e. to the orders table,
and from there we would follow the customer_id column to the customers table,
where we would finally get the company_name value.

In the where clause, we would follow the order_details's product_id column
to the products table, to filter on product_name.

If there would be multiple foreign keys on a column we try to follow,
the query planner would throw an error forcing the user to use explicit joins instead.

I think this syntactic sugar could save a lot of unnecessary typing,
and as long as the column names are chosen wisely,
the path expression will be just as readable as the manual JOINs would be.

Thoughts?

/Joel

#2Pantelis Theodosiou
ypercube@gmail.com
In reply to: Joel Jacobson (#1)
Re: Idea: Avoid JOINs by using path expressions to follow FKs

On Sat, Mar 27, 2021 at 8:28 PM Joel Jacobson <joel@compiler.org> wrote:

Hi,

The database Neo4j has a language called "Cypher" where one of the key
selling points is they "don’t need join tables".

Here is an example from
https://neo4j.com/developer/cypher/guide-sql-to-cypher/

SQL:

SELECT DISTINCT c.company_name
FROM customers AS c
JOIN orders AS o ON c.customer_id = o.customer_id
JOIN order_details AS od ON o.order_id = od.order_id
JOIN products AS p ON od.product_id = p.product_id
WHERE p.product_name = 'Chocolade';

Neo4j's Cypher:

MATCH (p:product
{product_name:"Chocolade"})<-[:PRODUCT]-(:order)<-[:PURCHASED]-(c:customer)
RETURN distinct c.company_name;

Imagine if we could simply write the SQL query like this:

SELECT DISTINCT od.order_id.customer_id.company_name
FROM order_details AS od
WHERE od.product_id.product_name = 'Chocolade';

I took the inspiration for this syntax from SQL/JSON path expressions.

Since there is only a single foreign key on the order_details.order_id
column,
we would know how to resolve it, i.e. to the orders table,
and from there we would follow the customer_id column to the customers
table,
where we would finally get the company_name value.

In the where clause, we would follow the order_details's product_id column
to the products table, to filter on product_name.

If there would be multiple foreign keys on a column we try to follow,
the query planner would throw an error forcing the user to use explicit
joins instead.

I think this syntactic sugar could save a lot of unnecessary typing,
and as long as the column names are chosen wisely,
the path expression will be just as readable as the manual JOINs would be.

Thoughts?

/Joel

Just my 2c. The idea is nice but:

1. It is changing the FROM clause and the (size of the) intermediate result
set. While in your example query there is no difference, you'd get
different results if it was something like

SELECT p.product_name, COUNT(*)
FROM ... (same joins)
GROUP BY p.product_name

2. If you want many columns in the SELECT list, possibly form many tables,
you'll need to repeated the expressions. i.e. how you propose to write
this without repeating the link expressions?

SELECT p.product_name, p.price, p.category, c.company_name, c.address
...

3. SQL already provides methods to remove the join "noise", with JOIN USING
(columns) when joining column have matching names and with NATURAL JOIN
(with extreme care).

Finally, extending the specs in this novel way might put Postgres in a
different path from the SQL specs in the future, especially if they have
plans to add functionality for graph queries.

Best regards
Pantelis Theodosiou

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Joel Jacobson (#1)
Re: Idea: Avoid JOINs by using path expressions to follow FKs

On 2021-Mar-27, Joel Jacobson wrote:

If there would be multiple foreign keys on a column we try to follow,
the query planner would throw an error forcing the user to use explicit joins instead.

This seems pretty dangerous -- you just have to create one more FK, and
suddenly a query that worked perfectly fine, now starts throwing errors
because it's now ambiguous. Feels a bit like JOIN NATURAL, which many
people discourage because of this problem.

--
�lvaro Herrera 39�49'30"S 73�17'W
Si no sabes adonde vas, es muy probable que acabes en otra parte.

#4Vik Fearing
vik@postgresfriends.org
In reply to: Joel Jacobson (#1)
Re: Idea: Avoid JOINs by using path expressions to follow FKs

On 3/27/21 9:27 PM, Joel Jacobson wrote:

Hi,

The database Neo4j has a language called "Cypher" where one of the key selling points is they "don’t need join tables".

Here is an example from https://neo4j.com/developer/cypher/guide-sql-to-cypher/

SQL:

SELECT DISTINCT c.company_name
FROM customers AS c
JOIN orders AS o ON c.customer_id = o.customer_id
JOIN order_details AS od ON o.order_id = od.order_id
JOIN products AS p ON od.product_id = p.product_id
WHERE p.product_name = 'Chocolade';

Neo4j's Cypher:

MATCH (p:product {product_name:"Chocolade"})<-[:PRODUCT]-(:order)<-[:PURCHASED]-(c:customer)
RETURN distinct c.company_name;

Imagine if we could simply write the SQL query like this:

SELECT DISTINCT od.order_id.customer_id.company_name
FROM order_details AS od
WHERE od.product_id.product_name = 'Chocolade';

I took the inspiration for this syntax from SQL/JSON path expressions.

This is a terrible idea, but let me explain why.

First of all, neo4j claims they don't have joins because they actually
don't have joins. The nodes point directly to the other nodes. Your
proposal is syntactic sugar over a real join. The equivalent to neo4j
would be somehow storing the foreign ctid in the column or something.

Secondly, and more importantly IMO, graph queries are coming to SQL.
They are mostly based on Cypher but not entirely because they amalgamate
concepts from other graph query languages like Oracle's PGQ. The
"common" syntax is called GQL (https://www.gqlstandards.org/) and it's
in the process of becoming Part 16 of the SQL standard. The timeline on
that website says August 2022 (it started in September 2019).

If that timeline holds and ambitious people work on it (I already know
one who is; not me), I would expect this to be in PostgreSQL 16 or 17.
The earliest your proposal could get in is 15, so it's not that much of
a wait.
--
Vik Fearing

#5Joel Jacobson
joel@compiler.org
In reply to: Vik Fearing (#4)
Re: Idea: Avoid JOINs by using path expressions to follow FKs

On Sun, Mar 28, 2021, at 12:25, Vik Fearing wrote:

On 3/27/21 9:27 PM, Joel Jacobson wrote:

Imagine if we could simply write the SQL query like this:

SELECT DISTINCT od.order_id.customer_id.company_name
FROM order_details AS od
WHERE od.product_id.product_name = 'Chocolade';

I took the inspiration for this syntax from SQL/JSON path expressions.

This is a terrible idea, but let me explain why.

First of all, neo4j claims they don't have joins because they actually
don't have joins. The nodes point directly to the other nodes. Your
proposal is syntactic sugar over a real join. The equivalent to neo4j
would be somehow storing the foreign ctid in the column or something.

Secondly, and more importantly IMO, graph queries are coming to SQL.
They are mostly based on Cypher but not entirely because they amalgamate
concepts from other graph query languages like Oracle's PGQ. The
"common" syntax is called GQL (https://www.gqlstandards.org/) and it's
in the process of becoming Part 16 of the SQL standard. The timeline on
that website says August 2022 (it started in September 2019).

If that timeline holds and ambitious people work on it (I already know
one who is; not me), I would expect this to be in PostgreSQL 16 or 17.
The earliest your proposal could get in is 15, so it's not that much of
a wait.

I think you misunderstood my idea entirely.

It has absolutely nothing to do with graph query languages,
except the one similarity I mentioned, not having joins.

What I propose is a way to do implicit joins by following foreign keys,
thus improving the SQL query language, making it more concise for
the simple case when what you want to do is an INNER JOIN on a
single column on which there is a single FOREIGN KEY.

/Joel

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Joel Jacobson (#5)
Re: Idea: Avoid JOINs by using path expressions to follow FKs

ne 28. 3. 2021 v 13:27 odesílatel Joel Jacobson <joel@compiler.org> napsal:

On Sun, Mar 28, 2021, at 12:25, Vik Fearing wrote:

On 3/27/21 9:27 PM, Joel Jacobson wrote:

Imagine if we could simply write the SQL query like this:

SELECT DISTINCT od.order_id.customer_id.company_name
FROM order_details AS od
WHERE od.product_id.product_name = 'Chocolade';

I took the inspiration for this syntax from SQL/JSON path expressions.

This is a terrible idea, but let me explain why.

First of all, neo4j claims they don't have joins because they actually
don't have joins. The nodes point directly to the other nodes. Your
proposal is syntactic sugar over a real join. The equivalent to neo4j
would be somehow storing the foreign ctid in the column or something.

Secondly, and more importantly IMO, graph queries are coming to SQL.
They are mostly based on Cypher but not entirely because they amalgamate
concepts from other graph query languages like Oracle's PGQ. The
"common" syntax is called GQL (https://www.gqlstandards.org/) and it's
in the process of becoming Part 16 of the SQL standard. The timeline on
that website says August 2022 (it started in September 2019).

If that timeline holds and ambitious people work on it (I already know
one who is; not me), I would expect this to be in PostgreSQL 16 or 17.
The earliest your proposal could get in is 15, so it's not that much of
a wait.

I think you misunderstood my idea entirely.

It has absolutely nothing to do with graph query languages,
except the one similarity I mentioned, not having joins.

What I propose is a way to do implicit joins by following foreign keys,
thus improving the SQL query language, making it more concise for
the simple case when what you want to do is an INNER JOIN on a
single column on which there is a single FOREIGN KEY.

There were some similar tools already. Personally I like the current
state, where tables should be explicitly defined, and join should be
explicitly defined. The joining of tables is not cheap - and I like the
visibility of this. On the other hand, this is very frustrable for a lot of
people, and I can understand. I don't want to see this feature inside
Postgres, because it can reduce the possibility to detect badly written
query. But it can be a great feature of some outer tool. I worked for
GoodData and this tool knows the model, and it generates necessary joins
implicitly, and people like it (this tool uses Postgres too).

https://www.gooddata.com/

Regards

Pavel

Show quoted text

/Joel

#7Joel Jacobson
joel@compiler.org
In reply to: Alvaro Herrera (#3)
Re: Idea: Avoid JOINs by using path expressions to follow FKs

On Sat, Mar 27, 2021, at 22:11, Alvaro Herrera wrote:

On 2021-Mar-27, Joel Jacobson wrote:

If there would be multiple foreign keys on a column we try to follow,
the query planner would throw an error forcing the user to use explicit joins instead.

This seems pretty dangerous -- you just have to create one more FK, and
suddenly a query that worked perfectly fine, now starts throwing errors
because it's now ambiguous.

Creating one more FK referencing some other column,
would break queries in the same way USING breaks,
if a column is added which causes ambiguity.

In my experience, it's extremely rare to have multiple different FKs on the same set of columns.
Maybe I'm missing something here, can we think of a realistic use-case?

If such a FK if created, it would break in the same way as USING breaks
if a column is added which causes ambiguity, except this is much less likely to happen than the equivalent use case.

I think this problem is hypothetical compared to the actual problem with USING,
since adding a column with the same name as some existing column actually happens sometimes.

Feels a bit like JOIN NATURAL, which many
people discourage because of this problem.

The problem with NATURAL is due to matching based on column names.
My proposal doesn't match on column names at all.
It merely follows the foreign key for a column.
With NATURAL you can also suddenly get a different join,
whereas my proposal at worst will generate an error due to multiple FKs on the same column,
there can never be any ambiguity.

/Joel

#8Joel Jacobson
joel@compiler.org
In reply to: Pavel Stehule (#6)
Re: Idea: Avoid JOINs by using path expressions to follow FKs

On Sun, Mar 28, 2021, at 13:51, Pavel Stehule wrote:

There were some similar tools already. Personally I like the current state, where tables should be explicitly defined, and join should be explicitly defined. The joining of tables is not cheap - and I like the visibility of this. On the other hand, this is very frustrable for a lot of people, and I can understand. I don't want to see this feature inside Postgres, because it can reduce the possibility to detect badly written query. But it can be a great feature of some outer tool. I worked for GoodData and this tool knows the model, and it generates necessary joins implicitly, and people like it (this tool uses Postgres too).

https://www.gooddata.com/

Very good points.

As a counter-argument, I could argue that you don't need to use this feature.
But that would be a poor argument, since you might have to work with code
written by other developers.

I'm also fearful of newbies misusing features, not understanding what they are doing, producing inefficient code.
However, this is a general problem, complex queries are hard to reason about,
and I'm not sure making some INNER JOINs implicit would worsen the situation,
you could also make the counter-argument that the remaining explicit JOINs become more visible,
and will stand-out, exposing what is really complex in the query.

Also, this proposed syntax would surely appeal to the NoSQL-crowd,
and should reduce their cravings for MongoDB.

So ask yourself the following question: Ten years from now, would you rather be forced to
work with code using MongoDB or a more concise SQL?

Lastly, let me reiterate I think you made a very good point,
your argument is the heaviest weigh on the negative side of my own scale.

/Joel

#9Andrew Dunstan
andrew@dunslane.net
In reply to: Alvaro Herrera (#3)
Re: Idea: Avoid JOINs by using path expressions to follow FKs

On 3/27/21 5:11 PM, Alvaro Herrera wrote:

On 2021-Mar-27, Joel Jacobson wrote:

If there would be multiple foreign keys on a column we try to follow,
the query planner would throw an error forcing the user to use explicit joins instead.

This seems pretty dangerous -- you just have to create one more FK, and
suddenly a query that worked perfectly fine, now starts throwing errors
because it's now ambiguous. Feels a bit like JOIN NATURAL, which many
people discourage because of this problem.

Maybe. I don't recall ever having seen a column with more than one FK.
Is that a common thing? In itself it seems like a bad idea.

Not saying I think this suggestion is a good idea, though. We've seen
many frameworks that hide joins, and the results are ... less than
universally good. If your application programmers don't like using
joins, then either a) you should have the DBA create some views for them
that contain the joins, or b) you have the wrong application programmers -:)

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#10Pavel Stehule
pavel.stehule@gmail.com
In reply to: Joel Jacobson (#8)
Re: Idea: Avoid JOINs by using path expressions to follow FKs

ne 28. 3. 2021 v 14:39 odesílatel Joel Jacobson <joel@compiler.org> napsal:

On Sun, Mar 28, 2021, at 13:51, Pavel Stehule wrote:

There were some similar tools already. Personally I like the current
state, where tables should be explicitly defined, and join should be
explicitly defined. The joining of tables is not cheap - and I like the
visibility of this. On the other hand, this is very frustrable for a lot of
people, and I can understand. I don't want to see this feature inside
Postgres, because it can reduce the possibility to detect badly written
query. But it can be a great feature of some outer tool. I worked for
GoodData and this tool knows the model, and it generates necessary joins
implicitly, and people like it (this tool uses Postgres too).

https://www.gooddata.com/

Very good points.

As a counter-argument, I could argue that you don't need to use this
feature.
But that would be a poor argument, since you might have to work with code
written by other developers.

I'm also fearful of newbies misusing features, not understanding what they
are doing, producing inefficient code.
However, this is a general problem, complex queries are hard to reason
about,
and I'm not sure making some INNER JOINs implicit would worsen the
situation,
you could also make the counter-argument that the remaining explicit JOINs
become more visible,
and will stand-out, exposing what is really complex in the query.

It is not the problem only for newbies - yesterday a very experienced user
(I know him personally) reported an issue related to misunderstanding some
behaviour and just some typo, I like some mandatory redundancy in syntax,
because it allows to detect some typo errors. SQL is not consistent in this
- the query is relatively safe, but if you use subqueries, then are not
safe because you can use an outer identifier without qualification, and
what is worse, the identifiers are prioritized - there is not amobigonuous
column check. So SQL has enough traps already, and I am afraid to introduce
some new implicit features.

Theoretically you can introduce own procedural language

CREATE OR REPLACE FUNCTION foo(a int)
RETURNS TABLE (x int, y int) AS $$
SELECT t1.x, t2.y WHERE t3.a = a;
$$ LANGUAGE mylanguage.

It is well wrapped, and well isolated.

Also, this proposed syntax would surely appeal to the NoSQL-crowd,
and should reduce their cravings for MongoDB.

So ask yourself the following question: Ten years from now, would you
rather be forced to
work with code using MongoDB or a more concise SQL?

I am a man who likes SQL - for me, it is a human readable language with a
good level of verbosity and abstraction - all time, when I learned SQL. But
I see that SQL is not a fully "safe" language. It allows bad joins, or
doesn't detect all human errors. It can be a good reason for a new layer
over SQL - some more abstract language. And it can work - I have really
good experience with GoodData query language. This is a transpiler from
domain language to SQL.

I think so every tool, every layer should have a similar level of
abstraction to be well usable.

Lastly, let me reiterate I think you made a very good point,
your argument is the heaviest weigh on the negative side of my own scale.

:)

Show quoted text

/Joel

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#9)
Re: Idea: Avoid JOINs by using path expressions to follow FKs

Andrew Dunstan <andrew@dunslane.net> writes:

On 3/27/21 5:11 PM, Alvaro Herrera wrote:

This seems pretty dangerous -- you just have to create one more FK, and
suddenly a query that worked perfectly fine, now starts throwing errors
because it's now ambiguous. Feels a bit like JOIN NATURAL, which many
people discourage because of this problem.

Maybe. I don't recall ever having seen a column with more than one FK.
Is that a common thing? In itself it seems like a bad idea.

Yeah, that aspect seems like a complete show-stopper. We have a way
to enforce that you can't *drop* a constraint that some stored view
depends on for semantic validity. We don't have a way to say that
you can't *add* a constraint-with-certain-properties. And I don't
think it'd be very practical to do (consider race conditions, if
nothing more).

However, that stumbling block is just dependent on the assumption
that the foreign key constraint being used is implicit. If the
syntax names it explicitly then you just have a normal constraint
dependency and all's well.

You might be able to have a shorthand notation in which the constraint
isn't named and the system will accept it as long as there's just one
candidate (but then, when dumping a stored view, the constraint name
would always be shown explicitly). However I'm not sure that the
"shorthand" would be any shorter. I'm imagining a syntax in which
you give the constraint name instead of the column name. Thought
experiment: how could the original syntax proposal make any use of
a multi-column foreign key?

Not saying I think this suggestion is a good idea, though. We've seen
many frameworks that hide joins, and the results are ... less than
universally good.

Yeah, I'm pretty much not sold on this idea either. I think it would
lead to the same problems we see with ORMs, namely that people write
queries that are impossible to execute efficiently and then blame
the database for their poor choice of schema.

regards, tom lane

#12Joel Jacobson
joel@compiler.org
In reply to: Tom Lane (#11)
Re: Idea: Avoid JOINs by using path expressions to follow FKs

On Sun, Mar 28, 2021, at 16:04, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net <mailto:andrew%40dunslane.net>> writes:

Maybe. I don't recall ever having seen a column with more than one FK.
Is that a common thing? In itself it seems like a bad idea.

Yeah, that aspect seems like a complete show-stopper. We have a way
to enforce that you can't *drop* a constraint that some stored view
depends on for semantic validity. We don't have a way to say that
you can't *add* a constraint-with-certain-properties. And I don't
think it'd be very practical to do (consider race conditions, if
nothing more).

Thanks for valuable insights, I didn't think about these things.

What if the path expressions are just syntactic sugar for an INNER JOIN on the referencing -> referenced column?
If a VIEW is created using this syntax, it would be stored as INNER JOINs, similar to how SELECT * is expanded.

/Joel

#13Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#11)
Re: Idea: Avoid JOINs by using path expressions to follow FKs

On 3/28/21 10:04 AM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

On 3/27/21 5:11 PM, Alvaro Herrera wrote:

This seems pretty dangerous -- you just have to create one more FK, and
suddenly a query that worked perfectly fine, now starts throwing errors
because it's now ambiguous. Feels a bit like JOIN NATURAL, which many
people discourage because of this problem.

Maybe. I don't recall ever having seen a column with more than one FK.
Is that a common thing? In itself it seems like a bad idea.

Yeah, that aspect seems like a complete show-stopper. We have a way
to enforce that you can't *drop* a constraint that some stored view
depends on for semantic validity. We don't have a way to say that
you can't *add* a constraint-with-certain-properties. And I don't
think it'd be very practical to do (consider race conditions, if
nothing more).

However, that stumbling block is just dependent on the assumption
that the foreign key constraint being used is implicit. If the
syntax names it explicitly then you just have a normal constraint
dependency and all's well.

You might be able to have a shorthand notation in which the constraint
isn't named and the system will accept it as long as there's just one
candidate (but then, when dumping a stored view, the constraint name
would always be shown explicitly). However I'm not sure that the
"shorthand" would be any shorter. I'm imagining a syntax in which
you give the constraint name instead of the column name. Thought
experiment: how could the original syntax proposal make any use of
a multi-column foreign key?

I guess we could have a special operator, which allows the LHS to be
either a column (in which case it must have only one single-valued FK
constraint) or a constraint name in which case it would match the
corresponding columns on both sides.

It gets kinda tricky though, as there are FKs going both ways:

    customers <- orders <- order_details -> products

and in fact this could make composing the query LESS clear. The natural
place to start this query (show me the name of every customer who
ordered chocolate) is with orders ISTM, but the example given starts
with order_details which seems somewhat unnatural.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#14Vik Fearing
vik@postgresfriends.org
In reply to: Joel Jacobson (#5)
Re: Idea: Avoid JOINs by using path expressions to follow FKs

On 3/28/21 1:26 PM, Joel Jacobson wrote:

On Sun, Mar 28, 2021, at 12:25, Vik Fearing wrote:

On 3/27/21 9:27 PM, Joel Jacobson wrote:

Imagine if we could simply write the SQL query like this:

SELECT DISTINCT od.order_id.customer_id.company_name
FROM order_details AS od
WHERE od.product_id.product_name = 'Chocolade';

I took the inspiration for this syntax from SQL/JSON path expressions.

This is a terrible idea, but let me explain why.

First of all, neo4j claims they don't have joins because they actually
don't have joins. The nodes point directly to the other nodes. Your
proposal is syntactic sugar over a real join. The equivalent to neo4j
would be somehow storing the foreign ctid in the column or something.

Secondly, and more importantly IMO, graph queries are coming to SQL.
They are mostly based on Cypher but not entirely because they amalgamate
concepts from other graph query languages like Oracle's PGQ. The
"common" syntax is called GQL (https://www.gqlstandards.org/) and it's
in the process of becoming Part 16 of the SQL standard. The timeline on
that website says August 2022 (it started in September 2019).

If that timeline holds and ambitious people work on it (I already know
one who is; not me), I would expect this to be in PostgreSQL 16 or 17.
The earliest your proposal could get in is 15, so it's not that much of
a wait.

I think you misunderstood my idea entirely.

It has absolutely nothing to do with graph query languages,
except the one similarity I mentioned, not having joins.

In that case, I oppose this suggestion.

What I propose is a way to do implicit joins by following foreign keys,
thus improving the SQL query language, making it more concise for
the simple case when what you want to do is an INNER JOIN on a
single column on which there is a single FOREIGN KEY.

SQL is not an implicit language, though.

I wouldn't mind something like

FROM a JOIN b WITH a_b_fk

or something. That would be really nice when the keys are multicolumn.
But I don't want an implicit join like you're suggesting.
--
Vik Fearing

#15Joel Jacobson
joel@compiler.org
In reply to: Tom Lane (#11)
Re: Idea: Avoid JOINs by using path expressions to follow FKs

On Sun, Mar 28, 2021, at 16:04, Tom Lane wrote:

I'm imagining a syntax in which
you give the constraint name instead of the column name. Thought
experiment: how could the original syntax proposal make any use of
a multi-column foreign key?

Thanks for coming up with this genius idea.

At first I didn't see the beauty of it; I wrongly thought the constraint name needed to be
unique per schema, but I realize we could just use the foreign table's name
as the constraint name, which will allow a nice syntax:

SELECT DISTINCT order_details.orders.customers.company_name
FROM order_details
WHERE order_details.products.product_name = 'Chocolade';

Given this data model:

CREATE TABLE customers (
customer_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
company_name text,
PRIMARY KEY (customer_id)
);

CREATE TABLE orders (
order_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
customer_id bigint NOT NULL,
PRIMARY KEY (order_id),
CONSTRAINT customers FOREIGN KEY (customer_id) REFERENCES customers
);

CREATE TABLE products (
product_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
product_name text NOT NULL,
PRIMARY KEY (product_id)
);

CREATE TABLE order_details (
order_id bigint NOT NULL,
product_id bigint NOT NULL,
PRIMARY KEY (order_id, product_id),
CONSTRAINT orders FOREIGN KEY (order_id) REFERENCES orders,
CONSTRAINT products FOREIGN KEY (product_id) REFERENCES products
);

Not saying I think this suggestion is a good idea, though. We've seen
many frameworks that hide joins, and the results are ... less than
universally good.

Yeah, I'm pretty much not sold on this idea either. I think it would
lead to the same problems we see with ORMs, namely that people write
queries that are impossible to execute efficiently and then blame
the database for their poor choice of schema.

I think this concern is valid for the original syntax,
but I actually think the idea on using foreign key constraint names
effectively solves an entire class of query writing bugs.

Users writing queries using this syntax are guaranteed to be aware
of the existence of the foreign keys, otherwise they couldn't write
the query this way, since they must use the foreign key
constraint names in the path expression.

This ensures it's not possible to produce a nonsensical JOIN
on the wrong columns, a problem for which traditional JOINs
have no means to protect against.

Even with foreign keys, indexes could of course be missing,
causing an inefficient query anyway, but at least the classes
of potential problems is reduced by one.

I think what's neat is how this syntax works excellent in combination
with traditional JOINs, allowing the one which feels most natural for
each part of the query to be used.

Let's also remember foreign keys did first appear in SQL-89,
so they couldn't have been taken into account when SQL-86
was designed. Maybe they would have came up with the idea
of making more use of foreign key constraints,
if they would have been invented from the very beginning.

However, it's not too late to fix this, it seems doable without
breaking any backwards compatibility. I think there is a risk
our personal preferences are biased due to being experienced
SQL users. I think it's likely newcomers to SQL would really
fancy this proposed syntax, and cause them to prefer PostgreSQL
over some other NoSQL product.

If we can provide such newcomers with a built-in solution,
I think that better than telling them they should
use some ORM/tool/macro to simplify their query writing.

/Joel

#16Pavel Stehule
pavel.stehule@gmail.com
In reply to: Joel Jacobson (#15)
Re: Idea: Avoid JOINs by using path expressions to follow FKs

po 29. 3. 2021 v 12:01 odesílatel Joel Jacobson <joel@compiler.org> napsal:

On Sun, Mar 28, 2021, at 16:04, Tom Lane wrote:

I'm imagining a syntax in which
you give the constraint name instead of the column name. Thought
experiment: how could the original syntax proposal make any use of
a multi-column foreign key?

Thanks for coming up with this genius idea.

At first I didn't see the beauty of it; I wrongly thought the constraint
name needed to be
unique per schema, but I realize we could just use the foreign table's name
as the constraint name, which will allow a nice syntax:

SELECT DISTINCT order_details.orders.customers.company_name
FROM order_details
WHERE order_details.products.product_name = 'Chocolade';

This syntax is similar to Oracle's object references (this is example from
thread from Czech Postgres list last week)

Select e.last_name employee,
e.department_ref.department_name department,
e.department_ref.manager_ref.last_name dept_manager
From employees_obj e
where e.initials() like 'K_';

I see few limitations: a) there is not support for outer join, b) there is
not support for aliasing - and it probably doesn't too nice, when you want
to returns more (but not all) columns

Regards

Pavel

Show quoted text

Given this data model:

CREATE TABLE customers (
customer_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
company_name text,
PRIMARY KEY (customer_id)
);

CREATE TABLE orders (
order_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
customer_id bigint NOT NULL,
PRIMARY KEY (order_id),
CONSTRAINT customers FOREIGN KEY (customer_id) REFERENCES customers
);

CREATE TABLE products (
product_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
product_name text NOT NULL,
PRIMARY KEY (product_id)
);

CREATE TABLE order_details (
order_id bigint NOT NULL,
product_id bigint NOT NULL,
PRIMARY KEY (order_id, product_id),
CONSTRAINT orders FOREIGN KEY (order_id) REFERENCES orders,
CONSTRAINT products FOREIGN KEY (product_id) REFERENCES products
);

Not saying I think this suggestion is a good idea, though. We've seen
many frameworks that hide joins, and the results are ... less than
universally good.

Yeah, I'm pretty much not sold on this idea either. I think it would
lead to the same problems we see with ORMs, namely that people write
queries that are impossible to execute efficiently and then blame
the database for their poor choice of schema.

I think this concern is valid for the original syntax,
but I actually think the idea on using foreign key constraint names
effectively solves an entire class of query writing bugs.

Users writing queries using this syntax are guaranteed to be aware
of the existence of the foreign keys, otherwise they couldn't write
the query this way, since they must use the foreign key
constraint names in the path expression.

This ensures it's not possible to produce a nonsensical JOIN
on the wrong columns, a problem for which traditional JOINs
have no means to protect against.

Even with foreign keys, indexes could of course be missing,
causing an inefficient query anyway, but at least the classes
of potential problems is reduced by one.

I think what's neat is how this syntax works excellent in combination
with traditional JOINs, allowing the one which feels most natural for
each part of the query to be used.

Let's also remember foreign keys did first appear in SQL-89,
so they couldn't have been taken into account when SQL-86
was designed. Maybe they would have came up with the idea
of making more use of foreign key constraints,
if they would have been invented from the very beginning.

However, it's not too late to fix this, it seems doable without
breaking any backwards compatibility. I think there is a risk
our personal preferences are biased due to being experienced
SQL users. I think it's likely newcomers to SQL would really
fancy this proposed syntax, and cause them to prefer PostgreSQL
over some other NoSQL product.

If we can provide such newcomers with a built-in solution,
I think that better than telling them they should
use some ORM/tool/macro to simplify their query writing.

/Joel

#17Vik Fearing
vik@postgresfriends.org
In reply to: Joel Jacobson (#15)
Re: Idea: Avoid JOINs by using path expressions to follow FKs

On 3/29/21 11:59 AM, Joel Jacobson wrote:

On Sun, Mar 28, 2021, at 16:04, Tom Lane wrote:

I'm imagining a syntax in which
you give the constraint name instead of the column name. Thought
experiment: how could the original syntax proposal make any use of
a multi-column foreign key?

Thanks for coming up with this genius idea.

At first I didn't see the beauty of it; I wrongly thought the constraint name needed to be
unique per schema, but I realize we could just use the foreign table's name
as the constraint name, which will allow a nice syntax:

SELECT DISTINCT order_details.orders.customers.company_name
FROM order_details
WHERE order_details.products.product_name = 'Chocolade';

Given this data model:

CREATE TABLE customers (
customer_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
company_name text,
PRIMARY KEY (customer_id)
);

CREATE TABLE orders (
order_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
customer_id bigint NOT NULL,
PRIMARY KEY (order_id),
CONSTRAINT customers FOREIGN KEY (customer_id) REFERENCES customers
);

CREATE TABLE products (
product_id bigint NOT NULL GENERATED ALWAYS AS IDENTITY,
product_name text NOT NULL,
PRIMARY KEY (product_id)
);

CREATE TABLE order_details (
order_id bigint NOT NULL,
product_id bigint NOT NULL,
PRIMARY KEY (order_id, product_id),
CONSTRAINT orders FOREIGN KEY (order_id) REFERENCES orders,
CONSTRAINT products FOREIGN KEY (product_id) REFERENCES products
);

If you write your schema like this, then it becomes standards compliant:

CREATE TYPE customers AS (
company_name text
);
CREATE TABLE customers OF customers (
REF IS customer_id SYSTEM GENERATED
);

CREATE TYPE orders AS (
customer REF(customers) NOT NULL
);
CREATE TABLE orders OF orders (
REF IS order_id SYSTEM GENERATED
);

CREATE TYPE products AS (
product_name text
);
CREATE TABLE products OF products (
REF IS product_id SYSTEM GENERATED
);

CREATE TABLE order_details (
"order" REF(orders),
product REF(products),
quantity integer,
PRIMARY KEY ("order", product)
);

And the query would be:

SELECT DISTINCT order_details."order"->customer->company_name
FROM order_details
WHERE order_details.product->product_name = 'Chocolade';

Postgres already supports most of that, but not all of it.
--
Vik Fearing

#18Joel Jacobson
joel@compiler.org
In reply to: Vik Fearing (#17)
Re: Idea: Avoid JOINs by using path expressions to follow FKs

On Mon, Mar 29, 2021, at 16:17, Vik Fearing wrote:

If you write your schema like this, then it becomes standards compliant:
...
CREATE TABLE order_details (
"order" REF(orders),
product REF(products),
quantity integer,
PRIMARY KEY ("order", product)
);

And the query would be:

SELECT DISTINCT order_details."order"->customer->company_name
FROM order_details
WHERE order_details.product->product_name = 'Chocolade';

Postgres already supports most of that, but not all of it.

Thanks for making me aware of this.
I can see this is "4.9 Reference types" in ISO/IEC 9075-2:2016(E).

This looks awesome.

/Joel

#19Joel Jacobson
joel@compiler.org
In reply to: Vik Fearing (#17)
Re: Idea: Avoid JOINs by using path expressions to follow FKs

On Mon, Mar 29, 2021, at 16:17, Vik Fearing wrote:

CREATE TABLE order_details (
"order" REF(orders),
product REF(products),
quantity integer,
PRIMARY KEY ("order", product)
);

And the query would be:

SELECT DISTINCT order_details."order"->customer->company_name
FROM order_details
WHERE order_details.product->product_name = 'Chocolade';

Postgres already supports most of that, but not all of it.

Do you know if REF is meant to be a replacement for foreign keys?

Are they a different thing meant to co-exist with foreign keys,
or are they actually foreign keys "under the hood"
or something else entirely?

/Joel

#20Vik Fearing
vik@postgresfriends.org
In reply to: Joel Jacobson (#19)
Re: Idea: Avoid JOINs by using path expressions to follow FKs

On 3/29/21 7:55 PM, Joel Jacobson wrote:

On Mon, Mar 29, 2021, at 16:17, Vik Fearing wrote:

CREATE TABLE order_details (
"order" REF(orders),
product REF(products),
quantity integer,
PRIMARY KEY ("order", product)
);

And the query would be:

SELECT DISTINCT order_details."order"->customer->company_name
FROM order_details
WHERE order_details.product->product_name = 'Chocolade';

Postgres already supports most of that, but not all of it.

Do you know if REF is meant to be a replacement for foreign keys?

Are they a different thing meant to co-exist with foreign keys,
or are they actually foreign keys "under the hood"
or something else entirely?

They're supposed to be OOP where each row in the typed table is an
instance of the object. Types can also have methods associated with
them, and the instance tables can have subtables similar to our table
inheritance. The dereference operator is replaced by a subquery.

There is a whole slew of things in this area of the standard that
apparently never caught on.
--
Vik Fearing

#21Joel Jacobson
joel@compiler.org
In reply to: Vik Fearing (#20)
#22Pavel Stehule
pavel.stehule@gmail.com
In reply to: Joel Jacobson (#21)
#23Julien Rouhaud
rjuju123@gmail.com
In reply to: Pavel Stehule (#22)
#24Pavel Stehule
pavel.stehule@gmail.com
In reply to: Julien Rouhaud (#23)
#25Julien Rouhaud
rjuju123@gmail.com
In reply to: Pavel Stehule (#24)
#26Pavel Stehule
pavel.stehule@gmail.com
In reply to: Julien Rouhaud (#25)
#27Joel Jacobson
joel@compiler.org
In reply to: Pavel Stehule (#22)
#28Pavel Stehule
pavel.stehule@gmail.com
In reply to: Joel Jacobson (#27)
#29Joel Jacobson
joel@compiler.org
In reply to: Pavel Stehule (#28)
#30Pavel Stehule
pavel.stehule@gmail.com
In reply to: Joel Jacobson (#29)
#31Joel Jacobson
joel@compiler.org
In reply to: Pavel Stehule (#28)
#32Joel Jacobson
joel@compiler.org
In reply to: Joel Jacobson (#31)
#33Rod Taylor
rbt@rbt.ca
In reply to: Joel Jacobson (#1)
#34Joel Jacobson
joel@compiler.org
In reply to: Rod Taylor (#33)
#35Joel Jacobson
joel@compiler.org
In reply to: Vik Fearing (#17)
#36Isaac Morland
isaac.morland@gmail.com
In reply to: Joel Jacobson (#35)
#37Joel Jacobson
joel@compiler.org
In reply to: Isaac Morland (#36)
#38Isaac Morland
isaac.morland@gmail.com
In reply to: Joel Jacobson (#37)
#39Joel Jacobson
joel@compiler.org
In reply to: Isaac Morland (#38)
#40Julien Rouhaud
rjuju123@gmail.com
In reply to: Joel Jacobson (#39)
#41Joel Jacobson
joel@compiler.org
In reply to: Julien Rouhaud (#40)
#42Pavel Stehule
pavel.stehule@gmail.com
In reply to: Joel Jacobson (#41)
#43Julien Rouhaud
rjuju123@gmail.com
In reply to: Joel Jacobson (#41)
#44Martin Jonsson
martinerikjonsson@yahoo.fr
In reply to: Pavel Stehule (#16)
#45Joel Jacobson
joel@compiler.org
In reply to: Julien Rouhaud (#43)
#46Joel Jacobson
joel@compiler.org
In reply to: Joel Jacobson (#45)
#47Isaac Morland
isaac.morland@gmail.com
In reply to: Joel Jacobson (#45)
#48Joel Jacobson
joel@compiler.org
In reply to: Isaac Morland (#47)
#49Vik Fearing
vik@postgresfriends.org
In reply to: Pavel Stehule (#42)
#50Joel Jacobson
joel@compiler.org
In reply to: Vik Fearing (#49)