Change JOIN tutorial to focus more on explicit joins
The following documentation comment has been logged on the website:
Page: https://www.postgresql.org/docs/12/tutorial-join.html
Description:
The tutorial about joins makes the following statement about the explicit
JOIN operator:
This syntax is not as commonly used as the one above
I think in 2020 this claim is no longer true, and I would love to see the
manual prefer the "modern" explicit JOIN operator rather than sticking to
the ancient implicit joins in the WHERE clause.
On Thu, May 21, 2020 at 1:37 AM PG Doc comments form
<noreply@postgresql.org> wrote:
The following documentation comment has been logged on the website:
Page: https://www.postgresql.org/docs/12/tutorial-join.html
Description:The tutorial about joins makes the following statement about the explicit
JOIN operator:This syntax is not as commonly used as the one above
I think in 2020 this claim is no longer true, and I would love to see the
manual prefer the "modern" explicit JOIN operator rather than sticking to
the ancient implicit joins in the WHERE clause.
+1
The "new" syntax is 28 years old, from SQL 92. I don't see too many
SQL 86 joins. Would you like to write a documentation patch?
On 20.05.20 23:56, Thomas Munro wrote:
On Thu, May 21, 2020 at 1:37 AM PG Doc comments form
<noreply@postgresql.org> wrote:The following documentation comment has been logged on the website:
Page: https://www.postgresql.org/docs/12/tutorial-join.html
Description:The tutorial about joins makes the following statement about the explicit
JOIN operator:This syntax is not as commonly used as the one above
I think in 2020 this claim is no longer true, and I would love to see the
manual prefer the "modern" explicit JOIN operator rather than sticking to
the ancient implicit joins in the WHERE clause.+1
The "new" syntax is 28 years old, from SQL 92. I don't see too many
SQL 86 joins. Would you like to write a documentation patch?
The attached patch
- prefers the explicit join-syntax over the implicit one and explains
the keywords of the explicit syntax
- uses a more accurate definition of 'join'
- separates <programlisting> and <screen> tags
- shifts <indexterm> definitions outside of <para> to get a better
rendering in PDF
- adds a note concerning IDs and foreign keys
--
J. Purtz
Attachments:
0001-query.patchtext/x-patch; charset=UTF-8; name=0001-query.patchDownload
diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml
index c0889743c4..6f8ea373b5 100644
--- a/doc/src/sgml/query.sgml
+++ b/doc/src/sgml/query.sgml
@@ -441,11 +441,11 @@ SELECT DISTINCT city
<para>
Thus far, our queries have only accessed one table at a time.
Queries can access multiple tables at once, or access the same
- table in such a way that multiple rows of the table are being
- processed at the same time. A query that accesses multiple rows
- of the same or different tables at one time is called a
- <firstterm>join</firstterm> query. As an example, say you wish to
- list all the weather records together with the location of the
+ table several times. Such queries — they are called
+ <firstterm>join</firstterm> queries — combine
+ rows of one table in some way with rows of the other table
+ and return a single row per combination. An example may be a
+ list of all the weather records together with the location of the
associated city. To do that, we need to compare the <structfield>city</structfield>
column of each row of the <structname>weather</structname> table with the
<structfield>name</structfield> column of all rows in the <structname>cities</structname>
@@ -461,10 +461,17 @@ SELECT DISTINCT city
<programlisting>
SELECT *
- FROM weather, cities
- WHERE city = name;
+FROM weather
+JOIN cities ON (city = name);
</programlisting>
+ The keyword <command>JOIN</command> connects the two tables.
+ Behind the keyword <command>ON</command> follows the
+ definition how to compare their rows. In this case, the
+ column <varname>city</varname> of table <varname>weather</varname>
+ must be equal to the column <varname>name</varname>
+ of table <varname>cities</varname>.
+
<screen>
city | temp_lo | temp_hi | prcp | date | name | location
---------------+---------+---------+------+------------+---------------+-----------
@@ -476,8 +483,14 @@ SELECT *
</para>
<para>
- Observe two things about the result set:
+ Observe some things about the result set:
<itemizedlist>
+ <listitem>
+ <para>
+ The resulting rows contain values from both tables.
+ </para>
+ </listitem>
+
<listitem>
<para>
There is no result row for the city of Hayward. This is
@@ -499,8 +512,8 @@ SELECT *
<literal>*</literal>:
<programlisting>
SELECT city, temp_lo, temp_hi, prcp, date, location
- FROM weather, cities
- WHERE city = name;
+FROM weather
+JOIN cities ON (city = name);
</programlisting>
</para>
</listitem>
@@ -509,10 +522,9 @@ SELECT city, temp_lo, temp_hi, prcp, date, location
<formalpara>
<title>Exercise:</title>
-
<para>
Attempt to determine the semantics of this query when the
- <literal>WHERE</literal> clause is omitted.
+ <literal>ON</literal> clause is omitted.
</para>
</formalpara>
@@ -526,8 +538,8 @@ SELECT city, temp_lo, temp_hi, prcp, date, location
<programlisting>
SELECT weather.city, weather.temp_lo, weather.temp_hi,
weather.prcp, weather.date, cities.location
- FROM weather, cities
- WHERE cities.name = weather.city;
+FROM weather
+JOIN cities ON (cities.name = weather.city);
</programlisting>
It is widely considered good style to qualify all column names
@@ -537,20 +549,22 @@ SELECT weather.city, weather.temp_lo, weather.temp_hi,
<para>
Join queries of the kind seen thus far can also be written in this
- alternative form:
+ form:
<programlisting>
SELECT *
- FROM weather INNER JOIN cities ON (weather.city = cities.name);
+FROM weather, cities
+WHERE weather.city = cities.name;
</programlisting>
- This syntax is not as commonly used as the one above, but we show
- it here to help you understand the following topics.
+ This syntax is not as commonly used as the one above. It dates back
+ to the very first times of SQL, avoids the <literal>JOIN</literal>
+ keyword and uses the <literal>WHERE</literal> clause instead of the
+ <literal>ON</literal> clause.
</para>
+ <indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
<para>
- <indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
-
Now we will figure out how we can get the Hayward records back in.
What we want the query to do is to scan the
<structname>weather</structname> table and for each row to find the
@@ -563,15 +577,19 @@ SELECT *
<programlisting>
SELECT *
- FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
+FROM weather
+LEFT OUTER JOIN cities ON (weather.city = cities.name);
+</programlisting>
+
+<screen>
city | temp_lo | temp_hi | prcp | date | name | location
---------------+---------+---------+------+------------+---------------+-----------
Hayward | 37 | 54 | | 1994-11-29 | |
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
(3 rows)
-</programlisting>
+</screen>
This query is called a <firstterm>left outer
join</firstterm> because the table mentioned on the left of the
@@ -591,10 +609,9 @@ SELECT *
</para>
</formalpara>
+ <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
+ <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
<para>
- <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
- <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
-
We can also join a table against itself. This is called a
<firstterm>self join</firstterm>. As an example, suppose we wish
to find all the weather records that are in the temperature range
@@ -607,30 +624,42 @@ SELECT *
following query:
<programlisting>
-SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high,
- W2.city, W2.temp_lo AS low, W2.temp_hi AS high
- FROM weather W1, weather W2
- WHERE W1.temp_lo < W2.temp_lo
- AND W1.temp_hi > W2.temp_hi;
+SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high,
+ w2.city, w2.temp_lo AS low, w2.temp_hi AS high
+FROM weather w1
+JOIN weather w2 ON (w1.temp_lo < w2.temp_lo AND
+ w1.temp_hi > w2.temp_hi);
+</programlisting>
+<screen>
city | low | high | city | low | high
---------------+-----+------+---------------+-----+------
San Francisco | 43 | 57 | San Francisco | 46 | 50
Hayward | 37 | 54 | San Francisco | 46 | 50
(2 rows)
-</programlisting>
+</screen>
- Here we have relabeled the weather table as <literal>W1</literal> and
- <literal>W2</literal> to be able to distinguish the left and right side
+ Here we have relabeled the weather table as <literal>w1</literal> and
+ <literal>w2</literal> to be able to distinguish the left and right side
of the join. You can also use these kinds of aliases in other
queries to save some typing, e.g.:
<programlisting>
SELECT *
- FROM weather w, cities c
- WHERE w.city = c.name;
+FROM weather w
+JOIN cities c ON (w.city = c.name);
</programlisting>
You will encounter this style of abbreviating quite frequently.
</para>
+
+ <note>
+ <para>
+ The examples shown here combine rows via city names.
+ This should help to understand the concept. Professional
+ solutions prefer to use numerical IDs and foreign keys
+ to join tables.
+ </para>
+ </note>
+
</sect1>
On Wed, May 27, 2020 at 8:29 PM Jürgen Purtz <juergen@purtz.de> wrote:
The "new" syntax is 28 years old, from SQL 92. I don't see too many
SQL 86 joins. Would you like to write a documentation patch?The attached patch
- prefers the explicit join-syntax over the implicit one and explains
the keywords of the explicit syntax- uses a more accurate definition of 'join'
- separates <programlisting> and <screen> tags
- shifts <indexterm> definitions outside of <para> to get a better
rendering in PDF- adds a note concerning IDs and foreign keys
Hi Jürgen,
Please add to the commitfest app, so we don't lose track of it.
On 2020-05-27 10:29, Jürgen Purtz wrote:
The attached patch
- prefers the explicit join-syntax over the implicit one and explains
the keywords of the explicit syntax- uses a more accurate definition of 'join'
- separates <programlisting> and <screen> tags
- shifts <indexterm> definitions outside of <para> to get a better
rendering in PDF- adds a note concerning IDs and foreign keys
I have committed some parts of this patch:
- separates <programlisting> and <screen> tags
- shifts <indexterm> definitions outside of <para> to get a better
rendering in PDF
as well as the change of W1/W2 to w1/w2. (Note that there is also
src/tutorial/basics.source that should be adjusted in the same way.)
For the remaining patch I have a couple of concerns:
<para> Attempt to determine the semantics of this query when the - <literal>WHERE</literal> clause is omitted. + <literal>ON</literal> clause is omitted. </para> </formalpara>
This no longer works.
In general, I agree that some more emphasis on the JOIN syntax is okay.
But I think the order in which the tutorial has taught it so far is
okay: First you do it the manual way, then you learn the more abstract way.
+ <note> + <para> + The examples shown here combine rows via city names. + This should help to understand the concept. Professional + solutions prefer to use numerical IDs and foreign keys + to join tables. + </para> + </note>
While there are interesting debates to be had about natural vs.
surrogate keys, I don't think we should imply that one of them is
unprofessional and then leave it at that and give no further guidance.
I think we should leave this out.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 04.09.20 08:52, Peter Eisentraut wrote:
For the remaining patch I have a couple of concerns:
<para> Attempt to determine the semantics of this query when the - <literal>WHERE</literal> clause is omitted. + <literal>ON</literal> clause is omitted. </para> </formalpara>This no longer works.
Ok, but I don't have any better suggestion than to delete this para.
In general, I agree that some more emphasis on the JOIN syntax is
okay. But I think the order in which the tutorial has taught it so far
is okay: First you do it the manual way, then you learn the more
abstract way.
In this context, I wouldn't use the terms 'manual' and 'abstract', it's
more about 'implicit' and 'explicit' syntax. The 'explicit' syntax does
not only emphasis the aspect of 'joining' tables, it also differentiates
between the usage of following AND/OR/NOT key words as join conditions
or as additional restrictions (the results are identical but not the
semantic). Because the purpose of this patch is the preference of the
explicit syntax, we shall show this syntax first.
+ <note> + <para> + The examples shown here combine rows via city names. + This should help to understand the concept. Professional + solutions prefer to use numerical IDs and foreign keys + to join tables. + </para> + </note>While there are interesting debates to be had about natural vs.
surrogate keys, I don't think we should imply that one of them is
unprofessional and then leave it at that and give no further guidance.
I think we should leave this out.
Ok, deleted.
--
Jürgen Purtz
Attachments:
0002-query.patchtext/x-patch; charset=UTF-8; name=0002-query.patchDownload
diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml
index e73e805ec4..413763691e 100644
--- a/doc/src/sgml/query.sgml
+++ b/doc/src/sgml/query.sgml
@@ -441,11 +441,11 @@ SELECT DISTINCT city
<para>
Thus far, our queries have only accessed one table at a time.
Queries can access multiple tables at once, or access the same
- table in such a way that multiple rows of the table are being
- processed at the same time. A query that accesses multiple rows
- of the same or different tables at one time is called a
- <firstterm>join</firstterm> query. As an example, say you wish to
- list all the weather records together with the location of the
+ table several times. Such queries — they are called
+ <firstterm>join</firstterm> queries — combine
+ rows of one table in some way with rows of the other table
+ and return a single row per combination. An example may be a
+ list of all the weather records together with the location of the
associated city. To do that, we need to compare the <structfield>city</structfield>
column of each row of the <structname>weather</structname> table with the
<structfield>name</structfield> column of all rows in the <structname>cities</structname>
@@ -461,10 +461,17 @@ SELECT DISTINCT city
<programlisting>
SELECT *
- FROM weather, cities
- WHERE city = name;
+FROM weather
+JOIN cities ON (city = name);
</programlisting>
+ The keyword <command>JOIN</command> connects the two tables.
+ Behind the keyword <command>ON</command> follows the
+ definition how to compare their rows. In this case, the
+ column <varname>city</varname> of table <varname>weather</varname>
+ must be equal to the column <varname>name</varname>
+ of table <varname>cities</varname>.
+
<screen>
city | temp_lo | temp_hi | prcp | date | name | location
---------------+---------+---------+------+------------+---------------+-----------
@@ -476,8 +483,14 @@ SELECT *
</para>
<para>
- Observe two things about the result set:
+ Observe some things about the result set:
<itemizedlist>
+ <listitem>
+ <para>
+ The resulting rows contain values from both tables.
+ </para>
+ </listitem>
+
<listitem>
<para>
There is no result row for the city of Hayward. This is
@@ -499,23 +512,14 @@ SELECT *
<literal>*</literal>:
<programlisting>
SELECT city, temp_lo, temp_hi, prcp, date, location
- FROM weather, cities
- WHERE city = name;
+FROM weather
+JOIN cities ON (city = name);
</programlisting>
</para>
</listitem>
</itemizedlist>
</para>
- <formalpara>
- <title>Exercise:</title>
-
- <para>
- Attempt to determine the semantics of this query when the
- <literal>WHERE</literal> clause is omitted.
- </para>
- </formalpara>
-
<para>
Since the columns all had different names, the parser
automatically found which table they belong to. If there
@@ -526,8 +530,8 @@ SELECT city, temp_lo, temp_hi, prcp, date, location
<programlisting>
SELECT weather.city, weather.temp_lo, weather.temp_hi,
weather.prcp, weather.date, cities.location
- FROM weather, cities
- WHERE cities.name = weather.city;
+FROM weather
+JOIN cities ON (cities.name = weather.city);
</programlisting>
It is widely considered good style to qualify all column names
@@ -537,15 +541,18 @@ SELECT weather.city, weather.temp_lo, weather.temp_hi,
<para>
Join queries of the kind seen thus far can also be written in this
- alternative form:
+ form:
<programlisting>
SELECT *
- FROM weather INNER JOIN cities ON (weather.city = cities.name);
+FROM weather, cities
+WHERE weather.city = cities.name;
</programlisting>
- This syntax is not as commonly used as the one above, but we show
- it here to help you understand the following topics.
+ This syntax is mainly used in legacy applications. It dates back
+ to the first days of SQL, avoids the <literal>JOIN</literal>
+ keyword, and uses the <literal>WHERE</literal> clause instead of the
+ <literal>ON</literal> clause.
</para>
<indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
@@ -558,7 +565,7 @@ SELECT *
found we want some <quote>empty values</quote> to be substituted
for the <structname>cities</structname> table's columns. This kind
of query is called an <firstterm>outer join</firstterm>. (The
- joins we have seen so far are inner joins.) The command looks
+ joins we have seen so far are <firstterm>inner joins</firstterm>.) The command looks
like this:
<programlisting>
@@ -593,10 +600,9 @@ SELECT *
</para>
</formalpara>
+ <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
+ <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
<para>
- <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
- <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
-
We can also join a table against itself. This is called a
<firstterm>self join</firstterm>. As an example, suppose we wish
to find all the weather records that are in the temperature range
@@ -610,10 +616,10 @@ SELECT *
<programlisting>
SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high,
- w2.city, w2.temp_lo AS low, w2.temp_hi AS high
- FROM weather w1, weather w2
- WHERE w1.temp_lo < w2.temp_lo
- AND w1.temp_hi > w2.temp_hi;
+ w2.city, w2.temp_lo AS low, w2.temp_hi AS high
+FROM weather w1
+JOIN weather w2 ON (w1.temp_lo < w2.temp_lo AND
+ w1.temp_hi > w2.temp_hi);
</programlisting>
<screen>
@@ -630,8 +636,8 @@ SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high,
queries to save some typing, e.g.:
<programlisting>
SELECT *
- FROM weather w, cities c
- WHERE w.city = c.name;
+FROM weather w
+JOIN cities c ON (w.city = c.name);
</programlisting>
You will encounter this style of abbreviating quite frequently.
</para>
On Fri, Sep 4, 2020 at 2:36 AM Jürgen Purtz <juergen@purtz.de> wrote:
On 04.09.20 08:52, Peter Eisentraut wrote:
For the remaining patch I have a couple of concerns:
This patch should not be changing the formatting choices for these queries,
just the addition of a JOIN clause and modification of the WHERE clause.
Specifically, SELECT is left-aligned while all subsequent clauses indent
under it. Forced alignment by adding extra spaces isn't done here either.
I have not altered those in the attached.
Did some word-smithing on the first paragraph. The part about the
cross-join was hurt by "in some way" and "may be" is not needed.
Pointing out that values from both tables doesn't seem like an improvement
when the second item covers that and it is more specific in noting that the
city name that is joined on appears twice - once from each table.
ON expression is more precise and the reader should be ok with the term.
Removal of the exercise is good. Not the time to discuss cross join
anyway. Given that "ON true" works the cross join form isn't even required.
In the FROM clause form I would not add table prefixes to the column
names. They are not part of the form changing. If discussion about table
prefixing is desired it should be done explicitly and by itself. They are
used later on, I didn't check to see whether that was covered or might be
confusing.
I suggested a wording for why to use join syntax that doesn't involve
legacy and points out its merit compared to sticking a join expression into
the where clause.
The original patch missed having the syntax for the first left outer join
conform to the multi-line query writing standard you introduced. I did not
change.
The "AND" ON clause should just go with (not changed):
ON (w1.temp_lo < w2.temp_lo
AND w1.temp_hi > w2.temp_high);
Attaching my suggestions made on top of the attached original
0002-query.patch
David J.
Attachments:
0002-00-query.patchapplication/octet-stream; name=0002-00-query.patchDownload
diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml
index e73e805ec4..413763691e 100644
--- a/doc/src/sgml/query.sgml
+++ b/doc/src/sgml/query.sgml
@@ -441,11 +441,11 @@ SELECT DISTINCT city
<para>
Thus far, our queries have only accessed one table at a time.
Queries can access multiple tables at once, or access the same
- table in such a way that multiple rows of the table are being
- processed at the same time. A query that accesses multiple rows
- of the same or different tables at one time is called a
- <firstterm>join</firstterm> query. As an example, say you wish to
- list all the weather records together with the location of the
+ table several times. Such queries — they are called
+ <firstterm>join</firstterm> queries — combine
+ rows of one table in some way with rows of the other table
+ and return a single row per combination. An example may be a
+ list of all the weather records together with the location of the
associated city. To do that, we need to compare the <structfield>city</structfield>
column of each row of the <structname>weather</structname> table with the
<structfield>name</structfield> column of all rows in the <structname>cities</structname>
@@ -461,10 +461,17 @@ SELECT DISTINCT city
<programlisting>
SELECT *
- FROM weather, cities
- WHERE city = name;
+FROM weather
+JOIN cities ON (city = name);
</programlisting>
+ The keyword <command>JOIN</command> connects the two tables.
+ Behind the keyword <command>ON</command> follows the
+ definition how to compare their rows. In this case, the
+ column <varname>city</varname> of table <varname>weather</varname>
+ must be equal to the column <varname>name</varname>
+ of table <varname>cities</varname>.
+
<screen>
city | temp_lo | temp_hi | prcp | date | name | location
---------------+---------+---------+------+------------+---------------+-----------
@@ -476,8 +483,14 @@ SELECT *
</para>
<para>
- Observe two things about the result set:
+ Observe some things about the result set:
<itemizedlist>
+ <listitem>
+ <para>
+ The resulting rows contain values from both tables.
+ </para>
+ </listitem>
+
<listitem>
<para>
There is no result row for the city of Hayward. This is
@@ -499,23 +512,14 @@ SELECT *
<literal>*</literal>:
<programlisting>
SELECT city, temp_lo, temp_hi, prcp, date, location
- FROM weather, cities
- WHERE city = name;
+FROM weather
+JOIN cities ON (city = name);
</programlisting>
</para>
</listitem>
</itemizedlist>
</para>
- <formalpara>
- <title>Exercise:</title>
-
- <para>
- Attempt to determine the semantics of this query when the
- <literal>WHERE</literal> clause is omitted.
- </para>
- </formalpara>
-
<para>
Since the columns all had different names, the parser
automatically found which table they belong to. If there
@@ -526,8 +530,8 @@ SELECT city, temp_lo, temp_hi, prcp, date, location
<programlisting>
SELECT weather.city, weather.temp_lo, weather.temp_hi,
weather.prcp, weather.date, cities.location
- FROM weather, cities
- WHERE cities.name = weather.city;
+FROM weather
+JOIN cities ON (cities.name = weather.city);
</programlisting>
It is widely considered good style to qualify all column names
@@ -537,15 +541,18 @@ SELECT weather.city, weather.temp_lo, weather.temp_hi,
<para>
Join queries of the kind seen thus far can also be written in this
- alternative form:
+ form:
<programlisting>
SELECT *
- FROM weather INNER JOIN cities ON (weather.city = cities.name);
+FROM weather, cities
+WHERE weather.city = cities.name;
</programlisting>
- This syntax is not as commonly used as the one above, but we show
- it here to help you understand the following topics.
+ This syntax is mainly used in legacy applications. It dates back
+ to the first days of SQL, avoids the <literal>JOIN</literal>
+ keyword, and uses the <literal>WHERE</literal> clause instead of the
+ <literal>ON</literal> clause.
</para>
<indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
@@ -558,7 +565,7 @@ SELECT *
found we want some <quote>empty values</quote> to be substituted
for the <structname>cities</structname> table's columns. This kind
of query is called an <firstterm>outer join</firstterm>. (The
- joins we have seen so far are inner joins.) The command looks
+ joins we have seen so far are <firstterm>inner joins</firstterm>.) The command looks
like this:
<programlisting>
@@ -593,10 +600,9 @@ SELECT *
</para>
</formalpara>
+ <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
+ <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
<para>
- <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
- <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
-
We can also join a table against itself. This is called a
<firstterm>self join</firstterm>. As an example, suppose we wish
to find all the weather records that are in the temperature range
@@ -610,10 +616,10 @@ SELECT *
<programlisting>
SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high,
- w2.city, w2.temp_lo AS low, w2.temp_hi AS high
- FROM weather w1, weather w2
- WHERE w1.temp_lo < w2.temp_lo
- AND w1.temp_hi > w2.temp_hi;
+ w2.city, w2.temp_lo AS low, w2.temp_hi AS high
+FROM weather w1
+JOIN weather w2 ON (w1.temp_lo < w2.temp_lo AND
+ w1.temp_hi > w2.temp_hi);
</programlisting>
<screen>
@@ -630,8 +636,8 @@ SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high,
queries to save some typing, e.g.:
<programlisting>
SELECT *
- FROM weather w, cities c
- WHERE w.city = c.name;
+FROM weather w
+JOIN cities c ON (w.city = c.name);
</programlisting>
You will encounter this style of abbreviating quite frequently.
</para>
0002-01-DGJ-query.patchapplication/octet-stream; name=0002-01-DGJ-query.patchDownload
commit 4f5f5c6bfc964d9c136b0d3490ddaa6a53a4c25c
Author: David G. Johnston <david.g.johnston@gmail.com>
Date: Wed Oct 21 23:23:33 2020 +0000
Some word-smithing to go along with my email comments
diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml
index 413763691e..bc42381427 100644
--- a/doc/src/sgml/query.sgml
+++ b/doc/src/sgml/query.sgml
@@ -440,13 +440,12 @@ SELECT DISTINCT city
<para>
Thus far, our queries have only accessed one table at a time.
- Queries can access multiple tables at once, or access the same
- table several times. Such queries — they are called
- <firstterm>join</firstterm> queries — combine
- rows of one table in some way with rows of the other table
- and return a single row per combination. An example may be a
- list of all the weather records together with the location of the
- associated city. To do that, we need to compare the <structfield>city</structfield>
+ Queries which access multiple tables (including repeats) at once are called
+ <firstterm>join</firstterm> queries. They internally combine
+ each row from one table with each row of a second table. A expression is
+ specified to then limit which pairs of rows are returned.
+ For example, to return all the weather records together with the location of the
+ associated city, the database compare the <structfield>city</structfield>
column of each row of the <structname>weather</structname> table with the
<structfield>name</structfield> column of all rows in the <structname>cities</structname>
table, and select the pairs of rows where these values match.
@@ -466,8 +465,8 @@ JOIN cities ON (city = name);
</programlisting>
The keyword <command>JOIN</command> connects the two tables.
- Behind the keyword <command>ON</command> follows the
- definition how to compare their rows. In this case, the
+ After the keyword <command>ON</command> follows the
+ expression comparing their rows. In this case, the
column <varname>city</varname> of table <varname>weather</varname>
must be equal to the column <varname>name</varname>
of table <varname>cities</varname>.
@@ -483,14 +482,8 @@ JOIN cities ON (city = name);
</para>
<para>
- Observe some things about the result set:
+ Observe two things about the result set:
<itemizedlist>
- <listitem>
- <para>
- The resulting rows contain values from both tables.
- </para>
- </listitem>
-
<listitem>
<para>
There is no result row for the city of Hayward. This is
@@ -546,13 +539,15 @@ JOIN cities ON (cities.name = weather.city);
<programlisting>
SELECT *
FROM weather, cities
-WHERE weather.city = cities.name;
+WHERE city = name;
</programlisting>
- This syntax is mainly used in legacy applications. It dates back
- to the first days of SQL, avoids the <literal>JOIN</literal>
- keyword, and uses the <literal>WHERE</literal> clause instead of the
- <literal>ON</literal> clause.
+ This syntax pre-dates the <literal>JOIN</literal> and <literal>ON</literal>
+ keywords. The tables are simply listed in the <literal>FROM</literal>,
+ comma-separated, and the comparison expression added to the
+ <literal>WHERE</literal> clause. As join expressions serve a specific
+ purpose in a multi-table query it is preferable to make them stand-out
+ by using join clauses to introduce additional tables into the query.
</para>
<indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
@@ -565,8 +560,8 @@ WHERE weather.city = cities.name;
found we want some <quote>empty values</quote> to be substituted
for the <structname>cities</structname> table's columns. This kind
of query is called an <firstterm>outer join</firstterm>. (The
- joins we have seen so far are <firstterm>inner joins</firstterm>.) The command looks
- like this:
+ joins we have seen so far are <firstterm>inner joins</firstterm>.)
+ The command looks like this:
<programlisting>
SELECT *
On 22.10.20 01:40, David G. Johnston wrote:
On Fri, Sep 4, 2020 at 2:36 AM Jürgen Purtz <juergen@purtz.de
<mailto:juergen@purtz.de>> wrote:On 04.09.20 08:52, Peter Eisentraut wrote:
For the remaining patch I have a couple of concerns:
This patch should not be changing the formatting choices for these
queries, just the addition of a JOIN clause and modification of the
WHERE clause. Specifically, SELECT is left-aligned while all
subsequent clauses indent under it. Forced alignment by adding extra
spaces isn't done here either. I have not altered those in the attached.Did some word-smithing on the first paragraph. The part about the
cross-join was hurt by "in some way" and "may be" is not needed.Pointing out that values from both tables doesn't seem like an
improvement when the second item covers that and it is more specific
in noting that the city name that is joined on appears twice - once
from each table.ON expression is more precise and the reader should be ok with the term.
Removal of the exercise is good. Not the time to discuss cross join
anyway. Given that "ON true" works the cross join form isn't even
required.In the FROM clause form I would not add table prefixes to the column
names. They are not part of the form changing. If discussion about
table prefixing is desired it should be done explicitly and by
itself. They are used later on, I didn't check to see whether that
was covered or might be confusing.I suggested a wording for why to use join syntax that doesn't involve
legacy and points out its merit compared to sticking a join expression
into the where clause.The original patch missed having the syntax for the first left outer
join conform to the multi-line query writing standard you introduced.
I did not change.The "AND" ON clause should just go with (not changed):
ON (w1.temp_lo < w2.temp_lo
AND w1.temp_hi > w2.temp_high);Attaching my suggestions made on top of the attached original
0002-query.patchDavid J.
(Hopefully) I have integrated all of David's suggestions as well as the
following rules:
- Syntax formatting with the previously used 4 spaces plus newline for JOIN
- Table aliases only when necessary or explicitly discussed
The discussion about the explicit vs. implicit syntax is added to the
"As join expressions serve a specific purpose ... " sentence and creates
a paragraph of its own.
The patch is build on top of master.
--
J. Purtz
Attachments:
0003-query.patchtext/x-patch; charset=UTF-8; name=0003-query.patchDownload
diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml
index e73e805ec4..532427ab4e 100644
--- a/doc/src/sgml/query.sgml
+++ b/doc/src/sgml/query.sgml
@@ -440,13 +440,12 @@ SELECT DISTINCT city
<para>
Thus far, our queries have only accessed one table at a time.
- Queries can access multiple tables at once, or access the same
- table in such a way that multiple rows of the table are being
- processed at the same time. A query that accesses multiple rows
- of the same or different tables at one time is called a
- <firstterm>join</firstterm> query. As an example, say you wish to
- list all the weather records together with the location of the
- associated city. To do that, we need to compare the <structfield>city</structfield>
+ Queries which access multiple tables (including repeats) at once are called
+ <firstterm>join</firstterm> queries. They internally combine
+ each row from one table with each row of a second table. An expression is
+ specified to limit which pairs of rows are returned.
+ For example, to return all the weather records together with the location of the
+ associated city, the database compare the <structfield>city</structfield>
column of each row of the <structname>weather</structname> table with the
<structfield>name</structfield> column of all rows in the <structname>cities</structname>
table, and select the pairs of rows where these values match.
@@ -461,10 +460,17 @@ SELECT DISTINCT city
<programlisting>
SELECT *
- FROM weather, cities
- WHERE city = name;
+ FROM weather
+ JOIN cities ON (city = name);
</programlisting>
+ After the keyword <command>ON</command> follows the
+ expression comparing their rows. In this case, the
+ definition how to compare their rows. In this case, the
+ column <varname>city</varname> of table <varname>weather</varname>
+ must be equal to the column <varname>name</varname>
+ of table <varname>cities</varname>.
+
<screen>
city | temp_lo | temp_hi | prcp | date | name | location
---------------+---------+---------+------+------------+---------------+-----------
@@ -499,23 +505,14 @@ SELECT *
<literal>*</literal>:
<programlisting>
SELECT city, temp_lo, temp_hi, prcp, date, location
- FROM weather, cities
- WHERE city = name;
+ FROM weather
+ JOIN cities ON (city = name);
</programlisting>
</para>
</listitem>
</itemizedlist>
</para>
- <formalpara>
- <title>Exercise:</title>
-
- <para>
- Attempt to determine the semantics of this query when the
- <literal>WHERE</literal> clause is omitted.
- </para>
- </formalpara>
-
<para>
Since the columns all had different names, the parser
automatically found which table they belong to. If there
@@ -526,8 +523,8 @@ SELECT city, temp_lo, temp_hi, prcp, date, location
<programlisting>
SELECT weather.city, weather.temp_lo, weather.temp_hi,
weather.prcp, weather.date, cities.location
- FROM weather, cities
- WHERE cities.name = weather.city;
+ FROM weather
+ JOIN cities ON (cities.name = weather.city);
</programlisting>
It is widely considered good style to qualify all column names
@@ -537,15 +534,29 @@ SELECT weather.city, weather.temp_lo, weather.temp_hi,
<para>
Join queries of the kind seen thus far can also be written in this
- alternative form:
+ form:
<programlisting>
SELECT *
- FROM weather INNER JOIN cities ON (weather.city = cities.name);
+ FROM weather, cities
+ WHERE city = name;
</programlisting>
- This syntax is not as commonly used as the one above, but we show
- it here to help you understand the following topics.
+ This syntax pre-dates the <literal>JOIN</literal> and <literal>ON</literal>
+ keywords. The tables are simply listed in the <literal>FROM</literal>,
+ comma-separated, and the comparison expression added to the
+ <literal>WHERE</literal> clause.
+ </para>
+
+ <para>
+ As join expressions serve a specific
+ purpose in a multi-table query it is preferable to make them stand-out
+ by using join clauses to introduce additional tables into the query.
+ The results from the old, implicit syntax and the new, explicit
+ JOIN/ON syntax are identical. But for a reader of the statement
+ its meaning is now easier to understand: the join condition is
+ introduced by its own key word whereas previously the condition was
+ merged into the WHERE clause together with other conditions.
</para>
<indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
@@ -558,12 +569,13 @@ SELECT *
found we want some <quote>empty values</quote> to be substituted
for the <structname>cities</structname> table's columns. This kind
of query is called an <firstterm>outer join</firstterm>. (The
- joins we have seen so far are inner joins.) The command looks
- like this:
+ joins we have seen so far are <firstterm>inner joins</firstterm>.)
+ The command looks like this:
<programlisting>
SELECT *
- FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
+ FROM weather
+ LEFT OUTER JOIN cities ON (city = name);
</programlisting>
<screen>
@@ -593,10 +605,9 @@ SELECT *
</para>
</formalpara>
+ <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
+ <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
<para>
- <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
- <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
-
We can also join a table against itself. This is called a
<firstterm>self join</firstterm>. As an example, suppose we wish
to find all the weather records that are in the temperature range
@@ -610,10 +621,9 @@ SELECT *
<programlisting>
SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high,
- w2.city, w2.temp_lo AS low, w2.temp_hi AS high
- FROM weather w1, weather w2
- WHERE w1.temp_lo < w2.temp_lo
- AND w1.temp_hi > w2.temp_hi;
+ w2.city, w2.temp_lo AS low, w2.temp_hi AS high
+ FROM weather w1
+ JOIN weather w2 ON (w1.temp_lo < w2.temp_lo AND w1.temp_hi > w2.temp_hi);
</programlisting>
<screen>
@@ -630,8 +640,8 @@ SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high,
queries to save some typing, e.g.:
<programlisting>
SELECT *
- FROM weather w, cities c
- WHERE w.city = c.name;
+ FROM weather w
+ JOIN cities c ON (w.city = c.name);
</programlisting>
You will encounter this style of abbreviating quite frequently.
</para>
čt 22. 10. 2020 v 15:32 odesílatel Jürgen Purtz <juergen@purtz.de> napsal:
On 22.10.20 01:40, David G. Johnston wrote:
On Fri, Sep 4, 2020 at 2:36 AM Jürgen Purtz <juergen@purtz.de> wrote:
On 04.09.20 08:52, Peter Eisentraut wrote:
For the remaining patch I have a couple of concerns:
This patch should not be changing the formatting choices for these
queries, just the addition of a JOIN clause and modification of the WHERE
clause. Specifically, SELECT is left-aligned while all subsequent clauses
indent under it. Forced alignment by adding extra spaces isn't done here
either. I have not altered those in the attached.Did some word-smithing on the first paragraph. The part about the
cross-join was hurt by "in some way" and "may be" is not needed.Pointing out that values from both tables doesn't seem like an improvement
when the second item covers that and it is more specific in noting that the
city name that is joined on appears twice - once from each table.ON expression is more precise and the reader should be ok with the term.
Removal of the exercise is good. Not the time to discuss cross join
anyway. Given that "ON true" works the cross join form isn't even required.In the FROM clause form I would not add table prefixes to the column
names. They are not part of the form changing. If discussion about table
prefixing is desired it should be done explicitly and by itself. They are
used later on, I didn't check to see whether that was covered or might be
confusing.I suggested a wording for why to use join syntax that doesn't involve
legacy and points out its merit compared to sticking a join expression into
the where clause.The original patch missed having the syntax for the first left outer join
conform to the multi-line query writing standard you introduced. I did not
change.The "AND" ON clause should just go with (not changed):
ON (w1.temp_lo < w2.temp_lo
AND w1.temp_hi > w2.temp_high);Attaching my suggestions made on top of the attached original
0002-query.patchDavid J.
(Hopefully) I have integrated all of David's suggestions as well as the
following rules:- Syntax formatting with the previously used 4 spaces plus newline for JOIN
- Table aliases only when necessary or explicitly discussed
The discussion about the explicit vs. implicit syntax is added to the "As
join expressions serve a specific purpose ... " sentence and creates a
paragraph of its own.The patch is build on top of master.
Why do you use parenthesis for ON clause? It is useless. SQL is not C or
JAVA.
Regards
Pavel
--
Show quoted text
J. Purtz
On Thu, Oct 22, 2020 at 8:14 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:
Why do you use parenthesis for ON clause? It is useless. SQL is not C or
JAVA.
At this point in my career it's just a personal habit. I never programmed
C, done most of my development in Java so maybe that's a subconscious
influence?
I suspect it is partly because I seldom need to use "ON" but instead join
with "USING" which does require the parentheses, so when I need to use ON I
just keep them.
I agree they are unnecessary in the example and should be removed to be
consistent.
David J.
čt 22. 10. 2020 v 18:27 odesílatel David G. Johnston <
david.g.johnston@gmail.com> napsal:
On Thu, Oct 22, 2020 at 8:14 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:Why do you use parenthesis for ON clause? It is useless. SQL is not C or
JAVA.At this point in my career it's just a personal habit. I never programmed
C, done most of my development in Java so maybe that's a subconscious
influence?I suspect it is partly because I seldom need to use "ON" but instead join
with "USING" which does require the parentheses, so when I need to use ON I
just keep them.I agree they are unnecessary in the example and should be removed to be
consistent.
:)
Show quoted text
David J.
On 22.10.20 17:14, Pavel Stehule wrote:
Why do you use parenthesis for ON clause? It is useless. SQL is not C
or JAVA.
Two more general answers:
- Why do people use tabs, spaces, and newlines to format their code even
though it's not necessary? SQL is a language to develop applications.
And what are the main costs of an application? It's not the time which
it takes to develop them. It's the time for their maintenance. During
the course of one or more decades, different persons will have to read
the code, add additional features, and fix bugs. They need some time to
read and understand the existing code. This task can be accelerated if
the code is easy to read. Therefore, it's a good habit of developers to
sometimes spend some extra characters to the code than is required -
not only comments. An example: there are clear precedence rules for
Boolean operators NOT/AND/OR. In an extensive statement it may be
helpful - for the developer himself as well as for anybody else -to use
newlines and parentheses at places where they are not necessary to keep
an overview of the intention of the statement. In such cases,
code-optimization is the duty of the compiler, not of the developer.
- In my professional life as a software developer, I have seen about 15
different languages. But only in rare cases, they have offered new
features or concepts. To overcome this Babylonian linguistic diversity I
tend to use such syntactical constructs which are common to many of them
even, even if they are not necessary for the concrete language.
And the concrete answer: Omitting the parentheses for the join condition
raises the danger that its Boolean operators are mixed with the Boolean
operators of the WHERE condition. The result at runtime is the same, but
a reader will understand the intention of the statement faster if the
parentheses exists.
--
J. Purtz
pá 23. 10. 2020 v 11:14 odesílatel Jürgen Purtz <juergen@purtz.de> napsal:
On 22.10.20 17:14, Pavel Stehule wrote:
Why do you use parenthesis for ON clause? It is useless. SQL is not C
or JAVA.Two more general answers:
- Why do people use tabs, spaces, and newlines to format their code even
though it's not necessary? SQL is a language to develop applications.
And what are the main costs of an application? It's not the time which
it takes to develop them. It's the time for their maintenance. During
the course of one or more decades, different persons will have to read
the code, add additional features, and fix bugs. They need some time to
read and understand the existing code. This task can be accelerated if
the code is easy to read. Therefore, it's a good habit of developers to
sometimes spend some extra characters to the code than is required -
not only comments. An example: there are clear precedence rules for
Boolean operators NOT/AND/OR. In an extensive statement it may be
helpful - for the developer himself as well as for anybody else -to use
newlines and parentheses at places where they are not necessary to keep
an overview of the intention of the statement. In such cases,
code-optimization is the duty of the compiler, not of the developer.
- In my professional life as a software developer, I have seen about 15
different languages. But only in rare cases, they have offered new
features or concepts. To overcome this Babylonian linguistic diversity I
tend to use such syntactical constructs which are common to many of them
even, even if they are not necessary for the concrete language.And the concrete answer: Omitting the parentheses for the join condition
raises the danger that its Boolean operators are mixed with the Boolean
operators of the WHERE condition. The result at runtime is the same, but
a reader will understand the intention of the statement faster if the
parentheses exists.
I strongly disagree.
If there are some boolean predicates, then parenthesis has sense. Without
these predicates the parenthesis decrease readability. This is the sense of
JOIN syntax to separate predicates.
I have a different problem - when I see parentheses where they should not
be, I am searching for a reason, and It is unfriendly where there is not
any reason. I can understand if somebody uses useless parentheses in their
product, but we talk about official documentation, and then we should
respect the character of language.
Regards
Pavel
Show quoted text
--
J. Purtz
Status update for a commitfest entry.
The commitfest is nearing the end and this thread was inactive for a while. As far as I see something got committed and now the discussion is stuck in arguing about parenthesis.
FWIW, I think it is a matter of personal taste. Maybe we can compromise on simply leaving this part unchanged.
If you are planning to continue working on it, please move it to the next CF.
On 30.11.20 20:45, Anastasia Lubennikova wrote:
As far as I see something got committed and now the discussion is stuck in arguing about parenthesis.
FWIW, I think it is a matter of personal taste. Maybe we can compromise on simply leaving this part unchanged.
With or without parenthesis is a little more than a personal taste, but
it's a very tiny detail. I'm happy with either of the two variants.
--
J. Purtz
On Mon, Nov 30, 2020 at 1:15 PM Jürgen Purtz <juergen@purtz.de> wrote:
On 30.11.20 20:45, Anastasia Lubennikova wrote:
As far as I see something got committed and now the discussion is stuck
in arguing about parenthesis.
FWIW, I think it is a matter of personal taste. Maybe we can compromise
on simply leaving this part unchanged.
With or without parenthesis is a little more than a personal taste, but
it's a very tiny detail. I'm happy with either of the two variants.
Sorry, I managed to overlook the most recent patch.
I admitted my use of parentheses was incorrect and I don't see anyone else
defending them. Please remove them.
Minor typos:
"the database compare" -> needs an "s" (compares)
"In this case, the definition how to compare their rows." -> remove,
redundant with the first sentence
"The results from the older implicit syntax, and the newer explicit JOIN/ON
syntax, are identical" -> move the commas around to what is shown here
David J.
On 30.11.20 21:25, David G. Johnston wrote:
Sorry, I managed to overlook the most recent patch.
I admitted my use of parentheses was incorrect and I don't see anyone
else defending them. Please remove them.Minor typos:
"the database compare" -> needs an "s" (compares)
"In this case, the definition how to compare their rows." -> remove,
redundant with the first sentence"The results from the older implicit syntax, and the newer explicit
JOIN/ON syntax, are identical" -> move the commas around to what is
shown hereDavid J.
OK. Patch attached.
--
J. Purtz
Attachments:
0004-query.patchtext/x-patch; charset=UTF-8; name=0004-query.patchDownload
diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml
index e73e805ec4..f343833dae 100644
--- a/doc/src/sgml/query.sgml
+++ b/doc/src/sgml/query.sgml
@@ -440,13 +440,12 @@ SELECT DISTINCT city
<para>
Thus far, our queries have only accessed one table at a time.
- Queries can access multiple tables at once, or access the same
- table in such a way that multiple rows of the table are being
- processed at the same time. A query that accesses multiple rows
- of the same or different tables at one time is called a
- <firstterm>join</firstterm> query. As an example, say you wish to
- list all the weather records together with the location of the
- associated city. To do that, we need to compare the <structfield>city</structfield>
+ Queries which access multiple tables (including repeats) at once are called
+ <firstterm>join</firstterm> queries. They internally combine
+ each row from one table with each row of a second table. An expression is
+ specified to limit which pairs of rows are returned.
+ For example, to return all the weather records together with the location of the
+ associated city, the database compares the <structfield>city</structfield>
column of each row of the <structname>weather</structname> table with the
<structfield>name</structfield> column of all rows in the <structname>cities</structname>
table, and select the pairs of rows where these values match.
@@ -461,10 +460,16 @@ SELECT DISTINCT city
<programlisting>
SELECT *
- FROM weather, cities
- WHERE city = name;
+ FROM weather
+ JOIN cities ON city = name;
</programlisting>
+ After the keyword <command>ON</command> follows the
+ expression comparing their rows. In this example the
+ column <varname>city</varname> of table <varname>weather</varname>
+ must be equal to the column <varname>name</varname>
+ of table <varname>cities</varname>.
+
<screen>
city | temp_lo | temp_hi | prcp | date | name | location
---------------+---------+---------+------+------------+---------------+-----------
@@ -499,23 +504,14 @@ SELECT *
<literal>*</literal>:
<programlisting>
SELECT city, temp_lo, temp_hi, prcp, date, location
- FROM weather, cities
- WHERE city = name;
+ FROM weather
+ JOIN cities ON city = name;
</programlisting>
</para>
</listitem>
</itemizedlist>
</para>
- <formalpara>
- <title>Exercise:</title>
-
- <para>
- Attempt to determine the semantics of this query when the
- <literal>WHERE</literal> clause is omitted.
- </para>
- </formalpara>
-
<para>
Since the columns all had different names, the parser
automatically found which table they belong to. If there
@@ -526,8 +522,8 @@ SELECT city, temp_lo, temp_hi, prcp, date, location
<programlisting>
SELECT weather.city, weather.temp_lo, weather.temp_hi,
weather.prcp, weather.date, cities.location
- FROM weather, cities
- WHERE cities.name = weather.city;
+ FROM weather
+ JOIN cities ON cities.name = weather.city;
</programlisting>
It is widely considered good style to qualify all column names
@@ -537,15 +533,29 @@ SELECT weather.city, weather.temp_lo, weather.temp_hi,
<para>
Join queries of the kind seen thus far can also be written in this
- alternative form:
+ form:
<programlisting>
SELECT *
- FROM weather INNER JOIN cities ON (weather.city = cities.name);
+ FROM weather, cities
+ WHERE city = name;
</programlisting>
- This syntax is not as commonly used as the one above, but we show
- it here to help you understand the following topics.
+ This syntax pre-dates the <literal>JOIN</literal> and <literal>ON</literal>
+ keywords. The tables are simply listed in the <literal>FROM</literal>,
+ comma-separated, and the comparison expression added to the
+ <literal>WHERE</literal> clause.
+ </para>
+
+ <para>
+ As join expressions serve a specific
+ purpose in a multi-table query it is preferable to make them stand-out
+ by using join clauses to introduce additional tables into the query.
+ The results from the older implicit syntax and the newer explicit
+ JOIN/ON syntax are identical. But for a reader of the statement
+ its meaning is now easier to understand: the join condition is
+ introduced by its own key word whereas previously the condition was
+ merged into the WHERE clause together with other conditions.
</para>
<indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
@@ -558,12 +568,13 @@ SELECT *
found we want some <quote>empty values</quote> to be substituted
for the <structname>cities</structname> table's columns. This kind
of query is called an <firstterm>outer join</firstterm>. (The
- joins we have seen so far are inner joins.) The command looks
- like this:
+ joins we have seen so far are <firstterm>inner joins</firstterm>.)
+ The command looks like this:
<programlisting>
SELECT *
- FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
+ FROM weather
+ LEFT OUTER JOIN cities ON city = name;
</programlisting>
<screen>
@@ -593,10 +604,9 @@ SELECT *
</para>
</formalpara>
+ <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
+ <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
<para>
- <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
- <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
-
We can also join a table against itself. This is called a
<firstterm>self join</firstterm>. As an example, suppose we wish
to find all the weather records that are in the temperature range
@@ -610,10 +620,9 @@ SELECT *
<programlisting>
SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high,
- w2.city, w2.temp_lo AS low, w2.temp_hi AS high
- FROM weather w1, weather w2
- WHERE w1.temp_lo < w2.temp_lo
- AND w1.temp_hi > w2.temp_hi;
+ w2.city, w2.temp_lo AS low, w2.temp_hi AS high
+ FROM weather w1
+ JOIN weather w2 ON w1.temp_lo < w2.temp_lo AND w1.temp_hi > w2.temp_hi;
</programlisting>
<screen>
@@ -630,8 +639,8 @@ SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high,
queries to save some typing, e.g.:
<programlisting>
SELECT *
- FROM weather w, cities c
- WHERE w.city = c.name;
+ FROM weather w
+ JOIN cities c ON w.city = c.name;
</programlisting>
You will encounter this style of abbreviating quite frequently.
</para>
On 12/1/20 3:38 AM, Jürgen Purtz wrote:
On 30.11.20 21:25, David G. Johnston wrote:
Sorry, I managed to overlook the most recent patch.
I admitted my use of parentheses was incorrect and I don't see anyone
else defending them. Please remove them.Minor typos:
"the database compare" -> needs an "s" (compares)
"In this case, the definition how to compare their rows." -> remove,
redundant with the first sentence"The results from the older implicit syntax, and the newer explicit
JOIN/ON syntax, are identical" -> move the commas around to what is
shown hereOK. Patch attached.
Peter, you committed some of this patch originally. Do you think the
rest of the patch is now in shape to be committed?
Regards,
--
-David
david@pgmasters.net
On Thu, Mar 11, 2021 at 2:06 AM David Steele <david@pgmasters.net> wrote:
On 12/1/20 3:38 AM, Jürgen Purtz wrote:
OK. Patch attached.
+ Queries which access multiple tables (including repeats) at once are called
I'd write "Queries that" here (that's is a transatlantic difference in
usage; I try to proofread these things in American mode for
consistency with the rest of the language in this project, which I
probably don't entirely succeed at but this one I've learned...).
Maybe instead of "(including repeats)" it could say "(or multiple
instances of the same table)"?
+ For example, to return all the weather records together with the
location of the
+ associated city, the database compares the <structfield>city</structfield>
column of each row of the <structname>weather</structname> table with the
<structfield>name</structfield> column of all rows in the
<structname>cities</structname>
table, and select the pairs of rows where these values match.
Here "select" should agree with "the database" and take an -s, no?
+ This syntax pre-dates the <literal>JOIN</literal> and <literal>ON</literal>
+ keywords. The tables are simply listed in the <literal>FROM</literal>,
+ comma-separated, and the comparison expression added to the
+ <literal>WHERE</literal> clause.
Could we mention SQL92 somewhere? Like maybe "This syntax pre-dates
the JOIN and ON keywords, which were introduced by SQL-92". (That's a
"non-restrictive which", I think the clue is the comma?)
po 15. 3. 2021 v 3:48 odesílatel Thomas Munro <thomas.munro@gmail.com>
napsal:
On Thu, Mar 11, 2021 at 2:06 AM David Steele <david@pgmasters.net> wrote:
On 12/1/20 3:38 AM, Jürgen Purtz wrote:
OK. Patch attached.
+ Queries which access multiple tables (including repeats) at once are
calledI'd write "Queries that" here (that's is a transatlantic difference in
usage; I try to proofread these things in American mode for
consistency with the rest of the language in this project, which I
probably don't entirely succeed at but this one I've learned...).Maybe instead of "(including repeats)" it could say "(or multiple
instances of the same table)"?+ For example, to return all the weather records together with the location of the + associated city, the database compares the <structfield>city</structfield> column of each row of the <structname>weather</structname> table with the <structfield>name</structfield> column of all rows in the <structname>cities</structname> table, and select the pairs of rows where these values match.Here "select" should agree with "the database" and take an -s, no?
+ This syntax pre-dates the <literal>JOIN</literal> and <literal>ON</literal> + keywords. The tables are simply listed in the <literal>FROM</literal>, + comma-separated, and the comparison expression added to the + <literal>WHERE</literal> clause.Could we mention SQL92 somewhere? Like maybe "This syntax pre-dates
the JOIN and ON keywords, which were introduced by SQL-92". (That's a
"non-restrictive which", I think the clue is the comma?)
previous syntax should be mentioned too. An reader can find this syntax
thousands applications
Pavel
Show quoted text
On 15.03.21 03:47, Thomas Munro wrote:
On Thu, Mar 11, 2021 at 2:06 AM David Steele <david@pgmasters.net> wrote:
On 12/1/20 3:38 AM, Jürgen Purtz wrote:
OK. Patch attached.
+ Queries which access multiple tables (including repeats) at once are called
I'd write "Queries that" here (that's is a transatlantic difference in
usage; I try to proofread these things in American mode for
consistency with the rest of the language in this project, which I
probably don't entirely succeed at but this one I've learned...).Maybe instead of "(including repeats)" it could say "(or multiple
instances of the same table)"?+ For example, to return all the weather records together with the location of the + associated city, the database compares the <structfield>city</structfield> column of each row of the <structname>weather</structname> table with the <structfield>name</structfield> column of all rows in the <structname>cities</structname> table, and select the pairs of rows where these values match.Here "select" should agree with "the database" and take an -s, no?
+ This syntax pre-dates the <literal>JOIN</literal> and <literal>ON</literal> + keywords. The tables are simply listed in the <literal>FROM</literal>, + comma-separated, and the comparison expression added to the + <literal>WHERE</literal> clause.Could we mention SQL92 somewhere? Like maybe "This syntax pre-dates
the JOIN and ON keywords, which were introduced by SQL-92". (That's a
"non-restrictive which", I think the clue is the comma?)
+1. All proposed changes integrated.
--
Kind regards, Jürgen Purtz
Attachments:
0005-query.patchtext/x-patch; charset=UTF-8; name=0005-query.patchDownload
diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml
index e793398bb2..44aa8ef32b 100644
--- a/doc/src/sgml/query.sgml
+++ b/doc/src/sgml/query.sgml
@@ -438,16 +438,15 @@ SELECT DISTINCT city
<para>
Thus far, our queries have only accessed one table at a time.
- Queries can access multiple tables at once, or access the same
- table in such a way that multiple rows of the table are being
- processed at the same time. A query that accesses multiple rows
- of the same or different tables at one time is called a
- <firstterm>join</firstterm> query. As an example, say you wish to
- list all the weather records together with the location of the
- associated city. To do that, we need to compare the <structfield>city</structfield>
+ Queries that access multiple tables (or multiple instances of the same table) at once are called
+ <firstterm>join</firstterm> queries. They internally combine
+ each row from one table with each row of a second table. An expression is
+ specified to limit which pairs of rows are returned.
+ For example, to return all the weather records together with the location of the
+ associated city, the database compares the <structfield>city</structfield>
column of each row of the <structname>weather</structname> table with the
<structfield>name</structfield> column of all rows in the <structname>cities</structname>
- table, and select the pairs of rows where these values match.
+ table, and selects the pairs of rows where these values match.
<note>
<para>
This is only a conceptual model. The join is usually performed
@@ -459,10 +458,16 @@ SELECT DISTINCT city
<programlisting>
SELECT *
- FROM weather, cities
- WHERE city = name;
+ FROM weather
+ JOIN cities ON city = name;
</programlisting>
+ After the keyword <command>ON</command> follows the
+ expression comparing their rows. In this example the
+ column <varname>city</varname> of table <varname>weather</varname>
+ must be equal to the column <varname>name</varname>
+ of table <varname>cities</varname>.
+
<screen>
city | temp_lo | temp_hi | prcp | date | name | location
---------------+---------+---------+------+------------+---------------+-----------
@@ -497,23 +502,14 @@ SELECT *
<literal>*</literal>:
<programlisting>
SELECT city, temp_lo, temp_hi, prcp, date, location
- FROM weather, cities
- WHERE city = name;
+ FROM weather
+ JOIN cities ON city = name;
</programlisting>
</para>
</listitem>
</itemizedlist>
</para>
- <formalpara>
- <title>Exercise:</title>
-
- <para>
- Attempt to determine the semantics of this query when the
- <literal>WHERE</literal> clause is omitted.
- </para>
- </formalpara>
-
<para>
Since the columns all had different names, the parser
automatically found which table they belong to. If there
@@ -524,8 +520,8 @@ SELECT city, temp_lo, temp_hi, prcp, date, location
<programlisting>
SELECT weather.city, weather.temp_lo, weather.temp_hi,
weather.prcp, weather.date, cities.location
- FROM weather, cities
- WHERE cities.name = weather.city;
+ FROM weather
+ JOIN cities ON cities.name = weather.city;
</programlisting>
It is widely considered good style to qualify all column names
@@ -535,15 +531,29 @@ SELECT weather.city, weather.temp_lo, weather.temp_hi,
<para>
Join queries of the kind seen thus far can also be written in this
- alternative form:
+ form:
<programlisting>
SELECT *
- FROM weather INNER JOIN cities ON (weather.city = cities.name);
+ FROM weather, cities
+ WHERE city = name;
</programlisting>
- This syntax is not as commonly used as the one above, but we show
- it here to help you understand the following topics.
+ This syntax pre-dates the <literal>JOIN</literal> and <literal>ON</literal>
+ keywords, which were introduced by SQL-92. The tables are simply listed
+ in the <literal>FROM</literal>, comma-separated, and the comparison
+ expression added to the <literal>WHERE</literal> clause.
+ </para>
+
+ <para>
+ As join expressions serve a specific
+ purpose in a multi-table query it is preferable to make them stand-out
+ by using join clauses to introduce additional tables into the query.
+ The results from the older implicit syntax and the newer explicit
+ JOIN/ON syntax are identical. But for a reader of the statement
+ its meaning is now easier to understand: the join condition is
+ introduced by its own key word whereas previously the condition was
+ merged into the WHERE clause together with other conditions.
</para>
<indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
@@ -556,12 +566,13 @@ SELECT *
found we want some <quote>empty values</quote> to be substituted
for the <structname>cities</structname> table's columns. This kind
of query is called an <firstterm>outer join</firstterm>. (The
- joins we have seen so far are inner joins.) The command looks
- like this:
+ joins we have seen so far are <firstterm>inner joins</firstterm>.)
+ The command looks like this:
<programlisting>
SELECT *
- FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
+ FROM weather
+ LEFT OUTER JOIN cities ON city = name;
</programlisting>
<screen>
@@ -591,10 +602,9 @@ SELECT *
</para>
</formalpara>
+ <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
+ <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
<para>
- <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
- <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
-
We can also join a table against itself. This is called a
<firstterm>self join</firstterm>. As an example, suppose we wish
to find all the weather records that are in the temperature range
@@ -608,10 +618,9 @@ SELECT *
<programlisting>
SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high,
- w2.city, w2.temp_lo AS low, w2.temp_hi AS high
- FROM weather w1, weather w2
- WHERE w1.temp_lo < w2.temp_lo
- AND w1.temp_hi > w2.temp_hi;
+ w2.city, w2.temp_lo AS low, w2.temp_hi AS high
+ FROM weather w1
+ JOIN weather w2 ON w1.temp_lo < w2.temp_lo AND w1.temp_hi > w2.temp_hi;
</programlisting>
<screen>
@@ -628,8 +637,8 @@ SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high,
queries to save some typing, e.g.:
<programlisting>
SELECT *
- FROM weather w, cities c
- WHERE w.city = c.name;
+ FROM weather w
+ JOIN cities c ON w.city = c.name;
</programlisting>
You will encounter this style of abbreviating quite frequently.
</para>