Preserving order through an inner join
I have a query structured like this:
SELECT stuff FROM
(SELECT more stuff FROM
table1
ORDER BY field1) AS q1
INNER JOIN table2 ON ( ... )
and have found that the INNER JOIN is ignoring the order set for q1.
The final results are not ordered by field1.
This works for other databases (eg. MySQL and Sqllite3) but not PostgreSQL.
I would really like to support PostgreSQL but this ordering problem is stopping me from doing so.
I can make some small changes to the query structure as long as it works for the other DBs as well. Moving the ORDER BY outside q1 would be a large amount of work, however (these queries are generated by a program), so I am hoping that there is a simpler solution.
Any suggestions for getting this to work?
Kevin
Hello
2010/9/26 Kevin Jardine <kevinjardine@yahoo.com>:
I have a query structured like this:
SELECT stuff FROM
(SELECT more stuff FROM
table1
ORDER BY field1) AS q1
INNER JOIN table2 ON ( ... )and have found that the INNER JOIN is ignoring the order set for q1.
you can block a hash join that cannot to protect order. But it can be
very contra-productive - this method of JOIN is the most faster.
SQLite or MySQL are too simple and doesn't provide some more
sophisticated constructs. Principally your query is wrong - the real
order is based on final ORDER BY. It's probable, so your queries will
stop working (on SQLite or MySQL) in future, when these databases will
be more mature.
Regards
Pavel Stehule
Show quoted text
The final results are not ordered by field1.
This works for other databases (eg. MySQL and Sqllite3) but not PostgreSQL.
I would really like to support PostgreSQL but this ordering problem is stopping me from doing so.
I can make some small changes to the query structure as long as it works for the other DBs as well. Moving the ORDER BY outside q1 would be a large amount of work, however (these queries are generated by a program), so I am hoping that there is a simpler solution.
Any suggestions for getting this to work?
Kevin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hi Pavel,
I'm not really interested in a "my database is better than your database" discussion.
I want to know how to preserve the order. As I said, moving the ORDER BY out of the subquery would be a lot of work.
I am trying to find another solution.
Any suggestions from anyone?
Kevin
--- On Sun, 9/26/10, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Show quoted text
From: Pavel Stehule <pavel.stehule@gmail.com>
Subject: Re: [GENERAL] Preserving order through an inner join
To: "Kevin Jardine" <kevinjardine@yahoo.com>
Cc: pgsql-general@postgresql.org
Date: Sunday, September 26, 2010, 9:23 PM
Hello2010/9/26 Kevin Jardine <kevinjardine@yahoo.com>:
I have a query structured like this:
SELECT stuff FROM
(SELECT more stuff FROM
table1
ORDER BY field1) AS q1
INNER JOIN table2 ON ( ... )and have found that the INNER JOIN is ignoring the
order set for q1.
you can block a hash join that cannot to protect order. But
it can be
very contra-productive - this method of JOIN is the most
faster.
SQLite or MySQL are too simple and doesn't provide some
more
sophisticated constructs. Principally your query is wrong -
the real
order is based on final ORDER BY. It's probable, so your
queries will
stop working (on SQLite or MySQL) in future, when these
databases will
be more mature.Regards
Pavel Stehule
The final results are not ordered by field1.
This works for other databases (eg. MySQL and
Sqllite3) but not PostgreSQL.
I would really like to support PostgreSQL but this
ordering problem is stopping me from doing so.
I can make some small changes to the query structure
as long as it works for the other DBs as well. Moving the
ORDER BY outside q1 would be a large amount of work, however
(these queries are generated by a program), so I am hoping
that there is a simpler solution.Any suggestions for getting this to work?
Kevin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Kevin Jardine <kevinjardine@yahoo.com> writes:
I have a query structured like this:
SELECT stuff FROM
(SELECT more stuff FROM
table1
ORDER BY field1) AS q1
INNER JOIN table2 ON ( ... )
and have found that the INNER JOIN is ignoring the order set for q1.
The final results are not ordered by field1.
Indeed. Many of the possible join techniques won't preserve that ordering.
This works for other databases (eg. MySQL and Sqllite3) but not PostgreSQL.
It might sometimes accidentally fail to fail, but I think you'll find
that there are *no* SQL databases where this is guaranteed to work the
way you expect. The SQL standard explicitly disavows any particular
output row order unless there is a top-level ORDER BY. (In fact,
unless things have changed recently an ORDER BY in a sub-select isn't
even legal per spec.)
I can make some small changes to the query structure as long as it works for the other DBs as well. Moving the ORDER BY outside q1 would be a large amount of work, however (these queries are generated by a program), so I am hoping that there is a simpler solution.
Nope, that's what you need to do.
regards, tom lane
OK, then.
The sub-select needs to go.
Thanks for helpful advice.
Kevin
--- On Sun, 9/26/10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
From: Tom Lane <tgl@sss.pgh.pa.us>
Subject: Re: [GENERAL] Preserving order through an inner join
To: "Kevin Jardine" <kevinjardine@yahoo.com>
Cc: pgsql-general@postgresql.org
Date: Sunday, September 26, 2010, 9:37 PM
Kevin Jardine <kevinjardine@yahoo.com>
writes:I have a query structured like this:
SELECT stuff FROM
(SELECT more stuff FROM
table1
ORDER BY field1) AS q1
INNER JOIN table2 ON ( ... )and have found that the INNER JOIN is ignoring the
order set for q1.
The final results are not ordered by field1.
Indeed. Many of the possible join techniques won't
preserve that ordering.This works for other databases (eg. MySQL and
Sqllite3) but not PostgreSQL.
It might sometimes accidentally fail to fail, but I think
you'll find
that there are *no* SQL databases where this is guaranteed
to work the
way you expect. The SQL standard explicitly disavows
any particular
output row order unless there is a top-level ORDER
BY. (In fact,
unless things have changed recently an ORDER BY in a
sub-select isn't
even legal per spec.)I can make some small changes to the query structure
as long as it works for the other DBs as well. Moving the
ORDER BY outside q1 would be a large amount of work, however
(these queries are generated by a program), so I am hoping
that there is a simpler solution.Nope, that's what you need to do.
regards, tom lane--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Sun, Sep 26, 2010 at 07:32:01PM +0000, Kevin Jardine wrote:
Hi Pavel,
I'm not really interested in a "my database is better than your database" discussion.
Pavel did not say that his database is better than yours.
What he said was that your query is wrong (with respect to
what you said you expect it to return). Tom attested that it
isn't even "legal" SQL as per the spec.
Please keep things factual.
Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
-----Original Message-----
From: Kevin Jardine [mailto:kevinjardine@yahoo.com]
Sent: Sunday, September 26, 2010 3:15 PM
To: pgsql-general@postgresql.org
Subject: Preserving order through an inner joinI have a query structured like this:
SELECT stuff FROM
(SELECT more stuff FROM
table1
ORDER BY field1) AS q1
INNER JOIN table2 ON ( ... )and have found that the INNER JOIN is ignoring the order set for q1.
The final results are not ordered by field1.
This works for other databases (eg. MySQL and Sqllite3) but
not PostgreSQL.I would really like to support PostgreSQL but this ordering
problem is stopping me from doing so.I can make some small changes to the query structure as long
as it works for the other DBs as well. Moving the ORDER BY
outside q1 would be a large amount of work, however (these
queries are generated by a program), so I am hoping that
there is a simpler solution.Any suggestions for getting this to work?
Kevin
As others stated, it's a bad idea to order sub-select.
But if you really, really need it (m.b. not in this particular case),
you could do:
SELECT stuff FROM
(SELECT field1, field2, COUNT(*) FROM
table1
GROUP BY field1, field2
ORDER BY field1) AS q1
INNER JOIN table2 ON ( ... )
Regards,
Igor Neyman
On Sun, Sep 26, 2010 at 9:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
The SQL standard explicitly disavows any particular
output row order unless there is a top-level ORDER BY. (In fact,
unless things have changed recently an ORDER BY in a sub-select isn't
even legal per spec.)
Not sure about the SQL spec allowing it, but an ORDER BY followed by a LIMIT
does have valid use cases in sub-selects.
Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com
singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet
Mail sent from my BlackLaptop device
Following up on other replies, and agreeing that SQL has no inherent row
ordering and it is only by accident that you are seeing such ...
The *only* way to reliably preserve sort order is by explicitly encoding the
order in your data, by maintaining a column or columns through to the outermost
query on which you sort.
A generic solution which you can apply is to use a SQL "RANK"-alike feature to
generate a serial number column in your rowset; you can do this in the inner
query where you want to have all your unique or complex sorting details, and
then your sort order will be preserved (as long as you explicitly propagate said
column either by name or with *) through to the outer query. You can then order
by and exclude said column in the outermost query if you want, but you don't
have to because your application will still get the ordering information in said
extra column.
For example, something like this:
SELECT stuff FROM
(SELECT more stuff, rank() OVER (ORDER BY field1) AS ordcol FROM
table1
) AS q1
INNER JOIN table2 ON ( ... )
ORDER BY ordcol
See
http://www.postgresql.org/docs/9.0/interactive/queries-table-expressions.html#QUERIES-WINDOW
and the 3 urls it links to.
-- Darren Duncan
Kevin Jardine wrote:
Show quoted text
I have a query structured like this:
SELECT stuff FROM
(SELECT more stuff FROM
table1
ORDER BY field1) AS q1
INNER JOIN table2 ON ( ... )and have found that the INNER JOIN is ignoring the order set for q1.
The final results are not ordered by field1.
This works for other databases (eg. MySQL and Sqllite3) but not PostgreSQL.
I would really like to support PostgreSQL but this ordering problem is stopping me from doing so.
I can make some small changes to the query structure as long as it works for the other DBs as well. Moving the ORDER BY outside q1 would be a large amount of work, however (these queries are generated by a program), so I am hoping that there is a simpler solution.
Any suggestions for getting this to work?
Gurjeet Singh wrote:
On Sun, Sep 26, 2010 at 9:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
The SQL standard explicitly disavows any particular
output row order unless there is a top-level ORDER BY. (In fact,
unless things have changed recently an ORDER BY in a sub-select isn't
even legal per spec.)Not sure about the SQL spec allowing it, but an ORDER BY followed by a
LIMIT does have valid use cases in sub-selects.
Absolutely it does, but that is just a row *filtering* operation. You still
have to have a separate ORDER BY in the outermost query to get result rows
output in a particular order. -- Darren Duncan
On 09/27/2010 03:37 AM, Tom Lane wrote:
Kevin Jardine<kevinjardine@yahoo.com> writes:
I have a query structured like this:
SELECT stuff FROM
(SELECT more stuff FROM
table1
ORDER BY field1) AS q1
INNER JOIN table2 ON ( ... )and have found that the INNER JOIN is ignoring the order set for q1.
The final results are not ordered by field1.
Indeed. Many of the possible join techniques won't preserve that ordering.
Just to elaborate on this for the OP:
When joining two data sets, PostgreSQL has a number of choices about how
to do it. Some of these are:
- Merge join (http://en.wikipedia.org/wiki/Sort-merge_join)
- Hash join (http://en.wikipedia.org/wiki/Hash_join)
- Nested loop + index/sequential table scans
(http://en.wikipedia.org/wiki/Nested_loop_join)
As you will see from the descriptions and how they work, many join
algorithms do not preserve the order of the input relations. Different
join types are optimal for different absolute and relative sizes of
input relations and different join conditions; all of them are useful in
one situation or another. Only using joins that preserved the natural
order of the relations (which is undefined as per the SQL spec and 99%
of the time people don't care about anyway) would be pretty nasty for
performance.
You could potentially force preservation of order in your particular
query by telling PostgreSQL to use a nested loop with your subquery as
the outer loop, but it'd require overriding the query planner's join
method and join order optimization in ugly ways that aren't well supported.
Not only that, but you'll have to do that kind of hacking-around if you
target any other database that supports at least hash joins, and you'll
have to do it using database-specific query hints, configuration
options, session variables, etc.
I hope this helps to explain in a bit more detail why it's worth putting
in the effort to update your code.
--
Craig ringer