How do I bump a row to the front of sort efficiently

Started by Sam Saffronalmost 11 years ago12 messages
#1Sam Saffron
sam.saffron@gmail.com

I have this query:

select * from topics
order by case when id=1 then 0 else 1 end, bumped_at desc
limit 30

It works fine, bumps id 1 to the front of the sort fine but is
terribly inefficient and scans

OTH

"select * from topics where id = 1" is super fast

"select * from topics order by bumped_at desc limit 30" is super fast

Even this is fast, and logically equiv as id is primary key unique

select * from topic
where id = 1000
union all
select * from (
select * from topics
where id <> 1000
order by bumped_at desc
limit 30
) as x
limit 30

However, the contortions on the above query make it very un-ORM
friendly as I would need to define a view for it but would have no
clean way to pass limits and offsets in.

Is there any clean technique to bump up particular rows to the front
of a sort if a certain condition is met without paying a huge
performance hit?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2David G Johnston
david.g.johnston@gmail.com
In reply to: Sam Saffron (#1)
Re: How do I bump a row to the front of sort efficiently

sam.saffron wrote

I have this query:

select * from topics
order by case when id=1 then 0 else 1 end, bumped_at desc
limit 30

It works fine, bumps id 1 to the front of the sort fine but is
terribly inefficient and scans

OTH

"select * from topics where id = 1" is super fast

"select * from topics order by bumped_at desc limit 30" is super fast

Even this is fast, and logically equiv as id is primary key unique

select * from topic
where id = 1000
union all
select * from (
select * from topics
where id <> 1000
order by bumped_at desc
limit 30
) as x
limit 30

However, the contortions on the above query make it very un-ORM
friendly as I would need to define a view for it but would have no
clean way to pass limits and offsets in.

Is there any clean technique to bump up particular rows to the front
of a sort if a certain condition is met without paying a huge
performance hit?

CREATE FUNCTION ...?

Probably with a VARIADIC argument.

David J.

--
View this message in context: http://postgresql.nabble.com/How-do-I-bump-a-row-to-the-front-of-sort-efficiently-tp5836354p5836356.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Sam Saffron (#1)
Re: How do I bump a row to the front of sort efficiently

On Mon, Feb 02, 2015 at 05:16:40PM +1100, Sam Saffron wrote:

Even this is fast, and logically equiv as id is primary key unique
select * from topic
where id = 1000
union all
select * from (
select * from topics
where id <> 1000
order by bumped_at desc
limit 30
) as x
limit 30
Is there any clean technique to bump up particular rows to the front
of a sort if a certain condition is met without paying a huge
performance hit?

Why don't you use the union-all approach? If it's fast, and does what
you need ?

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Tim Clarke
tim.clarke@manifest.co.uk
In reply to: hubert depesz lubaczewski (#3)
Re: How do I bump a row to the front of sort efficiently

On 02/02/15 08:40, hubert depesz lubaczewski wrote:

On Mon, Feb 02, 2015 at 05:16:40PM +1100, Sam Saffron wrote:

Even this is fast, and logically equiv as id is primary key unique
select * from topic
where id = 1000
union all
select * from (
select * from topics
where id <> 1000
order by bumped_at desc
limit 30
) as x
limit 30
Is there any clean technique to bump up particular rows to the front
of a sort if a certain condition is met without paying a huge
performance hit?

Why don't you use the union-all approach? If it's fast, and does what
you need ?

depesz

Or create another index or column that's build at insertion time with
the sort that you want in it. At least then retrieval time will be fast
at the cost of extra space.

--
Tim Clarke

A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5BladeOfLight16
bladeoflight16@gmail.com
In reply to: Sam Saffron (#1)
Re: [GENERAL] How do I bump a row to the front of sort efficiently

On Mon, Feb 2, 2015 at 1:16 AM, Sam Saffron <sam.saffron@gmail.com> wrote:

However, the contortions on the above query make it very un-ORM
friendly as I would need to define a view for it but would have no
clean way to pass limits and offsets in.

This is why ORMs are bad. They make hard problems *much* harder, and the
only benefit is that they maybe make easy problems a little quicker. The
cost/savings is *heavily* skewed toward the cost, since there's no upper
bound on the cost and there is a pretty small lower bound on the savings.
Micro-ORMs tend to do a better job of not shielding you from (or rather,
getting in the way of) the SQL while still providing some good
result-to-object translation. Whether even that is necessary depends on
your language, though. (For example, in Python, psycopg2 has a built in way
of spitting out namedtuples, which means you get result-to-object
translation out of the box. That makes even a micro-ORM pretty unnecessary.
On the other hand, a micro-ORM that does this well without blocking you
from the SQL, such as PetaPOCO, is a boon in .NET.)

If you can, your best bet would probably be to find a way to get your ORM
to execute raw SQL (with good parametrization to prevent injection
attacks!!!!) and be done with it. It took me way too much experience
fighting with an ORM on complicated queries to realize that.

#6BladeOfLight16
bladeoflight16@gmail.com
In reply to: BladeOfLight16 (#5)
Re: How do I bump a row to the front of sort efficiently

On Tue, Feb 3, 2015 at 9:33 PM, BladeOfLight16 <bladeoflight16@gmail.com>
wrote:

This is why ORMs are bad. They make hard problems *much* harder, and the
only benefit is that they maybe make easy problems a little quicker. The
cost/savings is *heavily* skewed toward the cost, since there's no upper
bound on the cost and there is a pretty small lower bound on the savings.
Micro-ORMs tend to do a better job of not shielding you from (or rather,
getting in the way of) the SQL while still providing some good
result-to-object translation. Whether even that is necessary depends on
your language, though. (For example, in Python, psycopg2 has a built in way
of spitting out namedtuples, which means you get result-to-object
translation out of the box. That makes even a micro-ORM pretty unnecessary.
On the other hand, a micro-ORM that does this well without blocking you
from the SQL, such as PetaPOCO, is a boon in .NET.)

If you can, your best bet would probably be to find a way to get your ORM
to execute raw SQL (with good parametrization to prevent injection
attacks!!!!) and be done with it. It took me way too much experience
fighting with an ORM on complicated queries to realize that.

Er, *pretty small upper bound on the savings.

#7Sam Saffron
sam.saffron@gmail.com
In reply to: BladeOfLight16 (#6)
Re: How do I bump a row to the front of sort efficiently

Note: I still consider this a bug/missing feature of sorts since the
planner could do better here, and there is no real clean way of
structuring a query to perform efficiently here, which is why I
erroneously cross posted this to hacker initially:

# create table testing(id serial primary key, data varchar);
# insert into testing(data) select 'test' from pg_tables a,pg_tables
b,pg_tables c,pg_tables d limit 100000

# explain select * from testing order by id limit 30;
QUERY PLAN
------------------------------------------------------------------------------------------
Limit (cost=0.29..1.24 rows=30 width=9)
-> Index Scan using testing_pkey on testing (cost=0.29..3148.29
rows=100000 width=9)
(2 rows)

# explain select * from testing where id = 1000;
QUERY PLAN
----------------------------------------------------------------------------
Index Scan using testing_pkey on testing (cost=0.29..8.31 rows=1 width=9)
Index Cond: (id = 1000)
(2 rows)

# explain select * from testing order by case when id = 1000 then 0
else 1 end, id limit 30;
QUERY PLAN
---------------------------------------------------------------------------
Limit (cost=4744.45..4744.52 rows=30 width=9)
-> Sort (cost=4744.45..4994.45 rows=100000 width=9)
Sort Key: (CASE WHEN (id = 1000) THEN 0 ELSE 1 END), id
-> Seq Scan on testing (cost=0.00..1791.00 rows=100000 width=9)
(4 rows)

Cost goes through the roof for a query that pg could have have done
better with if it were able to better "understand" the case statement.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8BladeOfLight16
bladeoflight16@gmail.com
In reply to: Sam Saffron (#7)
Re: How do I bump a row to the front of sort efficiently

On Tue, Feb 3, 2015 at 11:28 PM, Sam Saffron <sam.saffron@gmail.com> wrote:

Note: I still consider this a bug/missing feature of sorts since the
planner could do better here, and there is no real clean way of
structuring a query to perform efficiently here, which is why I
erroneously cross posted this to hacker initially:

No, it should not be considered a bug or a deficiency. You're telling the
system to use a *computed value* that uses an *arbitrary logic construct *for
*sorting*, and that's *before* you actually give it a filter to work with.
You're also trying to abuse ORDER BY to make LIMIT do filtering. How do you
expect that to go? I would expect the planner to do exactly what you told
it you wanted: sort the entire table by that computed value, and it would
have to do a sequential scan to compute the value for every row before it
knew which ones came first for the LIMIT.

CASE is well known to cause optimization problems; arbitrary conditional
logic isn't especially conducive to a planner optimizing things. In
general, the planner has *no* idea what the logic really means, so the
planner would have to have some kind of special logic trying to pick up on
this case. Your particular use case is uncommon; why should the planner
code be junked up with a thousand little optimizations for uncommon
situations like this (which would make it unmaintainable) when you already
have a reasonable alternative? PG is great for a reason: the devs have made
a lot of fantastic choices in designing it. Trying to keep the planner
relatively simple is one of them, if I recall correctly.

What you want is well represented by a UNION query: you want it to fetch
one particular row by ID, and then you want to tack on 29 other particular
rows based on a particular sort order and possibly an offset. These are two
completely disparate ways of fetching data; of course the most optimal way
is going to be to essentially write two queries and put the results
together. That's also going to be the *clearest* way of writing the query,
so the next person to work on it knows what you were doing.

And that aside, you don't even need to use CASE. You could've just used (id
= 1), which would give you a boolean result. You can most certainly sort by
a boolean. (I believe PG sorts TRUE first.) You should see if that gets
optimized at all. (If it doesn't, I won't be surprised and everything else
I said still holds.)

By the way, you can do better on your UNION query:

select * from topic
where id = 1000
union all
(select * from topic
where id <> 1000
order by bumped_at desc
limit 29)

I imagine your original would be at risk of LIMITing out the very row you
seek to get at the "top", since you don't have an ORDER BY to tell it which
ones to keep during the outer LIMIT.

#9Sam Saffron
sam.saffron@gmail.com
In reply to: BladeOfLight16 (#8)
Re: How do I bump a row to the front of sort efficiently

The union hack may be able to work, but what I want is slightly more complex

I want this to work efficiently, even without the case it chokes:

explain select * from testing order by id in (100,2,-1) desc, id limit 30;
QUERY PLAN
---------------------------------------------------------------------------
Limit (cost=4869.45..4869.52 rows=30 width=9)
-> Sort (cost=4869.45..5119.45 rows=100000 width=9)
Sort Key: ((id = ANY ('{100,2,-1}'::integer[]))), id
-> Seq Scan on testing (cost=0.00..1916.00 rows=100000 width=9)
(4 rows)

I need to be able to offset and limit the union hack in a view, which
is proving very tricky.

On Wed, Feb 4, 2015 at 9:15 PM, BladeOfLight16 <bladeoflight16@gmail.com> wrote:

On Tue, Feb 3, 2015 at 11:28 PM, Sam Saffron <sam.saffron@gmail.com> wrote:

Note: I still consider this a bug/missing feature of sorts since the
planner could do better here, and there is no real clean way of
structuring a query to perform efficiently here, which is why I
erroneously cross posted this to hacker initially:

No, it should not be considered a bug or a deficiency. You're telling the
system to use a computed value that uses an arbitrary logic construct for
sorting, and that's before you actually give it a filter to work with.
You're also trying to abuse ORDER BY to make LIMIT do filtering. How do you
expect that to go? I would expect the planner to do exactly what you told it
you wanted: sort the entire table by that computed value, and it would have
to do a sequential scan to compute the value for every row before it knew
which ones came first for the LIMIT.

CASE is well known to cause optimization problems; arbitrary conditional
logic isn't especially conducive to a planner optimizing things. In general,
the planner has no idea what the logic really means, so the planner would
have to have some kind of special logic trying to pick up on this case. Your
particular use case is uncommon; why should the planner code be junked up
with a thousand little optimizations for uncommon situations like this
(which would make it unmaintainable) when you already have a reasonable
alternative? PG is great for a reason: the devs have made a lot of fantastic
choices in designing it. Trying to keep the planner relatively simple is one
of them, if I recall correctly.

What you want is well represented by a UNION query: you want it to fetch one
particular row by ID, and then you want to tack on 29 other particular rows
based on a particular sort order and possibly an offset. These are two
completely disparate ways of fetching data; of course the most optimal way
is going to be to essentially write two queries and put the results
together. That's also going to be the clearest way of writing the query, so
the next person to work on it knows what you were doing.

And that aside, you don't even need to use CASE. You could've just used (id
= 1), which would give you a boolean result. You can most certainly sort by
a boolean. (I believe PG sorts TRUE first.) You should see if that gets
optimized at all. (If it doesn't, I won't be surprised and everything else I
said still holds.)

By the way, you can do better on your UNION query:

select * from topic
where id = 1000
union all
(select * from topic
where id <> 1000
order by bumped_at desc
limit 29)

I imagine your original would be at risk of LIMITing out the very row you
seek to get at the "top", since you don't have an ORDER BY to tell it which
ones to keep during the outer LIMIT.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10Paul Jungwirth
pj@illuminatedcomputing.com
In reply to: Sam Saffron (#9)
Re: How do I bump a row to the front of sort efficiently

I imagine your original would be at risk of LIMITing out the very row you
seek to get at the "top", since you don't have an ORDER BY to tell it which
ones to keep during the outer LIMIT.

Here is an old thread about combining ORDER BY with UNION:

/messages/by-id/16814.1280268424@sss.pgh.pa.us

So I think this query would work:

select * from topic
where id = 1000
union all
(select * from topic
where id <> 1000
order by bumped_at desc
limit 29)
order by case when id = 1000 then 0 else 1 end, bumped_at desc
;

I need to be able to offset and limit the union hack in a view, which
is proving very tricky.

Since this is sort of a "parameterized view" (which Postgres does not
have) you are probably better off figuring out how to make the UNION
query work with your ORM. What ORM is it? Maybe someone here can help
you with that. Or maybe instead of a view you could write a
set-returning function, e.g. as described here:

http://stackoverflow.com/questions/11401749/pass-in-where-parameters-to-postgresql-view

Paul

--
_________________________________
Pulchritudo splendor veritatis.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11Rémi Cura
remi.cura@gmail.com
In reply to: Paul Jungwirth (#10)
Re: How do I bump a row to the front of sort efficiently

Hey,
I'm not a guru, here is what I understood.
You are mixing several problems in the same question :
- 1. why the planner isn't more efficient
- 2. why the workaround is difficult to use with an ORM.

for 1. you can't do much (as said by others, you don't really need a case
here anyway). I think using a CASE is equivalent for the planner to using
your own custom blackbox function. So no way to improve that.
for 2. : if you can't pass limit and offset in your ORM,
a small workaround is to number your row following the order you defined
with the function row_number() over(your order here),
then you can use your ORM to design where conditions equivalent to limit
and offset :

WHERE row_number BETWEEN your_offset AND your_limit

Cheers,
Rémi-C

2015-02-04 21:40 GMT+01:00 Paul Jungwirth <pj@illuminatedcomputing.com>:

Show quoted text

I imagine your original would be at risk of LIMITing out the very row

you

seek to get at the "top", since you don't have an ORDER BY to tell it

which

ones to keep during the outer LIMIT.

Here is an old thread about combining ORDER BY with UNION:

/messages/by-id/16814.1280268424@sss.pgh.pa.us

So I think this query would work:

select * from topic
where id = 1000
union all
(select * from topic
where id <> 1000
order by bumped_at desc
limit 29)
order by case when id = 1000 then 0 else 1 end, bumped_at desc
;

I need to be able to offset and limit the union hack in a view, which
is proving very tricky.

Since this is sort of a "parameterized view" (which Postgres does not
have) you are probably better off figuring out how to make the UNION
query work with your ORM. What ORM is it? Maybe someone here can help
you with that. Or maybe instead of a view you could write a
set-returning function, e.g. as described here:

http://stackoverflow.com/questions/11401749/pass-in-where-parameters-to-postgresql-view

Paul

--
_________________________________
Pulchritudo splendor veritatis.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12Paul Jungwirth
pj@illuminatedcomputing.com
In reply to: Paul Jungwirth (#10)
Re: How do I bump a row to the front of sort efficiently

Or maybe instead of a view you could write a
set-returning function, e.g. as described here:

I thought I'd see if I could make this work just for fun. Here is a
simple proof of concept (on 9.3):

-- DROP TABLE IF EXISTS topics;
CREATE TABLE topics (
id INTEGER PRIMARY KEY,
bumped_at INTEGER NOT NULL
);
INSERT INTO topics
SELECT a, a * 2
FROM generate_series(1, 1000) s(a)
;

CREATE OR REPLACE FUNCTION topics_sorted_after_id(INT, INT)
RETURNS TABLE(id int, after_top int, bumped_at int)
AS $$
SELECT id, 0 AS after_top, bumped_at
FROM topics
WHERE id = $1
UNION ALL
(SELECT id, 1 AS after_top, bumped_at
FROM topics
WHERE id IS DISTINCT FROM $1
ORDER BY bumped_at DESC
LIMIT $2 - 1)
ORDER BY after_top, bumped_at DESC
$$
LANGUAGE sql;

SELECT * FROM topics_sorted_after_id(45, 30);

That looks to me like it gives the right results. I'm curious if
RETURNS TABLE is the right approach to use here or if there is
something nicer.

What if the ORM insists on `FROM topics`? Is there any way to rewrite
the query or function to work around that?

Paul

--
_________________________________
Pulchritudo splendor veritatis.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general