Can simplify 'limit 1' with slow function?

Started by gotoschool6gover 11 years ago12 messages
#1gotoschool6g
gotoschool6g@gmail.com

The simplified scene:
select slowfunction(s) from a order by b limit 1;
is slow than
select slowfunction(s) from (select s from a order by b limit 1) as z;
if there are many records in table 'a'.

The real scene. Function ST_Distance_Sphere is slow, the query:
SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road order by c limit 1;
is slow than:
select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT s from road order by c limit 1) as a;
There are about 7000 records in 'road'.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Martijn van Oosterhout
kleptog@svana.org
In reply to: gotoschool6g (#1)
Re: Can simplify 'limit 1' with slow function?

On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote:

The simplified scene:
select slowfunction(s) from a order by b limit 1;
is slow than
select slowfunction(s) from (select s from a order by b limit 1) as z;
if there are many records in table 'a'.

The real scene. Function ST_Distance_Sphere is slow, the query:
SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road order by c limit 1;
is slow than:
select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT s from road order by c limit 1) as a;
There are about 7000 records in 'road'.

I think to help here I think we need the EXPLAIN ANALYSE output for
both queries.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

He who writes carelessly confesses thereby at the very outset that he does
not attach much importance to his own thoughts.

-- Arthur Schopenhauer

#3Merlin Moncure
mmoncure@gmail.com
In reply to: Martijn van Oosterhout (#2)
Re: Can simplify 'limit 1' with slow function?

On Tue, Jul 1, 2014 at 2:16 PM, Martijn van Oosterhout
<kleptog@svana.org> wrote:

On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote:

The simplified scene:
select slowfunction(s) from a order by b limit 1;
is slow than
select slowfunction(s) from (select s from a order by b limit 1) as z;
if there are many records in table 'a'.

The real scene. Function ST_Distance_Sphere is slow, the query:
SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road order by c limit 1;
is slow than:
select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT s from road order by c limit 1) as a;
There are about 7000 records in 'road'.

I think to help here I think we need the EXPLAIN ANALYSE output for
both queries.

Well, I think the problem is a well understood one: there is no
guarantee that functions-in-select-list are called exactly once per
output row. This is documented -- for example see here:
http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS.
In short, if you want very precise control of function evaluation use
a subquery, or, if you're really paranoid, a CTE.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4David G Johnston
david.g.johnston@gmail.com
In reply to: Merlin Moncure (#3)
Re: Can simplify 'limit 1' with slow function?

Merlin Moncure-2 wrote

On Tue, Jul 1, 2014 at 2:16 PM, Martijn van Oosterhout
&lt;

kleptog@

&gt; wrote:

On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote:

The simplified scene:
select slowfunction(s) from a order by b limit 1;
is slow than
select slowfunction(s) from (select s from a order by b limit 1) as z;
if there are many records in table 'a'.

The real scene. Function ST_Distance_Sphere is slow, the query:
SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road
order by c limit 1;
is slow than:
select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT
s from road order by c limit 1) as a;
There are about 7000 records in 'road'.

I think to help here I think we need the EXPLAIN ANALYSE output for
both queries.

Well, I think the problem is a well understood one: there is no
guarantee that functions-in-select-list are called exactly once per
output row. This is documented -- for example see here:
http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS.
In short, if you want very precise control of function evaluation use
a subquery, or, if you're really paranoid, a CTE.

merlin

I would have to disagree on the "this is documented" comment - the linked
section on advisory locks does not constitute documentation of the fact that
limit can be applied after expressions in the select-list are evaluated.

http://www.postgresql.org/docs/9.3/static/sql-select.html

In the select command documentation item 5 covers select-list evaluation
while item 9 covers limit thus implying what we are saying - though keep in
mind each select statement gets processed independently and possibly in a
correlated fashion (i.e. potentially multiple times).

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Can-simplify-limit-1-with-slow-function-tp5809997p5810061.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Merlin Moncure
mmoncure@gmail.com
In reply to: David G Johnston (#4)
Re: Can simplify 'limit 1' with slow function?

On Tue, Jul 1, 2014 at 3:06 PM, David G Johnston
<david.g.johnston@gmail.com> wrote:

Merlin Moncure-2 wrote

On Tue, Jul 1, 2014 at 2:16 PM, Martijn van Oosterhout
&lt;

kleptog@

&gt; wrote:

On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote:

The simplified scene:
select slowfunction(s) from a order by b limit 1;
is slow than
select slowfunction(s) from (select s from a order by b limit 1) as z;
if there are many records in table 'a'.

The real scene. Function ST_Distance_Sphere is slow, the query:
SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road
order by c limit 1;
is slow than:
select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT
s from road order by c limit 1) as a;
There are about 7000 records in 'road'.

I think to help here I think we need the EXPLAIN ANALYSE output for
both queries.

Well, I think the problem is a well understood one: there is no
guarantee that functions-in-select-list are called exactly once per
output row. This is documented -- for example see here:
http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS.
In short, if you want very precise control of function evaluation use
a subquery, or, if you're really paranoid, a CTE.

merlin

I would have to disagree on the "this is documented" comment - the linked
section on advisory locks does not constitute documentation of the fact that
limit can be applied after expressions in the select-list are evaluated.

http://www.postgresql.org/docs/9.3/static/sql-select.html

In the select command documentation item 5 covers select-list evaluation
while item 9 covers limit thus implying what we are saying - though keep in
mind each select statement gets processed independently and possibly in a
correlated fashion (i.e. potentially multiple times).

Sure, although I did not claim that..the select documentation *does*
cover this behavior but I find the syntax driven doc pages to be
fairly arcane and unhelpful -- they don't say (for the most part)
"avoid this" or "do that". I pointed out this particular section
because it proved an example that matched the OP's problem case.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Martijn van Oosterhout
kleptog@svana.org
In reply to: Merlin Moncure (#3)
Re: Can simplify 'limit 1' with slow function?

On Tue, Jul 01, 2014 at 02:36:55PM -0500, Merlin Moncure wrote:

On Tue, Jul 1, 2014 at 2:16 PM, Martijn van Oosterhout
<kleptog@svana.org> wrote:

On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote:

The simplified scene:
select slowfunction(s) from a order by b limit 1;
is slow than
select slowfunction(s) from (select s from a order by b limit 1) as z;
if there are many records in table 'a'.

The real scene. Function ST_Distance_Sphere is slow, the query:
SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road order by c limit 1;
is slow than:
select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT s from road order by c limit 1) as a;
There are about 7000 records in 'road'.

I think to help here I think we need the EXPLAIN ANALYSE output for
both queries.

Well, I think the problem is a well understood one: there is no
guarantee that functions-in-select-list are called exactly once per
output row. This is documented -- for example see here:
http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS.
In short, if you want very precise control of function evaluation use
a subquery, or, if you're really paranoid, a CTE.

I'm probably dense, but I'm not sure I understand. Or it is that the
slowfunction() is called prior to the sort? That seems insane.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

He who writes carelessly confesses thereby at the very outset that he does
not attach much importance to his own thoughts.

-- Arthur Schopenhauer

#7David G Johnston
david.g.johnston@gmail.com
In reply to: Martijn van Oosterhout (#6)
Re: Can simplify 'limit 1' with slow function?

Martijn van Oosterhout wrote

On Tue, Jul 01, 2014 at 02:36:55PM -0500, Merlin Moncure wrote:

On Tue, Jul 1, 2014 at 2:16 PM, Martijn van Oosterhout
&lt;

kleptog@

&gt; wrote:

On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote:

The simplified scene:
select slowfunction(s) from a order by b limit 1;
is slow than
select slowfunction(s) from (select s from a order by b limit 1) as z;
if there are many records in table 'a'.

The real scene. Function ST_Distance_Sphere is slow, the query:
SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road

order by c limit 1;

is slow than:
select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from

(SELECT s from road order by c limit 1) as a;

There are about 7000 records in 'road'.

I think to help here I think we need the EXPLAIN ANALYSE output for
both queries.

Well, I think the problem is a well understood one: there is no
guarantee that functions-in-select-list are called exactly once per
output row. This is documented -- for example see here:
http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS.
In short, if you want very precise control of function evaluation use
a subquery, or, if you're really paranoid, a CTE.

I'm probably dense, but I'm not sure I understand. Or it is that the
slowfunction() is called prior to the sort? That seems insane.

The basic reality is that limit applies to the final set of rows that could
be output. Since stuff like group by and distinct require knowledge of the
exact values of every output column all expressions must necessarily be
evaluated before limit.

If you want to pick just 10 rows and then process them you need a subquery.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Can-simplify-limit-1-with-slow-function-tp5809997p5810297.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G Johnston (#7)
Re: Can simplify 'limit 1' with slow function?

David G Johnston <david.g.johnston@gmail.com> writes:

Martijn van Oosterhout wrote

I'm probably dense, but I'm not sure I understand. Or it is that the
slowfunction() is called prior to the sort? That seems insane.

The basic reality is that limit applies to the final set of rows that could
be output.

It's not so much the limit as that the sort has to happen before the
limit, and yes, evaluation of the targetlist happens before the sort.

This is fundamental to the SQL conceptual model; remember that SQL92 had
"SELECT slowfunction(), ... ORDER BY 1", which certainly requires the
function to be evaluated before the sort happens. And there's nothing in
the conceptual model suggesting that different targetlist entries should
be evaluated at different times, so just ordering by something other than
the slowfunction() entry doesn't get you out of that.

I'm not sure how much of this there is chapter and verse for in the
SQL standard, but ISTM the stage sequencing we lay out in our SELECT
reference page is pretty much forced by the standard.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Martijn van Oosterhout
kleptog@svana.org
In reply to: Tom Lane (#8)
Re: Can simplify 'limit 1' with slow function?

On Wed, Jul 02, 2014 at 04:17:13PM -0400, Tom Lane wrote:

David G Johnston <david.g.johnston@gmail.com> writes:

Martijn van Oosterhout wrote

I'm probably dense, but I'm not sure I understand. Or it is that the
slowfunction() is called prior to the sort? That seems insane.

The basic reality is that limit applies to the final set of rows that could
be output.

It's not so much the limit as that the sort has to happen before the
limit, and yes, evaluation of the targetlist happens before the sort.

I guess I assumed the column c was indexable, and it that case I
beleive the slowfunction() would indeed only be called once.

This is fundamental to the SQL conceptual model; remember that SQL92 had
"SELECT slowfunction(), ... ORDER BY 1", which certainly requires the
function to be evaluated before the sort happens. And there's nothing in
the conceptual model suggesting that different targetlist entries should
be evaluated at different times, so just ordering by something other than
the slowfunction() entry doesn't get you out of that.

I'm not sure how much of this there is chapter and verse for in the
SQL standard, but ISTM the stage sequencing we lay out in our SELECT
reference page is pretty much forced by the standard.

In the conceptual model the limit must happen after the select. But as
an optimisation moving the evaluation above the limit node (when
possible) should always be a win.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

He who writes carelessly confesses thereby at the very outset that he does
not attach much importance to his own thoughts.

-- Arthur Schopenhauer

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#9)
Re: Can simplify 'limit 1' with slow function?

Martijn van Oosterhout <kleptog@svana.org> writes:

On Wed, Jul 02, 2014 at 04:17:13PM -0400, Tom Lane wrote:

It's not so much the limit as that the sort has to happen before the
limit, and yes, evaluation of the targetlist happens before the sort.

I guess I assumed the column c was indexable, and it that case I
beleive the slowfunction() would indeed only be called once.

There are cases where we can avoid an explicit sort step by relying on
some earlier phase of the processing pipeline to generate the rows in the
right order to begin with. Evidently this wasn't one of them though :-(.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11gotoschool6g
gotoschool6g@gmail.com
In reply to: Martijn van Oosterhout (#6)
Re: Can simplify 'limit 1' with slow function?

slow query(8531 ms):
SELECT ST_Distance_Sphere(shape,ST_GeomFromText('POINT(116.41386186784513 40.12211338311868)')) FROM road order by id LIMIT 1;

explain output:
"Limit (cost=4653.48..4653.48 rows=1 width=3612)"
" -> Sort (cost=4653.48..4683.06 rows=11832 width=3612)"
" Sort Key: id"
" -> Seq Scan on road (cost=0.00..4594.32 rows=11832 width=3612)"

fast query(16ms):
select ST_Distance_Sphere(shape,ST_GeomFromText('POINT(116.41386186784513 40.12211338311868)')) from (SELECT shape FROM road order by id LIMIT 1) a

explain output:
"Subquery Scan on a (cost=1695.48..1695.74 rows=1 width=3608)"
" -> Limit (cost=1695.48..1695.48 rows=1 width=3612)"
" -> Sort (cost=1695.48..1725.06 rows=11832 width=3612)"
" Sort Key: road.id"
" -> Seq Scan on road (cost=0.00..1636.32 rows=11832 width=3612)"

CREATE TABLE road
(
shape geometry,
id integer
)
WITH (
OIDS=FALSE
);

There are redundant call when sorting?

On Tue, Jul 1, 2014 at 2:16 PM, Martijn van Oosterhout
<kleptog@svana.org> wrote:

On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote:

The simplified scene:
select slowfunction(s) from a order by b limit 1;
is slow than
select slowfunction(s) from (select s from a order by b limit 1) as z;
if there are many records in table 'a'.

The real scene. Function ST_Distance_Sphere is slow, the query:
SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road order by c limit 1;
is slow than:
select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT s from road order by c limit 1) as a;
There are about 7000 records in 'road'.

I think to help here I think we need the EXPLAIN ANALYSE output for
both queries.

Well, I think the problem is a well understood one: there is no
guarantee that functions-in-select-list are called exactly once per
output row. This is documented -- for example see here:
http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS.
In short, if you want very precise control of function evaluation use
a subquery, or, if you're really paranoid, a CTE.

I'm probably dense, but I'm not sure I understand. Or it is that the
slowfunction() is called prior to the sort? That seems insane.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

He who writes carelessly confesses thereby at the very outset that he does
not attach much importance to his own thoughts.

-- Arthur Schopenhauer
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Martijn van Oosterhout
kleptog@svana.org
In reply to: gotoschool6g (#11)
Re: Can simplify 'limit 1' with slow function?

Fascinating.

On Fri, Jul 04, 2014 at 10:47:06AM +0800, gotoschool6g wrote:

slow query(8531 ms):
SELECT ST_Distance_Sphere(shape,ST_GeomFromText('POINT(116.41386186784513 40.12211338311868)')) FROM road order by id LIMIT 1;

explain output:
"Limit (cost=4653.48..4653.48 rows=1 width=3612)"
" -> Sort (cost=4653.48..4683.06 rows=11832 width=3612)"
" Sort Key: id"
" -> Seq Scan on road (cost=0.00..4594.32 rows=11832 width=3612)"

fast query(16ms):
select ST_Distance_Sphere(shape,ST_GeomFromText('POINT(116.41386186784513 40.12211338311868)')) from (SELECT shape FROM road order by id LIMIT 1) a

explain output:
"Subquery Scan on a (cost=1695.48..1695.74 rows=1 width=3608)"
" -> Limit (cost=1695.48..1695.48 rows=1 width=3612)"
" -> Sort (cost=1695.48..1725.06 rows=11832 width=3612)"
" Sort Key: road.id"
" -> Seq Scan on road (cost=0.00..1636.32 rows=11832 width=3612)"

So Postgres knows perfectly well that it's expensive, it just doesn't
appear to understand it has the option of moving the calculation above
the limit.

In this case though, it seems an index on road(id) would make it
instant in any case.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

He who writes carelessly confesses thereby at the very outset that he does
not attach much importance to his own thoughts.

-- Arthur Schopenhauer