BUG #6483: Rows being evaluated, although being outside the LIMIT / OFFSET boundaries

Started by Kouber Saparevabout 14 years ago7 messagesbugs
Jump to latest
#1Kouber Saparev
kouber@saparev.com

The following bug has been logged on the website:

Bug reference: 6483
Logged by: Kouber Saparev
Email address: kouber@saparev.com
PostgreSQL version: 9.1.2
Operating system: Debian
Description:

The rows of a SELECT statement are being evaluated, even when not shown in
the final result, when using an OFFSET > 0. Although I know that LIMIT is
imposed just before flushing the result set to the client, this behaviour
seems quite confusing, especially when using DML statements in the field
list of the SELECT itself.

CREATE TABLE xxx (id INT);

CREATE FUNCTION f(xxx) RETURNS VOID AS $$
BEGIN
-- imagine some DML statements here --

RAISE NOTICE '%', $1.id;
END;
$$ LANGUAGE PLPGSQL;

INSERT INTO xxx VALUES (1), (2), (3), (4), (5);

-- shows a notice for 1 and 2
SELECT x.id, f(x) FROM xxx as x LIMIT 2;

-- shows a notice for 1, 2, 3 and 4
SELECT x.id, f(x) FROM xxx as x LIMIT 2 OFFSET 2;

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Kouber Saparev (#1)
Re: BUG #6483: Rows being evaluated, although being outside the LIMIT / OFFSET boundaries

Hello

this is not bug

"OFFSET" doesn't mean go to line n - it means - first n lines don't
send to client.

Regards

Pavel Stehule

2012/2/22 <kouber@saparev.com>:

Show quoted text

The following bug has been logged on the website:

Bug reference:      6483
Logged by:          Kouber Saparev
Email address:      kouber@saparev.com
PostgreSQL version: 9.1.2
Operating system:   Debian
Description:

The rows of a SELECT statement are being evaluated, even when not shown in
the final result, when using an OFFSET > 0. Although I know that LIMIT is
imposed just before flushing the result set to the client, this behaviour
seems quite confusing, especially when using DML statements in the field
list of the SELECT itself.

CREATE TABLE xxx (id INT);

CREATE FUNCTION f(xxx) RETURNS VOID AS $$
BEGIN
 -- imagine some DML statements here --

 RAISE NOTICE '%', $1.id;
END;
$$ LANGUAGE PLPGSQL;

INSERT INTO xxx VALUES (1), (2), (3), (4), (5);

-- shows a notice for 1 and 2
SELECT x.id, f(x) FROM xxx as x LIMIT 2;

-- shows a notice for 1, 2, 3 and 4
SELECT x.id, f(x) FROM xxx as x LIMIT 2 OFFSET 2;

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Marti Raudsepp
marti@juffo.org
In reply to: Kouber Saparev (#1)
Re: BUG #6483: Rows being evaluated, although being outside the LIMIT / OFFSET boundaries

On Wed, Feb 22, 2012 at 20:53, <kouber@saparev.com> wrote:

-- shows a notice for 1, 2, 3 and 4
SELECT x.id, f(x) FROM xxx as x LIMIT 2 OFFSET 2;

Currently the way to fix this is to use a subquery that acts as an
optimization barrier in the presence of OFFSET:
SELECT x.id, f(x) FROM (SELECT * FROM xxx as x LIMIT 2 OFFSET 2) as xxx;

The rows of a SELECT statement are being evaluated, even when not shown in
the final result, when using an OFFSET > 0. Although I know that LIMIT is
imposed just before flushing the result set to the client, this behaviour
seems quite confusing, especially when using DML statements in the field
list of the SELECT itself.

Interesting, the model for evaluating queries is documented here:
http://www.postgresql.org/docs/9.1/static/sql-select.html

According to this model, evaluating SELECT clause fields for *all*
found rows is done in step 5, whereas LIMIT/OFFSET are only applied
later at step 9. So we're already bending the rules here (in general
we don't do such optimizations around volatile functions). The worst
thing is that it's inconsistent -- the LIMIT gets applied when
computing the SELECT list, but OFFSET doesn't.

In theory we could bend the model even more -- to push SELECT list
fields below the "Limit" node if they aren't referenced by ORDER and
there are no set operations. However, adapting the model to back to
this behavior seems rather impossible -- ORDER BY must be strictly
evaluated after SELECT list (it can refer to SELECT fields), and LIMIT
must be evaluated after ORDER BY, otherwise it makes no sense.

Or going the other way -- we could make it evaluate all rows if the
SELECT list if it contains volatile functions, and then apply the
LIMIT afterwards. That would go even more against "common sense", but
at least it would be "correct" :)

Regards,
Marti

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marti Raudsepp (#3)
Re: BUG #6483: Rows being evaluated, although being outside the LIMIT / OFFSET boundaries

Marti Raudsepp <marti@juffo.org> writes:

According to this model, evaluating SELECT clause fields for *all*
found rows is done in step 5, whereas LIMIT/OFFSET are only applied
later at step 9. So we're already bending the rules here (in general
we don't do such optimizations around volatile functions). The worst
thing is that it's inconsistent -- the LIMIT gets applied when
computing the SELECT list, but OFFSET doesn't.

On what grounds do you say that? LIMIT and OFFSET are practically the
same thing internally, and are certainly applied in the same way.

regards, tom lane

#5Marti Raudsepp
marti@juffo.org
In reply to: Tom Lane (#4)
Re: BUG #6483: Rows being evaluated, although being outside the LIMIT / OFFSET boundaries

On Wed, Feb 22, 2012 at 23:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Marti Raudsepp <marti@juffo.org> writes:

According to this model, evaluating SELECT clause fields for *all*
found rows is done in step 5, whereas LIMIT/OFFSET are only applied
later at step 9. So we're already bending the rules here (in general
we don't do such optimizations around volatile functions). The worst
thing is that it's inconsistent -- the LIMIT gets applied when
computing the SELECT list, but OFFSET doesn't.

On what grounds do you say that?  LIMIT and OFFSET are practically the
same thing internally, and are certainly applied in the same way.

The difference is that the SELECT fields for the first OFFSET rows are
*evaluated*, but aren't simply returned to the client. But beyond
LIMIT, query evaluation terminates entirely -- the rest of the SELECT
clause rows aren't evaluated.

AFAICT, the model in the documentation suggests that the SELECT fields
are evaluated for all matching rows in indeterminate order, before
ORDER BY is applied and before the result set is sliced by
OFFSET/LIMIT.

Regards,
Marti

#6Kouber Saparev
postgres@saparev.com
In reply to: Marti Raudsepp (#5)
Re: BUG #6483: Rows being evaluated, although being outside the LIMIT / OFFSET boundaries

On 02/23/2012 12:05 AM, Marti Raudsepp wrote:

On Wed, Feb 22, 2012 at 23:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Marti Raudsepp <marti@juffo.org> writes:

According to this model, evaluating SELECT clause fields for *all*
found rows is done in step 5, whereas LIMIT/OFFSET are only applied
later at step 9. So we're already bending the rules here (in general
we don't do such optimizations around volatile functions). The worst
thing is that it's inconsistent -- the LIMIT gets applied when
computing the SELECT list, but OFFSET doesn't.

On what grounds do you say that? LIMIT and OFFSET are practically the
same thing internally, and are certainly applied in the same way.

The difference is that the SELECT fields for the first OFFSET rows are
*evaluated*, but aren't simply returned to the client. But beyond
LIMIT, query evaluation terminates entirely -- the rest of the SELECT
clause rows aren't evaluated.

AFAICT, the model in the documentation suggests that the SELECT fields
are evaluated for all matching rows in indeterminate order, before
ORDER BY is applied and before the result set is sliced by
OFFSET/LIMIT.

Indeed, that's probably the main issue - it is not behaving
symmetrically, i.e. fetching the first two rows has one effect (and
performance impact), while fetching the last two - completely different.

In my case, I am making something like an "ON SELECT" rule, triggering
some actions once the rows are read (and sent to the client) from a
SELECT statement. The thing is that "read" and "sent to the client"
appear to be two different things in that case. While I will certainly
use a subquery for it, as proposed by Marti (since real cursors are not
an option in my stateless web environment), I do believe that at least
the documentation should be more clear concerning cases like that (if
the behaviour stays that way).

Regards,
--
Kouber Saparev

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marti Raudsepp (#5)
Re: BUG #6483: Rows being evaluated, although being outside the LIMIT / OFFSET boundaries

Marti Raudsepp <marti@juffo.org> writes:

AFAICT, the model in the documentation suggests that the SELECT fields
are evaluated for all matching rows in indeterminate order, before
ORDER BY is applied and before the result set is sliced by
OFFSET/LIMIT.

That is in fact the case if you have a query plan that involves a Sort
node followed by Limit. We have some optimizations that avoid the need
for an explicit sort, but it would be pretty hard to write a
specification for exactly when unretrieved rows will not be evaluated.

regards, tom lane