limit-offset different result sets with same query

Started by Emanuel Calvo Francoalmost 17 years ago14 messagesgeneral
Jump to latest
#1Emanuel Calvo Franco
postgres.arg@gmail.com

Hi all.

I'll make this faster.

I hace this table and this function:

CREATE FUNCTION pg_round_random_range(integer, integer) RETURNS integer
LANGUAGE plperl IMMUTABLE STRICT
AS $_X$
my($imin, $imax) = @_;
if ($_[0] == $_[1]){
return $_[0];}
if($imin > $imax){
$imin = $_[1];
$imax = $_[0];}
$_number_ = ( (rand) * ($imax + 1));
while (($_number_ < $imin) && ( $_number_ > $imax)){
$_number_ = ( (rand) * ($imax + 1));}
return sprintf "%d",$_number_;

$_X$;

CREATE TABLE datos (
texto text DEFAULT md5((random())::text),
entero2 smallint DEFAULT (rpad((hashtext((random())::text))::text,
4))::smallint,
entero4 integer DEFAULT (lpad(((hashtext((random())::text))::text
|| replace((hashtext((random())::text))::text, '-'::text, ''::text)),
9))::integer,
entero8 bigint DEFAULT (rpad(((hashtext((random())::text))::text
|| replace((hashtext((random())::text))::text, '-'::text, ''::text)),
19))::bigint,
"float" double precision DEFAULT ((random() * (1000)::double
precision) + random()),
fecha date DEFAULT (now())::date,
tiempo timestamp without time zone DEFAULT now(),
ztiempo timestamp with time zone DEFAULT now(),
ip cidr DEFAULT ((((((((pg_round_random_range(0, 255))::text ||
'.'::text) || (pg_round_random_range(0, 255))::text) || '.'::text) ||
(pg_round_random_range(0, 255))::text) || '.'::text) ||
(pg_round_random_range(0, 255))::text))::cidr
);

I insert several tuples to make a test with the sentence 'insert into
datos default values'.
This table don't have indexes. There are no users connected exept me
(is a local and
test database).
When i have ~160000 regs i start to make some querys.

Executing 'select * from datos limit 1 offset 150000' two times i have different
result sets.
When i execute 'explain analyze verbose <query>' i see that (as
expected) the seq scan
is occurring.

Examples:

parapruebas=# select entero8 from datos limit 1 offset 2;
entero8
--------------------
477808241937806077
(1 row)

parapruebas=# select entero8 from datos limit 1 offset 2;
entero8
--------------------
477808241937806077
(1 row)
QUERY PLAN

----------------------------------------------------------------------------------------------------
Limit (cost=0.05..0.07 rows=1 width=8) (actual time=0.033..0.036
rows=1 loops=1)
-> Seq Scan on datos (cost=0.00..4128.00 rows=172800 width=8)
(actual time=0.014..0.020 rows=3
loops=1)
Total runtime: 0.107 ms
(3 rows)

In this case, on a small offset the result set returns the same. But
in higher offsets:

parapruebas=# select entero8 from datos limit 1 offset 100000;
entero8
--------------------
-82136193203177195
(1 row)

parapruebas=# select entero8 from datos limit 1 offset 100000;
entero8
---------------------
1201794554456297856
(1 row)
QUERY PLAN

----------------------------------------------------------------------------------------------------
Limit (cost=2388.89..2388.91 rows=1 width=8) (actual
time=622.198..622.201 rows=1 loops=1)
-> Seq Scan on datos (cost=0.00..4128.00 rows=172800 width=8)
(actual time=0.014..356.800 rows=
100001 loops=1)
Total runtime: 622.247 ms
(3 rows)

That's correct? Is logical that if the scan is sequential in the
physical table returns differents
data?

I test it on 8.4 beta1 and 8.3.5 and 8.3.7 with the same results.

Thanks in advance.

--
Emanuel Calvo Franco
Sumate al ARPUG !
( www.arpug.com.ar)
ArPUG / AOSUG Member

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Emanuel Calvo Franco (#1)
Re: limit-offset different result sets with same query

Emanuel Calvo Franco <postgres.arg@gmail.com> writes:

Executing 'select * from datos limit 1 offset 150000' two times i have different
result sets.

The "synchronous scan" logic is probably responsible. Turn off
synchronize_seqscans if this behavior bothers you.

regards, tom lane

#3David Fetter
david@fetter.org
In reply to: Emanuel Calvo Franco (#1)
Re: limit-offset different result sets with same query

On Fri, May 08, 2009 at 06:10:18PM -0300, Emanuel Calvo Franco wrote:

Hi all.

I'll make this faster.

I hace this table and this function:

You should only ever assume that your SELECT's output will have a
particular ordering when you include an ORDER BY clause that actually
specifies the order well enough :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#4Emanuel Calvo Franco
postgres.arg@gmail.com
In reply to: Tom Lane (#2)
Re: limit-offset different result sets with same query

2009/5/8 Tom Lane <tgl@sss.pgh.pa.us>:

Emanuel Calvo Franco <postgres.arg@gmail.com> writes:

Executing 'select * from datos limit 1 offset 150000' two times i have different
result sets.

The "synchronous scan" logic is probably responsible.  Turn off
synchronize_seqscans if this behavior bothers you.

                       regards, tom lane

It works Tom,
Thanks!

--
Emanuel Calvo Franco
Sumate al ARPUG !
( www.arpug.com.ar)
ArPUG / AOSUG Member

#5Adam Rich
adam.r@sbcglobal.net
In reply to: Emanuel Calvo Franco (#1)
Re: limit-offset different result sets with same query

Emanuel Calvo Franco wrote:

Executing 'select * from datos limit 1 offset 150000' two times i

have different

result sets.
When i execute 'explain analyze verbose <query>' i see that (as
expected) the seq scan
is occurring.

That's correct? Is logical that if the scan is sequential in the
physical table returns differents
data?

I test it on 8.4 beta1 and 8.3.5 and 8.3.7 with the same results.

Thanks in advance.

Emanuel,
LIMIT and OFFSET are stable only when you have "ORDER BY" on unique
values. Without that, the database is free to return the rows in
whatever order it deems best, which gives unpredictable results when
combined with LIMIT/OFFSET.

Adam

#6Emanuel Calvo Franco
postgres.arg@gmail.com
In reply to: David Fetter (#3)
Re: limit-offset different result sets with same query

2009/5/8 David Fetter <david@fetter.org>:

On Fri, May 08, 2009 at 06:10:18PM -0300, Emanuel Calvo Franco wrote:

Hi all.

I'll make this faster.

I hace this table and this function:

You should only ever assume that your SELECT's output will have a
particular ordering when you include an ORDER BY clause that actually
specifies the order well enough :)

I test it in the first time :)

With the 'order by' it works well, but in 'theory' if you
run sequentially and physically a table, you expect obtain the same
results with a same query.
There is no indexes that can intersect the results or inherits
tables.

But with the option synchronize_seqscans in off, it works like
i expected :)

i will study a bit more this option on monday (like always).

--
Emanuel Calvo Franco
Sumate al ARPUG !
( www.arpug.com.ar)
ArPUG / AOSUG Member

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Fetter (#3)
Re: limit-offset different result sets with same query

David Fetter escribi�:

On Fri, May 08, 2009 at 06:10:18PM -0300, Emanuel Calvo Franco wrote:

Hi all.

I'll make this faster.

I hace this table and this function:

You should only ever assume that your SELECT's output will have a
particular ordering when you include an ORDER BY clause that actually
specifies the order well enough :)

Yeah, we went over this on the spanish list, turned out that I couldn't
remember about syncscan :-)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#8David Fetter
david@fetter.org
In reply to: Emanuel Calvo Franco (#6)
Re: limit-offset different result sets with same query

On Fri, May 08, 2009 at 06:40:33PM -0300, Emanuel Calvo Franco wrote:

2009/5/8 David Fetter <david@fetter.org>:

On Fri, May 08, 2009 at 06:10:18PM -0300, Emanuel Calvo Franco wrote:

Hi all.

I'll make this faster.

I hace this table and this function:

You should only ever assume that your SELECT's output will have a
particular ordering when you include an ORDER BY clause that
actually specifies the order well enough :)

I test it in the first time :)

With the 'order by' it works well, but in 'theory'

The theory under which you should operate is that the underlying
implementation only gives you the orderings you ask for. This way,
when other beneficial implementation changes happen, they will not
surprise you. :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#9Emanuel Calvo Franco
postgres.arg@gmail.com
In reply to: David Fetter (#8)
Re: limit-offset different result sets with same query

2009/5/8 David Fetter <david@fetter.org>:

On Fri, May 08, 2009 at 06:40:33PM -0300, Emanuel Calvo Franco wrote:

I test it in the first time :)

With the 'order by' it works well, but in 'theory'

The theory under which you should operate is that the underlying
implementation only gives you the orderings you ask for. This way,
when other beneficial implementation changes happen, they will not
surprise you. :)

Yeap. now it have more sense.
Is a question of performance and to take notice of that.

Cool.
Thanks to all!,

--
Emanuel Calvo Franco
Sumate al ARPUG !
( www.arpug.com.ar)
ArPUG / AOSUG Member

#10Merlin Moncure
mmoncure@gmail.com
In reply to: Alvaro Herrera (#7)
Re: limit-offset different result sets with same query

On Fri, May 8, 2009 at 5:40 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

David Fetter escribió:

On Fri, May 08, 2009 at 06:10:18PM -0300, Emanuel Calvo Franco wrote:

Hi all.

I'll make this faster.

I hace this table and this function:

You should only ever assume that your SELECT's output will have a
particular ordering when you include an ORDER BY clause that actually
specifies the order well enough :)

Yeah, we went over this on the spanish list, turned out that I couldn't
remember about syncscan :-)

I like the new behavior. It really encourages proper use of order by,
because the natural ordering results are effectively randomized. A
class of subtle bugs has been made obvious. :)

merlin

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#10)
Re: limit-offset different result sets with same query

Merlin Moncure <mmoncure@gmail.com> writes:

On Fri, May 8, 2009 at 5:40 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Yeah, we went over this on the spanish list, turned out that I couldn't
remember about syncscan :-)

I like the new behavior. It really encourages proper use of order by,
because the natural ordering results are effectively randomized. A
class of subtle bugs has been made obvious. :)

Not really, because the syncscan behavior only kicks in when your table
gets large ... you'll never see it during devel testing on toy tables ...

regards, tom lane

#12David Fetter
david@fetter.org
In reply to: Tom Lane (#11)
Re: limit-offset different result sets with same query

On Sat, May 09, 2009 at 01:28:03PM -0400, Tom Lane wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

On Fri, May 8, 2009 at 5:40 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Yeah, we went over this on the spanish list, turned out that I
couldn't remember about syncscan :-)

I like the new behavior. It really encourages proper use of order
by, because the natural ordering results are effectively
randomized. A class of subtle bugs has been made obvious. :)

Not really, because the syncscan behavior only kicks in when your
table gets large ... you'll never see it during devel testing on toy
tables ...

Good point. It's important not to test only on toy-sized tables for
lots and lots of good reasons, scale-dependence of sync scans being a
small one.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#13Scott Marlowe
scott.marlowe@gmail.com
In reply to: David Fetter (#12)
Re: limit-offset different result sets with same query

On Sun, May 10, 2009 at 2:03 AM, David Fetter <david@fetter.org> wrote:

On Sat, May 09, 2009 at 01:28:03PM -0400, Tom Lane wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

On Fri, May 8, 2009 at 5:40 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Yeah, we went over this on the spanish list, turned out that I
couldn't remember about syncscan :-)

I like the new behavior.  It really encourages proper use of order
by, because the natural ordering results are effectively
randomized.  A class of subtle bugs has been made obvious.  :)

Not really, because the syncscan behavior only kicks in when your
table gets large ... you'll never see it during devel testing on toy
tables ...

Good point.  It's important not to test only on toy-sized tables for
lots and lots of good reasons, scale-dependence of sync scans being a
small one.

Last job I was at I was the lone pgsql guy who worked with three
Oracle DBAs, and quite a few of them were caught off guard by this
type of behaviour (it was with hash_agg and reporting queries with
group by).

#14Emanuel Calvo Franco
postgres.arg@gmail.com
In reply to: Tom Lane (#11)
Re: limit-offset different result sets with same query

2009/5/9 Tom Lane <tgl@sss.pgh.pa.us>:

Merlin Moncure <mmoncure@gmail.com> writes:

On Fri, May 8, 2009 at 5:40 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Yeah, we went over this on the spanish list, turned out that I couldn't
remember about syncscan :-)

I like the new behavior.  It really encourages proper use of order by,
because the natural ordering results are effectively randomized.  A
class of subtle bugs has been made obvious.  :)

Not really, because the syncscan behavior only kicks in when your table
gets large ... you'll never see it during devel testing on toy tables ...

                       regards, tom lane

Yeap. If you see one of the test i made, you'll see this switch over the
~100 regs.

--
Emanuel Calvo Franco
Sumate al ARPUG !
( www.arpug.com.ar)
ArPUG / AOSUG Member