need clarification on CTE/join

Started by Marc Millasabout 5 years ago5 messagesgeneral

marc.millas@mokadb.com

about 5 years ago

Hi,

if I do:
postgres=# with numb as(select ceiling(2582*random()) rand,
generate_series(1,5) as monnum) select rand, monnum from numb;
rand | monnum
------+--------
1543 | 1
2299 | 2
205 | 3
523 | 4
677 | 5
(5 lines)

ok, fine. The random numbers are at random...and the generate_series are
ordered...

I have a table firstnames(id serial, firstname text) with 2582 lines
containing firstnames sorted in alphabetical order.

--when I do, with the same CTE:
postgres=# with numb as(select ceiling(2582*random()) rand,
generate_series(1,5) as monnum) select monnum, firstname from
numb, firstnames where numb.rand= firstnames.id ;
monnum | firstname
--------+-----------
2 | Christine
1 | Firas
4 | Firmin
3 | Rawane
5 | Titania
(5 lignes)

which mean that what I get is a set of firstnames ordered according to the
firstnames table, and NOT to the result of the CTE.

--Now if I cast the result of the ceiling function to int:
postgres=# with numb as(select ceiling(2582*random())::int rand,
generate_series(1,5) as monnum) select monnum, firstname from
numb,firstnames where numb.rand=firstnames.id;
monnum | prenom
--------+----------
1 | Dexter
2 | Harrison
3 | Angilbe
4 | Narcisse
5 | Marcel
(5 lignes)

Now its ordered according to the CTE. (and the firstname list is at random)

I did test the same thing after putting the result of the CTE in a table,
with the very same behaviour.

So.. I would like to understand the "why" of this behaviour, ie. the change
of order when I do the cast.
(The original reason was that I was creating a test env with millions of
rows with this kind of CTE and was quite surprised to discover that the
result table was ordered...which was not at all my goal)

thanks,

Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com

David G. Johnston

david.g.johnston@gmail.com

about 5 years ago

In reply to: Marc Millas (#1)

Re: need clarification on CTE/join

On Tue, Mar 23, 2021 at 6:45 PM Marc Millas <marc.millas@mokadb.com> wrote:

So.. I would like to understand the "why" of this behaviour, ie. the
change of order when I do the cast.

I believe the "why" is immaterial here. Your queries do not contain order
by so your results are unordered - even if there appears to be an apparent
ordering for any particular result.

David J.

Marc Millas

marc.millas@mokadb.com

about 5 years ago

In reply to: David G. Johnston (#2)

Re: need clarification on CTE/join

Hi,

I cannot agree.
I did an explain analyze with and without the cast: its extremely different:

postgres=# explain analyze with numb as(select ceiling(2582*random())::int
rand, generate_series(1,5) as monnum) select monnum, prenom from
numb,prenoms where numb.rand=prenoms.id;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.33..37.65 rows=5 width=11) (actual time=0.059..0.147
rows=5 loops=1)
CTE numb
-> ProjectSet (cost=0.00..0.05 rows=5 width=8) (actual
time=0.021..0.022 rows=5 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual
time=0.000..0.000 rows=1 loops=1)
-> CTE Scan on numb (cost=0.00..0.10 rows=5 width=8) (actual
time=0.023..0.025 rows=5 loops=1)
-> Index Scan using prenoms_pkey on prenoms (cost=0.28..7.50 rows=1
width=11) (actual time=0.024..0.024 rows=1 loops=5)
Index Cond: (id = numb.rand)
Planning Time: 0.111 ms
Execution Time: 0.201 ms
(9 lignes)

postgres=# explain analyze with numb as(select ceiling(2582*random()) rand,
generate_series(1,5) as monnum) select monnum, prenom from numb,prenoms
where numb.rand=prenoms.id;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Hash Join (cost=0.21..54.59 rows=65 width=11) (actual time=0.105..0.291
rows=5 loops=1)
Hash Cond: ((prenoms.id)::double precision = numb.rand)
CTE numb
-> ProjectSet (cost=0.00..0.05 rows=5 width=12) (actual
time=0.002..0.003 rows=5 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual
time=0.000..0.000 rows=1 loops=1)
-> Seq Scan on prenoms (cost=0.00..40.82 rows=2582 width=11) (actual
time=0.011..0.131 rows=2582 loops=1)
-> Hash (cost=0.10..0.10 rows=5 width=12) (actual time=0.012..0.012
rows=5 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> CTE Scan on numb (cost=0.00..0.10 rows=5 width=12) (actual
time=0.004..0.005 rows=5 loops=1)
Planning Time: 0.070 ms
Execution Time: 0.313 ms
(11 lignes)

Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com

On Wed, Mar 24, 2021 at 3:22 AM David G. Johnston <
david.g.johnston@gmail.com> wrote:

Show quoted text

On Tue, Mar 23, 2021 at 6:45 PM Marc Millas <marc.millas@mokadb.com>
wrote:

So.. I would like to understand the "why" of this behaviour, ie. the
change of order when I do the cast.

I believe the "why" is immaterial here. Your queries do not contain order
by so your results are unordered - even if there appears to be an apparent
ordering for any particular result.

David J.

David G. Johnston

david.g.johnston@gmail.com

about 5 years ago

In reply to: Marc Millas (#3)

Re: need clarification on CTE/join

On Tuesday, March 23, 2021, Marc Millas <marc.millas@mokadb.com> wrote:

Hi,

I cannot agree.
I did an explain analyze with and without the cast: its
extremely different:

postgres=# explain analyze with numb as(select ceiling(2582*random())::int
rand, generate_series(1,5) as monnum) select monnum, prenom from
numb,prenoms where numb.rand=prenoms.id;
QUERY PLAN
------------------------------------------------------------
-----------------------------------------------------------------
Nested Loop (cost=0.33..37.65 rows=5 width=11) (actual time=0.059..0.147
rows=5 loops=1)
CTE numb
-> ProjectSet (cost=0.00..0.05 rows=5 width=8) (actual
time=0.021..0.022 rows=5 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual
time=0.000..0.000 rows=1 loops=1)
-> CTE Scan on numb (cost=0.00..0.10 rows=5 width=8) (actual
time=0.023..0.025 rows=5 loops=1)
-> Index Scan using prenoms_pkey on prenoms (cost=0.28..7.50 rows=1
width=11) (actual time=0.024..0.024 rows=1 loops=5)
Index Cond: (id = numb.rand)
Planning Time: 0.111 ms
Execution Time: 0.201 ms
(9 lignes)

postgres=# explain analyze with numb as(select ceiling(2582*random())
rand, generate_series(1,5) as monnum) select monnum, prenom from
numb,prenoms where numb.rand=prenoms.id;
QUERY PLAN
------------------------------------------------------------
---------------------------------------------------
Hash Join (cost=0.21..54.59 rows=65 width=11) (actual time=0.105..0.291
rows=5 loops=1)
Hash Cond: ((prenoms.id)::double precision = numb.rand)
CTE numb
-> ProjectSet (cost=0.00..0.05 rows=5 width=12) (actual
time=0.002..0.003 rows=5 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual
time=0.000..0.000 rows=1 loops=1)
-> Seq Scan on prenoms (cost=0.00..40.82 rows=2582 width=11) (actual
time=0.011..0.131 rows=2582 loops=1)
-> Hash (cost=0.10..0.10 rows=5 width=12) (actual time=0.012..0.012
rows=5 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> CTE Scan on numb (cost=0.00..0.10 rows=5 width=12) (actual
time=0.004..0.005 rows=5 loops=1)
Planning Time: 0.070 ms
Execution Time: 0.313 ms
(11 lignes)

If I’m reading that correctly since prenoms.id is an integer if you don’t
cast the ceiling(random) away from double you cannot use the index since
its not the same type - the integer has to become double, not the reverse.
So you get a different execution and thus different result ordering since
the executor doesn’t have to care about row order.

David J.

Marc Millas

marc.millas@mokadb.com

about 5 years ago

In reply to: David G. Johnston (#4)

Re: need clarification on CTE/join

Got it :-)
thanks !

Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com

On Wed, Mar 24, 2021 at 4:21 AM David G. Johnston <
david.g.johnston@gmail.com> wrote:

Show quoted text

On Tuesday, March 23, 2021, Marc Millas <marc.millas@mokadb.com> wrote:

Hi,

I cannot agree.
I did an explain analyze with and without the cast: its
extremely different:

postgres=# explain analyze with numb as(select
ceiling(2582*random())::int rand, generate_series(1,5) as monnum) select
monnum, prenom from numb,prenoms where numb.rand=prenoms.id;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.33..37.65 rows=5 width=11) (actual
time=0.059..0.147 rows=5 loops=1)
CTE numb
-> ProjectSet (cost=0.00..0.05 rows=5 width=8) (actual
time=0.021..0.022 rows=5 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual
time=0.000..0.000 rows=1 loops=1)
-> CTE Scan on numb (cost=0.00..0.10 rows=5 width=8) (actual
time=0.023..0.025 rows=5 loops=1)
-> Index Scan using prenoms_pkey on prenoms (cost=0.28..7.50 rows=1
width=11) (actual time=0.024..0.024 rows=1 loops=5)
Index Cond: (id = numb.rand)
Planning Time: 0.111 ms
Execution Time: 0.201 ms
(9 lignes)

postgres=# explain analyze with numb as(select ceiling(2582*random())
rand, generate_series(1,5) as monnum) select monnum, prenom from
numb,prenoms where numb.rand=prenoms.id;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------
Hash Join (cost=0.21..54.59 rows=65 width=11) (actual time=0.105..0.291
rows=5 loops=1)
Hash Cond: ((prenoms.id)::double precision = numb.rand)
CTE numb
-> ProjectSet (cost=0.00..0.05 rows=5 width=12) (actual
time=0.002..0.003 rows=5 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual
time=0.000..0.000 rows=1 loops=1)
-> Seq Scan on prenoms (cost=0.00..40.82 rows=2582 width=11) (actual
time=0.011..0.131 rows=2582 loops=1)
-> Hash (cost=0.10..0.10 rows=5 width=12) (actual time=0.012..0.012
rows=5 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> CTE Scan on numb (cost=0.00..0.10 rows=5 width=12) (actual
time=0.004..0.005 rows=5 loops=1)
Planning Time: 0.070 ms
Execution Time: 0.313 ms
(11 lignes)

If I’m reading that correctly since prenoms.id is an integer if you don’t
cast the ceiling(random) away from double you cannot use the index since
its not the same type - the integer has to become double, not the reverse.
So you get a different execution and thus different result ordering since
the executor doesn’t have to care about row order.

David J.