Query plan: SELECT vs INSERT from same select

Started by Alexander Voytsekhovskyyover 6 years ago5 messagesgeneral
Jump to latest
#1Alexander Voytsekhovskyy
young.inbox@gmail.com

I have quite complicated query:

SELECT axis_x1, axis_y1, SUM(delivery_price) as v_1 FROM (
SELECT to_char(delivery_data.delivery_date, 'YYYY-MM') as axis_x1,
clients.id_client as axis_y1, delivery_data.amount * production_price.price
* groups.discount as delivery_price

FROM delivery_data
JOIN client_tt ON (client_tt.id_client_tt = delivery_data.id_client_tt)
JOIN clients ON (client_tt.id_client = clients.id_client)
JOIN production ON (production.id = delivery_data.id_product)
JOIN groups ON (groups.id = delivery_data.delivery_group_id AND
client_tt.id_group = groups.id AND groups.id = clients.id_group)
LEFT JOIN production_price on (delivery_data.id_product =
production_price.id_production AND groups.price_list_id =
production_price.price_list_id AND delivery_data.delivery_date BETWEEN
production_price.date_from AND production_price.date_to)

WHERE delivery_data.delivery_date between '2019-03-01' AND '2019-06-30'
AND delivery_data.delivery_group_id IN (...short list of values...)
AND delivery_data.id_product IN ()) AS tmpsource

WHERE TRUE
GROUP BY GROUPING SETS ((axis_x1, axis_y1), (axis_x1), (axis_y1), ())

It runs well, took 1s and returns 4000 rows.

You can see explain analyze verbose here:
https://explain.depesz.com/s/AEWj

The problem is, when i wrap it to

A)
INSERT INTO norepl_1542_result (axis_x1, axis_y1, v_1)
SELECT .... SAME QUERY

OR even

B)
WITH rows AS (
... SAME SELECT QUERY ...
)
INSERT INTO norepl_1542_result (axis_x1, axis_y1, v_1)
SELECT * FROM rows

The query time dramatically drops to 500+ seconds.

You can see explain analyze verbose here
https://explain.depesz.com/s/AEWj

As you can see, 100% of time goes to same SELECT query, there is no issues
with INSERT-part

I have played a lot and it's reproducing all time.

So my question is, why wrapping SELECT query with INSERT FROM SELECT
dramatically change query plan and make it 500x slower?

#2Igor Korot
ikorot01@gmail.com
In reply to: Alexander Voytsekhovskyy (#1)
Re: Query plan: SELECT vs INSERT from same select

Hi,

On Tue, Jul 23, 2019 at 3:29 PM Alexander Voytsekhovskyy
<young.inbox@gmail.com> wrote:

I have quite complicated query:

SELECT axis_x1, axis_y1, SUM(delivery_price) as v_1 FROM (
SELECT to_char(delivery_data.delivery_date, 'YYYY-MM') as axis_x1, clients.id_client as axis_y1, delivery_data.amount * production_price.price * groups.discount as delivery_price

FROM delivery_data
JOIN client_tt ON (client_tt.id_client_tt = delivery_data.id_client_tt)
JOIN clients ON (client_tt.id_client = clients.id_client)
JOIN production ON (production.id = delivery_data.id_product)
JOIN groups ON (groups.id = delivery_data.delivery_group_id AND client_tt.id_group = groups.id AND groups.id = clients.id_group)
LEFT JOIN production_price on (delivery_data.id_product = production_price.id_production AND groups.price_list_id = production_price.price_list_id AND delivery_data.delivery_date BETWEEN production_price.date_from AND production_price.date_to)

WHERE delivery_data.delivery_date between '2019-03-01' AND '2019-06-30'
AND delivery_data.delivery_group_id IN (...short list of values...)
AND delivery_data.id_product IN ()) AS tmpsource

WHERE TRUE
GROUP BY GROUPING SETS ((axis_x1, axis_y1), (axis_x1), (axis_y1), ())

It runs well, took 1s and returns 4000 rows.

You can see explain analyze verbose here:
https://explain.depesz.com/s/AEWj

The problem is, when i wrap it to

A)
INSERT INTO norepl_1542_result (axis_x1, axis_y1, v_1)
SELECT .... SAME QUERY

OR even

B)
WITH rows AS (
... SAME SELECT QUERY ...
)
INSERT INTO norepl_1542_result (axis_x1, axis_y1, v_1)
SELECT * FROM rows

The query time dramatically drops to 500+ seconds.

You can see explain analyze verbose here
https://explain.depesz.com/s/AEWj

As you can see, 100% of time goes to same SELECT query, there is no issues with INSERT-part

I have played a lot and it's reproducing all time.

So my question is, why wrapping SELECT query with INSERT FROM SELECT dramatically change query plan and make it 500x slower?

Which version of PostgreSQL do you have?
Which OS does it running on?

Thank you.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Voytsekhovskyy (#1)
Re: Query plan: SELECT vs INSERT from same select

Alexander Voytsekhovskyy <young.inbox@gmail.com> writes:

You can see explain analyze verbose here:
https://explain.depesz.com/s/AEWj

The problem is, when i wrap it to

A)
INSERT INTO norepl_1542_result (axis_x1, axis_y1, v_1)
SELECT .... SAME QUERY

OR even

B)
WITH rows AS (
... SAME SELECT QUERY ...
)
INSERT INTO norepl_1542_result (axis_x1, axis_y1, v_1)
SELECT * FROM rows

The query time dramatically drops to 500+ seconds.

You can see explain analyze verbose here
https://explain.depesz.com/s/AEWj

That's the same link.

As you can see, 100% of time goes to same SELECT query, there is no issues
with INSERT-part

We can't see any such thing from what you posted.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Voytsekhovskyy (#1)
Re: Query plan: SELECT vs INSERT from same select

[ please keep the list cc'd ]

Alexander Voytsekhovskyy <young.inbox@gmail.com> writes:

Sorry again
here is both links:
https://explain.depesz.com/s/AEWj
https://explain.depesz.com/s/CHwF

Don't think I believe that those are the same query --- there's a
CTE in the second one that doesn't appear in the first, and it is
eating a lot of time too.

I have a vague recollection that there are cases where optimizations
are possible in plain SELECT but not in data-modifying queries.
So maybe this isn't pilot error but something triggered by that.
We'd need to see a self-contained test case to verify that though.

regards, tom lane

#5Alban Hertroys
haramrae@gmail.com
In reply to: Alexander Voytsekhovskyy (#1)
Re: Query plan: SELECT vs INSERT from same select

On 23 Jul 2019, at 22:29, Alexander Voytsekhovskyy <young.inbox@gmail.com> wrote:

I have quite complicated query:

SELECT axis_x1, axis_y1, SUM(delivery_price) as v_1 FROM (
SELECT to_char(delivery_data.delivery_date, 'YYYY-MM') as axis_x1, clients.id_client as axis_y1, delivery_data.amount * production_price.price * groups.discount as delivery_price

FROM delivery_data
JOIN client_tt ON (client_tt.id_client_tt = delivery_data.id_client_tt)
JOIN clients ON (client_tt.id_client = clients.id_client)
JOIN production ON (production.id = delivery_data.id_product)
JOIN groups ON (groups.id = delivery_data.delivery_group_id AND client_tt.id_group = groups.id AND groups.id = clients.id_group)

Are client_tt.id_group and clients.id_group ever different from each other? It looks like you might have redundant information there, but... If they are guaranteed to be the same then you don’t need the JOIN to clients, which would both remove a JOIN and reduce the complexity of the JOIN condition on groups.

Or (assuming the group id’s are indeed supposed to be equal), you could
JOIN clients ON (client_tt.id_client = clients.id_client AND client_tt.id_group = clients.id_group)
instead of putting that condition within the JOIN condition on groups.

I don’t think either option will make a huge difference (the first probably more than the second, as it reduces an entire join), but it could be enough to help the database figure out a better plan.

LEFT JOIN production_price on (delivery_data.id_product = production_price.id_production AND groups.price_list_id = production_price.price_list_id AND delivery_data.delivery_date BETWEEN production_price.date_from AND production_price.date_to)

WHERE delivery_data.delivery_date between '2019-03-01' AND '2019-06-30'
AND delivery_data.delivery_group_id IN (...short list of values...)
AND delivery_data.id_product IN ()) AS tmpsource

You don’t have a price if your goods weren’t produced in the delivery window you set? Or do you have goods that get delivered without having a price?

You seem to be using this query for a report on nett sales by month, but I have my doubts whether that LEFT JOIN, and especially the condition on the production date window, is really what you want: Your formula for delivery_price includes the price column from that LEFT JOIN, so you’re going to get 0 values when there is no production_price record in your delivery-window, resulting in a SUM that’s too low if the product was produced before (or after, but that seems unlikely) the delivery window.

WHERE TRUE

This line is unnecessary.

GROUP BY GROUPING SETS ((axis_x1, axis_y1), (axis_x1), (axis_y1), ())

Apparently (I’m new to these statements), CUBE (axis_x1, axis_y1) is a shorthand for the above. They seem to have been introduced at the same time (in 9.6?). See: https://www.postgresql.org/docs/11/queries-table-expressions.html#QUERIES-GROUPING-SETS

It runs well, took 1s and returns 4000 rows.

I won’t go into the performance issue ash this point, other more knowledgeable people already did.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.