How to improve the performance of my SQL query？

Started by gzhover 2 years ago26 messagesgeneral

gzhcoder@126.com

over 2 years ago

Hi everyone,

I'm running into some performance issues with my SQL query.

The following SQL query is taking a long time to execute.

explain analyze

select COUNT(ET_CD)

from TBL_SHA

WHERE TBL_SHA.MS_CD = '009'

and TBL_SHA.ETRYS in

(select TBL_INF.RY_CD

from TBL_INF

WHERE TBL_INF.MS_CD = '009'

AND TBL_INF.RY_CD = '000001'

)

----- Execution Plan -----

Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)

-> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)

-> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1)

-> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)

Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = '000001'::bpchar))

Heap Fetches: 1

-> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1)

Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))

Rows Removed by Filter: 32000325

Planning Time: 0.162 ms

Execution Time: 124168.838 ms

--------------------------------------------------------------------------------

The index is defined as follows.

CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);

When I take the following sql statement, the index works fine and the query is fast.

select COUNT(ET_CD)

from TBL_SHA

WHERE MS_CD = '009'

AND ETRYS = '000001'

The amount of data in the table is as follows.

TBL_SHA 38700325

TBL_INF 35546

Any suggestions for improving the performance of the query would be greatly appreciated.

Thanks in advance!

ewie@ewie.name

over 2 years ago

In reply to: gzh (#1)

Re: How to improve the performance of my SQL query？

On 20/07/2023 09:09 CEST gzh <gzhcoder@126.com> wrote:

I'm running into some performance issues with my SQL query.
The following SQL query is taking a long time to execute.

explain analyze
select COUNT(ET_CD)
from TBL_SHA
WHERE TBL_SHA.MS_CD = '009'
and TBL_SHA.ETRYS in
(select TBL_INF.RY_CD
from TBL_INF
WHERE TBL_INF.MS_CD = '009'
AND TBL_INF.RY_CD = '000001'
)
----- Execution Plan -----
Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)
-> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)
-> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1)
-> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = '000001'::bpchar))
Heap Fetches: 1
-> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1)
Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))
Rows Removed by Filter: 32000325
Planning Time: 0.162 ms
Execution Time: 124168.838 ms
--------------------------------------------------------------------------------

The index is defined as follows.

CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);

When I take the following sql statement, the index works fine and the query
is fast.

select COUNT(ET_CD)
from TBL_SHA
WHERE MS_CD = '009'
AND ETRYS = '000001'

The amount of data in the table is as follows.
TBL_SHA 38700325
TBL_INF 35546

Any suggestions for improving the performance of the query would be greatly
appreciated.

You can try EXISTS instead of IN to correlate the subquery and the outer query:

SELECT count(et_cd)
FROM tbl_sha
WHERE tbl_sha.ms_cd = '009'
AND tbl_sha.etrys = '000001'
AND EXISTS (
SELECT
FROM tbl_inf
WHERE tbl_inf.ms_cd = tbl_sha.ms_cd
AND tbl_inf.ry_cd = tbl_sha.etrys
)

--
Erik

Peter J. Holzer

hjp-pgsql@hjp.at

over 2 years ago

In reply to: gzh (#1)

Re: How to improve the performance of my SQL query？

On 2023-07-20 15:09:22 +0800, gzh wrote:

Hi everyone,

I'm running into some performance issues with my SQL query.

The following SQL query is taking a long time to execute.

explain analyze
select COUNT(ET_CD)
from TBL_SHA
WHERE TBL_SHA.MS_CD = '009'
and TBL_SHA.ETRYS in
(select TBL_INF.RY_CD
from TBL_INF
WHERE TBL_INF.MS_CD = '009'
AND TBL_INF.RY_CD = '000001'
)

----- Execution Plan -----

[...]

-> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18)
(actual time=97264.138..123554.792 rows=3200000 loops=1)
Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))
Rows Removed by Filter: 32000325

[...]

--------------------------------------------------------------------------------

The index is defined as follows.

CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);

When I take the following sql statement, the index works fine and the query is
fast.

select COUNT(ET_CD)
from TBL_SHA
WHERE MS_CD = '009'
AND ETRYS = '000001'

What's the plan for that query?

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

gzhcoder@126.com

over 2 years ago

In reply to: Erik Wienhold (#2)

Re:Re: How to improve the performance of my SQL query？

Thank you very much for taking the time to reply to my question.

Sorry, I provided incorrect information.

The index also does not work in the following query statement.

select COUNT(ET_CD)

from TBL_SHA

WHERE MS_CD = '009'

AND ETRYS = '000001'

QUERY PLAN

Limit (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.439..128668.250 rows=1 loops=1)

-> Finalize Aggregate (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.437..128668.246 rows=1 loops=1)

-> Gather (cost=2419643.25..2419643.46 rows=2 width=8) (actual time=128664.108..128668.233 rows=3 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Partial Aggregate (cost=2418643.25..2418643.26 rows=1 width=8) (actual time=128655.256..128655.258 rows=1 loops=3)

-> Parallel Seq Scan on TBL_SHA (cost=0.00..2415548.85 rows=1237762 width=9) (actual time=75357.455..128531.615 rows=1066667 loops=3)

Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))

Rows Removed by Filter: 11833442

Planning Time: 0.118 ms

Execution Time: 128668.290 ms

The TBL_SHA table has another index, as shown below.

CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS)

CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, FR_CD, RM_CD)

When I take the following query statement, the result is returned quickly.

Why does index_search_01 always not work?

explain analyze

select TO_CHAR(MAX(TBL_SHA.ET_DAY),'YYYYMMDD') AS ET_DAY

from TBL_SHA

WHERE MS_CD = '008'

AND ET_CD = '000003'

QUERY PLAN

Limit (cost=4.11..4.13 rows=1 width=32) (actual time=0.043..0.044 rows=1 loops=1)

InitPlan 1 (returns $0)

-> Limit (cost=0.56..4.11 rows=1 width=8) (actual time=0.032..0.033 rows=1 loops=1)

-> Index Scan Backward using index_search_01 on TBL_SHA (cost=0.56..10836962.40 rows=3054052 width=8) (actual time=0.032..0.032 rows=1 loops=1)

Index Cond: ((MS_CD = '008'::bpchar) AND (ent_day IS NOT NULL))

Filter: (ET_CD = '000003'::bpchar)

-> Result (cost=4.11..4.13 rows=1 width=32) (actual time=0.042..0.042 rows=1 loops=1)

Planning Time: 0.228 ms

Execution Time: 0.070 ms

At 2023-07-20 16:07:15, "Erik Wienhold" <ewie@ewie.name> wrote:

Show quoted text

On 20/07/2023 09:09 CEST gzh <gzhcoder@126.com> wrote:

I'm running into some performance issues with my SQL query.
The following SQL query is taking a long time to execute.

explain analyze
select COUNT(ET_CD)
from TBL_SHA
WHERE TBL_SHA.MS_CD = '009'
and TBL_SHA.ETRYS in
(select TBL_INF.RY_CD
from TBL_INF
WHERE TBL_INF.MS_CD = '009'
AND TBL_INF.RY_CD = '000001'
)
----- Execution Plan -----
Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)
-> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)
-> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1)
-> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = '000001'::bpchar))
Heap Fetches: 1
-> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1)
Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))
Rows Removed by Filter: 32000325
Planning Time: 0.162 ms
Execution Time: 124168.838 ms
--------------------------------------------------------------------------------

The index is defined as follows.

CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);

When I take the following sql statement, the index works fine and the query
is fast.

select COUNT(ET_CD)
from TBL_SHA
WHERE MS_CD = '009'
AND ETRYS = '000001'

The amount of data in the table is as follows.
TBL_SHA 38700325
TBL_INF 35546

Any suggestions for improving the performance of the query would be greatly
appreciated.

You can try EXISTS instead of IN to correlate the subquery and the outer query:

SELECT count(et_cd)
FROM tbl_sha
WHERE tbl_sha.ms_cd = '009'
AND tbl_sha.etrys = '000001'
AND EXISTS (
SELECT
FROM tbl_inf
WHERE tbl_inf.ms_cd = tbl_sha.ms_cd
AND tbl_inf.ry_cd = tbl_sha.etrys
)

--
Erik

laurenz.albe@cybertec.at

over 2 years ago

In reply to: gzh (#1)

Re: How to improve the performance of my SQL query？

On Thu, 2023-07-20 at 15:09 +0800, gzh wrote:

I'm running into some performance issues with my SQL query.
The following SQL query is taking a long time to execute.

explain analyze
select COUNT(ET_CD)
from TBL_SHA
WHERE TBL_SHA.MS_CD = '009'
and TBL_SHA.ETRYS in
(select TBL_INF.RY_CD
from TBL_INF
WHERE TBL_INF.MS_CD = '009'
AND TBL_INF.RY_CD = '000001'
)
----- Execution Plan -----
Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)
-> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)
-> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1)
-> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = '000001'::bpchar))
Heap Fetches: 1
-> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1)
Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))
Rows Removed by Filter: 32000325
Planning Time: 0.162 ms
Execution Time: 124168.838 ms
--------------------------------------------------------------------------------

The index is defined as follows.

CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);

Actual rows = 3200000, rows removed by filter is ten times as much.
It should use an index.

When I take the following sql statement, the index works fine and the query is fast.

select COUNT(ET_CD)
from TBL_SHA
WHERE MS_CD = '009'
AND ETRYS = '000001'

The amount of data in the table is as follows.
TBL_SHA 38700325
TBL_INF 35546

This looks very much like it is a problem with the data types.
I see that you are using "character", which you shouldn't do.

What I cannot see is if the columns are defined as "character" or whether you bind
the parameters as "character". Can you show us the table definition of "TBL_SHA"
and "TBL_INF"?

Yours,
Laurenz Albe

gzhcoder@126.com

over 2 years ago

In reply to: Laurenz Albe (#5)

Re: How to improve the performance of my SQL query？

Thank you for your reply.
The information I provided is incorrect, please see my previous reply.

What I cannot see is if the columns are defined as "character" or whether you bind
the parameters as "character". Can you show us the table definition of "TBL_SHA"

and "TBL_INF"?

For information security reasons, I can't provide the table definition, these columns are defined as "character".

At 2023-07-20 19:58:59, "Laurenz Albe" <laurenz.albe@cybertec.at> wrote:

Show quoted text

On Thu, 2023-07-20 at 15:09 +0800, gzh wrote:

I'm running into some performance issues with my SQL query.
The following SQL query is taking a long time to execute.

explain analyze
select COUNT(ET_CD)
from TBL_SHA
WHERE TBL_SHA.MS_CD = '009'
and TBL_SHA.ETRYS in
(select TBL_INF.RY_CD
from TBL_INF
WHERE TBL_INF.MS_CD = '009'
AND TBL_INF.RY_CD = '000001'
)
----- Execution Plan -----
Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)
-> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)
-> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1)
-> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
Index Cond: ((MS_CD = '009'::bpchar) AND (RY_CD = '000001'::bpchar))
Heap Fetches: 1
-> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1)
Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))
Rows Removed by Filter: 32000325
Planning Time: 0.162 ms
Execution Time: 124168.838 ms
--------------------------------------------------------------------------------

The index is defined as follows.

CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);

Actual rows = 3200000, rows removed by filter is ten times as much.
It should use an index.

When I take the following sql statement, the index works fine and the query is fast.

select COUNT(ET_CD)
from TBL_SHA
WHERE MS_CD = '009'
AND ETRYS = '000001'

The amount of data in the table is as follows.
TBL_SHA 38700325
TBL_INF 35546

This looks very much like it is a problem with the data types.
I see that you are using "character", which you shouldn't do.

What I cannot see is if the columns are defined as "character" or whether you bind
the parameters as "character". Can you show us the table definition of "TBL_SHA"
and "TBL_INF"?

Yours,
Laurenz Albe

laurenz.albe@cybertec.at

over 2 years ago

In reply to: gzh (#6)

Re: How to improve the performance of my SQL query？

On Thu, 2023-07-20 at 22:14 +0800, gzh wrote:

The information I provided is incorrect, please see my previous reply.

My question remains: I would like to see the table definitions.

Also, did you ANALYZE the tables?

Yours,
Laurenz Albe

jian.universality@gmail.com

over 2 years ago

In reply to: gzh (#4)

Re: Re: How to improve the performance of my SQL query？

On Thu, Jul 20, 2023 at 7:36 PM gzh <gzhcoder@126.com> wrote:

Thank you very much for taking the time to reply to my question.

Sorry, I provided incorrect information.

The index also does not work in the following query statement.

select COUNT(ET_CD)

from TBL_SHA

WHERE MS_CD = '009'

AND ETRYS = '000001'

QUERY PLAN

Limit (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.439..128668.250 rows=1 loops=1)

-> Finalize Aggregate (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.437..128668.246 rows=1 loops=1)

-> Gather (cost=2419643.25..2419643.46 rows=2 width=8) (actual time=128664.108..128668.233 rows=3 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Partial Aggregate (cost=2418643.25..2418643.26 rows=1 width=8) (actual time=128655.256..128655.258 rows=1 loops=3)

-> Parallel Seq Scan on TBL_SHA (cost=0.00..2415548.85 rows=1237762 width=9) (actual time=75357.455..128531.615 rows=1066667 loops=3)

Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))

Rows Removed by Filter: 11833442

Planning Time: 0.118 ms

Execution Time: 128668.290 ms

The TBL_SHA table has another index, as shown below.

CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS)

CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, FR_CD, RM_CD)

Rows Removed by Filter: 11833442

select (38700325 - 11833442) /38700325.0;
is 0.69 approx.
So I think it says around 69% of rows satisfy the query condition.

but I am not sure in the following 2 cases, whether the actual rows
are noisy or not. I can not find the doc explaining it.

Show quoted text

Partial Aggregate (actual time=128655.256..128655.258 rows=1 loops=3)
Finalize Aggregate (actual time=128667.437..128668.246 rows=1 loops=1)

gzhcoder@126.com

over 2 years ago

In reply to: Laurenz Albe (#7)

Re: How to improve the performance of my SQL query？

The definitions of the columns used in SQL are as follows.

TBL_SHA

ms_cd character(6) NOT NULL -- PRIMARY KEY

et_cd character(8)

etrys character(8)

TBL_INF

ms_cd character(6) NOT NULL -- PRIMARY KEY

ry_cd character(8) NOT NULL -- PRIMARY KEY

I made some modifications to the data, and I realized that I should not change the length of the data.

The actual data and its corresponding execution plan are shown below.

explain analyze

select COUNT(ET_CD)

from TBL_SHA

WHERE TBL_SHA.MS_CD = 'MLD009'

and TBL_SHA.ETRYS in

(select TBL_INF.RY_CD

from TBL_INF

WHERE TBL_INF.MS_CD = 'MLD009'

AND TBL_INF.RY_CD = '00000001'

)

----- Execution Plan -----

Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)

-> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)

-> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1)

-> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)

Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = '00000001'::bpchar))

Heap Fetches: 1

-> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1)

Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = '00000001'::bpchar))

Rows Removed by Filter: 32000325

Planning Time: 0.162 ms

Execution Time: 124168.838 ms

At 2023-07-20 22:48:19, "Laurenz Albe" <laurenz.albe@cybertec.at> wrote:

Show quoted text

On Thu, 2023-07-20 at 22:14 +0800, gzh wrote:

The information I provided is incorrect, please see my previous reply.

My question remains: I would like to see the table definitions.

Also, did you ANALYZE the tables?

Yours,
Laurenz Albe

gzhcoder@126.com

over 2 years ago

In reply to: jian he (#8)

Re:How to improve the performance of my SQL query？

select (38700325 - 11833442) /38700325.0;
is 0.69 approx.
So I think it says around 69% of rows satisfy the query condition.Thank you for your reply. I have learned a lot from it.

At 2023-07-20 23:20:16, "jian he" <jian.universality@gmail.com> wrote:

Show quoted text

On Thu, Jul 20, 2023 at 7:36 PM gzh <gzhcoder@126.com> wrote:

Thank you very much for taking the time to reply to my question.

Sorry, I provided incorrect information.

The index also does not work in the following query statement.

select COUNT(ET_CD)

from TBL_SHA

WHERE MS_CD = '009'

AND ETRYS = '000001'

QUERY PLAN

Limit (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.439..128668.250 rows=1 loops=1)

-> Finalize Aggregate (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.437..128668.246 rows=1 loops=1)

-> Gather (cost=2419643.25..2419643.46 rows=2 width=8) (actual time=128664.108..128668.233 rows=3 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Partial Aggregate (cost=2418643.25..2418643.26 rows=1 width=8) (actual time=128655.256..128655.258 rows=1 loops=3)

-> Parallel Seq Scan on TBL_SHA (cost=0.00..2415548.85 rows=1237762 width=9) (actual time=75357.455..128531.615 rows=1066667 loops=3)

Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))

Rows Removed by Filter: 11833442

Planning Time: 0.118 ms

Execution Time: 128668.290 ms

The TBL_SHA table has another index, as shown below.

CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS)

CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, FR_CD, RM_CD)

Rows Removed by Filter: 11833442

select (38700325 - 11833442) /38700325.0;
is 0.69 approx.
So I think it says around 69% of rows satisfy the query condition.

but I am not sure in the following 2 cases, whether the actual rows
are noisy or not. I can not find the doc explaining it.

Partial Aggregate (actual time=128655.256..128655.258 rows=1 loops=3)
Finalize Aggregate (actual time=128667.437..128668.246 rows=1 loops=1)

laurenz.albe@cybertec.at

over 2 years ago

In reply to: gzh (#9)

Re: How to improve the performance of my SQL query？

On Fri, 2023-07-21 at 09:43 +0800, gzh wrote:

The definitions of the columns used in SQL are as follows.

TBL_SHA

ms_cd character(6) NOT NULL -- PRIMARY KEY
et_cd character(8)
etrys character(8)

TBL_INF

ms_cd character(6) NOT NULL -- PRIMARY KEY
ry_cd character(8) NOT NULL -- PRIMARY KEY

I made some modifications to the data, and I realized that I should not change the length of the data.
The actual data and its corresponding execution plan are shown below.

explain analyze
select COUNT(ET_CD)
from TBL_SHA
WHERE TBL_SHA.MS_CD = 'MLD009'
and TBL_SHA.ETRYS in
(select TBL_INF.RY_CD
from TBL_INF
WHERE TBL_INF.MS_CD = 'MLD009'
AND TBL_INF.RY_CD = '00000001'
)
----- Execution Plan -----
Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)
-> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)
-> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1)
-> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = '00000001'::bpchar))
Heap Fetches: 1
-> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1)
Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = '00000001'::bpchar))
Rows Removed by Filter: 32000325
Planning Time: 0.162 ms
Execution Time: 124168.838 ms

Thanks. That should definitely use a b-tree index defined on (ms_cd, etrsys).

Did you change any parameters that have an impact on query planning?
You can see that in the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS).

Yours,
Laurenz Albe

gzhcoder@126.com

over 2 years ago

In reply to: Laurenz Albe (#11)

Re: How to improve the performance of my SQL query？

Did you change any parameters that have an impact on query planning?

You can see that in the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS).

I added some parameters and re-executed the Execution Plan.

Except for the index not taking effect, I still don't know the reason why the index is not working.

Is it because there is too much data that meets the conditions?

EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON)

select COUNT(ET_CD)

from TBL_SHA

WHERE TBL_SHA.MS_CD = 'MLD009'

and TBL_SHA.ETRYS in

(select TBL_INF.RY_CD

from TBL_INF

WHERE TBL_INF.MS_CD = 'MLD009'

AND TBL_INF.RY_CD = '00000001'

)

----- Execution Plan -----

Limit (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=128691.521..128717.677 rows=1 loops=1)

Output: (count(tbl_sha.et_cd))

Buffers: shared hit=58948 read=2112758

I/O Timings: read=357249.120

-> Aggregate (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=128691.519..128717.674 rows=1 loops=1)

Output: count(tbl_sha.et_cd)

Buffers: shared hit=58948 read=2112758

I/O Timings: read=357249.120

-> Nested Loop (cost=1000.29..2722556.76 rows=2430587 width=9) (actual time=2.364..128350.279 rows=2613500 loops=1)

Output: tbl_sha.et_cd

Buffers: shared hit=58948 read=2112758

I/O Timings: read=357249.120

-> Index Only Scan using tbl_inf_pkc on mtpdb.tbl_inf (cost=0.29..8.31 rows=1 width=9) (actual time=0.046..0.051 rows=1 loops=1)

Output: tbl_inf.ms_cd, tbl_inf.ry_cd

Index Cond: ((tbl_inf.ms_cd = 'MLD009'::bpchar) AND (tbl_inf.ry_cd = '00000001'::bpchar))

Heap Fetches: 1

Buffers: shared hit=4

-> Gather (cost=1000.00..2698242.58 rows=2430587 width=18) (actual time=2.315..127773.087 rows=2613500 loops=1)

Output: tbl_sha.et_cd, tbl_sha.etrys

Workers Planned: 2

Workers Launched: 2

Buffers: shared hit=58944 read=2112758

I/O Timings: read=357249.120

-> Parallel Seq Scan on mtpdb.tbl_sha (cost=0.00..2454183.88 rows=1012745 width=18) (actual time=952.728..127583.089 rows=871167 loops=3)

Output: tbl_sha.et_cd, tbl_sha.etrys

Filter: ((tbl_sha.ms_cd = 'MLD009'::bpchar) AND (tbl_sha.etrys = '00000001'::bpchar))

Rows Removed by Filter: 14062278

Buffers: shared hit=58944 read=2112758

I/O Timings: read=357249.120

Worker 0: actual time=1432.292..127762.181 rows=988036 loops=1

Buffers: shared hit=17875 read=706862

I/O Timings: read=119193.744

Worker 1: actual time=1425.878..127786.777 rows=992381 loops=1

Buffers: shared hit=19813 read=706359

I/O Timings: read=119386.899

Planning:

Buffers: shared hit=42

Planning Time: 1.024 ms

Execution Time: 128717.731 ms

At 2023-07-24 13:43:46, "Laurenz Albe" <laurenz.albe@cybertec.at> wrote:

Show quoted text

On Fri, 2023-07-21 at 09:43 +0800, gzh wrote:

The definitions of the columns used in SQL are as follows.

TBL_SHA

ms_cd character(6) NOT NULL -- PRIMARY KEY
et_cd character(8)
etrys character(8)

TBL_INF

ms_cd character(6) NOT NULL -- PRIMARY KEY
ry_cd character(8) NOT NULL -- PRIMARY KEY

I made some modifications to the data, and I realized that I should not change the length of the data.
The actual data and its corresponding execution plan are shown below.

explain analyze
select COUNT(ET_CD)
from TBL_SHA
WHERE TBL_SHA.MS_CD = 'MLD009'
and TBL_SHA.ETRYS in
(select TBL_INF.RY_CD
from TBL_INF
WHERE TBL_INF.MS_CD = 'MLD009'
AND TBL_INF.RY_CD = '00000001'
)
----- Execution Plan -----
Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)
-> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)
-> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1)
-> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = '00000001'::bpchar))
Heap Fetches: 1
-> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1)
Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = '00000001'::bpchar))
Rows Removed by Filter: 32000325
Planning Time: 0.162 ms
Execution Time: 124168.838 ms

Thanks. That should definitely use a b-tree index defined on (ms_cd, etrsys).

Did you change any parameters that have an impact on query planning?
You can see that in the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS).

Yours,
Laurenz Albe

jian.universality@gmail.com

over 2 years ago

In reply to: gzh (#12)

Re: How to improve the performance of my SQL query？

On Mon, Jul 24, 2023 at 5:54 PM gzh <gzhcoder@126.com> wrote:

Did you change any parameters that have an impact on query planning?

You can see that in the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS).

I added some parameters and re-executed the Execution Plan.

Except for the index not taking effect, I still don't know the reason why the index is not working.

Is it because there is too much data that meets the conditions?

EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON)

select COUNT(ET_CD)

from TBL_SHA

WHERE TBL_SHA.MS_CD = 'MLD009'

and TBL_SHA.ETRYS in

(select TBL_INF.RY_CD

from TBL_INF

WHERE TBL_INF.MS_CD = 'MLD009'

AND TBL_INF.RY_CD = '00000001'

)

----- Execution Plan -----

Limit (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=128691.521..128717.677 rows=1 loops=1)

Output: (count(tbl_sha.et_cd))

Buffers: shared hit=58948 read=2112758

I/O Timings: read=357249.120

-> Aggregate (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=128691.519..128717.674 rows=1 loops=1)

Output: count(tbl_sha.et_cd)

Buffers: shared hit=58948 read=2112758

I/O Timings: read=357249.120

-> Nested Loop (cost=1000.29..2722556.76 rows=2430587 width=9) (actual time=2.364..128350.279 rows=2613500 loops=1)

Output: tbl_sha.et_cd

Buffers: shared hit=58948 read=2112758

I/O Timings: read=357249.120

-> Index Only Scan using tbl_inf_pkc on mtpdb.tbl_inf (cost=0.29..8.31 rows=1 width=9) (actual time=0.046..0.051 rows=1 loops=1)

Output: tbl_inf.ms_cd, tbl_inf.ry_cd

Index Cond: ((tbl_inf.ms_cd = 'MLD009'::bpchar) AND (tbl_inf.ry_cd = '00000001'::bpchar))

Heap Fetches: 1

Buffers: shared hit=4

-> Gather (cost=1000.00..2698242.58 rows=2430587 width=18) (actual time=2.315..127773.087 rows=2613500 loops=1)

Output: tbl_sha.et_cd, tbl_sha.etrys

Workers Planned: 2

Workers Launched: 2

Buffers: shared hit=58944 read=2112758

I/O Timings: read=357249.120

-> Parallel Seq Scan on mtpdb.tbl_sha (cost=0.00..2454183.88 rows=1012745 width=18) (actual time=952.728..127583.089 rows=871167 loops=3)

Output: tbl_sha.et_cd, tbl_sha.etrys

Filter: ((tbl_sha.ms_cd = 'MLD009'::bpchar) AND (tbl_sha.etrys = '00000001'::bpchar))

Rows Removed by Filter: 14062278

Buffers: shared hit=58944 read=2112758

I/O Timings: read=357249.120

Worker 0: actual time=1432.292..127762.181 rows=988036 loops=1

Buffers: shared hit=17875 read=706862

I/O Timings: read=119193.744

Worker 1: actual time=1425.878..127786.777 rows=992381 loops=1

Buffers: shared hit=19813 read=706359

I/O Timings: read=119386.899

Planning:

Buffers: shared hit=42

Planning Time: 1.024 ms

Execution Time: 128717.731 ms

I think the whole query can just:
select COUNT(ET_CD)
from TBL_SHA
WHERE TBL_SHA.MS_CD = 'MLD009' AND TBL_SHA.ETRYS = '00000001';

and TBL_SHA.ETRYS in
(select TBL_INF.RY_CD
from TBL_INF
WHERE TBL_INF.MS_CD = 'MLD009'
AND TBL_INF.RY_CD = '00000001'
)

if subquery after IN clause part, no rows returned then the whole
query would return zero row.
if many duplicates rows returned, then there is no point of evaluate
something like {1 in (1,1,1,1,1,1,)}

laurenz.albe@cybertec.at

over 2 years ago

In reply to: gzh (#12)

Re: How to improve the performance of my SQL query？

On Mon, 2023-07-24 at 17:54 +0800, gzh wrote:

EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON)

I asked for EXPLAIN (ANALYZE, BUFFERS, SETTINGS) ...
Or do you have a version that is too old for SETTINGS?

One other idea: check if the index is INVALID (this will
be visible if you run "\d tablenane" in "psql").
Invalid indexes won't be used.

Yours,
Laurenz Albe

gzhcoder@126.com

over 2 years ago

In reply to: Laurenz Albe (#14)

Re: How to improve the performance of my SQL query？

I asked for EXPLAIN (ANALYZE, BUFFERS, SETTINGS) ...

Or do you have a version that is too old for SETTINGS?

Sorry. Please refer to the following execution plan.

EXPLAIN (ANALYZE, BUFFERS, SETTINGS)

select COUNT(ET_CD)

from TBL_SHA

WHERE TBL_SHA.MS_CD = 'MLD009'

and TBL_SHA.ETRYS in

(select TBL_INF.RY_CD

from TBL_INF

WHERE TBL_INF.MS_CD = 'MLD009'

AND TBL_INF.RY_CD = '00000001'

)

----- Execution Plan -----

Buffers: shared hit=13 read=2171693

I/O Timings: read=365863.877

-> Aggregate (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=131595.624..131599.529 rows=1 loops=1)

Buffers: shared hit=13 read=2171693

I/O Timings: read=365863.877

-> Nested Loop (cost=1000.29..2722556.76 rows=2430587 width=9) (actual time=2.341..131256.445 rows=2513500 loops=1)

Buffers: shared hit=13 read=2171693

I/O Timings: read=365863.877

-> Index Only Scan using tbl_inf_pkc on tbl_inf (cost=0.29..8.31 rows=1 width=9) (actual time=1.471..1.476 rows=1 loops=1)

Index Cond: ((ms_cd = 'MLD009'::bpchar) AND (ry_cd = '00000001'::bpchar))

Heap Fetches: 1

Buffers: shared hit=2 read=2

I/O Timings: read=1.412

-> Gather (cost=1000.00..2698242.58 rows=2430587 width=18) (actual time=0.866..130696.440 rows=2513500 loops=1)

Workers Planned: 2

Workers Launched: 2

Buffers: shared hit=11 read=2171691

I/O Timings: read=365862.464

-> Parallel Seq Scan on tbl_sha (cost=0.00..2454183.88 rows=1012745 width=18) (actual time=0.215..130476.981 rows=837833 loops=3)

Filter: ((ms_cd = 'MLD009'::bpchar) AND (etrys = '00000001'::bpchar))

Rows Removed by Filter: 13728945

Buffers: shared hit=11 read=2171691

I/O Timings: read=365862.464

Settings: effective_cache_size = '1886088kB', jit = 'off', search_path = '"$user", mdb'

Planning:

Buffers: shared hit=167 read=7

I/O Timings: read=2.735

Planning Time: 3.733 ms

Execution Time: 131599.594 ms

At 2023-07-24 23:58:50, "Laurenz Albe" <laurenz.albe@cybertec.at> wrote:

Show quoted text

On Mon, 2023-07-24 at 17:54 +0800, gzh wrote:

EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON)

I asked for EXPLAIN (ANALYZE, BUFFERS, SETTINGS) ...
Or do you have a version that is too old for SETTINGS?

One other idea: check if the index is INVALID (this will
be visible if you run "\d tablenane" in "psql").
Invalid indexes won't be used.

Yours,
Laurenz Albe

gzhcoder@126.com

over 2 years ago

In reply to: jian he (#13)

Re: How to improve the performance of my SQL query？

Thank you for your reply.

I think the whole query can just:
select COUNT(ET_CD)
from TBL_SHA
WHERE TBL_SHA.MS_CD = 'MLD009' AND TBL_SHA.ETRYS = '00000001';

This is related to the business logic.

if many duplicates rows returned, then there is no point of evaluate
something like {1 in (1,1,1,1,1,1,)}

Because the primary key of the 'tbl_inf' table only consists of 'ms_cd' and 'ry_cd' columns, the subquery will not return duplicate rows.

At 2023-07-24 22:42:01, "jian he" <jian.universality@gmail.com> wrote:

Show quoted text

On Mon, Jul 24, 2023 at 5:54 PM gzh <gzhcoder@126.com> wrote:

Did you change any parameters that have an impact on query planning?

You can see that in the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS).

I added some parameters and re-executed the Execution Plan.

Except for the index not taking effect, I still don't know the reason why the index is not working.

Is it because there is too much data that meets the conditions?

EXPLAIN (ANALYZE ON, VERBOSE ON, COSTS ON, BUFFERS ON)

select COUNT(ET_CD)

from TBL_SHA

WHERE TBL_SHA.MS_CD = 'MLD009'

and TBL_SHA.ETRYS in

(select TBL_INF.RY_CD

from TBL_INF

WHERE TBL_INF.MS_CD = 'MLD009'

AND TBL_INF.RY_CD = '00000001'

)

----- Execution Plan -----

Limit (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=128691.521..128717.677 rows=1 loops=1)

Output: (count(tbl_sha.et_cd))

Buffers: shared hit=58948 read=2112758

I/O Timings: read=357249.120

-> Aggregate (cost=2728633.22..2728633.23 rows=1 width=8) (actual time=128691.519..128717.674 rows=1 loops=1)

Output: count(tbl_sha.et_cd)

Buffers: shared hit=58948 read=2112758

I/O Timings: read=357249.120

-> Nested Loop (cost=1000.29..2722556.76 rows=2430587 width=9) (actual time=2.364..128350.279 rows=2613500 loops=1)

Output: tbl_sha.et_cd

Buffers: shared hit=58948 read=2112758

I/O Timings: read=357249.120

-> Index Only Scan using tbl_inf_pkc on mtpdb.tbl_inf (cost=0.29..8.31 rows=1 width=9) (actual time=0.046..0.051 rows=1 loops=1)

Output: tbl_inf.ms_cd, tbl_inf.ry_cd

Index Cond: ((tbl_inf.ms_cd = 'MLD009'::bpchar) AND (tbl_inf.ry_cd = '00000001'::bpchar))

Heap Fetches: 1

Buffers: shared hit=4

-> Gather (cost=1000.00..2698242.58 rows=2430587 width=18) (actual time=2.315..127773.087 rows=2613500 loops=1)

Output: tbl_sha.et_cd, tbl_sha.etrys

Workers Planned: 2

Workers Launched: 2

Buffers: shared hit=58944 read=2112758

I/O Timings: read=357249.120

-> Parallel Seq Scan on mtpdb.tbl_sha (cost=0.00..2454183.88 rows=1012745 width=18) (actual time=952.728..127583.089 rows=871167 loops=3)

Output: tbl_sha.et_cd, tbl_sha.etrys

Filter: ((tbl_sha.ms_cd = 'MLD009'::bpchar) AND (tbl_sha.etrys = '00000001'::bpchar))

Rows Removed by Filter: 14062278

Buffers: shared hit=58944 read=2112758

I/O Timings: read=357249.120

Worker 0: actual time=1432.292..127762.181 rows=988036 loops=1

Buffers: shared hit=17875 read=706862

I/O Timings: read=119193.744

Worker 1: actual time=1425.878..127786.777 rows=992381 loops=1

Buffers: shared hit=19813 read=706359

I/O Timings: read=119386.899

Planning:

Buffers: shared hit=42

Planning Time: 1.024 ms

Execution Time: 128717.731 ms

I think the whole query can just:
select COUNT(ET_CD)
from TBL_SHA
WHERE TBL_SHA.MS_CD = 'MLD009' AND TBL_SHA.ETRYS = '00000001';

and TBL_SHA.ETRYS in
(select TBL_INF.RY_CD
from TBL_INF
WHERE TBL_INF.MS_CD = 'MLD009'
AND TBL_INF.RY_CD = '00000001'
)

if subquery after IN clause part, no rows returned then the whole
query would return zero row.
if many duplicates rows returned, then there is no point of evaluate
something like {1 in (1,1,1,1,1,1,)}

laurenz.albe@cybertec.at

over 2 years ago

In reply to: gzh (#15)

Re: How to improve the performance of my SQL query？

On Tue, 2023-07-25 at 11:11 +0800, gzh wrote:

I asked for EXPLAIN (ANALYZE, BUFFERS, SETTINGS) ...
Or do you have a version that is too old for SETTINGS?

Sorry. Please refer to the following execution plan.

[...]
Settings: effective_cache_size = '1886088kB', jit = 'off', search_path = '"$user", mdb'
[...]

No unusual settings. So that theory is dead.

One other idea: check if the index is INVALID (this will
be visible if you run "\d tablenane" in "psql").
Invalid indexes won't be used.

Did you check that?

After that, I am out of ideas.

Yours,
Laurenz Albe

dgrowleyml@gmail.com

over 2 years ago

In reply to: gzh (#4)

Re: Re: How to improve the performance of my SQL query？

On Thu, 20 Jul 2023 at 23:36, gzh <gzhcoder@126.com> wrote:

Thank you very much for taking the time to reply to my question.

Sorry, I provided incorrect information.

The index also does not work in the following query statement.

select COUNT(ET_CD)
from TBL_SHA
WHERE MS_CD = '009'
AND ETRYS = '000001'

QUERY PLAN
Limit (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.439..128668.250 rows=1 loops=1)
-> Finalize Aggregate (cost=2419643.47..2419643.48 rows=1 width=8) (actual time=128667.437..128668.246 rows=1 loops=1)
-> Gather (cost=2419643.25..2419643.46 rows=2 width=8) (actual time=128664.108..128668.233 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=2418643.25..2418643.26 rows=1 width=8) (actual time=128655.256..128655.258 rows=1 loops=3)
-> Parallel Seq Scan on TBL_SHA (cost=0.00..2415548.85 rows=1237762 width=9) (actual time=75357.455..128531.615 rows=1066667 loops=3)
Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = '000001'::bpchar))
Rows Removed by Filter: 11833442
Planning Time: 0.118 ms
Execution Time: 128668.290 ms

The TBL_SHA table has another index, as shown below.

CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS)
CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, FR_CD, RM_CD)

When I take the following query statement, the result is returned quickly.
Why does index_search_01 always not work?

The method to access the table is decided by the query planner based
on costs. The costs are driven off the row estimates which are driven
from table statistics. If the table statistics, for example say that
99% of rows have MS_CD = '009', then scanning an index on MS_CD is
unlikely to be a good idea as that would likely require random access
to the heap. It's likely better to perform a table scan and then just
filter out the 1% of rows that don't match.

Try executing the query after having done:

SET enable_seqscan TO off;

What plan does it use now?

Is that plan faster or slower than the seq scan plan?

David

dgrowleyml@gmail.com

over 2 years ago

In reply to: gzh (#9)

Re: How to improve the performance of my SQL query？

On Fri, 21 Jul 2023 at 13:44, gzh <gzhcoder@126.com> wrote:

The definitions of the columns used in SQL are as follows.

TBL_SHA
ms_cd character(6) NOT NULL -- PRIMARY KEY
et_cd character(8)
etrys character(8)

explain analyze
select COUNT(ET_CD)
from TBL_SHA
WHERE TBL_SHA.MS_CD = 'MLD009'
and TBL_SHA.ETRYS in
(select TBL_INF.RY_CD
from TBL_INF
WHERE TBL_INF.MS_CD = 'MLD009'
AND TBL_INF.RY_CD = '00000001'
)
----- Execution Plan -----
Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)
-> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)
-> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1)
-> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = '00000001'::bpchar))
Heap Fetches: 1
-> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1)
Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = '00000001'::bpchar))
Rows Removed by Filter: 32000325
Planning Time: 0.162 ms
Execution Time: 124168.838 ms

You might want to check your description of the table definitions.
Going by the above EXPLAIN ANALYZE output, it very much does not look
like ms_cd is the primary key of TBL_SHA. If it is then it's very
weird that you have 3200000 rows for MS_CD = 'MLD009'. You have some
data corruption if that's the case. I suspect you've just not
accurately described the table definition, however.

David

gzhcoder@126.com

over 2 years ago

In reply to: David Rowley (#19)

Re: How to improve the performance of my SQL query？

Thank you very much for taking the time to reply to my question.

You might want to check your description of the table definitions.
Going by the above EXPLAIN ANALYZE output, it very much does not look
like ms_cd is the primary key of TBL_SHA. If it is then it's very
weird that you have 3200000 rows for MS_CD = 'MLD009'. You have some
data corruption if that's the case. I suspect you've just not

accurately described the table definition, however.

The primary key of the SHA table has six fields, and ms_cd is just one of them.
I'm sorry, I didn't make that clear.

Try executing the query after having done:

SET enable_seqscan TO off;

What plan does it use now?

Is that plan faster or slower than the seq scan plan?

There's improvement, but it's still quite slow.

QUERY PLAN (enable_seqscan=on)
Limit (cost=2693516.87..2693516.88 rows=1 width=8) (actual time=167089.822..167183.058 rows=1 loops=1)
-> Aggregate (cost=2693516.87..2693516.88 rows=1 width=8) (actual time=167089.820..167183.056 rows=1 loops=1)
-> Nested Loop (cost=1000.29..2688558.85 rows=1983209 width=9) (actual time=43544.753..166906.304 rows=2413500 loops=1)
-> Index Only Scan using tbl_inf_pkc on tbl_inf (cost=0.29..8.31 rows=1 width=9) (actual time=1.034..1.038 rows=1 loops=1)
Index Cond: ((ms_cd = 'MLD009'::bpchar) AND (ry_cd = '00000001'::bpchar))
Heap Fetches: 1
-> Gather (cost=1000.00..2668718.45 rows=1983209 width=18) (actual time=43543.714..166447.333 rows=2413500 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tbl_sha (cost=0.00..2469397.55 rows=826337 width=18) (actual time=43537.056..166225.162 rows=804500 loops=3)
Filter: ((ms_cd = 'MLD009'::bpchar) AND (etrys = '00000001'::bpchar))
Rows Removed by Filter: 15362328
Planning Time: 2.942 ms
Execution Time: 167183.133 ms

SET enable_seqscan TO off;

QUERY PLAN (enable_seqscan=off)
Limit (cost=2880973.06..2880973.07 rows=1 width=8) (actual time=22295.419..22320.102 rows=1 loops=1)
-> Aggregate (cost=2880973.06..2880973.07 rows=1 width=8) (actual time=22295.418..22320.100 rows=1 loops=1)
-> Nested Loop (cost=93112.74..2876169.16 rows=1921561 width=9) (actual time=265.880..22000.432 rows=2413500 loops=1)
-> Index Only Scan using tbl_inf_pkc on tbl_inf (cost=0.29..8.31 rows=1 width=9) (actual time=0.013..0.020 rows=1 loops=1)
Index Cond: ((ms_cd = 'MLD009'::bpchar) AND (ry_cd = '00000001'::bpchar))
Heap Fetches: 1
-> Gather (cost=93112.45..2856945.24 rows=1921561 width=18) (actual time=265.864..21535.325 rows=2413500 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Bitmap Heap Scan on tbl_sha (cost=92112.45..2663789.14 rows=800650 width=18) (actual time=260.540..21442.169 rows=804500 loops=3)
Recheck Cond: (ms_cd = 'MLD009'::bpchar)
Rows Removed by Index Recheck: 49
Filter: (etrys = '00000001'::bpchar)
Rows Removed by Filter: 295500
Heap Blocks: exact=13788 lossy=10565
-> Bitmap Index Scan on index_search_04_mscd_cdate (cost=0.00..91632.06 rows=3402599 width=0) (actual time=249.718..249.718 rows=3300000 loops=1)
Index Cond: (ms_cd = 'MLD009'::bpchar)
Planning Time: 0.670 ms
Execution Time: 22320.153 ms

At 2023-07-25 21:04:16, "David Rowley" <dgrowleyml@gmail.com> wrote:

Show quoted text

On Fri, 21 Jul 2023 at 13:44, gzh <gzhcoder@126.com> wrote:

The definitions of the columns used in SQL are as follows.

TBL_SHA
ms_cd character(6) NOT NULL -- PRIMARY KEY
et_cd character(8)
etrys character(8)

explain analyze
select COUNT(ET_CD)
from TBL_SHA
WHERE TBL_SHA.MS_CD = 'MLD009'
and TBL_SHA.ETRYS in
(select TBL_INF.RY_CD
from TBL_INF
WHERE TBL_INF.MS_CD = 'MLD009'
AND TBL_INF.RY_CD = '00000001'
)
----- Execution Plan -----
Limit (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.769..124168.771 rows=1 loops=1)
-> Aggregate (cost=2738709.57..2738709.58 rows=1 width=8) (actual time=124168.767..124168.769 rows=1 loops=1)
-> Nested Loop (cost=0.29..2730702.63 rows=3202774 width=9) (actual time=97264.166..123920.769 rows=3200000 loops=1)
-> Index Only Scan using TBL_INF_pkc on TBL_INF (cost=0.29..8.31 rows=1 width=9) (actual time=0.025..0.030 rows=1 loops=1)
Index Cond: ((MS_CD = 'MLD009'::bpchar) AND (RY_CD = '00000001'::bpchar))
Heap Fetches: 1
-> Seq Scan on TBL_SHA (cost=0.00..2698666.58 rows=3202774 width=18) (actual time=97264.138..123554.792 rows=3200000 loops=1)
Filter: ((MS_CD = 'MLD009'::bpchar) AND (ETRYS = '00000001'::bpchar))
Rows Removed by Filter: 32000325
Planning Time: 0.162 ms
Execution Time: 124168.838 ms

You might want to check your description of the table definitions.
Going by the above EXPLAIN ANALYZE output, it very much does not look
like ms_cd is the primary key of TBL_SHA. If it is then it's very
weird that you have 3200000 rows for MS_CD = 'MLD009'. You have some
data corruption if that's the case. I suspect you've just not
accurately described the table definition, however.

David

dgrowleyml@gmail.com

over 2 years ago

In reply to: gzh (#20)

gzhcoder@126.com

over 2 years ago

In reply to: David Rowley (#21)

carlbsb@gmail.com

over 2 years ago

In reply to: gzh (#22)

gzhcoder@126.com

over 2 years ago

In reply to: Charly (#23)

Peter J. Holzer

hjp-pgsql@hjp.at

over 2 years ago

In reply to: gzh (#20)

haramrae@gmail.com

over 2 years ago

In reply to: Peter J. Holzer (#25)