Question on execution plan and suitable index

Started by yudhi s4 months ago10 messagesgeneral

learnerdatabase99@gmail.com

4 months ago

Hi,
It's postgres version 17. We are having a critical UI query which runs for
~7 seconds+. The requirement is to bring down the response time within ~1
sec. Now in this plan , If i read this correctly, the below section is
consuming a significant amount of resources and should be addressed. i.e.
"Full scan of table "orders" and Nested loop with event_audit_log table".

*Below is the query and its complete plan:- *
https://gist.github.com/databasetech0073/f564ac23ee35d1f0413980fe4d00efa9

I am a bit new to the indexing strategy in postgres. My question is, what
suitable index should we create to cater these above?

1)For table event_audit_log:- Should we create composite Index on column
(request_id,created_at,event_comment_text) or should we create the covering
index i.e. just on two column (request_id,created_at) with "include" clause
for "event_comment_text". How and when the covering index indexes should be
used here in postgres. Want to understand from experts?
2)Similarly for table orders:- Should we create a covering index on column
(entity_id,due_date,order_type) with include clause (firm_dspt_case_id). Or
just a composite index (entity_id,due_date,order_type).
3)Whether the column used as range operator (here created_at or due_date)
should be used as leading column in the composite index or is it fine to
keep it as non leading?

-> Nested Loop (cost=50.06..2791551.71 rows=3148 width=19) (actual
time=280.735..7065.313 rows=57943 loops=3)
Buffers: shared hit=10014901
-> Hash Join (cost=49.49..1033247.35 rows=36729 width=8) (actual
time=196.407..3805.755 rows=278131 loops=3)
Hash Cond: ((ord.entity_id)::numeric = e.entity_id)
Buffers: shared hit=755352
-> Parallel Seq Scan on orders ord (cost=0.00..1022872.54 rows=3672860
width=16) (actual time=139.883..3152.627 rows=2944671 loops=3)
Filter: ((due_date >= '2024-01-01'::date) AND (due_date <=
'2024-04-01'::date) AND (order_type = ANY ('{TYPE_A,TYPE_B}'::text[])))
Rows Removed by Filter: 6572678
Buffers: shared hit=755208

Regards
Yudhi

Adrian Klaver

adrian.klaver@aklaver.com

4 months ago

In reply to: yudhi s (#1)

Re: Question on execution plan and suitable index

On 2/15/26 11:04, yudhi s wrote:

Hi,
It's postgres version 17. We are having a critical UI query which runs
for ~7 seconds+. The requirement is to bring down the response time
within ~1 sec. Now in this plan , If i read this correctly, the below
section is consuming a significant amount of resources and should be
addressed. i.e. "Full scan of table "orders" and Nested loop with
event_audit_log table".

For a start:

1) Supply the complete schema for the tables involved.

2) Also what is the minor version you are using e.g the x in 17.x?

I also recommend reading:

https://wiki.postgresql.org/wiki/Slow_Query_Questions

*Below is the query and its complete plan:- *
https://gist.github.com/databasetech0073/
f564ac23ee35d1f0413980fe4d00efa9 <https://gist.github.com/
databasetech0073/f564ac23ee35d1f0413980fe4d00efa9>

I am a bit new to the indexing strategy in postgres. My question is,
what suitable index should we create to cater these above?

1)For table event_audit_log:- Should we create composite Index on column
(request_id,created_at,event_comment_text) or should we create the
covering index i.e. just on two column (request_id,created_at) with
"include" clause for "event_comment_text". How and when the covering
index indexes should be used here in postgres. Want to understand from
experts?
2)Similarly for table orders:- Should we create a covering index on
column (entity_id,due_date,order_type) with include clause
(firm_dspt_case_id). Or just a composite index
(entity_id,due_date,order_type).
3)Whether the column used as range operator (here created_at or
due_date) should be used as leading column in the composite index or is
it fine to keep it as non leading?

-> Nested Loop (cost=50.06..2791551.71 rows=3148 width=19) (actual
time=280.735..7065.313 rows=57943 loops=3)
Buffers: shared hit=10014901
-> Hash Join (cost=49.49..1033247.35 rows=36729 width=8) (actual
time=196.407..3805.755 rows=278131 loops=3)
Hash Cond: ((ord.entity_id)::numeric = e.entity_id)
Buffers: shared hit=755352
-> Parallel Seq Scan on orders ord (cost=0.00..1022872.54 rows=3672860
width=16) (actual time=139.883..3152.627 rows=2944671 loops=3)
Filter: ((due_date >= '2024-01-01'::date) AND (due_date <=
'2024-04-01'::date) AND (order_type = ANY ('{TYPE_A,TYPE_B}'::text[])))
Rows Removed by Filter: 6572678
Buffers: shared hit=755208

Regards
Yudhi

--
Adrian Klaver
adrian.klaver@aklaver.com

Nisarg Patel

er.nisarg@gmail.com

4 months ago

In reply to: Adrian Klaver (#2)

Re: Question on execution plan and suitable index

I would definitely recommend to focus on this section:
Filter: ((due_date >= '2024-01-01'::date) AND (due_date <=
'2024-04-01'::date) AND (order_type = ANY ('{TYPE_A,TYPE_B}'::text[])))
Rows Removed by Filter: 6572678

You can certainly try composite index or partial index for order_type.

Thank you,
Nisarg

On Sun, Feb 15, 2026, 3:51 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 2/15/26 11:04, yudhi s wrote:

Hi,
It's postgres version 17. We are having a critical UI query which runs
for ~7 seconds+. The requirement is to bring down the response time
within ~1 sec. Now in this plan , If i read this correctly, the below
section is consuming a significant amount of resources and should be
addressed. i.e. "Full scan of table "orders" and Nested loop with
event_audit_log table".

For a start:

1) Supply the complete schema for the tables involved.

2) Also what is the minor version you are using e.g the x in 17.x?

I also recommend reading:

https://wiki.postgresql.org/wiki/Slow_Query_Questions

*Below is the query and its complete plan:- *
https://gist.github.com/databasetech0073/
f564ac23ee35d1f0413980fe4d00efa9 <https://gist.github.com/
databasetech0073/f564ac23ee35d1f0413980fe4d00efa9>

I am a bit new to the indexing strategy in postgres. My question is,
what suitable index should we create to cater these above?

1)For table event_audit_log:- Should we create composite Index on column
(request_id,created_at,event_comment_text) or should we create the
covering index i.e. just on two column (request_id,created_at) with
"include" clause for "event_comment_text". How and when the covering
index indexes should be used here in postgres. Want to understand from
experts?
2)Similarly for table orders:- Should we create a covering index on
column (entity_id,due_date,order_type) with include clause
(firm_dspt_case_id). Or just a composite index
(entity_id,due_date,order_type).
3)Whether the column used as range operator (here created_at or
due_date) should be used as leading column in the composite index or is
it fine to keep it as non leading?

-> Nested Loop (cost=50.06..2791551.71 rows=3148 width=19) (actual
time=280.735..7065.313 rows=57943 loops=3)
Buffers: shared hit=10014901
-> Hash Join (cost=49.49..1033247.35 rows=36729 width=8) (actual
time=196.407..3805.755 rows=278131 loops=3)
Hash Cond: ((ord.entity_id)::numeric = e.entity_id)
Buffers: shared hit=755352
-> Parallel Seq Scan on orders ord (cost=0.00..1022872.54 rows=3672860
width=16) (actual time=139.883..3152.627 rows=2944671 loops=3)
Filter: ((due_date >= '2024-01-01'::date) AND (due_date <=
'2024-04-01'::date) AND (order_type = ANY ('{TYPE_A,TYPE_B}'::text[])))
Rows Removed by Filter: 6572678
Buffers: shared hit=755208

Regards
Yudhi

--
Adrian Klaver
adrian.klaver@aklaver.com

Laurenz Albe

laurenz.albe@cybertec.at

4 months ago

In reply to: yudhi s (#1)

Re: Question on execution plan and suitable index

On Mon, 2026-02-16 at 00:34 +0530, yudhi s wrote:

It's postgres version 17. We are having a critical UI query which runs for ~7 seconds+. The requirement is to bring down the response time within ~1 sec. Now in this plan , If i read this correctly, the below section is consuming a significant amount of resources and should be addressed. i.e. "Full scan of table "orders" and Nested loop with event_audit_log table".

Below is the query and its complete plan:-
https://gist.github.com/databasetech0073/f564ac23ee35d1f0413980fe4d00efa9

I am a bit new to the indexing strategy in postgres. My question is, what suitable index should we create to cater these above?

1)For table event_audit_log:- Should we create composite Index on column (request_id,created_at,event_comment_text) or should we create the covering index i.e. just on two column (request_id,created_at) with "include" clause for "event_comment_text". How and when the covering index indexes should be used here in postgres. Want to understand from experts?
2)Similarly for table orders:- Should we create a covering index on column (entity_id,due_date,order_type) with include clause (firm_dspt_case_id). Or just a composite index (entity_id,due_date,order_type).
3)Whether the column used as range operator (here created_at or due_date) should be used as leading column in the composite index or is it fine to keep it as non leading?

-> Nested Loop (cost=50.06..2791551.71 rows=3148 width=19) (actual time=280.735..7065.313 rows=57943 loops=3)
  Buffers: shared hit=10014901
  -> Hash Join (cost=49.49..1033247.35 rows=36729 width=8) (actual time=196.407..3805.755 rows=278131 loops=3)
Hash Cond: ((ord.entity_id)::numeric = e.entity_id)
Buffers: shared hit=755352
-> Parallel Seq Scan on orders ord (cost=0.00..1022872.54 rows=3672860 width=16) (actual time=139.883..3152.627 rows=2944671 loops=3)
  Filter: ((due_date >= '2024-01-01'::date) AND (due_date <= '2024-04-01'::date) AND (order_type = ANY ('{TYPE_A,TYPE_B}'::text[])))
  Rows Removed by Filter: 6572678
  Buffers: shared hit=755208

You are selecting a lot of rows, so the query will never be really cheap.
But I agree that an index scan should be a win.

If the condition on "order_type" is always the same, a partial index is ideal:

CREATE INDEX ON orders (due_date) WHERE order_type IN ('TYPE_A', 'TYPE_B');

Otherwise, I'd create two indexes: one on "order_type" and one on "due_date".

Yours,
Laurenz Albe

yudhi s

learnerdatabase99@gmail.com

4 months ago

In reply to: Laurenz Albe (#4)

Re: Question on execution plan and suitable index

On Mon, Feb 16, 2026 at 2:29 PM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

On Mon, 2026-02-16 at 00:34 +0530, yudhi s wrote:

It's postgres version 17. We are having a critical UI query which runs

for ~7 seconds+. The requirement is to bring down the response time within
~1 sec. Now in this plan , If i read this correctly, the below section is
consuming a significant amount of resources and should be addressed. i.e.
"Full scan of table "orders" and Nested loop with event_audit_log table".

Below is the query and its complete plan:-

https://gist.github.com/databasetech0073/f564ac23ee35d1f0413980fe4d00efa9

I am a bit new to the indexing strategy in postgres. My question is,

what suitable index should we create to cater these above?

1)For table event_audit_log:- Should we create composite Index on column

(request_id,created_at,event_comment_text) or should we create the covering
index i.e. just on two column (request_id,created_at) with "include" clause
for "event_comment_text". How and when the covering index indexes should be
used here in postgres. Want to understand from experts?

2)Similarly for table orders:- Should we create a covering index on

column (entity_id,due_date,order_type) with include clause
(firm_dspt_case_id). Or just a composite index
(entity_id,due_date,order_type).

3)Whether the column used as range operator (here created_at or

due_date) should be used as leading column in the composite index or is it
fine to keep it as non leading?

->

*Nested Loop (cost=50.06..2791551.71 rows=3148 width=19) (actual
time=280.735..7065.313 rows=57943 loops=3) > Buffers: shared hit=10014901*

-> Hash Join (cost=49.49..1033247.35 rows=36729 width=8) (actual

time=196.407..3805.755 rows=278131 loops=3)

Hash Cond: ((ord.entity_id)::numeric = e.entity_id)
Buffers: shared hit=755352
-> Parallel Seq Scan on orders ord (cost=0.00..1022872.54

rows=3672860 width=16) (actual time=139.883..3152.627 rows=2944671 loops=3)

Filter: ((due_date >= '2024-01-01'::date) AND (due_date <=

'2024-04-01'::date) AND (order_type = ANY ('{TYPE_A,TYPE_B}'::text[])))

Rows Removed by Filter: 6572678
Buffers: shared hit=755208

You are selecting a lot of rows, so the query will never be really cheap.
But I agree that an index scan should be a win.

If the condition on "order_type" is always the same, a partial index is
ideal:

CREATE INDEX ON orders (due_date) WHERE order_type IN ('TYPE_A',
'TYPE_B');

Otherwise, I'd create two indexes: one on "order_type" and one on
"due_date".

Version is 17.7. Below is the table definitions as i pulled from Dbeaver
tool:-

https://gist.github.com/databasetech0073/f22d95de18dc3f1fa54af13e7fd2ce9e

The Order_type will be TYPE_A and TYPE_B in most of the cases. And below is
the distribution. So , it looks like the index on this column will not help
much. Correct me if I'm wrong. I am wondering why the already existing
index on column "due_date" of table "order" is not getting used by the
optimizer? Should we also add the column "entity_id" to the index too?

TYPE_A 25 Million
TYPE_B 2 Million
TYPE_C 700K
TYPE_D 200K
TYPE_E 6k

And, Yes there are differences in data types of the "entity_id" for columns
of table "order" and "entity". We need to fix that after analyzing the data.

Also the highlighted Nested loop above shows ~10M shared hits (which will
be ~70GB+ if we consider one hit as an 8K block). So does that mean , apart
from the Full scan on the "order" table , the main resource consuming
factor here is the scanning of "event_audit_log". And what is the best way
to improve this? Currently this table is getting scanned through an unique
index on column "request_id".

Regards
Yudhi

Laurenz Albe

laurenz.albe@cybertec.at

4 months ago

In reply to: yudhi s (#5)

Re: Question on execution plan and suitable index

On Mon, 2026-02-16 at 14:43 +0530, yudhi s wrote:

On Mon, Feb 16, 2026 at 2:29 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Mon, 2026-02-16 at 00:34 +0530, yudhi s wrote:

It's postgres version 17. We are having a critical UI query which runs for ~7 seconds+. The requirement is to bring down the response time within ~1 sec. Now in this plan , If i read this correctly, the below section is consuming a significant amount of resources and should be addressed. i.e. "Full scan of table "orders" and Nested loop with event_audit_log table".

Below is the query and its complete plan:-
https://gist.github.com/databasetech0073/f564ac23ee35d1f0413980fe4d00efa9

I am a bit new to the indexing strategy in postgres. My question is, what suitable index should we create to cater these above?

1)For table event_audit_log:- Should we create composite Index on column (request_id,created_at,event_comment_text) or should we create the covering index i.e. just on two column (request_id,created_at) with "include" clause for "event_comment_text". How and when the covering index indexes should be used here in postgres. Want to understand from experts?
2)Similarly for table orders:- Should we create a covering index on column (entity_id,due_date,order_type) with include clause (firm_dspt_case_id). Or just a composite index (entity_id,due_date,order_type).
3)Whether the column used as range operator (here created_at or due_date) should be used as leading column in the composite index or is it fine to keep it as non leading?

-> Nested Loop (cost=50.06..2791551.71 rows=3148 width=19) (actual time=280.735..7065.313 rows=57943 loops=3)
  Buffers: shared hit=10014901
  -> Hash Join (cost=49.49..1033247.35 rows=36729 width=8) (actual time=196.407..3805.755 rows=278131 loops=3)
Hash Cond: ((ord.entity_id)::numeric = e.entity_id)
Buffers: shared hit=755352
-> Parallel Seq Scan on orders ord (cost=0.00..1022872.54 rows=3672860 width=16) (actual time=139.883..3152.627 rows=2944671 loops=3)
  Filter: ((due_date >= '2024-01-01'::date) AND (due_date <= '2024-04-01'::date) AND (order_type = ANY ('{TYPE_A,TYPE_B}'::text[])))
  Rows Removed by Filter: 6572678
  Buffers: shared hit=755208

You are selecting a lot of rows, so the query will never be really cheap.
But I agree that an index scan should be a win.

If the condition on "order_type" is always the same, a partial index is ideal:

CREATE INDEX ON orders (due_date) WHERE order_type IN ('TYPE_A', 'TYPE_B');

Otherwise, I'd create two indexes: one on "order_type" and one on "due_date".

Version is 17.7. Below is the table definitions as i pulled from Dbeaver tool:-

https://gist.github.com/databasetech0073/f22d95de18dc3f1fa54af13e7fd2ce9e

The Order_type will be TYPE_A and TYPE_B in most of the cases. And below is the distribution.
So , it looks like the index on this column will not help much. Correct me if I'm wrong.

TYPE_A 25 Million
TYPE_B 2 Million
TYPE_C 700K
TYPE_D 200K
TYPE_E 6k

No, you are right about that.

I am wondering why the already existing index on column "due_date" of table "order" is not
getting used by the optimizer? Should we also add the column "entity_id" to the index too?

Seeing that your execution plan is incomplete, it is hard to say anything about that.
The scans of "entities" are missing, as is the UNION.

And, Yes there are differences in data types of the "entity_id" for columns of table "order"
and "entity". We need to fix that after analyzing the data.

Also the highlighted Nested loop above shows ~10M shared hits (which will be ~70GB+ if we
consider one hit as an 8K block). So does that mean , apart from the Full scan on the "order"
table , the main resource consuming factor here is the scanning of "event_audit_log".

Correct.

Yours,
Laurenz Albe

yudhi s

learnerdatabase99@gmail.com

4 months ago

In reply to: Laurenz Albe (#6)

Re: Question on execution plan and suitable index

On Mon, Feb 16, 2026 at 3:24 PM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

On Mon, 2026-02-16 at 14:43 +0530, yudhi s wrote:

On Mon, Feb 16, 2026 at 2:29 PM Laurenz Albe <laurenz.albe@cybertec.at>

wrote:

On Mon, 2026-02-16 at 00:34 +0530, yudhi s wrote:

It's postgres version 17. We are having a critical UI query which

runs for ~7 seconds+. The requirement is to bring down the response time
within ~1 sec. Now in this plan , If i read this correctly, the below
section is consuming a significant amount of resources and should be
addressed. i.e. "Full scan of table "orders" and Nested loop with
event_audit_log table".

Below is the query and its complete plan:-

https://gist.github.com/databasetech0073/f564ac23ee35d1f0413980fe4d00efa9

I am a bit new to the indexing strategy in postgres. My question is,

what suitable index should we create to cater these above?

1)For table event_audit_log:- Should we create composite Index on

column (request_id,created_at,event_comment_text) or should we create the
covering index i.e. just on two column (request_id,created_at) with
"include" clause for "event_comment_text". How and when the covering index
indexes should be used here in postgres. Want to understand from experts?

2)Similarly for table orders:- Should we create a covering index on

column (entity_id,due_date,order_type) with include clause
(firm_dspt_case_id). Or just a composite index
(entity_id,due_date,order_type).

3)Whether the column used as range operator (here created_at or

due_date) should be used as leading column in the composite index or is it
fine to keep it as non leading?

-> Nested Loop (cost=50.06..2791551.71 rows=3148 width=19) (actual

time=280.735..7065.313 rows=57943 loops=3)

Buffers: shared hit=10014901
-> Hash Join (cost=49.49..1033247.35 rows=36729 width=8) (actual

time=196.407..3805.755 rows=278131 loops=3)

Hash Cond: ((ord.entity_id)::numeric = e.entity_id)
Buffers: shared hit=755352
-> Parallel Seq Scan on orders ord (cost=0.00..1022872.54

rows=3672860 width=16) (actual time=139.883..3152.627 rows=2944671 loops=3)

Filter: ((due_date >= '2024-01-01'::date) AND (due_date <=

'2024-04-01'::date) AND (order_type = ANY ('{TYPE_A,TYPE_B}'::text[])))

Rows Removed by Filter: 6572678
Buffers: shared hit=755208

You are selecting a lot of rows, so the query will never be really

cheap.

But I agree that an index scan should be a win.

If the condition on "order_type" is always the same, a partial index

is ideal:

CREATE INDEX ON orders (due_date) WHERE order_type IN ('TYPE_A',

'TYPE_B');

Otherwise, I'd create two indexes: one on "order_type" and one on

"due_date".

Version is 17.7. Below is the table definitions as i pulled from Dbeaver

tool:-

https://gist.github.com/databasetech0073/f22d95de18dc3f1fa54af13e7fd2ce9e

The Order_type will be TYPE_A and TYPE_B in most of the cases. And below

is the distribution.

So , it looks like the index on this column will not help much. Correct

me if I'm wrong.

TYPE_A 25 Million
TYPE_B 2 Million
TYPE_C 700K
TYPE_D 200K
TYPE_E 6k

No, you are right about that.

I am wondering why the already existing index on column "due_date" of

table "order" is not

getting used by the optimizer? Should we also add the column "entity_id"

to the index too?

Seeing that your execution plan is incomplete, it is hard to say anything
about that.
The scans of "entities" are missing, as is the UNION.

And, Yes there are differences in data types of the "entity_id" for

columns of table "order"

and "entity". We need to fix that after analyzing the data.

Also the highlighted Nested loop above shows ~10M shared hits (which

will be ~70GB+ if we

consider one hit as an 8K block). So does that mean , apart from the

Full scan on the "order"

table , the main resource consuming factor here is the scanning of

"event_audit_log".

Correct.

Yours,
Laurenz Albe

Hi,
I have updated the plan below. While trying to replace actual binds and the
objects with sample names some lines got missed initially it seems.

https://gist.github.com/databasetech0073/f564ac23ee35d1f0413980fe4d00efa9

Laurenz Albe

laurenz.albe@cybertec.at

4 months ago

In reply to: yudhi s (#7)

Re: Question on execution plan and suitable index

On Mon, 2026-02-16 at 16:09 +0530, yudhi s wrote:

I have updated the plan below. While trying to replace actual binds and
the objects with sample names some lines got missed initially it seems.

https://gist.github.com/databasetech0073/f564ac23ee35d1f0413980fe4d00efa9

Thanks.

Does the filter on "due_date" eliminate many rows in "orders"? If yes,
and an index on that column would actually perform better (which you
could test with enable_seqscan = off), perhaps your "random_page_cost"
parameter is set too high.

Where you can certainly make a difference is the repeated scan on
"event_audit_log". An index on (request_id, event_comment_text, created_at)
should speed up that part.

Yours,
Laurenz Albe

yudhi s

learnerdatabase99@gmail.com

4 months ago

In reply to: Laurenz Albe (#8)

Re: Question on execution plan and suitable index

On Mon, Feb 16, 2026 at 5:22 PM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

On Mon, 2026-02-16 at 16:09 +0530, yudhi s wrote:

I have updated the plan below. While trying to replace actual binds and
the objects with sample names some lines got missed initially it seems.

https://gist.github.com/databasetech0073/f564ac23ee35d1f0413980fe4d00efa9

Thanks.

Does the filter on "due_date" eliminate many rows in "orders"? If yes,
and an index on that column would actually perform better (which you
could test with enable_seqscan = off), perhaps your "random_page_cost"
parameter is set too high.

Where you can certainly make a difference is the repeated scan on
"event_audit_log". An index on (request_id, event_comment_text,
created_at)
should speed up that part.

Yours,
Laurenz Albe

Thank you so much. Will try this one.

Regarding the composite index on (request_id, event_comment_text,
created_at) for table event_audit_log, is there any advice, which we should
follow for keeping "date column"(like column "Created_at" here) in the
indexing order (apart from the frequency of usage in the query)?

And to help the table scan of the ORDER table, should we also have
"entity_id" added to the index along with "due_date" i.e. a composite index
on (entity_id,due_date)?

Regards
Yudhi

#10

Laurenz Albe

laurenz.albe@cybertec.at

4 months ago

In reply to: yudhi s (#9)

Re: Question on execution plan and suitable index

On Mon, 2026-02-16 at 17:52 +0530, yudhi s wrote:

Regarding the composite index on (request_id, event_comment_text, created_at) for
table event_audit_log, is there any advice, which we should follow for keeping
"date column"(like column "Created_at" here) in the indexing order (apart from
the frequency of usage in the query)?

The rule is to first have all columns that are compared with equality, then the
others, starting with the most selective one. Based on the condition, I guessed
that that wouldn't be your "date" column, but if it is, put it second.

And to help the table scan of the ORDER table, should we also have "entity_id"
added to the index along with "due_date" i.e. a composite index on (entity_id,due_date)?

I don't know; you'd have to test it on your test system.
The current execution plan has no use for such an index, but adding additional
columns could

1. lead to an efficient index-only scan

2. make PostgreSQL pick an altogether different, better plan

Yours,
Laurenz Albe