BUG #18600: Getting wait_type_event as IPC:BTreePage for count queries

Started by PG Bug reporting formover 1 year ago5 messagesbugs

noreply@postgresql.org

over 1 year ago

The following bug has been logged on the website:

Bug reference: 18600
Logged by: Aditya Singh
Email address: aditya.singh@lji.io
PostgreSQL version: 13.11
Operating system: macOS
Description:

Table1:
Structure:
CREATE TABLE "femsa"."activities_bit" (
"id" int4 NOT NULL DEFAULT
nextval('femsa.activities_bit_copy_id_seq'::regclass),
"created_ts" timestamptz,
"member_id" varchar(40),
"bit_reference" varchar(32),
"header" jsonb,
"payment_details" jsonb,
PRIMARY_KEY("id")
}

Table 2:
Structure:
CREATE TABLE "femsa"."activities_bit" (
"id" int8 NOT NULL DEFAULT
nextval('femsa.activities_bit_copy_id_seq'::regclass),
"created_ts" timestamptz,
"member_id" varchar(40),
"bit_reference" varchar(32),
"header" jsonb,
"payment_details" jsonb,
PRIMARY_KEY("id")
}

The only difference between the two is the data type of the id sequence
(int, bigint).

SELECT COUNT(*) FROM {table_name} WHERE id > 0;
When I try to run the above query, I get the result for Table 1, but the
query on Table 2 goes into wait_type_event IPC:BtreePage.

Why this IPC:BtreePage is coming for Table2 when both tables have similar
structure, data and Indexes?

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 1 year ago

In reply to: PG Bug reporting form (#1)

Re: BUG #18600: Getting wait_type_event as IPC:BTreePage for count queries

On 9/4/24 11:21, PG Bug reporting form wrote:

The following bug has been logged on the website:

Bug reference: 18600
Logged by: Aditya Singh
Email address: aditya.singh@lji.io
PostgreSQL version: 13.11
Operating system: macOS
Description:

Table1:
Structure:
CREATE TABLE "femsa"."activities_bit" (
"id" int4 NOT NULL DEFAULT
nextval('femsa.activities_bit_copy_id_seq'::regclass),
"created_ts" timestamptz,
"member_id" varchar(40),
"bit_reference" varchar(32),
"header" jsonb,
"payment_details" jsonb,
PRIMARY_KEY("id")
}

Table 2:
Structure:
CREATE TABLE "femsa"."activities_bit" (
"id" int8 NOT NULL DEFAULT
nextval('femsa.activities_bit_copy_id_seq'::regclass),
"created_ts" timestamptz,
"member_id" varchar(40),
"bit_reference" varchar(32),
"header" jsonb,
"payment_details" jsonb,
PRIMARY_KEY("id")
}

The only difference between the two is the data type of the id sequence
(int, bigint).

SELECT COUNT(*) FROM {table_name} WHERE id > 0;
When I try to run the above query, I get the result for Table 1, but the
query on Table 2 goes into wait_type_event IPC:BtreePage.

Why this IPC:BtreePage is coming for Table2 when both tables have similar
structure, data and Indexes?

Not sure I understood the question correctly, but chances are both
queries hit that wait event, but it's intermittent and you only noticed
that for one of them. Or are you saying it gets "stuck" in that wait
even for the table with int8 column?

Another option is that the first query does not use the index at all, in
which case it of course won't see btree wait events. We don't even know
how large the tables are, not to mention which access methods they use.

You'd have to show us explain. I'd bet the first query uses seqscan,
while the second one uses index-only scan.

regards

--
Tomas Vondra

Aditya Singh

aditya.singh@lji.io

over 1 year ago

In reply to: Tomas Vondra (#2)

Re: BUG #18600: Getting wait_type_event as IPC:BTreePage for count queries

The query on the first table didn't hit the wait event; it hit on the
second table. I can show you the same on RDS Database Insights. Both tables
have the same *Indexes*, *Data*, and *Columns*. There is only a difference
in the ID datatype. Is there any possibility of scheduling a call to
troubleshoot this issue since it is happening in the Production
environment?

Regards
Aditya Narayan

On Wed, Sep 4, 2024 at 9:43 PM Tomas Vondra <tomas@vondra.me> wrote:

On 9/4/24 11:21, PG Bug reporting form wrote:

The following bug has been logged on the website:

Bug reference: 18600
Logged by: Aditya Singh
Email address: aditya.singh@lji.io
PostgreSQL version: 13.11
Operating system: macOS
Description:

Table1:
Structure:
CREATE TABLE "femsa"."activities_bit" (
"id" int4 NOT NULL DEFAULT
nextval('femsa.activities_bit_copy_id_seq'::regclass),
"created_ts" timestamptz,
"member_id" varchar(40),
"bit_reference" varchar(32),
"header" jsonb,
"payment_details" jsonb,
PRIMARY_KEY("id")
}

Table 2:
Structure:
CREATE TABLE "femsa"."activities_bit" (
"id" int8 NOT NULL DEFAULT
nextval('femsa.activities_bit_copy_id_seq'::regclass),
"created_ts" timestamptz,
"member_id" varchar(40),
"bit_reference" varchar(32),
"header" jsonb,
"payment_details" jsonb,
PRIMARY_KEY("id")
}

The only difference between the two is the data type of the id sequence
(int, bigint).

SELECT COUNT(*) FROM {table_name} WHERE id > 0;
When I try to run the above query, I get the result for Table 1, but the
query on Table 2 goes into wait_type_event IPC:BtreePage.

Why this IPC:BtreePage is coming for Table2 when both tables have similar
structure, data and Indexes?

Not sure I understood the question correctly, but chances are both
queries hit that wait event, but it's intermittent and you only noticed
that for one of them. Or are you saying it gets "stuck" in that wait
even for the table with int8 column?

Another option is that the first query does not use the index at all, in
which case it of course won't see btree wait events. We don't even know
how large the tables are, not to mention which access methods they use.

You'd have to show us explain. I'd bet the first query uses seqscan,
while the second one uses index-only scan.

regards

--
Tomas Vondra

*Confidentiality Warning:*
This message and any attachments are intended
only for the use of the intended recipient(s), are confidential, and may be
privileged. If you are not the intended recipient, you are hereby notified
that any disclosure, copying, distribution, or other use of this message
and any attachments is strictly prohibited. If received in error, please
notify the sender immediately and permanently delete it.

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 1 year ago

In reply to: Aditya Singh (#3)

Re: BUG #18600: Getting wait_type_event as IPC:BTreePage for count queries

On 9/5/24 10:11, Aditya Singh wrote:

The query on the first table didn't hit the wait event; it hit on the
second table. I can show you the same on RDS Database Insights. Both
tables have the same *Indexes*, *Data*, and *Columns*. There is only a
difference in the ID datatype. Is there any possibility of scheduling a
call to troubleshoot this issue since it is happening in the Production
environment?

As this is a RDS database, you should probably talk to AWS support?

Other than that, share the explain plans for the two queries (as I asked
in my previous response).

BTW please don't top post. It's customary to reply inline on this list.

regards

--
Tomas Vondra

Aditya Singh

aditya.singh@lji.io

over 1 year ago

In reply to: Tomas Vondra (#4)

Re: BUG #18600: Getting wait_type_event as IPC:BTreePage for count queries

*QUERY 1: *
*explain select count(*) from activities_bit_old where id > 0; (id column
data-type is int4)*Finalize Aggregate (cost=75709053.22..75709053.23
rows=1 width=8)
-> Gather (cost=75709053.00..75709053.21 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=75708053.00..75708053.01 rows=1
width=8)
-> Parallel Index Only Scan using activities_bit_pkey on
activities_bit_old (cost=0.58..73513797.33 rows=877702270 width=0)
Index Cond: (id > 0)

*QUERY 2*: *explain select count(*) from activities_bit where id > 0; (id
column data-type is int8)*
Finalize Aggregate (cost=117811065.69..117811065.70 rows=1 width=8)
-> Gather (cost=117811065.48..117811065.69 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=117810065.48..117810065.49 rows=1
width=8)
-> Parallel Index Only Scan using activities_bit_copy_pkey
on activities_bit (cost=0.58..115695086.65 rows=845991530 width=0)
Index Cond: (id > 0)

Above mentioned is the query plan for Table 1 and Table 2;

On Thu, Sep 5, 2024 at 7:11 PM Tomas Vondra <tomas@vondra.me> wrote:

On 9/5/24 10:11, Aditya Singh wrote:

The query on the first table didn't hit the wait event; it hit on the
second table. I can show you the same on RDS Database Insights. Both
tables have the same *Indexes*, *Data*, and *Columns*. There is only a
difference in the ID datatype. Is there any possibility of scheduling a
call to troubleshoot this issue since it is happening in the Production
environment?

As this is a RDS database, you should probably talk to AWS support?

Other than that, share the explain plans for the two queries (as I asked
in my previous response).

BTW please don't top post. It's customary to reply inline on this list.

regards

--
Tomas Vondra