BUG #18690: A count function returns wrong value when using FDW

Started by PG Bug reporting formover 1 year ago6 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18690
Logged by: Aidar Uldanov
Email address: aidar@donorbox.org
PostgreSQL version: 16.4
Operating system: Ubuntu
Description:

My following query returns a single average_donation_usd value

```
WITH

transactions AS ( SELECT * FROM transactions_fwd WHERE org_id = 1
),
recurring_payments AS ( SELECT * FROM recurring_payments_fwd WHERE
org_id = 1 )
SELECT AVG(t.usd_amount_cents / 100) AS average_donation_usd
FROM transactions t
JOIN recurring_payments r ON r.id = t.recurring_payment_id
WHERE t.status = 'paid' AND t.deleted_at IS NULL AND t.amount_refunded_cents
= 0
```

But when I wrap it to `select count(*) from (my_query)` i get 288 as a
result, but I expect it to be 1.

```
select count(*)
FROM (
WITH

transactions AS ( SELECT * FROM transactions_fwd WHERE org_id = 1
),
recurring_payments AS ( SELECT * FROM recurring_payments_fwd WHERE
org_id = 1 )
SELECT AVG(t.usd_amount_cents / 100) AS average_donation_usd
FROM transactions t
JOIN recurring_payments r ON r.id = t.recurring_payment_id
WHERE t.status = 'paid' AND t.deleted_at IS NULL AND t.amount_refunded_cents
= 0
);
```

I tried to create materialized views to see it this bug relates to using FWD
tables.

```
CREATE MATERIALIZED VIEW transactions AS
SELECT * FROM transactions_fwd WHERE org_id = 1;

CREATE MATERIALIZED VIEW recurring_payments AS
SELECT * FROM recurring_payments_fwd WHERE org_id = 1;
```

And it worked well returning me 1 as a result.

```
select count(*)
FROM (
WITH

transactions AS ( SELECT * FROM transactions WHERE org_id = 1 ),
recurring_payments AS ( SELECT * FROM recurring_payments WHERE
org_id = 1 )
SELECT AVG(t.usd_amount_cents / 100) AS average_donation_usd
FROM transactions t
JOIN recurring_payments r ON r.id = t.recurring_payment_id
WHERE t.status = 'paid' AND t.deleted_at IS NULL AND t.amount_refunded_cents
= 0
);

```

My OS and Postgres version (hosted on heroku)

```
User => select version();
version

-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 16.4 (Ubuntu 16.4-1.pgdg20.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
(1 row)
```

#2David Rowley
dgrowleyml@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #18690: A count function returns wrong value when using FDW

On Wed, 6 Nov 2024 at 22:58, PG Bug reporting form
<noreply@postgresql.org> wrote:

```
WITH

transactions AS ( SELECT * FROM transactions_fwd WHERE org_id = 1
),
recurring_payments AS ( SELECT * FROM recurring_payments_fwd WHERE
org_id = 1 )
SELECT AVG(t.usd_amount_cents / 100) AS average_donation_usd
FROM transactions t
JOIN recurring_payments r ON r.id = t.recurring_payment_id
WHERE t.status = 'paid' AND t.deleted_at IS NULL AND t.amount_refunded_cents
= 0
```

But when I wrap it to `select count(*) from (my_query)` i get 288 as a
result, but I expect it to be 1.

```
select count(*)
FROM (
WITH

transactions AS ( SELECT * FROM transactions_fwd WHERE org_id = 1
),
recurring_payments AS ( SELECT * FROM recurring_payments_fwd WHERE
org_id = 1 )
SELECT AVG(t.usd_amount_cents / 100) AS average_donation_usd
FROM transactions t
JOIN recurring_payments r ON r.id = t.recurring_payment_id
WHERE t.status = 'paid' AND t.deleted_at IS NULL AND t.amount_refunded_cents
= 0
);
```

Can you show the EXPLAIN ANALYZE of both of these queries?

Also, can you work on getting a self-contained reproducer that we can
run to recreate the issue.

David

#3Aidar Uldanov
aidar@donorbox.org
In reply to: David Rowley (#2)
Re: BUG #18690: A count function returns wrong value when using FDW

Thanks David

I am trying to reproduce it on a new db but I couldn't so far, though I was
able to simplify the queries

```select count(*)
FROM (
SELECT AVG(t.usd_amount_cents / 100) AS average
FROM transactions t
JOIN recurring_payments r ON r.id = t.recurring_payment_id
WHERE t.org_id = 1 AND r.org_id = 1 and t.status = 'paid'

);
```
count
-------
22429
(1 row)

```
select count(*)
FROM (
SELECT AVG(t.usd_amount_cents / 100) AS average
FROM transactions t
JOIN recurring_payments r ON r.id = t.recurring_payment_id
WHERE t.status = 'paid'

);
```
count
-------
1
(1 row)

Those queries are the same and both should return 1 because the inner query
returns just one AVG value. The difference between those two queries is
`t.org_id = 1 AND r.org_id = 1`

Here are SQL explains

explain verbose
select count(*)
FROM (
SELECT AVG(t.usd_amount_cents / 100) AS average
FROM transactions t
JOIN recurring_payments r ON r.id = t.recurring_payment_id
WHERE t.org_id = 1 AND r.org_id = 1 and t.status = 'paid'
);

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------
Aggregate (cost=213.22..213.22 rows=1 width=8)
Output: count(*)
-> Foreign Scan (cost=102.30..213.21 rows=1 width=32)
Output: NULL::numeric
Relations: Aggregate on ((((transactions t) INNER JOIN (orgs o))
INNER JOIN (recurring_payments rp)) INNER JOIN (orgs o_1))
Remote SQL: SELECT NULL::numeric FROM (((public.transactions r4
INNER JOIN public.orgs r5 ON (((r4.org_id = r5.id)) AND ((r5.provider_id =
1)) AND ((r4.status = 'paid')))) INNER JOIN public.recurring_payments r7 ON
(((r4.recurrin
g_payment_id = r7.id)))) INNER JOIN public.orgs r8 ON (((r7.org_id = r8.id))
AND ((r8.provider_id = 1))))
(7 rows)

explain verbose
select count(*)
FROM (
SELECT AVG(t.usd_amount_cents / 100) AS average
FROM transactions t
JOIN recurring_payments r ON r.id = t.recurring_payment_id
WHERE t.status = 'paid'
);

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------
Aggregate (cost=15478.69..15478.69 rows=1 width=8)
Output: count(*)
-> Aggregate (cost=15478.69..15478.69 rows=1 width=32)
Output: NULL::numeric
-> Foreign Scan (cost=396.66..15478.69 rows=820775 width=32)
Relations: (((transactions t) INNER JOIN (orgs o)) INNER
JOIN (recurring_payments rp)) INNER JOIN (orgs o_1)
Remote SQL: SELECT NULL FROM (((public.transactions r4 INNER
JOIN public.orgs r5 ON (((r4.org_id = r5.id)) AND ((r4.status = 'paid'))))
INNER JOIN public.recurring_payments r7 ON (((r4.recurring_payment_id =
r7.id)))) INNER
JOIN public.orgs r8 ON (((r7.org_id = r8.id))))
(8 rows)

Thanks and regards
Aidar

On Wed, Nov 6, 2024 at 12:03 PM David Rowley <dgrowleyml@gmail.com> wrote:

Show quoted text

On Wed, 6 Nov 2024 at 22:58, PG Bug reporting form
<noreply@postgresql.org> wrote:

```
WITH

transactions AS ( SELECT * FROM transactions_fwd WHERE org_id = 1
),
recurring_payments AS ( SELECT * FROM recurring_payments_fwd

WHERE

org_id = 1 )
SELECT AVG(t.usd_amount_cents / 100) AS average_donation_usd
FROM transactions t
JOIN recurring_payments r ON r.id = t.recurring_payment_id
WHERE t.status = 'paid' AND t.deleted_at IS NULL AND

t.amount_refunded_cents

= 0
```

But when I wrap it to `select count(*) from (my_query)` i get 288 as a
result, but I expect it to be 1.

```
select count(*)
FROM (
WITH

transactions AS ( SELECT * FROM transactions_fwd WHERE org_id = 1
),
recurring_payments AS ( SELECT * FROM recurring_payments_fwd

WHERE

org_id = 1 )
SELECT AVG(t.usd_amount_cents / 100) AS average_donation_usd
FROM transactions t
JOIN recurring_payments r ON r.id = t.recurring_payment_id
WHERE t.status = 'paid' AND t.deleted_at IS NULL AND

t.amount_refunded_cents

= 0
);
```

Can you show the EXPLAIN ANALYZE of both of these queries?

Also, can you work on getting a self-contained reproducer that we can
run to recreate the issue.

David

#4Aidar Uldanov
aidar@donorbox.org
In reply to: Aidar Uldanov (#3)
Re: BUG #18690: A count function returns wrong value when using FDW

David, after some troubleshooting I found out that the issue with FWD and
bigint values, so when column type int then it works well but after
changing to a bigint type a count() function returns the wrong value.

Here is how to reproduce that

-- ===> remote db schema and data
create table orgs(id int, name varchar, provider_id int);

create table recurring_payments(
id int primary key,
org_id int
);

create table transactions(
id serial primary key,
amount int,
org_id int,
recurring_payment_id int REFERENCES recurring_payments(id)
);

-- populate remote db
insert into orgs(id, name, provider_id) values (1, 'one', 11), (2, 'two',
22);
insert into recurring_payments(id, org_id) values (1, 1), (2, 1), (3, 2);
insert into transactions(amount, org_id, recurring_payment_id) values (1,
1, 1), (2, 1, null), (3, 1, null), (4, 1, 1);

-- <=== remote db

-- ===> host db

-- connect back to base database and create server and fdw
CREATE EXTENSION postgres_fdw;

CREATE SERVER foreign_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'localhost', port '5432', dbname 'test3');

CREATE USER MAPPING FOR postgres
SERVER foreign_server
OPTIONS (user 'postgres', password '12345');

CREATE FOREIGN TABLE orgs_f (
id int, name varchar, provider_id int
)
SERVER foreign_server
OPTIONS (schema_name 'public', table_name 'orgs');

CREATE FOREIGN TABLE recurring_payments_f (
id int, org_id int
)
SERVER foreign_server
OPTIONS (schema_name 'public', table_name 'recurring_payments');

CREATE FOREIGN TABLE transactions_f (
id serial,
amount bigint,
org_id int,
recurring_payment_id int
)
SERVER foreign_server
OPTIONS (schema_name 'public', table_name 'transactions');

create view transactions_v as
select t.*, o.provider_id as real_org_id
from transactions_f t
join orgs_f o on o.id = t.org_id;

create view recurring_payments_v as
select rp.*, o.provider_id as real_org_id
from recurring_payments_f rp
join orgs_f o on o.id = rp.org_id;

-- This one returns (1 + 4) / 2 = 2.5 which is correct
select avg(t.amount)
from transactions_v t
join recurring_payments_v rp on rp.id = t.recurring_payment_id
where t.real_org_id = 11 and rp.real_org_id = 11;

-- wrapping this to count gives 2 which is not correct
select count(*) from (
select avg(t.amount)
from transactions_v t
join recurring_payments_v rp on rp.id = t.recurring_payment_id
where t.real_org_id = 11 and rp.real_org_id = 11
);

On Wed, Nov 6, 2024 at 3:47 PM Aidar Uldanov <aidar@donorbox.org> wrote:

Show quoted text

Thanks David

I am trying to reproduce it on a new db but I couldn't so far, though I
was able to simplify the queries

```select count(*)
FROM (
SELECT AVG(t.usd_amount_cents / 100) AS average
FROM transactions t
JOIN recurring_payments r ON r.id = t.recurring_payment_id
WHERE t.org_id = 1 AND r.org_id = 1 and t.status = 'paid'

);
```
count
-------
22429
(1 row)

```
select count(*)
FROM (
SELECT AVG(t.usd_amount_cents / 100) AS average
FROM transactions t
JOIN recurring_payments r ON r.id = t.recurring_payment_id
WHERE t.status = 'paid'

);
```
count
-------
1
(1 row)

Those queries are the same and both should return 1 because the inner
query returns just one AVG value. The difference between those two queries
is `t.org_id = 1 AND r.org_id = 1`

Here are SQL explains

explain verbose
select count(*)
FROM (
SELECT AVG(t.usd_amount_cents / 100) AS average
FROM transactions t
JOIN recurring_payments r ON r.id = t.recurring_payment_id
WHERE t.org_id = 1 AND r.org_id = 1 and t.status = 'paid'
);

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------
Aggregate (cost=213.22..213.22 rows=1 width=8)
Output: count(*)
-> Foreign Scan (cost=102.30..213.21 rows=1 width=32)
Output: NULL::numeric
Relations: Aggregate on ((((transactions t) INNER JOIN (orgs o))
INNER JOIN (recurring_payments rp)) INNER JOIN (orgs o_1))
Remote SQL: SELECT NULL::numeric FROM (((public.transactions r4
INNER JOIN public.orgs r5 ON (((r4.org_id = r5.id)) AND ((r5.provider_id
= 1)) AND ((r4.status = 'paid')))) INNER JOIN public.recurring_payments r7
ON (((r4.recurrin
g_payment_id = r7.id)))) INNER JOIN public.orgs r8 ON (((r7.org_id = r8.id))
AND ((r8.provider_id = 1))))
(7 rows)

explain verbose
select count(*)
FROM (
SELECT AVG(t.usd_amount_cents / 100) AS average
FROM transactions t
JOIN recurring_payments r ON r.id = t.recurring_payment_id
WHERE t.status = 'paid'

);

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------
Aggregate (cost=15478.69..15478.69 rows=1 width=8)
Output: count(*)
-> Aggregate (cost=15478.69..15478.69 rows=1 width=32)
Output: NULL::numeric
-> Foreign Scan (cost=396.66..15478.69 rows=820775 width=32)
Relations: (((transactions t) INNER JOIN (orgs o)) INNER
JOIN (recurring_payments rp)) INNER JOIN (orgs o_1)
Remote SQL: SELECT NULL FROM (((public.transactions r4
INNER JOIN public.orgs r5 ON (((r4.org_id = r5.id)) AND ((r4.status =
'paid')))) INNER JOIN public.recurring_payments r7 ON
(((r4.recurring_payment_id = r7.id)))) INNER
JOIN public.orgs r8 ON (((r7.org_id = r8.id))))
(8 rows)

Thanks and regards
Aidar

On Wed, Nov 6, 2024 at 12:03 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 6 Nov 2024 at 22:58, PG Bug reporting form
<noreply@postgresql.org> wrote:

```
WITH

transactions AS ( SELECT * FROM transactions_fwd WHERE org_id =

1

),
recurring_payments AS ( SELECT * FROM recurring_payments_fwd

WHERE

org_id = 1 )
SELECT AVG(t.usd_amount_cents / 100) AS average_donation_usd
FROM transactions t
JOIN recurring_payments r ON r.id = t.recurring_payment_id
WHERE t.status = 'paid' AND t.deleted_at IS NULL AND

t.amount_refunded_cents

= 0
```

But when I wrap it to `select count(*) from (my_query)` i get 288 as a
result, but I expect it to be 1.

```
select count(*)
FROM (
WITH

transactions AS ( SELECT * FROM transactions_fwd WHERE org_id =

1

),
recurring_payments AS ( SELECT * FROM recurring_payments_fwd

WHERE

org_id = 1 )
SELECT AVG(t.usd_amount_cents / 100) AS average_donation_usd
FROM transactions t
JOIN recurring_payments r ON r.id = t.recurring_payment_id
WHERE t.status = 'paid' AND t.deleted_at IS NULL AND

t.amount_refunded_cents

= 0
);
```

Can you show the EXPLAIN ANALYZE of both of these queries?

Also, can you work on getting a self-contained reproducer that we can
run to recreate the issue.

David

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Aidar Uldanov (#4)
Re: BUG #18690: A count function returns wrong value when using FDW

Aidar Uldanov <aidar@donorbox.org> writes:

David, after some troubleshooting I found out that the issue with FWD and
bigint values, so when column type int then it works well but after
changing to a bigint type a count() function returns the wrong value.

create table transactions(
id serial primary key,
amount int,
org_id int,
recurring_payment_id int REFERENCES recurring_payments(id)
);
...
CREATE FOREIGN TABLE transactions_f (
id serial,
amount bigint,
org_id int,
recurring_payment_id int
)
SERVER foreign_server
OPTIONS (schema_name 'public', table_name 'transactions');

While it's sort of interesting that that misbehaves, I do not believe
that postgres_fdw promises to work correctly with foreign-table
definitions that don't match the remote table's actual column
datatypes.

regards, tom lane

#6Aidar Uldanov
aidar@donorbox.org
In reply to: Tom Lane (#5)
Re: BUG #18690: A count function returns wrong value when using FDW

oh sorry, I might forget to update int to bigint there. Actually both
should be bigint.
First I tried with int type and it works well and after changing to bigint
I got the wrong count value.

Thanks and regards
Aidar

On Wed, Nov 6, 2024 at 4:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Aidar Uldanov <aidar@donorbox.org> writes:

David, after some troubleshooting I found out that the issue with FWD and
bigint values, so when column type int then it works well but after
changing to a bigint type a count() function returns the wrong value.

create table transactions(
id serial primary key,
amount int,
org_id int,
recurring_payment_id int REFERENCES recurring_payments(id)
);
...
CREATE FOREIGN TABLE transactions_f (
id serial,
amount bigint,
org_id int,
recurring_payment_id int
)
SERVER foreign_server
OPTIONS (schema_name 'public', table_name 'transactions');

While it's sort of interesting that that misbehaves, I do not believe
that postgres_fdw promises to work correctly with foreign-table
definitions that don't match the remote table's actual column
datatypes.

regards, tom lane