exceptional result of postres_fdw external table joining local table
Hi,
I found a problem when using postres_fdw external table.
The PGDB version is 11.9。 I created a postgres_fdw external table to use a table from another DB, and I added use_remote_estimate true option to optimize remote sql。
gap_new=> \des+
List of foreign servers
Name | Owner | Foreign-data wrapper | Access privileges | Type | Version | FDW options | Descri
ption
----------------+-------+----------------------+-------------------+------+---------+------------------------------------------------------------------------------------+---
server28wx_fdw | gap | postgres_fdw | qu=U/qu | | | (host '192.168.1.28', port '5432', dbname 'db1', use_remote_estimate 'true') |
(1 row)
But when I run some full table scans and found a problems. the results of the following sql were different.
Method 1: directly join external table and local table
select count(distinct user_id),count(distinct member_code)
from a_fdw t1
join b as t6 on t1.user_id=t6.tags ->> '10'::varchar(50)
where coalesce(user_id,'') <> '';
-- result: 50739 50845
Method 2: create a local temp table from external table and then join temp table and local table.
drop table if exists temp_a;
select *
into temp temp_a
from a_fdw;
select count(distinct user_id),count(distinct member_code)
from a_fdw t1
join b as t6 on t1.user_id=t6.tags ->> '10'::varchar(50)
where coalesce(user_id,'') <> '';
--result 1641737 1645368
The two methods produces different results. And apparently, the result of Method 1 lost some data.
Is there some problem with use_remote_estimate option configuration ?
CAUTION: This e-mail and any attachments may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is forbidden. Please consider the environment before printing this e-mail!.
"Qu, Mischa, Majorel China" <mischa.qu@majorel.cn> writes:
But when I run some full table scans and found a problems. the results of the following sql were different.
Method 1: directly join external table and local table
select count(distinct user_id),count(distinct member_code)
from a_fdw t1
join b as t6 on t1.user_id=t6.tags ->> '10'::varchar(50)
where coalesce(user_id,'') <> '';
-- result: 50739 50845
What sort of plan is that query using? If it's a merge join, I'd
bet that the problem is related to local and remote servers not
having the same default collation.
regards, tom lane
Yes, the default collation is different, The collation of remote DB is en_US.UTF-8, and local is C.
Remote DB
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 7981 kB | pg_default | default administrative connection database
LOCAL DB
postgres | postgres | UTF8 | C | C | | 8061 kB | pg_default | default administrative connection database
and It's merge join。
Aggregate (cost=7058404551.15..7058404551.16 rows=1 width=16) (actual time=260773.462..260773.464 rows=1 loops=1)
Output: count(DISTINCT t1.user_id), count(DISTINCT t6.member_code)
-> Merge Join (cost=38935669.40..5266292528.36 rows=358422404557 width=131) (actual time=243519.160..260728.196 rows=65041 loops=1)
Output: t1.user_id, t6.member_code
Merge Cond: ((t1.user_id)::text = ((t6.tags ->> '508'::text)))
-> Foreign Scan on public.a_fdw t1 (cost=215387.41..493312.00 rows=1986961 width=28) (actual time=21277.241..27822.350 rows=2392619 loops=1)
Output: t1.id, t1.department_name, t1.leader_name, t1.leader_code, t1.store_code, t1.short_name, t1.staff_id, t1.staff_code, t1.staff_name, t1.staff_status, t1.ex
ternal_id, t1.user_id, t1.external_name, t1.external_status, t1.bind_time, t1.unbind_time, t1.create_time, t1.update_time, t1.remark
Filter: ((COALESCE(t1.user_id, ''::character varying))::text <> ''::text)
Rows Removed by Filter: 3342
Remote SQL: SELECT user_id FROM public.a ORDER BY user_id ASC NULLS LAST
-> Materialize (cost=38720281.99..38870604.70 rows=30064542 width=1101) (actual time=221529.199..231550.625 rows=10264430 loops=1)
Output: t6.member_code, t6.tags, ((t6.tags ->> '508'::text))
-> Sort (cost=38720281.99..38795443.34 rows=30064542 width=1101) (actual time=221529.195..230155.492 rows=10250260 loops=1)
Output: t6.member_code, t6.tags, ((t6.tags ->> '508'::text))
Sort Key: ((t6.tags ->> '508'::text))
Sort Method: external merge Disk: 33044192kB
-> Seq Scan on public.b t6 (cost=0.00..6007802.42 rows=30064542 width=1101) (actual time=0.042..150962.505 rows=30012748 loops=1)
Output: t6.member_code, t6.tags, (t6.tags ->> '508'::text)
Should I recreate foreign table using COLLATION en_US.UTF-8
-----邮件原件-----
发件人: Tom Lane <tgl@sss.pgh.pa.us>
发送时间: 2023年2月6日 23:34
收件人: Qu, Mischa, Majorel China <mischa.qu@majorel.cn>
抄送: pgsql-bugs@postgresql.org
主题: Re: exceptional result of postres_fdw external table joining local table
"Qu, Mischa, Majorel China" <mischa.qu@majorel.cn> writes:
But when I run some full table scans and found a problems. the results of the following sql were different.
Method 1: directly join external table and local table select
count(distinct user_id),count(distinct member_code) from a_fdw t1 join
b as t6 on t1.user_id=t6.tags ->> '10'::varchar(50) where
coalesce(user_id,'') <> '';
-- result: 50739 50845
What sort of plan is that query using? If it's a merge join, I'd bet that the problem is related to local and remote servers not having the same default collation.
regards, tom lane
CAUTION: This e-mail and any attachments may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is forbidden. Please consider the environment before printing this e-mail!.
Do only merge joins have this problem?
-----邮件原件-----
发件人: Qu, Mischa, Majorel China
发送时间: 2023年2月7日 11:23
收件人: Tom Lane <tgl@sss.pgh.pa.us>
抄送: pgsql-bugs@postgresql.org
主题: 答复: exceptional result of postres_fdw external table joining local table
Yes, the default collation is different, The collation of remote DB is en_US.UTF-8, and local is C.
Remote DB
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 7981 kB | pg_default | default administrative connection database
LOCAL DB
postgres | postgres | UTF8 | C | C | | 8061 kB | pg_default | default administrative connection database
and It's merge join。
Aggregate (cost=7058404551.15..7058404551.16 rows=1 width=16) (actual time=260773.462..260773.464 rows=1 loops=1)
Output: count(DISTINCT t1.user_id), count(DISTINCT t6.member_code)
-> Merge Join (cost=38935669.40..5266292528.36 rows=358422404557 width=131) (actual time=243519.160..260728.196 rows=65041 loops=1)
Output: t1.user_id, t6.member_code
Merge Cond: ((t1.user_id)::text = ((t6.tags ->> '508'::text)))
-> Foreign Scan on public.a_fdw t1 (cost=215387.41..493312.00 rows=1986961 width=28) (actual time=21277.241..27822.350 rows=2392619 loops=1)
Output: t1.id, t1.department_name, t1.leader_name, t1.leader_code, t1.store_code, t1.short_name, t1.staff_id, t1.staff_code, t1.staff_name, t1.staff_status, t1.ex ternal_id, t1.user_id, t1.external_name, t1.external_status, t1.bind_time, t1.unbind_time, t1.create_time, t1.update_time, t1.remark
Filter: ((COALESCE(t1.user_id, ''::character varying))::text <> ''::text)
Rows Removed by Filter: 3342
Remote SQL: SELECT user_id FROM public.a ORDER BY user_id ASC NULLS LAST
-> Materialize (cost=38720281.99..38870604.70 rows=30064542 width=1101) (actual time=221529.199..231550.625 rows=10264430 loops=1)
Output: t6.member_code, t6.tags, ((t6.tags ->> '508'::text))
-> Sort (cost=38720281.99..38795443.34 rows=30064542 width=1101) (actual time=221529.195..230155.492 rows=10250260 loops=1)
Output: t6.member_code, t6.tags, ((t6.tags ->> '508'::text))
Sort Key: ((t6.tags ->> '508'::text))
Sort Method: external merge Disk: 33044192kB
-> Seq Scan on public.b t6 (cost=0.00..6007802.42 rows=30064542 width=1101) (actual time=0.042..150962.505 rows=30012748 loops=1)
Output: t6.member_code, t6.tags, (t6.tags ->> '508'::text)
Should I recreate foreign table using COLLATION en_US.UTF-8
-----邮件原件-----
发件人: Tom Lane <tgl@sss.pgh.pa.us>
发送时间: 2023年2月6日 23:34
收件人: Qu, Mischa, Majorel China <mischa.qu@majorel.cn>
抄送: pgsql-bugs@postgresql.org
主题: Re: exceptional result of postres_fdw external table joining local table
"Qu, Mischa, Majorel China" <mischa.qu@majorel.cn> writes:
But when I run some full table scans and found a problems. the results of the following sql were different.
Method 1: directly join external table and local table select
count(distinct user_id),count(distinct member_code) from a_fdw t1 join
b as t6 on t1.user_id=t6.tags ->> '10'::varchar(50) where
coalesce(user_id,'') <> '';
-- result: 50739 50845
What sort of plan is that query using? If it's a merge join, I'd bet that the problem is related to local and remote servers not having the same default collation.
regards, tom lane
CAUTION: This e-mail and any attachments may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is forbidden. Please consider the environment before printing this e-mail!.
After I added colllation "en_US.utf8", the execution of another sql slowed down, and the query plan shows the remote sql is a full table scan.
Remote SQL: SELECT unionid, follow_user FROM public.wxwork_external_info
And using default collation the querying plan is
Remote SQL: SELECT unionid, follow_user FROM public.wxwork_external_info WHERE (($1::character varying(50) = unionid))
-----邮件原件-----
发件人: Qu, Mischa, Majorel China
发送时间: 2023年2月7日 11:45
收件人: Tom Lane <tgl@sss.pgh.pa.us>
抄送: pgsql-bugs@postgresql.org
主题: 答复: exceptional result of postres_fdw external table joining local table
Do only merge joins have this problem?
-----邮件原件-----
发件人: Qu, Mischa, Majorel China
发送时间: 2023年2月7日 11:23
收件人: Tom Lane <tgl@sss.pgh.pa.us>
抄送: pgsql-bugs@postgresql.org
主题: 答复: exceptional result of postres_fdw external table joining local table
Yes, the default collation is different, The collation of remote DB is en_US.UTF-8, and local is C.
Remote DB
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 7981 kB | pg_default | default administrative connection database
LOCAL DB
postgres | postgres | UTF8 | C | C | | 8061 kB | pg_default | default administrative connection database
and It's merge join。
Aggregate (cost=7058404551.15..7058404551.16 rows=1 width=16) (actual time=260773.462..260773.464 rows=1 loops=1)
Output: count(DISTINCT t1.user_id), count(DISTINCT t6.member_code)
-> Merge Join (cost=38935669.40..5266292528.36 rows=358422404557 width=131) (actual time=243519.160..260728.196 rows=65041 loops=1)
Output: t1.user_id, t6.member_code
Merge Cond: ((t1.user_id)::text = ((t6.tags ->> '508'::text)))
-> Foreign Scan on public.a_fdw t1 (cost=215387.41..493312.00 rows=1986961 width=28) (actual time=21277.241..27822.350 rows=2392619 loops=1)
Output: t1.id, t1.department_name, t1.leader_name, t1.leader_code, t1.store_code, t1.short_name, t1.staff_id, t1.staff_code, t1.staff_name, t1.staff_status, t1.ex ternal_id, t1.user_id, t1.external_name, t1.external_status, t1.bind_time, t1.unbind_time, t1.create_time, t1.update_time, t1.remark
Filter: ((COALESCE(t1.user_id, ''::character varying))::text <> ''::text)
Rows Removed by Filter: 3342
Remote SQL: SELECT user_id FROM public.a ORDER BY user_id ASC NULLS LAST
-> Materialize (cost=38720281.99..38870604.70 rows=30064542 width=1101) (actual time=221529.199..231550.625 rows=10264430 loops=1)
Output: t6.member_code, t6.tags, ((t6.tags ->> '508'::text))
-> Sort (cost=38720281.99..38795443.34 rows=30064542 width=1101) (actual time=221529.195..230155.492 rows=10250260 loops=1)
Output: t6.member_code, t6.tags, ((t6.tags ->> '508'::text))
Sort Key: ((t6.tags ->> '508'::text))
Sort Method: external merge Disk: 33044192kB
-> Seq Scan on public.b t6 (cost=0.00..6007802.42 rows=30064542 width=1101) (actual time=0.042..150962.505 rows=30012748 loops=1)
Output: t6.member_code, t6.tags, (t6.tags ->> '508'::text)
Should I recreate foreign table using COLLATION en_US.UTF-8
-----邮件原件-----
发件人: Tom Lane <tgl@sss.pgh.pa.us>
发送时间: 2023年2月6日 23:34
收件人: Qu, Mischa, Majorel China <mischa.qu@majorel.cn>
抄送: pgsql-bugs@postgresql.org
主题: Re: exceptional result of postres_fdw external table joining local table
"Qu, Mischa, Majorel China" <mischa.qu@majorel.cn> writes:
But when I run some full table scans and found a problems. the results of the following sql were different.
Method 1: directly join external table and local table select
count(distinct user_id),count(distinct member_code) from a_fdw t1 join
b as t6 on t1.user_id=t6.tags ->> '10'::varchar(50) where
coalesce(user_id,'') <> '';
-- result: 50739 50845
What sort of plan is that query using? If it's a merge join, I'd bet that the problem is related to local and remote servers not having the same default collation.
regards, tom lane
CAUTION: This e-mail and any attachments may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is forbidden. Please consider the environment before printing this e-mail!.