TEXT vs VARCHAR join qual push down diffrence, bug or expected?

Started by Jeevan Chalkealmost 11 years ago10 messageshackers

jeevan.chalke@enterprisedb.com

almost 11 years ago

Hi,

It is observed that, when we have one remote (huge) table and one local
(small) table and a join between them, then
1. If the column type is text, then we push the join qual to the remote
server, so that we will have less rows to fetch, and thus execution time
is very less.
2. If the column type is varchar, then we do not push the join qual to the
remote server, resulting into large number of data fetch and thus
execution time is very high.

Here is the EXPLAIN plan for such queries:

When VARCHAR column:

QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=100.15..4594935.73 rows=230 width=120) (actual
time=0.490..291.339 rows=1 loops=1)
Output: a.ename, d.dname
Join Filter: ((a.deptno)::text = (d.deptno)::text)
Rows Removed by Join Filter: 100099
-> Index Scan using emp2_pk on public.emp2 a (cost=0.15..8.17 rows=1
width=76) (actual time=0.009..0.013 rows=1 loops=1)
Output: a.empno, a.ename, a.deptno
Index Cond: (a.empno = '7369'::numeric)
-> Foreign Scan on public.fdw_dept2 d (cost=100.00..4594353.50
rows=45925 width=120) (actual time=0.466..274.990 rows=100100 loops=1)
Output: d.deptno, d.dname
Remote SQL: SELECT deptno, dname FROM public.dept2
Planning time: 0.697 ms
Execution time: 291.467 ms
(12 rows)

When TEXT column:

QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=100.57..216.63 rows=238 width=120) (actual
time=0.375..0.378 rows=1 loops=1)
Output: a.ename, d.dname
-> Index Scan using emp3_pk on public.emp3 a (cost=0.15..8.17 rows=1
width=70) (actual time=0.010..0.011 rows=1 loops=1)
Output: a.empno, a.ename, a.deptno
Index Cond: (a.empno = '7369'::numeric)
-> Foreign Scan on public.fdw_dept3 d (cost=100.42..208.45 rows=1
width=114) (actual time=0.362..0.362 rows=1 loops=1)
Output: d.deptno, d.dname
Remote SQL: SELECT deptno, dname FROM public.dept3 WHERE
(($1::text = deptno))
Planning time: 1.220 ms
Execution time: 0.498 ms
(10 rows)

Attached test script to reproduce this theory.

I have observed that, since we do not have an equality operator for VARCHAR
type, we convert VARCHAR to TEXT using RelabelType and use texteq operator
function.
However in foreign_expr_walker(), for T_RelabelType case, we have these
conditions which do not allow us push the qual to remote.

/*
* RelabelType must not introduce a collation not derived
from
* an input foreign Var.
*/
collation = r->resultcollid;
if (collation == InvalidOid)
state = FDW_COLLATE_NONE;
else if (inner_cxt.state == FDW_COLLATE_SAFE &&
collation == inner_cxt.collation)
state = FDW_COLLATE_SAFE;
else
state = FDW_COLLATE_UNSAFE;

I guess, since we do push qual to remote in case of TEXT, we should do the
same for VARCHAR too.

Also given that RelabelType are just dummy wrapper for binary compatible
types, can we simply set collation and state from its inner context instead
on above check block. Like

/*
* Since RelabelType represents a "dummy" type coercion
between
* two binary-compatible datatypes, set collation and state
got
* from the inner_cxt.
*/
collation = inner_cxt.collation;
state = inner_cxt.state;

Inputs/Thought?

--
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company