Joins between foreign tables
Dear Postgres,
Consider a query like:
SELECT table_on_server1.email
FROM table_on_server1 JOIN table_on_server2
ON (table_on_server1.id = table_on_server2.user_id)
WHERE table_on_server2.created_at >= date_trunc('day', now())
One could imagine the plan being something like:
1.
Find all the user_ids that result from SELECT user_id FROM
table_on_server2 WHERE table_on_server2.created_at >= date_trunc('day',
now())
2.
Pass these IDs to a query executed on server1, pushing them down in a
WHERE clause.
However, the query actually doesn’t perform at all like that. Which is to
say, if one executes the query in (1) from the console and copy pastes the
resulting IDs into an IN clause in a second query against table_on_server1,
the query returns quickly (milliseconds); whereas if one runs the naive
query at the beginning of this email, it does not return for seconds.
What are things we can do to get good performance for queries like this?
We’ve tied moving the search for IDs into a temp table and a CTE; it
doesn’t seem to make any difference. (Which suggests that joins between one
local and one remote table won’t perform well in general, either.)
Best Regards,
Jason Dusek
On Mon, Jun 8, 2015 at 10:29 AM, Jason Dusek <jason.dusek@gmail.com> wrote:
The databases involved are all Postgres 9.4 or 9.3. The FDW is the Postgres
FDW.The join node (the one from which queries are issued) is Postgres 9.4
installed yesterday from the Postgres Apt repository. It's using the version
of the wrapper that is shipped with that package.The databases being queried -- the ones with the actual tables -- are both
Postgres 9.3.5, on Amazon RDS.
Sorry, I forgot to include pgsql-general on the address list for my reply.
It seems that postgres_fdw in Postgres 9.3 and 9.4 supports where
clause push-down according to the documentation:
http://www.postgresql.org/docs/9.3/static/postgres-fdw.html
"postgres_fdw attempts to optimize remote queries to reduce the amount
of data transferred from foreign servers. This is done by sending
query WHERE clauses to the remote server for execution, and by not
retrieving table columns that are not needed for the current query. To
reduce the risk of misexecution of queries, WHERE clauses are not sent
to the remote server unless they use only built-in data types,
operators, and functions. Operators and functions in the clauses must
be IMMUTABLE as well."
I'm guessing that postgres_fdw sees that one of your function calls is
IMMUTABLE, so it thinks it is unsafe to push-down. It is probably your
call to NOW(). You might want to try replacing that with a literal
somehow, if you can.
Geoff
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Import Notes
Reply to msg id not found: CAO3NbwOijVj3iqdFhNXRu5bgmGFuRjWUoV2T6OCKR6vJGzCBdw@mail.gmail.com
Hi Geoff,
Thanks for your swift reply. Breaking the query up like this would seem to
address the problem:
CREATE UNLOGGED TABLE tmp ASSELECT * FROM table_on_server2
WHERE created_at >= date_trunc('day', now());
SELECT email FROM table_on_server1 WHERE id IN (SELECT user_id FROM tmp);
It would stand to reason that the IN list could be pushed down to the
foreign server; but this query’s performance is no better than the direct
join between table_on_server1 and table_on_server2.
Best Regards,
Jason Dusek