Custom FDW - the results of a nested query/join not being passed as qual to the outer query
Hi!
(First post. If this is not the appropriate list, please feel free to move
or let me know. )
I am developing an FDW which allows various data sources to act as virtual
tables, allowing various different APIs to be queried using a consistent
SQL interface - a similar concept to Osquery but using Postgres instead of
SQLite. It is working pretty well, but we have hit a bit of a roadblock (or
bump in the road at least).
We often have virtual tables where a list operation is not viable/possible
without providing quals. For example we have implemented a 'whois' table,
which will retrieve whois information for specified domains. It is clearly
not practical to do an unqualified 'list' of *all* domains.
The problem we have is that the results of nested subqueries/joins are not
being passed as quals to the outer query.
So for example
* select * from whois_domain where domain in ('google.com
<http://google.com>', 'yahoo.co.uk <http://yahoo.co.uk>')*
works fine, and a qual is passed to the fdw with a value of ['google.com', '
yahoo.co.uk']
However the following (assuming a 'domains table containing required
domains) does not work:
*select * from whois_domain where domain in (select domain from domains)*
In this case, no quals are passed to the fdw, so the *select * from
whois_domain* query therefore fails. What we would like is to ensure the
subquery runs first, and for the results to be available to the outer query.
---
Using SQLite, this could be accomplished using cross-joins (
https://sqlite.org/optoverview.html#crossjoin). Is there an equivalent (or
similar) mechanism in Postgres to ensure query ordering?
Within the FDW, I have tried using the GetForeignPaths function to return a
path which returns a single row when the 'key' column is used. This does
provide a qual, however it is of type T_Var - I believe I need a constant
qual.
Any suggestion welcome as to either a different way to structure the query
or whether the FDW can request/enforce the ordering by returning specific
planning results.
Many thanks,
Kai Daguerre
FDW source: https://github.com/turbot/steampipe-postgres-fdw
Product page: https://steampipe.io
Kai Daguerre <kai@turbot.com> writes:
We often have virtual tables where a list operation is not viable/possible
without providing quals. For example we have implemented a 'whois' table,
which will retrieve whois information for specified domains. It is clearly
not practical to do an unqualified 'list' of *all* domains.
In that case you're going to have to resign yourself to some queries
failing. This is unavoidable, consider "select * from whois". But
just because the query has a WHERE condition doesn't mean that a useful
restriction clause can be extracted for any particular table.
I think the best you can do is (1) fail at runtime if there's no qual
and (2) at plan time, return an extremely high cost estimate for a
qual-less scan, in hopes of discouraging the planner from choosing
that. (Instead of (2), you could perhaps not generate a scan path
at all, but that's likely to lead to an unintelligible error message.)
Perhaps you should rethink whether you really want a foreign table
rather than a set-returning function. With the SRF approach it's
automatic that the user must supply the restricting argument(s) you need.
regards, tom lane
Many thanks for the fast response.
Using an SRF is an interesting idea, I'll have a play and see if we can
make that work.
Cheers,
Kai
On Wed, Jan 27, 2021 at 3:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Kai Daguerre <kai@turbot.com> writes:
We often have virtual tables where a list operation is not
viable/possible
without providing quals. For example we have implemented a 'whois' table,
which will retrieve whois information for specified domains. It isclearly
not practical to do an unqualified 'list' of *all* domains.
In that case you're going to have to resign yourself to some queries
failing. This is unavoidable, consider "select * from whois". But
just because the query has a WHERE condition doesn't mean that a useful
restriction clause can be extracted for any particular table.I think the best you can do is (1) fail at runtime if there's no qual
and (2) at plan time, return an extremely high cost estimate for a
qual-less scan, in hopes of discouraging the planner from choosing
that. (Instead of (2), you could perhaps not generate a scan path
at all, but that's likely to lead to an unintelligible error message.)Perhaps you should rethink whether you really want a foreign table
rather than a set-returning function. With the SRF approach it's
automatic that the user must supply the restricting argument(s) you need.regards, tom lane