FDW for PostgreSQL
Hi all,
I'd like to propose FDW for PostgreSQL as a contrib module again.
Attached patch is updated version of the patch proposed in 9.2
development cycle.
For ease of review, I summarized what the patch tries to achieve.
Abstract
========
This patch provides FDW for PostgreSQL which allows users to access
external data stored in remote PostgreSQL via foreign tables. Of course
external instance can be beyond network. And I think that this FDW
could be an example of other RDBMS-based FDW, and it would be useful for
proof-of-concept of FDW-related features.
Note that the name has been changed from "pgsql_fdw" which was used in
last proposal, since I got a comment which says that most of existing
FDWs have name "${PRODUCT_NAME}_fdw" so "postgresql_fdw" or
"postgres_fdw" would be better. For this issue, I posted another patch
which moves existing postgresql_fdw_validator into contrib/dblink with
renaming in order to reserve the name "postgresql_fdw" for this FDW.
Please note that the attached patch requires dblink_fdw_validator.patch
to be applied first.
http://archives.postgresql.org/pgsql-hackers/2012-09/msg00454.php
Query deparser
==============
Now postgresql_fdw has its own SQL query deparser inside, so it's free
from backend's ruleutils module.
This deparser maps object names when generic options below were set.
nspname of foreign table: used as namespace (schema) of relation
relname of foreign table: used as relation name
colname of foreign column: used as column name
This mapping allows flexible schema design.
SELECT optimization
===================
postgresql_fdw always retrieves as much columns as foreign table from
remote to avoid overhead of column mapping. However, often some of them
(or sometimes all of them) are not used on local side, so postgresql_fdw
uses NULL literal as such unused columns in SELECT clause of remote
query. For example, let's assume one of pgbench workloads:
SELECT abalance FROM pgbench_accounts WHERE aid = 1;
This query generates a remote query below. In addition to bid and
filler, aid is replaced with NULL because it's already evaluated on
remote side.
SELECT NULL, NULL, abalance, NULL FROM pgbench_accounts
WHERE (aid OPERATOR(pg_catalog.=) 1);
This trick would improve performance notably by reducing amount of data
to be transferred.
One more example. Let's assume counting rows.
SELCT count(*) FROM pgbench_accounts;
This query requires only existence of row, so no actual column reference
is in SELECT clause.
SELECT NULL, NULL, NULL, NULL FROM pgbench_accounts;
WHERE push down
===============
postgresql_fdw pushes down some of restrictions (IOW, top level elements
in WHERE clause which are connected with AND) which can be evaluated on
remote side safely. Currently the criteria "safe" is declared as
whether an expression contains only:
- column reference
- constant of bult-in type (scalar and array)
- external parameter of EXECUTE statement
- built-in operator which uses built-in immutable function
(operator cannot be collative unless it's "=" or "<>")
- built-in immutable function
Some other elements might be also safe to be pushed down, but criteria
above seems enough for basic use cases.
Although it might seem odd, but operators are deparsed into OPERATOR
notation to avoid search_path problem.
E.g.
local query : WHERE col = 1
remote query: WHERE (col OPERATOR(pg_catalog.=) 1)
Connection management
=====================
postgresql_fdw has its own connection manager. Connection is
established when first foreign scan on a server is planned, and it's
pooled in the backend. If another foreign scan on same server is
invoked, same connection will be used. Connection pool is per-backend.
This means that different backends never share connection.
postgresql_fdw_connections view shows active connections, and
postgresql_fdw_disconnect() allows users to discard particular
connection at arbitrary timing.
Transaction management
======================
If multiple foreign tables on same foreign server is used in a local
query, postgresql_fdw uses same connection to retrieve results in a
transaction to make results consistent. Currently remote transaction is
closed at the end of local query, so following local query might produce
inconsistent result.
Costs estimation
================
To estimate costs and result rows of a foreign scan, postgresql_fdw
executes EXPLAIN statement on remote side, and retrieves costs and rows
values from the result. For cost estimation, cost of connection
establishment and data transfer are added to the base costs. Currently
these two factors is hard-coded, but making them configurable is not so
difficult.
Executing EXPLAIN is not cheap, but remote query itself is usually very
expensive, so such additional cost would be acceptable.
ANALYZE support
===============
postgresql_fdw supports ANALYZE to improve selectivity estimation of
filtering done on local side (WHERE clauses which could not been pushed
down. The sampler function retrieves all rows from remote table and
skip some of them so that result fits requested size. As same as
file_fdw, postgresql_fdw doesn't care order of result, because it's
important for only correlation, and correlation is important for only
index scans, which is not supported for this FDW.
Fetching Data
=============
postgresql_fdw uses single-row mode of libpq so that memory usage is
kept in low level even if the result is huge.
To cope with difference of encoding, postgresql_fdw automatically sets
client_encoding to server encoding of local database.
Future improvement
==================
I have some ideas for improvement:
- Provide sorted result path (requires index information?)
- Provide parameterized path
- Transaction mapping between local and remotes (2PC)
- binary transfer (only against servers with same PG major version?)
- JOIN push-down (requires support by core)
Any comments and questions are welcome.
--
Shigeru HANADA
Attachments:
postgresql_fdw_v1.patchtext/plain; charset=Shift_JIS; name=postgresql_fdw_v1.patchDownload
diff --git a/contrib/Makefile b/contrib/Makefile
index d230451..ce6d461 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -43,6 +43,7 @@ SUBDIRS = \
pgcrypto \
pgrowlocks \
pgstattuple \
+ postgresql_fdw \
seg \
spi \
tablefunc \
diff --git a/contrib/postgresql_fdw/.gitignore b/contrib/postgresql_fdw/.gitignore
new file mode 100644
index 0000000..0854728
--- /dev/null
+++ b/contrib/postgresql_fdw/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/results/
+*.o
+*.so
diff --git a/contrib/postgresql_fdw/Makefile b/contrib/postgresql_fdw/Makefile
new file mode 100644
index 0000000..898036f
--- /dev/null
+++ b/contrib/postgresql_fdw/Makefile
@@ -0,0 +1,22 @@
+# contrib/postgresql_fdw/Makefile
+
+MODULE_big = postgresql_fdw
+OBJS = postgresql_fdw.o option.o deparse.o connection.o
+PG_CPPFLAGS = -I$(libpq_srcdir)
+SHLIB_LINK = $(libpq)
+
+EXTENSION = postgresql_fdw
+DATA = postgresql_fdw--1.0.sql
+
+REGRESS = postgresql_fdw
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/postgresql_fdw
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/postgresql_fdw/connection.c b/contrib/postgresql_fdw/connection.c
new file mode 100644
index 0000000..145127f
--- /dev/null
+++ b/contrib/postgresql_fdw/connection.c
@@ -0,0 +1,604 @@
+/*-------------------------------------------------------------------------
+ *
+ * connection.c
+ * Connection management for postgresql_fdw
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgresql_fdw/connection.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_type.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq-fe.h"
+#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "utils/array.h"
+#include "utils/builtins.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "utils/tuplestore.h"
+
+#include "postgresql_fdw.h"
+#include "connection.h"
+
+/* ============================================================================
+ * Connection management functions
+ * ==========================================================================*/
+
+/*
+ * Connection cache entry managed with hash table.
+ */
+typedef struct ConnCacheEntry
+{
+ /* hash key must be first */
+ Oid serverid; /* oid of foreign server */
+ Oid userid; /* oid of local user */
+
+ bool use_tx; /* true when using remote transaction */
+ int refs; /* reference counter */
+ PGconn *conn; /* foreign server connection */
+} ConnCacheEntry;
+
+/*
+ * Hash table which is used to cache connection to PostgreSQL servers, will be
+ * initialized before first attempt to connect PostgreSQL server by the backend.
+ */
+static HTAB *ConnectionHash;
+
+/* ----------------------------------------------------------------------------
+ * prototype of private functions
+ * --------------------------------------------------------------------------*/
+static void
+cleanup_connection(ResourceReleasePhase phase,
+ bool isCommit,
+ bool isTopLevel,
+ void *arg);
+static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
+static void begin_remote_tx(PGconn *conn);
+static void abort_remote_tx(PGconn *conn);
+
+/*
+ * Get a PGconn which can be used to execute foreign query on the remote
+ * PostgreSQL server with the user's authorization. If this was the first
+ * request for the server, new connection is established.
+ *
+ * When use_tx is true, remote transaction is started if caller is the only
+ * user of the connection. Isolation level of the remote transaction is same
+ * as local transaction, and remote transaction will be aborted when last
+ * user release.
+ *
+ * TODO: Note that caching connections requires a mechanism to detect change of
+ * FDW object to invalidate already established connections.
+ */
+PGconn *
+GetConnection(ForeignServer *server, UserMapping *user, bool use_tx)
+{
+ bool found;
+ ConnCacheEntry *entry;
+ ConnCacheEntry key;
+
+ /* initialize connection cache if it isn't */
+ if (ConnectionHash == NULL)
+ {
+ HASHCTL ctl;
+
+ /* hash key is a pair of oids: serverid and userid */
+ MemSet(&ctl, 0, sizeof(ctl));
+ ctl.keysize = sizeof(Oid) + sizeof(Oid);
+ ctl.entrysize = sizeof(ConnCacheEntry);
+ ctl.hash = tag_hash;
+ ctl.match = memcmp;
+ ctl.keycopy = memcpy;
+ /* allocate ConnectionHash in the cache context */
+ ctl.hcxt = CacheMemoryContext;
+ ConnectionHash = hash_create("postgresql_fdw connections", 32,
+ &ctl,
+ HASH_ELEM | HASH_CONTEXT |
+ HASH_FUNCTION | HASH_COMPARE |
+ HASH_KEYCOPY);
+
+ /*
+ * Register postgresql_fdw's own cleanup function for connection
+ * cleanup. This should be done just once for each backend.
+ */
+ RegisterResourceReleaseCallback(cleanup_connection, ConnectionHash);
+ }
+
+ /* Create key value for the entry. */
+ MemSet(&key, 0, sizeof(key));
+ key.serverid = server->serverid;
+ key.userid = GetOuterUserId();
+
+ /*
+ * Find cached entry for requested connection. If we couldn't find,
+ * callback function of ResourceOwner should be registered to clean the
+ * connection up on error including user interrupt.
+ */
+ entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
+ if (!found)
+ {
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ }
+
+ /*
+ * We don't check the health of cached connection here, because it would
+ * require some overhead. Broken connection and its cache entry will be
+ * cleaned up when the connection is actually used.
+ */
+
+ /*
+ * If cache entry doesn't have connection, we have to establish new
+ * connection.
+ */
+ if (entry->conn == NULL)
+ {
+ PGconn *volatile conn = NULL;
+
+ /*
+ * Use PG_TRY block to ensure closing connection on error.
+ */
+ PG_TRY();
+ {
+ /*
+ * Connect to the foreign PostgreSQL server, and store it in cache
+ * entry to keep new connection.
+ * Note: key items of entry has already been initialized in
+ * hash_search(HASH_ENTER).
+ */
+ conn = connect_pg_server(server, user);
+ }
+ PG_CATCH();
+ {
+ /* Clear connection cache entry on error case. */
+ PQfinish(entry->conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+ entry->conn = conn;
+ elog(DEBUG3, "new postgresql_fdw connection %p for server %s",
+ entry->conn, server->servername);
+ }
+
+ /* Increase connection reference counter. */
+ entry->refs++;
+
+ /*
+ * If requester is the only referrer of this connection, start transaction
+ * with the same isolation level as the local transaction we are in.
+ * We need to remember whether this connection uses remote transaction to
+ * abort it when this connection is released completely.
+ */
+ if (use_tx && entry->refs == 1)
+ begin_remote_tx(entry->conn);
+ entry->use_tx = use_tx;
+
+
+ return entry->conn;
+}
+
+/*
+ * For non-superusers, insist that the connstr specify a password. This
+ * prevents a password from being picked up from .pgpass, a service file,
+ * the environment, etc. We don't want the postgres user's passwords
+ * to be accessible to non-superusers.
+ */
+static void
+check_conn_params(const char **keywords, const char **values)
+{
+ int i;
+
+ /* no check required if superuser */
+ if (superuser())
+ return;
+
+ /* ok if params contain a non-empty password */
+ for (i = 0; keywords[i] != NULL; i++)
+ {
+ if (strcmp(keywords[i], "password") == 0 && values[i][0] != '\0')
+ return;
+ }
+
+ ereport(ERROR,
+ (errcode(ERRCODE_S_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED),
+ errmsg("password is required"),
+ errdetail("Non-superusers must provide a password in the connection string.")));
+}
+
+static PGconn *
+connect_pg_server(ForeignServer *server, UserMapping *user)
+{
+ const char *conname = server->servername;
+ PGconn *conn;
+ const char **all_keywords;
+ const char **all_values;
+ const char **keywords;
+ const char **values;
+ int n;
+ int i, j;
+
+ /*
+ * Construct connection params from generic options of ForeignServer and
+ * UserMapping. Those two object hold only libpq options.
+ * Extra 3 items are for:
+ * *) fallback_application_name
+ * *) client_encoding
+ * *) NULL termination (end marker)
+ *
+ * Note: We don't omit any parameters even target database might be older
+ * than local, because unexpected parameters are just ignored.
+ */
+ n = list_length(server->options) + list_length(user->options) + 3;
+ all_keywords = (const char **) palloc(sizeof(char *) * n);
+ all_values = (const char **) palloc(sizeof(char *) * n);
+ keywords = (const char **) palloc(sizeof(char *) * n);
+ values = (const char **) palloc(sizeof(char *) * n);
+ n = 0;
+ n += ExtractConnectionOptions(server->options,
+ all_keywords + n, all_values + n);
+ n += ExtractConnectionOptions(user->options,
+ all_keywords + n, all_values + n);
+ all_keywords[n] = all_values[n] = NULL;
+
+ for (i = 0, j = 0; all_keywords[i]; i++)
+ {
+ keywords[j] = all_keywords[i];
+ values[j] = all_values[i];
+ j++;
+ }
+
+ /* Use "postgresql_fdw" as fallback_application_name. */
+ keywords[j] = "fallback_application_name";
+ values[j++] = "postgresql_fdw";
+
+ /* Set client_encoding so that libpq can convert encoding properly. */
+ keywords[j] = "client_encoding";
+ values[j++] = GetDatabaseEncodingName();
+
+ keywords[j] = values[j] = NULL;
+ pfree(all_keywords);
+ pfree(all_values);
+
+ /* verify connection parameters and do connect */
+ check_conn_params(keywords, values);
+ conn = PQconnectdbParams(keywords, values, 0);
+ if (!conn || PQstatus(conn) != CONNECTION_OK)
+ ereport(ERROR,
+ (errcode(ERRCODE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION),
+ errmsg("could not connect to server \"%s\"", conname),
+ errdetail("%s", PQerrorMessage(conn))));
+ pfree(keywords);
+ pfree(values);
+
+ /*
+ * Check that non-superuser has used password to establish connection.
+ * This check logic is based on dblink_security_check() in contrib/dblink.
+ *
+ * XXX Should we check this even if we don't provide unsafe version like
+ * dblink_connect_u()?
+ */
+ if (!superuser() && !PQconnectionUsedPassword(conn))
+ {
+ PQfinish(conn);
+ ereport(ERROR,
+ (errcode(ERRCODE_S_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED),
+ errmsg("password is required"),
+ errdetail("Non-superuser cannot connect if the server does not request a password."),
+ errhint("Target server's authentication method must be changed.")));
+ }
+
+ return conn;
+}
+
+/*
+ * Start remote transaction with proper isolation level.
+ */
+static void
+begin_remote_tx(PGconn *conn)
+{
+ const char *sql = NULL; /* keep compiler quiet. */
+ PGresult *res;
+
+ switch (XactIsoLevel)
+ {
+ case XACT_READ_UNCOMMITTED:
+ case XACT_READ_COMMITTED:
+ case XACT_REPEATABLE_READ:
+ sql = "START TRANSACTION ISOLATION LEVEL REPEATABLE READ";
+ break;
+ case XACT_SERIALIZABLE:
+ sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
+ break;
+ default:
+ elog(ERROR, "unexpected isolation level: %d", XactIsoLevel);
+ break;
+ }
+
+ elog(DEBUG3, "starting remote transaction with \"%s\"", sql);
+
+ res = PQexec(conn, sql);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ PQclear(res);
+ elog(ERROR, "could not start transaction: %s", PQerrorMessage(conn));
+ }
+ PQclear(res);
+}
+
+static void
+abort_remote_tx(PGconn *conn)
+{
+ PGresult *res;
+
+ elog(DEBUG3, "aborting remote transaction");
+
+ res = PQexec(conn, "ABORT TRANSACTION");
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ PQclear(res);
+ elog(ERROR, "could not abort transaction: %s", PQerrorMessage(conn));
+ }
+ PQclear(res);
+}
+
+/*
+ * Mark the connection as "unused", and close it if the caller was the last
+ * user of the connection.
+ */
+void
+ReleaseConnection(PGconn *conn)
+{
+ HASH_SEQ_STATUS scan;
+ ConnCacheEntry *entry;
+
+ if (conn == NULL)
+ return;
+
+ /*
+ * We need to scan sequentially since we use the address to find
+ * appropriate PGconn from the hash table.
+ */
+ hash_seq_init(&scan, ConnectionHash);
+ while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
+ {
+ if (entry->conn == conn)
+ {
+ hash_seq_term(&scan);
+ break;
+ }
+ }
+
+ /*
+ * If the given connection is an orphan, it must be a dangling pointer to
+ * already released connection. Discarding connection due to remote query
+ * error would produce such situation (see comments below).
+ */
+ if (entry == NULL)
+ return;
+
+ /*
+ * If releasing connection is broken or its transaction has failed,
+ * discard the connection to recover from the error. PQfinish would cause
+ * dangling pointer of shared PGconn object, but they won't double-free'd
+ * because their pointer values don't match any of cached entry and ignored
+ * at the check above.
+ *
+ * Subsequent connection request via GetConnection will create new
+ * connection.
+ */
+ if (PQstatus(conn) != CONNECTION_OK ||
+ (PQtransactionStatus(conn) != PQTRANS_IDLE &&
+ PQtransactionStatus(conn) != PQTRANS_INTRANS))
+ {
+ elog(DEBUG3, "discarding connection: %s %s",
+ PQstatus(conn) == CONNECTION_OK ? "OK" : "NG",
+ PQtransactionStatus(conn) == PQTRANS_IDLE ? "IDLE" :
+ PQtransactionStatus(conn) == PQTRANS_ACTIVE ? "ACTIVE" :
+ PQtransactionStatus(conn) == PQTRANS_INTRANS ? "INTRANS" :
+ PQtransactionStatus(conn) == PQTRANS_INERROR ? "INERROR" :
+ "UNKNOWN");
+ PQfinish(conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ return;
+ }
+
+ /*
+ * If this connection uses remote transaction and caller is the last user,
+ * abort remote transaction and forget about it.
+ */
+ if (entry->use_tx && entry->refs == 1)
+ {
+ abort_remote_tx(conn);
+ entry->use_tx = false;
+ }
+
+ /*
+ * Decrease reference counter of this connection. Even if the caller was
+ * the last referrer, we don't unregister it from cache.
+ */
+ entry->refs--;
+ if (entry->refs < 0)
+ entry->refs = 0; /* just in case */
+}
+
+/*
+ * Clean the connection up via ResourceOwner.
+ */
+static void
+cleanup_connection(ResourceReleasePhase phase,
+ bool isCommit,
+ bool isTopLevel,
+ void *arg)
+{
+ HASH_SEQ_STATUS scan;
+ ConnCacheEntry *entry = (ConnCacheEntry *) arg;
+
+ /* If the transaction was committed, don't close connections. */
+ if (isCommit)
+ return;
+
+ /*
+ * We clean the connection up on post-lock because foreign connections are
+ * backend-internal resource.
+ */
+ if (phase != RESOURCE_RELEASE_AFTER_LOCKS)
+ return;
+
+ /*
+ * We ignore cleanup for ResourceOwners other than transaction. At this
+ * point, such a ResourceOwner is only Portal.
+ */
+ if (CurrentResourceOwner != CurTransactionResourceOwner)
+ return;
+
+ /*
+ * We don't need to clean up at end of subtransactions, because they might
+ * be recovered to consistent state with savepoints.
+ */
+ if (!isTopLevel)
+ return;
+
+ /*
+ * Here, it must be after abort of top level transaction. Disconnect all
+ * cached connections to clear error status out and reset their reference
+ * counters.
+ */
+ hash_seq_init(&scan, ConnectionHash);
+ while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
+ {
+ elog(DEBUG3, "discard postgresql_fdw connection %p due to resowner cleanup",
+ entry->conn);
+ PQfinish(entry->conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ }
+}
+
+/*
+ * Get list of connections currently active.
+ */
+Datum postgresql_fdw_get_connections(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgresql_fdw_get_connections);
+Datum
+postgresql_fdw_get_connections(PG_FUNCTION_ARGS)
+{
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ HASH_SEQ_STATUS scan;
+ ConnCacheEntry *entry;
+ MemoryContext oldcontext = CurrentMemoryContext;
+ Tuplestorestate *tuplestore;
+ TupleDesc tupdesc;
+
+ /* We return list of connection with storing them in a Tuplestore. */
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = NULL;
+ rsinfo->setDesc = NULL;
+
+ /* Create tuplestore and copy of TupleDesc in per-query context. */
+ MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+
+ tupdesc = CreateTemplateTupleDesc(2, false);
+ TupleDescInitEntry(tupdesc, 1, "srvid", OIDOID, -1, 0);
+ TupleDescInitEntry(tupdesc, 2, "usesysid", OIDOID, -1, 0);
+ rsinfo->setDesc = tupdesc;
+
+ tuplestore = tuplestore_begin_heap(false, false, work_mem);
+ rsinfo->setResult = tuplestore;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * We need to scan sequentially since we use the address to find
+ * appropriate PGconn from the hash table.
+ */
+ if (ConnectionHash != NULL)
+ {
+ hash_seq_init(&scan, ConnectionHash);
+ while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
+ {
+ Datum values[2];
+ bool nulls[2];
+ HeapTuple tuple;
+
+ /* Ignore inactive connections */
+ if (PQstatus(entry->conn) != CONNECTION_OK)
+ continue;
+
+ /*
+ * Ignore other users' connections if current user isn't a
+ * superuser.
+ */
+ if (!superuser() && entry->userid != GetUserId())
+ continue;
+
+ values[0] = ObjectIdGetDatum(entry->serverid);
+ values[1] = ObjectIdGetDatum(entry->userid);
+ nulls[0] = false;
+ nulls[1] = false;
+
+ tuple = heap_formtuple(tupdesc, values, nulls);
+ tuplestore_puttuple(tuplestore, tuple);
+ }
+ }
+ tuplestore_donestoring(tuplestore);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Discard persistent connection designated by given connection name.
+ */
+Datum postgresql_fdw_disconnect(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgresql_fdw_disconnect);
+Datum
+postgresql_fdw_disconnect(PG_FUNCTION_ARGS)
+{
+ Oid serverid = PG_GETARG_OID(0);
+ Oid userid = PG_GETARG_OID(1);
+ ConnCacheEntry key;
+ ConnCacheEntry *entry = NULL;
+ bool found;
+
+ /* Non-superuser can't discard other users' connection. */
+ if (!superuser() && userid != GetOuterUserId())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+ errmsg("only superuser can discard other user's connection")));
+
+ /*
+ * If no connection has been established, or no such connections, just
+ * return "NG" to indicate nothing has done.
+ */
+ if (ConnectionHash == NULL)
+ PG_RETURN_TEXT_P(cstring_to_text("NG"));
+
+ key.serverid = serverid;
+ key.userid = userid;
+ entry = hash_search(ConnectionHash, &key, HASH_FIND, &found);
+ if (!found)
+ PG_RETURN_TEXT_P(cstring_to_text("NG"));
+
+ /* Discard cached connection, and clear reference counter. */
+ PQfinish(entry->conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+
+ PG_RETURN_TEXT_P(cstring_to_text("OK"));
+}
diff --git a/contrib/postgresql_fdw/connection.h b/contrib/postgresql_fdw/connection.h
new file mode 100644
index 0000000..17355df
--- /dev/null
+++ b/contrib/postgresql_fdw/connection.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * connection.h
+ * Connection management for postgresql_fdw
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgresql_fdw/connection.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CONNECTION_H
+#define CONNECTION_H
+
+#include "foreign/foreign.h"
+#include "libpq-fe.h"
+
+/*
+ * Connection management
+ */
+PGconn *GetConnection(ForeignServer *server, UserMapping *user, bool use_tx);
+void ReleaseConnection(PGconn *conn);
+
+#endif /* CONNECTION_H */
diff --git a/contrib/postgresql_fdw/deparse.c b/contrib/postgresql_fdw/deparse.c
new file mode 100644
index 0000000..9f64714
--- /dev/null
+++ b/contrib/postgresql_fdw/deparse.c
@@ -0,0 +1,1195 @@
+/*-------------------------------------------------------------------------
+ *
+ * deparse.c
+ * query deparser for PostgreSQL
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgresql_fdw/deparse.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/transam.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "commands/defrem.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "nodes/nodeFuncs.h"
+#include "nodes/nodes.h"
+#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/var.h"
+#include "parser/parser.h"
+#include "parser/parsetree.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+#include "postgresql_fdw.h"
+
+/*
+ * Context for walk-through the expression tree.
+ */
+typedef struct foreign_executable_cxt
+{
+ PlannerInfo *root;
+ RelOptInfo *foreignrel;
+ bool has_param;
+} foreign_executable_cxt;
+
+/*
+ * Get string representation which can be used in SQL statement from a node.
+ */
+static void deparseExpr(StringInfo buf, Expr *expr, PlannerInfo *root);
+static void deparseRelation(StringInfo buf, RangeTblEntry *rte,
+ bool need_prefix);
+static void deparseVar(StringInfo buf, Var *node, PlannerInfo *root,
+ bool need_prefix);
+static void deparseConst(StringInfo buf, Const *node, PlannerInfo *root);
+static void deparseBoolExpr(StringInfo buf, BoolExpr *node, PlannerInfo *root);
+static void deparseNullTest(StringInfo buf, NullTest *node, PlannerInfo *root);
+static void deparseDistinctExpr(StringInfo buf, DistinctExpr *node,
+ PlannerInfo *root);
+static void deparseRelabelType(StringInfo buf, RelabelType *node,
+ PlannerInfo *root);
+static void deparseFuncExpr(StringInfo buf, FuncExpr *node, PlannerInfo *root);
+static void deparseParam(StringInfo buf, Param *node, PlannerInfo *root);
+static void deparseScalarArrayOpExpr(StringInfo buf, ScalarArrayOpExpr *node,
+ PlannerInfo *root);
+static void deparseOpExpr(StringInfo buf, OpExpr *node, PlannerInfo *root);
+static void deparseArrayRef(StringInfo buf, ArrayRef *node, PlannerInfo *root);
+static void deparseArrayExpr(StringInfo buf, ArrayExpr *node, PlannerInfo *root);
+
+/*
+ * Determine whether an expression can be evaluated on remote side safely.
+ */
+static bool is_foreign_expr(PlannerInfo *root, RelOptInfo *baserel, Expr *expr,
+ bool *has_param);
+static bool foreign_expr_walker(Node *node, foreign_executable_cxt *context);
+static bool is_builtin(Oid procid);
+
+/*
+ * Deparse query representation into SQL statement which suits for remote
+ * PostgreSQL server. This function basically creates simple query string
+ * which consists of only SELECT, FROM clauses.
+ *
+ * Remote SELECT clause contains only columns which are used in targetlist or
+ * local_conds (conditions which can't be pushed down and will be checked on
+ * local side).
+ */
+void
+deparseSimpleSql(StringInfo buf,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ List *local_conds)
+{
+ RangeTblEntry *rte;
+ ListCell *lc;
+ StringInfoData foreign_relname;
+ bool first;
+ AttrNumber attr;
+ List *attr_used = NIL; /* List of AttNumber used in the query */
+
+ initStringInfo(buf);
+ initStringInfo(&foreign_relname);
+
+ /*
+ * First of all, determine which column should be retrieved for this scan.
+ *
+ * We do this before deparsing SELECT clause because attributes which are
+ * not used in neither reltargetlist nor baserel->baserestrictinfo, quals
+ * evaluated on local, can be replaced with literal "NULL" in the SELECT
+ * clause to reduce overhead of tuple handling tuple and data transfer.
+ */
+ foreach (lc, local_conds)
+ {
+ RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
+ List *attrs;
+
+ /*
+ * We need to know which attributes are used in qual evaluated
+ * on the local server, because they should be listed in the
+ * SELECT clause of remote query. We can ignore attributes
+ * which are referenced only in ORDER BY/GROUP BY clause because
+ * such attributes has already been kept in reltargetlist.
+ */
+ attrs = pull_var_clause((Node *) ri->clause,
+ PVC_RECURSE_AGGREGATES,
+ PVC_RECURSE_PLACEHOLDERS);
+ attr_used = list_union(attr_used, attrs);
+ }
+
+ /*
+ * deparse SELECT clause
+ *
+ * List attributes which are in either target list or local restriction.
+ * Unused attributes are replaced with a literal "NULL" for optimization.
+ *
+ * Note that nothing is added for dropped columns, though tuple constructor
+ * function requires entries for dropped columns. Such entries must be
+ * initialized with NULL before calling tuple constructor.
+ */
+ appendStringInfo(buf, "SELECT ");
+ rte = root->simple_rte_array[baserel->relid];
+ attr_used = list_union(attr_used, baserel->reltargetlist);
+ first = true;
+ for (attr = 1; attr <= baserel->max_attr; attr++)
+ {
+ Var *var = NULL;
+ ListCell *lc;
+
+ /* Ignore dropped attributes. */
+ if (get_rte_attribute_is_dropped(rte, attr))
+ continue;
+
+ if (!first)
+ appendStringInfo(buf, ", ");
+ first = false;
+
+ /*
+ * We use linear search here, but it wouldn't be problem since
+ * attr_used seems to not become so large.
+ */
+ foreach (lc, attr_used)
+ {
+ var = lfirst(lc);
+ if (var->varattno == attr)
+ break;
+ var = NULL;
+ }
+ if (var != NULL)
+ deparseVar(buf, var, root, false);
+ else
+ appendStringInfo(buf, "NULL");
+ }
+ appendStringInfoChar(buf, ' ');
+
+ /*
+ * deparse FROM clause, including alias if any
+ */
+ appendStringInfo(buf, "FROM ");
+ deparseRelation(buf, root->simple_rte_array[baserel->relid], true);
+
+ elog(DEBUG3, "Remote SQL: %s", buf->data);
+}
+
+/*
+ * Examine each element in the list baserestrictinfo of baserel, and classify
+ * them into three groups: remote_conds contains conditions which can be
+ * evaluated
+ * - remote_conds is push-down safe, and don't contain any Param node
+ * - param_conds is push-down safe, but contain some Param node
+ * - local_conds is not push-down safe
+ *
+ * Only remote_conds can be used in remote EXPLAIN, and remote_conds and
+ * param_conds can be used in final remote query.
+ */
+void
+classifyConditions(PlannerInfo *root,
+ RelOptInfo *baserel,
+ List **remote_conds,
+ List **param_conds,
+ List **local_conds)
+{
+ ListCell *lc;
+ bool has_param;
+
+ Assert(remote_conds);
+ Assert(param_conds);
+ Assert(local_conds);
+
+ foreach(lc, baserel->baserestrictinfo)
+ {
+ RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
+
+ if (is_foreign_expr(root, baserel, ri->clause, &has_param))
+ {
+ if (has_param)
+ *param_conds = lappend(*param_conds, ri);
+ else
+ *remote_conds = lappend(*remote_conds, ri);
+ }
+ else
+ *local_conds = lappend(*local_conds, ri);
+ }
+}
+
+/*
+ * Deparse SELECT statement to acquire sample rows of given relation into buf.
+ */
+void
+deparseAnalyzeSql(StringInfo buf, Relation rel)
+{
+ Oid relid = RelationGetRelid(rel);
+ TupleDesc tupdesc = RelationGetDescr(rel);
+ int i;
+ char *colname;
+ List *options;
+ ListCell *lc;
+ bool first = true;
+ char *nspname;
+ char *relname;
+ ForeignTable *table;
+
+ /* Deparse SELECT clause, use attribute name or colname option. */
+ appendStringInfo(buf, "SELECT ");
+ for (i = 0; i < tupdesc->natts; i++)
+ {
+ if (tupdesc->attrs[i]->attisdropped)
+ continue;
+
+ colname = NameStr(tupdesc->attrs[i]->attname);
+ options = GetForeignColumnOptions(relid, tupdesc->attrs[i]->attnum);
+
+ foreach(lc, options)
+ {
+ DefElem *def= (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "colname") == 0)
+ {
+ colname = defGetString(def);
+ break;
+ }
+ }
+
+ if (!first)
+ appendStringInfo(buf, ", ");
+ appendStringInfo(buf, "%s", quote_identifier(colname));
+ first = false;
+ }
+
+ /*
+ * Deparse FROM clause, use namespace and relation name, or use nspname and
+ * colname options respectively.
+ */
+ nspname = get_namespace_name(get_rel_namespace(relid));
+ relname = get_rel_name(relid);
+ table = GetForeignTable(relid);
+ foreach(lc, table->options)
+ {
+ DefElem *def= (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "nspname") == 0)
+ nspname = defGetString(def);
+ else if (strcmp(def->defname, "relname") == 0)
+ relname = defGetString(def);
+ }
+
+ appendStringInfo(buf, " FROM %s.%s", quote_identifier(nspname),
+ quote_identifier(relname));
+}
+
+/*
+ * Deparse given expression into buf. Actual string operation is delegated to
+ * node-type-specific functions.
+ *
+ * Note that switch statement of this function MUST match the one in
+ * foreign_expr_walker to avoid unsupported error..
+ */
+static void
+deparseExpr(StringInfo buf, Expr *node, PlannerInfo *root)
+{
+ /*
+ * This part must be match foreign_expr_walker.
+ */
+ switch (nodeTag(node))
+ {
+ case T_Const:
+ deparseConst(buf, (Const *) node, root);
+ break;
+ case T_BoolExpr:
+ deparseBoolExpr(buf, (BoolExpr *) node, root);
+ break;
+ case T_NullTest:
+ deparseNullTest(buf, (NullTest *) node, root);
+ break;
+ case T_DistinctExpr:
+ deparseDistinctExpr(buf, (DistinctExpr *) node, root);
+ break;
+ case T_RelabelType:
+ deparseRelabelType(buf, (RelabelType *) node, root);
+ break;
+ case T_FuncExpr:
+ deparseFuncExpr(buf, (FuncExpr *) node, root);
+ break;
+ case T_Param:
+ deparseParam(buf, (Param *) node, root);
+ break;
+ case T_ScalarArrayOpExpr:
+ deparseScalarArrayOpExpr(buf, (ScalarArrayOpExpr *) node, root);
+ break;
+ case T_OpExpr:
+ deparseOpExpr(buf, (OpExpr *) node, root);
+ break;
+ case T_Var:
+ deparseVar(buf, (Var *) node, root, false);
+ break;
+ case T_ArrayRef:
+ deparseArrayRef(buf, (ArrayRef *) node, root);
+ break;
+ case T_ArrayExpr:
+ deparseArrayExpr(buf, (ArrayExpr *) node, root);
+ break;
+ default:
+ {
+ ereport(ERROR,
+ (errmsg("unsupported expression for deparse"),
+ errdetail("%s", nodeToString(node))));
+ }
+ break;
+ }
+}
+
+/*
+ * Deparse node into buf, with relation qualifier if need_prefix was true. If
+ * node is a column of a foreign table, use value of colname FDW option (if any)
+ * instead of attribute name.
+ */
+static void
+deparseVar(StringInfo buf,
+ Var *node,
+ PlannerInfo *root,
+ bool need_prefix)
+{
+ RangeTblEntry *rte;
+ char *colname = NULL;
+ const char *q_colname = NULL;
+ List *options;
+ ListCell *lc;
+
+ /* node must not be any of OUTER_VAR,INNER_VAR and INDEX_VAR. */
+ Assert(node->varno >= 1 && node->varno <= root->simple_rel_array_size);
+
+ /* Get RangeTblEntry from array in PlannerInfo. */
+ rte = root->simple_rte_array[node->varno];
+
+ /*
+ * If the node is a column of a foreign table, and it has colname FDW
+ * option, use its value.
+ */
+ options = GetForeignColumnOptions(rte->relid, node->varattno);
+ foreach(lc, options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "colname") == 0)
+ {
+ colname = defGetString(def);
+ break;
+ }
+ }
+
+ /*
+ * If the node refers a column of a regular table or it doesn't have colname
+ * FDW option, use attribute name.
+ */
+ if (colname == NULL)
+ colname = get_attname(rte->relid, node->varattno);
+
+ if (need_prefix)
+ {
+ char *aliasname;
+ const char *q_aliasname;
+
+ if (rte->eref != NULL && rte->eref->aliasname != NULL)
+ aliasname = rte->eref->aliasname;
+ else if (rte->alias != NULL && rte->alias->aliasname != NULL)
+ aliasname = rte->alias->aliasname;
+
+ q_aliasname = quote_identifier(aliasname);
+ appendStringInfo(buf, "%s.", q_aliasname);
+ }
+
+ q_colname = quote_identifier(colname);
+ appendStringInfo(buf, "%s", q_colname);
+}
+
+/*
+ * Deparse table which has relid as oid into buf, with schema qualifier if
+ * need_prefix was true. If relid points a foreign table, use value of relname
+ * FDW option (if any) instead of relation's name. Similarly, nspname FDW
+ * option overrides schema name.
+ */
+static void
+deparseRelation(StringInfo buf,
+ RangeTblEntry *rte,
+ bool need_prefix)
+{
+ ForeignTable *table;
+ ListCell *lc;
+ const char *nspname = NULL; /* plain namespace name */
+ const char *relname = NULL; /* plain relation name */
+ const char *q_nspname; /* quoted namespace name */
+ const char *q_relname; /* quoted relation name */
+
+ /* obtain additional catalog information. */
+ table = GetForeignTable(rte->relid);
+
+ /*
+ * Use value of FDW options if any, instead of the name of object
+ * itself.
+ */
+ foreach(lc, table->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (need_prefix && strcmp(def->defname, "nspname") == 0)
+ nspname = defGetString(def);
+ else if (strcmp(def->defname, "relname") == 0)
+ relname = defGetString(def);
+ }
+
+ /* Quote each identifier, if necessary. */
+ if (need_prefix)
+ {
+ if (nspname == NULL)
+ nspname = get_namespace_name(get_rel_namespace(rte->relid));
+ q_nspname = quote_identifier(nspname);
+ }
+
+ if (relname == NULL)
+ relname = get_rel_name(rte->relid);
+ q_relname = quote_identifier(relname);
+
+ /* Construct relation reference into the buffer. */
+ if (need_prefix)
+ appendStringInfo(buf, "%s.", q_nspname);
+ appendStringInfo(buf, "%s", q_relname);
+}
+
+/*
+ * Deparse given constant value into buf. This function have to be kept in
+ * sync with get_const_expr.
+ */
+static void
+deparseConst(StringInfo buf,
+ Const *node,
+ PlannerInfo *root)
+{
+ Oid typoutput;
+ bool typIsVarlena;
+ char *extval;
+ bool isfloat = false;
+ bool needlabel;
+
+ if (node->constisnull)
+ {
+ appendStringInfo(buf, "NULL");
+ return;
+ }
+
+ getTypeOutputInfo(node->consttype,
+ &typoutput, &typIsVarlena);
+ extval = OidOutputFunctionCall(typoutput, node->constvalue);
+
+ switch (node->consttype)
+ {
+ case ANYARRAYOID:
+ case ANYNONARRAYOID:
+ elog(ERROR, "anyarray and anyenum are not supported");
+ break;
+ case INT2OID:
+ case INT4OID:
+ case INT8OID:
+ case OIDOID:
+ case FLOAT4OID:
+ case FLOAT8OID:
+ case NUMERICOID:
+ {
+ /*
+ * No need to quote unless they contain special values such as
+ * 'Nan'.
+ */
+ if (strspn(extval, "0123456789+-eE.") == strlen(extval))
+ {
+ if (extval[0] == '+' || extval[0] == '-')
+ appendStringInfo(buf, "(%s)", extval);
+ else
+ appendStringInfoString(buf, extval);
+ if (strcspn(extval, "eE.") != strlen(extval))
+ isfloat = true; /* it looks like a float */
+ }
+ else
+ appendStringInfo(buf, "'%s'", extval);
+ }
+ break;
+ case BITOID:
+ case VARBITOID:
+ appendStringInfo(buf, "B'%s'", extval);
+ break;
+ case BOOLOID:
+ if (strcmp(extval, "t") == 0)
+ appendStringInfoString(buf, "true");
+ else
+ appendStringInfoString(buf, "false");
+ break;
+
+ default:
+ {
+ const char *valptr;
+
+ appendStringInfoChar(buf, '\'');
+ for (valptr = extval; *valptr; valptr++)
+ {
+ char ch = *valptr;
+
+ /*
+ * standard_conforming_strings of remote session should be
+ * set to similar value as local session.
+ */
+ if (SQL_STR_DOUBLE(ch, !standard_conforming_strings))
+ appendStringInfoChar(buf, ch);
+ appendStringInfoChar(buf, ch);
+ }
+ appendStringInfoChar(buf, '\'');
+ }
+ break;
+ }
+
+ /*
+ * Append ::typename unless the constant will be implicitly typed as the
+ * right type when it is read in.
+ *
+ * XXX this code has to be kept in sync with the behavior of the parser,
+ * especially make_const.
+ */
+ switch (node->consttype)
+ {
+ case BOOLOID:
+ case INT4OID:
+ case UNKNOWNOID:
+ needlabel = false;
+ break;
+ case NUMERICOID:
+ needlabel = !isfloat || (node->consttypmod >= 0);
+ break;
+ default:
+ needlabel = true;
+ break;
+ }
+ if (needlabel)
+ {
+ appendStringInfo(buf, "::%s",
+ format_type_with_typemod(node->consttype,
+ node->consttypmod));
+ }
+}
+
+static void
+deparseBoolExpr(StringInfo buf,
+ BoolExpr *node,
+ PlannerInfo *root)
+{
+ ListCell *lc;
+ char *op = NULL; /* keep compiler quiet */
+ bool first;
+
+ switch (node->boolop)
+ {
+ case AND_EXPR:
+ op = "AND";
+ break;
+ case OR_EXPR:
+ op = "OR";
+ break;
+ case NOT_EXPR:
+ appendStringInfo(buf, "(NOT ");
+ deparseExpr(buf, list_nth(node->args, 0), root);
+ appendStringInfo(buf, ")");
+ return;
+ }
+
+ first = true;
+ appendStringInfo(buf, "(");
+ foreach(lc, node->args)
+ {
+ if (!first)
+ appendStringInfo(buf, " %s ", op);
+ deparseExpr(buf, (Expr *) lfirst(lc), root);
+ first = false;
+ }
+ appendStringInfo(buf, ")");
+}
+
+/*
+ * Deparse given IS [NOT] NULL test expression into buf.
+ */
+static void
+deparseNullTest(StringInfo buf,
+ NullTest *node,
+ PlannerInfo *root)
+{
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, node->arg, root);
+ if (node->nulltesttype == IS_NULL)
+ appendStringInfo(buf, " IS NULL)");
+ else
+ appendStringInfo(buf, " IS NOT NULL)");
+}
+
+static void
+deparseDistinctExpr(StringInfo buf,
+ DistinctExpr *node,
+ PlannerInfo *root)
+{
+ Assert(list_length(node->args) == 2);
+
+ deparseExpr(buf, linitial(node->args), root);
+ appendStringInfo(buf, " IS DISTINCT FROM ");
+ deparseExpr(buf, lsecond(node->args), root);
+}
+
+static void
+deparseRelabelType(StringInfo buf,
+ RelabelType *node,
+ PlannerInfo *root)
+{
+ char *typname;
+
+ Assert(node->arg);
+
+ /* We don't need to deparse cast when argument has same type as result. */
+ if (IsA(node->arg, Const) &&
+ ((Const *) node->arg)->consttype == node->resulttype &&
+ ((Const *) node->arg)->consttypmod == -1)
+ {
+ deparseExpr(buf, node->arg, root);
+ return;
+ }
+
+ typname = format_type_with_typemod(node->resulttype, node->resulttypmod);
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, node->arg, root);
+ appendStringInfo(buf, ")::%s", typname);
+}
+
+/*
+ * Deparse given node which represents a function call into buf. We treat only
+ * explicit function call and explicit cast (coerce), because others are
+ * processed on remote side if necessary.
+ *
+ * Function name (and type name) is always qualified by schema name to avoid
+ * problems caused by different setting of search_path on remote side.
+ */
+static void
+deparseFuncExpr(StringInfo buf,
+ FuncExpr *node,
+ PlannerInfo *root)
+{
+ Oid pronamespace;
+ const char *schemaname;
+ const char *funcname;
+ ListCell *arg;
+ bool first;
+
+ pronamespace = get_func_namespace(node->funcid);
+ schemaname = quote_identifier(get_namespace_name(pronamespace));
+ funcname = quote_identifier(get_func_name(node->funcid));
+
+ if (node->funcformat == COERCE_EXPLICIT_CALL)
+ {
+ /* Function call, deparse all arguments recursively. */
+ appendStringInfo(buf, "%s.%s(", schemaname, funcname);
+ first = true;
+ foreach(arg, node->args)
+ {
+ if (!first)
+ appendStringInfo(buf, ", ");
+ deparseExpr(buf, lfirst(arg), root);
+ first = false;
+ }
+ appendStringInfoChar(buf, ')');
+ }
+ else if (node->funcformat == COERCE_EXPLICIT_CAST)
+ {
+ /* Explicit cast, deparse only first argument. */
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, linitial(node->args), root);
+ appendStringInfo(buf, ")::%s", funcname);
+ }
+ else
+ {
+ /* Implicit cast, deparse only first argument. */
+ deparseExpr(buf, linitial(node->args), root);
+ }
+}
+
+/*
+ * Deparse given Param node into buf.
+ *
+ * We don't renumber parameter id, because skipping $1 is not cause problem
+ * as far as we pass through all arguments.
+ */
+static void
+deparseParam(StringInfo buf,
+ Param *node,
+ PlannerInfo *root)
+{
+ Assert(node->paramkind == PARAM_EXTERN);
+
+ appendStringInfo(buf, "$%d", node->paramid);
+}
+
+/*
+ * Deparse given ScalarArrayOpExpr expression into buf. To avoid problems
+ * around priority of operations, we always parenthesize the arguments. Also we
+ * use OPERATOR(schema.operator) notation to determine remote operator exactly.
+ */
+static void
+deparseScalarArrayOpExpr(StringInfo buf,
+ ScalarArrayOpExpr *node,
+ PlannerInfo *root)
+{
+ HeapTuple tuple;
+ Form_pg_operator form;
+ const char *opnspname;
+ char *opname;
+ Expr *arg1;
+ Expr *arg2;
+
+ /* Retrieve necessary information about the operator from system catalog. */
+ tuple = SearchSysCache1(OPEROID, ObjectIdGetDatum(node->opno));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for operator %u", node->opno);
+ form = (Form_pg_operator) GETSTRUCT(tuple);
+ /* opname is not a SQL identifier, so we don't need to quote it. */
+ opname = NameStr(form->oprname);
+ opnspname = quote_identifier(get_namespace_name(form->oprnamespace));
+ ReleaseSysCache(tuple);
+
+ /* Sanity check. */
+ Assert(list_length(node->args) == 2);
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, '(');
+
+ /* Extract operands. */
+ arg1 = linitial(node->args);
+ arg2 = lsecond(node->args);
+
+ /* Deparse fully qualified operator name. */
+ deparseExpr(buf, arg1, root);
+ appendStringInfo(buf, " OPERATOR(%s.%s) %s (",
+ opnspname, opname, node->useOr ? "ANY" : "ALL");
+ deparseExpr(buf, arg2, root);
+ appendStringInfoChar(buf, ')');
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, ')');
+}
+
+/*
+ * Deparse given operator expression into buf. To avoid problems around
+ * priority of operations, we always parenthesize the arguments. Also we use
+ * OPERATOR(schema.operator) notation to determine remote operator exactly.
+ */
+static void
+deparseOpExpr(StringInfo buf,
+ OpExpr *node,
+ PlannerInfo *root)
+{
+ HeapTuple tuple;
+ Form_pg_operator form;
+ const char *opnspname;
+ char *opname;
+ char oprkind;
+ ListCell *arg;
+
+ /* Retrieve necessary information about the operator from system catalog. */
+ tuple = SearchSysCache1(OPEROID, ObjectIdGetDatum(node->opno));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for operator %u", node->opno);
+ form = (Form_pg_operator) GETSTRUCT(tuple);
+ opnspname = quote_identifier(get_namespace_name(form->oprnamespace));
+ /* opname is not a SQL identifier, so we don't need to quote it. */
+ opname = NameStr(form->oprname);
+ oprkind = form->oprkind;
+ ReleaseSysCache(tuple);
+
+ /* Sanity check. */
+ Assert((oprkind == 'r' && list_length(node->args) == 1) ||
+ (oprkind == 'l' && list_length(node->args) == 1) ||
+ (oprkind == 'b' && list_length(node->args) == 2));
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, '(');
+
+ /* Deparse first operand. */
+ arg = list_head(node->args);
+ if (oprkind == 'r' || oprkind == 'b')
+ {
+ deparseExpr(buf, lfirst(arg), root);
+ appendStringInfoChar(buf, ' ');
+ }
+
+ /* Deparse fully qualified operator name. */
+ appendStringInfo(buf, "OPERATOR(%s.%s)", opnspname, opname);
+
+ /* Deparse last operand. */
+ arg = list_tail(node->args);
+ if (oprkind == 'l' || oprkind == 'b')
+ {
+ appendStringInfoChar(buf, ' ');
+ deparseExpr(buf, lfirst(arg), root);
+ }
+
+ appendStringInfoChar(buf, ')');
+}
+
+static void
+deparseArrayRef(StringInfo buf,
+ ArrayRef *node,
+ PlannerInfo *root)
+{
+ ListCell *lowlist_item;
+ ListCell *uplist_item;
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, '(');
+
+ /* Deparse referenced array expression first. */
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, node->refexpr, root);
+ appendStringInfoChar(buf, ')');
+
+ /* Deparse subscripts expression. */
+ lowlist_item = list_head(node->reflowerindexpr); /* could be NULL */
+ foreach(uplist_item, node->refupperindexpr)
+ {
+ appendStringInfoChar(buf, '[');
+ if (lowlist_item)
+ {
+ deparseExpr(buf, lfirst(lowlist_item), root);
+ appendStringInfoChar(buf, ':');
+ lowlist_item = lnext(lowlist_item);
+ }
+ deparseExpr(buf, lfirst(uplist_item), root);
+ appendStringInfoChar(buf, ']');
+ }
+
+ appendStringInfoChar(buf, ')');
+}
+
+
+/*
+ * Deparse given array of something into buf.
+ */
+static void
+deparseArrayExpr(StringInfo buf,
+ ArrayExpr *node,
+ PlannerInfo *root)
+{
+ ListCell *lc;
+ bool first = true;
+
+ appendStringInfo(buf, "ARRAY[");
+ foreach(lc, node->elements)
+ {
+ if (!first)
+ appendStringInfo(buf, ", ");
+ deparseExpr(buf, lfirst(lc), root);
+
+ first = false;
+ }
+ appendStringInfoChar(buf, ']');
+
+ /* If the array is empty, we need explicit cast to the array type. */
+ if (node->elements == NIL)
+ {
+ char *typname;
+
+ typname = format_type_with_typemod(node->array_typeid, -1);
+ appendStringInfo(buf, "::%s", typname);
+ }
+}
+
+/*
+ * Returns true if given expr is safe to evaluate on the foreign server. If
+ * result is true, extra information has_param tells whether given expression
+ * contains any Param node. This is useful to determine whether the expression
+ * can be used in remote EXPLAIN.
+ */
+static bool
+is_foreign_expr(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Expr *expr,
+ bool *has_param)
+{
+ foreign_executable_cxt context;
+ context.root = root;
+ context.foreignrel = baserel;
+ context.has_param = false;
+
+ /*
+ * An expression which includes any mutable function can't be pushed down
+ * because it's result is not stable. For example, pushing now() down to
+ * remote side would cause confusion from the clock offset.
+ * If we have routine mapping infrastructure in future release, we will be
+ * able to choose function to be pushed down in finer granularity.
+ */
+ if (contain_mutable_functions((Node *) expr))
+ return false;
+
+ /*
+ * Check that the expression consists of nodes which are known as safe to
+ * be pushed down.
+ */
+ if (foreign_expr_walker((Node *) expr, &context))
+ return false;
+
+ /*
+ * Tell caller whether the given expression contains any Param node, which
+ * can't be used in EXPLAIN statement before executor starts.
+ */
+ *has_param = context.has_param;
+
+ return true;
+}
+
+/*
+ * Return true if node includes any node which is not known as safe to be
+ * pushed down.
+ */
+static bool
+foreign_expr_walker(Node *node, foreign_executable_cxt *context)
+{
+ if (node == NULL)
+ return false;
+
+ /*
+ * Special case handling for List; expression_tree_walker handles List as
+ * well as other Expr nodes. For instance, List is used in RestrictInfo
+ * for args of FuncExpr node.
+ *
+ * Although the comments of expression_tree_walker mention that
+ * RangeTblRef, FromExpr, JoinExpr, and SetOperationStmt are handled as
+ * well, but we don't care them because they are not used in RestrictInfo.
+ * If one of them was passed into, default label catches it and give up
+ * traversing.
+ */
+ if (IsA(node, List))
+ {
+ ListCell *lc;
+
+ foreach(lc, (List *) node)
+ {
+ if (foreign_expr_walker(lfirst(lc), context))
+ return true;
+ }
+ return false;
+ }
+
+ /*
+ * If return type of given expression is not built-in, it can't be pushed
+ * down because it might has incompatible semantics on remote side.
+ */
+ if (!is_builtin(exprType(node)))
+ return true;
+
+ switch (nodeTag(node))
+ {
+ case T_Const:
+ /*
+ * Using anyarray and/or anyenum in remote query is not supported.
+ */
+ if (((Const *) node)->consttype == ANYARRAYOID ||
+ ((Const *) node)->consttype == ANYNONARRAYOID)
+ return true;
+ break;
+ case T_BoolExpr:
+ case T_NullTest:
+ case T_DistinctExpr:
+ case T_RelabelType:
+ /*
+ * These type of nodes are known as safe to be pushed down.
+ * Of course the subtree of the node, if any, should be checked
+ * continuously at the tail of this function.
+ */
+ break;
+ /*
+ * If function used by the expression is not built-in, it can't be
+ * pushed down because it might has incompatible semantics on remote
+ * side.
+ */
+ case T_FuncExpr:
+ {
+ FuncExpr *fe = (FuncExpr *) node;
+ if (!is_builtin(fe->funcid))
+ return true;
+ }
+ break;
+ case T_Param:
+ /*
+ * Only external parameters can be pushed down.:
+ */
+ {
+ if (((Param *) node)->paramkind != PARAM_EXTERN)
+ return true;
+
+ /* Mark that this expression contains Param node. */
+ context->has_param = true;
+ }
+ break;
+ case T_ScalarArrayOpExpr:
+ /*
+ * Only built-in operators can be pushed down. In addition,
+ * underlying function must be built-in and immutable, but we don't
+ * check volatility here; such check must be done already with
+ * contain_mutable_functions.
+ */
+ {
+ ScalarArrayOpExpr *oe = (ScalarArrayOpExpr *) node;
+
+ if (!is_builtin(oe->opno) || !is_builtin(oe->opfuncid))
+ return true;
+
+ /*
+ * If the operator takes collatable type as operands, we push
+ * down only "=" and "<>" which are not affected by collation.
+ * Other operators might be safe about collation, but these two
+ * seem enough to cover practical use cases.
+ */
+ if (exprInputCollation(node) != InvalidOid)
+ {
+ char *opname = get_opname(oe->opno);
+
+ if (strcmp(opname, "=") != 0 && strcmp(opname, "<>") != 0)
+ return true;
+ }
+
+ /* operands are checked later */
+ }
+ break;
+ case T_OpExpr:
+ /*
+ * Only built-in operators can be pushed down. In addition,
+ * underlying function must be built-in and immutable, but we don't
+ * check volatility here; such check must be done already with
+ * contain_mutable_functions.
+ */
+ {
+ OpExpr *oe = (OpExpr *) node;
+
+ if (!is_builtin(oe->opno) || !is_builtin(oe->opfuncid))
+ return true;
+
+ /*
+ * If the operator takes collatable type as operands, we push
+ * down only "=" and "<>" which are not affected by collation.
+ * Other operators might be safe about collation, but these two
+ * seem enough to cover practical use cases.
+ */
+ if (exprInputCollation(node) != InvalidOid)
+ {
+ char *opname = get_opname(oe->opno);
+
+ if (strcmp(opname, "=") != 0 && strcmp(opname, "<>") != 0)
+ return true;
+ }
+
+ /* operands are checked later */
+ }
+ break;
+ case T_Var:
+ /*
+ * Var can be pushed down if it is in the foreign table.
+ * XXX Var of other relation can be here?
+ */
+ {
+ Var *var = (Var *) node;
+ foreign_executable_cxt *f_context;
+
+ f_context = (foreign_executable_cxt *) context;
+ if (var->varno != f_context->foreignrel->relid ||
+ var->varlevelsup != 0)
+ return true;
+ }
+ break;
+ case T_ArrayRef:
+ /*
+ * ArrayRef which holds non-built-in typed elements can't be pushed
+ * down.
+ */
+ {
+ ArrayRef *ar = (ArrayRef *) node;;
+
+ if (!is_builtin(ar->refelemtype))
+ return true;
+
+ /* Assignment should not be in restrictions. */
+ if (ar->refassgnexpr != NULL)
+ return true;
+ }
+ break;
+ case T_ArrayExpr:
+ /*
+ * ArrayExpr which holds non-built-in typed elements can't be pushed
+ * down.
+ */
+ {
+ if (!is_builtin(((ArrayExpr *) node)->element_typeid))
+ return true;
+ }
+ break;
+ default:
+ {
+ ereport(DEBUG3,
+ (errmsg("expression is too complex"),
+ errdetail("%s", nodeToString(node))));
+ return true;
+ }
+ break;
+ }
+
+ return expression_tree_walker(node, foreign_expr_walker, context);
+}
+
+/*
+ * Return true if given object is one of built-in objects.
+ */
+static bool
+is_builtin(Oid oid)
+{
+ return (oid < FirstNormalObjectId);
+}
+
+/*
+ * Deparse WHERE clause from given list of RestrictInfo and append them to buf.
+ * We assume that buf already holds a SQL statement which ends with valid WHERE
+ * clause.
+ *
+ * Only when calling the first time for a statement, is_first should be true.
+ */
+void
+appendWhereClause(StringInfo buf,
+ bool is_first,
+ List *exprs,
+ PlannerInfo *root)
+{
+ bool first = true;
+ ListCell *lc;
+
+ foreach(lc, exprs)
+ {
+ RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
+
+ /* Connect expressions with "AND" and parenthesize whole condition. */
+ if (is_first && first)
+ appendStringInfo(buf, " WHERE ");
+ else
+ appendStringInfo(buf, " AND ");
+
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, ri->clause, root);
+ appendStringInfoChar(buf, ')');
+
+ first = false;
+ }
+}
diff --git a/contrib/postgresql_fdw/expected/postgresql_fdw.out b/contrib/postgresql_fdw/expected/postgresql_fdw.out
new file mode 100644
index 0000000..c01e3f9
--- /dev/null
+++ b/contrib/postgresql_fdw/expected/postgresql_fdw.out
@@ -0,0 +1,693 @@
+-- ===================================================================
+-- create FDW objects
+-- ===================================================================
+-- Clean up in case a prior regression run failed
+-- Suppress NOTICE messages when roles don't exist
+SET client_min_messages TO 'error';
+DROP ROLE IF EXISTS postgresql_fdw_user;
+RESET client_min_messages;
+CREATE ROLE postgresql_fdw_user LOGIN SUPERUSER;
+SET SESSION AUTHORIZATION 'postgresql_fdw_user';
+CREATE EXTENSION postgresql_fdw;
+CREATE SERVER loopback1 FOREIGN DATA WRAPPER postgresql_fdw;
+CREATE SERVER loopback2 FOREIGN DATA WRAPPER postgresql_fdw
+ OPTIONS (dbname 'contrib_regression');
+CREATE USER MAPPING FOR public SERVER loopback1
+ OPTIONS (user 'value', password 'value');
+CREATE USER MAPPING FOR postgresql_fdw_user SERVER loopback2;
+CREATE FOREIGN TABLE ft1 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10)
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft1 DROP COLUMN c0;
+CREATE FOREIGN TABLE ft2 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10)
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft2 DROP COLUMN c0;
+-- ===================================================================
+-- create objects used through FDW
+-- ===================================================================
+CREATE SCHEMA "S 1";
+CREATE TABLE "S 1"."T 1" (
+ "C 1" int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
+);
+CREATE TABLE "S 1"."T 2" (
+ c1 int NOT NULL,
+ c2 text,
+ CONSTRAINT t2_pkey PRIMARY KEY (c1)
+);
+BEGIN;
+TRUNCATE "S 1"."T 1";
+INSERT INTO "S 1"."T 1"
+ SELECT id,
+ id % 10,
+ to_char(id, 'FM00000'),
+ '1970-01-01'::timestamptz + ((id % 100) || ' days')::interval,
+ '1970-01-01'::timestamp + ((id % 100) || ' days')::interval,
+ id % 10,
+ id % 10
+ FROM generate_series(1, 1000) id;
+TRUNCATE "S 1"."T 2";
+INSERT INTO "S 1"."T 2"
+ SELECT id,
+ 'AAA' || to_char(id, 'FM000')
+ FROM generate_series(1, 100) id;
+COMMIT;
+-- ===================================================================
+-- tests for postgresql_fdw_validator
+-- ===================================================================
+ALTER FOREIGN DATA WRAPPER postgresql_fdw OPTIONS (host 'value'); -- ERROR
+ERROR: invalid option "host"
+HINT: Valid options in this context are:
+-- requiressl, krbsrvname and gsslib are omitted because they depend on
+-- configure option
+ALTER SERVER loopback1 OPTIONS (
+ authtype 'value',
+ service 'value',
+ connect_timeout 'value',
+ dbname 'value',
+ host 'value',
+ hostaddr 'value',
+ port 'value',
+ --client_encoding 'value',
+ tty 'value',
+ options 'value',
+ application_name 'value',
+ --fallback_application_name 'value',
+ keepalives 'value',
+ keepalives_idle 'value',
+ keepalives_interval 'value',
+ -- requiressl 'value',
+ sslcompression 'value',
+ sslmode 'value',
+ sslcert 'value',
+ sslkey 'value',
+ sslrootcert 'value',
+ sslcrl 'value'
+ --requirepeer 'value',
+ -- krbsrvname 'value',
+ -- gsslib 'value',
+ --replication 'value'
+);
+ALTER SERVER loopback1 OPTIONS (user 'value'); -- ERROR
+ERROR: invalid option "user"
+HINT: Valid options in this context are: authtype, service, connect_timeout, dbname, host, hostaddr, port, tty, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, requiressl, sslcompression, sslmode, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, krbsrvname, gsslib
+ALTER USER MAPPING FOR public SERVER loopback1
+ OPTIONS (DROP user, DROP password);
+ALTER USER MAPPING FOR public SERVER loopback1
+ OPTIONS (host 'value'); -- ERROR
+ERROR: invalid option "host"
+HINT: Valid options in this context are: user, password
+ALTER FOREIGN TABLE ft1 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft2 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft1 OPTIONS (invalid 'value'); -- ERROR
+ERROR: invalid option "invalid"
+HINT: Valid options in this context are: nspname, relname
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (invalid 'value'); -- ERROR
+ERROR: invalid option "invalid"
+HINT: Valid options in this context are: colname
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+\dew+
+ List of foreign-data wrappers
+ Name | Owner | Handler | Validator | Access privileges | FDW Options | Description
+----------------+---------------------+------------------------+--------------------------+-------------------+-------------+-------------
+ postgresql_fdw | postgresql_fdw_user | postgresql_fdw_handler | postgresql_fdw_validator | | |
+(1 row)
+
+\des+
+ List of foreign servers
+ Name | Owner | Foreign-data wrapper | Access privileges | Type | Version | FDW Options | Description
+-----------+---------------------+----------------------+-------------------+------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
+ loopback1 | postgresql_fdw_user | postgresql_fdw | | | | (authtype 'value', service 'value', connect_timeout 'value', dbname 'value', host 'value', hostaddr 'value', port 'value', tty 'value', options 'value', application_name 'value', keepalives 'value', keepalives_idle 'value', keepalives_interval 'value', sslcompression 'value', sslmode 'value', sslcert 'value', sslkey 'value', sslrootcert 'value', sslcrl 'value') |
+ loopback2 | postgresql_fdw_user | postgresql_fdw | | | | (dbname 'contrib_regression') |
+(2 rows)
+
+\deu+
+ List of user mappings
+ Server | User name | FDW Options
+-----------+---------------------+-------------
+ loopback1 | public |
+ loopback2 | postgresql_fdw_user |
+(2 rows)
+
+\det+
+ List of foreign tables
+ Schema | Table | Server | FDW Options | Description
+--------+-------+-----------+--------------------------------+-------------
+ public | ft1 | loopback2 | (nspname 'S 1', relname 'T 1') |
+ public | ft2 | loopback2 | (nspname 'S 1', relname 'T 1') |
+(2 rows)
+
+-- ===================================================================
+-- simple queries
+-- ===================================================================
+-- single table, with/without alias
+EXPLAIN (COSTS false) SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Limit
+ -> Sort
+ Sort Key: c3, c1
+ -> Foreign Scan on ft1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1"
+(5 rows)
+
+SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7
+-----+----+-------+------------------------------+--------------------------+----+------------
+ 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1
+ 102 | 2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2
+ 103 | 3 | 00103 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3
+ 104 | 4 | 00104 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4
+ 105 | 5 | 00105 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5
+ 106 | 6 | 00106 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6
+ 107 | 7 | 00107 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7
+ 108 | 8 | 00108 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8
+ 109 | 9 | 00109 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9
+ 110 | 0 | 00110 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0
+(10 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Limit
+ -> Sort
+ Sort Key: c3, c1
+ -> Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1"
+(5 rows)
+
+SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7
+-----+----+-------+------------------------------+--------------------------+----+------------
+ 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1
+ 102 | 2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2
+ 103 | 3 | 00103 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3
+ 104 | 4 | 00104 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4
+ 105 | 5 | 00105 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5
+ 106 | 6 | 00106 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6
+ 107 | 7 | 00107 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7
+ 108 | 8 | 00108 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8
+ 109 | 9 | 00109 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9
+ 110 | 0 | 00110 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0
+(10 rows)
+
+-- empty result
+SELECT * FROM ft1 WHERE false;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7
+----+----+----+----+----+----+----
+(0 rows)
+
+-- with WHERE clause
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Filter: (c7 >= '1'::bpchar)
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 101)) AND (((c6)::text OPERATOR(pg_catalog.=) '1'::text))
+(3 rows)
+
+SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+ c1 | c2 | c3 | c4 | c5 | c6 | c7
+-----+----+-------+------------------------------+--------------------------+----+------------
+ 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1
+(1 row)
+
+-- aggregate
+SELECT COUNT(*) FROM ft1 t1;
+ count
+-------
+ 1000
+(1 row)
+
+-- join two tables
+SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ c1
+-----
+ 101
+ 102
+ 103
+ 104
+ 105
+ 106
+ 107
+ 108
+ 109
+ 110
+(10 rows)
+
+-- subquery
+SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7
+----+----+-------+------------------------------+--------------------------+----+------------
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1
+ 2 | 2 | 00002 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2
+ 3 | 3 | 00003 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3
+ 4 | 4 | 00004 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4
+ 5 | 5 | 00005 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5
+ 6 | 6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6
+ 7 | 7 | 00007 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7
+ 8 | 8 | 00008 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8
+ 9 | 9 | 00009 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9
+ 10 | 0 | 00010 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0
+(10 rows)
+
+-- subquery+MAX
+SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY c1;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7
+------+----+-------+------------------------------+--------------------------+----+------------
+ 1000 | 0 | 01000 | Thu Jan 01 00:00:00 1970 PST | Thu Jan 01 00:00:00 1970 | 0 | 0
+(1 row)
+
+-- used in CTE
+WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
+ c1 | c2 | c3 | c4
+----+----+-------+------------------------------
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST
+ 2 | 2 | 00002 | Sat Jan 03 00:00:00 1970 PST
+ 3 | 3 | 00003 | Sun Jan 04 00:00:00 1970 PST
+ 4 | 4 | 00004 | Mon Jan 05 00:00:00 1970 PST
+ 5 | 5 | 00005 | Tue Jan 06 00:00:00 1970 PST
+ 6 | 6 | 00006 | Wed Jan 07 00:00:00 1970 PST
+ 7 | 7 | 00007 | Thu Jan 08 00:00:00 1970 PST
+ 8 | 8 | 00008 | Fri Jan 09 00:00:00 1970 PST
+ 9 | 9 | 00009 | Sat Jan 10 00:00:00 1970 PST
+ 10 | 0 | 00010 | Sun Jan 11 00:00:00 1970 PST
+(10 rows)
+
+-- fixed values
+SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+ ?column? | ?column?
+----------+----------
+ fixed |
+(1 row)
+
+-- user-defined operator/function
+CREATE FUNCTION postgresql_fdw_abs(int) RETURNS int AS $$
+BEGIN
+RETURN abs($1);
+END
+$$ LANGUAGE plpgsql IMMUTABLE;
+CREATE OPERATOR === (
+ LEFTARG = int,
+ RIGHTARG = int,
+ PROCEDURE = int4eq,
+ COMMUTATOR = ===,
+ NEGATOR = !==
+);
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = postgresql_fdw_abs(t1.c2);
+ QUERY PLAN
+---------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Filter: (c1 = postgresql_fdw_abs(c2))
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1"
+(3 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 === t1.c2;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Filter: (c1 === c2)
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1"
+(3 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = abs(t1.c2);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) pg_catalog.abs(c2)))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = t1.c2;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) c2))
+(2 rows)
+
+-- ===================================================================
+-- WHERE push down
+-- ===================================================================
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1; -- Var, OpExpr(b), Const
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 100)) AND ((c2 OPERATOR(pg_catalog.=) 0))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NULL; -- NullTest
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (("C 1" IS NULL))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL; -- NullTest
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (("C 1" IS NOT NULL))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE ((pg_catalog.round(pg_catalog.abs("C 1"), 0) OPERATOR(pg_catalog.=) 1::numeric))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = -c1; -- OpExpr(l)
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) (OPERATOR(pg_catalog.-) "C 1")))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE 1 = c1!; -- OpExpr(r)
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE ((1::numeric OPERATOR(pg_catalog.=) ("C 1" OPERATOR(pg_catalog.!))))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (("C 1" IS NOT NULL) IS DISTINCT FROM ("C 1" IS NOT NULL))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) ANY (ARRAY[c2, 1, ("C 1" OPERATOR(pg_catalog.+) 0)])))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 ft WHERE c1 = (ARRAY[c1,c2,3])[1]; -- ArrayRef
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 ft
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) ((ARRAY["C 1", c2, 3])[1])))
+(2 rows)
+
+-- ===================================================================
+-- parameterized queries
+-- ===================================================================
+-- simple join
+PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
+EXPLAIN (COSTS false) EXECUTE st1(1, 2);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------
+ Nested Loop
+ -> Foreign Scan on ft1 t1
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+ -> Foreign Scan on ft2 t2
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 2))
+(5 rows)
+
+EXECUTE st1(1, 1);
+ c3 | c3
+-------+-------
+ 00001 | 00001
+(1 row)
+
+EXECUTE st1(101, 101);
+ c3 | c3
+-------+-------
+ 00101 | 00101
+(1 row)
+
+-- subquery using stable function (can't be pushed down)
+PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c4) = 6) ORDER BY c1;
+EXPLAIN (COSTS false) EXECUTE st2(10, 20);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
+ Sort Key: t1.c1
+ -> Hash Join
+ Hash Cond: (t1.c3 = t2.c3)
+ -> Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.<) 20))
+ -> Hash
+ -> HashAggregate
+ -> Foreign Scan on ft2 t2
+ Filter: (date_part('dow'::text, c4) = 6::double precision)
+ Remote SQL: SELECT NULL, NULL, c3, c4, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.>) 10))
+(11 rows)
+
+EXECUTE st2(10, 20);
+ c1 | c2 | c3 | c4 | c5 | c6 | c7
+----+----+-------+------------------------------+--------------------------+----+------------
+ 16 | 6 | 00016 | Sat Jan 17 00:00:00 1970 PST | Sat Jan 17 00:00:00 1970 | 6 | 6
+(1 row)
+
+EXECUTE st1(101, 101);
+ c3 | c3
+-------+-------
+ 00101 | 00101
+(1 row)
+
+-- subquery using immutable function (can be pushed down)
+PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c5) = 6) ORDER BY c1;
+EXPLAIN (COSTS false) EXECUTE st3(10, 20);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
+ Sort Key: t1.c1
+ -> Nested Loop Semi Join
+ Join Filter: (t1.c3 = t2.c3)
+ -> Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.<) 20))
+ -> Materialize
+ -> Foreign Scan on ft2 t2
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.>) 10)) AND ((pg_catalog.date_part('dow'::text, c5) OPERATOR(pg_catalog.=) 6::double precision))
+(9 rows)
+
+EXECUTE st3(10, 20);
+ c1 | c2 | c3 | c4 | c5 | c6 | c7
+----+----+-------+------------------------------+--------------------------+----+------------
+ 16 | 6 | 00016 | Sat Jan 17 00:00:00 1970 PST | Sat Jan 17 00:00:00 1970 | 6 | 6
+(1 row)
+
+EXECUTE st3(20, 30);
+ c1 | c2 | c3 | c4 | c5 | c6 | c7
+----+----+-------+------------------------------+--------------------------+----+------------
+ 23 | 3 | 00023 | Sat Jan 24 00:00:00 1970 PST | Sat Jan 24 00:00:00 1970 | 3 | 3
+(1 row)
+
+-- custom plan should be chosen
+PREPARE st4(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 = $1;
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) $1))
+(2 rows)
+
+-- cleanup
+DEALLOCATE st1;
+DEALLOCATE st2;
+DEALLOCATE st3;
+DEALLOCATE st4;
+-- ===================================================================
+-- used in pl/pgsql function
+-- ===================================================================
+CREATE OR REPLACE FUNCTION f_test(p_c1 int) RETURNS int AS $$
+DECLARE
+ v_c1 int;
+BEGIN
+ SELECT c1 INTO v_c1 FROM ft1 WHERE c1 = p_c1 LIMIT 1;
+ PERFORM c1 FROM ft1 WHERE c1 = p_c1 AND p_c1 = v_c1 LIMIT 1;
+ RETURN v_c1;
+END;
+$$ LANGUAGE plpgsql;
+SELECT f_test(100);
+ f_test
+--------
+ 100
+(1 row)
+
+DROP FUNCTION f_test(int);
+-- ===================================================================
+-- connection management
+-- ===================================================================
+SELECT srvname, usename FROM postgresql_fdw_connections;
+ srvname | usename
+-----------+---------------------
+ loopback2 | postgresql_fdw_user
+(1 row)
+
+SELECT postgresql_fdw_disconnect(srvid, usesysid) FROM postgresql_fdw_get_connections();
+ postgresql_fdw_disconnect
+---------------------------
+ OK
+(1 row)
+
+SELECT srvname, usename FROM postgresql_fdw_connections;
+ srvname | usename
+---------+---------
+(0 rows)
+
+-- ===================================================================
+-- conversion error
+-- ===================================================================
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE int;
+SELECT * FROM ft1 WHERE c1 = 1; -- ERROR
+ERROR: invalid input syntax for integer: "1970-01-02 00:00:00"
+CONTEXT: column c5 of foreign table ft1
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE timestamp;
+-- ===================================================================
+-- subtransaction
+-- + local/remote error doesn't break cursor
+-- + remote error discards connection
+-- ===================================================================
+BEGIN;
+DECLARE c CURSOR FOR SELECT * FROM ft1 ORDER BY c1;
+FETCH c;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7
+----+----+-------+------------------------------+--------------------------+----+------------
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1
+(1 row)
+
+SAVEPOINT s;
+ERROR OUT; -- ERROR
+ERROR: syntax error at or near "ERROR"
+LINE 1: ERROR OUT;
+ ^
+ROLLBACK TO s;
+SELECT srvname FROM postgresql_fdw_connections;
+ srvname
+-----------
+ loopback2
+(1 row)
+
+FETCH c;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7
+----+----+-------+------------------------------+--------------------------+----+------------
+ 2 | 2 | 00002 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2
+(1 row)
+
+SAVEPOINT s;
+SELECT * FROM ft1 WHERE 1 / (c1 - 1) > 0; -- ERROR
+ERROR: could not execute remote query
+DETAIL: ERROR: division by zero
+
+HINT: SELECT "C 1", c2, c3, c4, c5, c6, c7 FROM "S 1"."T 1" WHERE (((1 OPERATOR(pg_catalog./) ("C 1" OPERATOR(pg_catalog.-) 1)) OPERATOR(pg_catalog.>) 0))
+ROLLBACK TO s;
+SELECT srvname FROM postgresql_fdw_connections;
+ srvname
+---------
+(0 rows)
+
+FETCH c;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7
+----+----+-------+------------------------------+--------------------------+----+------------
+ 3 | 3 | 00003 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3
+(1 row)
+
+SELECT * FROM ft1 ORDER BY c1 LIMIT 1;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7
+----+----+-------+------------------------------+--------------------------+----+------------
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1
+(1 row)
+
+COMMIT;
+SELECT srvname FROM postgresql_fdw_connections;
+ srvname
+-----------
+ loopback2
+(1 row)
+
+ERROR OUT; -- ERROR
+ERROR: syntax error at or near "ERROR"
+LINE 1: ERROR OUT;
+ ^
+SELECT srvname FROM postgresql_fdw_connections;
+ srvname
+---------
+(0 rows)
+
+-- ===================================================================
+-- cleanup
+-- ===================================================================
+DROP OPERATOR === (int, int) CASCADE;
+DROP OPERATOR !== (int, int) CASCADE;
+DROP FUNCTION postgresql_fdw_abs(int);
+DROP SCHEMA "S 1" CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to table "S 1"."T 1"
+drop cascades to table "S 1"."T 2"
+DROP EXTENSION postgresql_fdw CASCADE;
+NOTICE: drop cascades to 6 other objects
+DETAIL: drop cascades to server loopback1
+drop cascades to user mapping for public
+drop cascades to server loopback2
+drop cascades to user mapping for postgresql_fdw_user
+drop cascades to foreign table ft1
+drop cascades to foreign table ft2
+\c
+DROP ROLE postgresql_fdw_user;
diff --git a/contrib/postgresql_fdw/option.c b/contrib/postgresql_fdw/option.c
new file mode 100644
index 0000000..9e1f0e2
--- /dev/null
+++ b/contrib/postgresql_fdw/option.c
@@ -0,0 +1,222 @@
+/*-------------------------------------------------------------------------
+ *
+ * option.c
+ * FDW option handling
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgresql_fdw/option.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/reloptions.h"
+#include "catalog/pg_foreign_data_wrapper.h"
+#include "catalog/pg_foreign_server.h"
+#include "catalog/pg_foreign_table.h"
+#include "catalog/pg_user_mapping.h"
+#include "commands/defrem.h"
+#include "fmgr.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+
+#include "postgresql_fdw.h"
+
+/*
+ * SQL functions
+ */
+extern Datum postgresql_fdw_validator(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgresql_fdw_validator);
+
+/*
+ * Describes the valid options for objects that use this wrapper.
+ */
+typedef struct PgsqlFdwOption
+{
+ const char *optname;
+ Oid optcontext; /* Oid of catalog in which options may appear */
+ bool is_libpq_opt; /* true if it's used in libpq */
+} PgsqlFdwOption;
+
+/*
+ * Valid options for postgresql_fdw.
+ */
+static PgsqlFdwOption valid_options[] = {
+
+ /*
+ * Options for libpq connection.
+ * Note: This list should be updated along with PQconninfoOptions in
+ * interfaces/libpq/fe-connect.c, so the order is kept as is.
+ *
+ * Some useless libpq connection options are not accepted by postgresql_fdw:
+ * client_encoding: set to local database encoding automatically
+ * fallback_application_name: fixed to "postgresql_fdw"
+ * replication: postgresql_fdw never be replication client
+ */
+ {"authtype", ForeignServerRelationId, true},
+ {"service", ForeignServerRelationId, true},
+ {"user", UserMappingRelationId, true},
+ {"password", UserMappingRelationId, true},
+ {"connect_timeout", ForeignServerRelationId, true},
+ {"dbname", ForeignServerRelationId, true},
+ {"host", ForeignServerRelationId, true},
+ {"hostaddr", ForeignServerRelationId, true},
+ {"port", ForeignServerRelationId, true},
+#ifdef NOT_USED
+ {"client_encoding", ForeignServerRelationId, true},
+#endif
+ {"tty", ForeignServerRelationId, true},
+ {"options", ForeignServerRelationId, true},
+ {"application_name", ForeignServerRelationId, true},
+#ifdef NOT_USED
+ {"fallback_application_name", ForeignServerRelationId, true},
+#endif
+ {"keepalives", ForeignServerRelationId, true},
+ {"keepalives_idle", ForeignServerRelationId, true},
+ {"keepalives_interval", ForeignServerRelationId, true},
+ {"keepalives_count", ForeignServerRelationId, true},
+ {"requiressl", ForeignServerRelationId, true},
+ {"sslcompression", ForeignServerRelationId, true},
+ {"sslmode", ForeignServerRelationId, true},
+ {"sslcert", ForeignServerRelationId, true},
+ {"sslkey", ForeignServerRelationId, true},
+ {"sslrootcert", ForeignServerRelationId, true},
+ {"sslcrl", ForeignServerRelationId, true},
+ {"requirepeer", ForeignServerRelationId, true},
+ {"krbsrvname", ForeignServerRelationId, true},
+ {"gsslib", ForeignServerRelationId, true},
+#ifdef NOT_USED
+ {"replication", ForeignServerRelationId, true},
+#endif
+
+ /*
+ * Options for translation of object names.
+ */
+ {"nspname", ForeignTableRelationId, false},
+ {"relname", ForeignTableRelationId, false},
+ {"colname", AttributeRelationId, false},
+
+ /* Terminating entry --- MUST BE LAST */
+ {NULL, InvalidOid, false}
+};
+
+/*
+ * Helper functions
+ */
+static bool is_valid_option(const char *optname, Oid context);
+
+/*
+ * Validate the generic options given to a FOREIGN DATA WRAPPER, SERVER,
+ * USER MAPPING or FOREIGN TABLE that uses postgresql_fdw.
+ *
+ * Raise an ERROR if the option or its value is considered invalid.
+ */
+Datum
+postgresql_fdw_validator(PG_FUNCTION_ARGS)
+{
+ List *options_list = untransformRelOptions(PG_GETARG_DATUM(0));
+ Oid catalog = PG_GETARG_OID(1);
+ ListCell *cell;
+
+ /*
+ * Check that only options supported by postgresql_fdw, and allowed for the
+ * current object type, are given.
+ */
+ foreach(cell, options_list)
+ {
+ DefElem *def = (DefElem *) lfirst(cell);
+
+ if (!is_valid_option(def->defname, catalog))
+ {
+ PgsqlFdwOption *opt;
+ StringInfoData buf;
+
+ /*
+ * Unknown option specified, complain about it. Provide a hint
+ * with list of valid options for the object.
+ */
+ initStringInfo(&buf);
+ for (opt = valid_options; opt->optname; opt++)
+ {
+ if (catalog == opt->optcontext)
+ appendStringInfo(&buf, "%s%s", (buf.len > 0) ? ", " : "",
+ opt->optname);
+ }
+
+ ereport(ERROR,
+ (errcode(ERRCODE_FDW_INVALID_OPTION_NAME),
+ errmsg("invalid option \"%s\"", def->defname),
+ errhint("Valid options in this context are: %s",
+ buf.data)));
+ }
+ }
+
+ /*
+ * We don't care option-specific limitation here; they will be validated at
+ * the execution time.
+ */
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Check whether the given option is one of the valid postgresql_fdw options.
+ * context is the Oid of the catalog holding the object the option is for.
+ */
+static bool
+is_valid_option(const char *optname, Oid context)
+{
+ PgsqlFdwOption *opt;
+
+ for (opt = valid_options; opt->optname; opt++)
+ {
+ if (context == opt->optcontext && strcmp(opt->optname, optname) == 0)
+ return true;
+ }
+ return false;
+}
+
+/*
+ * Check whether the given option is one of the valid libpq options.
+ * context is the Oid of the catalog holding the object the option is for.
+ */
+static bool
+is_libpq_option(const char *optname)
+{
+ PgsqlFdwOption *opt;
+
+ for (opt = valid_options; opt->optname; opt++)
+ {
+ if (strcmp(opt->optname, optname) == 0 && opt->is_libpq_opt)
+ return true;
+ }
+ return false;
+}
+
+/*
+ * Generate key-value arrays which includes only libpq options from the list
+ * which contains any kind of options.
+ */
+int
+ExtractConnectionOptions(List *defelems, const char **keywords, const char **values)
+{
+ ListCell *lc;
+ int i;
+
+ i = 0;
+ foreach(lc, defelems)
+ {
+ DefElem *d = (DefElem *) lfirst(lc);
+ if (is_libpq_option(d->defname))
+ {
+ keywords[i] = d->defname;
+ values[i] = defGetString(d);
+ i++;
+ }
+ }
+ return i;
+}
+
diff --git a/contrib/postgresql_fdw/postgresql_fdw--1.0.sql b/contrib/postgresql_fdw/postgresql_fdw--1.0.sql
new file mode 100644
index 0000000..965cb85
--- /dev/null
+++ b/contrib/postgresql_fdw/postgresql_fdw--1.0.sql
@@ -0,0 +1,39 @@
+/* contrib/postgresql_fdw/postgresql_fdw--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION postgresql_fdw" to load this file. \quit
+
+CREATE FUNCTION postgresql_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION postgresql_fdw_validator(text[], oid)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER postgresql_fdw
+ HANDLER postgresql_fdw_handler
+ VALIDATOR postgresql_fdw_validator;
+
+/* connection management functions and view */
+CREATE FUNCTION postgresql_fdw_get_connections(out srvid oid, out usesysid oid)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION postgresql_fdw_disconnect(oid, oid)
+RETURNS text
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE VIEW postgresql_fdw_connections AS
+SELECT c.srvid srvid,
+ s.srvname srvname,
+ c.usesysid usesysid,
+ pg_get_userbyid(c.usesysid) usename
+ FROM postgresql_fdw_get_connections() c
+ JOIN pg_catalog.pg_foreign_server s ON (s.oid = c.srvid);
+GRANT SELECT ON postgresql_fdw_connections TO public;
+
diff --git a/contrib/postgresql_fdw/postgresql_fdw.c b/contrib/postgresql_fdw/postgresql_fdw.c
new file mode 100644
index 0000000..a60785d
--- /dev/null
+++ b/contrib/postgresql_fdw/postgresql_fdw.c
@@ -0,0 +1,1370 @@
+/*-------------------------------------------------------------------------
+ *
+ * postgresql_fdw.c
+ * foreign-data wrapper for remote PostgreSQL servers.
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgresql_fdw/postgresql_fdw.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "fmgr.h"
+
+#include "access/htup_details.h"
+#include "catalog/pg_foreign_server.h"
+#include "catalog/pg_foreign_table.h"
+#include "catalog/pg_type.h"
+#include "commands/defrem.h"
+#include "commands/explain.h"
+#include "commands/vacuum.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+#include "postgresql_fdw.h"
+#include "connection.h"
+
+PG_MODULE_MAGIC;
+
+/*
+ * Cost to establish a connection.
+ * XXX: should be configurable per server?
+ */
+#define CONNECTION_COSTS 100.0
+
+/*
+ * Cost to transfer 1 byte from remote server.
+ * XXX: should be configurable per server?
+ */
+#define TRANSFER_COSTS_PER_BYTE 0.001
+
+/*
+ * FDW-specific information for RelOptInfo.fdw_private. This is used to pass
+ * information from pgsqlGetForeignRelSize to pgsqlGetForeignPaths.
+ */
+typedef struct PgsqlFdwPlanState {
+ /*
+ * These are generated in GetForeignRelSize, and also used in subsequent
+ * GetForeignPaths.
+ */
+ StringInfoData sql;
+ Cost startup_cost;
+ Cost total_cost;
+ List *remote_conds;
+ List *param_conds;
+ List *local_conds;
+
+ /* Cached catalog information. */
+ ForeignTable *table;
+ ForeignServer *server;
+} PgsqlFdwPlanState;
+
+/*
+ * Index of FDW-private information stored in fdw_private list.
+ *
+ * We store various information in ForeignScan.fdw_private to pass them beyond
+ * the boundary between planner and executor. Finally FdwPlan holds items
+ * below:
+ *
+ * 1) plain SELECT statement
+ *
+ * These items are indexed with the enum FdwPrivateIndex, so an item
+ * can be accessed directly via list_nth(). For example of SELECT statement:
+ * sql = list_nth(fdw_private, FdwPrivateSelectSql)
+ */
+enum FdwPrivateIndex {
+ /* SQL statements */
+ FdwPrivateSelectSql,
+
+ /* # of elements stored in the list fdw_private */
+ FdwPrivateNum,
+};
+
+/*
+ * Describe the attribute where data conversion fails.
+ */
+typedef struct ErrorPos {
+ Oid relid; /* oid of the foreign table */
+ AttrNumber cur_attno; /* attribute number under process */
+} ErrorPos;
+
+/*
+ * Describes an execution state of a foreign scan against a foreign table
+ * using postgresql_fdw.
+ */
+typedef struct PgsqlFdwExecutionState
+{
+ List *fdw_private; /* FDW-private information */
+
+ /* for remote query execution */
+ PGconn *conn; /* connection for the scan */
+ Oid *param_types; /* type array of external parameter */
+ const char **param_values; /* value array of external parameter */
+
+ /* for tuple generation. */
+ AttrNumber attnum; /* # of non-dropped attribute */
+ Datum *values; /* column value buffer */
+ bool *nulls; /* column null indicator buffer */
+ AttInMetadata *attinmeta; /* attribute metadata */
+
+ /* for storing result tuples */
+ MemoryContext scan_cxt; /* context for per-scan lifespan data */
+ MemoryContext temp_cxt; /* context for per-tuple temporary data */
+ Tuplestorestate *tuples; /* result of the scan */
+
+ /* for error handling. */
+ ErrorPos errpos;
+} PgsqlFdwExecutionState;
+
+/*
+ * Describes a state of analyze request for a foreign table.
+ */
+typedef struct PgsqlAnalyzeState
+{
+ /* for tuple generation. */
+ TupleDesc tupdesc;
+ AttInMetadata *attinmeta;
+ Datum *values;
+ bool *nulls;
+
+ /* for random sampling */
+ HeapTuple *rows; /* result buffer */
+ int targrows; /* target # of sample rows */
+ int numrows; /* # of samples collected */
+ double samplerows; /* # of rows fetched */
+ double rowstoskip; /* # of rows skipped before next sample */
+ double rstate; /* random state */
+
+ /* for storing result tuples */
+ MemoryContext anl_cxt; /* context for per-analyze lifespan data */
+ MemoryContext temp_cxt; /* context for per-tuple temporary data */
+
+ /* for error handling. */
+ ErrorPos errpos;
+} PgsqlAnalyzeState;
+
+/*
+ * SQL functions
+ */
+extern Datum postgresql_fdw_handler(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgresql_fdw_handler);
+
+/*
+ * FDW callback routines
+ */
+static void pgsqlGetForeignRelSize(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid);
+static void pgsqlGetForeignPaths(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid);
+static ForeignScan *pgsqlGetForeignPlan(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid,
+ ForeignPath *best_path,
+ List *tlist,
+ List *scan_clauses);
+static void pgsqlExplainForeignScan(ForeignScanState *node, ExplainState *es);
+static void pgsqlBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *pgsqlIterateForeignScan(ForeignScanState *node);
+static void pgsqlReScanForeignScan(ForeignScanState *node);
+static void pgsqlEndForeignScan(ForeignScanState *node);
+static bool pgsqlAnalyzeForeignTable(Relation relation,
+ AcquireSampleRowsFunc *func,
+ BlockNumber *totalpages);
+
+/*
+ * Helper functions
+ */
+static void get_remote_estimate(const char *sql,
+ PGconn *conn,
+ double *rows,
+ int *width,
+ Cost *startup_cost,
+ Cost *total_cost);
+static void adjust_costs(double rows, int width,
+ Cost *startup_cost, Cost *total_cost);
+static void execute_query(ForeignScanState *node);
+static void query_row_processor(PGresult *res, ForeignScanState *node,
+ bool first);
+static void analyze_row_processor(PGresult *res, PgsqlAnalyzeState *astate,
+ bool first);
+static void postgresql_fdw_error_callback(void *arg);
+static int pgsqlAcquireSampleRowsFunc(Relation relation, int elevel,
+ HeapTuple *rows, int targrows,
+ double *totalrows,
+ double *totaldeadrows);
+
+/* Exported functions, but not written in postgresql_fdw.h. */
+void _PG_init(void);
+void _PG_fini(void);
+
+/*
+ * Module-specific initialization.
+ */
+void
+_PG_init(void)
+{
+}
+
+/*
+ * Module-specific clean up.
+ */
+void
+_PG_fini(void)
+{
+}
+
+/*
+ * Foreign-data wrapper handler function: return a struct with pointers
+ * to my callback routines.
+ */
+Datum
+postgresql_fdw_handler(PG_FUNCTION_ARGS)
+{
+ FdwRoutine *routine = makeNode(FdwRoutine);
+
+ /* Required handler functions. */
+ routine->GetForeignRelSize = pgsqlGetForeignRelSize;
+ routine->GetForeignPaths = pgsqlGetForeignPaths;
+ routine->GetForeignPlan = pgsqlGetForeignPlan;
+ routine->ExplainForeignScan = pgsqlExplainForeignScan;
+ routine->BeginForeignScan = pgsqlBeginForeignScan;
+ routine->IterateForeignScan = pgsqlIterateForeignScan;
+ routine->ReScanForeignScan = pgsqlReScanForeignScan;
+ routine->EndForeignScan = pgsqlEndForeignScan;
+
+ /* Optional handler functions. */
+ routine->AnalyzeForeignTable = pgsqlAnalyzeForeignTable;
+
+ PG_RETURN_POINTER(routine);
+}
+
+/*
+ * pgsqlGetForeignRelSize
+ * Estimate # of rows and width of the result of the scan
+ *
+ * Here we estimate number of rows returned by the scan in two steps. In the
+ * first step, we execute remote EXPLAIN command to obtain the number of rows
+ * returned from remote side. In the second step, we calculate the selectivity
+ * of the filtering done on local side, and modify first estimate.
+ *
+ * We have to get some catalog objects and generate remote query string here,
+ * so we store such expensive information in FDW private area of RelOptInfo and
+ * pass them to subsequent functions for reuse.
+ */
+static void
+pgsqlGetForeignRelSize(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid)
+{
+ PgsqlFdwPlanState *fpstate;
+ StringInfo sql;
+ ForeignTable *table;
+ ForeignServer *server;
+ UserMapping *user;
+ PGconn *conn;
+ double rows;
+ int width;
+ Cost startup_cost;
+ Cost total_cost;
+ List *remote_conds = NIL;
+ List *param_conds = NIL;
+ List *local_conds = NIL;
+ Selectivity sel;
+
+ /*
+ * We use PgsqlFdwPlanState to pass various information to subsequent
+ * functions.
+ */
+ fpstate = palloc0(sizeof(PgsqlFdwPlanState));
+ initStringInfo(&fpstate->sql);
+ sql = &fpstate->sql;
+
+ /* Retrieve catalog objects which are necessary to estimate rows. */
+ table = GetForeignTable(foreigntableid);
+ server = GetForeignServer(table->serverid);
+ user = GetUserMapping(GetOuterUserId(), server->serverid);
+
+ /*
+ * Construct remote query which consists of SELECT, FROM, and WHERE
+ * clauses, but conditions contain any Param node are excluded because
+ * placeholder can't be used in EXPLAIN statement. Such conditions are
+ * appended later.
+ */
+ classifyConditions(root, baserel, &remote_conds, ¶m_conds,
+ &local_conds);
+ deparseSimpleSql(sql, root, baserel, local_conds);
+ if (list_length(remote_conds) > 0)
+ appendWhereClause(sql, true, remote_conds, root);
+ conn = GetConnection(server, user, false);
+ get_remote_estimate(sql->data, conn, &rows, &width,
+ &startup_cost, &total_cost);
+ ReleaseConnection(conn);
+ if (list_length(param_conds) > 0)
+ appendWhereClause(sql, !(list_length(remote_conds) > 0), param_conds,
+ root);
+
+ /*
+ * Estimate selectivity of conditions which are not used in remote EXPLAIN
+ * by calling clauselist_selectivity(). The best we can do for
+ * parameterized condition is to estimate selectivity on the basis of local
+ * statistics. When we actually obtain result rows, such conditions are
+ * deparsed into remote query and reduce rows transferred.
+ */
+ sel = 1.0;
+ sel *= clauselist_selectivity(root, param_conds,
+ baserel->relid, JOIN_INNER, NULL);
+ sel *= clauselist_selectivity(root, local_conds,
+ baserel->relid, JOIN_INNER, NULL);
+ baserel->rows = rows * sel;
+ baserel->width = width;
+
+ /*
+ * Pack obtained information into a object and store it in FDW-private area
+ * of RelOptInfo to pass them to subsequent functions.
+ */
+ fpstate->startup_cost = startup_cost;
+ fpstate->total_cost = total_cost;
+ fpstate->remote_conds = remote_conds;
+ fpstate->param_conds = param_conds;
+ fpstate->local_conds = local_conds;
+ fpstate->table = table;
+ fpstate->server = server;
+ baserel->fdw_private = (void *) fpstate;
+}
+
+/*
+ * pgsqlGetForeignPaths
+ * Create possible scan paths for a scan on the foreign table
+ */
+static void
+pgsqlGetForeignPaths(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid)
+{
+ PgsqlFdwPlanState *fpstate = (PgsqlFdwPlanState *) baserel->fdw_private;
+ ForeignPath *path;
+ Cost startup_cost;
+ Cost total_cost;
+ List *fdw_private;
+
+ /*
+ * We have cost values which are estimated on remote side, so use them to
+ * estimate better costs which respect various stuffs to complete the scan,
+ * such as sending query, transferring result, and local filtering.
+ *
+ * XXX We assume that remote cost factors are same as local, but it might
+ * be worth to make configurable.
+ */
+ startup_cost = fpstate->startup_cost;
+ total_cost = fpstate->total_cost;
+ adjust_costs(baserel->rows, baserel->width, &startup_cost, &total_cost);
+
+ /* Construct list of SQL statements and bind it with the path. */
+ fdw_private = lappend(NIL, makeString(fpstate->sql.data));
+
+ /*
+ * Create simplest ForeignScan path node and add it to baserel. This path
+ * corresponds to SeqScan path of regular tables.
+ */
+ path = create_foreignscan_path(root, baserel,
+ baserel->rows,
+ startup_cost,
+ total_cost,
+ NIL, /* no pathkeys */
+ NULL, /* no outer rel either */
+ fdw_private);
+ add_path(baserel, (Path *) path);
+
+ /*
+ * XXX We can consider sorted path or parameterized path here if we know
+ * that foreign table is indexed on remote end. For this purpose, we
+ * might have to support FOREIGN INDEX to represent possible sets of sort
+ * keys and/or filtering.
+ */
+}
+
+/*
+ * pgsqlGetForeignPlan
+ * Create ForeignScan plan node which implements selected best path
+ */
+static ForeignScan *
+pgsqlGetForeignPlan(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid,
+ ForeignPath *best_path,
+ List *tlist,
+ List *scan_clauses)
+{
+ PgsqlFdwPlanState *fpstate = (PgsqlFdwPlanState *) baserel->fdw_private;
+ Index scan_relid = baserel->relid;
+ List *fdw_private = NIL;
+ List *fdw_exprs = NIL;
+ List *local_exprs = NIL;
+ ListCell *lc;
+
+ /*
+ * We need lists of Expr other than the lists of RestrictInfo. Now we can
+ * merge remote_conds and param_conds into fdw_exprs, because they are
+ * evaluated on remote side for actual remote query.
+ */
+ foreach(lc, fpstate->remote_conds)
+ fdw_exprs = lappend(fdw_exprs, ((RestrictInfo *) lfirst(lc))->clause);
+ foreach(lc, fpstate->param_conds)
+ fdw_exprs = lappend(fdw_exprs, ((RestrictInfo *) lfirst(lc))->clause);
+ foreach(lc, fpstate->local_conds)
+ local_exprs = lappend(local_exprs,
+ ((RestrictInfo *) lfirst(lc))->clause);
+
+ /*
+ * Make a list contains SELECT statement to it to executor with plan node
+ * for later use.
+ */
+ fdw_private = lappend(fdw_private, makeString(fpstate->sql.data));
+
+ /*
+ * Create the ForeignScan node from target list, local filtering
+ * expressions, remote filtering expressions, and FDW private information.
+ *
+ * We remove expressions which are evaluated on remote side from qual of
+ * the scan node to avoid redundant filtering. Such filter reduction
+ * can be done only here, done after choosing best path, because
+ * baserestrictinfo in RelOptInfo is shared by all possible paths until
+ * best path is chosen.
+ */
+ return make_foreignscan(tlist,
+ local_exprs,
+ scan_relid,
+ fdw_exprs,
+ fdw_private);
+}
+
+/*
+ * pgsqlExplainForeignScan
+ * Produce extra output for EXPLAIN
+ */
+static void
+pgsqlExplainForeignScan(ForeignScanState *node, ExplainState *es)
+{
+ List *fdw_private;
+ char *sql;
+
+ fdw_private = ((ForeignScan *) node->ss.ps.plan)->fdw_private;
+ sql = strVal(list_nth(fdw_private, FdwPrivateSelectSql));
+ ExplainPropertyText("Remote SQL", sql, es);
+}
+
+/*
+ * pgsqlBeginForeignScan
+ * Initiate access to a foreign PostgreSQL table.
+ */
+static void
+pgsqlBeginForeignScan(ForeignScanState *node, int eflags)
+{
+ PgsqlFdwExecutionState *festate;
+ PGconn *conn;
+ Oid relid;
+ ForeignTable *table;
+ ForeignServer *server;
+ UserMapping *user;
+ TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+ /*
+ * Do nothing in EXPLAIN (no ANALYZE) case. node->fdw_state stays NULL.
+ */
+ if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+ return;
+
+ /*
+ * Save state in node->fdw_state.
+ */
+ festate = (PgsqlFdwExecutionState *) palloc(sizeof(PgsqlFdwExecutionState));
+ festate->fdw_private = ((ForeignScan *) node->ss.ps.plan)->fdw_private;
+
+ /*
+ * Create contexts for per-scan tuplestore under per-query context.
+ */
+ festate->scan_cxt = AllocSetContextCreate(node->ss.ps.state->es_query_cxt,
+ "postgresql_fdw per-scan data",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ festate->temp_cxt = AllocSetContextCreate(node->ss.ps.state->es_query_cxt,
+ "postgresql_fdw temporary data",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+
+ /*
+ * Get connection to the foreign server. Connection manager would
+ * establish new connection if necessary.
+ */
+ relid = RelationGetRelid(node->ss.ss_currentRelation);
+ table = GetForeignTable(relid);
+ server = GetForeignServer(table->serverid);
+ user = GetUserMapping(GetOuterUserId(), server->serverid);
+ conn = GetConnection(server, user, true);
+ festate->conn = conn;
+
+ /* Result will be filled in first Iterate call. */
+ festate->tuples = NULL;
+
+ /* Allocate buffers for column values. */
+ {
+ TupleDesc tupdesc = slot->tts_tupleDescriptor;
+ festate->values = palloc(sizeof(Datum) * tupdesc->natts);
+ festate->nulls = palloc(sizeof(bool) * tupdesc->natts);
+ festate->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+ }
+
+ /*
+ * Allocate buffers for query parameters.
+ *
+ * ParamListInfo might include entries for pseudo-parameter such as
+ * PL/pgSQL's FOUND variable, but we don't care that here, because wasted
+ * area seems not so large.
+ */
+ {
+ ParamListInfo params = node->ss.ps.state->es_param_list_info;
+ int numParams = params ? params->numParams : 0;
+
+ if (numParams > 0)
+ {
+ festate->param_types = palloc0(sizeof(Oid) * numParams);
+ festate->param_values = palloc0(sizeof(char *) * numParams);
+ }
+ else
+ {
+ festate->param_types = NULL;
+ festate->param_values = NULL;
+ }
+ }
+
+ /* Remember which foreign table we are scanning. */
+ festate->errpos.relid = relid;
+
+ /* Store FDW-specific state into ForeignScanState */
+ node->fdw_state = (void *) festate;
+
+ return;
+}
+
+/*
+ * pgsqlIterateForeignScan
+ * Retrieve next row from the result set, or clear tuple slot to indicate
+ * EOF.
+ *
+ * Note that using per-query context when retrieving tuples from
+ * tuplestore to ensure that returned tuples can survive until next
+ * iteration because the tuple is released implicitly via ExecClearTuple.
+ * Retrieving a tuple from tuplestore in CurrentMemoryContext (it's a
+ * per-tuple context), ExecClearTuple will free dangling pointer.
+ */
+static TupleTableSlot *
+pgsqlIterateForeignScan(ForeignScanState *node)
+{
+ PgsqlFdwExecutionState *festate;
+ TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ festate = (PgsqlFdwExecutionState *) node->fdw_state;
+
+ /*
+ * If this is the first call after Begin or ReScan, we need to execute
+ * remote query and get result set.
+ */
+ if (festate->tuples == NULL)
+ execute_query(node);
+
+ /*
+ * If tuples are still left in tuplestore, just return next tuple from it.
+ *
+ * It is necessary to switch to per-scan context to make returned tuple
+ * valid until next IterateForeignScan call, because it will be released
+ * with ExecClearTuple then. Otherwise, picked tuple is allocated in
+ * per-tuple context, and double-free of that tuple might happen.
+ *
+ * If we don't have any result in tuplestore, clear result slot to tell
+ * executor that this scan is over.
+ */
+ MemoryContextSwitchTo(festate->scan_cxt);
+ tuplestore_gettupleslot(festate->tuples, true, false, slot);
+ MemoryContextSwitchTo(oldcontext);
+
+ return slot;
+}
+
+/*
+ * pgsqlReScanForeignScan
+ * - Restart this scan by clearing old results and set re-execute flag.
+ */
+static void
+pgsqlReScanForeignScan(ForeignScanState *node)
+{
+ PgsqlFdwExecutionState *festate;
+
+ festate = (PgsqlFdwExecutionState *) node->fdw_state;
+
+ /* If we haven't have valid result yet, nothing to do. */
+ if (festate->tuples == NULL)
+ return;
+
+ /*
+ * Only rewind the current result set is enough.
+ */
+ tuplestore_rescan(festate->tuples);
+}
+
+/*
+ * pgsqlEndForeignScan
+ * Finish scanning foreign table and dispose objects used for this scan
+ */
+static void
+pgsqlEndForeignScan(ForeignScanState *node)
+{
+ PgsqlFdwExecutionState *festate;
+
+ festate = (PgsqlFdwExecutionState *) node->fdw_state;
+
+ /* if festate is NULL, we are in EXPLAIN; nothing to do */
+ if (festate == NULL)
+ return;
+
+ /*
+ * The connection which was used for this scan should be valid until the
+ * end of the scan to make the lifespan of remote transaction same as the
+ * local query.
+ */
+ ReleaseConnection(festate->conn);
+ festate->conn = NULL;
+
+ /* Discard fetch results */
+ if (festate->tuples != NULL)
+ {
+ tuplestore_end(festate->tuples);
+ festate->tuples = NULL;
+ }
+
+ /* MemoryContext will be deleted automatically. */
+}
+
+/*
+ * Estimate costs of executing given SQL statement.
+ */
+static void
+get_remote_estimate(const char *sql, PGconn *conn,
+ double *rows, int *width,
+ Cost *startup_cost, Cost *total_cost)
+{
+ PGresult *volatile res = NULL;
+ StringInfoData buf;
+ char *plan;
+ char *p;
+ int n;
+
+ /*
+ * Construct EXPLAIN statement with given SQL statement.
+ */
+ initStringInfo(&buf);
+ appendStringInfo(&buf, "EXPLAIN %s", sql);
+
+ /* PGresult must be released before leaving this function. */
+ PG_TRY();
+ {
+ res = PQexec(conn, buf.data);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK || PQntuples(res) == 0)
+ ereport(ERROR,
+ (errmsg("could not execute EXPLAIN for cost estimation"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+
+ /*
+ * Find estimation portion from top plan node. Here we search opening
+ * parentheses from the end of the line to avoid finding unexpected
+ * parentheses.
+ */
+ plan = PQgetvalue(res, 0, 0);
+ p = strrchr(plan, '(');
+ if (p == NULL)
+ elog(ERROR, "wrong EXPLAIN output: %s", plan);
+ n = sscanf(p,
+ "(cost=%lf..%lf rows=%lf width=%d)",
+ startup_cost, total_cost, rows, width);
+ if (n != 4)
+ elog(ERROR, "could not get estimation from EXPLAIN output");
+
+ PQclear(res);
+ res = NULL;
+ }
+ PG_CATCH();
+ {
+ PQclear(res);
+
+ /* Release connection and let connection manager cleanup. */
+ ReleaseConnection(conn);
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+}
+
+/*
+ * Adjust costs estimated on remote end with some overheads such as connection
+ * and data transfer.
+ */
+static void
+adjust_costs(double rows, int width, Cost *startup_cost, Cost *total_cost)
+{
+ /*
+ * TODO Selectivity of quals which are NOT pushed down should be also
+ * considered.
+ */
+
+ /* add cost to establish connection. */
+ *startup_cost += CONNECTION_COSTS;
+ *total_cost += CONNECTION_COSTS;
+
+ /* add cost to transfer result. */
+ *total_cost += TRANSFER_COSTS_PER_BYTE * width * rows;
+ *total_cost += cpu_tuple_cost * rows;
+}
+
+/*
+ * Execute remote query with current parameters.
+ */
+static void
+execute_query(ForeignScanState *node)
+{
+ PgsqlFdwExecutionState *festate;
+ ParamListInfo params = node->ss.ps.state->es_param_list_info;
+ int numParams = params ? params->numParams : 0;
+ Oid *types = NULL;
+ const char **values = NULL;
+ char *sql;
+ PGconn *conn;
+ PGresult *volatile res = NULL;
+
+ festate = (PgsqlFdwExecutionState *) node->fdw_state;
+ types = festate->param_types;
+ values = festate->param_values;
+
+ /*
+ * Construct parameter array in text format. We don't release memory for
+ * the arrays explicitly, because the memory usage would not be very large,
+ * and anyway they will be released in context cleanup.
+ *
+ * If this query is invoked from pl/pgsql function, we have extra entry
+ * for dummy variable FOUND in ParamListInfo, so we need to check type oid
+ * to exclude it from remote parameters.
+ */
+ if (numParams > 0)
+ {
+ int i;
+
+ for (i = 0; i < numParams; i++)
+ {
+ ParamExternData *prm = ¶ms->params[i];
+
+ /* give hook a chance in case parameter is dynamic */
+ if (!OidIsValid(prm->ptype) && params->paramFetch != NULL)
+ params->paramFetch(params, i + 1);
+
+ /*
+ * Get string representation of each parameter value by invoking
+ * type-specific output function unless the value is null or it's
+ * not used in the query.
+ */
+ types[i] = prm->ptype;
+ if (!prm->isnull && OidIsValid(types[i]))
+ {
+ Oid out_func_oid;
+ bool isvarlena;
+ FmgrInfo func;
+
+ getTypeOutputInfo(types[i], &out_func_oid, &isvarlena);
+ fmgr_info(out_func_oid, &func);
+ values[i] = OutputFunctionCall(&func, prm->value);
+ }
+ else
+ values[i] = NULL;
+
+ /*
+ * We use type "text" (groundless but seems most flexible) for
+ * unused (and type-unknown) parameters. We can't remove entry for
+ * unused parameter from the arrays, because parameter references
+ * in remote query ($n) have been indexed based on full length
+ * parameter list.
+ */
+ if (!OidIsValid(types[i]))
+ types[i] = TEXTOID;
+ }
+ }
+
+ conn = festate->conn;
+
+ /* PGresult must be released before leaving this function. */
+ PG_TRY();
+ {
+ bool first = true;
+
+ /*
+ * Execute remote query with parameters, and retrieve results with
+ * single-row-mode which returns results row by row.
+ */
+ sql = strVal(list_nth(festate->fdw_private, FdwPrivateSelectSql));
+ if (!PQsendQueryParams(conn, sql, numParams, types, values, NULL, NULL,
+ 0))
+ ereport(ERROR,
+ (errmsg("could not execute remote query"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+ if (!PQsetSingleRowMode(conn))
+ ereport(ERROR,
+ (errmsg("could not set single-row mode"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+
+ /* Retrieve result rows one by one, and store them into tuplestore. */
+ for (;;)
+ {
+ /* Allow users to cancel long query */
+ CHECK_FOR_INTERRUPTS();
+
+ res = PQgetResult(conn);
+ if (res == NULL)
+ break;
+
+ /* Store the result row into tuplestore */
+ if (PQresultStatus(res) == PGRES_SINGLE_TUPLE)
+ {
+ query_row_processor(res, node, first);
+ PQclear(res);
+ res = NULL;
+ first = false;
+ }
+ else if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ /*
+ * PGresult with PGRES_TUPLES_OK means EOF, so we need to
+ * initialze tuplestore if we have not retrieved any tuple.
+ */
+ if (first)
+ query_row_processor(res, node, first);
+ PQclear(res);
+ res = NULL;
+ first = true;
+ }
+ else
+ {
+ /* Something wrong happend, report the error. */
+ ereport(ERROR,
+ (errmsg("could not execute remote query"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+ }
+ }
+
+ /*
+ * We can't know whether the scan is over or not in custom row
+ * processor, so mark that the result is valid here.
+ */
+ tuplestore_donestoring(festate->tuples);
+
+ /* Discard result of SELECT statement. */
+ PQclear(res);
+ res = NULL;
+ }
+ PG_CATCH();
+ {
+ PQclear(res);
+
+ /* Release connection and let connection manager cleanup. */
+ ReleaseConnection(conn);
+
+ /* propagate error */
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+}
+
+/*
+ * Create tuples from PGresult and store them into tuplestore.
+ *
+ * Caller must use PG_TRY block to catch exception and release PGresult
+ * surely.
+ */
+static void
+query_row_processor(PGresult *res, ForeignScanState *node, bool first)
+{
+ int i;
+ int j;
+ int attnum; /* number of non-dropped columns */
+ TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+ TupleDesc tupdesc = slot->tts_tupleDescriptor;
+ Form_pg_attribute *attrs = tupdesc->attrs;
+ PgsqlFdwExecutionState *festate = (PgsqlFdwExecutionState *) node->fdw_state;
+ AttInMetadata *attinmeta = festate->attinmeta;
+ HeapTuple tuple;
+ ErrorContextCallback errcontext;
+ MemoryContext oldcontext;
+
+ if (first)
+ {
+ int nfields = PQnfields(res);
+
+ /* count non-dropped columns */
+ for (attnum = 0, i = 0; i < tupdesc->natts; i++)
+ if (!attrs[i]->attisdropped)
+ attnum++;
+
+ /* check result and tuple descriptor have the same number of columns */
+ if (attnum > 0 && attnum != nfields)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("remote query result rowtype does not match "
+ "the specified FROM clause rowtype"),
+ errdetail("expected %d, actual %d", attnum, nfields)));
+
+ /* First, ensure that the tuplestore is empty. */
+ if (festate->tuples == NULL)
+ {
+
+ /*
+ * Create tuplestore to store result of the query in per-query
+ * context. Note that we use this memory context to avoid memory
+ * leak in error cases.
+ */
+ oldcontext = MemoryContextSwitchTo(festate->scan_cxt);
+ festate->tuples = tuplestore_begin_heap(false, false, work_mem);
+ MemoryContextSwitchTo(oldcontext);
+ }
+ else
+ {
+ /* Clear old result just in case. */
+ tuplestore_clear(festate->tuples);
+ }
+
+ /* Do nothing for empty result */
+ if (PQntuples(res) == 0)
+ return;
+ }
+
+ /* Should have a single-row result if we get here */
+ Assert(PQntuples(res) == 1);
+
+ /*
+ * Do the following work in a temp context that we reset after each tuple.
+ * This cleans up not only the data we have direct access to, but any
+ * cruft the I/O functions might leak.
+ */
+ oldcontext = MemoryContextSwitchTo(festate->temp_cxt);
+
+ for (i = 0, j = 0; i < tupdesc->natts; i++)
+ {
+ /* skip dropped columns. */
+ if (attrs[i]->attisdropped)
+ {
+ festate->nulls[i] = true;
+ continue;
+ }
+
+ /*
+ * Set NULL indicator, and convert text representation to internal
+ * representation if any.
+ */
+ if (PQgetisnull(res, 0, j))
+ festate->nulls[i] = true;
+ else
+ {
+ Datum value;
+
+ festate->nulls[i] = false;
+
+ MemoryContextSwitchTo(festate->scan_cxt);
+ MemoryContextSwitchTo(festate->temp_cxt);
+
+ /*
+ * Set up and install callback to report where conversion error
+ * occurs.
+ */
+ festate->errpos.cur_attno = i + 1;
+ errcontext.callback = postgresql_fdw_error_callback;
+ errcontext.arg = (void *) &festate->errpos;
+ errcontext.previous = error_context_stack;
+ error_context_stack = &errcontext;
+
+ value = InputFunctionCall(&attinmeta->attinfuncs[i],
+ PQgetvalue(res, 0, j),
+ attinmeta->attioparams[i],
+ attinmeta->atttypmods[i]);
+ festate->values[i] = value;
+
+ /* Uninstall error context callback. */
+ error_context_stack = errcontext.previous;
+ }
+ j++;
+ }
+
+ /*
+ * Build the tuple and put it into the slot.
+ * We don't have to free the tuple explicitly because it's been
+ * allocated in the per-tuple context.
+ */
+ tuple = heap_form_tuple(tupdesc, festate->values, festate->nulls);
+ tuplestore_puttuple(festate->tuples, tuple);
+
+ /* Clean up */
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(festate->temp_cxt);
+
+ return;
+}
+
+/*
+ * Callback function which is called when error occurs during column value
+ * conversion. Print names of column and relation.
+ */
+static void
+postgresql_fdw_error_callback(void *arg)
+{
+ ErrorPos *errpos = (ErrorPos *) arg;
+ const char *relname;
+ const char *colname;
+
+ relname = get_rel_name(errpos->relid);
+ colname = get_attname(errpos->relid, errpos->cur_attno);
+ errcontext("column %s of foreign table %s",
+ quote_identifier(colname), quote_identifier(relname));
+}
+
+/*
+ * pgsqlAnalyzeForeignTable
+ * Test whether analyzing this foreign table is supported
+ */
+static bool
+pgsqlAnalyzeForeignTable(Relation relation,
+ AcquireSampleRowsFunc *func,
+ BlockNumber *totalpages)
+{
+ *totalpages = 0;
+ *func = pgsqlAcquireSampleRowsFunc;
+
+ return true;
+}
+
+/*
+ * Acquire a random sample of rows from foreign table managed by postgresql_fdw.
+ *
+ * postgresql_fdw doesn't provide direct access to remote buffer, so we execute
+ * simple SELECT statement which retrieves whole rows from remote side, and
+ * pick some samples from them.
+ */
+static int
+pgsqlAcquireSampleRowsFunc(Relation relation, int elevel,
+ HeapTuple *rows, int targrows,
+ double *totalrows,
+ double *totaldeadrows)
+{
+ PgsqlAnalyzeState astate;
+ StringInfoData sql;
+ ForeignTable *table;
+ ForeignServer *server;
+ UserMapping *user;
+ PGconn *conn = NULL;
+ PGresult *volatile res = NULL;
+
+ /*
+ * Only few information are necessary as input to row processor. Other
+ * initialization will be done at the first row processor call.
+ */
+ astate.anl_cxt = CurrentMemoryContext;
+ astate.temp_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "postgresql_fdw analyze temporary data",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ astate.rows = rows;
+ astate.targrows = targrows;
+ astate.tupdesc = relation->rd_att;
+ astate.errpos.relid = relation->rd_id;
+
+ /*
+ * Construct SELECT statement which retrieves whole rows from remote. We
+ * can't avoid running sequential scan on remote side to get practical
+ * statistics, so this seems reasonable compromise.
+ */
+ initStringInfo(&sql);
+ deparseAnalyzeSql(&sql, relation);
+
+ table = GetForeignTable(relation->rd_id);
+ server = GetForeignServer(table->serverid);
+ user = GetUserMapping(GetOuterUserId(), server->serverid);
+ conn = GetConnection(server, user, true);
+
+ /*
+ * Acquire sample rows from the result set.
+ */
+ PG_TRY();
+ {
+ bool first = true;
+
+ /* Execute remote query and retrieve results row by row. */
+ if (!PQsendQuery(conn, sql.data))
+ ereport(ERROR,
+ (errmsg("could not execute remote query for analyze"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql.data)));
+ if (!PQsetSingleRowMode(conn))
+ ereport(ERROR,
+ (errmsg("could not set single-row mode"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql.data)));
+
+ /* Retrieve result rows one by one, and store them into tuplestore. */
+ for (;;)
+ {
+ /* Allow users to cancel long query */
+ CHECK_FOR_INTERRUPTS();
+
+ res = PQgetResult(conn);
+ if (res == NULL)
+ break;
+
+ /* Store the result row into tuplestore */
+ if (PQresultStatus(res) == PGRES_SINGLE_TUPLE)
+ {
+ analyze_row_processor(res, &astate, first);
+ PQclear(res);
+ res = NULL;
+ first = false;
+ }
+ else if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ /*
+ * PGresult with PGRES_TUPLES_OK means EOF, so we need to
+ * initialze tuplestore if we have not retrieved any tuple.
+ */
+ if (first && PQresultStatus(res) == PGRES_TUPLES_OK)
+ analyze_row_processor(res, &astate, first);
+
+ PQclear(res);
+ res = NULL;
+ first = true;
+ }
+ else
+ {
+ /* Something wrong happend, report the error. */
+ ereport(ERROR,
+ (errmsg("could not execute remote query for analyze"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql.data)));
+ }
+ }
+ }
+ PG_CATCH();
+ {
+ PQclear(res);
+
+ /* Release connection and let connection manager cleanup. */
+ ReleaseConnection(conn);
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ ReleaseConnection(conn);
+
+ /* We assume that we have no dead tuple. */
+ *totaldeadrows = 0.0;
+
+ /* We've retrieved all living tuples from foreign server. */
+ *totalrows = astate.samplerows;
+
+ /*
+ * We don't update pg_class.relpages because we don't care that in
+ * planning at all.
+ */
+
+ /*
+ * Emit some interesting relation info
+ */
+ ereport(elevel,
+ (errmsg("\"%s\": scanned with \"%s\", "
+ "containing %.0f live rows and %.0f dead rows; "
+ "%d rows in sample, %.0f estimated total rows",
+ RelationGetRelationName(relation), sql.data,
+ astate.samplerows, 0.0,
+ astate.numrows, astate.samplerows)));
+
+ return astate.numrows;
+}
+
+/*
+ * Custom row processor for acquire_sample_rows.
+ *
+ * Collect sample rows from the result of query.
+ * - Use all tuples as sample until target rows samples are collected.
+ * - Once reached the target, skip some tuples and replace already sampled
+ * tuple randomly.
+ */
+static void
+analyze_row_processor(PGresult *res, PgsqlAnalyzeState *astate, bool first)
+{
+ int targrows = astate->targrows;
+ TupleDesc tupdesc = astate->tupdesc;
+ int i;
+ int j;
+ int pos; /* position where next sample should be stored. */
+ HeapTuple tuple;
+ ErrorContextCallback errcontext;
+ MemoryContext callercontext;
+
+ if (first)
+ {
+ /* Prepare for sampling rows */
+ astate->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+ astate->values = (Datum *) palloc(sizeof(Datum) * tupdesc->natts);
+ astate->nulls = (bool *) palloc(sizeof(bool) * tupdesc->natts);
+ astate->numrows = 0;
+ astate->samplerows = 0;
+ astate->rowstoskip = -1;
+ astate->numrows = 0;
+ astate->rstate = anl_init_selection_state(astate->targrows);
+
+ /* Do nothing for empty result */
+ if (PQntuples(res) == 0)
+ return;
+ }
+
+ /* Should have a single-row result if we get here */
+ Assert(PQntuples(res) == 1);
+
+ /*
+ * Do the following work in a temp context that we reset after each tuple.
+ * This cleans up not only the data we have direct access to, but any
+ * cruft the I/O functions might leak.
+ */
+ callercontext = MemoryContextSwitchTo(astate->temp_cxt);
+
+ /*
+ * First targrows rows are once sampled always. If we have more source
+ * rows, pick up some of them by skipping and replace already sampled
+ * tuple randomly.
+ *
+ * Here we just determine the slot where next sample should be stored. Set
+ * pos to negative value to indicates the row should be skipped.
+ */
+ if (astate->numrows < targrows)
+ pos = astate->numrows++;
+ else
+ {
+ /*
+ * The first targrows sample rows are simply copied into
+ * the reservoir. Then we start replacing tuples in the
+ * sample until we reach the end of the relation. This
+ * algorithm is from Jeff Vitter's paper, similarly to
+ * acquire_sample_rows in analyze.c.
+ *
+ * We don't have block-wise accessibility, so every row in
+ * the PGresult is possible to be sample.
+ */
+ if (astate->rowstoskip < 0)
+ astate->rowstoskip = anl_get_next_S(astate->samplerows, targrows,
+ &astate->rstate);
+
+ if (astate->rowstoskip <= 0)
+ {
+ int k = (int) (targrows * anl_random_fract());
+
+ Assert(k >= 0 && k < targrows);
+
+ /*
+ * Create sample tuple from the result, and replace at
+ * random.
+ */
+ heap_freetuple(astate->rows[k]);
+ pos = k;
+ }
+ else
+ pos = -1;
+
+ astate->rowstoskip -= 1;
+ }
+
+ /* Always increment sample row counter. */
+ astate->samplerows += 1;
+
+ if (pos >= 0)
+ {
+ AttInMetadata *attinmeta = astate->attinmeta;
+
+ /*
+ * Create sample tuple from current result row, and store it into the
+ * position determined above. Note that i and j point entries in
+ * catalog and columns array respectively.
+ */
+ for (i = 0, j = 0; i < tupdesc->natts; i++)
+ {
+ if (tupdesc->attrs[i]->attisdropped)
+ continue;
+
+ if (PQgetisnull(res, 0, j))
+ astate->nulls[i] = true;
+ else
+ {
+ Datum value;
+
+ astate->nulls[i] = false;
+
+ /*
+ * Set up and install callback to report where conversion error
+ * occurs.
+ */
+ astate->errpos.cur_attno = i + 1;
+ errcontext.callback = postgresql_fdw_error_callback;
+ errcontext.arg = (void *) &astate->errpos;
+ errcontext.previous = error_context_stack;
+ error_context_stack = &errcontext;
+
+ value = InputFunctionCall(&attinmeta->attinfuncs[i],
+ PQgetvalue(res, 0, j),
+ attinmeta->attioparams[i],
+ attinmeta->atttypmods[i]);
+ astate->values[i] = value;
+
+ /* Uninstall error callback function. */
+ error_context_stack = errcontext.previous;
+ }
+ j++;
+ }
+
+ /*
+ * Generate tuple from the result row data, and store it into the give
+ * buffer. Note that we need to allocate the tuple in the analyze
+ * context to make it valid even after temporary per-tuple context has
+ * been reset.
+ */
+ MemoryContextSwitchTo(astate->anl_cxt);
+ tuple = heap_form_tuple(tupdesc, astate->values, astate->nulls);
+ MemoryContextSwitchTo(astate->temp_cxt);
+ astate->rows[pos] = tuple;
+ }
+
+ /* Clean up */
+ MemoryContextSwitchTo(callercontext);
+ MemoryContextReset(astate->temp_cxt);
+
+ return;
+}
diff --git a/contrib/postgresql_fdw/postgresql_fdw.control b/contrib/postgresql_fdw/postgresql_fdw.control
new file mode 100644
index 0000000..a87dc80
--- /dev/null
+++ b/contrib/postgresql_fdw/postgresql_fdw.control
@@ -0,0 +1,5 @@
+# postgresql_fdw extension
+comment = 'foreign-data wrapper for remote PostgreSQL servers'
+default_version = '1.0'
+module_pathname = '$libdir/postgresql_fdw'
+relocatable = true
diff --git a/contrib/postgresql_fdw/postgresql_fdw.h b/contrib/postgresql_fdw/postgresql_fdw.h
new file mode 100644
index 0000000..c84ca44
--- /dev/null
+++ b/contrib/postgresql_fdw/postgresql_fdw.h
@@ -0,0 +1,44 @@
+/*-------------------------------------------------------------------------
+ *
+ * postgresql_fdw.h
+ * foreign-data wrapper for remote PostgreSQL servers.
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgresql_fdw/postgresql_fdw.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef PGSQL_FDW_H
+#define PGSQL_FDW_H
+
+#include "postgres.h"
+#include "foreign/foreign.h"
+#include "nodes/relation.h"
+#include "utils/relcache.h"
+
+/* in option.c */
+int ExtractConnectionOptions(List *defelems,
+ const char **keywords,
+ const char **values);
+int GetFetchCountOption(ForeignTable *table, ForeignServer *server);
+
+/* in deparse.c */
+void deparseSimpleSql(StringInfo buf,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ List *local_conds);
+void appendWhereClause(StringInfo buf,
+ bool has_where,
+ List *exprs,
+ PlannerInfo *root);
+void classifyConditions(PlannerInfo *root,
+ RelOptInfo *baserel,
+ List **remote_conds,
+ List **param_conds,
+ List **local_conds);
+void deparseAnalyzeSql(StringInfo buf, Relation rel);
+
+#endif /* PGSQL_FDW_H */
diff --git a/contrib/postgresql_fdw/sql/postgresql_fdw.sql b/contrib/postgresql_fdw/sql/postgresql_fdw.sql
new file mode 100644
index 0000000..b1ad12b
--- /dev/null
+++ b/contrib/postgresql_fdw/sql/postgresql_fdw.sql
@@ -0,0 +1,290 @@
+-- ===================================================================
+-- create FDW objects
+-- ===================================================================
+
+-- Clean up in case a prior regression run failed
+
+-- Suppress NOTICE messages when roles don't exist
+SET client_min_messages TO 'error';
+
+DROP ROLE IF EXISTS postgresql_fdw_user;
+
+RESET client_min_messages;
+
+CREATE ROLE postgresql_fdw_user LOGIN SUPERUSER;
+SET SESSION AUTHORIZATION 'postgresql_fdw_user';
+
+CREATE EXTENSION postgresql_fdw;
+
+CREATE SERVER loopback1 FOREIGN DATA WRAPPER postgresql_fdw;
+CREATE SERVER loopback2 FOREIGN DATA WRAPPER postgresql_fdw
+ OPTIONS (dbname 'contrib_regression');
+
+CREATE USER MAPPING FOR public SERVER loopback1
+ OPTIONS (user 'value', password 'value');
+CREATE USER MAPPING FOR postgresql_fdw_user SERVER loopback2;
+
+CREATE FOREIGN TABLE ft1 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10)
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft1 DROP COLUMN c0;
+
+CREATE FOREIGN TABLE ft2 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10)
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft2 DROP COLUMN c0;
+
+-- ===================================================================
+-- create objects used through FDW
+-- ===================================================================
+CREATE SCHEMA "S 1";
+CREATE TABLE "S 1"."T 1" (
+ "C 1" int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
+);
+CREATE TABLE "S 1"."T 2" (
+ c1 int NOT NULL,
+ c2 text,
+ CONSTRAINT t2_pkey PRIMARY KEY (c1)
+);
+
+BEGIN;
+TRUNCATE "S 1"."T 1";
+INSERT INTO "S 1"."T 1"
+ SELECT id,
+ id % 10,
+ to_char(id, 'FM00000'),
+ '1970-01-01'::timestamptz + ((id % 100) || ' days')::interval,
+ '1970-01-01'::timestamp + ((id % 100) || ' days')::interval,
+ id % 10,
+ id % 10
+ FROM generate_series(1, 1000) id;
+TRUNCATE "S 1"."T 2";
+INSERT INTO "S 1"."T 2"
+ SELECT id,
+ 'AAA' || to_char(id, 'FM000')
+ FROM generate_series(1, 100) id;
+COMMIT;
+
+-- ===================================================================
+-- tests for postgresql_fdw_validator
+-- ===================================================================
+ALTER FOREIGN DATA WRAPPER postgresql_fdw OPTIONS (host 'value'); -- ERROR
+-- requiressl, krbsrvname and gsslib are omitted because they depend on
+-- configure option
+ALTER SERVER loopback1 OPTIONS (
+ authtype 'value',
+ service 'value',
+ connect_timeout 'value',
+ dbname 'value',
+ host 'value',
+ hostaddr 'value',
+ port 'value',
+ --client_encoding 'value',
+ tty 'value',
+ options 'value',
+ application_name 'value',
+ --fallback_application_name 'value',
+ keepalives 'value',
+ keepalives_idle 'value',
+ keepalives_interval 'value',
+ -- requiressl 'value',
+ sslcompression 'value',
+ sslmode 'value',
+ sslcert 'value',
+ sslkey 'value',
+ sslrootcert 'value',
+ sslcrl 'value'
+ --requirepeer 'value',
+ -- krbsrvname 'value',
+ -- gsslib 'value',
+ --replication 'value'
+);
+ALTER SERVER loopback1 OPTIONS (user 'value'); -- ERROR
+ALTER USER MAPPING FOR public SERVER loopback1
+ OPTIONS (DROP user, DROP password);
+ALTER USER MAPPING FOR public SERVER loopback1
+ OPTIONS (host 'value'); -- ERROR
+ALTER FOREIGN TABLE ft1 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft2 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft1 OPTIONS (invalid 'value'); -- ERROR
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (invalid 'value'); -- ERROR
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+\dew+
+\des+
+\deu+
+\det+
+
+-- ===================================================================
+-- simple queries
+-- ===================================================================
+-- single table, with/without alias
+EXPLAIN (COSTS false) SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+-- empty result
+SELECT * FROM ft1 WHERE false;
+-- with WHERE clause
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+-- aggregate
+SELECT COUNT(*) FROM ft1 t1;
+-- join two tables
+SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+-- subquery
+SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
+-- subquery+MAX
+SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY c1;
+-- used in CTE
+WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
+-- fixed values
+SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+-- user-defined operator/function
+CREATE FUNCTION postgresql_fdw_abs(int) RETURNS int AS $$
+BEGIN
+RETURN abs($1);
+END
+$$ LANGUAGE plpgsql IMMUTABLE;
+CREATE OPERATOR === (
+ LEFTARG = int,
+ RIGHTARG = int,
+ PROCEDURE = int4eq,
+ COMMUTATOR = ===,
+ NEGATOR = !==
+);
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = postgresql_fdw_abs(t1.c2);
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 === t1.c2;
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = abs(t1.c2);
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = t1.c2;
+
+-- ===================================================================
+-- WHERE push down
+-- ===================================================================
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1; -- Var, OpExpr(b), Const
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NULL; -- NullTest
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL; -- NullTest
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = -c1; -- OpExpr(l)
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE 1 = c1!; -- OpExpr(r)
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
+EXPLAIN (COSTS false) SELECT * FROM ft1 ft WHERE c1 = (ARRAY[c1,c2,3])[1]; -- ArrayRef
+
+-- ===================================================================
+-- parameterized queries
+-- ===================================================================
+-- simple join
+PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
+EXPLAIN (COSTS false) EXECUTE st1(1, 2);
+EXECUTE st1(1, 1);
+EXECUTE st1(101, 101);
+-- subquery using stable function (can't be pushed down)
+PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c4) = 6) ORDER BY c1;
+EXPLAIN (COSTS false) EXECUTE st2(10, 20);
+EXECUTE st2(10, 20);
+EXECUTE st1(101, 101);
+-- subquery using immutable function (can be pushed down)
+PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c5) = 6) ORDER BY c1;
+EXPLAIN (COSTS false) EXECUTE st3(10, 20);
+EXECUTE st3(10, 20);
+EXECUTE st3(20, 30);
+-- custom plan should be chosen
+PREPARE st4(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 = $1;
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+-- cleanup
+DEALLOCATE st1;
+DEALLOCATE st2;
+DEALLOCATE st3;
+DEALLOCATE st4;
+
+-- ===================================================================
+-- used in pl/pgsql function
+-- ===================================================================
+CREATE OR REPLACE FUNCTION f_test(p_c1 int) RETURNS int AS $$
+DECLARE
+ v_c1 int;
+BEGIN
+ SELECT c1 INTO v_c1 FROM ft1 WHERE c1 = p_c1 LIMIT 1;
+ PERFORM c1 FROM ft1 WHERE c1 = p_c1 AND p_c1 = v_c1 LIMIT 1;
+ RETURN v_c1;
+END;
+$$ LANGUAGE plpgsql;
+SELECT f_test(100);
+DROP FUNCTION f_test(int);
+
+-- ===================================================================
+-- connection management
+-- ===================================================================
+SELECT srvname, usename FROM postgresql_fdw_connections;
+SELECT postgresql_fdw_disconnect(srvid, usesysid) FROM postgresql_fdw_get_connections();
+SELECT srvname, usename FROM postgresql_fdw_connections;
+
+-- ===================================================================
+-- conversion error
+-- ===================================================================
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE int;
+SELECT * FROM ft1 WHERE c1 = 1; -- ERROR
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE timestamp;
+
+-- ===================================================================
+-- subtransaction
+-- + local/remote error doesn't break cursor
+-- + remote error discards connection
+-- ===================================================================
+BEGIN;
+DECLARE c CURSOR FOR SELECT * FROM ft1 ORDER BY c1;
+FETCH c;
+SAVEPOINT s;
+ERROR OUT; -- ERROR
+ROLLBACK TO s;
+SELECT srvname FROM postgresql_fdw_connections;
+FETCH c;
+SAVEPOINT s;
+SELECT * FROM ft1 WHERE 1 / (c1 - 1) > 0; -- ERROR
+ROLLBACK TO s;
+SELECT srvname FROM postgresql_fdw_connections;
+FETCH c;
+SELECT * FROM ft1 ORDER BY c1 LIMIT 1;
+COMMIT;
+SELECT srvname FROM postgresql_fdw_connections;
+ERROR OUT; -- ERROR
+SELECT srvname FROM postgresql_fdw_connections;
+
+-- ===================================================================
+-- cleanup
+-- ===================================================================
+DROP OPERATOR === (int, int) CASCADE;
+DROP OPERATOR !== (int, int) CASCADE;
+DROP FUNCTION postgresql_fdw_abs(int);
+DROP SCHEMA "S 1" CASCADE;
+DROP EXTENSION postgresql_fdw CASCADE;
+\c
+DROP ROLE postgresql_fdw_user;
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 6b13a0a..4ffa2fa 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -132,6 +132,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
&pgstatstatements;
&pgstattuple;
&pgtrgm;
+ &postgresql-fdw;
&seg;
&sepgsql;
&contrib-spi;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index db4cc3a..373582a 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -133,6 +133,7 @@
<!ENTITY pgtesttiming SYSTEM "pgtesttiming.sgml">
<!ENTITY pgtrgm SYSTEM "pgtrgm.sgml">
<!ENTITY pgupgrade SYSTEM "pgupgrade.sgml">
+<!ENTITY postgresql-fdw SYSTEM "postgresql-fdw.sgml">
<!ENTITY seg SYSTEM "seg.sgml">
<!ENTITY contrib-spi SYSTEM "contrib-spi.sgml">
<!ENTITY sepgsql SYSTEM "sepgsql.sgml">
diff --git a/doc/src/sgml/postgresql-fdw.sgml b/doc/src/sgml/postgresql-fdw.sgml
new file mode 100644
index 0000000..b1c4e36
--- /dev/null
+++ b/doc/src/sgml/postgresql-fdw.sgml
@@ -0,0 +1,235 @@
+<!-- doc/src/sgml/postgresql-fdw.sgml -->
+
+<sect1 id="postgresql-fdw" xreflabel="postgresql_fdw">
+ <title>postgresql_fdw</title>
+
+ <indexterm zone="postgresql-fdw">
+ <primary>postgresql_fdw</primary>
+ </indexterm>
+
+ <para>
+ The <filename>postgresql_fdw</filename> module provides a foreign-data
+ wrapper for external <productname>PostgreSQL</productname> servers.
+ With this module, users can access data stored in external
+ <productname>PostgreSQL</productname> via plain SQL statements.
+ </para>
+
+ <para>
+ Note that default wrapper <literal>postgresql_fdw</literal> is created
+ automatically during <command>CREATE EXTENSION</command> command for
+ <application>postgresql_fdw</application>.
+ </para>
+
+ <sect2>
+ <title>FDW Options of postgresql_fdw</title>
+
+ <sect3>
+ <title>Connection Options</title>
+ <para>
+ A foreign server and user mapping created using this wrapper can have
+ <application>libpq</> connection options, expect below:
+
+ <itemizedlist>
+ <listitem><para>client_encoding</para></listitem>
+ <listitem><para>fallback_application_name</para></listitem>
+ <listitem><para>replication</para></listitem>
+ </itemizedlist>
+
+ For details of <application>libpq</> connection options, see
+ <xref linkend="libpq-connect">.
+ </para>
+
+ <para>
+ <literal>user</literal> and <literal>password</literal> can be
+ specified on user mappings, and others can be specified on foreign servers.
+ </para>
+ </sect3>
+
+ <sect3>
+ <title>Object Name Options</title>
+ <para>
+ Foreign tables which were created using this wrapper, or its columns can
+ have object name options. These options can be used to specify the names
+ used in SQL statement sent to remote <productname>PostgreSQL</productname>
+ server. These options are useful when a remote object has different name
+ from corresponding local one.
+ </para>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>nspname</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a foreign table, is used as a
+ namespace (schema) reference in the SQL statement. If this options is
+ omitted, <literal>pg_class.nspname</literal> of the foreign table is
+ used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>relname</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a foreign table, is used as a
+ relation (table) reference in the SQL statement. If this options is
+ omitted, <literal>pg_class.relname</literal> of the foreign table is
+ used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>colname</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a column of a foreign table, is
+ used as a column (attribute) reference in the SQL statement. If this
+ option is omitted, <literal>pg_attribute.attname</literal> of the column
+ of the foreign table is used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ </sect3>
+
+ </sect2>
+
+ <sect2>
+ <title>Connection Management</title>
+
+ <para>
+ The <application>postgresql_fdw</application> establishes a connection to a
+ foreign server in the beginning of the first query which uses a foreign
+ table associated to the foreign server, and reuses the connection following
+ queries and even in following foreign scans in same query.
+
+ You can see the list of active connections via
+ <structname>postgresql_fdw_connections</structname> view. It shows pair of
+ oid and name of server and local role for each active connections
+ established by <application>postgresql_fdw</application>. For security
+ reason, only superuser can see other role's connections.
+ </para>
+
+ <para>
+ Established connections are kept alive until local role changes or the
+ current transaction aborts or user requests so.
+ </para>
+
+ <para>
+ If role has been changed, active connections established as old local role
+ is kept alive but never be reused until local role has restored to original
+ role. This kind of situation happens with <command>SET ROLE</command> and
+ <command>SET SESSION AUTHORIZATION</command>.
+ </para>
+
+ <para>
+ If current transaction aborts by error or user request, all active
+ connections are disconnected automatically. This behavior avoids possible
+ connection leaks on error.
+ </para>
+
+ <para>
+ You can discard persistent connection at arbitrary timing with
+ <function>postgresql_fdw_disconnect()</function>. It takes server oid and
+ user oid as arguments. This function can handle only connections
+ established in current session; connections established by other backends
+ are not reachable.
+ </para>
+
+ <para>
+ You can discard all active and visible connections in current session with
+ using <structname>postgresql_fdw_connections</structname> and
+ <function>postgresql_fdw_disconnect()</function> together:
+<synopsis>
+postgres=# SELECT postgresql_fdw_disconnect(srvid, usesysid) FROM postgresql_fdw_connections;
+ postgresql_fdw_disconnect
+----------------------
+ OK
+ OK
+(2 rows)
+</synopsis>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Transaction Management</title>
+ <para>
+ The <application>postgresql_fdw</application> begins remote transaction at
+ the beginning of a local query, and terminates it with
+ <command>ABORT</command> at the end of the local query. This means that all
+ foreign scans on a foreign server in a local query are executed in one
+ transaction.
+ If isolation level of local transaction is <literal>SERIALIZABLE</literal>,
+ <literal>SERIALIZABLE</literal> is used for remote transaction. Otherwise,
+ if isolation level of local transaction is one of
+ <literal>READ UNCOMMITTED</literal>, <literal>READ COMMITTED</literal> or
+ <literal>REPEATABLE READ</literal>, then <literal>REPEATABLE READ</literal>
+ is used for remote transaction.
+ <literal>READ UNCOMMITTED</literal> and <literal>READ COMMITTED</literal>
+ are never used for remote transaction, because even
+ <literal>READ COMMITTED</literal> transaction might produce inconsistent
+ results, if remote data have been updated between two remote queries.
+ </para>
+ <para>
+ Note that even if the isolation level of local transaction was
+ <literal>SERIALIZABLE</literal> or <literal>REPEATABLE READ</literal>,
+ series of one query might produce different result, because foreign scans
+ in different local queries are executed in different remote transactions.
+ For instance, when client started a local transaction
+ explicitly with isolation level <literal>SERIALIZABLE</literal>, and
+ executed same local query which contains a foreign table which references
+ foreign data which is updated frequently, latter result would be different
+ from former result.
+ </para>
+ <para>
+ This restriction might be relaxed in future release.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Estimation of Costs and Rows</title>
+ <para>
+ The <application>postgresql_fdw</application> estimates the costs of a
+ foreign scan by adding up some basic costs: connection costs, remote query
+ costs and data transfer costs.
+ To get remote query costs, <application>postgresql_fdw</application> executes
+ <command>EXPLAIN</command> command on remote server for each foreign scan.
+ </para>
+ <para>
+ On the other hand, estimated rows which was returned by
+ <command>EXPLAIN</command> is used for local estimation as-is.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>EXPLAIN Output</title>
+ <para>
+ For a foreign table using <literal>postgresql_fdw</>, <command>EXPLAIN</>
+ shows a remote SQL statement which is sent to remote
+ <productname>PostgreSQL</productname> server for a ForeignScan plan node.
+ For example:
+ </para>
+<synopsis>
+postgres=# EXPLAIN SELECT aid FROM pgbench_accounts WHERE abalance < 0;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on pgbench_accounts (cost=100.00..8105.13 rows=302613 width=8)
+ Filter: (abalance < 0)
+ Remote SQL: SELECT aid, NULL, abalance, NULL FROM public.pgbench_accounts
+(3 rows)
+</synopsis>
+ </sect2>
+
+ <sect2>
+ <title>Author</title>
+ <para>
+ Shigeru Hanada <email>shigeru.hanada@gmail.com</email>
+ </para>
+ </sect2>
+
+</sect1>
Hanada-san,
I tried to check this patch. Because we also had some discussion
on this extension through the last two commit fests, I have no
fundamental design arguments.
So, let me drop in the implementation detail of this patch.
At the postgresql_fdw/deparse.c,
* Even though deparseVar() is never invoked with need_prefix=true,
I doubt why Var reference needs to be qualified with relation alias.
It seems to me relation alias is never used in the remote query,
so isn't it a possible bug?
* deparseFuncExpr() has case handling depending on funcformat
of FuncExpr. I think all the cases can be deparsed using explicit
function call, and it can avoid a trouble when remote host has
inconsistent cast configuration.
At the postgresql_fdw/connection.c,
* I'm worry about the condition for invocation of begin_remote_tx().
+ if (use_tx && entry->refs == 1)
+ begin_remote_tx(entry->conn);
+ entry->use_tx = use_tx;
My preference is: if (use_tx && !entry->use_tx), instead.
Even though here is no code path to make a problem obvious,
it may cause possible difficult-to-find bug, in case when caller
tried to get a connection being already acquired by someone
but no transaction needed.
At the postgresql_fdw/postgresql_fdw.c,
* When pgsqlGetForeignPaths() add SQL statement into
fdw_private, it is implemented as:
+ /* Construct list of SQL statements and bind it with the path. */
+ fdw_private = lappend(NIL, makeString(fpstate->sql.data));
Could you use list_make1() instead?
* At the bottom half of query_row_processor(), I found these
mysterious two lines.
MemoryContextSwitchTo(festate->scan_cxt);
MemoryContextSwitchTo(festate->temp_cxt);
Why not switch temp_cxt directly?
At the sgml/postgresql-fdw.sgml,
* Please add this version does not support sub-transaction handling.
Especially, all we can do is abort top-level transaction in case when
an error is occurred at the remote side within sub-transaction.
I hope to take over this patch for committer soon.
Thanks,
2012/9/14 Shigeru HANADA <shigeru.hanada@gmail.com>:
Hi all,
I'd like to propose FDW for PostgreSQL as a contrib module again.
Attached patch is updated version of the patch proposed in 9.2
development cycle.For ease of review, I summarized what the patch tries to achieve.
Abstract
========
This patch provides FDW for PostgreSQL which allows users to access
external data stored in remote PostgreSQL via foreign tables. Of course
external instance can be beyond network. And I think that this FDW
could be an example of other RDBMS-based FDW, and it would be useful for
proof-of-concept of FDW-related features.Note that the name has been changed from "pgsql_fdw" which was used in
last proposal, since I got a comment which says that most of existing
FDWs have name "${PRODUCT_NAME}_fdw" so "postgresql_fdw" or
"postgres_fdw" would be better. For this issue, I posted another patch
which moves existing postgresql_fdw_validator into contrib/dblink with
renaming in order to reserve the name "postgresql_fdw" for this FDW.
Please note that the attached patch requires dblink_fdw_validator.patch
to be applied first.
http://archives.postgresql.org/pgsql-hackers/2012-09/msg00454.phpQuery deparser
==============
Now postgresql_fdw has its own SQL query deparser inside, so it's free
from backend's ruleutils module.This deparser maps object names when generic options below were set.
nspname of foreign table: used as namespace (schema) of relation
relname of foreign table: used as relation name
colname of foreign column: used as column nameThis mapping allows flexible schema design.
SELECT optimization
===================
postgresql_fdw always retrieves as much columns as foreign table from
remote to avoid overhead of column mapping. However, often some of them
(or sometimes all of them) are not used on local side, so postgresql_fdw
uses NULL literal as such unused columns in SELECT clause of remote
query. For example, let's assume one of pgbench workloads:SELECT abalance FROM pgbench_accounts WHERE aid = 1;
This query generates a remote query below. In addition to bid and
filler, aid is replaced with NULL because it's already evaluated on
remote side.SELECT NULL, NULL, abalance, NULL FROM pgbench_accounts
WHERE (aid OPERATOR(pg_catalog.=) 1);This trick would improve performance notably by reducing amount of data
to be transferred.One more example. Let's assume counting rows.
SELCT count(*) FROM pgbench_accounts;
This query requires only existence of row, so no actual column reference
is in SELECT clause.SELECT NULL, NULL, NULL, NULL FROM pgbench_accounts;
WHERE push down
===============
postgresql_fdw pushes down some of restrictions (IOW, top level elements
in WHERE clause which are connected with AND) which can be evaluated on
remote side safely. Currently the criteria "safe" is declared as
whether an expression contains only:
- column reference
- constant of bult-in type (scalar and array)
- external parameter of EXECUTE statement
- built-in operator which uses built-in immutable function
(operator cannot be collative unless it's "=" or "<>")
- built-in immutable functionSome other elements might be also safe to be pushed down, but criteria
above seems enough for basic use cases.Although it might seem odd, but operators are deparsed into OPERATOR
notation to avoid search_path problem.
E.g.
local query : WHERE col = 1
remote query: WHERE (col OPERATOR(pg_catalog.=) 1)Connection management
=====================
postgresql_fdw has its own connection manager. Connection is
established when first foreign scan on a server is planned, and it's
pooled in the backend. If another foreign scan on same server is
invoked, same connection will be used. Connection pool is per-backend.
This means that different backends never share connection.postgresql_fdw_connections view shows active connections, and
postgresql_fdw_disconnect() allows users to discard particular
connection at arbitrary timing.Transaction management
======================
If multiple foreign tables on same foreign server is used in a local
query, postgresql_fdw uses same connection to retrieve results in a
transaction to make results consistent. Currently remote transaction is
closed at the end of local query, so following local query might produce
inconsistent result.Costs estimation
================
To estimate costs and result rows of a foreign scan, postgresql_fdw
executes EXPLAIN statement on remote side, and retrieves costs and rows
values from the result. For cost estimation, cost of connection
establishment and data transfer are added to the base costs. Currently
these two factors is hard-coded, but making them configurable is not so
difficult.Executing EXPLAIN is not cheap, but remote query itself is usually very
expensive, so such additional cost would be acceptable.ANALYZE support
===============
postgresql_fdw supports ANALYZE to improve selectivity estimation of
filtering done on local side (WHERE clauses which could not been pushed
down. The sampler function retrieves all rows from remote table and
skip some of them so that result fits requested size. As same as
file_fdw, postgresql_fdw doesn't care order of result, because it's
important for only correlation, and correlation is important for only
index scans, which is not supported for this FDW.Fetching Data
=============
postgresql_fdw uses single-row mode of libpq so that memory usage is
kept in low level even if the result is huge.To cope with difference of encoding, postgresql_fdw automatically sets
client_encoding to server encoding of local database.Future improvement
==================
I have some ideas for improvement:
- Provide sorted result path (requires index information?)
- Provide parameterized path
- Transaction mapping between local and remotes (2PC)
- binary transfer (only against servers with same PG major version?)
- JOIN push-down (requires support by core)Any comments and questions are welcome.
--
Shigeru HANADA--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
On Fri, Sep 14, 2012 at 9:25 AM, Shigeru HANADA
<shigeru.hanada@gmail.com> wrote:
Hi all,
I'd like to propose FDW for PostgreSQL as a contrib module again.
Attached patch is updated version of the patch proposed in 9.2
development cycle.
very nice.
- binary transfer (only against servers with same PG major version?)
Unfortunately this is not enough -- at least not without some
additional work. The main problem is user defined types, especially
composites. Binary wire format sends internal member oids which the
receiving server will have to interpret.
merlin
Kaigai-san,
Thanks for the review.
On Thu, Oct 4, 2012 at 6:10 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
At the postgresql_fdw/deparse.c,
* Even though deparseVar() is never invoked with need_prefix=true,
I doubt why Var reference needs to be qualified with relation alias.
It seems to me relation alias is never used in the remote query,
so isn't it a possible bug?
This must be a remaining of my effort against supporting JOIN-push-down
(it is one of future improvements). At the moment it is not clear what
should be used as column prefix, so I removed need_prefix parameter to
avoid possible confusion. I removed need_prefix from deparseRelation as
well.
* deparseFuncExpr() has case handling depending on funcformat
of FuncExpr. I think all the cases can be deparsed using explicit
function call, and it can avoid a trouble when remote host has
inconsistent cast configuration.
Hm, your point is that specifying underlying function, e.g.
"cast_func(value)", is better than simple cast notation,e.g.
"value::type" and "CAST(value AS type)", because such explicit statement
prevents possible problems caused by difference of cast configuration,
right? If so, I agree about explicit casts. I'm not sure about
implicit casts because it seems difficult to avoid unexpected implicit
cast at all.
As background, I just followed the logic implemented in ruleutils.c for
FuncExpr, which deparses explicit cast in format of 'value::type'. If
it's sure that FuncExpr comes from cast never takes arguments more than
one, we can go your way. I'll check it.
At the postgresql_fdw/connection.c,
* I'm worry about the condition for invocation of begin_remote_tx().
+ if (use_tx && entry->refs == 1)
+ begin_remote_tx(entry->conn);
+ entry->use_tx = use_tx;
My preference is: if (use_tx && !entry->use_tx), instead.
Even though here is no code path to make a problem obvious,
it may cause possible difficult-to-find bug, in case when caller
tried to get a connection being already acquired by someone
but no transaction needed.
I got it. In addition, I fixed ReleaseConnection to call
abort_remote_tx after decrementing reference counter, as GetConnection
does for begin_remote_tx.
At the postgresql_fdw/postgresql_fdw.c,
* When pgsqlGetForeignPaths() add SQL statement into fdw_private, it is implemented as: + /* Construct list of SQL statements and bind it with the path. */ + fdw_private = lappend(NIL, makeString(fpstate->sql.data)); Could you use list_make1() instead?
Fixed.
* At the bottom half of query_row_processor(), I found these
mysterious two lines.
MemoryContextSwitchTo(festate->scan_cxt);
MemoryContextSwitchTo(festate->temp_cxt);
Why not switch temp_cxt directly?
It must be a copy-and-paste mistake. Removed both.
At the sgml/postgresql-fdw.sgml,
* Please add this version does not support sub-transaction handling.
Especially, all we can do is abort top-level transaction in case when
an error is occurred at the remote side within sub-transaction.
I don't think that abort of local top-level transaction is not necessary
in such case, because now connection_cleanup() closes remote connection
whenever remote error occurs in sub-transactions. For instance, we can
recover from remote syntax error (it could easily happen from wrong
relname setting) by ROLLBACK. Am I missing something?
$ ALTER FOREIGN TABLE foo OPTIONS (SET relname 'invalid');
$ BEGIN; -- explicit transaction
$ SAVEPOINT a; -- begin sub-transaction
$ SELECT * FROM foo; -- this causes remote error, then remote
-- connection is closed automatically
$ ROLLBACK TO a; -- clears local error state
$ SELECT 1; -- this should be successfully executed
I hope to take over this patch for committer soon.
I hope so too :)
Please examine attached v2 patch (note that is should be applied onto
latest dblink_fdw_validator patch).
Regards,
--
Shigeru HANADA
Attachments:
postgresql_fdw.v2.patchtext/plain; charset=UTF-8; name=postgresql_fdw.v2.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/contrib/Makefile b/contrib/Makefile
index d230451..ce6d461 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -43,6 +43,7 @@ SUBDIRS = \
pgcrypto \
pgrowlocks \
pgstattuple \
+ postgresql_fdw \
seg \
spi \
tablefunc \
diff --git a/contrib/postgresql_fdw/.gitignore b/contrib/postgresql_fdw/.gitignore
new file mode 100644
index 0000000..0854728
--- /dev/null
+++ b/contrib/postgresql_fdw/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/results/
+*.o
+*.so
diff --git a/contrib/postgresql_fdw/Makefile b/contrib/postgresql_fdw/Makefile
new file mode 100644
index 0000000..898036f
--- /dev/null
+++ b/contrib/postgresql_fdw/Makefile
@@ -0,0 +1,22 @@
+# contrib/postgresql_fdw/Makefile
+
+MODULE_big = postgresql_fdw
+OBJS = postgresql_fdw.o option.o deparse.o connection.o
+PG_CPPFLAGS = -I$(libpq_srcdir)
+SHLIB_LINK = $(libpq)
+
+EXTENSION = postgresql_fdw
+DATA = postgresql_fdw--1.0.sql
+
+REGRESS = postgresql_fdw
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/postgresql_fdw
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/postgresql_fdw/connection.c b/contrib/postgresql_fdw/connection.c
new file mode 100644
index 0000000..b7574c4
--- /dev/null
+++ b/contrib/postgresql_fdw/connection.c
@@ -0,0 +1,605 @@
+/*-------------------------------------------------------------------------
+ *
+ * connection.c
+ * Connection management for postgresql_fdw
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgresql_fdw/connection.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_type.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq-fe.h"
+#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "utils/array.h"
+#include "utils/builtins.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "utils/tuplestore.h"
+
+#include "postgresql_fdw.h"
+#include "connection.h"
+
+/* ============================================================================
+ * Connection management functions
+ * ==========================================================================*/
+
+/*
+ * Connection cache entry managed with hash table.
+ */
+typedef struct ConnCacheEntry
+{
+ /* hash key must be first */
+ Oid serverid; /* oid of foreign server */
+ Oid userid; /* oid of local user */
+
+ bool use_tx; /* true when using remote transaction */
+ int refs; /* reference counter */
+ PGconn *conn; /* foreign server connection */
+} ConnCacheEntry;
+
+/*
+ * Hash table which is used to cache connection to PostgreSQL servers, will be
+ * initialized before first attempt to connect PostgreSQL server by the backend.
+ */
+static HTAB *ConnectionHash;
+
+/* ----------------------------------------------------------------------------
+ * prototype of private functions
+ * --------------------------------------------------------------------------*/
+static void
+cleanup_connection(ResourceReleasePhase phase,
+ bool isCommit,
+ bool isTopLevel,
+ void *arg);
+static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
+static void begin_remote_tx(PGconn *conn);
+static void abort_remote_tx(PGconn *conn);
+
+/*
+ * Get a PGconn which can be used to execute foreign query on the remote
+ * PostgreSQL server with the user's authorization. If this was the first
+ * request for the server, new connection is established.
+ *
+ * When use_tx is true, remote transaction is started if caller is the only
+ * user of the connection. Isolation level of the remote transaction is same
+ * as local transaction, and remote transaction will be aborted when last
+ * user release.
+ *
+ * TODO: Note that caching connections requires a mechanism to detect change of
+ * FDW object to invalidate already established connections.
+ */
+PGconn *
+GetConnection(ForeignServer *server, UserMapping *user, bool use_tx)
+{
+ bool found;
+ ConnCacheEntry *entry;
+ ConnCacheEntry key;
+
+ /* initialize connection cache if it isn't */
+ if (ConnectionHash == NULL)
+ {
+ HASHCTL ctl;
+
+ /* hash key is a pair of oids: serverid and userid */
+ MemSet(&ctl, 0, sizeof(ctl));
+ ctl.keysize = sizeof(Oid) + sizeof(Oid);
+ ctl.entrysize = sizeof(ConnCacheEntry);
+ ctl.hash = tag_hash;
+ ctl.match = memcmp;
+ ctl.keycopy = memcpy;
+ /* allocate ConnectionHash in the cache context */
+ ctl.hcxt = CacheMemoryContext;
+ ConnectionHash = hash_create("postgresql_fdw connections", 32,
+ &ctl,
+ HASH_ELEM | HASH_CONTEXT |
+ HASH_FUNCTION | HASH_COMPARE |
+ HASH_KEYCOPY);
+
+ /*
+ * Register postgresql_fdw's own cleanup function for connection
+ * cleanup. This should be done just once for each backend.
+ */
+ RegisterResourceReleaseCallback(cleanup_connection, ConnectionHash);
+ }
+
+ /* Create key value for the entry. */
+ MemSet(&key, 0, sizeof(key));
+ key.serverid = server->serverid;
+ key.userid = GetOuterUserId();
+
+ /*
+ * Find cached entry for requested connection. If we couldn't find,
+ * callback function of ResourceOwner should be registered to clean the
+ * connection up on error including user interrupt.
+ */
+ entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
+ if (!found)
+ {
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ }
+
+ /*
+ * We don't check the health of cached connection here, because it would
+ * require some overhead. Broken connection and its cache entry will be
+ * cleaned up when the connection is actually used.
+ */
+
+ /*
+ * If cache entry doesn't have connection, we have to establish new
+ * connection.
+ */
+ if (entry->conn == NULL)
+ {
+ PGconn *volatile conn = NULL;
+
+ /*
+ * Use PG_TRY block to ensure closing connection on error.
+ */
+ PG_TRY();
+ {
+ /*
+ * Connect to the foreign PostgreSQL server, and store it in cache
+ * entry to keep new connection.
+ * Note: key items of entry has already been initialized in
+ * hash_search(HASH_ENTER).
+ */
+ conn = connect_pg_server(server, user);
+ }
+ PG_CATCH();
+ {
+ /* Clear connection cache entry on error case. */
+ PQfinish(entry->conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+ entry->conn = conn;
+ elog(DEBUG3, "new postgresql_fdw connection %p for server %s",
+ entry->conn, server->servername);
+ }
+
+ /* Increase connection reference counter. */
+ entry->refs++;
+
+ /*
+ * If remote transaction is requested but it has not started, start remote
+ * transaction with the same isolation level as the local transaction we
+ * are in. We need to remember whether this connection uses remote
+ * transaction to abort it when this connection is released completely.
+ */
+ if (use_tx && !entry->use_tx)
+ {
+ begin_remote_tx(entry->conn);
+ entry->use_tx = use_tx;
+ }
+
+ return entry->conn;
+}
+
+/*
+ * For non-superusers, insist that the connstr specify a password. This
+ * prevents a password from being picked up from .pgpass, a service file,
+ * the environment, etc. We don't want the postgres user's passwords
+ * to be accessible to non-superusers.
+ */
+static void
+check_conn_params(const char **keywords, const char **values)
+{
+ int i;
+
+ /* no check required if superuser */
+ if (superuser())
+ return;
+
+ /* ok if params contain a non-empty password */
+ for (i = 0; keywords[i] != NULL; i++)
+ {
+ if (strcmp(keywords[i], "password") == 0 && values[i][0] != '\0')
+ return;
+ }
+
+ ereport(ERROR,
+ (errcode(ERRCODE_S_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED),
+ errmsg("password is required"),
+ errdetail("Non-superusers must provide a password in the connection string.")));
+}
+
+static PGconn *
+connect_pg_server(ForeignServer *server, UserMapping *user)
+{
+ const char *conname = server->servername;
+ PGconn *conn;
+ const char **all_keywords;
+ const char **all_values;
+ const char **keywords;
+ const char **values;
+ int n;
+ int i, j;
+
+ /*
+ * Construct connection params from generic options of ForeignServer and
+ * UserMapping. Those two object hold only libpq options.
+ * Extra 3 items are for:
+ * *) fallback_application_name
+ * *) client_encoding
+ * *) NULL termination (end marker)
+ *
+ * Note: We don't omit any parameters even target database might be older
+ * than local, because unexpected parameters are just ignored.
+ */
+ n = list_length(server->options) + list_length(user->options) + 3;
+ all_keywords = (const char **) palloc(sizeof(char *) * n);
+ all_values = (const char **) palloc(sizeof(char *) * n);
+ keywords = (const char **) palloc(sizeof(char *) * n);
+ values = (const char **) palloc(sizeof(char *) * n);
+ n = 0;
+ n += ExtractConnectionOptions(server->options,
+ all_keywords + n, all_values + n);
+ n += ExtractConnectionOptions(user->options,
+ all_keywords + n, all_values + n);
+ all_keywords[n] = all_values[n] = NULL;
+
+ for (i = 0, j = 0; all_keywords[i]; i++)
+ {
+ keywords[j] = all_keywords[i];
+ values[j] = all_values[i];
+ j++;
+ }
+
+ /* Use "postgresql_fdw" as fallback_application_name. */
+ keywords[j] = "fallback_application_name";
+ values[j++] = "postgresql_fdw";
+
+ /* Set client_encoding so that libpq can convert encoding properly. */
+ keywords[j] = "client_encoding";
+ values[j++] = GetDatabaseEncodingName();
+
+ keywords[j] = values[j] = NULL;
+ pfree(all_keywords);
+ pfree(all_values);
+
+ /* verify connection parameters and do connect */
+ check_conn_params(keywords, values);
+ conn = PQconnectdbParams(keywords, values, 0);
+ if (!conn || PQstatus(conn) != CONNECTION_OK)
+ ereport(ERROR,
+ (errcode(ERRCODE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION),
+ errmsg("could not connect to server \"%s\"", conname),
+ errdetail("%s", PQerrorMessage(conn))));
+ pfree(keywords);
+ pfree(values);
+
+ /*
+ * Check that non-superuser has used password to establish connection.
+ * This check logic is based on dblink_security_check() in contrib/dblink.
+ *
+ * XXX Should we check this even if we don't provide unsafe version like
+ * dblink_connect_u()?
+ */
+ if (!superuser() && !PQconnectionUsedPassword(conn))
+ {
+ PQfinish(conn);
+ ereport(ERROR,
+ (errcode(ERRCODE_S_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED),
+ errmsg("password is required"),
+ errdetail("Non-superuser cannot connect if the server does not request a password."),
+ errhint("Target server's authentication method must be changed.")));
+ }
+
+ return conn;
+}
+
+/*
+ * Start remote transaction with proper isolation level.
+ */
+static void
+begin_remote_tx(PGconn *conn)
+{
+ const char *sql = NULL; /* keep compiler quiet. */
+ PGresult *res;
+
+ switch (XactIsoLevel)
+ {
+ case XACT_READ_UNCOMMITTED:
+ case XACT_READ_COMMITTED:
+ case XACT_REPEATABLE_READ:
+ sql = "START TRANSACTION ISOLATION LEVEL REPEATABLE READ";
+ break;
+ case XACT_SERIALIZABLE:
+ sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
+ break;
+ default:
+ elog(ERROR, "unexpected isolation level: %d", XactIsoLevel);
+ break;
+ }
+
+ elog(DEBUG3, "starting remote transaction with \"%s\"", sql);
+
+ res = PQexec(conn, sql);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ PQclear(res);
+ elog(ERROR, "could not start transaction: %s", PQerrorMessage(conn));
+ }
+ PQclear(res);
+}
+
+static void
+abort_remote_tx(PGconn *conn)
+{
+ PGresult *res;
+
+ elog(DEBUG3, "aborting remote transaction");
+
+ res = PQexec(conn, "ABORT TRANSACTION");
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ PQclear(res);
+ elog(ERROR, "could not abort transaction: %s", PQerrorMessage(conn));
+ }
+ PQclear(res);
+}
+
+/*
+ * Mark the connection as "unused", and close it if the caller was the last
+ * user of the connection.
+ */
+void
+ReleaseConnection(PGconn *conn)
+{
+ HASH_SEQ_STATUS scan;
+ ConnCacheEntry *entry;
+
+ if (conn == NULL)
+ return;
+
+ /*
+ * We need to scan sequentially since we use the address to find
+ * appropriate PGconn from the hash table.
+ */
+ hash_seq_init(&scan, ConnectionHash);
+ while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
+ {
+ if (entry->conn == conn)
+ {
+ hash_seq_term(&scan);
+ break;
+ }
+ }
+
+ /*
+ * If the given connection is an orphan, it must be a dangling pointer to
+ * already released connection. Discarding connection due to remote query
+ * error would produce such situation (see comments below).
+ */
+ if (entry == NULL)
+ return;
+
+ /*
+ * If releasing connection is broken or its transaction has failed,
+ * discard the connection to recover from the error. PQfinish would cause
+ * dangling pointer of shared PGconn object, but they won't double-free'd
+ * because their pointer values don't match any of cached entry and ignored
+ * at the check above.
+ *
+ * Subsequent connection request via GetConnection will create new
+ * connection.
+ */
+ if (PQstatus(conn) != CONNECTION_OK ||
+ (PQtransactionStatus(conn) != PQTRANS_IDLE &&
+ PQtransactionStatus(conn) != PQTRANS_INTRANS))
+ {
+ elog(DEBUG3, "discarding connection: %s %s",
+ PQstatus(conn) == CONNECTION_OK ? "OK" : "NG",
+ PQtransactionStatus(conn) == PQTRANS_IDLE ? "IDLE" :
+ PQtransactionStatus(conn) == PQTRANS_ACTIVE ? "ACTIVE" :
+ PQtransactionStatus(conn) == PQTRANS_INTRANS ? "INTRANS" :
+ PQtransactionStatus(conn) == PQTRANS_INERROR ? "INERROR" :
+ "UNKNOWN");
+ PQfinish(conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ return;
+ }
+
+ /*
+ * Decrease reference counter of this connection. Even if the caller was
+ * the last referrer, we don't unregister it from cache.
+ */
+ entry->refs--;
+ if (entry->refs < 0)
+ entry->refs = 0; /* just in case */
+
+ /*
+ * If this connection uses remote transaction and there is no user other
+ * than the caller, abort the remote transaction and forget about it.
+ */
+ if (entry->use_tx && entry->refs == 0)
+ {
+ abort_remote_tx(conn);
+ entry->use_tx = false;
+ }
+}
+
+/*
+ * Clean the connection up via ResourceOwner.
+ */
+static void
+cleanup_connection(ResourceReleasePhase phase,
+ bool isCommit,
+ bool isTopLevel,
+ void *arg)
+{
+ HASH_SEQ_STATUS scan;
+ ConnCacheEntry *entry = (ConnCacheEntry *) arg;
+
+ /* If the transaction was committed, don't close connections. */
+ if (isCommit)
+ return;
+
+ /*
+ * We clean the connection up on post-lock because foreign connections are
+ * backend-internal resource.
+ */
+ if (phase != RESOURCE_RELEASE_AFTER_LOCKS)
+ return;
+
+ /*
+ * We ignore cleanup for ResourceOwners other than transaction. At this
+ * point, such a ResourceOwner is only Portal.
+ */
+ if (CurrentResourceOwner != CurTransactionResourceOwner)
+ return;
+
+ /*
+ * We don't need to clean up at end of subtransactions, because they might
+ * be recovered to consistent state with savepoints.
+ */
+ if (!isTopLevel)
+ return;
+
+ /*
+ * Here, it must be after abort of top level transaction. Disconnect all
+ * cached connections to clear error status out and reset their reference
+ * counters.
+ */
+ hash_seq_init(&scan, ConnectionHash);
+ while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
+ {
+ elog(DEBUG3, "discard postgresql_fdw connection %p due to resowner cleanup",
+ entry->conn);
+ PQfinish(entry->conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ }
+}
+
+/*
+ * Get list of connections currently active.
+ */
+Datum postgresql_fdw_get_connections(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgresql_fdw_get_connections);
+Datum
+postgresql_fdw_get_connections(PG_FUNCTION_ARGS)
+{
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ HASH_SEQ_STATUS scan;
+ ConnCacheEntry *entry;
+ MemoryContext oldcontext = CurrentMemoryContext;
+ Tuplestorestate *tuplestore;
+ TupleDesc tupdesc;
+
+ /* We return list of connection with storing them in a Tuplestore. */
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = NULL;
+ rsinfo->setDesc = NULL;
+
+ /* Create tuplestore and copy of TupleDesc in per-query context. */
+ MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+
+ tupdesc = CreateTemplateTupleDesc(2, false);
+ TupleDescInitEntry(tupdesc, 1, "srvid", OIDOID, -1, 0);
+ TupleDescInitEntry(tupdesc, 2, "usesysid", OIDOID, -1, 0);
+ rsinfo->setDesc = tupdesc;
+
+ tuplestore = tuplestore_begin_heap(false, false, work_mem);
+ rsinfo->setResult = tuplestore;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * We need to scan sequentially since we use the address to find
+ * appropriate PGconn from the hash table.
+ */
+ if (ConnectionHash != NULL)
+ {
+ hash_seq_init(&scan, ConnectionHash);
+ while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
+ {
+ Datum values[2];
+ bool nulls[2];
+ HeapTuple tuple;
+
+ /* Ignore inactive connections */
+ if (PQstatus(entry->conn) != CONNECTION_OK)
+ continue;
+
+ /*
+ * Ignore other users' connections if current user isn't a
+ * superuser.
+ */
+ if (!superuser() && entry->userid != GetUserId())
+ continue;
+
+ values[0] = ObjectIdGetDatum(entry->serverid);
+ values[1] = ObjectIdGetDatum(entry->userid);
+ nulls[0] = false;
+ nulls[1] = false;
+
+ tuple = heap_formtuple(tupdesc, values, nulls);
+ tuplestore_puttuple(tuplestore, tuple);
+ }
+ }
+ tuplestore_donestoring(tuplestore);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Discard persistent connection designated by given connection name.
+ */
+Datum postgresql_fdw_disconnect(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgresql_fdw_disconnect);
+Datum
+postgresql_fdw_disconnect(PG_FUNCTION_ARGS)
+{
+ Oid serverid = PG_GETARG_OID(0);
+ Oid userid = PG_GETARG_OID(1);
+ ConnCacheEntry key;
+ ConnCacheEntry *entry = NULL;
+ bool found;
+
+ /* Non-superuser can't discard other users' connection. */
+ if (!superuser() && userid != GetOuterUserId())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+ errmsg("only superuser can discard other user's connection")));
+
+ /*
+ * If no connection has been established, or no such connections, just
+ * return "NG" to indicate nothing has done.
+ */
+ if (ConnectionHash == NULL)
+ PG_RETURN_TEXT_P(cstring_to_text("NG"));
+
+ key.serverid = serverid;
+ key.userid = userid;
+ entry = hash_search(ConnectionHash, &key, HASH_FIND, &found);
+ if (!found)
+ PG_RETURN_TEXT_P(cstring_to_text("NG"));
+
+ /* Discard cached connection, and clear reference counter. */
+ PQfinish(entry->conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+
+ PG_RETURN_TEXT_P(cstring_to_text("OK"));
+}
diff --git a/contrib/postgresql_fdw/connection.h b/contrib/postgresql_fdw/connection.h
new file mode 100644
index 0000000..17355df
--- /dev/null
+++ b/contrib/postgresql_fdw/connection.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * connection.h
+ * Connection management for postgresql_fdw
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgresql_fdw/connection.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CONNECTION_H
+#define CONNECTION_H
+
+#include "foreign/foreign.h"
+#include "libpq-fe.h"
+
+/*
+ * Connection management
+ */
+PGconn *GetConnection(ForeignServer *server, UserMapping *user, bool use_tx);
+void ReleaseConnection(PGconn *conn);
+
+#endif /* CONNECTION_H */
diff --git a/contrib/postgresql_fdw/deparse.c b/contrib/postgresql_fdw/deparse.c
new file mode 100644
index 0000000..698dbb8
--- /dev/null
+++ b/contrib/postgresql_fdw/deparse.c
@@ -0,0 +1,1203 @@
+/*-------------------------------------------------------------------------
+ *
+ * deparse.c
+ * query deparser for PostgreSQL
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgresql_fdw/deparse.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/transam.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "commands/defrem.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "nodes/nodeFuncs.h"
+#include "nodes/nodes.h"
+#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/var.h"
+#include "parser/parser.h"
+#include "parser/parsetree.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+#include "postgresql_fdw.h"
+
+/*
+ * Context for walk-through the expression tree.
+ */
+typedef struct foreign_executable_cxt
+{
+ PlannerInfo *root;
+ RelOptInfo *foreignrel;
+ bool has_param;
+} foreign_executable_cxt;
+
+/*
+ * Get string representation which can be used in SQL statement from a node.
+ */
+static void deparseExpr(StringInfo buf, Expr *expr, PlannerInfo *root);
+static void deparseRelation(StringInfo buf, RangeTblEntry *rte);
+static void deparseVar(StringInfo buf, Var *node, PlannerInfo *root);
+static void deparseConst(StringInfo buf, Const *node, PlannerInfo *root);
+static void deparseBoolExpr(StringInfo buf, BoolExpr *node, PlannerInfo *root);
+static void deparseNullTest(StringInfo buf, NullTest *node, PlannerInfo *root);
+static void deparseDistinctExpr(StringInfo buf, DistinctExpr *node,
+ PlannerInfo *root);
+static void deparseRelabelType(StringInfo buf, RelabelType *node,
+ PlannerInfo *root);
+static void deparseFuncExpr(StringInfo buf, FuncExpr *node, PlannerInfo *root);
+static void deparseParam(StringInfo buf, Param *node, PlannerInfo *root);
+static void deparseScalarArrayOpExpr(StringInfo buf, ScalarArrayOpExpr *node,
+ PlannerInfo *root);
+static void deparseOpExpr(StringInfo buf, OpExpr *node, PlannerInfo *root);
+static void deparseArrayRef(StringInfo buf, ArrayRef *node, PlannerInfo *root);
+static void deparseArrayExpr(StringInfo buf, ArrayExpr *node, PlannerInfo *root);
+
+/*
+ * Determine whether an expression can be evaluated on remote side safely.
+ */
+static bool is_foreign_expr(PlannerInfo *root, RelOptInfo *baserel, Expr *expr,
+ bool *has_param);
+static bool foreign_expr_walker(Node *node, foreign_executable_cxt *context);
+static bool is_builtin(Oid procid);
+
+/*
+ * Deparse query representation into SQL statement which suits for remote
+ * PostgreSQL server. This function basically creates simple query string
+ * which consists of only SELECT, FROM clauses.
+ *
+ * Remote SELECT clause contains only columns which are used in targetlist or
+ * local_conds (conditions which can't be pushed down and will be checked on
+ * local side).
+ */
+void
+deparseSimpleSql(StringInfo buf,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ List *local_conds)
+{
+ RangeTblEntry *rte;
+ ListCell *lc;
+ StringInfoData foreign_relname;
+ bool first;
+ AttrNumber attr;
+ List *attr_used = NIL; /* List of AttNumber used in the query */
+
+ initStringInfo(buf);
+ initStringInfo(&foreign_relname);
+
+ /*
+ * First of all, determine which column should be retrieved for this scan.
+ *
+ * We do this before deparsing SELECT clause because attributes which are
+ * not used in neither reltargetlist nor baserel->baserestrictinfo, quals
+ * evaluated on local, can be replaced with literal "NULL" in the SELECT
+ * clause to reduce overhead of tuple handling tuple and data transfer.
+ */
+ foreach (lc, local_conds)
+ {
+ RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
+ List *attrs;
+
+ /*
+ * We need to know which attributes are used in qual evaluated
+ * on the local server, because they should be listed in the
+ * SELECT clause of remote query. We can ignore attributes
+ * which are referenced only in ORDER BY/GROUP BY clause because
+ * such attributes has already been kept in reltargetlist.
+ */
+ attrs = pull_var_clause((Node *) ri->clause,
+ PVC_RECURSE_AGGREGATES,
+ PVC_RECURSE_PLACEHOLDERS);
+ attr_used = list_union(attr_used, attrs);
+ }
+
+ /*
+ * deparse SELECT clause
+ *
+ * List attributes which are in either target list or local restriction.
+ * Unused attributes are replaced with a literal "NULL" for optimization.
+ *
+ * Note that nothing is added for dropped columns, though tuple constructor
+ * function requires entries for dropped columns. Such entries must be
+ * initialized with NULL before calling tuple constructor.
+ */
+ appendStringInfo(buf, "SELECT ");
+ rte = root->simple_rte_array[baserel->relid];
+ attr_used = list_union(attr_used, baserel->reltargetlist);
+ first = true;
+ for (attr = 1; attr <= baserel->max_attr; attr++)
+ {
+ Var *var = NULL;
+ ListCell *lc;
+
+ /* Ignore dropped attributes. */
+ if (get_rte_attribute_is_dropped(rte, attr))
+ continue;
+
+ if (!first)
+ appendStringInfo(buf, ", ");
+ first = false;
+
+ /*
+ * We use linear search here, but it wouldn't be problem since
+ * attr_used seems to not become so large.
+ */
+ foreach (lc, attr_used)
+ {
+ var = lfirst(lc);
+ if (var->varattno == attr)
+ break;
+ var = NULL;
+ }
+ if (var != NULL)
+ deparseVar(buf, var, root);
+ else
+ appendStringInfo(buf, "NULL");
+ }
+ appendStringInfoChar(buf, ' ');
+
+ /*
+ * deparse FROM clause, including alias if any
+ */
+ appendStringInfo(buf, "FROM ");
+ deparseRelation(buf, root->simple_rte_array[baserel->relid]);
+}
+
+/*
+ * Examine each element in the list baserestrictinfo of baserel, and classify
+ * them into three groups: remote_conds contains conditions which can be
+ * evaluated
+ * - remote_conds is push-down safe, and don't contain any Param node
+ * - param_conds is push-down safe, but contain some Param node
+ * - local_conds is not push-down safe
+ *
+ * Only remote_conds can be used in remote EXPLAIN, and remote_conds and
+ * param_conds can be used in final remote query.
+ */
+void
+classifyConditions(PlannerInfo *root,
+ RelOptInfo *baserel,
+ List **remote_conds,
+ List **param_conds,
+ List **local_conds)
+{
+ ListCell *lc;
+ bool has_param;
+
+ Assert(remote_conds);
+ Assert(param_conds);
+ Assert(local_conds);
+
+ foreach(lc, baserel->baserestrictinfo)
+ {
+ RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
+
+ if (is_foreign_expr(root, baserel, ri->clause, &has_param))
+ {
+ if (has_param)
+ *param_conds = lappend(*param_conds, ri);
+ else
+ *remote_conds = lappend(*remote_conds, ri);
+ }
+ else
+ *local_conds = lappend(*local_conds, ri);
+ }
+}
+
+/*
+ * Deparse SELECT statement to acquire sample rows of given relation into buf.
+ */
+void
+deparseAnalyzeSql(StringInfo buf, Relation rel)
+{
+ Oid relid = RelationGetRelid(rel);
+ TupleDesc tupdesc = RelationGetDescr(rel);
+ int i;
+ char *colname;
+ List *options;
+ ListCell *lc;
+ bool first = true;
+ char *nspname;
+ char *relname;
+ ForeignTable *table;
+
+ /* Deparse SELECT clause, use attribute name or colname option. */
+ appendStringInfo(buf, "SELECT ");
+ for (i = 0; i < tupdesc->natts; i++)
+ {
+ if (tupdesc->attrs[i]->attisdropped)
+ continue;
+
+ colname = NameStr(tupdesc->attrs[i]->attname);
+ options = GetForeignColumnOptions(relid, tupdesc->attrs[i]->attnum);
+
+ foreach(lc, options)
+ {
+ DefElem *def= (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "colname") == 0)
+ {
+ colname = defGetString(def);
+ break;
+ }
+ }
+
+ if (!first)
+ appendStringInfo(buf, ", ");
+ appendStringInfo(buf, "%s", quote_identifier(colname));
+ first = false;
+ }
+
+ /*
+ * Deparse FROM clause, use namespace and relation name, or use nspname and
+ * colname options respectively.
+ */
+ nspname = get_namespace_name(get_rel_namespace(relid));
+ relname = get_rel_name(relid);
+ table = GetForeignTable(relid);
+ foreach(lc, table->options)
+ {
+ DefElem *def= (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "nspname") == 0)
+ nspname = defGetString(def);
+ else if (strcmp(def->defname, "relname") == 0)
+ relname = defGetString(def);
+ }
+
+ appendStringInfo(buf, " FROM %s.%s", quote_identifier(nspname),
+ quote_identifier(relname));
+}
+
+/*
+ * Deparse given expression into buf. Actual string operation is delegated to
+ * node-type-specific functions.
+ *
+ * Note that switch statement of this function MUST match the one in
+ * foreign_expr_walker to avoid unsupported error..
+ */
+static void
+deparseExpr(StringInfo buf, Expr *node, PlannerInfo *root)
+{
+ /*
+ * This part must be match foreign_expr_walker.
+ */
+ switch (nodeTag(node))
+ {
+ case T_Const:
+ deparseConst(buf, (Const *) node, root);
+ break;
+ case T_BoolExpr:
+ deparseBoolExpr(buf, (BoolExpr *) node, root);
+ break;
+ case T_NullTest:
+ deparseNullTest(buf, (NullTest *) node, root);
+ break;
+ case T_DistinctExpr:
+ deparseDistinctExpr(buf, (DistinctExpr *) node, root);
+ break;
+ case T_RelabelType:
+ deparseRelabelType(buf, (RelabelType *) node, root);
+ break;
+ case T_FuncExpr:
+ deparseFuncExpr(buf, (FuncExpr *) node, root);
+ break;
+ case T_Param:
+ deparseParam(buf, (Param *) node, root);
+ break;
+ case T_ScalarArrayOpExpr:
+ deparseScalarArrayOpExpr(buf, (ScalarArrayOpExpr *) node, root);
+ break;
+ case T_OpExpr:
+ deparseOpExpr(buf, (OpExpr *) node, root);
+ break;
+ case T_Var:
+ deparseVar(buf, (Var *) node, root);
+ break;
+ case T_ArrayRef:
+ deparseArrayRef(buf, (ArrayRef *) node, root);
+ break;
+ case T_ArrayExpr:
+ deparseArrayExpr(buf, (ArrayExpr *) node, root);
+ break;
+ default:
+ {
+ ereport(ERROR,
+ (errmsg("unsupported expression for deparse"),
+ errdetail("%s", nodeToString(node))));
+ }
+ break;
+ }
+}
+
+/*
+ * Deparse given Var node into buf. If the column has colname FDW option, use
+ * its value instead of attribute name.
+ */
+static void
+deparseVar(StringInfo buf, Var *node, PlannerInfo *root)
+{
+ RangeTblEntry *rte;
+ char *colname = NULL;
+ const char *q_colname = NULL;
+ List *options;
+ ListCell *lc;
+
+ /* node must not be any of OUTER_VAR,INNER_VAR and INDEX_VAR. */
+ Assert(node->varno >= 1 && node->varno <= root->simple_rel_array_size);
+
+ /* Get RangeTblEntry from array in PlannerInfo. */
+ rte = root->simple_rte_array[node->varno];
+
+ /*
+ * If the node is a column of a foreign table, and it has colname FDW
+ * option, use its value.
+ */
+ options = GetForeignColumnOptions(rte->relid, node->varattno);
+ foreach(lc, options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "colname") == 0)
+ {
+ colname = defGetString(def);
+ break;
+ }
+ }
+
+ /*
+ * If the node refers a column of a regular table or it doesn't have colname
+ * FDW option, use attribute name.
+ */
+ if (colname == NULL)
+ colname = get_attname(rte->relid, node->varattno);
+
+ q_colname = quote_identifier(colname);
+ appendStringInfo(buf, "%s", q_colname);
+}
+
+/*
+ * Deparse a RangeTblEntry node into buf. If rte represents a foreign table,
+ * use value of relname FDW option (if any) instead of relation's name.
+ * Similarly, nspname FDW option overrides schema name.
+ */
+static void
+deparseRelation(StringInfo buf, RangeTblEntry *rte)
+{
+ ForeignTable *table;
+ ListCell *lc;
+ const char *nspname = NULL; /* plain namespace name */
+ const char *relname = NULL; /* plain relation name */
+ const char *q_nspname; /* quoted namespace name */
+ const char *q_relname; /* quoted relation name */
+
+ /* obtain additional catalog information. */
+ table = GetForeignTable(rte->relid);
+
+ /*
+ * Use value of FDW options if any, instead of the name of object
+ * itself.
+ */
+ foreach(lc, table->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "nspname") == 0)
+ nspname = defGetString(def);
+ else if (strcmp(def->defname, "relname") == 0)
+ relname = defGetString(def);
+ }
+
+ /* Quote each identifier, if necessary. */
+ if (nspname == NULL)
+ nspname = get_namespace_name(get_rel_namespace(rte->relid));
+ q_nspname = quote_identifier(nspname);
+
+ if (relname == NULL)
+ relname = get_rel_name(rte->relid);
+ q_relname = quote_identifier(relname);
+
+ /* Construct relation reference into the buffer. */
+ appendStringInfo(buf, "%s.%s", q_nspname, q_relname);
+}
+
+/*
+ * Deparse given constant value into buf. This function have to be kept in
+ * sync with get_const_expr.
+ */
+static void
+deparseConst(StringInfo buf,
+ Const *node,
+ PlannerInfo *root)
+{
+ Oid typoutput;
+ bool typIsVarlena;
+ char *extval;
+ bool isfloat = false;
+ bool needlabel;
+
+ if (node->constisnull)
+ {
+ appendStringInfo(buf, "NULL");
+ return;
+ }
+
+ getTypeOutputInfo(node->consttype,
+ &typoutput, &typIsVarlena);
+ extval = OidOutputFunctionCall(typoutput, node->constvalue);
+
+ switch (node->consttype)
+ {
+ case ANYARRAYOID:
+ case ANYNONARRAYOID:
+ elog(ERROR, "anyarray and anyenum are not supported");
+ break;
+ case INT2OID:
+ case INT4OID:
+ case INT8OID:
+ case OIDOID:
+ case FLOAT4OID:
+ case FLOAT8OID:
+ case NUMERICOID:
+ {
+ /*
+ * No need to quote unless they contain special values such as
+ * 'Nan'.
+ */
+ if (strspn(extval, "0123456789+-eE.") == strlen(extval))
+ {
+ if (extval[0] == '+' || extval[0] == '-')
+ appendStringInfo(buf, "(%s)", extval);
+ else
+ appendStringInfoString(buf, extval);
+ if (strcspn(extval, "eE.") != strlen(extval))
+ isfloat = true; /* it looks like a float */
+ }
+ else
+ appendStringInfo(buf, "'%s'", extval);
+ }
+ break;
+ case BITOID:
+ case VARBITOID:
+ appendStringInfo(buf, "B'%s'", extval);
+ break;
+ case BOOLOID:
+ if (strcmp(extval, "t") == 0)
+ appendStringInfoString(buf, "true");
+ else
+ appendStringInfoString(buf, "false");
+ break;
+
+ default:
+ {
+ const char *valptr;
+
+ appendStringInfoChar(buf, '\'');
+ for (valptr = extval; *valptr; valptr++)
+ {
+ char ch = *valptr;
+
+ /*
+ * standard_conforming_strings of remote session should be
+ * set to similar value as local session.
+ */
+ if (SQL_STR_DOUBLE(ch, !standard_conforming_strings))
+ appendStringInfoChar(buf, ch);
+ appendStringInfoChar(buf, ch);
+ }
+ appendStringInfoChar(buf, '\'');
+ }
+ break;
+ }
+
+ /*
+ * Append ::typename unless the constant will be implicitly typed as the
+ * right type when it is read in.
+ *
+ * XXX this code has to be kept in sync with the behavior of the parser,
+ * especially make_const.
+ */
+ switch (node->consttype)
+ {
+ case BOOLOID:
+ case INT4OID:
+ case UNKNOWNOID:
+ needlabel = false;
+ break;
+ case NUMERICOID:
+ needlabel = !isfloat || (node->consttypmod >= 0);
+ break;
+ default:
+ needlabel = true;
+ break;
+ }
+ if (needlabel)
+ {
+ appendStringInfo(buf, "::%s",
+ format_type_with_typemod(node->consttype,
+ node->consttypmod));
+ }
+}
+
+static void
+deparseBoolExpr(StringInfo buf,
+ BoolExpr *node,
+ PlannerInfo *root)
+{
+ ListCell *lc;
+ char *op = NULL; /* keep compiler quiet */
+ bool first;
+
+ switch (node->boolop)
+ {
+ case AND_EXPR:
+ op = "AND";
+ break;
+ case OR_EXPR:
+ op = "OR";
+ break;
+ case NOT_EXPR:
+ appendStringInfo(buf, "(NOT ");
+ deparseExpr(buf, list_nth(node->args, 0), root);
+ appendStringInfo(buf, ")");
+ return;
+ }
+
+ first = true;
+ appendStringInfo(buf, "(");
+ foreach(lc, node->args)
+ {
+ if (!first)
+ appendStringInfo(buf, " %s ", op);
+ deparseExpr(buf, (Expr *) lfirst(lc), root);
+ first = false;
+ }
+ appendStringInfo(buf, ")");
+}
+
+/*
+ * Deparse given IS [NOT] NULL test expression into buf.
+ */
+static void
+deparseNullTest(StringInfo buf,
+ NullTest *node,
+ PlannerInfo *root)
+{
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, node->arg, root);
+ if (node->nulltesttype == IS_NULL)
+ appendStringInfo(buf, " IS NULL)");
+ else
+ appendStringInfo(buf, " IS NOT NULL)");
+}
+
+static void
+deparseDistinctExpr(StringInfo buf,
+ DistinctExpr *node,
+ PlannerInfo *root)
+{
+ Assert(list_length(node->args) == 2);
+
+ deparseExpr(buf, linitial(node->args), root);
+ appendStringInfo(buf, " IS DISTINCT FROM ");
+ deparseExpr(buf, lsecond(node->args), root);
+}
+
+static void
+deparseRelabelType(StringInfo buf,
+ RelabelType *node,
+ PlannerInfo *root)
+{
+ char *typname;
+
+ Assert(node->arg);
+
+ /* We don't need to deparse cast when argument has same type as result. */
+ if (IsA(node->arg, Const) &&
+ ((Const *) node->arg)->consttype == node->resulttype &&
+ ((Const *) node->arg)->consttypmod == -1)
+ {
+ deparseExpr(buf, node->arg, root);
+ return;
+ }
+
+ typname = format_type_with_typemod(node->resulttype, node->resulttypmod);
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, node->arg, root);
+ appendStringInfo(buf, ")::%s", typname);
+}
+
+/*
+ * Deparse given node which represents a function call into buf. We treat only
+ * explicit function call and explicit cast (coerce), because others are
+ * processed on remote side if necessary.
+ *
+ * Function name (and type name) is always qualified by schema name to avoid
+ * problems caused by different setting of search_path on remote side.
+ */
+static void
+deparseFuncExpr(StringInfo buf,
+ FuncExpr *node,
+ PlannerInfo *root)
+{
+ Oid pronamespace;
+ const char *schemaname;
+ const char *funcname;
+ ListCell *arg;
+ bool first;
+
+ pronamespace = get_func_namespace(node->funcid);
+ schemaname = quote_identifier(get_namespace_name(pronamespace));
+ funcname = quote_identifier(get_func_name(node->funcid));
+
+ if (node->funcformat == COERCE_EXPLICIT_CALL)
+ {
+ /* Function call, deparse all arguments recursively. */
+ appendStringInfo(buf, "%s.%s(", schemaname, funcname);
+ first = true;
+ foreach(arg, node->args)
+ {
+ if (!first)
+ appendStringInfo(buf, ", ");
+ deparseExpr(buf, lfirst(arg), root);
+ first = false;
+ }
+ appendStringInfoChar(buf, ')');
+ }
+ else if (node->funcformat == COERCE_EXPLICIT_CAST)
+ {
+ /* Explicit cast, deparse only first argument. */
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, linitial(node->args), root);
+ appendStringInfo(buf, ")::%s", funcname);
+ }
+ else
+ {
+ /* Implicit cast, deparse only first argument. */
+ deparseExpr(buf, linitial(node->args), root);
+ }
+}
+
+/*
+ * Deparse given Param node into buf.
+ *
+ * We don't renumber parameter id, because skipping $1 is not cause problem
+ * as far as we pass through all arguments.
+ */
+static void
+deparseParam(StringInfo buf,
+ Param *node,
+ PlannerInfo *root)
+{
+ Assert(node->paramkind == PARAM_EXTERN);
+
+ appendStringInfo(buf, "$%d", node->paramid);
+}
+
+/*
+ * Deparse given ScalarArrayOpExpr expression into buf. To avoid problems
+ * around priority of operations, we always parenthesize the arguments. Also we
+ * use OPERATOR(schema.operator) notation to determine remote operator exactly.
+ */
+static void
+deparseScalarArrayOpExpr(StringInfo buf,
+ ScalarArrayOpExpr *node,
+ PlannerInfo *root)
+{
+ HeapTuple tuple;
+ Form_pg_operator form;
+ const char *opnspname;
+ char *opname;
+ Expr *arg1;
+ Expr *arg2;
+
+ /* Retrieve necessary information about the operator from system catalog. */
+ tuple = SearchSysCache1(OPEROID, ObjectIdGetDatum(node->opno));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for operator %u", node->opno);
+ form = (Form_pg_operator) GETSTRUCT(tuple);
+ /* opname is not a SQL identifier, so we don't need to quote it. */
+ opname = NameStr(form->oprname);
+ opnspname = quote_identifier(get_namespace_name(form->oprnamespace));
+ ReleaseSysCache(tuple);
+
+ /* Sanity check. */
+ Assert(list_length(node->args) == 2);
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, '(');
+
+ /* Extract operands. */
+ arg1 = linitial(node->args);
+ arg2 = lsecond(node->args);
+
+ /* Deparse fully qualified operator name. */
+ deparseExpr(buf, arg1, root);
+ appendStringInfo(buf, " OPERATOR(%s.%s) %s (",
+ opnspname, opname, node->useOr ? "ANY" : "ALL");
+ deparseExpr(buf, arg2, root);
+ appendStringInfoChar(buf, ')');
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, ')');
+}
+
+/*
+ * Deparse given operator expression into buf. To avoid problems around
+ * priority of operations, we always parenthesize the arguments. Also we use
+ * OPERATOR(schema.operator) notation to determine remote operator exactly.
+ */
+static void
+deparseOpExpr(StringInfo buf,
+ OpExpr *node,
+ PlannerInfo *root)
+{
+ HeapTuple tuple;
+ Form_pg_operator form;
+ const char *opnspname;
+ char *opname;
+ char oprkind;
+ ListCell *arg;
+
+ /* Retrieve necessary information about the operator from system catalog. */
+ tuple = SearchSysCache1(OPEROID, ObjectIdGetDatum(node->opno));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for operator %u", node->opno);
+ form = (Form_pg_operator) GETSTRUCT(tuple);
+ opnspname = quote_identifier(get_namespace_name(form->oprnamespace));
+ /* opname is not a SQL identifier, so we don't need to quote it. */
+ opname = NameStr(form->oprname);
+ oprkind = form->oprkind;
+ ReleaseSysCache(tuple);
+
+ /* Sanity check. */
+ Assert((oprkind == 'r' && list_length(node->args) == 1) ||
+ (oprkind == 'l' && list_length(node->args) == 1) ||
+ (oprkind == 'b' && list_length(node->args) == 2));
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, '(');
+
+ /* Deparse first operand. */
+ arg = list_head(node->args);
+ if (oprkind == 'r' || oprkind == 'b')
+ {
+ deparseExpr(buf, lfirst(arg), root);
+ appendStringInfoChar(buf, ' ');
+ }
+
+ /* Deparse fully qualified operator name. */
+ appendStringInfo(buf, "OPERATOR(%s.%s)", opnspname, opname);
+
+ /* Deparse last operand. */
+ arg = list_tail(node->args);
+ if (oprkind == 'l' || oprkind == 'b')
+ {
+ appendStringInfoChar(buf, ' ');
+ deparseExpr(buf, lfirst(arg), root);
+ }
+
+ appendStringInfoChar(buf, ')');
+}
+
+static void
+deparseArrayRef(StringInfo buf,
+ ArrayRef *node,
+ PlannerInfo *root)
+{
+ ListCell *lowlist_item;
+ ListCell *uplist_item;
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, '(');
+
+ /* Deparse referenced array expression first. */
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, node->refexpr, root);
+ appendStringInfoChar(buf, ')');
+
+ /* Deparse subscripts expression. */
+ lowlist_item = list_head(node->reflowerindexpr); /* could be NULL */
+ foreach(uplist_item, node->refupperindexpr)
+ {
+ appendStringInfoChar(buf, '[');
+ if (lowlist_item)
+ {
+ deparseExpr(buf, lfirst(lowlist_item), root);
+ appendStringInfoChar(buf, ':');
+ lowlist_item = lnext(lowlist_item);
+ }
+ deparseExpr(buf, lfirst(uplist_item), root);
+ appendStringInfoChar(buf, ']');
+ }
+
+ appendStringInfoChar(buf, ')');
+}
+
+
+/*
+ * Deparse given array of something into buf.
+ */
+static void
+deparseArrayExpr(StringInfo buf,
+ ArrayExpr *node,
+ PlannerInfo *root)
+{
+ ListCell *lc;
+ bool first = true;
+
+ appendStringInfo(buf, "ARRAY[");
+ foreach(lc, node->elements)
+ {
+ if (!first)
+ appendStringInfo(buf, ", ");
+ deparseExpr(buf, lfirst(lc), root);
+
+ first = false;
+ }
+ appendStringInfoChar(buf, ']');
+
+ /* If the array is empty, we need explicit cast to the array type. */
+ if (node->elements == NIL)
+ {
+ char *typname;
+
+ typname = format_type_with_typemod(node->array_typeid, -1);
+ appendStringInfo(buf, "::%s", typname);
+ }
+}
+
+/*
+ * Returns true if given expr is safe to evaluate on the foreign server. If
+ * result is true, extra information has_param tells whether given expression
+ * contains any Param node. This is useful to determine whether the expression
+ * can be used in remote EXPLAIN.
+ */
+static bool
+is_foreign_expr(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Expr *expr,
+ bool *has_param)
+{
+ foreign_executable_cxt context;
+ context.root = root;
+ context.foreignrel = baserel;
+ context.has_param = false;
+
+ /*
+ * An expression which includes any mutable function can't be pushed down
+ * because it's result is not stable. For example, pushing now() down to
+ * remote side would cause confusion from the clock offset.
+ * If we have routine mapping infrastructure in future release, we will be
+ * able to choose function to be pushed down in finer granularity.
+ */
+ if (contain_mutable_functions((Node *) expr))
+ {
+ elog(DEBUG3, "expr has mutable function");
+ return false;
+ }
+
+ /*
+ * Check that the expression consists of nodes which are known as safe to
+ * be pushed down.
+ */
+ if (foreign_expr_walker((Node *) expr, &context))
+ return false;
+
+ /*
+ * Tell caller whether the given expression contains any Param node, which
+ * can't be used in EXPLAIN statement before executor starts.
+ */
+ *has_param = context.has_param;
+
+ return true;
+}
+
+/*
+ * Return true if node includes any node which is not known as safe to be
+ * pushed down.
+ */
+static bool
+foreign_expr_walker(Node *node, foreign_executable_cxt *context)
+{
+ if (node == NULL)
+ return false;
+
+ /*
+ * Special case handling for List; expression_tree_walker handles List as
+ * well as other Expr nodes. For instance, List is used in RestrictInfo
+ * for args of FuncExpr node.
+ *
+ * Although the comments of expression_tree_walker mention that
+ * RangeTblRef, FromExpr, JoinExpr, and SetOperationStmt are handled as
+ * well, but we don't care them because they are not used in RestrictInfo.
+ * If one of them was passed into, default label catches it and give up
+ * traversing.
+ */
+ if (IsA(node, List))
+ {
+ ListCell *lc;
+
+ foreach(lc, (List *) node)
+ {
+ if (foreign_expr_walker(lfirst(lc), context))
+ return true;
+ }
+ return false;
+ }
+
+ /*
+ * If return type of given expression is not built-in, it can't be pushed
+ * down because it might has incompatible semantics on remote side.
+ */
+ if (!is_builtin(exprType(node)))
+ {
+ elog(DEBUG3, "expr has user-defined type");
+ return true;
+ }
+
+ switch (nodeTag(node))
+ {
+ case T_Const:
+ /*
+ * Using anyarray and/or anyenum in remote query is not supported.
+ */
+ if (((Const *) node)->consttype == ANYARRAYOID ||
+ ((Const *) node)->consttype == ANYNONARRAYOID)
+ {
+ elog(DEBUG3, "expr has anyarray or anyenum");
+ return true;
+ }
+ break;
+ case T_BoolExpr:
+ case T_NullTest:
+ case T_DistinctExpr:
+ case T_RelabelType:
+ /*
+ * These type of nodes are known as safe to be pushed down.
+ * Of course the subtree of the node, if any, should be checked
+ * continuously at the tail of this function.
+ */
+ break;
+ /*
+ * If function used by the expression is not built-in, it can't be
+ * pushed down because it might has incompatible semantics on remote
+ * side.
+ */
+ case T_FuncExpr:
+ {
+ FuncExpr *fe = (FuncExpr *) node;
+ if (!is_builtin(fe->funcid))
+ {
+ elog(DEBUG3, "expr has user-defined function");
+ return true;
+ }
+ }
+ break;
+ case T_Param:
+ /*
+ * Only external parameters can be pushed down.:
+ */
+ {
+ if (((Param *) node)->paramkind != PARAM_EXTERN)
+ {
+ elog(DEBUG3, "expr has non-external parameter");
+ return true;
+ }
+
+ /* Mark that this expression contains Param node. */
+ context->has_param = true;
+ }
+ break;
+ case T_ScalarArrayOpExpr:
+ /*
+ * Only built-in operators can be pushed down. In addition,
+ * underlying function must be built-in and immutable, but we don't
+ * check volatility here; such check must be done already with
+ * contain_mutable_functions.
+ */
+ {
+ ScalarArrayOpExpr *oe = (ScalarArrayOpExpr *) node;
+
+ if (!is_builtin(oe->opno) || !is_builtin(oe->opfuncid))
+ {
+ elog(DEBUG3, "expr has user-defined scalar-array operator");
+ return true;
+ }
+
+ /*
+ * If the operator takes collatable type as operands, we push
+ * down only "=" and "<>" which are not affected by collation.
+ * Other operators might be safe about collation, but these two
+ * seem enough to cover practical use cases.
+ */
+ if (exprInputCollation(node) != InvalidOid)
+ {
+ char *opname = get_opname(oe->opno);
+
+ if (strcmp(opname, "=") != 0 && strcmp(opname, "<>") != 0)
+ {
+ elog(DEBUG3, "expr has scalar-array operator which takes collatable as operand");
+ return true;
+ }
+ }
+
+ /* operands are checked later */
+ }
+ break;
+ case T_OpExpr:
+ /*
+ * Only built-in operators can be pushed down. In addition,
+ * underlying function must be built-in and immutable, but we don't
+ * check volatility here; such check must be done already with
+ * contain_mutable_functions.
+ */
+ {
+ OpExpr *oe = (OpExpr *) node;
+
+ if (!is_builtin(oe->opno) || !is_builtin(oe->opfuncid))
+ {
+ elog(DEBUG3, "expr has user-defined operator");
+ return true;
+ }
+
+ /*
+ * If the operator takes collatable type as operands, we push
+ * down only "=" and "<>" which are not affected by collation.
+ * Other operators might be safe about collation, but these two
+ * seem enough to cover practical use cases.
+ */
+ if (exprInputCollation(node) != InvalidOid)
+ {
+ char *opname = get_opname(oe->opno);
+
+ if (strcmp(opname, "=") != 0 && strcmp(opname, "<>") != 0)
+ {
+ elog(DEBUG3, "expr has operator which takes collatable as operand");
+ return true;
+ }
+ }
+
+ /* operands are checked later */
+ }
+ break;
+ case T_Var:
+ /*
+ * Var can be pushed down if it is in the foreign table.
+ * XXX Var of other relation can be here?
+ */
+ {
+ Var *var = (Var *) node;
+ foreign_executable_cxt *f_context;
+
+ f_context = (foreign_executable_cxt *) context;
+ if (var->varno != f_context->foreignrel->relid ||
+ var->varlevelsup != 0)
+ {
+ elog(DEBUG3, "expr has var of other relation");
+ return true;
+ }
+ }
+ break;
+ case T_ArrayRef:
+ /*
+ * ArrayRef which holds non-built-in typed elements can't be pushed
+ * down.
+ */
+ {
+ ArrayRef *ar = (ArrayRef *) node;;
+
+ if (!is_builtin(ar->refelemtype))
+ {
+ elog(DEBUG3, "expr has user-defined type as array element");
+ return true;
+ }
+
+ /* Assignment should not be in restrictions. */
+ if (ar->refassgnexpr != NULL)
+ {
+ elog(DEBUG3, "expr has assignment");
+ return true;
+ }
+ }
+ break;
+ case T_ArrayExpr:
+ /*
+ * ArrayExpr which holds non-built-in typed elements can't be pushed
+ * down.
+ */
+ {
+ if (!is_builtin(((ArrayExpr *) node)->element_typeid))
+ {
+ elog(DEBUG3, "expr has user-defined type as array element");
+ return true;
+ }
+ }
+ break;
+ default:
+ {
+ elog(DEBUG3, "expression is too complex: %s",
+ nodeToString(node));
+ return true;
+ }
+ break;
+ }
+
+ return expression_tree_walker(node, foreign_expr_walker, context);
+}
+
+/*
+ * Return true if given object is one of built-in objects.
+ */
+static bool
+is_builtin(Oid oid)
+{
+ return (oid < FirstNormalObjectId);
+}
+
+/*
+ * Deparse WHERE clause from given list of RestrictInfo and append them to buf.
+ * We assume that buf already holds a SQL statement which ends with valid WHERE
+ * clause.
+ *
+ * Only when calling the first time for a statement, is_first should be true.
+ */
+void
+appendWhereClause(StringInfo buf,
+ bool is_first,
+ List *exprs,
+ PlannerInfo *root)
+{
+ bool first = true;
+ ListCell *lc;
+
+ foreach(lc, exprs)
+ {
+ RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
+
+ /* Connect expressions with "AND" and parenthesize whole condition. */
+ if (is_first && first)
+ appendStringInfo(buf, " WHERE ");
+ else
+ appendStringInfo(buf, " AND ");
+
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, ri->clause, root);
+ appendStringInfoChar(buf, ')');
+
+ first = false;
+ }
+}
diff --git a/contrib/postgresql_fdw/expected/postgresql_fdw.out b/contrib/postgresql_fdw/expected/postgresql_fdw.out
new file mode 100644
index 0000000..58e3530
--- /dev/null
+++ b/contrib/postgresql_fdw/expected/postgresql_fdw.out
@@ -0,0 +1,715 @@
+-- ===================================================================
+-- create FDW objects
+-- ===================================================================
+-- Clean up in case a prior regression run failed
+-- Suppress NOTICE messages when roles don't exist
+SET client_min_messages TO 'error';
+DROP ROLE IF EXISTS postgresql_fdw_user;
+RESET client_min_messages;
+CREATE ROLE postgresql_fdw_user LOGIN SUPERUSER;
+SET SESSION AUTHORIZATION 'postgresql_fdw_user';
+CREATE EXTENSION postgresql_fdw;
+CREATE SERVER loopback1 FOREIGN DATA WRAPPER postgresql_fdw;
+CREATE SERVER loopback2 FOREIGN DATA WRAPPER postgresql_fdw
+ OPTIONS (dbname 'contrib_regression');
+CREATE USER MAPPING FOR public SERVER loopback1
+ OPTIONS (user 'value', password 'value');
+CREATE USER MAPPING FOR postgresql_fdw_user SERVER loopback2;
+-- ===================================================================
+-- create objects used through FDW
+-- ===================================================================
+CREATE TYPE user_enum AS ENUM ('foo', 'bar', 'buz');
+CREATE SCHEMA "S 1";
+CREATE TABLE "S 1"."T 1" (
+ "C 1" int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum,
+ CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
+);
+CREATE TABLE "S 1"."T 2" (
+ c1 int NOT NULL,
+ c2 text,
+ CONSTRAINT t2_pkey PRIMARY KEY (c1)
+);
+BEGIN;
+TRUNCATE "S 1"."T 1";
+INSERT INTO "S 1"."T 1"
+ SELECT id,
+ id % 10,
+ to_char(id, 'FM00000'),
+ '1970-01-01'::timestamptz + ((id % 100) || ' days')::interval,
+ '1970-01-01'::timestamp + ((id % 100) || ' days')::interval,
+ id % 10,
+ id % 10,
+ 'foo'::user_enum
+ FROM generate_series(1, 1000) id;
+TRUNCATE "S 1"."T 2";
+INSERT INTO "S 1"."T 2"
+ SELECT id,
+ 'AAA' || to_char(id, 'FM000')
+ FROM generate_series(1, 100) id;
+COMMIT;
+-- ===================================================================
+-- create foreign tables
+-- ===================================================================
+CREATE FOREIGN TABLE ft1 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft1 DROP COLUMN c0;
+CREATE FOREIGN TABLE ft2 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft2 DROP COLUMN c0;
+-- ===================================================================
+-- tests for postgresql_fdw_validator
+-- ===================================================================
+ALTER FOREIGN DATA WRAPPER postgresql_fdw OPTIONS (host 'value'); -- ERROR
+ERROR: invalid option "host"
+HINT: Valid options in this context are:
+-- requiressl, krbsrvname and gsslib are omitted because they depend on
+-- configure option
+ALTER SERVER loopback1 OPTIONS (
+ authtype 'value',
+ service 'value',
+ connect_timeout 'value',
+ dbname 'value',
+ host 'value',
+ hostaddr 'value',
+ port 'value',
+ --client_encoding 'value',
+ tty 'value',
+ options 'value',
+ application_name 'value',
+ --fallback_application_name 'value',
+ keepalives 'value',
+ keepalives_idle 'value',
+ keepalives_interval 'value',
+ -- requiressl 'value',
+ sslcompression 'value',
+ sslmode 'value',
+ sslcert 'value',
+ sslkey 'value',
+ sslrootcert 'value',
+ sslcrl 'value'
+ --requirepeer 'value',
+ -- krbsrvname 'value',
+ -- gsslib 'value',
+ --replication 'value'
+);
+ALTER SERVER loopback1 OPTIONS (user 'value'); -- ERROR
+ERROR: invalid option "user"
+HINT: Valid options in this context are: authtype, service, connect_timeout, dbname, host, hostaddr, port, tty, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, requiressl, sslcompression, sslmode, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, krbsrvname, gsslib
+ALTER USER MAPPING FOR public SERVER loopback1
+ OPTIONS (DROP user, DROP password);
+ALTER USER MAPPING FOR public SERVER loopback1
+ OPTIONS (host 'value'); -- ERROR
+ERROR: invalid option "host"
+HINT: Valid options in this context are: user, password
+ALTER FOREIGN TABLE ft1 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft2 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft1 OPTIONS (invalid 'value'); -- ERROR
+ERROR: invalid option "invalid"
+HINT: Valid options in this context are: nspname, relname
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (invalid 'value'); -- ERROR
+ERROR: invalid option "invalid"
+HINT: Valid options in this context are: colname
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+\dew+
+ List of foreign-data wrappers
+ Name | Owner | Handler | Validator | Access privileges | FDW Options | Description
+----------------+---------------------+------------------------+--------------------------+-------------------+-------------+-------------
+ postgresql_fdw | postgresql_fdw_user | postgresql_fdw_handler | postgresql_fdw_validator | | |
+(1 row)
+
+\des+
+ List of foreign servers
+ Name | Owner | Foreign-data wrapper | Access privileges | Type | Version | FDW Options | Description
+-----------+---------------------+----------------------+-------------------+------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
+ loopback1 | postgresql_fdw_user | postgresql_fdw | | | | (authtype 'value', service 'value', connect_timeout 'value', dbname 'value', host 'value', hostaddr 'value', port 'value', tty 'value', options 'value', application_name 'value', keepalives 'value', keepalives_idle 'value', keepalives_interval 'value', sslcompression 'value', sslmode 'value', sslcert 'value', sslkey 'value', sslrootcert 'value', sslcrl 'value') |
+ loopback2 | postgresql_fdw_user | postgresql_fdw | | | | (dbname 'contrib_regression') |
+(2 rows)
+
+\deu+
+ List of user mappings
+ Server | User name | FDW Options
+-----------+---------------------+-------------
+ loopback1 | public |
+ loopback2 | postgresql_fdw_user |
+(2 rows)
+
+\det+
+ List of foreign tables
+ Schema | Table | Server | FDW Options | Description
+--------+-------+-----------+--------------------------------+-------------
+ public | ft1 | loopback2 | (nspname 'S 1', relname 'T 1') |
+ public | ft2 | loopback2 | (nspname 'S 1', relname 'T 1') |
+(2 rows)
+
+-- Use only Nested loop for stable results.
+SET enable_mergejoin TO off;
+SET enable_hashjoin TO off;
+-- ===================================================================
+-- simple queries
+-- ===================================================================
+-- single table, with/without alias
+EXPLAIN (COSTS false) SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Limit
+ -> Sort
+ Sort Key: c3, c1
+ -> Foreign Scan on ft1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(5 rows)
+
+SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+-----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+ 102 | 2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
+ 103 | 3 | 00103 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
+ 104 | 4 | 00104 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo
+ 105 | 5 | 00105 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5 | foo
+ 106 | 6 | 00106 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6 | foo
+ 107 | 7 | 00107 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7 | foo
+ 108 | 8 | 00108 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8 | foo
+ 109 | 9 | 00109 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9 | foo
+ 110 | 0 | 00110 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0 | foo
+(10 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Limit
+ -> Sort
+ Sort Key: c3, c1
+ -> Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(5 rows)
+
+SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+-----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+ 102 | 2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
+ 103 | 3 | 00103 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
+ 104 | 4 | 00104 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo
+ 105 | 5 | 00105 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5 | foo
+ 106 | 6 | 00106 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6 | foo
+ 107 | 7 | 00107 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7 | foo
+ 108 | 8 | 00108 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8 | foo
+ 109 | 9 | 00109 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9 | foo
+ 110 | 0 | 00110 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0 | foo
+(10 rows)
+
+-- empty result
+SELECT * FROM ft1 WHERE false;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+----+----+----+----+----+----
+(0 rows)
+
+-- with WHERE clause
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Filter: (c7 >= '1'::bpchar)
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 101)) AND (((c6)::text OPERATOR(pg_catalog.=) '1'::text))
+(3 rows)
+
+SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+-----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+(1 row)
+
+-- aggregate
+SELECT COUNT(*) FROM ft1 t1;
+ count
+-------
+ 1000
+(1 row)
+
+-- join two tables
+SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ c1
+-----
+ 101
+ 102
+ 103
+ 104
+ 105
+ 106
+ 107
+ 108
+ 109
+ 110
+(10 rows)
+
+-- subquery
+SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+ 2 | 2 | 00002 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
+ 3 | 3 | 00003 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
+ 4 | 4 | 00004 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo
+ 5 | 5 | 00005 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5 | foo
+ 6 | 6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6 | foo
+ 7 | 7 | 00007 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7 | foo
+ 8 | 8 | 00008 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8 | foo
+ 9 | 9 | 00009 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9 | foo
+ 10 | 0 | 00010 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0 | foo
+(10 rows)
+
+-- subquery+MAX
+SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY c1;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+------+----+-------+------------------------------+--------------------------+----+------------+-----
+ 1000 | 0 | 01000 | Thu Jan 01 00:00:00 1970 PST | Thu Jan 01 00:00:00 1970 | 0 | 0 | foo
+(1 row)
+
+-- used in CTE
+WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
+ c1 | c2 | c3 | c4
+----+----+-------+------------------------------
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST
+ 2 | 2 | 00002 | Sat Jan 03 00:00:00 1970 PST
+ 3 | 3 | 00003 | Sun Jan 04 00:00:00 1970 PST
+ 4 | 4 | 00004 | Mon Jan 05 00:00:00 1970 PST
+ 5 | 5 | 00005 | Tue Jan 06 00:00:00 1970 PST
+ 6 | 6 | 00006 | Wed Jan 07 00:00:00 1970 PST
+ 7 | 7 | 00007 | Thu Jan 08 00:00:00 1970 PST
+ 8 | 8 | 00008 | Fri Jan 09 00:00:00 1970 PST
+ 9 | 9 | 00009 | Sat Jan 10 00:00:00 1970 PST
+ 10 | 0 | 00010 | Sun Jan 11 00:00:00 1970 PST
+(10 rows)
+
+-- fixed values
+SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+ ?column? | ?column?
+----------+----------
+ fixed |
+(1 row)
+
+-- user-defined operator/function
+CREATE FUNCTION postgresql_fdw_abs(int) RETURNS int AS $$
+BEGIN
+RETURN abs($1);
+END
+$$ LANGUAGE plpgsql IMMUTABLE;
+CREATE OPERATOR === (
+ LEFTARG = int,
+ RIGHTARG = int,
+ PROCEDURE = int4eq,
+ COMMUTATOR = ===,
+ NEGATOR = !==
+);
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = postgresql_fdw_abs(t1.c2);
+ QUERY PLAN
+-------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Filter: (c1 = postgresql_fdw_abs(c2))
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(3 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 === t1.c2;
+ QUERY PLAN
+-------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Filter: (c1 === c2)
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(3 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = abs(t1.c2);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) pg_catalog.abs(c2)))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = t1.c2;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) c2))
+(2 rows)
+
+-- ===================================================================
+-- WHERE push down
+-- ===================================================================
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1; -- Var, OpExpr(b), Const
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 100)) AND ((c2 OPERATOR(pg_catalog.=) 0))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NULL; -- NullTest
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NULL))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL; -- NullTest
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NOT NULL))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((pg_catalog.round(pg_catalog.abs("C 1"), 0) OPERATOR(pg_catalog.=) 1::numeric))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = -c1; -- OpExpr(l)
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) (OPERATOR(pg_catalog.-) "C 1")))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE 1 = c1!; -- OpExpr(r)
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((1::numeric OPERATOR(pg_catalog.=) ("C 1" OPERATOR(pg_catalog.!))))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NOT NULL) IS DISTINCT FROM ("C 1" IS NOT NULL))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) ANY (ARRAY[c2, 1, ("C 1" OPERATOR(pg_catalog.+) 0)])))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = (ARRAY[c1,c2,3])[1]; -- ArrayRef
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) ((ARRAY["C 1", c2, 3])[1])))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo'; -- no push-down
+ QUERY PLAN
+-------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Filter: (c8 = 'foo'::user_enum)
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(3 rows)
+
+-- ===================================================================
+-- parameterized queries
+-- ===================================================================
+-- simple join
+PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
+EXPLAIN (COSTS false) EXECUTE st1(1, 2);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------
+ Nested Loop
+ -> Foreign Scan on ft1 t1
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+ -> Foreign Scan on ft2 t2
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 2))
+(5 rows)
+
+EXECUTE st1(1, 1);
+ c3 | c3
+-------+-------
+ 00001 | 00001
+(1 row)
+
+EXECUTE st1(101, 101);
+ c3 | c3
+-------+-------
+ 00101 | 00101
+(1 row)
+
+-- subquery using stable function (can't be pushed down)
+PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c4) = 6) ORDER BY c1;
+EXPLAIN (COSTS false) EXECUTE st2(10, 20);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
+ Sort Key: t1.c1
+ -> Nested Loop Semi Join
+ Join Filter: (t1.c3 = t2.c3)
+ -> Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.<) 20))
+ -> Materialize
+ -> Foreign Scan on ft2 t2
+ Filter: (date_part('dow'::text, c4) = 6::double precision)
+ Remote SQL: SELECT NULL, NULL, c3, c4, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.>) 10))
+(10 rows)
+
+EXECUTE st2(10, 20);
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 16 | 6 | 00016 | Sat Jan 17 00:00:00 1970 PST | Sat Jan 17 00:00:00 1970 | 6 | 6 | foo
+(1 row)
+
+EXECUTE st1(101, 101);
+ c3 | c3
+-------+-------
+ 00101 | 00101
+(1 row)
+
+-- subquery using immutable function (can be pushed down)
+PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c5) = 6) ORDER BY c1;
+EXPLAIN (COSTS false) EXECUTE st3(10, 20);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
+ Sort Key: t1.c1
+ -> Nested Loop Semi Join
+ Join Filter: (t1.c3 = t2.c3)
+ -> Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.<) 20))
+ -> Materialize
+ -> Foreign Scan on ft2 t2
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.>) 10)) AND ((pg_catalog.date_part('dow'::text, c5) OPERATOR(pg_catalog.=) 6::double precision))
+(9 rows)
+
+EXECUTE st3(10, 20);
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 16 | 6 | 00016 | Sat Jan 17 00:00:00 1970 PST | Sat Jan 17 00:00:00 1970 | 6 | 6 | foo
+(1 row)
+
+EXECUTE st3(20, 30);
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 23 | 3 | 00023 | Sat Jan 24 00:00:00 1970 PST | Sat Jan 24 00:00:00 1970 | 3 | 3 | foo
+(1 row)
+
+-- custom plan should be chosen
+PREPARE st4(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 = $1;
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) $1))
+(2 rows)
+
+-- cleanup
+DEALLOCATE st1;
+DEALLOCATE st2;
+DEALLOCATE st3;
+DEALLOCATE st4;
+-- ===================================================================
+-- used in pl/pgsql function
+-- ===================================================================
+CREATE OR REPLACE FUNCTION f_test(p_c1 int) RETURNS int AS $$
+DECLARE
+ v_c1 int;
+BEGIN
+ SELECT c1 INTO v_c1 FROM ft1 WHERE c1 = p_c1 LIMIT 1;
+ PERFORM c1 FROM ft1 WHERE c1 = p_c1 AND p_c1 = v_c1 LIMIT 1;
+ RETURN v_c1;
+END;
+$$ LANGUAGE plpgsql;
+SELECT f_test(100);
+ f_test
+--------
+ 100
+(1 row)
+
+DROP FUNCTION f_test(int);
+-- ===================================================================
+-- connection management
+-- ===================================================================
+SELECT srvname, usename FROM postgresql_fdw_connections;
+ srvname | usename
+-----------+---------------------
+ loopback2 | postgresql_fdw_user
+(1 row)
+
+SELECT postgresql_fdw_disconnect(srvid, usesysid) FROM postgresql_fdw_get_connections();
+ postgresql_fdw_disconnect
+---------------------------
+ OK
+(1 row)
+
+SELECT srvname, usename FROM postgresql_fdw_connections;
+ srvname | usename
+---------+---------
+(0 rows)
+
+-- ===================================================================
+-- conversion error
+-- ===================================================================
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE int;
+SELECT * FROM ft1 WHERE c1 = 1; -- ERROR
+ERROR: invalid input syntax for integer: "1970-01-02 00:00:00"
+CONTEXT: column c5 of foreign table ft1
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE timestamp;
+-- ===================================================================
+-- subtransaction
+-- + local/remote error doesn't break cursor
+-- + remote error discards connection
+-- ===================================================================
+BEGIN;
+DECLARE c CURSOR FOR SELECT * FROM ft1 ORDER BY c1;
+FETCH c;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+(1 row)
+
+SAVEPOINT s;
+ERROR OUT; -- ERROR
+ERROR: syntax error at or near "ERROR"
+LINE 1: ERROR OUT;
+ ^
+ROLLBACK TO s;
+SELECT srvname FROM postgresql_fdw_connections;
+ srvname
+-----------
+ loopback2
+(1 row)
+
+FETCH c;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 2 | 2 | 00002 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
+(1 row)
+
+SAVEPOINT s;
+SELECT * FROM ft1 WHERE 1 / (c1 - 1) > 0; -- ERROR
+ERROR: could not execute remote query
+DETAIL: ERROR: division by zero
+
+HINT: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (((1 OPERATOR(pg_catalog./) ("C 1" OPERATOR(pg_catalog.-) 1)) OPERATOR(pg_catalog.>) 0))
+ROLLBACK TO s;
+SELECT srvname FROM postgresql_fdw_connections;
+ srvname
+---------
+(0 rows)
+
+FETCH c;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 3 | 3 | 00003 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
+(1 row)
+
+SELECT * FROM ft1 ORDER BY c1 LIMIT 1;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+(1 row)
+
+COMMIT;
+SELECT srvname FROM postgresql_fdw_connections;
+ srvname
+-----------
+ loopback2
+(1 row)
+
+ERROR OUT; -- ERROR
+ERROR: syntax error at or near "ERROR"
+LINE 1: ERROR OUT;
+ ^
+SELECT srvname FROM postgresql_fdw_connections;
+ srvname
+---------
+(0 rows)
+
+-- ===================================================================
+-- cleanup
+-- ===================================================================
+DROP OPERATOR === (int, int) CASCADE;
+DROP OPERATOR !== (int, int) CASCADE;
+DROP FUNCTION postgresql_fdw_abs(int);
+DROP SCHEMA "S 1" CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to table "S 1"."T 1"
+drop cascades to table "S 1"."T 2"
+DROP TYPE user_enum CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to foreign table ft1 column c8
+drop cascades to foreign table ft2 column c8
+DROP EXTENSION postgresql_fdw CASCADE;
+NOTICE: drop cascades to 6 other objects
+DETAIL: drop cascades to server loopback1
+drop cascades to user mapping for public
+drop cascades to server loopback2
+drop cascades to user mapping for postgresql_fdw_user
+drop cascades to foreign table ft1
+drop cascades to foreign table ft2
+\c
+DROP ROLE postgresql_fdw_user;
diff --git a/contrib/postgresql_fdw/option.c b/contrib/postgresql_fdw/option.c
new file mode 100644
index 0000000..9e1f0e2
--- /dev/null
+++ b/contrib/postgresql_fdw/option.c
@@ -0,0 +1,222 @@
+/*-------------------------------------------------------------------------
+ *
+ * option.c
+ * FDW option handling
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgresql_fdw/option.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/reloptions.h"
+#include "catalog/pg_foreign_data_wrapper.h"
+#include "catalog/pg_foreign_server.h"
+#include "catalog/pg_foreign_table.h"
+#include "catalog/pg_user_mapping.h"
+#include "commands/defrem.h"
+#include "fmgr.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+
+#include "postgresql_fdw.h"
+
+/*
+ * SQL functions
+ */
+extern Datum postgresql_fdw_validator(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgresql_fdw_validator);
+
+/*
+ * Describes the valid options for objects that use this wrapper.
+ */
+typedef struct PgsqlFdwOption
+{
+ const char *optname;
+ Oid optcontext; /* Oid of catalog in which options may appear */
+ bool is_libpq_opt; /* true if it's used in libpq */
+} PgsqlFdwOption;
+
+/*
+ * Valid options for postgresql_fdw.
+ */
+static PgsqlFdwOption valid_options[] = {
+
+ /*
+ * Options for libpq connection.
+ * Note: This list should be updated along with PQconninfoOptions in
+ * interfaces/libpq/fe-connect.c, so the order is kept as is.
+ *
+ * Some useless libpq connection options are not accepted by postgresql_fdw:
+ * client_encoding: set to local database encoding automatically
+ * fallback_application_name: fixed to "postgresql_fdw"
+ * replication: postgresql_fdw never be replication client
+ */
+ {"authtype", ForeignServerRelationId, true},
+ {"service", ForeignServerRelationId, true},
+ {"user", UserMappingRelationId, true},
+ {"password", UserMappingRelationId, true},
+ {"connect_timeout", ForeignServerRelationId, true},
+ {"dbname", ForeignServerRelationId, true},
+ {"host", ForeignServerRelationId, true},
+ {"hostaddr", ForeignServerRelationId, true},
+ {"port", ForeignServerRelationId, true},
+#ifdef NOT_USED
+ {"client_encoding", ForeignServerRelationId, true},
+#endif
+ {"tty", ForeignServerRelationId, true},
+ {"options", ForeignServerRelationId, true},
+ {"application_name", ForeignServerRelationId, true},
+#ifdef NOT_USED
+ {"fallback_application_name", ForeignServerRelationId, true},
+#endif
+ {"keepalives", ForeignServerRelationId, true},
+ {"keepalives_idle", ForeignServerRelationId, true},
+ {"keepalives_interval", ForeignServerRelationId, true},
+ {"keepalives_count", ForeignServerRelationId, true},
+ {"requiressl", ForeignServerRelationId, true},
+ {"sslcompression", ForeignServerRelationId, true},
+ {"sslmode", ForeignServerRelationId, true},
+ {"sslcert", ForeignServerRelationId, true},
+ {"sslkey", ForeignServerRelationId, true},
+ {"sslrootcert", ForeignServerRelationId, true},
+ {"sslcrl", ForeignServerRelationId, true},
+ {"requirepeer", ForeignServerRelationId, true},
+ {"krbsrvname", ForeignServerRelationId, true},
+ {"gsslib", ForeignServerRelationId, true},
+#ifdef NOT_USED
+ {"replication", ForeignServerRelationId, true},
+#endif
+
+ /*
+ * Options for translation of object names.
+ */
+ {"nspname", ForeignTableRelationId, false},
+ {"relname", ForeignTableRelationId, false},
+ {"colname", AttributeRelationId, false},
+
+ /* Terminating entry --- MUST BE LAST */
+ {NULL, InvalidOid, false}
+};
+
+/*
+ * Helper functions
+ */
+static bool is_valid_option(const char *optname, Oid context);
+
+/*
+ * Validate the generic options given to a FOREIGN DATA WRAPPER, SERVER,
+ * USER MAPPING or FOREIGN TABLE that uses postgresql_fdw.
+ *
+ * Raise an ERROR if the option or its value is considered invalid.
+ */
+Datum
+postgresql_fdw_validator(PG_FUNCTION_ARGS)
+{
+ List *options_list = untransformRelOptions(PG_GETARG_DATUM(0));
+ Oid catalog = PG_GETARG_OID(1);
+ ListCell *cell;
+
+ /*
+ * Check that only options supported by postgresql_fdw, and allowed for the
+ * current object type, are given.
+ */
+ foreach(cell, options_list)
+ {
+ DefElem *def = (DefElem *) lfirst(cell);
+
+ if (!is_valid_option(def->defname, catalog))
+ {
+ PgsqlFdwOption *opt;
+ StringInfoData buf;
+
+ /*
+ * Unknown option specified, complain about it. Provide a hint
+ * with list of valid options for the object.
+ */
+ initStringInfo(&buf);
+ for (opt = valid_options; opt->optname; opt++)
+ {
+ if (catalog == opt->optcontext)
+ appendStringInfo(&buf, "%s%s", (buf.len > 0) ? ", " : "",
+ opt->optname);
+ }
+
+ ereport(ERROR,
+ (errcode(ERRCODE_FDW_INVALID_OPTION_NAME),
+ errmsg("invalid option \"%s\"", def->defname),
+ errhint("Valid options in this context are: %s",
+ buf.data)));
+ }
+ }
+
+ /*
+ * We don't care option-specific limitation here; they will be validated at
+ * the execution time.
+ */
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Check whether the given option is one of the valid postgresql_fdw options.
+ * context is the Oid of the catalog holding the object the option is for.
+ */
+static bool
+is_valid_option(const char *optname, Oid context)
+{
+ PgsqlFdwOption *opt;
+
+ for (opt = valid_options; opt->optname; opt++)
+ {
+ if (context == opt->optcontext && strcmp(opt->optname, optname) == 0)
+ return true;
+ }
+ return false;
+}
+
+/*
+ * Check whether the given option is one of the valid libpq options.
+ * context is the Oid of the catalog holding the object the option is for.
+ */
+static bool
+is_libpq_option(const char *optname)
+{
+ PgsqlFdwOption *opt;
+
+ for (opt = valid_options; opt->optname; opt++)
+ {
+ if (strcmp(opt->optname, optname) == 0 && opt->is_libpq_opt)
+ return true;
+ }
+ return false;
+}
+
+/*
+ * Generate key-value arrays which includes only libpq options from the list
+ * which contains any kind of options.
+ */
+int
+ExtractConnectionOptions(List *defelems, const char **keywords, const char **values)
+{
+ ListCell *lc;
+ int i;
+
+ i = 0;
+ foreach(lc, defelems)
+ {
+ DefElem *d = (DefElem *) lfirst(lc);
+ if (is_libpq_option(d->defname))
+ {
+ keywords[i] = d->defname;
+ values[i] = defGetString(d);
+ i++;
+ }
+ }
+ return i;
+}
+
diff --git a/contrib/postgresql_fdw/postgresql_fdw--1.0.sql b/contrib/postgresql_fdw/postgresql_fdw--1.0.sql
new file mode 100644
index 0000000..965cb85
--- /dev/null
+++ b/contrib/postgresql_fdw/postgresql_fdw--1.0.sql
@@ -0,0 +1,39 @@
+/* contrib/postgresql_fdw/postgresql_fdw--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION postgresql_fdw" to load this file. \quit
+
+CREATE FUNCTION postgresql_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION postgresql_fdw_validator(text[], oid)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER postgresql_fdw
+ HANDLER postgresql_fdw_handler
+ VALIDATOR postgresql_fdw_validator;
+
+/* connection management functions and view */
+CREATE FUNCTION postgresql_fdw_get_connections(out srvid oid, out usesysid oid)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION postgresql_fdw_disconnect(oid, oid)
+RETURNS text
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE VIEW postgresql_fdw_connections AS
+SELECT c.srvid srvid,
+ s.srvname srvname,
+ c.usesysid usesysid,
+ pg_get_userbyid(c.usesysid) usename
+ FROM postgresql_fdw_get_connections() c
+ JOIN pg_catalog.pg_foreign_server s ON (s.oid = c.srvid);
+GRANT SELECT ON postgresql_fdw_connections TO public;
+
diff --git a/contrib/postgresql_fdw/postgresql_fdw.c b/contrib/postgresql_fdw/postgresql_fdw.c
new file mode 100644
index 0000000..a023cb8
--- /dev/null
+++ b/contrib/postgresql_fdw/postgresql_fdw.c
@@ -0,0 +1,1370 @@
+/*-------------------------------------------------------------------------
+ *
+ * postgresql_fdw.c
+ * foreign-data wrapper for remote PostgreSQL servers.
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgresql_fdw/postgresql_fdw.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "fmgr.h"
+
+#include "access/htup_details.h"
+#include "catalog/pg_foreign_server.h"
+#include "catalog/pg_foreign_table.h"
+#include "catalog/pg_type.h"
+#include "commands/defrem.h"
+#include "commands/explain.h"
+#include "commands/vacuum.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+#include "postgresql_fdw.h"
+#include "connection.h"
+
+PG_MODULE_MAGIC;
+
+/*
+ * Cost to establish a connection.
+ * XXX: should be configurable per server?
+ */
+#define CONNECTION_COSTS 100.0
+
+/*
+ * Cost to transfer 1 byte from remote server.
+ * XXX: should be configurable per server?
+ */
+#define TRANSFER_COSTS_PER_BYTE 0.001
+
+/*
+ * FDW-specific information for RelOptInfo.fdw_private. This is used to pass
+ * information from pgsqlGetForeignRelSize to pgsqlGetForeignPaths.
+ */
+typedef struct PgsqlFdwPlanState {
+ /*
+ * These are generated in GetForeignRelSize, and also used in subsequent
+ * GetForeignPaths.
+ */
+ StringInfoData sql;
+ Cost startup_cost;
+ Cost total_cost;
+ List *remote_conds;
+ List *param_conds;
+ List *local_conds;
+
+ /* Cached catalog information. */
+ ForeignTable *table;
+ ForeignServer *server;
+} PgsqlFdwPlanState;
+
+/*
+ * Index of FDW-private information stored in fdw_private list.
+ *
+ * We store various information in ForeignScan.fdw_private to pass them beyond
+ * the boundary between planner and executor. Finally FdwPlan holds items
+ * below:
+ *
+ * 1) plain SELECT statement
+ *
+ * These items are indexed with the enum FdwPrivateIndex, so an item
+ * can be accessed directly via list_nth(). For example of SELECT statement:
+ * sql = list_nth(fdw_private, FdwPrivateSelectSql)
+ */
+enum FdwPrivateIndex {
+ /* SQL statements */
+ FdwPrivateSelectSql,
+
+ /* # of elements stored in the list fdw_private */
+ FdwPrivateNum,
+};
+
+/*
+ * Describe the attribute where data conversion fails.
+ */
+typedef struct ErrorPos {
+ Oid relid; /* oid of the foreign table */
+ AttrNumber cur_attno; /* attribute number under process */
+} ErrorPos;
+
+/*
+ * Describes an execution state of a foreign scan against a foreign table
+ * using postgresql_fdw.
+ */
+typedef struct PgsqlFdwExecutionState
+{
+ List *fdw_private; /* FDW-private information */
+
+ /* for remote query execution */
+ PGconn *conn; /* connection for the scan */
+ Oid *param_types; /* type array of external parameter */
+ const char **param_values; /* value array of external parameter */
+
+ /* for tuple generation. */
+ AttrNumber attnum; /* # of non-dropped attribute */
+ Datum *values; /* column value buffer */
+ bool *nulls; /* column null indicator buffer */
+ AttInMetadata *attinmeta; /* attribute metadata */
+
+ /* for storing result tuples */
+ MemoryContext scan_cxt; /* context for per-scan lifespan data */
+ MemoryContext temp_cxt; /* context for per-tuple temporary data */
+ Tuplestorestate *tuples; /* result of the scan */
+
+ /* for error handling. */
+ ErrorPos errpos;
+} PgsqlFdwExecutionState;
+
+/*
+ * Describes a state of analyze request for a foreign table.
+ */
+typedef struct PgsqlAnalyzeState
+{
+ /* for tuple generation. */
+ TupleDesc tupdesc;
+ AttInMetadata *attinmeta;
+ Datum *values;
+ bool *nulls;
+
+ /* for random sampling */
+ HeapTuple *rows; /* result buffer */
+ int targrows; /* target # of sample rows */
+ int numrows; /* # of samples collected */
+ double samplerows; /* # of rows fetched */
+ double rowstoskip; /* # of rows skipped before next sample */
+ double rstate; /* random state */
+
+ /* for storing result tuples */
+ MemoryContext anl_cxt; /* context for per-analyze lifespan data */
+ MemoryContext temp_cxt; /* context for per-tuple temporary data */
+
+ /* for error handling. */
+ ErrorPos errpos;
+} PgsqlAnalyzeState;
+
+/*
+ * SQL functions
+ */
+extern Datum postgresql_fdw_handler(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgresql_fdw_handler);
+
+/*
+ * FDW callback routines
+ */
+static void pgsqlGetForeignRelSize(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid);
+static void pgsqlGetForeignPaths(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid);
+static ForeignScan *pgsqlGetForeignPlan(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid,
+ ForeignPath *best_path,
+ List *tlist,
+ List *scan_clauses);
+static void pgsqlExplainForeignScan(ForeignScanState *node, ExplainState *es);
+static void pgsqlBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *pgsqlIterateForeignScan(ForeignScanState *node);
+static void pgsqlReScanForeignScan(ForeignScanState *node);
+static void pgsqlEndForeignScan(ForeignScanState *node);
+static bool pgsqlAnalyzeForeignTable(Relation relation,
+ AcquireSampleRowsFunc *func,
+ BlockNumber *totalpages);
+
+/*
+ * Helper functions
+ */
+static void get_remote_estimate(const char *sql,
+ PGconn *conn,
+ double *rows,
+ int *width,
+ Cost *startup_cost,
+ Cost *total_cost);
+static void adjust_costs(double rows, int width,
+ Cost *startup_cost, Cost *total_cost);
+static void execute_query(ForeignScanState *node);
+static void query_row_processor(PGresult *res, ForeignScanState *node,
+ bool first);
+static void analyze_row_processor(PGresult *res, PgsqlAnalyzeState *astate,
+ bool first);
+static void postgresql_fdw_error_callback(void *arg);
+static int pgsqlAcquireSampleRowsFunc(Relation relation, int elevel,
+ HeapTuple *rows, int targrows,
+ double *totalrows,
+ double *totaldeadrows);
+
+/* Exported functions, but not written in postgresql_fdw.h. */
+void _PG_init(void);
+void _PG_fini(void);
+
+/*
+ * Module-specific initialization.
+ */
+void
+_PG_init(void)
+{
+}
+
+/*
+ * Module-specific clean up.
+ */
+void
+_PG_fini(void)
+{
+}
+
+/*
+ * Foreign-data wrapper handler function: return a struct with pointers
+ * to my callback routines.
+ */
+Datum
+postgresql_fdw_handler(PG_FUNCTION_ARGS)
+{
+ FdwRoutine *routine = makeNode(FdwRoutine);
+
+ /* Required handler functions. */
+ routine->GetForeignRelSize = pgsqlGetForeignRelSize;
+ routine->GetForeignPaths = pgsqlGetForeignPaths;
+ routine->GetForeignPlan = pgsqlGetForeignPlan;
+ routine->ExplainForeignScan = pgsqlExplainForeignScan;
+ routine->BeginForeignScan = pgsqlBeginForeignScan;
+ routine->IterateForeignScan = pgsqlIterateForeignScan;
+ routine->ReScanForeignScan = pgsqlReScanForeignScan;
+ routine->EndForeignScan = pgsqlEndForeignScan;
+
+ /* Optional handler functions. */
+ routine->AnalyzeForeignTable = pgsqlAnalyzeForeignTable;
+
+ PG_RETURN_POINTER(routine);
+}
+
+/*
+ * pgsqlGetForeignRelSize
+ * Estimate # of rows and width of the result of the scan
+ *
+ * Here we estimate number of rows returned by the scan in two steps. In the
+ * first step, we execute remote EXPLAIN command to obtain the number of rows
+ * returned from remote side. In the second step, we calculate the selectivity
+ * of the filtering done on local side, and modify first estimate.
+ *
+ * We have to get some catalog objects and generate remote query string here,
+ * so we store such expensive information in FDW private area of RelOptInfo and
+ * pass them to subsequent functions for reuse.
+ */
+static void
+pgsqlGetForeignRelSize(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid)
+{
+ PgsqlFdwPlanState *fpstate;
+ StringInfo sql;
+ ForeignTable *table;
+ ForeignServer *server;
+ UserMapping *user;
+ PGconn *conn;
+ double rows;
+ int width;
+ Cost startup_cost;
+ Cost total_cost;
+ List *remote_conds = NIL;
+ List *param_conds = NIL;
+ List *local_conds = NIL;
+ Selectivity sel;
+
+ /*
+ * We use PgsqlFdwPlanState to pass various information to subsequent
+ * functions.
+ */
+ fpstate = palloc0(sizeof(PgsqlFdwPlanState));
+ initStringInfo(&fpstate->sql);
+ sql = &fpstate->sql;
+
+ /* Retrieve catalog objects which are necessary to estimate rows. */
+ table = GetForeignTable(foreigntableid);
+ server = GetForeignServer(table->serverid);
+ user = GetUserMapping(GetOuterUserId(), server->serverid);
+
+ /*
+ * Construct remote query which consists of SELECT, FROM, and WHERE
+ * clauses, but conditions contain any Param node are excluded because
+ * placeholder can't be used in EXPLAIN statement. Such conditions are
+ * appended later.
+ */
+ classifyConditions(root, baserel, &remote_conds, ¶m_conds,
+ &local_conds);
+ deparseSimpleSql(sql, root, baserel, local_conds);
+ if (list_length(remote_conds) > 0)
+ appendWhereClause(sql, true, remote_conds, root);
+ elog(DEBUG3, "Query SQL: %s", sql->data);
+
+ conn = GetConnection(server, user, false);
+ get_remote_estimate(sql->data, conn, &rows, &width,
+ &startup_cost, &total_cost);
+ ReleaseConnection(conn);
+ if (list_length(param_conds) > 0)
+ appendWhereClause(sql, !(list_length(remote_conds) > 0), param_conds,
+ root);
+
+ /*
+ * Estimate selectivity of conditions which are not used in remote EXPLAIN
+ * by calling clauselist_selectivity(). The best we can do for
+ * parameterized condition is to estimate selectivity on the basis of local
+ * statistics. When we actually obtain result rows, such conditions are
+ * deparsed into remote query and reduce rows transferred.
+ */
+ sel = 1.0;
+ sel *= clauselist_selectivity(root, param_conds,
+ baserel->relid, JOIN_INNER, NULL);
+ sel *= clauselist_selectivity(root, local_conds,
+ baserel->relid, JOIN_INNER, NULL);
+ baserel->rows = rows * sel;
+ baserel->width = width;
+
+ /*
+ * Pack obtained information into a object and store it in FDW-private area
+ * of RelOptInfo to pass them to subsequent functions.
+ */
+ fpstate->startup_cost = startup_cost;
+ fpstate->total_cost = total_cost;
+ fpstate->remote_conds = remote_conds;
+ fpstate->param_conds = param_conds;
+ fpstate->local_conds = local_conds;
+ fpstate->table = table;
+ fpstate->server = server;
+ baserel->fdw_private = (void *) fpstate;
+}
+
+/*
+ * pgsqlGetForeignPaths
+ * Create possible scan paths for a scan on the foreign table
+ */
+static void
+pgsqlGetForeignPaths(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid)
+{
+ PgsqlFdwPlanState *fpstate = (PgsqlFdwPlanState *) baserel->fdw_private;
+ ForeignPath *path;
+ Cost startup_cost;
+ Cost total_cost;
+ List *fdw_private;
+
+ /*
+ * We have cost values which are estimated on remote side, so adjust them
+ * for better estimate which respect various stuffs to complete the scan,
+ * such as sending query, transferring result, and local filtering.
+ *
+ * XXX We assume that remote cost factors are same as local, but it might
+ * be worth to make configurable.
+ */
+ startup_cost = fpstate->startup_cost;
+ total_cost = fpstate->total_cost;
+ adjust_costs(baserel->rows, baserel->width, &startup_cost, &total_cost);
+
+ /* Pass SQL statement from planner to executor through FDW private area. */
+ fdw_private = list_make1(makeString(fpstate->sql.data));
+
+ /*
+ * Create simplest ForeignScan path node and add it to baserel. This path
+ * corresponds to SeqScan path of regular tables.
+ */
+ path = create_foreignscan_path(root, baserel,
+ baserel->rows,
+ startup_cost,
+ total_cost,
+ NIL, /* no pathkeys */
+ NULL, /* no outer rel either */
+ fdw_private);
+ add_path(baserel, (Path *) path);
+
+ /*
+ * XXX We can consider sorted path or parameterized path here if we know
+ * that foreign table is indexed on remote end. For this purpose, we
+ * might have to support FOREIGN INDEX to represent possible sets of sort
+ * keys and/or filtering.
+ */
+}
+
+/*
+ * pgsqlGetForeignPlan
+ * Create ForeignScan plan node which implements selected best path
+ */
+static ForeignScan *
+pgsqlGetForeignPlan(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid,
+ ForeignPath *best_path,
+ List *tlist,
+ List *scan_clauses)
+{
+ PgsqlFdwPlanState *fpstate = (PgsqlFdwPlanState *) baserel->fdw_private;
+ Index scan_relid = baserel->relid;
+ List *fdw_private = NIL;
+ List *fdw_exprs = NIL;
+ List *local_exprs = NIL;
+ ListCell *lc;
+
+ /*
+ * We need lists of Expr other than the lists of RestrictInfo. Now we can
+ * merge remote_conds and param_conds into fdw_exprs, because they are
+ * evaluated on remote side for actual remote query.
+ */
+ foreach(lc, fpstate->remote_conds)
+ fdw_exprs = lappend(fdw_exprs, ((RestrictInfo *) lfirst(lc))->clause);
+ foreach(lc, fpstate->param_conds)
+ fdw_exprs = lappend(fdw_exprs, ((RestrictInfo *) lfirst(lc))->clause);
+ foreach(lc, fpstate->local_conds)
+ local_exprs = lappend(local_exprs,
+ ((RestrictInfo *) lfirst(lc))->clause);
+
+ /*
+ * Make a list contains SELECT statement to it to executor with plan node
+ * for later use.
+ */
+ fdw_private = lappend(fdw_private, makeString(fpstate->sql.data));
+
+ /*
+ * Create the ForeignScan node from target list, local filtering
+ * expressions, remote filtering expressions, and FDW private information.
+ *
+ * We remove expressions which are evaluated on remote side from qual of
+ * the scan node to avoid redundant filtering. Such filter reduction
+ * can be done only here, done after choosing best path, because
+ * baserestrictinfo in RelOptInfo is shared by all possible paths until
+ * best path is chosen.
+ */
+ return make_foreignscan(tlist,
+ local_exprs,
+ scan_relid,
+ fdw_exprs,
+ fdw_private);
+}
+
+/*
+ * pgsqlExplainForeignScan
+ * Produce extra output for EXPLAIN
+ */
+static void
+pgsqlExplainForeignScan(ForeignScanState *node, ExplainState *es)
+{
+ List *fdw_private;
+ char *sql;
+
+ fdw_private = ((ForeignScan *) node->ss.ps.plan)->fdw_private;
+ sql = strVal(list_nth(fdw_private, FdwPrivateSelectSql));
+ ExplainPropertyText("Remote SQL", sql, es);
+}
+
+/*
+ * pgsqlBeginForeignScan
+ * Initiate access to a foreign PostgreSQL table.
+ */
+static void
+pgsqlBeginForeignScan(ForeignScanState *node, int eflags)
+{
+ PgsqlFdwExecutionState *festate;
+ PGconn *conn;
+ Oid relid;
+ ForeignTable *table;
+ ForeignServer *server;
+ UserMapping *user;
+ TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+ /*
+ * Do nothing in EXPLAIN (no ANALYZE) case. node->fdw_state stays NULL.
+ */
+ if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+ return;
+
+ /*
+ * Save state in node->fdw_state.
+ */
+ festate = (PgsqlFdwExecutionState *) palloc(sizeof(PgsqlFdwExecutionState));
+ festate->fdw_private = ((ForeignScan *) node->ss.ps.plan)->fdw_private;
+
+ /*
+ * Create contexts for per-scan tuplestore under per-query context.
+ */
+ festate->scan_cxt = AllocSetContextCreate(node->ss.ps.state->es_query_cxt,
+ "postgresql_fdw per-scan data",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ festate->temp_cxt = AllocSetContextCreate(node->ss.ps.state->es_query_cxt,
+ "postgresql_fdw temporary data",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+
+ /*
+ * Get connection to the foreign server. Connection manager would
+ * establish new connection if necessary.
+ */
+ relid = RelationGetRelid(node->ss.ss_currentRelation);
+ table = GetForeignTable(relid);
+ server = GetForeignServer(table->serverid);
+ user = GetUserMapping(GetOuterUserId(), server->serverid);
+ conn = GetConnection(server, user, true);
+ festate->conn = conn;
+
+ /* Result will be filled in first Iterate call. */
+ festate->tuples = NULL;
+
+ /* Allocate buffers for column values. */
+ {
+ TupleDesc tupdesc = slot->tts_tupleDescriptor;
+ festate->values = palloc(sizeof(Datum) * tupdesc->natts);
+ festate->nulls = palloc(sizeof(bool) * tupdesc->natts);
+ festate->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+ }
+
+ /*
+ * Allocate buffers for query parameters.
+ *
+ * ParamListInfo might include entries for pseudo-parameter such as
+ * PL/pgSQL's FOUND variable, but we don't care that here, because wasted
+ * area seems not so large.
+ */
+ {
+ ParamListInfo params = node->ss.ps.state->es_param_list_info;
+ int numParams = params ? params->numParams : 0;
+
+ if (numParams > 0)
+ {
+ festate->param_types = palloc0(sizeof(Oid) * numParams);
+ festate->param_values = palloc0(sizeof(char *) * numParams);
+ }
+ else
+ {
+ festate->param_types = NULL;
+ festate->param_values = NULL;
+ }
+ }
+
+ /* Remember which foreign table we are scanning. */
+ festate->errpos.relid = relid;
+
+ /* Store FDW-specific state into ForeignScanState */
+ node->fdw_state = (void *) festate;
+
+ return;
+}
+
+/*
+ * pgsqlIterateForeignScan
+ * Retrieve next row from the result set, or clear tuple slot to indicate
+ * EOF.
+ *
+ * Note that using per-query context when retrieving tuples from
+ * tuplestore to ensure that returned tuples can survive until next
+ * iteration because the tuple is released implicitly via ExecClearTuple.
+ * Retrieving a tuple from tuplestore in CurrentMemoryContext (it's a
+ * per-tuple context), ExecClearTuple will free dangling pointer.
+ */
+static TupleTableSlot *
+pgsqlIterateForeignScan(ForeignScanState *node)
+{
+ PgsqlFdwExecutionState *festate;
+ TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ festate = (PgsqlFdwExecutionState *) node->fdw_state;
+
+ /*
+ * If this is the first call after Begin or ReScan, we need to execute
+ * remote query and get result set.
+ */
+ if (festate->tuples == NULL)
+ execute_query(node);
+
+ /*
+ * If tuples are still left in tuplestore, just return next tuple from it.
+ *
+ * It is necessary to switch to per-scan context to make returned tuple
+ * valid until next IterateForeignScan call, because it will be released
+ * with ExecClearTuple then. Otherwise, picked tuple is allocated in
+ * per-tuple context, and double-free of that tuple might happen.
+ *
+ * If we don't have any result in tuplestore, clear result slot to tell
+ * executor that this scan is over.
+ */
+ MemoryContextSwitchTo(festate->scan_cxt);
+ tuplestore_gettupleslot(festate->tuples, true, false, slot);
+ MemoryContextSwitchTo(oldcontext);
+
+ return slot;
+}
+
+/*
+ * pgsqlReScanForeignScan
+ * - Restart this scan by clearing old results and set re-execute flag.
+ */
+static void
+pgsqlReScanForeignScan(ForeignScanState *node)
+{
+ PgsqlFdwExecutionState *festate;
+
+ festate = (PgsqlFdwExecutionState *) node->fdw_state;
+
+ /* If we haven't have valid result yet, nothing to do. */
+ if (festate->tuples == NULL)
+ return;
+
+ /*
+ * Only rewind the current result set is enough.
+ */
+ tuplestore_rescan(festate->tuples);
+}
+
+/*
+ * pgsqlEndForeignScan
+ * Finish scanning foreign table and dispose objects used for this scan
+ */
+static void
+pgsqlEndForeignScan(ForeignScanState *node)
+{
+ PgsqlFdwExecutionState *festate;
+
+ festate = (PgsqlFdwExecutionState *) node->fdw_state;
+
+ /* if festate is NULL, we are in EXPLAIN; nothing to do */
+ if (festate == NULL)
+ return;
+
+ /*
+ * The connection which was used for this scan should be valid until the
+ * end of the scan to make the lifespan of remote transaction same as the
+ * local query.
+ */
+ ReleaseConnection(festate->conn);
+ festate->conn = NULL;
+
+ /* Discard fetch results */
+ if (festate->tuples != NULL)
+ {
+ tuplestore_end(festate->tuples);
+ festate->tuples = NULL;
+ }
+
+ /* MemoryContext will be deleted automatically. */
+}
+
+/*
+ * Estimate costs of executing given SQL statement.
+ */
+static void
+get_remote_estimate(const char *sql, PGconn *conn,
+ double *rows, int *width,
+ Cost *startup_cost, Cost *total_cost)
+{
+ PGresult *volatile res = NULL;
+ StringInfoData buf;
+ char *plan;
+ char *p;
+ int n;
+
+ /*
+ * Construct EXPLAIN statement with given SQL statement.
+ */
+ initStringInfo(&buf);
+ appendStringInfo(&buf, "EXPLAIN %s", sql);
+
+ /* PGresult must be released before leaving this function. */
+ PG_TRY();
+ {
+ res = PQexec(conn, buf.data);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK || PQntuples(res) == 0)
+ ereport(ERROR,
+ (errmsg("could not execute EXPLAIN for cost estimation"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+
+ /*
+ * Find estimation portion from top plan node. Here we search opening
+ * parentheses from the end of the line to avoid finding unexpected
+ * parentheses.
+ */
+ plan = PQgetvalue(res, 0, 0);
+ p = strrchr(plan, '(');
+ if (p == NULL)
+ elog(ERROR, "wrong EXPLAIN output: %s", plan);
+ n = sscanf(p,
+ "(cost=%lf..%lf rows=%lf width=%d)",
+ startup_cost, total_cost, rows, width);
+ if (n != 4)
+ elog(ERROR, "could not get estimation from EXPLAIN output");
+
+ PQclear(res);
+ res = NULL;
+ }
+ PG_CATCH();
+ {
+ PQclear(res);
+
+ /* Release connection and let connection manager cleanup. */
+ ReleaseConnection(conn);
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+}
+
+/*
+ * Adjust costs estimated on remote end with some overheads such as connection
+ * and data transfer.
+ */
+static void
+adjust_costs(double rows, int width, Cost *startup_cost, Cost *total_cost)
+{
+ /*
+ * TODO Selectivity of quals which are NOT pushed down should be also
+ * considered.
+ */
+
+ /* add cost to establish connection. */
+ *startup_cost += CONNECTION_COSTS;
+ *total_cost += CONNECTION_COSTS;
+
+ /* add cost to transfer result. */
+ *total_cost += TRANSFER_COSTS_PER_BYTE * width * rows;
+ *total_cost += cpu_tuple_cost * rows;
+}
+
+/*
+ * Execute remote query with current parameters.
+ */
+static void
+execute_query(ForeignScanState *node)
+{
+ PgsqlFdwExecutionState *festate;
+ ParamListInfo params = node->ss.ps.state->es_param_list_info;
+ int numParams = params ? params->numParams : 0;
+ Oid *types = NULL;
+ const char **values = NULL;
+ char *sql;
+ PGconn *conn;
+ PGresult *volatile res = NULL;
+
+ festate = (PgsqlFdwExecutionState *) node->fdw_state;
+ types = festate->param_types;
+ values = festate->param_values;
+
+ /*
+ * Construct parameter array in text format. We don't release memory for
+ * the arrays explicitly, because the memory usage would not be very large,
+ * and anyway they will be released in context cleanup.
+ *
+ * If this query is invoked from pl/pgsql function, we have extra entry
+ * for dummy variable FOUND in ParamListInfo, so we need to check type oid
+ * to exclude it from remote parameters.
+ */
+ if (numParams > 0)
+ {
+ int i;
+
+ for (i = 0; i < numParams; i++)
+ {
+ ParamExternData *prm = ¶ms->params[i];
+
+ /* give hook a chance in case parameter is dynamic */
+ if (!OidIsValid(prm->ptype) && params->paramFetch != NULL)
+ params->paramFetch(params, i + 1);
+
+ /*
+ * Get string representation of each parameter value by invoking
+ * type-specific output function unless the value is null or it's
+ * not used in the query.
+ */
+ types[i] = prm->ptype;
+ if (!prm->isnull && OidIsValid(types[i]))
+ {
+ Oid out_func_oid;
+ bool isvarlena;
+ FmgrInfo func;
+
+ getTypeOutputInfo(types[i], &out_func_oid, &isvarlena);
+ fmgr_info(out_func_oid, &func);
+ values[i] = OutputFunctionCall(&func, prm->value);
+ }
+ else
+ values[i] = NULL;
+
+ /*
+ * We use type "text" (groundless but seems most flexible) for
+ * unused (and type-unknown) parameters. We can't remove entry for
+ * unused parameter from the arrays, because parameter references
+ * in remote query ($n) have been indexed based on full length
+ * parameter list.
+ */
+ if (!OidIsValid(types[i]))
+ types[i] = TEXTOID;
+ }
+ }
+
+ conn = festate->conn;
+
+ /* PGresult must be released before leaving this function. */
+ PG_TRY();
+ {
+ bool first = true;
+
+ /*
+ * Execute remote query with parameters, and retrieve results with
+ * single-row-mode which returns results row by row.
+ */
+ sql = strVal(list_nth(festate->fdw_private, FdwPrivateSelectSql));
+ if (!PQsendQueryParams(conn, sql, numParams, types, values, NULL, NULL,
+ 0))
+ ereport(ERROR,
+ (errmsg("could not execute remote query"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+ if (!PQsetSingleRowMode(conn))
+ ereport(ERROR,
+ (errmsg("could not set single-row mode"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+
+ /* Retrieve result rows one by one, and store them into tuplestore. */
+ for (;;)
+ {
+ /* Allow users to cancel long query */
+ CHECK_FOR_INTERRUPTS();
+
+ res = PQgetResult(conn);
+ if (res == NULL)
+ break;
+
+ /* Store the result row into tuplestore */
+ if (PQresultStatus(res) == PGRES_SINGLE_TUPLE)
+ {
+ query_row_processor(res, node, first);
+ PQclear(res);
+ res = NULL;
+ first = false;
+ }
+ else if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ /*
+ * PGresult with PGRES_TUPLES_OK means EOF, so we need to
+ * initialze tuplestore if we have not retrieved any tuple.
+ */
+ if (first)
+ query_row_processor(res, node, first);
+ PQclear(res);
+ res = NULL;
+ first = true;
+ }
+ else
+ {
+ /* Something wrong happend, report the error. */
+ ereport(ERROR,
+ (errmsg("could not execute remote query"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+ }
+ }
+
+ /*
+ * We can't know whether the scan is over or not in custom row
+ * processor, so mark that the result is valid here.
+ */
+ tuplestore_donestoring(festate->tuples);
+
+ /* Discard result of SELECT statement. */
+ PQclear(res);
+ res = NULL;
+ }
+ PG_CATCH();
+ {
+ PQclear(res);
+
+ /* Release connection and let connection manager cleanup. */
+ ReleaseConnection(conn);
+
+ /* propagate error */
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+}
+
+/*
+ * Create tuples from PGresult and store them into tuplestore.
+ *
+ * Caller must use PG_TRY block to catch exception and release PGresult
+ * surely.
+ */
+static void
+query_row_processor(PGresult *res, ForeignScanState *node, bool first)
+{
+ int i;
+ int j;
+ int attnum; /* number of non-dropped columns */
+ TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+ TupleDesc tupdesc = slot->tts_tupleDescriptor;
+ Form_pg_attribute *attrs = tupdesc->attrs;
+ PgsqlFdwExecutionState *festate = (PgsqlFdwExecutionState *) node->fdw_state;
+ AttInMetadata *attinmeta = festate->attinmeta;
+ HeapTuple tuple;
+ ErrorContextCallback errcontext;
+ MemoryContext oldcontext;
+
+ if (first)
+ {
+ int nfields = PQnfields(res);
+
+ /* count non-dropped columns */
+ for (attnum = 0, i = 0; i < tupdesc->natts; i++)
+ if (!attrs[i]->attisdropped)
+ attnum++;
+
+ /* check result and tuple descriptor have the same number of columns */
+ if (attnum > 0 && attnum != nfields)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("remote query result rowtype does not match "
+ "the specified FROM clause rowtype"),
+ errdetail("expected %d, actual %d", attnum, nfields)));
+
+ /* First, ensure that the tuplestore is empty. */
+ if (festate->tuples == NULL)
+ {
+
+ /*
+ * Create tuplestore to store result of the query in per-query
+ * context. Note that we use this memory context to avoid memory
+ * leak in error cases.
+ */
+ oldcontext = MemoryContextSwitchTo(festate->scan_cxt);
+ festate->tuples = tuplestore_begin_heap(false, false, work_mem);
+ MemoryContextSwitchTo(oldcontext);
+ }
+ else
+ {
+ /* Clear old result just in case. */
+ tuplestore_clear(festate->tuples);
+ }
+
+ /* Do nothing for empty result */
+ if (PQntuples(res) == 0)
+ return;
+ }
+
+ /* Should have a single-row result if we get here */
+ Assert(PQntuples(res) == 1);
+
+ /*
+ * Do the following work in a temp context that we reset after each tuple.
+ * This cleans up not only the data we have direct access to, but any
+ * cruft the I/O functions might leak.
+ */
+ oldcontext = MemoryContextSwitchTo(festate->temp_cxt);
+
+ for (i = 0, j = 0; i < tupdesc->natts; i++)
+ {
+ /* skip dropped columns. */
+ if (attrs[i]->attisdropped)
+ {
+ festate->nulls[i] = true;
+ continue;
+ }
+
+ /*
+ * Set NULL indicator, and convert text representation to internal
+ * representation if any.
+ */
+ if (PQgetisnull(res, 0, j))
+ festate->nulls[i] = true;
+ else
+ {
+ Datum value;
+
+ festate->nulls[i] = false;
+
+ /*
+ * Set up and install callback to report where conversion error
+ * occurs.
+ */
+ festate->errpos.cur_attno = i + 1;
+ errcontext.callback = postgresql_fdw_error_callback;
+ errcontext.arg = (void *) &festate->errpos;
+ errcontext.previous = error_context_stack;
+ error_context_stack = &errcontext;
+
+ value = InputFunctionCall(&attinmeta->attinfuncs[i],
+ PQgetvalue(res, 0, j),
+ attinmeta->attioparams[i],
+ attinmeta->atttypmods[i]);
+ festate->values[i] = value;
+
+ /* Uninstall error context callback. */
+ error_context_stack = errcontext.previous;
+ }
+ j++;
+ }
+
+ /*
+ * Build the tuple and put it into the slot.
+ * We don't have to free the tuple explicitly because it's been
+ * allocated in the per-tuple context.
+ */
+ tuple = heap_form_tuple(tupdesc, festate->values, festate->nulls);
+ tuplestore_puttuple(festate->tuples, tuple);
+
+ /* Clean up */
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(festate->temp_cxt);
+
+ return;
+}
+
+/*
+ * Callback function which is called when error occurs during column value
+ * conversion. Print names of column and relation.
+ */
+static void
+postgresql_fdw_error_callback(void *arg)
+{
+ ErrorPos *errpos = (ErrorPos *) arg;
+ const char *relname;
+ const char *colname;
+
+ relname = get_rel_name(errpos->relid);
+ colname = get_attname(errpos->relid, errpos->cur_attno);
+ errcontext("column %s of foreign table %s",
+ quote_identifier(colname), quote_identifier(relname));
+}
+
+/*
+ * pgsqlAnalyzeForeignTable
+ * Test whether analyzing this foreign table is supported
+ */
+static bool
+pgsqlAnalyzeForeignTable(Relation relation,
+ AcquireSampleRowsFunc *func,
+ BlockNumber *totalpages)
+{
+ *totalpages = 0;
+ *func = pgsqlAcquireSampleRowsFunc;
+
+ return true;
+}
+
+/*
+ * Acquire a random sample of rows from foreign table managed by postgresql_fdw.
+ *
+ * postgresql_fdw doesn't provide direct access to remote buffer, so we execute
+ * simple SELECT statement which retrieves whole rows from remote side, and
+ * pick some samples from them.
+ */
+static int
+pgsqlAcquireSampleRowsFunc(Relation relation, int elevel,
+ HeapTuple *rows, int targrows,
+ double *totalrows,
+ double *totaldeadrows)
+{
+ PgsqlAnalyzeState astate;
+ StringInfoData sql;
+ ForeignTable *table;
+ ForeignServer *server;
+ UserMapping *user;
+ PGconn *conn = NULL;
+ PGresult *volatile res = NULL;
+
+ /*
+ * Only few information are necessary as input to row processor. Other
+ * initialization will be done at the first row processor call.
+ */
+ astate.anl_cxt = CurrentMemoryContext;
+ astate.temp_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "postgresql_fdw analyze temporary data",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ astate.rows = rows;
+ astate.targrows = targrows;
+ astate.tupdesc = relation->rd_att;
+ astate.errpos.relid = relation->rd_id;
+
+ /*
+ * Construct SELECT statement which retrieves whole rows from remote. We
+ * can't avoid running sequential scan on remote side to get practical
+ * statistics, so this seems reasonable compromise.
+ */
+ initStringInfo(&sql);
+ deparseAnalyzeSql(&sql, relation);
+ elog(DEBUG3, "Analyze SQL: %s", sql.data);
+
+ table = GetForeignTable(relation->rd_id);
+ server = GetForeignServer(table->serverid);
+ user = GetUserMapping(GetOuterUserId(), server->serverid);
+ conn = GetConnection(server, user, true);
+
+ /*
+ * Acquire sample rows from the result set.
+ */
+ PG_TRY();
+ {
+ bool first = true;
+
+ /* Execute remote query and retrieve results row by row. */
+ if (!PQsendQuery(conn, sql.data))
+ ereport(ERROR,
+ (errmsg("could not execute remote query for analyze"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql.data)));
+ if (!PQsetSingleRowMode(conn))
+ ereport(ERROR,
+ (errmsg("could not set single-row mode"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql.data)));
+
+ /* Retrieve result rows one by one, and store them into tuplestore. */
+ for (;;)
+ {
+ /* Allow users to cancel long query */
+ CHECK_FOR_INTERRUPTS();
+
+ res = PQgetResult(conn);
+ if (res == NULL)
+ break;
+
+ /* Store the result row into tuplestore */
+ if (PQresultStatus(res) == PGRES_SINGLE_TUPLE)
+ {
+ analyze_row_processor(res, &astate, first);
+ PQclear(res);
+ res = NULL;
+ first = false;
+ }
+ else if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ /*
+ * PGresult with PGRES_TUPLES_OK means EOF, so we need to
+ * initialze tuplestore if we have not retrieved any tuple.
+ */
+ if (first && PQresultStatus(res) == PGRES_TUPLES_OK)
+ analyze_row_processor(res, &astate, first);
+
+ PQclear(res);
+ res = NULL;
+ first = true;
+ }
+ else
+ {
+ /* Something wrong happend, report the error. */
+ ereport(ERROR,
+ (errmsg("could not execute remote query for analyze"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql.data)));
+ }
+ }
+ }
+ PG_CATCH();
+ {
+ PQclear(res);
+
+ /* Release connection and let connection manager cleanup. */
+ ReleaseConnection(conn);
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ ReleaseConnection(conn);
+
+ /* We assume that we have no dead tuple. */
+ *totaldeadrows = 0.0;
+
+ /* We've retrieved all living tuples from foreign server. */
+ *totalrows = astate.samplerows;
+
+ /*
+ * We don't update pg_class.relpages because we don't care that in
+ * planning at all.
+ */
+
+ /*
+ * Emit some interesting relation info
+ */
+ ereport(elevel,
+ (errmsg("\"%s\": scanned with \"%s\", "
+ "containing %.0f live rows and %.0f dead rows; "
+ "%d rows in sample, %.0f estimated total rows",
+ RelationGetRelationName(relation), sql.data,
+ astate.samplerows, 0.0,
+ astate.numrows, astate.samplerows)));
+
+ return astate.numrows;
+}
+
+/*
+ * Custom row processor for acquire_sample_rows.
+ *
+ * Collect sample rows from the result of query.
+ * - Use all tuples as sample until target rows samples are collected.
+ * - Once reached the target, skip some tuples and replace already sampled
+ * tuple randomly.
+ */
+static void
+analyze_row_processor(PGresult *res, PgsqlAnalyzeState *astate, bool first)
+{
+ int targrows = astate->targrows;
+ TupleDesc tupdesc = astate->tupdesc;
+ int i;
+ int j;
+ int pos; /* position where next sample should be stored. */
+ HeapTuple tuple;
+ ErrorContextCallback errcontext;
+ MemoryContext callercontext;
+
+ if (first)
+ {
+ /* Prepare for sampling rows */
+ astate->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+ astate->values = (Datum *) palloc(sizeof(Datum) * tupdesc->natts);
+ astate->nulls = (bool *) palloc(sizeof(bool) * tupdesc->natts);
+ astate->numrows = 0;
+ astate->samplerows = 0;
+ astate->rowstoskip = -1;
+ astate->numrows = 0;
+ astate->rstate = anl_init_selection_state(astate->targrows);
+
+ /* Do nothing for empty result */
+ if (PQntuples(res) == 0)
+ return;
+ }
+
+ /* Should have a single-row result if we get here */
+ Assert(PQntuples(res) == 1);
+
+ /*
+ * Do the following work in a temp context that we reset after each tuple.
+ * This cleans up not only the data we have direct access to, but any
+ * cruft the I/O functions might leak.
+ */
+ callercontext = MemoryContextSwitchTo(astate->temp_cxt);
+
+ /*
+ * First targrows rows are once sampled always. If we have more source
+ * rows, pick up some of them by skipping and replace already sampled
+ * tuple randomly.
+ *
+ * Here we just determine the slot where next sample should be stored. Set
+ * pos to negative value to indicates the row should be skipped.
+ */
+ if (astate->numrows < targrows)
+ pos = astate->numrows++;
+ else
+ {
+ /*
+ * The first targrows sample rows are simply copied into
+ * the reservoir. Then we start replacing tuples in the
+ * sample until we reach the end of the relation. This
+ * algorithm is from Jeff Vitter's paper, similarly to
+ * acquire_sample_rows in analyze.c.
+ *
+ * We don't have block-wise accessibility, so every row in
+ * the PGresult is possible to be sample.
+ */
+ if (astate->rowstoskip < 0)
+ astate->rowstoskip = anl_get_next_S(astate->samplerows, targrows,
+ &astate->rstate);
+
+ if (astate->rowstoskip <= 0)
+ {
+ int k = (int) (targrows * anl_random_fract());
+
+ Assert(k >= 0 && k < targrows);
+
+ /*
+ * Create sample tuple from the result, and replace at
+ * random.
+ */
+ heap_freetuple(astate->rows[k]);
+ pos = k;
+ }
+ else
+ pos = -1;
+
+ astate->rowstoskip -= 1;
+ }
+
+ /* Always increment sample row counter. */
+ astate->samplerows += 1;
+
+ if (pos >= 0)
+ {
+ AttInMetadata *attinmeta = astate->attinmeta;
+
+ /*
+ * Create sample tuple from current result row, and store it into the
+ * position determined above. Note that i and j point entries in
+ * catalog and columns array respectively.
+ */
+ for (i = 0, j = 0; i < tupdesc->natts; i++)
+ {
+ if (tupdesc->attrs[i]->attisdropped)
+ continue;
+
+ if (PQgetisnull(res, 0, j))
+ astate->nulls[i] = true;
+ else
+ {
+ Datum value;
+
+ astate->nulls[i] = false;
+
+ /*
+ * Set up and install callback to report where conversion error
+ * occurs.
+ */
+ astate->errpos.cur_attno = i + 1;
+ errcontext.callback = postgresql_fdw_error_callback;
+ errcontext.arg = (void *) &astate->errpos;
+ errcontext.previous = error_context_stack;
+ error_context_stack = &errcontext;
+
+ value = InputFunctionCall(&attinmeta->attinfuncs[i],
+ PQgetvalue(res, 0, j),
+ attinmeta->attioparams[i],
+ attinmeta->atttypmods[i]);
+ astate->values[i] = value;
+
+ /* Uninstall error callback function. */
+ error_context_stack = errcontext.previous;
+ }
+ j++;
+ }
+
+ /*
+ * Generate tuple from the result row data, and store it into the give
+ * buffer. Note that we need to allocate the tuple in the analyze
+ * context to make it valid even after temporary per-tuple context has
+ * been reset.
+ */
+ MemoryContextSwitchTo(astate->anl_cxt);
+ tuple = heap_form_tuple(tupdesc, astate->values, astate->nulls);
+ MemoryContextSwitchTo(astate->temp_cxt);
+ astate->rows[pos] = tuple;
+ }
+
+ /* Clean up */
+ MemoryContextSwitchTo(callercontext);
+ MemoryContextReset(astate->temp_cxt);
+
+ return;
+}
diff --git a/contrib/postgresql_fdw/postgresql_fdw.control b/contrib/postgresql_fdw/postgresql_fdw.control
new file mode 100644
index 0000000..a87dc80
--- /dev/null
+++ b/contrib/postgresql_fdw/postgresql_fdw.control
@@ -0,0 +1,5 @@
+# postgresql_fdw extension
+comment = 'foreign-data wrapper for remote PostgreSQL servers'
+default_version = '1.0'
+module_pathname = '$libdir/postgresql_fdw'
+relocatable = true
diff --git a/contrib/postgresql_fdw/postgresql_fdw.h b/contrib/postgresql_fdw/postgresql_fdw.h
new file mode 100644
index 0000000..691e0ff
--- /dev/null
+++ b/contrib/postgresql_fdw/postgresql_fdw.h
@@ -0,0 +1,44 @@
+/*-------------------------------------------------------------------------
+ *
+ * postgresql_fdw.h
+ * foreign-data wrapper for remote PostgreSQL servers.
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgresql_fdw/postgresql_fdw.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef POSTGRESQL_FDW_H
+#define POSTGRESQL_FDW_H
+
+#include "postgres.h"
+#include "foreign/foreign.h"
+#include "nodes/relation.h"
+#include "utils/relcache.h"
+
+/* in option.c */
+int ExtractConnectionOptions(List *defelems,
+ const char **keywords,
+ const char **values);
+int GetFetchCountOption(ForeignTable *table, ForeignServer *server);
+
+/* in deparse.c */
+void deparseSimpleSql(StringInfo buf,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ List *local_conds);
+void appendWhereClause(StringInfo buf,
+ bool has_where,
+ List *exprs,
+ PlannerInfo *root);
+void classifyConditions(PlannerInfo *root,
+ RelOptInfo *baserel,
+ List **remote_conds,
+ List **param_conds,
+ List **local_conds);
+void deparseAnalyzeSql(StringInfo buf, Relation rel);
+
+#endif /* POSTGRESQL_FDW_H */
diff --git a/contrib/postgresql_fdw/sql/postgresql_fdw.sql b/contrib/postgresql_fdw/sql/postgresql_fdw.sql
new file mode 100644
index 0000000..9d971c5
--- /dev/null
+++ b/contrib/postgresql_fdw/sql/postgresql_fdw.sql
@@ -0,0 +1,304 @@
+-- ===================================================================
+-- create FDW objects
+-- ===================================================================
+
+-- Clean up in case a prior regression run failed
+
+-- Suppress NOTICE messages when roles don't exist
+SET client_min_messages TO 'error';
+
+DROP ROLE IF EXISTS postgresql_fdw_user;
+
+RESET client_min_messages;
+
+CREATE ROLE postgresql_fdw_user LOGIN SUPERUSER;
+SET SESSION AUTHORIZATION 'postgresql_fdw_user';
+
+CREATE EXTENSION postgresql_fdw;
+
+CREATE SERVER loopback1 FOREIGN DATA WRAPPER postgresql_fdw;
+CREATE SERVER loopback2 FOREIGN DATA WRAPPER postgresql_fdw
+ OPTIONS (dbname 'contrib_regression');
+
+CREATE USER MAPPING FOR public SERVER loopback1
+ OPTIONS (user 'value', password 'value');
+CREATE USER MAPPING FOR postgresql_fdw_user SERVER loopback2;
+
+-- ===================================================================
+-- create objects used through FDW
+-- ===================================================================
+CREATE TYPE user_enum AS ENUM ('foo', 'bar', 'buz');
+CREATE SCHEMA "S 1";
+CREATE TABLE "S 1"."T 1" (
+ "C 1" int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum,
+ CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
+);
+CREATE TABLE "S 1"."T 2" (
+ c1 int NOT NULL,
+ c2 text,
+ CONSTRAINT t2_pkey PRIMARY KEY (c1)
+);
+
+BEGIN;
+TRUNCATE "S 1"."T 1";
+INSERT INTO "S 1"."T 1"
+ SELECT id,
+ id % 10,
+ to_char(id, 'FM00000'),
+ '1970-01-01'::timestamptz + ((id % 100) || ' days')::interval,
+ '1970-01-01'::timestamp + ((id % 100) || ' days')::interval,
+ id % 10,
+ id % 10,
+ 'foo'::user_enum
+ FROM generate_series(1, 1000) id;
+TRUNCATE "S 1"."T 2";
+INSERT INTO "S 1"."T 2"
+ SELECT id,
+ 'AAA' || to_char(id, 'FM000')
+ FROM generate_series(1, 100) id;
+COMMIT;
+
+-- ===================================================================
+-- create foreign tables
+-- ===================================================================
+CREATE FOREIGN TABLE ft1 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft1 DROP COLUMN c0;
+
+CREATE FOREIGN TABLE ft2 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft2 DROP COLUMN c0;
+
+-- ===================================================================
+-- tests for postgresql_fdw_validator
+-- ===================================================================
+ALTER FOREIGN DATA WRAPPER postgresql_fdw OPTIONS (host 'value'); -- ERROR
+-- requiressl, krbsrvname and gsslib are omitted because they depend on
+-- configure option
+ALTER SERVER loopback1 OPTIONS (
+ authtype 'value',
+ service 'value',
+ connect_timeout 'value',
+ dbname 'value',
+ host 'value',
+ hostaddr 'value',
+ port 'value',
+ --client_encoding 'value',
+ tty 'value',
+ options 'value',
+ application_name 'value',
+ --fallback_application_name 'value',
+ keepalives 'value',
+ keepalives_idle 'value',
+ keepalives_interval 'value',
+ -- requiressl 'value',
+ sslcompression 'value',
+ sslmode 'value',
+ sslcert 'value',
+ sslkey 'value',
+ sslrootcert 'value',
+ sslcrl 'value'
+ --requirepeer 'value',
+ -- krbsrvname 'value',
+ -- gsslib 'value',
+ --replication 'value'
+);
+ALTER SERVER loopback1 OPTIONS (user 'value'); -- ERROR
+ALTER USER MAPPING FOR public SERVER loopback1
+ OPTIONS (DROP user, DROP password);
+ALTER USER MAPPING FOR public SERVER loopback1
+ OPTIONS (host 'value'); -- ERROR
+ALTER FOREIGN TABLE ft1 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft2 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft1 OPTIONS (invalid 'value'); -- ERROR
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (invalid 'value'); -- ERROR
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+\dew+
+\des+
+\deu+
+\det+
+
+-- Use only Nested loop for stable results.
+SET enable_mergejoin TO off;
+SET enable_hashjoin TO off;
+
+-- ===================================================================
+-- simple queries
+-- ===================================================================
+-- single table, with/without alias
+EXPLAIN (COSTS false) SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+-- empty result
+SELECT * FROM ft1 WHERE false;
+-- with WHERE clause
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+-- aggregate
+SELECT COUNT(*) FROM ft1 t1;
+-- join two tables
+SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+-- subquery
+SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
+-- subquery+MAX
+SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY c1;
+-- used in CTE
+WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
+-- fixed values
+SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+-- user-defined operator/function
+CREATE FUNCTION postgresql_fdw_abs(int) RETURNS int AS $$
+BEGIN
+RETURN abs($1);
+END
+$$ LANGUAGE plpgsql IMMUTABLE;
+CREATE OPERATOR === (
+ LEFTARG = int,
+ RIGHTARG = int,
+ PROCEDURE = int4eq,
+ COMMUTATOR = ===,
+ NEGATOR = !==
+);
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = postgresql_fdw_abs(t1.c2);
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 === t1.c2;
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = abs(t1.c2);
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = t1.c2;
+
+-- ===================================================================
+-- WHERE push down
+-- ===================================================================
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1; -- Var, OpExpr(b), Const
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NULL; -- NullTest
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL; -- NullTest
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = -c1; -- OpExpr(l)
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE 1 = c1!; -- OpExpr(r)
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = (ARRAY[c1,c2,3])[1]; -- ArrayRef
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo'; -- no push-down
+
+-- ===================================================================
+-- parameterized queries
+-- ===================================================================
+-- simple join
+PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
+EXPLAIN (COSTS false) EXECUTE st1(1, 2);
+EXECUTE st1(1, 1);
+EXECUTE st1(101, 101);
+-- subquery using stable function (can't be pushed down)
+PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c4) = 6) ORDER BY c1;
+EXPLAIN (COSTS false) EXECUTE st2(10, 20);
+EXECUTE st2(10, 20);
+EXECUTE st1(101, 101);
+-- subquery using immutable function (can be pushed down)
+PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c5) = 6) ORDER BY c1;
+EXPLAIN (COSTS false) EXECUTE st3(10, 20);
+EXECUTE st3(10, 20);
+EXECUTE st3(20, 30);
+-- custom plan should be chosen
+PREPARE st4(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 = $1;
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+-- cleanup
+DEALLOCATE st1;
+DEALLOCATE st2;
+DEALLOCATE st3;
+DEALLOCATE st4;
+
+-- ===================================================================
+-- used in pl/pgsql function
+-- ===================================================================
+CREATE OR REPLACE FUNCTION f_test(p_c1 int) RETURNS int AS $$
+DECLARE
+ v_c1 int;
+BEGIN
+ SELECT c1 INTO v_c1 FROM ft1 WHERE c1 = p_c1 LIMIT 1;
+ PERFORM c1 FROM ft1 WHERE c1 = p_c1 AND p_c1 = v_c1 LIMIT 1;
+ RETURN v_c1;
+END;
+$$ LANGUAGE plpgsql;
+SELECT f_test(100);
+DROP FUNCTION f_test(int);
+
+-- ===================================================================
+-- connection management
+-- ===================================================================
+SELECT srvname, usename FROM postgresql_fdw_connections;
+SELECT postgresql_fdw_disconnect(srvid, usesysid) FROM postgresql_fdw_get_connections();
+SELECT srvname, usename FROM postgresql_fdw_connections;
+
+-- ===================================================================
+-- conversion error
+-- ===================================================================
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE int;
+SELECT * FROM ft1 WHERE c1 = 1; -- ERROR
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE timestamp;
+
+-- ===================================================================
+-- subtransaction
+-- + local/remote error doesn't break cursor
+-- + remote error discards connection
+-- ===================================================================
+BEGIN;
+DECLARE c CURSOR FOR SELECT * FROM ft1 ORDER BY c1;
+FETCH c;
+SAVEPOINT s;
+ERROR OUT; -- ERROR
+ROLLBACK TO s;
+SELECT srvname FROM postgresql_fdw_connections;
+FETCH c;
+SAVEPOINT s;
+SELECT * FROM ft1 WHERE 1 / (c1 - 1) > 0; -- ERROR
+ROLLBACK TO s;
+SELECT srvname FROM postgresql_fdw_connections;
+FETCH c;
+SELECT * FROM ft1 ORDER BY c1 LIMIT 1;
+COMMIT;
+SELECT srvname FROM postgresql_fdw_connections;
+ERROR OUT; -- ERROR
+SELECT srvname FROM postgresql_fdw_connections;
+
+-- ===================================================================
+-- cleanup
+-- ===================================================================
+DROP OPERATOR === (int, int) CASCADE;
+DROP OPERATOR !== (int, int) CASCADE;
+DROP FUNCTION postgresql_fdw_abs(int);
+DROP SCHEMA "S 1" CASCADE;
+DROP TYPE user_enum CASCADE;
+DROP EXTENSION postgresql_fdw CASCADE;
+\c
+DROP ROLE postgresql_fdw_user;
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 6b13a0a..4ffa2fa 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -132,6 +132,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
&pgstatstatements;
&pgstattuple;
&pgtrgm;
+ &postgresql-fdw;
&seg;
&sepgsql;
&contrib-spi;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index db4cc3a..373582a 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -133,6 +133,7 @@
<!ENTITY pgtesttiming SYSTEM "pgtesttiming.sgml">
<!ENTITY pgtrgm SYSTEM "pgtrgm.sgml">
<!ENTITY pgupgrade SYSTEM "pgupgrade.sgml">
+<!ENTITY postgresql-fdw SYSTEM "postgresql-fdw.sgml">
<!ENTITY seg SYSTEM "seg.sgml">
<!ENTITY contrib-spi SYSTEM "contrib-spi.sgml">
<!ENTITY sepgsql SYSTEM "sepgsql.sgml">
diff --git a/doc/src/sgml/postgresql-fdw.sgml b/doc/src/sgml/postgresql-fdw.sgml
new file mode 100644
index 0000000..b1c4e36
--- /dev/null
+++ b/doc/src/sgml/postgresql-fdw.sgml
@@ -0,0 +1,235 @@
+<!-- doc/src/sgml/postgresql-fdw.sgml -->
+
+<sect1 id="postgresql-fdw" xreflabel="postgresql_fdw">
+ <title>postgresql_fdw</title>
+
+ <indexterm zone="postgresql-fdw">
+ <primary>postgresql_fdw</primary>
+ </indexterm>
+
+ <para>
+ The <filename>postgresql_fdw</filename> module provides a foreign-data
+ wrapper for external <productname>PostgreSQL</productname> servers.
+ With this module, users can access data stored in external
+ <productname>PostgreSQL</productname> via plain SQL statements.
+ </para>
+
+ <para>
+ Note that default wrapper <literal>postgresql_fdw</literal> is created
+ automatically during <command>CREATE EXTENSION</command> command for
+ <application>postgresql_fdw</application>.
+ </para>
+
+ <sect2>
+ <title>FDW Options of postgresql_fdw</title>
+
+ <sect3>
+ <title>Connection Options</title>
+ <para>
+ A foreign server and user mapping created using this wrapper can have
+ <application>libpq</> connection options, expect below:
+
+ <itemizedlist>
+ <listitem><para>client_encoding</para></listitem>
+ <listitem><para>fallback_application_name</para></listitem>
+ <listitem><para>replication</para></listitem>
+ </itemizedlist>
+
+ For details of <application>libpq</> connection options, see
+ <xref linkend="libpq-connect">.
+ </para>
+
+ <para>
+ <literal>user</literal> and <literal>password</literal> can be
+ specified on user mappings, and others can be specified on foreign servers.
+ </para>
+ </sect3>
+
+ <sect3>
+ <title>Object Name Options</title>
+ <para>
+ Foreign tables which were created using this wrapper, or its columns can
+ have object name options. These options can be used to specify the names
+ used in SQL statement sent to remote <productname>PostgreSQL</productname>
+ server. These options are useful when a remote object has different name
+ from corresponding local one.
+ </para>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>nspname</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a foreign table, is used as a
+ namespace (schema) reference in the SQL statement. If this options is
+ omitted, <literal>pg_class.nspname</literal> of the foreign table is
+ used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>relname</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a foreign table, is used as a
+ relation (table) reference in the SQL statement. If this options is
+ omitted, <literal>pg_class.relname</literal> of the foreign table is
+ used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>colname</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a column of a foreign table, is
+ used as a column (attribute) reference in the SQL statement. If this
+ option is omitted, <literal>pg_attribute.attname</literal> of the column
+ of the foreign table is used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ </sect3>
+
+ </sect2>
+
+ <sect2>
+ <title>Connection Management</title>
+
+ <para>
+ The <application>postgresql_fdw</application> establishes a connection to a
+ foreign server in the beginning of the first query which uses a foreign
+ table associated to the foreign server, and reuses the connection following
+ queries and even in following foreign scans in same query.
+
+ You can see the list of active connections via
+ <structname>postgresql_fdw_connections</structname> view. It shows pair of
+ oid and name of server and local role for each active connections
+ established by <application>postgresql_fdw</application>. For security
+ reason, only superuser can see other role's connections.
+ </para>
+
+ <para>
+ Established connections are kept alive until local role changes or the
+ current transaction aborts or user requests so.
+ </para>
+
+ <para>
+ If role has been changed, active connections established as old local role
+ is kept alive but never be reused until local role has restored to original
+ role. This kind of situation happens with <command>SET ROLE</command> and
+ <command>SET SESSION AUTHORIZATION</command>.
+ </para>
+
+ <para>
+ If current transaction aborts by error or user request, all active
+ connections are disconnected automatically. This behavior avoids possible
+ connection leaks on error.
+ </para>
+
+ <para>
+ You can discard persistent connection at arbitrary timing with
+ <function>postgresql_fdw_disconnect()</function>. It takes server oid and
+ user oid as arguments. This function can handle only connections
+ established in current session; connections established by other backends
+ are not reachable.
+ </para>
+
+ <para>
+ You can discard all active and visible connections in current session with
+ using <structname>postgresql_fdw_connections</structname> and
+ <function>postgresql_fdw_disconnect()</function> together:
+<synopsis>
+postgres=# SELECT postgresql_fdw_disconnect(srvid, usesysid) FROM postgresql_fdw_connections;
+ postgresql_fdw_disconnect
+----------------------
+ OK
+ OK
+(2 rows)
+</synopsis>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Transaction Management</title>
+ <para>
+ The <application>postgresql_fdw</application> begins remote transaction at
+ the beginning of a local query, and terminates it with
+ <command>ABORT</command> at the end of the local query. This means that all
+ foreign scans on a foreign server in a local query are executed in one
+ transaction.
+ If isolation level of local transaction is <literal>SERIALIZABLE</literal>,
+ <literal>SERIALIZABLE</literal> is used for remote transaction. Otherwise,
+ if isolation level of local transaction is one of
+ <literal>READ UNCOMMITTED</literal>, <literal>READ COMMITTED</literal> or
+ <literal>REPEATABLE READ</literal>, then <literal>REPEATABLE READ</literal>
+ is used for remote transaction.
+ <literal>READ UNCOMMITTED</literal> and <literal>READ COMMITTED</literal>
+ are never used for remote transaction, because even
+ <literal>READ COMMITTED</literal> transaction might produce inconsistent
+ results, if remote data have been updated between two remote queries.
+ </para>
+ <para>
+ Note that even if the isolation level of local transaction was
+ <literal>SERIALIZABLE</literal> or <literal>REPEATABLE READ</literal>,
+ series of one query might produce different result, because foreign scans
+ in different local queries are executed in different remote transactions.
+ For instance, when client started a local transaction
+ explicitly with isolation level <literal>SERIALIZABLE</literal>, and
+ executed same local query which contains a foreign table which references
+ foreign data which is updated frequently, latter result would be different
+ from former result.
+ </para>
+ <para>
+ This restriction might be relaxed in future release.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Estimation of Costs and Rows</title>
+ <para>
+ The <application>postgresql_fdw</application> estimates the costs of a
+ foreign scan by adding up some basic costs: connection costs, remote query
+ costs and data transfer costs.
+ To get remote query costs, <application>postgresql_fdw</application> executes
+ <command>EXPLAIN</command> command on remote server for each foreign scan.
+ </para>
+ <para>
+ On the other hand, estimated rows which was returned by
+ <command>EXPLAIN</command> is used for local estimation as-is.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>EXPLAIN Output</title>
+ <para>
+ For a foreign table using <literal>postgresql_fdw</>, <command>EXPLAIN</>
+ shows a remote SQL statement which is sent to remote
+ <productname>PostgreSQL</productname> server for a ForeignScan plan node.
+ For example:
+ </para>
+<synopsis>
+postgres=# EXPLAIN SELECT aid FROM pgbench_accounts WHERE abalance < 0;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on pgbench_accounts (cost=100.00..8105.13 rows=302613 width=8)
+ Filter: (abalance < 0)
+ Remote SQL: SELECT aid, NULL, abalance, NULL FROM public.pgbench_accounts
+(3 rows)
+</synopsis>
+ </sect2>
+
+ <sect2>
+ <title>Author</title>
+ <para>
+ Shigeru Hanada <email>shigeru.hanada@gmail.com</email>
+ </para>
+ </sect2>
+
+</sect1>
Hi Hanada-san,
Please examine attached v2 patch (note that is should be applied onto
latest dblink_fdw_validator patch).
I've reviewed your patch quickly. I noticed that the patch has been created in
a slightly different way from the guidelines:
http://www.postgresql.org/docs/devel/static/fdw-planning.html The guidelines
say the following, but the patch identifies the clauses in
baserel->baserestrictinfo in GetForeignRelSize, not GetForeignPaths. Also, it
has been implemented so that most sub_expressions are evaluated at the remote
end, not the local end, though I'm missing something. For postgresql_fdw to be
a good reference for FDW developers, ISTM it would be better that it be
consistent with the guidelines. I think it would be needed to update the
following document or redesign the function to be consistent with the following
document.
As an example, the FDW might identify some restriction clauses of the form
foreign_variable= sub_expression, which it determines can be executed on the
remote server given the locally-evaluated value of the sub_expression. The
actual identification of such a clause should happen during GetForeignPaths,
since it would affect the cost estimate for the path.
Thanks,
Best regards,
Etsuro Fujita
2012/10/11 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
Hi Hanada-san,
Please examine attached v2 patch (note that is should be applied onto
latest dblink_fdw_validator patch).I've reviewed your patch quickly. I noticed that the patch has been created in
a slightly different way from the guidelines:
http://www.postgresql.org/docs/devel/static/fdw-planning.html The guidelines
say the following, but the patch identifies the clauses in
baserel->baserestrictinfo in GetForeignRelSize, not GetForeignPaths. Also, it
has been implemented so that most sub_expressions are evaluated at the remote
end, not the local end, though I'm missing something. For postgresql_fdw to be
a good reference for FDW developers, ISTM it would be better that it be
consistent with the guidelines. I think it would be needed to update the
following document or redesign the function to be consistent with the following
document.
Hmm. It seems to me Fujita-san's comment is right.
Even though the latest implementation gets an estimated number of rows
using EXPLAIN with qualified SELECT statement on remote side, then, it is
adjusted with selectivity of local qualifiers, we shall be able to obtain same
cost estimation because postgresql_fdw assumes all the pushed-down
qualifiers are built-in only.
So, is it available to move classifyConditions() to pgsqlGetForeignPlan(),
then, separate remote qualifiers and local ones here?
If we call get_remote_estimate() without WHERE clause, remote side
will give just whole number of rows of remote table. Then, it can be adjusted
with selectivity of "all" the RestrictInfo (including both remote and local).
Sorry, I should suggest it at the beginning.
This change may affects the optimization that obtains remote columns
being in-use at local side. Let me suggest an expression walker that
sets member of BitmapSet for columns in-use at local side or target list.
Then, we can list up them on the target list of the remote query.
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
Shigeru HANADA escribió:
Please examine attached v2 patch (note that is should be applied onto
latest dblink_fdw_validator patch).
Tom committed parts of the dblink_fdw_validator patch, but not the
removal, so it seems this patch needs to be rebased on top of that
somehow. I am not able to say what's the best resolution for that
conflict, however. But it seems to me that maybe you will need to
choose a different name for the validator after all, to support binary
upgrades.
There are some other comments downthread that a followup patch probably
needs to address too, as well. I am marking this patch Returned with
Feedback. Please submit an updated version to CF3.
Thanks.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Sorry for delayed response.
On Sun, Oct 21, 2012 at 3:16 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
2012/10/11 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
I've reviewed your patch quickly. I noticed that the patch has been created in
a slightly different way from the guidelines:
http://www.postgresql.org/docs/devel/static/fdw-planning.html The guidelines
say the following, but the patch identifies the clauses in
baserel->baserestrictinfo in GetForeignRelSize, not GetForeignPaths. Also, it
has been implemented so that most sub_expressions are evaluated at the remote
end, not the local end, though I'm missing something. For postgresql_fdw to be
a good reference for FDW developers, ISTM it would be better that it be
consistent with the guidelines. I think it would be needed to update the
following document or redesign the function to be consistent with the following
document.Hmm. It seems to me Fujita-san's comment is right.
Indeed postgresql_fdw touches baserestrictinfo in GetForeignRelSize, but
it's because of optimization for better width estimate. The doc
Fujita-san pointed says that:
The actual identification of such a clause should happen during
GetForeignPaths, since it would affect the cost estimate for the
path.
I understood this description says that "you need to touch baserestrict
info *before* GetForeignPlan to estimate costs of optimized path". I
don't feel that this description prohibits FDW to touch baserestrictinfo
in GetForeignRelSize, but mentioning it clearly might be better.
Regards,
--
Shigeru HANADA
2012/11/6 Shigeru HANADA <shigeru.hanada@gmail.com>:
Sorry for delayed response.
On Sun, Oct 21, 2012 at 3:16 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
2012/10/11 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
I've reviewed your patch quickly. I noticed that the patch has been
created in
a slightly different way from the guidelines:
http://www.postgresql.org/docs/devel/static/fdw-planning.html The
guidelines
say the following, but the patch identifies the clauses in
baserel->baserestrictinfo in GetForeignRelSize, not GetForeignPaths.
Also, it
has been implemented so that most sub_expressions are evaluated at the
remote
end, not the local end, though I'm missing something. For postgresql_fdw
to be
a good reference for FDW developers, ISTM it would be better that it be
consistent with the guidelines. I think it would be needed to update the
following document or redesign the function to be consistent with the
following
document.Hmm. It seems to me Fujita-san's comment is right.
Indeed postgresql_fdw touches baserestrictinfo in GetForeignRelSize, but
it's because of optimization for better width estimate. The doc Fujita-san
pointed says that:The actual identification of such a clause should happen during
GetForeignPaths, since it would affect the cost estimate for the
path.I understood this description says that "you need to touch baserestrict info
*before* GetForeignPlan to estimate costs of optimized path". I don't feel
that this description prohibits FDW to touch baserestrictinfo in
GetForeignRelSize, but mentioning it clearly might be better.
Hanada-san,
Isn't it possible to pick-up only columns to be used in targetlist or
local qualifiers,
without modification of baserestrictinfo?
Unless we put WHERE clause on EXPLAIN statement for cost estimation on
GetForeignRelSize, all we have to know is list of columns to be fetched using
underlying queries. Once we construct a SELECT statement without WHERE
clause on GetForeignRelSize stage, it is never difficult to append it later
according to the same criteria being implemented at classifyConditions.
I'd like to see committer's opinion here.
Please give Hanada-san some directions.
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
Kohei KaiGai <kaigai@kaigai.gr.jp> writes:
2012/11/6 Shigeru HANADA <shigeru.hanada@gmail.com>:
Indeed postgresql_fdw touches baserestrictinfo in GetForeignRelSize, but
it's because of optimization for better width estimate. The doc Fujita-san
pointed says that:
The actual identification of such a clause should happen during
GetForeignPaths, since it would affect the cost estimate for the
path.
I understood this description says that "you need to touch baserestrict info
*before* GetForeignPlan to estimate costs of optimized path". I don't feel
that this description prohibits FDW to touch baserestrictinfo in
GetForeignRelSize, but mentioning it clearly might be better.
Isn't it possible to pick-up only columns to be used in targetlist or
local qualifiers, without modification of baserestrictinfo?
What the doc means to suggest is that you can look through the
baserestrictinfo list and then record information elsewhere about
interesting clauses you find. If the FDW is actually *modifying* that
list, I agree that seems like a bad idea. I don't recall anything in
the core system that does that, so it seems fragile. The closest
parallel I can think of in the core system is indexscans pulling out
restriction clauses to use as index quals. That code doesn't modify
the baserestrictinfo list, only make new lists with some of the same
entries.
regards, tom lane
On 2012/11/07, at 1:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Isn't it possible to pick-up only columns to be used in targetlist or
local qualifiers, without modification of baserestrictinfo?What the doc means to suggest is that you can look through the
baserestrictinfo list and then record information elsewhere about
interesting clauses you find. If the FDW is actually *modifying* that
list, I agree that seems like a bad idea.
Kaigai-san might have misunderstood that postgresql_fdw changes
baserestrictinfo, since it did so in old implementation.
ClassifyConditions creates new lists, local_conds and remote_conds,
which have cells which point RestrictInfo(s) in baserestrictinfo.
It doesn't copy RestrictInfo for new lists, but I think it's ok
because baserestrictinfo list itself and RestrictInfo(s) pointed by
it are never modified by postgresql_fdw.
I don't recall anything in
the core system that does that, so it seems fragile. The closest
parallel I can think of in the core system is indexscans pulling out
restriction clauses to use as index quals. That code doesn't modify
the baserestrictinfo list, only make new lists with some of the same
entries.
Thanks for the advise. I found relation_excluded_by_constraints
which is called by set_rel_size() creates new RestrictInfo list from
baserestrictinfo, and this seems like what postgresql_fdw does in
GetForeignRelSize, from the perspective of relation size estimation.
Regards,
--
Shigeru HANADA
shigeru.hanada@gmail.com
=?iso-2022-jp?B?GyRCMlZFRBsoQiAbJEJMUBsoQg==?= <shigeru.hanada@gmail.com> writes:
ClassifyConditions creates new lists, local_conds and remote_conds,
which have cells which point RestrictInfo(s) in baserestrictinfo.
It doesn't copy RestrictInfo for new lists, but I think it's ok
because baserestrictinfo list itself and RestrictInfo(s) pointed by
it are never modified by postgresql_fdw.
That's good. I think there are actually some assumptions that
RestrictInfo nodes are not copied once created. You can link them into
new lists all you want, but don't copy them.
regards, tom lane
Hi Kaigai-san,
Sorry for delayed response. I updated the patch, although I didn't change
any about timing issue you and Fujita-san concern.
1) add some FDW options for cost estimation. Default behavior is not
changed.
2) get rid of array of libpq option names, similary to recent change of
dblink
3) enhance document, especially remote query optimization
4) rename to postgres_fdw, to avoid naming conflict with the validator
which exists in core
5) cope with changes about error context handling
On Tue, Nov 6, 2012 at 7:36 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
Isn't it possible to pick-up only columns to be used in targetlist or
local qualifiers,
without modification of baserestrictinfo?
IMO, it's possible. postgres_fdw doesn't modify baserestrictinfo at all;
it just create two new lists which exclusively point RestrictInfo elements
in baserestrictinfo. Pulling vars up from conditions which can't be pushed
down would gives us list of necessary columns. Am I missing something?
--
Shigeru HANADA
Attachments:
postgres_fdw.v3.patchapplication/octet-stream; name=postgres_fdw.v3.patchDownload
diff --git a/contrib/Makefile b/contrib/Makefile
index d230451..7c6009d 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -43,6 +43,7 @@ SUBDIRS = \
pgcrypto \
pgrowlocks \
pgstattuple \
+ postgres_fdw \
seg \
spi \
tablefunc \
diff --git a/contrib/postgres_fdw/.gitignore b/contrib/postgres_fdw/.gitignore
new file mode 100644
index 0000000..0854728
--- /dev/null
+++ b/contrib/postgres_fdw/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/results/
+*.o
+*.so
diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
new file mode 100644
index 0000000..8dac777
--- /dev/null
+++ b/contrib/postgres_fdw/Makefile
@@ -0,0 +1,22 @@
+# contrib/postgres_fdw/Makefile
+
+MODULE_big = postgres_fdw
+OBJS = postgres_fdw.o option.o deparse.o connection.o
+PG_CPPFLAGS = -I$(libpq_srcdir)
+SHLIB_LINK = $(libpq)
+
+EXTENSION = postgres_fdw
+DATA = postgres_fdw--1.0.sql
+
+REGRESS = postgres_fdw
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/postgres_fdw
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
new file mode 100644
index 0000000..eab8b87
--- /dev/null
+++ b/contrib/postgres_fdw/connection.c
@@ -0,0 +1,605 @@
+/*-------------------------------------------------------------------------
+ *
+ * connection.c
+ * Connection management for postgres_fdw
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgres_fdw/connection.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_type.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq-fe.h"
+#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "utils/array.h"
+#include "utils/builtins.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "utils/tuplestore.h"
+
+#include "postgres_fdw.h"
+#include "connection.h"
+
+/* ============================================================================
+ * Connection management functions
+ * ==========================================================================*/
+
+/*
+ * Connection cache entry managed with hash table.
+ */
+typedef struct ConnCacheEntry
+{
+ /* hash key must be first */
+ Oid serverid; /* oid of foreign server */
+ Oid userid; /* oid of local user */
+
+ bool use_tx; /* true when using remote transaction */
+ int refs; /* reference counter */
+ PGconn *conn; /* foreign server connection */
+} ConnCacheEntry;
+
+/*
+ * Hash table which is used to cache connection to PostgreSQL servers, will be
+ * initialized before first attempt to connect PostgreSQL server by the backend.
+ */
+static HTAB *ConnectionHash;
+
+/* ----------------------------------------------------------------------------
+ * prototype of private functions
+ * --------------------------------------------------------------------------*/
+static void
+cleanup_connection(ResourceReleasePhase phase,
+ bool isCommit,
+ bool isTopLevel,
+ void *arg);
+static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
+static void begin_remote_tx(PGconn *conn);
+static void abort_remote_tx(PGconn *conn);
+
+/*
+ * Get a PGconn which can be used to execute foreign query on the remote
+ * PostgreSQL server with the user's authorization. If this was the first
+ * request for the server, new connection is established.
+ *
+ * When use_tx is true, remote transaction is started if caller is the only
+ * user of the connection. Isolation level of the remote transaction is same
+ * as local transaction, and remote transaction will be aborted when last
+ * user release.
+ *
+ * TODO: Note that caching connections requires a mechanism to detect change of
+ * FDW object to invalidate already established connections.
+ */
+PGconn *
+GetConnection(ForeignServer *server, UserMapping *user, bool use_tx)
+{
+ bool found;
+ ConnCacheEntry *entry;
+ ConnCacheEntry key;
+
+ /* initialize connection cache if it isn't */
+ if (ConnectionHash == NULL)
+ {
+ HASHCTL ctl;
+
+ /* hash key is a pair of oids: serverid and userid */
+ MemSet(&ctl, 0, sizeof(ctl));
+ ctl.keysize = sizeof(Oid) + sizeof(Oid);
+ ctl.entrysize = sizeof(ConnCacheEntry);
+ ctl.hash = tag_hash;
+ ctl.match = memcmp;
+ ctl.keycopy = memcpy;
+ /* allocate ConnectionHash in the cache context */
+ ctl.hcxt = CacheMemoryContext;
+ ConnectionHash = hash_create("postgres_fdw connections", 32,
+ &ctl,
+ HASH_ELEM | HASH_CONTEXT |
+ HASH_FUNCTION | HASH_COMPARE |
+ HASH_KEYCOPY);
+
+ /*
+ * Register postgres_fdw's own cleanup function for connection
+ * cleanup. This should be done just once for each backend.
+ */
+ RegisterResourceReleaseCallback(cleanup_connection, ConnectionHash);
+ }
+
+ /* Create key value for the entry. */
+ MemSet(&key, 0, sizeof(key));
+ key.serverid = server->serverid;
+ key.userid = GetOuterUserId();
+
+ /*
+ * Find cached entry for requested connection. If we couldn't find,
+ * callback function of ResourceOwner should be registered to clean the
+ * connection up on error including user interrupt.
+ */
+ entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
+ if (!found)
+ {
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ }
+
+ /*
+ * We don't check the health of cached connection here, because it would
+ * require some overhead. Broken connection and its cache entry will be
+ * cleaned up when the connection is actually used.
+ */
+
+ /*
+ * If cache entry doesn't have connection, we have to establish new
+ * connection.
+ */
+ if (entry->conn == NULL)
+ {
+ PGconn *volatile conn = NULL;
+
+ /*
+ * Use PG_TRY block to ensure closing connection on error.
+ */
+ PG_TRY();
+ {
+ /*
+ * Connect to the foreign PostgreSQL server, and store it in cache
+ * entry to keep new connection.
+ * Note: key items of entry has already been initialized in
+ * hash_search(HASH_ENTER).
+ */
+ conn = connect_pg_server(server, user);
+ }
+ PG_CATCH();
+ {
+ /* Clear connection cache entry on error case. */
+ PQfinish(entry->conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+ entry->conn = conn;
+ elog(DEBUG3, "new postgres_fdw connection %p for server %s",
+ entry->conn, server->servername);
+ }
+
+ /* Increase connection reference counter. */
+ entry->refs++;
+
+ /*
+ * If remote transaction is requested but it has not started, start remote
+ * transaction with the same isolation level as the local transaction we
+ * are in. We need to remember whether this connection uses remote
+ * transaction to abort it when this connection is released completely.
+ */
+ if (use_tx && !entry->use_tx)
+ {
+ begin_remote_tx(entry->conn);
+ entry->use_tx = use_tx;
+ }
+
+ return entry->conn;
+}
+
+/*
+ * For non-superusers, insist that the connstr specify a password. This
+ * prevents a password from being picked up from .pgpass, a service file,
+ * the environment, etc. We don't want the postgres user's passwords
+ * to be accessible to non-superusers.
+ */
+static void
+check_conn_params(const char **keywords, const char **values)
+{
+ int i;
+
+ /* no check required if superuser */
+ if (superuser())
+ return;
+
+ /* ok if params contain a non-empty password */
+ for (i = 0; keywords[i] != NULL; i++)
+ {
+ if (strcmp(keywords[i], "password") == 0 && values[i][0] != '\0')
+ return;
+ }
+
+ ereport(ERROR,
+ (errcode(ERRCODE_S_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED),
+ errmsg("password is required"),
+ errdetail("Non-superusers must provide a password in the connection string.")));
+}
+
+static PGconn *
+connect_pg_server(ForeignServer *server, UserMapping *user)
+{
+ const char *conname = server->servername;
+ PGconn *conn;
+ const char **all_keywords;
+ const char **all_values;
+ const char **keywords;
+ const char **values;
+ int n;
+ int i, j;
+
+ /*
+ * Construct connection params from generic options of ForeignServer and
+ * UserMapping. Those two object hold only libpq options.
+ * Extra 3 items are for:
+ * *) fallback_application_name
+ * *) client_encoding
+ * *) NULL termination (end marker)
+ *
+ * Note: We don't omit any parameters even target database might be older
+ * than local, because unexpected parameters are just ignored.
+ */
+ n = list_length(server->options) + list_length(user->options) + 3;
+ all_keywords = (const char **) palloc(sizeof(char *) * n);
+ all_values = (const char **) palloc(sizeof(char *) * n);
+ keywords = (const char **) palloc(sizeof(char *) * n);
+ values = (const char **) palloc(sizeof(char *) * n);
+ n = 0;
+ n += ExtractConnectionOptions(server->options,
+ all_keywords + n, all_values + n);
+ n += ExtractConnectionOptions(user->options,
+ all_keywords + n, all_values + n);
+ all_keywords[n] = all_values[n] = NULL;
+
+ for (i = 0, j = 0; all_keywords[i]; i++)
+ {
+ keywords[j] = all_keywords[i];
+ values[j] = all_values[i];
+ j++;
+ }
+
+ /* Use "postgres_fdw" as fallback_application_name. */
+ keywords[j] = "fallback_application_name";
+ values[j++] = "postgres_fdw";
+
+ /* Set client_encoding so that libpq can convert encoding properly. */
+ keywords[j] = "client_encoding";
+ values[j++] = GetDatabaseEncodingName();
+
+ keywords[j] = values[j] = NULL;
+ pfree(all_keywords);
+ pfree(all_values);
+
+ /* verify connection parameters and do connect */
+ check_conn_params(keywords, values);
+ conn = PQconnectdbParams(keywords, values, 0);
+ if (!conn || PQstatus(conn) != CONNECTION_OK)
+ ereport(ERROR,
+ (errcode(ERRCODE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION),
+ errmsg("could not connect to server \"%s\"", conname),
+ errdetail("%s", PQerrorMessage(conn))));
+ pfree(keywords);
+ pfree(values);
+
+ /*
+ * Check that non-superuser has used password to establish connection.
+ * This check logic is based on dblink_security_check() in contrib/dblink.
+ *
+ * XXX Should we check this even if we don't provide unsafe version like
+ * dblink_connect_u()?
+ */
+ if (!superuser() && !PQconnectionUsedPassword(conn))
+ {
+ PQfinish(conn);
+ ereport(ERROR,
+ (errcode(ERRCODE_S_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED),
+ errmsg("password is required"),
+ errdetail("Non-superuser cannot connect if the server does not request a password."),
+ errhint("Target server's authentication method must be changed.")));
+ }
+
+ return conn;
+}
+
+/*
+ * Start remote transaction with proper isolation level.
+ */
+static void
+begin_remote_tx(PGconn *conn)
+{
+ const char *sql = NULL; /* keep compiler quiet. */
+ PGresult *res;
+
+ switch (XactIsoLevel)
+ {
+ case XACT_READ_UNCOMMITTED:
+ case XACT_READ_COMMITTED:
+ case XACT_REPEATABLE_READ:
+ sql = "START TRANSACTION ISOLATION LEVEL REPEATABLE READ";
+ break;
+ case XACT_SERIALIZABLE:
+ sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
+ break;
+ default:
+ elog(ERROR, "unexpected isolation level: %d", XactIsoLevel);
+ break;
+ }
+
+ elog(DEBUG3, "starting remote transaction with \"%s\"", sql);
+
+ res = PQexec(conn, sql);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ PQclear(res);
+ elog(ERROR, "could not start transaction: %s", PQerrorMessage(conn));
+ }
+ PQclear(res);
+}
+
+static void
+abort_remote_tx(PGconn *conn)
+{
+ PGresult *res;
+
+ elog(DEBUG3, "aborting remote transaction");
+
+ res = PQexec(conn, "ABORT TRANSACTION");
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ PQclear(res);
+ elog(ERROR, "could not abort transaction: %s", PQerrorMessage(conn));
+ }
+ PQclear(res);
+}
+
+/*
+ * Mark the connection as "unused", and close it if the caller was the last
+ * user of the connection.
+ */
+void
+ReleaseConnection(PGconn *conn)
+{
+ HASH_SEQ_STATUS scan;
+ ConnCacheEntry *entry;
+
+ if (conn == NULL)
+ return;
+
+ /*
+ * We need to scan sequentially since we use the address to find
+ * appropriate PGconn from the hash table.
+ */
+ hash_seq_init(&scan, ConnectionHash);
+ while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
+ {
+ if (entry->conn == conn)
+ {
+ hash_seq_term(&scan);
+ break;
+ }
+ }
+
+ /*
+ * If the given connection is an orphan, it must be a dangling pointer to
+ * already released connection. Discarding connection due to remote query
+ * error would produce such situation (see comments below).
+ */
+ if (entry == NULL)
+ return;
+
+ /*
+ * If releasing connection is broken or its transaction has failed,
+ * discard the connection to recover from the error. PQfinish would cause
+ * dangling pointer of shared PGconn object, but they won't double-free'd
+ * because their pointer values don't match any of cached entry and ignored
+ * at the check above.
+ *
+ * Subsequent connection request via GetConnection will create new
+ * connection.
+ */
+ if (PQstatus(conn) != CONNECTION_OK ||
+ (PQtransactionStatus(conn) != PQTRANS_IDLE &&
+ PQtransactionStatus(conn) != PQTRANS_INTRANS))
+ {
+ elog(DEBUG3, "discarding connection: %s %s",
+ PQstatus(conn) == CONNECTION_OK ? "OK" : "NG",
+ PQtransactionStatus(conn) == PQTRANS_IDLE ? "IDLE" :
+ PQtransactionStatus(conn) == PQTRANS_ACTIVE ? "ACTIVE" :
+ PQtransactionStatus(conn) == PQTRANS_INTRANS ? "INTRANS" :
+ PQtransactionStatus(conn) == PQTRANS_INERROR ? "INERROR" :
+ "UNKNOWN");
+ PQfinish(conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ return;
+ }
+
+ /*
+ * Decrease reference counter of this connection. Even if the caller was
+ * the last referrer, we don't unregister it from cache.
+ */
+ entry->refs--;
+ if (entry->refs < 0)
+ entry->refs = 0; /* just in case */
+
+ /*
+ * If this connection uses remote transaction and there is no user other
+ * than the caller, abort the remote transaction and forget about it.
+ */
+ if (entry->use_tx && entry->refs == 0)
+ {
+ abort_remote_tx(conn);
+ entry->use_tx = false;
+ }
+}
+
+/*
+ * Clean the connection up via ResourceOwner.
+ */
+static void
+cleanup_connection(ResourceReleasePhase phase,
+ bool isCommit,
+ bool isTopLevel,
+ void *arg)
+{
+ HASH_SEQ_STATUS scan;
+ ConnCacheEntry *entry = (ConnCacheEntry *) arg;
+
+ /* If the transaction was committed, don't close connections. */
+ if (isCommit)
+ return;
+
+ /*
+ * We clean the connection up on post-lock because foreign connections are
+ * backend-internal resource.
+ */
+ if (phase != RESOURCE_RELEASE_AFTER_LOCKS)
+ return;
+
+ /*
+ * We ignore cleanup for ResourceOwners other than transaction. At this
+ * point, such a ResourceOwner is only Portal.
+ */
+ if (CurrentResourceOwner != CurTransactionResourceOwner)
+ return;
+
+ /*
+ * We don't need to clean up at end of subtransactions, because they might
+ * be recovered to consistent state with savepoints.
+ */
+ if (!isTopLevel)
+ return;
+
+ /*
+ * Here, it must be after abort of top level transaction. Disconnect all
+ * cached connections to clear error status out and reset their reference
+ * counters.
+ */
+ hash_seq_init(&scan, ConnectionHash);
+ while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
+ {
+ elog(DEBUG3, "discard postgres_fdw connection %p due to resowner cleanup",
+ entry->conn);
+ PQfinish(entry->conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ }
+}
+
+/*
+ * Get list of connections currently active.
+ */
+Datum postgres_fdw_get_connections(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgres_fdw_get_connections);
+Datum
+postgres_fdw_get_connections(PG_FUNCTION_ARGS)
+{
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ HASH_SEQ_STATUS scan;
+ ConnCacheEntry *entry;
+ MemoryContext oldcontext = CurrentMemoryContext;
+ Tuplestorestate *tuplestore;
+ TupleDesc tupdesc;
+
+ /* We return list of connection with storing them in a Tuplestore. */
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = NULL;
+ rsinfo->setDesc = NULL;
+
+ /* Create tuplestore and copy of TupleDesc in per-query context. */
+ MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+
+ tupdesc = CreateTemplateTupleDesc(2, false);
+ TupleDescInitEntry(tupdesc, 1, "srvid", OIDOID, -1, 0);
+ TupleDescInitEntry(tupdesc, 2, "usesysid", OIDOID, -1, 0);
+ rsinfo->setDesc = tupdesc;
+
+ tuplestore = tuplestore_begin_heap(false, false, work_mem);
+ rsinfo->setResult = tuplestore;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * We need to scan sequentially since we use the address to find
+ * appropriate PGconn from the hash table.
+ */
+ if (ConnectionHash != NULL)
+ {
+ hash_seq_init(&scan, ConnectionHash);
+ while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
+ {
+ Datum values[2];
+ bool nulls[2];
+ HeapTuple tuple;
+
+ /* Ignore inactive connections */
+ if (PQstatus(entry->conn) != CONNECTION_OK)
+ continue;
+
+ /*
+ * Ignore other users' connections if current user isn't a
+ * superuser.
+ */
+ if (!superuser() && entry->userid != GetUserId())
+ continue;
+
+ values[0] = ObjectIdGetDatum(entry->serverid);
+ values[1] = ObjectIdGetDatum(entry->userid);
+ nulls[0] = false;
+ nulls[1] = false;
+
+ tuple = heap_formtuple(tupdesc, values, nulls);
+ tuplestore_puttuple(tuplestore, tuple);
+ }
+ }
+ tuplestore_donestoring(tuplestore);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Discard persistent connection designated by given connection name.
+ */
+Datum postgres_fdw_disconnect(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgres_fdw_disconnect);
+Datum
+postgres_fdw_disconnect(PG_FUNCTION_ARGS)
+{
+ Oid serverid = PG_GETARG_OID(0);
+ Oid userid = PG_GETARG_OID(1);
+ ConnCacheEntry key;
+ ConnCacheEntry *entry = NULL;
+ bool found;
+
+ /* Non-superuser can't discard other users' connection. */
+ if (!superuser() && userid != GetOuterUserId())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+ errmsg("only superuser can discard other user's connection")));
+
+ /*
+ * If no connection has been established, or no such connections, just
+ * return "NG" to indicate nothing has done.
+ */
+ if (ConnectionHash == NULL)
+ PG_RETURN_TEXT_P(cstring_to_text("NG"));
+
+ key.serverid = serverid;
+ key.userid = userid;
+ entry = hash_search(ConnectionHash, &key, HASH_FIND, &found);
+ if (!found)
+ PG_RETURN_TEXT_P(cstring_to_text("NG"));
+
+ /* Discard cached connection, and clear reference counter. */
+ PQfinish(entry->conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+
+ PG_RETURN_TEXT_P(cstring_to_text("OK"));
+}
diff --git a/contrib/postgres_fdw/connection.h b/contrib/postgres_fdw/connection.h
new file mode 100644
index 0000000..4c9d850
--- /dev/null
+++ b/contrib/postgres_fdw/connection.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * connection.h
+ * Connection management for postgres_fdw
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgres_fdw/connection.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CONNECTION_H
+#define CONNECTION_H
+
+#include "foreign/foreign.h"
+#include "libpq-fe.h"
+
+/*
+ * Connection management
+ */
+PGconn *GetConnection(ForeignServer *server, UserMapping *user, bool use_tx);
+void ReleaseConnection(PGconn *conn);
+
+#endif /* CONNECTION_H */
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
new file mode 100644
index 0000000..bfcf93a
--- /dev/null
+++ b/contrib/postgres_fdw/deparse.c
@@ -0,0 +1,1203 @@
+/*-------------------------------------------------------------------------
+ *
+ * deparse.c
+ * query deparser for PostgreSQL
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgres_fdw/deparse.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/transam.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "commands/defrem.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "nodes/nodeFuncs.h"
+#include "nodes/nodes.h"
+#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/var.h"
+#include "parser/parser.h"
+#include "parser/parsetree.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+#include "postgres_fdw.h"
+
+/*
+ * Context for walk-through the expression tree.
+ */
+typedef struct foreign_executable_cxt
+{
+ PlannerInfo *root;
+ RelOptInfo *foreignrel;
+ bool has_param;
+} foreign_executable_cxt;
+
+/*
+ * Get string representation which can be used in SQL statement from a node.
+ */
+static void deparseExpr(StringInfo buf, Expr *expr, PlannerInfo *root);
+static void deparseRelation(StringInfo buf, RangeTblEntry *rte);
+static void deparseVar(StringInfo buf, Var *node, PlannerInfo *root);
+static void deparseConst(StringInfo buf, Const *node, PlannerInfo *root);
+static void deparseBoolExpr(StringInfo buf, BoolExpr *node, PlannerInfo *root);
+static void deparseNullTest(StringInfo buf, NullTest *node, PlannerInfo *root);
+static void deparseDistinctExpr(StringInfo buf, DistinctExpr *node,
+ PlannerInfo *root);
+static void deparseRelabelType(StringInfo buf, RelabelType *node,
+ PlannerInfo *root);
+static void deparseFuncExpr(StringInfo buf, FuncExpr *node, PlannerInfo *root);
+static void deparseParam(StringInfo buf, Param *node, PlannerInfo *root);
+static void deparseScalarArrayOpExpr(StringInfo buf, ScalarArrayOpExpr *node,
+ PlannerInfo *root);
+static void deparseOpExpr(StringInfo buf, OpExpr *node, PlannerInfo *root);
+static void deparseArrayRef(StringInfo buf, ArrayRef *node, PlannerInfo *root);
+static void deparseArrayExpr(StringInfo buf, ArrayExpr *node, PlannerInfo *root);
+
+/*
+ * Determine whether an expression can be evaluated on remote side safely.
+ */
+static bool is_foreign_expr(PlannerInfo *root, RelOptInfo *baserel, Expr *expr,
+ bool *has_param);
+static bool foreign_expr_walker(Node *node, foreign_executable_cxt *context);
+static bool is_builtin(Oid procid);
+
+/*
+ * Deparse query representation into SQL statement which suits for remote
+ * PostgreSQL server. This function basically creates simple query string
+ * which consists of only SELECT, FROM clauses.
+ *
+ * Remote SELECT clause contains only columns which are used in targetlist or
+ * local_conds (conditions which can't be pushed down and will be checked on
+ * local side).
+ */
+void
+deparseSimpleSql(StringInfo buf,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ List *local_conds)
+{
+ RangeTblEntry *rte;
+ ListCell *lc;
+ StringInfoData foreign_relname;
+ bool first;
+ AttrNumber attr;
+ List *attr_used = NIL; /* List of AttNumber used in the query */
+
+ initStringInfo(buf);
+ initStringInfo(&foreign_relname);
+
+ /*
+ * First of all, determine which column should be retrieved for this scan.
+ *
+ * We do this before deparsing SELECT clause because attributes which are
+ * not used in neither reltargetlist nor baserel->baserestrictinfo, quals
+ * evaluated on local, can be replaced with literal "NULL" in the SELECT
+ * clause to reduce overhead of tuple handling tuple and data transfer.
+ */
+ foreach (lc, local_conds)
+ {
+ RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
+ List *attrs;
+
+ /*
+ * We need to know which attributes are used in qual evaluated
+ * on the local server, because they should be listed in the
+ * SELECT clause of remote query. We can ignore attributes
+ * which are referenced only in ORDER BY/GROUP BY clause because
+ * such attributes has already been kept in reltargetlist.
+ */
+ attrs = pull_var_clause((Node *) ri->clause,
+ PVC_RECURSE_AGGREGATES,
+ PVC_RECURSE_PLACEHOLDERS);
+ attr_used = list_union(attr_used, attrs);
+ }
+
+ /*
+ * deparse SELECT clause
+ *
+ * List attributes which are in either target list or local restriction.
+ * Unused attributes are replaced with a literal "NULL" for optimization.
+ *
+ * Note that nothing is added for dropped columns, though tuple constructor
+ * function requires entries for dropped columns. Such entries must be
+ * initialized with NULL before calling tuple constructor.
+ */
+ appendStringInfo(buf, "SELECT ");
+ rte = root->simple_rte_array[baserel->relid];
+ attr_used = list_union(attr_used, baserel->reltargetlist);
+ first = true;
+ for (attr = 1; attr <= baserel->max_attr; attr++)
+ {
+ Var *var = NULL;
+ ListCell *lc;
+
+ /* Ignore dropped attributes. */
+ if (get_rte_attribute_is_dropped(rte, attr))
+ continue;
+
+ if (!first)
+ appendStringInfo(buf, ", ");
+ first = false;
+
+ /*
+ * We use linear search here, but it wouldn't be problem since
+ * attr_used seems to not become so large.
+ */
+ foreach (lc, attr_used)
+ {
+ var = lfirst(lc);
+ if (var->varattno == attr)
+ break;
+ var = NULL;
+ }
+ if (var != NULL)
+ deparseVar(buf, var, root);
+ else
+ appendStringInfo(buf, "NULL");
+ }
+ appendStringInfoChar(buf, ' ');
+
+ /*
+ * deparse FROM clause, including alias if any
+ */
+ appendStringInfo(buf, "FROM ");
+ deparseRelation(buf, root->simple_rte_array[baserel->relid]);
+}
+
+/*
+ * Examine each element in the list baserestrictinfo of baserel, and classify
+ * them into three groups: remote_conds contains conditions which can be
+ * evaluated
+ * - remote_conds is push-down safe, and don't contain any Param node
+ * - param_conds is push-down safe, but contain some Param node
+ * - local_conds is not push-down safe
+ *
+ * Only remote_conds can be used in remote EXPLAIN, and remote_conds and
+ * param_conds can be used in final remote query.
+ */
+void
+classifyConditions(PlannerInfo *root,
+ RelOptInfo *baserel,
+ List **remote_conds,
+ List **param_conds,
+ List **local_conds)
+{
+ ListCell *lc;
+ bool has_param;
+
+ Assert(remote_conds);
+ Assert(param_conds);
+ Assert(local_conds);
+
+ foreach(lc, baserel->baserestrictinfo)
+ {
+ RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
+
+ if (is_foreign_expr(root, baserel, ri->clause, &has_param))
+ {
+ if (has_param)
+ *param_conds = lappend(*param_conds, ri);
+ else
+ *remote_conds = lappend(*remote_conds, ri);
+ }
+ else
+ *local_conds = lappend(*local_conds, ri);
+ }
+}
+
+/*
+ * Deparse SELECT statement to acquire sample rows of given relation into buf.
+ */
+void
+deparseAnalyzeSql(StringInfo buf, Relation rel)
+{
+ Oid relid = RelationGetRelid(rel);
+ TupleDesc tupdesc = RelationGetDescr(rel);
+ int i;
+ char *colname;
+ List *options;
+ ListCell *lc;
+ bool first = true;
+ char *nspname;
+ char *relname;
+ ForeignTable *table;
+
+ /* Deparse SELECT clause, use attribute name or colname option. */
+ appendStringInfo(buf, "SELECT ");
+ for (i = 0; i < tupdesc->natts; i++)
+ {
+ if (tupdesc->attrs[i]->attisdropped)
+ continue;
+
+ colname = NameStr(tupdesc->attrs[i]->attname);
+ options = GetForeignColumnOptions(relid, tupdesc->attrs[i]->attnum);
+
+ foreach(lc, options)
+ {
+ DefElem *def= (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "colname") == 0)
+ {
+ colname = defGetString(def);
+ break;
+ }
+ }
+
+ if (!first)
+ appendStringInfo(buf, ", ");
+ appendStringInfo(buf, "%s", quote_identifier(colname));
+ first = false;
+ }
+
+ /*
+ * Deparse FROM clause, use namespace and relation name, or use nspname and
+ * colname options respectively.
+ */
+ nspname = get_namespace_name(get_rel_namespace(relid));
+ relname = get_rel_name(relid);
+ table = GetForeignTable(relid);
+ foreach(lc, table->options)
+ {
+ DefElem *def= (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "nspname") == 0)
+ nspname = defGetString(def);
+ else if (strcmp(def->defname, "relname") == 0)
+ relname = defGetString(def);
+ }
+
+ appendStringInfo(buf, " FROM %s.%s", quote_identifier(nspname),
+ quote_identifier(relname));
+}
+
+/*
+ * Deparse given expression into buf. Actual string operation is delegated to
+ * node-type-specific functions.
+ *
+ * Note that switch statement of this function MUST match the one in
+ * foreign_expr_walker to avoid unsupported error..
+ */
+static void
+deparseExpr(StringInfo buf, Expr *node, PlannerInfo *root)
+{
+ /*
+ * This part must be match foreign_expr_walker.
+ */
+ switch (nodeTag(node))
+ {
+ case T_Const:
+ deparseConst(buf, (Const *) node, root);
+ break;
+ case T_BoolExpr:
+ deparseBoolExpr(buf, (BoolExpr *) node, root);
+ break;
+ case T_NullTest:
+ deparseNullTest(buf, (NullTest *) node, root);
+ break;
+ case T_DistinctExpr:
+ deparseDistinctExpr(buf, (DistinctExpr *) node, root);
+ break;
+ case T_RelabelType:
+ deparseRelabelType(buf, (RelabelType *) node, root);
+ break;
+ case T_FuncExpr:
+ deparseFuncExpr(buf, (FuncExpr *) node, root);
+ break;
+ case T_Param:
+ deparseParam(buf, (Param *) node, root);
+ break;
+ case T_ScalarArrayOpExpr:
+ deparseScalarArrayOpExpr(buf, (ScalarArrayOpExpr *) node, root);
+ break;
+ case T_OpExpr:
+ deparseOpExpr(buf, (OpExpr *) node, root);
+ break;
+ case T_Var:
+ deparseVar(buf, (Var *) node, root);
+ break;
+ case T_ArrayRef:
+ deparseArrayRef(buf, (ArrayRef *) node, root);
+ break;
+ case T_ArrayExpr:
+ deparseArrayExpr(buf, (ArrayExpr *) node, root);
+ break;
+ default:
+ {
+ ereport(ERROR,
+ (errmsg("unsupported expression for deparse"),
+ errdetail("%s", nodeToString(node))));
+ }
+ break;
+ }
+}
+
+/*
+ * Deparse given Var node into buf. If the column has colname FDW option, use
+ * its value instead of attribute name.
+ */
+static void
+deparseVar(StringInfo buf, Var *node, PlannerInfo *root)
+{
+ RangeTblEntry *rte;
+ char *colname = NULL;
+ const char *q_colname = NULL;
+ List *options;
+ ListCell *lc;
+
+ /* node must not be any of OUTER_VAR,INNER_VAR and INDEX_VAR. */
+ Assert(node->varno >= 1 && node->varno <= root->simple_rel_array_size);
+
+ /* Get RangeTblEntry from array in PlannerInfo. */
+ rte = root->simple_rte_array[node->varno];
+
+ /*
+ * If the node is a column of a foreign table, and it has colname FDW
+ * option, use its value.
+ */
+ options = GetForeignColumnOptions(rte->relid, node->varattno);
+ foreach(lc, options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "colname") == 0)
+ {
+ colname = defGetString(def);
+ break;
+ }
+ }
+
+ /*
+ * If the node refers a column of a regular table or it doesn't have colname
+ * FDW option, use attribute name.
+ */
+ if (colname == NULL)
+ colname = get_attname(rte->relid, node->varattno);
+
+ q_colname = quote_identifier(colname);
+ appendStringInfo(buf, "%s", q_colname);
+}
+
+/*
+ * Deparse a RangeTblEntry node into buf. If rte represents a foreign table,
+ * use value of relname FDW option (if any) instead of relation's name.
+ * Similarly, nspname FDW option overrides schema name.
+ */
+static void
+deparseRelation(StringInfo buf, RangeTblEntry *rte)
+{
+ ForeignTable *table;
+ ListCell *lc;
+ const char *nspname = NULL; /* plain namespace name */
+ const char *relname = NULL; /* plain relation name */
+ const char *q_nspname; /* quoted namespace name */
+ const char *q_relname; /* quoted relation name */
+
+ /* obtain additional catalog information. */
+ table = GetForeignTable(rte->relid);
+
+ /*
+ * Use value of FDW options if any, instead of the name of object
+ * itself.
+ */
+ foreach(lc, table->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "nspname") == 0)
+ nspname = defGetString(def);
+ else if (strcmp(def->defname, "relname") == 0)
+ relname = defGetString(def);
+ }
+
+ /* Quote each identifier, if necessary. */
+ if (nspname == NULL)
+ nspname = get_namespace_name(get_rel_namespace(rte->relid));
+ q_nspname = quote_identifier(nspname);
+
+ if (relname == NULL)
+ relname = get_rel_name(rte->relid);
+ q_relname = quote_identifier(relname);
+
+ /* Construct relation reference into the buffer. */
+ appendStringInfo(buf, "%s.%s", q_nspname, q_relname);
+}
+
+/*
+ * Deparse given constant value into buf. This function have to be kept in
+ * sync with get_const_expr.
+ */
+static void
+deparseConst(StringInfo buf,
+ Const *node,
+ PlannerInfo *root)
+{
+ Oid typoutput;
+ bool typIsVarlena;
+ char *extval;
+ bool isfloat = false;
+ bool needlabel;
+
+ if (node->constisnull)
+ {
+ appendStringInfo(buf, "NULL");
+ return;
+ }
+
+ getTypeOutputInfo(node->consttype,
+ &typoutput, &typIsVarlena);
+ extval = OidOutputFunctionCall(typoutput, node->constvalue);
+
+ switch (node->consttype)
+ {
+ case ANYARRAYOID:
+ case ANYNONARRAYOID:
+ elog(ERROR, "anyarray and anyenum are not supported");
+ break;
+ case INT2OID:
+ case INT4OID:
+ case INT8OID:
+ case OIDOID:
+ case FLOAT4OID:
+ case FLOAT8OID:
+ case NUMERICOID:
+ {
+ /*
+ * No need to quote unless they contain special values such as
+ * 'Nan'.
+ */
+ if (strspn(extval, "0123456789+-eE.") == strlen(extval))
+ {
+ if (extval[0] == '+' || extval[0] == '-')
+ appendStringInfo(buf, "(%s)", extval);
+ else
+ appendStringInfoString(buf, extval);
+ if (strcspn(extval, "eE.") != strlen(extval))
+ isfloat = true; /* it looks like a float */
+ }
+ else
+ appendStringInfo(buf, "'%s'", extval);
+ }
+ break;
+ case BITOID:
+ case VARBITOID:
+ appendStringInfo(buf, "B'%s'", extval);
+ break;
+ case BOOLOID:
+ if (strcmp(extval, "t") == 0)
+ appendStringInfoString(buf, "true");
+ else
+ appendStringInfoString(buf, "false");
+ break;
+
+ default:
+ {
+ const char *valptr;
+
+ appendStringInfoChar(buf, '\'');
+ for (valptr = extval; *valptr; valptr++)
+ {
+ char ch = *valptr;
+
+ /*
+ * standard_conforming_strings of remote session should be
+ * set to similar value as local session.
+ */
+ if (SQL_STR_DOUBLE(ch, !standard_conforming_strings))
+ appendStringInfoChar(buf, ch);
+ appendStringInfoChar(buf, ch);
+ }
+ appendStringInfoChar(buf, '\'');
+ }
+ break;
+ }
+
+ /*
+ * Append ::typename unless the constant will be implicitly typed as the
+ * right type when it is read in.
+ *
+ * XXX this code has to be kept in sync with the behavior of the parser,
+ * especially make_const.
+ */
+ switch (node->consttype)
+ {
+ case BOOLOID:
+ case INT4OID:
+ case UNKNOWNOID:
+ needlabel = false;
+ break;
+ case NUMERICOID:
+ needlabel = !isfloat || (node->consttypmod >= 0);
+ break;
+ default:
+ needlabel = true;
+ break;
+ }
+ if (needlabel)
+ {
+ appendStringInfo(buf, "::%s",
+ format_type_with_typemod(node->consttype,
+ node->consttypmod));
+ }
+}
+
+static void
+deparseBoolExpr(StringInfo buf,
+ BoolExpr *node,
+ PlannerInfo *root)
+{
+ ListCell *lc;
+ char *op = NULL; /* keep compiler quiet */
+ bool first;
+
+ switch (node->boolop)
+ {
+ case AND_EXPR:
+ op = "AND";
+ break;
+ case OR_EXPR:
+ op = "OR";
+ break;
+ case NOT_EXPR:
+ appendStringInfo(buf, "(NOT ");
+ deparseExpr(buf, list_nth(node->args, 0), root);
+ appendStringInfo(buf, ")");
+ return;
+ }
+
+ first = true;
+ appendStringInfo(buf, "(");
+ foreach(lc, node->args)
+ {
+ if (!first)
+ appendStringInfo(buf, " %s ", op);
+ deparseExpr(buf, (Expr *) lfirst(lc), root);
+ first = false;
+ }
+ appendStringInfo(buf, ")");
+}
+
+/*
+ * Deparse given IS [NOT] NULL test expression into buf.
+ */
+static void
+deparseNullTest(StringInfo buf,
+ NullTest *node,
+ PlannerInfo *root)
+{
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, node->arg, root);
+ if (node->nulltesttype == IS_NULL)
+ appendStringInfo(buf, " IS NULL)");
+ else
+ appendStringInfo(buf, " IS NOT NULL)");
+}
+
+static void
+deparseDistinctExpr(StringInfo buf,
+ DistinctExpr *node,
+ PlannerInfo *root)
+{
+ Assert(list_length(node->args) == 2);
+
+ deparseExpr(buf, linitial(node->args), root);
+ appendStringInfo(buf, " IS DISTINCT FROM ");
+ deparseExpr(buf, lsecond(node->args), root);
+}
+
+static void
+deparseRelabelType(StringInfo buf,
+ RelabelType *node,
+ PlannerInfo *root)
+{
+ char *typname;
+
+ Assert(node->arg);
+
+ /* We don't need to deparse cast when argument has same type as result. */
+ if (IsA(node->arg, Const) &&
+ ((Const *) node->arg)->consttype == node->resulttype &&
+ ((Const *) node->arg)->consttypmod == -1)
+ {
+ deparseExpr(buf, node->arg, root);
+ return;
+ }
+
+ typname = format_type_with_typemod(node->resulttype, node->resulttypmod);
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, node->arg, root);
+ appendStringInfo(buf, ")::%s", typname);
+}
+
+/*
+ * Deparse given node which represents a function call into buf. We treat only
+ * explicit function call and explicit cast (coerce), because others are
+ * processed on remote side if necessary.
+ *
+ * Function name (and type name) is always qualified by schema name to avoid
+ * problems caused by different setting of search_path on remote side.
+ */
+static void
+deparseFuncExpr(StringInfo buf,
+ FuncExpr *node,
+ PlannerInfo *root)
+{
+ Oid pronamespace;
+ const char *schemaname;
+ const char *funcname;
+ ListCell *arg;
+ bool first;
+
+ pronamespace = get_func_namespace(node->funcid);
+ schemaname = quote_identifier(get_namespace_name(pronamespace));
+ funcname = quote_identifier(get_func_name(node->funcid));
+
+ if (node->funcformat == COERCE_EXPLICIT_CALL)
+ {
+ /* Function call, deparse all arguments recursively. */
+ appendStringInfo(buf, "%s.%s(", schemaname, funcname);
+ first = true;
+ foreach(arg, node->args)
+ {
+ if (!first)
+ appendStringInfo(buf, ", ");
+ deparseExpr(buf, lfirst(arg), root);
+ first = false;
+ }
+ appendStringInfoChar(buf, ')');
+ }
+ else if (node->funcformat == COERCE_EXPLICIT_CAST)
+ {
+ /* Explicit cast, deparse only first argument. */
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, linitial(node->args), root);
+ appendStringInfo(buf, ")::%s", funcname);
+ }
+ else
+ {
+ /* Implicit cast, deparse only first argument. */
+ deparseExpr(buf, linitial(node->args), root);
+ }
+}
+
+/*
+ * Deparse given Param node into buf.
+ *
+ * We don't renumber parameter id, because skipping $1 is not cause problem
+ * as far as we pass through all arguments.
+ */
+static void
+deparseParam(StringInfo buf,
+ Param *node,
+ PlannerInfo *root)
+{
+ Assert(node->paramkind == PARAM_EXTERN);
+
+ appendStringInfo(buf, "$%d", node->paramid);
+}
+
+/*
+ * Deparse given ScalarArrayOpExpr expression into buf. To avoid problems
+ * around priority of operations, we always parenthesize the arguments. Also we
+ * use OPERATOR(schema.operator) notation to determine remote operator exactly.
+ */
+static void
+deparseScalarArrayOpExpr(StringInfo buf,
+ ScalarArrayOpExpr *node,
+ PlannerInfo *root)
+{
+ HeapTuple tuple;
+ Form_pg_operator form;
+ const char *opnspname;
+ char *opname;
+ Expr *arg1;
+ Expr *arg2;
+
+ /* Retrieve necessary information about the operator from system catalog. */
+ tuple = SearchSysCache1(OPEROID, ObjectIdGetDatum(node->opno));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for operator %u", node->opno);
+ form = (Form_pg_operator) GETSTRUCT(tuple);
+ /* opname is not a SQL identifier, so we don't need to quote it. */
+ opname = NameStr(form->oprname);
+ opnspname = quote_identifier(get_namespace_name(form->oprnamespace));
+ ReleaseSysCache(tuple);
+
+ /* Sanity check. */
+ Assert(list_length(node->args) == 2);
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, '(');
+
+ /* Extract operands. */
+ arg1 = linitial(node->args);
+ arg2 = lsecond(node->args);
+
+ /* Deparse fully qualified operator name. */
+ deparseExpr(buf, arg1, root);
+ appendStringInfo(buf, " OPERATOR(%s.%s) %s (",
+ opnspname, opname, node->useOr ? "ANY" : "ALL");
+ deparseExpr(buf, arg2, root);
+ appendStringInfoChar(buf, ')');
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, ')');
+}
+
+/*
+ * Deparse given operator expression into buf. To avoid problems around
+ * priority of operations, we always parenthesize the arguments. Also we use
+ * OPERATOR(schema.operator) notation to determine remote operator exactly.
+ */
+static void
+deparseOpExpr(StringInfo buf,
+ OpExpr *node,
+ PlannerInfo *root)
+{
+ HeapTuple tuple;
+ Form_pg_operator form;
+ const char *opnspname;
+ char *opname;
+ char oprkind;
+ ListCell *arg;
+
+ /* Retrieve necessary information about the operator from system catalog. */
+ tuple = SearchSysCache1(OPEROID, ObjectIdGetDatum(node->opno));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for operator %u", node->opno);
+ form = (Form_pg_operator) GETSTRUCT(tuple);
+ opnspname = quote_identifier(get_namespace_name(form->oprnamespace));
+ /* opname is not a SQL identifier, so we don't need to quote it. */
+ opname = NameStr(form->oprname);
+ oprkind = form->oprkind;
+ ReleaseSysCache(tuple);
+
+ /* Sanity check. */
+ Assert((oprkind == 'r' && list_length(node->args) == 1) ||
+ (oprkind == 'l' && list_length(node->args) == 1) ||
+ (oprkind == 'b' && list_length(node->args) == 2));
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, '(');
+
+ /* Deparse first operand. */
+ arg = list_head(node->args);
+ if (oprkind == 'r' || oprkind == 'b')
+ {
+ deparseExpr(buf, lfirst(arg), root);
+ appendStringInfoChar(buf, ' ');
+ }
+
+ /* Deparse fully qualified operator name. */
+ appendStringInfo(buf, "OPERATOR(%s.%s)", opnspname, opname);
+
+ /* Deparse last operand. */
+ arg = list_tail(node->args);
+ if (oprkind == 'l' || oprkind == 'b')
+ {
+ appendStringInfoChar(buf, ' ');
+ deparseExpr(buf, lfirst(arg), root);
+ }
+
+ appendStringInfoChar(buf, ')');
+}
+
+static void
+deparseArrayRef(StringInfo buf,
+ ArrayRef *node,
+ PlannerInfo *root)
+{
+ ListCell *lowlist_item;
+ ListCell *uplist_item;
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, '(');
+
+ /* Deparse referenced array expression first. */
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, node->refexpr, root);
+ appendStringInfoChar(buf, ')');
+
+ /* Deparse subscripts expression. */
+ lowlist_item = list_head(node->reflowerindexpr); /* could be NULL */
+ foreach(uplist_item, node->refupperindexpr)
+ {
+ appendStringInfoChar(buf, '[');
+ if (lowlist_item)
+ {
+ deparseExpr(buf, lfirst(lowlist_item), root);
+ appendStringInfoChar(buf, ':');
+ lowlist_item = lnext(lowlist_item);
+ }
+ deparseExpr(buf, lfirst(uplist_item), root);
+ appendStringInfoChar(buf, ']');
+ }
+
+ appendStringInfoChar(buf, ')');
+}
+
+
+/*
+ * Deparse given array of something into buf.
+ */
+static void
+deparseArrayExpr(StringInfo buf,
+ ArrayExpr *node,
+ PlannerInfo *root)
+{
+ ListCell *lc;
+ bool first = true;
+
+ appendStringInfo(buf, "ARRAY[");
+ foreach(lc, node->elements)
+ {
+ if (!first)
+ appendStringInfo(buf, ", ");
+ deparseExpr(buf, lfirst(lc), root);
+
+ first = false;
+ }
+ appendStringInfoChar(buf, ']');
+
+ /* If the array is empty, we need explicit cast to the array type. */
+ if (node->elements == NIL)
+ {
+ char *typname;
+
+ typname = format_type_with_typemod(node->array_typeid, -1);
+ appendStringInfo(buf, "::%s", typname);
+ }
+}
+
+/*
+ * Returns true if given expr is safe to evaluate on the foreign server. If
+ * result is true, extra information has_param tells whether given expression
+ * contains any Param node. This is useful to determine whether the expression
+ * can be used in remote EXPLAIN.
+ */
+static bool
+is_foreign_expr(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Expr *expr,
+ bool *has_param)
+{
+ foreign_executable_cxt context;
+ context.root = root;
+ context.foreignrel = baserel;
+ context.has_param = false;
+
+ /*
+ * An expression which includes any mutable function can't be pushed down
+ * because it's result is not stable. For example, pushing now() down to
+ * remote side would cause confusion from the clock offset.
+ * If we have routine mapping infrastructure in future release, we will be
+ * able to choose function to be pushed down in finer granularity.
+ */
+ if (contain_mutable_functions((Node *) expr))
+ {
+ elog(DEBUG3, "expr has mutable function");
+ return false;
+ }
+
+ /*
+ * Check that the expression consists of nodes which are known as safe to
+ * be pushed down.
+ */
+ if (foreign_expr_walker((Node *) expr, &context))
+ return false;
+
+ /*
+ * Tell caller whether the given expression contains any Param node, which
+ * can't be used in EXPLAIN statement before executor starts.
+ */
+ *has_param = context.has_param;
+
+ return true;
+}
+
+/*
+ * Return true if node includes any node which is not known as safe to be
+ * pushed down.
+ */
+static bool
+foreign_expr_walker(Node *node, foreign_executable_cxt *context)
+{
+ if (node == NULL)
+ return false;
+
+ /*
+ * Special case handling for List; expression_tree_walker handles List as
+ * well as other Expr nodes. For instance, List is used in RestrictInfo
+ * for args of FuncExpr node.
+ *
+ * Although the comments of expression_tree_walker mention that
+ * RangeTblRef, FromExpr, JoinExpr, and SetOperationStmt are handled as
+ * well, but we don't care them because they are not used in RestrictInfo.
+ * If one of them was passed into, default label catches it and give up
+ * traversing.
+ */
+ if (IsA(node, List))
+ {
+ ListCell *lc;
+
+ foreach(lc, (List *) node)
+ {
+ if (foreign_expr_walker(lfirst(lc), context))
+ return true;
+ }
+ return false;
+ }
+
+ /*
+ * If return type of given expression is not built-in, it can't be pushed
+ * down because it might has incompatible semantics on remote side.
+ */
+ if (!is_builtin(exprType(node)))
+ {
+ elog(DEBUG3, "expr has user-defined type");
+ return true;
+ }
+
+ switch (nodeTag(node))
+ {
+ case T_Const:
+ /*
+ * Using anyarray and/or anyenum in remote query is not supported.
+ */
+ if (((Const *) node)->consttype == ANYARRAYOID ||
+ ((Const *) node)->consttype == ANYNONARRAYOID)
+ {
+ elog(DEBUG3, "expr has anyarray or anyenum");
+ return true;
+ }
+ break;
+ case T_BoolExpr:
+ case T_NullTest:
+ case T_DistinctExpr:
+ case T_RelabelType:
+ /*
+ * These type of nodes are known as safe to be pushed down.
+ * Of course the subtree of the node, if any, should be checked
+ * continuously at the tail of this function.
+ */
+ break;
+ /*
+ * If function used by the expression is not built-in, it can't be
+ * pushed down because it might has incompatible semantics on remote
+ * side.
+ */
+ case T_FuncExpr:
+ {
+ FuncExpr *fe = (FuncExpr *) node;
+ if (!is_builtin(fe->funcid))
+ {
+ elog(DEBUG3, "expr has user-defined function");
+ return true;
+ }
+ }
+ break;
+ case T_Param:
+ /*
+ * Only external parameters can be pushed down.:
+ */
+ {
+ if (((Param *) node)->paramkind != PARAM_EXTERN)
+ {
+ elog(DEBUG3, "expr has non-external parameter");
+ return true;
+ }
+
+ /* Mark that this expression contains Param node. */
+ context->has_param = true;
+ }
+ break;
+ case T_ScalarArrayOpExpr:
+ /*
+ * Only built-in operators can be pushed down. In addition,
+ * underlying function must be built-in and immutable, but we don't
+ * check volatility here; such check must be done already with
+ * contain_mutable_functions.
+ */
+ {
+ ScalarArrayOpExpr *oe = (ScalarArrayOpExpr *) node;
+
+ if (!is_builtin(oe->opno) || !is_builtin(oe->opfuncid))
+ {
+ elog(DEBUG3, "expr has user-defined scalar-array operator");
+ return true;
+ }
+
+ /*
+ * If the operator takes collatable type as operands, we push
+ * down only "=" and "<>" which are not affected by collation.
+ * Other operators might be safe about collation, but these two
+ * seem enough to cover practical use cases.
+ */
+ if (exprInputCollation(node) != InvalidOid)
+ {
+ char *opname = get_opname(oe->opno);
+
+ if (strcmp(opname, "=") != 0 && strcmp(opname, "<>") != 0)
+ {
+ elog(DEBUG3, "expr has scalar-array operator which takes collatable as operand");
+ return true;
+ }
+ }
+
+ /* operands are checked later */
+ }
+ break;
+ case T_OpExpr:
+ /*
+ * Only built-in operators can be pushed down. In addition,
+ * underlying function must be built-in and immutable, but we don't
+ * check volatility here; such check must be done already with
+ * contain_mutable_functions.
+ */
+ {
+ OpExpr *oe = (OpExpr *) node;
+
+ if (!is_builtin(oe->opno) || !is_builtin(oe->opfuncid))
+ {
+ elog(DEBUG3, "expr has user-defined operator");
+ return true;
+ }
+
+ /*
+ * If the operator takes collatable type as operands, we push
+ * down only "=" and "<>" which are not affected by collation.
+ * Other operators might be safe about collation, but these two
+ * seem enough to cover practical use cases.
+ */
+ if (exprInputCollation(node) != InvalidOid)
+ {
+ char *opname = get_opname(oe->opno);
+
+ if (strcmp(opname, "=") != 0 && strcmp(opname, "<>") != 0)
+ {
+ elog(DEBUG3, "expr has operator which takes collatable as operand");
+ return true;
+ }
+ }
+
+ /* operands are checked later */
+ }
+ break;
+ case T_Var:
+ /*
+ * Var can be pushed down if it is in the foreign table.
+ * XXX Var of other relation can be here?
+ */
+ {
+ Var *var = (Var *) node;
+ foreign_executable_cxt *f_context;
+
+ f_context = (foreign_executable_cxt *) context;
+ if (var->varno != f_context->foreignrel->relid ||
+ var->varlevelsup != 0)
+ {
+ elog(DEBUG3, "expr has var of other relation");
+ return true;
+ }
+ }
+ break;
+ case T_ArrayRef:
+ /*
+ * ArrayRef which holds non-built-in typed elements can't be pushed
+ * down.
+ */
+ {
+ ArrayRef *ar = (ArrayRef *) node;;
+
+ if (!is_builtin(ar->refelemtype))
+ {
+ elog(DEBUG3, "expr has user-defined type as array element");
+ return true;
+ }
+
+ /* Assignment should not be in restrictions. */
+ if (ar->refassgnexpr != NULL)
+ {
+ elog(DEBUG3, "expr has assignment");
+ return true;
+ }
+ }
+ break;
+ case T_ArrayExpr:
+ /*
+ * ArrayExpr which holds non-built-in typed elements can't be pushed
+ * down.
+ */
+ {
+ if (!is_builtin(((ArrayExpr *) node)->element_typeid))
+ {
+ elog(DEBUG3, "expr has user-defined type as array element");
+ return true;
+ }
+ }
+ break;
+ default:
+ {
+ elog(DEBUG3, "expression is too complex: %s",
+ nodeToString(node));
+ return true;
+ }
+ break;
+ }
+
+ return expression_tree_walker(node, foreign_expr_walker, context);
+}
+
+/*
+ * Return true if given object is one of built-in objects.
+ */
+static bool
+is_builtin(Oid oid)
+{
+ return (oid < FirstNormalObjectId);
+}
+
+/*
+ * Deparse WHERE clause from given list of RestrictInfo and append them to buf.
+ * We assume that buf already holds a SQL statement which ends with valid WHERE
+ * clause.
+ *
+ * Only when calling the first time for a statement, is_first should be true.
+ */
+void
+appendWhereClause(StringInfo buf,
+ bool is_first,
+ List *exprs,
+ PlannerInfo *root)
+{
+ bool first = true;
+ ListCell *lc;
+
+ foreach(lc, exprs)
+ {
+ RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
+
+ /* Connect expressions with "AND" and parenthesize whole condition. */
+ if (is_first && first)
+ appendStringInfo(buf, " WHERE ");
+ else
+ appendStringInfo(buf, " AND ");
+
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, ri->clause, root);
+ appendStringInfoChar(buf, ')');
+
+ first = false;
+ }
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
new file mode 100644
index 0000000..3d4e7df
--- /dev/null
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -0,0 +1,721 @@
+-- ===================================================================
+-- create FDW objects
+-- ===================================================================
+-- Clean up in case a prior regression run failed
+-- Suppress NOTICE messages when roles don't exist
+SET client_min_messages TO 'error';
+DROP ROLE IF EXISTS postgres_fdw_user;
+RESET client_min_messages;
+CREATE ROLE postgres_fdw_user LOGIN SUPERUSER;
+SET SESSION AUTHORIZATION 'postgres_fdw_user';
+CREATE EXTENSION postgres_fdw;
+CREATE SERVER loopback1 FOREIGN DATA WRAPPER postgres_fdw;
+CREATE SERVER loopback2 FOREIGN DATA WRAPPER postgres_fdw
+ OPTIONS (dbname 'contrib_regression');
+CREATE USER MAPPING FOR public SERVER loopback1
+ OPTIONS (user 'value', password 'value');
+CREATE USER MAPPING FOR postgres_fdw_user SERVER loopback2;
+-- ===================================================================
+-- create objects used through FDW
+-- ===================================================================
+CREATE TYPE user_enum AS ENUM ('foo', 'bar', 'buz');
+CREATE SCHEMA "S 1";
+CREATE TABLE "S 1"."T 1" (
+ "C 1" int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum,
+ CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
+);
+CREATE TABLE "S 1"."T 2" (
+ c1 int NOT NULL,
+ c2 text,
+ CONSTRAINT t2_pkey PRIMARY KEY (c1)
+);
+BEGIN;
+TRUNCATE "S 1"."T 1";
+INSERT INTO "S 1"."T 1"
+ SELECT id,
+ id % 10,
+ to_char(id, 'FM00000'),
+ '1970-01-01'::timestamptz + ((id % 100) || ' days')::interval,
+ '1970-01-01'::timestamp + ((id % 100) || ' days')::interval,
+ id % 10,
+ id % 10,
+ 'foo'::user_enum
+ FROM generate_series(1, 1000) id;
+TRUNCATE "S 1"."T 2";
+INSERT INTO "S 1"."T 2"
+ SELECT id,
+ 'AAA' || to_char(id, 'FM000')
+ FROM generate_series(1, 100) id;
+COMMIT;
+-- ===================================================================
+-- create foreign tables
+-- ===================================================================
+CREATE FOREIGN TABLE ft1 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft1 DROP COLUMN c0;
+CREATE FOREIGN TABLE ft2 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft2 DROP COLUMN c0;
+-- ===================================================================
+-- tests for postgres_fdw_validator
+-- ===================================================================
+-- requiressl, krbsrvname and gsslib are omitted because they depend on
+-- configure option
+ALTER SERVER loopback1 OPTIONS (
+ use_remote_explain 'false',
+ fdw_startup_cost '123.456',
+ fdw_tuple_cost '0.123',
+ authtype 'value',
+ service 'value',
+ connect_timeout 'value',
+ dbname 'value',
+ host 'value',
+ hostaddr 'value',
+ port 'value',
+ --client_encoding 'value',
+ tty 'value',
+ options 'value',
+ application_name 'value',
+ --fallback_application_name 'value',
+ keepalives 'value',
+ keepalives_idle 'value',
+ keepalives_interval 'value',
+ -- requiressl 'value',
+ sslcompression 'value',
+ sslmode 'value',
+ sslcert 'value',
+ sslkey 'value',
+ sslrootcert 'value',
+ sslcrl 'value'
+ --requirepeer 'value',
+ -- krbsrvname 'value',
+ -- gsslib 'value',
+ --replication 'value'
+);
+ALTER USER MAPPING FOR public SERVER loopback1
+ OPTIONS (DROP user, DROP password);
+ALTER FOREIGN TABLE ft1 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft2 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+\dew+
+ List of foreign-data wrappers
+ Name | Owner | Handler | Validator | Access privileges | FDW Options | Description
+--------------+-------------------+----------------------+------------------------+-------------------+-------------+-------------
+ postgres_fdw | postgres_fdw_user | postgres_fdw_handler | postgres_fdw_validator | | |
+(1 row)
+
+\des+
+ List of foreign servers
+ Name | Owner | Foreign-data wrapper | Access privileges | Type | Version | FDW Options | Description
+-----------+-------------------+----------------------+-------------------+------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
+ loopback1 | postgres_fdw_user | postgres_fdw | | | | (use_remote_explain 'false', fdw_startup_cost '123.456', fdw_tuple_cost '0.123', authtype 'value', service 'value', connect_timeout 'value', dbname 'value', host 'value', hostaddr 'value', port 'value', tty 'value', options 'value', application_name 'value', keepalives 'value', keepalives_idle 'value', keepalives_interval 'value', sslcompression 'value', sslmode 'value', sslcert 'value', sslkey 'value', sslrootcert 'value', sslcrl 'value') |
+ loopback2 | postgres_fdw_user | postgres_fdw | | | | (dbname 'contrib_regression') |
+(2 rows)
+
+\deu+
+ List of user mappings
+ Server | User name | FDW Options
+-----------+-------------------+-------------
+ loopback1 | public |
+ loopback2 | postgres_fdw_user |
+(2 rows)
+
+\det+
+ List of foreign tables
+ Schema | Table | Server | FDW Options | Description
+--------+-------+-----------+--------------------------------+-------------
+ public | ft1 | loopback2 | (nspname 'S 1', relname 'T 1') |
+ public | ft2 | loopback2 | (nspname 'S 1', relname 'T 1') |
+(2 rows)
+
+-- Use only Nested loop for stable results.
+SET enable_mergejoin TO off;
+SET enable_hashjoin TO off;
+-- ===================================================================
+-- simple queries
+-- ===================================================================
+-- single table, with/without alias
+EXPLAIN (COSTS false) SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Limit
+ -> Sort
+ Sort Key: c3, c1
+ -> Foreign Scan on ft1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(5 rows)
+
+SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+-----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+ 102 | 2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
+ 103 | 3 | 00103 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
+ 104 | 4 | 00104 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo
+ 105 | 5 | 00105 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5 | foo
+ 106 | 6 | 00106 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6 | foo
+ 107 | 7 | 00107 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7 | foo
+ 108 | 8 | 00108 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8 | foo
+ 109 | 9 | 00109 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9 | foo
+ 110 | 0 | 00110 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0 | foo
+(10 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Limit
+ -> Sort
+ Sort Key: c3, c1
+ -> Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(5 rows)
+
+SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+-----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+ 102 | 2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
+ 103 | 3 | 00103 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
+ 104 | 4 | 00104 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo
+ 105 | 5 | 00105 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5 | foo
+ 106 | 6 | 00106 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6 | foo
+ 107 | 7 | 00107 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7 | foo
+ 108 | 8 | 00108 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8 | foo
+ 109 | 9 | 00109 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9 | foo
+ 110 | 0 | 00110 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0 | foo
+(10 rows)
+
+-- empty result
+SELECT * FROM ft1 WHERE false;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+----+----+----+----+----+----
+(0 rows)
+
+-- with WHERE clause
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Filter: (c7 >= '1'::bpchar)
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 101)) AND (((c6)::text OPERATOR(pg_catalog.=) '1'::text))
+(3 rows)
+
+SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+-----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+(1 row)
+
+-- aggregate
+SELECT COUNT(*) FROM ft1 t1;
+ count
+-------
+ 1000
+(1 row)
+
+-- join two tables
+SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ c1
+-----
+ 101
+ 102
+ 103
+ 104
+ 105
+ 106
+ 107
+ 108
+ 109
+ 110
+(10 rows)
+
+-- subquery
+SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+ 2 | 2 | 00002 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
+ 3 | 3 | 00003 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
+ 4 | 4 | 00004 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo
+ 5 | 5 | 00005 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5 | foo
+ 6 | 6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6 | foo
+ 7 | 7 | 00007 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7 | foo
+ 8 | 8 | 00008 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8 | foo
+ 9 | 9 | 00009 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9 | foo
+ 10 | 0 | 00010 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0 | foo
+(10 rows)
+
+-- subquery+MAX
+SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY c1;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+------+----+-------+------------------------------+--------------------------+----+------------+-----
+ 1000 | 0 | 01000 | Thu Jan 01 00:00:00 1970 PST | Thu Jan 01 00:00:00 1970 | 0 | 0 | foo
+(1 row)
+
+-- used in CTE
+WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
+ c1 | c2 | c3 | c4
+----+----+-------+------------------------------
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST
+ 2 | 2 | 00002 | Sat Jan 03 00:00:00 1970 PST
+ 3 | 3 | 00003 | Sun Jan 04 00:00:00 1970 PST
+ 4 | 4 | 00004 | Mon Jan 05 00:00:00 1970 PST
+ 5 | 5 | 00005 | Tue Jan 06 00:00:00 1970 PST
+ 6 | 6 | 00006 | Wed Jan 07 00:00:00 1970 PST
+ 7 | 7 | 00007 | Thu Jan 08 00:00:00 1970 PST
+ 8 | 8 | 00008 | Fri Jan 09 00:00:00 1970 PST
+ 9 | 9 | 00009 | Sat Jan 10 00:00:00 1970 PST
+ 10 | 0 | 00010 | Sun Jan 11 00:00:00 1970 PST
+(10 rows)
+
+-- fixed values
+SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+ ?column? | ?column?
+----------+----------
+ fixed |
+(1 row)
+
+-- user-defined operator/function
+CREATE FUNCTION postgres_fdw_abs(int) RETURNS int AS $$
+BEGIN
+RETURN abs($1);
+END
+$$ LANGUAGE plpgsql IMMUTABLE;
+CREATE OPERATOR === (
+ LEFTARG = int,
+ RIGHTARG = int,
+ PROCEDURE = int4eq,
+ COMMUTATOR = ===,
+ NEGATOR = !==
+);
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = postgres_fdw_abs(t1.c2);
+ QUERY PLAN
+-------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Filter: (c1 = postgres_fdw_abs(c2))
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(3 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 === t1.c2;
+ QUERY PLAN
+-------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Filter: (c1 === c2)
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(3 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = abs(t1.c2);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) pg_catalog.abs(c2)))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = t1.c2;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) c2))
+(2 rows)
+
+-- ===================================================================
+-- WHERE push down
+-- ===================================================================
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1; -- Var, OpExpr(b), Const
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 100)) AND ((c2 OPERATOR(pg_catalog.=) 0))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NULL; -- NullTest
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NULL))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL; -- NullTest
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NOT NULL))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((pg_catalog.round(pg_catalog.abs("C 1"), 0) OPERATOR(pg_catalog.=) 1::numeric))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = -c1; -- OpExpr(l)
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) (OPERATOR(pg_catalog.-) "C 1")))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE 1 = c1!; -- OpExpr(r)
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((1::numeric OPERATOR(pg_catalog.=) ("C 1" OPERATOR(pg_catalog.!))))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NOT NULL) IS DISTINCT FROM ("C 1" IS NOT NULL))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) ANY (ARRAY[c2, 1, ("C 1" OPERATOR(pg_catalog.+) 0)])))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = (ARRAY[c1,c2,3])[1]; -- ArrayRef
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) ((ARRAY["C 1", c2, 3])[1])))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo'; -- no push-down
+ QUERY PLAN
+-------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Filter: (c8 = 'foo'::user_enum)
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(3 rows)
+
+-- ===================================================================
+-- parameterized queries
+-- ===================================================================
+-- simple join
+PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
+EXPLAIN (COSTS false) EXECUTE st1(1, 2);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------
+ Nested Loop
+ -> Foreign Scan on ft1 t1
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+ -> Foreign Scan on ft2 t2
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 2))
+(5 rows)
+
+EXECUTE st1(1, 1);
+ c3 | c3
+-------+-------
+ 00001 | 00001
+(1 row)
+
+EXECUTE st1(101, 101);
+ c3 | c3
+-------+-------
+ 00101 | 00101
+(1 row)
+
+-- subquery using stable function (can't be pushed down)
+PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c4) = 6) ORDER BY c1;
+EXPLAIN (COSTS false) EXECUTE st2(10, 20);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
+ Sort Key: t1.c1
+ -> Nested Loop Semi Join
+ Join Filter: (t1.c3 = t2.c3)
+ -> Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.<) 20))
+ -> Materialize
+ -> Foreign Scan on ft2 t2
+ Filter: (date_part('dow'::text, c4) = 6::double precision)
+ Remote SQL: SELECT NULL, NULL, c3, c4, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.>) 10))
+(10 rows)
+
+EXECUTE st2(10, 20);
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 16 | 6 | 00016 | Sat Jan 17 00:00:00 1970 PST | Sat Jan 17 00:00:00 1970 | 6 | 6 | foo
+(1 row)
+
+EXECUTE st1(101, 101);
+ c3 | c3
+-------+-------
+ 00101 | 00101
+(1 row)
+
+-- subquery using immutable function (can be pushed down)
+PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c5) = 6) ORDER BY c1;
+EXPLAIN (COSTS false) EXECUTE st3(10, 20);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
+ Sort Key: t1.c1
+ -> Nested Loop Semi Join
+ Join Filter: (t1.c3 = t2.c3)
+ -> Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.<) 20))
+ -> Materialize
+ -> Foreign Scan on ft2 t2
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.>) 10)) AND ((pg_catalog.date_part('dow'::text, c5) OPERATOR(pg_catalog.=) 6::double precision))
+(9 rows)
+
+EXECUTE st3(10, 20);
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 16 | 6 | 00016 | Sat Jan 17 00:00:00 1970 PST | Sat Jan 17 00:00:00 1970 | 6 | 6 | foo
+(1 row)
+
+EXECUTE st3(20, 30);
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 23 | 3 | 00023 | Sat Jan 24 00:00:00 1970 PST | Sat Jan 24 00:00:00 1970 | 3 | 3 | foo
+(1 row)
+
+-- custom plan should be chosen
+PREPARE st4(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 = $1;
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) $1))
+(2 rows)
+
+-- cleanup
+DEALLOCATE st1;
+DEALLOCATE st2;
+DEALLOCATE st3;
+DEALLOCATE st4;
+-- ===================================================================
+-- used in pl/pgsql function
+-- ===================================================================
+CREATE OR REPLACE FUNCTION f_test(p_c1 int) RETURNS int AS $$
+DECLARE
+ v_c1 int;
+BEGIN
+ SELECT c1 INTO v_c1 FROM ft1 WHERE c1 = p_c1 LIMIT 1;
+ PERFORM c1 FROM ft1 WHERE c1 = p_c1 AND p_c1 = v_c1 LIMIT 1;
+ RETURN v_c1;
+END;
+$$ LANGUAGE plpgsql;
+SELECT f_test(100);
+ f_test
+--------
+ 100
+(1 row)
+
+DROP FUNCTION f_test(int);
+-- ===================================================================
+-- cost estimation options
+-- ===================================================================
+ALTER SERVER loopback1 OPTIONS (SET use_remote_explain 'true');
+ALTER SERVER loopback1 OPTIONS (SET fdw_startup_cost '0');
+ALTER SERVER loopback1 OPTIONS (SET fdw_tuple_cost '0');
+EXPLAIN (COSTS false) SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Limit
+ -> Sort
+ Sort Key: c3, c1
+ -> Foreign Scan on ft1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(5 rows)
+
+ALTER SERVER loopback1 OPTIONS (DROP use_remote_explain);
+ALTER SERVER loopback1 OPTIONS (DROP fdw_startup_cost);
+ALTER SERVER loopback1 OPTIONS (DROP fdw_tuple_cost);
+-- ===================================================================
+-- connection management
+-- ===================================================================
+SELECT srvname, usename FROM postgres_fdw_connections;
+ srvname | usename
+-----------+-------------------
+ loopback2 | postgres_fdw_user
+(1 row)
+
+SELECT postgres_fdw_disconnect(srvid, usesysid) FROM postgres_fdw_get_connections();
+ postgres_fdw_disconnect
+-------------------------
+ OK
+(1 row)
+
+SELECT srvname, usename FROM postgres_fdw_connections;
+ srvname | usename
+---------+---------
+(0 rows)
+
+-- ===================================================================
+-- conversion error
+-- ===================================================================
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE int;
+SELECT * FROM ft1 WHERE c1 = 1; -- ERROR
+ERROR: invalid input syntax for integer: "1970-01-02 00:00:00"
+CONTEXT: column c5 of foreign table ft1
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE timestamp;
+-- ===================================================================
+-- subtransaction
+-- + local/remote error doesn't break cursor
+-- + remote error discards connection
+-- ===================================================================
+BEGIN;
+DECLARE c CURSOR FOR SELECT * FROM ft1 ORDER BY c1;
+FETCH c;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+(1 row)
+
+SAVEPOINT s;
+ERROR OUT; -- ERROR
+ERROR: syntax error at or near "ERROR"
+LINE 1: ERROR OUT;
+ ^
+ROLLBACK TO s;
+SELECT srvname FROM postgres_fdw_connections;
+ srvname
+-----------
+ loopback2
+(1 row)
+
+FETCH c;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 2 | 2 | 00002 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
+(1 row)
+
+SAVEPOINT s;
+SELECT * FROM ft1 WHERE 1 / (c1 - 1) > 0; -- ERROR
+ERROR: could not execute remote query
+DETAIL: ERROR: division by zero
+
+HINT: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (((1 OPERATOR(pg_catalog./) ("C 1" OPERATOR(pg_catalog.-) 1)) OPERATOR(pg_catalog.>) 0))
+ROLLBACK TO s;
+SELECT srvname FROM postgres_fdw_connections;
+ srvname
+---------
+(0 rows)
+
+FETCH c;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 3 | 3 | 00003 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
+(1 row)
+
+SELECT * FROM ft1 ORDER BY c1 LIMIT 1;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+(1 row)
+
+COMMIT;
+SELECT srvname FROM postgres_fdw_connections;
+ srvname
+-----------
+ loopback2
+(1 row)
+
+ERROR OUT; -- ERROR
+ERROR: syntax error at or near "ERROR"
+LINE 1: ERROR OUT;
+ ^
+SELECT srvname FROM postgres_fdw_connections;
+ srvname
+---------
+(0 rows)
+
+-- ===================================================================
+-- cleanup
+-- ===================================================================
+DROP OPERATOR === (int, int) CASCADE;
+DROP OPERATOR !== (int, int) CASCADE;
+DROP FUNCTION postgres_fdw_abs(int);
+DROP SCHEMA "S 1" CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to table "S 1"."T 1"
+drop cascades to table "S 1"."T 2"
+DROP TYPE user_enum CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to foreign table ft1 column c8
+drop cascades to foreign table ft2 column c8
+DROP EXTENSION postgres_fdw CASCADE;
+NOTICE: drop cascades to 6 other objects
+DETAIL: drop cascades to server loopback1
+drop cascades to user mapping for public
+drop cascades to server loopback2
+drop cascades to user mapping for postgres_fdw_user
+drop cascades to foreign table ft1
+drop cascades to foreign table ft2
+\c
+DROP ROLE postgres_fdw_user;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
new file mode 100644
index 0000000..c173830
--- /dev/null
+++ b/contrib/postgres_fdw/option.c
@@ -0,0 +1,294 @@
+/*-------------------------------------------------------------------------
+ *
+ * option.c
+ * FDW option handling
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgres_fdw/option.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "libpq-fe.h"
+
+#include "access/reloptions.h"
+#include "catalog/pg_foreign_data_wrapper.h"
+#include "catalog/pg_foreign_server.h"
+#include "catalog/pg_foreign_table.h"
+#include "catalog/pg_user_mapping.h"
+#include "commands/defrem.h"
+#include "fmgr.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "utils/memutils.h"
+
+#include "postgres_fdw.h"
+
+/*
+ * SQL functions
+ */
+extern Datum postgres_fdw_validator(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgres_fdw_validator);
+
+/*
+ * Describes the valid options for objects that this wrapper uses.
+ */
+typedef struct PostgresFdwOption
+{
+ const char *keyword;
+ Oid optcontext; /* Oid of catalog in which options may appear */
+ bool is_libpq_opt; /* true if it's used in libpq */
+} PostgresFdwOption;
+
+/*
+ * Valid options for postgres_fdw.
+ * Allocated and filled in InitPostgresFdwOptions.
+ */
+static PostgresFdwOption *postgres_fdw_options;
+
+/*
+ * Valid options of libpq.
+ * Allocated and filled in InitPostgresFdwOptions.
+ */
+static PQconninfoOption *libpq_options;
+
+/*
+ * Helper functions
+ */
+static bool is_valid_option(const char *keyword, Oid context);
+
+/*
+ * Validate the generic options given to a FOREIGN DATA WRAPPER, SERVER,
+ * USER MAPPING or FOREIGN TABLE that uses postgres_fdw.
+ *
+ * Raise an ERROR if the option or its value is considered invalid.
+ */
+Datum
+postgres_fdw_validator(PG_FUNCTION_ARGS)
+{
+ List *options_list = untransformRelOptions(PG_GETARG_DATUM(0));
+ Oid catalog = PG_GETARG_OID(1);
+ ListCell *cell;
+
+ /*
+ * Check that only options supported by postgres_fdw, and allowed for the
+ * current object type, are given.
+ */
+ foreach(cell, options_list)
+ {
+ DefElem *def = (DefElem *) lfirst(cell);
+
+ if (!is_valid_option(def->defname, catalog))
+ {
+ PostgresFdwOption *opt;
+ StringInfoData buf;
+
+ /*
+ * Unknown option specified, complain about it. Provide a hint
+ * with list of valid options for the object.
+ */
+ initStringInfo(&buf);
+ for (opt = postgres_fdw_options; opt->keyword; opt++)
+ {
+ if (catalog == opt->optcontext)
+ appendStringInfo(&buf, "%s%s", (buf.len > 0) ? ", " : "",
+ opt->keyword);
+ }
+
+ ereport(ERROR,
+ (errcode(ERRCODE_FDW_INVALID_OPTION_NAME),
+ errmsg("invalid option \"%s\"", def->defname),
+ errhint("Valid options in this context are: %s",
+ buf.data)));
+ }
+
+ if (strcmp(def->defname, "use_remote_explain") == 0)
+ {
+ /* use_remote_explain accepts only boolean values */
+ (void) defGetBoolean(def);
+ }
+ else if (strcmp(def->defname, "fdw_startup_cost") == 0)
+ {
+ double val;
+ char *endp;
+ val = strtod(defGetString(def), &endp);
+ if (*endp || val < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("fdw_startup_cost requires positive numeric value or zero")));
+ }
+ else if (strcmp(def->defname, "fdw_tuple_cost") == 0)
+ {
+ double val;
+ char *endp;
+ val = strtod(defGetString(def), &endp);
+ if (*endp || val < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("fdw_tuple_cost requires positive numeric value or zero")));
+ }
+ }
+
+ /*
+ * We don't care option-specific limitation here; they will be validated at
+ * the execution time.
+ */
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Initialize option check mechanism. This must be called before any call
+ * against other functions in options.c, so _PG_init would be proper timing.
+ */
+void
+InitPostgresFdwOptions(void)
+{
+ int libpq_opt_num;
+ PQconninfoOption *lopt;
+ PostgresFdwOption *popt;
+ /* non-libpq FDW-specific FDW options */
+ static const PostgresFdwOption non_libpq_options[] = {
+ { "nspname", ForeignTableRelationId, false} ,
+ { "relname", ForeignTableRelationId, false} ,
+ { "colname", AttributeRelationId, false} ,
+ /* use_remote_explain is available on both server and table */
+ { "use_remote_explain", ForeignServerRelationId, false} ,
+ { "use_remote_explain", ForeignTableRelationId, false} ,
+ /* cost factors */
+ { "fdw_startup_cost", ForeignServerRelationId, false} ,
+ { "fdw_tuple_cost", ForeignServerRelationId, false} ,
+ { NULL, InvalidOid, false },
+ };
+
+ /* Prevent redundant initialization. */
+ if (postgres_fdw_options)
+ return;
+
+ /*
+ * Get list of valid libpq options.
+ *
+ * To avoid unnecessary work, we get the list once and use it throughout
+ * the lifetime of this backend process. We don't need to care about
+ * memory context issues, because PQconndefaults allocates with malloc.
+ */
+ libpq_options = PQconndefaults();
+ if (!libpq_options) /* assume reason for failure is OOM */
+ ereport(ERROR,
+ (errcode(ERRCODE_FDW_OUT_OF_MEMORY),
+ errmsg("out of memory"),
+ errdetail("could not get libpq's default connection options")));
+
+ /* Count how much libpq options are available. */
+ libpq_opt_num = 0;
+ for (lopt = libpq_options; lopt->keyword; lopt++)
+ libpq_opt_num++;
+
+ /*
+ * Construct an array which consists of all valid options for postgres_fdw,
+ * by appending FDW-specific options to libpq options.
+ *
+ * We can use plain malloc here to allocate postgres_fdw_options because it
+ * lives as long as the backend process does, but we don't do it to make
+ * code simple. We allocate extra entries for FDW-specific options
+ * including one more for sentinel.
+ *
+ * We keep libpq_options in memory until this backend process dies to avoid
+ * copying every keyword strings.
+ */
+ postgres_fdw_options = (PostgresFdwOption *)
+ MemoryContextAllocZero(CacheMemoryContext,
+ sizeof(PostgresFdwOption) * libpq_opt_num +
+ sizeof(non_libpq_options));
+ popt = postgres_fdw_options;
+ for (lopt = libpq_options; lopt->keyword; lopt++)
+ {
+ /* Disallow some debug options. */
+ if (strcmp(lopt->keyword, "replication") == 0 ||
+ strcmp(lopt->keyword, "fallback_application_name") == 0 ||
+ strcmp(lopt->keyword, "client_encoding") == 0)
+ continue;
+
+ /* We don't have to copy keyword string, as described above. */
+ popt->keyword = lopt->keyword;
+
+ /* "user" and any secret options are allowed on only user mappings. */
+ if (strcmp(lopt->keyword, "user") == 0 || strchr(lopt->dispchar, '*'))
+ popt->optcontext = UserMappingRelationId;
+ else
+ popt->optcontext = ForeignServerRelationId;
+ popt->is_libpq_opt = true;
+
+ /* Advance the position where next option will be placed. */
+ popt++;
+ }
+
+ /* Append FDW-specific options. */
+ memcpy(popt, non_libpq_options, sizeof(non_libpq_options));
+}
+
+/*
+ * Check whether the given option is one of the valid postgres_fdw options.
+ * context is the Oid of the catalog holding the object the option is for.
+ */
+static bool
+is_valid_option(const char *keyword, Oid context)
+{
+ PostgresFdwOption *opt;
+
+ for (opt = postgres_fdw_options; opt->keyword; opt++)
+ {
+ if (context == opt->optcontext && strcmp(opt->keyword, keyword) == 0)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Check whether the given option is one of the valid libpq options.
+ * context is the Oid of the catalog holding the object the option is for.
+ */
+static bool
+is_libpq_option(const char *keyword)
+{
+ PostgresFdwOption *opt;
+
+ for (opt = postgres_fdw_options; opt->keyword; opt++)
+ {
+ if (opt->is_libpq_opt && strcmp(opt->keyword, keyword) == 0)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Generate key-value arrays which includes only libpq options from the list
+ * which contains any kind of options.
+ */
+int
+ExtractConnectionOptions(List *defelems, const char **keywords,
+ const char **values)
+{
+ ListCell *lc;
+ int i;
+
+ i = 0;
+ foreach(lc, defelems)
+ {
+ DefElem *d = (DefElem *) lfirst(lc);
+ if (is_libpq_option(d->defname))
+ {
+ keywords[i] = d->defname;
+ values[i] = defGetString(d);
+ i++;
+ }
+ }
+ return i;
+}
+
diff --git a/contrib/postgres_fdw/postgres_fdw--1.0.sql b/contrib/postgres_fdw/postgres_fdw--1.0.sql
new file mode 100644
index 0000000..56b39b9
--- /dev/null
+++ b/contrib/postgres_fdw/postgres_fdw--1.0.sql
@@ -0,0 +1,39 @@
+/* contrib/postgres_fdw/postgres_fdw--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION postgres_fdw" to load this file. \quit
+
+CREATE FUNCTION postgres_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION postgres_fdw_validator(text[], oid)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER postgres_fdw
+ HANDLER postgres_fdw_handler
+ VALIDATOR postgres_fdw_validator;
+
+/* connection management functions and view */
+CREATE FUNCTION postgres_fdw_get_connections(out srvid oid, out usesysid oid)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION postgres_fdw_disconnect(oid, oid)
+RETURNS text
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE VIEW postgres_fdw_connections AS
+SELECT c.srvid srvid,
+ s.srvname srvname,
+ c.usesysid usesysid,
+ pg_get_userbyid(c.usesysid) usename
+ FROM postgres_fdw_get_connections() c
+ JOIN pg_catalog.pg_foreign_server s ON (s.oid = c.srvid);
+GRANT SELECT ON postgres_fdw_connections TO public;
+
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
new file mode 100644
index 0000000..dc57dd4
--- /dev/null
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -0,0 +1,1428 @@
+/*-------------------------------------------------------------------------
+ *
+ * postgres_fdw.c
+ * foreign-data wrapper for remote PostgreSQL servers.
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgres_fdw/postgres_fdw.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "fmgr.h"
+
+#include "access/htup_details.h"
+#include "catalog/pg_foreign_server.h"
+#include "catalog/pg_foreign_table.h"
+#include "catalog/pg_type.h"
+#include "commands/defrem.h"
+#include "commands/explain.h"
+#include "commands/vacuum.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+#include "postgres_fdw.h"
+#include "connection.h"
+
+PG_MODULE_MAGIC;
+
+/* Defalut cost to establish a connection. */
+#define DEFAULT_FDW_STARTUP_COST 100.0
+
+/* Defalut cost to process 1 row, including data transfer. */
+#define DEFAULT_FDW_TUPLE_COST 0.001
+
+/*
+ * FDW-specific information for RelOptInfo.fdw_private. This is used to pass
+ * information from postgresGetForeignRelSize to postgresGetForeignPaths.
+ */
+typedef struct PostgresFdwPlanState {
+ /*
+ * These are generated in GetForeignRelSize, and also used in subsequent
+ * GetForeignPaths.
+ */
+ StringInfoData sql;
+ Cost startup_cost;
+ Cost total_cost;
+ List *remote_conds;
+ List *param_conds;
+ List *local_conds;
+ int width; /* obtained by remote EXPLAIN */
+
+ /* Cached catalog information. */
+ ForeignTable *table;
+ ForeignServer *server;
+} PostgresFdwPlanState;
+
+/*
+ * Index of FDW-private information stored in fdw_private list.
+ *
+ * We store various information in ForeignScan.fdw_private to pass them beyond
+ * the boundary between planner and executor. Finally FdwPlan holds items
+ * below:
+ *
+ * 1) plain SELECT statement
+ *
+ * These items are indexed with the enum FdwPrivateIndex, so an item
+ * can be accessed directly via list_nth(). For example of SELECT statement:
+ * sql = list_nth(fdw_private, FdwPrivateSelectSql)
+ */
+enum FdwPrivateIndex {
+ /* SQL statements */
+ FdwPrivateSelectSql,
+
+ /* # of elements stored in the list fdw_private */
+ FdwPrivateNum,
+};
+
+/*
+ * Describe the attribute where data conversion fails.
+ */
+typedef struct ErrorPos {
+ Oid relid; /* oid of the foreign table */
+ AttrNumber cur_attno; /* attribute number under process */
+} ErrorPos;
+
+/*
+ * Describes an execution state of a foreign scan against a foreign table
+ * using postgres_fdw.
+ */
+typedef struct PostgresFdwExecutionState
+{
+ List *fdw_private; /* FDW-private information */
+
+ /* for remote query execution */
+ PGconn *conn; /* connection for the scan */
+ Oid *param_types; /* type array of external parameter */
+ const char **param_values; /* value array of external parameter */
+
+ /* for tuple generation. */
+ AttrNumber attnum; /* # of non-dropped attribute */
+ Datum *values; /* column value buffer */
+ bool *nulls; /* column null indicator buffer */
+ AttInMetadata *attinmeta; /* attribute metadata */
+
+ /* for storing result tuples */
+ MemoryContext scan_cxt; /* context for per-scan lifespan data */
+ MemoryContext temp_cxt; /* context for per-tuple temporary data */
+ Tuplestorestate *tuples; /* result of the scan */
+
+ /* for error handling. */
+ ErrorPos errpos;
+} PostgresFdwExecutionState;
+
+/*
+ * Describes a state of analyze request for a foreign table.
+ */
+typedef struct PostgresAnalyzeState
+{
+ /* for tuple generation. */
+ TupleDesc tupdesc;
+ AttInMetadata *attinmeta;
+ Datum *values;
+ bool *nulls;
+
+ /* for random sampling */
+ HeapTuple *rows; /* result buffer */
+ int targrows; /* target # of sample rows */
+ int numrows; /* # of samples collected */
+ double samplerows; /* # of rows fetched */
+ double rowstoskip; /* # of rows skipped before next sample */
+ double rstate; /* random state */
+
+ /* for storing result tuples */
+ MemoryContext anl_cxt; /* context for per-analyze lifespan data */
+ MemoryContext temp_cxt; /* context for per-tuple temporary data */
+
+ /* for error handling. */
+ ErrorPos errpos;
+} PostgresAnalyzeState;
+
+/*
+ * SQL functions
+ */
+extern Datum postgres_fdw_handler(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgres_fdw_handler);
+
+/*
+ * FDW callback routines
+ */
+static void postgresGetForeignRelSize(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid);
+static void postgresGetForeignPaths(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid);
+static ForeignScan *postgresGetForeignPlan(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid,
+ ForeignPath *best_path,
+ List *tlist,
+ List *scan_clauses);
+static void postgresExplainForeignScan(ForeignScanState *node,
+ ExplainState *es);
+static void postgresBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *postgresIterateForeignScan(ForeignScanState *node);
+static void postgresReScanForeignScan(ForeignScanState *node);
+static void postgresEndForeignScan(ForeignScanState *node);
+static bool postgresAnalyzeForeignTable(Relation relation,
+ AcquireSampleRowsFunc *func,
+ BlockNumber *totalpages);
+
+/*
+ * Helper functions
+ */
+static void get_remote_estimate(const char *sql,
+ PGconn *conn,
+ double *rows,
+ int *width,
+ Cost *startup_cost,
+ Cost *total_cost);
+static void execute_query(ForeignScanState *node);
+static void query_row_processor(PGresult *res, ForeignScanState *node,
+ bool first);
+static void analyze_row_processor(PGresult *res, PostgresAnalyzeState *astate,
+ bool first);
+static void postgres_fdw_error_callback(void *arg);
+static int postgresAcquireSampleRowsFunc(Relation relation, int elevel,
+ HeapTuple *rows, int targrows,
+ double *totalrows,
+ double *totaldeadrows);
+
+/* Exported functions, but not written in postgres_fdw.h. */
+void _PG_init(void);
+void _PG_fini(void);
+
+/*
+ * Module-specific initialization.
+ */
+void
+_PG_init(void)
+{
+ InitPostgresFdwOptions();
+}
+
+/*
+ * Module-specific clean up.
+ */
+void
+_PG_fini(void)
+{
+}
+
+/*
+ * Foreign-data wrapper handler function: return a struct with pointers
+ * to my callback routines.
+ */
+Datum
+postgres_fdw_handler(PG_FUNCTION_ARGS)
+{
+ FdwRoutine *routine = makeNode(FdwRoutine);
+
+ /* Required handler functions. */
+ routine->GetForeignRelSize = postgresGetForeignRelSize;
+ routine->GetForeignPaths = postgresGetForeignPaths;
+ routine->GetForeignPlan = postgresGetForeignPlan;
+ routine->ExplainForeignScan = postgresExplainForeignScan;
+ routine->BeginForeignScan = postgresBeginForeignScan;
+ routine->IterateForeignScan = postgresIterateForeignScan;
+ routine->ReScanForeignScan = postgresReScanForeignScan;
+ routine->EndForeignScan = postgresEndForeignScan;
+
+ /* Optional handler functions. */
+ routine->AnalyzeForeignTable = postgresAnalyzeForeignTable;
+
+ PG_RETURN_POINTER(routine);
+}
+
+/*
+ * postgresGetForeignRelSize
+ * Estimate # of rows and width of the result of the scan
+ *
+ * Here we estimate number of rows returned by the scan in two steps. In the
+ * first step, we execute remote EXPLAIN command to obtain the number of rows
+ * returned from remote side. In the second step, we calculate the selectivity
+ * of the filtering done on local side, and modify first estimate.
+ *
+ * We have to get some catalog objects and generate remote query string here,
+ * so we store such expensive information in FDW private area of RelOptInfo and
+ * pass them to subsequent functions for reuse.
+ */
+static void
+postgresGetForeignRelSize(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid)
+{
+ bool use_remote_explain = false;
+ ListCell *lc;
+ PostgresFdwPlanState *fpstate;
+ StringInfo sql;
+ ForeignTable *table;
+ ForeignServer *server;
+ Selectivity sel;
+ double rows;
+ int width;
+ Cost startup_cost;
+ Cost total_cost;
+ List *remote_conds = NIL;
+ List *param_conds = NIL;
+ List *local_conds = NIL;
+
+ /*
+ * We use PostgresFdwPlanState to pass various information to subsequent
+ * functions.
+ */
+ fpstate = palloc0(sizeof(PostgresFdwPlanState));
+ initStringInfo(&fpstate->sql);
+ sql = &fpstate->sql;
+
+ /*
+ * Determine whether we use remote estimate or not. Note that per-table
+ * setting overrides per-server setting.
+ */
+ table = GetForeignTable(foreigntableid);
+ server = GetForeignServer(table->serverid);
+ foreach (lc, server->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+ if (strcmp(def->defname, "use_remote_explain") == 0)
+ {
+ use_remote_explain = defGetBoolean(def);
+ break;
+ }
+ }
+ foreach (lc, table->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+ if (strcmp(def->defname, "use_remote_explain") == 0)
+ {
+ use_remote_explain = defGetBoolean(def);
+ break;
+ }
+ }
+
+ /*
+ * Construct remote query which consists of SELECT, FROM, and WHERE
+ * clauses. Conditions which contain any Param node are excluded because
+ * placeholder can't be used in EXPLAIN statement. Such conditions are
+ * appended later.
+ */
+ classifyConditions(root, baserel, &remote_conds, ¶m_conds,
+ &local_conds);
+ deparseSimpleSql(sql, root, baserel, local_conds);
+ if (list_length(remote_conds) > 0)
+ appendWhereClause(sql, true, remote_conds, root);
+ elog(DEBUG3, "Query SQL: %s", sql->data);
+
+ /*
+ * If the table or the server is configured to use remote EXPLAIN, connect
+ * the foreign server and execute EXPLAIN with conditions which don't
+ * contain any parameter reference. Otherwise, estimate rows in the way
+ * similar to ordinary tables.
+ */
+ if (use_remote_explain)
+ {
+ UserMapping *user;
+ PGconn *conn;
+
+ user = GetUserMapping(GetOuterUserId(), server->serverid);
+ conn = GetConnection(server, user, false);
+ get_remote_estimate(sql->data, conn, &rows, &width,
+ &startup_cost, &total_cost);
+ ReleaseConnection(conn);
+
+ /*
+ * Estimate selectivity of conditions which are not used in remote
+ * EXPLAIN by calling clauselist_selectivity(). The best we can do for
+ * parameterized condition is to estimate selectivity on the basis of
+ * local statistics. When we actually obtain result rows, such
+ * conditions are deparsed into remote query and reduce rows
+ * transferred.
+ */
+ sel = 1;
+ sel *= clauselist_selectivity(root, param_conds,
+ baserel->relid, JOIN_INNER, NULL);
+ sel *= clauselist_selectivity(root, local_conds,
+ baserel->relid, JOIN_INNER, NULL);
+
+ /* Report estimated numbers to planner. */
+ baserel->rows = rows * sel;
+ }
+ else
+ {
+ /*
+ * Estimate rows from the result of the last ANALYZE, and all
+ * conditions specified in original query.
+ */
+ set_baserel_size_estimates(root, baserel);
+
+ /* Save estimated width to pass it to consequence functions */
+ width = baserel->width;
+ }
+
+ /*
+ * Finish deparsing remote query by adding conditions which are unavailable
+ * in remote EXPLAIN since they contain parameter references.
+ */
+ if (list_length(param_conds) > 0)
+ appendWhereClause(sql, !(list_length(remote_conds) > 0), param_conds,
+ root);
+
+ /*
+ * Pack obtained information into a object and store it in FDW-private area
+ * of RelOptInfo to pass them to subsequent functions.
+ */
+ fpstate->startup_cost = startup_cost;
+ fpstate->total_cost = total_cost;
+ fpstate->remote_conds = remote_conds;
+ fpstate->param_conds = param_conds;
+ fpstate->local_conds = local_conds;
+ fpstate->width = width;
+ fpstate->table = table;
+ fpstate->server = server;
+ baserel->fdw_private = (void *) fpstate;
+}
+
+/*
+ * postgresGetForeignPaths
+ * Create possible scan paths for a scan on the foreign table
+ */
+static void
+postgresGetForeignPaths(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid)
+{
+ PostgresFdwPlanState *fpstate;
+ ForeignPath *path;
+ ListCell *lc;
+ double fdw_startup_cost = DEFAULT_FDW_STARTUP_COST;
+ double fdw_tuple_cost = DEFAULT_FDW_TUPLE_COST;
+ Cost startup_cost;
+ Cost total_cost;
+ List *fdw_private;
+
+ /* Cache frequently accessed value */
+ fpstate = (PostgresFdwPlanState *) baserel->fdw_private;
+
+ /*
+ * We have cost values which are estimated on remote side, so adjust them
+ * for better estimate which respect various stuffs to complete the scan,
+ * such as sending query, transferring result, and local filtering.
+ */
+ startup_cost = fpstate->startup_cost;
+ total_cost = fpstate->total_cost;
+
+ /*
+ * Adjust costs with factors of the corresponding foreign server:
+ * - add cost to establish connection to both startup and total
+ * - add cost to manipulate on remote, and transfer result to total
+ * - add cost to manipulate tuples on local side to total
+ */
+ foreach(lc, fpstate->server->options)
+ {
+ DefElem *d = (DefElem *) lfirst(lc);
+ if (strcmp(d->defname, "fdw_startup_cost") == 0)
+ fdw_startup_cost = strtod(defGetString(d), NULL);
+ else if (strcmp(d->defname, "fdw_tuple_cost") == 0)
+ fdw_tuple_cost = strtod(defGetString(d), NULL);
+ }
+ startup_cost += fdw_startup_cost;
+ total_cost += fdw_startup_cost;
+ total_cost += fdw_tuple_cost * baserel->rows;
+ total_cost += cpu_tuple_cost * baserel->rows;
+
+ /* Pass SQL statement from planner to executor through FDW private area. */
+ fdw_private = list_make1(makeString(fpstate->sql.data));
+
+ /*
+ * Create simplest ForeignScan path node and add it to baserel. This path
+ * corresponds to SeqScan path of regular tables.
+ */
+ path = create_foreignscan_path(root, baserel,
+ baserel->rows,
+ startup_cost,
+ total_cost,
+ NIL, /* no pathkeys */
+ NULL, /* no outer rel either */
+ fdw_private);
+ add_path(baserel, (Path *) path);
+
+ /*
+ * XXX We can consider sorted path or parameterized path here if we know
+ * that foreign table is indexed on remote end. For this purpose, we
+ * might have to support FOREIGN INDEX to represent possible sets of sort
+ * keys and/or filtering.
+ */
+}
+
+/*
+ * postgresGetForeignPlan
+ * Create ForeignScan plan node which implements selected best path
+ */
+static ForeignScan *
+postgresGetForeignPlan(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid,
+ ForeignPath *best_path,
+ List *tlist,
+ List *scan_clauses)
+{
+ PostgresFdwPlanState *fpstate;
+ Index scan_relid = baserel->relid;
+ List *fdw_private = NIL;
+ List *fdw_exprs = NIL;
+ List *local_exprs = NIL;
+ ListCell *lc;
+
+ /* Cache frequently accessed value */
+ fpstate = (PostgresFdwPlanState *) baserel->fdw_private;
+
+ /*
+ * We need lists of Expr other than the lists of RestrictInfo. Now we can
+ * merge remote_conds and param_conds into fdw_exprs, because they are
+ * evaluated on remote side for actual remote query.
+ */
+ foreach(lc, fpstate->remote_conds)
+ fdw_exprs = lappend(fdw_exprs, ((RestrictInfo *) lfirst(lc))->clause);
+ foreach(lc, fpstate->param_conds)
+ fdw_exprs = lappend(fdw_exprs, ((RestrictInfo *) lfirst(lc))->clause);
+ foreach(lc, fpstate->local_conds)
+ local_exprs = lappend(local_exprs,
+ ((RestrictInfo *) lfirst(lc))->clause);
+
+ /*
+ * Make a list contains SELECT statement to it to executor with plan node
+ * for later use.
+ */
+ fdw_private = lappend(fdw_private, makeString(fpstate->sql.data));
+
+ /*
+ * Create the ForeignScan node from target list, local filtering
+ * expressions, remote filtering expressions, and FDW private information.
+ *
+ * We remove expressions which are evaluated on remote side from qual of
+ * the scan node to avoid redundant filtering. Such filter reduction
+ * can be done only here, done after choosing best path, because
+ * baserestrictinfo in RelOptInfo is shared by all possible paths until
+ * best path is chosen.
+ */
+ return make_foreignscan(tlist,
+ local_exprs,
+ scan_relid,
+ fdw_exprs,
+ fdw_private);
+}
+
+/*
+ * postgresExplainForeignScan
+ * Produce extra output for EXPLAIN
+ */
+static void
+postgresExplainForeignScan(ForeignScanState *node, ExplainState *es)
+{
+ List *fdw_private;
+ char *sql;
+
+ fdw_private = ((ForeignScan *) node->ss.ps.plan)->fdw_private;
+ sql = strVal(list_nth(fdw_private, FdwPrivateSelectSql));
+ ExplainPropertyText("Remote SQL", sql, es);
+}
+
+/*
+ * postgresBeginForeignScan
+ * Initiate access to a foreign PostgreSQL table.
+ */
+static void
+postgresBeginForeignScan(ForeignScanState *node, int eflags)
+{
+ PostgresFdwExecutionState *festate;
+ PGconn *conn;
+ Oid relid;
+ ForeignTable *table;
+ ForeignServer *server;
+ UserMapping *user;
+ TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+ /*
+ * Do nothing in EXPLAIN (no ANALYZE) case. node->fdw_state stays NULL.
+ */
+ if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+ return;
+
+ /*
+ * Save state in node->fdw_state.
+ */
+ festate = (PostgresFdwExecutionState *)
+ palloc(sizeof(PostgresFdwExecutionState));
+ festate->fdw_private = ((ForeignScan *) node->ss.ps.plan)->fdw_private;
+
+ /*
+ * Create contexts for per-scan tuplestore under per-query context.
+ */
+ festate->scan_cxt = AllocSetContextCreate(node->ss.ps.state->es_query_cxt,
+ "postgres_fdw per-scan data",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ festate->temp_cxt = AllocSetContextCreate(node->ss.ps.state->es_query_cxt,
+ "postgres_fdw temporary data",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+
+ /*
+ * Get connection to the foreign server. Connection manager would
+ * establish new connection if necessary.
+ */
+ relid = RelationGetRelid(node->ss.ss_currentRelation);
+ table = GetForeignTable(relid);
+ server = GetForeignServer(table->serverid);
+ user = GetUserMapping(GetOuterUserId(), server->serverid);
+ conn = GetConnection(server, user, true);
+ festate->conn = conn;
+
+ /* Result will be filled in first Iterate call. */
+ festate->tuples = NULL;
+
+ /* Allocate buffers for column values. */
+ {
+ TupleDesc tupdesc = slot->tts_tupleDescriptor;
+ festate->values = palloc(sizeof(Datum) * tupdesc->natts);
+ festate->nulls = palloc(sizeof(bool) * tupdesc->natts);
+ festate->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+ }
+
+ /*
+ * Allocate buffers for query parameters.
+ *
+ * ParamListInfo might include entries for pseudo-parameter such as
+ * PL/pgSQL's FOUND variable, but we don't care that here, because wasted
+ * area seems not so large.
+ */
+ {
+ ParamListInfo params = node->ss.ps.state->es_param_list_info;
+ int numParams = params ? params->numParams : 0;
+
+ if (numParams > 0)
+ {
+ festate->param_types = palloc0(sizeof(Oid) * numParams);
+ festate->param_values = palloc0(sizeof(char *) * numParams);
+ }
+ else
+ {
+ festate->param_types = NULL;
+ festate->param_values = NULL;
+ }
+ }
+
+ /* Remember which foreign table we are scanning. */
+ festate->errpos.relid = relid;
+
+ /* Store FDW-specific state into ForeignScanState */
+ node->fdw_state = (void *) festate;
+
+ return;
+}
+
+/*
+ * postgresIterateForeignScan
+ * Retrieve next row from the result set, or clear tuple slot to indicate
+ * EOF.
+ *
+ * Note that using per-query context when retrieving tuples from
+ * tuplestore to ensure that returned tuples can survive until next
+ * iteration because the tuple is released implicitly via ExecClearTuple.
+ * Retrieving a tuple from tuplestore in CurrentMemoryContext (it's a
+ * per-tuple context), ExecClearTuple will free dangling pointer.
+ */
+static TupleTableSlot *
+postgresIterateForeignScan(ForeignScanState *node)
+{
+ PostgresFdwExecutionState *festate;
+ TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ festate = (PostgresFdwExecutionState *) node->fdw_state;
+
+ /*
+ * If this is the first call after Begin or ReScan, we need to execute
+ * remote query and get result set.
+ */
+ if (festate->tuples == NULL)
+ execute_query(node);
+
+ /*
+ * If tuples are still left in tuplestore, just return next tuple from it.
+ *
+ * It is necessary to switch to per-scan context to make returned tuple
+ * valid until next IterateForeignScan call, because it will be released
+ * with ExecClearTuple then. Otherwise, picked tuple is allocated in
+ * per-tuple context, and double-free of that tuple might happen.
+ *
+ * If we don't have any result in tuplestore, clear result slot to tell
+ * executor that this scan is over.
+ */
+ MemoryContextSwitchTo(festate->scan_cxt);
+ tuplestore_gettupleslot(festate->tuples, true, false, slot);
+ MemoryContextSwitchTo(oldcontext);
+
+ return slot;
+}
+
+/*
+ * postgresReScanForeignScan
+ * - Restart this scan by clearing old results and set re-execute flag.
+ */
+static void
+postgresReScanForeignScan(ForeignScanState *node)
+{
+ PostgresFdwExecutionState *festate;
+
+ festate = (PostgresFdwExecutionState *) node->fdw_state;
+
+ /* If we haven't have valid result yet, nothing to do. */
+ if (festate->tuples == NULL)
+ return;
+
+ /*
+ * Only rewind the current result set is enough.
+ */
+ tuplestore_rescan(festate->tuples);
+}
+
+/*
+ * postgresEndForeignScan
+ * Finish scanning foreign table and dispose objects used for this scan
+ */
+static void
+postgresEndForeignScan(ForeignScanState *node)
+{
+ PostgresFdwExecutionState *festate;
+
+ festate = (PostgresFdwExecutionState *) node->fdw_state;
+
+ /* if festate is NULL, we are in EXPLAIN; nothing to do */
+ if (festate == NULL)
+ return;
+
+ /*
+ * The connection which was used for this scan should be valid until the
+ * end of the scan to make the lifespan of remote transaction same as the
+ * local query.
+ */
+ ReleaseConnection(festate->conn);
+ festate->conn = NULL;
+
+ /* Discard fetch results */
+ if (festate->tuples != NULL)
+ {
+ tuplestore_end(festate->tuples);
+ festate->tuples = NULL;
+ }
+
+ /* MemoryContext will be deleted automatically. */
+}
+
+/*
+ * Estimate costs of executing given SQL statement.
+ */
+static void
+get_remote_estimate(const char *sql, PGconn *conn,
+ double *rows, int *width,
+ Cost *startup_cost, Cost *total_cost)
+{
+ PGresult *volatile res = NULL;
+ StringInfoData buf;
+ char *plan;
+ char *p;
+ int n;
+
+ /*
+ * Construct EXPLAIN statement with given SQL statement.
+ */
+ initStringInfo(&buf);
+ appendStringInfo(&buf, "EXPLAIN %s", sql);
+
+ /* PGresult must be released before leaving this function. */
+ PG_TRY();
+ {
+ res = PQexec(conn, buf.data);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK || PQntuples(res) == 0)
+ ereport(ERROR,
+ (errmsg("could not execute EXPLAIN for cost estimation"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+
+ /*
+ * Find estimation portion from top plan node. Here we search opening
+ * parentheses from the end of the line to avoid finding unexpected
+ * parentheses.
+ */
+ plan = PQgetvalue(res, 0, 0);
+ p = strrchr(plan, '(');
+ if (p == NULL)
+ elog(ERROR, "wrong EXPLAIN output: %s", plan);
+ n = sscanf(p,
+ "(cost=%lf..%lf rows=%lf width=%d)",
+ startup_cost, total_cost, rows, width);
+ if (n != 4)
+ elog(ERROR, "could not get estimation from EXPLAIN output");
+
+ PQclear(res);
+ res = NULL;
+ }
+ PG_CATCH();
+ {
+ PQclear(res);
+
+ /* Release connection and let connection manager cleanup. */
+ ReleaseConnection(conn);
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+}
+
+/*
+ * Execute remote query with current parameters.
+ */
+static void
+execute_query(ForeignScanState *node)
+{
+ PostgresFdwExecutionState *festate;
+ ParamListInfo params = node->ss.ps.state->es_param_list_info;
+ int numParams = params ? params->numParams : 0;
+ Oid *types = NULL;
+ const char **values = NULL;
+ char *sql;
+ PGconn *conn;
+ PGresult *volatile res = NULL;
+
+ festate = (PostgresFdwExecutionState *) node->fdw_state;
+ types = festate->param_types;
+ values = festate->param_values;
+
+ /*
+ * Construct parameter array in text format. We don't release memory for
+ * the arrays explicitly, because the memory usage would not be very large,
+ * and anyway they will be released in context cleanup.
+ *
+ * If this query is invoked from pl/pgsql function, we have extra entry
+ * for dummy variable FOUND in ParamListInfo, so we need to check type oid
+ * to exclude it from remote parameters.
+ */
+ if (numParams > 0)
+ {
+ int i;
+
+ for (i = 0; i < numParams; i++)
+ {
+ ParamExternData *prm = ¶ms->params[i];
+
+ /* give hook a chance in case parameter is dynamic */
+ if (!OidIsValid(prm->ptype) && params->paramFetch != NULL)
+ params->paramFetch(params, i + 1);
+
+ /*
+ * Get string representation of each parameter value by invoking
+ * type-specific output function unless the value is null or it's
+ * not used in the query.
+ */
+ types[i] = prm->ptype;
+ if (!prm->isnull && OidIsValid(types[i]))
+ {
+ Oid out_func_oid;
+ bool isvarlena;
+ FmgrInfo func;
+
+ getTypeOutputInfo(types[i], &out_func_oid, &isvarlena);
+ fmgr_info(out_func_oid, &func);
+ values[i] = OutputFunctionCall(&func, prm->value);
+ }
+ else
+ values[i] = NULL;
+
+ /*
+ * We use type "text" (groundless but seems most flexible) for
+ * unused (and type-unknown) parameters. We can't remove entry for
+ * unused parameter from the arrays, because parameter references
+ * in remote query ($n) have been indexed based on full length
+ * parameter list.
+ */
+ if (!OidIsValid(types[i]))
+ types[i] = TEXTOID;
+ }
+ }
+
+ conn = festate->conn;
+
+ /* PGresult must be released before leaving this function. */
+ PG_TRY();
+ {
+ bool first = true;
+
+ /*
+ * Execute remote query with parameters, and retrieve results with
+ * single-row-mode which returns results row by row.
+ */
+ sql = strVal(list_nth(festate->fdw_private, FdwPrivateSelectSql));
+ if (!PQsendQueryParams(conn, sql, numParams, types, values, NULL, NULL,
+ 0))
+ ereport(ERROR,
+ (errmsg("could not execute remote query"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+ if (!PQsetSingleRowMode(conn))
+ ereport(ERROR,
+ (errmsg("could not set single-row mode"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+
+ /* Retrieve result rows one by one, and store them into tuplestore. */
+ for (;;)
+ {
+ /* Allow users to cancel long query */
+ CHECK_FOR_INTERRUPTS();
+
+ res = PQgetResult(conn);
+ if (res == NULL)
+ break;
+
+ /* Store the result row into tuplestore */
+ if (PQresultStatus(res) == PGRES_SINGLE_TUPLE)
+ {
+ query_row_processor(res, node, first);
+ PQclear(res);
+ res = NULL;
+ first = false;
+ }
+ else if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ /*
+ * PGresult with PGRES_TUPLES_OK means EOF, so we need to
+ * initialize tuplestore if we have not retrieved any tuple.
+ */
+ if (first)
+ query_row_processor(res, node, first);
+ PQclear(res);
+ res = NULL;
+ first = true;
+ }
+ else
+ {
+ /* Something wrong happend, report the error. */
+ ereport(ERROR,
+ (errmsg("could not execute remote query"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+ }
+ }
+
+ /*
+ * We can't know whether the scan is over or not in custom row
+ * processor, so mark that the result is valid here.
+ */
+ tuplestore_donestoring(festate->tuples);
+
+ /* Discard result of SELECT statement. */
+ PQclear(res);
+ res = NULL;
+ }
+ PG_CATCH();
+ {
+ PQclear(res);
+
+ /* Release connection and let connection manager cleanup. */
+ ReleaseConnection(conn);
+
+ /* propagate error */
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+}
+
+/*
+ * Create tuples from PGresult and store them into tuplestore.
+ *
+ * Caller must use PG_TRY block to catch exception and release PGresult
+ * surely.
+ */
+static void
+query_row_processor(PGresult *res, ForeignScanState *node, bool first)
+{
+ int i;
+ int j;
+ int attnum; /* number of non-dropped columns */
+ TupleTableSlot *slot;
+ TupleDesc tupdesc;
+ Form_pg_attribute *attrs;
+ PostgresFdwExecutionState *festate;
+ AttInMetadata *attinmeta;
+ HeapTuple tuple;
+ ErrorContextCallback errcallback;
+ MemoryContext oldcontext;
+
+ /* Cache frequently used values */
+ slot = node->ss.ss_ScanTupleSlot;
+ tupdesc = slot->tts_tupleDescriptor;
+ attrs = tupdesc->attrs;
+ festate = (PostgresFdwExecutionState *) node->fdw_state;
+ attinmeta = festate->attinmeta;
+
+ if (first)
+ {
+ int nfields = PQnfields(res);
+
+ /* count non-dropped columns */
+ for (attnum = 0, i = 0; i < tupdesc->natts; i++)
+ if (!attrs[i]->attisdropped)
+ attnum++;
+
+ /* check result and tuple descriptor have the same number of columns */
+ if (attnum > 0 && attnum != nfields)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("remote query result rowtype does not match "
+ "the specified FROM clause rowtype"),
+ errdetail("expected %d, actual %d", attnum, nfields)));
+
+ /* First, ensure that the tuplestore is empty. */
+ if (festate->tuples == NULL)
+ {
+
+ /*
+ * Create tuplestore to store result of the query in per-query
+ * context. Note that we use this memory context to avoid memory
+ * leak in error cases.
+ */
+ oldcontext = MemoryContextSwitchTo(festate->scan_cxt);
+ festate->tuples = tuplestore_begin_heap(false, false, work_mem);
+ MemoryContextSwitchTo(oldcontext);
+ }
+ else
+ {
+ /* Clear old result just in case. */
+ tuplestore_clear(festate->tuples);
+ }
+
+ /* Do nothing for empty result */
+ if (PQntuples(res) == 0)
+ return;
+ }
+
+ /* Should have a single-row result if we get here */
+ Assert(PQntuples(res) == 1);
+
+ /*
+ * Do the following work in a temp context that we reset after each tuple.
+ * This cleans up not only the data we have direct access to, but any
+ * cruft the I/O functions might leak.
+ */
+ oldcontext = MemoryContextSwitchTo(festate->temp_cxt);
+
+ for (i = 0, j = 0; i < tupdesc->natts; i++)
+ {
+ /* skip dropped columns. */
+ if (attrs[i]->attisdropped)
+ {
+ festate->nulls[i] = true;
+ continue;
+ }
+
+ /*
+ * Set NULL indicator, and convert text representation to internal
+ * representation if any.
+ */
+ if (PQgetisnull(res, 0, j))
+ festate->nulls[i] = true;
+ else
+ {
+ Datum value;
+
+ festate->nulls[i] = false;
+
+ /*
+ * Set up and install callback to report where conversion error
+ * occurs.
+ */
+ festate->errpos.cur_attno = i + 1;
+ errcallback.callback = postgres_fdw_error_callback;
+ errcallback.arg = (void *) &festate->errpos;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ value = InputFunctionCall(&attinmeta->attinfuncs[i],
+ PQgetvalue(res, 0, j),
+ attinmeta->attioparams[i],
+ attinmeta->atttypmods[i]);
+ festate->values[i] = value;
+
+ /* Uninstall error context callback. */
+ error_context_stack = errcallback.previous;
+ }
+ j++;
+ }
+
+ /*
+ * Build the tuple and put it into the slot.
+ * We don't have to free the tuple explicitly because it's been
+ * allocated in the per-tuple context.
+ */
+ tuple = heap_form_tuple(tupdesc, festate->values, festate->nulls);
+ tuplestore_puttuple(festate->tuples, tuple);
+
+ /* Clean up */
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(festate->temp_cxt);
+
+ return;
+}
+
+/*
+ * Callback function which is called when error occurs during column value
+ * conversion. Print names of column and relation.
+ */
+static void
+postgres_fdw_error_callback(void *arg)
+{
+ ErrorPos *errpos = (ErrorPos *) arg;
+ const char *relname;
+ const char *colname;
+
+ relname = get_rel_name(errpos->relid);
+ colname = get_attname(errpos->relid, errpos->cur_attno);
+ errcontext("column %s of foreign table %s",
+ quote_identifier(colname), quote_identifier(relname));
+}
+
+/*
+ * postgresAnalyzeForeignTable
+ * Test whether analyzing this foreign table is supported
+ */
+static bool
+postgresAnalyzeForeignTable(Relation relation,
+ AcquireSampleRowsFunc *func,
+ BlockNumber *totalpages)
+{
+ *totalpages = 0;
+ *func = postgresAcquireSampleRowsFunc;
+
+ return true;
+}
+
+/*
+ * Acquire a random sample of rows from foreign table managed by postgres_fdw.
+ *
+ * postgres_fdw doesn't provide direct access to remote buffer, so we execute
+ * simple SELECT statement which retrieves whole rows from remote side, and
+ * pick some samples from them.
+ */
+static int
+postgresAcquireSampleRowsFunc(Relation relation, int elevel,
+ HeapTuple *rows, int targrows,
+ double *totalrows,
+ double *totaldeadrows)
+{
+ PostgresAnalyzeState astate;
+ StringInfoData sql;
+ ForeignTable *table;
+ ForeignServer *server;
+ UserMapping *user;
+ PGconn *conn = NULL;
+ PGresult *volatile res = NULL;
+
+ /*
+ * Only few information are necessary as input to row processor. Other
+ * initialization will be done at the first row processor call.
+ */
+ astate.anl_cxt = CurrentMemoryContext;
+ astate.temp_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "postgres_fdw analyze temporary data",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ astate.rows = rows;
+ astate.targrows = targrows;
+ astate.tupdesc = relation->rd_att;
+ astate.errpos.relid = relation->rd_id;
+
+ /*
+ * Construct SELECT statement which retrieves whole rows from remote. We
+ * can't avoid running sequential scan on remote side to get practical
+ * statistics, so this seems reasonable compromise.
+ */
+ initStringInfo(&sql);
+ deparseAnalyzeSql(&sql, relation);
+ elog(DEBUG3, "Analyze SQL: %s", sql.data);
+
+ table = GetForeignTable(relation->rd_id);
+ server = GetForeignServer(table->serverid);
+ user = GetUserMapping(GetOuterUserId(), server->serverid);
+ conn = GetConnection(server, user, true);
+
+ /*
+ * Acquire sample rows from the result set.
+ */
+ PG_TRY();
+ {
+ bool first = true;
+
+ /* Execute remote query and retrieve results row by row. */
+ if (!PQsendQuery(conn, sql.data))
+ ereport(ERROR,
+ (errmsg("could not execute remote query for analyze"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql.data)));
+ if (!PQsetSingleRowMode(conn))
+ ereport(ERROR,
+ (errmsg("could not set single-row mode"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql.data)));
+
+ /* Retrieve result rows one by one, and store them into tuplestore. */
+ for (;;)
+ {
+ /* Allow users to cancel long query */
+ CHECK_FOR_INTERRUPTS();
+
+ res = PQgetResult(conn);
+ if (res == NULL)
+ break;
+
+ /* Store the result row into tuplestore */
+ if (PQresultStatus(res) == PGRES_SINGLE_TUPLE)
+ {
+ analyze_row_processor(res, &astate, first);
+ PQclear(res);
+ res = NULL;
+ first = false;
+ }
+ else if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ /*
+ * PGresult with PGRES_TUPLES_OK means EOF, so we need to
+ * initialize tuplestore if we have not retrieved any tuple.
+ */
+ if (first && PQresultStatus(res) == PGRES_TUPLES_OK)
+ analyze_row_processor(res, &astate, first);
+
+ PQclear(res);
+ res = NULL;
+ first = true;
+ }
+ else
+ {
+ /* Something wrong happend, report the error. */
+ ereport(ERROR,
+ (errmsg("could not execute remote query for analyze"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql.data)));
+ }
+ }
+ }
+ PG_CATCH();
+ {
+ PQclear(res);
+
+ /* Release connection and let connection manager cleanup. */
+ ReleaseConnection(conn);
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ ReleaseConnection(conn);
+
+ /* We assume that we have no dead tuple. */
+ *totaldeadrows = 0.0;
+
+ /* We've retrieved all living tuples from foreign server. */
+ *totalrows = astate.samplerows;
+
+ /*
+ * We don't update pg_class.relpages because we don't care that in
+ * planning at all.
+ */
+
+ /*
+ * Emit some interesting relation info
+ */
+ ereport(elevel,
+ (errmsg("\"%s\": scanned with \"%s\", "
+ "containing %.0f live rows and %.0f dead rows; "
+ "%d rows in sample, %.0f estimated total rows",
+ RelationGetRelationName(relation), sql.data,
+ astate.samplerows, 0.0,
+ astate.numrows, astate.samplerows)));
+
+ return astate.numrows;
+}
+
+/*
+ * Custom row processor for acquire_sample_rows.
+ *
+ * Collect sample rows from the result of query.
+ * - Use all tuples as sample until target rows samples are collected.
+ * - Once reached the target, skip some tuples and replace already sampled
+ * tuple randomly.
+ */
+static void
+analyze_row_processor(PGresult *res, PostgresAnalyzeState *astate, bool first)
+{
+ int targrows = astate->targrows;
+ TupleDesc tupdesc = astate->tupdesc;
+ int i;
+ int j;
+ int pos; /* position where next sample should be stored. */
+ HeapTuple tuple;
+ ErrorContextCallback errcallback;
+ MemoryContext callercontext;
+
+ if (first)
+ {
+ /* Prepare for sampling rows */
+ astate->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+ astate->values = (Datum *) palloc(sizeof(Datum) * tupdesc->natts);
+ astate->nulls = (bool *) palloc(sizeof(bool) * tupdesc->natts);
+ astate->numrows = 0;
+ astate->samplerows = 0;
+ astate->rowstoskip = -1;
+ astate->numrows = 0;
+ astate->rstate = anl_init_selection_state(astate->targrows);
+
+ /* Do nothing for empty result */
+ if (PQntuples(res) == 0)
+ return;
+ }
+
+ /* Should have a single-row result if we get here */
+ Assert(PQntuples(res) == 1);
+
+ /*
+ * Do the following work in a temp context that we reset after each tuple.
+ * This cleans up not only the data we have direct access to, but any
+ * cruft the I/O functions might leak.
+ */
+ callercontext = MemoryContextSwitchTo(astate->temp_cxt);
+
+ /*
+ * First targrows rows are once sampled always. If we have more source
+ * rows, pick up some of them by skipping and replace already sampled
+ * tuple randomly.
+ *
+ * Here we just determine the slot where next sample should be stored. Set
+ * pos to negative value to indicates the row should be skipped.
+ */
+ if (astate->numrows < targrows)
+ pos = astate->numrows++;
+ else
+ {
+ /*
+ * The first targrows sample rows are simply copied into
+ * the reservoir. Then we start replacing tuples in the
+ * sample until we reach the end of the relation. This
+ * algorithm is from Jeff Vitter's paper, similarly to
+ * acquire_sample_rows in analyze.c.
+ *
+ * We don't have block-wise accessibility, so every row in
+ * the PGresult is possible to be sample.
+ */
+ if (astate->rowstoskip < 0)
+ astate->rowstoskip = anl_get_next_S(astate->samplerows, targrows,
+ &astate->rstate);
+
+ if (astate->rowstoskip <= 0)
+ {
+ int k = (int) (targrows * anl_random_fract());
+
+ Assert(k >= 0 && k < targrows);
+
+ /*
+ * Create sample tuple from the result, and replace at
+ * random.
+ */
+ heap_freetuple(astate->rows[k]);
+ pos = k;
+ }
+ else
+ pos = -1;
+
+ astate->rowstoskip -= 1;
+ }
+
+ /* Always increment sample row counter. */
+ astate->samplerows += 1;
+
+ if (pos >= 0)
+ {
+ AttInMetadata *attinmeta = astate->attinmeta;
+
+ /*
+ * Create sample tuple from current result row, and store it into the
+ * position determined above. Note that i and j point entries in
+ * catalog and columns array respectively.
+ */
+ for (i = 0, j = 0; i < tupdesc->natts; i++)
+ {
+ if (tupdesc->attrs[i]->attisdropped)
+ continue;
+
+ if (PQgetisnull(res, 0, j))
+ astate->nulls[i] = true;
+ else
+ {
+ Datum value;
+
+ astate->nulls[i] = false;
+
+ /*
+ * Set up and install callback to report where conversion error
+ * occurs.
+ */
+ astate->errpos.cur_attno = i + 1;
+ errcallback.callback = postgres_fdw_error_callback;
+ errcallback.arg = (void *) &astate->errpos;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ value = InputFunctionCall(&attinmeta->attinfuncs[i],
+ PQgetvalue(res, 0, j),
+ attinmeta->attioparams[i],
+ attinmeta->atttypmods[i]);
+ astate->values[i] = value;
+
+ /* Uninstall error callback function. */
+ error_context_stack = errcallback.previous;
+ }
+ j++;
+ }
+
+ /*
+ * Generate tuple from the result row data, and store it into the give
+ * buffer. Note that we need to allocate the tuple in the analyze
+ * context to make it valid even after temporary per-tuple context has
+ * been reset.
+ */
+ MemoryContextSwitchTo(astate->anl_cxt);
+ tuple = heap_form_tuple(tupdesc, astate->values, astate->nulls);
+ MemoryContextSwitchTo(astate->temp_cxt);
+ astate->rows[pos] = tuple;
+ }
+
+ /* Clean up */
+ MemoryContextSwitchTo(callercontext);
+ MemoryContextReset(astate->temp_cxt);
+
+ return;
+}
diff --git a/contrib/postgres_fdw/postgres_fdw.control b/contrib/postgres_fdw/postgres_fdw.control
new file mode 100644
index 0000000..f9ed490
--- /dev/null
+++ b/contrib/postgres_fdw/postgres_fdw.control
@@ -0,0 +1,5 @@
+# postgres_fdw extension
+comment = 'foreign-data wrapper for remote PostgreSQL servers'
+default_version = '1.0'
+module_pathname = '$libdir/postgres_fdw'
+relocatable = true
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
new file mode 100644
index 0000000..b5cefb8
--- /dev/null
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -0,0 +1,45 @@
+/*-------------------------------------------------------------------------
+ *
+ * postgres_fdw.h
+ * foreign-data wrapper for remote PostgreSQL servers.
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgres_fdw/postgres_fdw.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef POSTGRESQL_FDW_H
+#define POSTGRESQL_FDW_H
+
+#include "postgres.h"
+#include "foreign/foreign.h"
+#include "nodes/relation.h"
+#include "utils/relcache.h"
+
+/* in option.c */
+void InitPostgresFdwOptions(void);
+int ExtractConnectionOptions(List *defelems,
+ const char **keywords,
+ const char **values);
+int GetFetchCountOption(ForeignTable *table, ForeignServer *server);
+
+/* in deparse.c */
+void deparseSimpleSql(StringInfo buf,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ List *local_conds);
+void appendWhereClause(StringInfo buf,
+ bool has_where,
+ List *exprs,
+ PlannerInfo *root);
+void classifyConditions(PlannerInfo *root,
+ RelOptInfo *baserel,
+ List **remote_conds,
+ List **param_conds,
+ List **local_conds);
+void deparseAnalyzeSql(StringInfo buf, Relation rel);
+
+#endif /* POSTGRESQL_FDW_H */
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
new file mode 100644
index 0000000..234107d
--- /dev/null
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -0,0 +1,312 @@
+-- ===================================================================
+-- create FDW objects
+-- ===================================================================
+
+-- Clean up in case a prior regression run failed
+
+-- Suppress NOTICE messages when roles don't exist
+SET client_min_messages TO 'error';
+
+DROP ROLE IF EXISTS postgres_fdw_user;
+
+RESET client_min_messages;
+
+CREATE ROLE postgres_fdw_user LOGIN SUPERUSER;
+SET SESSION AUTHORIZATION 'postgres_fdw_user';
+
+CREATE EXTENSION postgres_fdw;
+
+CREATE SERVER loopback1 FOREIGN DATA WRAPPER postgres_fdw;
+CREATE SERVER loopback2 FOREIGN DATA WRAPPER postgres_fdw
+ OPTIONS (dbname 'contrib_regression');
+
+CREATE USER MAPPING FOR public SERVER loopback1
+ OPTIONS (user 'value', password 'value');
+CREATE USER MAPPING FOR postgres_fdw_user SERVER loopback2;
+
+-- ===================================================================
+-- create objects used through FDW
+-- ===================================================================
+CREATE TYPE user_enum AS ENUM ('foo', 'bar', 'buz');
+CREATE SCHEMA "S 1";
+CREATE TABLE "S 1"."T 1" (
+ "C 1" int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum,
+ CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
+);
+CREATE TABLE "S 1"."T 2" (
+ c1 int NOT NULL,
+ c2 text,
+ CONSTRAINT t2_pkey PRIMARY KEY (c1)
+);
+
+BEGIN;
+TRUNCATE "S 1"."T 1";
+INSERT INTO "S 1"."T 1"
+ SELECT id,
+ id % 10,
+ to_char(id, 'FM00000'),
+ '1970-01-01'::timestamptz + ((id % 100) || ' days')::interval,
+ '1970-01-01'::timestamp + ((id % 100) || ' days')::interval,
+ id % 10,
+ id % 10,
+ 'foo'::user_enum
+ FROM generate_series(1, 1000) id;
+TRUNCATE "S 1"."T 2";
+INSERT INTO "S 1"."T 2"
+ SELECT id,
+ 'AAA' || to_char(id, 'FM000')
+ FROM generate_series(1, 100) id;
+COMMIT;
+
+-- ===================================================================
+-- create foreign tables
+-- ===================================================================
+CREATE FOREIGN TABLE ft1 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft1 DROP COLUMN c0;
+
+CREATE FOREIGN TABLE ft2 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft2 DROP COLUMN c0;
+
+-- ===================================================================
+-- tests for postgres_fdw_validator
+-- ===================================================================
+-- requiressl, krbsrvname and gsslib are omitted because they depend on
+-- configure option
+ALTER SERVER loopback1 OPTIONS (
+ use_remote_explain 'false',
+ fdw_startup_cost '123.456',
+ fdw_tuple_cost '0.123',
+ authtype 'value',
+ service 'value',
+ connect_timeout 'value',
+ dbname 'value',
+ host 'value',
+ hostaddr 'value',
+ port 'value',
+ --client_encoding 'value',
+ tty 'value',
+ options 'value',
+ application_name 'value',
+ --fallback_application_name 'value',
+ keepalives 'value',
+ keepalives_idle 'value',
+ keepalives_interval 'value',
+ -- requiressl 'value',
+ sslcompression 'value',
+ sslmode 'value',
+ sslcert 'value',
+ sslkey 'value',
+ sslrootcert 'value',
+ sslcrl 'value'
+ --requirepeer 'value',
+ -- krbsrvname 'value',
+ -- gsslib 'value',
+ --replication 'value'
+);
+ALTER USER MAPPING FOR public SERVER loopback1
+ OPTIONS (DROP user, DROP password);
+ALTER FOREIGN TABLE ft1 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft2 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+\dew+
+\des+
+\deu+
+\det+
+
+-- Use only Nested loop for stable results.
+SET enable_mergejoin TO off;
+SET enable_hashjoin TO off;
+
+-- ===================================================================
+-- simple queries
+-- ===================================================================
+-- single table, with/without alias
+EXPLAIN (COSTS false) SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+-- empty result
+SELECT * FROM ft1 WHERE false;
+-- with WHERE clause
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+-- aggregate
+SELECT COUNT(*) FROM ft1 t1;
+-- join two tables
+SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+-- subquery
+SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
+-- subquery+MAX
+SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY c1;
+-- used in CTE
+WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
+-- fixed values
+SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+-- user-defined operator/function
+CREATE FUNCTION postgres_fdw_abs(int) RETURNS int AS $$
+BEGIN
+RETURN abs($1);
+END
+$$ LANGUAGE plpgsql IMMUTABLE;
+CREATE OPERATOR === (
+ LEFTARG = int,
+ RIGHTARG = int,
+ PROCEDURE = int4eq,
+ COMMUTATOR = ===,
+ NEGATOR = !==
+);
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = postgres_fdw_abs(t1.c2);
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 === t1.c2;
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = abs(t1.c2);
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = t1.c2;
+
+-- ===================================================================
+-- WHERE push down
+-- ===================================================================
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1; -- Var, OpExpr(b), Const
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NULL; -- NullTest
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL; -- NullTest
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = -c1; -- OpExpr(l)
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE 1 = c1!; -- OpExpr(r)
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = (ARRAY[c1,c2,3])[1]; -- ArrayRef
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo'; -- no push-down
+
+-- ===================================================================
+-- parameterized queries
+-- ===================================================================
+-- simple join
+PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
+EXPLAIN (COSTS false) EXECUTE st1(1, 2);
+EXECUTE st1(1, 1);
+EXECUTE st1(101, 101);
+-- subquery using stable function (can't be pushed down)
+PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c4) = 6) ORDER BY c1;
+EXPLAIN (COSTS false) EXECUTE st2(10, 20);
+EXECUTE st2(10, 20);
+EXECUTE st1(101, 101);
+-- subquery using immutable function (can be pushed down)
+PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c5) = 6) ORDER BY c1;
+EXPLAIN (COSTS false) EXECUTE st3(10, 20);
+EXECUTE st3(10, 20);
+EXECUTE st3(20, 30);
+-- custom plan should be chosen
+PREPARE st4(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 = $1;
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+-- cleanup
+DEALLOCATE st1;
+DEALLOCATE st2;
+DEALLOCATE st3;
+DEALLOCATE st4;
+
+-- ===================================================================
+-- used in pl/pgsql function
+-- ===================================================================
+CREATE OR REPLACE FUNCTION f_test(p_c1 int) RETURNS int AS $$
+DECLARE
+ v_c1 int;
+BEGIN
+ SELECT c1 INTO v_c1 FROM ft1 WHERE c1 = p_c1 LIMIT 1;
+ PERFORM c1 FROM ft1 WHERE c1 = p_c1 AND p_c1 = v_c1 LIMIT 1;
+ RETURN v_c1;
+END;
+$$ LANGUAGE plpgsql;
+SELECT f_test(100);
+DROP FUNCTION f_test(int);
+
+-- ===================================================================
+-- cost estimation options
+-- ===================================================================
+ALTER SERVER loopback1 OPTIONS (SET use_remote_explain 'true');
+ALTER SERVER loopback1 OPTIONS (SET fdw_startup_cost '0');
+ALTER SERVER loopback1 OPTIONS (SET fdw_tuple_cost '0');
+EXPLAIN (COSTS false) SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+ALTER SERVER loopback1 OPTIONS (DROP use_remote_explain);
+ALTER SERVER loopback1 OPTIONS (DROP fdw_startup_cost);
+ALTER SERVER loopback1 OPTIONS (DROP fdw_tuple_cost);
+
+-- ===================================================================
+-- connection management
+-- ===================================================================
+SELECT srvname, usename FROM postgres_fdw_connections;
+SELECT postgres_fdw_disconnect(srvid, usesysid) FROM postgres_fdw_get_connections();
+SELECT srvname, usename FROM postgres_fdw_connections;
+
+-- ===================================================================
+-- conversion error
+-- ===================================================================
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE int;
+SELECT * FROM ft1 WHERE c1 = 1; -- ERROR
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE timestamp;
+
+-- ===================================================================
+-- subtransaction
+-- + local/remote error doesn't break cursor
+-- + remote error discards connection
+-- ===================================================================
+BEGIN;
+DECLARE c CURSOR FOR SELECT * FROM ft1 ORDER BY c1;
+FETCH c;
+SAVEPOINT s;
+ERROR OUT; -- ERROR
+ROLLBACK TO s;
+SELECT srvname FROM postgres_fdw_connections;
+FETCH c;
+SAVEPOINT s;
+SELECT * FROM ft1 WHERE 1 / (c1 - 1) > 0; -- ERROR
+ROLLBACK TO s;
+SELECT srvname FROM postgres_fdw_connections;
+FETCH c;
+SELECT * FROM ft1 ORDER BY c1 LIMIT 1;
+COMMIT;
+SELECT srvname FROM postgres_fdw_connections;
+ERROR OUT; -- ERROR
+SELECT srvname FROM postgres_fdw_connections;
+
+-- ===================================================================
+-- cleanup
+-- ===================================================================
+DROP OPERATOR === (int, int) CASCADE;
+DROP OPERATOR !== (int, int) CASCADE;
+DROP FUNCTION postgres_fdw_abs(int);
+DROP SCHEMA "S 1" CASCADE;
+DROP TYPE user_enum CASCADE;
+DROP EXTENSION postgres_fdw CASCADE;
+\c
+DROP ROLE postgres_fdw_user;
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 6b13a0a..39e9827 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -132,6 +132,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
&pgstatstatements;
&pgstattuple;
&pgtrgm;
+ &postgres-fdw;
&seg;
&sepgsql;
&contrib-spi;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index db4cc3a..354111a 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -133,6 +133,7 @@
<!ENTITY pgtesttiming SYSTEM "pgtesttiming.sgml">
<!ENTITY pgtrgm SYSTEM "pgtrgm.sgml">
<!ENTITY pgupgrade SYSTEM "pgupgrade.sgml">
+<!ENTITY postgres-fdw SYSTEM "postgres-fdw.sgml">
<!ENTITY seg SYSTEM "seg.sgml">
<!ENTITY contrib-spi SYSTEM "contrib-spi.sgml">
<!ENTITY sepgsql SYSTEM "sepgsql.sgml">
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
new file mode 100644
index 0000000..1f00665
--- /dev/null
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -0,0 +1,434 @@
+<!-- doc/src/sgml/postgres-fdw.sgml -->
+
+<sect1 id="postgres-fdw" xreflabel="postgres_fdw">
+ <title>postgres_fdw</title>
+
+ <indexterm zone="postgres-fdw">
+ <primary>postgres_fdw</primary>
+ </indexterm>
+
+ <para>
+ The <filename>postgres_fdw</filename> module provides a foreign-data
+ wrapper for external <productname>PostgreSQL</productname> servers.
+ With this module, users can access data stored in external
+ <productname>PostgreSQL</productname> via plain SQL statements.
+ </para>
+
+ <para>
+ Default wrapper <literal>postgres_fdw</literal> is created automatically
+ during <command>CREATE EXTENSION</command> command for
+ <application>postgres_fdw</application>, so what you need to do to execute
+ queries are:
+ <orderedlist spacing="compact">
+ <listitem>
+ <para>
+ Create foreign server with <command>CREATE SERVER</command> command for
+ each remote database you want to connect. You need to specify connection
+ information except <literal>user</literal> and <literal>password</literal>
+ on it.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Create user mapping for servers with
+ <command>CREATE USER MAPPING</command> command for each user you want to
+ allow accessing the foreign server. You need to specify
+ <literal>user</literal> and <literal>password</literal> on it.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Create foreign table with <command>CREATE FOREIGN TABLE</command> command
+ for each relation you want to access. If you want to use different name
+ from remote one, you need to specify object name options (see below).
+ </para>
+ <para>
+ It is recommended to use same data types as those of remote columns,
+ though libpq text protocol allows flexible conversions between similar
+ data types.
+ </para>
+ </listitem>
+ </orderedlist>
+ </para>
+
+ <sect2>
+ <title>FDW Options of postgres_fdw</title>
+
+ <sect3>
+ <title>Connection Options</title>
+ <para>
+ A foreign server and user mapping created using this wrapper can have
+ <application>libpq</> connection options, expect below:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ client_encoding (automatically determined from the local server encoding)
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ fallback_application_name (fixed to <literal>postgres_fdw</literal>)
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ replication (never used for foreign-data wrapper connection)
+ </para>
+ </listitem>
+ </itemizedlist>
+
+ For details of <application>libpq</> connection options, see
+ <xref linkend="libpq-connect">.
+ </para>
+ <para>
+ <literal>user</literal> and <literal>password</literal> can be
+ specified on user mappings, and others can be specified on foreign servers.
+ </para>
+ <para>
+ Note that only superusers may connect foreign servers without password
+ authentication, so specify <literal>password</literal> FDW option on
+ corresponding user mappings for non-superusers.
+ </para>
+ </sect3>
+
+ <sect3>
+ <title>Object Name Options</title>
+ <para>
+ Foreign tables which were created using this wrapper, or their columns can
+ have object name options. These options can be used to specify the names
+ used in SQL statement sent to remote <productname>PostgreSQL</productname>
+ server. These options are useful when remote objects have different names
+ from corresponding local ones.
+ </para>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>nspname</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a foreign table, is used as a
+ namespace (schema) reference in the SQL statement. If this options is
+ omitted, <literal>pg_class.nspname</literal> of the foreign table is
+ used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>relname</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a foreign table, is used as a
+ relation (table) reference in the SQL statement. If this options is
+ omitted, <literal>pg_class.relname</literal> of the foreign table is
+ used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>colname</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a column of a foreign table, is
+ used as a column (attribute) reference in the SQL statement. If this
+ option is omitted, <literal>pg_attribute.attname</literal> of the column
+ of the foreign table is used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ </sect3>
+
+ <sect3>
+ <title>Cost Estimation Options</title>
+ <para>
+ The <application>postgres_fdw</> retrieves foreign data by executing queries
+ against foreign servers, so foreign scans usually cost more than scans done
+ on local side. To reflect various circumstance of foreign servers,
+ <application>postgres_fdw</> provides some options:
+ </para>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>use_remote_estimate</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a foreign table or a foreign
+ server, is used to control <application>postgres_fdw</>'s behavior about
+ estimation of rows and width. If this was set to
+ <literal>true</literal>, remote <command>EXPLAIN</command> is
+ executed in the early step of planning. This would give better estimate
+ of rows and width, but it also introduces some overhead. This option
+ defaults to <literal>false</literal>.
+ </para>
+ <para>
+ The <application>postgres_fdw</> supports gathering statistics of
+ foreign data from foreign servers and store them on local side via
+ <command>ANALYZE</command>, so we can estimate reasonable rows and width
+ of result of a query from them. However, if target foreign table is
+ frequently updated, local statistics would be obsolete soon.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>fdw_startup_cost</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a foreign server, is used as
+ additional startup cost per a scan. If planner overestimates or
+ underestimates startup cost of a foreign scan, change this to reflect
+ the actual overhead.
+ </para>
+ <para>
+ Defaults to <literal>100</literal>. The default value is groundless,
+ but this would be enough to make most foreign scans to have more cost
+ than local scans, even that foreign scan returns nothing.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>fdw_tuple_cost</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a foreign server, is used as
+ additional cost per tuple, which reflects overhead of tuple
+ manipulation and transfer between servers. If a foreign server is far
+ or near in the network, or a foreign server has different performance
+ characteristics, use this option to tell planner that.
+ </para>
+ <para>
+ Defaults to <literal>0.01</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ </sect3>
+
+ </sect2>
+
+ <sect2>
+ <title>Connection Management</title>
+
+ <para>
+ The <application>postgres_fdw</application> establishes a connection to a
+ foreign server in the beginning of the first query which uses a foreign
+ table associated to the foreign server, and reuses the connection following
+ queries and even in following foreign scans in same query.
+
+ You can see the list of active connections via
+ <structname>postgres_fdw_connections</structname> view. It shows pairs of
+ oid and name of server and local role for each active connections
+ established by <application>postgres_fdw</application>. For security
+ reason, only superuser can see other role's connections.
+ </para>
+
+ <para>
+ Established connections are kept alive until local role changes or the
+ current transaction aborts or user requests so.
+ </para>
+
+ <para>
+ If role has been changed, active connections established as old local role
+ is kept alive but never be reused until local role has restored to original
+ role. This kind of situation happens with <command>SET ROLE</command> and
+ <command>SET SESSION AUTHORIZATION</command>.
+ </para>
+
+ <para>
+ If current transaction aborts by error or user request, all active
+ connections are disconnected automatically. This behavior avoids possible
+ connection leaks on error.
+ </para>
+
+ <para>
+ You can discard persistent connection at arbitrary timing with
+ <function>postgres_fdw_disconnect()</function>. It takes server oid and
+ user oid as arguments. This function can handle only connections
+ established in current session; connections established by other backends
+ are not reachable.
+ </para>
+
+ <para>
+ You can discard all active and visible connections in current session with
+ using <structname>postgres_fdw_connections</structname> and
+ <function>postgres_fdw_disconnect()</function> together:
+<synopsis>
+postgres=# SELECT postgres_fdw_disconnect(srvid, usesysid) FROM postgres_fdw_connections;
+ postgres_fdw_disconnect
+-------------------------
+ OK
+ OK
+(2 rows)
+</synopsis>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Transaction Management</title>
+ <para>
+ The <application>postgres_fdw</application> begins remote transaction at
+ the beginning of a local query, and terminates it with
+ <command>ABORT</command> at the end of the local query. This means that all
+ foreign scans on a foreign server in a local query are executed in one
+ transaction.
+ </para>
+ <para>
+ Isolation level of remote transaction is determined from local transaction's
+ isolation level.
+ <table id="postgres-fdw-isolation-level">
+ <title>Isolation Level Mapping</title>
+
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry>Local Isolation Level</entry>
+ <entry>Remote Isolation Level</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>READ UNCOMMITTED</entry>
+ <entry morerows="2">REPEATABLE READ</entry>
+ </row>
+ <row>
+ <entry>READ COMMITTED</entry>
+ </row>
+ <row>
+ <entry>REPEATABLE READ</entry>
+ </row>
+ <row>
+ <entry>SERIALIZABLE</entry>
+ <entry>SERIALIZABLE</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </para>
+ <para>
+ <literal>READ UNCOMMITTED</literal> and <literal>READ COMMITTED</literal>
+ are never used for remote transactions, because even
+ <literal>READ COMMITTED</literal> transactions might produce inconsistent
+ results, if remote data has been updated between two remote queries (it
+ can happen in a local query).
+ </para>
+ <para>
+ Note that even if the isolation level of local transaction was
+ <literal>SERIALIZABLE</literal> or <literal>REPEATABLE READ</literal>,
+ executing same query repeatedly might produce different result, because
+ foreign scans in different local queries are executed in different remote
+ transactions. For instance, if external data was update between two same
+ queries in a <literal>SERIALIZABLE</literal> local transaction, client
+ receives different results.
+ </para>
+ <para>
+ This restriction might be relaxed in future release.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Estimate Costs, Rows and Width</title>
+ <para>
+ The <application>postgres_fdw</application> estimates the costs of a
+ foreign scan in two ways. In either way, selectivity of restrictions are
+ concerned to give proper estimate.
+ </para>
+ <para>
+ If <literal>use_remote_estimate</literal> was set to
+ <literal>false</literal> (default behavior), <application>postgres_fdw</>
+ assumes that external data have not been changed so much, and uses local
+ statistics as-is. It is recommended to execute <command>ANALYZE</command>
+ to keep local statistics reflect characteristics of external data.
+ Otherwise, <application>postgres_fdw</> executes remote
+ <command>EXPLAIN</command> in the beginning of a foreign scan to get remote
+ estimate of the remote query. This would provide better estimate but
+ requires some overhead.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Remote Query Optimization</title>
+ <para>
+ The <application>postgres_fdw</> optimizes remote queries to reduce amount
+ of data transferred from foreign servers.
+ <itemizedlist>
+ <listitem>
+ <para>
+ Restrictions which have same semantics on remote side are pushed down.
+ For example, restrictions which contain elements below might have
+ different semantics on remote side.
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ User defined objects, such as functions, operators, and types.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Functions defined as <literal>STABLE</literal> or
+ <literal>VOLATILE</literal>, and operators which use such functions.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Collatable types, such as text or varchar, with some exception (see
+ below).
+ </para>
+ <para>
+ Basically we assume that collatable expressions have different
+ semantics, because remote server might has different collation
+ setting, but this assumption causes denying simple and usual
+ expressions, such as <literal>text_col = 'string'</literal> to be
+ pushed down. So <application>postgres_fdw</application> treats
+ operator <literal>=</literal> and <literal><></literal> as safe
+ to push down even if they take collatable types as arguments.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Unnecessary columns in <literal>SElECT</literal> clause of remote queries
+ are replaced with <literal>NULL</literal> literal.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>EXPLAIN Output</title>
+ <para>
+ For each foreign table using <literal>postgres_fdw</>, <command>EXPLAIN</>
+ shows a remote SQL statement which is sent to remote
+ <productname>PostgreSQL</productname> server for a ForeignScan plan node.
+ For example:
+ </para>
+<synopsis>
+postgres=# EXPLAIN SELECT aid FROM pgbench_accounts WHERE abalance < 0;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on pgbench_accounts (cost=100.00..100.11 rows=1 width=97)
+ Remote SQL: SELECT aid, bid, abalance, filler FROM public.pgbench_accounts WHERE ((abalance OPERATOR(pg_catalog.<) 0))
+(2 rows)
+</synopsis>
+ </sect2>
+
+ <sect2>
+ <title>Author</title>
+ <para>
+ Shigeru Hanada <email>shigeru.hanada@gmail.com</email>
+ </para>
+ </sect2>
+
+</sect1>
2012/11/15 Shigeru Hanada <shigeru.hanada@gmail.com>:
Hi Kaigai-san,
Sorry for delayed response. I updated the patch, although I didn't change
any about timing issue you and Fujita-san concern.1) add some FDW options for cost estimation. Default behavior is not
changed.
2) get rid of array of libpq option names, similary to recent change of
dblink
3) enhance document, especially remote query optimization
4) rename to postgres_fdw, to avoid naming conflict with the validator which
exists in core
5) cope with changes about error context handlingOn Tue, Nov 6, 2012 at 7:36 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
Isn't it possible to pick-up only columns to be used in targetlist or
local qualifiers,
without modification of baserestrictinfo?IMO, it's possible. postgres_fdw doesn't modify baserestrictinfo at all; it
just create two new lists which exclusively point RestrictInfo elements in
baserestrictinfo. Pulling vars up from conditions which can't be pushed
down would gives us list of necessary columns. Am I missing something?
Hanada-san,
Let me comments on several points randomly.
I also think the new "use_remote_explain" option is good. It works fine
when we try to use this fdw over the network with latency more or less.
It seems to me its default is "false", thus, GetForeignRelSize will return
the estimated cost according to ANALYZE, instead of remote EXPLAIN.
Even though you mention the default behavior was not changed, is it
an expected one? My preference is the current one, as is.
The deparseFuncExpr() still has case handling whether it is explicit case,
implicit cast or regular functions. If my previous proposition has no flaw,
could you fix up it using regular function invocation manner? In case when
remote node has incompatible implicit-cast definition, this logic can make
a problem.
At InitPostgresFdwOptions(), the source comment says we don't use
malloc() here for simplification of code. Hmm. I'm not sure why it is more
simple. It seems to me we have no reason why to avoid malloc here, even
though libpq options are acquired using malloc().
Regarding to the regression test.
[kaigai@iwashi sepgsql]$ cat contrib/postgres_fdw/regression.diffs
*** /home/kaigai/repo/sepgsql/contrib/postgres_fdw/expected/postgres_fdw.out
Sat Nov 17 21:31:19 2012
--- /home/kaigai/repo/sepgsql/contrib/postgres_fdw/results/postgres_fdw.out
Tue Nov 20 13:53:32 2012
***************
*** 621,627 ****
-- ===================================================================
ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE int;
SELECT * FROM ft1 WHERE c1 = 1; -- ERROR
! ERROR: invalid input syntax for integer: "1970-01-02 00:00:00"
CONTEXT: column c5 of foreign table ft1
ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE timestamp;
-- ===================================================================
--- 621,627 ----
-- ===================================================================
ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE int;
SELECT * FROM ft1 WHERE c1 = 1; -- ERROR
! ERROR: invalid input syntax for integer: "Fri Jan 02 00:00:00 1970"
CONTEXT: column c5 of foreign table ft1
ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE timestamp;
-- ===================================================================
======================================================================
I guess this test tries to check a case when remote column has incompatible
data type with local side. Please check timestamp_out(). Its output
format follows
"datestyle" setting of GUC, and it affects OS configuration on initdb time.
(Please grep "datestyle" at initdb.c !) I'd like to recommend to use
another data type
for this regression test to avoid false-positive detection.
Elsewhere, I could not find problems right now.
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
Thank for the comment!
On Tue, Nov 20, 2012 at 10:23 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
I also think the new "use_remote_explain" option is good. It works fine
when we try to use this fdw over the network with latency more or less.
It seems to me its default is "false", thus, GetForeignRelSize will return
the estimated cost according to ANALYZE, instead of remote EXPLAIN.
Even though you mention the default behavior was not changed, is it
an expected one? My preference is the current one, as is.
Oops, I must have focused on only cost factors.
I too think that using local statistics is better as default behavior,
because foreign tables would be used for relatively stable tables.
If target tables are updated often, it would cause problems about
consistency, unless we provide full-fledged transaction mapping.
The deparseFuncExpr() still has case handling whether it is explicit case,
implicit cast or regular functions. If my previous proposition has no flaw,
could you fix up it using regular function invocation manner? In case when
remote node has incompatible implicit-cast definition, this logic can make
a problem.
Sorry, I overlooked this issue. Fixed to use function call notation
for all of explicit function calls, explicit casts, and implicit casts.
At InitPostgresFdwOptions(), the source comment says we don't use
malloc() here for simplification of code. Hmm. I'm not sure why it is more
simple. It seems to me we have no reason why to avoid malloc here, even
though libpq options are acquired using malloc().
I used "simple" because using palloc avoids null-check and error handling.
However, many backend code use malloc to allocate memory which lives
as long as backend process itself, so I fixed.
Regarding to the regression test.
[snip]
I guess this test tries to check a case when remote column has incompatible
data type with local side. Please check timestamp_out(). Its output
format follows
"datestyle" setting of GUC, and it affects OS configuration on initdb time.
(Please grep "datestyle" at initdb.c !) I'd like to recommend to use
another data type
for this regression test to avoid false-positive detection.
Good catch. :)
I fixed the test to use another data type, user defined enum.
Regards,
--
Shigeru HANADA
Attachments:
postgres_fdw.v4.patchapplication/octet-stream; name=postgres_fdw.v4.patchDownload
diff --git a/contrib/Makefile b/contrib/Makefile
index d230451..7c6009d 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -43,6 +43,7 @@ SUBDIRS = \
pgcrypto \
pgrowlocks \
pgstattuple \
+ postgres_fdw \
seg \
spi \
tablefunc \
diff --git a/contrib/postgres_fdw/.gitignore b/contrib/postgres_fdw/.gitignore
new file mode 100644
index 0000000..0854728
--- /dev/null
+++ b/contrib/postgres_fdw/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/results/
+*.o
+*.so
diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
new file mode 100644
index 0000000..8dac777
--- /dev/null
+++ b/contrib/postgres_fdw/Makefile
@@ -0,0 +1,22 @@
+# contrib/postgres_fdw/Makefile
+
+MODULE_big = postgres_fdw
+OBJS = postgres_fdw.o option.o deparse.o connection.o
+PG_CPPFLAGS = -I$(libpq_srcdir)
+SHLIB_LINK = $(libpq)
+
+EXTENSION = postgres_fdw
+DATA = postgres_fdw--1.0.sql
+
+REGRESS = postgres_fdw
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/postgres_fdw
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
new file mode 100644
index 0000000..eab8b87
--- /dev/null
+++ b/contrib/postgres_fdw/connection.c
@@ -0,0 +1,605 @@
+/*-------------------------------------------------------------------------
+ *
+ * connection.c
+ * Connection management for postgres_fdw
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgres_fdw/connection.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_type.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq-fe.h"
+#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "utils/array.h"
+#include "utils/builtins.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "utils/tuplestore.h"
+
+#include "postgres_fdw.h"
+#include "connection.h"
+
+/* ============================================================================
+ * Connection management functions
+ * ==========================================================================*/
+
+/*
+ * Connection cache entry managed with hash table.
+ */
+typedef struct ConnCacheEntry
+{
+ /* hash key must be first */
+ Oid serverid; /* oid of foreign server */
+ Oid userid; /* oid of local user */
+
+ bool use_tx; /* true when using remote transaction */
+ int refs; /* reference counter */
+ PGconn *conn; /* foreign server connection */
+} ConnCacheEntry;
+
+/*
+ * Hash table which is used to cache connection to PostgreSQL servers, will be
+ * initialized before first attempt to connect PostgreSQL server by the backend.
+ */
+static HTAB *ConnectionHash;
+
+/* ----------------------------------------------------------------------------
+ * prototype of private functions
+ * --------------------------------------------------------------------------*/
+static void
+cleanup_connection(ResourceReleasePhase phase,
+ bool isCommit,
+ bool isTopLevel,
+ void *arg);
+static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
+static void begin_remote_tx(PGconn *conn);
+static void abort_remote_tx(PGconn *conn);
+
+/*
+ * Get a PGconn which can be used to execute foreign query on the remote
+ * PostgreSQL server with the user's authorization. If this was the first
+ * request for the server, new connection is established.
+ *
+ * When use_tx is true, remote transaction is started if caller is the only
+ * user of the connection. Isolation level of the remote transaction is same
+ * as local transaction, and remote transaction will be aborted when last
+ * user release.
+ *
+ * TODO: Note that caching connections requires a mechanism to detect change of
+ * FDW object to invalidate already established connections.
+ */
+PGconn *
+GetConnection(ForeignServer *server, UserMapping *user, bool use_tx)
+{
+ bool found;
+ ConnCacheEntry *entry;
+ ConnCacheEntry key;
+
+ /* initialize connection cache if it isn't */
+ if (ConnectionHash == NULL)
+ {
+ HASHCTL ctl;
+
+ /* hash key is a pair of oids: serverid and userid */
+ MemSet(&ctl, 0, sizeof(ctl));
+ ctl.keysize = sizeof(Oid) + sizeof(Oid);
+ ctl.entrysize = sizeof(ConnCacheEntry);
+ ctl.hash = tag_hash;
+ ctl.match = memcmp;
+ ctl.keycopy = memcpy;
+ /* allocate ConnectionHash in the cache context */
+ ctl.hcxt = CacheMemoryContext;
+ ConnectionHash = hash_create("postgres_fdw connections", 32,
+ &ctl,
+ HASH_ELEM | HASH_CONTEXT |
+ HASH_FUNCTION | HASH_COMPARE |
+ HASH_KEYCOPY);
+
+ /*
+ * Register postgres_fdw's own cleanup function for connection
+ * cleanup. This should be done just once for each backend.
+ */
+ RegisterResourceReleaseCallback(cleanup_connection, ConnectionHash);
+ }
+
+ /* Create key value for the entry. */
+ MemSet(&key, 0, sizeof(key));
+ key.serverid = server->serverid;
+ key.userid = GetOuterUserId();
+
+ /*
+ * Find cached entry for requested connection. If we couldn't find,
+ * callback function of ResourceOwner should be registered to clean the
+ * connection up on error including user interrupt.
+ */
+ entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
+ if (!found)
+ {
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ }
+
+ /*
+ * We don't check the health of cached connection here, because it would
+ * require some overhead. Broken connection and its cache entry will be
+ * cleaned up when the connection is actually used.
+ */
+
+ /*
+ * If cache entry doesn't have connection, we have to establish new
+ * connection.
+ */
+ if (entry->conn == NULL)
+ {
+ PGconn *volatile conn = NULL;
+
+ /*
+ * Use PG_TRY block to ensure closing connection on error.
+ */
+ PG_TRY();
+ {
+ /*
+ * Connect to the foreign PostgreSQL server, and store it in cache
+ * entry to keep new connection.
+ * Note: key items of entry has already been initialized in
+ * hash_search(HASH_ENTER).
+ */
+ conn = connect_pg_server(server, user);
+ }
+ PG_CATCH();
+ {
+ /* Clear connection cache entry on error case. */
+ PQfinish(entry->conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+ entry->conn = conn;
+ elog(DEBUG3, "new postgres_fdw connection %p for server %s",
+ entry->conn, server->servername);
+ }
+
+ /* Increase connection reference counter. */
+ entry->refs++;
+
+ /*
+ * If remote transaction is requested but it has not started, start remote
+ * transaction with the same isolation level as the local transaction we
+ * are in. We need to remember whether this connection uses remote
+ * transaction to abort it when this connection is released completely.
+ */
+ if (use_tx && !entry->use_tx)
+ {
+ begin_remote_tx(entry->conn);
+ entry->use_tx = use_tx;
+ }
+
+ return entry->conn;
+}
+
+/*
+ * For non-superusers, insist that the connstr specify a password. This
+ * prevents a password from being picked up from .pgpass, a service file,
+ * the environment, etc. We don't want the postgres user's passwords
+ * to be accessible to non-superusers.
+ */
+static void
+check_conn_params(const char **keywords, const char **values)
+{
+ int i;
+
+ /* no check required if superuser */
+ if (superuser())
+ return;
+
+ /* ok if params contain a non-empty password */
+ for (i = 0; keywords[i] != NULL; i++)
+ {
+ if (strcmp(keywords[i], "password") == 0 && values[i][0] != '\0')
+ return;
+ }
+
+ ereport(ERROR,
+ (errcode(ERRCODE_S_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED),
+ errmsg("password is required"),
+ errdetail("Non-superusers must provide a password in the connection string.")));
+}
+
+static PGconn *
+connect_pg_server(ForeignServer *server, UserMapping *user)
+{
+ const char *conname = server->servername;
+ PGconn *conn;
+ const char **all_keywords;
+ const char **all_values;
+ const char **keywords;
+ const char **values;
+ int n;
+ int i, j;
+
+ /*
+ * Construct connection params from generic options of ForeignServer and
+ * UserMapping. Those two object hold only libpq options.
+ * Extra 3 items are for:
+ * *) fallback_application_name
+ * *) client_encoding
+ * *) NULL termination (end marker)
+ *
+ * Note: We don't omit any parameters even target database might be older
+ * than local, because unexpected parameters are just ignored.
+ */
+ n = list_length(server->options) + list_length(user->options) + 3;
+ all_keywords = (const char **) palloc(sizeof(char *) * n);
+ all_values = (const char **) palloc(sizeof(char *) * n);
+ keywords = (const char **) palloc(sizeof(char *) * n);
+ values = (const char **) palloc(sizeof(char *) * n);
+ n = 0;
+ n += ExtractConnectionOptions(server->options,
+ all_keywords + n, all_values + n);
+ n += ExtractConnectionOptions(user->options,
+ all_keywords + n, all_values + n);
+ all_keywords[n] = all_values[n] = NULL;
+
+ for (i = 0, j = 0; all_keywords[i]; i++)
+ {
+ keywords[j] = all_keywords[i];
+ values[j] = all_values[i];
+ j++;
+ }
+
+ /* Use "postgres_fdw" as fallback_application_name. */
+ keywords[j] = "fallback_application_name";
+ values[j++] = "postgres_fdw";
+
+ /* Set client_encoding so that libpq can convert encoding properly. */
+ keywords[j] = "client_encoding";
+ values[j++] = GetDatabaseEncodingName();
+
+ keywords[j] = values[j] = NULL;
+ pfree(all_keywords);
+ pfree(all_values);
+
+ /* verify connection parameters and do connect */
+ check_conn_params(keywords, values);
+ conn = PQconnectdbParams(keywords, values, 0);
+ if (!conn || PQstatus(conn) != CONNECTION_OK)
+ ereport(ERROR,
+ (errcode(ERRCODE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION),
+ errmsg("could not connect to server \"%s\"", conname),
+ errdetail("%s", PQerrorMessage(conn))));
+ pfree(keywords);
+ pfree(values);
+
+ /*
+ * Check that non-superuser has used password to establish connection.
+ * This check logic is based on dblink_security_check() in contrib/dblink.
+ *
+ * XXX Should we check this even if we don't provide unsafe version like
+ * dblink_connect_u()?
+ */
+ if (!superuser() && !PQconnectionUsedPassword(conn))
+ {
+ PQfinish(conn);
+ ereport(ERROR,
+ (errcode(ERRCODE_S_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED),
+ errmsg("password is required"),
+ errdetail("Non-superuser cannot connect if the server does not request a password."),
+ errhint("Target server's authentication method must be changed.")));
+ }
+
+ return conn;
+}
+
+/*
+ * Start remote transaction with proper isolation level.
+ */
+static void
+begin_remote_tx(PGconn *conn)
+{
+ const char *sql = NULL; /* keep compiler quiet. */
+ PGresult *res;
+
+ switch (XactIsoLevel)
+ {
+ case XACT_READ_UNCOMMITTED:
+ case XACT_READ_COMMITTED:
+ case XACT_REPEATABLE_READ:
+ sql = "START TRANSACTION ISOLATION LEVEL REPEATABLE READ";
+ break;
+ case XACT_SERIALIZABLE:
+ sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
+ break;
+ default:
+ elog(ERROR, "unexpected isolation level: %d", XactIsoLevel);
+ break;
+ }
+
+ elog(DEBUG3, "starting remote transaction with \"%s\"", sql);
+
+ res = PQexec(conn, sql);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ PQclear(res);
+ elog(ERROR, "could not start transaction: %s", PQerrorMessage(conn));
+ }
+ PQclear(res);
+}
+
+static void
+abort_remote_tx(PGconn *conn)
+{
+ PGresult *res;
+
+ elog(DEBUG3, "aborting remote transaction");
+
+ res = PQexec(conn, "ABORT TRANSACTION");
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ PQclear(res);
+ elog(ERROR, "could not abort transaction: %s", PQerrorMessage(conn));
+ }
+ PQclear(res);
+}
+
+/*
+ * Mark the connection as "unused", and close it if the caller was the last
+ * user of the connection.
+ */
+void
+ReleaseConnection(PGconn *conn)
+{
+ HASH_SEQ_STATUS scan;
+ ConnCacheEntry *entry;
+
+ if (conn == NULL)
+ return;
+
+ /*
+ * We need to scan sequentially since we use the address to find
+ * appropriate PGconn from the hash table.
+ */
+ hash_seq_init(&scan, ConnectionHash);
+ while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
+ {
+ if (entry->conn == conn)
+ {
+ hash_seq_term(&scan);
+ break;
+ }
+ }
+
+ /*
+ * If the given connection is an orphan, it must be a dangling pointer to
+ * already released connection. Discarding connection due to remote query
+ * error would produce such situation (see comments below).
+ */
+ if (entry == NULL)
+ return;
+
+ /*
+ * If releasing connection is broken or its transaction has failed,
+ * discard the connection to recover from the error. PQfinish would cause
+ * dangling pointer of shared PGconn object, but they won't double-free'd
+ * because their pointer values don't match any of cached entry and ignored
+ * at the check above.
+ *
+ * Subsequent connection request via GetConnection will create new
+ * connection.
+ */
+ if (PQstatus(conn) != CONNECTION_OK ||
+ (PQtransactionStatus(conn) != PQTRANS_IDLE &&
+ PQtransactionStatus(conn) != PQTRANS_INTRANS))
+ {
+ elog(DEBUG3, "discarding connection: %s %s",
+ PQstatus(conn) == CONNECTION_OK ? "OK" : "NG",
+ PQtransactionStatus(conn) == PQTRANS_IDLE ? "IDLE" :
+ PQtransactionStatus(conn) == PQTRANS_ACTIVE ? "ACTIVE" :
+ PQtransactionStatus(conn) == PQTRANS_INTRANS ? "INTRANS" :
+ PQtransactionStatus(conn) == PQTRANS_INERROR ? "INERROR" :
+ "UNKNOWN");
+ PQfinish(conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ return;
+ }
+
+ /*
+ * Decrease reference counter of this connection. Even if the caller was
+ * the last referrer, we don't unregister it from cache.
+ */
+ entry->refs--;
+ if (entry->refs < 0)
+ entry->refs = 0; /* just in case */
+
+ /*
+ * If this connection uses remote transaction and there is no user other
+ * than the caller, abort the remote transaction and forget about it.
+ */
+ if (entry->use_tx && entry->refs == 0)
+ {
+ abort_remote_tx(conn);
+ entry->use_tx = false;
+ }
+}
+
+/*
+ * Clean the connection up via ResourceOwner.
+ */
+static void
+cleanup_connection(ResourceReleasePhase phase,
+ bool isCommit,
+ bool isTopLevel,
+ void *arg)
+{
+ HASH_SEQ_STATUS scan;
+ ConnCacheEntry *entry = (ConnCacheEntry *) arg;
+
+ /* If the transaction was committed, don't close connections. */
+ if (isCommit)
+ return;
+
+ /*
+ * We clean the connection up on post-lock because foreign connections are
+ * backend-internal resource.
+ */
+ if (phase != RESOURCE_RELEASE_AFTER_LOCKS)
+ return;
+
+ /*
+ * We ignore cleanup for ResourceOwners other than transaction. At this
+ * point, such a ResourceOwner is only Portal.
+ */
+ if (CurrentResourceOwner != CurTransactionResourceOwner)
+ return;
+
+ /*
+ * We don't need to clean up at end of subtransactions, because they might
+ * be recovered to consistent state with savepoints.
+ */
+ if (!isTopLevel)
+ return;
+
+ /*
+ * Here, it must be after abort of top level transaction. Disconnect all
+ * cached connections to clear error status out and reset their reference
+ * counters.
+ */
+ hash_seq_init(&scan, ConnectionHash);
+ while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
+ {
+ elog(DEBUG3, "discard postgres_fdw connection %p due to resowner cleanup",
+ entry->conn);
+ PQfinish(entry->conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ }
+}
+
+/*
+ * Get list of connections currently active.
+ */
+Datum postgres_fdw_get_connections(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgres_fdw_get_connections);
+Datum
+postgres_fdw_get_connections(PG_FUNCTION_ARGS)
+{
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ HASH_SEQ_STATUS scan;
+ ConnCacheEntry *entry;
+ MemoryContext oldcontext = CurrentMemoryContext;
+ Tuplestorestate *tuplestore;
+ TupleDesc tupdesc;
+
+ /* We return list of connection with storing them in a Tuplestore. */
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = NULL;
+ rsinfo->setDesc = NULL;
+
+ /* Create tuplestore and copy of TupleDesc in per-query context. */
+ MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+
+ tupdesc = CreateTemplateTupleDesc(2, false);
+ TupleDescInitEntry(tupdesc, 1, "srvid", OIDOID, -1, 0);
+ TupleDescInitEntry(tupdesc, 2, "usesysid", OIDOID, -1, 0);
+ rsinfo->setDesc = tupdesc;
+
+ tuplestore = tuplestore_begin_heap(false, false, work_mem);
+ rsinfo->setResult = tuplestore;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * We need to scan sequentially since we use the address to find
+ * appropriate PGconn from the hash table.
+ */
+ if (ConnectionHash != NULL)
+ {
+ hash_seq_init(&scan, ConnectionHash);
+ while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
+ {
+ Datum values[2];
+ bool nulls[2];
+ HeapTuple tuple;
+
+ /* Ignore inactive connections */
+ if (PQstatus(entry->conn) != CONNECTION_OK)
+ continue;
+
+ /*
+ * Ignore other users' connections if current user isn't a
+ * superuser.
+ */
+ if (!superuser() && entry->userid != GetUserId())
+ continue;
+
+ values[0] = ObjectIdGetDatum(entry->serverid);
+ values[1] = ObjectIdGetDatum(entry->userid);
+ nulls[0] = false;
+ nulls[1] = false;
+
+ tuple = heap_formtuple(tupdesc, values, nulls);
+ tuplestore_puttuple(tuplestore, tuple);
+ }
+ }
+ tuplestore_donestoring(tuplestore);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Discard persistent connection designated by given connection name.
+ */
+Datum postgres_fdw_disconnect(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgres_fdw_disconnect);
+Datum
+postgres_fdw_disconnect(PG_FUNCTION_ARGS)
+{
+ Oid serverid = PG_GETARG_OID(0);
+ Oid userid = PG_GETARG_OID(1);
+ ConnCacheEntry key;
+ ConnCacheEntry *entry = NULL;
+ bool found;
+
+ /* Non-superuser can't discard other users' connection. */
+ if (!superuser() && userid != GetOuterUserId())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+ errmsg("only superuser can discard other user's connection")));
+
+ /*
+ * If no connection has been established, or no such connections, just
+ * return "NG" to indicate nothing has done.
+ */
+ if (ConnectionHash == NULL)
+ PG_RETURN_TEXT_P(cstring_to_text("NG"));
+
+ key.serverid = serverid;
+ key.userid = userid;
+ entry = hash_search(ConnectionHash, &key, HASH_FIND, &found);
+ if (!found)
+ PG_RETURN_TEXT_P(cstring_to_text("NG"));
+
+ /* Discard cached connection, and clear reference counter. */
+ PQfinish(entry->conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+
+ PG_RETURN_TEXT_P(cstring_to_text("OK"));
+}
diff --git a/contrib/postgres_fdw/connection.h b/contrib/postgres_fdw/connection.h
new file mode 100644
index 0000000..4c9d850
--- /dev/null
+++ b/contrib/postgres_fdw/connection.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * connection.h
+ * Connection management for postgres_fdw
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgres_fdw/connection.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CONNECTION_H
+#define CONNECTION_H
+
+#include "foreign/foreign.h"
+#include "libpq-fe.h"
+
+/*
+ * Connection management
+ */
+PGconn *GetConnection(ForeignServer *server, UserMapping *user, bool use_tx);
+void ReleaseConnection(PGconn *conn);
+
+#endif /* CONNECTION_H */
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
new file mode 100644
index 0000000..69e6a3e
--- /dev/null
+++ b/contrib/postgres_fdw/deparse.c
@@ -0,0 +1,1192 @@
+/*-------------------------------------------------------------------------
+ *
+ * deparse.c
+ * query deparser for PostgreSQL
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgres_fdw/deparse.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/transam.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "commands/defrem.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "nodes/nodeFuncs.h"
+#include "nodes/nodes.h"
+#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/var.h"
+#include "parser/parser.h"
+#include "parser/parsetree.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+#include "postgres_fdw.h"
+
+/*
+ * Context for walk-through the expression tree.
+ */
+typedef struct foreign_executable_cxt
+{
+ PlannerInfo *root;
+ RelOptInfo *foreignrel;
+ bool has_param;
+} foreign_executable_cxt;
+
+/*
+ * Get string representation which can be used in SQL statement from a node.
+ */
+static void deparseExpr(StringInfo buf, Expr *expr, PlannerInfo *root);
+static void deparseRelation(StringInfo buf, RangeTblEntry *rte);
+static void deparseVar(StringInfo buf, Var *node, PlannerInfo *root);
+static void deparseConst(StringInfo buf, Const *node, PlannerInfo *root);
+static void deparseBoolExpr(StringInfo buf, BoolExpr *node, PlannerInfo *root);
+static void deparseNullTest(StringInfo buf, NullTest *node, PlannerInfo *root);
+static void deparseDistinctExpr(StringInfo buf, DistinctExpr *node,
+ PlannerInfo *root);
+static void deparseRelabelType(StringInfo buf, RelabelType *node,
+ PlannerInfo *root);
+static void deparseFuncExpr(StringInfo buf, FuncExpr *node, PlannerInfo *root);
+static void deparseParam(StringInfo buf, Param *node, PlannerInfo *root);
+static void deparseScalarArrayOpExpr(StringInfo buf, ScalarArrayOpExpr *node,
+ PlannerInfo *root);
+static void deparseOpExpr(StringInfo buf, OpExpr *node, PlannerInfo *root);
+static void deparseArrayRef(StringInfo buf, ArrayRef *node, PlannerInfo *root);
+static void deparseArrayExpr(StringInfo buf, ArrayExpr *node, PlannerInfo *root);
+
+/*
+ * Determine whether an expression can be evaluated on remote side safely.
+ */
+static bool is_foreign_expr(PlannerInfo *root, RelOptInfo *baserel, Expr *expr,
+ bool *has_param);
+static bool foreign_expr_walker(Node *node, foreign_executable_cxt *context);
+static bool is_builtin(Oid procid);
+
+/*
+ * Deparse query representation into SQL statement which suits for remote
+ * PostgreSQL server. This function basically creates simple query string
+ * which consists of only SELECT, FROM clauses.
+ *
+ * Remote SELECT clause contains only columns which are used in targetlist or
+ * local_conds (conditions which can't be pushed down and will be checked on
+ * local side).
+ */
+void
+deparseSimpleSql(StringInfo buf,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ List *local_conds)
+{
+ RangeTblEntry *rte;
+ ListCell *lc;
+ StringInfoData foreign_relname;
+ bool first;
+ AttrNumber attr;
+ List *attr_used = NIL; /* List of AttNumber used in the query */
+
+ initStringInfo(buf);
+ initStringInfo(&foreign_relname);
+
+ /*
+ * First of all, determine which column should be retrieved for this scan.
+ *
+ * We do this before deparsing SELECT clause because attributes which are
+ * not used in neither reltargetlist nor baserel->baserestrictinfo, quals
+ * evaluated on local, can be replaced with literal "NULL" in the SELECT
+ * clause to reduce overhead of tuple handling tuple and data transfer.
+ */
+ foreach (lc, local_conds)
+ {
+ RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
+ List *attrs;
+
+ /*
+ * We need to know which attributes are used in qual evaluated
+ * on the local server, because they should be listed in the
+ * SELECT clause of remote query. We can ignore attributes
+ * which are referenced only in ORDER BY/GROUP BY clause because
+ * such attributes has already been kept in reltargetlist.
+ */
+ attrs = pull_var_clause((Node *) ri->clause,
+ PVC_RECURSE_AGGREGATES,
+ PVC_RECURSE_PLACEHOLDERS);
+ attr_used = list_union(attr_used, attrs);
+ }
+
+ /*
+ * deparse SELECT clause
+ *
+ * List attributes which are in either target list or local restriction.
+ * Unused attributes are replaced with a literal "NULL" for optimization.
+ *
+ * Note that nothing is added for dropped columns, though tuple constructor
+ * function requires entries for dropped columns. Such entries must be
+ * initialized with NULL before calling tuple constructor.
+ */
+ appendStringInfo(buf, "SELECT ");
+ rte = root->simple_rte_array[baserel->relid];
+ attr_used = list_union(attr_used, baserel->reltargetlist);
+ first = true;
+ for (attr = 1; attr <= baserel->max_attr; attr++)
+ {
+ Var *var = NULL;
+ ListCell *lc;
+
+ /* Ignore dropped attributes. */
+ if (get_rte_attribute_is_dropped(rte, attr))
+ continue;
+
+ if (!first)
+ appendStringInfo(buf, ", ");
+ first = false;
+
+ /*
+ * We use linear search here, but it wouldn't be problem since
+ * attr_used seems to not become so large.
+ */
+ foreach (lc, attr_used)
+ {
+ var = lfirst(lc);
+ if (var->varattno == attr)
+ break;
+ var = NULL;
+ }
+ if (var != NULL)
+ deparseVar(buf, var, root);
+ else
+ appendStringInfo(buf, "NULL");
+ }
+ appendStringInfoChar(buf, ' ');
+
+ /*
+ * deparse FROM clause, including alias if any
+ */
+ appendStringInfo(buf, "FROM ");
+ deparseRelation(buf, root->simple_rte_array[baserel->relid]);
+}
+
+/*
+ * Examine each element in the list baserestrictinfo of baserel, and classify
+ * them into three groups: remote_conds contains conditions which can be
+ * evaluated
+ * - remote_conds is push-down safe, and don't contain any Param node
+ * - param_conds is push-down safe, but contain some Param node
+ * - local_conds is not push-down safe
+ *
+ * Only remote_conds can be used in remote EXPLAIN, and remote_conds and
+ * param_conds can be used in final remote query.
+ */
+void
+classifyConditions(PlannerInfo *root,
+ RelOptInfo *baserel,
+ List **remote_conds,
+ List **param_conds,
+ List **local_conds)
+{
+ ListCell *lc;
+ bool has_param;
+
+ Assert(remote_conds);
+ Assert(param_conds);
+ Assert(local_conds);
+
+ foreach(lc, baserel->baserestrictinfo)
+ {
+ RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
+
+ if (is_foreign_expr(root, baserel, ri->clause, &has_param))
+ {
+ if (has_param)
+ *param_conds = lappend(*param_conds, ri);
+ else
+ *remote_conds = lappend(*remote_conds, ri);
+ }
+ else
+ *local_conds = lappend(*local_conds, ri);
+ }
+}
+
+/*
+ * Deparse SELECT statement to acquire sample rows of given relation into buf.
+ */
+void
+deparseAnalyzeSql(StringInfo buf, Relation rel)
+{
+ Oid relid = RelationGetRelid(rel);
+ TupleDesc tupdesc = RelationGetDescr(rel);
+ int i;
+ char *colname;
+ List *options;
+ ListCell *lc;
+ bool first = true;
+ char *nspname;
+ char *relname;
+ ForeignTable *table;
+
+ /* Deparse SELECT clause, use attribute name or colname option. */
+ appendStringInfo(buf, "SELECT ");
+ for (i = 0; i < tupdesc->natts; i++)
+ {
+ if (tupdesc->attrs[i]->attisdropped)
+ continue;
+
+ colname = NameStr(tupdesc->attrs[i]->attname);
+ options = GetForeignColumnOptions(relid, tupdesc->attrs[i]->attnum);
+
+ foreach(lc, options)
+ {
+ DefElem *def= (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "colname") == 0)
+ {
+ colname = defGetString(def);
+ break;
+ }
+ }
+
+ if (!first)
+ appendStringInfo(buf, ", ");
+ appendStringInfo(buf, "%s", quote_identifier(colname));
+ first = false;
+ }
+
+ /*
+ * Deparse FROM clause, use namespace and relation name, or use nspname and
+ * colname options respectively.
+ */
+ nspname = get_namespace_name(get_rel_namespace(relid));
+ relname = get_rel_name(relid);
+ table = GetForeignTable(relid);
+ foreach(lc, table->options)
+ {
+ DefElem *def= (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "nspname") == 0)
+ nspname = defGetString(def);
+ else if (strcmp(def->defname, "relname") == 0)
+ relname = defGetString(def);
+ }
+
+ appendStringInfo(buf, " FROM %s.%s", quote_identifier(nspname),
+ quote_identifier(relname));
+}
+
+/*
+ * Deparse given expression into buf. Actual string operation is delegated to
+ * node-type-specific functions.
+ *
+ * Note that switch statement of this function MUST match the one in
+ * foreign_expr_walker to avoid unsupported error..
+ */
+static void
+deparseExpr(StringInfo buf, Expr *node, PlannerInfo *root)
+{
+ /*
+ * This part must be match foreign_expr_walker.
+ */
+ switch (nodeTag(node))
+ {
+ case T_Const:
+ deparseConst(buf, (Const *) node, root);
+ break;
+ case T_BoolExpr:
+ deparseBoolExpr(buf, (BoolExpr *) node, root);
+ break;
+ case T_NullTest:
+ deparseNullTest(buf, (NullTest *) node, root);
+ break;
+ case T_DistinctExpr:
+ deparseDistinctExpr(buf, (DistinctExpr *) node, root);
+ break;
+ case T_RelabelType:
+ deparseRelabelType(buf, (RelabelType *) node, root);
+ break;
+ case T_FuncExpr:
+ deparseFuncExpr(buf, (FuncExpr *) node, root);
+ break;
+ case T_Param:
+ deparseParam(buf, (Param *) node, root);
+ break;
+ case T_ScalarArrayOpExpr:
+ deparseScalarArrayOpExpr(buf, (ScalarArrayOpExpr *) node, root);
+ break;
+ case T_OpExpr:
+ deparseOpExpr(buf, (OpExpr *) node, root);
+ break;
+ case T_Var:
+ deparseVar(buf, (Var *) node, root);
+ break;
+ case T_ArrayRef:
+ deparseArrayRef(buf, (ArrayRef *) node, root);
+ break;
+ case T_ArrayExpr:
+ deparseArrayExpr(buf, (ArrayExpr *) node, root);
+ break;
+ default:
+ {
+ ereport(ERROR,
+ (errmsg("unsupported expression for deparse"),
+ errdetail("%s", nodeToString(node))));
+ }
+ break;
+ }
+}
+
+/*
+ * Deparse given Var node into buf. If the column has colname FDW option, use
+ * its value instead of attribute name.
+ */
+static void
+deparseVar(StringInfo buf, Var *node, PlannerInfo *root)
+{
+ RangeTblEntry *rte;
+ char *colname = NULL;
+ const char *q_colname = NULL;
+ List *options;
+ ListCell *lc;
+
+ /* node must not be any of OUTER_VAR,INNER_VAR and INDEX_VAR. */
+ Assert(node->varno >= 1 && node->varno <= root->simple_rel_array_size);
+
+ /* Get RangeTblEntry from array in PlannerInfo. */
+ rte = root->simple_rte_array[node->varno];
+
+ /*
+ * If the node is a column of a foreign table, and it has colname FDW
+ * option, use its value.
+ */
+ options = GetForeignColumnOptions(rte->relid, node->varattno);
+ foreach(lc, options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "colname") == 0)
+ {
+ colname = defGetString(def);
+ break;
+ }
+ }
+
+ /*
+ * If the node refers a column of a regular table or it doesn't have colname
+ * FDW option, use attribute name.
+ */
+ if (colname == NULL)
+ colname = get_attname(rte->relid, node->varattno);
+
+ q_colname = quote_identifier(colname);
+ appendStringInfo(buf, "%s", q_colname);
+}
+
+/*
+ * Deparse a RangeTblEntry node into buf. If rte represents a foreign table,
+ * use value of relname FDW option (if any) instead of relation's name.
+ * Similarly, nspname FDW option overrides schema name.
+ */
+static void
+deparseRelation(StringInfo buf, RangeTblEntry *rte)
+{
+ ForeignTable *table;
+ ListCell *lc;
+ const char *nspname = NULL; /* plain namespace name */
+ const char *relname = NULL; /* plain relation name */
+ const char *q_nspname; /* quoted namespace name */
+ const char *q_relname; /* quoted relation name */
+
+ /* obtain additional catalog information. */
+ table = GetForeignTable(rte->relid);
+
+ /*
+ * Use value of FDW options if any, instead of the name of object
+ * itself.
+ */
+ foreach(lc, table->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "nspname") == 0)
+ nspname = defGetString(def);
+ else if (strcmp(def->defname, "relname") == 0)
+ relname = defGetString(def);
+ }
+
+ /* Quote each identifier, if necessary. */
+ if (nspname == NULL)
+ nspname = get_namespace_name(get_rel_namespace(rte->relid));
+ q_nspname = quote_identifier(nspname);
+
+ if (relname == NULL)
+ relname = get_rel_name(rte->relid);
+ q_relname = quote_identifier(relname);
+
+ /* Construct relation reference into the buffer. */
+ appendStringInfo(buf, "%s.%s", q_nspname, q_relname);
+}
+
+/*
+ * Deparse given constant value into buf. This function have to be kept in
+ * sync with get_const_expr.
+ */
+static void
+deparseConst(StringInfo buf,
+ Const *node,
+ PlannerInfo *root)
+{
+ Oid typoutput;
+ bool typIsVarlena;
+ char *extval;
+ bool isfloat = false;
+ bool needlabel;
+
+ if (node->constisnull)
+ {
+ appendStringInfo(buf, "NULL");
+ return;
+ }
+
+ getTypeOutputInfo(node->consttype,
+ &typoutput, &typIsVarlena);
+ extval = OidOutputFunctionCall(typoutput, node->constvalue);
+
+ switch (node->consttype)
+ {
+ case ANYARRAYOID:
+ case ANYNONARRAYOID:
+ elog(ERROR, "anyarray and anyenum are not supported");
+ break;
+ case INT2OID:
+ case INT4OID:
+ case INT8OID:
+ case OIDOID:
+ case FLOAT4OID:
+ case FLOAT8OID:
+ case NUMERICOID:
+ {
+ /*
+ * No need to quote unless they contain special values such as
+ * 'Nan'.
+ */
+ if (strspn(extval, "0123456789+-eE.") == strlen(extval))
+ {
+ if (extval[0] == '+' || extval[0] == '-')
+ appendStringInfo(buf, "(%s)", extval);
+ else
+ appendStringInfoString(buf, extval);
+ if (strcspn(extval, "eE.") != strlen(extval))
+ isfloat = true; /* it looks like a float */
+ }
+ else
+ appendStringInfo(buf, "'%s'", extval);
+ }
+ break;
+ case BITOID:
+ case VARBITOID:
+ appendStringInfo(buf, "B'%s'", extval);
+ break;
+ case BOOLOID:
+ if (strcmp(extval, "t") == 0)
+ appendStringInfoString(buf, "true");
+ else
+ appendStringInfoString(buf, "false");
+ break;
+
+ default:
+ {
+ const char *valptr;
+
+ appendStringInfoChar(buf, '\'');
+ for (valptr = extval; *valptr; valptr++)
+ {
+ char ch = *valptr;
+
+ /*
+ * standard_conforming_strings of remote session should be
+ * set to similar value as local session.
+ */
+ if (SQL_STR_DOUBLE(ch, !standard_conforming_strings))
+ appendStringInfoChar(buf, ch);
+ appendStringInfoChar(buf, ch);
+ }
+ appendStringInfoChar(buf, '\'');
+ }
+ break;
+ }
+
+ /*
+ * Append ::typename unless the constant will be implicitly typed as the
+ * right type when it is read in.
+ *
+ * XXX this code has to be kept in sync with the behavior of the parser,
+ * especially make_const.
+ */
+ switch (node->consttype)
+ {
+ case BOOLOID:
+ case INT4OID:
+ case UNKNOWNOID:
+ needlabel = false;
+ break;
+ case NUMERICOID:
+ needlabel = !isfloat || (node->consttypmod >= 0);
+ break;
+ default:
+ needlabel = true;
+ break;
+ }
+ if (needlabel)
+ {
+ appendStringInfo(buf, "::%s",
+ format_type_with_typemod(node->consttype,
+ node->consttypmod));
+ }
+}
+
+static void
+deparseBoolExpr(StringInfo buf,
+ BoolExpr *node,
+ PlannerInfo *root)
+{
+ ListCell *lc;
+ char *op = NULL; /* keep compiler quiet */
+ bool first;
+
+ switch (node->boolop)
+ {
+ case AND_EXPR:
+ op = "AND";
+ break;
+ case OR_EXPR:
+ op = "OR";
+ break;
+ case NOT_EXPR:
+ appendStringInfo(buf, "(NOT ");
+ deparseExpr(buf, list_nth(node->args, 0), root);
+ appendStringInfo(buf, ")");
+ return;
+ }
+
+ first = true;
+ appendStringInfo(buf, "(");
+ foreach(lc, node->args)
+ {
+ if (!first)
+ appendStringInfo(buf, " %s ", op);
+ deparseExpr(buf, (Expr *) lfirst(lc), root);
+ first = false;
+ }
+ appendStringInfo(buf, ")");
+}
+
+/*
+ * Deparse given IS [NOT] NULL test expression into buf.
+ */
+static void
+deparseNullTest(StringInfo buf,
+ NullTest *node,
+ PlannerInfo *root)
+{
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, node->arg, root);
+ if (node->nulltesttype == IS_NULL)
+ appendStringInfo(buf, " IS NULL)");
+ else
+ appendStringInfo(buf, " IS NOT NULL)");
+}
+
+static void
+deparseDistinctExpr(StringInfo buf,
+ DistinctExpr *node,
+ PlannerInfo *root)
+{
+ Assert(list_length(node->args) == 2);
+
+ deparseExpr(buf, linitial(node->args), root);
+ appendStringInfo(buf, " IS DISTINCT FROM ");
+ deparseExpr(buf, lsecond(node->args), root);
+}
+
+static void
+deparseRelabelType(StringInfo buf,
+ RelabelType *node,
+ PlannerInfo *root)
+{
+ char *typname;
+
+ Assert(node->arg);
+
+ /* We don't need to deparse cast when argument has same type as result. */
+ if (IsA(node->arg, Const) &&
+ ((Const *) node->arg)->consttype == node->resulttype &&
+ ((Const *) node->arg)->consttypmod == -1)
+ {
+ deparseExpr(buf, node->arg, root);
+ return;
+ }
+
+ typname = format_type_with_typemod(node->resulttype, node->resulttypmod);
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, node->arg, root);
+ appendStringInfo(buf, ")::%s", typname);
+}
+
+/*
+ * Deparse given node which represents a function call into buf. Here not only
+ * explicit function calls and explicit casts but also implicit casts are
+ * deparsed to avoid problem caused by different cast settings between local
+ * and remote.
+ *
+ * Function name (and type name) is always qualified by schema name to avoid
+ * problems caused by different setting of search_path on remote side.
+ */
+static void
+deparseFuncExpr(StringInfo buf,
+ FuncExpr *node,
+ PlannerInfo *root)
+{
+ Oid pronamespace;
+ const char *schemaname;
+ const char *funcname;
+ ListCell *arg;
+ bool first;
+
+ pronamespace = get_func_namespace(node->funcid);
+ schemaname = quote_identifier(get_namespace_name(pronamespace));
+ funcname = quote_identifier(get_func_name(node->funcid));
+
+ /*
+ * Deparse and all arguments recursively in parentheses after function
+ * name.
+ */
+ appendStringInfo(buf, "%s.%s(", schemaname, funcname);
+ first = true;
+ foreach(arg, node->args)
+ {
+ if (!first)
+ appendStringInfo(buf, ", ");
+ deparseExpr(buf, lfirst(arg), root);
+ first = false;
+ }
+ appendStringInfoChar(buf, ')');
+}
+
+/*
+ * Deparse given Param node into buf.
+ *
+ * We don't renumber parameter id, because skipping $1 is not cause problem
+ * as far as we pass through all arguments.
+ */
+static void
+deparseParam(StringInfo buf,
+ Param *node,
+ PlannerInfo *root)
+{
+ Assert(node->paramkind == PARAM_EXTERN);
+
+ appendStringInfo(buf, "$%d", node->paramid);
+}
+
+/*
+ * Deparse given ScalarArrayOpExpr expression into buf. To avoid problems
+ * around priority of operations, we always parenthesize the arguments. Also we
+ * use OPERATOR(schema.operator) notation to determine remote operator exactly.
+ */
+static void
+deparseScalarArrayOpExpr(StringInfo buf,
+ ScalarArrayOpExpr *node,
+ PlannerInfo *root)
+{
+ HeapTuple tuple;
+ Form_pg_operator form;
+ const char *opnspname;
+ char *opname;
+ Expr *arg1;
+ Expr *arg2;
+
+ /* Retrieve necessary information about the operator from system catalog. */
+ tuple = SearchSysCache1(OPEROID, ObjectIdGetDatum(node->opno));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for operator %u", node->opno);
+ form = (Form_pg_operator) GETSTRUCT(tuple);
+ /* opname is not a SQL identifier, so we don't need to quote it. */
+ opname = NameStr(form->oprname);
+ opnspname = quote_identifier(get_namespace_name(form->oprnamespace));
+ ReleaseSysCache(tuple);
+
+ /* Sanity check. */
+ Assert(list_length(node->args) == 2);
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, '(');
+
+ /* Extract operands. */
+ arg1 = linitial(node->args);
+ arg2 = lsecond(node->args);
+
+ /* Deparse fully qualified operator name. */
+ deparseExpr(buf, arg1, root);
+ appendStringInfo(buf, " OPERATOR(%s.%s) %s (",
+ opnspname, opname, node->useOr ? "ANY" : "ALL");
+ deparseExpr(buf, arg2, root);
+ appendStringInfoChar(buf, ')');
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, ')');
+}
+
+/*
+ * Deparse given operator expression into buf. To avoid problems around
+ * priority of operations, we always parenthesize the arguments. Also we use
+ * OPERATOR(schema.operator) notation to determine remote operator exactly.
+ */
+static void
+deparseOpExpr(StringInfo buf,
+ OpExpr *node,
+ PlannerInfo *root)
+{
+ HeapTuple tuple;
+ Form_pg_operator form;
+ const char *opnspname;
+ char *opname;
+ char oprkind;
+ ListCell *arg;
+
+ /* Retrieve necessary information about the operator from system catalog. */
+ tuple = SearchSysCache1(OPEROID, ObjectIdGetDatum(node->opno));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for operator %u", node->opno);
+ form = (Form_pg_operator) GETSTRUCT(tuple);
+ opnspname = quote_identifier(get_namespace_name(form->oprnamespace));
+ /* opname is not a SQL identifier, so we don't need to quote it. */
+ opname = NameStr(form->oprname);
+ oprkind = form->oprkind;
+ ReleaseSysCache(tuple);
+
+ /* Sanity check. */
+ Assert((oprkind == 'r' && list_length(node->args) == 1) ||
+ (oprkind == 'l' && list_length(node->args) == 1) ||
+ (oprkind == 'b' && list_length(node->args) == 2));
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, '(');
+
+ /* Deparse first operand. */
+ arg = list_head(node->args);
+ if (oprkind == 'r' || oprkind == 'b')
+ {
+ deparseExpr(buf, lfirst(arg), root);
+ appendStringInfoChar(buf, ' ');
+ }
+
+ /* Deparse fully qualified operator name. */
+ appendStringInfo(buf, "OPERATOR(%s.%s)", opnspname, opname);
+
+ /* Deparse last operand. */
+ arg = list_tail(node->args);
+ if (oprkind == 'l' || oprkind == 'b')
+ {
+ appendStringInfoChar(buf, ' ');
+ deparseExpr(buf, lfirst(arg), root);
+ }
+
+ appendStringInfoChar(buf, ')');
+}
+
+static void
+deparseArrayRef(StringInfo buf,
+ ArrayRef *node,
+ PlannerInfo *root)
+{
+ ListCell *lowlist_item;
+ ListCell *uplist_item;
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, '(');
+
+ /* Deparse referenced array expression first. */
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, node->refexpr, root);
+ appendStringInfoChar(buf, ')');
+
+ /* Deparse subscripts expression. */
+ lowlist_item = list_head(node->reflowerindexpr); /* could be NULL */
+ foreach(uplist_item, node->refupperindexpr)
+ {
+ appendStringInfoChar(buf, '[');
+ if (lowlist_item)
+ {
+ deparseExpr(buf, lfirst(lowlist_item), root);
+ appendStringInfoChar(buf, ':');
+ lowlist_item = lnext(lowlist_item);
+ }
+ deparseExpr(buf, lfirst(uplist_item), root);
+ appendStringInfoChar(buf, ']');
+ }
+
+ appendStringInfoChar(buf, ')');
+}
+
+
+/*
+ * Deparse given array of something into buf.
+ */
+static void
+deparseArrayExpr(StringInfo buf,
+ ArrayExpr *node,
+ PlannerInfo *root)
+{
+ ListCell *lc;
+ bool first = true;
+
+ appendStringInfo(buf, "ARRAY[");
+ foreach(lc, node->elements)
+ {
+ if (!first)
+ appendStringInfo(buf, ", ");
+ deparseExpr(buf, lfirst(lc), root);
+
+ first = false;
+ }
+ appendStringInfoChar(buf, ']');
+
+ /* If the array is empty, we need explicit cast to the array type. */
+ if (node->elements == NIL)
+ {
+ char *typname;
+
+ typname = format_type_with_typemod(node->array_typeid, -1);
+ appendStringInfo(buf, "::%s", typname);
+ }
+}
+
+/*
+ * Returns true if given expr is safe to evaluate on the foreign server. If
+ * result is true, extra information has_param tells whether given expression
+ * contains any Param node. This is useful to determine whether the expression
+ * can be used in remote EXPLAIN.
+ */
+static bool
+is_foreign_expr(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Expr *expr,
+ bool *has_param)
+{
+ foreign_executable_cxt context;
+ context.root = root;
+ context.foreignrel = baserel;
+ context.has_param = false;
+
+ /*
+ * An expression which includes any mutable function can't be pushed down
+ * because it's result is not stable. For example, pushing now() down to
+ * remote side would cause confusion from the clock offset.
+ * If we have routine mapping infrastructure in future release, we will be
+ * able to choose function to be pushed down in finer granularity.
+ */
+ if (contain_mutable_functions((Node *) expr))
+ {
+ elog(DEBUG3, "expr has mutable function");
+ return false;
+ }
+
+ /*
+ * Check that the expression consists of nodes which are known as safe to
+ * be pushed down.
+ */
+ if (foreign_expr_walker((Node *) expr, &context))
+ return false;
+
+ /*
+ * Tell caller whether the given expression contains any Param node, which
+ * can't be used in EXPLAIN statement before executor starts.
+ */
+ *has_param = context.has_param;
+
+ return true;
+}
+
+/*
+ * Return true if node includes any node which is not known as safe to be
+ * pushed down.
+ */
+static bool
+foreign_expr_walker(Node *node, foreign_executable_cxt *context)
+{
+ if (node == NULL)
+ return false;
+
+ /*
+ * Special case handling for List; expression_tree_walker handles List as
+ * well as other Expr nodes. For instance, List is used in RestrictInfo
+ * for args of FuncExpr node.
+ *
+ * Although the comments of expression_tree_walker mention that
+ * RangeTblRef, FromExpr, JoinExpr, and SetOperationStmt are handled as
+ * well, but we don't care them because they are not used in RestrictInfo.
+ * If one of them was passed into, default label catches it and give up
+ * traversing.
+ */
+ if (IsA(node, List))
+ {
+ ListCell *lc;
+
+ foreach(lc, (List *) node)
+ {
+ if (foreign_expr_walker(lfirst(lc), context))
+ return true;
+ }
+ return false;
+ }
+
+ /*
+ * If return type of given expression is not built-in, it can't be pushed
+ * down because it might has incompatible semantics on remote side.
+ */
+ if (!is_builtin(exprType(node)))
+ {
+ elog(DEBUG3, "expr has user-defined type");
+ return true;
+ }
+
+ switch (nodeTag(node))
+ {
+ case T_Const:
+ /*
+ * Using anyarray and/or anyenum in remote query is not supported.
+ */
+ if (((Const *) node)->consttype == ANYARRAYOID ||
+ ((Const *) node)->consttype == ANYNONARRAYOID)
+ {
+ elog(DEBUG3, "expr has anyarray or anyenum");
+ return true;
+ }
+ break;
+ case T_BoolExpr:
+ case T_NullTest:
+ case T_DistinctExpr:
+ case T_RelabelType:
+ /*
+ * These type of nodes are known as safe to be pushed down.
+ * Of course the subtree of the node, if any, should be checked
+ * continuously at the tail of this function.
+ */
+ break;
+ /*
+ * If function used by the expression is not built-in, it can't be
+ * pushed down because it might has incompatible semantics on remote
+ * side.
+ */
+ case T_FuncExpr:
+ {
+ FuncExpr *fe = (FuncExpr *) node;
+ if (!is_builtin(fe->funcid))
+ {
+ elog(DEBUG3, "expr has user-defined function");
+ return true;
+ }
+ }
+ break;
+ case T_Param:
+ /*
+ * Only external parameters can be pushed down.:
+ */
+ {
+ if (((Param *) node)->paramkind != PARAM_EXTERN)
+ {
+ elog(DEBUG3, "expr has non-external parameter");
+ return true;
+ }
+
+ /* Mark that this expression contains Param node. */
+ context->has_param = true;
+ }
+ break;
+ case T_ScalarArrayOpExpr:
+ /*
+ * Only built-in operators can be pushed down. In addition,
+ * underlying function must be built-in and immutable, but we don't
+ * check volatility here; such check must be done already with
+ * contain_mutable_functions.
+ */
+ {
+ ScalarArrayOpExpr *oe = (ScalarArrayOpExpr *) node;
+
+ if (!is_builtin(oe->opno) || !is_builtin(oe->opfuncid))
+ {
+ elog(DEBUG3, "expr has user-defined scalar-array operator");
+ return true;
+ }
+
+ /*
+ * If the operator takes collatable type as operands, we push
+ * down only "=" and "<>" which are not affected by collation.
+ * Other operators might be safe about collation, but these two
+ * seem enough to cover practical use cases.
+ */
+ if (exprInputCollation(node) != InvalidOid)
+ {
+ char *opname = get_opname(oe->opno);
+
+ if (strcmp(opname, "=") != 0 && strcmp(opname, "<>") != 0)
+ {
+ elog(DEBUG3, "expr has scalar-array operator which takes collatable as operand");
+ return true;
+ }
+ }
+
+ /* operands are checked later */
+ }
+ break;
+ case T_OpExpr:
+ /*
+ * Only built-in operators can be pushed down. In addition,
+ * underlying function must be built-in and immutable, but we don't
+ * check volatility here; such check must be done already with
+ * contain_mutable_functions.
+ */
+ {
+ OpExpr *oe = (OpExpr *) node;
+
+ if (!is_builtin(oe->opno) || !is_builtin(oe->opfuncid))
+ {
+ elog(DEBUG3, "expr has user-defined operator");
+ return true;
+ }
+
+ /*
+ * If the operator takes collatable type as operands, we push
+ * down only "=" and "<>" which are not affected by collation.
+ * Other operators might be safe about collation, but these two
+ * seem enough to cover practical use cases.
+ */
+ if (exprInputCollation(node) != InvalidOid)
+ {
+ char *opname = get_opname(oe->opno);
+
+ if (strcmp(opname, "=") != 0 && strcmp(opname, "<>") != 0)
+ {
+ elog(DEBUG3, "expr has operator which takes collatable as operand");
+ return true;
+ }
+ }
+
+ /* operands are checked later */
+ }
+ break;
+ case T_Var:
+ /*
+ * Var can be pushed down if it is in the foreign table.
+ * XXX Var of other relation can be here?
+ */
+ {
+ Var *var = (Var *) node;
+ foreign_executable_cxt *f_context;
+
+ f_context = (foreign_executable_cxt *) context;
+ if (var->varno != f_context->foreignrel->relid ||
+ var->varlevelsup != 0)
+ {
+ elog(DEBUG3, "expr has var of other relation");
+ return true;
+ }
+ }
+ break;
+ case T_ArrayRef:
+ /*
+ * ArrayRef which holds non-built-in typed elements can't be pushed
+ * down.
+ */
+ {
+ ArrayRef *ar = (ArrayRef *) node;;
+
+ if (!is_builtin(ar->refelemtype))
+ {
+ elog(DEBUG3, "expr has user-defined type as array element");
+ return true;
+ }
+
+ /* Assignment should not be in restrictions. */
+ if (ar->refassgnexpr != NULL)
+ {
+ elog(DEBUG3, "expr has assignment");
+ return true;
+ }
+ }
+ break;
+ case T_ArrayExpr:
+ /*
+ * ArrayExpr which holds non-built-in typed elements can't be pushed
+ * down.
+ */
+ {
+ if (!is_builtin(((ArrayExpr *) node)->element_typeid))
+ {
+ elog(DEBUG3, "expr has user-defined type as array element");
+ return true;
+ }
+ }
+ break;
+ default:
+ {
+ elog(DEBUG3, "expression is too complex: %s",
+ nodeToString(node));
+ return true;
+ }
+ break;
+ }
+
+ return expression_tree_walker(node, foreign_expr_walker, context);
+}
+
+/*
+ * Return true if given object is one of built-in objects.
+ */
+static bool
+is_builtin(Oid oid)
+{
+ return (oid < FirstNormalObjectId);
+}
+
+/*
+ * Deparse WHERE clause from given list of RestrictInfo and append them to buf.
+ * We assume that buf already holds a SQL statement which ends with valid WHERE
+ * clause.
+ *
+ * Only when calling the first time for a statement, is_first should be true.
+ */
+void
+appendWhereClause(StringInfo buf,
+ bool is_first,
+ List *exprs,
+ PlannerInfo *root)
+{
+ bool first = true;
+ ListCell *lc;
+
+ foreach(lc, exprs)
+ {
+ RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
+
+ /* Connect expressions with "AND" and parenthesize whole condition. */
+ if (is_first && first)
+ appendStringInfo(buf, " WHERE ");
+ else
+ appendStringInfo(buf, " AND ");
+
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, ri->clause, root);
+ appendStringInfoChar(buf, ')');
+
+ first = false;
+ }
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
new file mode 100644
index 0000000..6d2af42
--- /dev/null
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -0,0 +1,721 @@
+-- ===================================================================
+-- create FDW objects
+-- ===================================================================
+-- Clean up in case a prior regression run failed
+-- Suppress NOTICE messages when roles don't exist
+SET client_min_messages TO 'error';
+DROP ROLE IF EXISTS postgres_fdw_user;
+RESET client_min_messages;
+CREATE ROLE postgres_fdw_user LOGIN SUPERUSER;
+SET SESSION AUTHORIZATION 'postgres_fdw_user';
+CREATE EXTENSION postgres_fdw;
+CREATE SERVER loopback1 FOREIGN DATA WRAPPER postgres_fdw;
+CREATE SERVER loopback2 FOREIGN DATA WRAPPER postgres_fdw
+ OPTIONS (dbname 'contrib_regression');
+CREATE USER MAPPING FOR public SERVER loopback1
+ OPTIONS (user 'value', password 'value');
+CREATE USER MAPPING FOR postgres_fdw_user SERVER loopback2;
+-- ===================================================================
+-- create objects used through FDW
+-- ===================================================================
+CREATE TYPE user_enum AS ENUM ('foo', 'bar', 'buz');
+CREATE SCHEMA "S 1";
+CREATE TABLE "S 1"."T 1" (
+ "C 1" int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum,
+ CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
+);
+CREATE TABLE "S 1"."T 2" (
+ c1 int NOT NULL,
+ c2 text,
+ CONSTRAINT t2_pkey PRIMARY KEY (c1)
+);
+BEGIN;
+TRUNCATE "S 1"."T 1";
+INSERT INTO "S 1"."T 1"
+ SELECT id,
+ id % 10,
+ to_char(id, 'FM00000'),
+ '1970-01-01'::timestamptz + ((id % 100) || ' days')::interval,
+ '1970-01-01'::timestamp + ((id % 100) || ' days')::interval,
+ id % 10,
+ id % 10,
+ 'foo'::user_enum
+ FROM generate_series(1, 1000) id;
+TRUNCATE "S 1"."T 2";
+INSERT INTO "S 1"."T 2"
+ SELECT id,
+ 'AAA' || to_char(id, 'FM000')
+ FROM generate_series(1, 100) id;
+COMMIT;
+-- ===================================================================
+-- create foreign tables
+-- ===================================================================
+CREATE FOREIGN TABLE ft1 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft1 DROP COLUMN c0;
+CREATE FOREIGN TABLE ft2 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft2 DROP COLUMN c0;
+-- ===================================================================
+-- tests for validator
+-- ===================================================================
+-- requiressl, krbsrvname and gsslib are omitted because they depend on
+-- configure option
+ALTER SERVER loopback1 OPTIONS (
+ use_remote_explain 'false',
+ fdw_startup_cost '123.456',
+ fdw_tuple_cost '0.123',
+ authtype 'value',
+ service 'value',
+ connect_timeout 'value',
+ dbname 'value',
+ host 'value',
+ hostaddr 'value',
+ port 'value',
+ --client_encoding 'value',
+ tty 'value',
+ options 'value',
+ application_name 'value',
+ --fallback_application_name 'value',
+ keepalives 'value',
+ keepalives_idle 'value',
+ keepalives_interval 'value',
+ -- requiressl 'value',
+ sslcompression 'value',
+ sslmode 'value',
+ sslcert 'value',
+ sslkey 'value',
+ sslrootcert 'value',
+ sslcrl 'value'
+ --requirepeer 'value',
+ -- krbsrvname 'value',
+ -- gsslib 'value',
+ --replication 'value'
+);
+ALTER USER MAPPING FOR public SERVER loopback1
+ OPTIONS (DROP user, DROP password);
+ALTER FOREIGN TABLE ft1 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft2 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+\dew+
+ List of foreign-data wrappers
+ Name | Owner | Handler | Validator | Access privileges | FDW Options | Description
+--------------+-------------------+----------------------+------------------------+-------------------+-------------+-------------
+ postgres_fdw | postgres_fdw_user | postgres_fdw_handler | postgres_fdw_validator | | |
+(1 row)
+
+\des+
+ List of foreign servers
+ Name | Owner | Foreign-data wrapper | Access privileges | Type | Version | FDW Options | Description
+-----------+-------------------+----------------------+-------------------+------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
+ loopback1 | postgres_fdw_user | postgres_fdw | | | | (use_remote_explain 'false', fdw_startup_cost '123.456', fdw_tuple_cost '0.123', authtype 'value', service 'value', connect_timeout 'value', dbname 'value', host 'value', hostaddr 'value', port 'value', tty 'value', options 'value', application_name 'value', keepalives 'value', keepalives_idle 'value', keepalives_interval 'value', sslcompression 'value', sslmode 'value', sslcert 'value', sslkey 'value', sslrootcert 'value', sslcrl 'value') |
+ loopback2 | postgres_fdw_user | postgres_fdw | | | | (dbname 'contrib_regression') |
+(2 rows)
+
+\deu+
+ List of user mappings
+ Server | User name | FDW Options
+-----------+-------------------+-------------
+ loopback1 | public |
+ loopback2 | postgres_fdw_user |
+(2 rows)
+
+\det+
+ List of foreign tables
+ Schema | Table | Server | FDW Options | Description
+--------+-------+-----------+--------------------------------+-------------
+ public | ft1 | loopback2 | (nspname 'S 1', relname 'T 1') |
+ public | ft2 | loopback2 | (nspname 'S 1', relname 'T 1') |
+(2 rows)
+
+-- Use only Nested loop for stable results.
+SET enable_mergejoin TO off;
+SET enable_hashjoin TO off;
+-- ===================================================================
+-- simple queries
+-- ===================================================================
+-- single table, with/without alias
+EXPLAIN (COSTS false) SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Limit
+ -> Sort
+ Sort Key: c3, c1
+ -> Foreign Scan on ft1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(5 rows)
+
+SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+-----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+ 102 | 2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
+ 103 | 3 | 00103 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
+ 104 | 4 | 00104 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo
+ 105 | 5 | 00105 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5 | foo
+ 106 | 6 | 00106 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6 | foo
+ 107 | 7 | 00107 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7 | foo
+ 108 | 8 | 00108 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8 | foo
+ 109 | 9 | 00109 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9 | foo
+ 110 | 0 | 00110 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0 | foo
+(10 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Limit
+ -> Sort
+ Sort Key: c3, c1
+ -> Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(5 rows)
+
+SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+-----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+ 102 | 2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
+ 103 | 3 | 00103 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
+ 104 | 4 | 00104 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo
+ 105 | 5 | 00105 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5 | foo
+ 106 | 6 | 00106 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6 | foo
+ 107 | 7 | 00107 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7 | foo
+ 108 | 8 | 00108 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8 | foo
+ 109 | 9 | 00109 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9 | foo
+ 110 | 0 | 00110 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0 | foo
+(10 rows)
+
+-- empty result
+SELECT * FROM ft1 WHERE false;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+----+----+----+----+----+----
+(0 rows)
+
+-- with WHERE clause
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Filter: (c7 >= '1'::bpchar)
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 101)) AND (((c6)::text OPERATOR(pg_catalog.=) '1'::text))
+(3 rows)
+
+SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+-----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+(1 row)
+
+-- aggregate
+SELECT COUNT(*) FROM ft1 t1;
+ count
+-------
+ 1000
+(1 row)
+
+-- join two tables
+SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ c1
+-----
+ 101
+ 102
+ 103
+ 104
+ 105
+ 106
+ 107
+ 108
+ 109
+ 110
+(10 rows)
+
+-- subquery
+SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+ 2 | 2 | 00002 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
+ 3 | 3 | 00003 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
+ 4 | 4 | 00004 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo
+ 5 | 5 | 00005 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5 | foo
+ 6 | 6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6 | foo
+ 7 | 7 | 00007 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7 | foo
+ 8 | 8 | 00008 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8 | foo
+ 9 | 9 | 00009 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9 | foo
+ 10 | 0 | 00010 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0 | foo
+(10 rows)
+
+-- subquery+MAX
+SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY c1;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+------+----+-------+------------------------------+--------------------------+----+------------+-----
+ 1000 | 0 | 01000 | Thu Jan 01 00:00:00 1970 PST | Thu Jan 01 00:00:00 1970 | 0 | 0 | foo
+(1 row)
+
+-- used in CTE
+WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
+ c1 | c2 | c3 | c4
+----+----+-------+------------------------------
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST
+ 2 | 2 | 00002 | Sat Jan 03 00:00:00 1970 PST
+ 3 | 3 | 00003 | Sun Jan 04 00:00:00 1970 PST
+ 4 | 4 | 00004 | Mon Jan 05 00:00:00 1970 PST
+ 5 | 5 | 00005 | Tue Jan 06 00:00:00 1970 PST
+ 6 | 6 | 00006 | Wed Jan 07 00:00:00 1970 PST
+ 7 | 7 | 00007 | Thu Jan 08 00:00:00 1970 PST
+ 8 | 8 | 00008 | Fri Jan 09 00:00:00 1970 PST
+ 9 | 9 | 00009 | Sat Jan 10 00:00:00 1970 PST
+ 10 | 0 | 00010 | Sun Jan 11 00:00:00 1970 PST
+(10 rows)
+
+-- fixed values
+SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+ ?column? | ?column?
+----------+----------
+ fixed |
+(1 row)
+
+-- user-defined operator/function
+CREATE FUNCTION postgres_fdw_abs(int) RETURNS int AS $$
+BEGIN
+RETURN abs($1);
+END
+$$ LANGUAGE plpgsql IMMUTABLE;
+CREATE OPERATOR === (
+ LEFTARG = int,
+ RIGHTARG = int,
+ PROCEDURE = int4eq,
+ COMMUTATOR = ===,
+ NEGATOR = !==
+);
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = postgres_fdw_abs(t1.c2);
+ QUERY PLAN
+-------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Filter: (c1 = postgres_fdw_abs(c2))
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(3 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 === t1.c2;
+ QUERY PLAN
+-------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Filter: (c1 === c2)
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(3 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = abs(t1.c2);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) pg_catalog.abs(c2)))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = t1.c2;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) c2))
+(2 rows)
+
+-- ===================================================================
+-- WHERE push down
+-- ===================================================================
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1; -- Var, OpExpr(b), Const
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 100)) AND ((c2 OPERATOR(pg_catalog.=) 0))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NULL; -- NullTest
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NULL))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL; -- NullTest
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NOT NULL))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((pg_catalog.round(pg_catalog."numeric"(pg_catalog.abs("C 1")), 0) OPERATOR(pg_catalog.=) 1::numeric))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = -c1; -- OpExpr(l)
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) (OPERATOR(pg_catalog.-) "C 1")))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE 1 = c1!; -- OpExpr(r)
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((1::numeric OPERATOR(pg_catalog.=) (pg_catalog.int8("C 1") OPERATOR(pg_catalog.!))))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NOT NULL) IS DISTINCT FROM ("C 1" IS NOT NULL))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) ANY (ARRAY[c2, 1, ("C 1" OPERATOR(pg_catalog.+) 0)])))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = (ARRAY[c1,c2,3])[1]; -- ArrayRef
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) ((ARRAY["C 1", c2, 3])[1])))
+(2 rows)
+
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo'; -- no push-down
+ QUERY PLAN
+-------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Filter: (c8 = 'foo'::user_enum)
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(3 rows)
+
+-- ===================================================================
+-- parameterized queries
+-- ===================================================================
+-- simple join
+PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
+EXPLAIN (COSTS false) EXECUTE st1(1, 2);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------
+ Nested Loop
+ -> Foreign Scan on ft1 t1
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+ -> Foreign Scan on ft2 t2
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 2))
+(5 rows)
+
+EXECUTE st1(1, 1);
+ c3 | c3
+-------+-------
+ 00001 | 00001
+(1 row)
+
+EXECUTE st1(101, 101);
+ c3 | c3
+-------+-------
+ 00101 | 00101
+(1 row)
+
+-- subquery using stable function (can't be pushed down)
+PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c4) = 6) ORDER BY c1;
+EXPLAIN (COSTS false) EXECUTE st2(10, 20);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
+ Sort Key: t1.c1
+ -> Nested Loop Semi Join
+ Join Filter: (t1.c3 = t2.c3)
+ -> Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.<) 20))
+ -> Materialize
+ -> Foreign Scan on ft2 t2
+ Filter: (date_part('dow'::text, c4) = 6::double precision)
+ Remote SQL: SELECT NULL, NULL, c3, c4, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.>) 10))
+(10 rows)
+
+EXECUTE st2(10, 20);
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 16 | 6 | 00016 | Sat Jan 17 00:00:00 1970 PST | Sat Jan 17 00:00:00 1970 | 6 | 6 | foo
+(1 row)
+
+EXECUTE st1(101, 101);
+ c3 | c3
+-------+-------
+ 00101 | 00101
+(1 row)
+
+-- subquery using immutable function (can be pushed down)
+PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c5) = 6) ORDER BY c1;
+EXPLAIN (COSTS false) EXECUTE st3(10, 20);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
+ Sort Key: t1.c1
+ -> Nested Loop Semi Join
+ Join Filter: (t1.c3 = t2.c3)
+ -> Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.<) 20))
+ -> Materialize
+ -> Foreign Scan on ft2 t2
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.>) 10)) AND ((pg_catalog.date_part('dow'::text, c5) OPERATOR(pg_catalog.=) 6::double precision))
+(9 rows)
+
+EXECUTE st3(10, 20);
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 16 | 6 | 00016 | Sat Jan 17 00:00:00 1970 PST | Sat Jan 17 00:00:00 1970 | 6 | 6 | foo
+(1 row)
+
+EXECUTE st3(20, 30);
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 23 | 3 | 00023 | Sat Jan 24 00:00:00 1970 PST | Sat Jan 24 00:00:00 1970 | 3 | 3 | foo
+(1 row)
+
+-- custom plan should be chosen
+PREPARE st4(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 = $1;
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(2 rows)
+
+EXPLAIN (COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on ft1 t1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) $1))
+(2 rows)
+
+-- cleanup
+DEALLOCATE st1;
+DEALLOCATE st2;
+DEALLOCATE st3;
+DEALLOCATE st4;
+-- ===================================================================
+-- used in pl/pgsql function
+-- ===================================================================
+CREATE OR REPLACE FUNCTION f_test(p_c1 int) RETURNS int AS $$
+DECLARE
+ v_c1 int;
+BEGIN
+ SELECT c1 INTO v_c1 FROM ft1 WHERE c1 = p_c1 LIMIT 1;
+ PERFORM c1 FROM ft1 WHERE c1 = p_c1 AND p_c1 = v_c1 LIMIT 1;
+ RETURN v_c1;
+END;
+$$ LANGUAGE plpgsql;
+SELECT f_test(100);
+ f_test
+--------
+ 100
+(1 row)
+
+DROP FUNCTION f_test(int);
+-- ===================================================================
+-- cost estimation options
+-- ===================================================================
+ALTER SERVER loopback1 OPTIONS (SET use_remote_explain 'true');
+ALTER SERVER loopback1 OPTIONS (SET fdw_startup_cost '0');
+ALTER SERVER loopback1 OPTIONS (SET fdw_tuple_cost '0');
+EXPLAIN (COSTS false) SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Limit
+ -> Sort
+ Sort Key: c3, c1
+ -> Foreign Scan on ft1
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(5 rows)
+
+ALTER SERVER loopback1 OPTIONS (DROP use_remote_explain);
+ALTER SERVER loopback1 OPTIONS (DROP fdw_startup_cost);
+ALTER SERVER loopback1 OPTIONS (DROP fdw_tuple_cost);
+-- ===================================================================
+-- connection management
+-- ===================================================================
+SELECT srvname, usename FROM postgres_fdw_connections;
+ srvname | usename
+-----------+-------------------
+ loopback2 | postgres_fdw_user
+(1 row)
+
+SELECT postgres_fdw_disconnect(srvid, usesysid) FROM postgres_fdw_get_connections();
+ postgres_fdw_disconnect
+-------------------------
+ OK
+(1 row)
+
+SELECT srvname, usename FROM postgres_fdw_connections;
+ srvname | usename
+---------+---------
+(0 rows)
+
+-- ===================================================================
+-- conversion error
+-- ===================================================================
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE int;
+SELECT * FROM ft1 WHERE c1 = 1; -- ERROR
+ERROR: invalid input syntax for integer: "foo"
+CONTEXT: column c8 of foreign table ft1
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE user_enum;
+-- ===================================================================
+-- subtransaction
+-- + local/remote error doesn't break cursor
+-- + remote error discards connection
+-- ===================================================================
+BEGIN;
+DECLARE c CURSOR FOR SELECT * FROM ft1 ORDER BY c1;
+FETCH c;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+(1 row)
+
+SAVEPOINT s;
+ERROR OUT; -- ERROR
+ERROR: syntax error at or near "ERROR"
+LINE 1: ERROR OUT;
+ ^
+ROLLBACK TO s;
+SELECT srvname FROM postgres_fdw_connections;
+ srvname
+-----------
+ loopback2
+(1 row)
+
+FETCH c;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 2 | 2 | 00002 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
+(1 row)
+
+SAVEPOINT s;
+SELECT * FROM ft1 WHERE 1 / (c1 - 1) > 0; -- ERROR
+ERROR: could not execute remote query
+DETAIL: ERROR: division by zero
+
+HINT: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (((1 OPERATOR(pg_catalog./) ("C 1" OPERATOR(pg_catalog.-) 1)) OPERATOR(pg_catalog.>) 0))
+ROLLBACK TO s;
+SELECT srvname FROM postgres_fdw_connections;
+ srvname
+---------
+(0 rows)
+
+FETCH c;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 3 | 3 | 00003 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
+(1 row)
+
+SELECT * FROM ft1 ORDER BY c1 LIMIT 1;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+(1 row)
+
+COMMIT;
+SELECT srvname FROM postgres_fdw_connections;
+ srvname
+-----------
+ loopback2
+(1 row)
+
+ERROR OUT; -- ERROR
+ERROR: syntax error at or near "ERROR"
+LINE 1: ERROR OUT;
+ ^
+SELECT srvname FROM postgres_fdw_connections;
+ srvname
+---------
+(0 rows)
+
+-- ===================================================================
+-- cleanup
+-- ===================================================================
+DROP OPERATOR === (int, int) CASCADE;
+DROP OPERATOR !== (int, int) CASCADE;
+DROP FUNCTION postgres_fdw_abs(int);
+DROP SCHEMA "S 1" CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to table "S 1"."T 1"
+drop cascades to table "S 1"."T 2"
+DROP TYPE user_enum CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to foreign table ft2 column c8
+drop cascades to foreign table ft1 column c8
+DROP EXTENSION postgres_fdw CASCADE;
+NOTICE: drop cascades to 6 other objects
+DETAIL: drop cascades to server loopback1
+drop cascades to user mapping for public
+drop cascades to server loopback2
+drop cascades to user mapping for postgres_fdw_user
+drop cascades to foreign table ft1
+drop cascades to foreign table ft2
+\c
+DROP ROLE postgres_fdw_user;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
new file mode 100644
index 0000000..3c127dc
--- /dev/null
+++ b/contrib/postgres_fdw/option.c
@@ -0,0 +1,291 @@
+/*-------------------------------------------------------------------------
+ *
+ * option.c
+ * FDW option handling
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgres_fdw/option.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "libpq-fe.h"
+
+#include "access/reloptions.h"
+#include "catalog/pg_foreign_data_wrapper.h"
+#include "catalog/pg_foreign_server.h"
+#include "catalog/pg_foreign_table.h"
+#include "catalog/pg_user_mapping.h"
+#include "commands/defrem.h"
+#include "fmgr.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "utils/memutils.h"
+
+#include "postgres_fdw.h"
+
+/*
+ * SQL functions
+ */
+extern Datum postgres_fdw_validator(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgres_fdw_validator);
+
+/*
+ * Describes the valid options for objects that this wrapper uses.
+ */
+typedef struct PostgresFdwOption
+{
+ const char *keyword;
+ Oid optcontext; /* Oid of catalog in which options may appear */
+ bool is_libpq_opt; /* true if it's used in libpq */
+} PostgresFdwOption;
+
+/*
+ * Valid options for postgres_fdw.
+ * Allocated and filled in InitPostgresFdwOptions.
+ */
+static PostgresFdwOption *postgres_fdw_options;
+
+/*
+ * Valid options of libpq.
+ * Allocated and filled in InitPostgresFdwOptions.
+ */
+static PQconninfoOption *libpq_options;
+
+/*
+ * Helper functions
+ */
+static bool is_valid_option(const char *keyword, Oid context);
+
+/*
+ * Validate the generic options given to a FOREIGN DATA WRAPPER, SERVER,
+ * USER MAPPING or FOREIGN TABLE that uses postgres_fdw.
+ *
+ * Raise an ERROR if the option or its value is considered invalid.
+ */
+Datum
+postgres_fdw_validator(PG_FUNCTION_ARGS)
+{
+ List *options_list = untransformRelOptions(PG_GETARG_DATUM(0));
+ Oid catalog = PG_GETARG_OID(1);
+ ListCell *cell;
+
+ /*
+ * Check that only options supported by postgres_fdw, and allowed for the
+ * current object type, are given.
+ */
+ foreach(cell, options_list)
+ {
+ DefElem *def = (DefElem *) lfirst(cell);
+
+ if (!is_valid_option(def->defname, catalog))
+ {
+ PostgresFdwOption *opt;
+ StringInfoData buf;
+
+ /*
+ * Unknown option specified, complain about it. Provide a hint
+ * with list of valid options for the object.
+ */
+ initStringInfo(&buf);
+ for (opt = postgres_fdw_options; opt->keyword; opt++)
+ {
+ if (catalog == opt->optcontext)
+ appendStringInfo(&buf, "%s%s", (buf.len > 0) ? ", " : "",
+ opt->keyword);
+ }
+
+ ereport(ERROR,
+ (errcode(ERRCODE_FDW_INVALID_OPTION_NAME),
+ errmsg("invalid option \"%s\"", def->defname),
+ errhint("Valid options in this context are: %s",
+ buf.data)));
+ }
+
+ if (strcmp(def->defname, "use_remote_explain") == 0)
+ {
+ /* use_remote_explain accepts only boolean values */
+ (void) defGetBoolean(def);
+ }
+ else if (strcmp(def->defname, "fdw_startup_cost") == 0)
+ {
+ double val;
+ char *endp;
+ val = strtod(defGetString(def), &endp);
+ if (*endp || val < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("fdw_startup_cost requires positive numeric value or zero")));
+ }
+ else if (strcmp(def->defname, "fdw_tuple_cost") == 0)
+ {
+ double val;
+ char *endp;
+ val = strtod(defGetString(def), &endp);
+ if (*endp || val < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("fdw_tuple_cost requires positive numeric value or zero")));
+ }
+ }
+
+ /*
+ * We don't care option-specific limitation here; they will be validated at
+ * the execution time.
+ */
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Initialize option check mechanism. This must be called before any call
+ * against other functions in options.c, so _PG_init would be proper timing.
+ */
+void
+InitPostgresFdwOptions(void)
+{
+ int libpq_opt_num;
+ PQconninfoOption *lopt;
+ PostgresFdwOption *popt;
+ /* non-libpq FDW-specific FDW options */
+ static const PostgresFdwOption non_libpq_options[] = {
+ { "nspname", ForeignTableRelationId, false} ,
+ { "relname", ForeignTableRelationId, false} ,
+ { "colname", AttributeRelationId, false} ,
+ /* use_remote_explain is available on both server and table */
+ { "use_remote_explain", ForeignServerRelationId, false} ,
+ { "use_remote_explain", ForeignTableRelationId, false} ,
+ /* cost factors */
+ { "fdw_startup_cost", ForeignServerRelationId, false} ,
+ { "fdw_tuple_cost", ForeignServerRelationId, false} ,
+ { NULL, InvalidOid, false },
+ };
+
+ /* Prevent redundant initialization. */
+ if (postgres_fdw_options)
+ return;
+
+ /*
+ * Get list of valid libpq options.
+ *
+ * To avoid unnecessary work, we get the list once and use it throughout
+ * the lifetime of this backend process. We don't need to care about
+ * memory context issues, because PQconndefaults allocates with malloc.
+ */
+ libpq_options = PQconndefaults();
+ if (!libpq_options) /* assume reason for failure is OOM */
+ ereport(ERROR,
+ (errcode(ERRCODE_FDW_OUT_OF_MEMORY),
+ errmsg("out of memory"),
+ errdetail("could not get libpq's default connection options")));
+
+ /* Count how much libpq options are available. */
+ libpq_opt_num = 0;
+ for (lopt = libpq_options; lopt->keyword; lopt++)
+ libpq_opt_num++;
+
+ /*
+ * Construct an array which consists of all valid options for postgres_fdw,
+ * by appending FDW-specific options to libpq options.
+ *
+ * We use plain malloc here to allocate postgres_fdw_options because it
+ * lives as long as the backend process does. Besides, keeping
+ * libpq_options in memory allows us to avoid copying every keyword string.
+ */
+ postgres_fdw_options = (PostgresFdwOption *)
+ malloc(sizeof(PostgresFdwOption) * libpq_opt_num +
+ sizeof(non_libpq_options));
+ if (postgres_fdw_options == NULL)
+ elog(ERROR, "out of memory");
+ popt = postgres_fdw_options;
+ for (lopt = libpq_options; lopt->keyword; lopt++)
+ {
+ /* Disallow some debug options. */
+ if (strcmp(lopt->keyword, "replication") == 0 ||
+ strcmp(lopt->keyword, "fallback_application_name") == 0 ||
+ strcmp(lopt->keyword, "client_encoding") == 0)
+ continue;
+
+ /* We don't have to copy keyword string, as described above. */
+ popt->keyword = lopt->keyword;
+
+ /* "user" and any secret options are allowed on only user mappings. */
+ if (strcmp(lopt->keyword, "user") == 0 || strchr(lopt->dispchar, '*'))
+ popt->optcontext = UserMappingRelationId;
+ else
+ popt->optcontext = ForeignServerRelationId;
+ popt->is_libpq_opt = true;
+
+ /* Advance the position where next option will be placed. */
+ popt++;
+ }
+
+ /* Append FDW-specific options. */
+ memcpy(popt, non_libpq_options, sizeof(non_libpq_options));
+}
+
+/*
+ * Check whether the given option is one of the valid postgres_fdw options.
+ * context is the Oid of the catalog holding the object the option is for.
+ */
+static bool
+is_valid_option(const char *keyword, Oid context)
+{
+ PostgresFdwOption *opt;
+
+ for (opt = postgres_fdw_options; opt->keyword; opt++)
+ {
+ if (context == opt->optcontext && strcmp(opt->keyword, keyword) == 0)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Check whether the given option is one of the valid libpq options.
+ * context is the Oid of the catalog holding the object the option is for.
+ */
+static bool
+is_libpq_option(const char *keyword)
+{
+ PostgresFdwOption *opt;
+
+ for (opt = postgres_fdw_options; opt->keyword; opt++)
+ {
+ if (opt->is_libpq_opt && strcmp(opt->keyword, keyword) == 0)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Generate key-value arrays which includes only libpq options from the list
+ * which contains any kind of options.
+ */
+int
+ExtractConnectionOptions(List *defelems, const char **keywords,
+ const char **values)
+{
+ ListCell *lc;
+ int i;
+
+ i = 0;
+ foreach(lc, defelems)
+ {
+ DefElem *d = (DefElem *) lfirst(lc);
+ if (is_libpq_option(d->defname))
+ {
+ keywords[i] = d->defname;
+ values[i] = defGetString(d);
+ i++;
+ }
+ }
+ return i;
+}
+
diff --git a/contrib/postgres_fdw/postgres_fdw--1.0.sql b/contrib/postgres_fdw/postgres_fdw--1.0.sql
new file mode 100644
index 0000000..56b39b9
--- /dev/null
+++ b/contrib/postgres_fdw/postgres_fdw--1.0.sql
@@ -0,0 +1,39 @@
+/* contrib/postgres_fdw/postgres_fdw--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION postgres_fdw" to load this file. \quit
+
+CREATE FUNCTION postgres_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION postgres_fdw_validator(text[], oid)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER postgres_fdw
+ HANDLER postgres_fdw_handler
+ VALIDATOR postgres_fdw_validator;
+
+/* connection management functions and view */
+CREATE FUNCTION postgres_fdw_get_connections(out srvid oid, out usesysid oid)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION postgres_fdw_disconnect(oid, oid)
+RETURNS text
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE VIEW postgres_fdw_connections AS
+SELECT c.srvid srvid,
+ s.srvname srvname,
+ c.usesysid usesysid,
+ pg_get_userbyid(c.usesysid) usename
+ FROM postgres_fdw_get_connections() c
+ JOIN pg_catalog.pg_foreign_server s ON (s.oid = c.srvid);
+GRANT SELECT ON postgres_fdw_connections TO public;
+
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
new file mode 100644
index 0000000..dc57dd4
--- /dev/null
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -0,0 +1,1428 @@
+/*-------------------------------------------------------------------------
+ *
+ * postgres_fdw.c
+ * foreign-data wrapper for remote PostgreSQL servers.
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgres_fdw/postgres_fdw.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "fmgr.h"
+
+#include "access/htup_details.h"
+#include "catalog/pg_foreign_server.h"
+#include "catalog/pg_foreign_table.h"
+#include "catalog/pg_type.h"
+#include "commands/defrem.h"
+#include "commands/explain.h"
+#include "commands/vacuum.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+#include "postgres_fdw.h"
+#include "connection.h"
+
+PG_MODULE_MAGIC;
+
+/* Defalut cost to establish a connection. */
+#define DEFAULT_FDW_STARTUP_COST 100.0
+
+/* Defalut cost to process 1 row, including data transfer. */
+#define DEFAULT_FDW_TUPLE_COST 0.001
+
+/*
+ * FDW-specific information for RelOptInfo.fdw_private. This is used to pass
+ * information from postgresGetForeignRelSize to postgresGetForeignPaths.
+ */
+typedef struct PostgresFdwPlanState {
+ /*
+ * These are generated in GetForeignRelSize, and also used in subsequent
+ * GetForeignPaths.
+ */
+ StringInfoData sql;
+ Cost startup_cost;
+ Cost total_cost;
+ List *remote_conds;
+ List *param_conds;
+ List *local_conds;
+ int width; /* obtained by remote EXPLAIN */
+
+ /* Cached catalog information. */
+ ForeignTable *table;
+ ForeignServer *server;
+} PostgresFdwPlanState;
+
+/*
+ * Index of FDW-private information stored in fdw_private list.
+ *
+ * We store various information in ForeignScan.fdw_private to pass them beyond
+ * the boundary between planner and executor. Finally FdwPlan holds items
+ * below:
+ *
+ * 1) plain SELECT statement
+ *
+ * These items are indexed with the enum FdwPrivateIndex, so an item
+ * can be accessed directly via list_nth(). For example of SELECT statement:
+ * sql = list_nth(fdw_private, FdwPrivateSelectSql)
+ */
+enum FdwPrivateIndex {
+ /* SQL statements */
+ FdwPrivateSelectSql,
+
+ /* # of elements stored in the list fdw_private */
+ FdwPrivateNum,
+};
+
+/*
+ * Describe the attribute where data conversion fails.
+ */
+typedef struct ErrorPos {
+ Oid relid; /* oid of the foreign table */
+ AttrNumber cur_attno; /* attribute number under process */
+} ErrorPos;
+
+/*
+ * Describes an execution state of a foreign scan against a foreign table
+ * using postgres_fdw.
+ */
+typedef struct PostgresFdwExecutionState
+{
+ List *fdw_private; /* FDW-private information */
+
+ /* for remote query execution */
+ PGconn *conn; /* connection for the scan */
+ Oid *param_types; /* type array of external parameter */
+ const char **param_values; /* value array of external parameter */
+
+ /* for tuple generation. */
+ AttrNumber attnum; /* # of non-dropped attribute */
+ Datum *values; /* column value buffer */
+ bool *nulls; /* column null indicator buffer */
+ AttInMetadata *attinmeta; /* attribute metadata */
+
+ /* for storing result tuples */
+ MemoryContext scan_cxt; /* context for per-scan lifespan data */
+ MemoryContext temp_cxt; /* context for per-tuple temporary data */
+ Tuplestorestate *tuples; /* result of the scan */
+
+ /* for error handling. */
+ ErrorPos errpos;
+} PostgresFdwExecutionState;
+
+/*
+ * Describes a state of analyze request for a foreign table.
+ */
+typedef struct PostgresAnalyzeState
+{
+ /* for tuple generation. */
+ TupleDesc tupdesc;
+ AttInMetadata *attinmeta;
+ Datum *values;
+ bool *nulls;
+
+ /* for random sampling */
+ HeapTuple *rows; /* result buffer */
+ int targrows; /* target # of sample rows */
+ int numrows; /* # of samples collected */
+ double samplerows; /* # of rows fetched */
+ double rowstoskip; /* # of rows skipped before next sample */
+ double rstate; /* random state */
+
+ /* for storing result tuples */
+ MemoryContext anl_cxt; /* context for per-analyze lifespan data */
+ MemoryContext temp_cxt; /* context for per-tuple temporary data */
+
+ /* for error handling. */
+ ErrorPos errpos;
+} PostgresAnalyzeState;
+
+/*
+ * SQL functions
+ */
+extern Datum postgres_fdw_handler(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgres_fdw_handler);
+
+/*
+ * FDW callback routines
+ */
+static void postgresGetForeignRelSize(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid);
+static void postgresGetForeignPaths(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid);
+static ForeignScan *postgresGetForeignPlan(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid,
+ ForeignPath *best_path,
+ List *tlist,
+ List *scan_clauses);
+static void postgresExplainForeignScan(ForeignScanState *node,
+ ExplainState *es);
+static void postgresBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *postgresIterateForeignScan(ForeignScanState *node);
+static void postgresReScanForeignScan(ForeignScanState *node);
+static void postgresEndForeignScan(ForeignScanState *node);
+static bool postgresAnalyzeForeignTable(Relation relation,
+ AcquireSampleRowsFunc *func,
+ BlockNumber *totalpages);
+
+/*
+ * Helper functions
+ */
+static void get_remote_estimate(const char *sql,
+ PGconn *conn,
+ double *rows,
+ int *width,
+ Cost *startup_cost,
+ Cost *total_cost);
+static void execute_query(ForeignScanState *node);
+static void query_row_processor(PGresult *res, ForeignScanState *node,
+ bool first);
+static void analyze_row_processor(PGresult *res, PostgresAnalyzeState *astate,
+ bool first);
+static void postgres_fdw_error_callback(void *arg);
+static int postgresAcquireSampleRowsFunc(Relation relation, int elevel,
+ HeapTuple *rows, int targrows,
+ double *totalrows,
+ double *totaldeadrows);
+
+/* Exported functions, but not written in postgres_fdw.h. */
+void _PG_init(void);
+void _PG_fini(void);
+
+/*
+ * Module-specific initialization.
+ */
+void
+_PG_init(void)
+{
+ InitPostgresFdwOptions();
+}
+
+/*
+ * Module-specific clean up.
+ */
+void
+_PG_fini(void)
+{
+}
+
+/*
+ * Foreign-data wrapper handler function: return a struct with pointers
+ * to my callback routines.
+ */
+Datum
+postgres_fdw_handler(PG_FUNCTION_ARGS)
+{
+ FdwRoutine *routine = makeNode(FdwRoutine);
+
+ /* Required handler functions. */
+ routine->GetForeignRelSize = postgresGetForeignRelSize;
+ routine->GetForeignPaths = postgresGetForeignPaths;
+ routine->GetForeignPlan = postgresGetForeignPlan;
+ routine->ExplainForeignScan = postgresExplainForeignScan;
+ routine->BeginForeignScan = postgresBeginForeignScan;
+ routine->IterateForeignScan = postgresIterateForeignScan;
+ routine->ReScanForeignScan = postgresReScanForeignScan;
+ routine->EndForeignScan = postgresEndForeignScan;
+
+ /* Optional handler functions. */
+ routine->AnalyzeForeignTable = postgresAnalyzeForeignTable;
+
+ PG_RETURN_POINTER(routine);
+}
+
+/*
+ * postgresGetForeignRelSize
+ * Estimate # of rows and width of the result of the scan
+ *
+ * Here we estimate number of rows returned by the scan in two steps. In the
+ * first step, we execute remote EXPLAIN command to obtain the number of rows
+ * returned from remote side. In the second step, we calculate the selectivity
+ * of the filtering done on local side, and modify first estimate.
+ *
+ * We have to get some catalog objects and generate remote query string here,
+ * so we store such expensive information in FDW private area of RelOptInfo and
+ * pass them to subsequent functions for reuse.
+ */
+static void
+postgresGetForeignRelSize(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid)
+{
+ bool use_remote_explain = false;
+ ListCell *lc;
+ PostgresFdwPlanState *fpstate;
+ StringInfo sql;
+ ForeignTable *table;
+ ForeignServer *server;
+ Selectivity sel;
+ double rows;
+ int width;
+ Cost startup_cost;
+ Cost total_cost;
+ List *remote_conds = NIL;
+ List *param_conds = NIL;
+ List *local_conds = NIL;
+
+ /*
+ * We use PostgresFdwPlanState to pass various information to subsequent
+ * functions.
+ */
+ fpstate = palloc0(sizeof(PostgresFdwPlanState));
+ initStringInfo(&fpstate->sql);
+ sql = &fpstate->sql;
+
+ /*
+ * Determine whether we use remote estimate or not. Note that per-table
+ * setting overrides per-server setting.
+ */
+ table = GetForeignTable(foreigntableid);
+ server = GetForeignServer(table->serverid);
+ foreach (lc, server->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+ if (strcmp(def->defname, "use_remote_explain") == 0)
+ {
+ use_remote_explain = defGetBoolean(def);
+ break;
+ }
+ }
+ foreach (lc, table->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+ if (strcmp(def->defname, "use_remote_explain") == 0)
+ {
+ use_remote_explain = defGetBoolean(def);
+ break;
+ }
+ }
+
+ /*
+ * Construct remote query which consists of SELECT, FROM, and WHERE
+ * clauses. Conditions which contain any Param node are excluded because
+ * placeholder can't be used in EXPLAIN statement. Such conditions are
+ * appended later.
+ */
+ classifyConditions(root, baserel, &remote_conds, ¶m_conds,
+ &local_conds);
+ deparseSimpleSql(sql, root, baserel, local_conds);
+ if (list_length(remote_conds) > 0)
+ appendWhereClause(sql, true, remote_conds, root);
+ elog(DEBUG3, "Query SQL: %s", sql->data);
+
+ /*
+ * If the table or the server is configured to use remote EXPLAIN, connect
+ * the foreign server and execute EXPLAIN with conditions which don't
+ * contain any parameter reference. Otherwise, estimate rows in the way
+ * similar to ordinary tables.
+ */
+ if (use_remote_explain)
+ {
+ UserMapping *user;
+ PGconn *conn;
+
+ user = GetUserMapping(GetOuterUserId(), server->serverid);
+ conn = GetConnection(server, user, false);
+ get_remote_estimate(sql->data, conn, &rows, &width,
+ &startup_cost, &total_cost);
+ ReleaseConnection(conn);
+
+ /*
+ * Estimate selectivity of conditions which are not used in remote
+ * EXPLAIN by calling clauselist_selectivity(). The best we can do for
+ * parameterized condition is to estimate selectivity on the basis of
+ * local statistics. When we actually obtain result rows, such
+ * conditions are deparsed into remote query and reduce rows
+ * transferred.
+ */
+ sel = 1;
+ sel *= clauselist_selectivity(root, param_conds,
+ baserel->relid, JOIN_INNER, NULL);
+ sel *= clauselist_selectivity(root, local_conds,
+ baserel->relid, JOIN_INNER, NULL);
+
+ /* Report estimated numbers to planner. */
+ baserel->rows = rows * sel;
+ }
+ else
+ {
+ /*
+ * Estimate rows from the result of the last ANALYZE, and all
+ * conditions specified in original query.
+ */
+ set_baserel_size_estimates(root, baserel);
+
+ /* Save estimated width to pass it to consequence functions */
+ width = baserel->width;
+ }
+
+ /*
+ * Finish deparsing remote query by adding conditions which are unavailable
+ * in remote EXPLAIN since they contain parameter references.
+ */
+ if (list_length(param_conds) > 0)
+ appendWhereClause(sql, !(list_length(remote_conds) > 0), param_conds,
+ root);
+
+ /*
+ * Pack obtained information into a object and store it in FDW-private area
+ * of RelOptInfo to pass them to subsequent functions.
+ */
+ fpstate->startup_cost = startup_cost;
+ fpstate->total_cost = total_cost;
+ fpstate->remote_conds = remote_conds;
+ fpstate->param_conds = param_conds;
+ fpstate->local_conds = local_conds;
+ fpstate->width = width;
+ fpstate->table = table;
+ fpstate->server = server;
+ baserel->fdw_private = (void *) fpstate;
+}
+
+/*
+ * postgresGetForeignPaths
+ * Create possible scan paths for a scan on the foreign table
+ */
+static void
+postgresGetForeignPaths(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid)
+{
+ PostgresFdwPlanState *fpstate;
+ ForeignPath *path;
+ ListCell *lc;
+ double fdw_startup_cost = DEFAULT_FDW_STARTUP_COST;
+ double fdw_tuple_cost = DEFAULT_FDW_TUPLE_COST;
+ Cost startup_cost;
+ Cost total_cost;
+ List *fdw_private;
+
+ /* Cache frequently accessed value */
+ fpstate = (PostgresFdwPlanState *) baserel->fdw_private;
+
+ /*
+ * We have cost values which are estimated on remote side, so adjust them
+ * for better estimate which respect various stuffs to complete the scan,
+ * such as sending query, transferring result, and local filtering.
+ */
+ startup_cost = fpstate->startup_cost;
+ total_cost = fpstate->total_cost;
+
+ /*
+ * Adjust costs with factors of the corresponding foreign server:
+ * - add cost to establish connection to both startup and total
+ * - add cost to manipulate on remote, and transfer result to total
+ * - add cost to manipulate tuples on local side to total
+ */
+ foreach(lc, fpstate->server->options)
+ {
+ DefElem *d = (DefElem *) lfirst(lc);
+ if (strcmp(d->defname, "fdw_startup_cost") == 0)
+ fdw_startup_cost = strtod(defGetString(d), NULL);
+ else if (strcmp(d->defname, "fdw_tuple_cost") == 0)
+ fdw_tuple_cost = strtod(defGetString(d), NULL);
+ }
+ startup_cost += fdw_startup_cost;
+ total_cost += fdw_startup_cost;
+ total_cost += fdw_tuple_cost * baserel->rows;
+ total_cost += cpu_tuple_cost * baserel->rows;
+
+ /* Pass SQL statement from planner to executor through FDW private area. */
+ fdw_private = list_make1(makeString(fpstate->sql.data));
+
+ /*
+ * Create simplest ForeignScan path node and add it to baserel. This path
+ * corresponds to SeqScan path of regular tables.
+ */
+ path = create_foreignscan_path(root, baserel,
+ baserel->rows,
+ startup_cost,
+ total_cost,
+ NIL, /* no pathkeys */
+ NULL, /* no outer rel either */
+ fdw_private);
+ add_path(baserel, (Path *) path);
+
+ /*
+ * XXX We can consider sorted path or parameterized path here if we know
+ * that foreign table is indexed on remote end. For this purpose, we
+ * might have to support FOREIGN INDEX to represent possible sets of sort
+ * keys and/or filtering.
+ */
+}
+
+/*
+ * postgresGetForeignPlan
+ * Create ForeignScan plan node which implements selected best path
+ */
+static ForeignScan *
+postgresGetForeignPlan(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid,
+ ForeignPath *best_path,
+ List *tlist,
+ List *scan_clauses)
+{
+ PostgresFdwPlanState *fpstate;
+ Index scan_relid = baserel->relid;
+ List *fdw_private = NIL;
+ List *fdw_exprs = NIL;
+ List *local_exprs = NIL;
+ ListCell *lc;
+
+ /* Cache frequently accessed value */
+ fpstate = (PostgresFdwPlanState *) baserel->fdw_private;
+
+ /*
+ * We need lists of Expr other than the lists of RestrictInfo. Now we can
+ * merge remote_conds and param_conds into fdw_exprs, because they are
+ * evaluated on remote side for actual remote query.
+ */
+ foreach(lc, fpstate->remote_conds)
+ fdw_exprs = lappend(fdw_exprs, ((RestrictInfo *) lfirst(lc))->clause);
+ foreach(lc, fpstate->param_conds)
+ fdw_exprs = lappend(fdw_exprs, ((RestrictInfo *) lfirst(lc))->clause);
+ foreach(lc, fpstate->local_conds)
+ local_exprs = lappend(local_exprs,
+ ((RestrictInfo *) lfirst(lc))->clause);
+
+ /*
+ * Make a list contains SELECT statement to it to executor with plan node
+ * for later use.
+ */
+ fdw_private = lappend(fdw_private, makeString(fpstate->sql.data));
+
+ /*
+ * Create the ForeignScan node from target list, local filtering
+ * expressions, remote filtering expressions, and FDW private information.
+ *
+ * We remove expressions which are evaluated on remote side from qual of
+ * the scan node to avoid redundant filtering. Such filter reduction
+ * can be done only here, done after choosing best path, because
+ * baserestrictinfo in RelOptInfo is shared by all possible paths until
+ * best path is chosen.
+ */
+ return make_foreignscan(tlist,
+ local_exprs,
+ scan_relid,
+ fdw_exprs,
+ fdw_private);
+}
+
+/*
+ * postgresExplainForeignScan
+ * Produce extra output for EXPLAIN
+ */
+static void
+postgresExplainForeignScan(ForeignScanState *node, ExplainState *es)
+{
+ List *fdw_private;
+ char *sql;
+
+ fdw_private = ((ForeignScan *) node->ss.ps.plan)->fdw_private;
+ sql = strVal(list_nth(fdw_private, FdwPrivateSelectSql));
+ ExplainPropertyText("Remote SQL", sql, es);
+}
+
+/*
+ * postgresBeginForeignScan
+ * Initiate access to a foreign PostgreSQL table.
+ */
+static void
+postgresBeginForeignScan(ForeignScanState *node, int eflags)
+{
+ PostgresFdwExecutionState *festate;
+ PGconn *conn;
+ Oid relid;
+ ForeignTable *table;
+ ForeignServer *server;
+ UserMapping *user;
+ TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+ /*
+ * Do nothing in EXPLAIN (no ANALYZE) case. node->fdw_state stays NULL.
+ */
+ if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+ return;
+
+ /*
+ * Save state in node->fdw_state.
+ */
+ festate = (PostgresFdwExecutionState *)
+ palloc(sizeof(PostgresFdwExecutionState));
+ festate->fdw_private = ((ForeignScan *) node->ss.ps.plan)->fdw_private;
+
+ /*
+ * Create contexts for per-scan tuplestore under per-query context.
+ */
+ festate->scan_cxt = AllocSetContextCreate(node->ss.ps.state->es_query_cxt,
+ "postgres_fdw per-scan data",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ festate->temp_cxt = AllocSetContextCreate(node->ss.ps.state->es_query_cxt,
+ "postgres_fdw temporary data",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+
+ /*
+ * Get connection to the foreign server. Connection manager would
+ * establish new connection if necessary.
+ */
+ relid = RelationGetRelid(node->ss.ss_currentRelation);
+ table = GetForeignTable(relid);
+ server = GetForeignServer(table->serverid);
+ user = GetUserMapping(GetOuterUserId(), server->serverid);
+ conn = GetConnection(server, user, true);
+ festate->conn = conn;
+
+ /* Result will be filled in first Iterate call. */
+ festate->tuples = NULL;
+
+ /* Allocate buffers for column values. */
+ {
+ TupleDesc tupdesc = slot->tts_tupleDescriptor;
+ festate->values = palloc(sizeof(Datum) * tupdesc->natts);
+ festate->nulls = palloc(sizeof(bool) * tupdesc->natts);
+ festate->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+ }
+
+ /*
+ * Allocate buffers for query parameters.
+ *
+ * ParamListInfo might include entries for pseudo-parameter such as
+ * PL/pgSQL's FOUND variable, but we don't care that here, because wasted
+ * area seems not so large.
+ */
+ {
+ ParamListInfo params = node->ss.ps.state->es_param_list_info;
+ int numParams = params ? params->numParams : 0;
+
+ if (numParams > 0)
+ {
+ festate->param_types = palloc0(sizeof(Oid) * numParams);
+ festate->param_values = palloc0(sizeof(char *) * numParams);
+ }
+ else
+ {
+ festate->param_types = NULL;
+ festate->param_values = NULL;
+ }
+ }
+
+ /* Remember which foreign table we are scanning. */
+ festate->errpos.relid = relid;
+
+ /* Store FDW-specific state into ForeignScanState */
+ node->fdw_state = (void *) festate;
+
+ return;
+}
+
+/*
+ * postgresIterateForeignScan
+ * Retrieve next row from the result set, or clear tuple slot to indicate
+ * EOF.
+ *
+ * Note that using per-query context when retrieving tuples from
+ * tuplestore to ensure that returned tuples can survive until next
+ * iteration because the tuple is released implicitly via ExecClearTuple.
+ * Retrieving a tuple from tuplestore in CurrentMemoryContext (it's a
+ * per-tuple context), ExecClearTuple will free dangling pointer.
+ */
+static TupleTableSlot *
+postgresIterateForeignScan(ForeignScanState *node)
+{
+ PostgresFdwExecutionState *festate;
+ TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ festate = (PostgresFdwExecutionState *) node->fdw_state;
+
+ /*
+ * If this is the first call after Begin or ReScan, we need to execute
+ * remote query and get result set.
+ */
+ if (festate->tuples == NULL)
+ execute_query(node);
+
+ /*
+ * If tuples are still left in tuplestore, just return next tuple from it.
+ *
+ * It is necessary to switch to per-scan context to make returned tuple
+ * valid until next IterateForeignScan call, because it will be released
+ * with ExecClearTuple then. Otherwise, picked tuple is allocated in
+ * per-tuple context, and double-free of that tuple might happen.
+ *
+ * If we don't have any result in tuplestore, clear result slot to tell
+ * executor that this scan is over.
+ */
+ MemoryContextSwitchTo(festate->scan_cxt);
+ tuplestore_gettupleslot(festate->tuples, true, false, slot);
+ MemoryContextSwitchTo(oldcontext);
+
+ return slot;
+}
+
+/*
+ * postgresReScanForeignScan
+ * - Restart this scan by clearing old results and set re-execute flag.
+ */
+static void
+postgresReScanForeignScan(ForeignScanState *node)
+{
+ PostgresFdwExecutionState *festate;
+
+ festate = (PostgresFdwExecutionState *) node->fdw_state;
+
+ /* If we haven't have valid result yet, nothing to do. */
+ if (festate->tuples == NULL)
+ return;
+
+ /*
+ * Only rewind the current result set is enough.
+ */
+ tuplestore_rescan(festate->tuples);
+}
+
+/*
+ * postgresEndForeignScan
+ * Finish scanning foreign table and dispose objects used for this scan
+ */
+static void
+postgresEndForeignScan(ForeignScanState *node)
+{
+ PostgresFdwExecutionState *festate;
+
+ festate = (PostgresFdwExecutionState *) node->fdw_state;
+
+ /* if festate is NULL, we are in EXPLAIN; nothing to do */
+ if (festate == NULL)
+ return;
+
+ /*
+ * The connection which was used for this scan should be valid until the
+ * end of the scan to make the lifespan of remote transaction same as the
+ * local query.
+ */
+ ReleaseConnection(festate->conn);
+ festate->conn = NULL;
+
+ /* Discard fetch results */
+ if (festate->tuples != NULL)
+ {
+ tuplestore_end(festate->tuples);
+ festate->tuples = NULL;
+ }
+
+ /* MemoryContext will be deleted automatically. */
+}
+
+/*
+ * Estimate costs of executing given SQL statement.
+ */
+static void
+get_remote_estimate(const char *sql, PGconn *conn,
+ double *rows, int *width,
+ Cost *startup_cost, Cost *total_cost)
+{
+ PGresult *volatile res = NULL;
+ StringInfoData buf;
+ char *plan;
+ char *p;
+ int n;
+
+ /*
+ * Construct EXPLAIN statement with given SQL statement.
+ */
+ initStringInfo(&buf);
+ appendStringInfo(&buf, "EXPLAIN %s", sql);
+
+ /* PGresult must be released before leaving this function. */
+ PG_TRY();
+ {
+ res = PQexec(conn, buf.data);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK || PQntuples(res) == 0)
+ ereport(ERROR,
+ (errmsg("could not execute EXPLAIN for cost estimation"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+
+ /*
+ * Find estimation portion from top plan node. Here we search opening
+ * parentheses from the end of the line to avoid finding unexpected
+ * parentheses.
+ */
+ plan = PQgetvalue(res, 0, 0);
+ p = strrchr(plan, '(');
+ if (p == NULL)
+ elog(ERROR, "wrong EXPLAIN output: %s", plan);
+ n = sscanf(p,
+ "(cost=%lf..%lf rows=%lf width=%d)",
+ startup_cost, total_cost, rows, width);
+ if (n != 4)
+ elog(ERROR, "could not get estimation from EXPLAIN output");
+
+ PQclear(res);
+ res = NULL;
+ }
+ PG_CATCH();
+ {
+ PQclear(res);
+
+ /* Release connection and let connection manager cleanup. */
+ ReleaseConnection(conn);
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+}
+
+/*
+ * Execute remote query with current parameters.
+ */
+static void
+execute_query(ForeignScanState *node)
+{
+ PostgresFdwExecutionState *festate;
+ ParamListInfo params = node->ss.ps.state->es_param_list_info;
+ int numParams = params ? params->numParams : 0;
+ Oid *types = NULL;
+ const char **values = NULL;
+ char *sql;
+ PGconn *conn;
+ PGresult *volatile res = NULL;
+
+ festate = (PostgresFdwExecutionState *) node->fdw_state;
+ types = festate->param_types;
+ values = festate->param_values;
+
+ /*
+ * Construct parameter array in text format. We don't release memory for
+ * the arrays explicitly, because the memory usage would not be very large,
+ * and anyway they will be released in context cleanup.
+ *
+ * If this query is invoked from pl/pgsql function, we have extra entry
+ * for dummy variable FOUND in ParamListInfo, so we need to check type oid
+ * to exclude it from remote parameters.
+ */
+ if (numParams > 0)
+ {
+ int i;
+
+ for (i = 0; i < numParams; i++)
+ {
+ ParamExternData *prm = ¶ms->params[i];
+
+ /* give hook a chance in case parameter is dynamic */
+ if (!OidIsValid(prm->ptype) && params->paramFetch != NULL)
+ params->paramFetch(params, i + 1);
+
+ /*
+ * Get string representation of each parameter value by invoking
+ * type-specific output function unless the value is null or it's
+ * not used in the query.
+ */
+ types[i] = prm->ptype;
+ if (!prm->isnull && OidIsValid(types[i]))
+ {
+ Oid out_func_oid;
+ bool isvarlena;
+ FmgrInfo func;
+
+ getTypeOutputInfo(types[i], &out_func_oid, &isvarlena);
+ fmgr_info(out_func_oid, &func);
+ values[i] = OutputFunctionCall(&func, prm->value);
+ }
+ else
+ values[i] = NULL;
+
+ /*
+ * We use type "text" (groundless but seems most flexible) for
+ * unused (and type-unknown) parameters. We can't remove entry for
+ * unused parameter from the arrays, because parameter references
+ * in remote query ($n) have been indexed based on full length
+ * parameter list.
+ */
+ if (!OidIsValid(types[i]))
+ types[i] = TEXTOID;
+ }
+ }
+
+ conn = festate->conn;
+
+ /* PGresult must be released before leaving this function. */
+ PG_TRY();
+ {
+ bool first = true;
+
+ /*
+ * Execute remote query with parameters, and retrieve results with
+ * single-row-mode which returns results row by row.
+ */
+ sql = strVal(list_nth(festate->fdw_private, FdwPrivateSelectSql));
+ if (!PQsendQueryParams(conn, sql, numParams, types, values, NULL, NULL,
+ 0))
+ ereport(ERROR,
+ (errmsg("could not execute remote query"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+ if (!PQsetSingleRowMode(conn))
+ ereport(ERROR,
+ (errmsg("could not set single-row mode"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+
+ /* Retrieve result rows one by one, and store them into tuplestore. */
+ for (;;)
+ {
+ /* Allow users to cancel long query */
+ CHECK_FOR_INTERRUPTS();
+
+ res = PQgetResult(conn);
+ if (res == NULL)
+ break;
+
+ /* Store the result row into tuplestore */
+ if (PQresultStatus(res) == PGRES_SINGLE_TUPLE)
+ {
+ query_row_processor(res, node, first);
+ PQclear(res);
+ res = NULL;
+ first = false;
+ }
+ else if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ /*
+ * PGresult with PGRES_TUPLES_OK means EOF, so we need to
+ * initialize tuplestore if we have not retrieved any tuple.
+ */
+ if (first)
+ query_row_processor(res, node, first);
+ PQclear(res);
+ res = NULL;
+ first = true;
+ }
+ else
+ {
+ /* Something wrong happend, report the error. */
+ ereport(ERROR,
+ (errmsg("could not execute remote query"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+ }
+ }
+
+ /*
+ * We can't know whether the scan is over or not in custom row
+ * processor, so mark that the result is valid here.
+ */
+ tuplestore_donestoring(festate->tuples);
+
+ /* Discard result of SELECT statement. */
+ PQclear(res);
+ res = NULL;
+ }
+ PG_CATCH();
+ {
+ PQclear(res);
+
+ /* Release connection and let connection manager cleanup. */
+ ReleaseConnection(conn);
+
+ /* propagate error */
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+}
+
+/*
+ * Create tuples from PGresult and store them into tuplestore.
+ *
+ * Caller must use PG_TRY block to catch exception and release PGresult
+ * surely.
+ */
+static void
+query_row_processor(PGresult *res, ForeignScanState *node, bool first)
+{
+ int i;
+ int j;
+ int attnum; /* number of non-dropped columns */
+ TupleTableSlot *slot;
+ TupleDesc tupdesc;
+ Form_pg_attribute *attrs;
+ PostgresFdwExecutionState *festate;
+ AttInMetadata *attinmeta;
+ HeapTuple tuple;
+ ErrorContextCallback errcallback;
+ MemoryContext oldcontext;
+
+ /* Cache frequently used values */
+ slot = node->ss.ss_ScanTupleSlot;
+ tupdesc = slot->tts_tupleDescriptor;
+ attrs = tupdesc->attrs;
+ festate = (PostgresFdwExecutionState *) node->fdw_state;
+ attinmeta = festate->attinmeta;
+
+ if (first)
+ {
+ int nfields = PQnfields(res);
+
+ /* count non-dropped columns */
+ for (attnum = 0, i = 0; i < tupdesc->natts; i++)
+ if (!attrs[i]->attisdropped)
+ attnum++;
+
+ /* check result and tuple descriptor have the same number of columns */
+ if (attnum > 0 && attnum != nfields)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("remote query result rowtype does not match "
+ "the specified FROM clause rowtype"),
+ errdetail("expected %d, actual %d", attnum, nfields)));
+
+ /* First, ensure that the tuplestore is empty. */
+ if (festate->tuples == NULL)
+ {
+
+ /*
+ * Create tuplestore to store result of the query in per-query
+ * context. Note that we use this memory context to avoid memory
+ * leak in error cases.
+ */
+ oldcontext = MemoryContextSwitchTo(festate->scan_cxt);
+ festate->tuples = tuplestore_begin_heap(false, false, work_mem);
+ MemoryContextSwitchTo(oldcontext);
+ }
+ else
+ {
+ /* Clear old result just in case. */
+ tuplestore_clear(festate->tuples);
+ }
+
+ /* Do nothing for empty result */
+ if (PQntuples(res) == 0)
+ return;
+ }
+
+ /* Should have a single-row result if we get here */
+ Assert(PQntuples(res) == 1);
+
+ /*
+ * Do the following work in a temp context that we reset after each tuple.
+ * This cleans up not only the data we have direct access to, but any
+ * cruft the I/O functions might leak.
+ */
+ oldcontext = MemoryContextSwitchTo(festate->temp_cxt);
+
+ for (i = 0, j = 0; i < tupdesc->natts; i++)
+ {
+ /* skip dropped columns. */
+ if (attrs[i]->attisdropped)
+ {
+ festate->nulls[i] = true;
+ continue;
+ }
+
+ /*
+ * Set NULL indicator, and convert text representation to internal
+ * representation if any.
+ */
+ if (PQgetisnull(res, 0, j))
+ festate->nulls[i] = true;
+ else
+ {
+ Datum value;
+
+ festate->nulls[i] = false;
+
+ /*
+ * Set up and install callback to report where conversion error
+ * occurs.
+ */
+ festate->errpos.cur_attno = i + 1;
+ errcallback.callback = postgres_fdw_error_callback;
+ errcallback.arg = (void *) &festate->errpos;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ value = InputFunctionCall(&attinmeta->attinfuncs[i],
+ PQgetvalue(res, 0, j),
+ attinmeta->attioparams[i],
+ attinmeta->atttypmods[i]);
+ festate->values[i] = value;
+
+ /* Uninstall error context callback. */
+ error_context_stack = errcallback.previous;
+ }
+ j++;
+ }
+
+ /*
+ * Build the tuple and put it into the slot.
+ * We don't have to free the tuple explicitly because it's been
+ * allocated in the per-tuple context.
+ */
+ tuple = heap_form_tuple(tupdesc, festate->values, festate->nulls);
+ tuplestore_puttuple(festate->tuples, tuple);
+
+ /* Clean up */
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(festate->temp_cxt);
+
+ return;
+}
+
+/*
+ * Callback function which is called when error occurs during column value
+ * conversion. Print names of column and relation.
+ */
+static void
+postgres_fdw_error_callback(void *arg)
+{
+ ErrorPos *errpos = (ErrorPos *) arg;
+ const char *relname;
+ const char *colname;
+
+ relname = get_rel_name(errpos->relid);
+ colname = get_attname(errpos->relid, errpos->cur_attno);
+ errcontext("column %s of foreign table %s",
+ quote_identifier(colname), quote_identifier(relname));
+}
+
+/*
+ * postgresAnalyzeForeignTable
+ * Test whether analyzing this foreign table is supported
+ */
+static bool
+postgresAnalyzeForeignTable(Relation relation,
+ AcquireSampleRowsFunc *func,
+ BlockNumber *totalpages)
+{
+ *totalpages = 0;
+ *func = postgresAcquireSampleRowsFunc;
+
+ return true;
+}
+
+/*
+ * Acquire a random sample of rows from foreign table managed by postgres_fdw.
+ *
+ * postgres_fdw doesn't provide direct access to remote buffer, so we execute
+ * simple SELECT statement which retrieves whole rows from remote side, and
+ * pick some samples from them.
+ */
+static int
+postgresAcquireSampleRowsFunc(Relation relation, int elevel,
+ HeapTuple *rows, int targrows,
+ double *totalrows,
+ double *totaldeadrows)
+{
+ PostgresAnalyzeState astate;
+ StringInfoData sql;
+ ForeignTable *table;
+ ForeignServer *server;
+ UserMapping *user;
+ PGconn *conn = NULL;
+ PGresult *volatile res = NULL;
+
+ /*
+ * Only few information are necessary as input to row processor. Other
+ * initialization will be done at the first row processor call.
+ */
+ astate.anl_cxt = CurrentMemoryContext;
+ astate.temp_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "postgres_fdw analyze temporary data",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ astate.rows = rows;
+ astate.targrows = targrows;
+ astate.tupdesc = relation->rd_att;
+ astate.errpos.relid = relation->rd_id;
+
+ /*
+ * Construct SELECT statement which retrieves whole rows from remote. We
+ * can't avoid running sequential scan on remote side to get practical
+ * statistics, so this seems reasonable compromise.
+ */
+ initStringInfo(&sql);
+ deparseAnalyzeSql(&sql, relation);
+ elog(DEBUG3, "Analyze SQL: %s", sql.data);
+
+ table = GetForeignTable(relation->rd_id);
+ server = GetForeignServer(table->serverid);
+ user = GetUserMapping(GetOuterUserId(), server->serverid);
+ conn = GetConnection(server, user, true);
+
+ /*
+ * Acquire sample rows from the result set.
+ */
+ PG_TRY();
+ {
+ bool first = true;
+
+ /* Execute remote query and retrieve results row by row. */
+ if (!PQsendQuery(conn, sql.data))
+ ereport(ERROR,
+ (errmsg("could not execute remote query for analyze"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql.data)));
+ if (!PQsetSingleRowMode(conn))
+ ereport(ERROR,
+ (errmsg("could not set single-row mode"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql.data)));
+
+ /* Retrieve result rows one by one, and store them into tuplestore. */
+ for (;;)
+ {
+ /* Allow users to cancel long query */
+ CHECK_FOR_INTERRUPTS();
+
+ res = PQgetResult(conn);
+ if (res == NULL)
+ break;
+
+ /* Store the result row into tuplestore */
+ if (PQresultStatus(res) == PGRES_SINGLE_TUPLE)
+ {
+ analyze_row_processor(res, &astate, first);
+ PQclear(res);
+ res = NULL;
+ first = false;
+ }
+ else if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ /*
+ * PGresult with PGRES_TUPLES_OK means EOF, so we need to
+ * initialize tuplestore if we have not retrieved any tuple.
+ */
+ if (first && PQresultStatus(res) == PGRES_TUPLES_OK)
+ analyze_row_processor(res, &astate, first);
+
+ PQclear(res);
+ res = NULL;
+ first = true;
+ }
+ else
+ {
+ /* Something wrong happend, report the error. */
+ ereport(ERROR,
+ (errmsg("could not execute remote query for analyze"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql.data)));
+ }
+ }
+ }
+ PG_CATCH();
+ {
+ PQclear(res);
+
+ /* Release connection and let connection manager cleanup. */
+ ReleaseConnection(conn);
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ ReleaseConnection(conn);
+
+ /* We assume that we have no dead tuple. */
+ *totaldeadrows = 0.0;
+
+ /* We've retrieved all living tuples from foreign server. */
+ *totalrows = astate.samplerows;
+
+ /*
+ * We don't update pg_class.relpages because we don't care that in
+ * planning at all.
+ */
+
+ /*
+ * Emit some interesting relation info
+ */
+ ereport(elevel,
+ (errmsg("\"%s\": scanned with \"%s\", "
+ "containing %.0f live rows and %.0f dead rows; "
+ "%d rows in sample, %.0f estimated total rows",
+ RelationGetRelationName(relation), sql.data,
+ astate.samplerows, 0.0,
+ astate.numrows, astate.samplerows)));
+
+ return astate.numrows;
+}
+
+/*
+ * Custom row processor for acquire_sample_rows.
+ *
+ * Collect sample rows from the result of query.
+ * - Use all tuples as sample until target rows samples are collected.
+ * - Once reached the target, skip some tuples and replace already sampled
+ * tuple randomly.
+ */
+static void
+analyze_row_processor(PGresult *res, PostgresAnalyzeState *astate, bool first)
+{
+ int targrows = astate->targrows;
+ TupleDesc tupdesc = astate->tupdesc;
+ int i;
+ int j;
+ int pos; /* position where next sample should be stored. */
+ HeapTuple tuple;
+ ErrorContextCallback errcallback;
+ MemoryContext callercontext;
+
+ if (first)
+ {
+ /* Prepare for sampling rows */
+ astate->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+ astate->values = (Datum *) palloc(sizeof(Datum) * tupdesc->natts);
+ astate->nulls = (bool *) palloc(sizeof(bool) * tupdesc->natts);
+ astate->numrows = 0;
+ astate->samplerows = 0;
+ astate->rowstoskip = -1;
+ astate->numrows = 0;
+ astate->rstate = anl_init_selection_state(astate->targrows);
+
+ /* Do nothing for empty result */
+ if (PQntuples(res) == 0)
+ return;
+ }
+
+ /* Should have a single-row result if we get here */
+ Assert(PQntuples(res) == 1);
+
+ /*
+ * Do the following work in a temp context that we reset after each tuple.
+ * This cleans up not only the data we have direct access to, but any
+ * cruft the I/O functions might leak.
+ */
+ callercontext = MemoryContextSwitchTo(astate->temp_cxt);
+
+ /*
+ * First targrows rows are once sampled always. If we have more source
+ * rows, pick up some of them by skipping and replace already sampled
+ * tuple randomly.
+ *
+ * Here we just determine the slot where next sample should be stored. Set
+ * pos to negative value to indicates the row should be skipped.
+ */
+ if (astate->numrows < targrows)
+ pos = astate->numrows++;
+ else
+ {
+ /*
+ * The first targrows sample rows are simply copied into
+ * the reservoir. Then we start replacing tuples in the
+ * sample until we reach the end of the relation. This
+ * algorithm is from Jeff Vitter's paper, similarly to
+ * acquire_sample_rows in analyze.c.
+ *
+ * We don't have block-wise accessibility, so every row in
+ * the PGresult is possible to be sample.
+ */
+ if (astate->rowstoskip < 0)
+ astate->rowstoskip = anl_get_next_S(astate->samplerows, targrows,
+ &astate->rstate);
+
+ if (astate->rowstoskip <= 0)
+ {
+ int k = (int) (targrows * anl_random_fract());
+
+ Assert(k >= 0 && k < targrows);
+
+ /*
+ * Create sample tuple from the result, and replace at
+ * random.
+ */
+ heap_freetuple(astate->rows[k]);
+ pos = k;
+ }
+ else
+ pos = -1;
+
+ astate->rowstoskip -= 1;
+ }
+
+ /* Always increment sample row counter. */
+ astate->samplerows += 1;
+
+ if (pos >= 0)
+ {
+ AttInMetadata *attinmeta = astate->attinmeta;
+
+ /*
+ * Create sample tuple from current result row, and store it into the
+ * position determined above. Note that i and j point entries in
+ * catalog and columns array respectively.
+ */
+ for (i = 0, j = 0; i < tupdesc->natts; i++)
+ {
+ if (tupdesc->attrs[i]->attisdropped)
+ continue;
+
+ if (PQgetisnull(res, 0, j))
+ astate->nulls[i] = true;
+ else
+ {
+ Datum value;
+
+ astate->nulls[i] = false;
+
+ /*
+ * Set up and install callback to report where conversion error
+ * occurs.
+ */
+ astate->errpos.cur_attno = i + 1;
+ errcallback.callback = postgres_fdw_error_callback;
+ errcallback.arg = (void *) &astate->errpos;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ value = InputFunctionCall(&attinmeta->attinfuncs[i],
+ PQgetvalue(res, 0, j),
+ attinmeta->attioparams[i],
+ attinmeta->atttypmods[i]);
+ astate->values[i] = value;
+
+ /* Uninstall error callback function. */
+ error_context_stack = errcallback.previous;
+ }
+ j++;
+ }
+
+ /*
+ * Generate tuple from the result row data, and store it into the give
+ * buffer. Note that we need to allocate the tuple in the analyze
+ * context to make it valid even after temporary per-tuple context has
+ * been reset.
+ */
+ MemoryContextSwitchTo(astate->anl_cxt);
+ tuple = heap_form_tuple(tupdesc, astate->values, astate->nulls);
+ MemoryContextSwitchTo(astate->temp_cxt);
+ astate->rows[pos] = tuple;
+ }
+
+ /* Clean up */
+ MemoryContextSwitchTo(callercontext);
+ MemoryContextReset(astate->temp_cxt);
+
+ return;
+}
diff --git a/contrib/postgres_fdw/postgres_fdw.control b/contrib/postgres_fdw/postgres_fdw.control
new file mode 100644
index 0000000..f9ed490
--- /dev/null
+++ b/contrib/postgres_fdw/postgres_fdw.control
@@ -0,0 +1,5 @@
+# postgres_fdw extension
+comment = 'foreign-data wrapper for remote PostgreSQL servers'
+default_version = '1.0'
+module_pathname = '$libdir/postgres_fdw'
+relocatable = true
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
new file mode 100644
index 0000000..b5cefb8
--- /dev/null
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -0,0 +1,45 @@
+/*-------------------------------------------------------------------------
+ *
+ * postgres_fdw.h
+ * foreign-data wrapper for remote PostgreSQL servers.
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgres_fdw/postgres_fdw.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef POSTGRESQL_FDW_H
+#define POSTGRESQL_FDW_H
+
+#include "postgres.h"
+#include "foreign/foreign.h"
+#include "nodes/relation.h"
+#include "utils/relcache.h"
+
+/* in option.c */
+void InitPostgresFdwOptions(void);
+int ExtractConnectionOptions(List *defelems,
+ const char **keywords,
+ const char **values);
+int GetFetchCountOption(ForeignTable *table, ForeignServer *server);
+
+/* in deparse.c */
+void deparseSimpleSql(StringInfo buf,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ List *local_conds);
+void appendWhereClause(StringInfo buf,
+ bool has_where,
+ List *exprs,
+ PlannerInfo *root);
+void classifyConditions(PlannerInfo *root,
+ RelOptInfo *baserel,
+ List **remote_conds,
+ List **param_conds,
+ List **local_conds);
+void deparseAnalyzeSql(StringInfo buf, Relation rel);
+
+#endif /* POSTGRESQL_FDW_H */
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
new file mode 100644
index 0000000..66439ba
--- /dev/null
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -0,0 +1,312 @@
+-- ===================================================================
+-- create FDW objects
+-- ===================================================================
+
+-- Clean up in case a prior regression run failed
+
+-- Suppress NOTICE messages when roles don't exist
+SET client_min_messages TO 'error';
+
+DROP ROLE IF EXISTS postgres_fdw_user;
+
+RESET client_min_messages;
+
+CREATE ROLE postgres_fdw_user LOGIN SUPERUSER;
+SET SESSION AUTHORIZATION 'postgres_fdw_user';
+
+CREATE EXTENSION postgres_fdw;
+
+CREATE SERVER loopback1 FOREIGN DATA WRAPPER postgres_fdw;
+CREATE SERVER loopback2 FOREIGN DATA WRAPPER postgres_fdw
+ OPTIONS (dbname 'contrib_regression');
+
+CREATE USER MAPPING FOR public SERVER loopback1
+ OPTIONS (user 'value', password 'value');
+CREATE USER MAPPING FOR postgres_fdw_user SERVER loopback2;
+
+-- ===================================================================
+-- create objects used through FDW
+-- ===================================================================
+CREATE TYPE user_enum AS ENUM ('foo', 'bar', 'buz');
+CREATE SCHEMA "S 1";
+CREATE TABLE "S 1"."T 1" (
+ "C 1" int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum,
+ CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
+);
+CREATE TABLE "S 1"."T 2" (
+ c1 int NOT NULL,
+ c2 text,
+ CONSTRAINT t2_pkey PRIMARY KEY (c1)
+);
+
+BEGIN;
+TRUNCATE "S 1"."T 1";
+INSERT INTO "S 1"."T 1"
+ SELECT id,
+ id % 10,
+ to_char(id, 'FM00000'),
+ '1970-01-01'::timestamptz + ((id % 100) || ' days')::interval,
+ '1970-01-01'::timestamp + ((id % 100) || ' days')::interval,
+ id % 10,
+ id % 10,
+ 'foo'::user_enum
+ FROM generate_series(1, 1000) id;
+TRUNCATE "S 1"."T 2";
+INSERT INTO "S 1"."T 2"
+ SELECT id,
+ 'AAA' || to_char(id, 'FM000')
+ FROM generate_series(1, 100) id;
+COMMIT;
+
+-- ===================================================================
+-- create foreign tables
+-- ===================================================================
+CREATE FOREIGN TABLE ft1 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft1 DROP COLUMN c0;
+
+CREATE FOREIGN TABLE ft2 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft2 DROP COLUMN c0;
+
+-- ===================================================================
+-- tests for validator
+-- ===================================================================
+-- requiressl, krbsrvname and gsslib are omitted because they depend on
+-- configure option
+ALTER SERVER loopback1 OPTIONS (
+ use_remote_explain 'false',
+ fdw_startup_cost '123.456',
+ fdw_tuple_cost '0.123',
+ authtype 'value',
+ service 'value',
+ connect_timeout 'value',
+ dbname 'value',
+ host 'value',
+ hostaddr 'value',
+ port 'value',
+ --client_encoding 'value',
+ tty 'value',
+ options 'value',
+ application_name 'value',
+ --fallback_application_name 'value',
+ keepalives 'value',
+ keepalives_idle 'value',
+ keepalives_interval 'value',
+ -- requiressl 'value',
+ sslcompression 'value',
+ sslmode 'value',
+ sslcert 'value',
+ sslkey 'value',
+ sslrootcert 'value',
+ sslcrl 'value'
+ --requirepeer 'value',
+ -- krbsrvname 'value',
+ -- gsslib 'value',
+ --replication 'value'
+);
+ALTER USER MAPPING FOR public SERVER loopback1
+ OPTIONS (DROP user, DROP password);
+ALTER FOREIGN TABLE ft1 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft2 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+\dew+
+\des+
+\deu+
+\det+
+
+-- Use only Nested loop for stable results.
+SET enable_mergejoin TO off;
+SET enable_hashjoin TO off;
+
+-- ===================================================================
+-- simple queries
+-- ===================================================================
+-- single table, with/without alias
+EXPLAIN (COSTS false) SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+-- empty result
+SELECT * FROM ft1 WHERE false;
+-- with WHERE clause
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+-- aggregate
+SELECT COUNT(*) FROM ft1 t1;
+-- join two tables
+SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+-- subquery
+SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
+-- subquery+MAX
+SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY c1;
+-- used in CTE
+WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
+-- fixed values
+SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+-- user-defined operator/function
+CREATE FUNCTION postgres_fdw_abs(int) RETURNS int AS $$
+BEGIN
+RETURN abs($1);
+END
+$$ LANGUAGE plpgsql IMMUTABLE;
+CREATE OPERATOR === (
+ LEFTARG = int,
+ RIGHTARG = int,
+ PROCEDURE = int4eq,
+ COMMUTATOR = ===,
+ NEGATOR = !==
+);
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = postgres_fdw_abs(t1.c2);
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 === t1.c2;
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = abs(t1.c2);
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = t1.c2;
+
+-- ===================================================================
+-- WHERE push down
+-- ===================================================================
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1; -- Var, OpExpr(b), Const
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NULL; -- NullTest
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL; -- NullTest
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = -c1; -- OpExpr(l)
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE 1 = c1!; -- OpExpr(r)
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c1 = (ARRAY[c1,c2,3])[1]; -- ArrayRef
+EXPLAIN (COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo'; -- no push-down
+
+-- ===================================================================
+-- parameterized queries
+-- ===================================================================
+-- simple join
+PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
+EXPLAIN (COSTS false) EXECUTE st1(1, 2);
+EXECUTE st1(1, 1);
+EXECUTE st1(101, 101);
+-- subquery using stable function (can't be pushed down)
+PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c4) = 6) ORDER BY c1;
+EXPLAIN (COSTS false) EXECUTE st2(10, 20);
+EXECUTE st2(10, 20);
+EXECUTE st1(101, 101);
+-- subquery using immutable function (can be pushed down)
+PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c5) = 6) ORDER BY c1;
+EXPLAIN (COSTS false) EXECUTE st3(10, 20);
+EXECUTE st3(10, 20);
+EXECUTE st3(20, 30);
+-- custom plan should be chosen
+PREPARE st4(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 = $1;
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+EXPLAIN (COSTS false) EXECUTE st4(1);
+-- cleanup
+DEALLOCATE st1;
+DEALLOCATE st2;
+DEALLOCATE st3;
+DEALLOCATE st4;
+
+-- ===================================================================
+-- used in pl/pgsql function
+-- ===================================================================
+CREATE OR REPLACE FUNCTION f_test(p_c1 int) RETURNS int AS $$
+DECLARE
+ v_c1 int;
+BEGIN
+ SELECT c1 INTO v_c1 FROM ft1 WHERE c1 = p_c1 LIMIT 1;
+ PERFORM c1 FROM ft1 WHERE c1 = p_c1 AND p_c1 = v_c1 LIMIT 1;
+ RETURN v_c1;
+END;
+$$ LANGUAGE plpgsql;
+SELECT f_test(100);
+DROP FUNCTION f_test(int);
+
+-- ===================================================================
+-- cost estimation options
+-- ===================================================================
+ALTER SERVER loopback1 OPTIONS (SET use_remote_explain 'true');
+ALTER SERVER loopback1 OPTIONS (SET fdw_startup_cost '0');
+ALTER SERVER loopback1 OPTIONS (SET fdw_tuple_cost '0');
+EXPLAIN (COSTS false) SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+ALTER SERVER loopback1 OPTIONS (DROP use_remote_explain);
+ALTER SERVER loopback1 OPTIONS (DROP fdw_startup_cost);
+ALTER SERVER loopback1 OPTIONS (DROP fdw_tuple_cost);
+
+-- ===================================================================
+-- connection management
+-- ===================================================================
+SELECT srvname, usename FROM postgres_fdw_connections;
+SELECT postgres_fdw_disconnect(srvid, usesysid) FROM postgres_fdw_get_connections();
+SELECT srvname, usename FROM postgres_fdw_connections;
+
+-- ===================================================================
+-- conversion error
+-- ===================================================================
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE int;
+SELECT * FROM ft1 WHERE c1 = 1; -- ERROR
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE user_enum;
+
+-- ===================================================================
+-- subtransaction
+-- + local/remote error doesn't break cursor
+-- + remote error discards connection
+-- ===================================================================
+BEGIN;
+DECLARE c CURSOR FOR SELECT * FROM ft1 ORDER BY c1;
+FETCH c;
+SAVEPOINT s;
+ERROR OUT; -- ERROR
+ROLLBACK TO s;
+SELECT srvname FROM postgres_fdw_connections;
+FETCH c;
+SAVEPOINT s;
+SELECT * FROM ft1 WHERE 1 / (c1 - 1) > 0; -- ERROR
+ROLLBACK TO s;
+SELECT srvname FROM postgres_fdw_connections;
+FETCH c;
+SELECT * FROM ft1 ORDER BY c1 LIMIT 1;
+COMMIT;
+SELECT srvname FROM postgres_fdw_connections;
+ERROR OUT; -- ERROR
+SELECT srvname FROM postgres_fdw_connections;
+
+-- ===================================================================
+-- cleanup
+-- ===================================================================
+DROP OPERATOR === (int, int) CASCADE;
+DROP OPERATOR !== (int, int) CASCADE;
+DROP FUNCTION postgres_fdw_abs(int);
+DROP SCHEMA "S 1" CASCADE;
+DROP TYPE user_enum CASCADE;
+DROP EXTENSION postgres_fdw CASCADE;
+\c
+DROP ROLE postgres_fdw_user;
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 6b13a0a..39e9827 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -132,6 +132,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
&pgstatstatements;
&pgstattuple;
&pgtrgm;
+ &postgres-fdw;
&seg;
&sepgsql;
&contrib-spi;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index db4cc3a..354111a 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -133,6 +133,7 @@
<!ENTITY pgtesttiming SYSTEM "pgtesttiming.sgml">
<!ENTITY pgtrgm SYSTEM "pgtrgm.sgml">
<!ENTITY pgupgrade SYSTEM "pgupgrade.sgml">
+<!ENTITY postgres-fdw SYSTEM "postgres-fdw.sgml">
<!ENTITY seg SYSTEM "seg.sgml">
<!ENTITY contrib-spi SYSTEM "contrib-spi.sgml">
<!ENTITY sepgsql SYSTEM "sepgsql.sgml">
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
new file mode 100644
index 0000000..1f00665
--- /dev/null
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -0,0 +1,434 @@
+<!-- doc/src/sgml/postgres-fdw.sgml -->
+
+<sect1 id="postgres-fdw" xreflabel="postgres_fdw">
+ <title>postgres_fdw</title>
+
+ <indexterm zone="postgres-fdw">
+ <primary>postgres_fdw</primary>
+ </indexterm>
+
+ <para>
+ The <filename>postgres_fdw</filename> module provides a foreign-data
+ wrapper for external <productname>PostgreSQL</productname> servers.
+ With this module, users can access data stored in external
+ <productname>PostgreSQL</productname> via plain SQL statements.
+ </para>
+
+ <para>
+ Default wrapper <literal>postgres_fdw</literal> is created automatically
+ during <command>CREATE EXTENSION</command> command for
+ <application>postgres_fdw</application>, so what you need to do to execute
+ queries are:
+ <orderedlist spacing="compact">
+ <listitem>
+ <para>
+ Create foreign server with <command>CREATE SERVER</command> command for
+ each remote database you want to connect. You need to specify connection
+ information except <literal>user</literal> and <literal>password</literal>
+ on it.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Create user mapping for servers with
+ <command>CREATE USER MAPPING</command> command for each user you want to
+ allow accessing the foreign server. You need to specify
+ <literal>user</literal> and <literal>password</literal> on it.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Create foreign table with <command>CREATE FOREIGN TABLE</command> command
+ for each relation you want to access. If you want to use different name
+ from remote one, you need to specify object name options (see below).
+ </para>
+ <para>
+ It is recommended to use same data types as those of remote columns,
+ though libpq text protocol allows flexible conversions between similar
+ data types.
+ </para>
+ </listitem>
+ </orderedlist>
+ </para>
+
+ <sect2>
+ <title>FDW Options of postgres_fdw</title>
+
+ <sect3>
+ <title>Connection Options</title>
+ <para>
+ A foreign server and user mapping created using this wrapper can have
+ <application>libpq</> connection options, expect below:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ client_encoding (automatically determined from the local server encoding)
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ fallback_application_name (fixed to <literal>postgres_fdw</literal>)
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ replication (never used for foreign-data wrapper connection)
+ </para>
+ </listitem>
+ </itemizedlist>
+
+ For details of <application>libpq</> connection options, see
+ <xref linkend="libpq-connect">.
+ </para>
+ <para>
+ <literal>user</literal> and <literal>password</literal> can be
+ specified on user mappings, and others can be specified on foreign servers.
+ </para>
+ <para>
+ Note that only superusers may connect foreign servers without password
+ authentication, so specify <literal>password</literal> FDW option on
+ corresponding user mappings for non-superusers.
+ </para>
+ </sect3>
+
+ <sect3>
+ <title>Object Name Options</title>
+ <para>
+ Foreign tables which were created using this wrapper, or their columns can
+ have object name options. These options can be used to specify the names
+ used in SQL statement sent to remote <productname>PostgreSQL</productname>
+ server. These options are useful when remote objects have different names
+ from corresponding local ones.
+ </para>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>nspname</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a foreign table, is used as a
+ namespace (schema) reference in the SQL statement. If this options is
+ omitted, <literal>pg_class.nspname</literal> of the foreign table is
+ used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>relname</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a foreign table, is used as a
+ relation (table) reference in the SQL statement. If this options is
+ omitted, <literal>pg_class.relname</literal> of the foreign table is
+ used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>colname</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a column of a foreign table, is
+ used as a column (attribute) reference in the SQL statement. If this
+ option is omitted, <literal>pg_attribute.attname</literal> of the column
+ of the foreign table is used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ </sect3>
+
+ <sect3>
+ <title>Cost Estimation Options</title>
+ <para>
+ The <application>postgres_fdw</> retrieves foreign data by executing queries
+ against foreign servers, so foreign scans usually cost more than scans done
+ on local side. To reflect various circumstance of foreign servers,
+ <application>postgres_fdw</> provides some options:
+ </para>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>use_remote_estimate</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a foreign table or a foreign
+ server, is used to control <application>postgres_fdw</>'s behavior about
+ estimation of rows and width. If this was set to
+ <literal>true</literal>, remote <command>EXPLAIN</command> is
+ executed in the early step of planning. This would give better estimate
+ of rows and width, but it also introduces some overhead. This option
+ defaults to <literal>false</literal>.
+ </para>
+ <para>
+ The <application>postgres_fdw</> supports gathering statistics of
+ foreign data from foreign servers and store them on local side via
+ <command>ANALYZE</command>, so we can estimate reasonable rows and width
+ of result of a query from them. However, if target foreign table is
+ frequently updated, local statistics would be obsolete soon.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>fdw_startup_cost</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a foreign server, is used as
+ additional startup cost per a scan. If planner overestimates or
+ underestimates startup cost of a foreign scan, change this to reflect
+ the actual overhead.
+ </para>
+ <para>
+ Defaults to <literal>100</literal>. The default value is groundless,
+ but this would be enough to make most foreign scans to have more cost
+ than local scans, even that foreign scan returns nothing.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>fdw_tuple_cost</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a foreign server, is used as
+ additional cost per tuple, which reflects overhead of tuple
+ manipulation and transfer between servers. If a foreign server is far
+ or near in the network, or a foreign server has different performance
+ characteristics, use this option to tell planner that.
+ </para>
+ <para>
+ Defaults to <literal>0.01</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ </sect3>
+
+ </sect2>
+
+ <sect2>
+ <title>Connection Management</title>
+
+ <para>
+ The <application>postgres_fdw</application> establishes a connection to a
+ foreign server in the beginning of the first query which uses a foreign
+ table associated to the foreign server, and reuses the connection following
+ queries and even in following foreign scans in same query.
+
+ You can see the list of active connections via
+ <structname>postgres_fdw_connections</structname> view. It shows pairs of
+ oid and name of server and local role for each active connections
+ established by <application>postgres_fdw</application>. For security
+ reason, only superuser can see other role's connections.
+ </para>
+
+ <para>
+ Established connections are kept alive until local role changes or the
+ current transaction aborts or user requests so.
+ </para>
+
+ <para>
+ If role has been changed, active connections established as old local role
+ is kept alive but never be reused until local role has restored to original
+ role. This kind of situation happens with <command>SET ROLE</command> and
+ <command>SET SESSION AUTHORIZATION</command>.
+ </para>
+
+ <para>
+ If current transaction aborts by error or user request, all active
+ connections are disconnected automatically. This behavior avoids possible
+ connection leaks on error.
+ </para>
+
+ <para>
+ You can discard persistent connection at arbitrary timing with
+ <function>postgres_fdw_disconnect()</function>. It takes server oid and
+ user oid as arguments. This function can handle only connections
+ established in current session; connections established by other backends
+ are not reachable.
+ </para>
+
+ <para>
+ You can discard all active and visible connections in current session with
+ using <structname>postgres_fdw_connections</structname> and
+ <function>postgres_fdw_disconnect()</function> together:
+<synopsis>
+postgres=# SELECT postgres_fdw_disconnect(srvid, usesysid) FROM postgres_fdw_connections;
+ postgres_fdw_disconnect
+-------------------------
+ OK
+ OK
+(2 rows)
+</synopsis>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Transaction Management</title>
+ <para>
+ The <application>postgres_fdw</application> begins remote transaction at
+ the beginning of a local query, and terminates it with
+ <command>ABORT</command> at the end of the local query. This means that all
+ foreign scans on a foreign server in a local query are executed in one
+ transaction.
+ </para>
+ <para>
+ Isolation level of remote transaction is determined from local transaction's
+ isolation level.
+ <table id="postgres-fdw-isolation-level">
+ <title>Isolation Level Mapping</title>
+
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry>Local Isolation Level</entry>
+ <entry>Remote Isolation Level</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>READ UNCOMMITTED</entry>
+ <entry morerows="2">REPEATABLE READ</entry>
+ </row>
+ <row>
+ <entry>READ COMMITTED</entry>
+ </row>
+ <row>
+ <entry>REPEATABLE READ</entry>
+ </row>
+ <row>
+ <entry>SERIALIZABLE</entry>
+ <entry>SERIALIZABLE</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </para>
+ <para>
+ <literal>READ UNCOMMITTED</literal> and <literal>READ COMMITTED</literal>
+ are never used for remote transactions, because even
+ <literal>READ COMMITTED</literal> transactions might produce inconsistent
+ results, if remote data has been updated between two remote queries (it
+ can happen in a local query).
+ </para>
+ <para>
+ Note that even if the isolation level of local transaction was
+ <literal>SERIALIZABLE</literal> or <literal>REPEATABLE READ</literal>,
+ executing same query repeatedly might produce different result, because
+ foreign scans in different local queries are executed in different remote
+ transactions. For instance, if external data was update between two same
+ queries in a <literal>SERIALIZABLE</literal> local transaction, client
+ receives different results.
+ </para>
+ <para>
+ This restriction might be relaxed in future release.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Estimate Costs, Rows and Width</title>
+ <para>
+ The <application>postgres_fdw</application> estimates the costs of a
+ foreign scan in two ways. In either way, selectivity of restrictions are
+ concerned to give proper estimate.
+ </para>
+ <para>
+ If <literal>use_remote_estimate</literal> was set to
+ <literal>false</literal> (default behavior), <application>postgres_fdw</>
+ assumes that external data have not been changed so much, and uses local
+ statistics as-is. It is recommended to execute <command>ANALYZE</command>
+ to keep local statistics reflect characteristics of external data.
+ Otherwise, <application>postgres_fdw</> executes remote
+ <command>EXPLAIN</command> in the beginning of a foreign scan to get remote
+ estimate of the remote query. This would provide better estimate but
+ requires some overhead.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Remote Query Optimization</title>
+ <para>
+ The <application>postgres_fdw</> optimizes remote queries to reduce amount
+ of data transferred from foreign servers.
+ <itemizedlist>
+ <listitem>
+ <para>
+ Restrictions which have same semantics on remote side are pushed down.
+ For example, restrictions which contain elements below might have
+ different semantics on remote side.
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ User defined objects, such as functions, operators, and types.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Functions defined as <literal>STABLE</literal> or
+ <literal>VOLATILE</literal>, and operators which use such functions.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Collatable types, such as text or varchar, with some exception (see
+ below).
+ </para>
+ <para>
+ Basically we assume that collatable expressions have different
+ semantics, because remote server might has different collation
+ setting, but this assumption causes denying simple and usual
+ expressions, such as <literal>text_col = 'string'</literal> to be
+ pushed down. So <application>postgres_fdw</application> treats
+ operator <literal>=</literal> and <literal><></literal> as safe
+ to push down even if they take collatable types as arguments.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Unnecessary columns in <literal>SElECT</literal> clause of remote queries
+ are replaced with <literal>NULL</literal> literal.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>EXPLAIN Output</title>
+ <para>
+ For each foreign table using <literal>postgres_fdw</>, <command>EXPLAIN</>
+ shows a remote SQL statement which is sent to remote
+ <productname>PostgreSQL</productname> server for a ForeignScan plan node.
+ For example:
+ </para>
+<synopsis>
+postgres=# EXPLAIN SELECT aid FROM pgbench_accounts WHERE abalance < 0;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on pgbench_accounts (cost=100.00..100.11 rows=1 width=97)
+ Remote SQL: SELECT aid, bid, abalance, filler FROM public.pgbench_accounts WHERE ((abalance OPERATOR(pg_catalog.<) 0))
+(2 rows)
+</synopsis>
+ </sect2>
+
+ <sect2>
+ <title>Author</title>
+ <para>
+ Shigeru Hanada <email>shigeru.hanada@gmail.com</email>
+ </para>
+ </sect2>
+
+</sect1>
2012/11/21 Shigeru Hanada <shigeru.hanada@gmail.com>:
Thank for the comment!
On Tue, Nov 20, 2012 at 10:23 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
I also think the new "use_remote_explain" option is good. It works fine
when we try to use this fdw over the network with latency more or less.
It seems to me its default is "false", thus, GetForeignRelSize will return
the estimated cost according to ANALYZE, instead of remote EXPLAIN.
Even though you mention the default behavior was not changed, is it
an expected one? My preference is the current one, as is.Oops, I must have focused on only cost factors.
I too think that using local statistics is better as default behavior,
because foreign tables would be used for relatively stable tables.
If target tables are updated often, it would cause problems about
consistency, unless we provide full-fledged transaction mapping.The deparseFuncExpr() still has case handling whether it is explicit case,
implicit cast or regular functions. If my previous proposition has no
flaw,
could you fix up it using regular function invocation manner? In case when
remote node has incompatible implicit-cast definition, this logic can make
a problem.Sorry, I overlooked this issue. Fixed to use function call notation
for all of explicit function calls, explicit casts, and implicit casts.At InitPostgresFdwOptions(), the source comment says we don't use
malloc() here for simplification of code. Hmm. I'm not sure why it is more
simple. It seems to me we have no reason why to avoid malloc here, even
though libpq options are acquired using malloc().I used "simple" because using palloc avoids null-check and error handling.
However, many backend code use malloc to allocate memory which lives
as long as backend process itself, so I fixed.Regarding to the regression test.
[snip]
I guess this test tries to check a case when remote column has
incompatible
data type with local side. Please check timestamp_out(). Its output
format follows
"datestyle" setting of GUC, and it affects OS configuration on initdb
time.
(Please grep "datestyle" at initdb.c !) I'd like to recommend to use
another data type
for this regression test to avoid false-positive detection.Good catch. :)
I fixed the test to use another data type, user defined enum.
One other thing I noticed.
At execute_query(), it stores the retrieved rows onto tuplestore of
festate->tuples at once. Doesn't it make problems when remote-
table has very big number of rows?
IIRC, the previous code used cursor feature to fetch a set of rows
to avoid over-consumption of local memory. Do we have some
restriction if we fetch a certain number of rows with FETCH?
It seems to me, we can fetch 1000 rows for example, and tentatively
store them onto the tuplestore within one PG_TRY() block (so, no
need to worry about PQclear() timing), then we can fetch remote
rows again when IterateForeignScan reached end of tuplestore.
Please point out anything if I missed something.
Anyway, I'll check this v4 patch simultaneously.
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
On Wed, Nov 21, 2012 at 7:31 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
At execute_query(), it stores the retrieved rows onto tuplestore of
festate->tuples at once. Doesn't it make problems when remote-
table has very big number of rows?
No. postgres_fdw uses single-row processing mode of libpq when
retrieving query results in execute_query, so memory usage will
be stable at a certain level.
IIRC, the previous code used cursor feature to fetch a set of rows
to avoid over-consumption of local memory. Do we have some
restriction if we fetch a certain number of rows with FETCH?
It seems to me, we can fetch 1000 rows for example, and tentatively
store them onto the tuplestore within one PG_TRY() block (so, no
need to worry about PQclear() timing), then we can fetch remote
rows again when IterateForeignScan reached end of tuplestore.
As you say, postgres_fdw had used cursor to avoid possible memory
exhaust on large result set. I switched to single-row processing mode
(it could be said "protocol-level cursor"), which was added in 9.2,
because it accomplish same task with less SQL calls than cursor.
Regards,
--
Shigeru HANADA
After playing with some big SQLs for testing, I came to feel that
showing every remote query in EXPLAIN output is annoying, especially
when SELECT * is unfolded to long column list.
AFAIK no plan node shows so many information in a line, so I'm
inclined to make postgres_fdw to show it only when VERBOSE was
specified. This would make EXPLAIN output easy to read, even if many
foreign tables are used in a query.
Thoughts?
--
Shigeru HANADA
2012/11/22 Shigeru Hanada <shigeru.hanada@gmail.com>:
After playing with some big SQLs for testing, I came to feel that
showing every remote query in EXPLAIN output is annoying, especially
when SELECT * is unfolded to long column list.AFAIK no plan node shows so many information in a line, so I'm
inclined to make postgres_fdw to show it only when VERBOSE was
specified. This would make EXPLAIN output easy to read, even if many
foreign tables are used in a query.Thoughts?
Indeed, I also think it is reasonable solution.
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
Kohei KaiGai wrote:
2012/11/22 Shigeru Hanada <shigeru.hanada@gmail.com>:
After playing with some big SQLs for testing, I came to feel that
showing every remote query in EXPLAIN output is annoying, especially
when SELECT * is unfolded to long column list.AFAIK no plan node shows so many information in a line, so I'm
inclined to make postgres_fdw to show it only when VERBOSE was
specified. This would make EXPLAIN output easy to read, even if many
foreign tables are used in a query.Thoughts?
Indeed, I also think it is reasonable solution.
+1
That's the way I do it for oracle_fdw.
Yours,
Laurenz Albe
2012/11/21 Shigeru Hanada <shigeru.hanada@gmail.com>:
Thank for the comment!
On Tue, Nov 20, 2012 at 10:23 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
I also think the new "use_remote_explain" option is good. It works fine
when we try to use this fdw over the network with latency more or less.
It seems to me its default is "false", thus, GetForeignRelSize will return
the estimated cost according to ANALYZE, instead of remote EXPLAIN.
Even though you mention the default behavior was not changed, is it
an expected one? My preference is the current one, as is.Oops, I must have focused on only cost factors.
I too think that using local statistics is better as default behavior,
because foreign tables would be used for relatively stable tables.
If target tables are updated often, it would cause problems about
consistency, unless we provide full-fledged transaction mapping.The deparseFuncExpr() still has case handling whether it is explicit case,
implicit cast or regular functions. If my previous proposition has no
flaw,
could you fix up it using regular function invocation manner? In case when
remote node has incompatible implicit-cast definition, this logic can make
a problem.Sorry, I overlooked this issue. Fixed to use function call notation
for all of explicit function calls, explicit casts, and implicit casts.At InitPostgresFdwOptions(), the source comment says we don't use
malloc() here for simplification of code. Hmm. I'm not sure why it is more
simple. It seems to me we have no reason why to avoid malloc here, even
though libpq options are acquired using malloc().I used "simple" because using palloc avoids null-check and error handling.
However, many backend code use malloc to allocate memory which lives
as long as backend process itself, so I fixed.Regarding to the regression test.
[snip]
I guess this test tries to check a case when remote column has
incompatible
data type with local side. Please check timestamp_out(). Its output
format follows
"datestyle" setting of GUC, and it affects OS configuration on initdb
time.
(Please grep "datestyle" at initdb.c !) I'd like to recommend to use
another data type
for this regression test to avoid false-positive detection.Good catch. :)
I fixed the test to use another data type, user defined enum.
Hanada-san,
I checked the v4 patch, and I have nothing to comment anymore.
So, could you update the remaining EXPLAIN with VERBOSE option
stuff?
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
On Sun, Nov 25, 2012 at 5:24 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
I checked the v4 patch, and I have nothing to comment anymore.
So, could you update the remaining EXPLAIN with VERBOSE option
stuff?
Thanks for the review. Here is updated patch.
BTW, we have one more issue around naming of new FDW, and it is discussed
in another thread.
http://archives.postgresql.org/message-id/9E59E6E7-39C9-4AE9-88D6-BB0098579017@gmail.com
Please follow that thread for naming issue.
--
Shigeru HANADA
Attachments:
postgres_fdw.v5.patchapplication/octet-stream; name=postgres_fdw.v5.patchDownload
diff --git a/contrib/Makefile b/contrib/Makefile
index d230451..7c6009d 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -43,6 +43,7 @@ SUBDIRS = \
pgcrypto \
pgrowlocks \
pgstattuple \
+ postgres_fdw \
seg \
spi \
tablefunc \
diff --git a/contrib/postgres_fdw/.gitignore b/contrib/postgres_fdw/.gitignore
new file mode 100644
index 0000000..0854728
--- /dev/null
+++ b/contrib/postgres_fdw/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/results/
+*.o
+*.so
diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
new file mode 100644
index 0000000..8dac777
--- /dev/null
+++ b/contrib/postgres_fdw/Makefile
@@ -0,0 +1,22 @@
+# contrib/postgres_fdw/Makefile
+
+MODULE_big = postgres_fdw
+OBJS = postgres_fdw.o option.o deparse.o connection.o
+PG_CPPFLAGS = -I$(libpq_srcdir)
+SHLIB_LINK = $(libpq)
+
+EXTENSION = postgres_fdw
+DATA = postgres_fdw--1.0.sql
+
+REGRESS = postgres_fdw
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/postgres_fdw
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
new file mode 100644
index 0000000..eab8b87
--- /dev/null
+++ b/contrib/postgres_fdw/connection.c
@@ -0,0 +1,605 @@
+/*-------------------------------------------------------------------------
+ *
+ * connection.c
+ * Connection management for postgres_fdw
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgres_fdw/connection.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_type.h"
+#include "foreign/foreign.h"
+#include "funcapi.h"
+#include "libpq-fe.h"
+#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "utils/array.h"
+#include "utils/builtins.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
+#include "utils/resowner.h"
+#include "utils/tuplestore.h"
+
+#include "postgres_fdw.h"
+#include "connection.h"
+
+/* ============================================================================
+ * Connection management functions
+ * ==========================================================================*/
+
+/*
+ * Connection cache entry managed with hash table.
+ */
+typedef struct ConnCacheEntry
+{
+ /* hash key must be first */
+ Oid serverid; /* oid of foreign server */
+ Oid userid; /* oid of local user */
+
+ bool use_tx; /* true when using remote transaction */
+ int refs; /* reference counter */
+ PGconn *conn; /* foreign server connection */
+} ConnCacheEntry;
+
+/*
+ * Hash table which is used to cache connection to PostgreSQL servers, will be
+ * initialized before first attempt to connect PostgreSQL server by the backend.
+ */
+static HTAB *ConnectionHash;
+
+/* ----------------------------------------------------------------------------
+ * prototype of private functions
+ * --------------------------------------------------------------------------*/
+static void
+cleanup_connection(ResourceReleasePhase phase,
+ bool isCommit,
+ bool isTopLevel,
+ void *arg);
+static PGconn *connect_pg_server(ForeignServer *server, UserMapping *user);
+static void begin_remote_tx(PGconn *conn);
+static void abort_remote_tx(PGconn *conn);
+
+/*
+ * Get a PGconn which can be used to execute foreign query on the remote
+ * PostgreSQL server with the user's authorization. If this was the first
+ * request for the server, new connection is established.
+ *
+ * When use_tx is true, remote transaction is started if caller is the only
+ * user of the connection. Isolation level of the remote transaction is same
+ * as local transaction, and remote transaction will be aborted when last
+ * user release.
+ *
+ * TODO: Note that caching connections requires a mechanism to detect change of
+ * FDW object to invalidate already established connections.
+ */
+PGconn *
+GetConnection(ForeignServer *server, UserMapping *user, bool use_tx)
+{
+ bool found;
+ ConnCacheEntry *entry;
+ ConnCacheEntry key;
+
+ /* initialize connection cache if it isn't */
+ if (ConnectionHash == NULL)
+ {
+ HASHCTL ctl;
+
+ /* hash key is a pair of oids: serverid and userid */
+ MemSet(&ctl, 0, sizeof(ctl));
+ ctl.keysize = sizeof(Oid) + sizeof(Oid);
+ ctl.entrysize = sizeof(ConnCacheEntry);
+ ctl.hash = tag_hash;
+ ctl.match = memcmp;
+ ctl.keycopy = memcpy;
+ /* allocate ConnectionHash in the cache context */
+ ctl.hcxt = CacheMemoryContext;
+ ConnectionHash = hash_create("postgres_fdw connections", 32,
+ &ctl,
+ HASH_ELEM | HASH_CONTEXT |
+ HASH_FUNCTION | HASH_COMPARE |
+ HASH_KEYCOPY);
+
+ /*
+ * Register postgres_fdw's own cleanup function for connection
+ * cleanup. This should be done just once for each backend.
+ */
+ RegisterResourceReleaseCallback(cleanup_connection, ConnectionHash);
+ }
+
+ /* Create key value for the entry. */
+ MemSet(&key, 0, sizeof(key));
+ key.serverid = server->serverid;
+ key.userid = GetOuterUserId();
+
+ /*
+ * Find cached entry for requested connection. If we couldn't find,
+ * callback function of ResourceOwner should be registered to clean the
+ * connection up on error including user interrupt.
+ */
+ entry = hash_search(ConnectionHash, &key, HASH_ENTER, &found);
+ if (!found)
+ {
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ }
+
+ /*
+ * We don't check the health of cached connection here, because it would
+ * require some overhead. Broken connection and its cache entry will be
+ * cleaned up when the connection is actually used.
+ */
+
+ /*
+ * If cache entry doesn't have connection, we have to establish new
+ * connection.
+ */
+ if (entry->conn == NULL)
+ {
+ PGconn *volatile conn = NULL;
+
+ /*
+ * Use PG_TRY block to ensure closing connection on error.
+ */
+ PG_TRY();
+ {
+ /*
+ * Connect to the foreign PostgreSQL server, and store it in cache
+ * entry to keep new connection.
+ * Note: key items of entry has already been initialized in
+ * hash_search(HASH_ENTER).
+ */
+ conn = connect_pg_server(server, user);
+ }
+ PG_CATCH();
+ {
+ /* Clear connection cache entry on error case. */
+ PQfinish(entry->conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+ entry->conn = conn;
+ elog(DEBUG3, "new postgres_fdw connection %p for server %s",
+ entry->conn, server->servername);
+ }
+
+ /* Increase connection reference counter. */
+ entry->refs++;
+
+ /*
+ * If remote transaction is requested but it has not started, start remote
+ * transaction with the same isolation level as the local transaction we
+ * are in. We need to remember whether this connection uses remote
+ * transaction to abort it when this connection is released completely.
+ */
+ if (use_tx && !entry->use_tx)
+ {
+ begin_remote_tx(entry->conn);
+ entry->use_tx = use_tx;
+ }
+
+ return entry->conn;
+}
+
+/*
+ * For non-superusers, insist that the connstr specify a password. This
+ * prevents a password from being picked up from .pgpass, a service file,
+ * the environment, etc. We don't want the postgres user's passwords
+ * to be accessible to non-superusers.
+ */
+static void
+check_conn_params(const char **keywords, const char **values)
+{
+ int i;
+
+ /* no check required if superuser */
+ if (superuser())
+ return;
+
+ /* ok if params contain a non-empty password */
+ for (i = 0; keywords[i] != NULL; i++)
+ {
+ if (strcmp(keywords[i], "password") == 0 && values[i][0] != '\0')
+ return;
+ }
+
+ ereport(ERROR,
+ (errcode(ERRCODE_S_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED),
+ errmsg("password is required"),
+ errdetail("Non-superusers must provide a password in the connection string.")));
+}
+
+static PGconn *
+connect_pg_server(ForeignServer *server, UserMapping *user)
+{
+ const char *conname = server->servername;
+ PGconn *conn;
+ const char **all_keywords;
+ const char **all_values;
+ const char **keywords;
+ const char **values;
+ int n;
+ int i, j;
+
+ /*
+ * Construct connection params from generic options of ForeignServer and
+ * UserMapping. Those two object hold only libpq options.
+ * Extra 3 items are for:
+ * *) fallback_application_name
+ * *) client_encoding
+ * *) NULL termination (end marker)
+ *
+ * Note: We don't omit any parameters even target database might be older
+ * than local, because unexpected parameters are just ignored.
+ */
+ n = list_length(server->options) + list_length(user->options) + 3;
+ all_keywords = (const char **) palloc(sizeof(char *) * n);
+ all_values = (const char **) palloc(sizeof(char *) * n);
+ keywords = (const char **) palloc(sizeof(char *) * n);
+ values = (const char **) palloc(sizeof(char *) * n);
+ n = 0;
+ n += ExtractConnectionOptions(server->options,
+ all_keywords + n, all_values + n);
+ n += ExtractConnectionOptions(user->options,
+ all_keywords + n, all_values + n);
+ all_keywords[n] = all_values[n] = NULL;
+
+ for (i = 0, j = 0; all_keywords[i]; i++)
+ {
+ keywords[j] = all_keywords[i];
+ values[j] = all_values[i];
+ j++;
+ }
+
+ /* Use "postgres_fdw" as fallback_application_name. */
+ keywords[j] = "fallback_application_name";
+ values[j++] = "postgres_fdw";
+
+ /* Set client_encoding so that libpq can convert encoding properly. */
+ keywords[j] = "client_encoding";
+ values[j++] = GetDatabaseEncodingName();
+
+ keywords[j] = values[j] = NULL;
+ pfree(all_keywords);
+ pfree(all_values);
+
+ /* verify connection parameters and do connect */
+ check_conn_params(keywords, values);
+ conn = PQconnectdbParams(keywords, values, 0);
+ if (!conn || PQstatus(conn) != CONNECTION_OK)
+ ereport(ERROR,
+ (errcode(ERRCODE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION),
+ errmsg("could not connect to server \"%s\"", conname),
+ errdetail("%s", PQerrorMessage(conn))));
+ pfree(keywords);
+ pfree(values);
+
+ /*
+ * Check that non-superuser has used password to establish connection.
+ * This check logic is based on dblink_security_check() in contrib/dblink.
+ *
+ * XXX Should we check this even if we don't provide unsafe version like
+ * dblink_connect_u()?
+ */
+ if (!superuser() && !PQconnectionUsedPassword(conn))
+ {
+ PQfinish(conn);
+ ereport(ERROR,
+ (errcode(ERRCODE_S_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED),
+ errmsg("password is required"),
+ errdetail("Non-superuser cannot connect if the server does not request a password."),
+ errhint("Target server's authentication method must be changed.")));
+ }
+
+ return conn;
+}
+
+/*
+ * Start remote transaction with proper isolation level.
+ */
+static void
+begin_remote_tx(PGconn *conn)
+{
+ const char *sql = NULL; /* keep compiler quiet. */
+ PGresult *res;
+
+ switch (XactIsoLevel)
+ {
+ case XACT_READ_UNCOMMITTED:
+ case XACT_READ_COMMITTED:
+ case XACT_REPEATABLE_READ:
+ sql = "START TRANSACTION ISOLATION LEVEL REPEATABLE READ";
+ break;
+ case XACT_SERIALIZABLE:
+ sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
+ break;
+ default:
+ elog(ERROR, "unexpected isolation level: %d", XactIsoLevel);
+ break;
+ }
+
+ elog(DEBUG3, "starting remote transaction with \"%s\"", sql);
+
+ res = PQexec(conn, sql);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ PQclear(res);
+ elog(ERROR, "could not start transaction: %s", PQerrorMessage(conn));
+ }
+ PQclear(res);
+}
+
+static void
+abort_remote_tx(PGconn *conn)
+{
+ PGresult *res;
+
+ elog(DEBUG3, "aborting remote transaction");
+
+ res = PQexec(conn, "ABORT TRANSACTION");
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ PQclear(res);
+ elog(ERROR, "could not abort transaction: %s", PQerrorMessage(conn));
+ }
+ PQclear(res);
+}
+
+/*
+ * Mark the connection as "unused", and close it if the caller was the last
+ * user of the connection.
+ */
+void
+ReleaseConnection(PGconn *conn)
+{
+ HASH_SEQ_STATUS scan;
+ ConnCacheEntry *entry;
+
+ if (conn == NULL)
+ return;
+
+ /*
+ * We need to scan sequentially since we use the address to find
+ * appropriate PGconn from the hash table.
+ */
+ hash_seq_init(&scan, ConnectionHash);
+ while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
+ {
+ if (entry->conn == conn)
+ {
+ hash_seq_term(&scan);
+ break;
+ }
+ }
+
+ /*
+ * If the given connection is an orphan, it must be a dangling pointer to
+ * already released connection. Discarding connection due to remote query
+ * error would produce such situation (see comments below).
+ */
+ if (entry == NULL)
+ return;
+
+ /*
+ * If releasing connection is broken or its transaction has failed,
+ * discard the connection to recover from the error. PQfinish would cause
+ * dangling pointer of shared PGconn object, but they won't double-free'd
+ * because their pointer values don't match any of cached entry and ignored
+ * at the check above.
+ *
+ * Subsequent connection request via GetConnection will create new
+ * connection.
+ */
+ if (PQstatus(conn) != CONNECTION_OK ||
+ (PQtransactionStatus(conn) != PQTRANS_IDLE &&
+ PQtransactionStatus(conn) != PQTRANS_INTRANS))
+ {
+ elog(DEBUG3, "discarding connection: %s %s",
+ PQstatus(conn) == CONNECTION_OK ? "OK" : "NG",
+ PQtransactionStatus(conn) == PQTRANS_IDLE ? "IDLE" :
+ PQtransactionStatus(conn) == PQTRANS_ACTIVE ? "ACTIVE" :
+ PQtransactionStatus(conn) == PQTRANS_INTRANS ? "INTRANS" :
+ PQtransactionStatus(conn) == PQTRANS_INERROR ? "INERROR" :
+ "UNKNOWN");
+ PQfinish(conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ return;
+ }
+
+ /*
+ * Decrease reference counter of this connection. Even if the caller was
+ * the last referrer, we don't unregister it from cache.
+ */
+ entry->refs--;
+ if (entry->refs < 0)
+ entry->refs = 0; /* just in case */
+
+ /*
+ * If this connection uses remote transaction and there is no user other
+ * than the caller, abort the remote transaction and forget about it.
+ */
+ if (entry->use_tx && entry->refs == 0)
+ {
+ abort_remote_tx(conn);
+ entry->use_tx = false;
+ }
+}
+
+/*
+ * Clean the connection up via ResourceOwner.
+ */
+static void
+cleanup_connection(ResourceReleasePhase phase,
+ bool isCommit,
+ bool isTopLevel,
+ void *arg)
+{
+ HASH_SEQ_STATUS scan;
+ ConnCacheEntry *entry = (ConnCacheEntry *) arg;
+
+ /* If the transaction was committed, don't close connections. */
+ if (isCommit)
+ return;
+
+ /*
+ * We clean the connection up on post-lock because foreign connections are
+ * backend-internal resource.
+ */
+ if (phase != RESOURCE_RELEASE_AFTER_LOCKS)
+ return;
+
+ /*
+ * We ignore cleanup for ResourceOwners other than transaction. At this
+ * point, such a ResourceOwner is only Portal.
+ */
+ if (CurrentResourceOwner != CurTransactionResourceOwner)
+ return;
+
+ /*
+ * We don't need to clean up at end of subtransactions, because they might
+ * be recovered to consistent state with savepoints.
+ */
+ if (!isTopLevel)
+ return;
+
+ /*
+ * Here, it must be after abort of top level transaction. Disconnect all
+ * cached connections to clear error status out and reset their reference
+ * counters.
+ */
+ hash_seq_init(&scan, ConnectionHash);
+ while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
+ {
+ elog(DEBUG3, "discard postgres_fdw connection %p due to resowner cleanup",
+ entry->conn);
+ PQfinish(entry->conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+ }
+}
+
+/*
+ * Get list of connections currently active.
+ */
+Datum postgres_fdw_get_connections(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgres_fdw_get_connections);
+Datum
+postgres_fdw_get_connections(PG_FUNCTION_ARGS)
+{
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ HASH_SEQ_STATUS scan;
+ ConnCacheEntry *entry;
+ MemoryContext oldcontext = CurrentMemoryContext;
+ Tuplestorestate *tuplestore;
+ TupleDesc tupdesc;
+
+ /* We return list of connection with storing them in a Tuplestore. */
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = NULL;
+ rsinfo->setDesc = NULL;
+
+ /* Create tuplestore and copy of TupleDesc in per-query context. */
+ MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+
+ tupdesc = CreateTemplateTupleDesc(2, false);
+ TupleDescInitEntry(tupdesc, 1, "srvid", OIDOID, -1, 0);
+ TupleDescInitEntry(tupdesc, 2, "usesysid", OIDOID, -1, 0);
+ rsinfo->setDesc = tupdesc;
+
+ tuplestore = tuplestore_begin_heap(false, false, work_mem);
+ rsinfo->setResult = tuplestore;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * We need to scan sequentially since we use the address to find
+ * appropriate PGconn from the hash table.
+ */
+ if (ConnectionHash != NULL)
+ {
+ hash_seq_init(&scan, ConnectionHash);
+ while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
+ {
+ Datum values[2];
+ bool nulls[2];
+ HeapTuple tuple;
+
+ /* Ignore inactive connections */
+ if (PQstatus(entry->conn) != CONNECTION_OK)
+ continue;
+
+ /*
+ * Ignore other users' connections if current user isn't a
+ * superuser.
+ */
+ if (!superuser() && entry->userid != GetUserId())
+ continue;
+
+ values[0] = ObjectIdGetDatum(entry->serverid);
+ values[1] = ObjectIdGetDatum(entry->userid);
+ nulls[0] = false;
+ nulls[1] = false;
+
+ tuple = heap_formtuple(tupdesc, values, nulls);
+ tuplestore_puttuple(tuplestore, tuple);
+ }
+ }
+ tuplestore_donestoring(tuplestore);
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Discard persistent connection designated by given connection name.
+ */
+Datum postgres_fdw_disconnect(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgres_fdw_disconnect);
+Datum
+postgres_fdw_disconnect(PG_FUNCTION_ARGS)
+{
+ Oid serverid = PG_GETARG_OID(0);
+ Oid userid = PG_GETARG_OID(1);
+ ConnCacheEntry key;
+ ConnCacheEntry *entry = NULL;
+ bool found;
+
+ /* Non-superuser can't discard other users' connection. */
+ if (!superuser() && userid != GetOuterUserId())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
+ errmsg("only superuser can discard other user's connection")));
+
+ /*
+ * If no connection has been established, or no such connections, just
+ * return "NG" to indicate nothing has done.
+ */
+ if (ConnectionHash == NULL)
+ PG_RETURN_TEXT_P(cstring_to_text("NG"));
+
+ key.serverid = serverid;
+ key.userid = userid;
+ entry = hash_search(ConnectionHash, &key, HASH_FIND, &found);
+ if (!found)
+ PG_RETURN_TEXT_P(cstring_to_text("NG"));
+
+ /* Discard cached connection, and clear reference counter. */
+ PQfinish(entry->conn);
+ entry->use_tx = false;
+ entry->refs = 0;
+ entry->conn = NULL;
+
+ PG_RETURN_TEXT_P(cstring_to_text("OK"));
+}
diff --git a/contrib/postgres_fdw/connection.h b/contrib/postgres_fdw/connection.h
new file mode 100644
index 0000000..4c9d850
--- /dev/null
+++ b/contrib/postgres_fdw/connection.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * connection.h
+ * Connection management for postgres_fdw
+ *
+ * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgres_fdw/connection.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CONNECTION_H
+#define CONNECTION_H
+
+#include "foreign/foreign.h"
+#include "libpq-fe.h"
+
+/*
+ * Connection management
+ */
+PGconn *GetConnection(ForeignServer *server, UserMapping *user, bool use_tx);
+void ReleaseConnection(PGconn *conn);
+
+#endif /* CONNECTION_H */
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
new file mode 100644
index 0000000..69e6a3e
--- /dev/null
+++ b/contrib/postgres_fdw/deparse.c
@@ -0,0 +1,1192 @@
+/*-------------------------------------------------------------------------
+ *
+ * deparse.c
+ * query deparser for PostgreSQL
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgres_fdw/deparse.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/htup_details.h"
+#include "access/transam.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_type.h"
+#include "commands/defrem.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "nodes/nodeFuncs.h"
+#include "nodes/nodes.h"
+#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/var.h"
+#include "parser/parser.h"
+#include "parser/parsetree.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/syscache.h"
+
+#include "postgres_fdw.h"
+
+/*
+ * Context for walk-through the expression tree.
+ */
+typedef struct foreign_executable_cxt
+{
+ PlannerInfo *root;
+ RelOptInfo *foreignrel;
+ bool has_param;
+} foreign_executable_cxt;
+
+/*
+ * Get string representation which can be used in SQL statement from a node.
+ */
+static void deparseExpr(StringInfo buf, Expr *expr, PlannerInfo *root);
+static void deparseRelation(StringInfo buf, RangeTblEntry *rte);
+static void deparseVar(StringInfo buf, Var *node, PlannerInfo *root);
+static void deparseConst(StringInfo buf, Const *node, PlannerInfo *root);
+static void deparseBoolExpr(StringInfo buf, BoolExpr *node, PlannerInfo *root);
+static void deparseNullTest(StringInfo buf, NullTest *node, PlannerInfo *root);
+static void deparseDistinctExpr(StringInfo buf, DistinctExpr *node,
+ PlannerInfo *root);
+static void deparseRelabelType(StringInfo buf, RelabelType *node,
+ PlannerInfo *root);
+static void deparseFuncExpr(StringInfo buf, FuncExpr *node, PlannerInfo *root);
+static void deparseParam(StringInfo buf, Param *node, PlannerInfo *root);
+static void deparseScalarArrayOpExpr(StringInfo buf, ScalarArrayOpExpr *node,
+ PlannerInfo *root);
+static void deparseOpExpr(StringInfo buf, OpExpr *node, PlannerInfo *root);
+static void deparseArrayRef(StringInfo buf, ArrayRef *node, PlannerInfo *root);
+static void deparseArrayExpr(StringInfo buf, ArrayExpr *node, PlannerInfo *root);
+
+/*
+ * Determine whether an expression can be evaluated on remote side safely.
+ */
+static bool is_foreign_expr(PlannerInfo *root, RelOptInfo *baserel, Expr *expr,
+ bool *has_param);
+static bool foreign_expr_walker(Node *node, foreign_executable_cxt *context);
+static bool is_builtin(Oid procid);
+
+/*
+ * Deparse query representation into SQL statement which suits for remote
+ * PostgreSQL server. This function basically creates simple query string
+ * which consists of only SELECT, FROM clauses.
+ *
+ * Remote SELECT clause contains only columns which are used in targetlist or
+ * local_conds (conditions which can't be pushed down and will be checked on
+ * local side).
+ */
+void
+deparseSimpleSql(StringInfo buf,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ List *local_conds)
+{
+ RangeTblEntry *rte;
+ ListCell *lc;
+ StringInfoData foreign_relname;
+ bool first;
+ AttrNumber attr;
+ List *attr_used = NIL; /* List of AttNumber used in the query */
+
+ initStringInfo(buf);
+ initStringInfo(&foreign_relname);
+
+ /*
+ * First of all, determine which column should be retrieved for this scan.
+ *
+ * We do this before deparsing SELECT clause because attributes which are
+ * not used in neither reltargetlist nor baserel->baserestrictinfo, quals
+ * evaluated on local, can be replaced with literal "NULL" in the SELECT
+ * clause to reduce overhead of tuple handling tuple and data transfer.
+ */
+ foreach (lc, local_conds)
+ {
+ RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
+ List *attrs;
+
+ /*
+ * We need to know which attributes are used in qual evaluated
+ * on the local server, because they should be listed in the
+ * SELECT clause of remote query. We can ignore attributes
+ * which are referenced only in ORDER BY/GROUP BY clause because
+ * such attributes has already been kept in reltargetlist.
+ */
+ attrs = pull_var_clause((Node *) ri->clause,
+ PVC_RECURSE_AGGREGATES,
+ PVC_RECURSE_PLACEHOLDERS);
+ attr_used = list_union(attr_used, attrs);
+ }
+
+ /*
+ * deparse SELECT clause
+ *
+ * List attributes which are in either target list or local restriction.
+ * Unused attributes are replaced with a literal "NULL" for optimization.
+ *
+ * Note that nothing is added for dropped columns, though tuple constructor
+ * function requires entries for dropped columns. Such entries must be
+ * initialized with NULL before calling tuple constructor.
+ */
+ appendStringInfo(buf, "SELECT ");
+ rte = root->simple_rte_array[baserel->relid];
+ attr_used = list_union(attr_used, baserel->reltargetlist);
+ first = true;
+ for (attr = 1; attr <= baserel->max_attr; attr++)
+ {
+ Var *var = NULL;
+ ListCell *lc;
+
+ /* Ignore dropped attributes. */
+ if (get_rte_attribute_is_dropped(rte, attr))
+ continue;
+
+ if (!first)
+ appendStringInfo(buf, ", ");
+ first = false;
+
+ /*
+ * We use linear search here, but it wouldn't be problem since
+ * attr_used seems to not become so large.
+ */
+ foreach (lc, attr_used)
+ {
+ var = lfirst(lc);
+ if (var->varattno == attr)
+ break;
+ var = NULL;
+ }
+ if (var != NULL)
+ deparseVar(buf, var, root);
+ else
+ appendStringInfo(buf, "NULL");
+ }
+ appendStringInfoChar(buf, ' ');
+
+ /*
+ * deparse FROM clause, including alias if any
+ */
+ appendStringInfo(buf, "FROM ");
+ deparseRelation(buf, root->simple_rte_array[baserel->relid]);
+}
+
+/*
+ * Examine each element in the list baserestrictinfo of baserel, and classify
+ * them into three groups: remote_conds contains conditions which can be
+ * evaluated
+ * - remote_conds is push-down safe, and don't contain any Param node
+ * - param_conds is push-down safe, but contain some Param node
+ * - local_conds is not push-down safe
+ *
+ * Only remote_conds can be used in remote EXPLAIN, and remote_conds and
+ * param_conds can be used in final remote query.
+ */
+void
+classifyConditions(PlannerInfo *root,
+ RelOptInfo *baserel,
+ List **remote_conds,
+ List **param_conds,
+ List **local_conds)
+{
+ ListCell *lc;
+ bool has_param;
+
+ Assert(remote_conds);
+ Assert(param_conds);
+ Assert(local_conds);
+
+ foreach(lc, baserel->baserestrictinfo)
+ {
+ RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
+
+ if (is_foreign_expr(root, baserel, ri->clause, &has_param))
+ {
+ if (has_param)
+ *param_conds = lappend(*param_conds, ri);
+ else
+ *remote_conds = lappend(*remote_conds, ri);
+ }
+ else
+ *local_conds = lappend(*local_conds, ri);
+ }
+}
+
+/*
+ * Deparse SELECT statement to acquire sample rows of given relation into buf.
+ */
+void
+deparseAnalyzeSql(StringInfo buf, Relation rel)
+{
+ Oid relid = RelationGetRelid(rel);
+ TupleDesc tupdesc = RelationGetDescr(rel);
+ int i;
+ char *colname;
+ List *options;
+ ListCell *lc;
+ bool first = true;
+ char *nspname;
+ char *relname;
+ ForeignTable *table;
+
+ /* Deparse SELECT clause, use attribute name or colname option. */
+ appendStringInfo(buf, "SELECT ");
+ for (i = 0; i < tupdesc->natts; i++)
+ {
+ if (tupdesc->attrs[i]->attisdropped)
+ continue;
+
+ colname = NameStr(tupdesc->attrs[i]->attname);
+ options = GetForeignColumnOptions(relid, tupdesc->attrs[i]->attnum);
+
+ foreach(lc, options)
+ {
+ DefElem *def= (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "colname") == 0)
+ {
+ colname = defGetString(def);
+ break;
+ }
+ }
+
+ if (!first)
+ appendStringInfo(buf, ", ");
+ appendStringInfo(buf, "%s", quote_identifier(colname));
+ first = false;
+ }
+
+ /*
+ * Deparse FROM clause, use namespace and relation name, or use nspname and
+ * colname options respectively.
+ */
+ nspname = get_namespace_name(get_rel_namespace(relid));
+ relname = get_rel_name(relid);
+ table = GetForeignTable(relid);
+ foreach(lc, table->options)
+ {
+ DefElem *def= (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "nspname") == 0)
+ nspname = defGetString(def);
+ else if (strcmp(def->defname, "relname") == 0)
+ relname = defGetString(def);
+ }
+
+ appendStringInfo(buf, " FROM %s.%s", quote_identifier(nspname),
+ quote_identifier(relname));
+}
+
+/*
+ * Deparse given expression into buf. Actual string operation is delegated to
+ * node-type-specific functions.
+ *
+ * Note that switch statement of this function MUST match the one in
+ * foreign_expr_walker to avoid unsupported error..
+ */
+static void
+deparseExpr(StringInfo buf, Expr *node, PlannerInfo *root)
+{
+ /*
+ * This part must be match foreign_expr_walker.
+ */
+ switch (nodeTag(node))
+ {
+ case T_Const:
+ deparseConst(buf, (Const *) node, root);
+ break;
+ case T_BoolExpr:
+ deparseBoolExpr(buf, (BoolExpr *) node, root);
+ break;
+ case T_NullTest:
+ deparseNullTest(buf, (NullTest *) node, root);
+ break;
+ case T_DistinctExpr:
+ deparseDistinctExpr(buf, (DistinctExpr *) node, root);
+ break;
+ case T_RelabelType:
+ deparseRelabelType(buf, (RelabelType *) node, root);
+ break;
+ case T_FuncExpr:
+ deparseFuncExpr(buf, (FuncExpr *) node, root);
+ break;
+ case T_Param:
+ deparseParam(buf, (Param *) node, root);
+ break;
+ case T_ScalarArrayOpExpr:
+ deparseScalarArrayOpExpr(buf, (ScalarArrayOpExpr *) node, root);
+ break;
+ case T_OpExpr:
+ deparseOpExpr(buf, (OpExpr *) node, root);
+ break;
+ case T_Var:
+ deparseVar(buf, (Var *) node, root);
+ break;
+ case T_ArrayRef:
+ deparseArrayRef(buf, (ArrayRef *) node, root);
+ break;
+ case T_ArrayExpr:
+ deparseArrayExpr(buf, (ArrayExpr *) node, root);
+ break;
+ default:
+ {
+ ereport(ERROR,
+ (errmsg("unsupported expression for deparse"),
+ errdetail("%s", nodeToString(node))));
+ }
+ break;
+ }
+}
+
+/*
+ * Deparse given Var node into buf. If the column has colname FDW option, use
+ * its value instead of attribute name.
+ */
+static void
+deparseVar(StringInfo buf, Var *node, PlannerInfo *root)
+{
+ RangeTblEntry *rte;
+ char *colname = NULL;
+ const char *q_colname = NULL;
+ List *options;
+ ListCell *lc;
+
+ /* node must not be any of OUTER_VAR,INNER_VAR and INDEX_VAR. */
+ Assert(node->varno >= 1 && node->varno <= root->simple_rel_array_size);
+
+ /* Get RangeTblEntry from array in PlannerInfo. */
+ rte = root->simple_rte_array[node->varno];
+
+ /*
+ * If the node is a column of a foreign table, and it has colname FDW
+ * option, use its value.
+ */
+ options = GetForeignColumnOptions(rte->relid, node->varattno);
+ foreach(lc, options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "colname") == 0)
+ {
+ colname = defGetString(def);
+ break;
+ }
+ }
+
+ /*
+ * If the node refers a column of a regular table or it doesn't have colname
+ * FDW option, use attribute name.
+ */
+ if (colname == NULL)
+ colname = get_attname(rte->relid, node->varattno);
+
+ q_colname = quote_identifier(colname);
+ appendStringInfo(buf, "%s", q_colname);
+}
+
+/*
+ * Deparse a RangeTblEntry node into buf. If rte represents a foreign table,
+ * use value of relname FDW option (if any) instead of relation's name.
+ * Similarly, nspname FDW option overrides schema name.
+ */
+static void
+deparseRelation(StringInfo buf, RangeTblEntry *rte)
+{
+ ForeignTable *table;
+ ListCell *lc;
+ const char *nspname = NULL; /* plain namespace name */
+ const char *relname = NULL; /* plain relation name */
+ const char *q_nspname; /* quoted namespace name */
+ const char *q_relname; /* quoted relation name */
+
+ /* obtain additional catalog information. */
+ table = GetForeignTable(rte->relid);
+
+ /*
+ * Use value of FDW options if any, instead of the name of object
+ * itself.
+ */
+ foreach(lc, table->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "nspname") == 0)
+ nspname = defGetString(def);
+ else if (strcmp(def->defname, "relname") == 0)
+ relname = defGetString(def);
+ }
+
+ /* Quote each identifier, if necessary. */
+ if (nspname == NULL)
+ nspname = get_namespace_name(get_rel_namespace(rte->relid));
+ q_nspname = quote_identifier(nspname);
+
+ if (relname == NULL)
+ relname = get_rel_name(rte->relid);
+ q_relname = quote_identifier(relname);
+
+ /* Construct relation reference into the buffer. */
+ appendStringInfo(buf, "%s.%s", q_nspname, q_relname);
+}
+
+/*
+ * Deparse given constant value into buf. This function have to be kept in
+ * sync with get_const_expr.
+ */
+static void
+deparseConst(StringInfo buf,
+ Const *node,
+ PlannerInfo *root)
+{
+ Oid typoutput;
+ bool typIsVarlena;
+ char *extval;
+ bool isfloat = false;
+ bool needlabel;
+
+ if (node->constisnull)
+ {
+ appendStringInfo(buf, "NULL");
+ return;
+ }
+
+ getTypeOutputInfo(node->consttype,
+ &typoutput, &typIsVarlena);
+ extval = OidOutputFunctionCall(typoutput, node->constvalue);
+
+ switch (node->consttype)
+ {
+ case ANYARRAYOID:
+ case ANYNONARRAYOID:
+ elog(ERROR, "anyarray and anyenum are not supported");
+ break;
+ case INT2OID:
+ case INT4OID:
+ case INT8OID:
+ case OIDOID:
+ case FLOAT4OID:
+ case FLOAT8OID:
+ case NUMERICOID:
+ {
+ /*
+ * No need to quote unless they contain special values such as
+ * 'Nan'.
+ */
+ if (strspn(extval, "0123456789+-eE.") == strlen(extval))
+ {
+ if (extval[0] == '+' || extval[0] == '-')
+ appendStringInfo(buf, "(%s)", extval);
+ else
+ appendStringInfoString(buf, extval);
+ if (strcspn(extval, "eE.") != strlen(extval))
+ isfloat = true; /* it looks like a float */
+ }
+ else
+ appendStringInfo(buf, "'%s'", extval);
+ }
+ break;
+ case BITOID:
+ case VARBITOID:
+ appendStringInfo(buf, "B'%s'", extval);
+ break;
+ case BOOLOID:
+ if (strcmp(extval, "t") == 0)
+ appendStringInfoString(buf, "true");
+ else
+ appendStringInfoString(buf, "false");
+ break;
+
+ default:
+ {
+ const char *valptr;
+
+ appendStringInfoChar(buf, '\'');
+ for (valptr = extval; *valptr; valptr++)
+ {
+ char ch = *valptr;
+
+ /*
+ * standard_conforming_strings of remote session should be
+ * set to similar value as local session.
+ */
+ if (SQL_STR_DOUBLE(ch, !standard_conforming_strings))
+ appendStringInfoChar(buf, ch);
+ appendStringInfoChar(buf, ch);
+ }
+ appendStringInfoChar(buf, '\'');
+ }
+ break;
+ }
+
+ /*
+ * Append ::typename unless the constant will be implicitly typed as the
+ * right type when it is read in.
+ *
+ * XXX this code has to be kept in sync with the behavior of the parser,
+ * especially make_const.
+ */
+ switch (node->consttype)
+ {
+ case BOOLOID:
+ case INT4OID:
+ case UNKNOWNOID:
+ needlabel = false;
+ break;
+ case NUMERICOID:
+ needlabel = !isfloat || (node->consttypmod >= 0);
+ break;
+ default:
+ needlabel = true;
+ break;
+ }
+ if (needlabel)
+ {
+ appendStringInfo(buf, "::%s",
+ format_type_with_typemod(node->consttype,
+ node->consttypmod));
+ }
+}
+
+static void
+deparseBoolExpr(StringInfo buf,
+ BoolExpr *node,
+ PlannerInfo *root)
+{
+ ListCell *lc;
+ char *op = NULL; /* keep compiler quiet */
+ bool first;
+
+ switch (node->boolop)
+ {
+ case AND_EXPR:
+ op = "AND";
+ break;
+ case OR_EXPR:
+ op = "OR";
+ break;
+ case NOT_EXPR:
+ appendStringInfo(buf, "(NOT ");
+ deparseExpr(buf, list_nth(node->args, 0), root);
+ appendStringInfo(buf, ")");
+ return;
+ }
+
+ first = true;
+ appendStringInfo(buf, "(");
+ foreach(lc, node->args)
+ {
+ if (!first)
+ appendStringInfo(buf, " %s ", op);
+ deparseExpr(buf, (Expr *) lfirst(lc), root);
+ first = false;
+ }
+ appendStringInfo(buf, ")");
+}
+
+/*
+ * Deparse given IS [NOT] NULL test expression into buf.
+ */
+static void
+deparseNullTest(StringInfo buf,
+ NullTest *node,
+ PlannerInfo *root)
+{
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, node->arg, root);
+ if (node->nulltesttype == IS_NULL)
+ appendStringInfo(buf, " IS NULL)");
+ else
+ appendStringInfo(buf, " IS NOT NULL)");
+}
+
+static void
+deparseDistinctExpr(StringInfo buf,
+ DistinctExpr *node,
+ PlannerInfo *root)
+{
+ Assert(list_length(node->args) == 2);
+
+ deparseExpr(buf, linitial(node->args), root);
+ appendStringInfo(buf, " IS DISTINCT FROM ");
+ deparseExpr(buf, lsecond(node->args), root);
+}
+
+static void
+deparseRelabelType(StringInfo buf,
+ RelabelType *node,
+ PlannerInfo *root)
+{
+ char *typname;
+
+ Assert(node->arg);
+
+ /* We don't need to deparse cast when argument has same type as result. */
+ if (IsA(node->arg, Const) &&
+ ((Const *) node->arg)->consttype == node->resulttype &&
+ ((Const *) node->arg)->consttypmod == -1)
+ {
+ deparseExpr(buf, node->arg, root);
+ return;
+ }
+
+ typname = format_type_with_typemod(node->resulttype, node->resulttypmod);
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, node->arg, root);
+ appendStringInfo(buf, ")::%s", typname);
+}
+
+/*
+ * Deparse given node which represents a function call into buf. Here not only
+ * explicit function calls and explicit casts but also implicit casts are
+ * deparsed to avoid problem caused by different cast settings between local
+ * and remote.
+ *
+ * Function name (and type name) is always qualified by schema name to avoid
+ * problems caused by different setting of search_path on remote side.
+ */
+static void
+deparseFuncExpr(StringInfo buf,
+ FuncExpr *node,
+ PlannerInfo *root)
+{
+ Oid pronamespace;
+ const char *schemaname;
+ const char *funcname;
+ ListCell *arg;
+ bool first;
+
+ pronamespace = get_func_namespace(node->funcid);
+ schemaname = quote_identifier(get_namespace_name(pronamespace));
+ funcname = quote_identifier(get_func_name(node->funcid));
+
+ /*
+ * Deparse and all arguments recursively in parentheses after function
+ * name.
+ */
+ appendStringInfo(buf, "%s.%s(", schemaname, funcname);
+ first = true;
+ foreach(arg, node->args)
+ {
+ if (!first)
+ appendStringInfo(buf, ", ");
+ deparseExpr(buf, lfirst(arg), root);
+ first = false;
+ }
+ appendStringInfoChar(buf, ')');
+}
+
+/*
+ * Deparse given Param node into buf.
+ *
+ * We don't renumber parameter id, because skipping $1 is not cause problem
+ * as far as we pass through all arguments.
+ */
+static void
+deparseParam(StringInfo buf,
+ Param *node,
+ PlannerInfo *root)
+{
+ Assert(node->paramkind == PARAM_EXTERN);
+
+ appendStringInfo(buf, "$%d", node->paramid);
+}
+
+/*
+ * Deparse given ScalarArrayOpExpr expression into buf. To avoid problems
+ * around priority of operations, we always parenthesize the arguments. Also we
+ * use OPERATOR(schema.operator) notation to determine remote operator exactly.
+ */
+static void
+deparseScalarArrayOpExpr(StringInfo buf,
+ ScalarArrayOpExpr *node,
+ PlannerInfo *root)
+{
+ HeapTuple tuple;
+ Form_pg_operator form;
+ const char *opnspname;
+ char *opname;
+ Expr *arg1;
+ Expr *arg2;
+
+ /* Retrieve necessary information about the operator from system catalog. */
+ tuple = SearchSysCache1(OPEROID, ObjectIdGetDatum(node->opno));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for operator %u", node->opno);
+ form = (Form_pg_operator) GETSTRUCT(tuple);
+ /* opname is not a SQL identifier, so we don't need to quote it. */
+ opname = NameStr(form->oprname);
+ opnspname = quote_identifier(get_namespace_name(form->oprnamespace));
+ ReleaseSysCache(tuple);
+
+ /* Sanity check. */
+ Assert(list_length(node->args) == 2);
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, '(');
+
+ /* Extract operands. */
+ arg1 = linitial(node->args);
+ arg2 = lsecond(node->args);
+
+ /* Deparse fully qualified operator name. */
+ deparseExpr(buf, arg1, root);
+ appendStringInfo(buf, " OPERATOR(%s.%s) %s (",
+ opnspname, opname, node->useOr ? "ANY" : "ALL");
+ deparseExpr(buf, arg2, root);
+ appendStringInfoChar(buf, ')');
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, ')');
+}
+
+/*
+ * Deparse given operator expression into buf. To avoid problems around
+ * priority of operations, we always parenthesize the arguments. Also we use
+ * OPERATOR(schema.operator) notation to determine remote operator exactly.
+ */
+static void
+deparseOpExpr(StringInfo buf,
+ OpExpr *node,
+ PlannerInfo *root)
+{
+ HeapTuple tuple;
+ Form_pg_operator form;
+ const char *opnspname;
+ char *opname;
+ char oprkind;
+ ListCell *arg;
+
+ /* Retrieve necessary information about the operator from system catalog. */
+ tuple = SearchSysCache1(OPEROID, ObjectIdGetDatum(node->opno));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for operator %u", node->opno);
+ form = (Form_pg_operator) GETSTRUCT(tuple);
+ opnspname = quote_identifier(get_namespace_name(form->oprnamespace));
+ /* opname is not a SQL identifier, so we don't need to quote it. */
+ opname = NameStr(form->oprname);
+ oprkind = form->oprkind;
+ ReleaseSysCache(tuple);
+
+ /* Sanity check. */
+ Assert((oprkind == 'r' && list_length(node->args) == 1) ||
+ (oprkind == 'l' && list_length(node->args) == 1) ||
+ (oprkind == 'b' && list_length(node->args) == 2));
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, '(');
+
+ /* Deparse first operand. */
+ arg = list_head(node->args);
+ if (oprkind == 'r' || oprkind == 'b')
+ {
+ deparseExpr(buf, lfirst(arg), root);
+ appendStringInfoChar(buf, ' ');
+ }
+
+ /* Deparse fully qualified operator name. */
+ appendStringInfo(buf, "OPERATOR(%s.%s)", opnspname, opname);
+
+ /* Deparse last operand. */
+ arg = list_tail(node->args);
+ if (oprkind == 'l' || oprkind == 'b')
+ {
+ appendStringInfoChar(buf, ' ');
+ deparseExpr(buf, lfirst(arg), root);
+ }
+
+ appendStringInfoChar(buf, ')');
+}
+
+static void
+deparseArrayRef(StringInfo buf,
+ ArrayRef *node,
+ PlannerInfo *root)
+{
+ ListCell *lowlist_item;
+ ListCell *uplist_item;
+
+ /* Always parenthesize the expression. */
+ appendStringInfoChar(buf, '(');
+
+ /* Deparse referenced array expression first. */
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, node->refexpr, root);
+ appendStringInfoChar(buf, ')');
+
+ /* Deparse subscripts expression. */
+ lowlist_item = list_head(node->reflowerindexpr); /* could be NULL */
+ foreach(uplist_item, node->refupperindexpr)
+ {
+ appendStringInfoChar(buf, '[');
+ if (lowlist_item)
+ {
+ deparseExpr(buf, lfirst(lowlist_item), root);
+ appendStringInfoChar(buf, ':');
+ lowlist_item = lnext(lowlist_item);
+ }
+ deparseExpr(buf, lfirst(uplist_item), root);
+ appendStringInfoChar(buf, ']');
+ }
+
+ appendStringInfoChar(buf, ')');
+}
+
+
+/*
+ * Deparse given array of something into buf.
+ */
+static void
+deparseArrayExpr(StringInfo buf,
+ ArrayExpr *node,
+ PlannerInfo *root)
+{
+ ListCell *lc;
+ bool first = true;
+
+ appendStringInfo(buf, "ARRAY[");
+ foreach(lc, node->elements)
+ {
+ if (!first)
+ appendStringInfo(buf, ", ");
+ deparseExpr(buf, lfirst(lc), root);
+
+ first = false;
+ }
+ appendStringInfoChar(buf, ']');
+
+ /* If the array is empty, we need explicit cast to the array type. */
+ if (node->elements == NIL)
+ {
+ char *typname;
+
+ typname = format_type_with_typemod(node->array_typeid, -1);
+ appendStringInfo(buf, "::%s", typname);
+ }
+}
+
+/*
+ * Returns true if given expr is safe to evaluate on the foreign server. If
+ * result is true, extra information has_param tells whether given expression
+ * contains any Param node. This is useful to determine whether the expression
+ * can be used in remote EXPLAIN.
+ */
+static bool
+is_foreign_expr(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Expr *expr,
+ bool *has_param)
+{
+ foreign_executable_cxt context;
+ context.root = root;
+ context.foreignrel = baserel;
+ context.has_param = false;
+
+ /*
+ * An expression which includes any mutable function can't be pushed down
+ * because it's result is not stable. For example, pushing now() down to
+ * remote side would cause confusion from the clock offset.
+ * If we have routine mapping infrastructure in future release, we will be
+ * able to choose function to be pushed down in finer granularity.
+ */
+ if (contain_mutable_functions((Node *) expr))
+ {
+ elog(DEBUG3, "expr has mutable function");
+ return false;
+ }
+
+ /*
+ * Check that the expression consists of nodes which are known as safe to
+ * be pushed down.
+ */
+ if (foreign_expr_walker((Node *) expr, &context))
+ return false;
+
+ /*
+ * Tell caller whether the given expression contains any Param node, which
+ * can't be used in EXPLAIN statement before executor starts.
+ */
+ *has_param = context.has_param;
+
+ return true;
+}
+
+/*
+ * Return true if node includes any node which is not known as safe to be
+ * pushed down.
+ */
+static bool
+foreign_expr_walker(Node *node, foreign_executable_cxt *context)
+{
+ if (node == NULL)
+ return false;
+
+ /*
+ * Special case handling for List; expression_tree_walker handles List as
+ * well as other Expr nodes. For instance, List is used in RestrictInfo
+ * for args of FuncExpr node.
+ *
+ * Although the comments of expression_tree_walker mention that
+ * RangeTblRef, FromExpr, JoinExpr, and SetOperationStmt are handled as
+ * well, but we don't care them because they are not used in RestrictInfo.
+ * If one of them was passed into, default label catches it and give up
+ * traversing.
+ */
+ if (IsA(node, List))
+ {
+ ListCell *lc;
+
+ foreach(lc, (List *) node)
+ {
+ if (foreign_expr_walker(lfirst(lc), context))
+ return true;
+ }
+ return false;
+ }
+
+ /*
+ * If return type of given expression is not built-in, it can't be pushed
+ * down because it might has incompatible semantics on remote side.
+ */
+ if (!is_builtin(exprType(node)))
+ {
+ elog(DEBUG3, "expr has user-defined type");
+ return true;
+ }
+
+ switch (nodeTag(node))
+ {
+ case T_Const:
+ /*
+ * Using anyarray and/or anyenum in remote query is not supported.
+ */
+ if (((Const *) node)->consttype == ANYARRAYOID ||
+ ((Const *) node)->consttype == ANYNONARRAYOID)
+ {
+ elog(DEBUG3, "expr has anyarray or anyenum");
+ return true;
+ }
+ break;
+ case T_BoolExpr:
+ case T_NullTest:
+ case T_DistinctExpr:
+ case T_RelabelType:
+ /*
+ * These type of nodes are known as safe to be pushed down.
+ * Of course the subtree of the node, if any, should be checked
+ * continuously at the tail of this function.
+ */
+ break;
+ /*
+ * If function used by the expression is not built-in, it can't be
+ * pushed down because it might has incompatible semantics on remote
+ * side.
+ */
+ case T_FuncExpr:
+ {
+ FuncExpr *fe = (FuncExpr *) node;
+ if (!is_builtin(fe->funcid))
+ {
+ elog(DEBUG3, "expr has user-defined function");
+ return true;
+ }
+ }
+ break;
+ case T_Param:
+ /*
+ * Only external parameters can be pushed down.:
+ */
+ {
+ if (((Param *) node)->paramkind != PARAM_EXTERN)
+ {
+ elog(DEBUG3, "expr has non-external parameter");
+ return true;
+ }
+
+ /* Mark that this expression contains Param node. */
+ context->has_param = true;
+ }
+ break;
+ case T_ScalarArrayOpExpr:
+ /*
+ * Only built-in operators can be pushed down. In addition,
+ * underlying function must be built-in and immutable, but we don't
+ * check volatility here; such check must be done already with
+ * contain_mutable_functions.
+ */
+ {
+ ScalarArrayOpExpr *oe = (ScalarArrayOpExpr *) node;
+
+ if (!is_builtin(oe->opno) || !is_builtin(oe->opfuncid))
+ {
+ elog(DEBUG3, "expr has user-defined scalar-array operator");
+ return true;
+ }
+
+ /*
+ * If the operator takes collatable type as operands, we push
+ * down only "=" and "<>" which are not affected by collation.
+ * Other operators might be safe about collation, but these two
+ * seem enough to cover practical use cases.
+ */
+ if (exprInputCollation(node) != InvalidOid)
+ {
+ char *opname = get_opname(oe->opno);
+
+ if (strcmp(opname, "=") != 0 && strcmp(opname, "<>") != 0)
+ {
+ elog(DEBUG3, "expr has scalar-array operator which takes collatable as operand");
+ return true;
+ }
+ }
+
+ /* operands are checked later */
+ }
+ break;
+ case T_OpExpr:
+ /*
+ * Only built-in operators can be pushed down. In addition,
+ * underlying function must be built-in and immutable, but we don't
+ * check volatility here; such check must be done already with
+ * contain_mutable_functions.
+ */
+ {
+ OpExpr *oe = (OpExpr *) node;
+
+ if (!is_builtin(oe->opno) || !is_builtin(oe->opfuncid))
+ {
+ elog(DEBUG3, "expr has user-defined operator");
+ return true;
+ }
+
+ /*
+ * If the operator takes collatable type as operands, we push
+ * down only "=" and "<>" which are not affected by collation.
+ * Other operators might be safe about collation, but these two
+ * seem enough to cover practical use cases.
+ */
+ if (exprInputCollation(node) != InvalidOid)
+ {
+ char *opname = get_opname(oe->opno);
+
+ if (strcmp(opname, "=") != 0 && strcmp(opname, "<>") != 0)
+ {
+ elog(DEBUG3, "expr has operator which takes collatable as operand");
+ return true;
+ }
+ }
+
+ /* operands are checked later */
+ }
+ break;
+ case T_Var:
+ /*
+ * Var can be pushed down if it is in the foreign table.
+ * XXX Var of other relation can be here?
+ */
+ {
+ Var *var = (Var *) node;
+ foreign_executable_cxt *f_context;
+
+ f_context = (foreign_executable_cxt *) context;
+ if (var->varno != f_context->foreignrel->relid ||
+ var->varlevelsup != 0)
+ {
+ elog(DEBUG3, "expr has var of other relation");
+ return true;
+ }
+ }
+ break;
+ case T_ArrayRef:
+ /*
+ * ArrayRef which holds non-built-in typed elements can't be pushed
+ * down.
+ */
+ {
+ ArrayRef *ar = (ArrayRef *) node;;
+
+ if (!is_builtin(ar->refelemtype))
+ {
+ elog(DEBUG3, "expr has user-defined type as array element");
+ return true;
+ }
+
+ /* Assignment should not be in restrictions. */
+ if (ar->refassgnexpr != NULL)
+ {
+ elog(DEBUG3, "expr has assignment");
+ return true;
+ }
+ }
+ break;
+ case T_ArrayExpr:
+ /*
+ * ArrayExpr which holds non-built-in typed elements can't be pushed
+ * down.
+ */
+ {
+ if (!is_builtin(((ArrayExpr *) node)->element_typeid))
+ {
+ elog(DEBUG3, "expr has user-defined type as array element");
+ return true;
+ }
+ }
+ break;
+ default:
+ {
+ elog(DEBUG3, "expression is too complex: %s",
+ nodeToString(node));
+ return true;
+ }
+ break;
+ }
+
+ return expression_tree_walker(node, foreign_expr_walker, context);
+}
+
+/*
+ * Return true if given object is one of built-in objects.
+ */
+static bool
+is_builtin(Oid oid)
+{
+ return (oid < FirstNormalObjectId);
+}
+
+/*
+ * Deparse WHERE clause from given list of RestrictInfo and append them to buf.
+ * We assume that buf already holds a SQL statement which ends with valid WHERE
+ * clause.
+ *
+ * Only when calling the first time for a statement, is_first should be true.
+ */
+void
+appendWhereClause(StringInfo buf,
+ bool is_first,
+ List *exprs,
+ PlannerInfo *root)
+{
+ bool first = true;
+ ListCell *lc;
+
+ foreach(lc, exprs)
+ {
+ RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
+
+ /* Connect expressions with "AND" and parenthesize whole condition. */
+ if (is_first && first)
+ appendStringInfo(buf, " WHERE ");
+ else
+ appendStringInfo(buf, " AND ");
+
+ appendStringInfoChar(buf, '(');
+ deparseExpr(buf, ri->clause, root);
+ appendStringInfoChar(buf, ')');
+
+ first = false;
+ }
+}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
new file mode 100644
index 0000000..f81c727
--- /dev/null
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -0,0 +1,761 @@
+-- ===================================================================
+-- create FDW objects
+-- ===================================================================
+-- Clean up in case a prior regression run failed
+-- Suppress NOTICE messages when roles don't exist
+SET client_min_messages TO 'error';
+DROP ROLE IF EXISTS postgres_fdw_user;
+RESET client_min_messages;
+CREATE ROLE postgres_fdw_user LOGIN SUPERUSER;
+SET SESSION AUTHORIZATION 'postgres_fdw_user';
+CREATE EXTENSION postgres_fdw;
+CREATE SERVER loopback1 FOREIGN DATA WRAPPER postgres_fdw;
+CREATE SERVER loopback2 FOREIGN DATA WRAPPER postgres_fdw
+ OPTIONS (dbname 'contrib_regression');
+CREATE USER MAPPING FOR public SERVER loopback1
+ OPTIONS (user 'value', password 'value');
+CREATE USER MAPPING FOR postgres_fdw_user SERVER loopback2;
+-- ===================================================================
+-- create objects used through FDW
+-- ===================================================================
+CREATE TYPE user_enum AS ENUM ('foo', 'bar', 'buz');
+CREATE SCHEMA "S 1";
+CREATE TABLE "S 1"."T 1" (
+ "C 1" int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum,
+ CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
+);
+CREATE TABLE "S 1"."T 2" (
+ c1 int NOT NULL,
+ c2 text,
+ CONSTRAINT t2_pkey PRIMARY KEY (c1)
+);
+BEGIN;
+TRUNCATE "S 1"."T 1";
+INSERT INTO "S 1"."T 1"
+ SELECT id,
+ id % 10,
+ to_char(id, 'FM00000'),
+ '1970-01-01'::timestamptz + ((id % 100) || ' days')::interval,
+ '1970-01-01'::timestamp + ((id % 100) || ' days')::interval,
+ id % 10,
+ id % 10,
+ 'foo'::user_enum
+ FROM generate_series(1, 1000) id;
+TRUNCATE "S 1"."T 2";
+INSERT INTO "S 1"."T 2"
+ SELECT id,
+ 'AAA' || to_char(id, 'FM000')
+ FROM generate_series(1, 100) id;
+COMMIT;
+-- ===================================================================
+-- create foreign tables
+-- ===================================================================
+CREATE FOREIGN TABLE ft1 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft1 DROP COLUMN c0;
+CREATE FOREIGN TABLE ft2 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft2 DROP COLUMN c0;
+-- ===================================================================
+-- tests for validator
+-- ===================================================================
+-- requiressl, krbsrvname and gsslib are omitted because they depend on
+-- configure option
+ALTER SERVER loopback1 OPTIONS (
+ use_remote_explain 'false',
+ fdw_startup_cost '123.456',
+ fdw_tuple_cost '0.123',
+ authtype 'value',
+ service 'value',
+ connect_timeout 'value',
+ dbname 'value',
+ host 'value',
+ hostaddr 'value',
+ port 'value',
+ --client_encoding 'value',
+ tty 'value',
+ options 'value',
+ application_name 'value',
+ --fallback_application_name 'value',
+ keepalives 'value',
+ keepalives_idle 'value',
+ keepalives_interval 'value',
+ -- requiressl 'value',
+ sslcompression 'value',
+ sslmode 'value',
+ sslcert 'value',
+ sslkey 'value',
+ sslrootcert 'value',
+ sslcrl 'value'
+ --requirepeer 'value',
+ -- krbsrvname 'value',
+ -- gsslib 'value',
+ --replication 'value'
+);
+ALTER USER MAPPING FOR public SERVER loopback1
+ OPTIONS (DROP user, DROP password);
+ALTER FOREIGN TABLE ft1 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft2 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+\dew+
+ List of foreign-data wrappers
+ Name | Owner | Handler | Validator | Access privileges | FDW Options | Description
+--------------+-------------------+----------------------+------------------------+-------------------+-------------+-------------
+ postgres_fdw | postgres_fdw_user | postgres_fdw_handler | postgres_fdw_validator | | |
+(1 row)
+
+\des+
+ List of foreign servers
+ Name | Owner | Foreign-data wrapper | Access privileges | Type | Version | FDW Options | Description
+-----------+-------------------+----------------------+-------------------+------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
+ loopback1 | postgres_fdw_user | postgres_fdw | | | | (use_remote_explain 'false', fdw_startup_cost '123.456', fdw_tuple_cost '0.123', authtype 'value', service 'value', connect_timeout 'value', dbname 'value', host 'value', hostaddr 'value', port 'value', tty 'value', options 'value', application_name 'value', keepalives 'value', keepalives_idle 'value', keepalives_interval 'value', sslcompression 'value', sslmode 'value', sslcert 'value', sslkey 'value', sslrootcert 'value', sslcrl 'value') |
+ loopback2 | postgres_fdw_user | postgres_fdw | | | | (dbname 'contrib_regression') |
+(2 rows)
+
+\deu+
+ List of user mappings
+ Server | User name | FDW Options
+-----------+-------------------+-------------
+ loopback1 | public |
+ loopback2 | postgres_fdw_user |
+(2 rows)
+
+\det+
+ List of foreign tables
+ Schema | Table | Server | FDW Options | Description
+--------+-------+-----------+--------------------------------+-------------
+ public | ft1 | loopback2 | (nspname 'S 1', relname 'T 1') |
+ public | ft2 | loopback2 | (nspname 'S 1', relname 'T 1') |
+(2 rows)
+
+-- Use only Nested loop for stable results.
+SET enable_mergejoin TO off;
+SET enable_hashjoin TO off;
+-- ===================================================================
+-- simple queries
+-- ===================================================================
+-- single table, with/without alias
+EXPLAIN (COSTS false) SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+ QUERY PLAN
+---------------------------------
+ Limit
+ -> Sort
+ Sort Key: c3, c1
+ -> Foreign Scan on ft1
+(4 rows)
+
+SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+-----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+ 102 | 2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
+ 103 | 3 | 00103 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
+ 104 | 4 | 00104 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo
+ 105 | 5 | 00105 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5 | foo
+ 106 | 6 | 00106 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6 | foo
+ 107 | 7 | 00107 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7 | foo
+ 108 | 8 | 00108 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8 | foo
+ 109 | 9 | 00109 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9 | foo
+ 110 | 0 | 00110 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0 | foo
+(10 rows)
+
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Limit
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ -> Sort
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Sort Key: t1.c3, t1.c1
+ -> Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(8 rows)
+
+SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+-----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+ 102 | 2 | 00102 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
+ 103 | 3 | 00103 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
+ 104 | 4 | 00104 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo
+ 105 | 5 | 00105 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5 | foo
+ 106 | 6 | 00106 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6 | foo
+ 107 | 7 | 00107 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7 | foo
+ 108 | 8 | 00108 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8 | foo
+ 109 | 9 | 00109 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9 | foo
+ 110 | 0 | 00110 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0 | foo
+(10 rows)
+
+-- empty result
+SELECT * FROM ft1 WHERE false;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+----+----+----+----+----+----
+(0 rows)
+
+-- with WHERE clause
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Filter: (t1.c7 >= '1'::bpchar)
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 101)) AND (((c6)::text OPERATOR(pg_catalog.=) '1'::text))
+(4 rows)
+
+SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+-----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+(1 row)
+
+-- aggregate
+SELECT COUNT(*) FROM ft1 t1;
+ count
+-------
+ 1000
+(1 row)
+
+-- join two tables
+SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ c1
+-----
+ 101
+ 102
+ 103
+ 104
+ 105
+ 106
+ 107
+ 108
+ 109
+ 110
+(10 rows)
+
+-- subquery
+SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+ 2 | 2 | 00002 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
+ 3 | 3 | 00003 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
+ 4 | 4 | 00004 | Mon Jan 05 00:00:00 1970 PST | Mon Jan 05 00:00:00 1970 | 4 | 4 | foo
+ 5 | 5 | 00005 | Tue Jan 06 00:00:00 1970 PST | Tue Jan 06 00:00:00 1970 | 5 | 5 | foo
+ 6 | 6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6 | 6 | foo
+ 7 | 7 | 00007 | Thu Jan 08 00:00:00 1970 PST | Thu Jan 08 00:00:00 1970 | 7 | 7 | foo
+ 8 | 8 | 00008 | Fri Jan 09 00:00:00 1970 PST | Fri Jan 09 00:00:00 1970 | 8 | 8 | foo
+ 9 | 9 | 00009 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | 9 | foo
+ 10 | 0 | 00010 | Sun Jan 11 00:00:00 1970 PST | Sun Jan 11 00:00:00 1970 | 0 | 0 | foo
+(10 rows)
+
+-- subquery+MAX
+SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY c1;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+------+----+-------+------------------------------+--------------------------+----+------------+-----
+ 1000 | 0 | 01000 | Thu Jan 01 00:00:00 1970 PST | Thu Jan 01 00:00:00 1970 | 0 | 0 | foo
+(1 row)
+
+-- used in CTE
+WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
+ c1 | c2 | c3 | c4
+----+----+-------+------------------------------
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST
+ 2 | 2 | 00002 | Sat Jan 03 00:00:00 1970 PST
+ 3 | 3 | 00003 | Sun Jan 04 00:00:00 1970 PST
+ 4 | 4 | 00004 | Mon Jan 05 00:00:00 1970 PST
+ 5 | 5 | 00005 | Tue Jan 06 00:00:00 1970 PST
+ 6 | 6 | 00006 | Wed Jan 07 00:00:00 1970 PST
+ 7 | 7 | 00007 | Thu Jan 08 00:00:00 1970 PST
+ 8 | 8 | 00008 | Fri Jan 09 00:00:00 1970 PST
+ 9 | 9 | 00009 | Sat Jan 10 00:00:00 1970 PST
+ 10 | 0 | 00010 | Sun Jan 11 00:00:00 1970 PST
+(10 rows)
+
+-- fixed values
+SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+ ?column? | ?column?
+----------+----------
+ fixed |
+(1 row)
+
+-- user-defined operator/function
+CREATE FUNCTION postgres_fdw_abs(int) RETURNS int AS $$
+BEGIN
+RETURN abs($1);
+END
+$$ LANGUAGE plpgsql IMMUTABLE;
+CREATE OPERATOR === (
+ LEFTARG = int,
+ RIGHTARG = int,
+ PROCEDURE = int4eq,
+ COMMUTATOR = ===,
+ NEGATOR = !==
+);
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = postgres_fdw_abs(t1.c2);
+ QUERY PLAN
+-------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Filter: (t1.c1 = postgres_fdw_abs(t1.c2))
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(4 rows)
+
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 === t1.c2;
+ QUERY PLAN
+-------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Filter: (t1.c1 === t1.c2)
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(4 rows)
+
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = abs(t1.c2);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) pg_catalog.abs(c2)))
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = t1.c2;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) c2))
+(3 rows)
+
+-- ===================================================================
+-- WHERE push down
+-- ===================================================================
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1; -- Var, OpExpr(b), Const
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 100)) AND ((c2 OPERATOR(pg_catalog.=) 0))
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NULL; -- NullTest
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NULL))
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL; -- NullTest
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NOT NULL))
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((pg_catalog.round(pg_catalog."numeric"(pg_catalog.abs("C 1")), 0) OPERATOR(pg_catalog.=) 1::numeric))
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 = -c1; -- OpExpr(l)
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) (OPERATOR(pg_catalog.-) "C 1")))
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE 1 = c1!; -- OpExpr(r)
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((1::numeric OPERATOR(pg_catalog.=) (pg_catalog.int8("C 1") OPERATOR(pg_catalog.!))))
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NOT NULL) IS DISTINCT FROM ("C 1" IS NOT NULL))
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) ANY (ARRAY[c2, 1, ("C 1" OPERATOR(pg_catalog.+) 0)])))
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 = (ARRAY[c1,c2,3])[1]; -- ArrayRef
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) ((ARRAY["C 1", c2, 3])[1])))
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo'; -- no push-down
+ QUERY PLAN
+-------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Filter: (t1.c8 = 'foo'::user_enum)
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(4 rows)
+
+-- ===================================================================
+-- parameterized queries
+-- ===================================================================
+-- simple join
+PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
+EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Output: t1.c3, t2.c3
+ -> Foreign Scan on public.ft1 t1
+ Output: t1.c3
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+ -> Foreign Scan on public.ft2 t2
+ Output: t2.c3
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 2))
+(8 rows)
+
+EXECUTE st1(1, 1);
+ c3 | c3
+-------+-------
+ 00001 | 00001
+(1 row)
+
+EXECUTE st1(101, 101);
+ c3 | c3
+-------+-------
+ 00101 | 00101
+(1 row)
+
+-- subquery using stable function (can't be pushed down)
+PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c4) = 6) ORDER BY c1;
+EXPLAIN (VERBOSE, COSTS false) EXECUTE st2(10, 20);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
+ Sort Key: t1.c1
+ -> Nested Loop Semi Join
+ Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
+ Join Filter: (t1.c3 = t2.c3)
+ -> Foreign Scan on public.ft1 t1
+ Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.<) 20))
+ -> Materialize
+ Output: t2.c3
+ -> Foreign Scan on public.ft2 t2
+ Output: t2.c3
+ Filter: (date_part('dow'::text, t2.c4) = 6::double precision)
+ Remote SQL: SELECT NULL, NULL, c3, c4, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.>) 10))
+(15 rows)
+
+EXECUTE st2(10, 20);
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 16 | 6 | 00016 | Sat Jan 17 00:00:00 1970 PST | Sat Jan 17 00:00:00 1970 | 6 | 6 | foo
+(1 row)
+
+EXECUTE st1(101, 101);
+ c3 | c3
+-------+-------
+ 00101 | 00101
+(1 row)
+
+-- subquery using immutable function (can be pushed down)
+PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c5) = 6) ORDER BY c1;
+EXPLAIN (VERBOSE, COSTS false) EXECUTE st3(10, 20);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
+ Sort Key: t1.c1
+ -> Nested Loop Semi Join
+ Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
+ Join Filter: (t1.c3 = t2.c3)
+ -> Foreign Scan on public.ft1 t1
+ Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.<) 20))
+ -> Materialize
+ Output: t2.c3
+ -> Foreign Scan on public.ft2 t2
+ Output: t2.c3
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.>) 10)) AND ((pg_catalog.date_part('dow'::text, c5) OPERATOR(pg_catalog.=) 6::double precision))
+(14 rows)
+
+EXECUTE st3(10, 20);
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 16 | 6 | 00016 | Sat Jan 17 00:00:00 1970 PST | Sat Jan 17 00:00:00 1970 | 6 | 6 | foo
+(1 row)
+
+EXECUTE st3(20, 30);
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 23 | 3 | 00023 | Sat Jan 24 00:00:00 1970 PST | Sat Jan 24 00:00:00 1970 | 3 | 3 | foo
+(1 row)
+
+-- custom plan should be chosen
+PREPARE st4(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 = $1;
+EXPLAIN (VERBOSE, COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) 1))
+(3 rows)
+
+EXPLAIN (VERBOSE, COSTS false) EXECUTE st4(1);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on public.ft1 t1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" OPERATOR(pg_catalog.=) $1))
+(3 rows)
+
+-- cleanup
+DEALLOCATE st1;
+DEALLOCATE st2;
+DEALLOCATE st3;
+DEALLOCATE st4;
+-- ===================================================================
+-- used in pl/pgsql function
+-- ===================================================================
+CREATE OR REPLACE FUNCTION f_test(p_c1 int) RETURNS int AS $$
+DECLARE
+ v_c1 int;
+BEGIN
+ SELECT c1 INTO v_c1 FROM ft1 WHERE c1 = p_c1 LIMIT 1;
+ PERFORM c1 FROM ft1 WHERE c1 = p_c1 AND p_c1 = v_c1 LIMIT 1;
+ RETURN v_c1;
+END;
+$$ LANGUAGE plpgsql;
+SELECT f_test(100);
+ f_test
+--------
+ 100
+(1 row)
+
+DROP FUNCTION f_test(int);
+-- ===================================================================
+-- cost estimation options
+-- ===================================================================
+ALTER SERVER loopback1 OPTIONS (SET use_remote_explain 'true');
+ALTER SERVER loopback1 OPTIONS (SET fdw_startup_cost '0');
+ALTER SERVER loopback1 OPTIONS (SET fdw_tuple_cost '0');
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Limit
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ -> Sort
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Sort Key: ft1.c3, ft1.c1
+ -> Foreign Scan on public.ft1
+ Output: c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(8 rows)
+
+ALTER SERVER loopback1 OPTIONS (DROP use_remote_explain);
+ALTER SERVER loopback1 OPTIONS (DROP fdw_startup_cost);
+ALTER SERVER loopback1 OPTIONS (DROP fdw_tuple_cost);
+-- ===================================================================
+-- connection management
+-- ===================================================================
+SELECT srvname, usename FROM postgres_fdw_connections;
+ srvname | usename
+-----------+-------------------
+ loopback2 | postgres_fdw_user
+(1 row)
+
+SELECT postgres_fdw_disconnect(srvid, usesysid) FROM postgres_fdw_get_connections();
+ postgres_fdw_disconnect
+-------------------------
+ OK
+(1 row)
+
+SELECT srvname, usename FROM postgres_fdw_connections;
+ srvname | usename
+---------+---------
+(0 rows)
+
+-- ===================================================================
+-- conversion error
+-- ===================================================================
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE int;
+SELECT * FROM ft1 WHERE c1 = 1; -- ERROR
+ERROR: invalid input syntax for integer: "foo"
+CONTEXT: column c8 of foreign table ft1
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE user_enum;
+-- ===================================================================
+-- subtransaction
+-- + local/remote error doesn't break cursor
+-- + remote error discards connection
+-- ===================================================================
+BEGIN;
+DECLARE c CURSOR FOR SELECT * FROM ft1 ORDER BY c1;
+FETCH c;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+(1 row)
+
+SAVEPOINT s;
+ERROR OUT; -- ERROR
+ERROR: syntax error at or near "ERROR"
+LINE 1: ERROR OUT;
+ ^
+ROLLBACK TO s;
+SELECT srvname FROM postgres_fdw_connections;
+ srvname
+-----------
+ loopback2
+(1 row)
+
+FETCH c;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 2 | 2 | 00002 | Sat Jan 03 00:00:00 1970 PST | Sat Jan 03 00:00:00 1970 | 2 | 2 | foo
+(1 row)
+
+SAVEPOINT s;
+SELECT * FROM ft1 WHERE 1 / (c1 - 1) > 0; -- ERROR
+ERROR: could not execute remote query
+DETAIL: ERROR: division by zero
+
+HINT: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (((1 OPERATOR(pg_catalog./) ("C 1" OPERATOR(pg_catalog.-) 1)) OPERATOR(pg_catalog.>) 0))
+ROLLBACK TO s;
+SELECT srvname FROM postgres_fdw_connections;
+ srvname
+---------
+(0 rows)
+
+FETCH c;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 3 | 3 | 00003 | Sun Jan 04 00:00:00 1970 PST | Sun Jan 04 00:00:00 1970 | 3 | 3 | foo
+(1 row)
+
+SELECT * FROM ft1 ORDER BY c1 LIMIT 1;
+ c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
+----+----+-------+------------------------------+--------------------------+----+------------+-----
+ 1 | 1 | 00001 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo
+(1 row)
+
+COMMIT;
+SELECT srvname FROM postgres_fdw_connections;
+ srvname
+-----------
+ loopback2
+(1 row)
+
+ERROR OUT; -- ERROR
+ERROR: syntax error at or near "ERROR"
+LINE 1: ERROR OUT;
+ ^
+SELECT srvname FROM postgres_fdw_connections;
+ srvname
+---------
+(0 rows)
+
+-- ===================================================================
+-- cleanup
+-- ===================================================================
+DROP OPERATOR === (int, int) CASCADE;
+DROP OPERATOR !== (int, int) CASCADE;
+DROP FUNCTION postgres_fdw_abs(int);
+DROP SCHEMA "S 1" CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to table "S 1"."T 1"
+drop cascades to table "S 1"."T 2"
+DROP TYPE user_enum CASCADE;
+NOTICE: drop cascades to 2 other objects
+DETAIL: drop cascades to foreign table ft2 column c8
+drop cascades to foreign table ft1 column c8
+DROP EXTENSION postgres_fdw CASCADE;
+NOTICE: drop cascades to 6 other objects
+DETAIL: drop cascades to server loopback1
+drop cascades to user mapping for public
+drop cascades to server loopback2
+drop cascades to user mapping for postgres_fdw_user
+drop cascades to foreign table ft1
+drop cascades to foreign table ft2
+\c
+DROP ROLE postgres_fdw_user;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
new file mode 100644
index 0000000..3c127dc
--- /dev/null
+++ b/contrib/postgres_fdw/option.c
@@ -0,0 +1,291 @@
+/*-------------------------------------------------------------------------
+ *
+ * option.c
+ * FDW option handling
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgres_fdw/option.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "libpq-fe.h"
+
+#include "access/reloptions.h"
+#include "catalog/pg_foreign_data_wrapper.h"
+#include "catalog/pg_foreign_server.h"
+#include "catalog/pg_foreign_table.h"
+#include "catalog/pg_user_mapping.h"
+#include "commands/defrem.h"
+#include "fmgr.h"
+#include "foreign/foreign.h"
+#include "lib/stringinfo.h"
+#include "miscadmin.h"
+#include "utils/memutils.h"
+
+#include "postgres_fdw.h"
+
+/*
+ * SQL functions
+ */
+extern Datum postgres_fdw_validator(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgres_fdw_validator);
+
+/*
+ * Describes the valid options for objects that this wrapper uses.
+ */
+typedef struct PostgresFdwOption
+{
+ const char *keyword;
+ Oid optcontext; /* Oid of catalog in which options may appear */
+ bool is_libpq_opt; /* true if it's used in libpq */
+} PostgresFdwOption;
+
+/*
+ * Valid options for postgres_fdw.
+ * Allocated and filled in InitPostgresFdwOptions.
+ */
+static PostgresFdwOption *postgres_fdw_options;
+
+/*
+ * Valid options of libpq.
+ * Allocated and filled in InitPostgresFdwOptions.
+ */
+static PQconninfoOption *libpq_options;
+
+/*
+ * Helper functions
+ */
+static bool is_valid_option(const char *keyword, Oid context);
+
+/*
+ * Validate the generic options given to a FOREIGN DATA WRAPPER, SERVER,
+ * USER MAPPING or FOREIGN TABLE that uses postgres_fdw.
+ *
+ * Raise an ERROR if the option or its value is considered invalid.
+ */
+Datum
+postgres_fdw_validator(PG_FUNCTION_ARGS)
+{
+ List *options_list = untransformRelOptions(PG_GETARG_DATUM(0));
+ Oid catalog = PG_GETARG_OID(1);
+ ListCell *cell;
+
+ /*
+ * Check that only options supported by postgres_fdw, and allowed for the
+ * current object type, are given.
+ */
+ foreach(cell, options_list)
+ {
+ DefElem *def = (DefElem *) lfirst(cell);
+
+ if (!is_valid_option(def->defname, catalog))
+ {
+ PostgresFdwOption *opt;
+ StringInfoData buf;
+
+ /*
+ * Unknown option specified, complain about it. Provide a hint
+ * with list of valid options for the object.
+ */
+ initStringInfo(&buf);
+ for (opt = postgres_fdw_options; opt->keyword; opt++)
+ {
+ if (catalog == opt->optcontext)
+ appendStringInfo(&buf, "%s%s", (buf.len > 0) ? ", " : "",
+ opt->keyword);
+ }
+
+ ereport(ERROR,
+ (errcode(ERRCODE_FDW_INVALID_OPTION_NAME),
+ errmsg("invalid option \"%s\"", def->defname),
+ errhint("Valid options in this context are: %s",
+ buf.data)));
+ }
+
+ if (strcmp(def->defname, "use_remote_explain") == 0)
+ {
+ /* use_remote_explain accepts only boolean values */
+ (void) defGetBoolean(def);
+ }
+ else if (strcmp(def->defname, "fdw_startup_cost") == 0)
+ {
+ double val;
+ char *endp;
+ val = strtod(defGetString(def), &endp);
+ if (*endp || val < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("fdw_startup_cost requires positive numeric value or zero")));
+ }
+ else if (strcmp(def->defname, "fdw_tuple_cost") == 0)
+ {
+ double val;
+ char *endp;
+ val = strtod(defGetString(def), &endp);
+ if (*endp || val < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("fdw_tuple_cost requires positive numeric value or zero")));
+ }
+ }
+
+ /*
+ * We don't care option-specific limitation here; they will be validated at
+ * the execution time.
+ */
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * Initialize option check mechanism. This must be called before any call
+ * against other functions in options.c, so _PG_init would be proper timing.
+ */
+void
+InitPostgresFdwOptions(void)
+{
+ int libpq_opt_num;
+ PQconninfoOption *lopt;
+ PostgresFdwOption *popt;
+ /* non-libpq FDW-specific FDW options */
+ static const PostgresFdwOption non_libpq_options[] = {
+ { "nspname", ForeignTableRelationId, false} ,
+ { "relname", ForeignTableRelationId, false} ,
+ { "colname", AttributeRelationId, false} ,
+ /* use_remote_explain is available on both server and table */
+ { "use_remote_explain", ForeignServerRelationId, false} ,
+ { "use_remote_explain", ForeignTableRelationId, false} ,
+ /* cost factors */
+ { "fdw_startup_cost", ForeignServerRelationId, false} ,
+ { "fdw_tuple_cost", ForeignServerRelationId, false} ,
+ { NULL, InvalidOid, false },
+ };
+
+ /* Prevent redundant initialization. */
+ if (postgres_fdw_options)
+ return;
+
+ /*
+ * Get list of valid libpq options.
+ *
+ * To avoid unnecessary work, we get the list once and use it throughout
+ * the lifetime of this backend process. We don't need to care about
+ * memory context issues, because PQconndefaults allocates with malloc.
+ */
+ libpq_options = PQconndefaults();
+ if (!libpq_options) /* assume reason for failure is OOM */
+ ereport(ERROR,
+ (errcode(ERRCODE_FDW_OUT_OF_MEMORY),
+ errmsg("out of memory"),
+ errdetail("could not get libpq's default connection options")));
+
+ /* Count how much libpq options are available. */
+ libpq_opt_num = 0;
+ for (lopt = libpq_options; lopt->keyword; lopt++)
+ libpq_opt_num++;
+
+ /*
+ * Construct an array which consists of all valid options for postgres_fdw,
+ * by appending FDW-specific options to libpq options.
+ *
+ * We use plain malloc here to allocate postgres_fdw_options because it
+ * lives as long as the backend process does. Besides, keeping
+ * libpq_options in memory allows us to avoid copying every keyword string.
+ */
+ postgres_fdw_options = (PostgresFdwOption *)
+ malloc(sizeof(PostgresFdwOption) * libpq_opt_num +
+ sizeof(non_libpq_options));
+ if (postgres_fdw_options == NULL)
+ elog(ERROR, "out of memory");
+ popt = postgres_fdw_options;
+ for (lopt = libpq_options; lopt->keyword; lopt++)
+ {
+ /* Disallow some debug options. */
+ if (strcmp(lopt->keyword, "replication") == 0 ||
+ strcmp(lopt->keyword, "fallback_application_name") == 0 ||
+ strcmp(lopt->keyword, "client_encoding") == 0)
+ continue;
+
+ /* We don't have to copy keyword string, as described above. */
+ popt->keyword = lopt->keyword;
+
+ /* "user" and any secret options are allowed on only user mappings. */
+ if (strcmp(lopt->keyword, "user") == 0 || strchr(lopt->dispchar, '*'))
+ popt->optcontext = UserMappingRelationId;
+ else
+ popt->optcontext = ForeignServerRelationId;
+ popt->is_libpq_opt = true;
+
+ /* Advance the position where next option will be placed. */
+ popt++;
+ }
+
+ /* Append FDW-specific options. */
+ memcpy(popt, non_libpq_options, sizeof(non_libpq_options));
+}
+
+/*
+ * Check whether the given option is one of the valid postgres_fdw options.
+ * context is the Oid of the catalog holding the object the option is for.
+ */
+static bool
+is_valid_option(const char *keyword, Oid context)
+{
+ PostgresFdwOption *opt;
+
+ for (opt = postgres_fdw_options; opt->keyword; opt++)
+ {
+ if (context == opt->optcontext && strcmp(opt->keyword, keyword) == 0)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Check whether the given option is one of the valid libpq options.
+ * context is the Oid of the catalog holding the object the option is for.
+ */
+static bool
+is_libpq_option(const char *keyword)
+{
+ PostgresFdwOption *opt;
+
+ for (opt = postgres_fdw_options; opt->keyword; opt++)
+ {
+ if (opt->is_libpq_opt && strcmp(opt->keyword, keyword) == 0)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Generate key-value arrays which includes only libpq options from the list
+ * which contains any kind of options.
+ */
+int
+ExtractConnectionOptions(List *defelems, const char **keywords,
+ const char **values)
+{
+ ListCell *lc;
+ int i;
+
+ i = 0;
+ foreach(lc, defelems)
+ {
+ DefElem *d = (DefElem *) lfirst(lc);
+ if (is_libpq_option(d->defname))
+ {
+ keywords[i] = d->defname;
+ values[i] = defGetString(d);
+ i++;
+ }
+ }
+ return i;
+}
+
diff --git a/contrib/postgres_fdw/postgres_fdw--1.0.sql b/contrib/postgres_fdw/postgres_fdw--1.0.sql
new file mode 100644
index 0000000..56b39b9
--- /dev/null
+++ b/contrib/postgres_fdw/postgres_fdw--1.0.sql
@@ -0,0 +1,39 @@
+/* contrib/postgres_fdw/postgres_fdw--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION postgres_fdw" to load this file. \quit
+
+CREATE FUNCTION postgres_fdw_handler()
+RETURNS fdw_handler
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION postgres_fdw_validator(text[], oid)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FOREIGN DATA WRAPPER postgres_fdw
+ HANDLER postgres_fdw_handler
+ VALIDATOR postgres_fdw_validator;
+
+/* connection management functions and view */
+CREATE FUNCTION postgres_fdw_get_connections(out srvid oid, out usesysid oid)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION postgres_fdw_disconnect(oid, oid)
+RETURNS text
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
+CREATE VIEW postgres_fdw_connections AS
+SELECT c.srvid srvid,
+ s.srvname srvname,
+ c.usesysid usesysid,
+ pg_get_userbyid(c.usesysid) usename
+ FROM postgres_fdw_get_connections() c
+ JOIN pg_catalog.pg_foreign_server s ON (s.oid = c.srvid);
+GRANT SELECT ON postgres_fdw_connections TO public;
+
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
new file mode 100644
index 0000000..6b870ab
--- /dev/null
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -0,0 +1,1431 @@
+/*-------------------------------------------------------------------------
+ *
+ * postgres_fdw.c
+ * foreign-data wrapper for remote PostgreSQL servers.
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgres_fdw/postgres_fdw.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "fmgr.h"
+
+#include "access/htup_details.h"
+#include "catalog/pg_foreign_server.h"
+#include "catalog/pg_foreign_table.h"
+#include "catalog/pg_type.h"
+#include "commands/defrem.h"
+#include "commands/explain.h"
+#include "commands/vacuum.h"
+#include "foreign/fdwapi.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "optimizer/cost.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
+#include "utils/builtins.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+#include "postgres_fdw.h"
+#include "connection.h"
+
+PG_MODULE_MAGIC;
+
+/* Defalut cost to establish a connection. */
+#define DEFAULT_FDW_STARTUP_COST 100.0
+
+/* Defalut cost to process 1 row, including data transfer. */
+#define DEFAULT_FDW_TUPLE_COST 0.001
+
+/*
+ * FDW-specific information for RelOptInfo.fdw_private. This is used to pass
+ * information from postgresGetForeignRelSize to postgresGetForeignPaths.
+ */
+typedef struct PostgresFdwPlanState {
+ /*
+ * These are generated in GetForeignRelSize, and also used in subsequent
+ * GetForeignPaths.
+ */
+ StringInfoData sql;
+ Cost startup_cost;
+ Cost total_cost;
+ List *remote_conds;
+ List *param_conds;
+ List *local_conds;
+ int width; /* obtained by remote EXPLAIN */
+
+ /* Cached catalog information. */
+ ForeignTable *table;
+ ForeignServer *server;
+} PostgresFdwPlanState;
+
+/*
+ * Index of FDW-private information stored in fdw_private list.
+ *
+ * We store various information in ForeignScan.fdw_private to pass them beyond
+ * the boundary between planner and executor. Finally FdwPlan holds items
+ * below:
+ *
+ * 1) plain SELECT statement
+ *
+ * These items are indexed with the enum FdwPrivateIndex, so an item
+ * can be accessed directly via list_nth(). For example of SELECT statement:
+ * sql = list_nth(fdw_private, FdwPrivateSelectSql)
+ */
+enum FdwPrivateIndex {
+ /* SQL statements */
+ FdwPrivateSelectSql,
+
+ /* # of elements stored in the list fdw_private */
+ FdwPrivateNum,
+};
+
+/*
+ * Describe the attribute where data conversion fails.
+ */
+typedef struct ErrorPos {
+ Oid relid; /* oid of the foreign table */
+ AttrNumber cur_attno; /* attribute number under process */
+} ErrorPos;
+
+/*
+ * Describes an execution state of a foreign scan against a foreign table
+ * using postgres_fdw.
+ */
+typedef struct PostgresFdwExecutionState
+{
+ List *fdw_private; /* FDW-private information */
+
+ /* for remote query execution */
+ PGconn *conn; /* connection for the scan */
+ Oid *param_types; /* type array of external parameter */
+ const char **param_values; /* value array of external parameter */
+
+ /* for tuple generation. */
+ AttrNumber attnum; /* # of non-dropped attribute */
+ Datum *values; /* column value buffer */
+ bool *nulls; /* column null indicator buffer */
+ AttInMetadata *attinmeta; /* attribute metadata */
+
+ /* for storing result tuples */
+ MemoryContext scan_cxt; /* context for per-scan lifespan data */
+ MemoryContext temp_cxt; /* context for per-tuple temporary data */
+ Tuplestorestate *tuples; /* result of the scan */
+
+ /* for error handling. */
+ ErrorPos errpos;
+} PostgresFdwExecutionState;
+
+/*
+ * Describes a state of analyze request for a foreign table.
+ */
+typedef struct PostgresAnalyzeState
+{
+ /* for tuple generation. */
+ TupleDesc tupdesc;
+ AttInMetadata *attinmeta;
+ Datum *values;
+ bool *nulls;
+
+ /* for random sampling */
+ HeapTuple *rows; /* result buffer */
+ int targrows; /* target # of sample rows */
+ int numrows; /* # of samples collected */
+ double samplerows; /* # of rows fetched */
+ double rowstoskip; /* # of rows skipped before next sample */
+ double rstate; /* random state */
+
+ /* for storing result tuples */
+ MemoryContext anl_cxt; /* context for per-analyze lifespan data */
+ MemoryContext temp_cxt; /* context for per-tuple temporary data */
+
+ /* for error handling. */
+ ErrorPos errpos;
+} PostgresAnalyzeState;
+
+/*
+ * SQL functions
+ */
+extern Datum postgres_fdw_handler(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(postgres_fdw_handler);
+
+/*
+ * FDW callback routines
+ */
+static void postgresGetForeignRelSize(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid);
+static void postgresGetForeignPaths(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid);
+static ForeignScan *postgresGetForeignPlan(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid,
+ ForeignPath *best_path,
+ List *tlist,
+ List *scan_clauses);
+static void postgresExplainForeignScan(ForeignScanState *node,
+ ExplainState *es);
+static void postgresBeginForeignScan(ForeignScanState *node, int eflags);
+static TupleTableSlot *postgresIterateForeignScan(ForeignScanState *node);
+static void postgresReScanForeignScan(ForeignScanState *node);
+static void postgresEndForeignScan(ForeignScanState *node);
+static bool postgresAnalyzeForeignTable(Relation relation,
+ AcquireSampleRowsFunc *func,
+ BlockNumber *totalpages);
+
+/*
+ * Helper functions
+ */
+static void get_remote_estimate(const char *sql,
+ PGconn *conn,
+ double *rows,
+ int *width,
+ Cost *startup_cost,
+ Cost *total_cost);
+static void execute_query(ForeignScanState *node);
+static void query_row_processor(PGresult *res, ForeignScanState *node,
+ bool first);
+static void analyze_row_processor(PGresult *res, PostgresAnalyzeState *astate,
+ bool first);
+static void postgres_fdw_error_callback(void *arg);
+static int postgresAcquireSampleRowsFunc(Relation relation, int elevel,
+ HeapTuple *rows, int targrows,
+ double *totalrows,
+ double *totaldeadrows);
+
+/* Exported functions, but not written in postgres_fdw.h. */
+void _PG_init(void);
+void _PG_fini(void);
+
+/*
+ * Module-specific initialization.
+ */
+void
+_PG_init(void)
+{
+ InitPostgresFdwOptions();
+}
+
+/*
+ * Module-specific clean up.
+ */
+void
+_PG_fini(void)
+{
+}
+
+/*
+ * Foreign-data wrapper handler function: return a struct with pointers
+ * to my callback routines.
+ */
+Datum
+postgres_fdw_handler(PG_FUNCTION_ARGS)
+{
+ FdwRoutine *routine = makeNode(FdwRoutine);
+
+ /* Required handler functions. */
+ routine->GetForeignRelSize = postgresGetForeignRelSize;
+ routine->GetForeignPaths = postgresGetForeignPaths;
+ routine->GetForeignPlan = postgresGetForeignPlan;
+ routine->ExplainForeignScan = postgresExplainForeignScan;
+ routine->BeginForeignScan = postgresBeginForeignScan;
+ routine->IterateForeignScan = postgresIterateForeignScan;
+ routine->ReScanForeignScan = postgresReScanForeignScan;
+ routine->EndForeignScan = postgresEndForeignScan;
+
+ /* Optional handler functions. */
+ routine->AnalyzeForeignTable = postgresAnalyzeForeignTable;
+
+ PG_RETURN_POINTER(routine);
+}
+
+/*
+ * postgresGetForeignRelSize
+ * Estimate # of rows and width of the result of the scan
+ *
+ * Here we estimate number of rows returned by the scan in two steps. In the
+ * first step, we execute remote EXPLAIN command to obtain the number of rows
+ * returned from remote side. In the second step, we calculate the selectivity
+ * of the filtering done on local side, and modify first estimate.
+ *
+ * We have to get some catalog objects and generate remote query string here,
+ * so we store such expensive information in FDW private area of RelOptInfo and
+ * pass them to subsequent functions for reuse.
+ */
+static void
+postgresGetForeignRelSize(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid)
+{
+ bool use_remote_explain = false;
+ ListCell *lc;
+ PostgresFdwPlanState *fpstate;
+ StringInfo sql;
+ ForeignTable *table;
+ ForeignServer *server;
+ Selectivity sel;
+ double rows;
+ int width;
+ Cost startup_cost;
+ Cost total_cost;
+ List *remote_conds = NIL;
+ List *param_conds = NIL;
+ List *local_conds = NIL;
+
+ /*
+ * We use PostgresFdwPlanState to pass various information to subsequent
+ * functions.
+ */
+ fpstate = palloc0(sizeof(PostgresFdwPlanState));
+ initStringInfo(&fpstate->sql);
+ sql = &fpstate->sql;
+
+ /*
+ * Determine whether we use remote estimate or not. Note that per-table
+ * setting overrides per-server setting.
+ */
+ table = GetForeignTable(foreigntableid);
+ server = GetForeignServer(table->serverid);
+ foreach (lc, server->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+ if (strcmp(def->defname, "use_remote_explain") == 0)
+ {
+ use_remote_explain = defGetBoolean(def);
+ break;
+ }
+ }
+ foreach (lc, table->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+ if (strcmp(def->defname, "use_remote_explain") == 0)
+ {
+ use_remote_explain = defGetBoolean(def);
+ break;
+ }
+ }
+
+ /*
+ * Construct remote query which consists of SELECT, FROM, and WHERE
+ * clauses. Conditions which contain any Param node are excluded because
+ * placeholder can't be used in EXPLAIN statement. Such conditions are
+ * appended later.
+ */
+ classifyConditions(root, baserel, &remote_conds, ¶m_conds,
+ &local_conds);
+ deparseSimpleSql(sql, root, baserel, local_conds);
+ if (list_length(remote_conds) > 0)
+ appendWhereClause(sql, true, remote_conds, root);
+ elog(DEBUG3, "Query SQL: %s", sql->data);
+
+ /*
+ * If the table or the server is configured to use remote EXPLAIN, connect
+ * the foreign server and execute EXPLAIN with conditions which don't
+ * contain any parameter reference. Otherwise, estimate rows in the way
+ * similar to ordinary tables.
+ */
+ if (use_remote_explain)
+ {
+ UserMapping *user;
+ PGconn *conn;
+
+ user = GetUserMapping(GetOuterUserId(), server->serverid);
+ conn = GetConnection(server, user, false);
+ get_remote_estimate(sql->data, conn, &rows, &width,
+ &startup_cost, &total_cost);
+ ReleaseConnection(conn);
+
+ /*
+ * Estimate selectivity of conditions which are not used in remote
+ * EXPLAIN by calling clauselist_selectivity(). The best we can do for
+ * parameterized condition is to estimate selectivity on the basis of
+ * local statistics. When we actually obtain result rows, such
+ * conditions are deparsed into remote query and reduce rows
+ * transferred.
+ */
+ sel = 1;
+ sel *= clauselist_selectivity(root, param_conds,
+ baserel->relid, JOIN_INNER, NULL);
+ sel *= clauselist_selectivity(root, local_conds,
+ baserel->relid, JOIN_INNER, NULL);
+
+ /* Report estimated numbers to planner. */
+ baserel->rows = rows * sel;
+ }
+ else
+ {
+ /*
+ * Estimate rows from the result of the last ANALYZE, and all
+ * conditions specified in original query.
+ */
+ set_baserel_size_estimates(root, baserel);
+
+ /* Save estimated width to pass it to consequence functions */
+ width = baserel->width;
+ }
+
+ /*
+ * Finish deparsing remote query by adding conditions which are unavailable
+ * in remote EXPLAIN since they contain parameter references.
+ */
+ if (list_length(param_conds) > 0)
+ appendWhereClause(sql, !(list_length(remote_conds) > 0), param_conds,
+ root);
+
+ /*
+ * Pack obtained information into a object and store it in FDW-private area
+ * of RelOptInfo to pass them to subsequent functions.
+ */
+ fpstate->startup_cost = startup_cost;
+ fpstate->total_cost = total_cost;
+ fpstate->remote_conds = remote_conds;
+ fpstate->param_conds = param_conds;
+ fpstate->local_conds = local_conds;
+ fpstate->width = width;
+ fpstate->table = table;
+ fpstate->server = server;
+ baserel->fdw_private = (void *) fpstate;
+}
+
+/*
+ * postgresGetForeignPaths
+ * Create possible scan paths for a scan on the foreign table
+ */
+static void
+postgresGetForeignPaths(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid)
+{
+ PostgresFdwPlanState *fpstate;
+ ForeignPath *path;
+ ListCell *lc;
+ double fdw_startup_cost = DEFAULT_FDW_STARTUP_COST;
+ double fdw_tuple_cost = DEFAULT_FDW_TUPLE_COST;
+ Cost startup_cost;
+ Cost total_cost;
+ List *fdw_private;
+
+ /* Cache frequently accessed value */
+ fpstate = (PostgresFdwPlanState *) baserel->fdw_private;
+
+ /*
+ * We have cost values which are estimated on remote side, so adjust them
+ * for better estimate which respect various stuffs to complete the scan,
+ * such as sending query, transferring result, and local filtering.
+ */
+ startup_cost = fpstate->startup_cost;
+ total_cost = fpstate->total_cost;
+
+ /*
+ * Adjust costs with factors of the corresponding foreign server:
+ * - add cost to establish connection to both startup and total
+ * - add cost to manipulate on remote, and transfer result to total
+ * - add cost to manipulate tuples on local side to total
+ */
+ foreach(lc, fpstate->server->options)
+ {
+ DefElem *d = (DefElem *) lfirst(lc);
+ if (strcmp(d->defname, "fdw_startup_cost") == 0)
+ fdw_startup_cost = strtod(defGetString(d), NULL);
+ else if (strcmp(d->defname, "fdw_tuple_cost") == 0)
+ fdw_tuple_cost = strtod(defGetString(d), NULL);
+ }
+ startup_cost += fdw_startup_cost;
+ total_cost += fdw_startup_cost;
+ total_cost += fdw_tuple_cost * baserel->rows;
+ total_cost += cpu_tuple_cost * baserel->rows;
+
+ /* Pass SQL statement from planner to executor through FDW private area. */
+ fdw_private = list_make1(makeString(fpstate->sql.data));
+
+ /*
+ * Create simplest ForeignScan path node and add it to baserel. This path
+ * corresponds to SeqScan path of regular tables.
+ */
+ path = create_foreignscan_path(root, baserel,
+ baserel->rows,
+ startup_cost,
+ total_cost,
+ NIL, /* no pathkeys */
+ NULL, /* no outer rel either */
+ fdw_private);
+ add_path(baserel, (Path *) path);
+
+ /*
+ * XXX We can consider sorted path or parameterized path here if we know
+ * that foreign table is indexed on remote end. For this purpose, we
+ * might have to support FOREIGN INDEX to represent possible sets of sort
+ * keys and/or filtering.
+ */
+}
+
+/*
+ * postgresGetForeignPlan
+ * Create ForeignScan plan node which implements selected best path
+ */
+static ForeignScan *
+postgresGetForeignPlan(PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid,
+ ForeignPath *best_path,
+ List *tlist,
+ List *scan_clauses)
+{
+ PostgresFdwPlanState *fpstate;
+ Index scan_relid = baserel->relid;
+ List *fdw_private = NIL;
+ List *fdw_exprs = NIL;
+ List *local_exprs = NIL;
+ ListCell *lc;
+
+ /* Cache frequently accessed value */
+ fpstate = (PostgresFdwPlanState *) baserel->fdw_private;
+
+ /*
+ * We need lists of Expr other than the lists of RestrictInfo. Now we can
+ * merge remote_conds and param_conds into fdw_exprs, because they are
+ * evaluated on remote side for actual remote query.
+ */
+ foreach(lc, fpstate->remote_conds)
+ fdw_exprs = lappend(fdw_exprs, ((RestrictInfo *) lfirst(lc))->clause);
+ foreach(lc, fpstate->param_conds)
+ fdw_exprs = lappend(fdw_exprs, ((RestrictInfo *) lfirst(lc))->clause);
+ foreach(lc, fpstate->local_conds)
+ local_exprs = lappend(local_exprs,
+ ((RestrictInfo *) lfirst(lc))->clause);
+
+ /*
+ * Make a list contains SELECT statement to it to executor with plan node
+ * for later use.
+ */
+ fdw_private = lappend(fdw_private, makeString(fpstate->sql.data));
+
+ /*
+ * Create the ForeignScan node from target list, local filtering
+ * expressions, remote filtering expressions, and FDW private information.
+ *
+ * We remove expressions which are evaluated on remote side from qual of
+ * the scan node to avoid redundant filtering. Such filter reduction
+ * can be done only here, done after choosing best path, because
+ * baserestrictinfo in RelOptInfo is shared by all possible paths until
+ * best path is chosen.
+ */
+ return make_foreignscan(tlist,
+ local_exprs,
+ scan_relid,
+ fdw_exprs,
+ fdw_private);
+}
+
+/*
+ * postgresExplainForeignScan
+ * Produce extra output for EXPLAIN
+ */
+static void
+postgresExplainForeignScan(ForeignScanState *node, ExplainState *es)
+{
+ List *fdw_private;
+ char *sql;
+
+ if (es->verbose)
+ {
+ fdw_private = ((ForeignScan *) node->ss.ps.plan)->fdw_private;
+ sql = strVal(list_nth(fdw_private, FdwPrivateSelectSql));
+ ExplainPropertyText("Remote SQL", sql, es);
+ }
+}
+
+/*
+ * postgresBeginForeignScan
+ * Initiate access to a foreign PostgreSQL table.
+ */
+static void
+postgresBeginForeignScan(ForeignScanState *node, int eflags)
+{
+ PostgresFdwExecutionState *festate;
+ PGconn *conn;
+ Oid relid;
+ ForeignTable *table;
+ ForeignServer *server;
+ UserMapping *user;
+ TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+
+ /*
+ * Do nothing in EXPLAIN (no ANALYZE) case. node->fdw_state stays NULL.
+ */
+ if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+ return;
+
+ /*
+ * Save state in node->fdw_state.
+ */
+ festate = (PostgresFdwExecutionState *)
+ palloc(sizeof(PostgresFdwExecutionState));
+ festate->fdw_private = ((ForeignScan *) node->ss.ps.plan)->fdw_private;
+
+ /*
+ * Create contexts for per-scan tuplestore under per-query context.
+ */
+ festate->scan_cxt = AllocSetContextCreate(node->ss.ps.state->es_query_cxt,
+ "postgres_fdw per-scan data",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ festate->temp_cxt = AllocSetContextCreate(node->ss.ps.state->es_query_cxt,
+ "postgres_fdw temporary data",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+
+ /*
+ * Get connection to the foreign server. Connection manager would
+ * establish new connection if necessary.
+ */
+ relid = RelationGetRelid(node->ss.ss_currentRelation);
+ table = GetForeignTable(relid);
+ server = GetForeignServer(table->serverid);
+ user = GetUserMapping(GetOuterUserId(), server->serverid);
+ conn = GetConnection(server, user, true);
+ festate->conn = conn;
+
+ /* Result will be filled in first Iterate call. */
+ festate->tuples = NULL;
+
+ /* Allocate buffers for column values. */
+ {
+ TupleDesc tupdesc = slot->tts_tupleDescriptor;
+ festate->values = palloc(sizeof(Datum) * tupdesc->natts);
+ festate->nulls = palloc(sizeof(bool) * tupdesc->natts);
+ festate->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+ }
+
+ /*
+ * Allocate buffers for query parameters.
+ *
+ * ParamListInfo might include entries for pseudo-parameter such as
+ * PL/pgSQL's FOUND variable, but we don't care that here, because wasted
+ * area seems not so large.
+ */
+ {
+ ParamListInfo params = node->ss.ps.state->es_param_list_info;
+ int numParams = params ? params->numParams : 0;
+
+ if (numParams > 0)
+ {
+ festate->param_types = palloc0(sizeof(Oid) * numParams);
+ festate->param_values = palloc0(sizeof(char *) * numParams);
+ }
+ else
+ {
+ festate->param_types = NULL;
+ festate->param_values = NULL;
+ }
+ }
+
+ /* Remember which foreign table we are scanning. */
+ festate->errpos.relid = relid;
+
+ /* Store FDW-specific state into ForeignScanState */
+ node->fdw_state = (void *) festate;
+
+ return;
+}
+
+/*
+ * postgresIterateForeignScan
+ * Retrieve next row from the result set, or clear tuple slot to indicate
+ * EOF.
+ *
+ * Note that using per-query context when retrieving tuples from
+ * tuplestore to ensure that returned tuples can survive until next
+ * iteration because the tuple is released implicitly via ExecClearTuple.
+ * Retrieving a tuple from tuplestore in CurrentMemoryContext (it's a
+ * per-tuple context), ExecClearTuple will free dangling pointer.
+ */
+static TupleTableSlot *
+postgresIterateForeignScan(ForeignScanState *node)
+{
+ PostgresFdwExecutionState *festate;
+ TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ festate = (PostgresFdwExecutionState *) node->fdw_state;
+
+ /*
+ * If this is the first call after Begin or ReScan, we need to execute
+ * remote query and get result set.
+ */
+ if (festate->tuples == NULL)
+ execute_query(node);
+
+ /*
+ * If tuples are still left in tuplestore, just return next tuple from it.
+ *
+ * It is necessary to switch to per-scan context to make returned tuple
+ * valid until next IterateForeignScan call, because it will be released
+ * with ExecClearTuple then. Otherwise, picked tuple is allocated in
+ * per-tuple context, and double-free of that tuple might happen.
+ *
+ * If we don't have any result in tuplestore, clear result slot to tell
+ * executor that this scan is over.
+ */
+ MemoryContextSwitchTo(festate->scan_cxt);
+ tuplestore_gettupleslot(festate->tuples, true, false, slot);
+ MemoryContextSwitchTo(oldcontext);
+
+ return slot;
+}
+
+/*
+ * postgresReScanForeignScan
+ * - Restart this scan by clearing old results and set re-execute flag.
+ */
+static void
+postgresReScanForeignScan(ForeignScanState *node)
+{
+ PostgresFdwExecutionState *festate;
+
+ festate = (PostgresFdwExecutionState *) node->fdw_state;
+
+ /* If we haven't have valid result yet, nothing to do. */
+ if (festate->tuples == NULL)
+ return;
+
+ /*
+ * Only rewind the current result set is enough.
+ */
+ tuplestore_rescan(festate->tuples);
+}
+
+/*
+ * postgresEndForeignScan
+ * Finish scanning foreign table and dispose objects used for this scan
+ */
+static void
+postgresEndForeignScan(ForeignScanState *node)
+{
+ PostgresFdwExecutionState *festate;
+
+ festate = (PostgresFdwExecutionState *) node->fdw_state;
+
+ /* if festate is NULL, we are in EXPLAIN; nothing to do */
+ if (festate == NULL)
+ return;
+
+ /*
+ * The connection which was used for this scan should be valid until the
+ * end of the scan to make the lifespan of remote transaction same as the
+ * local query.
+ */
+ ReleaseConnection(festate->conn);
+ festate->conn = NULL;
+
+ /* Discard fetch results */
+ if (festate->tuples != NULL)
+ {
+ tuplestore_end(festate->tuples);
+ festate->tuples = NULL;
+ }
+
+ /* MemoryContext will be deleted automatically. */
+}
+
+/*
+ * Estimate costs of executing given SQL statement.
+ */
+static void
+get_remote_estimate(const char *sql, PGconn *conn,
+ double *rows, int *width,
+ Cost *startup_cost, Cost *total_cost)
+{
+ PGresult *volatile res = NULL;
+ StringInfoData buf;
+ char *plan;
+ char *p;
+ int n;
+
+ /*
+ * Construct EXPLAIN statement with given SQL statement.
+ */
+ initStringInfo(&buf);
+ appendStringInfo(&buf, "EXPLAIN %s", sql);
+
+ /* PGresult must be released before leaving this function. */
+ PG_TRY();
+ {
+ res = PQexec(conn, buf.data);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK || PQntuples(res) == 0)
+ ereport(ERROR,
+ (errmsg("could not execute EXPLAIN for cost estimation"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+
+ /*
+ * Find estimation portion from top plan node. Here we search opening
+ * parentheses from the end of the line to avoid finding unexpected
+ * parentheses.
+ */
+ plan = PQgetvalue(res, 0, 0);
+ p = strrchr(plan, '(');
+ if (p == NULL)
+ elog(ERROR, "wrong EXPLAIN output: %s", plan);
+ n = sscanf(p,
+ "(cost=%lf..%lf rows=%lf width=%d)",
+ startup_cost, total_cost, rows, width);
+ if (n != 4)
+ elog(ERROR, "could not get estimation from EXPLAIN output");
+
+ PQclear(res);
+ res = NULL;
+ }
+ PG_CATCH();
+ {
+ PQclear(res);
+
+ /* Release connection and let connection manager cleanup. */
+ ReleaseConnection(conn);
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+}
+
+/*
+ * Execute remote query with current parameters.
+ */
+static void
+execute_query(ForeignScanState *node)
+{
+ PostgresFdwExecutionState *festate;
+ ParamListInfo params = node->ss.ps.state->es_param_list_info;
+ int numParams = params ? params->numParams : 0;
+ Oid *types = NULL;
+ const char **values = NULL;
+ char *sql;
+ PGconn *conn;
+ PGresult *volatile res = NULL;
+
+ festate = (PostgresFdwExecutionState *) node->fdw_state;
+ types = festate->param_types;
+ values = festate->param_values;
+
+ /*
+ * Construct parameter array in text format. We don't release memory for
+ * the arrays explicitly, because the memory usage would not be very large,
+ * and anyway they will be released in context cleanup.
+ *
+ * If this query is invoked from pl/pgsql function, we have extra entry
+ * for dummy variable FOUND in ParamListInfo, so we need to check type oid
+ * to exclude it from remote parameters.
+ */
+ if (numParams > 0)
+ {
+ int i;
+
+ for (i = 0; i < numParams; i++)
+ {
+ ParamExternData *prm = ¶ms->params[i];
+
+ /* give hook a chance in case parameter is dynamic */
+ if (!OidIsValid(prm->ptype) && params->paramFetch != NULL)
+ params->paramFetch(params, i + 1);
+
+ /*
+ * Get string representation of each parameter value by invoking
+ * type-specific output function unless the value is null or it's
+ * not used in the query.
+ */
+ types[i] = prm->ptype;
+ if (!prm->isnull && OidIsValid(types[i]))
+ {
+ Oid out_func_oid;
+ bool isvarlena;
+ FmgrInfo func;
+
+ getTypeOutputInfo(types[i], &out_func_oid, &isvarlena);
+ fmgr_info(out_func_oid, &func);
+ values[i] = OutputFunctionCall(&func, prm->value);
+ }
+ else
+ values[i] = NULL;
+
+ /*
+ * We use type "text" (groundless but seems most flexible) for
+ * unused (and type-unknown) parameters. We can't remove entry for
+ * unused parameter from the arrays, because parameter references
+ * in remote query ($n) have been indexed based on full length
+ * parameter list.
+ */
+ if (!OidIsValid(types[i]))
+ types[i] = TEXTOID;
+ }
+ }
+
+ conn = festate->conn;
+
+ /* PGresult must be released before leaving this function. */
+ PG_TRY();
+ {
+ bool first = true;
+
+ /*
+ * Execute remote query with parameters, and retrieve results with
+ * single-row-mode which returns results row by row.
+ */
+ sql = strVal(list_nth(festate->fdw_private, FdwPrivateSelectSql));
+ if (!PQsendQueryParams(conn, sql, numParams, types, values, NULL, NULL,
+ 0))
+ ereport(ERROR,
+ (errmsg("could not execute remote query"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+ if (!PQsetSingleRowMode(conn))
+ ereport(ERROR,
+ (errmsg("could not set single-row mode"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+
+ /* Retrieve result rows one by one, and store them into tuplestore. */
+ for (;;)
+ {
+ /* Allow users to cancel long query */
+ CHECK_FOR_INTERRUPTS();
+
+ res = PQgetResult(conn);
+ if (res == NULL)
+ break;
+
+ /* Store the result row into tuplestore */
+ if (PQresultStatus(res) == PGRES_SINGLE_TUPLE)
+ {
+ query_row_processor(res, node, first);
+ PQclear(res);
+ res = NULL;
+ first = false;
+ }
+ else if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ /*
+ * PGresult with PGRES_TUPLES_OK means EOF, so we need to
+ * initialize tuplestore if we have not retrieved any tuple.
+ */
+ if (first)
+ query_row_processor(res, node, first);
+ PQclear(res);
+ res = NULL;
+ first = true;
+ }
+ else
+ {
+ /* Something wrong happend, report the error. */
+ ereport(ERROR,
+ (errmsg("could not execute remote query"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql)));
+ }
+ }
+
+ /*
+ * We can't know whether the scan is over or not in custom row
+ * processor, so mark that the result is valid here.
+ */
+ tuplestore_donestoring(festate->tuples);
+
+ /* Discard result of SELECT statement. */
+ PQclear(res);
+ res = NULL;
+ }
+ PG_CATCH();
+ {
+ PQclear(res);
+
+ /* Release connection and let connection manager cleanup. */
+ ReleaseConnection(conn);
+
+ /* propagate error */
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+}
+
+/*
+ * Create tuples from PGresult and store them into tuplestore.
+ *
+ * Caller must use PG_TRY block to catch exception and release PGresult
+ * surely.
+ */
+static void
+query_row_processor(PGresult *res, ForeignScanState *node, bool first)
+{
+ int i;
+ int j;
+ int attnum; /* number of non-dropped columns */
+ TupleTableSlot *slot;
+ TupleDesc tupdesc;
+ Form_pg_attribute *attrs;
+ PostgresFdwExecutionState *festate;
+ AttInMetadata *attinmeta;
+ HeapTuple tuple;
+ ErrorContextCallback errcallback;
+ MemoryContext oldcontext;
+
+ /* Cache frequently used values */
+ slot = node->ss.ss_ScanTupleSlot;
+ tupdesc = slot->tts_tupleDescriptor;
+ attrs = tupdesc->attrs;
+ festate = (PostgresFdwExecutionState *) node->fdw_state;
+ attinmeta = festate->attinmeta;
+
+ if (first)
+ {
+ int nfields = PQnfields(res);
+
+ /* count non-dropped columns */
+ for (attnum = 0, i = 0; i < tupdesc->natts; i++)
+ if (!attrs[i]->attisdropped)
+ attnum++;
+
+ /* check result and tuple descriptor have the same number of columns */
+ if (attnum > 0 && attnum != nfields)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("remote query result rowtype does not match "
+ "the specified FROM clause rowtype"),
+ errdetail("expected %d, actual %d", attnum, nfields)));
+
+ /* First, ensure that the tuplestore is empty. */
+ if (festate->tuples == NULL)
+ {
+
+ /*
+ * Create tuplestore to store result of the query in per-query
+ * context. Note that we use this memory context to avoid memory
+ * leak in error cases.
+ */
+ oldcontext = MemoryContextSwitchTo(festate->scan_cxt);
+ festate->tuples = tuplestore_begin_heap(false, false, work_mem);
+ MemoryContextSwitchTo(oldcontext);
+ }
+ else
+ {
+ /* Clear old result just in case. */
+ tuplestore_clear(festate->tuples);
+ }
+
+ /* Do nothing for empty result */
+ if (PQntuples(res) == 0)
+ return;
+ }
+
+ /* Should have a single-row result if we get here */
+ Assert(PQntuples(res) == 1);
+
+ /*
+ * Do the following work in a temp context that we reset after each tuple.
+ * This cleans up not only the data we have direct access to, but any
+ * cruft the I/O functions might leak.
+ */
+ oldcontext = MemoryContextSwitchTo(festate->temp_cxt);
+
+ for (i = 0, j = 0; i < tupdesc->natts; i++)
+ {
+ /* skip dropped columns. */
+ if (attrs[i]->attisdropped)
+ {
+ festate->nulls[i] = true;
+ continue;
+ }
+
+ /*
+ * Set NULL indicator, and convert text representation to internal
+ * representation if any.
+ */
+ if (PQgetisnull(res, 0, j))
+ festate->nulls[i] = true;
+ else
+ {
+ Datum value;
+
+ festate->nulls[i] = false;
+
+ /*
+ * Set up and install callback to report where conversion error
+ * occurs.
+ */
+ festate->errpos.cur_attno = i + 1;
+ errcallback.callback = postgres_fdw_error_callback;
+ errcallback.arg = (void *) &festate->errpos;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ value = InputFunctionCall(&attinmeta->attinfuncs[i],
+ PQgetvalue(res, 0, j),
+ attinmeta->attioparams[i],
+ attinmeta->atttypmods[i]);
+ festate->values[i] = value;
+
+ /* Uninstall error context callback. */
+ error_context_stack = errcallback.previous;
+ }
+ j++;
+ }
+
+ /*
+ * Build the tuple and put it into the slot.
+ * We don't have to free the tuple explicitly because it's been
+ * allocated in the per-tuple context.
+ */
+ tuple = heap_form_tuple(tupdesc, festate->values, festate->nulls);
+ tuplestore_puttuple(festate->tuples, tuple);
+
+ /* Clean up */
+ MemoryContextSwitchTo(oldcontext);
+ MemoryContextReset(festate->temp_cxt);
+
+ return;
+}
+
+/*
+ * Callback function which is called when error occurs during column value
+ * conversion. Print names of column and relation.
+ */
+static void
+postgres_fdw_error_callback(void *arg)
+{
+ ErrorPos *errpos = (ErrorPos *) arg;
+ const char *relname;
+ const char *colname;
+
+ relname = get_rel_name(errpos->relid);
+ colname = get_attname(errpos->relid, errpos->cur_attno);
+ errcontext("column %s of foreign table %s",
+ quote_identifier(colname), quote_identifier(relname));
+}
+
+/*
+ * postgresAnalyzeForeignTable
+ * Test whether analyzing this foreign table is supported
+ */
+static bool
+postgresAnalyzeForeignTable(Relation relation,
+ AcquireSampleRowsFunc *func,
+ BlockNumber *totalpages)
+{
+ *totalpages = 0;
+ *func = postgresAcquireSampleRowsFunc;
+
+ return true;
+}
+
+/*
+ * Acquire a random sample of rows from foreign table managed by postgres_fdw.
+ *
+ * postgres_fdw doesn't provide direct access to remote buffer, so we execute
+ * simple SELECT statement which retrieves whole rows from remote side, and
+ * pick some samples from them.
+ */
+static int
+postgresAcquireSampleRowsFunc(Relation relation, int elevel,
+ HeapTuple *rows, int targrows,
+ double *totalrows,
+ double *totaldeadrows)
+{
+ PostgresAnalyzeState astate;
+ StringInfoData sql;
+ ForeignTable *table;
+ ForeignServer *server;
+ UserMapping *user;
+ PGconn *conn = NULL;
+ PGresult *volatile res = NULL;
+
+ /*
+ * Only few information are necessary as input to row processor. Other
+ * initialization will be done at the first row processor call.
+ */
+ astate.anl_cxt = CurrentMemoryContext;
+ astate.temp_cxt = AllocSetContextCreate(CurrentMemoryContext,
+ "postgres_fdw analyze temporary data",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ astate.rows = rows;
+ astate.targrows = targrows;
+ astate.tupdesc = relation->rd_att;
+ astate.errpos.relid = relation->rd_id;
+
+ /*
+ * Construct SELECT statement which retrieves whole rows from remote. We
+ * can't avoid running sequential scan on remote side to get practical
+ * statistics, so this seems reasonable compromise.
+ */
+ initStringInfo(&sql);
+ deparseAnalyzeSql(&sql, relation);
+ elog(DEBUG3, "Analyze SQL: %s", sql.data);
+
+ table = GetForeignTable(relation->rd_id);
+ server = GetForeignServer(table->serverid);
+ user = GetUserMapping(GetOuterUserId(), server->serverid);
+ conn = GetConnection(server, user, true);
+
+ /*
+ * Acquire sample rows from the result set.
+ */
+ PG_TRY();
+ {
+ bool first = true;
+
+ /* Execute remote query and retrieve results row by row. */
+ if (!PQsendQuery(conn, sql.data))
+ ereport(ERROR,
+ (errmsg("could not execute remote query for analyze"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql.data)));
+ if (!PQsetSingleRowMode(conn))
+ ereport(ERROR,
+ (errmsg("could not set single-row mode"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql.data)));
+
+ /* Retrieve result rows one by one, and store them into tuplestore. */
+ for (;;)
+ {
+ /* Allow users to cancel long query */
+ CHECK_FOR_INTERRUPTS();
+
+ res = PQgetResult(conn);
+ if (res == NULL)
+ break;
+
+ /* Store the result row into tuplestore */
+ if (PQresultStatus(res) == PGRES_SINGLE_TUPLE)
+ {
+ analyze_row_processor(res, &astate, first);
+ PQclear(res);
+ res = NULL;
+ first = false;
+ }
+ else if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ /*
+ * PGresult with PGRES_TUPLES_OK means EOF, so we need to
+ * initialize tuplestore if we have not retrieved any tuple.
+ */
+ if (first && PQresultStatus(res) == PGRES_TUPLES_OK)
+ analyze_row_processor(res, &astate, first);
+
+ PQclear(res);
+ res = NULL;
+ first = true;
+ }
+ else
+ {
+ /* Something wrong happend, report the error. */
+ ereport(ERROR,
+ (errmsg("could not execute remote query for analyze"),
+ errdetail("%s", PQerrorMessage(conn)),
+ errhint("%s", sql.data)));
+ }
+ }
+ }
+ PG_CATCH();
+ {
+ PQclear(res);
+
+ /* Release connection and let connection manager cleanup. */
+ ReleaseConnection(conn);
+
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ ReleaseConnection(conn);
+
+ /* We assume that we have no dead tuple. */
+ *totaldeadrows = 0.0;
+
+ /* We've retrieved all living tuples from foreign server. */
+ *totalrows = astate.samplerows;
+
+ /*
+ * We don't update pg_class.relpages because we don't care that in
+ * planning at all.
+ */
+
+ /*
+ * Emit some interesting relation info
+ */
+ ereport(elevel,
+ (errmsg("\"%s\": scanned with \"%s\", "
+ "containing %.0f live rows and %.0f dead rows; "
+ "%d rows in sample, %.0f estimated total rows",
+ RelationGetRelationName(relation), sql.data,
+ astate.samplerows, 0.0,
+ astate.numrows, astate.samplerows)));
+
+ return astate.numrows;
+}
+
+/*
+ * Custom row processor for acquire_sample_rows.
+ *
+ * Collect sample rows from the result of query.
+ * - Use all tuples as sample until target rows samples are collected.
+ * - Once reached the target, skip some tuples and replace already sampled
+ * tuple randomly.
+ */
+static void
+analyze_row_processor(PGresult *res, PostgresAnalyzeState *astate, bool first)
+{
+ int targrows = astate->targrows;
+ TupleDesc tupdesc = astate->tupdesc;
+ int i;
+ int j;
+ int pos; /* position where next sample should be stored. */
+ HeapTuple tuple;
+ ErrorContextCallback errcallback;
+ MemoryContext callercontext;
+
+ if (first)
+ {
+ /* Prepare for sampling rows */
+ astate->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+ astate->values = (Datum *) palloc(sizeof(Datum) * tupdesc->natts);
+ astate->nulls = (bool *) palloc(sizeof(bool) * tupdesc->natts);
+ astate->numrows = 0;
+ astate->samplerows = 0;
+ astate->rowstoskip = -1;
+ astate->numrows = 0;
+ astate->rstate = anl_init_selection_state(astate->targrows);
+
+ /* Do nothing for empty result */
+ if (PQntuples(res) == 0)
+ return;
+ }
+
+ /* Should have a single-row result if we get here */
+ Assert(PQntuples(res) == 1);
+
+ /*
+ * Do the following work in a temp context that we reset after each tuple.
+ * This cleans up not only the data we have direct access to, but any
+ * cruft the I/O functions might leak.
+ */
+ callercontext = MemoryContextSwitchTo(astate->temp_cxt);
+
+ /*
+ * First targrows rows are once sampled always. If we have more source
+ * rows, pick up some of them by skipping and replace already sampled
+ * tuple randomly.
+ *
+ * Here we just determine the slot where next sample should be stored. Set
+ * pos to negative value to indicates the row should be skipped.
+ */
+ if (astate->numrows < targrows)
+ pos = astate->numrows++;
+ else
+ {
+ /*
+ * The first targrows sample rows are simply copied into
+ * the reservoir. Then we start replacing tuples in the
+ * sample until we reach the end of the relation. This
+ * algorithm is from Jeff Vitter's paper, similarly to
+ * acquire_sample_rows in analyze.c.
+ *
+ * We don't have block-wise accessibility, so every row in
+ * the PGresult is possible to be sample.
+ */
+ if (astate->rowstoskip < 0)
+ astate->rowstoskip = anl_get_next_S(astate->samplerows, targrows,
+ &astate->rstate);
+
+ if (astate->rowstoskip <= 0)
+ {
+ int k = (int) (targrows * anl_random_fract());
+
+ Assert(k >= 0 && k < targrows);
+
+ /*
+ * Create sample tuple from the result, and replace at
+ * random.
+ */
+ heap_freetuple(astate->rows[k]);
+ pos = k;
+ }
+ else
+ pos = -1;
+
+ astate->rowstoskip -= 1;
+ }
+
+ /* Always increment sample row counter. */
+ astate->samplerows += 1;
+
+ if (pos >= 0)
+ {
+ AttInMetadata *attinmeta = astate->attinmeta;
+
+ /*
+ * Create sample tuple from current result row, and store it into the
+ * position determined above. Note that i and j point entries in
+ * catalog and columns array respectively.
+ */
+ for (i = 0, j = 0; i < tupdesc->natts; i++)
+ {
+ if (tupdesc->attrs[i]->attisdropped)
+ continue;
+
+ if (PQgetisnull(res, 0, j))
+ astate->nulls[i] = true;
+ else
+ {
+ Datum value;
+
+ astate->nulls[i] = false;
+
+ /*
+ * Set up and install callback to report where conversion error
+ * occurs.
+ */
+ astate->errpos.cur_attno = i + 1;
+ errcallback.callback = postgres_fdw_error_callback;
+ errcallback.arg = (void *) &astate->errpos;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ value = InputFunctionCall(&attinmeta->attinfuncs[i],
+ PQgetvalue(res, 0, j),
+ attinmeta->attioparams[i],
+ attinmeta->atttypmods[i]);
+ astate->values[i] = value;
+
+ /* Uninstall error callback function. */
+ error_context_stack = errcallback.previous;
+ }
+ j++;
+ }
+
+ /*
+ * Generate tuple from the result row data, and store it into the give
+ * buffer. Note that we need to allocate the tuple in the analyze
+ * context to make it valid even after temporary per-tuple context has
+ * been reset.
+ */
+ MemoryContextSwitchTo(astate->anl_cxt);
+ tuple = heap_form_tuple(tupdesc, astate->values, astate->nulls);
+ MemoryContextSwitchTo(astate->temp_cxt);
+ astate->rows[pos] = tuple;
+ }
+
+ /* Clean up */
+ MemoryContextSwitchTo(callercontext);
+ MemoryContextReset(astate->temp_cxt);
+
+ return;
+}
diff --git a/contrib/postgres_fdw/postgres_fdw.control b/contrib/postgres_fdw/postgres_fdw.control
new file mode 100644
index 0000000..f9ed490
--- /dev/null
+++ b/contrib/postgres_fdw/postgres_fdw.control
@@ -0,0 +1,5 @@
+# postgres_fdw extension
+comment = 'foreign-data wrapper for remote PostgreSQL servers'
+default_version = '1.0'
+module_pathname = '$libdir/postgres_fdw'
+relocatable = true
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
new file mode 100644
index 0000000..b5cefb8
--- /dev/null
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -0,0 +1,45 @@
+/*-------------------------------------------------------------------------
+ *
+ * postgres_fdw.h
+ * foreign-data wrapper for remote PostgreSQL servers.
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/postgres_fdw/postgres_fdw.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef POSTGRESQL_FDW_H
+#define POSTGRESQL_FDW_H
+
+#include "postgres.h"
+#include "foreign/foreign.h"
+#include "nodes/relation.h"
+#include "utils/relcache.h"
+
+/* in option.c */
+void InitPostgresFdwOptions(void);
+int ExtractConnectionOptions(List *defelems,
+ const char **keywords,
+ const char **values);
+int GetFetchCountOption(ForeignTable *table, ForeignServer *server);
+
+/* in deparse.c */
+void deparseSimpleSql(StringInfo buf,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ List *local_conds);
+void appendWhereClause(StringInfo buf,
+ bool has_where,
+ List *exprs,
+ PlannerInfo *root);
+void classifyConditions(PlannerInfo *root,
+ RelOptInfo *baserel,
+ List **remote_conds,
+ List **param_conds,
+ List **local_conds);
+void deparseAnalyzeSql(StringInfo buf, Relation rel);
+
+#endif /* POSTGRESQL_FDW_H */
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
new file mode 100644
index 0000000..7845e70
--- /dev/null
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -0,0 +1,312 @@
+-- ===================================================================
+-- create FDW objects
+-- ===================================================================
+
+-- Clean up in case a prior regression run failed
+
+-- Suppress NOTICE messages when roles don't exist
+SET client_min_messages TO 'error';
+
+DROP ROLE IF EXISTS postgres_fdw_user;
+
+RESET client_min_messages;
+
+CREATE ROLE postgres_fdw_user LOGIN SUPERUSER;
+SET SESSION AUTHORIZATION 'postgres_fdw_user';
+
+CREATE EXTENSION postgres_fdw;
+
+CREATE SERVER loopback1 FOREIGN DATA WRAPPER postgres_fdw;
+CREATE SERVER loopback2 FOREIGN DATA WRAPPER postgres_fdw
+ OPTIONS (dbname 'contrib_regression');
+
+CREATE USER MAPPING FOR public SERVER loopback1
+ OPTIONS (user 'value', password 'value');
+CREATE USER MAPPING FOR postgres_fdw_user SERVER loopback2;
+
+-- ===================================================================
+-- create objects used through FDW
+-- ===================================================================
+CREATE TYPE user_enum AS ENUM ('foo', 'bar', 'buz');
+CREATE SCHEMA "S 1";
+CREATE TABLE "S 1"."T 1" (
+ "C 1" int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum,
+ CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
+);
+CREATE TABLE "S 1"."T 2" (
+ c1 int NOT NULL,
+ c2 text,
+ CONSTRAINT t2_pkey PRIMARY KEY (c1)
+);
+
+BEGIN;
+TRUNCATE "S 1"."T 1";
+INSERT INTO "S 1"."T 1"
+ SELECT id,
+ id % 10,
+ to_char(id, 'FM00000'),
+ '1970-01-01'::timestamptz + ((id % 100) || ' days')::interval,
+ '1970-01-01'::timestamp + ((id % 100) || ' days')::interval,
+ id % 10,
+ id % 10,
+ 'foo'::user_enum
+ FROM generate_series(1, 1000) id;
+TRUNCATE "S 1"."T 2";
+INSERT INTO "S 1"."T 2"
+ SELECT id,
+ 'AAA' || to_char(id, 'FM000')
+ FROM generate_series(1, 100) id;
+COMMIT;
+
+-- ===================================================================
+-- create foreign tables
+-- ===================================================================
+CREATE FOREIGN TABLE ft1 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft1 DROP COLUMN c0;
+
+CREATE FOREIGN TABLE ft2 (
+ c0 int,
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ c4 timestamptz,
+ c5 timestamp,
+ c6 varchar(10),
+ c7 char(10),
+ c8 user_enum
+) SERVER loopback2;
+ALTER FOREIGN TABLE ft2 DROP COLUMN c0;
+
+-- ===================================================================
+-- tests for validator
+-- ===================================================================
+-- requiressl, krbsrvname and gsslib are omitted because they depend on
+-- configure option
+ALTER SERVER loopback1 OPTIONS (
+ use_remote_explain 'false',
+ fdw_startup_cost '123.456',
+ fdw_tuple_cost '0.123',
+ authtype 'value',
+ service 'value',
+ connect_timeout 'value',
+ dbname 'value',
+ host 'value',
+ hostaddr 'value',
+ port 'value',
+ --client_encoding 'value',
+ tty 'value',
+ options 'value',
+ application_name 'value',
+ --fallback_application_name 'value',
+ keepalives 'value',
+ keepalives_idle 'value',
+ keepalives_interval 'value',
+ -- requiressl 'value',
+ sslcompression 'value',
+ sslmode 'value',
+ sslcert 'value',
+ sslkey 'value',
+ sslrootcert 'value',
+ sslcrl 'value'
+ --requirepeer 'value',
+ -- krbsrvname 'value',
+ -- gsslib 'value',
+ --replication 'value'
+);
+ALTER USER MAPPING FOR public SERVER loopback1
+ OPTIONS (DROP user, DROP password);
+ALTER FOREIGN TABLE ft1 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft2 OPTIONS (nspname 'S 1', relname 'T 1');
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (colname 'C 1');
+\dew+
+\des+
+\deu+
+\det+
+
+-- Use only Nested loop for stable results.
+SET enable_mergejoin TO off;
+SET enable_hashjoin TO off;
+
+-- ===================================================================
+-- simple queries
+-- ===================================================================
+-- single table, with/without alias
+EXPLAIN (COSTS false) SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+SELECT * FROM ft1 t1 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+-- empty result
+SELECT * FROM ft1 WHERE false;
+-- with WHERE clause
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+SELECT * FROM ft1 t1 WHERE t1.c1 = 101 AND t1.c6 = '1' AND t1.c7 >= '1';
+-- aggregate
+SELECT COUNT(*) FROM ft1 t1;
+-- join two tables
+SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+-- subquery
+SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
+-- subquery+MAX
+SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY c1;
+-- used in CTE
+WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
+-- fixed values
+SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+-- user-defined operator/function
+CREATE FUNCTION postgres_fdw_abs(int) RETURNS int AS $$
+BEGIN
+RETURN abs($1);
+END
+$$ LANGUAGE plpgsql IMMUTABLE;
+CREATE OPERATOR === (
+ LEFTARG = int,
+ RIGHTARG = int,
+ PROCEDURE = int4eq,
+ COMMUTATOR = ===,
+ NEGATOR = !==
+);
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = postgres_fdw_abs(t1.c2);
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 === t1.c2;
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = abs(t1.c2);
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = t1.c2;
+
+-- ===================================================================
+-- WHERE push down
+-- ===================================================================
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1; -- Var, OpExpr(b), Const
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NULL; -- NullTest
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL; -- NullTest
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 = -c1; -- OpExpr(l)
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE 1 = c1!; -- OpExpr(r)
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 = ANY(ARRAY[c2, 1, c1 + 0]); -- ScalarArrayOpExpr
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 = (ARRAY[c1,c2,3])[1]; -- ArrayRef
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo'; -- no push-down
+
+-- ===================================================================
+-- parameterized queries
+-- ===================================================================
+-- simple join
+PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
+EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2);
+EXECUTE st1(1, 1);
+EXECUTE st1(101, 101);
+-- subquery using stable function (can't be pushed down)
+PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c4) = 6) ORDER BY c1;
+EXPLAIN (VERBOSE, COSTS false) EXECUTE st2(10, 20);
+EXECUTE st2(10, 20);
+EXECUTE st1(101, 101);
+-- subquery using immutable function (can be pushed down)
+PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND EXTRACT(dow FROM c5) = 6) ORDER BY c1;
+EXPLAIN (VERBOSE, COSTS false) EXECUTE st3(10, 20);
+EXECUTE st3(10, 20);
+EXECUTE st3(20, 30);
+-- custom plan should be chosen
+PREPARE st4(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 = $1;
+EXPLAIN (VERBOSE, COSTS false) EXECUTE st4(1);
+EXPLAIN (VERBOSE, COSTS false) EXECUTE st4(1);
+EXPLAIN (VERBOSE, COSTS false) EXECUTE st4(1);
+EXPLAIN (VERBOSE, COSTS false) EXECUTE st4(1);
+EXPLAIN (VERBOSE, COSTS false) EXECUTE st4(1);
+EXPLAIN (VERBOSE, COSTS false) EXECUTE st4(1);
+-- cleanup
+DEALLOCATE st1;
+DEALLOCATE st2;
+DEALLOCATE st3;
+DEALLOCATE st4;
+
+-- ===================================================================
+-- used in pl/pgsql function
+-- ===================================================================
+CREATE OR REPLACE FUNCTION f_test(p_c1 int) RETURNS int AS $$
+DECLARE
+ v_c1 int;
+BEGIN
+ SELECT c1 INTO v_c1 FROM ft1 WHERE c1 = p_c1 LIMIT 1;
+ PERFORM c1 FROM ft1 WHERE c1 = p_c1 AND p_c1 = v_c1 LIMIT 1;
+ RETURN v_c1;
+END;
+$$ LANGUAGE plpgsql;
+SELECT f_test(100);
+DROP FUNCTION f_test(int);
+
+-- ===================================================================
+-- cost estimation options
+-- ===================================================================
+ALTER SERVER loopback1 OPTIONS (SET use_remote_explain 'true');
+ALTER SERVER loopback1 OPTIONS (SET fdw_startup_cost '0');
+ALTER SERVER loopback1 OPTIONS (SET fdw_tuple_cost '0');
+EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 ORDER BY c3, c1 OFFSET 100 LIMIT 10;
+ALTER SERVER loopback1 OPTIONS (DROP use_remote_explain);
+ALTER SERVER loopback1 OPTIONS (DROP fdw_startup_cost);
+ALTER SERVER loopback1 OPTIONS (DROP fdw_tuple_cost);
+
+-- ===================================================================
+-- connection management
+-- ===================================================================
+SELECT srvname, usename FROM postgres_fdw_connections;
+SELECT postgres_fdw_disconnect(srvid, usesysid) FROM postgres_fdw_get_connections();
+SELECT srvname, usename FROM postgres_fdw_connections;
+
+-- ===================================================================
+-- conversion error
+-- ===================================================================
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE int;
+SELECT * FROM ft1 WHERE c1 = 1; -- ERROR
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE user_enum;
+
+-- ===================================================================
+-- subtransaction
+-- + local/remote error doesn't break cursor
+-- + remote error discards connection
+-- ===================================================================
+BEGIN;
+DECLARE c CURSOR FOR SELECT * FROM ft1 ORDER BY c1;
+FETCH c;
+SAVEPOINT s;
+ERROR OUT; -- ERROR
+ROLLBACK TO s;
+SELECT srvname FROM postgres_fdw_connections;
+FETCH c;
+SAVEPOINT s;
+SELECT * FROM ft1 WHERE 1 / (c1 - 1) > 0; -- ERROR
+ROLLBACK TO s;
+SELECT srvname FROM postgres_fdw_connections;
+FETCH c;
+SELECT * FROM ft1 ORDER BY c1 LIMIT 1;
+COMMIT;
+SELECT srvname FROM postgres_fdw_connections;
+ERROR OUT; -- ERROR
+SELECT srvname FROM postgres_fdw_connections;
+
+-- ===================================================================
+-- cleanup
+-- ===================================================================
+DROP OPERATOR === (int, int) CASCADE;
+DROP OPERATOR !== (int, int) CASCADE;
+DROP FUNCTION postgres_fdw_abs(int);
+DROP SCHEMA "S 1" CASCADE;
+DROP TYPE user_enum CASCADE;
+DROP EXTENSION postgres_fdw CASCADE;
+\c
+DROP ROLE postgres_fdw_user;
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 6b13a0a..39e9827 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -132,6 +132,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
&pgstatstatements;
&pgstattuple;
&pgtrgm;
+ &postgres-fdw;
&seg;
&sepgsql;
&contrib-spi;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index db4cc3a..354111a 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -133,6 +133,7 @@
<!ENTITY pgtesttiming SYSTEM "pgtesttiming.sgml">
<!ENTITY pgtrgm SYSTEM "pgtrgm.sgml">
<!ENTITY pgupgrade SYSTEM "pgupgrade.sgml">
+<!ENTITY postgres-fdw SYSTEM "postgres-fdw.sgml">
<!ENTITY seg SYSTEM "seg.sgml">
<!ENTITY contrib-spi SYSTEM "contrib-spi.sgml">
<!ENTITY sepgsql SYSTEM "sepgsql.sgml">
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
new file mode 100644
index 0000000..1f00665
--- /dev/null
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -0,0 +1,434 @@
+<!-- doc/src/sgml/postgres-fdw.sgml -->
+
+<sect1 id="postgres-fdw" xreflabel="postgres_fdw">
+ <title>postgres_fdw</title>
+
+ <indexterm zone="postgres-fdw">
+ <primary>postgres_fdw</primary>
+ </indexterm>
+
+ <para>
+ The <filename>postgres_fdw</filename> module provides a foreign-data
+ wrapper for external <productname>PostgreSQL</productname> servers.
+ With this module, users can access data stored in external
+ <productname>PostgreSQL</productname> via plain SQL statements.
+ </para>
+
+ <para>
+ Default wrapper <literal>postgres_fdw</literal> is created automatically
+ during <command>CREATE EXTENSION</command> command for
+ <application>postgres_fdw</application>, so what you need to do to execute
+ queries are:
+ <orderedlist spacing="compact">
+ <listitem>
+ <para>
+ Create foreign server with <command>CREATE SERVER</command> command for
+ each remote database you want to connect. You need to specify connection
+ information except <literal>user</literal> and <literal>password</literal>
+ on it.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Create user mapping for servers with
+ <command>CREATE USER MAPPING</command> command for each user you want to
+ allow accessing the foreign server. You need to specify
+ <literal>user</literal> and <literal>password</literal> on it.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Create foreign table with <command>CREATE FOREIGN TABLE</command> command
+ for each relation you want to access. If you want to use different name
+ from remote one, you need to specify object name options (see below).
+ </para>
+ <para>
+ It is recommended to use same data types as those of remote columns,
+ though libpq text protocol allows flexible conversions between similar
+ data types.
+ </para>
+ </listitem>
+ </orderedlist>
+ </para>
+
+ <sect2>
+ <title>FDW Options of postgres_fdw</title>
+
+ <sect3>
+ <title>Connection Options</title>
+ <para>
+ A foreign server and user mapping created using this wrapper can have
+ <application>libpq</> connection options, expect below:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ client_encoding (automatically determined from the local server encoding)
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ fallback_application_name (fixed to <literal>postgres_fdw</literal>)
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ replication (never used for foreign-data wrapper connection)
+ </para>
+ </listitem>
+ </itemizedlist>
+
+ For details of <application>libpq</> connection options, see
+ <xref linkend="libpq-connect">.
+ </para>
+ <para>
+ <literal>user</literal> and <literal>password</literal> can be
+ specified on user mappings, and others can be specified on foreign servers.
+ </para>
+ <para>
+ Note that only superusers may connect foreign servers without password
+ authentication, so specify <literal>password</literal> FDW option on
+ corresponding user mappings for non-superusers.
+ </para>
+ </sect3>
+
+ <sect3>
+ <title>Object Name Options</title>
+ <para>
+ Foreign tables which were created using this wrapper, or their columns can
+ have object name options. These options can be used to specify the names
+ used in SQL statement sent to remote <productname>PostgreSQL</productname>
+ server. These options are useful when remote objects have different names
+ from corresponding local ones.
+ </para>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>nspname</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a foreign table, is used as a
+ namespace (schema) reference in the SQL statement. If this options is
+ omitted, <literal>pg_class.nspname</literal> of the foreign table is
+ used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>relname</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a foreign table, is used as a
+ relation (table) reference in the SQL statement. If this options is
+ omitted, <literal>pg_class.relname</literal> of the foreign table is
+ used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>colname</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a column of a foreign table, is
+ used as a column (attribute) reference in the SQL statement. If this
+ option is omitted, <literal>pg_attribute.attname</literal> of the column
+ of the foreign table is used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ </sect3>
+
+ <sect3>
+ <title>Cost Estimation Options</title>
+ <para>
+ The <application>postgres_fdw</> retrieves foreign data by executing queries
+ against foreign servers, so foreign scans usually cost more than scans done
+ on local side. To reflect various circumstance of foreign servers,
+ <application>postgres_fdw</> provides some options:
+ </para>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>use_remote_estimate</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a foreign table or a foreign
+ server, is used to control <application>postgres_fdw</>'s behavior about
+ estimation of rows and width. If this was set to
+ <literal>true</literal>, remote <command>EXPLAIN</command> is
+ executed in the early step of planning. This would give better estimate
+ of rows and width, but it also introduces some overhead. This option
+ defaults to <literal>false</literal>.
+ </para>
+ <para>
+ The <application>postgres_fdw</> supports gathering statistics of
+ foreign data from foreign servers and store them on local side via
+ <command>ANALYZE</command>, so we can estimate reasonable rows and width
+ of result of a query from them. However, if target foreign table is
+ frequently updated, local statistics would be obsolete soon.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>fdw_startup_cost</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a foreign server, is used as
+ additional startup cost per a scan. If planner overestimates or
+ underestimates startup cost of a foreign scan, change this to reflect
+ the actual overhead.
+ </para>
+ <para>
+ Defaults to <literal>100</literal>. The default value is groundless,
+ but this would be enough to make most foreign scans to have more cost
+ than local scans, even that foreign scan returns nothing.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>fdw_tuple_cost</literal></term>
+ <listitem>
+ <para>
+ This option, which can be specified on a foreign server, is used as
+ additional cost per tuple, which reflects overhead of tuple
+ manipulation and transfer between servers. If a foreign server is far
+ or near in the network, or a foreign server has different performance
+ characteristics, use this option to tell planner that.
+ </para>
+ <para>
+ Defaults to <literal>0.01</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+ </sect3>
+
+ </sect2>
+
+ <sect2>
+ <title>Connection Management</title>
+
+ <para>
+ The <application>postgres_fdw</application> establishes a connection to a
+ foreign server in the beginning of the first query which uses a foreign
+ table associated to the foreign server, and reuses the connection following
+ queries and even in following foreign scans in same query.
+
+ You can see the list of active connections via
+ <structname>postgres_fdw_connections</structname> view. It shows pairs of
+ oid and name of server and local role for each active connections
+ established by <application>postgres_fdw</application>. For security
+ reason, only superuser can see other role's connections.
+ </para>
+
+ <para>
+ Established connections are kept alive until local role changes or the
+ current transaction aborts or user requests so.
+ </para>
+
+ <para>
+ If role has been changed, active connections established as old local role
+ is kept alive but never be reused until local role has restored to original
+ role. This kind of situation happens with <command>SET ROLE</command> and
+ <command>SET SESSION AUTHORIZATION</command>.
+ </para>
+
+ <para>
+ If current transaction aborts by error or user request, all active
+ connections are disconnected automatically. This behavior avoids possible
+ connection leaks on error.
+ </para>
+
+ <para>
+ You can discard persistent connection at arbitrary timing with
+ <function>postgres_fdw_disconnect()</function>. It takes server oid and
+ user oid as arguments. This function can handle only connections
+ established in current session; connections established by other backends
+ are not reachable.
+ </para>
+
+ <para>
+ You can discard all active and visible connections in current session with
+ using <structname>postgres_fdw_connections</structname> and
+ <function>postgres_fdw_disconnect()</function> together:
+<synopsis>
+postgres=# SELECT postgres_fdw_disconnect(srvid, usesysid) FROM postgres_fdw_connections;
+ postgres_fdw_disconnect
+-------------------------
+ OK
+ OK
+(2 rows)
+</synopsis>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Transaction Management</title>
+ <para>
+ The <application>postgres_fdw</application> begins remote transaction at
+ the beginning of a local query, and terminates it with
+ <command>ABORT</command> at the end of the local query. This means that all
+ foreign scans on a foreign server in a local query are executed in one
+ transaction.
+ </para>
+ <para>
+ Isolation level of remote transaction is determined from local transaction's
+ isolation level.
+ <table id="postgres-fdw-isolation-level">
+ <title>Isolation Level Mapping</title>
+
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry>Local Isolation Level</entry>
+ <entry>Remote Isolation Level</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>READ UNCOMMITTED</entry>
+ <entry morerows="2">REPEATABLE READ</entry>
+ </row>
+ <row>
+ <entry>READ COMMITTED</entry>
+ </row>
+ <row>
+ <entry>REPEATABLE READ</entry>
+ </row>
+ <row>
+ <entry>SERIALIZABLE</entry>
+ <entry>SERIALIZABLE</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </para>
+ <para>
+ <literal>READ UNCOMMITTED</literal> and <literal>READ COMMITTED</literal>
+ are never used for remote transactions, because even
+ <literal>READ COMMITTED</literal> transactions might produce inconsistent
+ results, if remote data has been updated between two remote queries (it
+ can happen in a local query).
+ </para>
+ <para>
+ Note that even if the isolation level of local transaction was
+ <literal>SERIALIZABLE</literal> or <literal>REPEATABLE READ</literal>,
+ executing same query repeatedly might produce different result, because
+ foreign scans in different local queries are executed in different remote
+ transactions. For instance, if external data was update between two same
+ queries in a <literal>SERIALIZABLE</literal> local transaction, client
+ receives different results.
+ </para>
+ <para>
+ This restriction might be relaxed in future release.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Estimate Costs, Rows and Width</title>
+ <para>
+ The <application>postgres_fdw</application> estimates the costs of a
+ foreign scan in two ways. In either way, selectivity of restrictions are
+ concerned to give proper estimate.
+ </para>
+ <para>
+ If <literal>use_remote_estimate</literal> was set to
+ <literal>false</literal> (default behavior), <application>postgres_fdw</>
+ assumes that external data have not been changed so much, and uses local
+ statistics as-is. It is recommended to execute <command>ANALYZE</command>
+ to keep local statistics reflect characteristics of external data.
+ Otherwise, <application>postgres_fdw</> executes remote
+ <command>EXPLAIN</command> in the beginning of a foreign scan to get remote
+ estimate of the remote query. This would provide better estimate but
+ requires some overhead.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Remote Query Optimization</title>
+ <para>
+ The <application>postgres_fdw</> optimizes remote queries to reduce amount
+ of data transferred from foreign servers.
+ <itemizedlist>
+ <listitem>
+ <para>
+ Restrictions which have same semantics on remote side are pushed down.
+ For example, restrictions which contain elements below might have
+ different semantics on remote side.
+ <itemizedlist spacing="compact">
+ <listitem>
+ <para>
+ User defined objects, such as functions, operators, and types.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Functions defined as <literal>STABLE</literal> or
+ <literal>VOLATILE</literal>, and operators which use such functions.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Collatable types, such as text or varchar, with some exception (see
+ below).
+ </para>
+ <para>
+ Basically we assume that collatable expressions have different
+ semantics, because remote server might has different collation
+ setting, but this assumption causes denying simple and usual
+ expressions, such as <literal>text_col = 'string'</literal> to be
+ pushed down. So <application>postgres_fdw</application> treats
+ operator <literal>=</literal> and <literal><></literal> as safe
+ to push down even if they take collatable types as arguments.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Unnecessary columns in <literal>SElECT</literal> clause of remote queries
+ are replaced with <literal>NULL</literal> literal.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>EXPLAIN Output</title>
+ <para>
+ For each foreign table using <literal>postgres_fdw</>, <command>EXPLAIN</>
+ shows a remote SQL statement which is sent to remote
+ <productname>PostgreSQL</productname> server for a ForeignScan plan node.
+ For example:
+ </para>
+<synopsis>
+postgres=# EXPLAIN SELECT aid FROM pgbench_accounts WHERE abalance < 0;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan on pgbench_accounts (cost=100.00..100.11 rows=1 width=97)
+ Remote SQL: SELECT aid, bid, abalance, filler FROM public.pgbench_accounts WHERE ((abalance OPERATOR(pg_catalog.<) 0))
+(2 rows)
+</synopsis>
+ </sect2>
+
+ <sect2>
+ <title>Author</title>
+ <para>
+ Shigeru Hanada <email>shigeru.hanada@gmail.com</email>
+ </para>
+ </sect2>
+
+</sect1>
2012/11/28 Shigeru Hanada <shigeru.hanada@gmail.com>:
On Sun, Nov 25, 2012 at 5:24 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
I checked the v4 patch, and I have nothing to comment anymore.
So, could you update the remaining EXPLAIN with VERBOSE option
stuff?Thanks for the review. Here is updated patch.
I checked the patch. The new VERBOSE option of EXPLAIN statement seems to me
working fine. I think it is time to hand over this patch to committer.
It is not a matter to be solved, but just my preference.
postgres=# EXPLAIN(VERBOSE) SELECT * FROM ftbl WHERE a > 0 AND b like '%a%';
QUERY PLAN
--------------------------------------------------------------------------------
Foreign Scan on public.ftbl (cost=100.00..100.01 rows=1 width=36)
Output: a, b
Filter: (ftbl.b ~~ '%a%'::text)
Remote SQL: SELECT a, b FROM public.tbl WHERE ((a OPERATOR(pg_catalog.>) 0))
(4 rows)
postgres=# EXPLAIN SELECT * FROM ftbl WHERE a > 0 AND b like '%a%';
QUERY PLAN
-------------------------------------------------------------
Foreign Scan on ftbl (cost=100.00..100.01 rows=1 width=36)
Filter: (b ~~ '%a%'::text)
(2 rows)
Do you think the qualifier being pushed-down should be explained if VERBOSE
option was not given?
BTW, we have one more issue around naming of new FDW, and it is discussed in
another thread.
http://archives.postgresql.org/message-id/9E59E6E7-39C9-4AE9-88D6-BB0098579017@gmail.com
I don't have any strong option about this naming discussion.
As long as it does not conflict with existing name and is not
misleading, I think
it is reasonable. So, "postgre_fdw" is OK for me. "pgsql_fdw" is also welcome.
"posugure_fdw" may make sense only in Japan. "pg_fdw" is a bit misleading.
"postgresql_fdw" might be the best, but do we have some clear advantage
on this name to take an additional effort to solve the conflict with existing
built-in postgresql_fdw_validator() function?
I think, "postgres_fdw" is enough reasonable choice.
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2012/11/28 Kohei KaiGai <kaigai@kaigai.gr.jp>:
it is reasonable. So, "postgre_fdw" is OK for me. "pgsql_fdw" is also welcome.
Sorry, s/"postgre_fdw"/"postgres_fdw"/g
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
[ postgres_fdw.v5.patch ]
I started to look at this patch today. There seems to be quite a bit
left to do to make it committable. I'm willing to work on it, but
there are some things that need discussion:
* The code seems to always use GetOuterUserId() to select the foreign
user mapping to use. This seems entirely wrong. For instance it will
do the wrong thing inside a SECURITY DEFINER function, where surely the
relevant privileges should be those of the function owner, not the
session user. I would also argue that if Alice has access to a foreign
table owned by Bob, and Alice creates a view that selects from that
table and grants select privilege on the view to Charlie, then when
Charlie selects from the view the user mapping to use ought to be
Alice's. (If anyone thinks differently about that, speak up!)
To implement that for queries, we need code similar to what
ExecCheckRTEPerms does, ie "rte->checkAsUser ? rte->checkAsUser :
GetUserId()". It's a bit of a pain to get hold of the RTE from
postgresGetForeignRelSize or postgresBeginForeignScan, but it's doable.
(Should we modify the APIs for these functions to make that easier?)
I think possibly postgresAcquireSampleRowsFunc should use the foreign
table's owner regardless of the current user ID - if the user has
permission to run ANALYZE then we don't really want the command to
succeed or fail depending on exactly who the user is. That's perhaps
debatable, anybody have another theory?
* AFAICT, the patch expects to use a single connection for all
operations initiated under one foreign server + user mapping pair.
I don't think this can possibly be workable. For instance, we don't
really want postgresIterateForeignScan executing the entire remote query
to completion and stashing the results locally -- what if that's many
megabytes? It ought to be pulling the rows back a few at a time, and
that's not going to work well if multiple scans are sharing the same
connection. (We might be able to dodge that by declaring a cursor
for each scan, but I'm not convinced that such a solution will scale up
to writable foreign tables, nested queries, subtransactions, etc.)
I think we'd better be prepared to allow multiple similar connections.
The main reason I'm bringing this up now is that it breaks the
assumption embodied in postgres_fdw_get_connections() and
postgres_fdw_disconnect() that foreign server + user mapping can
constitute a unique key for identifying connections. However ...
* I find postgres_fdw_get_connections() and postgres_fdw_disconnect()
to be a bad idea altogether. These connections ought to be a hidden
implementation matter, not something that the user has a view of, much
less control over. Aside from the previous issue, I believe it's a
trivial matter to crash the patch as it now stands by applying
postgres_fdw_disconnect() to a connection that's in active use. I can
see the potential value in being able to shut down connections when a
session has stopped using them, but this is a pretty badly-designed
approach to that. I suggest that we just drop these functions for now
and revisit that problem later. (One idea is some sort of GUC setting
to control how many connections can be held open speculatively for
future use.)
* deparse.c contains a depressingly large amount of duplication of logic
from ruleutils.c, and can only need more as we expand the set of
constructs that can be pushed to the remote end. This doesn't seem like
a maintainable approach. Was there a specific reason not to try to use
ruleutils.c for this? I'd much rather tweak ruleutils to expose some
additional APIs, if that's what it takes, than have all this redundant
logic.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/2/14 Tom Lane <tgl@sss.pgh.pa.us>:
* deparse.c contains a depressingly large amount of duplication of logic
from ruleutils.c, and can only need more as we expand the set of
constructs that can be pushed to the remote end. This doesn't seem like
a maintainable approach. Was there a specific reason not to try to use
ruleutils.c for this? I'd much rather tweak ruleutils to expose some
additional APIs, if that's what it takes, than have all this redundant
logic.
The original pgsql_fdw design utilized ruleutils.c logic.
Previously, you suggested to implement its own logic for query deparsing,
then Hanada-san rewrite the relevant code.
/messages/by-id/12181.1331223482@sss.pgh.pa.us
Indeed, most of the logic is duplicated. However, it is to avoid bugs in
some corner cases, for instance, function name is not qualified with
schema even if this function is owned by different schema in remote side.
Do we add a flag on deparse_expression() to show this call intend to
construct remote executable query? It may be reasonable, but case-
branch makes code complicated in general....
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Feb 14, 2013 at 1:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
* The code seems to always use GetOuterUserId() to select the foreign
user mapping to use. This seems entirely wrong. For instance it will
do the wrong thing inside a SECURITY DEFINER function, where surely the
relevant privileges should be those of the function owner, not the
session user. I would also argue that if Alice has access to a foreign
table owned by Bob, and Alice creates a view that selects from that
table and grants select privilege on the view to Charlie, then when
Charlie selects from the view the user mapping to use ought to be
Alice's. (If anyone thinks differently about that, speak up!)
Agreed that OuterUserId is wrong for user mapping. Also agreed that
Charlie doesn't need his own mapping for the server, if he is
accessing via a valid view.
To implement that for queries, we need code similar to what
ExecCheckRTEPerms does, ie "rte->checkAsUser ? rte->checkAsUser :
GetUserId()". It's a bit of a pain to get hold of the RTE from
postgresGetForeignRelSize or postgresBeginForeignScan, but it's doable.
(Should we modify the APIs for these functions to make that easier?)
This issue seems not specific to postgres_fdw. Currently
GetUserMapping takes userid and server's oid as parameters, but we
would able to hide the complex rule by replacing userid with RTE or
something.
I think possibly postgresAcquireSampleRowsFunc should use the foreign
table's owner regardless of the current user ID - if the user has
permission to run ANALYZE then we don't really want the command to
succeed or fail depending on exactly who the user is. That's perhaps
debatable, anybody have another theory?
+1. This allows non-owner to ANALYZE foreign tables without having
per-user mapping, though public mapping also solves this issue.
In implementation level, postgresAcquireSampleRowsFunc has Relation of
the target table, so we can get owner's oid by reading
rd_rel->relowner.
* AFAICT, the patch expects to use a single connection for all
operations initiated under one foreign server + user mapping pair.
I don't think this can possibly be workable. For instance, we don't
really want postgresIterateForeignScan executing the entire remote query
to completion and stashing the results locally -- what if that's many
megabytes?
It uses single-row-mode of libpq and TuplestoreState to keep result
locally, so it uses limited memory at a time. If the result is larger
than work_mem, overflowed tuples are written to temp file. I think
this is similar to materializing query results.
It ought to be pulling the rows back a few at a time, and
that's not going to work well if multiple scans are sharing the same
connection. (We might be able to dodge that by declaring a cursor
for each scan, but I'm not convinced that such a solution will scale up
to writable foreign tables, nested queries, subtransactions, etc.)
Indeed the FDW used CURSOR in older versions. Sorry for that I have
not looked writable foreign table patch closely yet, but it would
require (may be multiple) remote update query executions during
scanning?
I think we'd better be prepared to allow multiple similar connections.
The main reason I'm bringing this up now is that it breaks the
assumption embodied in postgres_fdw_get_connections() and
postgres_fdw_disconnect() that foreign server + user mapping can
constitute a unique key for identifying connections. However ...
Main reason to use single connection is to make multiple results
retrieved from same server in a local query consistent. Shared
snapshot might be helpful for this consistency issue, but I've not
tried that with FDW.
* I find postgres_fdw_get_connections() and postgres_fdw_disconnect()
to be a bad idea altogether. These connections ought to be a hidden
implementation matter, not something that the user has a view of, much
less control over. Aside from the previous issue, I believe it's a
trivial matter to crash the patch as it now stands by applying
postgres_fdw_disconnect() to a connection that's in active use. I can
see the potential value in being able to shut down connections when a
session has stopped using them, but this is a pretty badly-designed
approach to that. I suggest that we just drop these functions for now
and revisit that problem later. (One idea is some sort of GUC setting
to control how many connections can be held open speculatively for
future use.)
Actually these functions follows dblink's similar functions, but
having them is a bad decision because FDW can't connect explicitly.
As you mentioned, postgres_fdw_disconnect is provided for clean
shutdown on remote side (I needed it in my testing).
I agree that separate the issue from FDW core.
* deparse.c contains a depressingly large amount of duplication of logic
from ruleutils.c, and can only need more as we expand the set of
constructs that can be pushed to the remote end. This doesn't seem like
a maintainable approach. Was there a specific reason not to try to use
ruleutils.c for this? I'd much rather tweak ruleutils to expose some
additional APIs, if that's what it takes, than have all this redundant
logic.
I got a comment about that issue from you, but I might have misunderstood.
(2012/03/09 1:18), Tom Lane wrote:
I've been looking at this patch a little bit over the past day or so.
I'm pretty unhappy with deparse.c --- it seems like a real kluge,
inefficient and full of corner-case bugs. After some thought I believe
that you're ultimately going to have to abandon depending on ruleutils.c
for reverse-listing services, and it would be best to bite that bullet
now and rewrite this code from scratch.
I thought that writing ruleutils-free SQL constructor in postgres_fdw
is better approach because ruleutils might be changed from its own
purpose in future. Besides that, as Kaigai-san mentioned in upthread,
ruleutils seems to have insufficient capability for building remote
PostgreSQL query.
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Shigeru Hanada wrote:
Tom Lane wrote:
It ought to be pulling the rows back a few at a time, and
that's not going to work well if multiple scans are sharing the same
connection. (We might be able to dodge that by declaring a cursor
for each scan, but I'm not convinced that such a solution will scale up
to writable foreign tables, nested queries, subtransactions, etc.)Indeed the FDW used CURSOR in older versions. Sorry for that I have
not looked writable foreign table patch closely yet, but it would
require (may be multiple) remote update query executions during
scanning?
It would for example call ExecForeignUpdate after each call to
IterateForeignScan that produces a row that meets the UPDATE
condition.
Yours,
Laurenz Albe
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Feb 14, 2013 at 6:45 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Shigeru Hanada wrote:
Tom Lane wrote:
It ought to be pulling the rows back a few at a time, and
that's not going to work well if multiple scans are sharing the same
connection. (We might be able to dodge that by declaring a cursor
for each scan, but I'm not convinced that such a solution will scale up
to writable foreign tables, nested queries, subtransactions, etc.)Indeed the FDW used CURSOR in older versions. Sorry for that I have
not looked writable foreign table patch closely yet, but it would
require (may be multiple) remote update query executions during
scanning?It would for example call ExecForeignUpdate after each call to
IterateForeignScan that produces a row that meets the UPDATE
condition.
Thanks! It seems that ExecForeignUpdate needs another connection for
update query, or we need to retrieve all results at the first Iterate
call to prepare for possible subsequent update query.
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Kohei KaiGai <kaigai@kaigai.gr.jp> writes:
2013/2/14 Tom Lane <tgl@sss.pgh.pa.us>:
* deparse.c contains a depressingly large amount of duplication of logic
from ruleutils.c, and can only need more as we expand the set of
constructs that can be pushed to the remote end. This doesn't seem like
a maintainable approach. Was there a specific reason not to try to use
ruleutils.c for this?
Previously, you suggested to implement its own logic for query deparsing,
then Hanada-san rewrite the relevant code.
/messages/by-id/12181.1331223482@sss.pgh.pa.us
[ rereads that... ] Hm, I did make some good points. But having seen
the end result of this way, I'm still not very happy; it still looks
like a maintenance problem. Maybe some additional flags in ruleutils.c
is the least evil way after all. Needs more thought.
Indeed, most of the logic is duplicated. However, it is to avoid bugs in
some corner cases, for instance, function name is not qualified with
schema even if this function is owned by different schema in remote side.
That particular reason doesn't seem to hold a lot of water when we're
restricting the code to only push over built-in functions/operators
anyway.
I find it tempting to think about setting search_path explicitly to
"pg_catalog" (only) on the remote side, whereupon we'd have to
explicitly schema-qualify references to user tables, but built-in
functions/operators would not need that (and it wouldn't really matter
if ruleutils did try to qualify them).
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
On Thu, Feb 14, 2013 at 1:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
* AFAICT, the patch expects to use a single connection for all
operations initiated under one foreign server + user mapping pair.
I don't think this can possibly be workable. For instance, we don't
really want postgresIterateForeignScan executing the entire remote query
to completion and stashing the results locally -- what if that's many
megabytes?
It uses single-row-mode of libpq and TuplestoreState to keep result
locally, so it uses limited memory at a time. If the result is larger
than work_mem, overflowed tuples are written to temp file. I think
this is similar to materializing query results.
Well, yeah, but that doesn't make it an acceptable solution. Consider
for instance "SELECT * FROM huge_foreign_table LIMIT 10". People are
not going to be satisfied if that pulls back the entire foreign table
before handing them the 10 rows. Comparable performance problems can
arise even without LIMIT, for instance in handling of nestloop inner
scans.
I think we'd better be prepared to allow multiple similar connections.
Main reason to use single connection is to make multiple results
retrieved from same server in a local query consistent.
Hmm. That could be a good argument, although the current patch pretty
much destroys any such advantage by being willing to use READ COMMITTED
mode on the far end --- with that, you lose any right to expect
snapshot-consistent data anyway. I'm inclined to think that maybe we
should always use at least REPEATABLE READ mode, rather than blindly
copying the local transaction mode. Or maybe this should be driven by a
foreign-server option instead of looking at the local mode at all?
Anyway, it does seem like maybe we need to use cursors so that we can
have several active scans that we are pulling back just a few rows at a
time from.
I'm not convinced that that gets us out of the woods though WRT needing
only one connection. Consider a query that is scanning some foreign
table, and it calls a plpgsql function, and that function (inside an
EXCEPTION block) does a query that scans another foreign table on the
same server. This second query gets an error on the remote side. If
the error is caught via the exception block, and the outer query
continues, what then? We could imagine adding enough infrastructure
to establish a remote savepoint for each local subtransaction and clean
things up on failure, but no such logic is in the patch now, and I think
it wouldn't be too simple either. The least painful way to make this
scenario work, given what's in the patch, is to allow such a
subtransaction to use a separate connection.
In any case, I'm pretty well convinced that the connection-bookkeeping
logic needs a major rewrite to have any hope of working in
subtransactions. I'm going to work on that first and see where it leads.
* I find postgres_fdw_get_connections() and postgres_fdw_disconnect()
to be a bad idea altogether.
I agree that separate the issue from FDW core.
OK, so we'll drop these from the current version of the patch and
revisit the problem of closing connections later.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Continuing to look at this patch ... I'm wondering if any particular
discussion went into choosing the FDW option names "nspname", "relname",
and "colname". These don't seem to me like names that we ought to be
exposing at the SQL command level. Why not just "schema", "table",
"column"? Or perhaps "schema_name", "table_name", "column_name" if you
feel it's essential to distinguish that these are names.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Feb 17, 2013 at 8:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Continuing to look at this patch ... I'm wondering if any particular
discussion went into choosing the FDW option names "nspname", "relname",
and "colname".
IIRC, there was no deep discussion about those option names. I simply
chose "relname" and "nspname" from pg_class and pg_namespace. At that
time I thought users would understand those options easily if those
names are used in catalog.
These don't seem to me like names that we ought to be
exposing at the SQL command level. Why not just "schema", "table",
"column"? Or perhaps "schema_name", "table_name", "column_name" if you
feel it's essential to distinguish that these are names.
I think not-shortened names (words used in documents of conversations)
are better now. I prefer "table_name" to "table", because it would be
easy to distinguish as name, even if we add new options like
"table_foo".
Besides, I found a strange(?) behavior in psql \d+ command in
no-postfix case, though it wouldn't be a serious problem.
In psql \d+ result for postgres_fdw foreign tables, "table" and
"column" are quoted, but "schema" is not. Is this behavior of
quote_ident() intentional?
postgres=# \d+ pgbench1_branches
Foreign table
"public.pgbench1_branches"
Column | Type | Modifiers | FDW Options | Storage |
Stats target | Description
----------+---------------+-----------+------------------+----------+--------------+-------------
bid | integer | not null | ("column" 'bid') | plain |
|
bbalance | integer | | | plain |
|
filler | character(88) | | | extended |
|
Server: pgbench1
FDW Options: (schema 'public', "table" 'foo')
Has OIDs: no
We can use "table" and "column" options without quoting (or with quote
of course) in CREATE/ALTER FOREIGN TABLE commands, so this is not a
barrier against choosing no-postfix names.
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
On Sun, Feb 17, 2013 at 8:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
These don't seem to me like names that we ought to be
exposing at the SQL command level. Why not just "schema", "table",
"column"? Or perhaps "schema_name", "table_name", "column_name" if you
feel it's essential to distinguish that these are names.
I think not-shortened names (words used in documents of conversations)
are better now. I prefer "table_name" to "table", because it would be
easy to distinguish as name, even if we add new options like
"table_foo".
Yeah. I doubt that these options will be commonly used anyway ---
surely it's easier and less confusing to choose names that match the
remote table in the first place. So there's no very good reason to
keep the option names short.
I'll go with "schema_name", "table_name", "column_name" unless someone
comes along with a contrary opinion.
In psql \d+ result for postgres_fdw foreign tables, "table" and
"column" are quoted, but "schema" is not. Is this behavior of
quote_ident() intentional?
That's probably a consequence of these being keywords of different
levels of reserved-ness. If we go with the longer names it won't
happen.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Feb 15, 2013 at 12:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
[ rereads that... ] Hm, I did make some good points. But having seen
the end result of this way, I'm still not very happy; it still looks
like a maintenance problem. Maybe some additional flags in ruleutils.c
is the least evil way after all. Needs more thought.
I'm working on revising deparser so that it uses ruleutils routines to
construct remote query, and re-found an FDW-specific problem which I
encountered some months ago.
So far ruleutils routines require "deparse context", which is a list
of namespace information. Currently deparse_context_for() seems to
fit postgres_fdw's purpose, but it always uses names stored in
catalogs (pg_class, pg_attribute and pg_namespace), though
postgres_fdw wants to replace column/table/schema name with the name
specified in relevant FDW options if any.
Proper remote query will be generated If postgres_fdw can modify
deparse context, but deparse_context is hidden detail of ruleutils.c.
IMO disclosing it is bad idea.
Given these, I'm thinking to add new deparse context generator which
basically construct namespaces from catalogs, but replace one if FDW
option *_name was specified for an object. With this context,
existing ruleutils would generate expression-strings with proper
names, without any change.
Is this idea acceptable?
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
On Fri, Feb 15, 2013 at 12:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
[ rereads that... ] Hm, I did make some good points. But having seen
the end result of this way, I'm still not very happy; it still looks
like a maintenance problem. Maybe some additional flags in ruleutils.c
is the least evil way after all. Needs more thought.
I'm working on revising deparser so that it uses ruleutils routines to
construct remote query, and re-found an FDW-specific problem which I
encountered some months ago.
After further review I'm unconvinced that we can really do much better
than what's there now --- the idea of sharing code with ruleutils sounds
attractive, but once you look at all the specific issues that ruleutils
would have to be taught about, it gets much less so. (In particular
I fear we'll find that we have to do some weird stuff to deal with
cross-server-version issues.) I've been revising the patch on the
assumption that we'll keep deparse.c more or less as is.
Having said that, I remain pretty unhappy with the namespace handling in
deparse.c. I don't think it serves much purpose to schema-qualify
everything when we're restricting what we can access to built-in
operators and functions --- the loss of readability outweighs the
benefits IMO. Also, there is very little point in schema-qualifying
almost everything rather than everything; if you're not 100% then you
have no safety against search_path issues. But that's what we've got
because the code still relies on format_type to print type names.
Now we could get around that complaint by duplicating format_type as
well as ruleutils, but I don't think that's the right direction to
proceed. I still think it might be a good idea to set search_path to
pg_catalog on the remote side, and then schema-qualify only what is not
in pg_catalog (which would be nothing, in the current code, so far as
types/functions/operators are concerned).
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
[ postgres_fdw.v5.patch ]
Applied with a lot of revisions.
There are still a number of loose ends and things that need to be
discussed:
I'm not at all happy with the planner support functions --- as designed,
that code is basically incapable of thinking about more than one remote
access path. It needs to be restructured and then extended to be able
to generate parameterized remote paths. The "local estimation" mode is
pretty bogus as well. I thought the patch could be committed anyway,
but I'm going to go back and work on that part later.
I doubt that deparseFuncExpr is doing the right thing by printing
casts as their underlying function calls. The comment claims that
this way is more robust but I rather think it's less so, because it's
effectively assuming that the remote server implements casts exactly
like the local one, which might be incorrect if the remote is a different
Postgres version. I think we should probably change that, but would like
to know what the argument was for coding it like this. Also, if this
is to be the approach to printing casts, why is RelabelType handled
differently?
As I mentioned earlier, I think it would be better to force the remote
session's search_path setting to just "pg_catalog" and then reduce the
amount of explicit schema naming in the queries --- any opinions about
that?
I took out the checks on collations of operators because I thought they
were thoroughly broken. In the first place, looking at operator names
to deduce semantics is unsafe (if we were to try to distinguish equality,
looking up btree opclass membership would be the way to do that). In the
second place, restricting only collation-sensitive operators and not
collation-sensitive functions seems just about useless for guaranteeing
safety. But we don't have any very good handle on which functions might
be safe to send despite having collatable input types, so taking that
approach would greatly restrict our ability to send function calls at all.
The bigger picture here though is that we're already relying on the user
to make sure that remote tables have column data types matching the local
definition, so why can't we say that they've got to make sure collations
match too? So I think this is largely a documentation issue and we don't
need any automated enforcement mechanism, or at least it's silly to try
to enforce this when we're not enforcing column type matching (yet?).
What might make sense is to try to determine whether a WHERE clause uses
any collations different from those of the contained foreign-column Vars,
and send it over only if not. That would prevent us from sending clauses
that explicitly use collations that might not exist on the remote server.
I didn't try to code this though.
Another thing that I find fairly suspicious in this connection is that
deparse.c isn't bothering to print collations attached to Const nodes.
That may be a good idea to avoid needing the assumption that the remote
server uses the same collation names we do, but if we're going to do it
like this, those Const collations need to be considered when deciding
if the expression is safe to send at all.
A more general idea that follows on from that is that if we're relying on
the user to be sure the semantics are the same, maybe we don't need to be
quite so tight about what we'll send over. In particular, if the user has
declared a foreign-table column of a non-built-in type, the current code
will never send any constraints for that column at all, which seems overly
conservative if you believe he's matched the type correctly. I'm not sure
exactly how to act on that thought, but I think there's room for
improvement there.
A related issue is that as coded, is_builtin() is pretty flaky, because
what's built-in on our server might not exist at all on the remote side,
if it's a different major Postgres version. So what we've got is that
the code is overly conservative about what it will send and yet still
perfectly capable of sending remote queries that will fail outright,
which is not a happy combination. I have no very good idea how to deal
with that though.
Another thing I was wondering about, but did not change, is that if we're
having the remote transaction inherit the local transaction's isolation
level, shouldn't it inherit the READ ONLY property as well?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Tom Lane wrote:
Applied with a lot of revisions.
I am thrilled.
As I mentioned earlier, I think it would be better to force the remote
session's search_path setting to just "pg_catalog" and then reduce the
amount of explicit schema naming in the queries --- any opinions about
that?
I think that that would make the remore query much more readable.
That would improve EXPLAIN VERBOSE output, which is a user visible
improvement.
I took out the checks on collations of operators because I thought they
were thoroughly broken. In the first place, looking at operator names
to deduce semantics is unsafe (if we were to try to distinguish equality,
looking up btree opclass membership would be the way to do that). In the
second place, restricting only collation-sensitive operators and not
collation-sensitive functions seems just about useless for guaranteeing
safety. But we don't have any very good handle on which functions might
be safe to send despite having collatable input types, so taking that
approach would greatly restrict our ability to send function calls at all.The bigger picture here though is that we're already relying on the user
to make sure that remote tables have column data types matching the local
definition, so why can't we say that they've got to make sure collations
match too? So I think this is largely a documentation issue and we don't
need any automated enforcement mechanism, or at least it's silly to try
to enforce this when we're not enforcing column type matching (yet?).
I think that the question what to push down is a different question
from checking column data types, because there we can rely on the
type input functions to reject bad values.
Being permissive on collation issues would lead to user problems
along the lines of "my query results are different when I
select from a remote table on a different operating system".
In my experience many users are blissfully ignorant of issues like
collation and encoding.
What about the following design principle:
Only push down conditions which are sure to return the correct
result, provided that the PostgreSQL system objects have not
been tampered with.
Would it be reasonable to push down operators and functions
only if
a) they are in the pg_catalog schema
b) they have been around for a couple of releases
c) they are not collation sensitive?
It makes me uncomfortable to think of a FDW that would happily
push down conditions that may lead to wrong query results.
Another thing I was wondering about, but did not change, is that if we're
having the remote transaction inherit the local transaction's isolation
level, shouldn't it inherit the READ ONLY property as well?
That seems to me like it would be the right thing to do.
Yours,
Laurenz Albe
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-02-21 14:23:35 +0000, Albe Laurenz wrote:
Tom Lane wrote:
Another thing I was wondering about, but did not change, is that if we're
having the remote transaction inherit the local transaction's isolation
level, shouldn't it inherit the READ ONLY property as well?That seems to me like it would be the right thing to do.
I am not 100% convinced of that. There might be valid usecases where a
standby executes queries on the primary that executes that do DML. And
there would be no way out of it I think?
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Albe Laurenz <laurenz.albe@wien.gv.at> writes:
Tom Lane wrote:
As I mentioned earlier, I think it would be better to force the remote
session's search_path setting to just "pg_catalog" and then reduce the
amount of explicit schema naming in the queries --- any opinions about
that?
I think that that would make the remore query much more readable.
That would improve EXPLAIN VERBOSE output, which is a user visible
improvement.
Yeah, that's really the main point. OPERATOR() is tremendously ugly...
The bigger picture here though is that we're already relying on the user
to make sure that remote tables have column data types matching the local
definition, so why can't we say that they've got to make sure collations
match too? So I think this is largely a documentation issue and we don't
need any automated enforcement mechanism, or at least it's silly to try
to enforce this when we're not enforcing column type matching (yet?).
I think that the question what to push down is a different question
from checking column data types, because there we can rely on the
type input functions to reject bad values.
Unfortunately, that's a very myopic view of the situation: there
are many cases where datatype semantics can vary without the I/O
functions having any idea that anything is wrong. To take one example,
what if the underlying column is type citext but the user wrote "text"
in the foreign table definition? postgres_fdw would see no reason not
to push "col = 'foo'" across, but that clause would behave quite
differently on the remote side. Another example is that float8 and
numeric will have different opinions about the truth of
"1.000000000000000000001 = 1.000000000000000000002", so you're going
to get into trouble if you declare an FT column as one when the
underlying column is the other, even though the I/O functions for these
types will happily take each other's output.
So I think (and have written in the committed docs) that users had
better be careful to ensure that FT columns are declared as the same
type as the underlying columns, even though we can't readily enforce
that, at least not for non-builtin types.
And there's really no difference between that situation and the
collation situation, though I agree with you that the latter is a lot
more likely to bite careless users.
What about the following design principle:
Only push down conditions which are sure to return the correct
result, provided that the PostgreSQL system objects have not
been tampered with.
That's a nice, simple, and useless-in-practice design principle,
because it will toss out many situations that users will want to work;
situations that in fact *would* work as long as the users adhere to
safe coding conventions. I do not believe that when people ask "why
does performance of LIKE suck on my foreign table", they will accept an
answer of "we don't allow that to be pushed across because we think
you're too stupid to make the remote collation match".
If you want something provably correct, the way to get there is
to work out a way to check if the remote types and collations
really match. But that's a hard problem AFAICT, so if we want
something usable in the meantime, we are going to have to accept
some uncertainty about what will happen if the user messes up.
Would it be reasonable to push down operators and functions
only if
a) they are in the pg_catalog schema
b) they have been around for a couple of releases
c) they are not collation sensitive?
We don't track (b) nor (c), so this suggestion is entirely
unimplementable in the 9.3 time frame; nor is it provably safe,
unless by (b) you mean "must have been around since 7.4 or so".
On a longer time horizon this might be doable, but it would be a lot
of work to implement a solution that most people will find far too
restrictive. I'd rather see the long-term focus be on doing
type/collation matching, so that we can expand not restrict the set
of things we can push across.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
On 2013-02-21 14:23:35 +0000, Albe Laurenz wrote:
Tom Lane wrote:
Another thing I was wondering about, but did not change, is that if we're
having the remote transaction inherit the local transaction's isolation
level, shouldn't it inherit the READ ONLY property as well?
That seems to me like it would be the right thing to do.
I am not 100% convinced of that. There might be valid usecases where a
standby executes queries on the primary that executes that do DML. And
there would be no way out of it I think?
How exactly would it do that via an FDW? Surely if the user tries to
execute INSERT/UPDATE/DELETE against a foreign table, the command would
get rejected in a read-only transaction, long before we even figure out
that the target is a foreign table?
Even granting that there's some loophole that lets the command get sent
to the foreign server, why's it a good idea to allow that? I rather
thought the idea of READ ONLY was to prevent the transaction from making
any permanent changes. It's not clear why changes on a remote database
would be exempted from that.
(Doubtless you could escape the restriction anyway with dblink, but that
doesn't mean that postgres_fdw should be similarly ill-defined.)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-02-21 09:58:57 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2013-02-21 14:23:35 +0000, Albe Laurenz wrote:
Tom Lane wrote:
Another thing I was wondering about, but did not change, is that if we're
having the remote transaction inherit the local transaction's isolation
level, shouldn't it inherit the READ ONLY property as well?That seems to me like it would be the right thing to do.
I am not 100% convinced of that. There might be valid usecases where a
standby executes queries on the primary that executes that do DML. And
there would be no way out of it I think?How exactly would it do that via an FDW? Surely if the user tries to
execute INSERT/UPDATE/DELETE against a foreign table, the command would
get rejected in a read-only transaction, long before we even figure out
that the target is a foreign table?
I was thinking of querying a remote table thats actually a view. Which
might be using a function that does caching into a table or something.
Not a completely unreasonable design.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
On 2013-02-21 09:58:57 -0500, Tom Lane wrote:
How exactly would it do that via an FDW? Surely if the user tries to
execute INSERT/UPDATE/DELETE against a foreign table, the command would
get rejected in a read-only transaction, long before we even figure out
that the target is a foreign table?
I was thinking of querying a remote table thats actually a view. Which
might be using a function that does caching into a table or something.
Not a completely unreasonable design.
Yeah, referencing a remote view is something that should work fine, but
it's not clear to me why it should work differently than it does on the
remote server. If you select from that same view in a READ ONLY
transaction on the remote, won't it fail? If so, why should that work
if it's selected from via a foreign table?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-02-21 10:21:34 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2013-02-21 09:58:57 -0500, Tom Lane wrote:
How exactly would it do that via an FDW? Surely if the user tries to
execute INSERT/UPDATE/DELETE against a foreign table, the command would
get rejected in a read-only transaction, long before we even figure out
that the target is a foreign table?I was thinking of querying a remote table thats actually a view. Which
might be using a function that does caching into a table or something.
Not a completely unreasonable design.Yeah, referencing a remote view is something that should work fine, but
it's not clear to me why it should work differently than it does on the
remote server. If you select from that same view in a READ ONLY
transaction on the remote, won't it fail? If so, why should that work
if it's selected from via a foreign table?
Sure, it might fail if you use READ ONLY explicitly. Or the code might
check it. The point is that one might not have choice about the READ
ONLY state of the local transaction if its a HS standby as all
transactions are READ ONLY there.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
Sure, it might fail if you use READ ONLY explicitly. Or the code might
check it. The point is that one might not have choice about the READ
ONLY state of the local transaction if its a HS standby as all
transactions are READ ONLY there.
[ shrug... ] If you want to use a remote DB to cheat on READ ONLY,
there's always dblink. It's not apparent to me that the FDW
implementation should try to be complicit in such cheating.
(Not that it would work anyway given the command-level checks.)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Tom Lane wrote:
I think that the question what to push down is a different question
from checking column data types, because there we can rely on the
type input functions to reject bad values.Unfortunately, that's a very myopic view of the situation: there
are many cases where datatype semantics can vary without the I/O
functions having any idea that anything is wrong. To take one example,
what if the underlying column is type citext but the user wrote "text"
in the foreign table definition? postgres_fdw would see no reason not
to push "col = 'foo'" across, but that clause would behave quite
differently on the remote side. Another example is that float8 and
numeric will have different opinions about the truth of
"1.000000000000000000001 = 1.000000000000000000002", so you're going
to get into trouble if you declare an FT column as one when the
underlying column is the other, even though the I/O functions for these
types will happily take each other's output.
You are right.
So I think (and have written in the committed docs) that users had
better be careful to ensure that FT columns are declared as the same
type as the underlying columns, even though we can't readily enforce
that, at least not for non-builtin types.And there's really no difference between that situation and the
collation situation, though I agree with you that the latter is a lot
more likely to bite careless users.
That's what I am worried about.
What about the following design principle:
Only push down conditions which are sure to return the correct
result, provided that the PostgreSQL system objects have not
been tampered with.That's a nice, simple, and useless-in-practice design principle,
because it will toss out many situations that users will want to work;
situations that in fact *would* work as long as the users adhere to
safe coding conventions. I do not believe that when people ask "why
does performance of LIKE suck on my foreign table", they will accept an
answer of "we don't allow that to be pushed across because we think
you're too stupid to make the remote collation match".
I think that it will be pretty hard to get both reliability
and performance to an optimum.
I'd rather hear complaints about bad performance than
about bad results, but that's just my opinion.
Would it be reasonable to push down operators and functions
only if
a) they are in the pg_catalog schema
b) they have been around for a couple of releases
c) they are not collation sensitive?We don't track (b) nor (c), so this suggestion is entirely
unimplementable in the 9.3 time frame; nor is it provably safe,
unless by (b) you mean "must have been around since 7.4 or so".
My idea was to have a (hand picked) list of functions and operators
that are considered safe, but I understand that that is rather
ugly. There could of course be no generic method because,
as you say, b) and c) are not tracked.
On a longer time horizon this might be doable, but it would be a lot
of work to implement a solution that most people will find far too
restrictive. I'd rather see the long-term focus be on doing
type/collation matching, so that we can expand not restrict the set
of things we can push across.
I like that vision, and of course my above idea does not
go well with it.
Yours,
Laurenz Albe
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 21 February 2013 10:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
[ postgres_fdw.v5.patch ]
Applied with a lot of revisions.
Bit of an issue with selecting rows:
postgres=# SELECT * FROM animals;
id | animal_name | animal_type | lifespan
----+-------------+-------------+----------
1 | cat | mammal | 20
2 | dog | mammal | 12
3 | robin | bird | 12
4 | dolphin | mammal | 30
5 | gecko | reptile | 18
6 | human | mammal | 85
7 | elephant | mammal | 70
8 | tortoise | reptile | 150
(8 rows)
postgres=# SELECT animals FROM animals;
animals
---------
(,,,)
(,,,)
(,,,)
(,,,)
(,,,)
(,,,)
(,,,)
(,,,)
(8 rows)
postgres=# SELECT animals, animal_name FROM animals;
animals | animal_name
---------------+-------------
(,cat,,) | cat
(,dog,,) | dog
(,robin,,) | robin
(,dolphin,,) | dolphin
(,gecko,,) | gecko
(,human,,) | human
(,elephant,,) | elephant
(,tortoise,,) | tortoise
(8 rows)
postgres=# EXPLAIN (ANALYSE, VERBOSE) SELECT animals FROM animals;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Foreign Scan on public.animals (cost=100.00..100.24 rows=8 width=45)
(actual time=0.253..0.255 rows=8 loops=1)
Output: animals.*
Remote SQL: SELECT NULL, NULL, NULL, NULL FROM public.animals
Total runtime: 0.465 ms
(4 rows)
--
Thom
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thom Brown <thom@linux.com> writes:
Bit of an issue with selecting rows:
Ooops, looks like I screwed up the logic for whole-row references.
Will fix, thanks for the test case!
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 22 February 2013 14:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thom Brown <thom@linux.com> writes:
Bit of an issue with selecting rows:
Ooops, looks like I screwed up the logic for whole-row references.
Will fix, thanks for the test case!
Retried after your changes and all is well. Thanks Tom.
--
Thom
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Tom, all,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Another thing I was wondering about, but did not change, is that if we're
having the remote transaction inherit the local transaction's isolation
level, shouldn't it inherit the READ ONLY property as well?
Apologies for bringing this up pretty late, but wrt writable FDW
transaction levels, I was *really* hoping that we'd be able to implement
autonomous transactions on top of writeable FDWs. It looks like there's
no way to do this using the postgres_fdw due to it COMMIT'ing only when
the client transaction commits. Would it be possible to have a simply
function which could be called to say "commit the transaction on the
foreign side for this server/table/connection/whatever"? A nice
addition on top of that would be able to define 'auto-commit' for a
given table or server.
I'll try and find time to work on this, but I'd love feedback on if this
is possible and where the landmines are.
Thanks,
Stephen
Stephen Frost <sfrost@snowman.net> writes:
Apologies for bringing this up pretty late, but wrt writable FDW
transaction levels, I was *really* hoping that we'd be able to implement
autonomous transactions on top of writeable FDWs. It looks like there's
no way to do this using the postgres_fdw due to it COMMIT'ing only when
the client transaction commits. Would it be possible to have a simply
function which could be called to say "commit the transaction on the
foreign side for this server/table/connection/whatever"? A nice
addition on top of that would be able to define 'auto-commit' for a
given table or server.
TBH I think this is a fairly bad idea. You can get that behavior via
dblink if you need it, but there's no way to do it in an FDW without
ending up with astonishing (and not in a good way) semantics. A commit
would force committal of everything that'd been done through that
connection, regardless of transaction/subtransaction structure up to
that point; and it would also destroy open cursors. The only way to
make this sane at all would be to provide user control of which
operations go to which connections; which is inherent in dblink's API
but is simply not a concept in the FDW universe. And I don't want to
try to plaster it on, either.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
TBH I think this is a fairly bad idea. You can get that behavior via
dblink if you need it,
While I appreciate that dblink can do it, I simply don't see it as a
good solution to this.
but there's no way to do it in an FDW without
ending up with astonishing (and not in a good way) semantics. A commit
would force committal of everything that'd been done through that
connection, regardless of transaction/subtransaction structure up to
that point; and it would also destroy open cursors. The only way to
make this sane at all would be to provide user control of which
operations go to which connections; which is inherent in dblink's API
but is simply not a concept in the FDW universe. And I don't want to
try to plaster it on, either.
This concern would make a lot more sense to me if we were sharing a
given FDW connection between multiple client backends/sessions; I admit
that I've not looked through the code but the documentation seems to
imply that we create one-or-more FDW connection per backend session and
there's no sharing going on.
A single backend will be operating in a linear fashion through the
commands sent to it. As such, I'm not sure that it's quite as bad as it
may seem.
Perhaps a reasonable compromise would be to have a SERVER option which
is along the lines of 'autocommit', where a user could request that any
query to this server is automatically committed independent of the
client transaction. I'd be happier if we could allow the user to
control it, but this would at least allow for 'log tables', which are
defined using this server definition, where long-running pl/pgsql code
could log progress where other connections could see it.
Thanks,
Stephen
Stephen Frost <sfrost@snowman.net> writes:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
... The only way to
make this sane at all would be to provide user control of which
operations go to which connections; which is inherent in dblink's API
but is simply not a concept in the FDW universe. And I don't want to
try to plaster it on, either.
This concern would make a lot more sense to me if we were sharing a
given FDW connection between multiple client backends/sessions; I admit
that I've not looked through the code but the documentation seems to
imply that we create one-or-more FDW connection per backend session and
there's no sharing going on.
Well, ATM postgres_fdw shares connections across tables and queries;
but my point is that that's all supposed to be transparent and invisible
to the user. I don't want to have API features that make connections
explicit, because I don't think that can be shoehorned into the FDW
model without considerable strain and weird corner cases.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
I don't want to have API features that make connections
explicit, because I don't think that can be shoehorned into the FDW
model without considerable strain and weird corner cases.
It seems we're talking past each other here. I'm not particularly
interested in exposing what connections have been made to other servers
via some API (though I could see some debugging use there). What I was
hoping for is a way for a given user to say "I want this specific
change, to this table, to be persisted immediately". I'd love to have
that ability *without* FDWs too. It just happens that FDWs provide a
simple way to get there from here and without a lot of muddying of the
waters, imv.
FDWs are no stranger to remote connections which don't have transactions
either, file_fdw will happily return whatever the current contents of
the file are with no concern for PG transactions. I would expect a
writable file_fdw to act the same and immediately write out any data
written to it.
Hope that clarifies things a bit.
Thanks,
Stephen