postgres_fdw: using TABLESAMPLE to collect remote sample
Hi,
here's a small patch modifying postgres_fdw to use TABLESAMPLE to
collect sample when running ANALYZE on a foreign table. Currently the
sampling happens locally, i.e. we fetch all data from the remote server
and then pick rows. But that's hugely expensive for large relations
and/or with limited network bandwidth, of course.
Alexander Lepikhov mentioned this in [1]/messages/by-id/bdb0bea2-a0da-1f1d-5c92-96ff90c198eb@postgrespro.ru, but it was actually proposed
by Stephen in 2020 [2]/messages/by-id/20200829162231.GE29590@tamriel.snowman.net but no patch even materialized.
So here we go. The patch does a very simple thing - it uses TABLESAMPLE
to collect/transfer just a small sample from the remote node, saving
both CPU and network.
And indeed, that helps quite a bit:
---------------------------------------------------------------------
create table t (a int);
Time: 10.223 ms
insert into t select i from generate_series(1,10000000) s(i);
Time: 552.207 ms
analyze t;
Time: 310.445 ms
CREATE FOREIGN TABLE ft (a int)
SERVER foreign_server
OPTIONS (schema_name 'public', table_name 't');
Time: 4.838 ms
analyze ft;
WARNING: SQL: DECLARE c1 CURSOR FOR SELECT a FROM public.t TABLESAMPLE
SYSTEM(0.375001)
Time: 44.632 ms
alter foreign table ft options (sample 'false');
Time: 4.821 ms
analyze ft;
WARNING: SQL: DECLARE c1 CURSOR FOR SELECT a FROM public.t
Time: 6690.425 ms (00:06.690)
---------------------------------------------------------------------
6690ms without sampling, and 44ms with sampling - quite an improvement.
Of course, the difference may be much smaller/larger in other cases.
Now, there's a couple issues ...
Firstly, the FDW API forces a bit strange division of work between
different methods and duplicating some of it (and it's already mentioned
in postgresAnalyzeForeignTable). But that's minor and can be fixed.
The other issue is which sampling method to use - we have SYSTEM and
BERNOULLI built in, and the tsm_system_rows as an extension (and _time,
but that's not very useful here). I guess we'd use one of the built-in
ones, because that'll work on more systems out of the box.
But that leads to the main issue - determining the fraction of rows to
sample. We know how many rows we want to sample, but we have no idea how
many rows there are in total. We can look at reltuples, but we can't be
sure how accurate / up-to-date that value is.
The patch just trusts it unless it's obviously bogus (-1, 0, etc.) and
applies some simple sanity checks, but I wonder if we need to do more
(e.g. look at relation size and adjust reltuples by current/relpages).
FWIW this is yet a bit more convoluted when analyzing partitioned table
with foreign partitions, because we only ever look at relation sizes to
determine how many rows to sample from each partition.
regards
[1]: /messages/by-id/bdb0bea2-a0da-1f1d-5c92-96ff90c198eb@postgrespro.ru
/messages/by-id/bdb0bea2-a0da-1f1d-5c92-96ff90c198eb@postgrespro.ru
[2]: /messages/by-id/20200829162231.GE29590@tamriel.snowman.net
/messages/by-id/20200829162231.GE29590@tamriel.snowman.net
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
postgres-fdw-analyze-sample-20220211.patchtext/x-patch; charset=UTF-8; name=postgres-fdw-analyze-sample-20220211.patchDownload
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index bf12eac0288..68f875c47a6 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2267,6 +2267,26 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
appendStringInfo(buf, "::pg_catalog.regclass) / %d", BLCKSZ);
}
+/*
+ * Construct SELECT statement to acquire numbe of rows of given relation.
+ *
+ * Note: Maybe this should compare relpages and current relation size
+ * and adjust reltuples accordingly?
+ */
+void
+deparseAnalyzeTuplesSql(StringInfo buf, Relation rel)
+{
+ StringInfoData relname;
+
+ /* We'll need the remote relation name as a literal. */
+ initStringInfo(&relname);
+ deparseRelation(&relname, rel);
+
+ appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = ");
+ deparseStringLiteral(buf, relname.data);
+ appendStringInfoString(buf, "::pg_catalog.regclass");
+}
+
/*
* Construct SELECT statement to acquire sample rows of given relation.
*
@@ -2328,6 +2348,72 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
deparseRelation(buf, rel);
}
+/*
+ * Construct SELECT statement to acquire sample rows of given relation,
+ * by sampling a fraction of the table using TABLESAMPLE SYSTEM.
+ *
+ * SELECT command is appended to buf, and list of columns retrieved
+ * is returned to *retrieved_attrs.
+ *
+ * Note: We could allow selecting system/bernoulli, and maybe even the
+ * optional TSM modules (especially tsm_system_rows would help).
+ */
+void
+deparseAnalyzeSampleSql(StringInfo buf, Relation rel, List **retrieved_attrs, double sample_frac)
+{
+ Oid relid = RelationGetRelid(rel);
+ TupleDesc tupdesc = RelationGetDescr(rel);
+ int i;
+ char *colname;
+ List *options;
+ ListCell *lc;
+ bool first = true;
+
+ *retrieved_attrs = NIL;
+
+ appendStringInfoString(buf, "SELECT ");
+ for (i = 0; i < tupdesc->natts; i++)
+ {
+ /* Ignore dropped columns. */
+ if (TupleDescAttr(tupdesc, i)->attisdropped)
+ continue;
+
+ if (!first)
+ appendStringInfoString(buf, ", ");
+ first = false;
+
+ /* Use attribute name or column_name option. */
+ colname = NameStr(TupleDescAttr(tupdesc, i)->attname);
+ options = GetForeignColumnOptions(relid, i + 1);
+
+ foreach(lc, options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "column_name") == 0)
+ {
+ colname = defGetString(def);
+ break;
+ }
+ }
+
+ appendStringInfoString(buf, quote_identifier(colname));
+
+ *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+ }
+
+ /* Don't generate bad syntax for zero-column relation. */
+ if (first)
+ appendStringInfoString(buf, "NULL");
+
+ /*
+ * Construct FROM clause
+ */
+ appendStringInfoString(buf, " FROM ");
+ deparseRelation(buf, rel);
+ appendStringInfo(buf, " TABLESAMPLE SYSTEM(%f)", (100.0 * sample_frac));
+}
+
/*
* Construct a simple "TRUNCATE rel" statement
*/
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index fc3ce6a53a2..02be48ec82c 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -121,7 +121,8 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
strcmp(def->defname, "updatable") == 0 ||
strcmp(def->defname, "truncatable") == 0 ||
strcmp(def->defname, "async_capable") == 0 ||
- strcmp(def->defname, "keep_connections") == 0)
+ strcmp(def->defname, "keep_connections") == 0 ||
+ strcmp(def->defname, "sample") == 0)
{
/* these accept only boolean values */
(void) defGetBoolean(def);
@@ -252,6 +253,10 @@ InitPgFdwOptions(void)
{"keep_connections", ForeignServerRelationId, false},
{"password_required", UserMappingRelationId, false},
+ /* sampling is available on both server and table */
+ {"sample", ForeignServerRelationId, false},
+ {"sample", ForeignTableRelationId, false},
+
/*
* sslcert and sslkey are in fact libpq options, but we repeat them
* here to allow them to appear in both foreign server context (when
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 56654844e8f..be04ceb01ef 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -4961,6 +4961,68 @@ postgresAnalyzeForeignTable(Relation relation,
return true;
}
+/*
+ * postgresCountTuplesForForeignTable
+ * Count tuples in foreign table (just get pg_class.reltuples).
+ *
+ * Note: It's unclear how accurate reltuples is, maybe size that using
+ * relpages and simple assumptions (1 tuples per page, ...)? Using
+ * tsm_system_rows wold make this somewhat unnecessary.
+ */
+static double
+postgresCountTuplesForForeignTable(Relation relation)
+{
+ ForeignTable *table;
+ UserMapping *user;
+ PGconn *conn;
+ StringInfoData sql;
+ PGresult *volatile res = NULL;
+ double reltuples = -1;
+
+ /*
+ * Now we have to get the number of pages. It's annoying that the ANALYZE
+ * API requires us to return that now, because it forces some duplication
+ * of effort between this routine and postgresAcquireSampleRowsFunc. But
+ * it's probably not worth redefining that API at this point.
+ */
+
+ /*
+ * Get the connection to use. We do the remote access as the table's
+ * owner, even if the ANALYZE was started by some other user.
+ */
+ table = GetForeignTable(RelationGetRelid(relation));
+ user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
+ conn = GetConnection(user, false, NULL);
+
+ /*
+ * Construct command to get page count for relation.
+ */
+ initStringInfo(&sql);
+ deparseAnalyzeTuplesSql(&sql, relation);
+
+ /* In what follows, do not risk leaking any PGresults. */
+ PG_TRY();
+ {
+ res = pgfdw_exec_query(conn, sql.data, NULL);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ pgfdw_report_error(ERROR, res, conn, false, sql.data);
+
+ if (PQntuples(res) != 1 || PQnfields(res) != 1)
+ elog(ERROR, "unexpected result from deparseAnalyzeSizeSql query");
+ reltuples = strtod(PQgetvalue(res, 0, 0), NULL);
+ }
+ PG_FINALLY();
+ {
+ if (res)
+ PQclear(res);
+ }
+ PG_END_TRY();
+
+ ReleaseConnection(conn);
+
+ return reltuples;
+}
+
/*
* Acquire a random sample of rows from foreign table managed by postgres_fdw.
*
@@ -4991,6 +5053,12 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
unsigned int cursor_number;
StringInfoData sql;
PGresult *volatile res = NULL;
+ ListCell *lc;
+
+ /* sampling enabled by default */
+ bool do_sample = true;
+ double sample_frac = -1.0;
+ double reltuples;
/* Initialize workspace state */
astate.rel = relation;
@@ -5018,20 +5086,74 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
conn = GetConnection(user, false, NULL);
+ /*
+ * Should we use TABLESAMPLE to collect the remote sample?
+ */
+ foreach(lc, server->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "sample") == 0)
+ {
+ do_sample = defGetBoolean(def);
+ break;
+ }
+ }
+ foreach(lc, table->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "sample") == 0)
+ {
+ do_sample = defGetBoolean(def);
+ break;
+ }
+ }
+
+ if (do_sample)
+ {
+ reltuples = postgresCountTuplesForForeignTable(relation);
+
+ if ((reltuples <= 0) || (targrows >= reltuples))
+ do_sample = false;
+
+ sample_frac = targrows / reltuples;
+
+ /* Let's sample a bit more, we'll reduce the sample locally. */
+ sample_frac *= 1.25;
+
+ /* Sanity checks. */
+ sample_frac = Min(1.0, Max(0.0, sample_frac));
+
+ /*
+ * When sampling too large fraction, just read everything.
+ *
+ * XXX It's not clear where exactly the threshold is, with slow
+ * network it may be cheaper to sample even 90%.
+ */
+ if (sample_frac > 0.5)
+ do_sample = false;
+ }
+
/*
* Construct cursor that retrieves whole rows from remote.
*/
cursor_number = GetCursorNumber(conn);
initStringInfo(&sql);
appendStringInfo(&sql, "DECLARE c%u CURSOR FOR ", cursor_number);
- deparseAnalyzeSql(&sql, relation, &astate.retrieved_attrs);
+
+ if (do_sample)
+ deparseAnalyzeSampleSql(&sql, relation, &astate.retrieved_attrs, sample_frac);
+ else
+ deparseAnalyzeSql(&sql, relation, &astate.retrieved_attrs);
+
+ elog(WARNING, "SQL: %s", sql.data);
/* In what follows, do not risk leaking any PGresults. */
PG_TRY();
{
char fetch_sql[64];
int fetch_size;
- ListCell *lc;
res = pgfdw_exec_query(conn, sql.data, NULL);
if (PQresultStatus(res) != PGRES_COMMAND_OK)
@@ -5120,7 +5242,10 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
*totaldeadrows = 0.0;
/* We've retrieved all living tuples from foreign server. */
- *totalrows = astate.samplerows;
+ if (do_sample)
+ *totalrows = reltuples;
+ else
+ *totalrows = astate.samplerows;
/*
* Emit some interesting relation info
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 8ae79e97e44..5e4f732e563 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -208,8 +208,12 @@ extern void deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
List *returningList,
List **retrieved_attrs);
extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
+extern void deparseAnalyzeTuplesSql(StringInfo buf, Relation rel);
extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
List **retrieved_attrs);
+extern void deparseAnalyzeSampleSql(StringInfo buf, Relation rel,
+ List **retrieved_attrs,
+ double sample_frac);
extern void deparseTruncateSql(StringInfo buf,
List *rels,
DropBehavior behavior,
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
So here we go. The patch does a very simple thing - it uses TABLESAMPLE
to collect/transfer just a small sample from the remote node, saving
both CPU and network.
This is great if the remote end has TABLESAMPLE, but pre-9.5 servers
don't, and postgres_fdw is supposed to still work with old servers.
So you need some conditionality for that.
regards, tom lane
On Fri, Feb 11, 2022 at 12:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
So here we go. The patch does a very simple thing - it uses TABLESAMPLE
to collect/transfer just a small sample from the remote node, saving
both CPU and network.This is great if the remote end has TABLESAMPLE, but pre-9.5 servers
don't, and postgres_fdw is supposed to still work with old servers.
So you need some conditionality for that.
I think it's going to be necessary to compromise on that at some
point. I don't, for example, think it would be reasonable for
postgres_fdw to have detailed knowledge of which operators can be
pushed down as a function of the remote PostgreSQL version. Nor do I
think that we care about whether this works at all against, say,
PostgreSQL 8.0. I'm not sure where it's reasonable to draw a line and
say we're not going to expend any more effort, and maybe 15 with 9.5
is a small enough gap that we still care at least somewhat about
compatibility. But even that is not 100% obvious to me.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
I think it's going to be necessary to compromise on that at some
point.
Sure. The existing postgres_fdw documentation [1]https://www.postgresql.org/docs/devel/postgres-fdw.html#id-1.11.7.45.17 already addresses
this issue:
postgres_fdw can be used with remote servers dating back to PostgreSQL
8.3. Read-only capability is available back to 8.1. A limitation
however is that postgres_fdw generally assumes that immutable built-in
functions and operators are safe to send to the remote server for
execution, if they appear in a WHERE clause for a foreign table. Thus,
a built-in function that was added since the remote server's release
might be sent to it for execution, resulting in “function does not
exist” or a similar error. This type of failure can be worked around
by rewriting the query, for example by embedding the foreign table
reference in a sub-SELECT with OFFSET 0 as an optimization fence, and
placing the problematic function or operator outside the sub-SELECT.
While I'm not opposed to moving those goalposts at some point,
I think pushing them all the way up to 9.5 for this one easily-fixed
problem is not very reasonable.
Given other recent discussion, an argument could be made for moving
the cutoff to 9.2, on the grounds that it's too hard to test against
anything older. But that still leaves us needing a version check
if we want to use TABLESAMPLE.
regards, tom lane
[1]: https://www.postgresql.org/docs/devel/postgres-fdw.html#id-1.11.7.45.17
On Fri, Feb 11, 2022 at 1:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
While I'm not opposed to moving those goalposts at some point,
I think pushing them all the way up to 9.5 for this one easily-fixed
problem is not very reasonable.Given other recent discussion, an argument could be made for moving
the cutoff to 9.2, on the grounds that it's too hard to test against
anything older. But that still leaves us needing a version check
if we want to use TABLESAMPLE.
OK, thanks.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 2/11/22 19:13, Robert Haas wrote:
On Fri, Feb 11, 2022 at 1:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
While I'm not opposed to moving those goalposts at some point,
I think pushing them all the way up to 9.5 for this one easily-fixed
problem is not very reasonable.Given other recent discussion, an argument could be made for moving
the cutoff to 9.2, on the grounds that it's too hard to test against
anything older. But that still leaves us needing a version check
if we want to use TABLESAMPLE.OK, thanks.
Yeah, I think checking server_version is fairly simple. Which is mostly
why I haven't done anything about that in the submitted patch, because
the other issues (what fraction to sample) seemed more important.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hello,
I find it a great idea to use TABLESAMPLE in postgres_fdw ANALYZE.
Let me offer you some ideas how to resolve you problems.
Tomas Vondra < tomas.vondra@enterprisedb.com > writes:
The other issue is which sampling method to use - we have SYSTEM and
BERNOULLI built in, and the tsm_system_rows as an extension (and _time,
but that's not very useful here). I guess we'd use one of the built-in
ones, because that'll work on more systems out of the box.
It’s hard to choose between speed and quality, but I think we need
SYSTEM method here. This patch is for speeding-up ANALYZE,
and SYSTEM method will faster than BERNOULLI on fraction
values to 50%.
But that leads to the main issue - determining the fraction of rows to
sample. We know how many rows we want to sample, but we have no idea how
many rows there are in total. We can look at reltuples, but we can't be
sure how accurate / up-to-date that value is.The patch just trusts it unless it's obviously bogus (-1, 0, etc.) and
applies some simple sanity checks, but I wonder if we need to do more
(e.g. look at relation size and adjust reltuples by current/relpages).
I found a query on Stackoverflow (it does similar thing to what
estimate_rel_size does) that may help with it. So that makes
tsm_system_rows unnecessary.
----------------------------------------------------------------------
SELECT
(CASE WHEN relpages = 0 THEN float8 '0'
ELSE reltuples / relpages END
* (pg_relation_size(oid) / pg_catalog.current_setting('block_size')::int)
)::bigint
FROM pg_class c
WHERE c.oid = 'tablename'::regclass;
----------------------------------------------------------------------
And one more refactoring note. New function deparseAnalyzeSampleSql
duplicates function deparseAnalyzeSql (is previous in file deparse.c)
except for the new last line. I guess it would be better to add new
parameter — double sample_frac — to existing function
deparseAnalyzeSql and use it as a condition for adding
"TABLESAMPLE SYSTEM..." to SQL query (set it to zero when
do_sample is false). Or you may also add do_sample as a parameter to
deparseAnalyzeSql, but as for me that’s redundantly.
Stackoverflow: https://stackoverflow.com/questions/7943233/fast-way-to-discover-the-row-count-of-a-table-in-postgresql
--
Sofia Kopikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Hi,
here's a slightly updated version of the patch series. The 0001 part
adds tracking of server_version_num, so that it's possible to enable
other features depending on it. In this case it's used to decide whether
TABLESAMPLE is supported.
The 0002 part modifies the sampling. I realized we can do something
similar even on pre-9.5 releases, by running "WHERE random() < $1". Not
perfect, because it still has to read the whole table, but still better
than also sending it over the network.
There's a "sample" option for foreign server/table, which can be used to
disable the sampling if needed.
A simple measurement on a table with 10M rows, on localhost.
old: 6600ms
random: 450ms
tablesample: 40ms (system)
tablesample: 200ms (bernoulli)
Local analyze takes ~190ms, so that's quite close.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
0001-postgres_fdw-track-server-version-for-conne-20220218.patchtext/x-patch; charset=UTF-8; name=0001-postgres_fdw-track-server-version-for-conne-20220218.patchDownload
From 6524f8c6f0db13c5dca0438bdda194ee5bedebbb Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri, 18 Feb 2022 01:32:25 +0100
Subject: [PATCH 1/2] postgres_fdw: track server version for connections
To allow using features that only exist on new Postgres versions, we
need some information about version of the remote node. We simply
request server_version_num from the remote node. We only fetch it if
version number is actually needed, and we cache it.
---
contrib/postgres_fdw/connection.c | 44 +++++++++++++++++++++++++++++
contrib/postgres_fdw/postgres_fdw.h | 2 ++
2 files changed, 46 insertions(+)
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index f753c6e2324..3ea2948d3ec 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -279,6 +279,50 @@ GetConnection(UserMapping *user, bool will_prep_stmt, PgFdwConnState **state)
return entry->conn;
}
+/*
+ * Determine remote server version (as an int value).
+ *
+ * The value is determined only once and then cached in PgFdwConnState.
+ */
+int
+GetServerVersion(PGconn *conn, PgFdwConnState *state)
+{
+ PGresult *volatile res = NULL;
+
+ /*
+ * If we already know server version for this connection, we're done.
+ */
+ if (state->server_version_num != 0)
+ return state->server_version_num;
+
+ /* PGresult must be released before leaving this function. */
+ PG_TRY();
+ {
+ char *line;
+
+ /*
+ * Execute EXPLAIN remotely.
+ */
+ res = pgfdw_exec_query(conn, "SHOW server_version_num", NULL);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ pgfdw_report_error(ERROR, res, conn, false, "SHOW server_version_num");
+
+ /*
+ * Extract the server version number.
+ */
+ line = PQgetvalue(res, 0, 0);
+ state->server_version_num = strtol(line, NULL, 10);
+ }
+ PG_FINALLY();
+ {
+ if (res)
+ PQclear(res);
+ }
+ PG_END_TRY();
+
+ return state->server_version_num;
+}
+
/*
* Reset all transient state fields in the cached connection entry and
* establish new connection to the remote server.
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 8ae79e97e44..1687c62df2d 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -132,6 +132,7 @@ typedef struct PgFdwRelationInfo
typedef struct PgFdwConnState
{
AsyncRequest *pendingAreq; /* pending async request */
+ int server_version_num; /* remote server version */
} PgFdwConnState;
/* in postgres_fdw.c */
@@ -143,6 +144,7 @@ extern void process_pending_request(AsyncRequest *areq);
extern PGconn *GetConnection(UserMapping *user, bool will_prep_stmt,
PgFdwConnState **state);
extern void ReleaseConnection(PGconn *conn);
+extern int GetServerVersion(PGconn *conn, PgFdwConnState *state);
extern unsigned int GetCursorNumber(PGconn *conn);
extern unsigned int GetPrepStmtNumber(PGconn *conn);
extern void do_sql_command(PGconn *conn, const char *sql);
--
2.34.1
0002-postgres_fdw-sample-data-on-remote-node-for-20220218.patchtext/x-patch; charset=UTF-8; name=0002-postgres_fdw-sample-data-on-remote-node-for-20220218.patchDownload
From 546657dd0d3ab3bebdb1145785c62809b78e7644 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri, 18 Feb 2022 00:33:19 +0100
Subject: [PATCH 2/2] postgres_fdw: sample data on remote node for ANALYZE
When performing ANALYZE on a foreign tables, we need to collect sample
of rows. Until now, we simply read all data from the remote node and
built the sample locally. That is very expensive, especially in terms of
network traffic etc. But it's possible to move the sampling to the
remote node, and use either TABLESAMPLE or simply random() to transfer
just much smaller amount of data.
So we run either
SELECT * FROM table TABLESAMPLE SYSTEM(fraction)
or
SELECT * FROM table WHERE random() < fraction
depending on the server version (TABLESAMPLE is supported since 9.5).
To do that, we need to determine what fraction of the table to sample.
We rely on reltuples (fetched from the remote node) to be sufficiently
accurate and up to date, and calculate the fraction based on that. We
increase the sample size a bit (in case the table shrunk), and we still
do the reservoir sampling (in case it grew).
Using tsm_system_rows would allow specifying sample size in rows,
without determining sampling rate. But the sampling method may not be
installed, and we'd still have to determine the relation size.
This adds 'sample' option for remote servers / tables. By default, it's
set to 'true' which enables remote sampling. Setting it to 'false' uses
the old approach of fetching everything and sampling locally.
---
contrib/postgres_fdw/deparse.c | 152 ++++++++++++++++++++++++++++
contrib/postgres_fdw/option.c | 7 +-
contrib/postgres_fdw/postgres_fdw.c | 145 +++++++++++++++++++++++++-
contrib/postgres_fdw/postgres_fdw.h | 7 ++
4 files changed, 306 insertions(+), 5 deletions(-)
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index bf12eac0288..32f2c0d5fb3 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2267,6 +2267,26 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
appendStringInfo(buf, "::pg_catalog.regclass) / %d", BLCKSZ);
}
+/*
+ * Construct SELECT statement to acquire numbe of rows of given relation.
+ *
+ * Note: Maybe this should compare relpages and current relation size
+ * and adjust reltuples accordingly?
+ */
+void
+deparseAnalyzeTuplesSql(StringInfo buf, Relation rel)
+{
+ StringInfoData relname;
+
+ /* We'll need the remote relation name as a literal. */
+ initStringInfo(&relname);
+ deparseRelation(&relname, rel);
+
+ appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = ");
+ deparseStringLiteral(buf, relname.data);
+ appendStringInfoString(buf, "::pg_catalog.regclass");
+}
+
/*
* Construct SELECT statement to acquire sample rows of given relation.
*
@@ -2328,6 +2348,138 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
deparseRelation(buf, rel);
}
+/*
+ * Construct SELECT statement to acquire sample rows of given relation,
+ * by sampling a fraction of the table using TABLESAMPLE SYSTEM.
+ *
+ * SELECT command is appended to buf, and list of columns retrieved
+ * is returned to *retrieved_attrs.
+ *
+ * Note: We could allow selecting system/bernoulli, and maybe even the
+ * optional TSM modules (especially tsm_system_rows would help).
+ */
+void
+deparseAnalyzeTableSampleSql(StringInfo buf, Relation rel, List **retrieved_attrs, double sample_frac)
+{
+ Oid relid = RelationGetRelid(rel);
+ TupleDesc tupdesc = RelationGetDescr(rel);
+ int i;
+ char *colname;
+ List *options;
+ ListCell *lc;
+ bool first = true;
+
+ *retrieved_attrs = NIL;
+
+ appendStringInfoString(buf, "SELECT ");
+ for (i = 0; i < tupdesc->natts; i++)
+ {
+ /* Ignore dropped columns. */
+ if (TupleDescAttr(tupdesc, i)->attisdropped)
+ continue;
+
+ if (!first)
+ appendStringInfoString(buf, ", ");
+ first = false;
+
+ /* Use attribute name or column_name option. */
+ colname = NameStr(TupleDescAttr(tupdesc, i)->attname);
+ options = GetForeignColumnOptions(relid, i + 1);
+
+ foreach(lc, options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "column_name") == 0)
+ {
+ colname = defGetString(def);
+ break;
+ }
+ }
+
+ appendStringInfoString(buf, quote_identifier(colname));
+
+ *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+ }
+
+ /* Don't generate bad syntax for zero-column relation. */
+ if (first)
+ appendStringInfoString(buf, "NULL");
+
+ /*
+ * Construct FROM clause
+ */
+ appendStringInfoString(buf, " FROM ");
+ deparseRelation(buf, rel);
+ appendStringInfo(buf, " TABLESAMPLE SYSTEM(%f)", (100.0 * sample_frac));
+}
+
+/*
+ * Construct SELECT statement to acquire sample rows of given relation,
+ * by sampling a fraction of the table using TABLESAMPLE SYSTEM.
+ *
+ * SELECT command is appended to buf, and list of columns retrieved
+ * is returned to *retrieved_attrs.
+ *
+ * Note: We could allow selecting system/bernoulli, and maybe even the
+ * optional TSM modules (especially tsm_system_rows would help).
+ */
+void
+deparseAnalyzeLegacySampleSql(StringInfo buf, Relation rel, List **retrieved_attrs, double sample_frac)
+{
+ Oid relid = RelationGetRelid(rel);
+ TupleDesc tupdesc = RelationGetDescr(rel);
+ int i;
+ char *colname;
+ List *options;
+ ListCell *lc;
+ bool first = true;
+
+ *retrieved_attrs = NIL;
+
+ appendStringInfoString(buf, "SELECT ");
+ for (i = 0; i < tupdesc->natts; i++)
+ {
+ /* Ignore dropped columns. */
+ if (TupleDescAttr(tupdesc, i)->attisdropped)
+ continue;
+
+ if (!first)
+ appendStringInfoString(buf, ", ");
+ first = false;
+
+ /* Use attribute name or column_name option. */
+ colname = NameStr(TupleDescAttr(tupdesc, i)->attname);
+ options = GetForeignColumnOptions(relid, i + 1);
+
+ foreach(lc, options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "column_name") == 0)
+ {
+ colname = defGetString(def);
+ break;
+ }
+ }
+
+ appendStringInfoString(buf, quote_identifier(colname));
+
+ *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+ }
+
+ /* Don't generate bad syntax for zero-column relation. */
+ if (first)
+ appendStringInfoString(buf, "NULL");
+
+ /*
+ * Construct FROM clause
+ */
+ appendStringInfoString(buf, " FROM ");
+ deparseRelation(buf, rel);
+ appendStringInfo(buf, " WHERE random() < %f", sample_frac);
+}
+
/*
* Construct a simple "TRUNCATE rel" statement
*/
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index fc3ce6a53a2..02be48ec82c 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -121,7 +121,8 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
strcmp(def->defname, "updatable") == 0 ||
strcmp(def->defname, "truncatable") == 0 ||
strcmp(def->defname, "async_capable") == 0 ||
- strcmp(def->defname, "keep_connections") == 0)
+ strcmp(def->defname, "keep_connections") == 0 ||
+ strcmp(def->defname, "sample") == 0)
{
/* these accept only boolean values */
(void) defGetBoolean(def);
@@ -252,6 +253,10 @@ InitPgFdwOptions(void)
{"keep_connections", ForeignServerRelationId, false},
{"password_required", UserMappingRelationId, false},
+ /* sampling is available on both server and table */
+ {"sample", ForeignServerRelationId, false},
+ {"sample", ForeignTableRelationId, false},
+
/*
* sslcert and sslkey are in fact libpq options, but we repeat them
* here to allow them to appear in both foreign server context (when
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 56654844e8f..6eafe520c4e 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -4961,6 +4961,68 @@ postgresAnalyzeForeignTable(Relation relation,
return true;
}
+/*
+ * postgresCountTuplesForForeignTable
+ * Count tuples in foreign table (just get pg_class.reltuples).
+ *
+ * Note: It's unclear how accurate reltuples is, maybe size that using
+ * relpages and simple assumptions (1 tuples per page, ...)? Using
+ * tsm_system_rows wold make this somewhat unnecessary.
+ */
+static double
+postgresCountTuplesForForeignTable(Relation relation)
+{
+ ForeignTable *table;
+ UserMapping *user;
+ PGconn *conn;
+ StringInfoData sql;
+ PGresult *volatile res = NULL;
+ double reltuples = -1;
+
+ /*
+ * Now we have to get the number of pages. It's annoying that the ANALYZE
+ * API requires us to return that now, because it forces some duplication
+ * of effort between this routine and postgresAcquireSampleRowsFunc. But
+ * it's probably not worth redefining that API at this point.
+ */
+
+ /*
+ * Get the connection to use. We do the remote access as the table's
+ * owner, even if the ANALYZE was started by some other user.
+ */
+ table = GetForeignTable(RelationGetRelid(relation));
+ user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
+ conn = GetConnection(user, false, NULL);
+
+ /*
+ * Construct command to get page count for relation.
+ */
+ initStringInfo(&sql);
+ deparseAnalyzeTuplesSql(&sql, relation);
+
+ /* In what follows, do not risk leaking any PGresults. */
+ PG_TRY();
+ {
+ res = pgfdw_exec_query(conn, sql.data, NULL);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ pgfdw_report_error(ERROR, res, conn, false, sql.data);
+
+ if (PQntuples(res) != 1 || PQnfields(res) != 1)
+ elog(ERROR, "unexpected result from deparseAnalyzeSizeSql query");
+ reltuples = strtod(PQgetvalue(res, 0, 0), NULL);
+ }
+ PG_FINALLY();
+ {
+ if (res)
+ PQclear(res);
+ }
+ PG_END_TRY();
+
+ ReleaseConnection(conn);
+
+ return reltuples;
+}
+
/*
* Acquire a random sample of rows from foreign table managed by postgres_fdw.
*
@@ -4991,6 +5053,15 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
unsigned int cursor_number;
StringInfoData sql;
PGresult *volatile res = NULL;
+ ListCell *lc;
+ PgFdwConnState *state;
+ int server_version_num;
+
+ /* sampling enabled by default */
+ bool do_sample = true;
+ bool use_tablesample = true;
+ double sample_frac = -1.0;
+ double reltuples;
/* Initialize workspace state */
astate.rel = relation;
@@ -5016,7 +5087,63 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
table = GetForeignTable(RelationGetRelid(relation));
server = GetForeignServer(table->serverid);
user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
- conn = GetConnection(user, false, NULL);
+ conn = GetConnection(user, false, &state);
+
+ /* We'll need server version, so fetch it now. */
+ server_version_num = GetServerVersion(conn, state);
+
+ /* disable tablesample on old remote servers */
+ if (server_version_num < 95000)
+ use_tablesample = false;
+
+ /*
+ * Should we use TABLESAMPLE to collect the remote sample?
+ */
+ foreach(lc, server->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "sample") == 0)
+ {
+ do_sample = defGetBoolean(def);
+ break;
+ }
+ }
+ foreach(lc, table->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "sample") == 0)
+ {
+ do_sample = defGetBoolean(def);
+ break;
+ }
+ }
+
+ if (do_sample)
+ {
+ reltuples = postgresCountTuplesForForeignTable(relation);
+
+ if ((reltuples <= 0) || (targrows >= reltuples))
+ do_sample = false;
+
+ sample_frac = targrows / reltuples;
+
+ /* Let's sample a bit more, we'll reduce the sample locally. */
+ sample_frac *= 1.25;
+
+ /* Sanity checks. */
+ sample_frac = Min(1.0, Max(0.0, sample_frac));
+
+ /*
+ * When sampling too large fraction, just read everything.
+ *
+ * XXX It's not clear where exactly the threshold is, with slow
+ * network it may be cheaper to sample even 90%.
+ */
+ if (sample_frac > 0.5)
+ do_sample = false;
+ }
/*
* Construct cursor that retrieves whole rows from remote.
@@ -5024,14 +5151,21 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
cursor_number = GetCursorNumber(conn);
initStringInfo(&sql);
appendStringInfo(&sql, "DECLARE c%u CURSOR FOR ", cursor_number);
- deparseAnalyzeSql(&sql, relation, &astate.retrieved_attrs);
+
+ if (do_sample && use_tablesample)
+ deparseAnalyzeTableSampleSql(&sql, relation, &astate.retrieved_attrs, sample_frac);
+ else if (do_sample)
+ deparseAnalyzeLegacySampleSql(&sql, relation, &astate.retrieved_attrs, sample_frac);
+ else
+ deparseAnalyzeSql(&sql, relation, &astate.retrieved_attrs);
+
+ elog(WARNING, "SQL: %s", sql.data);
/* In what follows, do not risk leaking any PGresults. */
PG_TRY();
{
char fetch_sql[64];
int fetch_size;
- ListCell *lc;
res = pgfdw_exec_query(conn, sql.data, NULL);
if (PQresultStatus(res) != PGRES_COMMAND_OK)
@@ -5120,7 +5254,10 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
*totaldeadrows = 0.0;
/* We've retrieved all living tuples from foreign server. */
- *totalrows = astate.samplerows;
+ if (do_sample)
+ *totalrows = reltuples;
+ else
+ *totalrows = astate.samplerows;
/*
* Emit some interesting relation info
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 1687c62df2d..54b6556b20c 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -210,8 +210,15 @@ extern void deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
List *returningList,
List **retrieved_attrs);
extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
+extern void deparseAnalyzeTuplesSql(StringInfo buf, Relation rel);
extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
List **retrieved_attrs);
+extern void deparseAnalyzeTableSampleSql(StringInfo buf, Relation rel,
+ List **retrieved_attrs,
+ double sample_frac);
+extern void deparseAnalyzeLegacySampleSql(StringInfo buf, Relation rel,
+ List **retrieved_attrs,
+ double sample_frac);
extern void deparseTruncateSql(StringInfo buf,
List *rels,
DropBehavior behavior,
--
2.34.1
On 2022/02/18 22:28, Tomas Vondra wrote:
Hi,
here's a slightly updated version of the patch series.
Thanks for updating the patches!
The 0001 part
adds tracking of server_version_num, so that it's possible to enable
other features depending on it.
Like configure_remote_session() does, can't we use PQserverVersion() instead of implementing new function GetServerVersion()?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2/22/22 01:36, Fujii Masao wrote:
On 2022/02/18 22:28, Tomas Vondra wrote:
Hi,
here's a slightly updated version of the patch series.
Thanks for updating the patches!
The 0001 part
adds tracking of server_version_num, so that it's possible to enable
other features depending on it.Like configure_remote_session() does, can't we use PQserverVersion()
instead of implementing new function GetServerVersion()?
Ah! My knowledge of libpq is limited, so I wasn't sure this function
exists. It'll simplify the patch.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2/22/22 21:12, Tomas Vondra wrote:
On 2/22/22 01:36, Fujii Masao wrote:
On 2022/02/18 22:28, Tomas Vondra wrote:
Hi,
here's a slightly updated version of the patch series.
Thanks for updating the patches!
The 0001 part
adds tracking of server_version_num, so that it's possible to enable
other features depending on it.Like configure_remote_session() does, can't we use PQserverVersion()
instead of implementing new function GetServerVersion()?Ah! My knowledge of libpq is limited, so I wasn't sure this function
exists. It'll simplify the patch.
And here's the slightly simplified patch, without the part adding the
unnecessary GetServerVersion() function.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
0001-postgres_fdw-sample-data-on-remote-node-for-20220222.patchtext/x-patch; charset=UTF-8; name=0001-postgres_fdw-sample-data-on-remote-node-for-20220222.patchDownload
From 089ecf8c9ee44b89becfbcfb9c7a0ddc6c8f197a Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri, 18 Feb 2022 00:33:19 +0100
Subject: [PATCH] postgres_fdw: sample data on remote node for ANALYZE
When performing ANALYZE on a foreign tables, we need to collect sample
of rows. Until now, we simply read all data from the remote node and
built the sample locally. That is very expensive, especially in terms of
network traffic etc. But it's possible to move the sampling to the
remote node, and use either TABLESAMPLE or simply random() to transfer
just much smaller amount of data.
So we run either
SELECT * FROM table TABLESAMPLE SYSTEM(fraction)
or
SELECT * FROM table WHERE random() < fraction
depending on the server version (TABLESAMPLE is supported since 9.5).
To do that, we need to determine what fraction of the table to sample.
We rely on reltuples (fetched from the remote node) to be sufficiently
accurate and up to date, and calculate the fraction based on that. We
increase the sample size a bit (in case the table shrunk), and we still
do the reservoir sampling (in case it grew).
Using tsm_system_rows would allow specifying sample size in rows,
without determining sampling rate. But the sampling method may not be
installed, and we'd still have to determine the relation size.
This adds 'sample' option for remote servers / tables. By default, it's
set to 'true' which enables remote sampling. Setting it to 'false' uses
the old approach of fetching everything and sampling locally.
---
contrib/postgres_fdw/deparse.c | 152 ++++++++++++++++++++++++++++
contrib/postgres_fdw/option.c | 7 +-
contrib/postgres_fdw/postgres_fdw.c | 142 +++++++++++++++++++++++++-
contrib/postgres_fdw/postgres_fdw.h | 7 ++
4 files changed, 304 insertions(+), 4 deletions(-)
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index bf12eac0288..32f2c0d5fb3 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2267,6 +2267,26 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
appendStringInfo(buf, "::pg_catalog.regclass) / %d", BLCKSZ);
}
+/*
+ * Construct SELECT statement to acquire numbe of rows of given relation.
+ *
+ * Note: Maybe this should compare relpages and current relation size
+ * and adjust reltuples accordingly?
+ */
+void
+deparseAnalyzeTuplesSql(StringInfo buf, Relation rel)
+{
+ StringInfoData relname;
+
+ /* We'll need the remote relation name as a literal. */
+ initStringInfo(&relname);
+ deparseRelation(&relname, rel);
+
+ appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = ");
+ deparseStringLiteral(buf, relname.data);
+ appendStringInfoString(buf, "::pg_catalog.regclass");
+}
+
/*
* Construct SELECT statement to acquire sample rows of given relation.
*
@@ -2328,6 +2348,138 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
deparseRelation(buf, rel);
}
+/*
+ * Construct SELECT statement to acquire sample rows of given relation,
+ * by sampling a fraction of the table using TABLESAMPLE SYSTEM.
+ *
+ * SELECT command is appended to buf, and list of columns retrieved
+ * is returned to *retrieved_attrs.
+ *
+ * Note: We could allow selecting system/bernoulli, and maybe even the
+ * optional TSM modules (especially tsm_system_rows would help).
+ */
+void
+deparseAnalyzeTableSampleSql(StringInfo buf, Relation rel, List **retrieved_attrs, double sample_frac)
+{
+ Oid relid = RelationGetRelid(rel);
+ TupleDesc tupdesc = RelationGetDescr(rel);
+ int i;
+ char *colname;
+ List *options;
+ ListCell *lc;
+ bool first = true;
+
+ *retrieved_attrs = NIL;
+
+ appendStringInfoString(buf, "SELECT ");
+ for (i = 0; i < tupdesc->natts; i++)
+ {
+ /* Ignore dropped columns. */
+ if (TupleDescAttr(tupdesc, i)->attisdropped)
+ continue;
+
+ if (!first)
+ appendStringInfoString(buf, ", ");
+ first = false;
+
+ /* Use attribute name or column_name option. */
+ colname = NameStr(TupleDescAttr(tupdesc, i)->attname);
+ options = GetForeignColumnOptions(relid, i + 1);
+
+ foreach(lc, options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "column_name") == 0)
+ {
+ colname = defGetString(def);
+ break;
+ }
+ }
+
+ appendStringInfoString(buf, quote_identifier(colname));
+
+ *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+ }
+
+ /* Don't generate bad syntax for zero-column relation. */
+ if (first)
+ appendStringInfoString(buf, "NULL");
+
+ /*
+ * Construct FROM clause
+ */
+ appendStringInfoString(buf, " FROM ");
+ deparseRelation(buf, rel);
+ appendStringInfo(buf, " TABLESAMPLE SYSTEM(%f)", (100.0 * sample_frac));
+}
+
+/*
+ * Construct SELECT statement to acquire sample rows of given relation,
+ * by sampling a fraction of the table using TABLESAMPLE SYSTEM.
+ *
+ * SELECT command is appended to buf, and list of columns retrieved
+ * is returned to *retrieved_attrs.
+ *
+ * Note: We could allow selecting system/bernoulli, and maybe even the
+ * optional TSM modules (especially tsm_system_rows would help).
+ */
+void
+deparseAnalyzeLegacySampleSql(StringInfo buf, Relation rel, List **retrieved_attrs, double sample_frac)
+{
+ Oid relid = RelationGetRelid(rel);
+ TupleDesc tupdesc = RelationGetDescr(rel);
+ int i;
+ char *colname;
+ List *options;
+ ListCell *lc;
+ bool first = true;
+
+ *retrieved_attrs = NIL;
+
+ appendStringInfoString(buf, "SELECT ");
+ for (i = 0; i < tupdesc->natts; i++)
+ {
+ /* Ignore dropped columns. */
+ if (TupleDescAttr(tupdesc, i)->attisdropped)
+ continue;
+
+ if (!first)
+ appendStringInfoString(buf, ", ");
+ first = false;
+
+ /* Use attribute name or column_name option. */
+ colname = NameStr(TupleDescAttr(tupdesc, i)->attname);
+ options = GetForeignColumnOptions(relid, i + 1);
+
+ foreach(lc, options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "column_name") == 0)
+ {
+ colname = defGetString(def);
+ break;
+ }
+ }
+
+ appendStringInfoString(buf, quote_identifier(colname));
+
+ *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+ }
+
+ /* Don't generate bad syntax for zero-column relation. */
+ if (first)
+ appendStringInfoString(buf, "NULL");
+
+ /*
+ * Construct FROM clause
+ */
+ appendStringInfoString(buf, " FROM ");
+ deparseRelation(buf, rel);
+ appendStringInfo(buf, " WHERE random() < %f", sample_frac);
+}
+
/*
* Construct a simple "TRUNCATE rel" statement
*/
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 2c6b2894b96..65fc1acadcd 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -121,7 +121,8 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
strcmp(def->defname, "updatable") == 0 ||
strcmp(def->defname, "truncatable") == 0 ||
strcmp(def->defname, "async_capable") == 0 ||
- strcmp(def->defname, "keep_connections") == 0)
+ strcmp(def->defname, "keep_connections") == 0 ||
+ strcmp(def->defname, "sample") == 0)
{
/* these accept only boolean values */
(void) defGetBoolean(def);
@@ -252,6 +253,10 @@ InitPgFdwOptions(void)
{"keep_connections", ForeignServerRelationId, false},
{"password_required", UserMappingRelationId, false},
+ /* sampling is available on both server and table */
+ {"sample", ForeignServerRelationId, false},
+ {"sample", ForeignTableRelationId, false},
+
/*
* sslcert and sslkey are in fact libpq options, but we repeat them
* here to allow them to appear in both foreign server context (when
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 56654844e8f..6040ee199aa 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -4961,6 +4961,68 @@ postgresAnalyzeForeignTable(Relation relation,
return true;
}
+/*
+ * postgresCountTuplesForForeignTable
+ * Count tuples in foreign table (just get pg_class.reltuples).
+ *
+ * Note: It's unclear how accurate reltuples is, maybe size that using
+ * relpages and simple assumptions (1 tuples per page, ...)? Using
+ * tsm_system_rows wold make this somewhat unnecessary.
+ */
+static double
+postgresCountTuplesForForeignTable(Relation relation)
+{
+ ForeignTable *table;
+ UserMapping *user;
+ PGconn *conn;
+ StringInfoData sql;
+ PGresult *volatile res = NULL;
+ double reltuples = -1;
+
+ /*
+ * Now we have to get the number of pages. It's annoying that the ANALYZE
+ * API requires us to return that now, because it forces some duplication
+ * of effort between this routine and postgresAcquireSampleRowsFunc. But
+ * it's probably not worth redefining that API at this point.
+ */
+
+ /*
+ * Get the connection to use. We do the remote access as the table's
+ * owner, even if the ANALYZE was started by some other user.
+ */
+ table = GetForeignTable(RelationGetRelid(relation));
+ user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
+ conn = GetConnection(user, false, NULL);
+
+ /*
+ * Construct command to get page count for relation.
+ */
+ initStringInfo(&sql);
+ deparseAnalyzeTuplesSql(&sql, relation);
+
+ /* In what follows, do not risk leaking any PGresults. */
+ PG_TRY();
+ {
+ res = pgfdw_exec_query(conn, sql.data, NULL);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ pgfdw_report_error(ERROR, res, conn, false, sql.data);
+
+ if (PQntuples(res) != 1 || PQnfields(res) != 1)
+ elog(ERROR, "unexpected result from deparseAnalyzeSizeSql query");
+ reltuples = strtod(PQgetvalue(res, 0, 0), NULL);
+ }
+ PG_FINALLY();
+ {
+ if (res)
+ PQclear(res);
+ }
+ PG_END_TRY();
+
+ ReleaseConnection(conn);
+
+ return reltuples;
+}
+
/*
* Acquire a random sample of rows from foreign table managed by postgres_fdw.
*
@@ -4991,6 +5053,14 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
unsigned int cursor_number;
StringInfoData sql;
PGresult *volatile res = NULL;
+ ListCell *lc;
+ int server_version_num;
+
+ /* sampling enabled by default */
+ bool do_sample = true;
+ bool use_tablesample = true;
+ double sample_frac = -1.0;
+ double reltuples;
/* Initialize workspace state */
astate.rel = relation;
@@ -5018,20 +5088,83 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
conn = GetConnection(user, false, NULL);
+ /* We'll need server version, so fetch it now. */
+ server_version_num = PQserverVersion(conn);
+
+ /* disable tablesample on old remote servers */
+ if (server_version_num < 95000)
+ use_tablesample = false;
+
+ /*
+ * Should we use TABLESAMPLE to collect the remote sample?
+ */
+ foreach(lc, server->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "sample") == 0)
+ {
+ do_sample = defGetBoolean(def);
+ break;
+ }
+ }
+ foreach(lc, table->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "sample") == 0)
+ {
+ do_sample = defGetBoolean(def);
+ break;
+ }
+ }
+
+ if (do_sample)
+ {
+ reltuples = postgresCountTuplesForForeignTable(relation);
+
+ if ((reltuples <= 0) || (targrows >= reltuples))
+ do_sample = false;
+
+ sample_frac = targrows / reltuples;
+
+ /* Let's sample a bit more, we'll reduce the sample locally. */
+ sample_frac *= 1.25;
+
+ /* Sanity checks. */
+ sample_frac = Min(1.0, Max(0.0, sample_frac));
+
+ /*
+ * When sampling too large fraction, just read everything.
+ *
+ * XXX It's not clear where exactly the threshold is, with slow
+ * network it may be cheaper to sample even 90%.
+ */
+ if (sample_frac > 0.5)
+ do_sample = false;
+ }
+
/*
* Construct cursor that retrieves whole rows from remote.
*/
cursor_number = GetCursorNumber(conn);
initStringInfo(&sql);
appendStringInfo(&sql, "DECLARE c%u CURSOR FOR ", cursor_number);
- deparseAnalyzeSql(&sql, relation, &astate.retrieved_attrs);
+
+ if (do_sample && use_tablesample)
+ deparseAnalyzeTableSampleSql(&sql, relation, &astate.retrieved_attrs, sample_frac);
+ else if (do_sample)
+ deparseAnalyzeLegacySampleSql(&sql, relation, &astate.retrieved_attrs, sample_frac);
+ else
+ deparseAnalyzeSql(&sql, relation, &astate.retrieved_attrs);
+
+ elog(WARNING, "SQL: %s", sql.data);
/* In what follows, do not risk leaking any PGresults. */
PG_TRY();
{
char fetch_sql[64];
int fetch_size;
- ListCell *lc;
res = pgfdw_exec_query(conn, sql.data, NULL);
if (PQresultStatus(res) != PGRES_COMMAND_OK)
@@ -5120,7 +5253,10 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
*totaldeadrows = 0.0;
/* We've retrieved all living tuples from foreign server. */
- *totalrows = astate.samplerows;
+ if (do_sample)
+ *totalrows = reltuples;
+ else
+ *totalrows = astate.samplerows;
/*
* Emit some interesting relation info
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 8ae79e97e44..6bc0a93de68 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -208,8 +208,15 @@ extern void deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
List *returningList,
List **retrieved_attrs);
extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
+extern void deparseAnalyzeTuplesSql(StringInfo buf, Relation rel);
extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
List **retrieved_attrs);
+extern void deparseAnalyzeTableSampleSql(StringInfo buf, Relation rel,
+ List **retrieved_attrs,
+ double sample_frac);
+extern void deparseAnalyzeLegacySampleSql(StringInfo buf, Relation rel,
+ List **retrieved_attrs,
+ double sample_frac);
extern void deparseTruncateSql(StringInfo buf,
List *rels,
DropBehavior behavior,
--
2.34.1
Hi,
On 2022-02-23 00:51:24 +0100, Tomas Vondra wrote:
And here's the slightly simplified patch, without the part adding the
unnecessary GetServerVersion() function.
Doesn't apply anymore: http://cfbot.cputube.org/patch_37_3535.log
Marked as waiting-on-author.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2022-02-23 00:51:24 +0100, Tomas Vondra wrote:
And here's the slightly simplified patch, without the part adding the
unnecessary GetServerVersion() function.
Doesn't apply anymore: http://cfbot.cputube.org/patch_37_3535.log
Marked as waiting-on-author.
Here's a rebased version that should at least pass regression tests.
I've not reviewed it in any detail, but:
* I'm not really on board with defaulting to SYSTEM sample method,
and definitely not on board with not allowing any other choice.
We don't know enough about the situation in a remote table to be
confident that potentially-nonrandom sampling is OK. So personally
I'd default to BERNOULLI, which is more reliable and seems plenty fast
enough given your upthread results. It could be an idea to extend the
sample option to be like "sample [ = methodname ]", if you want more
flexibility, but I'd be happy leaving that for another time.
* The code depending on reltuples is broken in recent server versions,
and might give useless results in older ones too (if reltuples =
relpages = 0). Ideally we'd retrieve all of reltuples, relpages, and
pg_relation_size(rel), and do the same calculation the planner does.
Not sure if pg_relation_size() exists far enough back though.
* Copying-and-pasting all of deparseAnalyzeSql (twice!) seems pretty
bletcherous. Why not call that and then add a WHERE clause to its
result, or just add some parameters to it so it can do that itself?
* More attention to updating relevant comments would be appropriate,
eg here you've not bothered to fix the adjacent falsified comment:
/* We've retrieved all living tuples from foreign server. */
- *totalrows = astate.samplerows;
+ if (do_sample)
+ *totalrows = reltuples;
+ else
+ *totalrows = astate.samplerows;
* Needs docs obviously. I'm not sure if the existing regression
testing covers the new code adequately or if we need more cases.
Having said that much, I'm going to leave it in Waiting on Author
state.
regards, tom lane
Attachments:
0001-postgres_fdw-sample-data-on-remote-node-for-20220716.patchtext/x-diff; charset=us-ascii; name*0=0001-postgres_fdw-sample-data-on-remote-node-for-20220716.p; name*1=atchDownload
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 8f4d8a5022..cdf12e53e8 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2297,6 +2297,31 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
appendStringInfo(buf, "::pg_catalog.regclass) / %d", BLCKSZ);
}
+/*
+ * Construct SELECT statement to acquire the number of rows in given relation.
+ *
+ * We just need an estimate here, so consult pg_class.reltuples instead
+ * of doing anything as expensive as COUNT(*).
+ *
+ * Note: Maybe this should compare relpages and current relation size
+ * and adjust reltuples accordingly, like the planner does?
+ * XXX: it needs some work anyway, because it won't do anything sane
+ * for a never-analyzed table with reltuples = -1.
+ */
+void
+deparseAnalyzeTuplesSql(StringInfo buf, Relation rel)
+{
+ StringInfoData relname;
+
+ /* We'll need the remote relation name as a literal. */
+ initStringInfo(&relname);
+ deparseRelation(&relname, rel);
+
+ appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = ");
+ deparseStringLiteral(buf, relname.data);
+ appendStringInfoString(buf, "::pg_catalog.regclass");
+}
+
/*
* Construct SELECT statement to acquire sample rows of given relation.
*
@@ -2358,6 +2383,135 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
deparseRelation(buf, rel);
}
+/*
+ * Construct SELECT statement to acquire sample rows of given relation,
+ * by sampling a fraction of the table using TABLESAMPLE SYSTEM.
+ *
+ * SELECT command is appended to buf, and list of columns retrieved
+ * is returned to *retrieved_attrs.
+ *
+ * Note: We could allow selecting system/bernoulli, and maybe even the
+ * optional TSM modules (especially tsm_system_rows would help).
+ */
+void
+deparseAnalyzeTableSampleSql(StringInfo buf, Relation rel, List **retrieved_attrs, double sample_frac)
+{
+ Oid relid = RelationGetRelid(rel);
+ TupleDesc tupdesc = RelationGetDescr(rel);
+ int i;
+ char *colname;
+ List *options;
+ ListCell *lc;
+ bool first = true;
+
+ *retrieved_attrs = NIL;
+
+ appendStringInfoString(buf, "SELECT ");
+ for (i = 0; i < tupdesc->natts; i++)
+ {
+ /* Ignore dropped columns. */
+ if (TupleDescAttr(tupdesc, i)->attisdropped)
+ continue;
+
+ if (!first)
+ appendStringInfoString(buf, ", ");
+ first = false;
+
+ /* Use attribute name or column_name option. */
+ colname = NameStr(TupleDescAttr(tupdesc, i)->attname);
+ options = GetForeignColumnOptions(relid, i + 1);
+
+ foreach(lc, options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "column_name") == 0)
+ {
+ colname = defGetString(def);
+ break;
+ }
+ }
+
+ appendStringInfoString(buf, quote_identifier(colname));
+
+ *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+ }
+
+ /* Don't generate bad syntax for zero-column relation. */
+ if (first)
+ appendStringInfoString(buf, "NULL");
+
+ /*
+ * Construct FROM clause
+ */
+ appendStringInfoString(buf, " FROM ");
+ deparseRelation(buf, rel);
+ appendStringInfo(buf, " TABLESAMPLE SYSTEM(%f)", (100.0 * sample_frac));
+}
+
+/*
+ * Construct SELECT statement to acquire sample rows of given relation,
+ * by sampling a fraction of the table using random().
+ *
+ * SELECT command is appended to buf, and list of columns retrieved
+ * is returned to *retrieved_attrs.
+ */
+void
+deparseAnalyzeLegacySampleSql(StringInfo buf, Relation rel, List **retrieved_attrs, double sample_frac)
+{
+ Oid relid = RelationGetRelid(rel);
+ TupleDesc tupdesc = RelationGetDescr(rel);
+ int i;
+ char *colname;
+ List *options;
+ ListCell *lc;
+ bool first = true;
+
+ *retrieved_attrs = NIL;
+
+ appendStringInfoString(buf, "SELECT ");
+ for (i = 0; i < tupdesc->natts; i++)
+ {
+ /* Ignore dropped columns. */
+ if (TupleDescAttr(tupdesc, i)->attisdropped)
+ continue;
+
+ if (!first)
+ appendStringInfoString(buf, ", ");
+ first = false;
+
+ /* Use attribute name or column_name option. */
+ colname = NameStr(TupleDescAttr(tupdesc, i)->attname);
+ options = GetForeignColumnOptions(relid, i + 1);
+
+ foreach(lc, options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "column_name") == 0)
+ {
+ colname = defGetString(def);
+ break;
+ }
+ }
+
+ appendStringInfoString(buf, quote_identifier(colname));
+
+ *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+ }
+
+ /* Don't generate bad syntax for zero-column relation. */
+ if (first)
+ appendStringInfoString(buf, "NULL");
+
+ /*
+ * Construct FROM clause
+ */
+ appendStringInfoString(buf, " FROM ");
+ deparseRelation(buf, rel);
+ appendStringInfo(buf, " WHERE random() < %f", sample_frac);
+}
+
/*
* Construct a simple "TRUNCATE rel" statement
*/
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 5f2ef88cf3..482c21032b 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9532,7 +9532,7 @@ DO $d$
END;
$d$;
ERROR: invalid option "password"
-HINT: Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, sslcrldir, sslsni, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, truncatable, fetch_size, batch_size, async_capable, parallel_commit, keep_connections
+HINT: Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, sslcrldir, sslsni, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, truncatable, fetch_size, batch_size, async_capable, parallel_commit, keep_connections, sample
CONTEXT: SQL statement "ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw')"
PL/pgSQL function inline_code_block line 3 at EXECUTE
-- If we add a password for our user mapping instead, we should get a different
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 572591a558..3749d4701a 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -122,7 +122,8 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
strcmp(def->defname, "truncatable") == 0 ||
strcmp(def->defname, "async_capable") == 0 ||
strcmp(def->defname, "parallel_commit") == 0 ||
- strcmp(def->defname, "keep_connections") == 0)
+ strcmp(def->defname, "keep_connections") == 0 ||
+ strcmp(def->defname, "sample") == 0)
{
/* these accept only boolean values */
(void) defGetBoolean(def);
@@ -254,6 +255,10 @@ InitPgFdwOptions(void)
{"keep_connections", ForeignServerRelationId, false},
{"password_required", UserMappingRelationId, false},
+ /* sampling is available on both server and table */
+ {"sample", ForeignServerRelationId, false},
+ {"sample", ForeignTableRelationId, false},
+
/*
* sslcert and sslkey are in fact libpq options, but we repeat them
* here to allow them to appear in both foreign server context (when
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 955a428e3d..cd2088e2ae 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -4961,6 +4961,68 @@ postgresAnalyzeForeignTable(Relation relation,
return true;
}
+/*
+ * postgresCountTuplesForForeignTable
+ * Count tuples in foreign table (just get pg_class.reltuples).
+ *
+ * Note: It's unclear how accurate reltuples is, maybe size that using
+ * relpages and simple assumptions (1 tuples per page, ...)? Using
+ * tsm_system_rows wold make this somewhat unnecessary.
+ */
+static double
+postgresCountTuplesForForeignTable(Relation relation)
+{
+ ForeignTable *table;
+ UserMapping *user;
+ PGconn *conn;
+ StringInfoData sql;
+ PGresult *volatile res = NULL;
+ double reltuples = -1;
+
+ /*
+ * Now we have to get the number of pages. It's annoying that the ANALYZE
+ * API requires us to return that now, because it forces some duplication
+ * of effort between this routine and postgresAcquireSampleRowsFunc. But
+ * it's probably not worth redefining that API at this point.
+ */
+
+ /*
+ * Get the connection to use. We do the remote access as the table's
+ * owner, even if the ANALYZE was started by some other user.
+ */
+ table = GetForeignTable(RelationGetRelid(relation));
+ user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
+ conn = GetConnection(user, false, NULL);
+
+ /*
+ * Construct command to get page count for relation.
+ */
+ initStringInfo(&sql);
+ deparseAnalyzeTuplesSql(&sql, relation);
+
+ /* In what follows, do not risk leaking any PGresults. */
+ PG_TRY();
+ {
+ res = pgfdw_exec_query(conn, sql.data, NULL);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ pgfdw_report_error(ERROR, res, conn, false, sql.data);
+
+ if (PQntuples(res) != 1 || PQnfields(res) != 1)
+ elog(ERROR, "unexpected result from deparseAnalyzeSizeSql query");
+ reltuples = strtod(PQgetvalue(res, 0, 0), NULL);
+ }
+ PG_FINALLY();
+ {
+ if (res)
+ PQclear(res);
+ }
+ PG_END_TRY();
+
+ ReleaseConnection(conn);
+
+ return reltuples;
+}
+
/*
* Acquire a random sample of rows from foreign table managed by postgres_fdw.
*
@@ -4991,6 +5053,12 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
unsigned int cursor_number;
StringInfoData sql;
PGresult *volatile res = NULL;
+ ListCell *lc;
+ int server_version_num;
+ bool do_sample = true; /* enabled by default */
+ bool use_tablesample = true;
+ double sample_frac = -1.0;
+ double reltuples;
/* Initialize workspace state */
astate.rel = relation;
@@ -5018,20 +5086,81 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
conn = GetConnection(user, false, NULL);
+ /* We'll need server version, so fetch it now. */
+ server_version_num = PQserverVersion(conn);
+
+ /* disable tablesample on old remote servers */
+ if (server_version_num < 95000)
+ use_tablesample = false;
+
+ /*
+ * Should we use TABLESAMPLE to collect the remote sample?
+ */
+ foreach(lc, server->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "sample") == 0)
+ {
+ do_sample = defGetBoolean(def);
+ break;
+ }
+ }
+ foreach(lc, table->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "sample") == 0)
+ {
+ do_sample = defGetBoolean(def);
+ break;
+ }
+ }
+
+ if (do_sample)
+ {
+ reltuples = postgresCountTuplesForForeignTable(relation);
+
+ if ((reltuples <= 0) || (targrows >= reltuples))
+ do_sample = false;
+
+ sample_frac = targrows / reltuples;
+
+ /* Let's sample a bit more, we'll reduce the sample locally. */
+ sample_frac *= 1.25;
+
+ /* Sanity checks. */
+ sample_frac = Min(1.0, Max(0.0, sample_frac));
+
+ /*
+ * When sampling too large fraction, just read everything.
+ *
+ * XXX It's not clear where exactly the threshold is, with slow
+ * network it may be cheaper to sample even 90%.
+ */
+ if (sample_frac > 0.5)
+ do_sample = false;
+ }
+
/*
* Construct cursor that retrieves whole rows from remote.
*/
cursor_number = GetCursorNumber(conn);
initStringInfo(&sql);
appendStringInfo(&sql, "DECLARE c%u CURSOR FOR ", cursor_number);
- deparseAnalyzeSql(&sql, relation, &astate.retrieved_attrs);
+
+ if (do_sample && use_tablesample)
+ deparseAnalyzeTableSampleSql(&sql, relation, &astate.retrieved_attrs, sample_frac);
+ else if (do_sample)
+ deparseAnalyzeLegacySampleSql(&sql, relation, &astate.retrieved_attrs, sample_frac);
+ else
+ deparseAnalyzeSql(&sql, relation, &astate.retrieved_attrs);
/* In what follows, do not risk leaking any PGresults. */
PG_TRY();
{
char fetch_sql[64];
int fetch_size;
- ListCell *lc;
res = pgfdw_exec_query(conn, sql.data, NULL);
if (PQresultStatus(res) != PGRES_COMMAND_OK)
@@ -5119,7 +5248,10 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
*totaldeadrows = 0.0;
/* We've retrieved all living tuples from foreign server. */
- *totalrows = astate.samplerows;
+ if (do_sample)
+ *totalrows = reltuples;
+ else
+ *totalrows = astate.samplerows;
/*
* Emit some interesting relation info
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 21f2b20ce8..b0d9cf4298 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -211,8 +211,15 @@ extern void deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
List *returningList,
List **retrieved_attrs);
extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
+extern void deparseAnalyzeTuplesSql(StringInfo buf, Relation rel);
extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
List **retrieved_attrs);
+extern void deparseAnalyzeTableSampleSql(StringInfo buf, Relation rel,
+ List **retrieved_attrs,
+ double sample_frac);
+extern void deparseAnalyzeLegacySampleSql(StringInfo buf, Relation rel,
+ List **retrieved_attrs,
+ double sample_frac);
extern void deparseTruncateSql(StringInfo buf,
List *rels,
DropBehavior behavior,
On 7/16/22 23:57, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2022-02-23 00:51:24 +0100, Tomas Vondra wrote:
And here's the slightly simplified patch, without the part adding the
unnecessary GetServerVersion() function.Doesn't apply anymore: http://cfbot.cputube.org/patch_37_3535.log
Marked as waiting-on-author.Here's a rebased version that should at least pass regression tests.
Thanks. I've been hacking on this over the past few days, and by
coincidence I've been improving exactly the stuff you've pointed out in
the review. 0001 is just the original patch rebased, 0002 includes all
the various changes.
I've not reviewed it in any detail, but:
* I'm not really on board with defaulting to SYSTEM sample method,
and definitely not on board with not allowing any other choice.
We don't know enough about the situation in a remote table to be
confident that potentially-nonrandom sampling is OK. So personally
I'd default to BERNOULLI, which is more reliable and seems plenty fast
enough given your upthread results. It could be an idea to extend the
sample option to be like "sample [ = methodname ]", if you want more
flexibility, but I'd be happy leaving that for another time.
I agree, I came roughly to the same conclusion, so I replaced the simple
on/off option with these options:
off - Disables the remote sampling, so we just fetch everything and do
sampling on the local node, just like today.
random - Remote sampling, but "naive" implementation using random()
function. The advantage is this reduces the amount of data we need to
transfer, but it still reads the whole table. This should work for all
server versions, I believe.
system - TABLESAMPLE system method.
bernoulli - TABLESAMOLE bernoulli (default for 9.5+)
auto - picks bernoulli on 9.5+, random on older servers.
I'm not sure about custom TABLESAMPLE methods - that adds more
complexity to detect if it's installed, it's trickier to decide what's
the best choice (for "auto" to make decide), and the parameter is also
different (e.g. system_rows uses number of rows vs. sampling rate).
* The code depending on reltuples is broken in recent server versions,
and might give useless results in older ones too (if reltuples =
relpages = 0). Ideally we'd retrieve all of reltuples, relpages, and
pg_relation_size(rel), and do the same calculation the planner does.
Not sure if pg_relation_size() exists far enough back though.
Yes, I noticed that too, and the reworked code should deal with this
reltuples=0 (by just disabling remote sampling).
I haven't implemented the reltuples/relpages correction yet, but I don't
think compatibility would be an issue - deparseAnalyzeSizeSql() already
calls pg_relation_size(), after all.
FWIW it seems a bit weird being so careful about adjusting reltuples,
when acquire_inherited_sample_rows() only really looks at relpages when
deciding how many rows to sample from each partition. If our goal is to
use a more accurate reltuples, maybe we should do that in the first step
already. Otherwise we may end up build with a sample that does not
reflect sizes of the partitions correctly.
Of course, the sample rate also matters for non-partitioned tables.
* Copying-and-pasting all of deparseAnalyzeSql (twice!) seems pretty
bletcherous. Why not call that and then add a WHERE clause to its
result, or just add some parameters to it so it can do that itself?
Right. I ended up refactoring this into a single function, with a
"method" parameter that determines if/how we do the remote sampling.
* More attention to updating relevant comments would be appropriate,
eg here you've not bothered to fix the adjacent falsified comment:/* We've retrieved all living tuples from foreign server. */ - *totalrows = astate.samplerows; + if (do_sample) + *totalrows = reltuples; + else + *totalrows = astate.samplerows;
Yep, fixed.
* Needs docs obviously. I'm not sure if the existing regression
testing covers the new code adequately or if we need more cases.
Yep, I added the "sampling_method" to postgres-fdw.sgml.
Having said that much, I'm going to leave it in Waiting on Author
state.
Thanks. I'll switch this to "needs review" now.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
0001-postgres_fdw-sample-data-on-remote-node-for-20220718.patchtext/x-patch; charset=UTF-8; name=0001-postgres_fdw-sample-data-on-remote-node-for-20220718.patchDownload
From 77589ba90e8f3007b0d58f522f9e498b7d8ab277 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 16 Jul 2022 00:37:20 +0200
Subject: [PATCH 1/2] postgres_fdw: sample data on remote node for ANALYZE
When performing ANALYZE on a foreign tables, we need to collect sample
of rows. Until now, we simply read all data from the remote node and
built the sample locally. That is very expensive, especially in terms of
network traffic etc. But it's possible to move the sampling to the
remote node, and use either TABLESAMPLE or simply random() to transfer
just much smaller amount of data.
So we run either
SELECT * FROM table TABLESAMPLE SYSTEM(fraction)
or
SELECT * FROM table WHERE random() < fraction
depending on the server version (TABLESAMPLE is supported since 9.5).
To do that, we need to determine what fraction of the table to sample.
We rely on reltuples (fetched from the remote node) to be sufficiently
accurate and up to date, and calculate the fraction based on that. We
increase the sample size a bit (in case the table shrunk), and we still
do the reservoir sampling (in case it grew).
Using tsm_system_rows would allow specifying sample size in rows,
without determining sampling rate. But the sampling method may not be
installed, and we'd still have to determine the relation size.
This adds 'sample' option for remote servers / tables. By default, it's
set to 'true' which enables remote sampling. Setting it to 'false' uses
the old approach of fetching everything and sampling locally.
---
contrib/postgres_fdw/deparse.c | 152 ++++++++++++++++++++++++++++
contrib/postgres_fdw/option.c | 7 +-
contrib/postgres_fdw/postgres_fdw.c | 142 +++++++++++++++++++++++++-
contrib/postgres_fdw/postgres_fdw.h | 7 ++
4 files changed, 304 insertions(+), 4 deletions(-)
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 8f4d8a50226..8454f489161 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2297,6 +2297,26 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
appendStringInfo(buf, "::pg_catalog.regclass) / %d", BLCKSZ);
}
+/*
+ * Construct SELECT statement to acquire numbe of rows of given relation.
+ *
+ * Note: Maybe this should compare relpages and current relation size
+ * and adjust reltuples accordingly?
+ */
+void
+deparseAnalyzeTuplesSql(StringInfo buf, Relation rel)
+{
+ StringInfoData relname;
+
+ /* We'll need the remote relation name as a literal. */
+ initStringInfo(&relname);
+ deparseRelation(&relname, rel);
+
+ appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = ");
+ deparseStringLiteral(buf, relname.data);
+ appendStringInfoString(buf, "::pg_catalog.regclass");
+}
+
/*
* Construct SELECT statement to acquire sample rows of given relation.
*
@@ -2358,6 +2378,138 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
deparseRelation(buf, rel);
}
+/*
+ * Construct SELECT statement to acquire sample rows of given relation,
+ * by sampling a fraction of the table using TABLESAMPLE SYSTEM.
+ *
+ * SELECT command is appended to buf, and list of columns retrieved
+ * is returned to *retrieved_attrs.
+ *
+ * Note: We could allow selecting system/bernoulli, and maybe even the
+ * optional TSM modules (especially tsm_system_rows would help).
+ */
+void
+deparseAnalyzeTableSampleSql(StringInfo buf, Relation rel, List **retrieved_attrs, double sample_frac)
+{
+ Oid relid = RelationGetRelid(rel);
+ TupleDesc tupdesc = RelationGetDescr(rel);
+ int i;
+ char *colname;
+ List *options;
+ ListCell *lc;
+ bool first = true;
+
+ *retrieved_attrs = NIL;
+
+ appendStringInfoString(buf, "SELECT ");
+ for (i = 0; i < tupdesc->natts; i++)
+ {
+ /* Ignore dropped columns. */
+ if (TupleDescAttr(tupdesc, i)->attisdropped)
+ continue;
+
+ if (!first)
+ appendStringInfoString(buf, ", ");
+ first = false;
+
+ /* Use attribute name or column_name option. */
+ colname = NameStr(TupleDescAttr(tupdesc, i)->attname);
+ options = GetForeignColumnOptions(relid, i + 1);
+
+ foreach(lc, options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "column_name") == 0)
+ {
+ colname = defGetString(def);
+ break;
+ }
+ }
+
+ appendStringInfoString(buf, quote_identifier(colname));
+
+ *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+ }
+
+ /* Don't generate bad syntax for zero-column relation. */
+ if (first)
+ appendStringInfoString(buf, "NULL");
+
+ /*
+ * Construct FROM clause
+ */
+ appendStringInfoString(buf, " FROM ");
+ deparseRelation(buf, rel);
+ appendStringInfo(buf, " TABLESAMPLE SYSTEM(%f)", (100.0 * sample_frac));
+}
+
+/*
+ * Construct SELECT statement to acquire sample rows of given relation,
+ * by sampling a fraction of the table using TABLESAMPLE SYSTEM.
+ *
+ * SELECT command is appended to buf, and list of columns retrieved
+ * is returned to *retrieved_attrs.
+ *
+ * Note: We could allow selecting system/bernoulli, and maybe even the
+ * optional TSM modules (especially tsm_system_rows would help).
+ */
+void
+deparseAnalyzeLegacySampleSql(StringInfo buf, Relation rel, List **retrieved_attrs, double sample_frac)
+{
+ Oid relid = RelationGetRelid(rel);
+ TupleDesc tupdesc = RelationGetDescr(rel);
+ int i;
+ char *colname;
+ List *options;
+ ListCell *lc;
+ bool first = true;
+
+ *retrieved_attrs = NIL;
+
+ appendStringInfoString(buf, "SELECT ");
+ for (i = 0; i < tupdesc->natts; i++)
+ {
+ /* Ignore dropped columns. */
+ if (TupleDescAttr(tupdesc, i)->attisdropped)
+ continue;
+
+ if (!first)
+ appendStringInfoString(buf, ", ");
+ first = false;
+
+ /* Use attribute name or column_name option. */
+ colname = NameStr(TupleDescAttr(tupdesc, i)->attname);
+ options = GetForeignColumnOptions(relid, i + 1);
+
+ foreach(lc, options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "column_name") == 0)
+ {
+ colname = defGetString(def);
+ break;
+ }
+ }
+
+ appendStringInfoString(buf, quote_identifier(colname));
+
+ *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+ }
+
+ /* Don't generate bad syntax for zero-column relation. */
+ if (first)
+ appendStringInfoString(buf, "NULL");
+
+ /*
+ * Construct FROM clause
+ */
+ appendStringInfoString(buf, " FROM ");
+ deparseRelation(buf, rel);
+ appendStringInfo(buf, " WHERE random() < %f", sample_frac);
+}
+
/*
* Construct a simple "TRUNCATE rel" statement
*/
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 572591a558d..3749d4701a7 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -122,7 +122,8 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
strcmp(def->defname, "truncatable") == 0 ||
strcmp(def->defname, "async_capable") == 0 ||
strcmp(def->defname, "parallel_commit") == 0 ||
- strcmp(def->defname, "keep_connections") == 0)
+ strcmp(def->defname, "keep_connections") == 0 ||
+ strcmp(def->defname, "sample") == 0)
{
/* these accept only boolean values */
(void) defGetBoolean(def);
@@ -254,6 +255,10 @@ InitPgFdwOptions(void)
{"keep_connections", ForeignServerRelationId, false},
{"password_required", UserMappingRelationId, false},
+ /* sampling is available on both server and table */
+ {"sample", ForeignServerRelationId, false},
+ {"sample", ForeignTableRelationId, false},
+
/*
* sslcert and sslkey are in fact libpq options, but we repeat them
* here to allow them to appear in both foreign server context (when
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 955a428e3da..3adf518b676 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -4961,6 +4961,68 @@ postgresAnalyzeForeignTable(Relation relation,
return true;
}
+/*
+ * postgresCountTuplesForForeignTable
+ * Count tuples in foreign table (just get pg_class.reltuples).
+ *
+ * Note: It's unclear how accurate reltuples is, maybe size that using
+ * relpages and simple assumptions (1 tuples per page, ...)? Using
+ * tsm_system_rows wold make this somewhat unnecessary.
+ */
+static double
+postgresCountTuplesForForeignTable(Relation relation)
+{
+ ForeignTable *table;
+ UserMapping *user;
+ PGconn *conn;
+ StringInfoData sql;
+ PGresult *volatile res = NULL;
+ double reltuples = -1;
+
+ /*
+ * Now we have to get the number of pages. It's annoying that the ANALYZE
+ * API requires us to return that now, because it forces some duplication
+ * of effort between this routine and postgresAcquireSampleRowsFunc. But
+ * it's probably not worth redefining that API at this point.
+ */
+
+ /*
+ * Get the connection to use. We do the remote access as the table's
+ * owner, even if the ANALYZE was started by some other user.
+ */
+ table = GetForeignTable(RelationGetRelid(relation));
+ user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
+ conn = GetConnection(user, false, NULL);
+
+ /*
+ * Construct command to get page count for relation.
+ */
+ initStringInfo(&sql);
+ deparseAnalyzeTuplesSql(&sql, relation);
+
+ /* In what follows, do not risk leaking any PGresults. */
+ PG_TRY();
+ {
+ res = pgfdw_exec_query(conn, sql.data, NULL);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ pgfdw_report_error(ERROR, res, conn, false, sql.data);
+
+ if (PQntuples(res) != 1 || PQnfields(res) != 1)
+ elog(ERROR, "unexpected result from deparseAnalyzeSizeSql query");
+ reltuples = strtod(PQgetvalue(res, 0, 0), NULL);
+ }
+ PG_FINALLY();
+ {
+ if (res)
+ PQclear(res);
+ }
+ PG_END_TRY();
+
+ ReleaseConnection(conn);
+
+ return reltuples;
+}
+
/*
* Acquire a random sample of rows from foreign table managed by postgres_fdw.
*
@@ -4991,6 +5053,14 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
unsigned int cursor_number;
StringInfoData sql;
PGresult *volatile res = NULL;
+ ListCell *lc;
+ int server_version_num;
+
+ /* sampling enabled by default */
+ bool do_sample = true;
+ bool use_tablesample = true;
+ double sample_frac = -1.0;
+ double reltuples;
/* Initialize workspace state */
astate.rel = relation;
@@ -5018,20 +5088,83 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
conn = GetConnection(user, false, NULL);
+ /* We'll need server version, so fetch it now. */
+ server_version_num = PQserverVersion(conn);
+
+ /* disable tablesample on old remote servers */
+ if (server_version_num < 95000)
+ use_tablesample = false;
+
+ /*
+ * Should we use TABLESAMPLE to collect the remote sample?
+ */
+ foreach(lc, server->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "sample") == 0)
+ {
+ do_sample = defGetBoolean(def);
+ break;
+ }
+ }
+ foreach(lc, table->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "sample") == 0)
+ {
+ do_sample = defGetBoolean(def);
+ break;
+ }
+ }
+
+ if (do_sample)
+ {
+ reltuples = postgresCountTuplesForForeignTable(relation);
+
+ if ((reltuples <= 0) || (targrows >= reltuples))
+ do_sample = false;
+
+ sample_frac = targrows / reltuples;
+
+ /* Let's sample a bit more, we'll reduce the sample locally. */
+ sample_frac *= 1.25;
+
+ /* Sanity checks. */
+ sample_frac = Min(1.0, Max(0.0, sample_frac));
+
+ /*
+ * When sampling too large fraction, just read everything.
+ *
+ * XXX It's not clear where exactly the threshold is, with slow
+ * network it may be cheaper to sample even 90%.
+ */
+ if (sample_frac > 0.5)
+ do_sample = false;
+ }
+
/*
* Construct cursor that retrieves whole rows from remote.
*/
cursor_number = GetCursorNumber(conn);
initStringInfo(&sql);
appendStringInfo(&sql, "DECLARE c%u CURSOR FOR ", cursor_number);
- deparseAnalyzeSql(&sql, relation, &astate.retrieved_attrs);
+
+ if (do_sample && use_tablesample)
+ deparseAnalyzeTableSampleSql(&sql, relation, &astate.retrieved_attrs, sample_frac);
+ else if (do_sample)
+ deparseAnalyzeLegacySampleSql(&sql, relation, &astate.retrieved_attrs, sample_frac);
+ else
+ deparseAnalyzeSql(&sql, relation, &astate.retrieved_attrs);
+
+ elog(WARNING, "SQL: %s", sql.data);
/* In what follows, do not risk leaking any PGresults. */
PG_TRY();
{
char fetch_sql[64];
int fetch_size;
- ListCell *lc;
res = pgfdw_exec_query(conn, sql.data, NULL);
if (PQresultStatus(res) != PGRES_COMMAND_OK)
@@ -5119,7 +5252,10 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
*totaldeadrows = 0.0;
/* We've retrieved all living tuples from foreign server. */
- *totalrows = astate.samplerows;
+ if (do_sample)
+ *totalrows = reltuples;
+ else
+ *totalrows = astate.samplerows;
/*
* Emit some interesting relation info
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 21f2b20ce8d..b0d9cf42982 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -211,8 +211,15 @@ extern void deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
List *returningList,
List **retrieved_attrs);
extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
+extern void deparseAnalyzeTuplesSql(StringInfo buf, Relation rel);
extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
List **retrieved_attrs);
+extern void deparseAnalyzeTableSampleSql(StringInfo buf, Relation rel,
+ List **retrieved_attrs,
+ double sample_frac);
+extern void deparseAnalyzeLegacySampleSql(StringInfo buf, Relation rel,
+ List **retrieved_attrs,
+ double sample_frac);
extern void deparseTruncateSql(StringInfo buf,
List *rels,
DropBehavior behavior,
--
2.34.3
0002-rework-postgres_fdw-analyze-sampling-20220718.patchtext/x-patch; charset=UTF-8; name=0002-rework-postgres_fdw-analyze-sampling-20220718.patchDownload
From 8549f29f8a94674ee75fbeb6c699ef1750f51df2 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 18 Jul 2022 09:23:23 +0200
Subject: [PATCH 2/2] rework postgres_fdw analyze sampling
---
contrib/postgres_fdw/deparse.c | 167 +++++-------------
.../postgres_fdw/expected/postgres_fdw.out | 44 ++++-
contrib/postgres_fdw/option.c | 24 ++-
contrib/postgres_fdw/postgres_fdw.c | 131 ++++++++++----
contrib/postgres_fdw/postgres_fdw.h | 20 ++-
contrib/postgres_fdw/sql/postgres_fdw.sql | 54 ++++++
doc/src/sgml/postgres-fdw.sgml | 22 +++
7 files changed, 289 insertions(+), 173 deletions(-)
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 8454f489161..e9d7869397e 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2298,7 +2298,7 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
}
/*
- * Construct SELECT statement to acquire numbe of rows of given relation.
+ * Construct SELECT statement to acquire a number of rows of a relation.
*
* Note: Maybe this should compare relpages and current relation size
* and adjust reltuples accordingly?
@@ -2322,74 +2322,31 @@ deparseAnalyzeTuplesSql(StringInfo buf, Relation rel)
*
* SELECT command is appended to buf, and list of columns retrieved
* is returned to *retrieved_attrs.
- */
-void
-deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
-{
- Oid relid = RelationGetRelid(rel);
- TupleDesc tupdesc = RelationGetDescr(rel);
- int i;
- char *colname;
- List *options;
- ListCell *lc;
- bool first = true;
-
- *retrieved_attrs = NIL;
-
- appendStringInfoString(buf, "SELECT ");
- for (i = 0; i < tupdesc->natts; i++)
- {
- /* Ignore dropped columns. */
- if (TupleDescAttr(tupdesc, i)->attisdropped)
- continue;
-
- if (!first)
- appendStringInfoString(buf, ", ");
- first = false;
-
- /* Use attribute name or column_name option. */
- colname = NameStr(TupleDescAttr(tupdesc, i)->attname);
- options = GetForeignColumnOptions(relid, i + 1);
-
- foreach(lc, options)
- {
- DefElem *def = (DefElem *) lfirst(lc);
-
- if (strcmp(def->defname, "column_name") == 0)
- {
- colname = defGetString(def);
- break;
- }
- }
-
- appendStringInfoString(buf, quote_identifier(colname));
-
- *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
- }
-
- /* Don't generate bad syntax for zero-column relation. */
- if (first)
- appendStringInfoString(buf, "NULL");
-
- /*
- * Construct FROM clause
- */
- appendStringInfoString(buf, " FROM ");
- deparseRelation(buf, rel);
-}
-
-/*
- * Construct SELECT statement to acquire sample rows of given relation,
- * by sampling a fraction of the table using TABLESAMPLE SYSTEM.
*
- * SELECT command is appended to buf, and list of columns retrieved
- * is returned to *retrieved_attrs.
+ * XXX We allow customizing the sampling method, but we only support methods
+ * we can decide based on server version. Allowing custom TSM modules (for
+ * example tsm_system_rows) might be useful, but it would require detecting
+ * which extensions are installed, to allow automatic fall-back. Moreover, the
+ * methods use different parameters (not sampling rate). So we don't do this
+ * for now, leaving it for future improvements.
+ *
+ * XXX Using remote random() to sample rows has advantages & disadvantages.
+ * The advantages are that this works on all PostgreSQL versions (unlike
+ * TABLESAMPLE), and that it does the sampling on the remote side (unlike
+ * the old approach, which transfers everything and then discards most data).
+ * We could also do "ORDER BY random() LIMIT x", which would always pick
+ * the expected number of rows, but it requires sorting so it's a bit more
+ * expensive.
+ *
+ * The disadvantage is that we still have to read all rows and evaluate the
+ * random(), while TABLESAMPLE skips most of the pages entirely.
*
- * Note: We could allow selecting system/bernoulli, and maybe even the
- * optional TSM modules (especially tsm_system_rows would help).
+ * XXX What if we need only a subset of columns, e.g. ANALYZE t(a,b)?
*/
void
-deparseAnalyzeTableSampleSql(StringInfo buf, Relation rel, List **retrieved_attrs, double sample_frac)
+deparseAnalyzeSql(StringInfo buf, Relation rel,
+ PgFdwSamplingMethod sample_method, double sample_frac,
+ List **retrieved_attrs)
{
Oid relid = RelationGetRelid(rel);
TupleDesc tupdesc = RelationGetDescr(rel);
@@ -2437,77 +2394,35 @@ deparseAnalyzeTableSampleSql(StringInfo buf, Relation rel, List **retrieved_attr
appendStringInfoString(buf, "NULL");
/*
- * Construct FROM clause
+ * Construct FROM clause, and perhaps WHERE clause too, depending on the
+ * selected sampling method.
*/
appendStringInfoString(buf, " FROM ");
deparseRelation(buf, rel);
- appendStringInfo(buf, " TABLESAMPLE SYSTEM(%f)", (100.0 * sample_frac));
-}
-/*
- * Construct SELECT statement to acquire sample rows of given relation,
- * by sampling a fraction of the table using TABLESAMPLE SYSTEM.
- *
- * SELECT command is appended to buf, and list of columns retrieved
- * is returned to *retrieved_attrs.
- *
- * Note: We could allow selecting system/bernoulli, and maybe even the
- * optional TSM modules (especially tsm_system_rows would help).
- */
-void
-deparseAnalyzeLegacySampleSql(StringInfo buf, Relation rel, List **retrieved_attrs, double sample_frac)
-{
- Oid relid = RelationGetRelid(rel);
- TupleDesc tupdesc = RelationGetDescr(rel);
- int i;
- char *colname;
- List *options;
- ListCell *lc;
- bool first = true;
-
- *retrieved_attrs = NIL;
-
- appendStringInfoString(buf, "SELECT ");
- for (i = 0; i < tupdesc->natts; i++)
+ switch (sample_method)
{
- /* Ignore dropped columns. */
- if (TupleDescAttr(tupdesc, i)->attisdropped)
- continue;
-
- if (!first)
- appendStringInfoString(buf, ", ");
- first = false;
-
- /* Use attribute name or column_name option. */
- colname = NameStr(TupleDescAttr(tupdesc, i)->attname);
- options = GetForeignColumnOptions(relid, i + 1);
+ case ANALYZE_SAMPLE_OFF:
+ /* nothing to do here */
+ break;
- foreach(lc, options)
- {
- DefElem *def = (DefElem *) lfirst(lc);
+ case ANALYZE_SAMPLE_RANDOM:
+ appendStringInfo(buf, " WHERE pg_catalog.random() < %f", sample_frac);
+ break;
- if (strcmp(def->defname, "column_name") == 0)
- {
- colname = defGetString(def);
- break;
- }
- }
+ case ANALYZE_SAMPLE_SYSTEM:
+ appendStringInfo(buf, " TABLESAMPLE SYSTEM(%f)", (100.0 * sample_frac));
+ break;
- appendStringInfoString(buf, quote_identifier(colname));
+ case ANALYZE_SAMPLE_BERNOULLI:
+ appendStringInfo(buf, " TABLESAMPLE BERNOULLI(%f)", (100.0 * sample_frac));
+ break;
- *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+ case ANALYZE_SAMPLE_AUTO:
+ /* should have been resolved into actual method */
+ elog(ERROR, "unexpected sampling method");
+ break;
}
-
- /* Don't generate bad syntax for zero-column relation. */
- if (first)
- appendStringInfoString(buf, "NULL");
-
- /*
- * Construct FROM clause
- */
- appendStringInfoString(buf, " FROM ");
- deparseRelation(buf, rel);
- appendStringInfo(buf, " WHERE random() < %f", sample_frac);
}
/*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 5f2ef88cf38..e838d31815e 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9532,7 +9532,7 @@ DO $d$
END;
$d$;
ERROR: invalid option "password"
-HINT: Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, sslcrldir, sslsni, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, truncatable, fetch_size, batch_size, async_capable, parallel_commit, keep_connections
+HINT: Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, sslcrldir, sslsni, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, truncatable, fetch_size, batch_size, async_capable, parallel_commit, keep_connections, analyze_sampling
CONTEXT: SQL statement "ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw')"
PL/pgSQL function inline_code_block line 3 at EXECUTE
-- If we add a password for our user mapping instead, we should get a different
@@ -11296,3 +11296,45 @@ SELECT * FROM prem2;
ALTER SERVER loopback OPTIONS (DROP parallel_commit);
ALTER SERVER loopback2 OPTIONS (DROP parallel_commit);
+-- ===================================================================
+-- test for ANALYZE
+-- ===================================================================
+CREATE TABLE analyze_rtable1 (id int primary key, a text, b bigint);
+CREATE TABLE analyze_rtable2 (id int primary key, a text, b bigint);
+CREATE TABLE analyze_ptable (id int, a text, b bigint) PARTITION BY HASH(id);
+CREATE FOREIGN TABLE analyze_ftable__p1 PARTITION OF analyze_ptable
+ FOR VALUES WITH (MODULUS 2, REMAINDER 0)
+ SERVER loopback OPTIONS (table_name 'analyze_rtable1');
+CREATE FOREIGN TABLE analyze_ftable__p2 PARTITION OF analyze_ptable
+ FOR VALUES WITH (MODULUS 2, REMAINDER 1)
+ SERVER loopback OPTIONS (table_name 'analyze_rtable2');
+INSERT INTO analyze_ptable (SELECT x FROM generate_series(1,5000) x);
+-- analyze the 'local' tables to update relpages/reltuples
+ANALYZE analyze_rtable1, analyze_rtable2;
+-- now analyze the remote tables directly - this expects to scan everything,
+-- so should not do any sampling
+ANALYZE analyze_ftable__p1;
+ANALYZE analyze_ftable__p2;
+-- now analyze the parent - this should scan everything too, because 30k
+-- sample requires everything to be scanned
+ANALYZE analyze_ptable;
+-- now lower the target to 10, which requires only 3k rows sample, so about
+-- 1500 rows from each partition, so sampling will kick in, by default with
+-- the 'bernoulli' tablesample method
+SET default_statistics_target = 10;
+ANALYZE analyze_ptable;
+-- now alter the method for remote server to 'system'
+ALTER SERVER loopback OPTIONS (analyze_sampling 'system');
+ANALYZE analyze_ptable;
+-- now alter the method for remote table to 'random', to not use tablesample
+-- but the 'legacy' sampling, and disable sampling for the other partition
+ALTER FOREIGN TABLE analyze_ftable__p1 OPTIONS (ADD analyze_sampling 'random');
+ALTER FOREIGN TABLE analyze_ftable__p2 OPTIONS (ADD analyze_sampling 'off');
+ANALYZE analyze_ptable;
+-- now add more data, so that each partition exceeds the statistics target
+INSERT INTO analyze_ptable (SELECT x FROM generate_series(5001, 10000) x);
+ANALYZE analyze_rtable1, analyze_rtable2;
+ANALYZE analyze_ptable;
+-- cleanup
+DROP FOREIGN TABLE analyze_ftable__p1, analyze_ftable__p2;
+DROP TABLE analyze_ptable, analyze_rtable1, analyze_rtable2;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 3749d4701a7..7c4a7d8d4b7 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -122,8 +122,7 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
strcmp(def->defname, "truncatable") == 0 ||
strcmp(def->defname, "async_capable") == 0 ||
strcmp(def->defname, "parallel_commit") == 0 ||
- strcmp(def->defname, "keep_connections") == 0 ||
- strcmp(def->defname, "sample") == 0)
+ strcmp(def->defname, "keep_connections") == 0)
{
/* these accept only boolean values */
(void) defGetBoolean(def);
@@ -208,6 +207,23 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
errmsg("sslcert and sslkey are superuser-only"),
errhint("User mappings with the sslcert or sslkey options set may only be created or modified by the superuser")));
}
+ else if (strcmp(def->defname, "analyze_sampling") == 0)
+ {
+ char *value;
+
+ value = defGetString(def);
+
+ /* we recognize off/auto/random/system/bernoulli */
+ if (strcmp(value, "off") != 0 &&
+ strcmp(value, "auto") != 0 &&
+ strcmp(value, "random") != 0 &&
+ strcmp(value, "system") != 0 &&
+ strcmp(value, "bernoulli") != 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for string option \"%s\": %s",
+ def->defname, value)));
+ }
}
PG_RETURN_VOID();
@@ -256,8 +272,8 @@ InitPgFdwOptions(void)
{"password_required", UserMappingRelationId, false},
/* sampling is available on both server and table */
- {"sample", ForeignServerRelationId, false},
- {"sample", ForeignTableRelationId, false},
+ {"analyze_sampling", ForeignServerRelationId, false},
+ {"analyze_sampling", ForeignTableRelationId, false},
/*
* sslcert and sslkey are in fact libpq options, but we repeat them
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 3adf518b676..ea21c5c5119 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -5056,9 +5056,8 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
ListCell *lc;
int server_version_num;
- /* sampling enabled by default */
- bool do_sample = true;
- bool use_tablesample = true;
+ /* analyze sampling enabled by default, if available */
+ PgFdwSamplingMethod method = ANALYZE_SAMPLE_AUTO;
double sample_frac = -1.0;
double reltuples;
@@ -5091,57 +5090,120 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
/* We'll need server version, so fetch it now. */
server_version_num = PQserverVersion(conn);
- /* disable tablesample on old remote servers */
- if (server_version_num < 95000)
- use_tablesample = false;
-
/*
- * Should we use TABLESAMPLE to collect the remote sample?
+ * Should we try do the sampling for analyze on the remote server?
*/
foreach(lc, server->options)
{
DefElem *def = (DefElem *) lfirst(lc);
- if (strcmp(def->defname, "sample") == 0)
+ if (strcmp(def->defname, "analyze_sampling") == 0)
{
- do_sample = defGetBoolean(def);
+ char *value = defGetString(def);
+
+ if (strcmp(value, "off") == 0)
+ method = ANALYZE_SAMPLE_OFF;
+ else if (strcmp(value, "auto") == 0)
+ method = ANALYZE_SAMPLE_AUTO;
+ else if (strcmp(value, "random") == 0)
+ method = ANALYZE_SAMPLE_RANDOM;
+ else if (strcmp(value, "system") == 0)
+ method = ANALYZE_SAMPLE_SYSTEM;
+ else if (strcmp(value, "bernoulli") == 0)
+ method = ANALYZE_SAMPLE_BERNOULLI;
+
break;
}
}
+
foreach(lc, table->options)
{
DefElem *def = (DefElem *) lfirst(lc);
- if (strcmp(def->defname, "sample") == 0)
+ if (strcmp(def->defname, "analyze_sampling") == 0)
{
- do_sample = defGetBoolean(def);
+ char *value = defGetString(def);
+
+ if (strcmp(value, "off") == 0)
+ method = ANALYZE_SAMPLE_OFF;
+ else if (strcmp(value, "auto") == 0)
+ method = ANALYZE_SAMPLE_AUTO;
+ else if (strcmp(value, "random") == 0)
+ method = ANALYZE_SAMPLE_RANDOM;
+ else if (strcmp(value, "system") == 0)
+ method = ANALYZE_SAMPLE_SYSTEM;
+ else if (strcmp(value, "bernoulli") == 0)
+ method = ANALYZE_SAMPLE_BERNOULLI;
+
break;
}
}
- if (do_sample)
+ /*
+ * Error-out if explicitly required one of the TABLESAMPLE methods, but
+ * the server does not support it.
+ */
+ if ((server_version_num < 95000) &&
+ (method == ANALYZE_SAMPLE_SYSTEM ||
+ method == ANALYZE_SAMPLE_BERNOULLI))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("remote server does not support TABLESAMPLE feature")));
+
+ /*
+ * For "auto" method, pick the one we believe is best. For servers with
+ * TABLESAMPLE support we pick BERNOULLI, for old servers we fall-back to
+ * random() to at least reduce network transfer.
+ */
+ if (method == ANALYZE_SAMPLE_AUTO)
+ {
+ if (server_version_num < 95000)
+ method = ANALYZE_SAMPLE_RANDOM;
+ else
+ method = ANALYZE_SAMPLE_BERNOULLI;
+ }
+
+ /*
+ * If we've decided to do remote sampling, calculate the sampling rate. We
+ * need to get the number of tuples from the remote server, so we skip the
+ * network round-trip if not needed.
+ */
+ if (method != ANALYZE_SAMPLE_OFF)
{
reltuples = postgresCountTuplesForForeignTable(relation);
+ /*
+ * No rows or we expect to sample everything - disable sampling after
+ * all (and make sure we don't divide by 0 in sample_frac calculation.)
+ */
if ((reltuples <= 0) || (targrows >= reltuples))
- do_sample = false;
+ method = ANALYZE_SAMPLE_OFF;
- sample_frac = targrows / reltuples;
+ /* Make sure we don't divide by 0 when calculating the rate. */
+ sample_frac = targrows / Max(1.0, reltuples);
- /* Let's sample a bit more, we'll reduce the sample locally. */
- sample_frac *= 1.25;
+ /*
+ * Let's sample a bit more (10%), we'll reduce the sample locally.
+ *
+ * XXX Not sure this is really necessary. If we don't trust the remote
+ * sampling to sample the right number of rows, we should not use it.
+ */
+ sample_frac *= 1.1;
- /* Sanity checks. */
+ /*
+ * Ensure the sampling rate is between 0.0 and 1.0, even after the
+ * 10% adjustment above.
+ */
sample_frac = Min(1.0, Max(0.0, sample_frac));
/*
- * When sampling too large fraction, just read everything.
- *
- * XXX It's not clear where exactly the threshold is, with slow
- * network it may be cheaper to sample even 90%.
+ * If we expect the sampling to reduce very few rows, just disable it
+ * and read the whole remote table. We decide based on the number of
+ * rows we expect to "eliminate" by sampling. If saving than 100 rows,
+ * we disable sampling.
*/
- if (sample_frac > 0.5)
- do_sample = false;
+ if (reltuples * (1 - sample_frac) < 100.0)
+ method = ANALYZE_SAMPLE_OFF;
}
/*
@@ -5151,14 +5213,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
initStringInfo(&sql);
appendStringInfo(&sql, "DECLARE c%u CURSOR FOR ", cursor_number);
- if (do_sample && use_tablesample)
- deparseAnalyzeTableSampleSql(&sql, relation, &astate.retrieved_attrs, sample_frac);
- else if (do_sample)
- deparseAnalyzeLegacySampleSql(&sql, relation, &astate.retrieved_attrs, sample_frac);
- else
- deparseAnalyzeSql(&sql, relation, &astate.retrieved_attrs);
-
- elog(WARNING, "SQL: %s", sql.data);
+ deparseAnalyzeSql(&sql, relation, method, sample_frac, &astate.retrieved_attrs);
/* In what follows, do not risk leaking any PGresults. */
PG_TRY();
@@ -5251,11 +5306,15 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
/* We assume that we have no dead tuple. */
*totaldeadrows = 0.0;
- /* We've retrieved all living tuples from foreign server. */
- if (do_sample)
- *totalrows = reltuples;
- else
+ /*
+ * Without ANALYZE sampling, we've retrieved all living tuples from foreign
+ * server, so just use that. Otherwise we have the reltuples estimate we
+ * got from the remote side.
+ */
+ if (method == ANALYZE_SAMPLE_OFF)
*totalrows = astate.samplerows;
+ else
+ *totalrows = reltuples;
/*
* Emit some interesting relation info
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index b0d9cf42982..1c2a6045a94 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -134,6 +134,18 @@ typedef struct PgFdwConnState
AsyncRequest *pendingAreq; /* pending async request */
} PgFdwConnState;
+/*
+ * Method used by ANALYZE to sample remote rows.
+ */
+typedef enum PgFdwSamplingMethod
+{
+ ANALYZE_SAMPLE_OFF, /* no remote sampling */
+ ANALYZE_SAMPLE_AUTO, /* choose by server version */
+ ANALYZE_SAMPLE_RANDOM, /* remote random() */
+ ANALYZE_SAMPLE_SYSTEM, /* TABLESAMPLE system */
+ ANALYZE_SAMPLE_BERNOULLI /* TABLESAMPLE bernoulli */
+} PgFdwSamplingMethod;
+
/* in postgres_fdw.c */
extern int set_transmission_modes(void);
extern void reset_transmission_modes(int nestlevel);
@@ -213,13 +225,9 @@ extern void deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
extern void deparseAnalyzeTuplesSql(StringInfo buf, Relation rel);
extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
+ PgFdwSamplingMethod sample_method,
+ double sample_frac,
List **retrieved_attrs);
-extern void deparseAnalyzeTableSampleSql(StringInfo buf, Relation rel,
- List **retrieved_attrs,
- double sample_frac);
-extern void deparseAnalyzeLegacySampleSql(StringInfo buf, Relation rel,
- List **retrieved_attrs,
- double sample_frac);
extern void deparseTruncateSql(StringInfo buf,
List *rels,
DropBehavior behavior,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index ae1fc8f58be..9c1daecc1a2 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3636,3 +3636,57 @@ SELECT * FROM prem2;
ALTER SERVER loopback OPTIONS (DROP parallel_commit);
ALTER SERVER loopback2 OPTIONS (DROP parallel_commit);
+
+
+-- ===================================================================
+-- test for ANALYZE
+-- ===================================================================
+CREATE TABLE analyze_rtable1 (id int primary key, a text, b bigint);
+CREATE TABLE analyze_rtable2 (id int primary key, a text, b bigint);
+
+CREATE TABLE analyze_ptable (id int, a text, b bigint) PARTITION BY HASH(id);
+CREATE FOREIGN TABLE analyze_ftable__p1 PARTITION OF analyze_ptable
+ FOR VALUES WITH (MODULUS 2, REMAINDER 0)
+ SERVER loopback OPTIONS (table_name 'analyze_rtable1');
+CREATE FOREIGN TABLE analyze_ftable__p2 PARTITION OF analyze_ptable
+ FOR VALUES WITH (MODULUS 2, REMAINDER 1)
+ SERVER loopback OPTIONS (table_name 'analyze_rtable2');
+INSERT INTO analyze_ptable (SELECT x FROM generate_series(1,5000) x);
+
+-- analyze the 'local' tables to update relpages/reltuples
+ANALYZE analyze_rtable1, analyze_rtable2;
+
+-- now analyze the remote tables directly - this expects to scan everything,
+-- so should not do any sampling
+ANALYZE analyze_ftable__p1;
+ANALYZE analyze_ftable__p2;
+
+-- now analyze the parent - this should scan everything too, because 30k
+-- sample requires everything to be scanned
+ANALYZE analyze_ptable;
+
+-- now lower the target to 10, which requires only 3k rows sample, so about
+-- 1500 rows from each partition, so sampling will kick in, by default with
+-- the 'bernoulli' tablesample method
+SET default_statistics_target = 10;
+ANALYZE analyze_ptable;
+
+-- now alter the method for remote server to 'system'
+ALTER SERVER loopback OPTIONS (analyze_sampling 'system');
+ANALYZE analyze_ptable;
+
+-- now alter the method for remote table to 'random', to not use tablesample
+-- but the 'legacy' sampling, and disable sampling for the other partition
+ALTER FOREIGN TABLE analyze_ftable__p1 OPTIONS (ADD analyze_sampling 'random');
+ALTER FOREIGN TABLE analyze_ftable__p2 OPTIONS (ADD analyze_sampling 'off');
+ANALYZE analyze_ptable;
+
+-- now add more data, so that each partition exceeds the statistics target
+INSERT INTO analyze_ptable (SELECT x FROM generate_series(5001, 10000) x);
+
+ANALYZE analyze_rtable1, analyze_rtable2;
+ANALYZE analyze_ptable;
+
+-- cleanup
+DROP FOREIGN TABLE analyze_ftable__p1, analyze_ftable__p2;
+DROP TABLE analyze_ptable, analyze_rtable1, analyze_rtable2;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index bfd344cdc0e..d44c8cdd71e 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -305,6 +305,28 @@ OPTIONS (ADD password_required 'false');
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>analyze_sampling</literal> (<type>text</type>)</term>
+ <listitem>
+ <para>
+ This option determines if <command>ANALYZE</command> on a foreign
+ table samples the data on the remote node, or reads and transfers
+ all data and performs the sampling locally. The supported values
+ are <literal>off</literal>, <literal>random</literal>,
+ <literal>system</literal>, <literal>bernoulli</literal> and
+ <literal>auto</literal>. <literal>off</literal> disables remote
+ sampling, so all data are transferred and sampled locally.
+ <literal>random</literal> performs remote sampling using
+ <literal>random()</literal> function, while <literal>system</literal>
+ and <literal>bernoulli</literal> rely on built-in <literal>TABLESAMPLE</literal>
+ methods. <literal>random</literal> works on all server versions,
+ while <literal>TABLESAMPLE</literal> is supported only since 9.5.
+ <literal>auto</literal> checks the server version and picks the
+ best remote sampling method automatically.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<para>
--
2.34.3
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
Thanks. I'll switch this to "needs review" now.
OK, I looked through this, and attach some review suggestions in the
form of a delta patch. (0001 below is your two patches merged, 0002
is my delta.) A lot of the delta is comment-smithing, but not all.
After reflection I think that what you've got, ie use reltuples but
don't try to sample if reltuples <= 0, is just fine. The remote
would only have reltuples <= 0 in a never-analyzed table, which
shouldn't be a situation that persists for long unless the table
is tiny. Also, if reltuples is in error, the way to bet is that
it's too small thanks to the table having been enlarged. But
an error in that direction doesn't hurt us: we'll overestimate
the required sample_frac and pull back more data than we need,
but we'll still end up with a valid sample of the right size.
So I doubt it's worth the complication to try to correct based
on relpages etc. (Note that any such correction would almost
certainly end in increasing our estimate of reltuples. But
it's safer to have an underestimate than an overestimate.)
I messed around with the sample_frac choosing logic slightly,
to make it skip pointless calculations if we decide right off
the bat to disable sampling. That way we don't need to worry
about avoiding zero divide, nor do we have to wonder if any
of the later calculations could misbehave.
I left your logic about "disable if saving fewer than 100 rows"
alone, but I have to wonder if using an absolute threshold rather
than a relative one is well-conceived. Sampling at a rate of
99.9 percent seems pretty pointless, but this code is perfectly
capable of accepting that if reltuples is big enough. So
personally I'd do that more like
if (sample_frac > 0.95)
method = ANALYZE_SAMPLE_OFF;
which is simpler and would also eliminate the need for the previous
range-clamp step. I'm not sure what the right cutoff is, but
your "100 tuples" constant is just as arbitrary.
I rearranged the docs patch too. Where you had it, analyze_sampling
was between fdw_startup_cost/fdw_tuple_cost and the following para
discussing them, which didn't seem to me to flow well at all. I ended
up putting analyze_sampling in its own separate list. You could almost
make a case for giving it its own <sect3>, but I concluded that was
probably overkill.
One thing I'm not happy about, but did not touch here, is the expense
of the test cases you added. On my machine, that adds a full 10% to
the already excessively long runtime of postgres_fdw.sql --- and I
do not think it's buying us anything. It is not this module's job
to test whether bernoulli sampling works on partitioned tables.
I think you should just add enough to make sure we exercise the
relevant code paths in postgres_fdw itself.
With these issues addressed, I think this'd be committable.
regards, tom lane
Attachments:
0001-postgres_fdw-sample-data-tomas.patchtext/x-diff; charset=us-ascii; name=0001-postgres_fdw-sample-data-tomas.patchDownload
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index a9766f9734..ea2139fbfa 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2369,14 +2369,56 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
appendStringInfo(buf, "::pg_catalog.regclass) / %d", BLCKSZ);
}
+/*
+ * Construct SELECT statement to acquire a number of rows of a relation.
+ *
+ * Note: Maybe this should compare relpages and current relation size
+ * and adjust reltuples accordingly?
+ */
+void
+deparseAnalyzeTuplesSql(StringInfo buf, Relation rel)
+{
+ StringInfoData relname;
+
+ /* We'll need the remote relation name as a literal. */
+ initStringInfo(&relname);
+ deparseRelation(&relname, rel);
+
+ appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = ");
+ deparseStringLiteral(buf, relname.data);
+ appendStringInfoString(buf, "::pg_catalog.regclass");
+}
+
/*
* Construct SELECT statement to acquire sample rows of given relation.
*
* SELECT command is appended to buf, and list of columns retrieved
* is returned to *retrieved_attrs.
+ *
+ * XXX We allow customizing the sampling method, but we only support methods
+ * we can decide based on server version. Allowing custom TSM modules (for
+ * example tsm_system_rows) might be useful, but it would require detecting
+ * which extensions are installed, to allow automatic fall-back. Moreover, the
+ * methods use different parameters (not sampling rate). So we don't do this
+ * for now, leaving it for future improvements.
+ *
+ * XXX Using remote random() to sample rows has advantages & disadvantages.
+ * The advantages are that this works on all PostgreSQL versions (unlike
+ * TABLESAMPLE), and that it does the sampling on the remote side (unlike
+ * the old approach, which transfers everything and then discards most data).
+ * We could also do "ORDER BY random() LIMIT x", which would always pick
+ * the expected number of rows, but it requires sorting so it's a bit more
+ * expensive.
+ *
+ * The disadvantage is that we still have to read all rows and evaluate the
+ * random(), while TABLESAMPLE skips most of the pages entirely.
+ *
+ * XXX What if we need only a subset of columns, e.g. ANALYZE t(a,b)?
*/
void
-deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
+deparseAnalyzeSql(StringInfo buf, Relation rel,
+ PgFdwSamplingMethod sample_method, double sample_frac,
+ List **retrieved_attrs)
{
Oid relid = RelationGetRelid(rel);
TupleDesc tupdesc = RelationGetDescr(rel);
@@ -2424,10 +2466,35 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
appendStringInfoString(buf, "NULL");
/*
- * Construct FROM clause
+ * Construct FROM clause, and perhaps WHERE clause too, depending on the
+ * selected sampling method.
*/
appendStringInfoString(buf, " FROM ");
deparseRelation(buf, rel);
+
+ switch (sample_method)
+ {
+ case ANALYZE_SAMPLE_OFF:
+ /* nothing to do here */
+ break;
+
+ case ANALYZE_SAMPLE_RANDOM:
+ appendStringInfo(buf, " WHERE pg_catalog.random() < %f", sample_frac);
+ break;
+
+ case ANALYZE_SAMPLE_SYSTEM:
+ appendStringInfo(buf, " TABLESAMPLE SYSTEM(%f)", (100.0 * sample_frac));
+ break;
+
+ case ANALYZE_SAMPLE_BERNOULLI:
+ appendStringInfo(buf, " TABLESAMPLE BERNOULLI(%f)", (100.0 * sample_frac));
+ break;
+
+ case ANALYZE_SAMPLE_AUTO:
+ /* should have been resolved into actual method */
+ elog(ERROR, "unexpected sampling method");
+ break;
+ }
}
/*
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index ebf9ea3598..a37d824d44 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9575,7 +9575,7 @@ DO $d$
END;
$d$;
ERROR: invalid option "password"
-HINT: Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, sslcrldir, sslsni, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, truncatable, fetch_size, batch_size, async_capable, parallel_commit, keep_connections
+HINT: Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, sslcrldir, sslsni, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, truncatable, fetch_size, batch_size, async_capable, parallel_commit, keep_connections, analyze_sampling
CONTEXT: SQL statement "ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw')"
PL/pgSQL function inline_code_block line 3 at EXECUTE
-- If we add a password for our user mapping instead, we should get a different
@@ -11339,3 +11339,45 @@ SELECT * FROM prem2;
ALTER SERVER loopback OPTIONS (DROP parallel_commit);
ALTER SERVER loopback2 OPTIONS (DROP parallel_commit);
+-- ===================================================================
+-- test for ANALYZE
+-- ===================================================================
+CREATE TABLE analyze_rtable1 (id int primary key, a text, b bigint);
+CREATE TABLE analyze_rtable2 (id int primary key, a text, b bigint);
+CREATE TABLE analyze_ptable (id int, a text, b bigint) PARTITION BY HASH(id);
+CREATE FOREIGN TABLE analyze_ftable__p1 PARTITION OF analyze_ptable
+ FOR VALUES WITH (MODULUS 2, REMAINDER 0)
+ SERVER loopback OPTIONS (table_name 'analyze_rtable1');
+CREATE FOREIGN TABLE analyze_ftable__p2 PARTITION OF analyze_ptable
+ FOR VALUES WITH (MODULUS 2, REMAINDER 1)
+ SERVER loopback OPTIONS (table_name 'analyze_rtable2');
+INSERT INTO analyze_ptable (SELECT x FROM generate_series(1,5000) x);
+-- analyze the 'local' tables to update relpages/reltuples
+ANALYZE analyze_rtable1, analyze_rtable2;
+-- now analyze the remote tables directly - this expects to scan everything,
+-- so should not do any sampling
+ANALYZE analyze_ftable__p1;
+ANALYZE analyze_ftable__p2;
+-- now analyze the parent - this should scan everything too, because 30k
+-- sample requires everything to be scanned
+ANALYZE analyze_ptable;
+-- now lower the target to 10, which requires only 3k rows sample, so about
+-- 1500 rows from each partition, so sampling will kick in, by default with
+-- the 'bernoulli' tablesample method
+SET default_statistics_target = 10;
+ANALYZE analyze_ptable;
+-- now alter the method for remote server to 'system'
+ALTER SERVER loopback OPTIONS (analyze_sampling 'system');
+ANALYZE analyze_ptable;
+-- now alter the method for remote table to 'random', to not use tablesample
+-- but the 'legacy' sampling, and disable sampling for the other partition
+ALTER FOREIGN TABLE analyze_ftable__p1 OPTIONS (ADD analyze_sampling 'random');
+ALTER FOREIGN TABLE analyze_ftable__p2 OPTIONS (ADD analyze_sampling 'off');
+ANALYZE analyze_ptable;
+-- now add more data, so that each partition exceeds the statistics target
+INSERT INTO analyze_ptable (SELECT x FROM generate_series(5001, 10000) x);
+ANALYZE analyze_rtable1, analyze_rtable2;
+ANALYZE analyze_ptable;
+-- cleanup
+DROP FOREIGN TABLE analyze_ftable__p1, analyze_ftable__p2;
+DROP TABLE analyze_ptable, analyze_rtable1, analyze_rtable2;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index cd2ef234d6..99f5099929 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -205,6 +205,23 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
errmsg("sslcert and sslkey are superuser-only"),
errhint("User mappings with the sslcert or sslkey options set may only be created or modified by the superuser")));
}
+ else if (strcmp(def->defname, "analyze_sampling") == 0)
+ {
+ char *value;
+
+ value = defGetString(def);
+
+ /* we recognize off/auto/random/system/bernoulli */
+ if (strcmp(value, "off") != 0 &&
+ strcmp(value, "auto") != 0 &&
+ strcmp(value, "random") != 0 &&
+ strcmp(value, "system") != 0 &&
+ strcmp(value, "bernoulli") != 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for string option \"%s\": %s",
+ def->defname, value)));
+ }
}
PG_RETURN_VOID();
@@ -252,6 +269,10 @@ InitPgFdwOptions(void)
{"keep_connections", ForeignServerRelationId, false},
{"password_required", UserMappingRelationId, false},
+ /* sampling is available on both server and table */
+ {"analyze_sampling", ForeignServerRelationId, false},
+ {"analyze_sampling", ForeignTableRelationId, false},
+
/*
* sslcert and sslkey are in fact libpq options, but we repeat them
* here to allow them to appear in both foreign server context (when
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index f3b93954ee..e2df32830f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -4969,6 +4969,68 @@ postgresAnalyzeForeignTable(Relation relation,
return true;
}
+/*
+ * postgresCountTuplesForForeignTable
+ * Count tuples in foreign table (just get pg_class.reltuples).
+ *
+ * Note: It's unclear how accurate reltuples is, maybe size that using
+ * relpages and simple assumptions (1 tuples per page, ...)? Using
+ * tsm_system_rows wold make this somewhat unnecessary.
+ */
+static double
+postgresCountTuplesForForeignTable(Relation relation)
+{
+ ForeignTable *table;
+ UserMapping *user;
+ PGconn *conn;
+ StringInfoData sql;
+ PGresult *volatile res = NULL;
+ double reltuples = -1;
+
+ /*
+ * Now we have to get the number of pages. It's annoying that the ANALYZE
+ * API requires us to return that now, because it forces some duplication
+ * of effort between this routine and postgresAcquireSampleRowsFunc. But
+ * it's probably not worth redefining that API at this point.
+ */
+
+ /*
+ * Get the connection to use. We do the remote access as the table's
+ * owner, even if the ANALYZE was started by some other user.
+ */
+ table = GetForeignTable(RelationGetRelid(relation));
+ user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
+ conn = GetConnection(user, false, NULL);
+
+ /*
+ * Construct command to get page count for relation.
+ */
+ initStringInfo(&sql);
+ deparseAnalyzeTuplesSql(&sql, relation);
+
+ /* In what follows, do not risk leaking any PGresults. */
+ PG_TRY();
+ {
+ res = pgfdw_exec_query(conn, sql.data, NULL);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ pgfdw_report_error(ERROR, res, conn, false, sql.data);
+
+ if (PQntuples(res) != 1 || PQnfields(res) != 1)
+ elog(ERROR, "unexpected result from deparseAnalyzeSizeSql query");
+ reltuples = strtod(PQgetvalue(res, 0, 0), NULL);
+ }
+ PG_FINALLY();
+ {
+ if (res)
+ PQclear(res);
+ }
+ PG_END_TRY();
+
+ ReleaseConnection(conn);
+
+ return reltuples;
+}
+
/*
* Acquire a random sample of rows from foreign table managed by postgres_fdw.
*
@@ -4999,6 +5061,13 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
unsigned int cursor_number;
StringInfoData sql;
PGresult *volatile res = NULL;
+ ListCell *lc;
+ int server_version_num;
+
+ /* analyze sampling enabled by default, if available */
+ PgFdwSamplingMethod method = ANALYZE_SAMPLE_AUTO;
+ double sample_frac = -1.0;
+ double reltuples;
/* Initialize workspace state */
astate.rel = relation;
@@ -5026,20 +5095,139 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
conn = GetConnection(user, false, NULL);
+ /* We'll need server version, so fetch it now. */
+ server_version_num = PQserverVersion(conn);
+
+ /*
+ * Should we try do the sampling for analyze on the remote server?
+ */
+ foreach(lc, server->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "analyze_sampling") == 0)
+ {
+ char *value = defGetString(def);
+
+ if (strcmp(value, "off") == 0)
+ method = ANALYZE_SAMPLE_OFF;
+ else if (strcmp(value, "auto") == 0)
+ method = ANALYZE_SAMPLE_AUTO;
+ else if (strcmp(value, "random") == 0)
+ method = ANALYZE_SAMPLE_RANDOM;
+ else if (strcmp(value, "system") == 0)
+ method = ANALYZE_SAMPLE_SYSTEM;
+ else if (strcmp(value, "bernoulli") == 0)
+ method = ANALYZE_SAMPLE_BERNOULLI;
+
+ break;
+ }
+ }
+
+ foreach(lc, table->options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "analyze_sampling") == 0)
+ {
+ char *value = defGetString(def);
+
+ if (strcmp(value, "off") == 0)
+ method = ANALYZE_SAMPLE_OFF;
+ else if (strcmp(value, "auto") == 0)
+ method = ANALYZE_SAMPLE_AUTO;
+ else if (strcmp(value, "random") == 0)
+ method = ANALYZE_SAMPLE_RANDOM;
+ else if (strcmp(value, "system") == 0)
+ method = ANALYZE_SAMPLE_SYSTEM;
+ else if (strcmp(value, "bernoulli") == 0)
+ method = ANALYZE_SAMPLE_BERNOULLI;
+
+ break;
+ }
+ }
+
+ /*
+ * Error-out if explicitly required one of the TABLESAMPLE methods, but
+ * the server does not support it.
+ */
+ if ((server_version_num < 95000) &&
+ (method == ANALYZE_SAMPLE_SYSTEM ||
+ method == ANALYZE_SAMPLE_BERNOULLI))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("remote server does not support TABLESAMPLE feature")));
+
+ /*
+ * For "auto" method, pick the one we believe is best. For servers with
+ * TABLESAMPLE support we pick BERNOULLI, for old servers we fall-back to
+ * random() to at least reduce network transfer.
+ */
+ if (method == ANALYZE_SAMPLE_AUTO)
+ {
+ if (server_version_num < 95000)
+ method = ANALYZE_SAMPLE_RANDOM;
+ else
+ method = ANALYZE_SAMPLE_BERNOULLI;
+ }
+
+ /*
+ * If we've decided to do remote sampling, calculate the sampling rate. We
+ * need to get the number of tuples from the remote server, so we skip the
+ * network round-trip if not needed.
+ */
+ if (method != ANALYZE_SAMPLE_OFF)
+ {
+ reltuples = postgresCountTuplesForForeignTable(relation);
+
+ /*
+ * No rows or we expect to sample everything - disable sampling after
+ * all (and make sure we don't divide by 0 in sample_frac calculation.)
+ */
+ if ((reltuples <= 0) || (targrows >= reltuples))
+ method = ANALYZE_SAMPLE_OFF;
+
+ /* Make sure we don't divide by 0 when calculating the rate. */
+ sample_frac = targrows / Max(1.0, reltuples);
+
+ /*
+ * Let's sample a bit more (10%), we'll reduce the sample locally.
+ *
+ * XXX Not sure this is really necessary. If we don't trust the remote
+ * sampling to sample the right number of rows, we should not use it.
+ */
+ sample_frac *= 1.1;
+
+ /*
+ * Ensure the sampling rate is between 0.0 and 1.0, even after the
+ * 10% adjustment above.
+ */
+ sample_frac = Min(1.0, Max(0.0, sample_frac));
+
+ /*
+ * If we expect the sampling to reduce very few rows, just disable it
+ * and read the whole remote table. We decide based on the number of
+ * rows we expect to "eliminate" by sampling. If saving than 100 rows,
+ * we disable sampling.
+ */
+ if (reltuples * (1 - sample_frac) < 100.0)
+ method = ANALYZE_SAMPLE_OFF;
+ }
+
/*
* Construct cursor that retrieves whole rows from remote.
*/
cursor_number = GetCursorNumber(conn);
initStringInfo(&sql);
appendStringInfo(&sql, "DECLARE c%u CURSOR FOR ", cursor_number);
- deparseAnalyzeSql(&sql, relation, &astate.retrieved_attrs);
+
+ deparseAnalyzeSql(&sql, relation, method, sample_frac, &astate.retrieved_attrs);
/* In what follows, do not risk leaking any PGresults. */
PG_TRY();
{
char fetch_sql[64];
int fetch_size;
- ListCell *lc;
res = pgfdw_exec_query(conn, sql.data, NULL);
if (PQresultStatus(res) != PGRES_COMMAND_OK)
@@ -5126,8 +5314,15 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
/* We assume that we have no dead tuple. */
*totaldeadrows = 0.0;
- /* We've retrieved all living tuples from foreign server. */
- *totalrows = astate.samplerows;
+ /*
+ * Without ANALYZE sampling, we've retrieved all living tuples from foreign
+ * server, so just use that. Otherwise we have the reltuples estimate we
+ * got from the remote side.
+ */
+ if (method == ANALYZE_SAMPLE_OFF)
+ *totalrows = astate.samplerows;
+ else
+ *totalrows = reltuples;
/*
* Emit some interesting relation info
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 21f2b20ce8..1c2a6045a9 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -134,6 +134,18 @@ typedef struct PgFdwConnState
AsyncRequest *pendingAreq; /* pending async request */
} PgFdwConnState;
+/*
+ * Method used by ANALYZE to sample remote rows.
+ */
+typedef enum PgFdwSamplingMethod
+{
+ ANALYZE_SAMPLE_OFF, /* no remote sampling */
+ ANALYZE_SAMPLE_AUTO, /* choose by server version */
+ ANALYZE_SAMPLE_RANDOM, /* remote random() */
+ ANALYZE_SAMPLE_SYSTEM, /* TABLESAMPLE system */
+ ANALYZE_SAMPLE_BERNOULLI /* TABLESAMPLE bernoulli */
+} PgFdwSamplingMethod;
+
/* in postgres_fdw.c */
extern int set_transmission_modes(void);
extern void reset_transmission_modes(int nestlevel);
@@ -211,7 +223,10 @@ extern void deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
List *returningList,
List **retrieved_attrs);
extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
+extern void deparseAnalyzeTuplesSql(StringInfo buf, Relation rel);
extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
+ PgFdwSamplingMethod sample_method,
+ double sample_frac,
List **retrieved_attrs);
extern void deparseTruncateSql(StringInfo buf,
List *rels,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index b7817c5a41..d9bd800aa2 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3655,3 +3655,57 @@ SELECT * FROM prem2;
ALTER SERVER loopback OPTIONS (DROP parallel_commit);
ALTER SERVER loopback2 OPTIONS (DROP parallel_commit);
+
+
+-- ===================================================================
+-- test for ANALYZE
+-- ===================================================================
+CREATE TABLE analyze_rtable1 (id int primary key, a text, b bigint);
+CREATE TABLE analyze_rtable2 (id int primary key, a text, b bigint);
+
+CREATE TABLE analyze_ptable (id int, a text, b bigint) PARTITION BY HASH(id);
+CREATE FOREIGN TABLE analyze_ftable__p1 PARTITION OF analyze_ptable
+ FOR VALUES WITH (MODULUS 2, REMAINDER 0)
+ SERVER loopback OPTIONS (table_name 'analyze_rtable1');
+CREATE FOREIGN TABLE analyze_ftable__p2 PARTITION OF analyze_ptable
+ FOR VALUES WITH (MODULUS 2, REMAINDER 1)
+ SERVER loopback OPTIONS (table_name 'analyze_rtable2');
+INSERT INTO analyze_ptable (SELECT x FROM generate_series(1,5000) x);
+
+-- analyze the 'local' tables to update relpages/reltuples
+ANALYZE analyze_rtable1, analyze_rtable2;
+
+-- now analyze the remote tables directly - this expects to scan everything,
+-- so should not do any sampling
+ANALYZE analyze_ftable__p1;
+ANALYZE analyze_ftable__p2;
+
+-- now analyze the parent - this should scan everything too, because 30k
+-- sample requires everything to be scanned
+ANALYZE analyze_ptable;
+
+-- now lower the target to 10, which requires only 3k rows sample, so about
+-- 1500 rows from each partition, so sampling will kick in, by default with
+-- the 'bernoulli' tablesample method
+SET default_statistics_target = 10;
+ANALYZE analyze_ptable;
+
+-- now alter the method for remote server to 'system'
+ALTER SERVER loopback OPTIONS (analyze_sampling 'system');
+ANALYZE analyze_ptable;
+
+-- now alter the method for remote table to 'random', to not use tablesample
+-- but the 'legacy' sampling, and disable sampling for the other partition
+ALTER FOREIGN TABLE analyze_ftable__p1 OPTIONS (ADD analyze_sampling 'random');
+ALTER FOREIGN TABLE analyze_ftable__p2 OPTIONS (ADD analyze_sampling 'off');
+ANALYZE analyze_ptable;
+
+-- now add more data, so that each partition exceeds the statistics target
+INSERT INTO analyze_ptable (SELECT x FROM generate_series(5001, 10000) x);
+
+ANALYZE analyze_rtable1, analyze_rtable2;
+ANALYZE analyze_ptable;
+
+-- cleanup
+DROP FOREIGN TABLE analyze_ftable__p1, analyze_ftable__p2;
+DROP TABLE analyze_ptable, analyze_rtable1, analyze_rtable2;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index bfd344cdc0..d44c8cdd71 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -305,6 +305,28 @@ OPTIONS (ADD password_required 'false');
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>analyze_sampling</literal> (<type>text</type>)</term>
+ <listitem>
+ <para>
+ This option determines if <command>ANALYZE</command> on a foreign
+ table samples the data on the remote node, or reads and transfers
+ all data and performs the sampling locally. The supported values
+ are <literal>off</literal>, <literal>random</literal>,
+ <literal>system</literal>, <literal>bernoulli</literal> and
+ <literal>auto</literal>. <literal>off</literal> disables remote
+ sampling, so all data are transferred and sampled locally.
+ <literal>random</literal> performs remote sampling using
+ <literal>random()</literal> function, while <literal>system</literal>
+ and <literal>bernoulli</literal> rely on built-in <literal>TABLESAMPLE</literal>
+ methods. <literal>random</literal> works on all server versions,
+ while <literal>TABLESAMPLE</literal> is supported only since 9.5.
+ <literal>auto</literal> checks the server version and picks the
+ best remote sampling method automatically.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<para>
0002-toms-review-delta.patchtext/x-diff; charset=us-ascii; name=0002-toms-review-delta.patchDownload
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ea2139fbfa..48bce9744d 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2370,10 +2370,11 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
}
/*
- * Construct SELECT statement to acquire a number of rows of a relation.
+ * Construct SELECT statement to acquire the number of rows of a relation.
*
- * Note: Maybe this should compare relpages and current relation size
- * and adjust reltuples accordingly?
+ * Note: we just return the remote server's reltuples value, which might
+ * be off a good deal, but it doesn't seem worth working harder. See
+ * comments in postgresAcquireSampleRowsFunc.
*/
void
deparseAnalyzeTuplesSql(StringInfo buf, Relation rel)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index e2df32830f..8c2a0c6ca1 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -4972,10 +4972,6 @@ postgresAnalyzeForeignTable(Relation relation,
/*
* postgresCountTuplesForForeignTable
* Count tuples in foreign table (just get pg_class.reltuples).
- *
- * Note: It's unclear how accurate reltuples is, maybe size that using
- * relpages and simple assumptions (1 tuples per page, ...)? Using
- * tsm_system_rows wold make this somewhat unnecessary.
*/
static double
postgresCountTuplesForForeignTable(Relation relation)
@@ -4985,14 +4981,7 @@ postgresCountTuplesForForeignTable(Relation relation)
PGconn *conn;
StringInfoData sql;
PGresult *volatile res = NULL;
- double reltuples = -1;
-
- /*
- * Now we have to get the number of pages. It's annoying that the ANALYZE
- * API requires us to return that now, because it forces some duplication
- * of effort between this routine and postgresAcquireSampleRowsFunc. But
- * it's probably not worth redefining that API at this point.
- */
+ volatile double reltuples = -1;
/*
* Get the connection to use. We do the remote access as the table's
@@ -5016,7 +5005,7 @@ postgresCountTuplesForForeignTable(Relation relation)
pgfdw_report_error(ERROR, res, conn, false, sql.data);
if (PQntuples(res) != 1 || PQnfields(res) != 1)
- elog(ERROR, "unexpected result from deparseAnalyzeSizeSql query");
+ elog(ERROR, "unexpected result from deparseAnalyzeTuplesSql query");
reltuples = strtod(PQgetvalue(res, 0, 0), NULL);
}
PG_FINALLY();
@@ -5034,8 +5023,6 @@ postgresCountTuplesForForeignTable(Relation relation)
/*
* Acquire a random sample of rows from foreign table managed by postgres_fdw.
*
- * We fetch the whole table from the remote side and pick out some sample rows.
- *
* Selected rows are returned in the caller-allocated array rows[],
* which must have at least targrows entries.
* The actual number of rows selected is returned as the function result.
@@ -5058,16 +5045,14 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
ForeignServer *server;
UserMapping *user;
PGconn *conn;
+ int server_version_num;
+ PgFdwSamplingMethod method = ANALYZE_SAMPLE_AUTO; /* auto is default */
+ double sample_frac = -1.0;
+ double reltuples;
unsigned int cursor_number;
StringInfoData sql;
PGresult *volatile res = NULL;
ListCell *lc;
- int server_version_num;
-
- /* analyze sampling enabled by default, if available */
- PgFdwSamplingMethod method = ANALYZE_SAMPLE_AUTO;
- double sample_frac = -1.0;
- double reltuples;
/* Initialize workspace state */
astate.rel = relation;
@@ -5099,7 +5084,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
server_version_num = PQserverVersion(conn);
/*
- * Should we try do the sampling for analyze on the remote server?
+ * What sampling method should we use?
*/
foreach(lc, server->options)
{
@@ -5107,7 +5092,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
if (strcmp(def->defname, "analyze_sampling") == 0)
{
- char *value = defGetString(def);
+ char *value = defGetString(def);
if (strcmp(value, "off") == 0)
method = ANALYZE_SAMPLE_OFF;
@@ -5130,7 +5115,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
if (strcmp(def->defname, "analyze_sampling") == 0)
{
- char *value = defGetString(def);
+ char *value = defGetString(def);
if (strcmp(value, "off") == 0)
method = ANALYZE_SAMPLE_OFF;
@@ -5173,7 +5158,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
/*
* If we've decided to do remote sampling, calculate the sampling rate. We
- * need to get the number of tuples from the remote server, so we skip the
+ * need to get the number of tuples from the remote server, but skip that
* network round-trip if not needed.
*/
if (method != ANALYZE_SAMPLE_OFF)
@@ -5181,37 +5166,42 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
reltuples = postgresCountTuplesForForeignTable(relation);
/*
- * No rows or we expect to sample everything - disable sampling after
- * all (and make sure we don't divide by 0 in sample_frac calculation.)
+ * Remote's reltuples could be 0 or -1 if the table has never been
+ * vacuumed/analyzed. In that case, or if we expect to need all the
+ * rows, disable sampling after all.
*/
if ((reltuples <= 0) || (targrows >= reltuples))
method = ANALYZE_SAMPLE_OFF;
+ else
+ {
+ sample_frac = targrows / reltuples;
- /* Make sure we don't divide by 0 when calculating the rate. */
- sample_frac = targrows / Max(1.0, reltuples);
-
- /*
- * Let's sample a bit more (10%), we'll reduce the sample locally.
- *
- * XXX Not sure this is really necessary. If we don't trust the remote
- * sampling to sample the right number of rows, we should not use it.
- */
- sample_frac *= 1.1;
+ /*
+ * Let's sample a bit more (10%) to compensate for any small
+ * inaccuracy in the reltuples value. We'll reduce the sample
+ * locally if it's too large. Of course, this won't fix a big
+ * error in reltuples --- but big errors in it are most commonly
+ * underestimates due to the table having grown since the last
+ * ANALYZE. That will lead to overestimating the required
+ * sample_frac, which is fine.
+ */
+ sample_frac *= 1.1;
- /*
- * Ensure the sampling rate is between 0.0 and 1.0, even after the
- * 10% adjustment above.
- */
- sample_frac = Min(1.0, Max(0.0, sample_frac));
+ /*
+ * Ensure the sampling rate is between 0.0 and 1.0, even after the
+ * 10% adjustment above. (Clamping to 0.0 is just paranoia.)
+ */
+ sample_frac = Min(1.0, Max(0.0, sample_frac));
- /*
- * If we expect the sampling to reduce very few rows, just disable it
- * and read the whole remote table. We decide based on the number of
- * rows we expect to "eliminate" by sampling. If saving than 100 rows,
- * we disable sampling.
- */
- if (reltuples * (1 - sample_frac) < 100.0)
- method = ANALYZE_SAMPLE_OFF;
+ /*
+ * If we expect sampling to remove very few rows, just disable it
+ * and read the whole remote table; the overhead won't be worth
+ * it. We decide based on the number of rows we expect sampling
+ * to eliminate. If saving fewer than 100 rows, disable sampling.
+ */
+ if (reltuples * (1 - sample_frac) < 100.0)
+ method = ANALYZE_SAMPLE_OFF;
+ }
}
/*
@@ -5315,9 +5305,9 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
*totaldeadrows = 0.0;
/*
- * Without ANALYZE sampling, we've retrieved all living tuples from foreign
- * server, so just use that. Otherwise we have the reltuples estimate we
- * got from the remote side.
+ * Without sampling, we've retrieved all living tuples from foreign
+ * server, so report that as totalrows. Otherwise use the reltuples
+ * estimate we got from the remote side.
*/
if (method == ANALYZE_SAMPLE_OFF)
*totalrows = astate.samplerows;
@@ -5330,7 +5320,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
ereport(elevel,
(errmsg("\"%s\": table contains %.0f rows, %d rows in sample",
RelationGetRelationName(relation),
- astate.samplerows, astate.numrows)));
+ *totalrows, astate.numrows)));
return astate.numrows;
}
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index d44c8cdd71..d681d1675e 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -305,28 +305,6 @@ OPTIONS (ADD password_required 'false');
</listitem>
</varlistentry>
- <varlistentry>
- <term><literal>analyze_sampling</literal> (<type>text</type>)</term>
- <listitem>
- <para>
- This option determines if <command>ANALYZE</command> on a foreign
- table samples the data on the remote node, or reads and transfers
- all data and performs the sampling locally. The supported values
- are <literal>off</literal>, <literal>random</literal>,
- <literal>system</literal>, <literal>bernoulli</literal> and
- <literal>auto</literal>. <literal>off</literal> disables remote
- sampling, so all data are transferred and sampled locally.
- <literal>random</literal> performs remote sampling using
- <literal>random()</literal> function, while <literal>system</literal>
- and <literal>bernoulli</literal> rely on built-in <literal>TABLESAMPLE</literal>
- methods. <literal>random</literal> works on all server versions,
- while <literal>TABLESAMPLE</literal> is supported only since 9.5.
- <literal>auto</literal> checks the server version and picks the
- best remote sampling method automatically.
- </para>
- </listitem>
- </varlistentry>
-
</variablelist>
<para>
@@ -348,6 +326,41 @@ OPTIONS (ADD password_required 'false');
frequently updated, the local statistics will soon be obsolete.
</para>
+ <para>
+ The following option controls how such an <command>ANALYZE</command>
+ operation behaves:
+ </para>
+
+ <variablelist>
+
+ <varlistentry>
+ <term><literal>analyze_sampling</literal> (<type>text</type>)</term>
+ <listitem>
+ <para>
+ This option, which can be specified for a foreign table or a foreign
+ server, determines if <command>ANALYZE</command> on a foreign table
+ samples the data on the remote side, or reads and transfers all data
+ and performs the sampling locally. The supported values
+ are <literal>off</literal>, <literal>random</literal>,
+ <literal>system</literal>, <literal>bernoulli</literal>
+ and <literal>auto</literal>. <literal>off</literal> disables remote
+ sampling, so all data are transferred and sampled locally.
+ <literal>random</literal> performs remote sampling using the
+ <literal>random()</literal> function to choose returned rows,
+ while <literal>system</literal> and <literal>bernoulli</literal> rely
+ on the built-in <literal>TABLESAMPLE</literal> methods of those
+ names. <literal>random</literal> works on all remote server versions,
+ while <literal>TABLESAMPLE</literal> is supported only since 9.5.
+ <literal>auto</literal> (the default) picks the recommended sampling
+ method automatically; currently it means
+ either <literal>bernoulli</literal> or <literal>random</literal>
+ depending on the remote server version.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
</sect3>
<sect3>
On 7/18/22 20:45, Tom Lane wrote:
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
Thanks. I'll switch this to "needs review" now.
OK, I looked through this, and attach some review suggestions in the
form of a delta patch. (0001 below is your two patches merged, 0002
is my delta.) A lot of the delta is comment-smithing, but not all.
Thanks!
After reflection I think that what you've got, ie use reltuples but
don't try to sample if reltuples <= 0, is just fine. The remote
would only have reltuples <= 0 in a never-analyzed table, which
shouldn't be a situation that persists for long unless the table
is tiny. Also, if reltuples is in error, the way to bet is that
it's too small thanks to the table having been enlarged. But
an error in that direction doesn't hurt us: we'll overestimate
the required sample_frac and pull back more data than we need,
but we'll still end up with a valid sample of the right size.
So I doubt it's worth the complication to try to correct based
on relpages etc. (Note that any such correction would almost
certainly end in increasing our estimate of reltuples. But
it's safer to have an underestimate than an overestimate.)
I mostly agree, particularly for the non-partitioned case.
I we want to improve sampling for partitioned cases (where the foreign
table is just one of many partitions), I think we'd have to rework how
we determine sample size for each partition. Now we simply calculate
that from relpages, which seems quite fragile (different amounts of
bloat, different tuple densities) and somewhat strange for FDW serves
that don't use the same "page" concept.
So it may easily happen we determine bogus sample sizes for each
partition. The difficulties when calculating the sample_frac is just a
secondary issue.
OTOH the concept of a "row" seems way more general, so perhaps
acquire_inherited_sample_rows should use reltuples, and if we want to do
correction it should happen at this stage already.
I messed around with the sample_frac choosing logic slightly,
to make it skip pointless calculations if we decide right off
the bat to disable sampling. That way we don't need to worry
about avoiding zero divide, nor do we have to wonder if any
of the later calculations could misbehave.
Thanks.
I left your logic about "disable if saving fewer than 100 rows"
alone, but I have to wonder if using an absolute threshold rather
than a relative one is well-conceived. Sampling at a rate of
99.9 percent seems pretty pointless, but this code is perfectly
capable of accepting that if reltuples is big enough. So
personally I'd do that more likeif (sample_frac > 0.95)
method = ANALYZE_SAMPLE_OFF;which is simpler and would also eliminate the need for the previous
range-clamp step. I'm not sure what the right cutoff is, but
your "100 tuples" constant is just as arbitrary.
I agree there probably is not much difference between a threshold on
sample_frac directly and number of rows, at least in general. My
reasoning for switching to "100 rows" is that in most cases the network
transfer is probably more costly than "local costs", and 5% may be quite
a few rows (particularly with higher statistics target). I guess the
proper approach would be to make some simple costing, but that seems
like an overkill.
I rearranged the docs patch too. Where you had it, analyze_sampling
was between fdw_startup_cost/fdw_tuple_cost and the following para
discussing them, which didn't seem to me to flow well at all. I ended
up putting analyze_sampling in its own separate list. You could almost
make a case for giving it its own <sect3>, but I concluded that was
probably overkill.
Thanks.
One thing I'm not happy about, but did not touch here, is the expense
of the test cases you added. On my machine, that adds a full 10% to
the already excessively long runtime of postgres_fdw.sql --- and I
do not think it's buying us anything. It is not this module's job
to test whether bernoulli sampling works on partitioned tables.
I think you should just add enough to make sure we exercise the
relevant code paths in postgres_fdw itself.
Right, I should have commented on that. The purpose of those tests was
verifying that if we change the sampling method on server/table, the
generated query changes accordingly, etc. But that's a bit futile
because we don't have a good way of verifying what query was used - it
worked during development, as I added elog(WARNING).
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
I we want to improve sampling for partitioned cases (where the foreign
table is just one of many partitions), I think we'd have to rework how
we determine sample size for each partition. Now we simply calculate
that from relpages, which seems quite fragile (different amounts of
bloat, different tuple densities) and somewhat strange for FDW serves
that don't use the same "page" concept.
So it may easily happen we determine bogus sample sizes for each
partition. The difficulties when calculating the sample_frac is just a
secondary issue.
OTOH the concept of a "row" seems way more general, so perhaps
acquire_inherited_sample_rows should use reltuples, and if we want to do
correction it should happen at this stage already.
Yeah, there's definitely something to be said for changing that to be
based on rowcount estimates instead of physical size. I think it's
a matter for a different patch though, and not a reason to hold up
this one.
regards, tom lane
Hi!
Here's an updated patch, including all the changes proposed by Tom in
his review. There were two more points left [1], namely:
1) The useless test added to postgres_fdw.sql consuming a lot of time.
I decided to just rip this out and replace it with a much simpler test,
that simply changes the sampling method and does ANALYZE. Ideally we'd
also verify we actually generated the right SQL query to be executed on
the remote server, but there doesn't seem to be a good way to do that
(e.g. we can't do EXPLAIN to show the "Remote SQL" on the "ANALYZE").
So I guess just changing the method and running ANALYZE will have to be
enough for now (coverage report should at least tell us we got to all
the SQL variants).
2) the logic disabling sampling when the fraction gets too high
I based the condition on the absolute number of "unsampled" rows, Tom
suggested maybe it should be just a simple condition on sample_frac
(e.g. 95%). But neither of us was sure what would be a good value.
So I did a couple tests to get at least some idea what would be a good
threshold, and it turns out the threshold is mostly useless. I'll
discuss the measurements I did in a bit (some of the findings are a bit
amusing or even hilarious, actually), but the main reasons are:
- The sampling overhead is negligible (compared to transferring the
data, even on localhost), and the behavior is very smooth. So almost
never exceed reading everything, unless sample_frac >= 0.99 and even
there it's a tiny difference. So there's almost no risk, I think.
- For small tables it doesn't really matter. It's going to be fast
anyway, and the difference between reading 10000 or 11000 rows is going
to be just noise. Who cares if this takes 10 or 11 ms ...
- For large tables, we'll never even get to these high sample_frac
values. Imagine a table with 10M rows - the highers stats target we
allow is 10k, and the sample size is 300 * target, so 3M rows. It
doesn't matter if the condition is 0.95 or 0.99, because for this table
we'll never ask for a sample above 30%.
- For the tables in between it might be more relevant, but the simple
truth is that reading the row and sampling it remotely is way cheaper
than the network transfer, even if on localhost. The data suggest that
reading+sampling a row costs ~0.2us at most, but sending it is ~1.5us
(localhost) or ~5.5us (local network).
So I just removed the threshold from the patch, and we'll request
sampling even with sample_frac=100% (if that happens).
sampling test
-------------
I did a simple test to collect some data - create a table, and sample
various fractions in the ways discussed in this patch - either locally
or through a FDW.
This required a bit of care to ensure the sampling happens in the right
place (and e.g. we don't push the random() or tablesample down), which I
did by pointing the FDW table not to a remote table, but to a view with
an optimization fence (OFFSET 0). See the run-local.sh script.
The script populates the table with different numbers of rows, samples
different fractions of it, etc. I also did this from a different
machine, to see what a bit more network latency would do (that's what
run-remote.sh is for).
results (fdw-sampling-test.pdf)
-------------------------------
The attached PDF shows the results - first page is for the foreign table
in the same instance (i.e. localhost, latency ~0.01ms), second page is
for FDW pointing to a machine in the same network (latency ~0.1ms).
Left column is always the table "directly", right column is through the
FDW. On the x-axis is the fraction of the table we sample, y-axis is
duration in milliseconds. "full" means "reading everything" (i.e. what
the FDW does now), the other options should be clear I think.
The first two dataset sizes (10k and 10M rows) are tiny (10M is ~2GB),
and fit into RAM, which is 8GB. The 100M is ~21GB, so much larger.
In the "direct" (non-FDW) sampling, the various sampling methods start
losing to seqscan fairly soon - "random" is consistently slower,
"bernoulli" starts losing at ~30%, "system" as ~80%. This is not very
surprising, particularly for bernoulli/random which actually read all
the rows anyway. But the overhead is pretty limited to ~30% on top of
the seqscan.
But in the "FDW" sampling (right column), it's entirely different story,
and all the methods clearly win over just transferring everything and
only then doing the sampling.
Who cares if the remote sampling means means we have to pay 0.2us
instead of 0.15us (per row), when the transfer costs 1.5us per row?
The 100M case shows an interesting behavior for the "system" method,
which quickly spikes to ~2x of the "full" method when sampling ~20% of
the table, and then gradually improves again.
My explanation is that this is due to "system" making the I/O patter
increasingly more random, because it jumps blocks in a way that makes
readahead impossible. And then as the fraction increases, it becomes
more sequential again.
All the other methods are pretty much equal to just scanning everything
sequentially, and sampling rows one by one.
The "system" method in TABLESAMPLE would probably benefit from explicit
prefetching, I guess. For ANALYZE this probably is not a huge, as we'll
never sample this large fraction for large tables (for 100M rows we peak
at ~3% with target 10000, which is way before the peak). And for smaller
tables we're more likely to hit cache (which is why the smaller data
sets don't have this issue). But for explicit TABLESAMPLE queries that
may not be the case.
Although, ANALYZE uses something like "system" to sample rows too, no?
However, even this is not an issue for the FDW case - in that case it
still clearly wins over the current "local sample" approach, because
transferring the data is so expensive which makes the "peak" into a tiny
hump.
The second page (different machine, local network) tells the same story,
except that the differences are even clearer.
ANALYZE test
------------
So I changed the patch, and did a similar test by running ANALYZE either
on the local or foreign table, using the different sampling methods.
This does not require the hacks to prevent pushdown etc. but it also
means we can't determine sample_frac directly, only through statistics
target (which is capped to 10k).
In this case, "local" means ANALYZE on the local table (which you might
think of as the "baseline"), and "off" means reading all data without
remote sampling.
For the two smaller data sets (10k and 10M rows), the benefits are
pretty obvious. We're very close to the "local" results, because we save
a lot on copying only some of the rows. For 10M we only get to ~30%
before we hit target=10k, which is we don't see it get closer to "off".
But now we get to the *hilarious* thing - if you look at the 10M result,
you may notice that *all* the sampling methods beat ANALYZE on the local
table.
For "system" (which wins from the very beginning) we might make some
argument that the algorithm is simpler than what ANALYZE does, skips
blocks differently, etc. - perhaps ...
But bernoulli/random are pretty much ideal sampling, reading/sampling
all rows. And yet both methods start winning after crossing ~1% on this
tables. In other words, it's about 3x faster to ANALYZE a table through
FDW than directly ;-)
Anyway, those issues have impact on this patch, I think. I believe the
results show what the patch does is reasonable.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
fdw-sampling-test.pdfapplication/pdf; name=fdw-sampling-test.pdfDownload
%PDF-1.4
%����
1 0 obj
<</Title (fdw-sampling-test)
/Producer (Skia/PDF m108 Google Docs Renderer)>>
endobj
3 0 obj
<</ca 1
/BM /Normal>>
endobj
5 0 obj
<</Type /XObject
/Subtype /Image
/Width 1200
/Height 742
/ColorSpace /DeviceRGB
/BitsPerComponent 8
/Filter /FlateDecode
/Length 44534>> stream
x��� xU�7��;���^��8���aF�]�ud�AGq�PT����}��% !��I�BBd'$@6B��wU��v�x�����[R�������TWw������9u�l /������+W��/���������Z=����a��#F�5j���EEEz�B||<+��V��{]�3�����g����Xdgg;�/>|{��e�6r�H�� @�
%!!a���~?�����z�!��
�{]��� E�R\\����ceYv�]^|�� 4I�&L���Of�����_PP����!L�<��FNN�;�5z�h�Q��ms�����tf�;w.�*���I�.\x��!�(����e���fb��������A �|�*~�
�������K�.�cA����������,6���(�kRQQA�����x>XRRBa�g�������S__o��^�P�,�7of���������A �|�Ydd$���bc����}BRR�� �T>H���I�T �?���4�l�B����Qe����@8�����A �����EUU�����n�:g��q�����!!!<��z?j�(��AJ�������v����/���d2Q�9o�<������?���$''������� �A�8q�?T��{u�@(�;v,�
g������CJ�={�H������Mg�������/_��-0b��5������8:�F/>|� ��A�'++���)S�}/��p������l�����(S�Ne��7���������1$$D��'-���+�Q���u��^|�� �����;v,\��
�7b�����/X��f^�t���_�|y����-��S��>���x�b��h��QSS�����j%_�����}���ci�;���S�/00p����G��ef���g��k����� ��>lNyy9����&Mb�J��o����� ��/J��^Z�t)�S__��Y�-���5kU5}Vm�����/_N+L�����%,,����-�l�2�][�l��(���"�iiii��C@���y3�^ZC������y%Y���W���GG�=p��q�9t��;�1r;���!lNtt4��Y�922���S~�����d��Mk>c�Z
Z:������4V���}�����
Gm�����M�O���8����oYY��V��������7m�D%K�p�y:�o�h�>��YC����'�*b�����3g�q�Z�I�������V2))i���tL)�.��
�WvD�����9��������|����L���k����LQ��$?Y�x�HBq��I�E����h�~KEE�;�O:���
�J��"���z�������e���_���W�[����|1*V�Kn^��Q� ���=�>�OJ�;e����4&�v:�����O�N�)J~5�?�A �GW=�M�C5�����N�)�C����/C�8I�x�B��Y��LWL��\�.��'Z��t��?/^�8f���@hZ
�Op-����D�sC�S��&��2U�� �z�!�U(Ia� ����X{�cU���Q�L����.By����
Y����P%�^���7��G:tH|�� �)S�h���T�n���4�BT�<T���t�YQ����-<� �\�p{u{1�s��A{B���bnn�"���{/m{LL�����[�d �4J0�}oQQQXX��8��T�������|�U��F���D�^Z�N�m[�F���0��c����m�D9Hqq��.>�%)��*�+������y��l|���*G�9�}e[�������I�q�FDD�;A1t�Q�x�! �n������_�u���t������������y�xzT��kY�c �mVw����`�^���� v���Z�.��2t��������Jr���>����O���u���|�*T�%�Z������[G������ ���#^��-�-Z�f
}&���7o��es�
���.����� }����tlGZ1�5jU�6m�D������p����y������P�g��I���<Y�kh��#��j>�����r�JZ�+V��C�B)���������>��)������U���y����;��T|��)�`��J������^�1���3�j�����4)�l~�!t�m};������z06T[[�����JG8�R�dK���Mt@�&S*���P���7]Q)p��t�E�/I �J��6p���t]��i�8�7n�����\
qPr��5���6z~'���*����}�}g��}��<)��+��d���J:}5*���i���� �������GU�l����d��V�N
R]�sss��l^�x���������_����i�_C��]��Q��-��������H�7���SU�L���O�LM���k������K����HWq���*�g�O�����b&�L{XQ��Rl��^�S�Yh�Z%�|��;}�C��=����u|1��p��x]���YM���V$&&���^����c�R���R���:u*))��=//�/����W�^�}��/"�_|�8������A?BBBX���0%��������B����P�sD]]�
v����E5[�2�|mi��7��E�5�SH[����������Y�P�F���ve1�&\�x�R�UOB�c�A���q��9���C������I�� �7x�N'(�����%�v���G��]�v�h����S6J?�����E;�RUF�P���:����T9��E����>|����)���)�RSS��|���T��K��/B� ��|k�w�;��uJL�***�E���=������+V��^@����'�]��I����Z)�g=(�X� ��o�^
���$��$��Z>(f.U1�U=@�.����k�Z_�iw��m?��A�{�����,~��{R�H�a��"���
%G�WynE� �{)T�uLm4�;�B�^���g��,��|@ �������ott�j:^�����g���P������x��[XXh���T�W�*�-LII�~;Uw������c�A<I���]�`>HgE~�)��w�j_�K(����1P������'�R�t�����G���Pm��]xx8�u��Y/������.��T9��3;����
����Rr��?LQI������9� ���%��h<����t���]��*D^^]�-Z4m�4�]������uF#VVmv��o����,tCb7s��|�*x��Y�����T�R��A����[_�3N�4#�)i3M37�� ���=jo�
6���L��)4����F��j����8���v>H�"��u^#����u�_!�:�?����m}����p�7~�x�5pq�
���/b��9��7Z��q:K���5k��[1��}w=�5k���e�8��
�m����6{�lq�������HZ�!�)1���t���iTd��;6����+�32�t�Y��FQg�<�����K�����[,�e/t���N47��(��3g%��n��S:����U�" �bt!�uEg�����l=@����I��U��[�����ox�j^��o��r&N�(�w?<v���/���������ui����M.G1�����������F�/_���GV�qU�:q_Y�6noI�|���Crrr4>'**�/��mM�5�g��^U�i��`
�:
��c�jT��H���K�.9r|�����Kk����
G�7��j�d�����b��i;G9>[�
�8�:��S"�0L9���>�`��)S��R������&6?��r������gN��������s����t����A�OJ��A�����I�� ��wV���wL8Bl������������ ��B9��ex���1���|�^o�/�p���m|��
|T������kWG��~.T��|��iU$���_���Fo��c��#q_���i|�v>�`���9��+����������k�{��W$�.QEN{�mr����M���?U�L<��>��#�����X�^U�������n��x�7a�q�uBq��6���n��)b�����'dF<��cp�k��/�+�m��s����O�G�
n�K�����n�������)�����^B> ����/���WI���M���������4+�ix��F7~�����|P{����~?�!��|P��ZqP>���R>���-�"�����.[k������#F4�44~k��P��J�'t�|����h�/���7G�=���TArpM��m$��o���!�w���EL;���Q�8����r�)��
��H����(��{��A����� �s���g����k`ZZ_Xl�9��g#�+�uG,--en��Q\��F��]�S�*{��]�M�N���Rr��TTT���S�����O�����D> ���$��:u*���?��,-sC_)ZR5r�=�U�.:<\�|��e��/�T��K�]���5���:��N|��;�`�5@~��8^���
5�z���J��Asc{����h��Y����x>��2qM�
^!��6�mv �T�h�^>��o���<��wWi4�������������M�� Ejj*f��]9R�h��eT�v���
bx�����]�� ��8UR��1�n�r�R��7w�\���O���Rr�2t bcc5�](B> ��RRR�-*t���q���h��,Vq3��9�>\{?DGG�UG��%����=���Uw�A���T����-r<