WIP: Collecting statistics on CSV file data
Hi there,
To enable file_fdw to estimate costs of scanning a CSV file more
accurately, I would like to propose a new FDW callback routine,
AnalyzeForeignTable, which allows to ANALYZE command to collect
statistics on a foreign table, and a corresponding file_fdw function,
fileAnalyzeForeignTable. Attached is my WIP patch.
Here's a summary of the implementation:
void AnalyzeForeignTable (Relation relation, VacuumStmt *vacstmt, int
elevel);
This is a new FDW callback routine to collect statistics on a foreign
table and store the results in the pg_class and pg_statistic system
catalogs. This is called when ANALYZE command is executed. (ANALYZE
command should be executed because autovacuum does not analyze foreign
tables.)
static void fileAnalyzeForeignTable(Relation relation, VacuumStmt
*vacstmt, int elevel);
This new file_fdw function collects and stores the same statistics on
CSV file data as collected on a local table except for index related
statistics by executing the sequential scan on the CSV file and
acquiring sample rows using Vitter's algorithm. (It is time-consuming
for a large file.)
estimate_costs() (more precisely, clauselist_selectivity() in
estimate_costs()) estimates baserel->rows using the statistics stored in
the pg_statistic system catalog. If there are no statistics,
estimate_costs() estimates it using the default statistics as in
PostgreSQL 9.1.
I am able to demonstrate the effectiveness of this patch. The following
run is performed on a single core of a 3.00GHz Intel Xeon CPU with 8GB
of RAM. Configuration settings are default except for work_mem = 256MB.
We can see from this result that the optimiser selects a good plan when
the foreign tables have been analyzed.
I appreciate your comments and suggestions.
[sample csv file data]
postgres=# COPY (SELECT s.a, repeat('a', 100) FROM generate_series(1,
5000000) AS s(a)) TO '/home/pgsql/sample_csv_data1.csv' (FORMAT csv,
DELIMITER ',');
COPY 5000000
postgres=# COPY (SELECT (random()*10000)::int, repeat('b', 100) FROM
generate_series(1, 5000000)) TO '/home/pgsql/sample_csv_data2.csv'
(FORMAT csv, DELIMITER ',');
COPY 5000000
[Unpatched]
postgres=# CREATE FOREIGN TABLE tab1 (aid INTEGER, msg text) SERVER
file_fs OPTIONS (filename '/home/pgsql/sample_csv_data1.csv', format
'csv', delimiter ',');
CREATE FOREIGN TABLE
postgres=# CREATE FOREIGN TABLE tab2 (aid INTEGER, msg text) SERVER
file_fs OPTIONS (filename '/home/pgsql/sample_csv_data2.csv', format
'csv', delimiter ',');
CREATE FOREIGN TABLE
postgres=# SELECT count(*) FROM tab1;
count
---------
5000000
(1 row)
postgres=# SELECT count(*) FROM tab2;
count
---------
5000000
(1 row)
postgres=# EXPLAIN ANALYZE SELECT count(*) FROM tab1, tab2 WHERE
tab1.aid >= 0 AND tab1.aid <= 10000 AND tab1.aid = tab2.aid;
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
---
Aggregate (cost=128859182.29..128859182.30 rows=1 width=0) (actual
time=27321.304..27321.304 rows=1 loops=1)
-> Merge Join (cost=5787102.68..111283426.33 rows=7030302383
width=0) (actual time=22181.428..26736.194 rows=4999745 loops=1)
Merge Cond: (tab1.aid = tab2.aid)
-> Sort (cost=1857986.37..1858198.83 rows=84983 width=4)
(actual time=5964.282..5965.958 rows=10000 loops=1)
Sort Key: tab1.aid
Sort Method: quicksort Memory: 853kB
-> Foreign Scan on tab1 (cost=0.00..1851028.44
rows=84983 width=4) (actual time=0.071..5962.382 rows=10000 loops=1)
Filter: ((aid >= 0) AND (aid <= 10000))
Foreign File: /home/pgsql/sample_csv_data1.csv
Foreign File Size: 543888896
-> Materialize (cost=3929116.30..4011842.29 rows=16545197
width=4) (actual time=16216.953..19550.846 rows=5000000 loops=1)
-> Sort (cost=3929116.30..3970479.30 rows=16545197
width=4) (actual time=16216.947..18418.684 rows=5000000 loops=1)
Sort Key: tab2.aid
Sort Method: external merge Disk: 68424kB
-> Foreign Scan on tab2 (cost=0.00..1719149.70
rows=16545197 width=4) (actual time=0.081..6059.630 rows=5000000 loops=1)
Foreign File: /home/pgsql/sample_csv_data2.csv
Foreign File Size: 529446313
Total runtime: 27350.673 ms
(18 rows)
[Patched]
postgres=# CREATE FOREIGN TABLE tab1 (aid INTEGER, msg text) SERVER
file_fs OPTIONS (filename '/home/pgsql/sample_csv_data1.csv', format
'csv', delimiter ',');
CREATE FOREIGN TABLE
postgres=# CREATE FOREIGN TABLE tab2 (aid INTEGER, msg text) SERVER
file_fs OPTIONS (filename '/home/pgsql/sample_csv_data2.csv', format
'csv', delimiter ',');
CREATE FOREIGN TABLE
postgres=# ANALYZE VERBOSE tab1;
INFO: analyzing "public.tab1"
INFO: "tab1": scanned, containing 5000000 rows; 30000 rows in sample
ANALYZE
postgres=# ANALYZE VERBOSE tab2;
INFO: analyzing "public.tab2"
INFO: "tab2": scanned, containing 5000000 rows; 30000 rows in sample
ANALYZE
postgres=# EXPLAIN ANALYZE SELECT count(*) FROM tab1, tab2 WHERE
tab1.aid >= 0 AND tab1.aid <= 10000 AND tab1.aid = tab2.aid;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1282725.25..1282725.26 rows=1 width=0) (actual
time=15114.325..15114.325 rows=1 loops=1)
-> Hash Join (cost=591508.50..1271157.90 rows=4626940 width=0)
(actual time=5964.449..14526.822 rows=4999745 loops=1)
Hash Cond: (tab2.aid = tab1.aid)
-> Foreign Scan on tab2 (cost=0.00..564630.00 rows=5000000
width=4) (actual time=0.070..6253.257 rows=5000000 loops=1)
Foreign File: /home/pgsql/sample_csv_data2.csv
Foreign File Size: 529446313
-> Hash (cost=591393.00..591393.00 rows=9240 width=4) (actual
time=5964.346..5964.346 rows=10000 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 352kB
-> Foreign Scan on tab1 (cost=0.00..591393.00 rows=9240
width=4) (actual time=0.066..5962.222 rows=10000 loops=1)
Filter: ((aid >= 0) AND (aid <= 10000))
Foreign File: /home/pgsql/sample_csv_data1.csv
Foreign File Size: 543888896
Total runtime: 15114.480 ms
(13 rows)
Best regards,
Etsuro Fujita
Attachments:
postgresql-analyze-v1.patchtext/plain; name=postgresql-analyze-v1.patchDownload
diff -crNB original/postgresql-9.1beta1/contrib/file_fdw/file_fdw.c changed/postgresql-9.1beta1/contrib/file_fdw/file_fdw.c
*** original/postgresql-9.1beta1/contrib/file_fdw/file_fdw.c 2011-04-28 06:17:22.000000000 +0900
--- changed/postgresql-9.1beta1/contrib/file_fdw/file_fdw.c 2011-09-12 15:19:28.000000000 +0900
***************
*** 15,29 ****
--- 15,41 ----
#include <sys/stat.h>
#include <unistd.h>
+ #include "access/htup.h"
#include "access/reloptions.h"
+ #include "access/transam.h"
#include "catalog/pg_foreign_table.h"
#include "commands/copy.h"
+ #include "commands/dbcommands.h"
#include "commands/defrem.h"
#include "commands/explain.h"
+ #include "commands/vacuum.h"
#include "foreign/fdwapi.h"
#include "foreign/foreign.h"
#include "miscadmin.h"
#include "optimizer/cost.h"
+ #include "optimizer/plancat.h"
+ #include "pgstat.h"
+ #include "parser/parse_relation.h"
+ #include "utils/attoptcache.h"
+ #include "utils/guc.h"
+ #include "utils/lsyscache.h"
+ #include "utils/memutils.h"
+ /* #include "utils/pg_rusage.h" */
PG_MODULE_MAGIC;
***************
*** 101,106 ****
--- 113,119 ----
static TupleTableSlot *fileIterateForeignScan(ForeignScanState *node);
static void fileReScanForeignScan(ForeignScanState *node);
static void fileEndForeignScan(ForeignScanState *node);
+ static void fileAnalyzeForeignTable(Relation relation, VacuumStmt *vacstmt, int elevel);
/*
* Helper functions
***************
*** 111,117 ****
static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
const char *filename,
Cost *startup_cost, Cost *total_cost);
!
/*
* Foreign-data wrapper handler function: return a struct with pointers
--- 124,131 ----
static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
const char *filename,
Cost *startup_cost, Cost *total_cost);
! static void file_do_analyze_rel(Relation relation, VacuumStmt *vacstmt, int elevel, const char *filename, CopyState cstate);
! static int file_acquire_sample_rows(Relation onerel, int elevel, CopyState cstate, HeapTuple *rows, int targrows, double *totalrows);
/*
* Foreign-data wrapper handler function: return a struct with pointers
***************
*** 128,133 ****
--- 142,148 ----
fdwroutine->IterateForeignScan = fileIterateForeignScan;
fdwroutine->ReScanForeignScan = fileReScanForeignScan;
fdwroutine->EndForeignScan = fileEndForeignScan;
+ fdwroutine->AnalyzeForeignTable = fileAnalyzeForeignTable;
PG_RETURN_POINTER(fdwroutine);
}
***************
*** 464,469 ****
--- 479,509 ----
}
/*
+ * fileAnalyzeForeignTable
+ * Analyze table
+ */
+ static void
+ fileAnalyzeForeignTable(Relation relation, VacuumStmt *vacstmt, int elevel)
+ {
+ char *filename;
+ List *options;
+ CopyState cstate;
+
+ /* Fetch options of foreign table */
+ fileGetOptions(RelationGetRelid(relation), &filename, &options);
+
+ /*
+ * Create CopyState from FDW options. We always acquire all columns, so
+ * as to match the expected ScanTupleSlot signature.
+ */
+ cstate = BeginCopyFrom(relation, filename, NIL, options);
+
+ file_do_analyze_rel(relation, vacstmt, elevel, filename, cstate);
+
+ EndCopyFrom(cstate);
+ }
+
+ /*
* Estimate costs of scanning a foreign table.
*/
static void
***************
*** 473,479 ****
{
struct stat stat_buf;
BlockNumber pages;
! int tuple_width;
double ntuples;
double nrows;
Cost run_cost = 0;
--- 513,520 ----
{
struct stat stat_buf;
BlockNumber pages;
! BlockNumber relpages;
! double reltuples;
double ntuples;
double nrows;
Cost run_cost = 0;
***************
*** 493,508 ****
if (pages < 1)
pages = 1;
! /*
! * Estimate the number of tuples in the file. We back into this estimate
! * using the planner's idea of the relation width; which is bogus if not
! * all columns are being read, not to mention that the text representation
! * of a row probably isn't the same size as its internal representation.
! * FIXME later.
! */
! tuple_width = MAXALIGN(baserel->width) + MAXALIGN(sizeof(HeapTupleHeaderData));
! ntuples = clamp_row_est((double) stat_buf.st_size / (double) tuple_width);
/*
* Now estimate the number of rows returned by the scan after applying the
--- 534,565 ----
if (pages < 1)
pages = 1;
! relpages = baserel->pages;
! reltuples = baserel->tuples;
!
! if (relpages > 0)
! {
! double density;
!
! density = reltuples / (double) relpages;
!
! ntuples = clamp_row_est(density * (double) pages);
! }
! else
! {
! int tuple_width;
! /*
! * Estimate the number of tuples in the file. We back into this estimate
! * using the planner's idea of the relation width; which is bogus if not
! * all columns are being read, not to mention that the text representation
! * of a row probably isn't the same size as its internal representation.
! * FIXME later.
! */
! tuple_width = MAXALIGN(baserel->width) + MAXALIGN(sizeof(HeapTupleHeaderData));
!
! ntuples = clamp_row_est((double) stat_buf.st_size / (double) tuple_width);
! }
/*
* Now estimate the number of rows returned by the scan after applying the
***************
*** 534,536 ****
--- 591,960 ----
run_cost += cpu_per_tuple * ntuples;
*total_cost = *startup_cost + run_cost;
}
+
+ /*
+ * file_do_analyze_rel() -- analyze one foreign table
+ */
+ static void
+ file_do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, int elevel, const char *filename, CopyState cstate)
+ {
+ int i,
+ attr_cnt,
+ tcnt,
+ numrows = 0,
+ targrows;
+ double totalrows = 0;
+ HeapTuple *rows;
+ struct stat stat_buf;
+ BlockNumber pages;
+ VacAttrStats **vacattrstats;
+ MemoryContext caller_context;
+ MemoryContext anl_context;
+
+ ereport(elevel,
+ (errmsg("analyzing \"%s.%s\"",
+ get_namespace_name(RelationGetNamespace(onerel)),
+ RelationGetRelationName(onerel))));
+
+ /*
+ * Set up a working context so that we can easily free whatever junk gets
+ * created.
+ */
+ anl_context = AllocSetContextCreate(CurrentMemoryContext,
+ "Analyze",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ caller_context = MemoryContextSwitchTo(anl_context);
+
+ /*
+ * Switch to the table owner's userid, so that any index functions are run
+ * as that user. Also lock down security-restricted operations and
+ * arrange to make GUC variable changes local to this command.
+ */
+ /*
+ GetUserIdAndSecContext(&save_userid, &save_sec_context);
+ SetUserIdAndSecContext(onerel->rd_rel->relowner,
+ save_sec_context | SECURITY_RESTRICTED_OPERATION);
+ save_nestlevel = NewGUCNestLevel();
+ */
+
+ /*
+ * Determine which columns to analyze
+ *
+ * Note that system attributes are never analyzed.
+ */
+ if (vacstmt->va_cols != NIL)
+ {
+ ListCell *le;
+
+ vacattrstats = (VacAttrStats **) palloc(list_length(vacstmt->va_cols) *
+ sizeof(VacAttrStats *));
+ tcnt = 0;
+ foreach(le, vacstmt->va_cols)
+ {
+ char *col = strVal(lfirst(le));
+
+ i = attnameAttNum(onerel, col, false);
+ if (i == InvalidAttrNumber)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_COLUMN),
+ errmsg("column \"%s\" of relation \"%s\" does not exist",
+ col, RelationGetRelationName(onerel))));
+ vacattrstats[tcnt] = examine_attribute(onerel, i, NULL, anl_context);
+ if (vacattrstats[tcnt] != NULL)
+ tcnt++;
+ }
+ attr_cnt = tcnt;
+ }
+ else
+ {
+ attr_cnt = onerel->rd_att->natts;
+ vacattrstats = (VacAttrStats **) palloc(attr_cnt * sizeof(VacAttrStats *));
+ tcnt = 0;
+ for (i = 1; i <= attr_cnt; i++)
+ {
+ vacattrstats[tcnt] = examine_attribute(onerel, i, NULL, anl_context);
+ if (vacattrstats[tcnt] != NULL)
+ tcnt++;
+ }
+ attr_cnt = tcnt;
+ }
+
+ /*
+ * Quit if no analyzable columns.
+ */
+ if (attr_cnt <= 0)
+ goto cleanup;
+
+ /*
+ * Determine how many rows we need to sample, using the worst case from
+ * all analyzable columns. We use a lower bound of 100 rows to avoid
+ * possible overflow in Vitter's algorithm.
+ */
+ targrows = 100;
+ for (i = 0; i < attr_cnt; i++)
+ {
+ if (targrows < vacattrstats[i]->minrows)
+ targrows = vacattrstats[i]->minrows;
+ }
+
+ /*
+ * Acquire the sample rows
+ */
+ rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
+ numrows = file_acquire_sample_rows(onerel, elevel, cstate, rows, targrows, &totalrows);
+
+ /*
+ * Compute the statistics. Temporary results during the calculations for
+ * each column are stored in a child context. The calc routines are
+ * responsible to make sure that whatever they store into the VacAttrStats
+ * structure is allocated in anl_context.
+ */
+ if (numrows > 0)
+ {
+ MemoryContext col_context, old_context;
+
+ col_context = AllocSetContextCreate(anl_context,
+ "Analyze Column",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ old_context = MemoryContextSwitchTo(col_context);
+
+ for (i = 0; i < attr_cnt; i++)
+ {
+ VacAttrStats *stats = vacattrstats[i];
+ AttributeOpts *aopt = get_attribute_options(onerel->rd_id, stats->attr->attnum);
+
+ stats->rows = rows;
+ stats->tupDesc = onerel->rd_att;
+ (*stats->compute_stats) (stats,
+ std_fetch_func,
+ numrows,
+ totalrows);
+
+ /*
+ * If the appropriate flavor of the n_distinct option is
+ * specified, override with the corresponding value.
+ */
+ if (aopt != NULL)
+ {
+ float8 n_distinct = aopt->n_distinct;
+
+ if (n_distinct != 0.0)
+ stats->stadistinct = n_distinct;
+ }
+
+ MemoryContextResetAndDeleteChildren(col_context);
+ }
+
+ MemoryContextSwitchTo(old_context);
+ MemoryContextDelete(col_context);
+
+ /*
+ * Emit the completed stats rows into pg_statistic, replacing any
+ * previous statistics for the target columns. (If there are stats in
+ * pg_statistic for columns we didn't process, we leave them alone.)
+ */
+ update_attstats(onerel->rd_id, false, attr_cnt, vacattrstats);
+ }
+
+ /*
+ * Get size of the file. It might not be there at plan time, though, in
+ * which case we have to use a default estimate.
+ */
+ if (stat(filename, &stat_buf) < 0)
+ stat_buf.st_size = 10 * BLCKSZ;
+
+ /*
+ * Convert size to pages for use in I/O cost estimate below.
+ */
+ pages = (stat_buf.st_size + (BLCKSZ - 1)) / BLCKSZ;
+ if (pages < 1)
+ pages = 1;
+
+ /*
+ * Update pages/tuples stats in pg_class.
+ */
+ vac_update_relstats(onerel, pages, totalrows, false, InvalidTransactionId);
+
+ /*
+ * Report ANALYZE to the stats collector, too; likewise, tell it to adopt
+ * these numbers only if we're not inside a VACUUM that got a better
+ * number. However, a call with inh = true shouldn't reset the stats.
+ */
+ pgstat_report_analyze(onerel, true, totalrows, 0);
+
+ /* We skip to here if there were no analyzable columns */
+ cleanup:
+
+ /* Restore current context and release memory */
+ MemoryContextSwitchTo(caller_context);
+ MemoryContextDelete(anl_context);
+ anl_context = NULL;
+ }
+
+ /*
+ * file_acquire_sample_rows -- acquire a random sample of rows from the table
+ *
+ * Selected rows are returned in the caller-allocated array rows[], which
+ * must have at least targrows entries.
+ * The actual number of rows selected is returned as the function result.
+ * We also count the number of rows in the table, and return it into *totalrows.
+ *
+ * The returned list of tuples is in order by physical position in the table.
+ * (We will rely on this later to derive correlation estimates.)
+ */
+ static int
+ file_acquire_sample_rows(Relation onerel, int elevel, CopyState cstate, HeapTuple *rows, int targrows, double *totalrows)
+ {
+ int numrows = 0;
+ double samplerows = 0; /* total # rows collected */
+ double rowstoskip = -1; /* -1 means not set yet */
+ double rstate;
+ HeapTuple tuple;
+ TupleDesc tupDesc;
+ TupleConstr *constr;
+ int natts;
+ int attrChk;
+ Datum *values;
+ bool *nulls;
+ bool found;
+ bool sample_it = false;
+ BlockNumber blknum;
+ OffsetNumber offnum;
+ ErrorContextCallback errcontext;
+
+ Assert(onerel);
+ Assert(targrows > 0);
+
+ tupDesc = RelationGetDescr(onerel);
+ constr = tupDesc->constr;
+ natts = tupDesc->natts;
+ values = (Datum *) palloc(tupDesc->natts * sizeof(Datum));
+ nulls = (bool *) palloc(tupDesc->natts * sizeof(bool));
+
+ /* Prepare for sampling rows */
+ rstate = init_selection_state(targrows);
+
+ for (;;)
+ {
+ sample_it = true;
+
+ /* Set up callback to identify error line number. */
+ errcontext.callback = CopyFromErrorCallback;
+ errcontext.arg = (void *) cstate;
+ errcontext.previous = error_context_stack;
+ error_context_stack = &errcontext;
+
+ found = NextCopyFrom(cstate, NULL, values, nulls, NULL);
+
+ /* Remove error callback. */
+ error_context_stack = errcontext.previous;
+
+ if (!found)
+ break;
+
+ tuple = heap_form_tuple(tupDesc, values, nulls);
+
+ if (constr && constr->has_not_null)
+ {
+ for (attrChk = 1; attrChk <= natts; attrChk++)
+ {
+ if (onerel->rd_att->attrs[attrChk - 1]->attnotnull &&
+ heap_attisnull(tuple, attrChk))
+ {
+ sample_it = false;
+ break;
+ }
+ }
+ }
+
+ if (!sample_it)
+ {
+ heap_freetuple(tuple);
+ continue;
+ }
+
+ /*
+ * The first targrows sample rows are simply copied into the
+ * reservoir. Then we start replacing tuples in the sample
+ * until we reach the end of the relation. This algorithm is
+ * from Jeff Vitter's paper (see full citation below). It
+ * works by repeatedly computing the number of tuples to skip
+ * before selecting a tuple, which replaces a randomly chosen
+ * element of the reservoir (current set of tuples). At all
+ * times the reservoir is a true random sample of the tuples
+ * we've passed over so far, so when we fall off the end of
+ * the relation we're done.
+ */
+ if (numrows < targrows)
+ {
+ blknum = (BlockNumber) samplerows / MaxOffsetNumber;
+ offnum = (OffsetNumber) samplerows % MaxOffsetNumber + 1;
+ ItemPointerSet(&tuple->t_self, blknum, offnum);
+ rows[numrows++] = heap_copytuple(tuple);
+ }
+ else
+ {
+ /*
+ * t in Vitter's paper is the number of records already
+ * processed. If we need to compute a new S value, we
+ * must use the not-yet-incremented value of samplerows as
+ * t.
+ */
+ if (rowstoskip < 0)
+ rowstoskip = get_next_S(samplerows, targrows, &rstate);
+
+ if (rowstoskip <= 0)
+ {
+ /*
+ * Found a suitable tuple, so save it, replacing one
+ * old tuple at random
+ */
+ int k = (int) (targrows * random_fract());
+
+ Assert(k >= 0 && k < targrows);
+ heap_freetuple(rows[k]);
+
+ blknum = (BlockNumber) samplerows / MaxOffsetNumber;
+ offnum = (OffsetNumber) samplerows % MaxOffsetNumber + 1;
+ ItemPointerSet(&tuple->t_self, blknum, offnum);
+ rows[k] = heap_copytuple(tuple);
+ }
+
+ rowstoskip -= 1;
+ }
+
+ samplerows += 1;
+ heap_freetuple(tuple);
+ }
+
+ /*
+ * If we didn't find as many tuples as we wanted then we're done. No sort
+ * is needed, since they're already in order.
+ *
+ * Otherwise we need to sort the collected tuples by position
+ * (itempointer). It's not worth worrying about corner cases where the
+ * tuples are already sorted.
+ */
+ if (numrows == targrows)
+ qsort((void *) rows, numrows, sizeof(HeapTuple), compare_rows);
+
+ *totalrows = samplerows;
+
+ pfree(values);
+ pfree(nulls);
+
+ /*
+ * Emit some interesting relation info
+ */
+ ereport(elevel,
+ (errmsg("\"%s\": scanned, "
+ "containing %d rows; "
+ "%d rows in sample",
+ RelationGetRelationName(onerel), (int) samplerows, numrows)));
+
+ return numrows;
+ }
diff -crNB original/postgresql-9.1beta1/contrib/file_fdw/input/file_fdw.source changed/postgresql-9.1beta1/contrib/file_fdw/input/file_fdw.source
*** original/postgresql-9.1beta1/contrib/file_fdw/input/file_fdw.source 2011-04-28 06:17:22.000000000 +0900
--- changed/postgresql-9.1beta1/contrib/file_fdw/input/file_fdw.source 2011-09-04 19:29:23.000000000 +0900
***************
*** 94,99 ****
--- 94,104 ----
EXECUTE st(100);
DEALLOCATE st;
+ -- statistics collection tests
+ ANALYZE agg_csv;
+ SELECT relpages, reltuples FROM pg_class WHERE relname = 'agg_csv';
+ SELECT * FROM pg_stats WHERE tablename = 'agg_csv';
+
-- tableoid
SELECT tableoid::regclass, b FROM agg_csv;
diff -crNB original/postgresql-9.1beta1/contrib/file_fdw/output/file_fdw.source changed/postgresql-9.1beta1/contrib/file_fdw/output/file_fdw.source
*** original/postgresql-9.1beta1/contrib/file_fdw/output/file_fdw.source 2011-04-28 06:17:22.000000000 +0900
--- changed/postgresql-9.1beta1/contrib/file_fdw/output/file_fdw.source 2011-09-04 19:31:15.000000000 +0900
***************
*** 141,146 ****
--- 141,161 ----
(1 row)
DEALLOCATE st;
+ -- statistics collection tests
+ ANALYZE agg_csv;
+ SELECT relpages, reltuples FROM pg_class WHERE relname = 'agg_csv';
+ relpages | reltuples
+ ----------+-----------
+ 1 | 3
+ (1 row)
+
+ SELECT * FROM pg_stats WHERE tablename = 'agg_csv';
+ schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
+ ------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+-------------------------+-------------
+ public | agg_csv | a | f | 0 | 2 | -1 | | | {0,42,100} | -0.5
+ public | agg_csv | b | f | 0 | 4 | -1 | | | {0.09561,99.097,324.78} | 0.5
+ (2 rows)
+
-- tableoid
SELECT tableoid::regclass, b FROM agg_csv;
tableoid | b
diff -crNB original/postgresql-9.1beta1/src/backend/commands/analyze.c changed/postgresql-9.1beta1/src/backend/commands/analyze.c
*** original/postgresql-9.1beta1/src/backend/commands/analyze.c 2011-04-28 06:17:22.000000000 +0900
--- changed/postgresql-9.1beta1/src/backend/commands/analyze.c 2011-09-12 13:21:04.000000000 +0900
***************
*** 24,35 ****
--- 24,38 ----
#include "catalog/index.h"
#include "catalog/indexing.h"
#include "catalog/namespace.h"
+ #include "catalog/pg_class.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_namespace.h"
#include "commands/dbcommands.h"
#include "commands/vacuum.h"
#include "executor/executor.h"
+ #include "foreign/foreign.h"
+ #include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "parser/parse_oper.h"
***************
*** 94,114 ****
AnlIndexData *indexdata, int nindexes,
HeapTuple *rows, int numrows,
MemoryContext col_context);
! static VacAttrStats *examine_attribute(Relation onerel, int attnum,
! Node *index_expr);
static int acquire_sample_rows(Relation onerel, HeapTuple *rows,
int targrows, double *totalrows, double *totaldeadrows);
! static double random_fract(void);
! static double init_selection_state(int n);
! static double get_next_S(double t, int n, double *stateptr);
! static int compare_rows(const void *a, const void *b);
static int acquire_inherited_sample_rows(Relation onerel,
HeapTuple *rows, int targrows,
double *totalrows, double *totaldeadrows);
! static void update_attstats(Oid relid, bool inh,
! int natts, VacAttrStats **vacattrstats);
! static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
! static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
static bool std_typanalyze(VacAttrStats *stats);
--- 97,117 ----
AnlIndexData *indexdata, int nindexes,
HeapTuple *rows, int numrows,
MemoryContext col_context);
! /* static VacAttrStats *examine_attribute(Relation onerel, int attnum, */
! /* Node *index_expr); */
static int acquire_sample_rows(Relation onerel, HeapTuple *rows,
int targrows, double *totalrows, double *totaldeadrows);
! /* static double random_fract(void); */
! /* static double init_selection_state(int n); */
! /* static double get_next_S(double t, int n, double *stateptr); */
! /* static int compare_rows(const void *a, const void *b); */
static int acquire_inherited_sample_rows(Relation onerel,
HeapTuple *rows, int targrows,
double *totalrows, double *totaldeadrows);
! /* static void update_attstats(Oid relid, bool inh, */
! /* int natts, VacAttrStats **vacattrstats); */
! /* static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull); */
! /* static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull); */
static bool std_typanalyze(VacAttrStats *stats);
***************
*** 129,134 ****
--- 132,138 ----
BufferAccessStrategy bstrategy, bool update_reltuples)
{
Relation onerel;
+ MemoryContext caller_context;
/* Set up static variables */
if (vacstmt->options & VACOPT_VERBOSE)
***************
*** 196,202 ****
* Check that it's a plain table; we used to do this in get_rel_oids() but
* seems safer to check after we've locked the relation.
*/
! if (onerel->rd_rel->relkind != RELKIND_RELATION)
{
/* No need for a WARNING if we already complained during VACUUM */
if (!(vacstmt->options & VACOPT_VACUUM))
--- 200,207 ----
* Check that it's a plain table; we used to do this in get_rel_oids() but
* seems safer to check after we've locked the relation.
*/
! if (!(onerel->rd_rel->relkind == RELKIND_RELATION ||
! onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE))
{
/* No need for a WARNING if we already complained during VACUUM */
if (!(vacstmt->options & VACOPT_VACUUM))
***************
*** 238,250 ****
/*
* Do the normal non-recursive ANALYZE.
*/
! do_analyze_rel(onerel, vacstmt, update_reltuples, false);
! /*
! * If there are child tables, do recursive ANALYZE.
! */
! if (onerel->rd_rel->relhassubclass)
! do_analyze_rel(onerel, vacstmt, false, true);
/*
* Close source relation now, but keep lock so that no one deletes it
--- 243,272 ----
/*
* Do the normal non-recursive ANALYZE.
*/
! if (onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
! {
! ForeignDataWrapper *wrapper;
! ForeignServer *server;
! ForeignTable *table;
! FdwRoutine *fdwroutine;
!
! table = GetForeignTable(RelationGetRelid(onerel));
! server = GetForeignServer(table->serverid);
! wrapper = GetForeignDataWrapper(server->fdwid);
! fdwroutine = GetFdwRoutine(wrapper->fdwhandler);
! fdwroutine->AnalyzeForeignTable(onerel, vacstmt, elevel);
! }
! else
! {
! do_analyze_rel(onerel, vacstmt, update_reltuples, false);
!
! /*
! * If there are child tables, do recursive ANALYZE.
! */
! if (onerel->rd_rel->relhassubclass)
! do_analyze_rel(onerel, vacstmt, false, true);
! }
/*
* Close source relation now, but keep lock so that no one deletes it
***************
*** 354,360 ****
(errcode(ERRCODE_UNDEFINED_COLUMN),
errmsg("column \"%s\" of relation \"%s\" does not exist",
col, RelationGetRelationName(onerel))));
! vacattrstats[tcnt] = examine_attribute(onerel, i, NULL);
if (vacattrstats[tcnt] != NULL)
tcnt++;
}
--- 376,382 ----
(errcode(ERRCODE_UNDEFINED_COLUMN),
errmsg("column \"%s\" of relation \"%s\" does not exist",
col, RelationGetRelationName(onerel))));
! vacattrstats[tcnt] = examine_attribute(onerel, i, NULL, anl_context);
if (vacattrstats[tcnt] != NULL)
tcnt++;
}
***************
*** 368,374 ****
tcnt = 0;
for (i = 1; i <= attr_cnt; i++)
{
! vacattrstats[tcnt] = examine_attribute(onerel, i, NULL);
if (vacattrstats[tcnt] != NULL)
tcnt++;
}
--- 390,396 ----
tcnt = 0;
for (i = 1; i <= attr_cnt; i++)
{
! vacattrstats[tcnt] = examine_attribute(onerel, i, NULL, anl_context);
if (vacattrstats[tcnt] != NULL)
tcnt++;
}
***************
*** 423,429 ****
indexkey = (Node *) lfirst(indexpr_item);
indexpr_item = lnext(indexpr_item);
thisdata->vacattrstats[tcnt] =
! examine_attribute(Irel[ind], i + 1, indexkey);
if (thisdata->vacattrstats[tcnt] != NULL)
{
tcnt++;
--- 445,451 ----
indexkey = (Node *) lfirst(indexpr_item);
indexpr_item = lnext(indexpr_item);
thisdata->vacattrstats[tcnt] =
! examine_attribute(Irel[ind], i + 1, indexkey, anl_context);
if (thisdata->vacattrstats[tcnt] != NULL)
{
tcnt++;
***************
*** 825,832 ****
* If index_expr isn't NULL, then we're trying to analyze an expression index,
* and index_expr is the expression tree representing the column's data.
*/
! static VacAttrStats *
! examine_attribute(Relation onerel, int attnum, Node *index_expr)
{
Form_pg_attribute attr = onerel->rd_att->attrs[attnum - 1];
HeapTuple typtuple;
--- 847,854 ----
* If index_expr isn't NULL, then we're trying to analyze an expression index,
* and index_expr is the expression tree representing the column's data.
*/
! VacAttrStats *
! examine_attribute(Relation onerel, int attnum, Node *index_expr, MemoryContext anl_context)
{
Form_pg_attribute attr = onerel->rd_att->attrs[attnum - 1];
HeapTuple typtuple;
***************
*** 1272,1278 ****
}
/* Select a random value R uniformly distributed in (0 - 1) */
! static double
random_fract(void)
{
return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
--- 1294,1300 ----
}
/* Select a random value R uniformly distributed in (0 - 1) */
! double
random_fract(void)
{
return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
***************
*** 1292,1305 ****
* determines the number of records to skip before the next record is
* processed.
*/
! static double
init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
return exp(-log(random_fract()) / n);
}
! static double
get_next_S(double t, int n, double *stateptr)
{
double S;
--- 1314,1327 ----
* determines the number of records to skip before the next record is
* processed.
*/
! double
init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
return exp(-log(random_fract()) / n);
}
! double
get_next_S(double t, int n, double *stateptr)
{
double S;
***************
*** 1384,1390 ****
/*
* qsort comparator for sorting rows[] array
*/
! static int
compare_rows(const void *a, const void *b)
{
HeapTuple ha = *(HeapTuple *) a;
--- 1406,1412 ----
/*
* qsort comparator for sorting rows[] array
*/
! int
compare_rows(const void *a, const void *b)
{
HeapTuple ha = *(HeapTuple *) a;
***************
*** 1578,1584 ****
* ANALYZE the same table concurrently. Presently, we lock that out
* by taking a self-exclusive lock on the relation in analyze_rel().
*/
! static void
update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
{
Relation sd;
--- 1600,1606 ----
* ANALYZE the same table concurrently. Presently, we lock that out
* by taking a self-exclusive lock on the relation in analyze_rel().
*/
! void
update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
{
Relation sd;
***************
*** 1712,1718 ****
* This exists to provide some insulation between compute_stats routines
* and the actual storage of the sample data.
*/
! static Datum
std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull)
{
int attnum = stats->tupattnum;
--- 1734,1740 ----
* This exists to provide some insulation between compute_stats routines
* and the actual storage of the sample data.
*/
! Datum
std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull)
{
int attnum = stats->tupattnum;
***************
*** 1728,1734 ****
* We have not bothered to construct index tuples, instead the data is
* just in Datum arrays.
*/
! static Datum
ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull)
{
int i;
--- 1750,1756 ----
* We have not bothered to construct index tuples, instead the data is
* just in Datum arrays.
*/
! Datum
ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull)
{
int i;
diff -crNB original/postgresql-9.1beta1/src/include/commands/vacuum.h changed/postgresql-9.1beta1/src/include/commands/vacuum.h
*** original/postgresql-9.1beta1/src/include/commands/vacuum.h 2011-04-28 06:17:22.000000000 +0900
--- changed/postgresql-9.1beta1/src/include/commands/vacuum.h 2011-09-12 15:03:51.000000000 +0900
***************
*** 130,137 ****
/* GUC parameters */
! extern PGDLLIMPORT int default_statistics_target; /* PGDLLIMPORT for
! * PostGIS */
extern int vacuum_freeze_min_age;
extern int vacuum_freeze_table_age;
--- 130,137 ----
/* GUC parameters */
! extern PGDLLIMPORT int default_statistics_target; /* PGDLLIMPORT for
! * PostGIS */
extern int vacuum_freeze_min_age;
extern int vacuum_freeze_table_age;
***************
*** 161,166 ****
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, VacuumStmt *vacstmt,
! BufferAccessStrategy bstrategy, bool update_reltuples);
#endif /* VACUUM_H */
--- 161,175 ----
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, VacuumStmt *vacstmt,
! BufferAccessStrategy bstrategy, bool update_reltuples);
! extern VacAttrStats * examine_attribute(Relation onerel, int attnum, Node *index_expr,
! MemoryContext anl_context);
! extern double random_fract(void);
! extern double init_selection_state(int n);
! extern double get_next_S(double t, int n, double *stateptr);
! extern int compare_rows(const void *a, const void *b);
! extern void update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats);
! extern Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
! extern Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
#endif /* VACUUM_H */
diff -crNB original/postgresql-9.1beta1/src/include/foreign/fdwapi.h changed/postgresql-9.1beta1/src/include/foreign/fdwapi.h
*** original/postgresql-9.1beta1/src/include/foreign/fdwapi.h 2011-04-28 06:17:22.000000000 +0900
--- changed/postgresql-9.1beta1/src/include/foreign/fdwapi.h 2011-09-12 15:08:31.000000000 +0900
***************
*** 12,19 ****
--- 12,21 ----
#ifndef FDWAPI_H
#define FDWAPI_H
+ #include "foreign/foreign.h"
#include "nodes/execnodes.h"
#include "nodes/relation.h"
+ #include "utils/rel.h"
/* To avoid including explain.h here, reference ExplainState thus: */
struct ExplainState;
***************
*** 68,73 ****
--- 69,77 ----
typedef void (*EndForeignScan_function) (ForeignScanState *node);
+ typedef void (*AnalyzeForeignTable_function) (Relation relation,
+ VacuumStmt *vacstmt,
+ int elevel);
/*
* FdwRoutine is the struct returned by a foreign-data wrapper's handler
***************
*** 88,93 ****
--- 92,98 ----
IterateForeignScan_function IterateForeignScan;
ReScanForeignScan_function ReScanForeignScan;
EndForeignScan_function EndForeignScan;
+ AnalyzeForeignTable_function AnalyzeForeignTable;
} FdwRoutine;
Hi Fujita-san,
(2011/09/12 19:40), Etsuro Fujita wrote:
Hi there,
To enable file_fdw to estimate costs of scanning a CSV file more
accurately, I would like to propose a new FDW callback routine,
AnalyzeForeignTable, which allows to ANALYZE command to collect
statistics on a foreign table, and a corresponding file_fdw function,
fileAnalyzeForeignTable. Attached is my WIP patch.
<snip>
I think this is a very nice feature so that planner would be able to
create smarter plan for a query which uses foreign tables.
I took a look at the patch, and found that it couldn't be applied
cleanly against HEAD. Please rebase your patch against current HEAD of
master branch, rather than 9.1beta1.
The wiki pages below would be helpful for you.
http://wiki.postgresql.org/wiki/Submitting_a_Patch
http://wiki.postgresql.org/wiki/Creating_Clean_Patches
http://wiki.postgresql.org/wiki/Reviewing_a_Patch
And it would be easy to use git to follow changes made by other
developers in master branch.
http://wiki.postgresql.org/wiki/Working_with_Git
Regards,
--
Shigeru Hanada
2011/9/12 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
This is called when ANALYZE command is executed. (ANALYZE
command should be executed because autovacuum does not analyze foreign
tables.)
This is a good idea.
However, if adding these statistics requires an explicit ANALYZE
command, then we should also have a command for resetting the
collected statistics -- to get it back into the un-analyzed state.
Currently it looks like the only way to reset statistics is to tamper
with catalogs directly, or recreate the foreign table.
Regards,
Marti
Marti Raudsepp <marti@juffo.org> writes:
2011/9/12 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
This is called when ANALYZE command is executed. (ANALYZE
command should be executed because autovacuum does not analyze foreign
tables.)
This is a good idea.
However, if adding these statistics requires an explicit ANALYZE
command, then we should also have a command for resetting the
collected statistics -- to get it back into the un-analyzed state.
Uh, why? There is no UNANALYZE operation for ordinary tables, and
I've never heard anyone ask for one.
If you're desperate you could manually delete the relevant rows in
pg_statistic, a solution that would presumably work for foreign tables
too.
Probably a more interesting question is why we wouldn't change
autovacuum so that it calls this automatically for foreign tables.
(Note: I'm unconvinced that there's a use-case for this in the case of
"real" foreign tables on a remote server --- it seems likely that the
wrapper ought to ask the remote server for its current stats, instead.
But it's clearly useful for non-server-backed sources such as file_fdw.)
regards, tom lane
On 20-09-2011 11:12, Marti Raudsepp wrote:
2011/9/12 Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp>:
This is called when ANALYZE command is executed. (ANALYZE
command should be executed because autovacuum does not analyze foreign
tables.)This is a good idea.
However, if adding these statistics requires an explicit ANALYZE
command, then we should also have a command for resetting the
collected statistics -- to get it back into the un-analyzed state.
Why would you want this? If the stats aren't up to date, run ANALYZE
periodically. Remember that it is part of the DBA maintenance tasks [1]http://www.postgresql.org/docs/current/static/maintenance.html.
[1]: http://www.postgresql.org/docs/current/static/maintenance.html
--
Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
On Tue, Sep 20, 2011 at 11:13:05AM -0400, Tom Lane wrote:
Marti Raudsepp <marti@juffo.org> writes:
2011/9/12 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
This is called when ANALYZE command is executed. (ANALYZE
command should be executed because autovacuum does not analyze foreign
tables.)This is a good idea.
However, if adding these statistics requires an explicit ANALYZE
command, then we should also have a command for resetting the
collected statistics -- to get it back into the un-analyzed state.Uh, why? There is no UNANALYZE operation for ordinary tables, and
I've never heard anyone ask for one.If you're desperate you could manually delete the relevant rows in
pg_statistic, a solution that would presumably work for foreign tables
too.Probably a more interesting question is why we wouldn't change
autovacuum so that it calls this automatically for foreign tables.
How about a per-table setting that tells autovacuum whether to do
this? Come to think of it, all of per-FDW, per-remote and per-table
settings would be handy, so people could express things like, "all CSV
files except these three, all PostgreSQL connections on the
10.1.0.0/16 network, and these two tables in Oracle."
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
Excerpts from David Fetter's message of mar sep 20 21:22:32 -0300 2011:
On Tue, Sep 20, 2011 at 11:13:05AM -0400, Tom Lane wrote:
Probably a more interesting question is why we wouldn't change
autovacuum so that it calls this automatically for foreign tables.How about a per-table setting that tells autovacuum whether to do
this?
Seems reasonable. Have autovacuum assume that foreign tables are not to
be analyzed, unless some reloption is set.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hi Hanada-san,
I'm very sorry for late reply.
(2011/09/20 18:49), Shigeru Hanada wrote:
I took a look at the patch, and found that it couldn't be applied
cleanly against HEAD. Please rebase your patch against current HEAD of
master branch, rather than 9.1beta1.The wiki pages below would be helpful for you.
http://wiki.postgresql.org/wiki/Submitting_a_Patch
http://wiki.postgresql.org/wiki/Creating_Clean_Patches
http://wiki.postgresql.org/wiki/Reviewing_a_PatchAnd it would be easy to use git to follow changes made by other
developers in master branch.
http://wiki.postgresql.org/wiki/Working_with_Git
Thank you for the review and the helpful information.
I rebased. Please find attached a patch. I'll add the patch to the next CF.
Changes:
* cleanups and fixes
* addition of the following to ALTER FOREIGN TABLE
ALTER [COLUMN] column SET STATISTICS integer
ALTER [COLUMN] column SET ( n_distinct = val ) (n_distinct only)
ALTER [COLUMN] column RESET ( n_distinct )
* reflection of the force_not_null info in acquiring sample rows
* documentation
Best regards,
Etsuro Fujita
Attachments:
postgresql-analyze-v2.patchtext/plain; name=postgresql-analyze-v2.patchDownload
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
***************
*** 15,30 ****
--- 15,42 ----
#include <sys/stat.h>
#include <unistd.h>
+ #include "access/htup.h"
#include "access/reloptions.h"
+ #include "access/transam.h"
#include "catalog/pg_foreign_table.h"
#include "commands/copy.h"
+ #include "commands/dbcommands.h"
#include "commands/defrem.h"
#include "commands/explain.h"
+ #include "commands/vacuum.h"
#include "foreign/fdwapi.h"
#include "foreign/foreign.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "optimizer/cost.h"
+ #include "optimizer/plancat.h"
+ #include "parser/parse_relation.h"
+ #include "pgstat.h"
+ #include "utils/attoptcache.h"
+ #include "utils/elog.h"
+ #include "utils/guc.h"
+ #include "utils/lsyscache.h"
+ #include "utils/memutils.h"
#include "utils/rel.h"
#include "utils/syscache.h"
***************
*** 101,106 **** static void fileBeginForeignScan(ForeignScanState *node, int eflags);
--- 113,119 ----
static TupleTableSlot *fileIterateForeignScan(ForeignScanState *node);
static void fileReScanForeignScan(ForeignScanState *node);
static void fileEndForeignScan(ForeignScanState *node);
+ static void fileAnalyzeForeignTable(Relation onerel, VacuumStmt *vacstmt, int elevel);
/*
* Helper functions
***************
*** 112,118 **** static List *get_file_fdw_attribute_options(Oid relid);
static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
const char *filename,
Cost *startup_cost, Cost *total_cost);
!
/*
* Foreign-data wrapper handler function: return a struct with pointers
--- 125,132 ----
static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
const char *filename,
Cost *startup_cost, Cost *total_cost);
! static void file_fdw_do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, int elevel);
! static int file_fdw_acquire_sample_rows(Relation onerel, int elevel, HeapTuple *rows, int targrows, BlockNumber *totalpages, double *totalrows);
/*
* Foreign-data wrapper handler function: return a struct with pointers
***************
*** 129,134 **** file_fdw_handler(PG_FUNCTION_ARGS)
--- 143,149 ----
fdwroutine->IterateForeignScan = fileIterateForeignScan;
fdwroutine->ReScanForeignScan = fileReScanForeignScan;
fdwroutine->EndForeignScan = fileEndForeignScan;
+ fdwroutine->AnalyzeForeignTable = fileAnalyzeForeignTable;
PG_RETURN_POINTER(fdwroutine);
}
***************
*** 575,580 **** fileReScanForeignScan(ForeignScanState *node)
--- 590,605 ----
}
/*
+ * fileAnalyzeForeignTable
+ * Analyze table
+ */
+ static void
+ fileAnalyzeForeignTable(Relation onerel, VacuumStmt *vacstmt, int elevel)
+ {
+ file_fdw_do_analyze_rel(onerel, vacstmt, elevel);
+ }
+
+ /*
* Estimate costs of scanning a foreign table.
*/
static void
***************
*** 584,590 **** estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
{
struct stat stat_buf;
BlockNumber pages;
! int tuple_width;
double ntuples;
double nrows;
Cost run_cost = 0;
--- 609,616 ----
{
struct stat stat_buf;
BlockNumber pages;
! BlockNumber relpages;
! double reltuples;
double ntuples;
double nrows;
Cost run_cost = 0;
***************
*** 604,619 **** estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
if (pages < 1)
pages = 1;
! /*
! * Estimate the number of tuples in the file. We back into this estimate
! * using the planner's idea of the relation width; which is bogus if not
! * all columns are being read, not to mention that the text representation
! * of a row probably isn't the same size as its internal representation.
! * FIXME later.
! */
! tuple_width = MAXALIGN(baserel->width) + MAXALIGN(sizeof(HeapTupleHeaderData));
! ntuples = clamp_row_est((double) stat_buf.st_size / (double) tuple_width);
/*
* Now estimate the number of rows returned by the scan after applying the
--- 630,661 ----
if (pages < 1)
pages = 1;
! relpages = baserel->pages;
! reltuples = baserel->tuples;
! if (relpages > 0)
! {
! double density;
!
! density = reltuples / (double) relpages;
!
! ntuples = clamp_row_est(density * (double) pages);
! }
! else
! {
! int tuple_width;
!
! /*
! * Estimate the number of tuples in the file. We back into this estimate
! * using the planner's idea of the relation width; which is bogus if not
! * all columns are being read, not to mention that the text representation
! * of a row probably isn't the same size as its internal representation.
! * FIXME later.
! */
! tuple_width = MAXALIGN(baserel->width) + MAXALIGN(sizeof(HeapTupleHeaderData));
!
! ntuples = clamp_row_est((double) stat_buf.st_size / (double) tuple_width);
! }
/*
* Now estimate the number of rows returned by the scan after applying the
***************
*** 645,647 **** estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
--- 687,1092 ----
run_cost += cpu_per_tuple * ntuples;
*total_cost = *startup_cost + run_cost;
}
+
+ /*
+ * file_fdw_do_analyze_rel() -- analyze one foreign table
+ */
+ static void
+ file_fdw_do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, int elevel)
+ {
+ int i,
+ attr_cnt,
+ tcnt,
+ numrows = 0,
+ targrows;
+ double totalrows = 0;
+ BlockNumber totalpages = 0;
+ HeapTuple *rows;
+ VacAttrStats **vacattrstats;
+ MemoryContext anl_context;
+ MemoryContext caller_context;
+
+ ereport(elevel,
+ (errmsg("analyzing \"%s.%s\"",
+ get_namespace_name(RelationGetNamespace(onerel)),
+ RelationGetRelationName(onerel))));
+
+ /*
+ * Set up a working context so that we can easily free whatever junk gets
+ * created.
+ */
+ anl_context = AllocSetContextCreate(CurrentMemoryContext,
+ "Analyze",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ caller_context = MemoryContextSwitchTo(anl_context);
+
+ /*
+ * Determine which columns to analyze
+ *
+ * Note that system attributes are never analyzed.
+ */
+ if (vacstmt->va_cols != NIL)
+ {
+ ListCell *le;
+
+ vacattrstats = (VacAttrStats **) palloc(list_length(vacstmt->va_cols) *
+ sizeof(VacAttrStats *));
+ tcnt = 0;
+ foreach(le, vacstmt->va_cols)
+ {
+ char *col = strVal(lfirst(le));
+
+ i = attnameAttNum(onerel, col, false);
+ if (i == InvalidAttrNumber)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_COLUMN),
+ errmsg("column \"%s\" of relation \"%s\" does not exist",
+ col, RelationGetRelationName(onerel))));
+ vacattrstats[tcnt] = examine_attribute(onerel, i, NULL, anl_context);
+ if (vacattrstats[tcnt] != NULL)
+ tcnt++;
+ }
+ attr_cnt = tcnt;
+ }
+ else
+ {
+ attr_cnt = onerel->rd_att->natts;
+ vacattrstats = (VacAttrStats **) palloc(attr_cnt * sizeof(VacAttrStats *));
+ tcnt = 0;
+ for (i = 1; i <= attr_cnt; i++)
+ {
+ vacattrstats[tcnt] = examine_attribute(onerel, i, NULL, anl_context);
+ if (vacattrstats[tcnt] != NULL)
+ tcnt++;
+ }
+ attr_cnt = tcnt;
+ }
+
+ /*
+ * Quit if no analyzable columns.
+ */
+ if (attr_cnt <= 0)
+ goto cleanup;
+
+ /*
+ * Determine how many rows we need to sample, using the worst case from
+ * all analyzable columns. We use a lower bound of 100 rows to avoid
+ * possible overflow in Vitter's algorithm.
+ */
+ targrows = 100;
+ for (i = 0; i < attr_cnt; i++)
+ {
+ if (targrows < vacattrstats[i]->minrows)
+ targrows = vacattrstats[i]->minrows;
+ }
+
+ /*
+ * Acquire the sample rows
+ */
+ rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
+ numrows = file_fdw_acquire_sample_rows(onerel, elevel, rows, targrows, &totalpages, &totalrows);
+
+ /*
+ * Compute the statistics. Temporary results during the calculations for
+ * each column are stored in a child context. The calc routines are
+ * responsible to make sure that whatever they store into the VacAttrStats
+ * structure is allocated in anl_context.
+ */
+ if (numrows > 0)
+ {
+ MemoryContext col_context, old_context;
+
+ col_context = AllocSetContextCreate(anl_context,
+ "Analyze Column",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ old_context = MemoryContextSwitchTo(col_context);
+
+ for (i = 0; i < attr_cnt; i++)
+ {
+ VacAttrStats *stats = vacattrstats[i];
+ AttributeOpts *aopt = get_attribute_options(onerel->rd_id, stats->attr->attnum);
+
+ stats->rows = rows;
+ stats->tupDesc = onerel->rd_att;
+ (*stats->compute_stats) (stats,
+ std_fetch_func,
+ numrows,
+ totalrows);
+
+ /*
+ * If the appropriate flavor of the n_distinct option is
+ * specified, override with the corresponding value.
+ */
+ if (aopt != NULL)
+ {
+ float8 n_distinct = aopt->n_distinct;
+
+ if (n_distinct != 0.0)
+ stats->stadistinct = n_distinct;
+ }
+
+ MemoryContextResetAndDeleteChildren(col_context);
+ }
+
+ MemoryContextSwitchTo(old_context);
+ MemoryContextDelete(col_context);
+
+ /*
+ * Emit the completed stats rows into pg_statistic, replacing any
+ * previous statistics for the target columns. (If there are stats in
+ * pg_statistic for columns we didn't process, we leave them alone.)
+ */
+ update_attstats(onerel->rd_id, false, attr_cnt, vacattrstats);
+ }
+
+ /*
+ * Update pages/tuples stats in pg_class.
+ */
+ vac_update_relstats(onerel, totalpages, totalrows, false, InvalidTransactionId);
+
+ /*
+ * Report ANALYZE to the stats collector, too; likewise, tell it to adopt
+ * these numbers only if we're not inside a VACUUM that got a better
+ * number. However, a call with inh = true shouldn't reset the stats.
+ */
+ pgstat_report_analyze(onerel, totalrows, 0);
+
+ /* We skip to here if there were no analyzable columns */
+ cleanup:
+
+ /* Restore current context and release memory */
+ MemoryContextSwitchTo(caller_context);
+ MemoryContextDelete(anl_context);
+ anl_context = NULL;
+ }
+
+ /*
+ * file_fdw_acquire_sample_rows -- acquire a random sample of rows from the table
+ *
+ * Selected rows are returned in the caller-allocated array rows[], which
+ * must have at least targrows entries.
+ * The actual number of rows selected is returned as the function result.
+ * We also count the number of rows in the table, and return it into *totalrows.
+ *
+ * The returned list of tuples is in order by physical position in the table.
+ * (We will rely on this later to derive correlation estimates.)
+ */
+ static int
+ file_fdw_acquire_sample_rows(Relation onerel, int elevel, HeapTuple *rows, int targrows, BlockNumber *totalpages, double *totalrows)
+ {
+ int numrows = 0;
+ double samplerows = 0; /* total # rows collected */
+ double rowstoskip = -1; /* -1 means not set yet */
+ double rstate;
+ HeapTuple tuple;
+ TupleDesc tupDesc;
+ TupleConstr *constr;
+ int natts;
+ int attrChk;
+ Datum *values;
+ bool *nulls;
+ bool found;
+ bool sample_it = false;
+ BlockNumber blknum;
+ OffsetNumber offnum;
+ char *filename;
+ struct stat stat_buf;
+ List *options;
+ ListCell *lc;
+ List *force_notnull = NIL;
+ bool *force_notnull_flags;
+ CopyState cstate;
+ ErrorContextCallback errcontext;
+
+ Assert(onerel);
+ Assert(targrows > 0);
+
+ tupDesc = RelationGetDescr(onerel);
+ constr = tupDesc->constr;
+ natts = tupDesc->natts;
+ values = (Datum *) palloc(tupDesc->natts * sizeof(Datum));
+ nulls = (bool *) palloc(tupDesc->natts * sizeof(bool));
+ force_notnull_flags = (bool *) palloc0(natts * sizeof(bool));
+
+ /* Fetch options of foreign table */
+ fileGetOptions(RelationGetRelid(onerel), &filename, &options);
+
+ foreach(lc, options)
+ {
+ DefElem *def = (DefElem *) lfirst(lc);
+
+ if (strcmp(def->defname, "force_not_null") == 0)
+ {
+ force_notnull = (List *) def->arg;
+ break;
+ }
+ }
+
+ if (force_notnull)
+ {
+ List *attnums;
+ ListCell *cur;
+
+ attnums = CopyGetAttnums(tupDesc, onerel, force_notnull);
+
+ foreach(cur, attnums)
+ {
+ int attnum = lfirst_int(cur);
+
+ force_notnull_flags[attnum - 1] = true;
+ }
+ }
+
+ /*
+ * Get size of the file.
+ */
+ if (stat(filename, &stat_buf) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ filename)));
+
+ /*
+ * Convert size to pages for use in I/O cost estimate.
+ */
+ *totalpages = (stat_buf.st_size + (BLCKSZ - 1)) / BLCKSZ;
+ if (*totalpages < 1)
+ *totalpages = 1;
+
+ /*
+ * Create CopyState from FDW options. We always acquire all columns, so
+ * as to match the expected ScanTupleSlot signature.
+ */
+ cstate = BeginCopyFrom(onerel, filename, NIL, options);
+
+ /* Prepare for sampling rows */
+ rstate = init_selection_state(targrows);
+
+ for (;;)
+ {
+ sample_it = true;
+
+ /* Set up callback to identify error line number. */
+ errcontext.callback = CopyFromErrorCallback;
+ errcontext.arg = (void *) cstate;
+ errcontext.previous = error_context_stack;
+ error_context_stack = &errcontext;
+
+ found = NextCopyFrom(cstate, NULL, values, nulls, NULL);
+
+ /* Remove error callback. */
+ error_context_stack = errcontext.previous;
+
+ if (!found)
+ break;
+
+ tuple = heap_form_tuple(tupDesc, values, nulls);
+
+ if (constr && constr->has_not_null)
+ {
+ for (attrChk = 1; attrChk <= natts; attrChk++)
+ {
+ if (onerel->rd_att->attrs[attrChk - 1]->attnotnull &&
+ !force_notnull_flags[attrChk - 1] &&
+ heap_attisnull(tuple, attrChk))
+ {
+ sample_it = false;
+ break;
+ }
+ }
+ }
+
+ if (!sample_it)
+ {
+ heap_freetuple(tuple);
+ continue;
+ }
+
+ /*
+ * The first targrows sample rows are simply copied into the
+ * reservoir. Then we start replacing tuples in the sample
+ * until we reach the end of the relation. This algorithm is
+ * from Jeff Vitter's paper (see full citation below). It
+ * works by repeatedly computing the number of tuples to skip
+ * before selecting a tuple, which replaces a randomly chosen
+ * element of the reservoir (current set of tuples). At all
+ * times the reservoir is a true random sample of the tuples
+ * we've passed over so far, so when we fall off the end of
+ * the relation we're done.
+ */
+ if (numrows < targrows)
+ {
+ blknum = (BlockNumber) samplerows / MaxOffsetNumber;
+ offnum = (OffsetNumber) samplerows % MaxOffsetNumber + 1;
+ ItemPointerSet(&tuple->t_self, blknum, offnum);
+ rows[numrows++] = heap_copytuple(tuple);
+ }
+ else
+ {
+ /*
+ * t in Vitter's paper is the number of records already
+ * processed. If we need to compute a new S value, we
+ * must use the not-yet-incremented value of samplerows as
+ * t.
+ */
+ if (rowstoskip < 0)
+ rowstoskip = get_next_S(samplerows, targrows, &rstate);
+
+ if (rowstoskip <= 0)
+ {
+ /*
+ * Found a suitable tuple, so save it, replacing one
+ * old tuple at random
+ */
+ int k = (int) (targrows * random_fract());
+
+ Assert(k >= 0 && k < targrows);
+ heap_freetuple(rows[k]);
+
+ blknum = (BlockNumber) samplerows / MaxOffsetNumber;
+ offnum = (OffsetNumber) samplerows % MaxOffsetNumber + 1;
+ ItemPointerSet(&tuple->t_self, blknum, offnum);
+ rows[k] = heap_copytuple(tuple);
+ }
+
+ rowstoskip -= 1;
+ }
+
+ samplerows += 1;
+ heap_freetuple(tuple);
+ }
+
+ /*
+ * If we didn't find as many tuples as we wanted then we're done. No sort
+ * is needed, since they're already in order.
+ *
+ * Otherwise we need to sort the collected tuples by position
+ * (itempointer). It's not worth worrying about corner cases where the
+ * tuples are already sorted.
+ */
+ if (numrows == targrows)
+ qsort((void *) rows, numrows, sizeof(HeapTuple), compare_rows);
+
+ *totalrows = samplerows;
+
+ EndCopyFrom(cstate);
+
+ pfree(values);
+ pfree(nulls);
+ pfree(force_notnull_flags);
+
+ /*
+ * Emit some interesting relation info
+ */
+ ereport(elevel,
+ (errmsg("\"%s\": scanned, "
+ "containing %d rows; "
+ "%d rows in sample",
+ RelationGetRelationName(onerel), (int) samplerows, numrows)));
+
+ return numrows;
+ }
*** a/contrib/file_fdw/input/file_fdw.source
--- b/contrib/file_fdw/input/file_fdw.source
***************
*** 111,116 **** EXECUTE st(100);
--- 111,121 ----
EXECUTE st(100);
DEALLOCATE st;
+ -- statistics collection tests
+ ANALYZE agg_csv;
+ SELECT relpages, reltuples FROM pg_class WHERE relname = 'agg_csv';
+ SELECT * FROM pg_stats WHERE tablename = 'agg_csv';
+
-- tableoid
SELECT tableoid::regclass, b FROM agg_csv;
*** a/contrib/file_fdw/output/file_fdw.source
--- b/contrib/file_fdw/output/file_fdw.source
***************
*** 174,179 **** EXECUTE st(100);
--- 174,194 ----
(1 row)
DEALLOCATE st;
+ -- statistics collection tests
+ ANALYZE agg_csv;
+ SELECT relpages, reltuples FROM pg_class WHERE relname = 'agg_csv';
+ relpages | reltuples
+ ----------+-----------
+ 1 | 3
+ (1 row)
+
+ SELECT * FROM pg_stats WHERE tablename = 'agg_csv';
+ schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
+ ------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+-------------------------+-------------
+ public | agg_csv | a | f | 0 | 2 | -1 | | | {0,42,100} | -0.5
+ public | agg_csv | b | f | 0 | 4 | -1 | | | {0.09561,99.097,324.78} | 0.5
+ (2 rows)
+
-- tableoid
SELECT tableoid::regclass, b FROM agg_csv;
tableoid | b
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***************
*** 233,238 **** EndForeignScan (ForeignScanState *node);
--- 233,257 ----
for additional details.
</para>
+ <para>
+ <programlisting>
+ void
+ AnalyzeForeignTable (Relation onerel,
+ VacuumStmt *vacstmt,
+ int elevel);
+ </programlisting>
+
+ Collect statistics on a foreign table and store the results in the
+ pg_class and pg_statistics system catalog.
+ This is called when <command>ANALYZE</> command is run.
+ </para>
+
+ <para>
+ The <structname>FdwRoutine</> and <structname>FdwPlan</> struct types
+ are declared in <filename>src/include/foreign/fdwapi.h</>, which see
+ for additional details.
+ </para>
+
</sect1>
</chapter>
*** a/doc/src/sgml/ref/alter_foreign_table.sgml
--- b/doc/src/sgml/ref/alter_foreign_table.sgml
***************
*** 36,41 **** ALTER FOREIGN TABLE <replaceable class="PARAMETER">name</replaceable>
--- 36,44 ----
DROP [ COLUMN ] [ IF EXISTS ] <replaceable class="PARAMETER">column</replaceable> [ RESTRICT | CASCADE ]
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> [ SET DATA ] TYPE <replaceable class="PARAMETER">type</replaceable>
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> { SET | DROP } NOT NULL
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> SET STATISTICS <replaceable class="PARAMETER">integer</replaceable>
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> SET ( <replaceable class="PARAMETER">attribute_option</replaceable> = <replaceable class="PARAMETER">value</replaceable> [, ... ] )
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> RESET ( <replaceable class="PARAMETER">attribute_option</replaceable> [, ... ] )
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> OPTIONS ( [ ADD | SET | DROP ] <replaceable class="PARAMETER">option</replaceable> ['<replaceable class="PARAMETER">value</replaceable>'] [, ... ])
OWNER TO <replaceable class="PARAMETER">new_owner</replaceable>
OPTIONS ( [ ADD | SET | DROP ] <replaceable class="PARAMETER">option</replaceable> ['<replaceable class="PARAMETER">value</replaceable>'] [, ... ])
***************
*** 94,99 **** ALTER FOREIGN TABLE <replaceable class="PARAMETER">name</replaceable>
--- 97,146 ----
</varlistentry>
<varlistentry>
+ <term><literal>SET STATISTICS</literal></term>
+ <listitem>
+ <para>
+ This form
+ sets the per-column statistics-gathering target for subsequent
+ <xref linkend="sql-analyze"> operations.
+ The target can be set in the range 0 to 10000; alternatively, set it
+ to -1 to revert to using the system default statistics
+ target (<xref linkend="guc-default-statistics-target">).
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>SET ( <replaceable class="PARAMETER">attribute_option</replaceable> = <replaceable class="PARAMETER">value</replaceable> [, ... ] )</literal></term>
+ <term><literal>RESET ( <replaceable class="PARAMETER">attribute_option</replaceable> [, ... ] )</literal></term>
+ <listitem>
+ <para>
+ This form
+ sets or resets per-attribute options. Currently, the only defined
+ per-attribute options is <literal>n_distinct</>, which override
+ the number-of-distinct-values estimates made by subsequent
+ <xref linkend="sql-analyze"> operations. <literal>n_distinct</>
+ affects the statistics for the foreign table itself. When set to
+ a positive value, <command>ANALYZE</> will assume that the column
+ contains exactly the specified number of distinct nonnull values.
+ When set to a negative value, which must be greater than or equal
+ to -1, <command>ANALYZE</> will assume that the number of distinct
+ nonnull values in the column is linear in the size of the foreign
+ table; the exact count is to be computed by multiplying
+ the estimated foreign table size by the absolute value of
+ the given number. For example,
+ a value of -1 implies that all values in the column are distinct,
+ while a value of -0.5 implies that each value appears twice on the
+ average.
+ This can be useful when the size of the foreign table changes over
+ time, since the multiplication by the number of rows in the foreign
+ table is not performed until query planning time. Specify a value
+ of 0 to revert to estimating the number of distinct values normally.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>OWNER</literal></term>
<listitem>
<para>
*** a/doc/src/sgml/ref/analyze.sgml
--- b/doc/src/sgml/ref/analyze.sgml
***************
*** 39,47 **** ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ ( <re
<para>
With no parameter, <command>ANALYZE</command> examines every table in the
! current database. With a parameter, <command>ANALYZE</command> examines
! only that table. It is further possible to give a list of column names,
! in which case only the statistics for those columns are collected.
</para>
</refsect1>
--- 39,48 ----
<para>
With no parameter, <command>ANALYZE</command> examines every table in the
! current database except for foreign tables. With a parameter, <command>
! ANALYZE</command> examines only that table. It is further possible to
! give a list of column names, in which case only the statistics for those
! columns are collected.
</para>
</refsect1>
***************
*** 63,69 **** ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ ( <re
<listitem>
<para>
The name (possibly schema-qualified) of a specific table to
! analyze. Defaults to all tables in the current database.
</para>
</listitem>
</varlistentry>
--- 64,71 ----
<listitem>
<para>
The name (possibly schema-qualified) of a specific table to
! analyze. Defaults to all tables in the current database except
! for foreign tables.
</para>
</listitem>
</varlistentry>
***************
*** 138,143 **** ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ ( <re
--- 140,148 ----
choices of query plans to change after <command>ANALYZE</command> is run.
To avoid this, raise the amount of statistics collected by
<command>ANALYZE</command>, as described below.
+ Note that the time needed to analyze a foreign table depends on
+ the implementation of a foreign data wrappers via which that table is
+ attached.
</para>
<para>
*** a/src/backend/commands/analyze.c
--- b/src/backend/commands/analyze.c
***************
*** 22,27 ****
--- 22,28 ----
#include "access/xact.h"
#include "catalog/index.h"
#include "catalog/indexing.h"
+ #include "catalog/pg_class.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_namespace.h"
***************
*** 29,34 ****
--- 30,37 ----
#include "commands/tablecmds.h"
#include "commands/vacuum.h"
#include "executor/executor.h"
+ #include "foreign/foreign.h"
+ #include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "parser/parse_oper.h"
***************
*** 93,112 **** static void compute_index_stats(Relation onerel, double totalrows,
AnlIndexData *indexdata, int nindexes,
HeapTuple *rows, int numrows,
MemoryContext col_context);
- static VacAttrStats *examine_attribute(Relation onerel, int attnum,
- Node *index_expr);
static int acquire_sample_rows(Relation onerel, HeapTuple *rows,
int targrows, double *totalrows, double *totaldeadrows);
- static double random_fract(void);
- static double init_selection_state(int n);
- static double get_next_S(double t, int n, double *stateptr);
- static int compare_rows(const void *a, const void *b);
static int acquire_inherited_sample_rows(Relation onerel,
HeapTuple *rows, int targrows,
double *totalrows, double *totaldeadrows);
- static void update_attstats(Oid relid, bool inh,
- int natts, VacAttrStats **vacattrstats);
- static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
static bool std_typanalyze(VacAttrStats *stats);
--- 96,106 ----
***************
*** 183,192 **** analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
}
/*
! * Check that it's a plain table; we used to do this in get_rel_oids() but
! * seems safer to check after we've locked the relation.
*/
! if (onerel->rd_rel->relkind != RELKIND_RELATION)
{
/* No need for a WARNING if we already complained during VACUUM */
if (!(vacstmt->options & VACOPT_VACUUM))
--- 177,187 ----
}
/*
! * Check that it's a plain table or a foreign table; we used to do this in
! * get_rel_oids() but seems safer to check after we've locked the relation.
*/
! if (!(onerel->rd_rel->relkind == RELKIND_RELATION ||
! onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE))
{
/* No need for a WARNING if we already complained during VACUUM */
if (!(vacstmt->options & VACOPT_VACUUM))
***************
*** 228,240 **** analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
/*
* Do the normal non-recursive ANALYZE.
*/
! do_analyze_rel(onerel, vacstmt, false);
! /*
! * If there are child tables, do recursive ANALYZE.
! */
! if (onerel->rd_rel->relhassubclass)
! do_analyze_rel(onerel, vacstmt, true);
/*
* Close source relation now, but keep lock so that no one deletes it
--- 223,252 ----
/*
* Do the normal non-recursive ANALYZE.
*/
! if (onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
! {
! ForeignDataWrapper *wrapper;
! ForeignServer *server;
! ForeignTable *table;
! FdwRoutine *fdwroutine;
! table = GetForeignTable(RelationGetRelid(onerel));
! server = GetForeignServer(table->serverid);
! wrapper = GetForeignDataWrapper(server->fdwid);
! fdwroutine = GetFdwRoutine(wrapper->fdwhandler);
!
! fdwroutine->AnalyzeForeignTable(onerel, vacstmt, elevel);
! }
! else
! {
! do_analyze_rel(onerel, vacstmt, false);
!
! /*
! * If there are child tables, do recursive ANALYZE.
! */
! if (onerel->rd_rel->relhassubclass)
! do_analyze_rel(onerel, vacstmt, true);
! }
/*
* Close source relation now, but keep lock so that no one deletes it
***************
*** 342,348 **** do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
(errcode(ERRCODE_UNDEFINED_COLUMN),
errmsg("column \"%s\" of relation \"%s\" does not exist",
col, RelationGetRelationName(onerel))));
! vacattrstats[tcnt] = examine_attribute(onerel, i, NULL);
if (vacattrstats[tcnt] != NULL)
tcnt++;
}
--- 354,360 ----
(errcode(ERRCODE_UNDEFINED_COLUMN),
errmsg("column \"%s\" of relation \"%s\" does not exist",
col, RelationGetRelationName(onerel))));
! vacattrstats[tcnt] = examine_attribute(onerel, i, NULL, anl_context);
if (vacattrstats[tcnt] != NULL)
tcnt++;
}
***************
*** 356,362 **** do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
tcnt = 0;
for (i = 1; i <= attr_cnt; i++)
{
! vacattrstats[tcnt] = examine_attribute(onerel, i, NULL);
if (vacattrstats[tcnt] != NULL)
tcnt++;
}
--- 368,374 ----
tcnt = 0;
for (i = 1; i <= attr_cnt; i++)
{
! vacattrstats[tcnt] = examine_attribute(onerel, i, NULL, anl_context);
if (vacattrstats[tcnt] != NULL)
tcnt++;
}
***************
*** 410,416 **** do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
indexkey = (Node *) lfirst(indexpr_item);
indexpr_item = lnext(indexpr_item);
thisdata->vacattrstats[tcnt] =
! examine_attribute(Irel[ind], i + 1, indexkey);
if (thisdata->vacattrstats[tcnt] != NULL)
tcnt++;
}
--- 422,428 ----
indexkey = (Node *) lfirst(indexpr_item);
indexpr_item = lnext(indexpr_item);
thisdata->vacattrstats[tcnt] =
! examine_attribute(Irel[ind], i + 1, indexkey, anl_context);
if (thisdata->vacattrstats[tcnt] != NULL)
tcnt++;
}
***************
*** 800,807 **** compute_index_stats(Relation onerel, double totalrows,
* If index_expr isn't NULL, then we're trying to analyze an expression index,
* and index_expr is the expression tree representing the column's data.
*/
! static VacAttrStats *
! examine_attribute(Relation onerel, int attnum, Node *index_expr)
{
Form_pg_attribute attr = onerel->rd_att->attrs[attnum - 1];
HeapTuple typtuple;
--- 812,819 ----
* If index_expr isn't NULL, then we're trying to analyze an expression index,
* and index_expr is the expression tree representing the column's data.
*/
! VacAttrStats *
! examine_attribute(Relation onerel, int attnum, Node *index_expr, MemoryContext anl_context)
{
Form_pg_attribute attr = onerel->rd_att->attrs[attnum - 1];
HeapTuple typtuple;
***************
*** 1247,1253 **** acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
}
/* Select a random value R uniformly distributed in (0 - 1) */
! static double
random_fract(void)
{
return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
--- 1259,1265 ----
}
/* Select a random value R uniformly distributed in (0 - 1) */
! double
random_fract(void)
{
return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
***************
*** 1267,1280 **** random_fract(void)
* determines the number of records to skip before the next record is
* processed.
*/
! static double
init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
return exp(-log(random_fract()) / n);
}
! static double
get_next_S(double t, int n, double *stateptr)
{
double S;
--- 1279,1292 ----
* determines the number of records to skip before the next record is
* processed.
*/
! double
init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
return exp(-log(random_fract()) / n);
}
! double
get_next_S(double t, int n, double *stateptr)
{
double S;
***************
*** 1359,1365 **** get_next_S(double t, int n, double *stateptr)
/*
* qsort comparator for sorting rows[] array
*/
! static int
compare_rows(const void *a, const void *b)
{
HeapTuple ha = *(const HeapTuple *) a;
--- 1371,1377 ----
/*
* qsort comparator for sorting rows[] array
*/
! int
compare_rows(const void *a, const void *b)
{
HeapTuple ha = *(const HeapTuple *) a;
***************
*** 1554,1560 **** acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
* ANALYZE the same table concurrently. Presently, we lock that out
* by taking a self-exclusive lock on the relation in analyze_rel().
*/
! static void
update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
{
Relation sd;
--- 1566,1572 ----
* ANALYZE the same table concurrently. Presently, we lock that out
* by taking a self-exclusive lock on the relation in analyze_rel().
*/
! void
update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
{
Relation sd;
***************
*** 1691,1697 **** update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
* This exists to provide some insulation between compute_stats routines
* and the actual storage of the sample data.
*/
! static Datum
std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull)
{
int attnum = stats->tupattnum;
--- 1703,1709 ----
* This exists to provide some insulation between compute_stats routines
* and the actual storage of the sample data.
*/
! Datum
std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull)
{
int attnum = stats->tupattnum;
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
***************
*** 288,295 **** static Datum CopyReadBinaryAttribute(CopyState cstate,
static void CopyAttributeOutText(CopyState cstate, char *string);
static void CopyAttributeOutCSV(CopyState cstate, char *string,
bool use_quote, bool single_attr);
- static List *CopyGetAttnums(TupleDesc tupDesc, Relation rel,
- List *attnamelist);
static char *limit_printout_length(const char *str);
/* Low-level communications functions */
--- 288,293 ----
***************
*** 3725,3731 **** CopyAttributeOutCSV(CopyState cstate, char *string,
*
* rel can be NULL ... it's only used for error reports.
*/
! static List *
CopyGetAttnums(TupleDesc tupDesc, Relation rel, List *attnamelist)
{
List *attnums = NIL;
--- 3723,3729 ----
*
* rel can be NULL ... it's only used for error reports.
*/
! List *
CopyGetAttnums(TupleDesc tupDesc, Relation rel, List *attnamelist)
{
List *attnums = NIL;
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***************
*** 311,316 **** static void ATPrepSetStatistics(Relation rel, const char *colName,
--- 311,318 ----
Node *newValue, LOCKMODE lockmode);
static void ATExecSetStatistics(Relation rel, const char *colName,
Node *newValue, LOCKMODE lockmode);
+ static void ATPrepSetOptions(Relation rel, const char *colName,
+ Node *options, LOCKMODE lockmode);
static void ATExecSetOptions(Relation rel, const char *colName,
Node *options, bool isReset, LOCKMODE lockmode);
static void ATExecSetStorage(Relation rel, const char *colName,
***************
*** 2886,2892 **** ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
break;
case AT_SetOptions: /* ALTER COLUMN SET ( options ) */
case AT_ResetOptions: /* ALTER COLUMN RESET ( options ) */
! ATSimplePermissions(rel, ATT_TABLE | ATT_INDEX);
/* This command never recurses */
pass = AT_PASS_MISC;
break;
--- 2888,2895 ----
break;
case AT_SetOptions: /* ALTER COLUMN SET ( options ) */
case AT_ResetOptions: /* ALTER COLUMN RESET ( options ) */
! ATSimplePermissions(rel, ATT_TABLE | ATT_INDEX | ATT_FOREIGN_TABLE);
! ATPrepSetOptions(rel, cmd->name, cmd->def, lockmode);
/* This command never recurses */
pass = AT_PASS_MISC;
break;
***************
*** 4822,4828 **** ATPrepSetStatistics(Relation rel, const char *colName, Node *newValue, LOCKMODE
* allowSystemTableMods to be turned on.
*/
if (rel->rd_rel->relkind != RELKIND_RELATION &&
! rel->rd_rel->relkind != RELKIND_INDEX)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("\"%s\" is not a table or index",
--- 4825,4832 ----
* allowSystemTableMods to be turned on.
*/
if (rel->rd_rel->relkind != RELKIND_RELATION &&
! rel->rd_rel->relkind != RELKIND_INDEX &&
! rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
errmsg("\"%s\" is not a table or index",
***************
*** 4894,4899 **** ATExecSetStatistics(Relation rel, const char *colName, Node *newValue, LOCKMODE
--- 4898,4923 ----
}
static void
+ ATPrepSetOptions(Relation rel, const char *colName, Node *options,
+ LOCKMODE lockmode)
+ {
+ if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ {
+ ListCell *cell;
+
+ foreach(cell, (List *) options)
+ {
+ DefElem *def = (DefElem *) lfirst(cell);
+
+ if (pg_strncasecmp(def->defname, "n_distinct_inherited", strlen("n_distinct_inherited")) == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("option \"n_distinct_inherited\" is not supported for a foreign table")));
+ }
+ }
+ }
+
+ static void
ATExecSetOptions(Relation rel, const char *colName, Node *options,
bool isReset, LOCKMODE lockmode)
{
*** a/src/include/commands/copy.h
--- b/src/include/commands/copy.h
***************
*** 32,37 **** extern bool NextCopyFrom(CopyState cstate, ExprContext *econtext,
--- 32,38 ----
extern bool NextCopyFromRawFields(CopyState cstate,
char ***fields, int *nfields);
extern void CopyFromErrorCallback(void *arg);
+ extern List *CopyGetAttnums(TupleDesc tupDesc, Relation rel, List *attnamelist);
extern DestReceiver *CreateCopyDestReceiver(void);
*** a/src/include/commands/vacuum.h
--- b/src/include/commands/vacuum.h
***************
*** 166,170 **** extern void lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
--- 166,178 ----
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, VacuumStmt *vacstmt,
BufferAccessStrategy bstrategy);
+ extern VacAttrStats * examine_attribute(Relation onerel, int attnum, Node *index_expr,
+ MemoryContext anl_context);
+ extern double random_fract(void);
+ extern double init_selection_state(int n);
+ extern double get_next_S(double t, int n, double *stateptr);
+ extern int compare_rows(const void *a, const void *b);
+ extern void update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats);
+ extern Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
#endif /* VACUUM_H */
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 12,19 ****
--- 12,21 ----
#ifndef FDWAPI_H
#define FDWAPI_H
+ #include "foreign/foreign.h"
#include "nodes/execnodes.h"
#include "nodes/relation.h"
+ #include "utils/rel.h"
/* To avoid including explain.h here, reference ExplainState thus: */
struct ExplainState;
***************
*** 68,73 **** typedef void (*ReScanForeignScan_function) (ForeignScanState *node);
--- 70,78 ----
typedef void (*EndForeignScan_function) (ForeignScanState *node);
+ typedef void (*AnalyzeForeignTable_function) (Relation relation,
+ VacuumStmt *vacstmt,
+ int elevel);
/*
* FdwRoutine is the struct returned by a foreign-data wrapper's handler
***************
*** 88,93 **** typedef struct FdwRoutine
--- 93,99 ----
IterateForeignScan_function IterateForeignScan;
ReScanForeignScan_function ReScanForeignScan;
EndForeignScan_function EndForeignScan;
+ AnalyzeForeignTable_function AnalyzeForeignTable;
} FdwRoutine;
Hi,
I'm very sorry for the late reply.
(2011/09/21 10:00), Alvaro Herrera wrote:
Excerpts from David Fetter's message of mar sep 20 21:22:32 -0300 2011:
On Tue, Sep 20, 2011 at 11:13:05AM -0400, Tom Lane wrote:
Probably a more interesting question is why we wouldn't change
autovacuum so that it calls this automatically for foreign tables.How about a per-table setting that tells autovacuum whether to do
this?Seems reasonable. Have autovacuum assume that foreign tables are not to
be analyzed, unless some reloption is set.
Thank you for the comments. I'd like to leave that feature for future work.
(But this is BTW. I'm interested in developing CREATE FOREIGN INDEX.
I've examined whether there are discussions about the design and
implementation of it in the archive, but could not find information. If
you know anything, please tell me.)
Best regards,
Etsuro Fujita
On Fri, Oct 07, 2011 at 08:09:44PM +0900, Etsuro Fujita wrote:
Hi,
I'm very sorry for the late reply.
(2011/09/21 10:00), Alvaro Herrera wrote:
Excerpts from David Fetter's message of mar sep 20 21:22:32 -0300 2011:
On Tue, Sep 20, 2011 at 11:13:05AM -0400, Tom Lane wrote:
Probably a more interesting question is why we wouldn't change
autovacuum so that it calls this automatically for foreign tables.How about a per-table setting that tells autovacuum whether to do
this?Seems reasonable. Have autovacuum assume that foreign tables are not to
be analyzed, unless some reloption is set.Thank you for the comments. I'd like to leave that feature for future work.
OK
(But this is BTW. I'm interested in developing CREATE FOREIGN INDEX.
I've examined whether there are discussions about the design and
implementation of it in the archive, but could not find information.
If you know anything, please tell me.)
Look into the "virtual index interface" from Informix. We might want
to start a wiki page on this.
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
(2011/10/07 21:56), David Fetter wrote:
(But this is BTW. I'm interested in developing CREATE FOREIGN INDEX.
I've examined whether there are discussions about the design and
implementation of it in the archive, but could not find information.
If you know anything, please tell me.)Look into the "virtual index interface" from Informix.
Thank you for the information.
We might want to start a wiki page on this.
Yeah, I think it might be better to add information to the SQL/MED wiki
page:
http://wiki.postgresql.org/wiki/SQL/MED
Best regards,
Etsuro Fujita
(2011/10/07 18:09), Etsuro Fujita wrote:
Thank you for the review and the helpful information.
I rebased. Please find attached a patch. I'll add the patch to the next CF.Changes:
* cleanups and fixes
* addition of the following to ALTER FOREIGN TABLE
ALTER [COLUMN] column SET STATISTICS integer
ALTER [COLUMN] column SET ( n_distinct = val ) (n_distinct only)
ALTER [COLUMN] column RESET ( n_distinct )
* reflection of the force_not_null info in acquiring sample rows
* documentation
The new patch could be applied with some shifts. Regression tests of
core and file_fdw have passed cleanly. Though I've tested only simple
tests, ANALYZE works for foreign tables for file_fdw, and estimation of
costs and selectivity seem appropriate.
New API AnalyzeForeignTable
===========================
In your design, new handler function is called instead of
do_analylze_rel. IMO this hook point would be good for FDWs which can
provide statistics in optimized way. For instance, pgsql_fdw can simply
copy statistics from remote PostgreSQL server if they are compatible.
Possible another idea is to replace acquire_sample_rows so that other
FDWs can reuse most part of fileAnalyzeForeignTable and
file_fdw_do_analyze_rel.
And I think that AnalyzeForeignTable should be optional, and it would be
very useful if a default handler is provided. Probably a default
handler can use basic FDW APIs to acquire sample rows from the result of
"SELECT * FROM foreign_table" with skipping periodically. It won't be
efficient but I think it's not so unreasonable.
Other issues
============
There are some other comments about non-critical issues.
- When there is no analyzable column, vac_update_relstats is not called.
Is this behavior intentional?
- psql can't complete foreign table name after ANALYZE.
- A new parameter has been added to vac_update_relstats in a recent
commit. Perhaps 0 is OK for that parameter.
- ANALYZE without relation name ignores foreign tables because
get_rel_oids doesn't list foreign tables.
- IMO logging "analyzing foo.bar" should not be done in
AnalyzeForeignTable handler of each FDW because some FDW might forget to
do it. Maybe it should be pulled up to analyze_rel or somewhere in core.
- It should be mentioned in a document that foreign tables are not
analyzed automatically because they are read-only.
Regards,
--
Shigeru Hanada
(2011/10/18 2:27), Shigeru Hanada wrote:
The new patch could be applied with some shifts. Regression tests of
core and file_fdw have passed cleanly. Though I've tested only simple
tests, ANALYZE works for foreign tables for file_fdw, and estimation of
costs and selectivity seem appropriate.
Thank you for your testing.
New API AnalyzeForeignTable
===========================
And I think that AnalyzeForeignTable should be optional, and it would be
very useful if a default handler is provided. Probably a default
handler can use basic FDW APIs to acquire sample rows from the result of
"SELECT * FROM foreign_table" with skipping periodically. It won't be
efficient but I think it's not so unreasonable.
I agree with you. However, I think that it is difficult to support such
a default handler in a unified way because there exist external data
sources for which we cannot execute "SELECT * FROM foreign_table", e.g.,
web-accessible DBs limiting full access to the contents.
Other issues
============
There are some other comments about non-critical issues.
- When there is no analyzable column, vac_update_relstats is not called.
Is this behavior intentional?
- psql can't complete foreign table name after ANALYZE.
- A new parameter has been added to vac_update_relstats in a recent
commit. Perhaps 0 is OK for that parameter.
I'll check.
- ANALYZE without relation name ignores foreign tables because
get_rel_oids doesn't list foreign tables.
I think that it might be better to ignore foreign tables by default
because analyzing such tables may take long depending on FDW.
- IMO logging "analyzing foo.bar" should not be done in
AnalyzeForeignTable handler of each FDW because some FDW might forget to
do it. Maybe it should be pulled up to analyze_rel or somewhere in core.
- It should be mentioned in a document that foreign tables are not
analyzed automatically because they are read-only.
OK. I'll revise.
Regards,
Best regards,
Etsuro Fujita
New API AnalyzeForeignTable
I didn't look at the patch, but I'm using CSV foreign tables with named pipes
to get near-realtime KPI calculated by postgresql. Of course, pipes can be
read just once, so I wouldn't want an "automatic analyze" of foreign tables...
Hi,
(2011/10/18 16:32), Leonardo Francalanci wrote:
New API AnalyzeForeignTable
I didn't look at the patch, but I'm using CSV foreign tables with named pipes
to get near-realtime KPI calculated by postgresql. Of course, pipes can be
read just once, so I wouldn't want an "automatic analyze" of foreign tables...
The patch does not analyze on foreign tables automatically. (The issue
of auto-analyze on foreign tables has been discussed. Please refer to [1]http://archives.postgresql.org/pgsql-hackers/2011-09/msg00992.php.)
[1]: http://archives.postgresql.org/pgsql-hackers/2011-09/msg00992.php
Best regards,
Etsuro Fujita
Hi,
I revised the patch according to Hanada-san's comments. Attached is the
updated version of the patch.
Changes:
* pull up of logging "analyzing foo.bar"
* new vac_update_relstats always called
* tab-completion in psql
* add "foreign tables are not analyzed automatically..." to 23.1.3
Updating Planner Statistics
* some other modifications
Best regards,
Etsuro Fujita
Attachments:
postgresql-analyze-v3.patchtext/plain; name=postgresql-analyze-v3.patchDownload
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
***************
*** 15,30 ****
--- 15,42 ----
#include <sys/stat.h>
#include <unistd.h>
+ #include "access/htup.h"
#include "access/reloptions.h"
+ #include "access/transam.h"
#include "catalog/pg_foreign_table.h"
#include "commands/copy.h"
+ #include "commands/dbcommands.h"
#include "commands/defrem.h"
#include "commands/explain.h"
+ #include "commands/vacuum.h"
#include "foreign/fdwapi.h"
#include "foreign/foreign.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "optimizer/cost.h"
+ #include "optimizer/plancat.h"
+ #include "parser/parse_relation.h"
+ #include "pgstat.h"
+ #include "utils/attoptcache.h"
+ #include "utils/elog.h"
+ #include "utils/guc.h"
+ #include "utils/lsyscache.h"
+ #include "utils/memutils.h"
#include "utils/rel.h"
#include "utils/syscache.h"
***************
*** 101,106 **** static void fileBeginForeignScan(ForeignScanState *node, int eflags);
--- 113,119 ----
static TupleTableSlot *fileIterateForeignScan(ForeignScanState *node);
static void fileReScanForeignScan(ForeignScanState *node);
static void fileEndForeignScan(ForeignScanState *node);
+ static void fileAnalyzeForeignTable(Relation onerel, VacuumStmt *vacstmt, int elevel);
/*
* Helper functions
***************
*** 112,118 **** static List *get_file_fdw_attribute_options(Oid relid);
static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
const char *filename,
Cost *startup_cost, Cost *total_cost);
!
/*
* Foreign-data wrapper handler function: return a struct with pointers
--- 125,132 ----
static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
const char *filename,
Cost *startup_cost, Cost *total_cost);
! static void file_fdw_do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, int elevel);
! static int file_fdw_acquire_sample_rows(Relation onerel, int elevel, HeapTuple *rows, int targrows, BlockNumber *totalpages, double *totalrows);
/*
* Foreign-data wrapper handler function: return a struct with pointers
***************
*** 129,134 **** file_fdw_handler(PG_FUNCTION_ARGS)
--- 143,149 ----
fdwroutine->IterateForeignScan = fileIterateForeignScan;
fdwroutine->ReScanForeignScan = fileReScanForeignScan;
fdwroutine->EndForeignScan = fileEndForeignScan;
+ fdwroutine->AnalyzeForeignTable = fileAnalyzeForeignTable;
PG_RETURN_POINTER(fdwroutine);
}
***************
*** 575,580 **** fileReScanForeignScan(ForeignScanState *node)
--- 590,605 ----
}
/*
+ * fileAnalyzeForeignTable
+ * Analyze table
+ */
+ static void
+ fileAnalyzeForeignTable(Relation onerel, VacuumStmt *vacstmt, int elevel)
+ {
+ file_fdw_do_analyze_rel(onerel, vacstmt, elevel);
+ }
+
+ /*
* Estimate costs of scanning a foreign table.
*/
static void
***************
*** 584,590 **** estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
{
struct stat stat_buf;
BlockNumber pages;
! int tuple_width;
double ntuples;
double nrows;
Cost run_cost = 0;
--- 609,616 ----
{
struct stat stat_buf;
BlockNumber pages;
! BlockNumber relpages;
! double reltuples;
double ntuples;
double nrows;
Cost run_cost = 0;
***************
*** 604,619 **** estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
if (pages < 1)
pages = 1;
! /*
! * Estimate the number of tuples in the file. We back into this estimate
! * using the planner's idea of the relation width; which is bogus if not
! * all columns are being read, not to mention that the text representation
! * of a row probably isn't the same size as its internal representation.
! * FIXME later.
! */
! tuple_width = MAXALIGN(baserel->width) + MAXALIGN(sizeof(HeapTupleHeaderData));
! ntuples = clamp_row_est((double) stat_buf.st_size / (double) tuple_width);
/*
* Now estimate the number of rows returned by the scan after applying the
--- 630,661 ----
if (pages < 1)
pages = 1;
! relpages = baserel->pages;
! reltuples = baserel->tuples;
!
! if (relpages > 0)
! {
! double density;
! density = reltuples / (double) relpages;
!
! ntuples = clamp_row_est(density * (double) pages);
! }
! else
! {
! int tuple_width;
!
! /*
! * Estimate the number of tuples in the file. We back into this estimate
! * using the planner's idea of the relation width; which is bogus if not
! * all columns are being read, not to mention that the text representation
! * of a row probably isn't the same size as its internal representation.
! * FIXME later.
! */
! tuple_width = MAXALIGN(baserel->width) + MAXALIGN(sizeof(HeapTupleHeaderData));
!
! ntuples = clamp_row_est((double) stat_buf.st_size / (double) tuple_width);
! }
/*
* Now estimate the number of rows returned by the scan after applying the
***************
*** 645,647 **** estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
--- 687,1046 ----
run_cost += cpu_per_tuple * ntuples;
*total_cost = *startup_cost + run_cost;
}
+
+ /*
+ * file_fdw_do_analyze_rel() -- analyze one foreign table
+ */
+ static void
+ file_fdw_do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, int elevel)
+ {
+ int i,
+ attr_cnt,
+ tcnt,
+ numrows = 0,
+ targrows;
+ double totalrows = 0;
+ BlockNumber totalpages = 0;
+ HeapTuple *rows;
+ VacAttrStats **vacattrstats;
+ MemoryContext anl_context;
+ MemoryContext caller_context;
+
+ /*
+ * Set up a working context so that we can easily free whatever junk gets
+ * created.
+ */
+ anl_context = AllocSetContextCreate(CurrentMemoryContext,
+ "Analyze",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ caller_context = MemoryContextSwitchTo(anl_context);
+
+ /*
+ * Determine which columns to analyze
+ *
+ * Note that system attributes are never analyzed.
+ */
+ if (vacstmt->va_cols != NIL)
+ {
+ ListCell *le;
+
+ vacattrstats = (VacAttrStats **) palloc(list_length(vacstmt->va_cols) *
+ sizeof(VacAttrStats *));
+ tcnt = 0;
+ foreach(le, vacstmt->va_cols)
+ {
+ char *col = strVal(lfirst(le));
+
+ i = attnameAttNum(onerel, col, false);
+ if (i == InvalidAttrNumber)
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_COLUMN),
+ errmsg("column \"%s\" of relation \"%s\" does not exist",
+ col, RelationGetRelationName(onerel))));
+ vacattrstats[tcnt] = examine_attribute(onerel, i, NULL, anl_context);
+ if (vacattrstats[tcnt] != NULL)
+ tcnt++;
+ }
+ attr_cnt = tcnt;
+ }
+ else
+ {
+ attr_cnt = onerel->rd_att->natts;
+ vacattrstats = (VacAttrStats **) palloc(attr_cnt * sizeof(VacAttrStats *));
+ tcnt = 0;
+ for (i = 1; i <= attr_cnt; i++)
+ {
+ vacattrstats[tcnt] = examine_attribute(onerel, i, NULL, anl_context);
+ if (vacattrstats[tcnt] != NULL)
+ tcnt++;
+ }
+ attr_cnt = tcnt;
+ }
+
+ /*
+ * Determine how many rows we need to sample, using the worst case from
+ * all analyzable columns. We use a lower bound of 100 rows to avoid
+ * possible overflow in Vitter's algorithm.
+ */
+ targrows = 100;
+ for (i = 0; i < attr_cnt; i++)
+ {
+ if (targrows < vacattrstats[i]->minrows)
+ targrows = vacattrstats[i]->minrows;
+ }
+
+ /*
+ * Acquire the sample rows
+ */
+ rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
+ numrows = file_fdw_acquire_sample_rows(onerel, elevel, rows, targrows, &totalpages, &totalrows);
+
+ /*
+ * Compute the statistics. Temporary results during the calculations for
+ * each column are stored in a child context. The calc routines are
+ * responsible to make sure that whatever they store into the VacAttrStats
+ * structure is allocated in anl_context.
+ */
+ if (numrows > 0)
+ {
+ MemoryContext col_context, old_context;
+
+ col_context = AllocSetContextCreate(anl_context,
+ "Analyze Column",
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+ old_context = MemoryContextSwitchTo(col_context);
+
+ for (i = 0; i < attr_cnt; i++)
+ {
+ VacAttrStats *stats = vacattrstats[i];
+ AttributeOpts *aopt = get_attribute_options(onerel->rd_id, stats->attr->attnum);
+
+ stats->rows = rows;
+ stats->tupDesc = onerel->rd_att;
+ (*stats->compute_stats) (stats,
+ std_fetch_func,
+ numrows,
+ totalrows);
+
+ /*
+ * If the appropriate flavor of the n_distinct option is
+ * specified, override with the corresponding value.
+ */
+ if (aopt != NULL)
+ {
+ float8 n_distinct = aopt->n_distinct;
+
+ if (n_distinct != 0.0)
+ stats->stadistinct = n_distinct;
+ }
+
+ MemoryContextResetAndDeleteChildren(col_context);
+ }
+
+ MemoryContextSwitchTo(old_context);
+ MemoryContextDelete(col_context);
+
+ /*
+ * Emit the completed stats rows into pg_statistic, replacing any
+ * previous statistics for the target columns. (If there are stats in
+ * pg_statistic for columns we didn't process, we leave them alone.)
+ */
+ update_attstats(onerel->rd_id, false, attr_cnt, vacattrstats);
+ }
+
+ /*
+ * Update pages/tuples stats in pg_class.
+ */
+ vac_update_relstats(onerel, totalpages, totalrows, 0, false, InvalidTransactionId);
+
+ /*
+ * Report ANALYZE to the stats collector, too.
+ */
+ pgstat_report_analyze(onerel, totalrows, 0);
+
+ /* Restore current context and release memory */
+ MemoryContextSwitchTo(caller_context);
+ MemoryContextDelete(anl_context);
+ anl_context = NULL;
+ }
+
+ /*
+ * file_fdw_acquire_sample_rows -- acquire a random sample of rows from the table
+ *
+ * Selected rows are returned in the caller-allocated array rows[], which
+ * must have at least targrows entries.
+ * The actual number of rows selected is returned as the function result.
+ * We also count the number of rows in the table, and return it into *totalrows.
+ *
+ * The returned list of tuples is in order by physical position in the table.
+ * (We will rely on this later to derive correlation estimates.)
+ */
+ static int
+ file_fdw_acquire_sample_rows(Relation onerel, int elevel, HeapTuple *rows, int targrows, BlockNumber *totalpages, double *totalrows)
+ {
+ int numrows = 0;
+ double samplerows = 0; /* total # rows collected */
+ double rowstoskip = -1; /* -1 means not set yet */
+ double rstate;
+ HeapTuple tuple;
+ TupleDesc tupDesc;
+ TupleConstr *constr;
+ int natts;
+ int attrChk;
+ Datum *values;
+ bool *nulls;
+ bool found;
+ bool sample_it = false;
+ BlockNumber blknum;
+ OffsetNumber offnum;
+ char *filename;
+ struct stat stat_buf;
+ List *options;
+ CopyState cstate;
+ ErrorContextCallback errcontext;
+
+ Assert(onerel);
+ Assert(targrows > 0);
+
+ tupDesc = RelationGetDescr(onerel);
+ constr = tupDesc->constr;
+ natts = tupDesc->natts;
+ values = (Datum *) palloc(tupDesc->natts * sizeof(Datum));
+ nulls = (bool *) palloc(tupDesc->natts * sizeof(bool));
+
+ /* Fetch options of foreign table */
+ fileGetOptions(RelationGetRelid(onerel), &filename, &options);
+
+ /*
+ * Get size of the file.
+ */
+ if (stat(filename, &stat_buf) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ filename)));
+
+ /*
+ * Convert size to pages for use in I/O cost estimate.
+ */
+ *totalpages = (stat_buf.st_size + (BLCKSZ - 1)) / BLCKSZ;
+ if (*totalpages < 1)
+ *totalpages = 1;
+
+ /*
+ * Create CopyState from FDW options. We always acquire all columns, so
+ * as to match the expected ScanTupleSlot signature.
+ */
+ cstate = BeginCopyFrom(onerel, filename, NIL, options);
+
+ /* Prepare for sampling rows */
+ rstate = init_selection_state(targrows);
+
+ /* Set up callback to identify error line number. */
+ errcontext.callback = CopyFromErrorCallback;
+ errcontext.arg = (void *) cstate;
+ errcontext.previous = error_context_stack;
+ error_context_stack = &errcontext;
+
+ for (;;)
+ {
+ sample_it = true;
+
+ CHECK_FOR_INTERRUPTS();
+
+ found = NextCopyFrom(cstate, NULL, values, nulls, NULL);
+
+ if (!found)
+ break;
+
+ tuple = heap_form_tuple(tupDesc, values, nulls);
+
+ if (constr && constr->has_not_null)
+ {
+ for (attrChk = 1; attrChk <= natts; attrChk++)
+ {
+ if (onerel->rd_att->attrs[attrChk - 1]->attnotnull &&
+ !(cstate->force_notnull_flags[attrChk - 1]) &&
+ heap_attisnull(tuple, attrChk))
+ {
+ sample_it = false;
+ break;
+ }
+ }
+ }
+
+ if (!sample_it)
+ {
+ heap_freetuple(tuple);
+ continue;
+ }
+
+ /*
+ * The first targrows sample rows are simply copied into the
+ * reservoir. Then we start replacing tuples in the sample
+ * until we reach the end of the relation. This algorithm is
+ * from Jeff Vitter's paper (see full citation below). It
+ * works by repeatedly computing the number of tuples to skip
+ * before selecting a tuple, which replaces a randomly chosen
+ * element of the reservoir (current set of tuples). At all
+ * times the reservoir is a true random sample of the tuples
+ * we've passed over so far, so when we fall off the end of
+ * the relation we're done.
+ */
+ if (numrows < targrows)
+ {
+ blknum = (BlockNumber) samplerows / MaxOffsetNumber;
+ offnum = (OffsetNumber) samplerows % MaxOffsetNumber + 1;
+ ItemPointerSet(&tuple->t_self, blknum, offnum);
+ rows[numrows++] = heap_copytuple(tuple);
+ }
+ else
+ {
+ /*
+ * t in Vitter's paper is the number of records already
+ * processed. If we need to compute a new S value, we
+ * must use the not-yet-incremented value of samplerows as
+ * t.
+ */
+ if (rowstoskip < 0)
+ rowstoskip = get_next_S(samplerows, targrows, &rstate);
+
+ if (rowstoskip <= 0)
+ {
+ /*
+ * Found a suitable tuple, so save it, replacing one
+ * old tuple at random
+ */
+ int k = (int) (targrows * random_fract());
+
+ Assert(k >= 0 && k < targrows);
+ heap_freetuple(rows[k]);
+
+ blknum = (BlockNumber) samplerows / MaxOffsetNumber;
+ offnum = (OffsetNumber) samplerows % MaxOffsetNumber + 1;
+ ItemPointerSet(&tuple->t_self, blknum, offnum);
+ rows[k] = heap_copytuple(tuple);
+ }
+
+ rowstoskip -= 1;
+ }
+
+ samplerows += 1;
+ heap_freetuple(tuple);
+ }
+
+ /* Remove error callback. */
+ error_context_stack = errcontext.previous;
+
+ /*
+ * If we didn't find as many tuples as we wanted then we're done. No sort
+ * is needed, since they're already in order.
+ *
+ * Otherwise we need to sort the collected tuples by position
+ * (itempointer). It's not worth worrying about corner cases where the
+ * tuples are already sorted.
+ */
+ if (numrows == targrows)
+ qsort((void *) rows, numrows, sizeof(HeapTuple), compare_rows);
+
+ *totalrows = samplerows;
+
+ EndCopyFrom(cstate);
+
+ pfree(values);
+ pfree(nulls);
+
+ /*
+ * Emit some interesting relation info
+ */
+ ereport(elevel,
+ (errmsg("\"%s\": scanned, "
+ "%d rows in sample, %d total rows",
+ RelationGetRelationName(onerel), numrows, (int) *totalrows)));
+
+ return numrows;
+ }
*** a/contrib/file_fdw/input/file_fdw.source
--- b/contrib/file_fdw/input/file_fdw.source
***************
*** 111,116 **** EXECUTE st(100);
--- 111,121 ----
EXECUTE st(100);
DEALLOCATE st;
+ -- statistics collection tests
+ ANALYZE agg_csv;
+ SELECT relpages, reltuples FROM pg_class WHERE relname = 'agg_csv';
+ SELECT * FROM pg_stats WHERE tablename = 'agg_csv';
+
-- tableoid
SELECT tableoid::regclass, b FROM agg_csv;
*** a/contrib/file_fdw/output/file_fdw.source
--- b/contrib/file_fdw/output/file_fdw.source
***************
*** 174,179 **** EXECUTE st(100);
--- 174,194 ----
(1 row)
DEALLOCATE st;
+ -- statistics collection tests
+ ANALYZE agg_csv;
+ SELECT relpages, reltuples FROM pg_class WHERE relname = 'agg_csv';
+ relpages | reltuples
+ ----------+-----------
+ 1 | 3
+ (1 row)
+
+ SELECT * FROM pg_stats WHERE tablename = 'agg_csv';
+ schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
+ ------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+-------------------------+-------------
+ public | agg_csv | a | f | 0 | 2 | -1 | | | {0,42,100} | -0.5
+ public | agg_csv | b | f | 0 | 4 | -1 | | | {0.09561,99.097,324.78} | 0.5
+ (2 rows)
+
-- tableoid
SELECT tableoid::regclass, b FROM agg_csv;
tableoid | b
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***************
*** 228,233 **** EndForeignScan (ForeignScanState *node);
--- 228,246 ----
</para>
<para>
+ <programlisting>
+ void
+ AnalyzeForeignTable (Relation onerel,
+ VacuumStmt *vacstmt,
+ int elevel);
+ </programlisting>
+
+ Collect statistics on a foreign table and store the results in the
+ pg_class and pg_statistics system catalogs.
+ This is called when <command>ANALYZE</> command is run.
+ </para>
+
+ <para>
The <structname>FdwRoutine</> and <structname>FdwPlan</> struct types
are declared in <filename>src/include/foreign/fdwapi.h</>, which see
for additional details.
*** a/doc/src/sgml/maintenance.sgml
--- b/doc/src/sgml/maintenance.sgml
***************
*** 279,284 ****
--- 279,288 ----
<command>ANALYZE</> strictly as a function of the number of rows
inserted or updated; it has no knowledge of whether that will lead
to meaningful statistical changes.
+ Note that the autovacuum daemon does not issue <command>ANALYZE</>
+ commands on foreign tables. It is recommended to run manually-managed
+ <command>ANALYZE</> commands as needed, which typically are executed
+ according to a schedule by cron or Task Scheduler scripts.
</para>
<para>
*** a/doc/src/sgml/ref/alter_foreign_table.sgml
--- b/doc/src/sgml/ref/alter_foreign_table.sgml
***************
*** 36,41 **** ALTER FOREIGN TABLE <replaceable class="PARAMETER">name</replaceable>
--- 36,44 ----
DROP [ COLUMN ] [ IF EXISTS ] <replaceable class="PARAMETER">column</replaceable> [ RESTRICT | CASCADE ]
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> [ SET DATA ] TYPE <replaceable class="PARAMETER">type</replaceable>
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> { SET | DROP } NOT NULL
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> SET STATISTICS <replaceable class="PARAMETER">integer</replaceable>
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> SET ( <replaceable class="PARAMETER">attribute_option</replaceable> = <replaceable class="PARAMETER">value</replaceable> [, ... ] )
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> RESET ( <replaceable class="PARAMETER">attribute_option</replaceable> [, ... ] )
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> OPTIONS ( [ ADD | SET | DROP ] <replaceable class="PARAMETER">option</replaceable> ['<replaceable class="PARAMETER">value</replaceable>'] [, ... ])
OWNER TO <replaceable class="PARAMETER">new_owner</replaceable>
OPTIONS ( [ ADD | SET | DROP ] <replaceable class="PARAMETER">option</replaceable> ['<replaceable class="PARAMETER">value</replaceable>'] [, ... ])
***************
*** 94,99 **** ALTER FOREIGN TABLE <replaceable class="PARAMETER">name</replaceable>
--- 97,146 ----
</varlistentry>
<varlistentry>
+ <term><literal>SET STATISTICS</literal></term>
+ <listitem>
+ <para>
+ This form
+ sets the per-column statistics-gathering target for subsequent
+ <xref linkend="sql-analyze"> operations.
+ The target can be set in the range 0 to 10000; alternatively, set it
+ to -1 to revert to using the system default statistics
+ target (<xref linkend="guc-default-statistics-target">).
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>SET ( <replaceable class="PARAMETER">attribute_option</replaceable> = <replaceable class="PARAMETER">value</replaceable> [, ... ] )</literal></term>
+ <term><literal>RESET ( <replaceable class="PARAMETER">attribute_option</replaceable> [, ... ] )</literal></term>
+ <listitem>
+ <para>
+ This form
+ sets or resets a per-attribute option. Currently, the only defined
+ per-attribute option is <literal>n_distinct</>, which overrides
+ the number-of-distinct-values estimates made by subsequent
+ <xref linkend="sql-analyze"> operations.
+ When set to a positive value, <command>ANALYZE</> will assume that
+ the column contains exactly the specified number of distinct nonnull
+ values.
+ When set to a negative value, which must be greater than or equal
+ to -1, <command>ANALYZE</> will assume that the number of distinct
+ nonnull values in the column is linear in the size of the foreign
+ table; the exact count is to be computed by multiplying the estimated
+ foreign table size by the absolute value of the given number.
+ For example,
+ a value of -1 implies that all values in the column are distinct,
+ while a value of -0.5 implies that each value appears twice on the
+ average.
+ This can be useful when the size of the foreign table changes over
+ time, since the multiplication by the number of rows in the foreign
+ table is not performed until query planning time. Specify a value
+ of 0 to revert to estimating the number of distinct values normally.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>OWNER</literal></term>
<listitem>
<para>
*** a/doc/src/sgml/ref/analyze.sgml
--- b/doc/src/sgml/ref/analyze.sgml
***************
*** 39,47 **** ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ ( <re
<para>
With no parameter, <command>ANALYZE</command> examines every table in the
! current database. With a parameter, <command>ANALYZE</command> examines
! only that table. It is further possible to give a list of column names,
! in which case only the statistics for those columns are collected.
</para>
</refsect1>
--- 39,48 ----
<para>
With no parameter, <command>ANALYZE</command> examines every table in the
! current database except for foreign tables. With a parameter, <command>
! ANALYZE</command> examines only that table. It is further possible to
! give a list of column names, in which case only the statistics for those
! columns are collected.
</para>
</refsect1>
***************
*** 63,69 **** ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ ( <re
<listitem>
<para>
The name (possibly schema-qualified) of a specific table to
! analyze. Defaults to all tables in the current database.
</para>
</listitem>
</varlistentry>
--- 64,71 ----
<listitem>
<para>
The name (possibly schema-qualified) of a specific table to
! analyze. Defaults to all tables in the current database except
! for foreign tables.
</para>
</listitem>
</varlistentry>
***************
*** 137,143 **** ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ ( <re
In rare situations, this non-determinism will cause the planner's
choices of query plans to change after <command>ANALYZE</command> is run.
To avoid this, raise the amount of statistics collected by
! <command>ANALYZE</command>, as described below.
</para>
<para>
--- 139,147 ----
In rare situations, this non-determinism will cause the planner's
choices of query plans to change after <command>ANALYZE</command> is run.
To avoid this, raise the amount of statistics collected by
! <command>ANALYZE</command>, as described below. Note that the time
! needed to analyze on foreign tables depends on the implementation of
! the foreign data wrapper via which such tables are attached.
</para>
<para>
*** a/src/backend/commands/analyze.c
--- b/src/backend/commands/analyze.c
***************
*** 23,28 ****
--- 23,29 ----
#include "access/xact.h"
#include "catalog/index.h"
#include "catalog/indexing.h"
+ #include "catalog/pg_class.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_namespace.h"
***************
*** 30,35 ****
--- 31,38 ----
#include "commands/tablecmds.h"
#include "commands/vacuum.h"
#include "executor/executor.h"
+ #include "foreign/foreign.h"
+ #include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "parser/parse_oper.h"
***************
*** 94,113 **** static void compute_index_stats(Relation onerel, double totalrows,
AnlIndexData *indexdata, int nindexes,
HeapTuple *rows, int numrows,
MemoryContext col_context);
- static VacAttrStats *examine_attribute(Relation onerel, int attnum,
- Node *index_expr);
static int acquire_sample_rows(Relation onerel, HeapTuple *rows,
int targrows, double *totalrows, double *totaldeadrows);
- static double random_fract(void);
- static double init_selection_state(int n);
- static double get_next_S(double t, int n, double *stateptr);
- static int compare_rows(const void *a, const void *b);
static int acquire_inherited_sample_rows(Relation onerel,
HeapTuple *rows, int targrows,
double *totalrows, double *totaldeadrows);
- static void update_attstats(Oid relid, bool inh,
- int natts, VacAttrStats **vacattrstats);
- static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
static bool std_typanalyze(VacAttrStats *stats);
--- 97,107 ----
***************
*** 184,193 **** analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
}
/*
! * Check that it's a plain table; we used to do this in get_rel_oids() but
! * seems safer to check after we've locked the relation.
*/
! if (onerel->rd_rel->relkind != RELKIND_RELATION)
{
/* No need for a WARNING if we already complained during VACUUM */
if (!(vacstmt->options & VACOPT_VACUUM))
--- 178,188 ----
}
/*
! * Check that it's a plain table or foreign table; we used to do this in
! * get_rel_oids() but seems safer to check after we've locked the relation.
*/
! if (!(onerel->rd_rel->relkind == RELKIND_RELATION ||
! onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE))
{
/* No need for a WARNING if we already complained during VACUUM */
if (!(vacstmt->options & VACOPT_VACUUM))
***************
*** 226,241 **** analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
MyProc->vacuumFlags |= PROC_IN_ANALYZE;
LWLockRelease(ProcArrayLock);
! /*
! * Do the normal non-recursive ANALYZE.
! */
! do_analyze_rel(onerel, vacstmt, false);
! /*
! * If there are child tables, do recursive ANALYZE.
! */
! if (onerel->rd_rel->relhassubclass)
! do_analyze_rel(onerel, vacstmt, true);
/*
* Close source relation now, but keep lock so that no one deletes it
--- 221,251 ----
MyProc->vacuumFlags |= PROC_IN_ANALYZE;
LWLockRelease(ProcArrayLock);
! if (onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
! {
! FdwRoutine *fdwroutine;
! ereport(elevel,
! (errmsg("analyzing \"%s.%s\"",
! get_namespace_name(RelationGetNamespace(onerel)),
! RelationGetRelationName(onerel))));
!
! fdwroutine = GetFdwRoutineByRelId(RelationGetRelid(onerel));
! fdwroutine->AnalyzeForeignTable(onerel, vacstmt, elevel);
! }
! else
! {
! /*
! * Do the normal non-recursive ANALYZE.
! */
! do_analyze_rel(onerel, vacstmt, false);
!
! /*
! * If there are child tables, do recursive ANALYZE.
! */
! if (onerel->rd_rel->relhassubclass)
! do_analyze_rel(onerel, vacstmt, true);
! }
/*
* Close source relation now, but keep lock so that no one deletes it
***************
*** 343,349 **** do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
(errcode(ERRCODE_UNDEFINED_COLUMN),
errmsg("column \"%s\" of relation \"%s\" does not exist",
col, RelationGetRelationName(onerel))));
! vacattrstats[tcnt] = examine_attribute(onerel, i, NULL);
if (vacattrstats[tcnt] != NULL)
tcnt++;
}
--- 353,359 ----
(errcode(ERRCODE_UNDEFINED_COLUMN),
errmsg("column \"%s\" of relation \"%s\" does not exist",
col, RelationGetRelationName(onerel))));
! vacattrstats[tcnt] = examine_attribute(onerel, i, NULL, anl_context);
if (vacattrstats[tcnt] != NULL)
tcnt++;
}
***************
*** 357,363 **** do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
tcnt = 0;
for (i = 1; i <= attr_cnt; i++)
{
! vacattrstats[tcnt] = examine_attribute(onerel, i, NULL);
if (vacattrstats[tcnt] != NULL)
tcnt++;
}
--- 367,373 ----
tcnt = 0;
for (i = 1; i <= attr_cnt; i++)
{
! vacattrstats[tcnt] = examine_attribute(onerel, i, NULL, anl_context);
if (vacattrstats[tcnt] != NULL)
tcnt++;
}
***************
*** 411,417 **** do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
indexkey = (Node *) lfirst(indexpr_item);
indexpr_item = lnext(indexpr_item);
thisdata->vacattrstats[tcnt] =
! examine_attribute(Irel[ind], i + 1, indexkey);
if (thisdata->vacattrstats[tcnt] != NULL)
tcnt++;
}
--- 421,427 ----
indexkey = (Node *) lfirst(indexpr_item);
indexpr_item = lnext(indexpr_item);
thisdata->vacattrstats[tcnt] =
! examine_attribute(Irel[ind], i + 1, indexkey, anl_context);
if (thisdata->vacattrstats[tcnt] != NULL)
tcnt++;
}
***************
*** 807,814 **** compute_index_stats(Relation onerel, double totalrows,
* If index_expr isn't NULL, then we're trying to analyze an expression index,
* and index_expr is the expression tree representing the column's data.
*/
! static VacAttrStats *
! examine_attribute(Relation onerel, int attnum, Node *index_expr)
{
Form_pg_attribute attr = onerel->rd_att->attrs[attnum - 1];
HeapTuple typtuple;
--- 817,824 ----
* If index_expr isn't NULL, then we're trying to analyze an expression index,
* and index_expr is the expression tree representing the column's data.
*/
! VacAttrStats *
! examine_attribute(Relation onerel, int attnum, Node *index_expr, MemoryContext anl_context)
{
Form_pg_attribute attr = onerel->rd_att->attrs[attnum - 1];
HeapTuple typtuple;
***************
*** 1254,1260 **** acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
}
/* Select a random value R uniformly distributed in (0 - 1) */
! static double
random_fract(void)
{
return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
--- 1264,1270 ----
}
/* Select a random value R uniformly distributed in (0 - 1) */
! double
random_fract(void)
{
return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
***************
*** 1274,1287 **** random_fract(void)
* determines the number of records to skip before the next record is
* processed.
*/
! static double
init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
return exp(-log(random_fract()) / n);
}
! static double
get_next_S(double t, int n, double *stateptr)
{
double S;
--- 1284,1297 ----
* determines the number of records to skip before the next record is
* processed.
*/
! double
init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
return exp(-log(random_fract()) / n);
}
! double
get_next_S(double t, int n, double *stateptr)
{
double S;
***************
*** 1366,1372 **** get_next_S(double t, int n, double *stateptr)
/*
* qsort comparator for sorting rows[] array
*/
! static int
compare_rows(const void *a, const void *b)
{
HeapTuple ha = *(const HeapTuple *) a;
--- 1376,1382 ----
/*
* qsort comparator for sorting rows[] array
*/
! int
compare_rows(const void *a, const void *b)
{
HeapTuple ha = *(const HeapTuple *) a;
***************
*** 1561,1567 **** acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
* ANALYZE the same table concurrently. Presently, we lock that out
* by taking a self-exclusive lock on the relation in analyze_rel().
*/
! static void
update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
{
Relation sd;
--- 1571,1577 ----
* ANALYZE the same table concurrently. Presently, we lock that out
* by taking a self-exclusive lock on the relation in analyze_rel().
*/
! void
update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
{
Relation sd;
***************
*** 1698,1704 **** update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
* This exists to provide some insulation between compute_stats routines
* and the actual storage of the sample data.
*/
! static Datum
std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull)
{
int attnum = stats->tupattnum;
--- 1708,1714 ----
* This exists to provide some insulation between compute_stats routines
* and the actual storage of the sample data.
*/
! Datum
std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull)
{
int attnum = stats->tupattnum;
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
***************
*** 42,192 ****
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
- #include "utils/rel.h"
#include "utils/snapmgr.h"
#define ISOCTAL(c) (((c) >= '0') && ((c) <= '7'))
#define OCTVALUE(c) ((c) - '0')
- /*
- * Represents the different source/dest cases we need to worry about at
- * the bottom level
- */
- typedef enum CopyDest
- {
- COPY_FILE, /* to/from file */
- COPY_OLD_FE, /* to/from frontend (2.0 protocol) */
- COPY_NEW_FE /* to/from frontend (3.0 protocol) */
- } CopyDest;
-
- /*
- * Represents the end-of-line terminator type of the input
- */
- typedef enum EolType
- {
- EOL_UNKNOWN,
- EOL_NL,
- EOL_CR,
- EOL_CRNL
- } EolType;
-
- /*
- * This struct contains all the state variables used throughout a COPY
- * operation. For simplicity, we use the same struct for all variants of COPY,
- * even though some fields are used in only some cases.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is TRUE
- * when we have to do it the hard way.
- */
- typedef struct CopyStateData
- {
- /* low-level state data */
- CopyDest copy_dest; /* type of copy source/destination */
- FILE *copy_file; /* used if copy_dest == COPY_FILE */
- StringInfo fe_msgbuf; /* used for all dests during COPY TO, only for
- * dest == COPY_NEW_FE in COPY FROM */
- bool fe_eof; /* true if detected end of copy data */
- EolType eol_type; /* EOL type of input */
- int file_encoding; /* file or remote side's character encoding */
- bool need_transcoding; /* file encoding diff from server? */
- bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
-
- /* parameters from the COPY command */
- Relation rel; /* relation to copy to or from */
- QueryDesc *queryDesc; /* executable query to copy from */
- List *attnumlist; /* integer list of attnums to copy */
- char *filename; /* filename, or NULL for STDIN/STDOUT */
- bool binary; /* binary format? */
- bool oids; /* include OIDs? */
- bool csv_mode; /* Comma Separated Value format? */
- bool header_line; /* CSV header line? */
- char *null_print; /* NULL marker string (server encoding!) */
- int null_print_len; /* length of same */
- char *null_print_client; /* same converted to file encoding */
- char *delim; /* column delimiter (must be 1 byte) */
- char *quote; /* CSV quote char (must be 1 byte) */
- char *escape; /* CSV escape char (must be 1 byte) */
- List *force_quote; /* list of column names */
- bool force_quote_all; /* FORCE QUOTE *? */
- bool *force_quote_flags; /* per-column CSV FQ flags */
- List *force_notnull; /* list of column names */
- bool *force_notnull_flags; /* per-column CSV FNN flags */
-
- /* these are just for error messages, see CopyFromErrorCallback */
- const char *cur_relname; /* table name for error messages */
- int cur_lineno; /* line number for error messages */
- const char *cur_attname; /* current att for error messages */
- const char *cur_attval; /* current att value for error messages */
-
- /*
- * Working state for COPY TO/FROM
- */
- MemoryContext copycontext; /* per-copy execution context */
-
- /*
- * Working state for COPY TO
- */
- FmgrInfo *out_functions; /* lookup info for output functions */
- MemoryContext rowcontext; /* per-row evaluation context */
-
- /*
- * Working state for COPY FROM
- */
- AttrNumber num_defaults;
- bool file_has_oids;
- FmgrInfo oid_in_function;
- Oid oid_typioparam;
- FmgrInfo *in_functions; /* array of input functions for each attrs */
- Oid *typioparams; /* array of element types for in_functions */
- int *defmap; /* array of default att numbers */
- ExprState **defexprs; /* array of default att expressions */
-
- /*
- * These variables are used to reduce overhead in textual COPY FROM.
- *
- * attribute_buf holds the separated, de-escaped text for each field of
- * the current line. The CopyReadAttributes functions return arrays of
- * pointers into this buffer. We avoid palloc/pfree overhead by re-using
- * the buffer on each cycle.
- */
- StringInfoData attribute_buf;
-
- /* field raw data pointers found by COPY FROM */
-
- int max_fields;
- char **raw_fields;
-
- /*
- * Similarly, line_buf holds the whole input line being processed. The
- * input cycle is first to read the whole line into line_buf, convert it
- * to server encoding there, and then extract the individual attribute
- * fields into attribute_buf. line_buf is preserved unmodified so that we
- * can display it in error messages if appropriate.
- */
- StringInfoData line_buf;
- bool line_buf_converted; /* converted to server encoding? */
-
- /*
- * Finally, raw_buf holds raw data read from the data source (file or
- * client connection). CopyReadLine parses this data sufficiently to
- * locate line boundaries, then transfers the data to line_buf and
- * converts it. Note: we guarantee that there is a \0 at
- * raw_buf[raw_buf_len].
- */
- #define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
- char *raw_buf;
- int raw_buf_index; /* next byte to process */
- int raw_buf_len; /* total # of bytes stored */
- } CopyStateData;
/* DestReceiver for COPY (SELECT) TO */
typedef struct
--- 42,53 ----
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***************
*** 311,316 **** static void ATPrepSetStatistics(Relation rel, const char *colName,
--- 311,318 ----
Node *newValue, LOCKMODE lockmode);
static void ATExecSetStatistics(Relation rel, const char *colName,
Node *newValue, LOCKMODE lockmode);
+ static void ATPrepSetOptions(Relation rel, const char *colName,
+ Node *options, LOCKMODE lockmode);
static void ATExecSetOptions(Relation rel, const char *colName,
Node *options, bool isReset, LOCKMODE lockmode);
static void ATExecSetStorage(Relation rel, const char *colName,
***************
*** 2886,2892 **** ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
break;
case AT_SetOptions: /* ALTER COLUMN SET ( options ) */
case AT_ResetOptions: /* ALTER COLUMN RESET ( options ) */
! ATSimplePermissions(rel, ATT_TABLE | ATT_INDEX);
/* This command never recurses */
pass = AT_PASS_MISC;
break;
--- 2888,2895 ----
break;
case AT_SetOptions: /* ALTER COLUMN SET ( options ) */
case AT_ResetOptions: /* ALTER COLUMN RESET ( options ) */
! ATSimplePermissions(rel, ATT_TABLE | ATT_INDEX | ATT_FOREIGN_TABLE);
! ATPrepSetOptions(rel, cmd->name, cmd->def, lockmode);
/* This command never recurses */
pass = AT_PASS_MISC;
break;
***************
*** 4822,4831 **** ATPrepSetStatistics(Relation rel, const char *colName, Node *newValue, LOCKMODE
* allowSystemTableMods to be turned on.
*/
if (rel->rd_rel->relkind != RELKIND_RELATION &&
! rel->rd_rel->relkind != RELKIND_INDEX)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
! errmsg("\"%s\" is not a table or index",
RelationGetRelationName(rel))));
/* Permissions checks */
--- 4825,4835 ----
* allowSystemTableMods to be turned on.
*/
if (rel->rd_rel->relkind != RELKIND_RELATION &&
! rel->rd_rel->relkind != RELKIND_INDEX &&
! rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
! errmsg("\"%s\" is not a table, index, or foreign table",
RelationGetRelationName(rel))));
/* Permissions checks */
***************
*** 4893,4898 **** ATExecSetStatistics(Relation rel, const char *colName, Node *newValue, LOCKMODE
--- 4897,4923 ----
heap_close(attrelation, RowExclusiveLock);
}
+
+ static void
+ ATPrepSetOptions(Relation rel, const char *colName, Node *options,
+ LOCKMODE lockmode)
+ {
+ if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ {
+ ListCell *cell;
+
+ foreach(cell, (List *) options)
+ {
+ DefElem *def = (DefElem *) lfirst(cell);
+
+ if (pg_strncasecmp(def->defname, "n_distinct_inherited", strlen("n_distinct_inherited")) == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot support option \"n_distinct_inherited\" for foreign tables")));
+ }
+ }
+ }
+
static void
ATExecSetOptions(Relation rel, const char *colName, Node *options,
bool isReset, LOCKMODE lockmode)
*** a/src/bin/psql/tab-complete.c
--- b/src/bin/psql/tab-complete.c
***************
*** 399,404 **** static const SchemaQuery Query_for_list_of_tsvf = {
--- 399,419 ----
NULL
};
+ static const SchemaQuery Query_for_list_of_tf = {
+ /* catname */
+ "pg_catalog.pg_class c",
+ /* selcondition */
+ "c.relkind IN ('r', 'f')",
+ /* viscondition */
+ "pg_catalog.pg_table_is_visible(c.oid)",
+ /* namespace */
+ "c.relnamespace",
+ /* result */
+ "pg_catalog.quote_ident(c.relname)",
+ /* qualresult */
+ NULL
+ };
+
static const SchemaQuery Query_for_list_of_views = {
/* catname */
"pg_catalog.pg_class c",
***************
*** 2755,2761 **** psql_completion(char *text, int start, int end)
/* ANALYZE */
/* If the previous word is ANALYZE, produce list of tables */
else if (pg_strcasecmp(prev_wd, "ANALYZE") == 0)
! COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);
/* WHERE */
/* Simple case of the word before the where being the table name */
--- 2770,2776 ----
/* ANALYZE */
/* If the previous word is ANALYZE, produce list of tables */
else if (pg_strcasecmp(prev_wd, "ANALYZE") == 0)
! COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tf, NULL);
/* WHERE */
/* Simple case of the word before the where being the table name */
*** a/src/include/commands/copy.h
--- b/src/include/commands/copy.h
***************
*** 14,22 ****
--- 14,168 ----
#ifndef COPY_H
#define COPY_H
+ #include "access/attnum.h"
+ #include "executor/execdesc.h"
+ #include "fmgr.h"
+ #include "lib/stringinfo.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
+ #include "nodes/pg_list.h"
#include "tcop/dest.h"
+ #include "utils/palloc.h"
+ #include "utils/rel.h"
+
+ /*
+ * Represents the different source/dest cases we need to worry about at
+ * the bottom level
+ */
+ typedef enum CopyDest
+ {
+ COPY_FILE, /* to/from file */
+ COPY_OLD_FE, /* to/from frontend (2.0 protocol) */
+ COPY_NEW_FE /* to/from frontend (3.0 protocol) */
+ } CopyDest;
+
+ /*
+ * Represents the end-of-line terminator type of the input
+ */
+ typedef enum EolType
+ {
+ EOL_UNKNOWN,
+ EOL_NL,
+ EOL_CR,
+ EOL_CRNL
+ } EolType;
+
+ /*
+ * This struct contains all the state variables used throughout a COPY
+ * operation. For simplicity, we use the same struct for all variants of COPY,
+ * even though some fields are used in only some cases.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is TRUE
+ * when we have to do it the hard way.
+ */
+ typedef struct CopyStateData
+ {
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ FILE *copy_file; /* used if copy_dest == COPY_FILE */
+ StringInfo fe_msgbuf; /* used for all dests during COPY TO, only for
+ * dest == COPY_NEW_FE in COPY FROM */
+ bool fe_eof; /* true if detected end of copy data */
+ EolType eol_type; /* EOL type of input */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ Relation rel; /* relation to copy to or from */
+ QueryDesc *queryDesc; /* executable query to copy from */
+ List *attnumlist; /* integer list of attnums to copy */
+ char *filename; /* filename, or NULL for STDIN/STDOUT */
+ bool binary; /* binary format? */
+ bool oids; /* include OIDs? */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ char *null_print; /* NULL marker string (server encoding!) */
+ int null_print_len; /* length of same */
+ char *null_print_client; /* same converted to file encoding */
+ char *delim; /* column delimiter (must be 1 byte) */
+ char *quote; /* CSV quote char (must be 1 byte) */
+ char *escape; /* CSV escape char (must be 1 byte) */
+ List *force_quote; /* list of column names */
+ bool force_quote_all; /* FORCE QUOTE *? */
+ bool *force_quote_flags; /* per-column CSV FQ flags */
+ List *force_notnull; /* list of column names */
+ bool *force_notnull_flags; /* per-column CSV FNN flags */
+
+ /* these are just for error messages, see CopyFromErrorCallback */
+ const char *cur_relname; /* table name for error messages */
+ int cur_lineno; /* line number for error messages */
+ const char *cur_attname; /* current att for error messages */
+ const char *cur_attval; /* current att value for error messages */
+
+ /*
+ * Working state for COPY TO/FROM
+ */
+ MemoryContext copycontext; /* per-copy execution context */
+
+ /*
+ * Working state for COPY TO
+ */
+ FmgrInfo *out_functions; /* lookup info for output functions */
+ MemoryContext rowcontext; /* per-row evaluation context */
+
+ /*
+ * Working state for COPY FROM
+ */
+ AttrNumber num_defaults;
+ bool file_has_oids;
+ FmgrInfo oid_in_function;
+ Oid oid_typioparam;
+ FmgrInfo *in_functions; /* array of input functions for each attrs */
+ Oid *typioparams; /* array of element types for in_functions */
+ int *defmap; /* array of default att numbers */
+ ExprState **defexprs; /* array of default att expressions */
+
+ /*
+ * These variables are used to reduce overhead in textual COPY FROM.
+ *
+ * attribute_buf holds the separated, de-escaped text for each field of
+ * the current line. The CopyReadAttributes functions return arrays of
+ * pointers into this buffer. We avoid palloc/pfree overhead by re-using
+ * the buffer on each cycle.
+ */
+ StringInfoData attribute_buf;
+
+ /* field raw data pointers found by COPY FROM */
+
+ int max_fields;
+ char **raw_fields;
+
+ /*
+ * Similarly, line_buf holds the whole input line being processed. The
+ * input cycle is first to read the whole line into line_buf, convert it
+ * to server encoding there, and then extract the individual attribute
+ * fields into attribute_buf. line_buf is preserved unmodified so that we
+ * can display it in error messages if appropriate.
+ */
+ StringInfoData line_buf;
+ bool line_buf_converted; /* converted to server encoding? */
+
+ /*
+ * Finally, raw_buf holds raw data read from the data source (file or
+ * client connection). CopyReadLine parses this data sufficiently to
+ * locate line boundaries, then transfers the data to line_buf and
+ * converts it. Note: we guarantee that there is a \0 at
+ * raw_buf[raw_buf_len].
+ */
+ #define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+ char *raw_buf;
+ int raw_buf_index; /* next byte to process */
+ int raw_buf_len; /* total # of bytes stored */
+ } CopyStateData;
/* CopyStateData is private in commands/copy.c */
typedef struct CopyStateData *CopyState;
*** a/src/include/commands/vacuum.h
--- b/src/include/commands/vacuum.h
***************
*** 167,171 **** extern void lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
--- 167,179 ----
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, VacuumStmt *vacstmt,
BufferAccessStrategy bstrategy);
+ extern VacAttrStats * examine_attribute(Relation onerel, int attnum, Node *index_expr,
+ MemoryContext anl_context);
+ extern double random_fract(void);
+ extern double init_selection_state(int n);
+ extern double get_next_S(double t, int n, double *stateptr);
+ extern int compare_rows(const void *a, const void *b);
+ extern void update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats);
+ extern Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
#endif /* VACUUM_H */
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 12,19 ****
--- 12,21 ----
#ifndef FDWAPI_H
#define FDWAPI_H
+ #include "foreign/foreign.h"
#include "nodes/execnodes.h"
#include "nodes/relation.h"
+ #include "utils/rel.h"
/* To avoid including explain.h here, reference ExplainState thus: */
struct ExplainState;
***************
*** 68,73 **** typedef void (*ReScanForeignScan_function) (ForeignScanState *node);
--- 70,78 ----
typedef void (*EndForeignScan_function) (ForeignScanState *node);
+ typedef void (*AnalyzeForeignTable_function) (Relation relation,
+ VacuumStmt *vacstmt,
+ int elevel);
/*
* FdwRoutine is the struct returned by a foreign-data wrapper's handler
***************
*** 88,93 **** typedef struct FdwRoutine
--- 93,99 ----
IterateForeignScan_function IterateForeignScan;
ReScanForeignScan_function ReScanForeignScan;
EndForeignScan_function EndForeignScan;
+ AnalyzeForeignTable_function AnalyzeForeignTable;
} FdwRoutine;
(2011/10/20 18:56), Etsuro Fujita wrote:
I revised the patch according to Hanada-san's comments. Attached is the
updated version of the patch.Changes:
* pull up of logging "analyzing foo.bar"
* new vac_update_relstats always called
* tab-completion in psql
* add "foreign tables are not analyzed automatically..." to 23.1.3
Updating Planner Statistics
* some other modifications
Submission review
=================
- Patch can be applied, and all regression tests passed. :)
Random comments
===============
- Some headers are not necessary for file_fdw.c
#include "access/htup.h"
#include "commands/dbcommands.h"
#include "optimizer/plancat.h"
#include "utils/elog.h"
#include "utils/guc.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
- It might be better to mention in document that users need to
explicitly specify foreign table name to ANALYZE command.
- I think backend should be aware the case which a handler is NULL. For
the case of ANALYZE of foreign table, it would be worth telling user
that request was not accomplished.
- file_fdw_do_analyze_rel is almost copy of do_analyze_rel. IIUC,
difference against do_analyze_rel are:
* don't logging analyze target
* don't switch userid to the owner of target table
* don't measure elapsed time for autoanalyze deamon
* don't handle index
* some comments are removed.
* sample rows are acquired by file_fdw's routine
I don't see any problem here, but would you confirm that all of them are
intentional?
Besides, keeping format (mainly indent and folding) of this function
similar to do_analyze_rel would help to follow changes in do_analyze_rel.
- IMHO exporting CopyState should be avoided. One possible idea is
adding new COPY API which allows to extract records from the file with
skipping specified number or rate.
- In your design, each FDW have to copy most of do_analyze_rel to their
own source. It means that FDW authors must know much details of ANALYZE
to implement ANALYZE handler. Actually, your patch exports some static
functions from analyze.c. Have you considered hooking
acquire_sample_rows()? Such handler should be more simple, and
FDW-specific. As you say, such design requires FDWs to skip some
records, but it would be difficult for some FDW (e.g. twitter_fdw) which
can't pick sample data up easily. IMHO such problem *must* be solved by
FDW itself.
--
Shigeru Hanada
(2011/11/07 20:26), Shigeru Hanada wrote:
(2011/10/20 18:56), Etsuro Fujita wrote:
I revised the patch according to Hanada-san's comments. Attached is the
updated version of the patch.Changes:
* pull up of logging "analyzing foo.bar"
* new vac_update_relstats always called
* tab-completion in psql
* add "foreign tables are not analyzed automatically..." to 23.1.3
Updating Planner Statistics
* some other modificationsSubmission review
=================- Patch can be applied, and all regression tests passed. :)
Thank you for your testing. I updated the patch according to your
comments. Attached is the updated version of the patch.
- file_fdw_do_analyze_rel is almost copy of do_analyze_rel. IIUC,
difference against do_analyze_rel are:
* don't logging analyze target
* don't switch userid to the owner of target table
* don't measure elapsed time for autoanalyze deamon
* don't handle index
* some comments are removed.
* sample rows are acquired by file_fdw's routineI don't see any problem here, but would you confirm that all of them are
intentional?
Yes. But in the updated version, I've refactored analyze.c a little bit
to allow FDW authors to simply call do_analyze_rel().
- In your design, each FDW have to copy most of do_analyze_rel to their
own source. It means that FDW authors must know much details of ANALYZE
to implement ANALYZE handler. Actually, your patch exports some static
functions from analyze.c. Have you considered hooking
acquire_sample_rows()? Such handler should be more simple, and
FDW-specific. As you say, such design requires FDWs to skip some
records, but it would be difficult for some FDW (e.g. twitter_fdw) which
can't pick sample data up easily. IMHO such problem *must* be solved by
FDW itself.
The updated version enables FDW authors to just write their own
acquire_sample_rows(). On the other hand, by retaining to hook
AnalyzeForeignTable routine at analyze_rel(), higher level than
acquire_sample_rows() as before, it allows FDW authors to write
AnalyzeForeignTable routine for foreign tables on a remote server to ask
the server for its current stats instead, as pointed out earlier by Tom
Lane.
Best regards,
Etsuro Fujita
Attachments:
postgresql-analyze-v4.patchtext/plain; name=postgresql-analyze-v4.patchDownload
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
***************
*** 16,31 ****
#include <unistd.h>
#include "access/reloptions.h"
#include "catalog/pg_foreign_table.h"
#include "commands/copy.h"
#include "commands/defrem.h"
#include "commands/explain.h"
#include "foreign/fdwapi.h"
#include "foreign/foreign.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "optimizer/cost.h"
! #include "utils/rel.h"
#include "utils/syscache.h"
PG_MODULE_MAGIC;
--- 16,36 ----
#include <unistd.h>
#include "access/reloptions.h"
+ #include "access/transam.h"
#include "catalog/pg_foreign_table.h"
#include "commands/copy.h"
#include "commands/defrem.h"
#include "commands/explain.h"
+ #include "commands/vacuum.h"
#include "foreign/fdwapi.h"
#include "foreign/foreign.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "optimizer/cost.h"
! #include "parser/parse_relation.h"
! #include "pgstat.h"
! #include "utils/attoptcache.h"
! #include "utils/memutils.h"
#include "utils/syscache.h"
PG_MODULE_MAGIC;
***************
*** 101,106 **** static void fileBeginForeignScan(ForeignScanState *node, int eflags);
--- 106,112 ----
static TupleTableSlot *fileIterateForeignScan(ForeignScanState *node);
static void fileReScanForeignScan(ForeignScanState *node);
static void fileEndForeignScan(ForeignScanState *node);
+ static void fileAnalyzeForeignTable(Relation onerel, VacuumStmt *vacstmt, int elevel);
/*
* Helper functions
***************
*** 112,118 **** static List *get_file_fdw_attribute_options(Oid relid);
static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
const char *filename,
Cost *startup_cost, Cost *total_cost);
!
/*
* Foreign-data wrapper handler function: return a struct with pointers
--- 118,127 ----
static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
const char *filename,
Cost *startup_cost, Cost *total_cost);
! static int acquire_sample_rows(Relation onerel,
! HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel);
/*
* Foreign-data wrapper handler function: return a struct with pointers
***************
*** 129,134 **** file_fdw_handler(PG_FUNCTION_ARGS)
--- 138,144 ----
fdwroutine->IterateForeignScan = fileIterateForeignScan;
fdwroutine->ReScanForeignScan = fileReScanForeignScan;
fdwroutine->EndForeignScan = fileEndForeignScan;
+ fdwroutine->AnalyzeForeignTable = fileAnalyzeForeignTable;
PG_RETURN_POINTER(fdwroutine);
}
***************
*** 575,580 **** fileReScanForeignScan(ForeignScanState *node)
--- 585,602 ----
}
/*
+ * fileAnalyzeForeignTable
+ * Analyze table
+ */
+ static void
+ fileAnalyzeForeignTable(Relation onerel, VacuumStmt *vacstmt, int elevel)
+ {
+ do_analyze_rel(onerel, vacstmt,
+ elevel, false,
+ acquire_sample_rows);
+ }
+
+ /*
* Estimate costs of scanning a foreign table.
*/
static void
***************
*** 584,590 **** estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
{
struct stat stat_buf;
BlockNumber pages;
! int tuple_width;
double ntuples;
double nrows;
Cost run_cost = 0;
--- 606,613 ----
{
struct stat stat_buf;
BlockNumber pages;
! BlockNumber relpages;
! double reltuples;
double ntuples;
double nrows;
Cost run_cost = 0;
***************
*** 604,619 **** estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
if (pages < 1)
pages = 1;
! /*
! * Estimate the number of tuples in the file. We back into this estimate
! * using the planner's idea of the relation width; which is bogus if not
! * all columns are being read, not to mention that the text representation
! * of a row probably isn't the same size as its internal representation.
! * FIXME later.
! */
! tuple_width = MAXALIGN(baserel->width) + MAXALIGN(sizeof(HeapTupleHeaderData));
! ntuples = clamp_row_est((double) stat_buf.st_size / (double) tuple_width);
/*
* Now estimate the number of rows returned by the scan after applying the
--- 627,658 ----
if (pages < 1)
pages = 1;
! relpages = baserel->pages;
! reltuples = baserel->tuples;
!
! if (relpages > 0)
! {
! double density;
! density = reltuples / (double) relpages;
!
! ntuples = clamp_row_est(density * (double) pages);
! }
! else
! {
! int tuple_width;
!
! /*
! * Estimate the number of tuples in the file. We back into this estimate
! * using the planner's idea of the relation width; which is bogus if not
! * all columns are being read, not to mention that the text representation
! * of a row probably isn't the same size as its internal representation.
! * FIXME later.
! */
! tuple_width = MAXALIGN(baserel->width) + MAXALIGN(sizeof(HeapTupleHeaderData));
!
! ntuples = clamp_row_est((double) stat_buf.st_size / (double) tuple_width);
! }
/*
* Now estimate the number of rows returned by the scan after applying the
***************
*** 645,647 **** estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
--- 684,888 ----
run_cost += cpu_per_tuple * ntuples;
*total_cost = *startup_cost + run_cost;
}
+
+ /*
+ * acquire_sample_rows -- acquire a random sample of rows from the table
+ *
+ * Selected rows are returned in the caller-allocated array rows[], which
+ * must have at least targrows entries.
+ * The actual number of rows selected is returned as the function result.
+ * We also count the number of rows in the table, and return it into *totalrows.
+ *
+ * The returned list of tuples is in order by physical position in the table.
+ * (We will rely on this later to derive correlation estimates.)
+ */
+ static int
+ acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
+ double *totalrows, double *totaldeadrows,
+ BlockNumber *totalpages, int elevel)
+ {
+ int numrows = 0;
+ double samplerows = 0; /* total # rows collected */
+ double rowstoskip = -1; /* -1 means not set yet */
+ double rstate;
+ HeapTuple tuple;
+ TupleDesc tupDesc;
+ TupleConstr *constr;
+ int natts;
+ int attrChk;
+ Datum *values;
+ bool *nulls;
+ bool found;
+ bool sample_it = false;
+ BlockNumber blknum;
+ OffsetNumber offnum;
+ char *filename;
+ struct stat stat_buf;
+ List *options;
+ CopyState cstate;
+ ErrorContextCallback errcontext;
+
+ Assert(onerel);
+ Assert(targrows > 0);
+
+ tupDesc = RelationGetDescr(onerel);
+ constr = tupDesc->constr;
+ natts = tupDesc->natts;
+ values = (Datum *) palloc(tupDesc->natts * sizeof(Datum));
+ nulls = (bool *) palloc(tupDesc->natts * sizeof(bool));
+
+ /* Fetch options of foreign table */
+ fileGetOptions(RelationGetRelid(onerel), &filename, &options);
+
+ /*
+ * Get size of the file.
+ */
+ if (stat(filename, &stat_buf) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ filename)));
+
+ /*
+ * Convert size to pages for use in I/O cost estimate.
+ */
+ *totalpages = (stat_buf.st_size + (BLCKSZ - 1)) / BLCKSZ;
+ if (*totalpages < 1)
+ *totalpages = 1;
+
+ /*
+ * Create CopyState from FDW options. We always acquire all columns, so
+ * as to match the expected ScanTupleSlot signature.
+ */
+ cstate = BeginCopyFrom(onerel, filename, NIL, options);
+
+ /* Prepare for sampling rows */
+ rstate = init_selection_state(targrows);
+
+ /* Set up callback to identify error line number. */
+ errcontext.callback = CopyFromErrorCallback;
+ errcontext.arg = (void *) cstate;
+ errcontext.previous = error_context_stack;
+ error_context_stack = &errcontext;
+
+ for (;;)
+ {
+ sample_it = true;
+
+ /*
+ * Check for user-requested abort.
+ */
+ CHECK_FOR_INTERRUPTS();
+
+ found = NextCopyFrom(cstate, NULL, values, nulls, NULL);
+
+ if (!found)
+ break;
+
+ tuple = heap_form_tuple(tupDesc, values, nulls);
+
+ if (constr && constr->has_not_null)
+ {
+ for (attrChk = 1; attrChk <= natts; attrChk++)
+ {
+ if (onerel->rd_att->attrs[attrChk - 1]->attnotnull &&
+ heap_attisnull(tuple, attrChk))
+ {
+ sample_it = false;
+ break;
+ }
+ }
+ }
+
+ if (!sample_it)
+ {
+ heap_freetuple(tuple);
+ continue;
+ }
+
+ /*
+ * The first targrows sample rows are simply copied into the
+ * reservoir. Then we start replacing tuples in the sample
+ * until we reach the end of the relation. This algorithm is
+ * from Jeff Vitter's paper (see full citation below). It
+ * works by repeatedly computing the number of tuples to skip
+ * before selecting a tuple, which replaces a randomly chosen
+ * element of the reservoir (current set of tuples). At all
+ * times the reservoir is a true random sample of the tuples
+ * we've passed over so far, so when we fall off the end of
+ * the relation we're done.
+ */
+ if (numrows < targrows)
+ {
+ blknum = (BlockNumber) samplerows / MaxOffsetNumber;
+ offnum = (OffsetNumber) samplerows % MaxOffsetNumber + 1;
+ ItemPointerSet(&tuple->t_self, blknum, offnum);
+ rows[numrows++] = heap_copytuple(tuple);
+ }
+ else
+ {
+ /*
+ * t in Vitter's paper is the number of records already
+ * processed. If we need to compute a new S value, we
+ * must use the not-yet-incremented value of samplerows as
+ * t.
+ */
+ if (rowstoskip < 0)
+ rowstoskip = get_next_S(samplerows, targrows, &rstate);
+
+ if (rowstoskip <= 0)
+ {
+ /*
+ * Found a suitable tuple, so save it, replacing one
+ * old tuple at random
+ */
+ int k = (int) (targrows * random_fract());
+
+ Assert(k >= 0 && k < targrows);
+ heap_freetuple(rows[k]);
+
+ blknum = (BlockNumber) samplerows / MaxOffsetNumber;
+ offnum = (OffsetNumber) samplerows % MaxOffsetNumber + 1;
+ ItemPointerSet(&tuple->t_self, blknum, offnum);
+ rows[k] = heap_copytuple(tuple);
+ }
+
+ rowstoskip -= 1;
+ }
+
+ samplerows += 1;
+ heap_freetuple(tuple);
+ }
+
+ /* Remove error callback. */
+ error_context_stack = errcontext.previous;
+
+ /*
+ * If we didn't find as many tuples as we wanted then we're done. No sort
+ * is needed, since they're already in order.
+ *
+ * Otherwise we need to sort the collected tuples by position
+ * (itempointer). It's not worth worrying about corner cases where the
+ * tuples are already sorted.
+ */
+ if (numrows == targrows)
+ qsort((void *) rows, numrows, sizeof(HeapTuple), compare_rows);
+
+ *totalrows = samplerows;
+ *totaldeadrows = 0;
+
+ EndCopyFrom(cstate);
+
+ pfree(values);
+ pfree(nulls);
+
+ /*
+ * Emit some interesting relation info
+ */
+ ereport(elevel,
+ (errmsg("\"%s\": scanned, "
+ "%d rows in sample, %d total rows",
+ RelationGetRelationName(onerel), numrows, (int) *totalrows)));
+
+ return numrows;
+ }
*** a/contrib/file_fdw/input/file_fdw.source
--- b/contrib/file_fdw/input/file_fdw.source
***************
*** 111,116 **** EXECUTE st(100);
--- 111,121 ----
EXECUTE st(100);
DEALLOCATE st;
+ -- statistics collection tests
+ ANALYZE agg_csv;
+ SELECT relpages, reltuples FROM pg_class WHERE relname = 'agg_csv';
+ SELECT * FROM pg_stats WHERE tablename = 'agg_csv';
+
-- tableoid
SELECT tableoid::regclass, b FROM agg_csv;
*** a/contrib/file_fdw/output/file_fdw.source
--- b/contrib/file_fdw/output/file_fdw.source
***************
*** 174,179 **** EXECUTE st(100);
--- 174,194 ----
(1 row)
DEALLOCATE st;
+ -- statistics collection tests
+ ANALYZE agg_csv;
+ SELECT relpages, reltuples FROM pg_class WHERE relname = 'agg_csv';
+ relpages | reltuples
+ ----------+-----------
+ 1 | 3
+ (1 row)
+
+ SELECT * FROM pg_stats WHERE tablename = 'agg_csv';
+ schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
+ ------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+-------------------------+-------------
+ public | agg_csv | a | f | 0 | 2 | -1 | | | {0,42,100} | -0.5
+ public | agg_csv | b | f | 0 | 4 | -1 | | | {0.09561,99.097,324.78} | 0.5
+ (2 rows)
+
-- tableoid
SELECT tableoid::regclass, b FROM agg_csv;
tableoid | b
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***************
*** 228,233 **** EndForeignScan (ForeignScanState *node);
--- 228,246 ----
</para>
<para>
+ <programlisting>
+ void
+ AnalyzeForeignTable (Relation onerel,
+ VacuumStmt *vacstmt,
+ int elevel);
+ </programlisting>
+
+ Collect statistics on a foreign table and store the results in the
+ pg_class and pg_statistics system catalogs.
+ This is called when <command>ANALYZE</> command is run.
+ </para>
+
+ <para>
The <structname>FdwRoutine</> and <structname>FdwPlan</> struct types
are declared in <filename>src/include/foreign/fdwapi.h</>, which see
for additional details.
*** a/doc/src/sgml/maintenance.sgml
--- b/doc/src/sgml/maintenance.sgml
***************
*** 279,284 ****
--- 279,288 ----
<command>ANALYZE</> strictly as a function of the number of rows
inserted or updated; it has no knowledge of whether that will lead
to meaningful statistical changes.
+ Note that the autovacuum daemon does not issue <command>ANALYZE</>
+ commands on foreign tables. It is recommended to run manually-managed
+ <command>ANALYZE</> commands as needed, which typically are executed
+ according to a schedule by cron or Task Scheduler scripts.
</para>
<para>
*** a/doc/src/sgml/ref/alter_foreign_table.sgml
--- b/doc/src/sgml/ref/alter_foreign_table.sgml
***************
*** 36,41 **** ALTER FOREIGN TABLE <replaceable class="PARAMETER">name</replaceable>
--- 36,44 ----
DROP [ COLUMN ] [ IF EXISTS ] <replaceable class="PARAMETER">column</replaceable> [ RESTRICT | CASCADE ]
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> [ SET DATA ] TYPE <replaceable class="PARAMETER">type</replaceable>
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> { SET | DROP } NOT NULL
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> SET STATISTICS <replaceable class="PARAMETER">integer</replaceable>
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> SET ( <replaceable class="PARAMETER">attribute_option</replaceable> = <replaceable class="PARAMETER">value</replaceable> [, ... ] )
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> RESET ( <replaceable class="PARAMETER">attribute_option</replaceable> [, ... ] )
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> OPTIONS ( [ ADD | SET | DROP ] <replaceable class="PARAMETER">option</replaceable> ['<replaceable class="PARAMETER">value</replaceable>'] [, ... ])
OWNER TO <replaceable class="PARAMETER">new_owner</replaceable>
OPTIONS ( [ ADD | SET | DROP ] <replaceable class="PARAMETER">option</replaceable> ['<replaceable class="PARAMETER">value</replaceable>'] [, ... ])
***************
*** 94,99 **** ALTER FOREIGN TABLE <replaceable class="PARAMETER">name</replaceable>
--- 97,146 ----
</varlistentry>
<varlistentry>
+ <term><literal>SET STATISTICS</literal></term>
+ <listitem>
+ <para>
+ This form
+ sets the per-column statistics-gathering target for subsequent
+ <xref linkend="sql-analyze"> operations.
+ The target can be set in the range 0 to 10000; alternatively, set it
+ to -1 to revert to using the system default statistics
+ target (<xref linkend="guc-default-statistics-target">).
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>SET ( <replaceable class="PARAMETER">attribute_option</replaceable> = <replaceable class="PARAMETER">value</replaceable> [, ... ] )</literal></term>
+ <term><literal>RESET ( <replaceable class="PARAMETER">attribute_option</replaceable> [, ... ] )</literal></term>
+ <listitem>
+ <para>
+ This form
+ sets or resets a per-attribute option. Currently, the only defined
+ per-attribute option is <literal>n_distinct</>, which overrides
+ the number-of-distinct-values estimates made by subsequent
+ <xref linkend="sql-analyze"> operations.
+ When set to a positive value, <command>ANALYZE</> will assume that
+ the column contains exactly the specified number of distinct nonnull
+ values.
+ When set to a negative value, which must be greater than or equal
+ to -1, <command>ANALYZE</> will assume that the number of distinct
+ nonnull values in the column is linear in the size of the foreign
+ table; the exact count is to be computed by multiplying the estimated
+ foreign table size by the absolute value of the given number.
+ For example,
+ a value of -1 implies that all values in the column are distinct,
+ while a value of -0.5 implies that each value appears twice on the
+ average.
+ This can be useful when the size of the foreign table changes over
+ time, since the multiplication by the number of rows in the foreign
+ table is not performed until query planning time. Specify a value
+ of 0 to revert to estimating the number of distinct values normally.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>OWNER</literal></term>
<listitem>
<para>
*** a/doc/src/sgml/ref/analyze.sgml
--- b/doc/src/sgml/ref/analyze.sgml
***************
*** 39,47 **** ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ ( <re
<para>
With no parameter, <command>ANALYZE</command> examines every table in the
! current database. With a parameter, <command>ANALYZE</command> examines
! only that table. It is further possible to give a list of column names,
! in which case only the statistics for those columns are collected.
</para>
</refsect1>
--- 39,49 ----
<para>
With no parameter, <command>ANALYZE</command> examines every table in the
! current database except for foreign tables. With a parameter, <command>
! ANALYZE</command> examines only that table. For a foreign table, it is
! necessary to spcify the name of that table. It is further possible to
! give a list of column names, in which case only the statistics for those
! columns are collected.
</para>
</refsect1>
***************
*** 63,69 **** ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ ( <re
<listitem>
<para>
The name (possibly schema-qualified) of a specific table to
! analyze. Defaults to all tables in the current database.
</para>
</listitem>
</varlistentry>
--- 65,72 ----
<listitem>
<para>
The name (possibly schema-qualified) of a specific table to
! analyze. Defaults to all tables in the current database except
! for foreign tables.
</para>
</listitem>
</varlistentry>
***************
*** 137,143 **** ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ ( <re
In rare situations, this non-determinism will cause the planner's
choices of query plans to change after <command>ANALYZE</command> is run.
To avoid this, raise the amount of statistics collected by
! <command>ANALYZE</command>, as described below.
</para>
<para>
--- 140,148 ----
In rare situations, this non-determinism will cause the planner's
choices of query plans to change after <command>ANALYZE</command> is run.
To avoid this, raise the amount of statistics collected by
! <command>ANALYZE</command>, as described below. Note that the time
! needed to analyze on foreign tables depends on the implementation of
! the foreign data wrapper via which such tables are attached.
</para>
<para>
*** a/src/backend/commands/analyze.c
--- b/src/backend/commands/analyze.c
***************
*** 23,28 ****
--- 23,29 ----
#include "access/xact.h"
#include "catalog/index.h"
#include "catalog/indexing.h"
+ #include "catalog/pg_class.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_namespace.h"
***************
*** 30,35 ****
--- 31,38 ----
#include "commands/tablecmds.h"
#include "commands/vacuum.h"
#include "executor/executor.h"
+ #include "foreign/foreign.h"
+ #include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "parser/parse_oper.h"
***************
*** 78,91 **** typedef struct AnlIndexData
int default_statistics_target = 100;
/* A few variables that don't seem worth passing around as parameters */
- static int elevel = -1;
-
static MemoryContext anl_context = NULL;
static BufferAccessStrategy vac_strategy;
- static void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh);
static void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
int samplesize);
static bool BlockSampler_HasMore(BlockSampler bs);
--- 81,91 ----
***************
*** 96,110 **** static void compute_index_stats(Relation onerel, double totalrows,
MemoryContext col_context);
static VacAttrStats *examine_attribute(Relation onerel, int attnum,
Node *index_expr);
! static int acquire_sample_rows(Relation onerel, HeapTuple *rows,
! int targrows, double *totalrows, double *totaldeadrows);
! static double random_fract(void);
! static double init_selection_state(int n);
! static double get_next_S(double t, int n, double *stateptr);
! static int compare_rows(const void *a, const void *b);
static int acquire_inherited_sample_rows(Relation onerel,
HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows);
static void update_attstats(Oid relid, bool inh,
int natts, VacAttrStats **vacattrstats);
static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
--- 96,109 ----
MemoryContext col_context);
static VacAttrStats *examine_attribute(Relation onerel, int attnum,
Node *index_expr);
! static int acquire_sample_rows(Relation onerel,
! HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel);
static int acquire_inherited_sample_rows(Relation onerel,
HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel);
static void update_attstats(Oid relid, bool inh,
int natts, VacAttrStats **vacattrstats);
static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
***************
*** 119,125 **** static bool std_typanalyze(VacAttrStats *stats);
--- 118,126 ----
void
analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
{
+ int elevel;
Relation onerel;
+ FdwRoutine *fdwroutine;
/* Set up static variables */
if (vacstmt->options & VACOPT_VERBOSE)
***************
*** 184,193 **** analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
}
/*
! * Check that it's a plain table; we used to do this in get_rel_oids() but
* seems safer to check after we've locked the relation.
*/
! if (onerel->rd_rel->relkind != RELKIND_RELATION)
{
/* No need for a WARNING if we already complained during VACUUM */
if (!(vacstmt->options & VACOPT_VACUUM))
--- 185,195 ----
}
/*
! * Check that it's a plain table or foreign table; we used to do this in get_rel_oids() but
* seems safer to check after we've locked the relation.
*/
! if (onerel->rd_rel->relkind != RELKIND_RELATION &&
! onerel->rd_rel->relkind != RELKIND_FOREIGN_TABLE)
{
/* No need for a WARNING if we already complained during VACUUM */
if (!(vacstmt->options & VACOPT_VACUUM))
***************
*** 212,217 **** analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
--- 214,221 ----
/*
* We can ANALYZE any table except pg_statistic. See update_attstats
+ * We can ANALYZE foreign tables if the underlying foreign-data wrappers
+ * have their AnalyzeForeignTable callback routines.
*/
if (RelationGetRelid(onerel) == StatisticRelationId)
{
***************
*** 219,224 **** analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
--- 223,242 ----
return;
}
+ if (onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ {
+ fdwroutine = GetFdwRoutineByRelId(RelationGetRelid(onerel));
+
+ if (fdwroutine->AnalyzeForeignTable == NULL)
+ {
+ ereport(WARNING,
+ (errmsg("skipping \"%s\" --- underlying foreign-data wrapper cannot analyze it",
+ RelationGetRelationName(onerel))));
+ relation_close(onerel, ShareUpdateExclusiveLock);
+ return;
+ }
+ }
+
/*
* OK, let's do it. First let other backends know I'm in ANALYZE.
*/
***************
*** 226,241 **** analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
MyProc->vacuumFlags |= PROC_IN_ANALYZE;
LWLockRelease(ProcArrayLock);
! /*
! * Do the normal non-recursive ANALYZE.
! */
! do_analyze_rel(onerel, vacstmt, false);
! /*
! * If there are child tables, do recursive ANALYZE.
! */
! if (onerel->rd_rel->relhassubclass)
! do_analyze_rel(onerel, vacstmt, true);
/*
* Close source relation now, but keep lock so that no one deletes it
--- 244,280 ----
MyProc->vacuumFlags |= PROC_IN_ANALYZE;
LWLockRelease(ProcArrayLock);
! if (onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
! {
! ereport(elevel,
! (errmsg("analyzing \"%s.%s\"",
! get_namespace_name(RelationGetNamespace(onerel)),
! RelationGetRelationName(onerel))));
! fdwroutine->AnalyzeForeignTable(onerel, vacstmt, elevel);
! }
! else
! {
! /*
! * Do the normal non-recursive ANALYZE.
! */
! ereport(elevel,
! (errmsg("analyzing \"%s.%s\"",
! get_namespace_name(RelationGetNamespace(onerel)),
! RelationGetRelationName(onerel))));
! do_analyze_rel(onerel, vacstmt, elevel, false, acquire_sample_rows);
! /*
! * If there are child tables, do recursive ANALYZE.
! */
! if (onerel->rd_rel->relhassubclass)
! {
! ereport(elevel,
! (errmsg("analyzing \"%s.%s\" inheritance tree",
! get_namespace_name(RelationGetNamespace(onerel)),
! RelationGetRelationName(onerel))));
! do_analyze_rel(onerel, vacstmt, elevel, true, acquire_inherited_sample_rows);
! }
! }
/*
* Close source relation now, but keep lock so that no one deletes it
***************
*** 257,264 **** analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
/*
* do_analyze_rel() -- analyze one relation, recursively or not
*/
! static void
! do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
{
int attr_cnt,
tcnt,
--- 296,304 ----
/*
* do_analyze_rel() -- analyze one relation, recursively or not
*/
! void
! do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, int elevel,
! bool inh, int (*sample_row_acquirer) ())
{
int attr_cnt,
tcnt,
***************
*** 273,278 **** do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
--- 313,319 ----
numrows;
double totalrows,
totaldeadrows;
+ BlockNumber totalpages;
HeapTuple *rows;
PGRUsage ru0;
TimestampTz starttime = 0;
***************
*** 281,297 **** do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
int save_sec_context;
int save_nestlevel;
- if (inh)
- ereport(elevel,
- (errmsg("analyzing \"%s.%s\" inheritance tree",
- get_namespace_name(RelationGetNamespace(onerel)),
- RelationGetRelationName(onerel))));
- else
- ereport(elevel,
- (errmsg("analyzing \"%s.%s\"",
- get_namespace_name(RelationGetNamespace(onerel)),
- RelationGetRelationName(onerel))));
-
/*
* Set up a working context so that we can easily free whatever junk gets
* created.
--- 322,327 ----
***************
*** 449,459 **** do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
*/
rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
if (inh)
! numrows = acquire_inherited_sample_rows(onerel, rows, targrows,
! &totalrows, &totaldeadrows);
else
! numrows = acquire_sample_rows(onerel, rows, targrows,
! &totalrows, &totaldeadrows);
/*
* Compute the statistics. Temporary results during the calculations for
--- 479,491 ----
*/
rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
if (inh)
! numrows = sample_row_acquirer(onerel, rows, targrows,
! &totalrows, &totaldeadrows,
! NULL, elevel);
else
! numrows = sample_row_acquirer(onerel, rows, targrows,
! &totalrows, &totaldeadrows,
! &totalpages, elevel);
/*
* Compute the statistics. Temporary results during the calculations for
***************
*** 534,540 **** do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
*/
if (!inh)
vac_update_relstats(onerel,
! RelationGetNumberOfBlocks(onerel),
totalrows,
visibilitymap_count(onerel),
hasindex,
--- 566,572 ----
*/
if (!inh)
vac_update_relstats(onerel,
! totalpages,
totalrows,
visibilitymap_count(onerel),
hasindex,
***************
*** 1017,1023 **** BlockSampler_Next(BlockSampler bs)
*/
static int
acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows)
{
int numrows = 0; /* # rows now in reservoir */
double samplerows = 0; /* total # rows collected */
--- 1049,1056 ----
*/
static int
acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel)
{
int numrows = 0; /* # rows now in reservoir */
double samplerows = 0; /* total # rows collected */
***************
*** 1032,1037 **** acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
--- 1065,1072 ----
Assert(targrows > 0);
totalblocks = RelationGetNumberOfBlocks(onerel);
+ if (totalpages)
+ *totalpages = totalblocks;
/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true);
***************
*** 1254,1260 **** acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
}
/* Select a random value R uniformly distributed in (0 - 1) */
! static double
random_fract(void)
{
return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
--- 1289,1295 ----
}
/* Select a random value R uniformly distributed in (0 - 1) */
! double
random_fract(void)
{
return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
***************
*** 1274,1287 **** random_fract(void)
* determines the number of records to skip before the next record is
* processed.
*/
! static double
init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
return exp(-log(random_fract()) / n);
}
! static double
get_next_S(double t, int n, double *stateptr)
{
double S;
--- 1309,1322 ----
* determines the number of records to skip before the next record is
* processed.
*/
! double
init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
return exp(-log(random_fract()) / n);
}
! double
get_next_S(double t, int n, double *stateptr)
{
double S;
***************
*** 1366,1372 **** get_next_S(double t, int n, double *stateptr)
/*
* qsort comparator for sorting rows[] array
*/
! static int
compare_rows(const void *a, const void *b)
{
HeapTuple ha = *(const HeapTuple *) a;
--- 1401,1407 ----
/*
* qsort comparator for sorting rows[] array
*/
! int
compare_rows(const void *a, const void *b)
{
HeapTuple ha = *(const HeapTuple *) a;
***************
*** 1397,1403 **** compare_rows(const void *a, const void *b)
*/
static int
acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows)
{
List *tableOIDs;
Relation *rels;
--- 1432,1439 ----
*/
static int
acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel)
{
List *tableOIDs;
Relation *rels;
***************
*** 1460,1465 **** acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
--- 1496,1503 ----
totalblocks += relblocks[nrels];
nrels++;
}
+ if (totalpages)
+ *totalpages = totalblocks;
/*
* Now sample rows from each relation, proportionally to its fraction of
***************
*** 1493,1499 **** acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
rows + numrows,
childtargrows,
&trows,
! &tdrows);
/* We may need to convert from child's rowtype to parent's */
if (childrows > 0 &&
--- 1531,1539 ----
rows + numrows,
childtargrows,
&trows,
! &tdrows,
! NULL,
! elevel);
/* We may need to convert from child's rowtype to parent's */
if (childrows > 0 &&
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***************
*** 311,316 **** static void ATPrepSetStatistics(Relation rel, const char *colName,
--- 311,318 ----
Node *newValue, LOCKMODE lockmode);
static void ATExecSetStatistics(Relation rel, const char *colName,
Node *newValue, LOCKMODE lockmode);
+ static void ATPrepSetOptions(Relation rel, const char *colName,
+ Node *options, LOCKMODE lockmode);
static void ATExecSetOptions(Relation rel, const char *colName,
Node *options, bool isReset, LOCKMODE lockmode);
static void ATExecSetStorage(Relation rel, const char *colName,
***************
*** 2887,2892 **** ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
--- 2889,2895 ----
case AT_SetOptions: /* ALTER COLUMN SET ( options ) */
case AT_ResetOptions: /* ALTER COLUMN RESET ( options ) */
ATSimplePermissions(rel, ATT_TABLE | ATT_INDEX);
+ ATPrepSetOptions(rel, cmd->name, cmd->def, lockmode);
/* This command never recurses */
pass = AT_PASS_MISC;
break;
***************
*** 4822,4831 **** ATPrepSetStatistics(Relation rel, const char *colName, Node *newValue, LOCKMODE
* allowSystemTableMods to be turned on.
*/
if (rel->rd_rel->relkind != RELKIND_RELATION &&
! rel->rd_rel->relkind != RELKIND_INDEX)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
! errmsg("\"%s\" is not a table or index",
RelationGetRelationName(rel))));
/* Permissions checks */
--- 4825,4835 ----
* allowSystemTableMods to be turned on.
*/
if (rel->rd_rel->relkind != RELKIND_RELATION &&
! rel->rd_rel->relkind != RELKIND_INDEX &&
! rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
! errmsg("\"%s\" is not a table, index or foreign table",
RelationGetRelationName(rel))));
/* Permissions checks */
***************
*** 4894,4899 **** ATExecSetStatistics(Relation rel, const char *colName, Node *newValue, LOCKMODE
--- 4898,4923 ----
}
static void
+ ATPrepSetOptions(Relation rel, const char *colName, Node *options,
+ LOCKMODE lockmode)
+ {
+ if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ {
+ ListCell *cell;
+
+ foreach(cell, (List *) options)
+ {
+ DefElem *def = (DefElem *) lfirst(cell);
+
+ if (pg_strncasecmp(def->defname, "n_distinct_inherited", strlen("n_distinct_inherited")) == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot support option \"n_distinct_inherited\" for foreign tables")));
+ }
+ }
+ }
+
+ static void
ATExecSetOptions(Relation rel, const char *colName, Node *options,
bool isReset, LOCKMODE lockmode)
{
*** a/src/bin/psql/tab-complete.c
--- b/src/bin/psql/tab-complete.c
***************
*** 399,404 **** static const SchemaQuery Query_for_list_of_tsvf = {
--- 399,419 ----
NULL
};
+ static const SchemaQuery Query_for_list_of_tf = {
+ /* catname */
+ "pg_catalog.pg_class c",
+ /* selcondition */
+ "c.relkind IN ('r', 'f')",
+ /* viscondition */
+ "pg_catalog.pg_table_is_visible(c.oid)",
+ /* namespace */
+ "c.relnamespace",
+ /* result */
+ "pg_catalog.quote_ident(c.relname)",
+ /* qualresult */
+ NULL
+ };
+
static const SchemaQuery Query_for_list_of_views = {
/* catname */
"pg_catalog.pg_class c",
***************
*** 2769,2775 **** psql_completion(char *text, int start, int end)
/* ANALYZE */
/* If the previous word is ANALYZE, produce list of tables */
else if (pg_strcasecmp(prev_wd, "ANALYZE") == 0)
! COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);
/* WHERE */
/* Simple case of the word before the where being the table name */
--- 2784,2790 ----
/* ANALYZE */
/* If the previous word is ANALYZE, produce list of tables */
else if (pg_strcasecmp(prev_wd, "ANALYZE") == 0)
! COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tf, NULL);
/* WHERE */
/* Simple case of the word before the where being the table name */
*** a/src/include/commands/vacuum.h
--- b/src/include/commands/vacuum.h
***************
*** 167,171 **** extern void lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
--- 167,178 ----
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, VacuumStmt *vacstmt,
BufferAccessStrategy bstrategy);
+ extern void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, int elevel,
+ bool inh, int (*sample_row_acquirer) ());
+ extern double random_fract(void);
+ extern double init_selection_state(int n);
+ extern double get_next_S(double t, int n, double *stateptr);
+ extern int compare_rows(const void *a, const void *b);
+
#endif /* VACUUM_H */
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 12,19 ****
--- 12,21 ----
#ifndef FDWAPI_H
#define FDWAPI_H
+ #include "foreign/foreign.h"
#include "nodes/execnodes.h"
#include "nodes/relation.h"
+ #include "utils/rel.h"
/* To avoid including explain.h here, reference ExplainState thus: */
struct ExplainState;
***************
*** 68,73 **** typedef void (*ReScanForeignScan_function) (ForeignScanState *node);
--- 70,78 ----
typedef void (*EndForeignScan_function) (ForeignScanState *node);
+ typedef void (*AnalyzeForeignTable_function) (Relation relation,
+ VacuumStmt *vacstmt,
+ int elevel);
/*
* FdwRoutine is the struct returned by a foreign-data wrapper's handler
***************
*** 88,93 **** typedef struct FdwRoutine
--- 93,99 ----
IterateForeignScan_function IterateForeignScan;
ReScanForeignScan_function ReScanForeignScan;
EndForeignScan_function EndForeignScan;
+ AnalyzeForeignTable_function AnalyzeForeignTable;
} FdwRoutine;
(2011/11/18 16:25), Etsuro Fujita wrote:
Thank you for your testing. I updated the patch according to your
comments. Attached is the updated version of the patch.
I'd like to share result of my review even though it's not fully
finished. So far I looked from viewpoint of API design, code
formatting, and documentation. I'll examine effectiveness of the patch
and details of implementation next week, and hopefully try writing
ANALYZE handler for pgsql_fdw :)
New patch has correct format, and it could be applied to HEAD of master
branch cleanly. All regression tests including those of contrib modules
have passed. It contains changes of codes and regression tests related
to the issue, and they have enough comments. IMO the document in this
patch is not enough to show how to write analyze handler to FDW authors,
but it can be enhanced afterward. With this patch, FDW author can
provide optional ANALYZE handler which updates statistics of foreign
tables. Planner would be able to generate better plan by using statistics.
Yes. But in the updated version, I've refactored analyze.c a little bit
to allow FDW authors to simply call do_analyze_rel().
<snip>
The updated version enables FDW authors to just write their own
acquire_sample_rows(). On the other hand, by retaining to hook
AnalyzeForeignTable routine at analyze_rel(), higher level than
acquire_sample_rows() as before, it allows FDW authors to write
AnalyzeForeignTable routine for foreign tables on a remote server to ask
the server for its current stats instead, as pointed out earlier by Tom
Lane.
IIUC, this patch offers three options to FDWs: a) set
AnalyzeForeignTable to NULL to indicate lack of capability, b) provide
AnalyzeForeignTable which calls do_analyze_rel with custom
sample_row_acquirer, and c) create statistics data from scratch by FDW
itself by doing similar things to do_analyze_rel with given argument or
copying statistics from remote PostgreSQL server.
ISTM that this design is well-balanced between simplicity and
flexibility. Maybe these three options would suit web-based wrappers,
file-based or RDBMS wrappers, and pgsql_fdw respectively. I think that
adding more details of FdwRoutine, such as purpose of new callback
function and difference from required ones, would help FDW authors,
including me :)
I have some random comments:
- I think separated typedef of sample_acquire_rows would make codes more
readable. In addition, parameters of the function should be described
explicitly.
- I couldn't see the reason why file_fdw sets ctid of sample tuples,
though I guess it's for Vitter's random sampling algorithm. If every
FDW must set valid ctid to sample tuples, it should be mentioned in
document of AnalyzeForeignTable. Exporting some functions from
analyze.c relates this issue?
- Why file_fdw skips sample tuples which have NULL value? AFAIS
original acquire_sample_rows doesn't do so.
- Some comment lines go past 80 columns.
- Some headers included in file_fdw.c seems unnecessary.
Regards,
--
Shigeru Hanada
2011/11/18 Shigeru Hanada <shigeru.hanada@gmail.com>:
- I couldn't see the reason why file_fdw sets ctid of sample tuples,
though I guess it's for Vitter's random sampling algorithm. If every
FDW must set valid ctid to sample tuples, it should be mentioned in
document of AnalyzeForeignTable. Exporting some functions from
analyze.c relates this issue?
If every FDW must set valid ctid to sample tuples, it should be fixed
so that they don't have to, I would think.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi Hanada-san,
Thank you for your valuable comments. I will improve the items pointed
out by you at the next version of the patch, including documentation on
the purpose of AnalyzeForeignTable, how to write it, and so on. Here I
comment only one point:
- Why file_fdw skips sample tuples which have NULL value? AFAIS
original acquire_sample_rows doesn't do so.
To be precise, I've implemented to skip tuples that have null values in
certain column(s) but that are not allowed to contain null values in
that(those) column(s) by NOT NULL constrain. file_fdw's
sample_row_acquirer considers those tuples as "dead" tuples. This is
for the consistency with NOT NULL constrain. (But I don't know why
fileIterateForeignScan routine allows such dead tuples. I may have
missed something.)
Best regards,
Etsuro Fujita
(2011/11/18 21:00), Shigeru Hanada wrote:
Show quoted text
(2011/11/18 16:25), Etsuro Fujita wrote:
Thank you for your testing. I updated the patch according to your
comments. Attached is the updated version of the patch.I'd like to share result of my review even though it's not fully
finished. So far I looked from viewpoint of API design, code
formatting, and documentation. I'll examine effectiveness of the patch
and details of implementation next week, and hopefully try writing
ANALYZE handler for pgsql_fdw :)New patch has correct format, and it could be applied to HEAD of master
branch cleanly. All regression tests including those of contrib modules
have passed. It contains changes of codes and regression tests related
to the issue, and they have enough comments. IMO the document in this
patch is not enough to show how to write analyze handler to FDW authors,
but it can be enhanced afterward. With this patch, FDW author can
provide optional ANALYZE handler which updates statistics of foreign
tables. Planner would be able to generate better plan by using statistics.Yes. But in the updated version, I've refactored analyze.c a little bit
to allow FDW authors to simply call do_analyze_rel().<snip>
The updated version enables FDW authors to just write their own
acquire_sample_rows(). On the other hand, by retaining to hook
AnalyzeForeignTable routine at analyze_rel(), higher level than
acquire_sample_rows() as before, it allows FDW authors to write
AnalyzeForeignTable routine for foreign tables on a remote server to ask
the server for its current stats instead, as pointed out earlier by Tom
Lane.IIUC, this patch offers three options to FDWs: a) set
AnalyzeForeignTable to NULL to indicate lack of capability, b) provide
AnalyzeForeignTable which calls do_analyze_rel with custom
sample_row_acquirer, and c) create statistics data from scratch by FDW
itself by doing similar things to do_analyze_rel with given argument or
copying statistics from remote PostgreSQL server.ISTM that this design is well-balanced between simplicity and
flexibility. Maybe these three options would suit web-based wrappers,
file-based or RDBMS wrappers, and pgsql_fdw respectively. I think that
adding more details of FdwRoutine, such as purpose of new callback
function and difference from required ones, would help FDW authors,
including me :)I have some random comments:
- I think separated typedef of sample_acquire_rows would make codes more
readable. In addition, parameters of the function should be described
explicitly.
- I couldn't see the reason why file_fdw sets ctid of sample tuples,
though I guess it's for Vitter's random sampling algorithm. If every
FDW must set valid ctid to sample tuples, it should be mentioned in
document of AnalyzeForeignTable. Exporting some functions from
analyze.c relates this issue?
- Why file_fdw skips sample tuples which have NULL value? AFAIS
original acquire_sample_rows doesn't do so.
- Some comment lines go past 80 columns.
- Some headers included in file_fdw.c seems unnecessary.Regards,
(2011/11/19 0:54), Robert Haas wrote:
2011/11/18 Shigeru Hanada<shigeru.hanada@gmail.com>:
- I couldn't see the reason why file_fdw sets ctid of sample tuples,
though I guess it's for Vitter's random sampling algorithm. If every
FDW must set valid ctid to sample tuples, it should be mentioned in
document of AnalyzeForeignTable. Exporting some functions from
analyze.c relates this issue?If every FDW must set valid ctid to sample tuples, it should be fixed
so that they don't have to, I would think.
It's for neither Vitter's algorithm nor exporting functions from
analyze.c. It's for "foreign index scan" on CSV file data that I plan
to propose in the next CF. So, it is meaningless for now. I'm sorry.
I will fix it at the next version of the patch so that they don't have to.
Best regards,
Etsuro Fujita
Hi Hanada-san,
I updated the patch. Please find attached a patch.
Best regards,
Etsuro Fujita
Show quoted text
(2011/11/18 21:00), Shigeru Hanada wrote:
(2011/11/18 16:25), Etsuro Fujita wrote:
Thank you for your testing. I updated the patch according to your
comments. Attached is the updated version of the patch.I'd like to share result of my review even though it's not fully
finished. So far I looked from viewpoint of API design, code
formatting, and documentation. I'll examine effectiveness of the patch
and details of implementation next week, and hopefully try writing
ANALYZE handler for pgsql_fdw :)New patch has correct format, and it could be applied to HEAD of master
branch cleanly. All regression tests including those of contrib modules
have passed. It contains changes of codes and regression tests related
to the issue, and they have enough comments. IMO the document in this
patch is not enough to show how to write analyze handler to FDW authors,
but it can be enhanced afterward. With this patch, FDW author can
provide optional ANALYZE handler which updates statistics of foreign
tables. Planner would be able to generate better plan by using statistics.Yes. But in the updated version, I've refactored analyze.c a little bit
to allow FDW authors to simply call do_analyze_rel().<snip>
The updated version enables FDW authors to just write their own
acquire_sample_rows(). On the other hand, by retaining to hook
AnalyzeForeignTable routine at analyze_rel(), higher level than
acquire_sample_rows() as before, it allows FDW authors to write
AnalyzeForeignTable routine for foreign tables on a remote server to ask
the server for its current stats instead, as pointed out earlier by Tom
Lane.IIUC, this patch offers three options to FDWs: a) set
AnalyzeForeignTable to NULL to indicate lack of capability, b) provide
AnalyzeForeignTable which calls do_analyze_rel with custom
sample_row_acquirer, and c) create statistics data from scratch by FDW
itself by doing similar things to do_analyze_rel with given argument or
copying statistics from remote PostgreSQL server.ISTM that this design is well-balanced between simplicity and
flexibility. Maybe these three options would suit web-based wrappers,
file-based or RDBMS wrappers, and pgsql_fdw respectively. I think that
adding more details of FdwRoutine, such as purpose of new callback
function and difference from required ones, would help FDW authors,
including me :)I have some random comments:
- I think separated typedef of sample_acquire_rows would make codes more
readable. In addition, parameters of the function should be described
explicitly.
- I couldn't see the reason why file_fdw sets ctid of sample tuples,
though I guess it's for Vitter's random sampling algorithm. If every
FDW must set valid ctid to sample tuples, it should be mentioned in
document of AnalyzeForeignTable. Exporting some functions from
analyze.c relates this issue?
- Why file_fdw skips sample tuples which have NULL value? AFAIS
original acquire_sample_rows doesn't do so.
- Some comment lines go past 80 columns.
- Some headers included in file_fdw.c seems unnecessary.Regards,
Attachments:
postgresql-analyze-v5.patchtext/plain; name=postgresql-analyze-v5.patchDownload
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
***************
*** 20,25 ****
--- 20,26 ----
#include "commands/copy.h"
#include "commands/defrem.h"
#include "commands/explain.h"
+ #include "commands/vacuum.h"
#include "foreign/fdwapi.h"
#include "foreign/foreign.h"
#include "miscadmin.h"
***************
*** 101,106 **** static void fileBeginForeignScan(ForeignScanState *node, int eflags);
--- 102,108 ----
static TupleTableSlot *fileIterateForeignScan(ForeignScanState *node);
static void fileReScanForeignScan(ForeignScanState *node);
static void fileEndForeignScan(ForeignScanState *node);
+ static void fileAnalyzeForeignTable(Relation onerel, VacuumStmt *vacstmt, int elevel);
/*
* Helper functions
***************
*** 112,118 **** static List *get_file_fdw_attribute_options(Oid relid);
static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
const char *filename,
Cost *startup_cost, Cost *total_cost);
!
/*
* Foreign-data wrapper handler function: return a struct with pointers
--- 114,123 ----
static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
const char *filename,
Cost *startup_cost, Cost *total_cost);
! static int acquire_sample_rows(Relation onerel,
! HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel);
/*
* Foreign-data wrapper handler function: return a struct with pointers
***************
*** 129,134 **** file_fdw_handler(PG_FUNCTION_ARGS)
--- 134,140 ----
fdwroutine->IterateForeignScan = fileIterateForeignScan;
fdwroutine->ReScanForeignScan = fileReScanForeignScan;
fdwroutine->EndForeignScan = fileEndForeignScan;
+ fdwroutine->AnalyzeForeignTable = fileAnalyzeForeignTable;
PG_RETURN_POINTER(fdwroutine);
}
***************
*** 575,580 **** fileReScanForeignScan(ForeignScanState *node)
--- 581,598 ----
}
/*
+ * fileAnalyzeForeignTable
+ * Analyze table
+ */
+ static void
+ fileAnalyzeForeignTable(Relation onerel, VacuumStmt *vacstmt, int elevel)
+ {
+ do_analyze_rel(onerel, vacstmt,
+ elevel, false,
+ acquire_sample_rows);
+ }
+
+ /*
* Estimate costs of scanning a foreign table.
*/
static void
***************
*** 584,590 **** estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
{
struct stat stat_buf;
BlockNumber pages;
! int tuple_width;
double ntuples;
double nrows;
Cost run_cost = 0;
--- 602,609 ----
{
struct stat stat_buf;
BlockNumber pages;
! BlockNumber relpages;
! double reltuples;
double ntuples;
double nrows;
Cost run_cost = 0;
***************
*** 604,619 **** estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
if (pages < 1)
pages = 1;
! /*
! * Estimate the number of tuples in the file. We back into this estimate
! * using the planner's idea of the relation width; which is bogus if not
! * all columns are being read, not to mention that the text representation
! * of a row probably isn't the same size as its internal representation.
! * FIXME later.
! */
! tuple_width = MAXALIGN(baserel->width) + MAXALIGN(sizeof(HeapTupleHeaderData));
! ntuples = clamp_row_est((double) stat_buf.st_size / (double) tuple_width);
/*
* Now estimate the number of rows returned by the scan after applying the
--- 623,654 ----
if (pages < 1)
pages = 1;
! relpages = baserel->pages;
! reltuples = baserel->tuples;
!
! if (relpages > 0)
! {
! double density;
! density = reltuples / (double) relpages;
!
! ntuples = clamp_row_est(density * (double) pages);
! }
! else
! {
! int tuple_width;
!
! /*
! * Estimate the number of tuples in the file. We back into this estimate
! * using the planner's idea of the relation width; which is bogus if not
! * all columns are being read, not to mention that the text representation
! * of a row probably isn't the same size as its internal representation.
! * FIXME later.
! */
! tuple_width = MAXALIGN(baserel->width) + MAXALIGN(sizeof(HeapTupleHeaderData));
!
! ntuples = clamp_row_est((double) stat_buf.st_size / (double) tuple_width);
! }
/*
* Now estimate the number of rows returned by the scan after applying the
***************
*** 645,647 **** estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
--- 680,867 ----
run_cost += cpu_per_tuple * ntuples;
*total_cost = *startup_cost + run_cost;
}
+
+ /*
+ * acquire_sample_rows -- acquire a random sample of rows from the table
+ *
+ * Selected rows are returned in the caller-allocated array rows[], which
+ * must have at least targrows entries.
+ * The actual number of rows selected is returned as the function result.
+ * We also count the number of rows in the table, and return it into *totalrows.
+ *
+ * The returned list of tuples is in order by physical position in the table.
+ * (We will rely on this later to derive correlation estimates.)
+ */
+ static int
+ acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
+ double *totalrows, double *totaldeadrows,
+ BlockNumber *totalpages, int elevel)
+ {
+ int numrows = 0;
+ int invalrows = 0; /* total # rows violating the NOT NULL constraints */
+ double validrows = 0; /* total # rows collected */
+ double rowstoskip = -1; /* -1 means not set yet */
+ double rstate;
+ HeapTuple tuple;
+ TupleDesc tupDesc;
+ TupleConstr *constr;
+ int natts;
+ int attrChk;
+ Datum *values;
+ bool *nulls;
+ bool found;
+ bool sample_it = false;
+ char *filename;
+ struct stat stat_buf;
+ List *options;
+ CopyState cstate;
+ ErrorContextCallback errcontext;
+
+ Assert(onerel);
+ Assert(targrows > 0);
+
+ tupDesc = RelationGetDescr(onerel);
+ constr = tupDesc->constr;
+ natts = tupDesc->natts;
+ values = (Datum *) palloc(tupDesc->natts * sizeof(Datum));
+ nulls = (bool *) palloc(tupDesc->natts * sizeof(bool));
+
+ /* Fetch options of foreign table */
+ fileGetOptions(RelationGetRelid(onerel), &filename, &options);
+
+ /*
+ * Get size of the file.
+ */
+ if (stat(filename, &stat_buf) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ filename)));
+
+ /*
+ * Convert size to pages for use in I/O cost estimate.
+ */
+ *totalpages = (stat_buf.st_size + (BLCKSZ - 1)) / BLCKSZ;
+ if (*totalpages < 1)
+ *totalpages = 1;
+
+ /*
+ * Create CopyState from FDW options. We always acquire all columns, so
+ * as to match the expected ScanTupleSlot signature.
+ */
+ cstate = BeginCopyFrom(onerel, filename, NIL, options);
+
+ /* Prepare for sampling rows */
+ rstate = init_selection_state(targrows);
+
+ /* Set up callback to identify error line number. */
+ errcontext.callback = CopyFromErrorCallback;
+ errcontext.arg = (void *) cstate;
+ errcontext.previous = error_context_stack;
+ error_context_stack = &errcontext;
+
+ for (;;)
+ {
+ sample_it = true;
+
+ /*
+ * Check for user-requested abort.
+ */
+ CHECK_FOR_INTERRUPTS();
+
+ found = NextCopyFrom(cstate, NULL, values, nulls, NULL);
+
+ if (!found)
+ break;
+
+ tuple = heap_form_tuple(tupDesc, values, nulls);
+
+ if (constr && constr->has_not_null)
+ {
+ for (attrChk = 1; attrChk <= natts; attrChk++)
+ {
+ if (onerel->rd_att->attrs[attrChk - 1]->attnotnull &&
+ heap_attisnull(tuple, attrChk))
+ {
+ sample_it = false;
+ break;
+ }
+ }
+ }
+
+ if (!sample_it)
+ {
+ invalrows += 1;
+ heap_freetuple(tuple);
+ continue;
+ }
+
+ /*
+ * The first targrows sample rows are simply copied into the
+ * reservoir. Then we start replacing tuples in the sample
+ * until we reach the end of the relation. This algorithm is
+ * from Jeff Vitter's paper (see full citation below). It
+ * works by repeatedly computing the number of tuples to skip
+ * before selecting a tuple, which replaces a randomly chosen
+ * element of the reservoir (current set of tuples). At all
+ * times the reservoir is a true random sample of the tuples
+ * we've passed over so far, so when we fall off the end of
+ * the relation we're done.
+ */
+ if (numrows < targrows)
+ rows[numrows++] = heap_copytuple(tuple);
+ else
+ {
+ /*
+ * t in Vitter's paper is the number of records already
+ * processed. If we need to compute a new S value, we
+ * must use the not-yet-incremented value of samplerows as
+ * t.
+ */
+ if (rowstoskip < 0)
+ rowstoskip = get_next_S(validrows, targrows, &rstate);
+
+ if (rowstoskip <= 0)
+ {
+ /*
+ * Found a suitable tuple, so save it, replacing one
+ * old tuple at random
+ */
+ int k = (int) (targrows * random_fract());
+
+ Assert(k >= 0 && k < targrows);
+ heap_freetuple(rows[k]);
+ rows[k] = heap_copytuple(tuple);
+ }
+
+ rowstoskip -= 1;
+ }
+
+ validrows += 1;
+ heap_freetuple(tuple);
+ }
+
+ /* Remove error callback. */
+ error_context_stack = errcontext.previous;
+
+ *totalrows = validrows;
+ *totaldeadrows = 0;
+
+ EndCopyFrom(cstate);
+
+ pfree(values);
+ pfree(nulls);
+
+ /*
+ * Emit some interesting relation info
+ */
+ ereport(elevel,
+ (errmsg("\"%s\": scanned, "
+ "containing %d valid rows and %d invalid rows; "
+ "%d rows in sample, %d total rows",
+ RelationGetRelationName(onerel),
+ (int) validrows, invalrows,
+ numrows, (int) *totalrows)));
+
+ return numrows;
+ }
*** a/contrib/file_fdw/input/file_fdw.source
--- b/contrib/file_fdw/input/file_fdw.source
***************
*** 111,116 **** EXECUTE st(100);
--- 111,121 ----
EXECUTE st(100);
DEALLOCATE st;
+ -- statistics collection tests
+ ANALYZE agg_csv;
+ SELECT relpages, reltuples FROM pg_class WHERE relname = 'agg_csv';
+ SELECT * FROM pg_stats WHERE tablename = 'agg_csv';
+
-- tableoid
SELECT tableoid::regclass, b FROM agg_csv;
*** a/contrib/file_fdw/output/file_fdw.source
--- b/contrib/file_fdw/output/file_fdw.source
***************
*** 174,179 **** EXECUTE st(100);
--- 174,194 ----
(1 row)
DEALLOCATE st;
+ -- statistics collection tests
+ ANALYZE agg_csv;
+ SELECT relpages, reltuples FROM pg_class WHERE relname = 'agg_csv';
+ relpages | reltuples
+ ----------+-----------
+ 1 | 3
+ (1 row)
+
+ SELECT * FROM pg_stats WHERE tablename = 'agg_csv';
+ schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
+ ------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+-------------------------+-------------
+ public | agg_csv | a | f | 0 | 2 | -1 | | | {0,42,100} | -0.5
+ public | agg_csv | b | f | 0 | 4 | -1 | | | {0.09561,99.097,324.78} | 0.5
+ (2 rows)
+
-- tableoid
SELECT tableoid::regclass, b FROM agg_csv;
tableoid | b
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***************
*** 228,233 **** EndForeignScan (ForeignScanState *node);
--- 228,257 ----
</para>
<para>
+ <programlisting>
+ void
+ AnalyzeForeignTable (Relation onerel,
+ VacuumStmt *vacstmt,
+ int elevel);
+ </programlisting>
+
+ Collect statistics on a foreign table and store the results in the
+ pg_class and pg_statistics system catalogs.
+ This is optional, and if implemented, called when <command>ANALYZE
+ </> command is run.
+ The statistics are used by the query planner in order to make good
+ choices of query plans.
+ The function can be implemented by writing a sampling function that
+ acquires a random sample of rows from an external data source and
+ then by calling <function>do_analyze_rel</>, where you should pass
+ the sampling function as an argument.
+ Or the function can be directly implemented to get the statistics
+ from an external data source, transform it if neccesary, and store
+ it in the pg_class and pg_statistics system catalogs.
+ The function must be set to NULL if it doesn't be implemented.
+ </para>
+
+ <para>
The <structname>FdwRoutine</> and <structname>FdwPlan</> struct types
are declared in <filename>src/include/foreign/fdwapi.h</>, which see
for additional details.
*** a/doc/src/sgml/maintenance.sgml
--- b/doc/src/sgml/maintenance.sgml
***************
*** 279,284 ****
--- 279,288 ----
<command>ANALYZE</> strictly as a function of the number of rows
inserted or updated; it has no knowledge of whether that will lead
to meaningful statistical changes.
+ Note that the autovacuum daemon does not issue <command>ANALYZE</>
+ commands on foreign tables. It is recommended to run manually-managed
+ <command>ANALYZE</> commands as needed, which typically are executed
+ according to a schedule by cron or Task Scheduler scripts.
</para>
<para>
*** a/doc/src/sgml/ref/alter_foreign_table.sgml
--- b/doc/src/sgml/ref/alter_foreign_table.sgml
***************
*** 36,41 **** ALTER FOREIGN TABLE <replaceable class="PARAMETER">name</replaceable>
--- 36,44 ----
DROP [ COLUMN ] [ IF EXISTS ] <replaceable class="PARAMETER">column</replaceable> [ RESTRICT | CASCADE ]
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> [ SET DATA ] TYPE <replaceable class="PARAMETER">type</replaceable>
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> { SET | DROP } NOT NULL
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> SET STATISTICS <replaceable class="PARAMETER">integer</replaceable>
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> SET ( <replaceable class="PARAMETER">attribute_option</replaceable> = <replaceable class="PARAMETER">value</replaceable> [, ... ] )
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> RESET ( <replaceable class="PARAMETER">attribute_option</replaceable> [, ... ] )
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> OPTIONS ( [ ADD | SET | DROP ] <replaceable class="PARAMETER">option</replaceable> ['<replaceable class="PARAMETER">value</replaceable>'] [, ... ])
OWNER TO <replaceable class="PARAMETER">new_owner</replaceable>
OPTIONS ( [ ADD | SET | DROP ] <replaceable class="PARAMETER">option</replaceable> ['<replaceable class="PARAMETER">value</replaceable>'] [, ... ])
***************
*** 94,99 **** ALTER FOREIGN TABLE <replaceable class="PARAMETER">name</replaceable>
--- 97,146 ----
</varlistentry>
<varlistentry>
+ <term><literal>SET STATISTICS</literal></term>
+ <listitem>
+ <para>
+ This form
+ sets the per-column statistics-gathering target for subsequent
+ <xref linkend="sql-analyze"> operations.
+ The target can be set in the range 0 to 10000; alternatively, set it
+ to -1 to revert to using the system default statistics
+ target (<xref linkend="guc-default-statistics-target">).
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>SET ( <replaceable class="PARAMETER">attribute_option</replaceable> = <replaceable class="PARAMETER">value</replaceable> [, ... ] )</literal></term>
+ <term><literal>RESET ( <replaceable class="PARAMETER">attribute_option</replaceable> [, ... ] )</literal></term>
+ <listitem>
+ <para>
+ This form
+ sets or resets a per-attribute option. Currently, the only defined
+ per-attribute option is <literal>n_distinct</>, which overrides
+ the number-of-distinct-values estimates made by subsequent
+ <xref linkend="sql-analyze"> operations.
+ When set to a positive value, <command>ANALYZE</> will assume that
+ the column contains exactly the specified number of distinct nonnull
+ values.
+ When set to a negative value, which must be greater than or equal
+ to -1, <command>ANALYZE</> will assume that the number of distinct
+ nonnull values in the column is linear in the size of the foreign
+ table; the exact count is to be computed by multiplying the estimated
+ foreign table size by the absolute value of the given number.
+ For example,
+ a value of -1 implies that all values in the column are distinct,
+ while a value of -0.5 implies that each value appears twice on the
+ average.
+ This can be useful when the size of the foreign table changes over
+ time, since the multiplication by the number of rows in the foreign
+ table is not performed until query planning time. Specify a value
+ of 0 to revert to estimating the number of distinct values normally.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>OWNER</literal></term>
<listitem>
<para>
*** a/doc/src/sgml/ref/analyze.sgml
--- b/doc/src/sgml/ref/analyze.sgml
***************
*** 39,47 **** ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ ( <re
<para>
With no parameter, <command>ANALYZE</command> examines every table in the
! current database. With a parameter, <command>ANALYZE</command> examines
! only that table. It is further possible to give a list of column names,
! in which case only the statistics for those columns are collected.
</para>
</refsect1>
--- 39,49 ----
<para>
With no parameter, <command>ANALYZE</command> examines every table in the
! current database except for foreign tables. With a parameter, <command>
! ANALYZE</command> examines only that table. For a foreign table, it is
! necessary to spcify the name of that table. It is further possible to
! give a list of column names, in which case only the statistics for those
! columns are collected.
</para>
</refsect1>
***************
*** 63,69 **** ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ ( <re
<listitem>
<para>
The name (possibly schema-qualified) of a specific table to
! analyze. Defaults to all tables in the current database.
</para>
</listitem>
</varlistentry>
--- 65,72 ----
<listitem>
<para>
The name (possibly schema-qualified) of a specific table to
! analyze. Defaults to all tables in the current database except
! for foreign tables.
</para>
</listitem>
</varlistentry>
***************
*** 137,143 **** ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ ( <re
In rare situations, this non-determinism will cause the planner's
choices of query plans to change after <command>ANALYZE</command> is run.
To avoid this, raise the amount of statistics collected by
! <command>ANALYZE</command>, as described below.
</para>
<para>
--- 140,148 ----
In rare situations, this non-determinism will cause the planner's
choices of query plans to change after <command>ANALYZE</command> is run.
To avoid this, raise the amount of statistics collected by
! <command>ANALYZE</command>, as described below. Note that the time
! needed to analyze on foreign tables depends on the implementation of
! the foreign data wrapper via which such tables are attached.
</para>
<para>
*** a/src/backend/commands/analyze.c
--- b/src/backend/commands/analyze.c
***************
*** 23,28 ****
--- 23,29 ----
#include "access/xact.h"
#include "catalog/index.h"
#include "catalog/indexing.h"
+ #include "catalog/pg_class.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_namespace.h"
***************
*** 30,35 ****
--- 31,38 ----
#include "commands/tablecmds.h"
#include "commands/vacuum.h"
#include "executor/executor.h"
+ #include "foreign/foreign.h"
+ #include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "parser/parse_oper.h"
***************
*** 78,91 **** typedef struct AnlIndexData
int default_statistics_target = 100;
/* A few variables that don't seem worth passing around as parameters */
- static int elevel = -1;
-
static MemoryContext anl_context = NULL;
static BufferAccessStrategy vac_strategy;
- static void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh);
static void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
int samplesize);
static bool BlockSampler_HasMore(BlockSampler bs);
--- 81,91 ----
***************
*** 96,110 **** static void compute_index_stats(Relation onerel, double totalrows,
MemoryContext col_context);
static VacAttrStats *examine_attribute(Relation onerel, int attnum,
Node *index_expr);
! static int acquire_sample_rows(Relation onerel, HeapTuple *rows,
! int targrows, double *totalrows, double *totaldeadrows);
! static double random_fract(void);
! static double init_selection_state(int n);
! static double get_next_S(double t, int n, double *stateptr);
! static int compare_rows(const void *a, const void *b);
static int acquire_inherited_sample_rows(Relation onerel,
HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows);
static void update_attstats(Oid relid, bool inh,
int natts, VacAttrStats **vacattrstats);
static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
--- 96,110 ----
MemoryContext col_context);
static VacAttrStats *examine_attribute(Relation onerel, int attnum,
Node *index_expr);
! static int acquire_sample_rows(Relation onerel,
! HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel);
static int acquire_inherited_sample_rows(Relation onerel,
HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel);
! static int compare_rows(const void *a, const void *b);
static void update_attstats(Oid relid, bool inh,
int natts, VacAttrStats **vacattrstats);
static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
***************
*** 119,125 **** static bool std_typanalyze(VacAttrStats *stats);
--- 119,127 ----
void
analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
{
+ int elevel;
Relation onerel;
+ FdwRoutine *fdwroutine;
/* Set up static variables */
if (vacstmt->options & VACOPT_VERBOSE)
***************
*** 184,193 **** analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
}
/*
! * Check that it's a plain table; we used to do this in get_rel_oids() but
! * seems safer to check after we've locked the relation.
*/
! if (onerel->rd_rel->relkind != RELKIND_RELATION)
{
/* No need for a WARNING if we already complained during VACUUM */
if (!(vacstmt->options & VACOPT_VACUUM))
--- 186,197 ----
}
/*
! * Check that it's a plain table or foreign table; we used to do this
! * in get_rel_oids() but seems safer to check after we've locked the
! * relation.
*/
! if (onerel->rd_rel->relkind != RELKIND_RELATION &&
! onerel->rd_rel->relkind != RELKIND_FOREIGN_TABLE)
{
/* No need for a WARNING if we already complained during VACUUM */
if (!(vacstmt->options & VACOPT_VACUUM))
***************
*** 212,217 **** analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
--- 216,223 ----
/*
* We can ANALYZE any table except pg_statistic. See update_attstats
+ * We can ANALYZE foreign tables if the underlying foreign-data wrappers
+ * have their AnalyzeForeignTable callback routines.
*/
if (RelationGetRelid(onerel) == StatisticRelationId)
{
***************
*** 219,224 **** analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
--- 225,244 ----
return;
}
+ if (onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ {
+ fdwroutine = GetFdwRoutineByRelId(RelationGetRelid(onerel));
+
+ if (fdwroutine->AnalyzeForeignTable == NULL)
+ {
+ ereport(WARNING,
+ (errmsg("skipping \"%s\" --- underlying foreign-data wrapper cannot analyze it",
+ RelationGetRelationName(onerel))));
+ relation_close(onerel, ShareUpdateExclusiveLock);
+ return;
+ }
+ }
+
/*
* OK, let's do it. First let other backends know I'm in ANALYZE.
*/
***************
*** 226,241 **** analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
LWLockRelease(ProcArrayLock);
! /*
! * Do the normal non-recursive ANALYZE.
! */
! do_analyze_rel(onerel, vacstmt, false);
! /*
! * If there are child tables, do recursive ANALYZE.
! */
! if (onerel->rd_rel->relhassubclass)
! do_analyze_rel(onerel, vacstmt, true);
/*
* Close source relation now, but keep lock so that no one deletes it
--- 246,282 ----
MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
LWLockRelease(ProcArrayLock);
! if (onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
! {
! ereport(elevel,
! (errmsg("analyzing \"%s.%s\"",
! get_namespace_name(RelationGetNamespace(onerel)),
! RelationGetRelationName(onerel))));
! fdwroutine->AnalyzeForeignTable(onerel, vacstmt, elevel);
! }
! else
! {
! /*
! * Do the normal non-recursive ANALYZE.
! */
! ereport(elevel,
! (errmsg("analyzing \"%s.%s\"",
! get_namespace_name(RelationGetNamespace(onerel)),
! RelationGetRelationName(onerel))));
! do_analyze_rel(onerel, vacstmt, elevel, false, acquire_sample_rows);
! /*
! * If there are child tables, do recursive ANALYZE.
! */
! if (onerel->rd_rel->relhassubclass)
! {
! ereport(elevel,
! (errmsg("analyzing \"%s.%s\" inheritance tree",
! get_namespace_name(RelationGetNamespace(onerel)),
! RelationGetRelationName(onerel))));
! do_analyze_rel(onerel, vacstmt, elevel, true, acquire_inherited_sample_rows);
! }
! }
/*
* Close source relation now, but keep lock so that no one deletes it
***************
*** 257,264 **** analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
/*
* do_analyze_rel() -- analyze one relation, recursively or not
*/
! static void
! do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
{
int attr_cnt,
tcnt,
--- 298,306 ----
/*
* do_analyze_rel() -- analyze one relation, recursively or not
*/
! void
! do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, int elevel,
! bool inh, SampleRowAcquireFunc acquirefunc)
{
int attr_cnt,
tcnt,
***************
*** 273,278 **** do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
--- 315,321 ----
numrows;
double totalrows,
totaldeadrows;
+ BlockNumber totalpages;
HeapTuple *rows;
PGRUsage ru0;
TimestampTz starttime = 0;
***************
*** 281,297 **** do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
int save_sec_context;
int save_nestlevel;
- if (inh)
- ereport(elevel,
- (errmsg("analyzing \"%s.%s\" inheritance tree",
- get_namespace_name(RelationGetNamespace(onerel)),
- RelationGetRelationName(onerel))));
- else
- ereport(elevel,
- (errmsg("analyzing \"%s.%s\"",
- get_namespace_name(RelationGetNamespace(onerel)),
- RelationGetRelationName(onerel))));
-
/*
* Set up a working context so that we can easily free whatever junk gets
* created.
--- 324,329 ----
***************
*** 449,459 **** do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
*/
rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
if (inh)
! numrows = acquire_inherited_sample_rows(onerel, rows, targrows,
! &totalrows, &totaldeadrows);
else
! numrows = acquire_sample_rows(onerel, rows, targrows,
! &totalrows, &totaldeadrows);
/*
* Compute the statistics. Temporary results during the calculations for
--- 481,493 ----
*/
rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
if (inh)
! numrows = acquirefunc(onerel, rows, targrows,
! &totalrows, &totaldeadrows,
! NULL, elevel);
else
! numrows = acquirefunc(onerel, rows, targrows,
! &totalrows, &totaldeadrows,
! &totalpages, elevel);
/*
* Compute the statistics. Temporary results during the calculations for
***************
*** 534,540 **** do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
*/
if (!inh)
vac_update_relstats(onerel,
! RelationGetNumberOfBlocks(onerel),
totalrows,
visibilitymap_count(onerel),
hasindex,
--- 568,574 ----
*/
if (!inh)
vac_update_relstats(onerel,
! totalpages,
totalrows,
visibilitymap_count(onerel),
hasindex,
***************
*** 1017,1023 **** BlockSampler_Next(BlockSampler bs)
*/
static int
acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows)
{
int numrows = 0; /* # rows now in reservoir */
double samplerows = 0; /* total # rows collected */
--- 1051,1058 ----
*/
static int
acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel)
{
int numrows = 0; /* # rows now in reservoir */
double samplerows = 0; /* total # rows collected */
***************
*** 1032,1037 **** acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
--- 1067,1074 ----
Assert(targrows > 0);
totalblocks = RelationGetNumberOfBlocks(onerel);
+ if (totalpages)
+ *totalpages = totalblocks;
/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true);
***************
*** 1254,1260 **** acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
}
/* Select a random value R uniformly distributed in (0 - 1) */
! static double
random_fract(void)
{
return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
--- 1291,1297 ----
}
/* Select a random value R uniformly distributed in (0 - 1) */
! double
random_fract(void)
{
return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
***************
*** 1274,1287 **** random_fract(void)
* determines the number of records to skip before the next record is
* processed.
*/
! static double
init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
return exp(-log(random_fract()) / n);
}
! static double
get_next_S(double t, int n, double *stateptr)
{
double S;
--- 1311,1324 ----
* determines the number of records to skip before the next record is
* processed.
*/
! double
init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
return exp(-log(random_fract()) / n);
}
! double
get_next_S(double t, int n, double *stateptr)
{
double S;
***************
*** 1397,1403 **** compare_rows(const void *a, const void *b)
*/
static int
acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows)
{
List *tableOIDs;
Relation *rels;
--- 1434,1441 ----
*/
static int
acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel)
{
List *tableOIDs;
Relation *rels;
***************
*** 1460,1465 **** acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
--- 1498,1505 ----
totalblocks += relblocks[nrels];
nrels++;
}
+ if (totalpages)
+ *totalpages = totalblocks;
/*
* Now sample rows from each relation, proportionally to its fraction of
***************
*** 1493,1499 **** acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
rows + numrows,
childtargrows,
&trows,
! &tdrows);
/* We may need to convert from child's rowtype to parent's */
if (childrows > 0 &&
--- 1533,1541 ----
rows + numrows,
childtargrows,
&trows,
! &tdrows,
! NULL,
! elevel);
/* We may need to convert from child's rowtype to parent's */
if (childrows > 0 &&
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***************
*** 317,322 **** static void ATPrepSetStatistics(Relation rel, const char *colName,
--- 317,324 ----
Node *newValue, LOCKMODE lockmode);
static void ATExecSetStatistics(Relation rel, const char *colName,
Node *newValue, LOCKMODE lockmode);
+ static void ATPrepSetOptions(Relation rel, const char *colName,
+ Node *options, LOCKMODE lockmode);
static void ATExecSetOptions(Relation rel, const char *colName,
Node *options, bool isReset, LOCKMODE lockmode);
static void ATExecSetStorage(Relation rel, const char *colName,
***************
*** 2916,2921 **** ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
--- 2918,2924 ----
case AT_SetOptions: /* ALTER COLUMN SET ( options ) */
case AT_ResetOptions: /* ALTER COLUMN RESET ( options ) */
ATSimplePermissions(rel, ATT_TABLE | ATT_INDEX);
+ ATPrepSetOptions(rel, cmd->name, cmd->def, lockmode);
/* This command never recurses */
pass = AT_PASS_MISC;
break;
***************
*** 4851,4860 **** ATPrepSetStatistics(Relation rel, const char *colName, Node *newValue, LOCKMODE
* allowSystemTableMods to be turned on.
*/
if (rel->rd_rel->relkind != RELKIND_RELATION &&
! rel->rd_rel->relkind != RELKIND_INDEX)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
! errmsg("\"%s\" is not a table or index",
RelationGetRelationName(rel))));
/* Permissions checks */
--- 4854,4864 ----
* allowSystemTableMods to be turned on.
*/
if (rel->rd_rel->relkind != RELKIND_RELATION &&
! rel->rd_rel->relkind != RELKIND_INDEX &&
! rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
! errmsg("\"%s\" is not a table, index or foreign table",
RelationGetRelationName(rel))));
/* Permissions checks */
***************
*** 4923,4928 **** ATExecSetStatistics(Relation rel, const char *colName, Node *newValue, LOCKMODE
--- 4927,4952 ----
}
static void
+ ATPrepSetOptions(Relation rel, const char *colName, Node *options,
+ LOCKMODE lockmode)
+ {
+ if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ {
+ ListCell *cell;
+
+ foreach(cell, (List *) options)
+ {
+ DefElem *def = (DefElem *) lfirst(cell);
+
+ if (pg_strncasecmp(def->defname, "n_distinct_inherited", strlen("n_distinct_inherited")) == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot support option \"n_distinct_inherited\" for foreign tables")));
+ }
+ }
+ }
+
+ static void
ATExecSetOptions(Relation rel, const char *colName, Node *options,
bool isReset, LOCKMODE lockmode)
{
*** a/src/bin/psql/tab-complete.c
--- b/src/bin/psql/tab-complete.c
***************
*** 399,404 **** static const SchemaQuery Query_for_list_of_tsvf = {
--- 399,419 ----
NULL
};
+ static const SchemaQuery Query_for_list_of_tf = {
+ /* catname */
+ "pg_catalog.pg_class c",
+ /* selcondition */
+ "c.relkind IN ('r', 'f')",
+ /* viscondition */
+ "pg_catalog.pg_table_is_visible(c.oid)",
+ /* namespace */
+ "c.relnamespace",
+ /* result */
+ "pg_catalog.quote_ident(c.relname)",
+ /* qualresult */
+ NULL
+ };
+
static const SchemaQuery Query_for_list_of_views = {
/* catname */
"pg_catalog.pg_class c",
***************
*** 2769,2775 **** psql_completion(char *text, int start, int end)
/* ANALYZE */
/* If the previous word is ANALYZE, produce list of tables */
else if (pg_strcasecmp(prev_wd, "ANALYZE") == 0)
! COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);
/* WHERE */
/* Simple case of the word before the where being the table name */
--- 2784,2790 ----
/* ANALYZE */
/* If the previous word is ANALYZE, produce list of tables */
else if (pg_strcasecmp(prev_wd, "ANALYZE") == 0)
! COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tf, NULL);
/* WHERE */
/* Simple case of the word before the where being the table name */
*** a/src/include/commands/vacuum.h
--- b/src/include/commands/vacuum.h
***************
*** 165,171 **** extern void lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
--- 165,182 ----
BufferAccessStrategy bstrategy);
/* in commands/analyze.c */
+ typedef int (*SampleRowAcquireFunc) (Relation onerel, HeapTuple *rows,
+ int targrows,
+ double *totalrows, double *totaldeadrows,
+ BlockNumber *totalpages, int elevel);
+
extern void analyze_rel(Oid relid, VacuumStmt *vacstmt,
BufferAccessStrategy bstrategy);
+ extern void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, int elevel,
+ bool inh, SampleRowAcquireFunc acquirefunc);
+ extern double random_fract(void);
+ extern double init_selection_state(int n);
+ extern double get_next_S(double t, int n, double *stateptr);
+
#endif /* VACUUM_H */
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 12,19 ****
--- 12,21 ----
#ifndef FDWAPI_H
#define FDWAPI_H
+ #include "foreign/foreign.h"
#include "nodes/execnodes.h"
#include "nodes/relation.h"
+ #include "utils/rel.h"
/* To avoid including explain.h here, reference ExplainState thus: */
struct ExplainState;
***************
*** 68,73 **** typedef void (*ReScanForeignScan_function) (ForeignScanState *node);
--- 70,78 ----
typedef void (*EndForeignScan_function) (ForeignScanState *node);
+ typedef void (*AnalyzeForeignTable_function) (Relation relation,
+ VacuumStmt *vacstmt,
+ int elevel);
/*
* FdwRoutine is the struct returned by a foreign-data wrapper's handler
***************
*** 88,93 **** typedef struct FdwRoutine
--- 93,99 ----
IterateForeignScan_function IterateForeignScan;
ReScanForeignScan_function ReScanForeignScan;
EndForeignScan_function EndForeignScan;
+ AnalyzeForeignTable_function AnalyzeForeignTable;
} FdwRoutine;
(2011/12/09 21:16), Etsuro Fujita wrote:
I updated the patch. Please find attached a patch.
I've examined v5 patch, and got reasonable EXPLAIN results which reflect
collected statistics! As increasing STATISTICS option, estimated rows
become better. Please see attached stats_*.txt for what I
tested.
stats_none.txt : before ANALYZE
stats_100.txt : SET STATISTICS = 100 for all columns, and ANALYZE
stats_10000.txt : SET STATISTICS = 10000 for all columns, and ANALYZE
I think that this patch become ready for committer after some
minor corrections:
Couldn't set n_distinct
=======================
I couldn't set n_distinct to columns of foreign tables. With some
research, I noticed that ATSimplePermissions should accept
ATT_FOREIGN_TABLE too for that case. In addition, regression tests for
ALTER FOREIGN TABLE should be added to detect this kind of problem.
Showing stats target
====================
We can see stats target of ordinary tables with \d+, but it is not
available for foreign tables. I think "Stats target" column should be
added even though output of \d+ for foreign tables become wider. One
possible idea is to remove useless "Storage" column instead, but views
have that column even though their source could come from foreign tables.
Please see attached patch for these two items.
Comments of FdwRoutine
======================
Mention about optional handler is obsolete. We should clearly say
AnalyzeForeignTable is optional (can be set to NULL) and rest are
required. IMO separating them with comment would help FDW authors to
understand requirements, e.g.:
typedef struct FdwRoutine
{
NodeTag type;
/*
* These Handlers are required to execute simple scan on a foreign
* table. If any of them was set to NULL, scan on a foreign table
* managed by such FDW would fail.
*/
PlanForeignScan_function PlanForeignScan;
ExplainForeignScan_function ExplainForeignScan;
BeginForeignScan_function BeginForeignScan;
IterateForeignScan_function IterateForeignScan;
ReScanForeignScan_function ReScanForeignScan;
EndForeignScan_function EndForeignScan;
/*
* Handlers below are optional. You can set any of them to
* NULL to tell PostgreSQL that the FDW doesn't have the
* capability.
*/
AnalyzeForeignTable_function AnalyzeForeignTable;
} FdwRoutine;
Code formatting
===============
Some code lines go past 80 columns.
Message style
=============
The terms 'cannot support option "n_distinct"...' used in
ATPrepSetOptions seems little unusual in PostgreSQL. Should we say
'cannot set "n_distinct_inherited" for foreign tables' for that case?
Typo
====
Typo "spcify" is in document of analyze.
Regards,
--
Shigeru Hanada
Attachments:
show_stats_target.patchtext/plain; name=show_stats_target.patchDownload
commit b056c0cc38a9460c083741bc021a9b5eddee22f1
Author: Shigeru Hanada <shigeru.hanada@gmail.com>
Date: Mon Dec 12 18:14:26 2011 +0900
Fix psql to show stats target for foreign tables too.
Regression tests are also added for this change, and one simple bug
is detected and fixed.
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 5db476b..6dc736d 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -2917,7 +2917,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
break;
case AT_SetOptions: /* ALTER COLUMN SET ( options ) */
case AT_ResetOptions: /* ALTER COLUMN RESET ( options ) */
- ATSimplePermissions(rel, ATT_TABLE | ATT_INDEX);
+ ATSimplePermissions(rel, ATT_TABLE | ATT_INDEX | ATT_FOREIGN_TABLE);
ATPrepSetOptions(rel, cmd->name, cmd->def, lockmode);
/* This command never recurses */
pass = AT_PASS_MISC;
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index dcafdd2..802abf2 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1099,7 +1099,7 @@ describeOneTableDetails(const char *schemaname,
bool printTableInitialized = false;
int i;
char *view_def = NULL;
- char *headers[6];
+ char *headers[7];
char **seq_values = NULL;
char **modifiers = NULL;
char **ptr;
@@ -1390,7 +1390,7 @@ describeOneTableDetails(const char *schemaname,
if (verbose)
{
headers[cols++] = gettext_noop("Storage");
- if (tableinfo.relkind == 'r')
+ if (tableinfo.relkind == 'r' || tableinfo.relkind == 'f')
headers[cols++] = gettext_noop("Stats target");
/* Column comments, if the relkind supports this feature. */
if (tableinfo.relkind == 'r' || tableinfo.relkind == 'v' ||
@@ -1493,7 +1493,7 @@ describeOneTableDetails(const char *schemaname,
false, false);
/* Statistics target, if the relkind supports this feature */
- if (tableinfo.relkind == 'r')
+ if (tableinfo.relkind == 'r' || tableinfo.relkind == 'f')
{
printTableAddCell(&cont, PQgetvalue(res, i, firstvcol + 1),
false, false);
diff --git a/src/test/regress/expected/foreign_data.out b/src/test/regress/expected/foreign_data.out
index 122e285..4a16238 100644
--- a/src/test/regress/expected/foreign_data.out
+++ b/src/test/regress/expected/foreign_data.out
@@ -678,12 +678,12 @@ CREATE FOREIGN TABLE ft1 (
COMMENT ON FOREIGN TABLE ft1 IS 'ft1';
COMMENT ON COLUMN ft1.c1 IS 'ft1.c1';
\d+ ft1
- Foreign table "public.ft1"
- Column | Type | Modifiers | FDW Options | Storage | Description
---------+---------+-----------+--------------------------------+----------+-------------
- c1 | integer | not null | ("param 1" 'val1') | plain | ft1.c1
- c2 | text | | (param2 'val2', param3 'val3') | extended |
- c3 | date | | | plain |
+ Foreign table "public.ft1"
+ Column | Type | Modifiers | FDW Options | Storage | Stats target | Description
+--------+---------+-----------+--------------------------------+----------+--------------+-------------
+ c1 | integer | not null | ("param 1" 'val1') | plain | | ft1.c1
+ c2 | text | | (param2 'val2', param3 'val3') | extended | |
+ c3 | date | | | plain | |
Server: s0
FDW Options: (delimiter ',', quote '"', "be quoted" 'value')
Has OIDs: no
@@ -729,19 +729,24 @@ ERROR: cannot alter system column "xmin"
ALTER FOREIGN TABLE ft1 ALTER COLUMN c7 OPTIONS (ADD p1 'v1', ADD p2 'v2'),
ALTER COLUMN c8 OPTIONS (ADD p1 'v1', ADD p2 'v2');
ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 OPTIONS (SET p2 'V2', DROP p1);
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET STATISTICS 10000;
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET (n_distinct = 100);
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET (n_distinct_inherited = 100); -- ERROR
+ERROR: cannot set "n_distinct_inherited" on foreign tables
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 SET STATISTICS -1;
\d+ ft1
- Foreign table "public.ft1"
- Column | Type | Modifiers | FDW Options | Storage | Description
---------+---------+-----------+--------------------------------+----------+-------------
- c1 | integer | not null | ("param 1" 'val1') | plain |
- c2 | text | | (param2 'val2', param3 'val3') | extended |
- c3 | date | | | plain |
- c4 | integer | | | plain |
- c6 | integer | not null | | plain |
- c7 | integer | | (p1 'v1', p2 'v2') | plain |
- c8 | text | | (p2 'V2') | extended |
- c9 | integer | | | plain |
- c10 | integer | | (p1 'v1') | plain |
+ Foreign table "public.ft1"
+ Column | Type | Modifiers | FDW Options | Storage | Stats target | Description
+--------+---------+-----------+--------------------------------+----------+--------------+-------------
+ c1 | integer | not null | ("param 1" 'val1') | plain | 10000 |
+ c2 | text | | (param2 'val2', param3 'val3') | extended | |
+ c3 | date | | | plain | |
+ c4 | integer | | | plain | |
+ c6 | integer | not null | | plain | |
+ c7 | integer | | (p1 'v1', p2 'v2') | plain | |
+ c8 | text | | (p2 'V2') | extended | |
+ c9 | integer | | | plain | |
+ c10 | integer | | (p1 'v1') | plain | |
Server: s0
FDW Options: (delimiter ',', quote '"', "be quoted" 'value')
Has OIDs: no
diff --git a/src/test/regress/sql/foreign_data.sql b/src/test/regress/sql/foreign_data.sql
index e99e707..5908ff3 100644
--- a/src/test/regress/sql/foreign_data.sql
+++ b/src/test/regress/sql/foreign_data.sql
@@ -306,6 +306,10 @@ ALTER FOREIGN TABLE ft1 ALTER COLUMN xmin OPTIONS (ADD p1 'v1'); -- ERROR
ALTER FOREIGN TABLE ft1 ALTER COLUMN c7 OPTIONS (ADD p1 'v1', ADD p2 'v2'),
ALTER COLUMN c8 OPTIONS (ADD p1 'v1', ADD p2 'v2');
ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 OPTIONS (SET p2 'V2', DROP p1);
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET STATISTICS 10000;
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET (n_distinct = 100);
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET (n_distinct_inherited = 100); -- ERROR
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 SET STATISTICS -1;
\d+ ft1
-- can't change the column type if it's used elsewhere
CREATE TABLE use_ft1_column_type (x ft1);
(2011/12/12 19:33), Shigeru Hanada wrote:
(2011/12/09 21:16), Etsuro Fujita wrote:
I updated the patch. Please find attached a patch.
I've examined v5 patch, and got reasonable EXPLAIN results which reflect
collected statistics! As increasing STATISTICS option, estimated rows
become better. Please see attached stats_*.txt for what I
tested.stats_none.txt : before ANALYZE
stats_100.txt : SET STATISTICS = 100 for all columns, and ANALYZE
stats_10000.txt : SET STATISTICS = 10000 for all columns, and ANALYZEI think that this patch become ready for committer after some
minor corrections:Couldn't set n_distinct
=======================
I couldn't set n_distinct to columns of foreign tables. With some
research, I noticed that ATSimplePermissions should accept
ATT_FOREIGN_TABLE too for that case. In addition, regression tests for
ALTER FOREIGN TABLE should be added to detect this kind of problem.Showing stats target
====================
We can see stats target of ordinary tables with \d+, but it is not
available for foreign tables. I think "Stats target" column should be
added even though output of \d+ for foreign tables become wider. One
possible idea is to remove useless "Storage" column instead, but views
have that column even though their source could come from foreign tables.Please see attached patch for these two items.
Comments of FdwRoutine
======================
Mention about optional handler is obsolete. We should clearly say
AnalyzeForeignTable is optional (can be set to NULL) and rest are
required. IMO separating them with comment would help FDW authors to
understand requirements, e.g.:typedef struct FdwRoutine
{
NodeTag type;/*
* These Handlers are required to execute simple scan on a foreign
* table. If any of them was set to NULL, scan on a foreign table
* managed by such FDW would fail.
*/
PlanForeignScan_function PlanForeignScan;
ExplainForeignScan_function ExplainForeignScan;
BeginForeignScan_function BeginForeignScan;
IterateForeignScan_function IterateForeignScan;
ReScanForeignScan_function ReScanForeignScan;
EndForeignScan_function EndForeignScan;/*
* Handlers below are optional. You can set any of them to
* NULL to tell PostgreSQL that the FDW doesn't have the
* capability.
*/
AnalyzeForeignTable_function AnalyzeForeignTable;
} FdwRoutine;Code formatting
===============
Some code lines go past 80 columns.Message style
=============
The terms 'cannot support option "n_distinct"...' used in
ATPrepSetOptions seems little unusual in PostgreSQL. Should we say
'cannot set "n_distinct_inherited" for foreign tables' for that case?Typo
====
Typo "spcify" is in document of analyze.
Thank you for your effectiveness experiments and proposals for
improvements. I updated the patch according to your proposals.
Attached is the updated version of the patch.
Best regards,
Etsuro Fujita
Show quoted text
Regards,
Attachments:
postgresql-analyze-v6.patchtext/plain; name=postgresql-analyze-v6.patchDownload
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
***************
*** 20,25 ****
--- 20,26 ----
#include "commands/copy.h"
#include "commands/defrem.h"
#include "commands/explain.h"
+ #include "commands/vacuum.h"
#include "foreign/fdwapi.h"
#include "foreign/foreign.h"
#include "miscadmin.h"
***************
*** 101,106 **** static void fileBeginForeignScan(ForeignScanState *node, int eflags);
--- 102,110 ----
static TupleTableSlot *fileIterateForeignScan(ForeignScanState *node);
static void fileReScanForeignScan(ForeignScanState *node);
static void fileEndForeignScan(ForeignScanState *node);
+ static void fileAnalyzeForeignTable(Relation onerel,
+ VacuumStmt *vacstmt,
+ int elevel);
/*
* Helper functions
***************
*** 112,118 **** static List *get_file_fdw_attribute_options(Oid relid);
static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
const char *filename,
Cost *startup_cost, Cost *total_cost);
!
/*
* Foreign-data wrapper handler function: return a struct with pointers
--- 116,125 ----
static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
const char *filename,
Cost *startup_cost, Cost *total_cost);
! static int acquire_sample_rows(Relation onerel,
! HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel);
/*
* Foreign-data wrapper handler function: return a struct with pointers
***************
*** 129,134 **** file_fdw_handler(PG_FUNCTION_ARGS)
--- 136,142 ----
fdwroutine->IterateForeignScan = fileIterateForeignScan;
fdwroutine->ReScanForeignScan = fileReScanForeignScan;
fdwroutine->EndForeignScan = fileEndForeignScan;
+ fdwroutine->AnalyzeForeignTable = fileAnalyzeForeignTable;
PG_RETURN_POINTER(fdwroutine);
}
***************
*** 575,580 **** fileReScanForeignScan(ForeignScanState *node)
--- 583,600 ----
}
/*
+ * fileAnalyzeForeignTable
+ * Analyze foreign table
+ */
+ static void
+ fileAnalyzeForeignTable(Relation onerel, VacuumStmt *vacstmt, int elevel)
+ {
+ do_analyze_rel(onerel, vacstmt,
+ elevel, false,
+ acquire_sample_rows);
+ }
+
+ /*
* Estimate costs of scanning a foreign table.
*/
static void
***************
*** 584,590 **** estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
{
struct stat stat_buf;
BlockNumber pages;
! int tuple_width;
double ntuples;
double nrows;
Cost run_cost = 0;
--- 604,611 ----
{
struct stat stat_buf;
BlockNumber pages;
! BlockNumber relpages;
! double reltuples;
double ntuples;
double nrows;
Cost run_cost = 0;
***************
*** 604,619 **** estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
if (pages < 1)
pages = 1;
! /*
! * Estimate the number of tuples in the file. We back into this estimate
! * using the planner's idea of the relation width; which is bogus if not
! * all columns are being read, not to mention that the text representation
! * of a row probably isn't the same size as its internal representation.
! * FIXME later.
! */
! tuple_width = MAXALIGN(baserel->width) + MAXALIGN(sizeof(HeapTupleHeaderData));
! ntuples = clamp_row_est((double) stat_buf.st_size / (double) tuple_width);
/*
* Now estimate the number of rows returned by the scan after applying the
--- 625,658 ----
if (pages < 1)
pages = 1;
! relpages = baserel->pages;
! reltuples = baserel->tuples;
!
! if (relpages > 0)
! {
! double density;
! density = reltuples / (double) relpages;
!
! ntuples = clamp_row_est(density * (double) pages);
! }
! else
! {
! int tuple_width;
!
! /*
! * Estimate the number of tuples in the file. We back into this
! * estimate using the planner's idea of the relation width; which is
! * bogus if not all columns are being read, not to mention that the
! * text representation of a row probably isn't the same size as its
! * internal representation. FIXME later.
! */
! tuple_width = MAXALIGN(baserel->width) +
! MAXALIGN(sizeof(HeapTupleHeaderData));
!
! ntuples = clamp_row_est((double) stat_buf.st_size /
! (double) tuple_width);
! }
/*
* Now estimate the number of rows returned by the scan after applying the
***************
*** 645,647 **** estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
--- 684,872 ----
run_cost += cpu_per_tuple * ntuples;
*total_cost = *startup_cost + run_cost;
}
+
+ /*
+ * acquire_sample_rows -- acquire a random sample of rows from the table
+ *
+ * Selected rows are returned in the caller-allocated array rows[], which must
+ * have at least targrows entries. The actual number of rows selected is
+ * returned as the function result. We also count the number of valid rows in
+ * the table, and return it into *totalrows.
+ *
+ * The returned list of tuples is in order by physical position in the table.
+ * (We will rely on this later to derive correlation estimates.)
+ */
+ static int
+ acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
+ double *totalrows, double *totaldeadrows,
+ BlockNumber *totalpages, int elevel)
+ {
+ int numrows = 0;
+ int invalrows = 0; /* total # rows violating
+ the NOT NULL constraints */
+ double validrows = 0; /* total # rows collected */
+ double rowstoskip = -1; /* -1 means not set yet */
+ double rstate;
+ HeapTuple tuple;
+ TupleDesc tupDesc;
+ TupleConstr *constr;
+ int natts;
+ int attrChk;
+ Datum *values;
+ bool *nulls;
+ bool found;
+ bool sample_it = false;
+ char *filename;
+ struct stat stat_buf;
+ List *options;
+ CopyState cstate;
+ ErrorContextCallback errcontext;
+
+ Assert(onerel);
+ Assert(targrows > 0);
+
+ tupDesc = RelationGetDescr(onerel);
+ constr = tupDesc->constr;
+ natts = tupDesc->natts;
+ values = (Datum *) palloc(tupDesc->natts * sizeof(Datum));
+ nulls = (bool *) palloc(tupDesc->natts * sizeof(bool));
+
+ /* Fetch options of foreign table */
+ fileGetOptions(RelationGetRelid(onerel), &filename, &options);
+
+ /*
+ * Get size of the file.
+ */
+ if (stat(filename, &stat_buf) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ filename)));
+
+ /*
+ * Convert size to pages for use in I/O cost estimate.
+ */
+ *totalpages = (stat_buf.st_size + (BLCKSZ - 1)) / BLCKSZ;
+ if (*totalpages < 1)
+ *totalpages = 1;
+
+ /*
+ * Create CopyState from FDW options. We always acquire all columns, so
+ * as to match the expected ScanTupleSlot signature.
+ */
+ cstate = BeginCopyFrom(onerel, filename, NIL, options);
+
+ /* Prepare for sampling rows */
+ rstate = init_selection_state(targrows);
+
+ /* Set up callback to identify error line number. */
+ errcontext.callback = CopyFromErrorCallback;
+ errcontext.arg = (void *) cstate;
+ errcontext.previous = error_context_stack;
+ error_context_stack = &errcontext;
+
+ for (;;)
+ {
+ sample_it = true;
+
+ /*
+ * Check for user-requested abort.
+ */
+ CHECK_FOR_INTERRUPTS();
+
+ found = NextCopyFrom(cstate, NULL, values, nulls, NULL);
+
+ if (!found)
+ break;
+
+ tuple = heap_form_tuple(tupDesc, values, nulls);
+
+ if (constr && constr->has_not_null)
+ {
+ for (attrChk = 1; attrChk <= natts; attrChk++)
+ {
+ if (onerel->rd_att->attrs[attrChk - 1]->attnotnull &&
+ heap_attisnull(tuple, attrChk))
+ {
+ sample_it = false;
+ break;
+ }
+ }
+ }
+
+ if (!sample_it)
+ {
+ invalrows += 1;
+ heap_freetuple(tuple);
+ continue;
+ }
+
+ /*
+ * The first targrows sample rows are simply copied into the
+ * reservoir. Then we start replacing tuples in the sample
+ * until we reach the end of the relation. This algorithm is
+ * from Jeff Vitter's paper (see full citation below). It
+ * works by repeatedly computing the number of tuples to skip
+ * before selecting a tuple, which replaces a randomly chosen
+ * element of the reservoir (current set of tuples). At all
+ * times the reservoir is a true random sample of the tuples
+ * we've passed over so far, so when we fall off the end of
+ * the relation we're done.
+ */
+ if (numrows < targrows)
+ rows[numrows++] = heap_copytuple(tuple);
+ else
+ {
+ /*
+ * t in Vitter's paper is the number of records already
+ * processed. If we need to compute a new S value, we
+ * must use the not-yet-incremented value of samplerows as
+ * t.
+ */
+ if (rowstoskip < 0)
+ rowstoskip = get_next_S(validrows, targrows, &rstate);
+
+ if (rowstoskip <= 0)
+ {
+ /*
+ * Found a suitable tuple, so save it, replacing one
+ * old tuple at random
+ */
+ int k = (int) (targrows * random_fract());
+
+ Assert(k >= 0 && k < targrows);
+ heap_freetuple(rows[k]);
+ rows[k] = heap_copytuple(tuple);
+ }
+
+ rowstoskip -= 1;
+ }
+
+ validrows += 1;
+ heap_freetuple(tuple);
+ }
+
+ /* Remove error callback. */
+ error_context_stack = errcontext.previous;
+
+ *totalrows = validrows;
+ *totaldeadrows = 0;
+
+ EndCopyFrom(cstate);
+
+ pfree(values);
+ pfree(nulls);
+
+ /*
+ * Emit some interesting relation info
+ */
+ ereport(elevel,
+ (errmsg("\"%s\": scanned, "
+ "containing %d valid rows and %d invalid rows; "
+ "%d rows in sample, %d total rows",
+ RelationGetRelationName(onerel),
+ (int) validrows, invalrows,
+ numrows, (int) *totalrows)));
+
+ return numrows;
+ }
*** a/contrib/file_fdw/input/file_fdw.source
--- b/contrib/file_fdw/input/file_fdw.source
***************
*** 111,116 **** EXECUTE st(100);
--- 111,121 ----
EXECUTE st(100);
DEALLOCATE st;
+ -- statistics collection tests
+ ANALYZE agg_csv;
+ SELECT relpages, reltuples FROM pg_class WHERE relname = 'agg_csv';
+ SELECT * FROM pg_stats WHERE tablename = 'agg_csv';
+
-- tableoid
SELECT tableoid::regclass, b FROM agg_csv;
*** a/contrib/file_fdw/output/file_fdw.source
--- b/contrib/file_fdw/output/file_fdw.source
***************
*** 174,179 **** EXECUTE st(100);
--- 174,194 ----
(1 row)
DEALLOCATE st;
+ -- statistics collection tests
+ ANALYZE agg_csv;
+ SELECT relpages, reltuples FROM pg_class WHERE relname = 'agg_csv';
+ relpages | reltuples
+ ----------+-----------
+ 1 | 3
+ (1 row)
+
+ SELECT * FROM pg_stats WHERE tablename = 'agg_csv';
+ schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
+ ------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+-------------------------+-------------
+ public | agg_csv | a | f | 0 | 2 | -1 | | | {0,42,100} | -0.5
+ public | agg_csv | b | f | 0 | 4 | -1 | | | {0.09561,99.097,324.78} | 0.5
+ (2 rows)
+
-- tableoid
SELECT tableoid::regclass, b FROM agg_csv;
tableoid | b
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***************
*** 228,233 **** EndForeignScan (ForeignScanState *node);
--- 228,257 ----
</para>
<para>
+ <programlisting>
+ void
+ AnalyzeForeignTable (Relation onerel,
+ VacuumStmt *vacstmt,
+ int elevel);
+ </programlisting>
+
+ Collect statistics on a foreign table and store the results in the
+ pg_class and pg_statistics system catalogs.
+ This is optional, and if implemented, called when <command>ANALYZE
+ </> command is run.
+ The statistics are used by the query planner in order to make good
+ choices of query plans.
+ The function can be implemented by writing a sampling function that
+ acquires a random sample of rows from an external data source and
+ then by calling <function>do_analyze_rel</>, where you should pass
+ the sampling function as an argument.
+ Or the function can be directly implemented to get the statistics
+ from an external data source, transform it if neccesary, and store
+ it in the pg_class and pg_statistics system catalogs.
+ The function must be set to NULL if it isn't implemented.
+ </para>
+
+ <para>
The <structname>FdwRoutine</> and <structname>FdwPlan</> struct types
are declared in <filename>src/include/foreign/fdwapi.h</>, which see
for additional details.
*** a/doc/src/sgml/maintenance.sgml
--- b/doc/src/sgml/maintenance.sgml
***************
*** 279,284 ****
--- 279,288 ----
<command>ANALYZE</> strictly as a function of the number of rows
inserted or updated; it has no knowledge of whether that will lead
to meaningful statistical changes.
+ Note that the autovacuum daemon does not issue <command>ANALYZE</>
+ commands on foreign tables. It is recommended to run manually-managed
+ <command>ANALYZE</> commands as needed, which typically are executed
+ according to a schedule by cron or Task Scheduler scripts.
</para>
<para>
*** a/doc/src/sgml/ref/alter_foreign_table.sgml
--- b/doc/src/sgml/ref/alter_foreign_table.sgml
***************
*** 36,41 **** ALTER FOREIGN TABLE <replaceable class="PARAMETER">name</replaceable>
--- 36,44 ----
DROP [ COLUMN ] [ IF EXISTS ] <replaceable class="PARAMETER">column</replaceable> [ RESTRICT | CASCADE ]
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> [ SET DATA ] TYPE <replaceable class="PARAMETER">type</replaceable>
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> { SET | DROP } NOT NULL
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> SET STATISTICS <replaceable class="PARAMETER">integer</replaceable>
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> SET ( <replaceable class="PARAMETER">attribute_option</replaceable> = <replaceable class="PARAMETER">value</replaceable> [, ... ] )
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> RESET ( <replaceable class="PARAMETER">attribute_option</replaceable> [, ... ] )
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> OPTIONS ( [ ADD | SET | DROP ] <replaceable class="PARAMETER">option</replaceable> ['<replaceable class="PARAMETER">value</replaceable>'] [, ... ])
OWNER TO <replaceable class="PARAMETER">new_owner</replaceable>
OPTIONS ( [ ADD | SET | DROP ] <replaceable class="PARAMETER">option</replaceable> ['<replaceable class="PARAMETER">value</replaceable>'] [, ... ])
***************
*** 94,99 **** ALTER FOREIGN TABLE <replaceable class="PARAMETER">name</replaceable>
--- 97,146 ----
</varlistentry>
<varlistentry>
+ <term><literal>SET STATISTICS</literal></term>
+ <listitem>
+ <para>
+ This form
+ sets the per-column statistics-gathering target for subsequent
+ <xref linkend="sql-analyze"> operations.
+ The target can be set in the range 0 to 10000; alternatively, set it
+ to -1 to revert to using the system default statistics
+ target (<xref linkend="guc-default-statistics-target">).
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>SET ( <replaceable class="PARAMETER">attribute_option</replaceable> = <replaceable class="PARAMETER">value</replaceable> [, ... ] )</literal></term>
+ <term><literal>RESET ( <replaceable class="PARAMETER">attribute_option</replaceable> [, ... ] )</literal></term>
+ <listitem>
+ <para>
+ This form
+ sets or resets a per-attribute option. Currently, the only defined
+ per-attribute option is <literal>n_distinct</>, which overrides
+ the number-of-distinct-values estimates made by subsequent
+ <xref linkend="sql-analyze"> operations.
+ When set to a positive value, <command>ANALYZE</> will assume that
+ the column contains exactly the specified number of distinct nonnull
+ values.
+ When set to a negative value, which must be greater than or equal
+ to -1, <command>ANALYZE</> will assume that the number of distinct
+ nonnull values in the column is linear in the size of the foreign
+ table; the exact count is to be computed by multiplying the estimated
+ foreign table size by the absolute value of the given number.
+ For example,
+ a value of -1 implies that all values in the column are distinct,
+ while a value of -0.5 implies that each value appears twice on the
+ average.
+ This can be useful when the size of the foreign table changes over
+ time, since the multiplication by the number of rows in the foreign
+ table is not performed until query planning time. Specify a value
+ of 0 to revert to estimating the number of distinct values normally.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>OWNER</literal></term>
<listitem>
<para>
*** a/doc/src/sgml/ref/analyze.sgml
--- b/doc/src/sgml/ref/analyze.sgml
***************
*** 39,47 **** ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ ( <re
<para>
With no parameter, <command>ANALYZE</command> examines every table in the
! current database. With a parameter, <command>ANALYZE</command> examines
! only that table. It is further possible to give a list of column names,
! in which case only the statistics for those columns are collected.
</para>
</refsect1>
--- 39,49 ----
<para>
With no parameter, <command>ANALYZE</command> examines every table in the
! current database except for foreign tables. With a parameter, <command>
! ANALYZE</command> examines only that table. For a foreign table, it is
! necessary to specify the name of that table. It is further possible to
! give a list of column names, in which case only the statistics for those
! columns are collected.
</para>
</refsect1>
***************
*** 63,69 **** ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ ( <re
<listitem>
<para>
The name (possibly schema-qualified) of a specific table to
! analyze. Defaults to all tables in the current database.
</para>
</listitem>
</varlistentry>
--- 65,72 ----
<listitem>
<para>
The name (possibly schema-qualified) of a specific table to
! analyze. Defaults to all tables in the current database except
! for foreign tables.
</para>
</listitem>
</varlistentry>
***************
*** 137,143 **** ANALYZE [ VERBOSE ] [ <replaceable class="PARAMETER">table</replaceable> [ ( <re
In rare situations, this non-determinism will cause the planner's
choices of query plans to change after <command>ANALYZE</command> is run.
To avoid this, raise the amount of statistics collected by
! <command>ANALYZE</command>, as described below.
</para>
<para>
--- 140,148 ----
In rare situations, this non-determinism will cause the planner's
choices of query plans to change after <command>ANALYZE</command> is run.
To avoid this, raise the amount of statistics collected by
! <command>ANALYZE</command>, as described below. Note that the time
! needed to analyze on foreign tables depends on the implementation of
! the foreign data wrapper via which such tables are attached.
</para>
<para>
*** a/src/backend/commands/analyze.c
--- b/src/backend/commands/analyze.c
***************
*** 23,28 ****
--- 23,29 ----
#include "access/xact.h"
#include "catalog/index.h"
#include "catalog/indexing.h"
+ #include "catalog/pg_class.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_namespace.h"
***************
*** 30,35 ****
--- 31,38 ----
#include "commands/tablecmds.h"
#include "commands/vacuum.h"
#include "executor/executor.h"
+ #include "foreign/foreign.h"
+ #include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "parser/parse_oper.h"
***************
*** 78,91 **** typedef struct AnlIndexData
int default_statistics_target = 100;
/* A few variables that don't seem worth passing around as parameters */
- static int elevel = -1;
-
static MemoryContext anl_context = NULL;
static BufferAccessStrategy vac_strategy;
- static void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh);
static void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
int samplesize);
static bool BlockSampler_HasMore(BlockSampler bs);
--- 81,91 ----
***************
*** 96,110 **** static void compute_index_stats(Relation onerel, double totalrows,
MemoryContext col_context);
static VacAttrStats *examine_attribute(Relation onerel, int attnum,
Node *index_expr);
! static int acquire_sample_rows(Relation onerel, HeapTuple *rows,
! int targrows, double *totalrows, double *totaldeadrows);
! static double random_fract(void);
! static double init_selection_state(int n);
! static double get_next_S(double t, int n, double *stateptr);
! static int compare_rows(const void *a, const void *b);
static int acquire_inherited_sample_rows(Relation onerel,
HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows);
static void update_attstats(Oid relid, bool inh,
int natts, VacAttrStats **vacattrstats);
static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
--- 96,110 ----
MemoryContext col_context);
static VacAttrStats *examine_attribute(Relation onerel, int attnum,
Node *index_expr);
! static int acquire_sample_rows(Relation onerel,
! HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel);
static int acquire_inherited_sample_rows(Relation onerel,
HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel);
! static int compare_rows(const void *a, const void *b);
static void update_attstats(Oid relid, bool inh,
int natts, VacAttrStats **vacattrstats);
static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
***************
*** 119,125 **** static bool std_typanalyze(VacAttrStats *stats);
--- 119,127 ----
void
analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
{
+ int elevel;
Relation onerel;
+ FdwRoutine *fdwroutine;
/* Set up static variables */
if (vacstmt->options & VACOPT_VERBOSE)
***************
*** 184,193 **** analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
}
/*
! * Check that it's a plain table; we used to do this in get_rel_oids() but
! * seems safer to check after we've locked the relation.
*/
! if (onerel->rd_rel->relkind != RELKIND_RELATION)
{
/* No need for a WARNING if we already complained during VACUUM */
if (!(vacstmt->options & VACOPT_VACUUM))
--- 186,197 ----
}
/*
! * Check that it's a plain table or foreign table; we used to do this
! * in get_rel_oids() but seems safer to check after we've locked the
! * relation.
*/
! if (onerel->rd_rel->relkind != RELKIND_RELATION &&
! onerel->rd_rel->relkind != RELKIND_FOREIGN_TABLE)
{
/* No need for a WARNING if we already complained during VACUUM */
if (!(vacstmt->options & VACOPT_VACUUM))
***************
*** 211,217 **** analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
}
/*
! * We can ANALYZE any table except pg_statistic. See update_attstats
*/
if (RelationGetRelid(onerel) == StatisticRelationId)
{
--- 215,223 ----
}
/*
! * We can ANALYZE any table except pg_statistic. See update_attstats. In
! * addition, We can ANALYZE foreign tables if AnalyzeForeignTable callback
! * routines of underlying foreign-data wrappers are implemented.
*/
if (RelationGetRelid(onerel) == StatisticRelationId)
{
***************
*** 219,224 **** analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
--- 225,245 ----
return;
}
+ if (onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ {
+ fdwroutine = GetFdwRoutineByRelId(RelationGetRelid(onerel));
+
+ if (fdwroutine->AnalyzeForeignTable == NULL)
+ {
+ ereport(WARNING,
+ (errmsg("skipping \"%s\" --- underlying foreign-data "
+ "wrapper cannot analyze it",
+ RelationGetRelationName(onerel))));
+ relation_close(onerel, ShareUpdateExclusiveLock);
+ return;
+ }
+ }
+
/*
* OK, let's do it. First let other backends know I'm in ANALYZE.
*/
***************
*** 226,241 **** analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
LWLockRelease(ProcArrayLock);
! /*
! * Do the normal non-recursive ANALYZE.
! */
! do_analyze_rel(onerel, vacstmt, false);
! /*
! * If there are child tables, do recursive ANALYZE.
! */
! if (onerel->rd_rel->relhassubclass)
! do_analyze_rel(onerel, vacstmt, true);
/*
* Close source relation now, but keep lock so that no one deletes it
--- 247,285 ----
MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
LWLockRelease(ProcArrayLock);
! if (onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
! {
! ereport(elevel,
! (errmsg("analyzing \"%s.%s\"",
! get_namespace_name(RelationGetNamespace(onerel)),
! RelationGetRelationName(onerel))));
! fdwroutine->AnalyzeForeignTable(onerel, vacstmt, elevel);
! }
! else
! {
! /*
! * Do the normal non-recursive ANALYZE.
! */
! ereport(elevel,
! (errmsg("analyzing \"%s.%s\"",
! get_namespace_name(RelationGetNamespace(onerel)),
! RelationGetRelationName(onerel))));
! do_analyze_rel(onerel, vacstmt, elevel,
! false, acquire_sample_rows);
! /*
! * If there are child tables, do recursive ANALYZE.
! */
! if (onerel->rd_rel->relhassubclass)
! {
! ereport(elevel,
! (errmsg("analyzing \"%s.%s\" inheritance tree",
! get_namespace_name(RelationGetNamespace(onerel)),
! RelationGetRelationName(onerel))));
! do_analyze_rel(onerel, vacstmt, elevel,
! true, acquire_inherited_sample_rows);
! }
! }
/*
* Close source relation now, but keep lock so that no one deletes it
***************
*** 257,264 **** analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
/*
* do_analyze_rel() -- analyze one relation, recursively or not
*/
! static void
! do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
{
int attr_cnt,
tcnt,
--- 301,309 ----
/*
* do_analyze_rel() -- analyze one relation, recursively or not
*/
! void
! do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, int elevel,
! bool inh, SampleRowAcquireFunc acquirefunc)
{
int attr_cnt,
tcnt,
***************
*** 273,278 **** do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
--- 318,324 ----
numrows;
double totalrows,
totaldeadrows;
+ BlockNumber totalpages;
HeapTuple *rows;
PGRUsage ru0;
TimestampTz starttime = 0;
***************
*** 281,297 **** do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
int save_sec_context;
int save_nestlevel;
- if (inh)
- ereport(elevel,
- (errmsg("analyzing \"%s.%s\" inheritance tree",
- get_namespace_name(RelationGetNamespace(onerel)),
- RelationGetRelationName(onerel))));
- else
- ereport(elevel,
- (errmsg("analyzing \"%s.%s\"",
- get_namespace_name(RelationGetNamespace(onerel)),
- RelationGetRelationName(onerel))));
-
/*
* Set up a working context so that we can easily free whatever junk gets
* created.
--- 327,332 ----
***************
*** 449,459 **** do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
*/
rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
if (inh)
! numrows = acquire_inherited_sample_rows(onerel, rows, targrows,
! &totalrows, &totaldeadrows);
else
! numrows = acquire_sample_rows(onerel, rows, targrows,
! &totalrows, &totaldeadrows);
/*
* Compute the statistics. Temporary results during the calculations for
--- 484,496 ----
*/
rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
if (inh)
! numrows = acquirefunc(onerel, rows, targrows,
! &totalrows, &totaldeadrows,
! NULL, elevel);
else
! numrows = acquirefunc(onerel, rows, targrows,
! &totalrows, &totaldeadrows,
! &totalpages, elevel);
/*
* Compute the statistics. Temporary results during the calculations for
***************
*** 534,540 **** do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
*/
if (!inh)
vac_update_relstats(onerel,
! RelationGetNumberOfBlocks(onerel),
totalrows,
visibilitymap_count(onerel),
hasindex,
--- 571,577 ----
*/
if (!inh)
vac_update_relstats(onerel,
! totalpages,
totalrows,
visibilitymap_count(onerel),
hasindex,
***************
*** 1017,1023 **** BlockSampler_Next(BlockSampler bs)
*/
static int
acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows)
{
int numrows = 0; /* # rows now in reservoir */
double samplerows = 0; /* total # rows collected */
--- 1054,1061 ----
*/
static int
acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel)
{
int numrows = 0; /* # rows now in reservoir */
double samplerows = 0; /* total # rows collected */
***************
*** 1032,1037 **** acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
--- 1070,1077 ----
Assert(targrows > 0);
totalblocks = RelationGetNumberOfBlocks(onerel);
+ if (totalpages)
+ *totalpages = totalblocks;
/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true);
***************
*** 1254,1260 **** acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
}
/* Select a random value R uniformly distributed in (0 - 1) */
! static double
random_fract(void)
{
return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
--- 1294,1300 ----
}
/* Select a random value R uniformly distributed in (0 - 1) */
! double
random_fract(void)
{
return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
***************
*** 1274,1287 **** random_fract(void)
* determines the number of records to skip before the next record is
* processed.
*/
! static double
init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
return exp(-log(random_fract()) / n);
}
! static double
get_next_S(double t, int n, double *stateptr)
{
double S;
--- 1314,1327 ----
* determines the number of records to skip before the next record is
* processed.
*/
! double
init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
return exp(-log(random_fract()) / n);
}
! double
get_next_S(double t, int n, double *stateptr)
{
double S;
***************
*** 1397,1403 **** compare_rows(const void *a, const void *b)
*/
static int
acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows)
{
List *tableOIDs;
Relation *rels;
--- 1437,1444 ----
*/
static int
acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel)
{
List *tableOIDs;
Relation *rels;
***************
*** 1460,1465 **** acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
--- 1501,1508 ----
totalblocks += relblocks[nrels];
nrels++;
}
+ if (totalpages)
+ *totalpages = totalblocks;
/*
* Now sample rows from each relation, proportionally to its fraction of
***************
*** 1493,1499 **** acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
rows + numrows,
childtargrows,
&trows,
! &tdrows);
/* We may need to convert from child's rowtype to parent's */
if (childrows > 0 &&
--- 1536,1544 ----
rows + numrows,
childtargrows,
&trows,
! &tdrows,
! NULL,
! elevel);
/* We may need to convert from child's rowtype to parent's */
if (childrows > 0 &&
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***************
*** 317,322 **** static void ATPrepSetStatistics(Relation rel, const char *colName,
--- 317,324 ----
Node *newValue, LOCKMODE lockmode);
static void ATExecSetStatistics(Relation rel, const char *colName,
Node *newValue, LOCKMODE lockmode);
+ static void ATPrepSetOptions(Relation rel, const char *colName,
+ Node *options, LOCKMODE lockmode);
static void ATExecSetOptions(Relation rel, const char *colName,
Node *options, bool isReset, LOCKMODE lockmode);
static void ATExecSetStorage(Relation rel, const char *colName,
***************
*** 2915,2921 **** ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
break;
case AT_SetOptions: /* ALTER COLUMN SET ( options ) */
case AT_ResetOptions: /* ALTER COLUMN RESET ( options ) */
! ATSimplePermissions(rel, ATT_TABLE | ATT_INDEX);
/* This command never recurses */
pass = AT_PASS_MISC;
break;
--- 2917,2924 ----
break;
case AT_SetOptions: /* ALTER COLUMN SET ( options ) */
case AT_ResetOptions: /* ALTER COLUMN RESET ( options ) */
! ATSimplePermissions(rel, ATT_TABLE | ATT_INDEX | ATT_FOREIGN_TABLE);
! ATPrepSetOptions(rel, cmd->name, cmd->def, lockmode);
/* This command never recurses */
pass = AT_PASS_MISC;
break;
***************
*** 4851,4860 **** ATPrepSetStatistics(Relation rel, const char *colName, Node *newValue, LOCKMODE
* allowSystemTableMods to be turned on.
*/
if (rel->rd_rel->relkind != RELKIND_RELATION &&
! rel->rd_rel->relkind != RELKIND_INDEX)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
! errmsg("\"%s\" is not a table or index",
RelationGetRelationName(rel))));
/* Permissions checks */
--- 4854,4864 ----
* allowSystemTableMods to be turned on.
*/
if (rel->rd_rel->relkind != RELKIND_RELATION &&
! rel->rd_rel->relkind != RELKIND_INDEX &&
! rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
! errmsg("\"%s\" is not a table, index or foreign table",
RelationGetRelationName(rel))));
/* Permissions checks */
***************
*** 4923,4928 **** ATExecSetStatistics(Relation rel, const char *colName, Node *newValue, LOCKMODE
--- 4927,4954 ----
}
static void
+ ATPrepSetOptions(Relation rel, const char *colName, Node *options,
+ LOCKMODE lockmode)
+ {
+ if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ {
+ ListCell *cell;
+
+ foreach(cell, (List *) options)
+ {
+ DefElem *def = (DefElem *) lfirst(cell);
+
+ if (pg_strncasecmp(def->defname, "n_distinct_inherited",
+ strlen("n_distinct_inherited")) == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot set \"n_distinct_inherited\" "
+ "for foreign tables")));
+ }
+ }
+ }
+
+ static void
ATExecSetOptions(Relation rel, const char *colName, Node *options,
bool isReset, LOCKMODE lockmode)
{
*** a/src/bin/psql/describe.c
--- b/src/bin/psql/describe.c
***************
*** 1099,1105 **** describeOneTableDetails(const char *schemaname,
bool printTableInitialized = false;
int i;
char *view_def = NULL;
! char *headers[6];
char **seq_values = NULL;
char **modifiers = NULL;
char **ptr;
--- 1099,1105 ----
bool printTableInitialized = false;
int i;
char *view_def = NULL;
! char *headers[7];
char **seq_values = NULL;
char **modifiers = NULL;
char **ptr;
***************
*** 1390,1396 **** describeOneTableDetails(const char *schemaname,
if (verbose)
{
headers[cols++] = gettext_noop("Storage");
! if (tableinfo.relkind == 'r')
headers[cols++] = gettext_noop("Stats target");
/* Column comments, if the relkind supports this feature. */
if (tableinfo.relkind == 'r' || tableinfo.relkind == 'v' ||
--- 1390,1396 ----
if (verbose)
{
headers[cols++] = gettext_noop("Storage");
! if (tableinfo.relkind == 'r' || tableinfo.relkind == 'f')
headers[cols++] = gettext_noop("Stats target");
/* Column comments, if the relkind supports this feature. */
if (tableinfo.relkind == 'r' || tableinfo.relkind == 'v' ||
***************
*** 1493,1499 **** describeOneTableDetails(const char *schemaname,
false, false);
/* Statistics target, if the relkind supports this feature */
! if (tableinfo.relkind == 'r')
{
printTableAddCell(&cont, PQgetvalue(res, i, firstvcol + 1),
false, false);
--- 1493,1499 ----
false, false);
/* Statistics target, if the relkind supports this feature */
! if (tableinfo.relkind == 'r' || tableinfo.relkind == 'f')
{
printTableAddCell(&cont, PQgetvalue(res, i, firstvcol + 1),
false, false);
*** a/src/bin/psql/tab-complete.c
--- b/src/bin/psql/tab-complete.c
***************
*** 399,404 **** static const SchemaQuery Query_for_list_of_tsvf = {
--- 399,419 ----
NULL
};
+ static const SchemaQuery Query_for_list_of_tf = {
+ /* catname */
+ "pg_catalog.pg_class c",
+ /* selcondition */
+ "c.relkind IN ('r', 'f')",
+ /* viscondition */
+ "pg_catalog.pg_table_is_visible(c.oid)",
+ /* namespace */
+ "c.relnamespace",
+ /* result */
+ "pg_catalog.quote_ident(c.relname)",
+ /* qualresult */
+ NULL
+ };
+
static const SchemaQuery Query_for_list_of_views = {
/* catname */
"pg_catalog.pg_class c",
***************
*** 2769,2775 **** psql_completion(char *text, int start, int end)
/* ANALYZE */
/* If the previous word is ANALYZE, produce list of tables */
else if (pg_strcasecmp(prev_wd, "ANALYZE") == 0)
! COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);
/* WHERE */
/* Simple case of the word before the where being the table name */
--- 2784,2790 ----
/* ANALYZE */
/* If the previous word is ANALYZE, produce list of tables */
else if (pg_strcasecmp(prev_wd, "ANALYZE") == 0)
! COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tf, NULL);
/* WHERE */
/* Simple case of the word before the where being the table name */
*** a/src/include/commands/vacuum.h
--- b/src/include/commands/vacuum.h
***************
*** 165,171 **** extern void lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
--- 165,182 ----
BufferAccessStrategy bstrategy);
/* in commands/analyze.c */
+ typedef int (*SampleRowAcquireFunc) (Relation onerel, HeapTuple *rows,
+ int targrows,
+ double *totalrows, double *totaldeadrows,
+ BlockNumber *totalpages, int elevel);
+
extern void analyze_rel(Oid relid, VacuumStmt *vacstmt,
BufferAccessStrategy bstrategy);
+ extern void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, int elevel,
+ bool inh, SampleRowAcquireFunc acquirefunc);
+ extern double random_fract(void);
+ extern double init_selection_state(int n);
+ extern double get_next_S(double t, int n, double *stateptr);
+
#endif /* VACUUM_H */
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 12,19 ****
--- 12,21 ----
#ifndef FDWAPI_H
#define FDWAPI_H
+ #include "foreign/foreign.h"
#include "nodes/execnodes.h"
#include "nodes/relation.h"
+ #include "utils/rel.h"
/* To avoid including explain.h here, reference ExplainState thus: */
struct ExplainState;
***************
*** 68,73 **** typedef void (*ReScanForeignScan_function) (ForeignScanState *node);
--- 70,78 ----
typedef void (*EndForeignScan_function) (ForeignScanState *node);
+ typedef void (*AnalyzeForeignTable_function) (Relation relation,
+ VacuumStmt *vacstmt,
+ int elevel);
/*
* FdwRoutine is the struct returned by a foreign-data wrapper's handler
***************
*** 82,93 **** typedef struct FdwRoutine
--- 87,109 ----
{
NodeTag type;
+ /*
+ * These handlers are required to execute simple scans on a foreign
+ * table. If any of them was set to NULL, scans on a foreign table
+ * managed by FDW would fail.
+ */
PlanForeignScan_function PlanForeignScan;
ExplainForeignScan_function ExplainForeignScan;
BeginForeignScan_function BeginForeignScan;
IterateForeignScan_function IterateForeignScan;
ReScanForeignScan_function ReScanForeignScan;
EndForeignScan_function EndForeignScan;
+
+ /*
+ * Handlers below are optional. You can set any of them to NULL to
+ * tell PostgreSQL that FDW doesn't have the capability.
+ */
+ AnalyzeForeignTable_function AnalyzeForeignTable;
} FdwRoutine;
*** a/src/test/regress/expected/foreign_data.out
--- b/src/test/regress/expected/foreign_data.out
***************
*** 647,658 **** CREATE FOREIGN TABLE ft1 (
COMMENT ON FOREIGN TABLE ft1 IS 'ft1';
COMMENT ON COLUMN ft1.c1 IS 'ft1.c1';
\d+ ft1
! Foreign table "public.ft1"
! Column | Type | Modifiers | FDW Options | Storage | Description
! --------+---------+-----------+--------------------------------+----------+-------------
! c1 | integer | not null | ("param 1" 'val1') | plain | ft1.c1
! c2 | text | | (param2 'val2', param3 'val3') | extended |
! c3 | date | | | plain |
Server: s0
FDW Options: (delimiter ',', quote '"', "be quoted" 'value')
Has OIDs: no
--- 647,658 ----
COMMENT ON FOREIGN TABLE ft1 IS 'ft1';
COMMENT ON COLUMN ft1.c1 IS 'ft1.c1';
\d+ ft1
! Foreign table "public.ft1"
! Column | Type | Modifiers | FDW Options | Storage | Stats target | Description
! --------+---------+-----------+--------------------------------+----------+--------------+-------------
! c1 | integer | not null | ("param 1" 'val1') | plain | | ft1.c1
! c2 | text | | (param2 'val2', param3 'val3') | extended | |
! c3 | date | | | plain | |
Server: s0
FDW Options: (delimiter ',', quote '"', "be quoted" 'value')
Has OIDs: no
***************
*** 698,716 **** ERROR: cannot alter system column "xmin"
ALTER FOREIGN TABLE ft1 ALTER COLUMN c7 OPTIONS (ADD p1 'v1', ADD p2 'v2'),
ALTER COLUMN c8 OPTIONS (ADD p1 'v1', ADD p2 'v2');
ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 OPTIONS (SET p2 'V2', DROP p1);
\d+ ft1
! Foreign table "public.ft1"
! Column | Type | Modifiers | FDW Options | Storage | Description
! --------+---------+-----------+--------------------------------+----------+-------------
! c1 | integer | not null | ("param 1" 'val1') | plain |
! c2 | text | | (param2 'val2', param3 'val3') | extended |
! c3 | date | | | plain |
! c4 | integer | | | plain |
! c6 | integer | not null | | plain |
! c7 | integer | | (p1 'v1', p2 'v2') | plain |
! c8 | text | | (p2 'V2') | extended |
! c9 | integer | | | plain |
! c10 | integer | | (p1 'v1') | plain |
Server: s0
FDW Options: (delimiter ',', quote '"', "be quoted" 'value')
Has OIDs: no
--- 698,721 ----
ALTER FOREIGN TABLE ft1 ALTER COLUMN c7 OPTIONS (ADD p1 'v1', ADD p2 'v2'),
ALTER COLUMN c8 OPTIONS (ADD p1 'v1', ADD p2 'v2');
ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 OPTIONS (SET p2 'V2', DROP p1);
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET STATISTICS 10000;
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET (n_distinct = 100);
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET (n_distinct_inherited = 100); -- ERROR
+ ERROR: cannot set "n_distinct_inherited" for foreign tables
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 SET STATISTICS -1;
\d+ ft1
! Foreign table "public.ft1"
! Column | Type | Modifiers | FDW Options | Storage | Stats target | Description
! --------+---------+-----------+--------------------------------+----------+--------------+-------------
! c1 | integer | not null | ("param 1" 'val1') | plain | 10000 |
! c2 | text | | (param2 'val2', param3 'val3') | extended | |
! c3 | date | | | plain | |
! c4 | integer | | | plain | |
! c6 | integer | not null | | plain | |
! c7 | integer | | (p1 'v1', p2 'v2') | plain | |
! c8 | text | | (p2 'V2') | extended | |
! c9 | integer | | | plain | |
! c10 | integer | | (p1 'v1') | plain | |
Server: s0
FDW Options: (delimiter ',', quote '"', "be quoted" 'value')
Has OIDs: no
*** a/src/test/regress/sql/foreign_data.sql
--- b/src/test/regress/sql/foreign_data.sql
***************
*** 298,303 **** ALTER FOREIGN TABLE ft1 ALTER COLUMN xmin OPTIONS (ADD p1 'v1'); -- ERROR
--- 298,307 ----
ALTER FOREIGN TABLE ft1 ALTER COLUMN c7 OPTIONS (ADD p1 'v1', ADD p2 'v2'),
ALTER COLUMN c8 OPTIONS (ADD p1 'v1', ADD p2 'v2');
ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 OPTIONS (SET p2 'V2', DROP p1);
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET STATISTICS 10000;
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET (n_distinct = 100);
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET (n_distinct_inherited = 100); -- ERROR
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 SET STATISTICS -1;
\d+ ft1
-- can't change the column type if it's used elsewhere
CREATE TABLE use_ft1_column_type (x ft1);
(2011/12/13 22:00), Etsuro Fujita wrote:
Thank you for your effectiveness experiments and proposals for
improvements. I updated the patch according to your proposals.
Attached is the updated version of the patch.
I think this patch could be marked as "Ready for committer" with some
minor fixes. Please find attached a revised patch (v6.1).
Changes from Fujita-san's patch are:
* Fix a typo in src/backend/commands/analyze.c.
* Connect multi-lined message string literals, because PG code style
allows a line including message string literals to go past 80 columns.
* Fix ATPrepSetOptions so that it uses pg_strcasecmp instead of
pg_strncasecmp, becuase it's guaranteed that a) given strings are
null-terminated, and b) they have no trailing.
This feature would enhance cost estimation of foreign scans
substantially. Great!
Regards,
--
Shigeru Hanada
* 不明 - 自動検出
* 英語
* 日本語
* 英語
* 日本語
<javascript:void(0);>
Attachments:
postgresql-analyze-v6.1.patchtext/plain; name=postgresql-analyze-v6.1.patchDownload
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 1cf3b3c..4eb4e72 100644
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
***************
*** 20,25 ****
--- 20,26 ----
#include "commands/copy.h"
#include "commands/defrem.h"
#include "commands/explain.h"
+ #include "commands/vacuum.h"
#include "foreign/fdwapi.h"
#include "foreign/foreign.h"
#include "miscadmin.h"
*************** static void fileBeginForeignScan(Foreign
*** 101,106 ****
--- 102,110 ----
static TupleTableSlot *fileIterateForeignScan(ForeignScanState *node);
static void fileReScanForeignScan(ForeignScanState *node);
static void fileEndForeignScan(ForeignScanState *node);
+ static void fileAnalyzeForeignTable(Relation onerel,
+ VacuumStmt *vacstmt,
+ int elevel);
/*
* Helper functions
*************** static List *get_file_fdw_attribute_opti
*** 112,118 ****
static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
const char *filename,
Cost *startup_cost, Cost *total_cost);
!
/*
* Foreign-data wrapper handler function: return a struct with pointers
--- 116,125 ----
static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
const char *filename,
Cost *startup_cost, Cost *total_cost);
! static int acquire_sample_rows(Relation onerel,
! HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel);
/*
* Foreign-data wrapper handler function: return a struct with pointers
*************** file_fdw_handler(PG_FUNCTION_ARGS)
*** 129,134 ****
--- 136,142 ----
fdwroutine->IterateForeignScan = fileIterateForeignScan;
fdwroutine->ReScanForeignScan = fileReScanForeignScan;
fdwroutine->EndForeignScan = fileEndForeignScan;
+ fdwroutine->AnalyzeForeignTable = fileAnalyzeForeignTable;
PG_RETURN_POINTER(fdwroutine);
}
*************** fileReScanForeignScan(ForeignScanState *
*** 575,580 ****
--- 583,600 ----
}
/*
+ * fileAnalyzeForeignTable
+ * Analyze foreign table
+ */
+ static void
+ fileAnalyzeForeignTable(Relation onerel, VacuumStmt *vacstmt, int elevel)
+ {
+ do_analyze_rel(onerel, vacstmt,
+ elevel, false,
+ acquire_sample_rows);
+ }
+
+ /*
* Estimate costs of scanning a foreign table.
*/
static void
*************** estimate_costs(PlannerInfo *root, RelOpt
*** 584,590 ****
{
struct stat stat_buf;
BlockNumber pages;
! int tuple_width;
double ntuples;
double nrows;
Cost run_cost = 0;
--- 604,611 ----
{
struct stat stat_buf;
BlockNumber pages;
! BlockNumber relpages;
! double reltuples;
double ntuples;
double nrows;
Cost run_cost = 0;
*************** estimate_costs(PlannerInfo *root, RelOpt
*** 604,619 ****
if (pages < 1)
pages = 1;
! /*
! * Estimate the number of tuples in the file. We back into this estimate
! * using the planner's idea of the relation width; which is bogus if not
! * all columns are being read, not to mention that the text representation
! * of a row probably isn't the same size as its internal representation.
! * FIXME later.
! */
! tuple_width = MAXALIGN(baserel->width) + MAXALIGN(sizeof(HeapTupleHeaderData));
! ntuples = clamp_row_est((double) stat_buf.st_size / (double) tuple_width);
/*
* Now estimate the number of rows returned by the scan after applying the
--- 625,658 ----
if (pages < 1)
pages = 1;
! relpages = baserel->pages;
! reltuples = baserel->tuples;
! if (relpages > 0)
! {
! double density;
!
! density = reltuples / (double) relpages;
!
! ntuples = clamp_row_est(density * (double) pages);
! }
! else
! {
! int tuple_width;
!
! /*
! * Estimate the number of tuples in the file. We back into this
! * estimate using the planner's idea of the relation width; which is
! * bogus if not all columns are being read, not to mention that the
! * text representation of a row probably isn't the same size as its
! * internal representation. FIXME later.
! */
! tuple_width = MAXALIGN(baserel->width) +
! MAXALIGN(sizeof(HeapTupleHeaderData));
!
! ntuples = clamp_row_est((double) stat_buf.st_size /
! (double) tuple_width);
! }
/*
* Now estimate the number of rows returned by the scan after applying the
*************** estimate_costs(PlannerInfo *root, RelOpt
*** 645,647 ****
--- 684,872 ----
run_cost += cpu_per_tuple * ntuples;
*total_cost = *startup_cost + run_cost;
}
+
+ /*
+ * acquire_sample_rows -- acquire a random sample of rows from the table
+ *
+ * Selected rows are returned in the caller-allocated array rows[], which must
+ * have at least targrows entries. The actual number of rows selected is
+ * returned as the function result. We also count the number of valid rows in
+ * the table, and return it into *totalrows.
+ *
+ * The returned list of tuples is in order by physical position in the table.
+ * (We will rely on this later to derive correlation estimates.)
+ */
+ static int
+ acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
+ double *totalrows, double *totaldeadrows,
+ BlockNumber *totalpages, int elevel)
+ {
+ int numrows = 0;
+ int invalrows = 0; /* total # rows violating
+ the NOT NULL constraints */
+ double validrows = 0; /* total # rows collected */
+ double rowstoskip = -1; /* -1 means not set yet */
+ double rstate;
+ HeapTuple tuple;
+ TupleDesc tupDesc;
+ TupleConstr *constr;
+ int natts;
+ int attrChk;
+ Datum *values;
+ bool *nulls;
+ bool found;
+ bool sample_it = false;
+ char *filename;
+ struct stat stat_buf;
+ List *options;
+ CopyState cstate;
+ ErrorContextCallback errcontext;
+
+ Assert(onerel);
+ Assert(targrows > 0);
+
+ tupDesc = RelationGetDescr(onerel);
+ constr = tupDesc->constr;
+ natts = tupDesc->natts;
+ values = (Datum *) palloc(tupDesc->natts * sizeof(Datum));
+ nulls = (bool *) palloc(tupDesc->natts * sizeof(bool));
+
+ /* Fetch options of foreign table */
+ fileGetOptions(RelationGetRelid(onerel), &filename, &options);
+
+ /*
+ * Get size of the file.
+ */
+ if (stat(filename, &stat_buf) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ filename)));
+
+ /*
+ * Convert size to pages for use in I/O cost estimate.
+ */
+ *totalpages = (stat_buf.st_size + (BLCKSZ - 1)) / BLCKSZ;
+ if (*totalpages < 1)
+ *totalpages = 1;
+
+ /*
+ * Create CopyState from FDW options. We always acquire all columns, so
+ * as to match the expected ScanTupleSlot signature.
+ */
+ cstate = BeginCopyFrom(onerel, filename, NIL, options);
+
+ /* Prepare for sampling rows */
+ rstate = init_selection_state(targrows);
+
+ /* Set up callback to identify error line number. */
+ errcontext.callback = CopyFromErrorCallback;
+ errcontext.arg = (void *) cstate;
+ errcontext.previous = error_context_stack;
+ error_context_stack = &errcontext;
+
+ for (;;)
+ {
+ sample_it = true;
+
+ /*
+ * Check for user-requested abort.
+ */
+ CHECK_FOR_INTERRUPTS();
+
+ found = NextCopyFrom(cstate, NULL, values, nulls, NULL);
+
+ if (!found)
+ break;
+
+ tuple = heap_form_tuple(tupDesc, values, nulls);
+
+ if (constr && constr->has_not_null)
+ {
+ for (attrChk = 1; attrChk <= natts; attrChk++)
+ {
+ if (onerel->rd_att->attrs[attrChk - 1]->attnotnull &&
+ heap_attisnull(tuple, attrChk))
+ {
+ sample_it = false;
+ break;
+ }
+ }
+ }
+
+ if (!sample_it)
+ {
+ invalrows += 1;
+ heap_freetuple(tuple);
+ continue;
+ }
+
+ /*
+ * The first targrows sample rows are simply copied into the
+ * reservoir. Then we start replacing tuples in the sample
+ * until we reach the end of the relation. This algorithm is
+ * from Jeff Vitter's paper (see full citation below). It
+ * works by repeatedly computing the number of tuples to skip
+ * before selecting a tuple, which replaces a randomly chosen
+ * element of the reservoir (current set of tuples). At all
+ * times the reservoir is a true random sample of the tuples
+ * we've passed over so far, so when we fall off the end of
+ * the relation we're done.
+ */
+ if (numrows < targrows)
+ rows[numrows++] = heap_copytuple(tuple);
+ else
+ {
+ /*
+ * t in Vitter's paper is the number of records already
+ * processed. If we need to compute a new S value, we
+ * must use the not-yet-incremented value of samplerows as
+ * t.
+ */
+ if (rowstoskip < 0)
+ rowstoskip = get_next_S(validrows, targrows, &rstate);
+
+ if (rowstoskip <= 0)
+ {
+ /*
+ * Found a suitable tuple, so save it, replacing one
+ * old tuple at random
+ */
+ int k = (int) (targrows * random_fract());
+
+ Assert(k >= 0 && k < targrows);
+ heap_freetuple(rows[k]);
+ rows[k] = heap_copytuple(tuple);
+ }
+
+ rowstoskip -= 1;
+ }
+
+ validrows += 1;
+ heap_freetuple(tuple);
+ }
+
+ /* Remove error callback. */
+ error_context_stack = errcontext.previous;
+
+ *totalrows = validrows;
+ *totaldeadrows = 0;
+
+ EndCopyFrom(cstate);
+
+ pfree(values);
+ pfree(nulls);
+
+ /*
+ * Emit some interesting relation info
+ */
+ ereport(elevel,
+ (errmsg("\"%s\": scanned, "
+ "containing %d valid rows and %d invalid rows; "
+ "%d rows in sample, %d total rows",
+ RelationGetRelationName(onerel),
+ (int) validrows, invalrows,
+ numrows, (int) *totalrows)));
+
+ return numrows;
+ }
diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 8e3d553..21b6fb4 100644
*** a/contrib/file_fdw/input/file_fdw.source
--- b/contrib/file_fdw/input/file_fdw.source
*************** EXECUTE st(100);
*** 111,116 ****
--- 111,121 ----
EXECUTE st(100);
DEALLOCATE st;
+ -- statistics collection tests
+ ANALYZE agg_csv;
+ SELECT relpages, reltuples FROM pg_class WHERE relname = 'agg_csv';
+ SELECT * FROM pg_stats WHERE tablename = 'agg_csv';
+
-- tableoid
SELECT tableoid::regclass, b FROM agg_csv;
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 84f0750..fe0d67f 100644
*** a/contrib/file_fdw/output/file_fdw.source
--- b/contrib/file_fdw/output/file_fdw.source
*************** EXECUTE st(100);
*** 174,179 ****
--- 174,194 ----
(1 row)
DEALLOCATE st;
+ -- statistics collection tests
+ ANALYZE agg_csv;
+ SELECT relpages, reltuples FROM pg_class WHERE relname = 'agg_csv';
+ relpages | reltuples
+ ----------+-----------
+ 1 | 3
+ (1 row)
+
+ SELECT * FROM pg_stats WHERE tablename = 'agg_csv';
+ schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
+ ------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+-------------------------+-------------
+ public | agg_csv | a | f | 0 | 2 | -1 | | | {0,42,100} | -0.5
+ public | agg_csv | b | f | 0 | 4 | -1 | | | {0.09561,99.097,324.78} | 0.5
+ (2 rows)
+
-- tableoid
SELECT tableoid::regclass, b FROM agg_csv;
tableoid | b
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 76ff243..7d6443a 100644
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
*************** EndForeignScan (ForeignScanState *node);
*** 228,233 ****
--- 228,257 ----
</para>
<para>
+ <programlisting>
+ void
+ AnalyzeForeignTable (Relation onerel,
+ VacuumStmt *vacstmt,
+ int elevel);
+ </programlisting>
+
+ Collect statistics on a foreign table and store the results in the
+ pg_class and pg_statistics system catalogs.
+ This is optional, and if implemented, called when <command>ANALYZE
+ </> command is run.
+ The statistics are used by the query planner in order to make good
+ choices of query plans.
+ The function can be implemented by writing a sampling function that
+ acquires a random sample of rows from an external data source and
+ then by calling <function>do_analyze_rel</>, where you should pass
+ the sampling function as an argument.
+ Or the function can be directly implemented to get the statistics
+ from an external data source, transform it if neccesary, and store
+ it in the pg_class and pg_statistics system catalogs.
+ The function must be set to NULL if it isn't implemented.
+ </para>
+
+ <para>
The <structname>FdwRoutine</> and <structname>FdwPlan</> struct types
are declared in <filename>src/include/foreign/fdwapi.h</>, which see
for additional details.
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 03cc6c9..82795d1 100644
*** a/doc/src/sgml/maintenance.sgml
--- b/doc/src/sgml/maintenance.sgml
***************
*** 279,284 ****
--- 279,288 ----
<command>ANALYZE</> strictly as a function of the number of rows
inserted or updated; it has no knowledge of whether that will lead
to meaningful statistical changes.
+ Note that the autovacuum daemon does not issue <command>ANALYZE</>
+ commands on foreign tables. It is recommended to run manually-managed
+ <command>ANALYZE</> commands as needed, which typically are executed
+ according to a schedule by cron or Task Scheduler scripts.
</para>
<para>
diff --git a/doc/src/sgml/ref/alter_foreign_table.sgml b/doc/src/sgml/ref/alter_foreign_table.sgml
index 5c7a86f..9fbb108 100644
*** a/doc/src/sgml/ref/alter_foreign_table.sgml
--- b/doc/src/sgml/ref/alter_foreign_table.sgml
*************** ALTER FOREIGN TABLE <replaceable class="
*** 36,41 ****
--- 36,44 ----
DROP [ COLUMN ] [ IF EXISTS ] <replaceable class="PARAMETER">column</replaceable> [ RESTRICT | CASCADE ]
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> [ SET DATA ] TYPE <replaceable class="PARAMETER">type</replaceable>
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> { SET | DROP } NOT NULL
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> SET STATISTICS <replaceable class="PARAMETER">integer</replaceable>
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> SET ( <replaceable class="PARAMETER">attribute_option</replaceable> = <replaceable class="PARAMETER">value</replaceable> [, ... ] )
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> RESET ( <replaceable class="PARAMETER">attribute_option</replaceable> [, ... ] )
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> OPTIONS ( [ ADD | SET | DROP ] <replaceable class="PARAMETER">option</replaceable> ['<replaceable class="PARAMETER">value</replaceable>'] [, ... ])
OWNER TO <replaceable class="PARAMETER">new_owner</replaceable>
OPTIONS ( [ ADD | SET | DROP ] <replaceable class="PARAMETER">option</replaceable> ['<replaceable class="PARAMETER">value</replaceable>'] [, ... ])
*************** ALTER FOREIGN TABLE <replaceable class="
*** 94,99 ****
--- 97,146 ----
</varlistentry>
<varlistentry>
+ <term><literal>SET STATISTICS</literal></term>
+ <listitem>
+ <para>
+ This form
+ sets the per-column statistics-gathering target for subsequent
+ <xref linkend="sql-analyze"> operations.
+ The target can be set in the range 0 to 10000; alternatively, set it
+ to -1 to revert to using the system default statistics
+ target (<xref linkend="guc-default-statistics-target">).
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>SET ( <replaceable class="PARAMETER">attribute_option</replaceable> = <replaceable class="PARAMETER">value</replaceable> [, ... ] )</literal></term>
+ <term><literal>RESET ( <replaceable class="PARAMETER">attribute_option</replaceable> [, ... ] )</literal></term>
+ <listitem>
+ <para>
+ This form
+ sets or resets a per-attribute option. Currently, the only defined
+ per-attribute option is <literal>n_distinct</>, which overrides
+ the number-of-distinct-values estimates made by subsequent
+ <xref linkend="sql-analyze"> operations.
+ When set to a positive value, <command>ANALYZE</> will assume that
+ the column contains exactly the specified number of distinct nonnull
+ values.
+ When set to a negative value, which must be greater than or equal
+ to -1, <command>ANALYZE</> will assume that the number of distinct
+ nonnull values in the column is linear in the size of the foreign
+ table; the exact count is to be computed by multiplying the estimated
+ foreign table size by the absolute value of the given number.
+ For example,
+ a value of -1 implies that all values in the column are distinct,
+ while a value of -0.5 implies that each value appears twice on the
+ average.
+ This can be useful when the size of the foreign table changes over
+ time, since the multiplication by the number of rows in the foreign
+ table is not performed until query planning time. Specify a value
+ of 0 to revert to estimating the number of distinct values normally.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>OWNER</literal></term>
<listitem>
<para>
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index 7545fa5..fe13b2b 100644
*** a/doc/src/sgml/ref/analyze.sgml
--- b/doc/src/sgml/ref/analyze.sgml
*************** ANALYZE [ VERBOSE ] [ <replaceable class
*** 39,47 ****
<para>
With no parameter, <command>ANALYZE</command> examines every table in the
! current database. With a parameter, <command>ANALYZE</command> examines
! only that table. It is further possible to give a list of column names,
! in which case only the statistics for those columns are collected.
</para>
</refsect1>
--- 39,49 ----
<para>
With no parameter, <command>ANALYZE</command> examines every table in the
! current database except for foreign tables. With a parameter, <command>
! ANALYZE</command> examines only that table. For a foreign table, it is
! necessary to specify the name of that table. It is further possible to
! give a list of column names, in which case only the statistics for those
! columns are collected.
</para>
</refsect1>
*************** ANALYZE [ VERBOSE ] [ <replaceable class
*** 63,69 ****
<listitem>
<para>
The name (possibly schema-qualified) of a specific table to
! analyze. Defaults to all tables in the current database.
</para>
</listitem>
</varlistentry>
--- 65,72 ----
<listitem>
<para>
The name (possibly schema-qualified) of a specific table to
! analyze. Defaults to all tables in the current database except
! for foreign tables.
</para>
</listitem>
</varlistentry>
*************** ANALYZE [ VERBOSE ] [ <replaceable class
*** 137,143 ****
In rare situations, this non-determinism will cause the planner's
choices of query plans to change after <command>ANALYZE</command> is run.
To avoid this, raise the amount of statistics collected by
! <command>ANALYZE</command>, as described below.
</para>
<para>
--- 140,148 ----
In rare situations, this non-determinism will cause the planner's
choices of query plans to change after <command>ANALYZE</command> is run.
To avoid this, raise the amount of statistics collected by
! <command>ANALYZE</command>, as described below. Note that the time
! needed to analyze on foreign tables depends on the implementation of
! the foreign data wrapper via which such tables are attached.
</para>
<para>
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index c3d3958..fa5fa7e 100644
*** a/src/backend/commands/analyze.c
--- b/src/backend/commands/analyze.c
***************
*** 23,28 ****
--- 23,29 ----
#include "access/xact.h"
#include "catalog/index.h"
#include "catalog/indexing.h"
+ #include "catalog/pg_class.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_namespace.h"
***************
*** 30,35 ****
--- 31,38 ----
#include "commands/tablecmds.h"
#include "commands/vacuum.h"
#include "executor/executor.h"
+ #include "foreign/foreign.h"
+ #include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "parser/parse_oper.h"
*************** typedef struct AnlIndexData
*** 78,91 ****
int default_statistics_target = 100;
/* A few variables that don't seem worth passing around as parameters */
- static int elevel = -1;
-
static MemoryContext anl_context = NULL;
static BufferAccessStrategy vac_strategy;
- static void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh);
static void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
int samplesize);
static bool BlockSampler_HasMore(BlockSampler bs);
--- 81,91 ----
*************** static void compute_index_stats(Relation
*** 96,110 ****
MemoryContext col_context);
static VacAttrStats *examine_attribute(Relation onerel, int attnum,
Node *index_expr);
! static int acquire_sample_rows(Relation onerel, HeapTuple *rows,
! int targrows, double *totalrows, double *totaldeadrows);
! static double random_fract(void);
! static double init_selection_state(int n);
! static double get_next_S(double t, int n, double *stateptr);
! static int compare_rows(const void *a, const void *b);
static int acquire_inherited_sample_rows(Relation onerel,
HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows);
static void update_attstats(Oid relid, bool inh,
int natts, VacAttrStats **vacattrstats);
static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
--- 96,110 ----
MemoryContext col_context);
static VacAttrStats *examine_attribute(Relation onerel, int attnum,
Node *index_expr);
! static int acquire_sample_rows(Relation onerel,
! HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel);
static int acquire_inherited_sample_rows(Relation onerel,
HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel);
! static int compare_rows(const void *a, const void *b);
static void update_attstats(Oid relid, bool inh,
int natts, VacAttrStats **vacattrstats);
static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
*************** static bool std_typanalyze(VacAttrStats
*** 119,125 ****
--- 119,127 ----
void
analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
{
+ int elevel;
Relation onerel;
+ FdwRoutine *fdwroutine;
/* Set up static variables */
if (vacstmt->options & VACOPT_VERBOSE)
*************** analyze_rel(Oid relid, VacuumStmt *vacst
*** 184,193 ****
}
/*
! * Check that it's a plain table; we used to do this in get_rel_oids() but
! * seems safer to check after we've locked the relation.
*/
! if (onerel->rd_rel->relkind != RELKIND_RELATION)
{
/* No need for a WARNING if we already complained during VACUUM */
if (!(vacstmt->options & VACOPT_VACUUM))
--- 186,197 ----
}
/*
! * Check that it's a plain table or foreign table; we used to do this
! * in get_rel_oids() but seems safer to check after we've locked the
! * relation.
*/
! if (onerel->rd_rel->relkind != RELKIND_RELATION &&
! onerel->rd_rel->relkind != RELKIND_FOREIGN_TABLE)
{
/* No need for a WARNING if we already complained during VACUUM */
if (!(vacstmt->options & VACOPT_VACUUM))
*************** analyze_rel(Oid relid, VacuumStmt *vacst
*** 211,217 ****
}
/*
! * We can ANALYZE any table except pg_statistic. See update_attstats
*/
if (RelationGetRelid(onerel) == StatisticRelationId)
{
--- 215,223 ----
}
/*
! * We can ANALYZE any table except pg_statistic. See update_attstats. In
! * addition, we can ANALYZE foreign tables if AnalyzeForeignTable callback
! * routines of underlying foreign-data wrappers are implemented.
*/
if (RelationGetRelid(onerel) == StatisticRelationId)
{
*************** analyze_rel(Oid relid, VacuumStmt *vacst
*** 219,224 ****
--- 225,244 ----
return;
}
+ if (onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ {
+ fdwroutine = GetFdwRoutineByRelId(RelationGetRelid(onerel));
+
+ if (fdwroutine->AnalyzeForeignTable == NULL)
+ {
+ ereport(WARNING,
+ (errmsg("skipping \"%s\" --- underlying foreign-data wrapper cannot analyze it",
+ RelationGetRelationName(onerel))));
+ relation_close(onerel, ShareUpdateExclusiveLock);
+ return;
+ }
+ }
+
/*
* OK, let's do it. First let other backends know I'm in ANALYZE.
*/
*************** analyze_rel(Oid relid, VacuumStmt *vacst
*** 226,241 ****
MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
LWLockRelease(ProcArrayLock);
! /*
! * Do the normal non-recursive ANALYZE.
! */
! do_analyze_rel(onerel, vacstmt, false);
! /*
! * If there are child tables, do recursive ANALYZE.
! */
! if (onerel->rd_rel->relhassubclass)
! do_analyze_rel(onerel, vacstmt, true);
/*
* Close source relation now, but keep lock so that no one deletes it
--- 246,283 ----
MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
LWLockRelease(ProcArrayLock);
! if (onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
! {
! ereport(elevel,
! (errmsg("analyzing \"%s.%s\"",
! get_namespace_name(RelationGetNamespace(onerel)),
! RelationGetRelationName(onerel))));
! fdwroutine->AnalyzeForeignTable(onerel, vacstmt, elevel);
! }
! else
! {
! /*
! * Do the normal non-recursive ANALYZE.
! */
! ereport(elevel,
! (errmsg("analyzing \"%s.%s\"",
! get_namespace_name(RelationGetNamespace(onerel)),
! RelationGetRelationName(onerel))));
! do_analyze_rel(onerel, vacstmt, elevel, false, acquire_sample_rows);
! /*
! * If there are child tables, do recursive ANALYZE.
! */
! if (onerel->rd_rel->relhassubclass)
! {
! ereport(elevel,
! (errmsg("analyzing \"%s.%s\" inheritance tree",
! get_namespace_name(RelationGetNamespace(onerel)),
! RelationGetRelationName(onerel))));
! do_analyze_rel(onerel, vacstmt, elevel, true,
! acquire_inherited_sample_rows);
! }
! }
/*
* Close source relation now, but keep lock so that no one deletes it
*************** analyze_rel(Oid relid, VacuumStmt *vacst
*** 257,264 ****
/*
* do_analyze_rel() -- analyze one relation, recursively or not
*/
! static void
! do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
{
int attr_cnt,
tcnt,
--- 299,307 ----
/*
* do_analyze_rel() -- analyze one relation, recursively or not
*/
! void
! do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, int elevel,
! bool inh, SampleRowAcquireFunc acquirefunc)
{
int attr_cnt,
tcnt,
*************** do_analyze_rel(Relation onerel, VacuumSt
*** 273,278 ****
--- 316,322 ----
numrows;
double totalrows,
totaldeadrows;
+ BlockNumber totalpages;
HeapTuple *rows;
PGRUsage ru0;
TimestampTz starttime = 0;
*************** do_analyze_rel(Relation onerel, VacuumSt
*** 281,297 ****
int save_sec_context;
int save_nestlevel;
- if (inh)
- ereport(elevel,
- (errmsg("analyzing \"%s.%s\" inheritance tree",
- get_namespace_name(RelationGetNamespace(onerel)),
- RelationGetRelationName(onerel))));
- else
- ereport(elevel,
- (errmsg("analyzing \"%s.%s\"",
- get_namespace_name(RelationGetNamespace(onerel)),
- RelationGetRelationName(onerel))));
-
/*
* Set up a working context so that we can easily free whatever junk gets
* created.
--- 325,330 ----
*************** do_analyze_rel(Relation onerel, VacuumSt
*** 449,459 ****
*/
rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
if (inh)
! numrows = acquire_inherited_sample_rows(onerel, rows, targrows,
! &totalrows, &totaldeadrows);
else
! numrows = acquire_sample_rows(onerel, rows, targrows,
! &totalrows, &totaldeadrows);
/*
* Compute the statistics. Temporary results during the calculations for
--- 482,494 ----
*/
rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
if (inh)
! numrows = acquirefunc(onerel, rows, targrows,
! &totalrows, &totaldeadrows,
! NULL, elevel);
else
! numrows = acquirefunc(onerel, rows, targrows,
! &totalrows, &totaldeadrows,
! &totalpages, elevel);
/*
* Compute the statistics. Temporary results during the calculations for
*************** do_analyze_rel(Relation onerel, VacuumSt
*** 534,540 ****
*/
if (!inh)
vac_update_relstats(onerel,
! RelationGetNumberOfBlocks(onerel),
totalrows,
visibilitymap_count(onerel),
hasindex,
--- 569,575 ----
*/
if (!inh)
vac_update_relstats(onerel,
! totalpages,
totalrows,
visibilitymap_count(onerel),
hasindex,
*************** BlockSampler_Next(BlockSampler bs)
*** 1017,1023 ****
*/
static int
acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows)
{
int numrows = 0; /* # rows now in reservoir */
double samplerows = 0; /* total # rows collected */
--- 1052,1059 ----
*/
static int
acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel)
{
int numrows = 0; /* # rows now in reservoir */
double samplerows = 0; /* total # rows collected */
*************** acquire_sample_rows(Relation onerel, Hea
*** 1032,1037 ****
--- 1068,1075 ----
Assert(targrows > 0);
totalblocks = RelationGetNumberOfBlocks(onerel);
+ if (totalpages)
+ *totalpages = totalblocks;
/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true);
*************** acquire_sample_rows(Relation onerel, Hea
*** 1254,1260 ****
}
/* Select a random value R uniformly distributed in (0 - 1) */
! static double
random_fract(void)
{
return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
--- 1292,1298 ----
}
/* Select a random value R uniformly distributed in (0 - 1) */
! double
random_fract(void)
{
return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
*************** random_fract(void)
*** 1274,1287 ****
* determines the number of records to skip before the next record is
* processed.
*/
! static double
init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
return exp(-log(random_fract()) / n);
}
! static double
get_next_S(double t, int n, double *stateptr)
{
double S;
--- 1312,1325 ----
* determines the number of records to skip before the next record is
* processed.
*/
! double
init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
return exp(-log(random_fract()) / n);
}
! double
get_next_S(double t, int n, double *stateptr)
{
double S;
*************** compare_rows(const void *a, const void *
*** 1397,1403 ****
*/
static int
acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows)
{
List *tableOIDs;
Relation *rels;
--- 1435,1442 ----
*/
static int
acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel)
{
List *tableOIDs;
Relation *rels;
*************** acquire_inherited_sample_rows(Relation o
*** 1460,1465 ****
--- 1499,1506 ----
totalblocks += relblocks[nrels];
nrels++;
}
+ if (totalpages)
+ *totalpages = totalblocks;
/*
* Now sample rows from each relation, proportionally to its fraction of
*************** acquire_inherited_sample_rows(Relation o
*** 1493,1499 ****
rows + numrows,
childtargrows,
&trows,
! &tdrows);
/* We may need to convert from child's rowtype to parent's */
if (childrows > 0 &&
--- 1534,1542 ----
rows + numrows,
childtargrows,
&trows,
! &tdrows,
! NULL,
! elevel);
/* We may need to convert from child's rowtype to parent's */
if (childrows > 0 &&
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1ee201c..9e7c063 100644
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
*************** static void ATPrepSetStatistics(Relation
*** 317,322 ****
--- 317,324 ----
Node *newValue, LOCKMODE lockmode);
static void ATExecSetStatistics(Relation rel, const char *colName,
Node *newValue, LOCKMODE lockmode);
+ static void ATPrepSetOptions(Relation rel, const char *colName,
+ Node *options, LOCKMODE lockmode);
static void ATExecSetOptions(Relation rel, const char *colName,
Node *options, bool isReset, LOCKMODE lockmode);
static void ATExecSetStorage(Relation rel, const char *colName,
*************** ATPrepCmd(List **wqueue, Relation rel, A
*** 2915,2921 ****
break;
case AT_SetOptions: /* ALTER COLUMN SET ( options ) */
case AT_ResetOptions: /* ALTER COLUMN RESET ( options ) */
! ATSimplePermissions(rel, ATT_TABLE | ATT_INDEX);
/* This command never recurses */
pass = AT_PASS_MISC;
break;
--- 2917,2924 ----
break;
case AT_SetOptions: /* ALTER COLUMN SET ( options ) */
case AT_ResetOptions: /* ALTER COLUMN RESET ( options ) */
! ATSimplePermissions(rel, ATT_TABLE | ATT_INDEX | ATT_FOREIGN_TABLE);
! ATPrepSetOptions(rel, cmd->name, cmd->def, lockmode);
/* This command never recurses */
pass = AT_PASS_MISC;
break;
*************** ATPrepSetStatistics(Relation rel, const
*** 4851,4860 ****
* allowSystemTableMods to be turned on.
*/
if (rel->rd_rel->relkind != RELKIND_RELATION &&
! rel->rd_rel->relkind != RELKIND_INDEX)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
! errmsg("\"%s\" is not a table or index",
RelationGetRelationName(rel))));
/* Permissions checks */
--- 4854,4864 ----
* allowSystemTableMods to be turned on.
*/
if (rel->rd_rel->relkind != RELKIND_RELATION &&
! rel->rd_rel->relkind != RELKIND_INDEX &&
! rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
! errmsg("\"%s\" is not a table, index or foreign table",
RelationGetRelationName(rel))));
/* Permissions checks */
*************** ATExecSetStatistics(Relation rel, const
*** 4923,4928 ****
--- 4927,4952 ----
}
static void
+ ATPrepSetOptions(Relation rel, const char *colName, Node *options,
+ LOCKMODE lockmode)
+ {
+ if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ {
+ ListCell *cell;
+
+ foreach(cell, (List *) options)
+ {
+ DefElem *def = (DefElem *) lfirst(cell);
+
+ if (pg_strcasecmp(def->defname, "n_distinct_inherited") == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot set \"n_distinct_inherited\" for foreign tables")));
+ }
+ }
+ }
+
+ static void
ATExecSetOptions(Relation rel, const char *colName, Node *options,
bool isReset, LOCKMODE lockmode)
{
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index dcafdd2..802abf2 100644
*** a/src/bin/psql/describe.c
--- b/src/bin/psql/describe.c
*************** describeOneTableDetails(const char *sche
*** 1099,1105 ****
bool printTableInitialized = false;
int i;
char *view_def = NULL;
! char *headers[6];
char **seq_values = NULL;
char **modifiers = NULL;
char **ptr;
--- 1099,1105 ----
bool printTableInitialized = false;
int i;
char *view_def = NULL;
! char *headers[7];
char **seq_values = NULL;
char **modifiers = NULL;
char **ptr;
*************** describeOneTableDetails(const char *sche
*** 1390,1396 ****
if (verbose)
{
headers[cols++] = gettext_noop("Storage");
! if (tableinfo.relkind == 'r')
headers[cols++] = gettext_noop("Stats target");
/* Column comments, if the relkind supports this feature. */
if (tableinfo.relkind == 'r' || tableinfo.relkind == 'v' ||
--- 1390,1396 ----
if (verbose)
{
headers[cols++] = gettext_noop("Storage");
! if (tableinfo.relkind == 'r' || tableinfo.relkind == 'f')
headers[cols++] = gettext_noop("Stats target");
/* Column comments, if the relkind supports this feature. */
if (tableinfo.relkind == 'r' || tableinfo.relkind == 'v' ||
*************** describeOneTableDetails(const char *sche
*** 1493,1499 ****
false, false);
/* Statistics target, if the relkind supports this feature */
! if (tableinfo.relkind == 'r')
{
printTableAddCell(&cont, PQgetvalue(res, i, firstvcol + 1),
false, false);
--- 1493,1499 ----
false, false);
/* Statistics target, if the relkind supports this feature */
! if (tableinfo.relkind == 'r' || tableinfo.relkind == 'f')
{
printTableAddCell(&cont, PQgetvalue(res, i, firstvcol + 1),
false, false);
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index bb0fa09..1507a61 100644
*** a/src/bin/psql/tab-complete.c
--- b/src/bin/psql/tab-complete.c
*************** static const SchemaQuery Query_for_list_
*** 399,404 ****
--- 399,419 ----
NULL
};
+ static const SchemaQuery Query_for_list_of_tf = {
+ /* catname */
+ "pg_catalog.pg_class c",
+ /* selcondition */
+ "c.relkind IN ('r', 'f')",
+ /* viscondition */
+ "pg_catalog.pg_table_is_visible(c.oid)",
+ /* namespace */
+ "c.relnamespace",
+ /* result */
+ "pg_catalog.quote_ident(c.relname)",
+ /* qualresult */
+ NULL
+ };
+
static const SchemaQuery Query_for_list_of_views = {
/* catname */
"pg_catalog.pg_class c",
*************** psql_completion(char *text, int start, i
*** 2769,2775 ****
/* ANALYZE */
/* If the previous word is ANALYZE, produce list of tables */
else if (pg_strcasecmp(prev_wd, "ANALYZE") == 0)
! COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);
/* WHERE */
/* Simple case of the word before the where being the table name */
--- 2784,2790 ----
/* ANALYZE */
/* If the previous word is ANALYZE, produce list of tables */
else if (pg_strcasecmp(prev_wd, "ANALYZE") == 0)
! COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tf, NULL);
/* WHERE */
/* Simple case of the word before the where being the table name */
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index d8fd0ca..0c24e2d 100644
*** a/src/include/commands/vacuum.h
--- b/src/include/commands/vacuum.h
*************** extern void lazy_vacuum_rel(Relation one
*** 165,171 ****
--- 165,182 ----
BufferAccessStrategy bstrategy);
/* in commands/analyze.c */
+ typedef int (*SampleRowAcquireFunc) (Relation onerel, HeapTuple *rows,
+ int targrows,
+ double *totalrows, double *totaldeadrows,
+ BlockNumber *totalpages, int elevel);
+
extern void analyze_rel(Oid relid, VacuumStmt *vacstmt,
BufferAccessStrategy bstrategy);
+ extern void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, int elevel,
+ bool inh, SampleRowAcquireFunc acquirefunc);
+ extern double random_fract(void);
+ extern double init_selection_state(int n);
+ extern double get_next_S(double t, int n, double *stateptr);
+
#endif /* VACUUM_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 3378ba9..3543f15 100644
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 12,19 ****
--- 12,21 ----
#ifndef FDWAPI_H
#define FDWAPI_H
+ #include "foreign/foreign.h"
#include "nodes/execnodes.h"
#include "nodes/relation.h"
+ #include "utils/rel.h"
/* To avoid including explain.h here, reference ExplainState thus: */
struct ExplainState;
*************** typedef void (*ReScanForeignScan_functio
*** 68,73 ****
--- 70,78 ----
typedef void (*EndForeignScan_function) (ForeignScanState *node);
+ typedef void (*AnalyzeForeignTable_function) (Relation relation,
+ VacuumStmt *vacstmt,
+ int elevel);
/*
* FdwRoutine is the struct returned by a foreign-data wrapper's handler
*************** typedef struct FdwRoutine
*** 82,93 ****
--- 87,109 ----
{
NodeTag type;
+ /*
+ * These handlers are required to execute simple scans on a foreign
+ * table. If any of them was set to NULL, scans on a foreign table
+ * managed by FDW would fail.
+ */
PlanForeignScan_function PlanForeignScan;
ExplainForeignScan_function ExplainForeignScan;
BeginForeignScan_function BeginForeignScan;
IterateForeignScan_function IterateForeignScan;
ReScanForeignScan_function ReScanForeignScan;
EndForeignScan_function EndForeignScan;
+
+ /*
+ * Handlers below are optional. You can set any of them to NULL to
+ * tell PostgreSQL that FDW doesn't have the capability.
+ */
+ AnalyzeForeignTable_function AnalyzeForeignTable;
} FdwRoutine;
diff --git a/src/test/regress/expected/foreign_data.out b/src/test/regress/expected/foreign_data.out
index 122e285..48a07f3 100644
*** a/src/test/regress/expected/foreign_data.out
--- b/src/test/regress/expected/foreign_data.out
*************** CREATE FOREIGN TABLE ft1 (
*** 678,689 ****
COMMENT ON FOREIGN TABLE ft1 IS 'ft1';
COMMENT ON COLUMN ft1.c1 IS 'ft1.c1';
\d+ ft1
! Foreign table "public.ft1"
! Column | Type | Modifiers | FDW Options | Storage | Description
! --------+---------+-----------+--------------------------------+----------+-------------
! c1 | integer | not null | ("param 1" 'val1') | plain | ft1.c1
! c2 | text | | (param2 'val2', param3 'val3') | extended |
! c3 | date | | | plain |
Server: s0
FDW Options: (delimiter ',', quote '"', "be quoted" 'value')
Has OIDs: no
--- 678,689 ----
COMMENT ON FOREIGN TABLE ft1 IS 'ft1';
COMMENT ON COLUMN ft1.c1 IS 'ft1.c1';
\d+ ft1
! Foreign table "public.ft1"
! Column | Type | Modifiers | FDW Options | Storage | Stats target | Description
! --------+---------+-----------+--------------------------------+----------+--------------+-------------
! c1 | integer | not null | ("param 1" 'val1') | plain | | ft1.c1
! c2 | text | | (param2 'val2', param3 'val3') | extended | |
! c3 | date | | | plain | |
Server: s0
FDW Options: (delimiter ',', quote '"', "be quoted" 'value')
Has OIDs: no
*************** ERROR: cannot alter system column "xmin
*** 729,747 ****
ALTER FOREIGN TABLE ft1 ALTER COLUMN c7 OPTIONS (ADD p1 'v1', ADD p2 'v2'),
ALTER COLUMN c8 OPTIONS (ADD p1 'v1', ADD p2 'v2');
ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 OPTIONS (SET p2 'V2', DROP p1);
\d+ ft1
! Foreign table "public.ft1"
! Column | Type | Modifiers | FDW Options | Storage | Description
! --------+---------+-----------+--------------------------------+----------+-------------
! c1 | integer | not null | ("param 1" 'val1') | plain |
! c2 | text | | (param2 'val2', param3 'val3') | extended |
! c3 | date | | | plain |
! c4 | integer | | | plain |
! c6 | integer | not null | | plain |
! c7 | integer | | (p1 'v1', p2 'v2') | plain |
! c8 | text | | (p2 'V2') | extended |
! c9 | integer | | | plain |
! c10 | integer | | (p1 'v1') | plain |
Server: s0
FDW Options: (delimiter ',', quote '"', "be quoted" 'value')
Has OIDs: no
--- 729,752 ----
ALTER FOREIGN TABLE ft1 ALTER COLUMN c7 OPTIONS (ADD p1 'v1', ADD p2 'v2'),
ALTER COLUMN c8 OPTIONS (ADD p1 'v1', ADD p2 'v2');
ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 OPTIONS (SET p2 'V2', DROP p1);
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET STATISTICS 10000;
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET (n_distinct = 100);
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET (n_distinct_inherited = 100); -- ERROR
+ ERROR: cannot set "n_distinct_inherited" for foreign tables
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 SET STATISTICS -1;
\d+ ft1
! Foreign table "public.ft1"
! Column | Type | Modifiers | FDW Options | Storage | Stats target | Description
! --------+---------+-----------+--------------------------------+----------+--------------+-------------
! c1 | integer | not null | ("param 1" 'val1') | plain | 10000 |
! c2 | text | | (param2 'val2', param3 'val3') | extended | |
! c3 | date | | | plain | |
! c4 | integer | | | plain | |
! c6 | integer | not null | | plain | |
! c7 | integer | | (p1 'v1', p2 'v2') | plain | |
! c8 | text | | (p2 'V2') | extended | |
! c9 | integer | | | plain | |
! c10 | integer | | (p1 'v1') | plain | |
Server: s0
FDW Options: (delimiter ',', quote '"', "be quoted" 'value')
Has OIDs: no
diff --git a/src/test/regress/sql/foreign_data.sql b/src/test/regress/sql/foreign_data.sql
index e99e707..5908ff3 100644
*** a/src/test/regress/sql/foreign_data.sql
--- b/src/test/regress/sql/foreign_data.sql
*************** ALTER FOREIGN TABLE ft1 ALTER COLUMN xmi
*** 306,311 ****
--- 306,315 ----
ALTER FOREIGN TABLE ft1 ALTER COLUMN c7 OPTIONS (ADD p1 'v1', ADD p2 'v2'),
ALTER COLUMN c8 OPTIONS (ADD p1 'v1', ADD p2 'v2');
ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 OPTIONS (SET p2 'V2', DROP p1);
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET STATISTICS 10000;
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET (n_distinct = 100);
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET (n_distinct_inherited = 100); -- ERROR
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 SET STATISTICS -1;
\d+ ft1
-- can't change the column type if it's used elsewhere
CREATE TABLE use_ft1_column_type (x ft1);
(2011/12/14 15:34), Shigeru Hanada wrote:
(2011/12/13 22:00), Etsuro Fujita wrote:
Thank you for your effectiveness experiments and proposals for
improvements. I updated the patch according to your proposals.
Attached is the updated version of the patch.I think this patch could be marked as "Ready for committer" with some
minor fixes. Please find attached a revised patch (v6.1).
Many thanks.
Best regards,
Etsuro Fujita
(2011/12/15 11:30), Etsuro Fujita wrote:
(2011/12/14 15:34), Shigeru Hanada wrote:
I think this patch could be marked as "Ready for committer" with some
minor fixes. Please find attached a revised patch (v6.1).
I've tried to make pgsql_fdw work with this feature, and found that few
static functions to be needed to exported to implement ANALYZE handler
in short-cut style. The "Short-cut style" means the way to generate
statistics (pg_class and pg_statistic) for foreign tables without
retrieving sample data from foreign server.
Attached patch (export_funcs.patch) exports examine_attribute and
update_attstats which are necessary to implement ANALYZE handler for
pgsql_fdw. In addition to exporting, update_attstats is also renamed to
vac_update_attstats to fit with already exported function
vac_update_relstats.
I also attached archive of WIP pgsql_fdw with ANALYZE support. This
version has better estimation than original pgsql_fdw, because it can
use selectivity of qualifiers evaluated on local side to estimate number
of result rows. To show the effect of ANALYZE clearly, WHERE push-down
feature is disabled. Please see pgsqlAnalyzeForeignTable and
store_remote_stats in pgsql_fdw.c.
I used pgbench_accounts tables with 30000 records, and got reasonable
rows estimation for queries below.
<on remote side>
postgres=# UPDATE pgbench_accounts SET filler = NULL
postgres-# WHERE aid % 3 = 0;
postgres=# ANALYZE;
<on local side>
postgres=# ANALYZE pgbench_accounts; -- needs explicit table name
postgres=# EXPLAIN SELECT * FROM pgbench_accounts WHERE filler IS NULL;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Foreign Scan on pgbench_accounts (cost=100.00..40610.00 rows=100030
width=97)
Filter: (filler IS NULL)
Remote SQL: DECLARE pgsql_fdw_cursor_13 SCROLL CURSOR FOR SELECT aid,
bid, abalance, filler FROM public.pgbench_accounts
(3 rows)
postgres=# EXPLAIN SELECT * FROM pgbench_accounts WHERE aid < 100;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Foreign Scan on pgbench_accounts (cost=100.00..40610.00 rows=96 width=97)
Filter: (aid < 100)
Remote SQL: DECLARE pgsql_fdw_cursor_14 SCROLL CURSOR FOR SELECT aid,
bid, abalance, filler FROM public.pgbench_accounts
(3 rows)
postgres=# EXPLAIN SELECT * FROM pgbench_accounts WHERE aid < 1000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Foreign Scan on pgbench_accounts (cost=100.00..40610.00 rows=1004
width=97)
Filter: (aid < 1000)
Remote SQL: DECLARE pgsql_fdw_cursor_15 SCROLL CURSOR FOR SELECT aid,
bid, abalance, filler FROM public.pgbench_accounts
(3 rows)
In implementing ANALYZE handler, hardest part was copying anyarray
values from remote to local. If we can make it common in core, it would
help FDW authors who want to implement ANALYZE handler without
retrieving sample rows from remote server.
Regards,
--
Shigeru Hanada
Attachments:
export_funcs.patchtext/plain; name=export_funcs.patchDownload
commit bb28cb5a69aae3bd9c7fbebc8b9483d23711bec4
Author: Shigeru Hanada <shigeru.hanada@gmail.com>
Date: Thu Feb 9 16:06:14 2012 +0900
Export functions which are useful for FDW analyze support.
Export examine_attribute and update_attstats (with renaming to
vac_update_attstats) which are useful (and nealy required) to implement
short-cut version of ANALYZE handler in FDWs.
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 6a22d49..d0a323a 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -94,8 +94,6 @@ static void compute_index_stats(Relation onerel, double totalrows,
AnlIndexData *indexdata, int nindexes,
HeapTuple *rows, int numrows,
MemoryContext col_context);
-static VacAttrStats *examine_attribute(Relation onerel, int attnum,
- Node *index_expr);
static int acquire_sample_rows(Relation onerel,
HeapTuple *rows, int targrows,
double *totalrows, double *totaldeadrows,
@@ -105,8 +103,6 @@ static int acquire_inherited_sample_rows(Relation onerel,
double *totalrows, double *totaldeadrows,
BlockNumber *totalpages, int elevel);
static int compare_rows(const void *a, const void *b);
-static void update_attstats(Oid relid, bool inh,
- int natts, VacAttrStats **vacattrstats);
static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
@@ -215,9 +211,9 @@ analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
}
/*
- * We can ANALYZE any table except pg_statistic. See update_attstats. In
- * addition, we can ANALYZE foreign tables if AnalyzeForeignTable callback
- * routines of underlying foreign-data wrappers are implemented.
+ * We can ANALYZE any table except pg_statistic. See vac_update_attstats.
+ * In addition, we can ANALYZE foreign tables if AnalyzeForeignTable
+ * callback routines of underlying foreign-data wrappers are implemented.
*/
if (RelationGetRelid(onerel) == StatisticRelationId)
{
@@ -283,7 +279,7 @@ analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
* Close source relation now, but keep lock so that no one deletes it
* before we commit. (If someone did, they'd fail to clean up the entries
* we made in pg_statistic. Also, releasing the lock before commit would
- * expose us to concurrent-update failures in update_attstats.)
+ * expose us to concurrent-update failures in vac_update_attstats.)
*/
relation_close(onerel, NoLock);
@@ -551,15 +547,15 @@ do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, int elevel,
* previous statistics for the target columns. (If there are stats in
* pg_statistic for columns we didn't process, we leave them alone.)
*/
- update_attstats(RelationGetRelid(onerel), inh,
- attr_cnt, vacattrstats);
+ vac_update_attstats(RelationGetRelid(onerel), inh,
+ attr_cnt, vacattrstats);
for (ind = 0; ind < nindexes; ind++)
{
AnlIndexData *thisdata = &indexdata[ind];
- update_attstats(RelationGetRelid(Irel[ind]), false,
- thisdata->attr_cnt, thisdata->vacattrstats);
+ vac_update_attstats(RelationGetRelid(Irel[ind]), false,
+ thisdata->attr_cnt, thisdata->vacattrstats);
}
}
@@ -842,7 +838,7 @@ compute_index_stats(Relation onerel, double totalrows,
* If index_expr isn't NULL, then we're trying to analyze an expression index,
* and index_expr is the expression tree representing the column's data.
*/
-static VacAttrStats *
+VacAttrStats *
examine_attribute(Relation onerel, int attnum, Node *index_expr)
{
Form_pg_attribute attr = onerel->rd_att->attrs[attnum - 1];
@@ -1583,7 +1579,7 @@ acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
/*
- * update_attstats() -- update attribute statistics for one relation
+ * vac_update_attstats() -- update attribute statistics for one relation
*
* Statistics are stored in several places: the pg_class row for the
* relation has stats about the whole relation, and there is a
@@ -1604,8 +1600,8 @@ acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
* ANALYZE the same table concurrently. Presently, we lock that out
* by taking a self-exclusive lock on the relation in analyze_rel().
*/
-static void
-update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
+void
+vac_update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
{
Relation sd;
int attno;
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 1530970..b165953 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -152,6 +152,10 @@ extern void vac_update_relstats(Relation relation,
BlockNumber num_all_visible_pages,
bool hasindex,
TransactionId frozenxid);
+extern void vac_update_attstats(Oid relid,
+ bool inh,
+ int natts,
+ VacAttrStats **vacattrstats);
extern void vacuum_set_xid_limits(int freeze_min_age, int freeze_table_age,
bool sharedRel,
TransactionId *oldestXmin,
@@ -177,6 +181,8 @@ extern void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, int elevel,
extern double random_fract(void);
extern double init_selection_state(int n);
extern double get_next_S(double t, int n, double *stateptr);
+extern VacAttrStats *examine_attribute(Relation onerel, int attnum,
+ Node *index_expr);
#endif /* VACUUM_H */
Hi Hanada-san,
Sorry for the late response.
(2012/02/10 22:05), Shigeru Hanada wrote:
(2011/12/15 11:30), Etsuro Fujita wrote:
(2011/12/14 15:34), Shigeru Hanada wrote:
I think this patch could be marked as "Ready for committer" with some
minor fixes. Please find attached a revised patch (v6.1).I've tried to make pgsql_fdw work with this feature, and found that few
static functions to be needed to exported to implement ANALYZE handler
in short-cut style. The "Short-cut style" means the way to generate
statistics (pg_class and pg_statistic) for foreign tables without
retrieving sample data from foreign server.
That's great! Here is my review.
The patch applies with some modifications and compiles cleanly. But
regression tests on subqueries failed in addition to role related tests
as discussed earlier.
While I've not looked at the patch in detail, I have some comments:
1. The patch might need codes to handle the irregular case where
ANALYZE-related catalog data such as attstattarget are different between
the local and the remote. (Although we don't have the options to set
such a data on a foreign table in ALTER FOREIGN TABLE.) For example,
while attstattarget = -1 for some column on the local, attstattarget = 0
for that column on the remote meaning that there can be no stats
available for that column. In such a case it would be better to inform
the user of it.
2. It might be better for the FDW to estimate the costs of a remote
query for itself without doing EXPLAIN if stats were available using
this feature. While this approach is less accurate compared to the
EXPLAIN approach due to the lack of information such as seq_page_cost or
randam_page_cost on the remote, it is cheaper! I think such a
information may be added to generic options for a foreign table, which
may have been previously discussed.
3.
In implementing ANALYZE handler, hardest part was copying anyarray
values from remote to local. If we can make it common in core, it would
help FDW authors who want to implement ANALYZE handler without
retrieving sample rows from remote server.
+1 from me.
Best regards,
Etsuro Fujita
(2011/12/13 22:00), Etsuro Fujita wrote:
Thank you for your effectiveness experiments and proposals for
improvements. I updated the patch according to your proposals.
Attached is the updated version of the patch.
Hi all,
I've revised the v6.1 patch and created v7 patch, though dead line of
this CF is coming closer. I think that this feature provides a way to
improve plans for foreign tables significantly, so I hope that this
feature is available in 9.2.
I'd like to show overview of the patch again for ease of review.
New FDW API function
====================
This patch adds new FDW API function AnalyzeForeignTable to FdwRoutine
which can be used to support updating local statistics of foreign table.
This function is invoked when ANALYZE command is executed against a
foreign table explicitly. This handler function is optional, so if
underlying FDW set NULL for the pointer, PostgreSQL doesn't touch
statistics but emits a message about skipping.
It's not required to FDWs to implement fully featured analyzer by
itself. They can use core routines, do_analyze_rel and others, for most
difficult part of analyzing. What FDWs should do is to provide a
sampling function and call do_analyze_rel with passing the sampling
function as argument in their concrete AnalyzeForeignTable.
AnalyzeForeignTable (or sampling function) can report FDW-specific
additional information by calling ereport() with given elevel.
Once we've considered an idea that FDW stores statistics information
without calling do_analyze_rel, but it seems very hard to implement, and
not so efficient. I tried to implement such handler in pgsql_fdw (which
seeems easiest to achieve) by getting remote statistics by "SELECT *
FROM pg_statistics", but it has several issues such as:
1) Highly privileged user on remote side should be mapped to ANALYZE invoker
2) Structure and semantics might be difficult on remote side, if
versions are not same.
3) We need to convert anyarray to anyarray through text representation.
We know type of elements, but bothering works are needed.
DDL changes
===========
ALTER FOREIGN TABLE supports SET STATISTICS clause and n_distinct
option. Former changes per-attribute statistics target, and latter
overrides calculated statistics. n_distinct_inherited is not available
because foreign tables can't be inherited.
psql support
============
psql completes foreign table names after the keyword "ANALYZE" in
addition to ordinary tables. Of course newly added statistics target is
shown in \d+ command.
file_fdw
========
This patch contains a use case of new handler function in
contrib/file_fdw. Since file_fdw reads data from a flat file,
fileAnalyzeForeignTable uses similar algorithm to ordinary tables; it
samples first N rows first, and replaces them randomly with subsequent
rows. Also file_fdw updates pg_class.relpages by calculating number of
pages from size of the data file.
To allow FDWs to implement sampling argorighm like this, several
functions are exported from analyze.c, e.g. random_fract,
init_selection_state, and get_next_S.
pgsql_fdw
=========
Though it's not fully finished, I've implemented ANALYZE handler for
pgsql_fdw. Please extract pgsql_fdw.tar.gz into contrib (or use pgxs)
to try it.
Regards,
--
Shigeru HANADA
Attachments:
postgresql-analyze-v7.patchtext/plain; charset=Shift_JIS; name=postgresql-analyze-v7.patchDownload
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index e890770..f84a01f 100644
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
***************
*** 20,25 ****
--- 20,26 ----
#include "commands/copy.h"
#include "commands/defrem.h"
#include "commands/explain.h"
+ #include "commands/vacuum.h"
#include "foreign/fdwapi.h"
#include "foreign/foreign.h"
#include "miscadmin.h"
*************** static void fileBeginForeignScan(Foreign
*** 123,128 ****
--- 124,132 ----
static TupleTableSlot *fileIterateForeignScan(ForeignScanState *node);
static void fileReScanForeignScan(ForeignScanState *node);
static void fileEndForeignScan(ForeignScanState *node);
+ static void fileAnalyzeForeignTable(Relation onerel,
+ VacuumStmt *vacstmt,
+ int elevel);
/*
* Helper functions
*************** static void estimate_size(PlannerInfo *r
*** 136,142 ****
static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
FileFdwPlanState *fdw_private,
Cost *startup_cost, Cost *total_cost);
!
/*
* Foreign-data wrapper handler function: return a struct with pointers
--- 140,149 ----
static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
FileFdwPlanState *fdw_private,
Cost *startup_cost, Cost *total_cost);
! static int acquire_sample_rows(Relation onerel,
! HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel);
/*
* Foreign-data wrapper handler function: return a struct with pointers
*************** file_fdw_handler(PG_FUNCTION_ARGS)
*** 155,160 ****
--- 162,168 ----
fdwroutine->IterateForeignScan = fileIterateForeignScan;
fdwroutine->ReScanForeignScan = fileReScanForeignScan;
fdwroutine->EndForeignScan = fileEndForeignScan;
+ fdwroutine->AnalyzeForeignTable = fileAnalyzeForeignTable;
PG_RETURN_POINTER(fdwroutine);
}
*************** estimate_size(PlannerInfo *root, RelOptI
*** 662,693 ****
double nrows;
/*
! * Get size of the file. It might not be there at plan time, though, in
! * which case we have to use a default estimate.
*/
! if (stat(fdw_private->filename, &stat_buf) < 0)
! stat_buf.st_size = 10 * BLCKSZ;
! /*
! * Convert size to pages for use in I/O cost estimate later.
! */
! pages = (stat_buf.st_size + (BLCKSZ - 1)) / BLCKSZ;
! if (pages < 1)
! pages = 1;
! fdw_private->pages = pages;
! /*
! * Estimate the number of tuples in the file. We back into this estimate
! * using the planner's idea of the relation width; which is bogus if not
! * all columns are being read, not to mention that the text representation
! * of a row probably isn't the same size as its internal representation.
! * FIXME later.
! */
! tuple_width = MAXALIGN(baserel->width) + MAXALIGN(sizeof(HeapTupleHeaderData));
! ntuples = clamp_row_est((double) stat_buf.st_size / (double) tuple_width);
fdw_private->ntuples = ntuples;
/*
--- 670,716 ----
double nrows;
/*
! * Use statistics stored in pg_class as is if any. Otherwise, calculate
! * them from file size and average tuple width.
*/
! if (baserel->pages > 0)
! {
! pages = baserel->pages;
! ntuples = baserel->tuples;
! }
! else
! {
! /*
! * Get size of the file. It might not be there at plan time, though,
! * in which case we have to use a default estimate.
! */
! if (stat(fdw_private->filename, &stat_buf) < 0)
! stat_buf.st_size = 10 * BLCKSZ;
! /*
! * Convert size to pages for use in I/O cost estimate later.
! */
! pages = (stat_buf.st_size + (BLCKSZ - 1)) / BLCKSZ;
! if (pages < 1)
! pages = 1;
! /*
! * Estimate the number of tuples in the file. We back into this
! * estimate using the planner's idea of the relation width; which is
! * bogus if not all columns are being read, not to mention that the
! * text representation of a row probably isn't the same size as its
! * internal representation. FIXME later.
! */
! tuple_width = MAXALIGN(baserel->width) +
! MAXALIGN(sizeof(HeapTupleHeaderData));
! ntuples = clamp_row_est((double) stat_buf.st_size /
! (double) tuple_width);
! }
+ /* Pass estimates to subsequent functions via FileFdwPlanState. */
+ fdw_private->pages = pages;
fdw_private->ntuples = ntuples;
/*
*************** estimate_size(PlannerInfo *root, RelOptI
*** 709,714 ****
--- 732,747 ----
}
/*
+ * fileAnalyzeForeignTable
+ * Analyze foreign table
+ */
+ static void
+ fileAnalyzeForeignTable(Relation onerel, VacuumStmt *vacstmt, int elevel)
+ {
+ do_analyze_rel(onerel, vacstmt, elevel, false, acquire_sample_rows);
+ }
+
+ /*
* Estimate costs of scanning a foreign table.
*
* Results are returned in *startup_cost and *total_cost.
*************** estimate_costs(PlannerInfo *root, RelOpt
*** 736,738 ****
--- 769,957 ----
run_cost += cpu_per_tuple * ntuples;
*total_cost = *startup_cost + run_cost;
}
+
+ /*
+ * acquire_sample_rows -- acquire a random sample of rows from the table
+ *
+ * Selected rows are returned in the caller-allocated array rows[], which must
+ * have at least targrows entries. The actual number of rows selected is
+ * returned as the function result. We also count the number of valid rows in
+ * the table, and return it into *totalrows.
+ *
+ * The returned list of tuples is in order by physical position in the table.
+ * (We will rely on this later to derive correlation estimates.)
+ */
+ static int
+ acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
+ double *totalrows, double *totaldeadrows,
+ BlockNumber *totalpages, int elevel)
+ {
+ int numrows = 0;
+ int invalrows = 0; /* total # rows violating
+ the NOT NULL constraints */
+ double validrows = 0; /* total # rows collected */
+ double rowstoskip = -1; /* -1 means not set yet */
+ double rstate;
+ HeapTuple tuple;
+ TupleDesc tupDesc;
+ TupleConstr *constr;
+ int natts;
+ int attrChk;
+ Datum *values;
+ bool *nulls;
+ bool found;
+ bool sample_it = false;
+ char *filename;
+ struct stat stat_buf;
+ List *options;
+ CopyState cstate;
+ ErrorContextCallback errcontext;
+
+ Assert(onerel);
+ Assert(targrows > 0);
+
+ tupDesc = RelationGetDescr(onerel);
+ constr = tupDesc->constr;
+ natts = tupDesc->natts;
+ values = (Datum *) palloc(tupDesc->natts * sizeof(Datum));
+ nulls = (bool *) palloc(tupDesc->natts * sizeof(bool));
+
+ /* Fetch options of foreign table */
+ fileGetOptions(RelationGetRelid(onerel), &filename, &options);
+
+ /*
+ * Get size of the file.
+ */
+ if (stat(filename, &stat_buf) < 0)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m",
+ filename)));
+
+ /*
+ * Convert size to pages for use in I/O cost estimate.
+ */
+ *totalpages = (stat_buf.st_size + (BLCKSZ - 1)) / BLCKSZ;
+ if (*totalpages < 1)
+ *totalpages = 1;
+
+ /*
+ * Create CopyState from FDW options. We always acquire all columns, so
+ * as to match the expected ScanTupleSlot signature.
+ */
+ cstate = BeginCopyFrom(onerel, filename, NIL, options);
+
+ /* Prepare for sampling rows */
+ rstate = init_selection_state(targrows);
+
+ /* Set up callback to identify error line number. */
+ errcontext.callback = CopyFromErrorCallback;
+ errcontext.arg = (void *) cstate;
+ errcontext.previous = error_context_stack;
+ error_context_stack = &errcontext;
+
+ for (;;)
+ {
+ sample_it = true;
+
+ /*
+ * Check for user-requested abort.
+ */
+ CHECK_FOR_INTERRUPTS();
+
+ found = NextCopyFrom(cstate, NULL, values, nulls, NULL);
+
+ if (!found)
+ break;
+
+ tuple = heap_form_tuple(tupDesc, values, nulls);
+
+ if (constr && constr->has_not_null)
+ {
+ for (attrChk = 1; attrChk <= natts; attrChk++)
+ {
+ if (onerel->rd_att->attrs[attrChk - 1]->attnotnull &&
+ heap_attisnull(tuple, attrChk))
+ {
+ sample_it = false;
+ break;
+ }
+ }
+ }
+
+ if (!sample_it)
+ {
+ invalrows += 1;
+ heap_freetuple(tuple);
+ continue;
+ }
+
+ /*
+ * The first targrows sample rows are simply copied into the
+ * reservoir. Then we start replacing tuples in the sample
+ * until we reach the end of the relation. This algorithm is
+ * from Jeff Vitter's paper (see full citation below). It
+ * works by repeatedly computing the number of tuples to skip
+ * before selecting a tuple, which replaces a randomly chosen
+ * element of the reservoir (current set of tuples). At all
+ * times the reservoir is a true random sample of the tuples
+ * we've passed over so far, so when we fall off the end of
+ * the relation we're done.
+ */
+ if (numrows < targrows)
+ rows[numrows++] = heap_copytuple(tuple);
+ else
+ {
+ /*
+ * t in Vitter's paper is the number of records already
+ * processed. If we need to compute a new S value, we
+ * must use the not-yet-incremented value of samplerows as
+ * t.
+ */
+ if (rowstoskip < 0)
+ rowstoskip = get_next_S(validrows, targrows, &rstate);
+
+ if (rowstoskip <= 0)
+ {
+ /*
+ * Found a suitable tuple, so save it, replacing one
+ * old tuple at random
+ */
+ int k = (int) (targrows * random_fract());
+
+ Assert(k >= 0 && k < targrows);
+ heap_freetuple(rows[k]);
+ rows[k] = heap_copytuple(tuple);
+ }
+
+ rowstoskip -= 1;
+ }
+
+ validrows += 1;
+ heap_freetuple(tuple);
+ }
+
+ /* Remove error callback. */
+ error_context_stack = errcontext.previous;
+
+ *totalrows = validrows;
+ *totaldeadrows = 0;
+
+ EndCopyFrom(cstate);
+
+ pfree(values);
+ pfree(nulls);
+
+ /*
+ * Emit some interesting relation info
+ */
+ ereport(elevel,
+ (errmsg("\"%s\": scanned, "
+ "containing %d valid rows and %d invalid rows; "
+ "%d rows in sample, %d total rows",
+ RelationGetRelationName(onerel),
+ (int) validrows, invalrows,
+ numrows, (int) *totalrows)));
+
+ return numrows;
+ }
diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 8e3d553..fddd9cd 100644
*** a/contrib/file_fdw/input/file_fdw.source
--- b/contrib/file_fdw/input/file_fdw.source
*************** EXECUTE st(100);
*** 111,116 ****
--- 111,121 ----
EXECUTE st(100);
DEALLOCATE st;
+ -- statistics collection tests
+ ANALYZE agg_csv;
+ SELECT relpages, reltuples FROM pg_class WHERE relname = 'agg_csv';
+ SELECT * FROM pg_stats WHERE tablename = 'agg_csv' ORDER BY attname;
+
-- tableoid
SELECT tableoid::regclass, b FROM agg_csv;
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 84f0750..5be5fe0 100644
*** a/contrib/file_fdw/output/file_fdw.source
--- b/contrib/file_fdw/output/file_fdw.source
*************** EXECUTE st(100);
*** 174,179 ****
--- 174,194 ----
(1 row)
DEALLOCATE st;
+ -- statistics collection tests
+ ANALYZE agg_csv;
+ SELECT relpages, reltuples FROM pg_class WHERE relname = 'agg_csv';
+ relpages | reltuples
+ ----------+-----------
+ 1 | 3
+ (1 row)
+
+ SELECT * FROM pg_stats WHERE tablename = 'agg_csv' ORDER BY attname;
+ schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem_count_histogram
+ ------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+-------------------------+-------------+-------------------+------------------------+----------------------
+ public | agg_csv | a | f | 0 | 2 | -1 | | | {0,42,100} | -0.5 | | |
+ public | agg_csv | b | f | 0 | 4 | -1 | | | {0.09561,99.097,324.78} | 0.5 | | |
+ (2 rows)
+
-- tableoid
SELECT tableoid::regclass, b FROM agg_csv;
tableoid | b
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index f7bf3d8..4f962b7 100644
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
*************** EndForeignScan (ForeignScanState *node);
*** 277,282 ****
--- 277,305 ----
</para>
<para>
+ <programlisting>
+ void
+ AnalyzeForeignTable (Relation onerel,
+ VacuumStmt *vacstmt,
+ int elevel);
+ </programlisting>
+
+ Collect statistics on a foreign table and store the results in the
+ pg_class and pg_statistics system catalogs.
+ This is optional, and if implemented, called when <command>ANALYZE</>
+ command is run. The statistics are used by the query planner in order to
+ make good choices of query plans.
+ </para>
+
+ <para>
+ The function can be implemented by writing a sampling function that
+ acquires a random sample of rows from an external data source and
+ then by calling <function>do_analyze_rel</>, where you should pass
+ the sampling function as an argument.
+ The function must be set to NULL if it isn't implemented.
+ </para>
+
+ <para>
The <structname>FdwRoutine</> struct type is declared in
<filename>src/include/foreign/fdwapi.h</>, which see for additional
details.
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 93c3ff5..54d0838 100644
*** a/doc/src/sgml/maintenance.sgml
--- b/doc/src/sgml/maintenance.sgml
***************
*** 284,289 ****
--- 284,293 ----
<command>ANALYZE</> strictly as a function of the number of rows
inserted or updated; it has no knowledge of whether that will lead
to meaningful statistical changes.
+ Note that the autovacuum daemon does not issue <command>ANALYZE</>
+ commands on foreign tables. It is recommended to run manually-managed
+ <command>ANALYZE</> commands as needed, which typically are executed
+ according to a schedule by cron or Task Scheduler scripts.
</para>
<para>
diff --git a/doc/src/sgml/ref/alter_foreign_table.sgml b/doc/src/sgml/ref/alter_foreign_table.sgml
index c4cdaa8..af5c0a8 100644
*** a/doc/src/sgml/ref/alter_foreign_table.sgml
--- b/doc/src/sgml/ref/alter_foreign_table.sgml
*************** ALTER FOREIGN TABLE [ IF EXISTS ] <repla
*** 36,41 ****
--- 36,44 ----
DROP [ COLUMN ] [ IF EXISTS ] <replaceable class="PARAMETER">column</replaceable> [ RESTRICT | CASCADE ]
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> [ SET DATA ] TYPE <replaceable class="PARAMETER">type</replaceable>
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> { SET | DROP } NOT NULL
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> SET STATISTICS <replaceable class="PARAMETER">integer</replaceable>
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> SET ( <replaceable class="PARAMETER">attribute_option</replaceable> = <replaceable class="PARAMETER">value</replaceable> [, ... ] )
+ ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> RESET ( <replaceable class="PARAMETER">attribute_option</replaceable> [, ... ] )
ALTER [ COLUMN ] <replaceable class="PARAMETER">column</replaceable> OPTIONS ( [ ADD | SET | DROP ] <replaceable class="PARAMETER">option</replaceable> ['<replaceable class="PARAMETER">value</replaceable>'] [, ... ])
OWNER TO <replaceable class="PARAMETER">new_owner</replaceable>
OPTIONS ( [ ADD | SET | DROP ] <replaceable class="PARAMETER">option</replaceable> ['<replaceable class="PARAMETER">value</replaceable>'] [, ... ])
*************** ALTER FOREIGN TABLE [ IF EXISTS ] <repla
*** 104,109 ****
--- 107,156 ----
</varlistentry>
<varlistentry>
+ <term><literal>SET STATISTICS</literal></term>
+ <listitem>
+ <para>
+ This form
+ sets the per-column statistics-gathering target for subsequent
+ <xref linkend="sql-analyze"> operations.
+ The target can be set in the range 0 to 10000; alternatively, set it
+ to -1 to revert to using the system default statistics
+ target (<xref linkend="guc-default-statistics-target">).
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>SET ( <replaceable class="PARAMETER">attribute_option</replaceable> = <replaceable class="PARAMETER">value</replaceable> [, ... ] )</literal></term>
+ <term><literal>RESET ( <replaceable class="PARAMETER">attribute_option</replaceable> [, ... ] )</literal></term>
+ <listitem>
+ <para>
+ This form
+ sets or resets a per-attribute option. Currently, the only defined
+ per-attribute option is <literal>n_distinct</>, which overrides
+ the number-of-distinct-values estimates made by subsequent
+ <xref linkend="sql-analyze"> operations.
+ When set to a positive value, <command>ANALYZE</> will assume that
+ the column contains exactly the specified number of distinct nonnull
+ values.
+ When set to a negative value, which must be greater than or equal
+ to -1, <command>ANALYZE</> will assume that the number of distinct
+ nonnull values in the column is linear in the size of the foreign
+ table; the exact count is to be computed by multiplying the estimated
+ foreign table size by the absolute value of the given number.
+ For example,
+ a value of -1 implies that all values in the column are distinct,
+ while a value of -0.5 implies that each value appears twice on the
+ average.
+ This can be useful when the size of the foreign table changes over
+ time, since the multiplication by the number of rows in the foreign
+ table is not performed until query planning time. Specify a value
+ of 0 to revert to estimating the number of distinct values normally.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>OWNER</literal></term>
<listitem>
<para>
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index 8c9057b..97e9ad6 100644
*** a/doc/src/sgml/ref/analyze.sgml
--- b/doc/src/sgml/ref/analyze.sgml
*************** ANALYZE [ VERBOSE ] [ <replaceable class
*** 39,47 ****
<para>
With no parameter, <command>ANALYZE</command> examines every table in the
! current database. With a parameter, <command>ANALYZE</command> examines
! only that table. It is further possible to give a list of column names,
! in which case only the statistics for those columns are collected.
</para>
</refsect1>
--- 39,49 ----
<para>
With no parameter, <command>ANALYZE</command> examines every table in the
! current database except for foreign tables. With a parameter,
! <command>ANALYZE</command> examines only that table. For a foreign table,
! it is necessary to specify the name of that table. It is further possible
! to give a list of column names, in which case only the statistics for those
! columns are collected.
</para>
</refsect1>
*************** ANALYZE [ VERBOSE ] [ <replaceable class
*** 63,69 ****
<listitem>
<para>
The name (possibly schema-qualified) of a specific table to
! analyze. Defaults to all tables in the current database.
</para>
</listitem>
</varlistentry>
--- 65,72 ----
<listitem>
<para>
The name (possibly schema-qualified) of a specific table to
! analyze. Defaults to all tables in the current database except
! for foreign tables.
</para>
</listitem>
</varlistentry>
*************** ANALYZE [ VERBOSE ] [ <replaceable class
*** 137,143 ****
In rare situations, this non-determinism will cause the planner's
choices of query plans to change after <command>ANALYZE</command> is run.
To avoid this, raise the amount of statistics collected by
! <command>ANALYZE</command>, as described below.
</para>
<para>
--- 140,148 ----
In rare situations, this non-determinism will cause the planner's
choices of query plans to change after <command>ANALYZE</command> is run.
To avoid this, raise the amount of statistics collected by
! <command>ANALYZE</command>, as described below. Note that the time
! needed to analyze on foreign tables depends on the implementation of
! the foreign data wrapper via which such tables are attached.
</para>
<para>
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 9cd6e67..6ac9aee 100644
*** a/src/backend/commands/analyze.c
--- b/src/backend/commands/analyze.c
***************
*** 23,28 ****
--- 23,29 ----
#include "access/xact.h"
#include "catalog/index.h"
#include "catalog/indexing.h"
+ #include "catalog/pg_class.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_namespace.h"
***************
*** 30,35 ****
--- 31,38 ----
#include "commands/tablecmds.h"
#include "commands/vacuum.h"
#include "executor/executor.h"
+ #include "foreign/foreign.h"
+ #include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "parser/parse_oper.h"
*************** typedef struct AnlIndexData
*** 78,91 ****
int default_statistics_target = 100;
/* A few variables that don't seem worth passing around as parameters */
- static int elevel = -1;
-
static MemoryContext anl_context = NULL;
static BufferAccessStrategy vac_strategy;
- static void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh);
static void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
int samplesize);
static bool BlockSampler_HasMore(BlockSampler bs);
--- 81,91 ----
*************** static void compute_index_stats(Relation
*** 97,110 ****
static VacAttrStats *examine_attribute(Relation onerel, int attnum,
Node *index_expr);
static int acquire_sample_rows(Relation onerel, HeapTuple *rows,
! int targrows, double *totalrows, double *totaldeadrows);
! static double random_fract(void);
! static double init_selection_state(int n);
! static double get_next_S(double t, int n, double *stateptr);
static int compare_rows(const void *a, const void *b);
static int acquire_inherited_sample_rows(Relation onerel,
HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows);
static void update_attstats(Oid relid, bool inh,
int natts, VacAttrStats **vacattrstats);
static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
--- 97,109 ----
static VacAttrStats *examine_attribute(Relation onerel, int attnum,
Node *index_expr);
static int acquire_sample_rows(Relation onerel, HeapTuple *rows,
! int targrows, double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel);
static int compare_rows(const void *a, const void *b);
static int acquire_inherited_sample_rows(Relation onerel,
HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel);
static void update_attstats(Oid relid, bool inh,
int natts, VacAttrStats **vacattrstats);
static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
*************** void
*** 118,123 ****
--- 117,124 ----
analyze_rel(Oid relid, VacuumStmt *vacstmt, BufferAccessStrategy bstrategy)
{
Relation onerel;
+ int elevel;
+ FdwRoutine *fdwroutine;
/* Set up static variables */
if (vacstmt->options & VACOPT_VERBOSE)
*************** analyze_rel(Oid relid, VacuumStmt *vacst
*** 182,191 ****
}
/*
! * Check that it's a plain table; we used to do this in get_rel_oids() but
! * seems safer to check after we've locked the relation.
*/
! if (onerel->rd_rel->relkind != RELKIND_RELATION)
{
/* No need for a WARNING if we already complained during VACUUM */
if (!(vacstmt->options & VACOPT_VACUUM))
--- 183,194 ----
}
/*
! * Check that it's a plain table or foreign table; we used to do this
! * in get_rel_oids() but seems safer to check after we've locked the
! * relation.
*/
! if (onerel->rd_rel->relkind != RELKIND_RELATION &&
! onerel->rd_rel->relkind != RELKIND_FOREIGN_TABLE)
{
/* No need for a WARNING if we already complained during VACUUM */
if (!(vacstmt->options & VACOPT_VACUUM))
*************** analyze_rel(Oid relid, VacuumStmt *vacst
*** 209,215 ****
}
/*
! * We can ANALYZE any table except pg_statistic. See update_attstats
*/
if (RelationGetRelid(onerel) == StatisticRelationId)
{
--- 212,220 ----
}
/*
! * We can ANALYZE any table except pg_statistic. See update_attstats.
! * In addition, we can ANALYZE foreign tables if AnalyzeForeignTable
! * callback routines of underlying foreign-data wrappers are implemented.
*/
if (RelationGetRelid(onerel) == StatisticRelationId)
{
*************** analyze_rel(Oid relid, VacuumStmt *vacst
*** 217,222 ****
--- 222,241 ----
return;
}
+ if (onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ {
+ fdwroutine = GetFdwRoutineByRelId(RelationGetRelid(onerel));
+
+ if (fdwroutine->AnalyzeForeignTable == NULL)
+ {
+ ereport(WARNING,
+ (errmsg("skipping \"%s\" --- underlying foreign-data wrapper cannot analyze it",
+ RelationGetRelationName(onerel))));
+ relation_close(onerel, ShareUpdateExclusiveLock);
+ return;
+ }
+ }
+
/*
* OK, let's do it. First let other backends know I'm in ANALYZE.
*/
*************** analyze_rel(Oid relid, VacuumStmt *vacst
*** 224,239 ****
MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
LWLockRelease(ProcArrayLock);
! /*
! * Do the normal non-recursive ANALYZE.
! */
! do_analyze_rel(onerel, vacstmt, false);
! /*
! * If there are child tables, do recursive ANALYZE.
! */
! if (onerel->rd_rel->relhassubclass)
! do_analyze_rel(onerel, vacstmt, true);
/*
* Close source relation now, but keep lock so that no one deletes it
--- 243,280 ----
MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
LWLockRelease(ProcArrayLock);
! if (onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
! {
! ereport(elevel,
! (errmsg("analyzing \"%s.%s\"",
! get_namespace_name(RelationGetNamespace(onerel)),
! RelationGetRelationName(onerel))));
! fdwroutine->AnalyzeForeignTable(onerel, vacstmt, elevel);
! }
! else
! {
! /*
! * Do the normal non-recursive ANALYZE.
! */
! ereport(elevel,
! (errmsg("analyzing \"%s.%s\"",
! get_namespace_name(RelationGetNamespace(onerel)),
! RelationGetRelationName(onerel))));
! do_analyze_rel(onerel, vacstmt, elevel, false, acquire_sample_rows);
! /*
! * If there are child tables, do recursive ANALYZE.
! */
! if (onerel->rd_rel->relhassubclass)
! {
! ereport(elevel,
! (errmsg("analyzing \"%s.%s\" inheritance tree",
! get_namespace_name(RelationGetNamespace(onerel)),
! RelationGetRelationName(onerel))));
! do_analyze_rel(onerel, vacstmt, elevel, true,
! acquire_inherited_sample_rows);
! }
! }
/*
* Close source relation now, but keep lock so that no one deletes it
*************** analyze_rel(Oid relid, VacuumStmt *vacst
*** 255,262 ****
/*
* do_analyze_rel() -- analyze one relation, recursively or not
*/
! static void
! do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
{
int attr_cnt,
tcnt,
--- 296,304 ----
/*
* do_analyze_rel() -- analyze one relation, recursively or not
*/
! void
! do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, int elevel,
! bool inh, SampleRowAcquireFunc acquirefunc)
{
int attr_cnt,
tcnt,
*************** do_analyze_rel(Relation onerel, VacuumSt
*** 271,276 ****
--- 313,319 ----
numrows;
double totalrows,
totaldeadrows;
+ BlockNumber totalpages;
HeapTuple *rows;
PGRUsage ru0;
TimestampTz starttime = 0;
*************** do_analyze_rel(Relation onerel, VacuumSt
*** 279,295 ****
int save_sec_context;
int save_nestlevel;
- if (inh)
- ereport(elevel,
- (errmsg("analyzing \"%s.%s\" inheritance tree",
- get_namespace_name(RelationGetNamespace(onerel)),
- RelationGetRelationName(onerel))));
- else
- ereport(elevel,
- (errmsg("analyzing \"%s.%s\"",
- get_namespace_name(RelationGetNamespace(onerel)),
- RelationGetRelationName(onerel))));
-
/*
* Set up a working context so that we can easily free whatever junk gets
* created.
--- 322,327 ----
*************** do_analyze_rel(Relation onerel, VacuumSt
*** 447,457 ****
*/
rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
if (inh)
! numrows = acquire_inherited_sample_rows(onerel, rows, targrows,
! &totalrows, &totaldeadrows);
else
! numrows = acquire_sample_rows(onerel, rows, targrows,
! &totalrows, &totaldeadrows);
/*
* Compute the statistics. Temporary results during the calculations for
--- 479,491 ----
*/
rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
if (inh)
! numrows = acquirefunc(onerel, rows, targrows,
! &totalrows, &totaldeadrows,
! NULL, elevel);
else
! numrows = acquirefunc(onerel, rows, targrows,
! &totalrows, &totaldeadrows,
! &totalpages, elevel);
/*
* Compute the statistics. Temporary results during the calculations for
*************** do_analyze_rel(Relation onerel, VacuumSt
*** 532,538 ****
*/
if (!inh)
vac_update_relstats(onerel,
! RelationGetNumberOfBlocks(onerel),
totalrows,
visibilitymap_count(onerel),
hasindex,
--- 566,572 ----
*/
if (!inh)
vac_update_relstats(onerel,
! totalpages,
totalrows,
visibilitymap_count(onerel),
hasindex,
*************** BlockSampler_Next(BlockSampler bs)
*** 1015,1021 ****
*/
static int
acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows)
{
int numrows = 0; /* # rows now in reservoir */
double samplerows = 0; /* total # rows collected */
--- 1049,1056 ----
*/
static int
acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel)
{
int numrows = 0; /* # rows now in reservoir */
double samplerows = 0; /* total # rows collected */
*************** acquire_sample_rows(Relation onerel, Hea
*** 1030,1035 ****
--- 1065,1072 ----
Assert(targrows > 0);
totalblocks = RelationGetNumberOfBlocks(onerel);
+ if (totalpages)
+ *totalpages = totalblocks;
/* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
OldestXmin = GetOldestXmin(onerel->rd_rel->relisshared, true);
*************** acquire_sample_rows(Relation onerel, Hea
*** 1252,1258 ****
}
/* Select a random value R uniformly distributed in (0 - 1) */
! static double
random_fract(void)
{
return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
--- 1289,1295 ----
}
/* Select a random value R uniformly distributed in (0 - 1) */
! double
random_fract(void)
{
return ((double) random() + 1) / ((double) MAX_RANDOM_VALUE + 2);
*************** random_fract(void)
*** 1272,1285 ****
* determines the number of records to skip before the next record is
* processed.
*/
! static double
init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
return exp(-log(random_fract()) / n);
}
! static double
get_next_S(double t, int n, double *stateptr)
{
double S;
--- 1309,1322 ----
* determines the number of records to skip before the next record is
* processed.
*/
! double
init_selection_state(int n)
{
/* Initial value of W (for use when Algorithm Z is first applied) */
return exp(-log(random_fract()) / n);
}
! double
get_next_S(double t, int n, double *stateptr)
{
double S;
*************** compare_rows(const void *a, const void *
*** 1395,1401 ****
*/
static int
acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows)
{
List *tableOIDs;
Relation *rels;
--- 1432,1439 ----
*/
static int
acquire_inherited_sample_rows(Relation onerel, HeapTuple *rows, int targrows,
! double *totalrows, double *totaldeadrows,
! BlockNumber *totalpages, int elevel)
{
List *tableOIDs;
Relation *rels;
*************** acquire_inherited_sample_rows(Relation o
*** 1458,1463 ****
--- 1496,1503 ----
totalblocks += relblocks[nrels];
nrels++;
}
+ if (totalpages)
+ *totalpages = totalblocks;
/*
* Now sample rows from each relation, proportionally to its fraction of
*************** acquire_inherited_sample_rows(Relation o
*** 1491,1497 ****
rows + numrows,
childtargrows,
&trows,
! &tdrows);
/* We may need to convert from child's rowtype to parent's */
if (childrows > 0 &&
--- 1531,1539 ----
rows + numrows,
childtargrows,
&trows,
! &tdrows,
! NULL,
! elevel);
/* We may need to convert from child's rowtype to parent's */
if (childrows > 0 &&
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 9853686..3031496 100644
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
*************** static void ATPrepSetStatistics(Relation
*** 320,325 ****
--- 320,327 ----
Node *newValue, LOCKMODE lockmode);
static void ATExecSetStatistics(Relation rel, const char *colName,
Node *newValue, LOCKMODE lockmode);
+ static void ATPrepSetOptions(Relation rel, const char *colName,
+ Node *options, LOCKMODE lockmode);
static void ATExecSetOptions(Relation rel, const char *colName,
Node *options, bool isReset, LOCKMODE lockmode);
static void ATExecSetStorage(Relation rel, const char *colName,
*************** ATPrepCmd(List **wqueue, Relation rel, A
*** 3021,3027 ****
break;
case AT_SetOptions: /* ALTER COLUMN SET ( options ) */
case AT_ResetOptions: /* ALTER COLUMN RESET ( options ) */
! ATSimplePermissions(rel, ATT_TABLE | ATT_INDEX);
/* This command never recurses */
pass = AT_PASS_MISC;
break;
--- 3023,3030 ----
break;
case AT_SetOptions: /* ALTER COLUMN SET ( options ) */
case AT_ResetOptions: /* ALTER COLUMN RESET ( options ) */
! ATSimplePermissions(rel, ATT_TABLE | ATT_INDEX | ATT_FOREIGN_TABLE);
! ATPrepSetOptions(rel, cmd->name, cmd->def, lockmode);
/* This command never recurses */
pass = AT_PASS_MISC;
break;
*************** ATPrepSetStatistics(Relation rel, const
*** 4999,5008 ****
* allowSystemTableMods to be turned on.
*/
if (rel->rd_rel->relkind != RELKIND_RELATION &&
! rel->rd_rel->relkind != RELKIND_INDEX)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
! errmsg("\"%s\" is not a table or index",
RelationGetRelationName(rel))));
/* Permissions checks */
--- 5002,5012 ----
* allowSystemTableMods to be turned on.
*/
if (rel->rd_rel->relkind != RELKIND_RELATION &&
! rel->rd_rel->relkind != RELKIND_INDEX &&
! rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
! errmsg("\"%s\" is not a table, index or foreign table",
RelationGetRelationName(rel))));
/* Permissions checks */
*************** ATExecSetStatistics(Relation rel, const
*** 5071,5076 ****
--- 5075,5100 ----
}
static void
+ ATPrepSetOptions(Relation rel, const char *colName, Node *options,
+ LOCKMODE lockmode)
+ {
+ if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ {
+ ListCell *cell;
+
+ foreach(cell, (List *) options)
+ {
+ DefElem *def = (DefElem *) lfirst(cell);
+
+ if (pg_strcasecmp(def->defname, "n_distinct_inherited") == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot set \"n_distinct_inherited\" for foreign tables")));
+ }
+ }
+ }
+
+ static void
ATExecSetOptions(Relation rel, const char *colName, Node *options,
bool isReset, LOCKMODE lockmode)
{
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index dc2248b..b3d2078 100644
*** a/src/bin/psql/describe.c
--- b/src/bin/psql/describe.c
*************** describeOneTableDetails(const char *sche
*** 1104,1110 ****
bool printTableInitialized = false;
int i;
char *view_def = NULL;
! char *headers[6];
char **seq_values = NULL;
char **modifiers = NULL;
char **ptr;
--- 1104,1110 ----
bool printTableInitialized = false;
int i;
char *view_def = NULL;
! char *headers[7];
char **seq_values = NULL;
char **modifiers = NULL;
char **ptr;
*************** describeOneTableDetails(const char *sche
*** 1395,1401 ****
if (verbose)
{
headers[cols++] = gettext_noop("Storage");
! if (tableinfo.relkind == 'r')
headers[cols++] = gettext_noop("Stats target");
/* Column comments, if the relkind supports this feature. */
if (tableinfo.relkind == 'r' || tableinfo.relkind == 'v' ||
--- 1395,1401 ----
if (verbose)
{
headers[cols++] = gettext_noop("Storage");
! if (tableinfo.relkind == 'r' || tableinfo.relkind == 'f')
headers[cols++] = gettext_noop("Stats target");
/* Column comments, if the relkind supports this feature. */
if (tableinfo.relkind == 'r' || tableinfo.relkind == 'v' ||
*************** describeOneTableDetails(const char *sche
*** 1498,1504 ****
false, false);
/* Statistics target, if the relkind supports this feature */
! if (tableinfo.relkind == 'r')
{
printTableAddCell(&cont, PQgetvalue(res, i, firstvcol + 1),
false, false);
--- 1498,1504 ----
false, false);
/* Statistics target, if the relkind supports this feature */
! if (tableinfo.relkind == 'r' || tableinfo.relkind == 'f')
{
printTableAddCell(&cont, PQgetvalue(res, i, firstvcol + 1),
false, false);
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 975d655..d113adf 100644
*** a/src/bin/psql/tab-complete.c
--- b/src/bin/psql/tab-complete.c
*************** static const SchemaQuery Query_for_list_
*** 409,414 ****
--- 409,429 ----
NULL
};
+ static const SchemaQuery Query_for_list_of_tf = {
+ /* catname */
+ "pg_catalog.pg_class c",
+ /* selcondition */
+ "c.relkind IN ('r', 'f')",
+ /* viscondition */
+ "pg_catalog.pg_table_is_visible(c.oid)",
+ /* namespace */
+ "c.relnamespace",
+ /* result */
+ "pg_catalog.quote_ident(c.relname)",
+ /* qualresult */
+ NULL
+ };
+
static const SchemaQuery Query_for_list_of_views = {
/* catname */
"pg_catalog.pg_class c",
*************** psql_completion(char *text, int start, i
*** 2833,2839 ****
/* ANALYZE */
/* If the previous word is ANALYZE, produce list of tables */
else if (pg_strcasecmp(prev_wd, "ANALYZE") == 0)
! COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);
/* WHERE */
/* Simple case of the word before the where being the table name */
--- 2848,2854 ----
/* ANALYZE */
/* If the previous word is ANALYZE, produce list of tables */
else if (pg_strcasecmp(prev_wd, "ANALYZE") == 0)
! COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tf, NULL);
/* WHERE */
/* Simple case of the word before the where being the table name */
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 3deee66..3e89e82 100644
*** a/src/include/commands/vacuum.h
--- b/src/include/commands/vacuum.h
*************** extern void lazy_vacuum_rel(Relation one
*** 170,174 ****
--- 170,183 ----
extern void analyze_rel(Oid relid, VacuumStmt *vacstmt,
BufferAccessStrategy bstrategy);
extern bool std_typanalyze(VacAttrStats *stats);
+ typedef int (*SampleRowAcquireFunc) (Relation onerel, HeapTuple *rows,
+ int targrows, double *totalrows,
+ double *totaldeadrows,
+ BlockNumber *totalpages, int elevel);
+ extern void do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, int elevel,
+ bool inh, SampleRowAcquireFunc acquirefunc);
+ extern double random_fract(void);
+ extern double init_selection_state(int n);
+ extern double get_next_S(double t, int n, double *stateptr);
#endif /* VACUUM_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 854f177..d7181c7 100644
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 12,19 ****
--- 12,21 ----
#ifndef FDWAPI_H
#define FDWAPI_H
+ #include "foreign/foreign.h"
#include "nodes/execnodes.h"
#include "nodes/relation.h"
+ #include "utils/rel.h"
/* To avoid including explain.h here, reference ExplainState thus: */
struct ExplainState;
*************** typedef void (*ReScanForeignScan_functio
*** 50,55 ****
--- 52,60 ----
typedef void (*EndForeignScan_function) (ForeignScanState *node);
+ typedef void (*AnalyzeForeignTable_function) (Relation relation,
+ VacuumStmt *vacstmt,
+ int elevel);
/*
* FdwRoutine is the struct returned by a foreign-data wrapper's handler
*************** typedef struct FdwRoutine
*** 64,69 ****
--- 69,78 ----
{
NodeTag type;
+ /*
+ * These handlers are required to execute a scan on a foreign table. If
+ * any of them was NULL, scans on a foreign table managed by such FDW fail.
+ */
GetForeignRelSize_function GetForeignRelSize;
GetForeignPaths_function GetForeignPaths;
GetForeignPlan_function GetForeignPlan;
*************** typedef struct FdwRoutine
*** 72,77 ****
--- 81,92 ----
IterateForeignScan_function IterateForeignScan;
ReScanForeignScan_function ReScanForeignScan;
EndForeignScan_function EndForeignScan;
+
+ /*
+ * Handlers below are optional. You can set any of them to NULL to tell
+ * PostgreSQL backend that the FDW doesn't have the capability.
+ */
+ AnalyzeForeignTable_function AnalyzeForeignTable;
} FdwRoutine;
diff --git a/src/test/regress/expected/foreign_data.out b/src/test/regress/expected/foreign_data.out
index ba86883..f1379a6 100644
*** a/src/test/regress/expected/foreign_data.out
--- b/src/test/regress/expected/foreign_data.out
*************** CREATE FOREIGN TABLE ft1 (
*** 679,690 ****
COMMENT ON FOREIGN TABLE ft1 IS 'ft1';
COMMENT ON COLUMN ft1.c1 IS 'ft1.c1';
\d+ ft1
! Foreign table "public.ft1"
! Column | Type | Modifiers | FDW Options | Storage | Description
! --------+---------+-----------+--------------------------------+----------+-------------
! c1 | integer | not null | ("param 1" 'val1') | plain | ft1.c1
! c2 | text | | (param2 'val2', param3 'val3') | extended |
! c3 | date | | | plain |
Server: s0
FDW Options: (delimiter ',', quote '"', "be quoted" 'value')
Has OIDs: no
--- 679,690 ----
COMMENT ON FOREIGN TABLE ft1 IS 'ft1';
COMMENT ON COLUMN ft1.c1 IS 'ft1.c1';
\d+ ft1
! Foreign table "public.ft1"
! Column | Type | Modifiers | FDW Options | Storage | Stats target | Description
! --------+---------+-----------+--------------------------------+----------+--------------+-------------
! c1 | integer | not null | ("param 1" 'val1') | plain | | ft1.c1
! c2 | text | | (param2 'val2', param3 'val3') | extended | |
! c3 | date | | | plain | |
Server: s0
FDW Options: (delimiter ',', quote '"', "be quoted" 'value')
Has OIDs: no
*************** ERROR: cannot alter system column "xmin
*** 730,748 ****
ALTER FOREIGN TABLE ft1 ALTER COLUMN c7 OPTIONS (ADD p1 'v1', ADD p2 'v2'),
ALTER COLUMN c8 OPTIONS (ADD p1 'v1', ADD p2 'v2');
ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 OPTIONS (SET p2 'V2', DROP p1);
\d+ ft1
! Foreign table "public.ft1"
! Column | Type | Modifiers | FDW Options | Storage | Description
! --------+---------+-----------+--------------------------------+----------+-------------
! c1 | integer | not null | ("param 1" 'val1') | plain |
! c2 | text | | (param2 'val2', param3 'val3') | extended |
! c3 | date | | | plain |
! c4 | integer | | | plain |
! c6 | integer | not null | | plain |
! c7 | integer | | (p1 'v1', p2 'v2') | plain |
! c8 | text | | (p2 'V2') | extended |
! c9 | integer | | | plain |
! c10 | integer | | (p1 'v1') | plain |
Server: s0
FDW Options: (delimiter ',', quote '"', "be quoted" 'value')
Has OIDs: no
--- 730,753 ----
ALTER FOREIGN TABLE ft1 ALTER COLUMN c7 OPTIONS (ADD p1 'v1', ADD p2 'v2'),
ALTER COLUMN c8 OPTIONS (ADD p1 'v1', ADD p2 'v2');
ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 OPTIONS (SET p2 'V2', DROP p1);
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET STATISTICS 10000;
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET (n_distinct = 100);
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET (n_distinct_inherited = 100); -- ERROR
+ ERROR: cannot set "n_distinct_inherited" for foreign tables
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 SET STATISTICS -1;
\d+ ft1
! Foreign table "public.ft1"
! Column | Type | Modifiers | FDW Options | Storage | Stats target | Description
! --------+---------+-----------+--------------------------------+----------+--------------+-------------
! c1 | integer | not null | ("param 1" 'val1') | plain | 10000 |
! c2 | text | | (param2 'val2', param3 'val3') | extended | |
! c3 | date | | | plain | |
! c4 | integer | | | plain | |
! c6 | integer | not null | | plain | |
! c7 | integer | | (p1 'v1', p2 'v2') | plain | |
! c8 | text | | (p2 'V2') | extended | |
! c9 | integer | | | plain | |
! c10 | integer | | (p1 'v1') | plain | |
Server: s0
FDW Options: (delimiter ',', quote '"', "be quoted" 'value')
Has OIDs: no
diff --git a/src/test/regress/sql/foreign_data.sql b/src/test/regress/sql/foreign_data.sql
index 0c95672..03b5680 100644
*** a/src/test/regress/sql/foreign_data.sql
--- b/src/test/regress/sql/foreign_data.sql
*************** ALTER FOREIGN TABLE ft1 ALTER COLUMN xmi
*** 307,312 ****
--- 307,316 ----
ALTER FOREIGN TABLE ft1 ALTER COLUMN c7 OPTIONS (ADD p1 'v1', ADD p2 'v2'),
ALTER COLUMN c8 OPTIONS (ADD p1 'v1', ADD p2 'v2');
ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 OPTIONS (SET p2 'V2', DROP p1);
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET STATISTICS 10000;
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET (n_distinct = 100);
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 SET (n_distinct_inherited = 100); -- ERROR
+ ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 SET STATISTICS -1;
\d+ ft1
-- can't change the column type if it's used elsewhere
CREATE TABLE use_ft1_column_type (x ft1);
pgsql_fdw.tar.gzapplication/gzip; name=pgsql_fdw.tar.gzDownload
� b�}O �\�s�����������_y���s�`ls�� n��v4BZ@��TI�qn����c%��I��s�-���v����������y��g����'�� >�^���/�~�����W�����<|rpxt����x��-)������x��|����������0���i~����~Y��.���/Y��/^��_�|u�D<�
��?��W"���w��c7�
;X.���7bgD���{��X�6��PF�H.�D�a'�H���D,��;�#g��KL�
�!��������'��J������;u�(��#�^`�lS{$�J5���~r���������������x~|����%�����G��������{��!v����~�T�hg���8�."����� �n1 �#w�HD���# :��{���������y�BE�{��O�g�vk���*������*�#���W�o{+G�j�K�����=�l[����e'��7 �y��j&��,6*H�W?K�+��B��@6�yoV��D^�����J
nl[��-�^%��[Qd�oj��\/q�xS�"�Vd/65-���lj����6�%���q<(15d�,�����w��G���mQs�4vy� A�`�im�^H�F�j ��u��XX�B�Km�f�R��1�W;!m����`���@ 0RL���|�����T���uN�{�:"� ��*�{�b��uFG� l��Fe�4�;���_�.��\��Id�����9\?�9"9�y|���m v�����9�����o����\Sp��wNR�_d<����B�1n�I���g��R��y�Z$��n�Z���0��$�p+I�2L�,\'%��"���}-}���-qmq1i��g�\Op��
�hx��'�(HT.�n�7��`�7_��P'�����B3gzm���l9���!h�!��0P[@P�@pmx���t�=�0�s�H�Z��~���������,��k�X?m�+�r a4*�.�~N��O���MVs���>A��5w�����s :���5��|4�T[���W((e?�$ j-��g�\�:��������JA���"��;�V0�[+�n��H*�0G�1 6�����n50z����B�V�����e��������]�tg�g��%�������aO0�g������G��@�g�4�� 5�&C�����l�l���@���,f�+b�M�=�^�>�YX �jQ�������/%xv���(bG&�d�����N��`��C���q�`-/��s��^���%U6�+��������.G����91*E�������
4Q<P]��A�n��w�����2"�s��^�nT`5�����=�U*v����"�C�%�S�_�������T�K����
�i����a������H�~h�c�wU+���x���v%>;Y����Bm��!>?^Z���!�����bn�<DN���h0�b�[.�C��y�����;,���At��.Hw���Y��Kd��tH�l�jC1WCOPdf��������?�|��_���]����]/��QgS�o;�������X���P�0���p���$6�b�N���W;�lf*�Fi��?���{(=z� >